Re: RFC: replacing CVS
Hello Tibor et al, I know that my contribution to this discussion is absolutely marginal, because I'm not a core contributor, and for my little patches, I'm perfectly happy with quilt. But, like you, I have been following all this SCM-DSCM business during the last few years, and I have a need myself to keep track of my scripts and utilities. My only real experience (read+write) with SCM was with CVS many years ago, and I have used Subversion for downloading a couple of software packages and have some tests with tla, Git and Mercurial. I can subscribe your original conclussions, and I have not much to add to what the others have said, except some random thoughts. As we have settled with DrProject[1], a trac fork that can handle multiple projects our wiki and task handling, and both trac and DrProject have good integration with Subversion, in principle I was favouring this option. But my readonly experiences with Subversion has been quite disappointing when I saw that, when downloading software, it cannot keep the original timestamps of the files, something that CVS does perfectly. This annoys me so much! My tests with Git and Mercurial have been driven by a quest to find a nice tool for distributed storage for digital preservation[2]. We did some stress-test ingesting both with 500 Gb of fat (25-75 MB) Tiff files + some technical metadata[3]. What we found is that both are similarly capable and both handled this load with similar timings and overhead. We were specially interested in Git's content-addressable-filesystem[4] concept and the hash-tree ability to check if two repositories are identical. We havent' concluded anything yet except that, compared with the mighty Git, Mercurial is no toy either. What else? As there is no attractive (to me) centralised SCM option, and being myself a low-profile developer, I'd be happy to go to a DSCM (it can be fun), but the easier the better. I'd be more than happy with Mercurial. Moreover, Mercurial has reached 1.0 this week, it has enthusiastic followers[5], and has an Emacs frontend[6]. My cent, Ferran [1] http://www.drproject.org/ [2] See a summary of our preliminary findings at http://www.cesca.es/promocio/congressos/tsiuc2007/FerranJorba.pdf [3] This .info file contains the md5sum and the output of ImageMagick's `identify -verbose' (http://www.imagemagick.org/script/identify.php). [4] http://en.wikipedia.org/wiki/Content-addressable_storage [5] For example, the first comment at http://lwn.net/Articles/274823/ [6] http://freehg.org/u/agriggio/ahg/
CDS Invenio v0.99.0 is released
CDS Invenio v0.99.0 is released March 27, 2008 http://cdsware.cern.ch/invenio/news.html CDS Invenio v0.99.0 was released on March 27, 2008. What's new: --- *) new Invenio configuration language, new inveniocfg configuration tool permitting more runtime changes and enabling separate local customizations (MiscUtil) *) phased out WML dependency everywhere (all modules) *) new common RSS cache implementation (WebSearch) *) improved access control to the detailed record pages (WebSearch) *) when searching non-existing collections, do not revert to searching in public Home anymore (WebSearch) *) strict calculation of number of hits per multiple collections (WebSearch) *) propagate properly language environment in browse pages, thanks to Ferran Jorba (WebSearch) *) search results sorting made accentless, thanks to Ferran Jorba (WebSearch) *) new OpenURL interface (WebSearch) *) added new search engine API argument to limit searches to record creation/modification dates and times instead of hitherto creation dates only (WebSearch) *) do not allow HTTP POST method for searches to prevent hidden mining (WebSearch) *) added alert and RSS teaser for search engine queries (WebSearch) *) new optimized index structure for fast integer bit vector operations, leading to significant indexing time improvements (MiscUtil, BibIndex, WebSearch) *) new tab-based organisation of detailed record pages, with new URL schema (/record/1/usage) and related CSS changes (BibFormat, MiscUtil, WebComment, WebSearch, WebStyle, WebSubmit) *) phased out old PHP based code; migration to Python-based output formats recommended (BibFormat, WebSubmit) *) new configurability to show/hide specific output formats for specific collections (BibFormat, WebSearch) *) new configurability to have specific stemming settings for specific indexes (BibIndex, WebSearch) *) optional removal of LaTeX markup for indexer (BibIndex, WebSearch) *) performance optimization for webcoll and optional arguments to refresh only parts of collection cache (WebSearch) *) optional verbosity argument propagation to the output formatter (BibFormat, WebSearch) *) new convenient reindex option to the indexer (BibIndex) *) fixed problem with indexing of some lengthy UTF-8 accented names, thanks to Theodoros Theodoropoulos for reporting the problem (BibIndex) *) fixed full-text indexing of HTML pages (BibIndex) *) new Stemmer module dependency, fixes issues on 64-bit systems (BibIndex) *) fixed download history graph display (BibRank) *) improved citation ranking and history graphs, introduced self-citation distinction, added new demo records (BibRank) *) fixed range redefinition and output message printing problems in the ranking indexer, thanks to Mike Marino (BibRank) *) new XSLT output formatter support; phased out old BFX formats (BibFormat) *) I18N output messages are now translated in the output formatter templates (BibFormat) *) formats fixed to allow multiple author affiliations (BibFormat) *) improved speed of the record output reformatter in case of large sets (BibFormat) *) support for displaying LaTeX forumas via JavaScript (BibFormat) *) new and improved output formatter elements (BibFormat) *) new escaping modes for format elements (BibFormat) *) output format template editor cache and element dependency checker improvements (BibFormat) *) output formatter speed improvements in PHP-compatible mode (BibFormat) *) new demo submission configuration and approval workflow examples (WebSubmit) *) new submission full-text file stamper utility (WebSubmit) *) new submission icon-creation utility (WebSubmit) *) separated submission engine and database layer (WebSubmit) *) submission functions can now access user information (WebSubmit) *) implemented support for restricted icons (WebSubmit, WebAccess) *) new full-text file URL and cleaner storage facility; requires file names to be unique within a given record (WebSearch, WebSubmit) *) experimental release of the complex approval and refereeing workflow (WebSubmit) *) new end-submission functions to move files to storage space (WebSubmit) *) added support for MD5 checking of full-text files (WebSubmit) *) improved behaviour of the submission system with respect to the browser back button (WebSubmit) *) removed support for submission cookies (WebSubmit) *) flexible report number generation during submission (WebSubmit) *) added support for optional filtering step in the OAI harvesting chain (BibHarvest) *) new text-oriented converter functions IFDEFP, JOINMULTILINES (BibConvert) *) selective harvesting improvements, sets, non-standard
Re: release news and commit moratorium
Hi, just submitting changes in citation_indexer. I'm testing it in a 'clean' installatio but for some minutes 'make test may not work. Moreover, pylint will complain about this module until tomorrow. Yours, Marko
Re: release news and commit moratorium
just submitting changes in citation_indexer. I'm testing it in a 'clean' installation but for some minutes 'make test may not work. run-unit-tests etc work ok now. Yours, Marko
Re: release news and commit moratorium
Hi gang: On Wed, 26 Mar 2008, Tibor Simko wrote: 2) As I told you IRL, there is a tentative absolute commit moratorium planned for tonight at 23:59. If you discover any important fixes that should go into the release after that time, please check with me. The commit moratorium is over now. But please keep in mind that: - CVS may soon be replaced: I will restart the SCM discussion soon now that the release in over - CVS should host only bugfixes to the 0.99.x series, meaning no API changes over 0.99.0, no backward incompatibilities in configurations, no template changes, etc - all important newly added functionality must be made optional, e.g. via CFG_FOO_BAR config variables to switch it on or off, or via new database tables, etc - ... and all this so that we can have frequent releases of the 0.99.y series that people could simply drop in place of their older installed 0.99.x version and that their custom templates and etc configurations will continue to work out of the box. For rougher developments there is the -dev CVS repository (no branches please), and soon we shall possibly have a new SCM (with possibly good branching capabilities). To be continued soon... Best regards -- Tibor Simko ** CERN Document Server ** http://cds.cern.ch/
Re: Handling of sessions and robots
Hi Greg: On Wed, 19 Mar 2008, Gregory Favre wrote: Googlebot, MSNbot and yahoo slurp accept cookies but do not use them. Therefore, each access creates a new session. [...] Discussing with Sam about this issue, we went to the conclusion that the robots should not receive any session. The patch looks good, but this technique has a drawback of not catching all robots, not catching the ones that bypass robots.txt or that pretend to be MSIE (we have been exposed to a few), and slowing down the system for the regular end users a bit. Which is why I suggested in the past: On Fri, 11 Jan 2008, Tibor Simko wrote: | Another thing is not to distribute sessions to anonymous users, | i.e. to treat them all as uid=0. This would force the personal | features to be available only to registered users, but you do that | on your site anyway, I think. We have discussed this option within the CDS team and the Inspire collaboration, and we have agreed that we don't need to distinguish between a guest A and a guest B anymore. We did this in the past mostly so that people could try out baskets without having to register. At CERN, there is less need now for this due to the Single Sign On, and outside of CERN, there may be less need for this too due to forthcoming basket facility end-user documentation.[1] Therefore please expect a new invenio.conf option soon that would fix this problem in an ultimate way... [1] I was actually planning to address the -users list in the future to see if people would like to switch or if there is an interest to keep the old behaviour... Best regards -- Tibor Simko ** CERN Document Server ** http://cds.cern.ch/
CVS Commit Overview for 2008-03-26
CVS Commit Overview for 2008-03-26 == 2008-03-26 Jerome Caffaro jerome.caff...@cern.ch * modules/bibclassify/doc/hacking/bibclassify-internals.webdoc: Fixed phrase. 2008-03-26 Jerome Caffaro jerome.caff...@cern.ch * modules/webstyle/lib/webdoc.py: Fixed 'content' pages to display correctly when translation is missing. 2008-03-26 Tibor Simko tibor.si...@cern.ch * modules/bibedit/lib/bibedit_dblayer.py, modules/bibformat/lib/bibformat_utils.py, modules/bibformat/lib/elements/bfe_additional_report_numbers.py, modules/bibformat/lib/elements/bfe_addresses.py, modules/bibformat/lib/elements/bfe_affiliation.py, modules/bibformat/lib/elements/bfe_bfx_engine.py, modules/bibformat/lib/elements/bfe_cited_by.py, modules/bibformat/lib/elements/bfe_collection.py, modules/bibformat/lib/elements/bfe_comments.py, modules/bibformat/lib/elements/bfe_contact.py, modules/bibformat/lib/elements/bfe_creation_date.py, modules/bibformat/lib/elements/bfe_date_rec.py, modules/bibformat/lib/elements/bfe_editors.py, modules/bibformat/lib/elements/bfe_external_publications.py, modules/bibformat/lib/elements/bfe_issn.py, modules/bibformat/lib/elements/bfe_keywords.py, modules/bibformat/lib/elements/bfe_language.py, modules/bibformat/lib/elements/bfe_notes.py, modules/bibformat/lib/elements/bfe_photo_resources.py, modules/bibformat/lib/elements/bfe_publi_info.py, modules/bibformat/lib/elements/bfe_record_id.py, modules/bibformat/lib/elements/bfe_references.py, modules/bibformat/lib/elements/bfe_report_numbers.py, modules/bibformat/lib/elements/bfe_test_2.py, modules/bibformat/lib/elements/bfe_test_4.py, modules/bibformat/lib/elements/bfe_url.py, modules/bibformat/lib/elements/bfe_xml_record.py, modules/bibformat/lib/elements/test_1.py, modules/bibformat/lib/elements/test_5.py, modules/bibharvest/lib/oaiarchiveadmin_regression_tests.py, modules/bibindex/lib/bibindex_engine_stopwords.py, modules/bibrank/lib/bibrank_downloads_indexer.py, modules/elmsubmit/lib/elmsubmit_EZArchive.py, modules/elmsubmit/lib/elmsubmit_enriched2txt.py, modules/elmsubmit/lib/elmsubmit_field_validation.py, modules/elmsubmit/lib/elmsubmit_generate_marc.py, modules/elmsubmit/lib/elmsubmit_html2txt.py, modules/elmsubmit/lib/elmsubmit_misc.py, modules/elmsubmit/lib/elmsubmit_richtext2txt.py, modules/elmsubmit/lib/elmsubmit_submission_parser.py, modules/elmsubmit/lib/myhtmlentitydefs.py, modules/miscutil/lib/htmlutils.py, modules/webbasket/lib/webbasket_config.py, modules/webcomment/lib/webcomment_config.py, modules/webjournal/lib/widgets/bfe_webjournal_widget_latestPhoto.py, modules/webmessage/lib/webmessage_config.py, modules/webmessage/lib/webmessage_mailutils.py, modules/websearch/lib/search_engine_config.py, modules/websearch/lib/websearch_external_collections_config.py, modules/websearch/lib/websearch_external_collections_getter.py, modules/websearch/lib/websearch_external_collections_parser.py, modules/websubmit/lib/websubmit_file_stamper.py, modules/websubmit/lib/websubmitadmin_dblayer.py, modules/websubmit/lib/functions/CaseEDS.py, modules/websubmit/lib/functions/Create_Modify_Interface.py, modules/websubmit/lib/functions/Create_Recid.py, modules/websubmit/lib/functions/Print_Success_DEL.py, modules/websubmit/lib/functions/Print_Success_SRV.py, modules/websubmit/lib/functions/Shared_Functions.py, modules/websubmit/lib/functions/Stamp_Uploaded_Files.py: Deleted trailing whitespace in all Python files. 2008-03-26 Jerome Caffaro jerome.caff...@cern.ch * modules/webstyle/doc/hacking/Makefile.am, modules/webstyle/doc/hacking/webstyle-webdoc-syntax.webdoc: Initial release of the WebDoc syntax guide. 2008-03-26 Jerome Caffaro jerome.caff...@cern.ch * modules/bibformat/doc/hacking/bibformat-api.webdoc: Fixed HTML. 2008-03-26 Jerome Caffaro jerome.caff...@cern.ch * modules/webstyle/doc/admin/webstyle-admin-guide.webdoc: Added link to WebDoc syntax. 2008-03-26 Jerome Caffaro jerome.caff...@cern.ch * modules/webstyle/doc/admin/webstyle-admin-guide.webdoc: Replaced config.py by invenio.conf 2008-03-26 Jerome Caffaro jerome.caff...@cern.ch * modules/webstyle/doc/admin/webstyle-admin-guide.webdoc: Updated doc. 2008-03-26 Tibor Simko tibor.si...@cern.ch * modules/bibclassify/lib/bibclassify_daemon.py: Improved code kwalitee. 2008-03-26 Jerome Caffaro jerome.caff...@cern.ch *