Re: RFC: replacing CVS

2008-03-27 Thread Ferran Jorba
Hello Tibor et al,

I know that my contribution to this discussion is absolutely marginal,
because I'm not a core contributor, and for my little patches, I'm
perfectly happy with quilt.

But, like you, I have been following all this SCM-DSCM business during
the last few years, and I have a need myself to keep track of my
scripts and utilities.  My only real experience (read+write) with SCM
was with CVS many years ago, and I have used Subversion for
downloading a couple of software packages and have some tests with
tla, Git and Mercurial.

I can subscribe your original conclussions, and I have not much to
add to what the others have said, except some random thoughts.

As we have settled with DrProject[1], a trac fork that can handle
multiple projects our wiki and task handling, and both trac and
DrProject have good integration with Subversion, in principle I was
favouring this option.

But my readonly experiences with Subversion has been quite
disappointing when I saw that, when downloading software, it cannot
keep the original timestamps of the files, something that CVS does
perfectly.  This annoys me so much!

My tests with Git and Mercurial have been driven by a quest to find a
nice tool for distributed storage for digital preservation[2].  We did
some stress-test ingesting both with 500 Gb of fat (25-75 MB) Tiff
files + some technical metadata[3].  What we found is that both are
similarly capable and both handled this load with similar timings and
overhead.  We were specially interested in Git's
content-addressable-filesystem[4] concept and the hash-tree ability to
check if two repositories are identical.  We havent' concluded
anything yet except that, compared with the mighty Git, Mercurial is
no toy either.

What else?  As there is no attractive (to me) centralised SCM option,
and being myself a low-profile developer, I'd be happy to go to a DSCM
(it can be fun), but the easier the better.  I'd be more than happy
with Mercurial.

Moreover, Mercurial has reached 1.0 this week, it has enthusiastic
followers[5], and has an Emacs frontend[6].

My cent,

Ferran


[1] http://www.drproject.org/
[2] See a summary of our preliminary findings at
http://www.cesca.es/promocio/congressos/tsiuc2007/FerranJorba.pdf
[3] This .info file contains the md5sum and the output of
ImageMagick's `identify -verbose'
(http://www.imagemagick.org/script/identify.php).
[4] http://en.wikipedia.org/wiki/Content-addressable_storage
[5] For example, the first comment at http://lwn.net/Articles/274823/
[6] http://freehg.org/u/agriggio/ahg/



CDS Invenio v0.99.0 is released

2008-03-27 Thread Tibor Simko

CDS Invenio v0.99.0 is released
March 27, 2008
http://cdsware.cern.ch/invenio/news.html


CDS Invenio v0.99.0 was released on March 27, 2008.

What's new:
---

 *) new Invenio configuration language, new inveniocfg configuration
tool permitting more runtime changes and enabling separate local
customizations (MiscUtil)

 *) phased out WML dependency everywhere (all modules)

 *) new common RSS cache implementation (WebSearch)

 *) improved access control to the detailed record pages (WebSearch)

 *) when searching non-existing collections, do not revert to
searching in public Home anymore (WebSearch)

 *) strict calculation of number of hits per multiple collections
(WebSearch)

 *) propagate properly language environment in browse pages, thanks to
Ferran Jorba (WebSearch)

 *) search results sorting made accentless, thanks to Ferran Jorba
(WebSearch)

 *) new OpenURL interface (WebSearch)

 *) added new search engine API argument to limit searches to record
creation/modification dates and times instead of hitherto creation
dates only (WebSearch)

 *) do not allow HTTP POST method for searches to prevent hidden
mining (WebSearch)

 *) added alert and RSS teaser for search engine queries (WebSearch)

 *) new optimized index structure for fast integer bit vector
operations, leading to significant indexing time improvements
(MiscUtil, BibIndex, WebSearch)

 *) new tab-based organisation of detailed record pages, with new URL
schema (/record/1/usage) and related CSS changes (BibFormat,
MiscUtil, WebComment, WebSearch, WebStyle, WebSubmit)

 *) phased out old PHP based code; migration to Python-based output
formats recommended (BibFormat, WebSubmit)

 *) new configurability to show/hide specific output formats for
specific collections (BibFormat, WebSearch)

 *) new configurability to have specific stemming settings for
specific indexes (BibIndex, WebSearch)

 *) optional removal of LaTeX markup for indexer (BibIndex, WebSearch)

 *) performance optimization for webcoll and optional arguments to
refresh only parts of collection cache (WebSearch)

 *) optional verbosity argument propagation to the output formatter
(BibFormat, WebSearch)

 *) new convenient reindex option to the indexer (BibIndex)

 *) fixed problem with indexing of some lengthy UTF-8 accented names,
thanks to Theodoros Theodoropoulos for reporting the problem
(BibIndex)

 *) fixed full-text indexing of HTML pages (BibIndex)

 *) new Stemmer module dependency, fixes issues on 64-bit systems
(BibIndex)

 *) fixed download history graph display (BibRank)

 *) improved citation ranking and history graphs, introduced
self-citation distinction, added new demo records (BibRank)

 *) fixed range redefinition and output message printing problems in
the ranking indexer, thanks to Mike Marino (BibRank)

 *) new XSLT output formatter support; phased out old BFX formats
(BibFormat)

 *) I18N output messages are now translated in the output formatter
templates (BibFormat)

 *) formats fixed to allow multiple author affiliations (BibFormat)

 *) improved speed of the record output reformatter in case of large
sets (BibFormat)

 *) support for displaying LaTeX forumas via JavaScript (BibFormat)

 *) new and improved output formatter elements (BibFormat)

 *) new escaping modes for format elements (BibFormat)

 *) output format template editor cache and element dependency
checker improvements (BibFormat)

 *) output formatter speed improvements in PHP-compatible mode
(BibFormat)

 *) new demo submission configuration and approval workflow examples
(WebSubmit)

 *) new submission full-text file stamper utility (WebSubmit)

 *) new submission icon-creation utility (WebSubmit)

 *) separated submission engine and database layer (WebSubmit)

 *) submission functions can now access user information (WebSubmit)

 *) implemented support for restricted icons (WebSubmit, WebAccess)

 *) new full-text file URL and cleaner storage facility; requires file
names to be unique within a given record (WebSearch, WebSubmit)

 *) experimental release of the complex approval and refereeing
workflow (WebSubmit)

 *) new end-submission functions to move files to storage space
(WebSubmit)

 *) added support for MD5 checking of full-text files (WebSubmit)

 *) improved behaviour of the submission system with respect to the
browser back button (WebSubmit)

 *) removed support for submission cookies (WebSubmit)

 *) flexible report number generation during submission (WebSubmit)

 *) added support for optional filtering step in the OAI harvesting
chain (BibHarvest)

 *) new text-oriented converter functions IFDEFP, JOINMULTILINES
(BibConvert)

 *) selective harvesting improvements, sets, non-standard 

Re: release news and commit moratorium

2008-03-27 Thread Marko Niinimaki
Hi,

just submitting changes in citation_indexer. I'm testing it in a
'clean' installatio but for some minutes 'make test may not work.
Moreover, pylint will complain about this module until tomorrow.

Yours,
Marko



Re: release news and commit moratorium

2008-03-27 Thread Marko Niinimaki
  just submitting changes in citation_indexer. I'm testing it in a
  'clean' installation but for some minutes 'make test may not work.

run-unit-tests etc work ok now.

Yours,
Marko



Re: release news and commit moratorium

2008-03-27 Thread Tibor Simko
Hi gang:

On Wed, 26 Mar 2008, Tibor Simko wrote:

 2) As I told you IRL, there is a tentative absolute commit moratorium
 planned for tonight at 23:59.  If you discover any important fixes
 that should go into the release after that time, please check with me.

The commit moratorium is over now.  But please keep in mind that:

 - CVS may soon be replaced: I will restart the SCM discussion soon
   now that the release in over

 - CVS should host only bugfixes to the 0.99.x series, meaning no API
   changes over 0.99.0, no backward incompatibilities in
   configurations, no template changes, etc

 - all important newly added functionality must be made optional,
   e.g. via CFG_FOO_BAR config variables to switch it on or off, or
   via new database tables, etc

 - ... and all this so that we can have frequent releases of the
   0.99.y series that people could simply drop in place of their older
   installed 0.99.x version and that their custom templates and etc
   configurations will continue to work out of the box.

For rougher developments there is the -dev CVS repository (no
branches please), and soon we shall possibly have a new SCM (with
possibly good branching capabilities).

To be continued soon...

Best regards
-- 
Tibor Simko ** CERN Document Server ** http://cds.cern.ch/



Re: Handling of sessions and robots

2008-03-27 Thread Tibor Simko
Hi Greg:

On Wed, 19 Mar 2008, Gregory Favre wrote:

 Googlebot, MSNbot and yahoo slurp accept cookies but do not use
 them. Therefore, each access creates a new session. [...]
 Discussing with Sam about this issue, we went to the conclusion that
 the robots should not receive any session.

The patch looks good, but this technique has a drawback of not
catching all robots, not catching the ones that bypass robots.txt or
that pretend to be MSIE (we have been exposed to a few), and slowing
down the system for the regular end users a bit.  Which is why I
suggested in the past:

On Fri, 11 Jan 2008, Tibor Simko wrote:

| Another thing is not to distribute sessions to anonymous users,
| i.e. to treat them all as uid=0.  This would force the personal
| features to be available only to registered users, but you do that
| on your site anyway, I think.

We have discussed this option within the CDS team and the Inspire
collaboration, and we have agreed that we don't need to distinguish
between a guest A and a guest B anymore.  We did this in the past
mostly so that people could try out baskets without having to
register.  At CERN, there is less need now for this due to the Single
Sign On, and outside of CERN, there may be less need for this too due
to forthcoming basket facility end-user documentation.[1]

Therefore please expect a new invenio.conf option soon that would fix
this problem in an ultimate way...

[1] I was actually planning to address the -users list in the future
to see if people would like to switch or if there is an interest
to keep the old behaviour...

Best regards
-- 
Tibor Simko ** CERN Document Server ** http://cds.cern.ch/



CVS Commit Overview for 2008-03-26

2008-03-27 Thread CDS Invenio CVS
CVS Commit Overview for 2008-03-26
==

2008-03-26  Jerome Caffaro jerome.caff...@cern.ch

* modules/bibclassify/doc/hacking/bibclassify-internals.webdoc:
Fixed phrase.

2008-03-26  Jerome Caffaro jerome.caff...@cern.ch

* modules/webstyle/lib/webdoc.py: Fixed 'content' pages to display
correctly when translation is missing.

2008-03-26  Tibor Simko tibor.si...@cern.ch

* modules/bibedit/lib/bibedit_dblayer.py,
modules/bibformat/lib/bibformat_utils.py,
modules/bibformat/lib/elements/bfe_additional_report_numbers.py,
modules/bibformat/lib/elements/bfe_addresses.py,
modules/bibformat/lib/elements/bfe_affiliation.py,
modules/bibformat/lib/elements/bfe_bfx_engine.py,
modules/bibformat/lib/elements/bfe_cited_by.py,
modules/bibformat/lib/elements/bfe_collection.py,
modules/bibformat/lib/elements/bfe_comments.py,
modules/bibformat/lib/elements/bfe_contact.py,
modules/bibformat/lib/elements/bfe_creation_date.py,
modules/bibformat/lib/elements/bfe_date_rec.py,
modules/bibformat/lib/elements/bfe_editors.py,
modules/bibformat/lib/elements/bfe_external_publications.py,
modules/bibformat/lib/elements/bfe_issn.py,
modules/bibformat/lib/elements/bfe_keywords.py,
modules/bibformat/lib/elements/bfe_language.py,
modules/bibformat/lib/elements/bfe_notes.py,
modules/bibformat/lib/elements/bfe_photo_resources.py,
modules/bibformat/lib/elements/bfe_publi_info.py,
modules/bibformat/lib/elements/bfe_record_id.py,
modules/bibformat/lib/elements/bfe_references.py,
modules/bibformat/lib/elements/bfe_report_numbers.py,
modules/bibformat/lib/elements/bfe_test_2.py,
modules/bibformat/lib/elements/bfe_test_4.py,
modules/bibformat/lib/elements/bfe_url.py,
modules/bibformat/lib/elements/bfe_xml_record.py,
modules/bibformat/lib/elements/test_1.py,
modules/bibformat/lib/elements/test_5.py,
modules/bibharvest/lib/oaiarchiveadmin_regression_tests.py,
modules/bibindex/lib/bibindex_engine_stopwords.py,
modules/bibrank/lib/bibrank_downloads_indexer.py,
modules/elmsubmit/lib/elmsubmit_EZArchive.py,
modules/elmsubmit/lib/elmsubmit_enriched2txt.py,
modules/elmsubmit/lib/elmsubmit_field_validation.py,
modules/elmsubmit/lib/elmsubmit_generate_marc.py,
modules/elmsubmit/lib/elmsubmit_html2txt.py,
modules/elmsubmit/lib/elmsubmit_misc.py,
modules/elmsubmit/lib/elmsubmit_richtext2txt.py,
modules/elmsubmit/lib/elmsubmit_submission_parser.py,
modules/elmsubmit/lib/myhtmlentitydefs.py,
modules/miscutil/lib/htmlutils.py,
modules/webbasket/lib/webbasket_config.py,
modules/webcomment/lib/webcomment_config.py,
modules/webjournal/lib/widgets/bfe_webjournal_widget_latestPhoto.py,
modules/webmessage/lib/webmessage_config.py,
modules/webmessage/lib/webmessage_mailutils.py,
modules/websearch/lib/search_engine_config.py,
modules/websearch/lib/websearch_external_collections_config.py,
modules/websearch/lib/websearch_external_collections_getter.py,
modules/websearch/lib/websearch_external_collections_parser.py,
modules/websubmit/lib/websubmit_file_stamper.py,
modules/websubmit/lib/websubmitadmin_dblayer.py,
modules/websubmit/lib/functions/CaseEDS.py,
modules/websubmit/lib/functions/Create_Modify_Interface.py,
modules/websubmit/lib/functions/Create_Recid.py,
modules/websubmit/lib/functions/Print_Success_DEL.py,
modules/websubmit/lib/functions/Print_Success_SRV.py,
modules/websubmit/lib/functions/Shared_Functions.py,
modules/websubmit/lib/functions/Stamp_Uploaded_Files.py: Deleted
trailing whitespace in all Python files.

2008-03-26  Jerome Caffaro jerome.caff...@cern.ch

* modules/webstyle/doc/hacking/Makefile.am,
modules/webstyle/doc/hacking/webstyle-webdoc-syntax.webdoc: Initial
release of the WebDoc syntax guide.

2008-03-26  Jerome Caffaro jerome.caff...@cern.ch

* modules/bibformat/doc/hacking/bibformat-api.webdoc: Fixed HTML.

2008-03-26  Jerome Caffaro jerome.caff...@cern.ch

* modules/webstyle/doc/admin/webstyle-admin-guide.webdoc: Added
link to WebDoc syntax.

2008-03-26  Jerome Caffaro jerome.caff...@cern.ch

* modules/webstyle/doc/admin/webstyle-admin-guide.webdoc: Replaced
config.py by invenio.conf

2008-03-26  Jerome Caffaro jerome.caff...@cern.ch

* modules/webstyle/doc/admin/webstyle-admin-guide.webdoc: Updated
doc.

2008-03-26  Tibor Simko tibor.si...@cern.ch

* modules/bibclassify/lib/bibclassify_daemon.py: Improved code
kwalitee.

2008-03-26  Jerome Caffaro jerome.caff...@cern.ch

*