Thank you very much for your suggestions Jerome. It'll give me a good starting point. Kind Regards Sergio
On Fri, Oct 31, 2008 at 8:30 AM, Jerome Caffaro <[email protected]>wrote: > Dear Sergio, > > Sergio Laberer wrote: > >> The documents within such a collection are now exported by someone via >> excel and processed further outside CDS-invenio. This person would now be >> interested in being able to pick up the incremental addions to this >> collection since they exported the documents last. >> > > You could simply implement a check in the Excel output: > > for each record: > if the record modification/addition time is older \ > than the last export time then: > skip record > else: > format record to excel > Save current export time in a file > (also save collection if export is done on a collection basis) > > You can find the Excel export function is > /opt/cds-invenio/lib/python/invenio/bibformat.py > > That would however not be very efficient and could leads to a > few problems (see below). > > Alternatively, I was thinking of possibly doing this by adding a field for >> each document, which indicates whether the document was already exported or >> not. However, I would then have to batch update all those fields after it >> being exported to excel. Do you think this would be feasible? If so, do you >> have some pointers? >> >> > > That would be feasible: at export time (see "algorithm" above), just > starts a BibUpload task with the updated XML. That would be as simple > as adding a field: > > For eg. generate /tmp/your_modif.xml: > <record> > <controlfield tag="001">XXXX</controfield> > <datafield tag="999" ind1="9" ind2="9"> > <subfield code="a">PROCESSED</a> > </datafield> > </record> > > and then: > $ /opt/cds-invenio/bin/bibupload -a /tmp/your_modif.xml > > You could even update the collection of the record, if you want > to have a collection tree in this form: > "Collection YYYYY" > -> "Collection YYYYY new" > -> "Collection YYYYY processed" > > See BibUpload admin guide: > <http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide> > > However I see a problem in this workflow: what if someone exports the > data to an Excel output, but does not save the file (click by mistake > on "Cancel" when asked to save)? It would no longer be possible to > re-export these records "easily". > > Also there would be a problem if the documents are exported > concurrently, by several people at the same time. > > While there is the alerting functionality which in my view would solve this >> matter, they actually would like to do this whenever they wanted it and not >> being bound on a regular schedule. >> > > If they set up a daily alerts that goes to some dedicated mailbox, > they could process them whenever they want. That would also give them > a good idea of the amount of work ("ZZZZ unread messages"). > > The alerts could all go to a shared mailbox, or could be set up so > that they go to a particular person depending on the collection. > > Note that alerts can be send to WebBaskets too: your users could > progressively empty their baskets while records are processed. Also > the contents of WebBaskets can be exported to some output formats: you > could plug one of the possibilities discussed above in this module > (remove records from the basket once they have been exported). However > the same limitations apply. > > If I understand the new feature you mentioned, there would be a staging >> area where the latest additions would be queued before being promoted to be >> viewable. Will those "pre-release" documents be already within the database, >> i.e. searchable or would they simply be queued before being loaded into the >> repository? If it is the former, then I could see a way around my problem. >> The later might not solve my problem. >> > > They would be queued before being searchable/viewable. > > Then you might think about setting up a second Invenio server: Records > are harvested from external sources on the first Invenio instance, > while the second Invenio instance harvest from the first one. The > first server would be for search/view, while the second would be used > only for its possibility to queue new records before they are > integrated. That could however be a bit heavy for your needs. > > Finally consider this possibility to use CDS Invenio to process the > records instead of producing Excel listings for external processing: > with some custom WebSubmit submissions you can achieve quite powerful > workflows. > > Best regards > > -- > Jerome Caffaro ** CERN Document Server ** <http://cds.cern.ch/> > >
