Thank you very much for your suggestions Jerome. It'll give me a good
starting point.
Kind Regards
Sergio

On Fri, Oct 31, 2008 at 8:30 AM, Jerome Caffaro <[email protected]>wrote:

> Dear Sergio,
>
> Sergio Laberer wrote:
>
>> The documents within such a collection are now exported by someone via
>> excel and processed further outside CDS-invenio. This person would now be
>> interested in being able to pick up the incremental addions to this
>> collection since they exported the documents last.
>>
>
> You could simply implement a check in the Excel output:
>
> for each record:
>  if the record modification/addition time is older \
>   than the last export time then:
>       skip record
>   else:
>       format record to excel
> Save current export time in a file
> (also save collection if export is done on a collection basis)
>
> You can find the Excel export function is
> /opt/cds-invenio/lib/python/invenio/bibformat.py
>
> That would however not be very efficient and could leads to a
> few problems (see below).
>
> Alternatively, I was thinking of possibly doing this by adding a field for
>> each document, which indicates whether the document was already exported or
>> not. However, I would then have to batch update all those fields after it
>> being exported to excel. Do you think this would be feasible? If so, do you
>> have some pointers?
>>
>>
>
> That would be feasible: at export time (see "algorithm" above), just
> starts a BibUpload task with the updated XML. That would be as simple
> as adding a field:
>
> For eg. generate /tmp/your_modif.xml:
> <record>
>  <controlfield tag="001">XXXX</controfield>
>  <datafield tag="999" ind1="9" ind2="9">
>   <subfield code="a">PROCESSED</a>
>  </datafield>
> </record>
>
> and then:
> $ /opt/cds-invenio/bin/bibupload -a /tmp/your_modif.xml
>
> You could even update the collection of the record, if you want
> to have a collection tree in this form:
> "Collection YYYYY"
>  -> "Collection YYYYY new"
>  -> "Collection YYYYY processed"
>
> See BibUpload admin guide:
> <http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide>
>
> However I see a problem in this workflow: what if someone exports the
> data to an Excel output, but does not save the file (click by mistake
> on "Cancel" when asked to save)? It would no longer be possible to
> re-export these records "easily".
>
> Also there would be a problem if the documents are exported
> concurrently, by several people at the same time.
>
> While there is the alerting functionality which in my view would solve this
>> matter, they actually would like to do this whenever they wanted it and not
>> being bound on a regular schedule.
>>
>
> If they set up a daily alerts that goes to some dedicated mailbox,
> they could process them whenever they want. That would also give them
> a good idea of the amount of work ("ZZZZ unread messages").
>
> The alerts could all go to a shared mailbox, or could be set up so
> that they go to a particular person depending on the collection.
>
> Note that alerts can be send to WebBaskets too: your users could
> progressively empty their baskets while records are processed.  Also
> the contents of WebBaskets can be exported to some output formats: you
> could plug one of the possibilities discussed above in this module
> (remove records from the basket once they have been exported). However
> the same limitations apply.
>
> If I understand the new feature you mentioned, there would be a staging
>> area where the latest additions would be queued before being promoted to be
>> viewable. Will those "pre-release" documents be already within the database,
>> i.e. searchable or would they simply be queued before being loaded into the
>> repository? If it is the former, then I could see a way around my problem.
>> The later might not solve my problem.
>>
>
> They would be queued before being searchable/viewable.
>
> Then you might think about setting up a second Invenio server: Records
> are harvested from external sources on the first Invenio instance,
> while the second Invenio instance harvest from the first one. The
> first server would be for search/view, while the second would be used
> only for its possibility to queue new records before they are
> integrated. That could however be a bit heavy for your needs.
>
> Finally consider this possibility to use CDS Invenio to process the
> records instead of producing Excel listings for external processing:
> with some custom WebSubmit submissions you can achieve quite powerful
> workflows.
>
> Best regards
>
> --
> Jerome Caffaro ** CERN Document Server ** <http://cds.cern.ch/>
>
>

Reply via email to