Re: Re-implementation of OAI repository in Invenio

2011-10-03 Thread Samuele Kaplun
Dear All,

Il giorno lun, 03/10/2011 alle 10.27 +0200, Samuele Kaplun ha scritto:
 So what about having new MARC tag 909CP with the assignment:
 
 * baseURL - $u
 * identifier - $i
 * datestamp - $d
 * metadataNamespace - $m
 * originDescription - $o
 * harvestDate - $h
 * altered - $a

answering to myself, I've seen we already have in invenio.conf:

[...]
## CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG -- where do we store OAI ID tags
## of harvested records?  Useful for matching when we harvest stuff
## via OAI that we do not want to reexport via Invenio OAI; so records
## may have only the source OAI ID stored in this tag (kind of like
## external system number too).
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG = 035__a

## CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG -- where do we store OAI SRC
## tags of harvested records?  Useful for matching when we harvest stuff
## via OAI that we do not want to reexport via Invenio OAI; so records
## may have only the source OAI SRC stored in this tag (kind of like
## external system number too). Note that the field should be the same of
## CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG.
CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG = 035__9
[...]

As far as I know the main usage of this 035 field in the Invenio
community has been by INSPIRE when harvesting from arXiv. However what
has been put in 035__9 was not the baseURL of arXiv, rather the string
arXiv. 

In Invenio there is a special treatment for this 035 field, namely that
the couple OAIID_TAG + OAIID_PROVENANCE_TAG is used to identify uniquely
a record.

So shall I simply add by default to 035 the above mentioned attributes?

E.g. 

* baseURL - $u (different than $9 which is a semantic string. The
baseURL might change because of technical reasons, and therefore the $9
subfield, when present will receive priority in identify a record).
* identifier - $a (as per CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG)
* datestamp - $d
* metadataNamespace - $m
* originDescription - $o
* harvestDate - $h
* altered - $a

Is there anyone in the Invenio community whose system is harvesting
record (putting external IDs in 035) and is trying to expose them?

Cheers!
Sam
-- 
Samuele Kaplun
Invenio Developer ** http://invenio-software.org/



Re: Re-implementation of OAI repository in Invenio

2011-10-03 Thread Benoit Thiell
Hi Sam.

We'll be storing our main identifiers (bibcodes) in 035.

Benoit.

On Mon, Oct 3, 2011 at 9:02 AM, Gregory Favre gregory.fa...@epfl.ch wrote:
 Hi Sam!

 So what about having new MARC tag 909CP with the assignment:

 I use the 909C0p field for a complete different thing (storing info about
 labs responsible of the record). Maybe this is the reason why OAI export is
 broken since our migration to invenio 1.0 (all the sets return the whole
 database)?

 Is there anyone in the Invenio community whose system is harvesting
 record (putting external IDs in 035) and is trying to expose them?

 Yup, we do. We harvest records from several external databases (WoS, Scopus,
 Pubmed, ArXiV, ...); we enrich (labs, fulltext, ...) and expose them  as any
 other record. This is just an alternative to websubmit for us. We keep the
 external identifiers in the 035__ subfields.

 Cheers,
 Greg

 Cheers!
 Sam
 --
 Samuele Kaplun
 Invenio Developer ** http://invenio-software.org/


 
 Gregory Favre
 Coordinateur Infoscience
 École Polytechnique Fédérale de Lausanne
 KIS - DIT
 Station 8
 CH-1015 Lausanne
 +41 21 693 22 88
 + 41 79 599 09 06
 gregory.fa...@epfl.ch
 http://plan.epfl.ch/?sciper=128933
 








-- 
Benoit Thiell
The SAO/NASA Astrophysics Data System
http://adswww.harvard.edu/


Re: Re-implementation of OAI repository in Invenio

2011-10-03 Thread Ferran Jorba
Hello Samuele,

[...]
 In Invenio there is a special treatment for this 035 field, namely that
 the couple OAIID_TAG + OAIID_PROVENANCE_TAG is used to identify uniquely
 a record.

 So shall I simply add by default to 035 the above mentioned attributes?

 E.g. 

 * baseURL - $u (different than $9 which is a semantic string. The
 baseURL might change because of technical reasons, and therefore the $9
 subfield, when present will receive priority in identify a record).
 * identifier - $a (as per CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG)
 * datestamp - $d
 * metadataNamespace - $m
 * originDescription - $o
 * harvestDate - $h
 * altered - $a

 Is there anyone in the Invenio community whose system is harvesting
 record (putting external IDs in 035) and is trying to expose them?

We also do it.  After backporting your
2fb7275849e83f5afbb7915000e208a3e053889a patch to 0.99.1, now we store
in 035 $a all kinds of external identifiers, including external OAI ids,
with $9 acting as a kind of «namespace identifier» to avoid conflicts.

As per our own OAI id, we spent some time to conclude that, instead of a
local 9XX field, it should go to 024.8_:

 http://www.loc.gov/marc/bibliographic/bd024.html

 CFG_OAI_SET_FIELD = 0248_9
 CFG_OAI_ID_FIELD = 0248_a

So, even if we are re-exposing a small part of our harvested holdings,
at this moment we don't reuse the same tag for both uses.

I understand the need of your suggested fields ($d, $m, etc.), but
please don't hurry up adding non standard subfields to 035.  The more
your default values depart from Marc21 standard, the more difficulties
you are posing to interchange records with other databases, and more
troubles to potential Invenio newcomers.  I don't have the solution
right now, but your fields don't appear in the standard:

 http://www.loc.gov/marc/bibliographic/bd035.html

Maybe you can ask to some librarian before deciding them.

Thanks,

Ferran


Re: Re-implementation of OAI repository in Invenio

2011-10-03 Thread Samuele Kaplun
Hi Ferran et al.

Il giorno lun, 03/10/2011 alle 16.09 +0200, Ferran Jorba ha scritto:
 I understand the need of your suggested fields ($d, $m, etc.), but
 please don't hurry up adding non standard subfields to 035.  The more
 your default values depart from Marc21 standard, the more difficulties
 you are posing to interchange records with other databases, and more
 troubles to potential Invenio newcomers.  I don't have the solution
 right now, but your fields don't appear in the standard:
 
  http://www.loc.gov/marc/bibliographic/bd035.html
 
 Maybe you can ask to some librarian before deciding them.

thanks for your comments. Actually my implementation is very flexible
and you will able to tune any single subfields (of a given tag). The
thing is that in Invenio there was already some special management for
035 (or any other tag specified in CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG, and
I wanted to know what convention/assumption you have already made on it
in case my branch was about to break compatibility.

However I think I found a way to respect any existing convention and
additionally the previously mentioned support for OAI-PMH origin.

Cheers!
Sam

-- 
Samuele Kaplun
Invenio Developer ** http://invenio-software.org/