Re: Re-implementation of OAI repository in Invenio
Dear All, Il giorno lun, 03/10/2011 alle 10.27 +0200, Samuele Kaplun ha scritto: So what about having new MARC tag 909CP with the assignment: * baseURL - $u * identifier - $i * datestamp - $d * metadataNamespace - $m * originDescription - $o * harvestDate - $h * altered - $a answering to myself, I've seen we already have in invenio.conf: [...] ## CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG -- where do we store OAI ID tags ## of harvested records? Useful for matching when we harvest stuff ## via OAI that we do not want to reexport via Invenio OAI; so records ## may have only the source OAI ID stored in this tag (kind of like ## external system number too). CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG = 035__a ## CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG -- where do we store OAI SRC ## tags of harvested records? Useful for matching when we harvest stuff ## via OAI that we do not want to reexport via Invenio OAI; so records ## may have only the source OAI SRC stored in this tag (kind of like ## external system number too). Note that the field should be the same of ## CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG. CFG_BIBUPLOAD_EXTERNAL_OAIID_PROVENANCE_TAG = 035__9 [...] As far as I know the main usage of this 035 field in the Invenio community has been by INSPIRE when harvesting from arXiv. However what has been put in 035__9 was not the baseURL of arXiv, rather the string arXiv. In Invenio there is a special treatment for this 035 field, namely that the couple OAIID_TAG + OAIID_PROVENANCE_TAG is used to identify uniquely a record. So shall I simply add by default to 035 the above mentioned attributes? E.g. * baseURL - $u (different than $9 which is a semantic string. The baseURL might change because of technical reasons, and therefore the $9 subfield, when present will receive priority in identify a record). * identifier - $a (as per CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG) * datestamp - $d * metadataNamespace - $m * originDescription - $o * harvestDate - $h * altered - $a Is there anyone in the Invenio community whose system is harvesting record (putting external IDs in 035) and is trying to expose them? Cheers! Sam -- Samuele Kaplun Invenio Developer ** http://invenio-software.org/
Re: Re-implementation of OAI repository in Invenio
Hi Sam. We'll be storing our main identifiers (bibcodes) in 035. Benoit. On Mon, Oct 3, 2011 at 9:02 AM, Gregory Favre gregory.fa...@epfl.ch wrote: Hi Sam! So what about having new MARC tag 909CP with the assignment: I use the 909C0p field for a complete different thing (storing info about labs responsible of the record). Maybe this is the reason why OAI export is broken since our migration to invenio 1.0 (all the sets return the whole database)? Is there anyone in the Invenio community whose system is harvesting record (putting external IDs in 035) and is trying to expose them? Yup, we do. We harvest records from several external databases (WoS, Scopus, Pubmed, ArXiV, ...); we enrich (labs, fulltext, ...) and expose them as any other record. This is just an alternative to websubmit for us. We keep the external identifiers in the 035__ subfields. Cheers, Greg Cheers! Sam -- Samuele Kaplun Invenio Developer ** http://invenio-software.org/ Gregory Favre Coordinateur Infoscience École Polytechnique Fédérale de Lausanne KIS - DIT Station 8 CH-1015 Lausanne +41 21 693 22 88 + 41 79 599 09 06 gregory.fa...@epfl.ch http://plan.epfl.ch/?sciper=128933 -- Benoit Thiell The SAO/NASA Astrophysics Data System http://adswww.harvard.edu/
Re: Re-implementation of OAI repository in Invenio
Hello Samuele, [...] In Invenio there is a special treatment for this 035 field, namely that the couple OAIID_TAG + OAIID_PROVENANCE_TAG is used to identify uniquely a record. So shall I simply add by default to 035 the above mentioned attributes? E.g. * baseURL - $u (different than $9 which is a semantic string. The baseURL might change because of technical reasons, and therefore the $9 subfield, when present will receive priority in identify a record). * identifier - $a (as per CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG) * datestamp - $d * metadataNamespace - $m * originDescription - $o * harvestDate - $h * altered - $a Is there anyone in the Invenio community whose system is harvesting record (putting external IDs in 035) and is trying to expose them? We also do it. After backporting your 2fb7275849e83f5afbb7915000e208a3e053889a patch to 0.99.1, now we store in 035 $a all kinds of external identifiers, including external OAI ids, with $9 acting as a kind of «namespace identifier» to avoid conflicts. As per our own OAI id, we spent some time to conclude that, instead of a local 9XX field, it should go to 024.8_: http://www.loc.gov/marc/bibliographic/bd024.html CFG_OAI_SET_FIELD = 0248_9 CFG_OAI_ID_FIELD = 0248_a So, even if we are re-exposing a small part of our harvested holdings, at this moment we don't reuse the same tag for both uses. I understand the need of your suggested fields ($d, $m, etc.), but please don't hurry up adding non standard subfields to 035. The more your default values depart from Marc21 standard, the more difficulties you are posing to interchange records with other databases, and more troubles to potential Invenio newcomers. I don't have the solution right now, but your fields don't appear in the standard: http://www.loc.gov/marc/bibliographic/bd035.html Maybe you can ask to some librarian before deciding them. Thanks, Ferran
Re: Re-implementation of OAI repository in Invenio
Hi Ferran et al. Il giorno lun, 03/10/2011 alle 16.09 +0200, Ferran Jorba ha scritto: I understand the need of your suggested fields ($d, $m, etc.), but please don't hurry up adding non standard subfields to 035. The more your default values depart from Marc21 standard, the more difficulties you are posing to interchange records with other databases, and more troubles to potential Invenio newcomers. I don't have the solution right now, but your fields don't appear in the standard: http://www.loc.gov/marc/bibliographic/bd035.html Maybe you can ask to some librarian before deciding them. thanks for your comments. Actually my implementation is very flexible and you will able to tune any single subfields (of a given tag). The thing is that in Invenio there was already some special management for 035 (or any other tag specified in CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG, and I wanted to know what convention/assumption you have already made on it in case my branch was about to break compatibility. However I think I found a way to respect any existing convention and additionally the previously mentioned support for OAI-PMH origin. Cheers! Sam -- Samuele Kaplun Invenio Developer ** http://invenio-software.org/