Hi Jan, Yes, I'm somewhat familiar with solrmarc in that I know a couple of the developers and I know it is used internally by blacklight and vufind. It is definitely a speedy solution for getting marc into solr, but that part wasn't where I was having problems. It was more the getting usable marc out of Invenio that was the challenge.
--jay On Tue, Jun 15, 2010 at 11:14 AM, Jan Iwaszkiewicz <[email protected]> wrote: > Hi Jay, > > Let me add my 2 cents. > Have you come across the solrmarc project ( > http://code.google.com/p/solrmarc/) ? > It is supposed to be a solution for the problem of uploading MARC data to > Solr. > > --Jan > > Jay Luker wrote: >> >> On Thu, Jun 10, 2010 at 5:02 PM, Samuele Kaplun <[email protected]> >> wrote: >> >> >>> >>> Currently when you use xmlmarc2textmarc utility, and you export to >>> Aleph, a dummy leader is generated, which proved to be enough for Aleph. >>> >> >> Ahh, I wasn't aware of that existing utility. I showed the marc output >> from bibformat to some librarian coder friends and they said, "huh, >> looks like aleph sequential" :) >> >> So Aleph doesn't care about the leader values? >> >> >>> >>> I am currently not an expert on the leader subject, but to me it seems >>> that the leader makes sense mostly in the MARC21 binary format, and when >>> dealing with plain library records. And it exists in MARCXML just as a >>> conversion consequence. Is this correct? Proof is that Invenio can do >>> powerful and extremely flexible things without any need for the leader. >>> >> >> Not an expert either. I think for Invenio it really just boils down to >> interoperability. I mean, saying, Software X can do incredible, >> amazing things with internal, non-standard format Y, isn't really a >> remarkable statement. >> >> >>> >>> In particular if Invenio has to support the leader in MARCXML how can we >>> map its workflows with the rigid schema of the leader: >>> >> >> I think sensible defaults for some values combined with a minimum of >> conditional logic should suffice. The first part of that may be the >> trickier as I'm still trying to figure out defaults myself. >> >> >>> >>> Also what is the meaning of certain bytes of the leader in MARCXML: >>> (from <http://www.loc.gov/marc/bibliographic/bdleader.html>): >>> >>> [...] >>> Character Positions >>> 00-04 - Record length >>> [...] >>> 12-16 - Base address of data >>> [...] >>> >> >> leader/05 = 'n' - the term "new" in this context is confusing but >> I've been told "don't overthink it" >> leader/06 = 'a' - "...electronic resources that are basically textual in >> nature" >> leader/07 is where i'm less confident but I *think* the logic is >> simply 'b' for articles, 'a' for things that are part of a collection >> or proceedings, and 'm' for everything else. For ADS we are currently >> storing our internal item-type description in the 690a (which may be >> incorrect) and this is how i'm determining the leader/07 >> leader/08 = '#' - not sure about this one >> leader/09 = 'a' - assuming unicode >> >> I've been looking at http://www.itsmarc.com/crs/bib1465.htm for some >> guidance and when I have a dumb question about something I'll ask in >> the #code4lib irc channel. >> >> Apparently the 008 control field >> (http://www.loc.gov/marc/bibliographic/bd008.html) is also important >> to many applications, but I haven't really explored it or determined >> the level of importance. >> >> >>> >>> In the end, probably the best thing is still to put a fake leader like >>> xmlmarc2textmarc currently does, with the most neutral values. >>> >> >> Yeah, I guess I agree, although I'm not sure what a "neutral" value >> would be for something like the leader/07. also it's important to get >> the leader/09 correct as tools like pymarc need to know how to decode. >> >> >>> >>> It is true that, on the other hand, when Invenio records have been >>> imported from original MARC21 or from MARCXML with a leader, Invenio >>> should not throw away such information. >>> >> >> Agreed. >> >> --jay >> >> ****************************************************** >> Jay Luker Astrophysics Data System (ADS) >> [email protected] Center for Astrophysics >> 617-495-4588 60 Garden Street MS 67 >> 617-495-7356 fax Cambridge, MA 02138 >> ****************************************************** >> > > -- ****************************************************** Jay Luker Astrophysics Data System (ADS) [email protected] Center for Astrophysics 617-495-4588 60 Garden Street MS 67 617-495-7356 fax Cambridge, MA 02138 ******************************************************

