You don't have to build your own indexer. You might use the pymarc parser to pull the records into a flat database like Mongo, then pull reports from there. It really depends on what the service is delivering.
This would be much less insanity inducing than regexes in vi. I do agree with Jonathan. If authorities were easy, everyone would be doing them. Cary On Sunday, November 2, 2014, Stuart Yeates <stuart.yea...@vuw.ac.nz> wrote: > Do any of these have built-in indexing? 800k records isn't going to fit in > memory and if building my own MARC indexer is 'relatively straightforward' > then you're a better coder than I am. > > cheers > stuart > > -- > I have a new phone number: 04 463 5692 > > ________________________________________ > From: Code for Libraries <CODE4LIB@LISTSERV.ND.EDU> on behalf of Jonathan > Rochkind <rochk...@jhu.edu> > Sent: Monday, 3 November 2014 1:24 p.m. > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] MARC reporting engine > > If you are, can become, or know, a programmer, that would be relatively > straightforward in any programming language using the open source MARC > processing library for that language. (ruby marc, pymarc, perl marc, > whatever). > > Although you might find more trouble than you expect around authorities, > with them being less standardized in your corpus than you might like. > ________________________________________ > From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Stuart > Yeates [stuart.yea...@vuw.ac.nz] > Sent: Sunday, November 02, 2014 5:48 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] MARC reporting engine > > I have ~800,000 MARC records from an indexing service ( > http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am > trying to generate: > > (a) a list of person authorities (and sundry metadata), sorted by how many > times they're referenced, in wikimedia syntax > > (b) a view of a person authority, with all the records by which they're > referenced, processed into a wikipedia stub biography > > I have established that this is too much data to process in XSLT or > multi-line regexps in vi. What other MARC engines are there out there? > > The two options I'm aware of are learning multi-line processing in sed or > learning enough koha to write reports in whatever their reporting engine is. > > Any advice? > > cheers > stuart > -- > I have a new phone number: 04 463 5692 > -- Cary Gordon The Cherry Hill Company http://chillco.com