It looks like the dataset is available in XML format. Perhaps you can import it into an XML database (eXist - exist-db.org comes to mind), and then generate a report via its query capabilities.

Miles Fidelman

Jonathan Rochkind wrote:
If you are, can become, or know, a programmer, that would be relatively 
straightforward in any programming language using the open source MARC 
processing library for that language. (ruby marc, pymarc, perl marc, whatever).

Although you might find more trouble than you expect around authorities, with 
them being less standardized in your corpus than you might like.
________________________________________
From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Stuart Yeates 
[stuart.yea...@vuw.ac.nz]
Sent: Sunday, November 02, 2014 5:48 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] MARC reporting engine

I have ~800,000 MARC records from an indexing service 
(http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am trying to 
generate:

(a) a list of person authorities (and sundry metadata), sorted by how many 
times they're referenced, in wikimedia syntax

(b) a view of a person authority, with all the records by which they're 
referenced, processed into a wikipedia stub biography

I have established that this is too much data to process in XSLT or multi-line 
regexps in vi. What other MARC engines are there out there?

The two options I'm aware of are learning multi-line processing in sed or 
learning enough koha to write reports in whatever their reporting engine is.

Any advice?

cheers
stuart
--
I have a new phone number: 04 463 5692


--
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra

Reply via email to