Thanks for the replies. To clarify, I am working with 2 (or more in the future) 
marc records outside of the ILS. I've tried using Marcedit but my usage did 
vary...not much overlap with the control fields that were available to me. I 
have a feeling they are a bit varied. I'm also messing around with marcXimiL a 
little but I'm having trouble getting it to output any records at all. I also 
was looking at the XC aggregation module but I was having trouble getting that 
to work properly as well and the listserv was unresponsive. It seemed like good 
software but it required me to set up an OAI harvest source to allow it to 
ingest the records and that...well...enough is enough... I think I will 
probably need to write something, and at least that way I know what it will be 
doing rather than plowing through software that has little to no support. 
Please feel free to let me know of a particular strategy you think might work 
best in this regard...

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
mbecca...@paulsmiths.edu
Become a friend of Paul Smith's Library on Facebook today!


-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Andy 
Kohler
Sent: Thursday, August 15, 2013 2:29 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] De-dup MARC Ebook records

Are you expecting to work with two files of records, outside of your ILS?
If so, for a project like that I'd probably write Perl script(s) using 
MARC::Record (there are similar code libraries for Ruby, Python and Java at 
least).

For each record in each file, use the ISBN (and/or OCLC number and/or LCCN) as 
a key.  Compare all sets, and keep one record per key.

This assumes that the vendors are supplying records with standard identifiers, 
and not just their own record numbers.

If you're comparing each file with what's already in your ILS, then it'll 
depend on the tools the ILS offers for matching incoming records to the 
database.  Or, export the database and compare it with the files, as above.

Andy Kohler / UCLA Library Info Tech
akoh...@library.ucla.edu / 310 206-8312

On Thu, Aug 15, 2013 at 10:11 AM, Michael Beccaria <mbecca...@paulsmiths.edu
> wrote:

> Has anyone had any luck finding a good way to de-duplicate MARC 
> records from ebook vendors. We're looking to integrate Ebrary and 
> Ebsco Academic Ebook collections and they estimate an overlap into the 10's 
> of thousands.
>
>

Reply via email to