Better yet, we could use 'uniq -d'.  It only prints out the duplicate
lines, so the following should suffice:
marcprint | grep "=001" | sort | uniq -d

I agree we should also be able to do something with SQL or SQL/Perl pretty
easily, but this was quick.

-Doug

On 1 September 2012 17:26, Mark Tompsett <mtomp...@hotmail.com> wrote:

> Greetings,
>
> I thought I'd interject a bit.
>
>
>  It would be good to build a tool to find duplicate control numbers.
>> I did this by exporting all the biblios, using:
>> marcprint (my python utility) | grep "=001" | sort | uniq -c | sort -r |
>> less
>> and looked for counts greater than 1.
>>
>
> I'll use your "marcprint" in my example, though I suspect exporting
> a MARC file would be useful enough, if the MARC file is then
> converted to something "human readable". Under Windows, I would
> likely used MarcEdit to "break" the .mrc file into a .mrk file.
> http://people.oregonstate.edu/**~reeset/marcedit/html/<http://people.oregonstate.edu/~reeset/marcedit/html/>
> And then having uploaded my .mrk files into a linux environment,
> substitute "marcprint" with "cat mymarcfile.mrk". All this uploading
> got me thinking that perhaps something like:
> SELECT ExtractValue(marcxml,'//**datafield[@tag="001"]/*') AS Control
> FROM biblioitems
> But I didn't take it further than this, since I don't have time.
>
> NOTATION: # is a comment. $ is a command line prompt
>
> # Get a list of unique "=001" fields, should be one per record, right?
> $ marcprint | grep "=001" | sort -u -r > ~/check1.txt
> # Get a list of all the "=001" fields.
> $ marcprint | grep "=001" | sort -r > ~/check2.txt
> # Compare the two. Any differences will be due to duplications.
> $ diff ~/check1.txt ~/check2.txt
>
> GPML,
> Mark Tompsett
>
_______________________________________________
Koha mailing list  http://koha-community.org
Koha@lists.katipo.co.nz
http://lists.katipo.co.nz/mailman/listinfo/koha

Reply via email to