Better yet, we could use 'uniq -d'. It only prints out the duplicate lines, so the following should suffice: marcprint | grep "=001" | sort | uniq -d
I agree we should also be able to do something with SQL or SQL/Perl pretty easily, but this was quick. -Doug On 1 September 2012 17:26, Mark Tompsett <mtomp...@hotmail.com> wrote: > Greetings, > > I thought I'd interject a bit. > > > It would be good to build a tool to find duplicate control numbers. >> I did this by exporting all the biblios, using: >> marcprint (my python utility) | grep "=001" | sort | uniq -c | sort -r | >> less >> and looked for counts greater than 1. >> > > I'll use your "marcprint" in my example, though I suspect exporting > a MARC file would be useful enough, if the MARC file is then > converted to something "human readable". Under Windows, I would > likely used MarcEdit to "break" the .mrc file into a .mrk file. > http://people.oregonstate.edu/**~reeset/marcedit/html/<http://people.oregonstate.edu/~reeset/marcedit/html/> > And then having uploaded my .mrk files into a linux environment, > substitute "marcprint" with "cat mymarcfile.mrk". All this uploading > got me thinking that perhaps something like: > SELECT ExtractValue(marcxml,'//**datafield[@tag="001"]/*') AS Control > FROM biblioitems > But I didn't take it further than this, since I don't have time. > > NOTATION: # is a comment. $ is a command line prompt > > # Get a list of unique "=001" fields, should be one per record, right? > $ marcprint | grep "=001" | sort -u -r > ~/check1.txt > # Get a list of all the "=001" fields. > $ marcprint | grep "=001" | sort -r > ~/check2.txt > # Compare the two. Any differences will be due to duplications. > $ diff ~/check1.txt ~/check2.txt > > GPML, > Mark Tompsett > _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha