Hi, > I'm using MARC::Batch and MARC::Field to iterate through a text file of > bibliographic records from Voyager. > > The unrecoverable error is actually occurring in the Perl Unicode module > which is, of course, called by MARC::Record. > It's running into "invalid UTF-8 character 0xC2." > When I looked up the Unicode character list, all of the C2 entries are found > hex characters, so it appears that the second half is missing. >
I don't have my MARC-8 character set list with me, but I'd guess it's some good old MARC-8 data mixed in with UTF-8. Or a MARC-8 record with the wrong leader info. Unfortunately bad UTF-8 is pretty common in my experience. You can use a regexp to check if something is valid utf-8: http://keithdevens.com/weblog/archive/2004/Jun/29/UTF-8.regex Then it's up to you to take appropriate action. Ashley. -- Ashley Sanders a.sand...@manchester.ac.uk Copac http://copac.ac.uk -- A Mimas service funded by JISC