Re: Invalid UTF-8 characters causing MARC::Record crash.

Ashley Sanders Tue, 17 May 2011 06:30:46 -0700

Hi,

> I'm using MARC::Batch and MARC::Field to iterate through a text file of
> bibliographic records from Voyager.
> 
> The unrecoverable error is actually occurring in the Perl Unicode module
> which is, of course, called by MARC::Record.
> It's running into "invalid UTF-8 character 0xC2."
> When I looked up the Unicode character list, all of the C2 entries are found
> hex characters, so it appears that the second half is missing.
>


I don't have my MARC-8 character set list with me, but I'd guess it's
some good old MARC-8 data mixed in with UTF-8. Or a MARC-8 record with
the wrong leader info.

Unfortunately bad UTF-8 is pretty common in my experience. You can
use a regexp to check if something is valid utf-8:

  http://keithdevens.com/weblog/archive/2004/Jun/29/UTF-8.regex

Then it's up to you to take appropriate action.

Ashley.
--
Ashley Sanders [email protected]
Copac http://copac.ac.uk -- A Mimas service funded by JISC

Re: Invalid UTF-8 characters causing MARC::Record crash.

Reply via email to