Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Jonathan Rochkind Wed, 06 Apr 2011 13:29:52 -0700

I am not familar with that Perl module. But I'm more familiar then I'dwant with char encoding in Marc.

I don't recognize the bytes 0xC2 (there are some bytes I becamepathetically familiar with in past debugging, but I've forgotten em),but the first things to look at:

1. Is your Marc file encoded in Marc8 or UTF-8? I'm betting Marc8.Theoretically there is a Marc leader byte that tells you whether it'sMarc8 or UTF-8, but the leader byte is often wrong in real worldrecords. Is it wrong?

2. Does Perl MARC::Batch have a function to convert from Marc8 toUTF-8? If so, how does it decide whether to convert? Is it trying todo that? Is it assuming that the leader byte the record accuratelyidentifies the encoding, and if so, is the leader byte wrong? Is ittrying to convert from Marc8 to UTF-8, when the source was UTF-8 in thefirst place? Or is it assuming the source was UTF-8 in the first place,when in fact it was Marc8?

Not the answer you wanted, maybe someone else will have that. Debuggingchar encoding is hands down the most annoying kind of debugging I ever do.


On 4/6/2011 4:13 PM, Eric Lease Morgan wrote:

Ack! While using the venerable Perl MARC::Batch module I get the following 
error while trying to read a MARC record:

   utf8 "\xC2" does not map to Unicode

This is a real pain, and I'm hoping someone here can help me either: 1) trap this error 
allowing me to move on, or 2) figure out how to open the file "correctly".

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Reply via email to