I am not familar with that Perl module. But I'm more familiar then I'd want with char encoding in Marc.

I don't recognize the bytes 0xC2 (there are some bytes I became pathetically familiar with in past debugging, but I've forgotten em), but the first things to look at:

1. Is your Marc file encoded in Marc8 or UTF-8? I'm betting Marc8. Theoretically there is a Marc leader byte that tells you whether it's Marc8 or UTF-8, but the leader byte is often wrong in real world records. Is it wrong?

2. Does Perl MARC::Batch have a function to convert from Marc8 to UTF-8? If so, how does it decide whether to convert? Is it trying to do that? Is it assuming that the leader byte the record accurately identifies the encoding, and if so, is the leader byte wrong? Is it trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the first place? Or is it assuming the source was UTF-8 in the first place, when in fact it was Marc8?

Not the answer you wanted, maybe someone else will have that. Debugging char encoding is hands down the most annoying kind of debugging I ever do.

On 4/6/2011 4:13 PM, Eric Lease Morgan wrote:
Ack! While using the venerable Perl MARC::Batch module I get the following 
error while trying to read a MARC record:

   utf8 "\xC2" does not map to Unicode

This is a real pain, and I'm hoping someone here can help me either: 1) trap this error 
allowing me to move on, or 2) figure out how to open the file "correctly".

Reply via email to