printing UTF-8 encoded MARC records with as_usmarc

Shelley Doljack Mon, 30 Jul 2012 15:51:46 -0700

Hi,

I wrote a script that extracts marc records from a file given certain 
conditions and puts them in a new file. When my input record is correctly 
encoded in UTF-8 and I run my script from windows command prompt, this warning 
message appears: "Wide character in print at record_extraction.pl line 99" (the 
line in my script where I print to a new file using as_usmarc). I compared the 
extracted record before and after in MarcEdit and the diacritic was changed. I 
tried marcdump newfile.mrc to see what happens and I get this error: "utf8 \xF4 
does not map to Unicode at C:/Perl64/lib/Encode.pm line 176." When I run my 
extraction script again with MARC-8 encoded data then I don't have the same 
problem.


The basic outline of my script is:

my $batch = MARC::Batch->new('USMARC', $input_file);

while (my $record = $batch->next()) {
     #do some checks
     #if checks ok then
     print FILE $record->as_usmarc();
}

Do I need to add something that specifies to interpret the data as UTF-8? Does 
MARC::Record not handle UTF-8 at all? 

Thanks,
Shelley

----
Shelley Doljack  
E-Resources Metadata Librarian 
Metadata and Library Systems
Stanford University Libraries
sdolj...@stanford.edu
650-725-0167

printing UTF-8 encoded MARC records with as_usmarc

Reply via email to