I recently came across a nasty issue with MARC::Record to do with output of Marc-8 encoded records. I was converting XML (which was in UTF-8) into MARC records using MARC::Record and had initially, and successfully, got good UTF-8 encoded MARC records out at the end.
However, I then could not load them into our LMS, and realised it was going to be easier at the LMS end if the records were presented in MARC-8. While the Perl modules largely worked and I got the right MARC-8 representation out at the end, the record length and the field offsets and lengths in the directory got in a real mess, because the top-bit-set characters in MARC-8 got counted as though they were code-points 0x80 to 0xFF encoded as two bytes of UTF-8. I found a solution by hackily recalculating the lengths when needed, but I thought I'd mention it as the thread has touched on this area. Matthew -- Matthew Phillips Electronic Systems Librarian, Durham University Durham University Library, Stockton Road, Durham, DH1 3LY +44 (0)191 334 2941 > On Mon, Jul 30, 2012 at 6:51 PM, Shelley Doljack > <sdolj...@stanford.edu>wrote: > > > Hi, > > > > I wrote a script that extracts marc records from a file given certain > > conditions and puts them in a new file. When my input record is correctly > > encoded in UTF-8 and I run my script from windows command prompt, this > > warning message appears: "Wide character in print at > record_extraction.plline 99" (the line in my script where I print to a new > file > using > > as_usmarc). I compared the extracted record before and after in MarcEdit > > and the diacritic was changed. I tried marcdump newfile.mrc to see what > > happens and I get this error: "utf8 \xF4 does not map to Unicode at > > C:/Perl64/lib/Encode.pm line 176." When I run my extraction script again > > with MARC-8 encoded data then I don't have the same problem. > > > > The basic outline of my script is: > > > > my $batch = MARC::Batch->new('USMARC', $input_file); > > > > while (my $record = $batch->next()) { > > #do some checks > > #if checks ok then > > print FILE $record->as_usmarc(); > > } > > > > Do I need to add something that specifies to interpret the data as UTF-8? > > Does MARC::Record not handle UTF-8 at all? > > > > Thanks, > > Shelley > > > > ---- > > Shelley Doljack > > E-Resources Metadata Librarian > > Metadata and Library Systems > > Stanford University Libraries > > sdolj...@stanford.edu > > 650-725-0167 > >