RE: printing UTF-8 encoded MARC records with as_usmarc

PHILLIPS M.E. Tue, 31 Jul 2012 06:41:28 -0700

I recently came across a nasty issue with MARC::Record to do with output of 
Marc-8 encoded records.  I was converting XML (which was in UTF-8) into MARC 
records using MARC::Record and had initially, and successfully, got good UTF-8 
encoded MARC records out at the end.


However, I then could not load them into our LMS, and realised it was going to 
be easier at the LMS end if the records were presented in MARC-8.  While the 
Perl modules largely worked and I got the right MARC-8 representation out at 
the end, the record length and the field offsets and lengths in the directory 
got in a real mess, because the top-bit-set characters in MARC-8 got counted as 
though they were code-points 0x80 to 0xFF encoded as two bytes of UTF-8.  I 
found a solution by hackily recalculating the lengths when needed, but I 
thought I'd mention it as the thread has touched on this area.

Matthew

-- 
Matthew Phillips
Electronic Systems Librarian, Durham University
Durham University Library, Stockton Road, Durham, DH1 3LY
+44 (0)191 334 2941


> On Mon, Jul 30, 2012 at 6:51 PM, Shelley Doljack
> <[email protected]>wrote:
> 
> > Hi,
> >
> > I wrote a script that extracts marc records from a file given certain
> > conditions and puts them in a new file. When my input record is correctly
> > encoded in UTF-8 and I run my script from windows command prompt, this
> > warning message appears: "Wide character in print at
> record_extraction.plline 99" (the line in my script where I print to a new 
> file
> using
> > as_usmarc). I compared the extracted record before and after in MarcEdit
> > and the diacritic was changed. I tried marcdump newfile.mrc to see what
> > happens and I get this error: "utf8 \xF4 does not map to Unicode at
> > C:/Perl64/lib/Encode.pm line 176." When I run my extraction script again
> > with MARC-8 encoded data then I don't have the same problem.
> >
> > The basic outline of my script is:
> >
> > my $batch = MARC::Batch->new('USMARC', $input_file);
> >
> > while (my $record = $batch->next()) {
> >      #do some checks
> >      #if checks ok then
> >      print FILE $record->as_usmarc();
> > }
> >
> > Do I need to add something that specifies to interpret the data as UTF-8?
> > Does MARC::Record not handle UTF-8 at all?
> >
> > Thanks,
> > Shelley
> >
> > ----
> > Shelley Doljack
> > E-Resources Metadata Librarian
> > Metadata and Library Systems
> > Stanford University Libraries
> > [email protected]
> > 650-725-0167
> >

RE: printing UTF-8 encoded MARC records with as_usmarc

Reply via email to