Eric,
Have you tried checking how MARC::Batch views the encoding?
e.g.
# read & write
while ( my $marc = $batch->next ) { print $marc->encoding(); print
$marc->as_usmarc; }
It is supposed to pick up the encoding from 09 in the leader but I am not sure
this is totally reliable. If you know this is definitely a utf8 file you can
mannually set the encoding (but you shouldn't have to).
e.g.
# read & write
while ( my $marc = $batch->next ) { $marc->encoding('UTF-8'); print
$marc->as_usmarc; }
regards
Alan
--
Alan Brown
Library Systems Liaison Officer
Bury Library Service
Resource Services
Textile Hall
Manchester Rd
Bury BL9 0DG
0161 253 5877
http://www.bury.gov.uk/libraries
http://library.bury.gov.uk
-----Original Message-----
From: Eric Lease Morgan [mailto:[email protected]]
Sent: 26 March 2013 20:22
To: [email protected]
Subject: reading and writing of utf-8 with marc::batch
For the life of me I can't figure out how to do reading and writing of UTF-8
with MARC::Batch.
I have a UTF-8 encoded file of MARC records. Dumping the records and greping
for a particular string illustrates the validity:
$ marcdump und.marc | grep Sainte-Face
und.marc
1000 records
2000 records
3000 records
4000 records
5000 records
6000 records
7000 records
8000 records
9000 records
10000 records
11000 records
12000 records
245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
610 20 _aArchiconfrérie de la Sainte-Face
13000 records
$
I then run a Perl script that simply reads each record and dumps it to STDOUT.
Notice how I define both my input and output as UTF-8:
#!/shared/perl/current/bin/perl
# configure
use constant MARC => './und.marc';
# require
use strict;
use MARC::Batch;
# initialize
binmode ( MARC, ":utf8" );
my $batch = MARC::Batch->new( 'USMARC', MARC );
$batch->strict_off;
$batch->warnings_off;
binmode( STDOUT, ":utf8" );
# read & write
while ( my $marc = $batch->next ) { print $marc->as_usmarc }
# done
exit;
But my output is munged:
$ ./marc.pl > und.mrc
$ marcdump und.mrc | grep Sainte-Face
und.mrc
1000 records
2000 records
3000 records
4000 records
5000 records
6000 records
7000 records
8000 records
9000 records
10000 records
11000 records
12000 records
245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
610 _aArchiconfrérie de la Sainte-Face
13000 records
$
What am I doing wrong!?
--
Eric Lease Morgan
University of Notre Dame
574/631-8604
-----------------------------------------------------------------
Why not visit our website www.bury.gov.uk
-----------------------------------------------------------------
Incoming and outgoing e-mail messages are routinely monitored for compliance
with our information security policy.
The information contained in this e-mail and any files transmitted
with it is for the intended recipient(s) alone. It may contain
confidential information that is exempt from the disclosure under
English law and may also be covered by legal,professional or other privilege.
If you are not the intended recipient, you must not copy, distribute or take any
action in reliance on it.
If you have received this e-mail in error, please notify us immediately by
using
the reply facility on your e-mail system.
If this message is being transmitted over the Internet, be aware that it may be
intercepted by third parties.
As a public body, the Council may be required to disclose this e-mail or any
response to it under the Freedom of Information Act 2000 unless the information
in it is covered by one of the exemptions in the Act.
Electronic service accepted only at [email protected] and on fax number
0161 253 5119 .
*************************************************************