For the life of me I can't figure out how to do reading and writing of UTF-8
with MARC::Batch.
I have a UTF-8 encoded file of MARC records. Dumping the records and greping
for a particular string illustrates the validity:
$ marcdump und.marc | grep Sainte-Face
und.marc
1000 records
2000 records
3000 records
4000 records
5000 records
6000 records
7000 records
8000 records
9000 records
10000 records
11000 records
12000 records
245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
610 20 _aArchiconfrérie de la Sainte-Face
13000 records
$
I then run a Perl script that simply reads each record and dumps it to STDOUT.
Notice how I define both my input and output as UTF-8:
#!/shared/perl/current/bin/perl
# configure
use constant MARC => './und.marc';
# require
use strict;
use MARC::Batch;
# initialize
binmode ( MARC, ":utf8" );
my $batch = MARC::Batch->new( 'USMARC', MARC );
$batch->strict_off;
$batch->warnings_off;
binmode( STDOUT, ":utf8" );
# read & write
while ( my $marc = $batch->next ) { print $marc->as_usmarc }
# done
exit;
But my output is munged:
$ ./marc.pl > und.mrc
$ marcdump und.mrc | grep Sainte-Face
und.mrc
1000 records
2000 records
3000 records
4000 records
5000 records
6000 records
7000 records
8000 records
9000 records
10000 records
11000 records
12000 records
245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
610 _aArchiconfrérie de la Sainte-Face
13000 records
$
What am I doing wrong!?
--
Eric Lease Morgan
University of Notre Dame
574/631-8604