This is related to my previous post (9/17/2015) about deleting 035 fields after
RDA-ification. Jon Gorman solved that one for me by pointing out that I
probably had a problem with my perl libraries.
But now, instead of creating the record from the database and writing it back
to the database, I am reading from a file exported from my database, which is
UTF-8. Specifically, the blasted copyright symbol again. As stored in the
database, the copyright symbol is encoded as C2 A9, which if I read the tables
correctly, is the correct UTF-8 encoding for copyright. But when I read the
record from a file and write it back to the file after deleting the problematic
035, the encoding for the copyright symbol has been turned into A9.
This "transformation" happens both when running the perl program on my pc and
on the unix server. Interestingly, complicated Unicode seems to be okay. I took
a record with Hebrew vernacular characters and edited it using my program, then
ran the source record and target record through xxd. I then diffed the files;
it showed no difference. But the before and after of the record that has the
copyright symbol munges the copyright by stripping the C2.
Here's the program. If anybody can tell my what I'm doing wrong I'd really
appreciate it.
----------------------------------------------------------------------------------------------------------
use strict;
use warnings;
use MARC::Record;
use MARC::Batch;
my $infile='4788022.bib';
my $batch = MARC::Batch->new('USMARC',"$infile");
my $outfile='4788022.edited.bib';
open(OUTPUT, ">$outfile");
while (my $record = $batch->next) {
my $f001 = $record->field('001');
my $bib_id = $f001->as_string();
my @a035 = $record->field('035');
foreach my $f035 (@a035) {
if (my $f035a = $f035->subfield('a')) {
if ($f035a eq $bib_id) {
$record->delete_field($f035);
}
}
}
print OUTPUT $record->as_usmarc();
}
Anne L. Highsmith
Director, Consortia Systems
TAMU Libraries
5000 TAMU
College Station, TX 77843-5000
979 862 4234
[email protected]