Opening & writing to UTF-8 files; copyright symbol again

Highsmith, Anne L Fri, 13 Nov 2015 12:02:25 -0800

This is related to my previous post (9/17/2015) about deleting 035 fields after 
RDA-ification. Jon Gorman solved that one for me by pointing out that I 
probably had a problem with my perl libraries.


But now, instead of creating the record from the database and writing it back 
to the database, I am reading from a file exported from my database, which is 
UTF-8. Specifically, the blasted copyright symbol again. As stored in the 
database, the copyright symbol is encoded as C2 A9, which if I read the tables 
correctly, is the correct UTF-8 encoding for copyright. But when I read the 
record from a file and write it back to the file after deleting the problematic 
035, the encoding for the copyright symbol has been turned into A9.

This "transformation" happens both when running the perl program on my pc and 
on the unix server. Interestingly, complicated Unicode seems to be okay. I took 
a record with Hebrew vernacular characters and edited it using my program, then 
ran the source record and target record through xxd. I then diffed the files; 
it showed no difference. But the before and after of the record that has the 
copyright symbol munges the copyright by stripping the C2.

Here's the program. If anybody can tell my what I'm doing wrong I'd really 
appreciate it.
----------------------------------------------------------------------------------------------------------
use strict;
use warnings;
use MARC::Record;
use MARC::Batch;
my $infile='4788022.bib';
my $batch = MARC::Batch->new('USMARC',"$infile");
my $outfile='4788022.edited.bib';
open(OUTPUT, ">$outfile");

while (my $record = $batch->next) {
     my $f001 = $record->field('001');
     my $bib_id = $f001->as_string();

     my @a035 = $record->field('035');
     foreach my $f035 (@a035) {
           if (my $f035a = $f035->subfield('a')) {
                if ($f035a eq $bib_id) {
                     $record->delete_field($f035);
                }
           }
     }
     print OUTPUT $record->as_usmarc();
}



Anne L. Highsmith
Director, Consortia Systems
TAMU Libraries
5000 TAMU
College Station, TX   77843-5000
979 862 4234
hism...@tamu.edu

Opening & writing to UTF-8 files; copyright symbol again

Reply via email to