Al, Your code worked like a charm. Ran the entire test data set (about 1,000 bibs) w/out a problem.
Dave, Being a Perl novice, I decided to forgo LocalOverride. Your description scared me off. :) I'll make that an adventure for another day. Now I can start on my real project: Mining bib records for dates and date-related phrases. So far, I appear to be chasing a nonexistent entity, but I'm determined. Thanks to all for advice! Mike On Tue, May 17, 2011 at 9:27 AM, Al <ra...@berkeley.edu> wrote: > >Anybody ever see this before? > > All. The. Time. > > When I use Encode.pm version 2.12 I don't have this problem. But it occurs > repeatedly with version 2.40. > > There are a few different solutions, but I'm assuming, like me, that it's > not practical for you to clean up your MARC records *before* you try and > process them. So you can downgrade your Encode.pm or modify it to make it > less demanding. For me I've found the best solution is to leave Encode.pm > alone and redefine the offending subroutine within my processing script. I > paste this in at the bottom of every script: > > package Encode; > use Encode::Alias; > > sub decode($$;$) > { > my ($name,$octets,$check) = @_; > my $altstring = $octets; > return undef unless defined $octets; > $octets .= '' if ref $octets; > $check ||=0; > my $enc = find_encoding($name); > unless(defined $enc){ > require Carp; > Carp::croak("Unknown encoding '$name'"); > } > my $string; > eval { $string = $enc->decode($octets,$check); }; > $_[1] = $octets if $check and !($check & LEAVE_SRC()); > if ($@) { > return $altstring; > } else { > return $string; > } > } > > But I'll be interested in other solutions people may bring up. > > Good luck! > > Al > > > > At 5/17/2011, Mike Barrett wrote: > >I'm using MARC::Batch and MARC::Field to iterate through a text file of > >bibliographic records from Voyager. > > > >The unrecoverable error is actually occurring in the Perl Unicode module > >which is, of course, called by MARC::Record. > >It's running into "invalid UTF-8 character 0xC2." > >When I looked up the Unicode character list, all of the C2 entries are > found > >hex characters, so it appears that the second half is missing. > > > >After looking at the records in Voyager (using Arial Unicode MS font), I > >find that all of the problem records I've found are maps with Field 255|a > >[scale] |b [projection] |c [geo cordinates]. > > > >Here's an example: > >As it appears in the text file: c(W 106¿¿¿30¿¿00¿¿--W > >104¿¿¿52¿¿30¿¿/N > >39¿¿¿22¿¿30¿¿--N 37¿¿¿15¿¿00¿¿). > >As it appears in Voyager Cataloging module: ‡a Scale 1:126,720 â‡c (W > >106â °30ʹ00ʺ--W 104â °52ʹ30ʺ/N 39â °22ʹ30ʺ--N 37â °15ʹ00ʺ). > > > > > >Thanks, > >Mike Barrett > >