Re: Invalid UTF-8 characters causing MARC::Record crash.

Mike Barrett Tue, 17 May 2011 16:29:54 -0700

Al,
Your code worked like a charm.  Ran the entire test data set  (about 1,000
bibs) w/out a problem.


Dave,
Being a Perl novice, I decided to forgo LocalOverride.  Your description
scared me off.  :)
I'll make that an adventure for another day.

Now I can start on my real project:  Mining bib records for dates and
date-related phrases.  So far, I appear to be chasing a nonexistent entity,
but I'm determined.

Thanks to all for advice!
Mike

On Tue, May 17, 2011 at 9:27 AM, Al <ra...@berkeley.edu> wrote:

> >Anybody ever see this before?
>
> All. The. Time.
>
> When I use Encode.pm version 2.12 I don't have this problem. But it occurs
> repeatedly with version 2.40.
>
> There are a few different solutions, but I'm assuming, like me, that it's
> not practical for you to clean up your MARC records *before* you try and
> process them. So you can downgrade your Encode.pm or modify it to make it
> less demanding. For me I've found the best solution is to leave Encode.pm
> alone and redefine the offending subroutine within my processing script. I
> paste this in at the bottom of every script:
>
> package Encode;
> use Encode::Alias;
>
> sub decode($$;$)
> {
>   my ($name,$octets,$check) = @_;
>   my $altstring = $octets;
>   return undef unless defined $octets;
>   $octets .= '' if ref $octets;
>   $check ||=0;
>   my $enc = find_encoding($name);
>   unless(defined $enc){
>      require Carp;
>      Carp::croak("Unknown encoding '$name'");
>   }
>   my $string;
>   eval { $string = $enc->decode($octets,$check); };
>   $_[1] = $octets if $check and !($check & LEAVE_SRC());
>   if ($@) {
>      return $altstring;
>   } else {
>      return $string;
>   }
> }
>
> But I'll be interested in other solutions people may bring up.
>
> Good luck!
>
> Al
>
>
>
> At 5/17/2011, Mike Barrett wrote:
> >I'm using MARC::Batch and MARC::Field to iterate through a text file of
> >bibliographic records from Voyager.
> >
> >The unrecoverable error is actually occurring in the Perl Unicode module
> >which is, of course, called by MARC::Record.
> >It's running into "invalid UTF-8 character 0xC2."
> >When I looked up the Unicode character list, all of the C2 entries are
> found
> >hex characters, so it appears that the second half is missing.
> >
> >After looking at the records in Voyager (using Arial Unicode MS font), I
> >find that all of the problem records I've found are maps with Field 255|a
> >[scale] |b [projection] |c [geo cordinates].
> >
> >Here's an example:
> >As it appears in the text file:  c(W 106Â¿Â¿Â¿30Â¿Â¿00Â¿Â¿--W
> >104Â¿Â¿Â¿52Â¿Â¿30Â¿Â¿/N
> >39Â¿Â¿Â¿22Â¿Â¿30Â¿Â¿--N 37Â¿Â¿Â¿15Â¿Â¿00Â¿Â¿).
> >As it appears in Voyager Cataloging module:  ‡a Scale 1:126,720 ââ€¡c (W
> >106â °30Ê¹00Êº--W 104â °52Ê¹30Êº/N 39â °22Ê¹30Êº--N 37â °15Ê¹00Êº).
> >
> >
> >Thanks,
> >Mike Barrett
>
>

Re: Invalid UTF-8 characters causing MARC::Record crash.

Reply via email to