On 12/15/03 8:54 AM, Eric Lease Morgan <[EMAIL PROTECTED]> wrote:

> In order to get the MARC records for my "catalog" I have been searching the
> LOC catalog, identifying the record I desire, and using Net::Z3950 to download
> the desired record via the MARC 001 tag. Tastes great. Less filling.
> 
> When I loop through my MARC records MARC::Batch sometimes warns that the MARC
> leader is incorrect. This happens when the record contains a diacritic.
> Specifically, my MARC::Batch object returns "Invalid record length..." I have
> discovered that I can plow right through the record anyway by turning on
> strict_off, but my resulting records get really ugly at the point of the
> diacritic:
> 
>  http://infomotions.com/books/?cmd=search&query=id=russell-world-107149566

Upon further investigation, it seems that MARC::Batch is not necessarily
causing my problem with diacritics, instead, the problem may lie in the way
I am downloading my records using Net::Z3950.

How do I tell Net::Z3950 to download a specific MARC record using a specific
character set?

To download my MARC records from the LOC I feed a locally developed Perl
script, using Net::Z3950, the value from a LOC MARC record, field 001. This
retrieves one an only one record. I then suck up the found record and put it
into a MARC::Record object. It is all done like this:


  # define sum constants
  my $DATABASE = 'voyager';
  my $SERVER   = 'z3950.loc.gov';
  my $PORT     = '7090';
  
  # create a LOC (Voyager) 001 query
  my $query = "[EMAIL PROTECTED] 1=7 3118006";
  
  # create a z39.50 object
  my $z3950 = Net::Z3950::Manager->new(databaseName => $DATABASE);
  
  # assign the object some z39.50 characteristics
  $z3950->option(elementSetName => "f");
  $z3950->option(preferredRecordSyntax => Net::Z3950::RecordSyntax::USMARC);
      
  # connect to the server and check for success
  my $connection = $z3950->connect($SERVER, $PORT);
      
  # search
  my $results = $connection->search($query);
  
  # get the found record and turn it into a MARC::Record object
  my $record = $results->record(1);
  $record = MARC::Record->new_from_usmarc($record->rawdata());

  # create a file name
  my $id = time;

  # write the record
  open MARC, "> $id.marc";
  print MARC $record->as_usmarc;
  close MARC;


This process works just fine for records that contain no diacritics, but
when diacritics are in the records extra characters end up in my saved
files, like this:

  00901nam  22002651
    ^^^
  45000010008000000050017000080080041000250350021000669060045000870
  10001700132040001800149050001800167082001000185100002900195245009
  20022426000340031630000470035049000290039750400260042660000340045
  27100021004869910044005079910055005510990029006063118006
  19740417000000.0731207s1967    nyuabf   b    000 0beng  
  9(DLC)   67029856  a7bcbccorignewdueocipf19gy-gencatlg
  a   67029856   aDLCcDLCdDLC00aND588.D9bR8500a759.31
  aRussell, Francis,d1910-14aThe world of Dˆ®urer,
                                              ^^^^^^^
  1471-1528,cby Francis Russell and the editors of Time-Life
  Books.  aNew York,bTime, inc.c[1967]  a183 p.billus.,
  maps, col. plates.c32 cm.0 aTime-Life library of art
  aBibliography: p. 177.10aDˆ®urer, Albrecht,d1471-1528.2
                              ^^^^^^^
  aTime-Life Books.  bc-GenCollhND588.D9iR85tCopy 1wBOOKS
  bc-GenCollhND588.D9iR85p00034015107tCopy 2wCCF
  arussell-world-1071495663

Notice how Dürer got munged into Dˆ®urer, twice, and consequently the record
length is not 901 but 903 instead.

Some people say I must be sure to request a specific character set from the
LOC when downloading my MARC records, specifically MARC-8 or MARC-UCS. Which
one of these character sets do I want and how do I tell the remote database
which one I want?

-- 
Eric "The Ugly American Who Doesn't Understand Diacritics" Morgan
University Libraries of Notre Dame

(574) 631-8604


Reply via email to