On 12/15/03 8:54 AM, Eric Lease Morgan <[EMAIL PROTECTED]> wrote:
> In order to get the MARC records for my "catalog" I have been searching the
> LOC catalog, identifying the record I desire, and using Net::Z3950 to download
> the desired record via the MARC 001 tag. Tastes great. Less filling.
>
> When I loop through my MARC records MARC::Batch sometimes warns that the MARC
> leader is incorrect. This happens when the record contains a diacritic.
> Specifically, my MARC::Batch object returns "Invalid record length..." I have
> discovered that I can plow right through the record anyway by turning on
> strict_off, but my resulting records get really ugly at the point of the
> diacritic:
>
> http://infomotions.com/books/?cmd=search&query=id=russell-world-107149566
Upon further investigation, it seems that MARC::Batch is not necessarily
causing my problem with diacritics, instead, the problem may lie in the way
I am downloading my records using Net::Z3950.
How do I tell Net::Z3950 to download a specific MARC record using a specific
character set?
To download my MARC records from the LOC I feed a locally developed Perl
script, using Net::Z3950, the value from a LOC MARC record, field 001. This
retrieves one an only one record. I then suck up the found record and put it
into a MARC::Record object. It is all done like this:
# define sum constants
my $DATABASE = 'voyager';
my $SERVER = 'z3950.loc.gov';
my $PORT = '7090';
# create a LOC (Voyager) 001 query
my $query = "[EMAIL PROTECTED] 1=7 3118006";
# create a z39.50 object
my $z3950 = Net::Z3950::Manager->new(databaseName => $DATABASE);
# assign the object some z39.50 characteristics
$z3950->option(elementSetName => "f");
$z3950->option(preferredRecordSyntax => Net::Z3950::RecordSyntax::USMARC);
# connect to the server and check for success
my $connection = $z3950->connect($SERVER, $PORT);
# search
my $results = $connection->search($query);
# get the found record and turn it into a MARC::Record object
my $record = $results->record(1);
$record = MARC::Record->new_from_usmarc($record->rawdata());
# create a file name
my $id = time;
# write the record
open MARC, "> $id.marc";
print MARC $record->as_usmarc;
close MARC;
This process works just fine for records that contain no diacritics, but
when diacritics are in the records extra characters end up in my saved
files, like this:
00901nam 22002651
^^^
45000010008000000050017000080080041000250350021000669060045000870
10001700132040001800149050001800167082001000185100002900195245009
20022426000340031630000470035049000290039750400260042660000340045
27100021004869910044005079910055005510990029006063118006
19740417000000.0731207s1967 nyuabf b 000 0beng
9(DLC) 67029856 a7bcbccorignewdueocipf19gy-gencatlg
a 67029856 aDLCcDLCdDLC00aND588.D9bR8500a759.31
aRussell, Francis,d1910-14aThe world of D��urer,
^^^^^^^
1471-1528,cby Francis Russell and the editors of Time-Life
Books. aNew York,bTime, inc.c[1967] a183 p.billus.,
maps, col. plates.c32 cm.0 aTime-Life library of art
aBibliography: p. 177.10aD��urer, Albrecht,d1471-1528.2
^^^^^^^
aTime-Life Books. bc-GenCollhND588.D9iR85tCopy 1wBOOKS
bc-GenCollhND588.D9iR85p00034015107tCopy 2wCCF
arussell-world-1071495663
Notice how D�rer got munged into D��urer, twice, and consequently the record
length is not 901 but 903 instead.
Some people say I must be sure to request a specific character set from the
LOC when downloading my MARC records, specifically MARC-8 or MARC-UCS. Which
one of these character sets do I want and how do I tell the remote database
which one I want?
--
Eric "The Ugly American Who Doesn't Understand Diacritics" Morgan
University Libraries of Notre Dame
(574) 631-8604