On 12/15/03 8:54 AM, Eric Lease Morgan <[EMAIL PROTECTED]> wrote: > In order to get the MARC records for my "catalog" I have been searching the > LOC catalog, identifying the record I desire, and using Net::Z3950 to download > the desired record via the MARC 001 tag. Tastes great. Less filling. > > When I loop through my MARC records MARC::Batch sometimes warns that the MARC > leader is incorrect. This happens when the record contains a diacritic. > Specifically, my MARC::Batch object returns "Invalid record length..." I have > discovered that I can plow right through the record anyway by turning on > strict_off, but my resulting records get really ugly at the point of the > diacritic: > > http://infomotions.com/books/?cmd=search&query=id=russell-world-107149566
Upon further investigation, it seems that MARC::Batch is not necessarily causing my problem with diacritics, instead, the problem may lie in the way I am downloading my records using Net::Z3950. How do I tell Net::Z3950 to download a specific MARC record using a specific character set? To download my MARC records from the LOC I feed a locally developed Perl script, using Net::Z3950, the value from a LOC MARC record, field 001. This retrieves one an only one record. I then suck up the found record and put it into a MARC::Record object. It is all done like this: # define sum constants my $DATABASE = 'voyager'; my $SERVER = 'z3950.loc.gov'; my $PORT = '7090'; # create a LOC (Voyager) 001 query my $query = "[EMAIL PROTECTED] 1=7 3118006"; # create a z39.50 object my $z3950 = Net::Z3950::Manager->new(databaseName => $DATABASE); # assign the object some z39.50 characteristics $z3950->option(elementSetName => "f"); $z3950->option(preferredRecordSyntax => Net::Z3950::RecordSyntax::USMARC); # connect to the server and check for success my $connection = $z3950->connect($SERVER, $PORT); # search my $results = $connection->search($query); # get the found record and turn it into a MARC::Record object my $record = $results->record(1); $record = MARC::Record->new_from_usmarc($record->rawdata()); # create a file name my $id = time; # write the record open MARC, "> $id.marc"; print MARC $record->as_usmarc; close MARC; This process works just fine for records that contain no diacritics, but when diacritics are in the records extra characters end up in my saved files, like this: 00901nam 22002651 ^^^ 45000010008000000050017000080080041000250350021000669060045000870 10001700132040001800149050001800167082001000185100002900195245009 20022426000340031630000470035049000290039750400260042660000340045 27100021004869910044005079910055005510990029006063118006 19740417000000.0731207s1967 nyuabf b 000 0beng 9(DLC) 67029856 a7bcbccorignewdueocipf19gy-gencatlg a 67029856 aDLCcDLCdDLC00aND588.D9bR8500a759.31 aRussell, Francis,d1910-14aThe world of Dˆ®urer, ^^^^^^^ 1471-1528,cby Francis Russell and the editors of Time-Life Books. aNew York,bTime, inc.c[1967] a183 p.billus., maps, col. plates.c32 cm.0 aTime-Life library of art aBibliography: p. 177.10aDˆ®urer, Albrecht,d1471-1528.2 ^^^^^^^ aTime-Life Books. bc-GenCollhND588.D9iR85tCopy 1wBOOKS bc-GenCollhND588.D9iR85p00034015107tCopy 2wCCF arussell-world-1071495663 Notice how Dürer got munged into Dˆ®urer, twice, and consequently the record length is not 901 but 903 instead. Some people say I must be sure to request a specific character set from the LOC when downloading my MARC records, specifically MARC-8 or MARC-UCS. Which one of these character sets do I want and how do I tell the remote database which one I want? -- Eric "The Ugly American Who Doesn't Understand Diacritics" Morgan University Libraries of Notre Dame (574) 631-8604