in fact the question is quite complex to explain, and I'm not sure that I can explain well.
At 14.57 16/12/03, you wrote:
This process works just fine for records that contain no diacritics, but when diacritics are in the records extra characters end up in my saved files, like this:
00901nam 22002651 ^^^ 45000010008000000050017000080080041000250350021000669060045000870 10001700132040001800149050001800167082001000185100002900195245009 20022426000340031630000470035049000290039750400260042660000340045 27100021004869910044005079910055005510990029006063118006 19740417000000.0731207s1967 nyuabf b 000 0beng 9(DLC) 67029856 a7bcbccorignewdueocipf19gy-gencatlg a 67029856 aDLCcDLCdDLC00aND588.D9bR8500a759.31 aRussell, Francis,d1910-14aThe world of Dˆ®urer, ^^^^^^^ 1471-1528,cby Francis Russell and the editors of Time-Life Books. aNew York,bTime, inc.c[1967] a183 p.billus., maps, col. plates.c32 cm.0 aTime-Life library of art aBibliography: p. 177.10aDˆ®urer, Albrecht,d1471-1528.2 ^^^^^^^ aTime-Life Books. bc-GenCollhND588.D9iR85tCopy 1wBOOKS bc-GenCollhND588.D9iR85p00034015107tCopy 2wCCF arussell-world-1071495663
Notice how Dürer got munged into Dˆ®urer, twice, and consequently the record length is not 901 but 903 instead.
Some people say I must be sure to request a specific character set from the LOC when downloading my MARC records, specifically MARC-8 or MARC-UCS. Which one of these character sets do I want and how do I tell the remote database which one I want?
1)When you call LOC without a specific character you recive data in MARC-8 character set.
2) In MARC-8 character set a letter like "è" [e grave] is done with TWO bytes one for the sign [the grave accent] and one for the letter [the letter e].
3)In the leader, position 0-4 you have the number of character, NOT the number of bytes. In your record there are 901 characters and 903 bytes.
In fact the "lenght" function of perl read the number of bytes. The best option, now, is to use charset where 1 character is always 1 byte, for example ISO 8859_1
A good place to undestand charset sets is http://www.gymel.com/charsets/ [in deutch]
Bye
Zeno Tajoli [EMAIL PROTECTED] CILEA - Segrate (MI) 02 / 26995321