Hi Henri-Damien, > And any LOWERCASE DIGRAPH AE or UPPERCASE DIGRAPH AE or > LOWERCASE DIGRAPH OE is not well encoded. Encoding is > **assumed** to be latin1 translated into utf-8 in the > catalogue I am working on but appears respectively µ, ¥,¶ > in biblios.
hex MARC-8 ISO-8859-1 (Latin-1) - ---- -------------------- -------------------- µ 0xB5 LOWERCASE DIGRAPH AE MICRO SIGN ¥ 0xA5 UPPERCASE DIGRAPH AE YEN SIGN ¶ 0xB6 LOWERCASE DIGRAPH OE PILCROW SIGN > Is there a way to fix things up ? If the underlying numerical encoding in your MARC records for the digraphs in question is hex 0xB5, 0xA5, and 0xB6, then the character set is not Latin-1; it is MARC-8. If that is the case, I don't believe that anything needs to be fixed; if you are using MARC::Charset to convert the records from MARC-8 to UTF-8, it should work. However, it may also be that I am misunderstanding the issue. It would help if you could provide the pertinent Perl code you are using for the character set translation and a couple of the MARC records with digraphs that are failing. > ... but appears respectively µ, ¥,¶ in biblios. Please excuse my ignorance, but what is 'biblios' in the context of this discussion? -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Henri-Damien LAURENT [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 14, 2007 4:18 AM > To: Doran, Michael D; perl4lib > Subject: Re: MARC::Charset > > Doran, Michael D a écrit : > > Hi Henri, > > > > Although in my email client, the character in question > appears as a MICRO SIGN ("µ"), I am assuming that it is > actually meant to be a LOWERCASE DIGRAPH AE ("æ") since that > is consistent with the Latin vernacular text in your record. > In MARC-8, the LOWERCASE DIGRAPH AE character is a > precomposed character represented by 0xB5 in hex [1]. You > mention that you are using MARC::File::XML which in turn uses > MARC::Charset. I'm wondering if there is some confusion as > to the expected encoding of the MARC records being > processed/converted? If MARC::Charset is expecting MARC21 > Unicode/UCS encoded records, but is actually getting MARC-8 > encoded records, then in that context it likely wouldn't know > what to do with the 0xB5 octet and that might be the cause of > the error you are seeing. > > > > -- Michael > > > > [1] Your MARC records appear to be encoded in MARC-8 as > evidenced by "ergáo" in which the combining accent character > comes before the character to be modified. I.e. the byte > string that displays as "ergáo" in your email would display > as "ergò" (with a Latin small letter o with grave) in a > MARC-8 aware client. > > > > > Thanks for your answer. > Well, this could be a precious hint. > Indeed, in that catalogue I want to process, some books are > ancient books and were catalogued from OCLC or SUDOC. > And any LOWERCASE DIGRAPH AE or UPPERCASE DIGRAPH AE or > LOWERCASE DIGRAPH OE is not well encoded. Encoding is > **assumed** to be latin1 translated into utf-8 in the > catalogue I am working on but appears respectively µ, ¥,¶ in biblios. > > Is there a way to fix things up ? > > -- > Henri Damien LAURENT et Paul POULAIN > Consultants indépendants > en logiciels libres et bibliothéconomie (http://www.koha-fr.org) >