> From: Paul Hoffman [mailto:[EMAIL PROTECTED] 
> Sent: 01 July, 2004 11:57
> Subject: Re: Displaying diacritics in a terminal vs. a browser
> 
> Unless I'm very much mistaken, Chris's code is outputting 
> UTF-8 to the terminal, not MARC-8.

> >> From: Christopher Morgan [mailto:[EMAIL PROTECTED]
> >> Sent: 01 July, 2004 10:50
> >> Subject: Displaying diacritics in a terminal vs. a browser
> >> 
> >> (I get two characters instead of one -- one with the letter 
> >> and one with the accent mark). Am I doing something wrong? 

I realized that he was outputting UTF-8, but if he started with
MARC-8 and used $cs->to_utf8 in MARC::Charset, MARC::Charset 
would most likely keep the data in Unicode Normal form D, which
is why he sees two characters.  When he views them with a browser,
the browser most likely receives the two characters but,
depending upon what fonts you are using, it will combine the
two characters to look as *if* they are one combined character.

> 
> http://mail.nl.linux.org/linux-utf8/2003-07/msg00231.html
> 

Nice reference...


Andy.

Andrew Houghton, OCLC Online Computer Library Center, Inc.
http://www.oclc.org/about/
http://www.oclc.org/research/staff/houghton.htm

Reply via email to