> From: Christopher Morgan [mailto:[EMAIL PROTECTED] 
> Sent: 01 July, 2004 10:50
> Subject: Displaying diacritics in a terminal vs. a browser
> 
> I use the $cs->to_utf8 conversion from MARC::Charset to 
> display MARC Authority records in a browser, and the 
> diacritics display properly there.
> But they don't display properly via SDTOUT in my terminal 
> window (I get two characters instead of one -- one with the 
> letter and one with the accent mark). Am I doing something 
> wrong? I'm using:
>  
>       binmode (STDOUT, ":utf8");
> 
> Is there any way around this problem, or is it a limitation 
> of terminal displays? 

I'm not sure what MARC::Charset does internally, but MARC-8 
defines the diacritic separate from the base character.  So 
even using binmode(STDOUT,":utf8") will produce two characters,
one for the base character followed by the diacritic.  If you
want them combined then you need to combine them.

It just so happens that I have recently been converting MARC-XML
to RDF.  The RDF specification mandates Unicode Normal form C,
which means that the base character and the diacritic are 
combined.  MARC-XML uses Unicode Normal form D, which means that 
the base character is separate from the diacritic.  So I hacked 
together some Perl scripts to convert Unicode NFD <-> Unicode NFC.
The scripts require Perl 5.8.0.

I was talking with a colleague, just yesterday, about whether we 
should unleash these on the Net...  They need to be cleaned up a 
little and need some basic documentation on how to run the Perl 
scripts.


Andy.

Andrew Houghton, OCLC Online Computer Library Center, Inc.
http://www.oclc.org/about/
http://www.oclc.org/research/staff/houghton.htm

Reply via email to