Mark Leisher <[EMAIL PROTECTED]> writes:
>    Peter> Also: since the .enc files seem to have adopted the four hex digit
>    Peter> per code point format how is the Encode module going to handle
>    Peter> UTF16 surrogates?
>
>I haven't looked into the format for .enc files, but another thing that
>happens for example, is more that a single source character set codepoint can
>map to multiple Unicode codepoints.  An example is the last version of the
>Armenian national standard which includes single codepoints for three very
>common ligatures, each of which should be converted to two Unicode codepoints.
>The opposite can happen as well.
>
>Although complicated on the surface, I highly recommend using Tech Report #22
>on the Unicode website as a guideline for designing future mapping tables.

All excellent stuff. What we have today is a "trial" API and a prototype
implementation based on what Tcl uses. We needed _something_ and all we 
had was fine words and no actual code. 
(Well various Unicode::Map* modules but those all seem to predate, and 
coexist badly with, native support for chars > 255 - but I may just 
be misunderstanding things.)

I would be delighted if people
start fixing or improving the prototype - but we really want to prove 
that the API is "suitable" for actual use (by XS modules like Tk, 
PerlIO, EBCDIC, ...).

What I need for Tk and what PerlIO will need, is a fast C callable 
API to get between various external encodings used by fonts or in files,
and perl's internal form. 

-- 
Nick Ing-Simmons <[EMAIL PROTECTED]>
Via, but not speaking for: Texas Instruments Ltd.

Reply via email to