Mark Leisher <[EMAIL PROTECTED]> writes:
> Peter> Also: since the .enc files seem to have adopted the four hex digit
> Peter> per code point format how is the Encode module going to handle
> Peter> UTF16 surrogates?
>
>I haven't looked into the format for .enc files, but another thing that
>happens for example, is more that a single source character set codepoint can
>map to multiple Unicode codepoints. An example is the last version of the
>Armenian national standard which includes single codepoints for three very
>common ligatures, each of which should be converted to two Unicode codepoints.
>The opposite can happen as well.
>
>Although complicated on the surface, I highly recommend using Tech Report #22
>on the Unicode website as a guideline for designing future mapping tables.
All excellent stuff. What we have today is a "trial" API and a prototype
implementation based on what Tcl uses. We needed _something_ and all we
had was fine words and no actual code.
(Well various Unicode::Map* modules but those all seem to predate, and
coexist badly with, native support for chars > 255 - but I may just
be misunderstanding things.)
I would be delighted if people
start fixing or improving the prototype - but we really want to prove
that the API is "suitable" for actual use (by XS modules like Tk,
PerlIO, EBCDIC, ...).
What I need for Tk and what PerlIO will need, is a fast C callable
API to get between various external encodings used by fonts or in files,
and perl's internal form.
--
Nick Ing-Simmons <[EMAIL PROTECTED]>
Via, but not speaking for: Texas Instruments Ltd.