Dan Sugalski <[EMAIL PROTECTED]> writes:
> At 05:20 PM 6/7/2001 +0000, Nick Ing-Simmons wrote:

>> One reason perl5.7.1+'s Encode does not do asian encodings yet is that
>> the tables I have found so far (Mainly Unicode 3.0 based) are lossy.

> Joy. Hopefully by the time we're done there'll be a full
> implementation. This makes me even more determined to support non-ASCII,
> non-Unicode encodings in the core if we want to handle non-western text.

Incidentally, one of the places that the largest amount of work that I'm
aware of in this area has been done is in the iconv support in current
versions of glibc.  That includes (in current CVS) something pretty close
to full bidirectional mappings between a huge variety of local character
sets and Unicode.

May be worth looking at their code, although unfortunately it can't be
incorporated directly into Perl.  They may have already dealt with the
issues of lossiness or lack thereof.  As I recall from reading mailing
list traffic, one of the major things that was recently added were a
variety of tests to the glibc test suite to ensure that round-trip
conversions through Unicode were lossless where possible.

The other advantage of looking at glibc's approach is that they get tons
of bug reports about obscure things and conventions for using particular
characters that aren't obvious from the specifications.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>

Reply via email to