Re: Encode: CJK-Guide

Nick Ing-Simmons Wed, 27 Mar 2002 02:21:22 -0800

Jungshik Shin <[EMAIL PROTECTED]> writes:
>
>> >   For Johab, no new table is necessary because Hangul precomposed
>> > syllable mapping (to Unicode) is algorithmic while Hanjas and symbols can
>> > be mapped to KS X 1001 algorithmically and then mapped to Unicode
>> > using KS X 1001 mapping table.
>
> Before going further, I have a question or two. It appears that
>euc-kr, ksc5601-raw(ksc5601-gl or whatever) and cp949 have their own
>mapping tables although they're closely related. Is there any reason
>for this?


The "compile" process will share the compiled form of the tables automaticaly
if they are closely related.

>In case of Johab, the easiest way to add support for it is to
>just generate the mapping table for it, but I feel uncomfotable bloating
>the code when it can be done algorithmically if I can make use of the
>mapping table for euc-kr or ksc5601(-raw). It appears that euc-jp and
>shift_jis don't share the mapping table, either although shift_jis and
>euc-jp can be more or less algorithmically converted to/from each other.
>I must be missing something here. There should be a way to do it and
>I'd be glad if someone could tell me where to look for an example case
>(e.g. shift_jis and euc-jp)

There is some documentation on the API that an encoding must provide.
(I think Dan moved it out of Encode.pm.)

Most of existing encodings use one multi-byte-to-multi-byte "engine",
with compiled tables - this works well for 8-bit encodings and can
handle the others - not necessarily optimally.

It would be good to have some algorithmic encodings to use as
examples. The only ones we have at present are UCS-2 (as perl code)
and UTF-8 (C but buried in perl's core).

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Re: Encode: CJK-Guide

Reply via email to