Jungshik Shin <[EMAIL PROTECTED]> writes: > >> > For Johab, no new table is necessary because Hangul precomposed >> > syllable mapping (to Unicode) is algorithmic while Hanjas and symbols can >> > be mapped to KS X 1001 algorithmically and then mapped to Unicode >> > using KS X 1001 mapping table. > > Before going further, I have a question or two. It appears that >euc-kr, ksc5601-raw(ksc5601-gl or whatever) and cp949 have their own >mapping tables although they're closely related. Is there any reason >for this?
The "compile" process will share the compiled form of the tables automaticaly if they are closely related. >In case of Johab, the easiest way to add support for it is to >just generate the mapping table for it, but I feel uncomfotable bloating >the code when it can be done algorithmically if I can make use of the >mapping table for euc-kr or ksc5601(-raw). It appears that euc-jp and >shift_jis don't share the mapping table, either although shift_jis and >euc-jp can be more or less algorithmically converted to/from each other. >I must be missing something here. There should be a way to do it and >I'd be glad if someone could tell me where to look for an example case >(e.g. shift_jis and euc-jp) There is some documentation on the API that an encoding must provide. (I think Dan moved it out of Encode.pm.) Most of existing encodings use one multi-byte-to-multi-byte "engine", with compiled tables - this works well for 8-bit encodings and can handle the others - not necessarily optimally. It would be good to have some algorithmic encodings to use as examples. The only ones we have at present are UCS-2 (as perl code) and UTF-8 (C but buried in perl's core). -- Nick Ing-Simmons http://www.ni-s.u-net.com/