On Mon, 25 Mar 2002 21:56:08 +0900 Dan Kogai <[EMAIL PROTECTED]> wrote:
> On Monday, March 25, 2002, at 09:37 , Nick Ing-Simmons wrote: > >> > >> in trouble? Or perl on such systems are smart enough to load > >> UNIVERSA.pm (I guess this is the case). > > > > They load UNIVERSAL.pm and the OS truncates it and finds UNIVERSA.pm. > > > >> Size reduction was a byproduct of */Makefile.PL linting. > >> As for "Encode::Supports", there is another concern in perldoc; is > >> perldoc smart enough to 8.3-ize filenames? > > > > Same logic as above works - name passed to OS is still the long one. > > Okay, I am convinced that we should stick with the original, long, > user-friendly names but how about ucm-transitions? > As of Encode-0.98, there are so many duped tables under Encode/ and I > want to tidy it up if possible. Well, for this I will wait what > Sadahiro-san has to say.... hmm.... I'm not in opposition to it. IMO, a more significant point might be which encodings are worth implemented in the core ship. In other words, it's better to assess each encoding which is supported only by Encode::Tcl. AFAIK, such encodings includes ISO-2022-JP-2 and ISO-2022-CN. (defined by 2022-jp2.enc and 2022-cn.enc, respectively) But it may seem weird to encode to them, since they have many many duplicates in definition. Say, here is an example of ISO-2022-CN cited from RFC 1922. Example: the hex sequence 1b 24 29 41 0e 3d 3b 3b 3b 1b 24 29 47 47 28 5f 50 0f represents the Chinese word for "Interchange" (jiao huan) twice; where, <3d 3b 3b 3b> is "jiao huan" in GB (GB 2312-80), and <47 28 5f 50> is "jiao huan" in CNS (CNS 11643 plane-1). Then, decoding of it gives "\x{4ea4}\x{6362}\x{4ea4}\x{63db}". "jiao" has mapped to the same code point in Unicode! To encode "\x{4ea4}\x{6362}\x{4ea4}\x{63db}" to ISO-2022-CN will give the following hex sequence: 1b 24 29 41 0e 3d 3b 3b 3b 3d 3b 1b 24 29 47 5f 50 0f where, <3d 3b 3b 3b 3d 3b> is "jiao huan jiao" in GB, and <5f 50> is "huan" in CNS. How about it? More confusing is ISO-2022-JP-2, as it has JIS/GB/KS characters. Many kanji/hanzi/hanja are *triplicated*! (Of course triplicates includes hiragana, katakana, Greek, etc.) A solution to distinguish the languages may be tagging but are they truly useful? NOTE In Encode::Tcl::Escape::encode(), each character is retrived in order cited in the .enc file. Say, according to 2022-jp2.enc, jis0212 is preferred than gb2312, and gb2312 than ksc5601. E name iso2022-jp2 init {} final {} ascii \x1b(B ascii \x1b(J jis0208 \x1b$B jis0208 \x1b$@ jis0212 \x1b$(D gb2312 \x1b$A ksc5601 \x1b$(C 7bit-latin1 \x1b.A 7bit-greek \x1b.F > At leas euc-jp must be in *.ucm because it contains triple-bytes (JIS > X 0212), which Encode::Tcl used to handle via Encode::Tcl::Extended but > now ::Extended is gone.... Well, I've agreed it. http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2002-03/msg00076.html > Dan the Encode Maintainer Regards, SADAHIRO Tomoyuki