Re: README.jp, README.tw, README.cn, README.kr
On Sun, 14 Apr 2002, Dan Kogai wrote: On Sunday, April 14, 2002, at 05:38 , Sean M. Burke wrote: At 23:30 2002-04-13 +0300, Jarkko Hietaniemi wrote: (You know what? Since of the files will be named README.xx and written in pod, the build machinery will automatically create the pod pages perljp, perltw, perlcn, and perlkr...) BTW, you all know those are country codes and not language tags, right? Right. But sometimes we have to bend the rule to keep legacy systems happy. So be it .(cn|jp|kr|tw) instead of .(zh_cn|ja|ko|zh_tw) ;) I'm just wondering what legacy system we have to/can make happy by using (cn|jp|kr|tw) in place of (zh_cn|ja|ko|zh_tw). My North Korean brethren may not like it much if I use 'kr' instead of 'ko' (ko_kr) :-) BTW, I'm sorry to make things more complicated when we seem to have enough headache with perldoc's handling of 8bit characters. However, I can't help thinking it'd be better to make README.xx in UTF-8 and let Encode convert to legacy encodings depending on the present locale setting (LC_CTYPE - nl_codeset()) than the other way around. Am I missing something here? Jungshik
Re: README.jp, README.tw, README.cn, README.kr
At 08:09 2002-04-14 -0400, [EMAIL PROTECTED] wrote: BTW, I'm sorry to make things more complicated when we seem to have enough headache with perldoc's handling of 8bit characters. I was thinking today about the situation where someone is Korean (for example), and they try doing perldoc README.kr (or whatever it ends up getting called), and they can't see the content, for any of a dozen reasons. I was a bit bothered by that possibillity, until I recalled that if they could always just look at the README.kr file in their browser, and have it appear as a plaintext file in a selectable encoding. Since, in my experience, if people only one program at all that can render their language, it's their browser. -- Sean M. Burkehttp://www.spinn.net/~sburke/
Re: README.jp, README.tw, README.cn, README.kr
by using (cn|jp|kr|tw) in place of (zh_cn|ja|ko|zh_tw). My North Korean brethren may not like it much if I use 'kr' instead of 'ko' (ko_kr) :-) Maybe then go for README.ko? But README.tw, README.cn, to separate the Trad/Simp, Japanese may be either ja or jp, whichever feels better to Dan. Yes, there's no strict logic country vs language in there, just the logic that they are different enough to work in 8.3. I'll think about some more about the legacy vs UTF-8 matter, but as I said, that would sort of ruin the idea of having something in each CJK native encoding and showing how easy it is to start using Perl and converting to using Unicode. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
[Encode] 1.40 will be released in a few hours!
Folks, I will release ver. 1.40 of Encode after the smoke testings are done. With In-XSimmons' XS version of Unicode transcoders, encoding.pm enhancements and fixes (that led to child gets croaked before born bug discovery), and other nits picked, simple version increment is not enough. * With all modules loaded, it can transcode some 113 encodings and it is easy to add more via enc2xs. * With encoding pragma, you can emulate Jperl and more * Though Encode accounts for some 30% of PERL5LIB in size, its memory consumption is not that big. Here is a list of core file sizes via dump immediately after modules loaded on my FreeBSD box. perl alone 774,144 bytes No Encode::XX 1,171,456 With All 2,990,080 All+HanExtra 3,534,848 * I decided not to include Indics. It is MY obsession to include all encodings that are available in unicode.org but come to think of it, HanExtra is already 'external' and for other Encodings there are always others that are 'obsession'. So I decided to wait till my obsession becomes 'ours'. And I already added '-C' option to enc2xs so postinstalled modules can also join the demand-loading list. Better take time to let it mature enough for production quality. Detailed Changes right after my signature. Dan the Encode Maintainer 1.40 + Encode/ConfigLocal_PM.e2x ! lib/Encode/Config.pm ! bin/enc2xs enc2xs -C now generates/updates Encode::ConfigLocal. ConfigLocal_PM.e2x is a skelton thereof. ! lib/Encode/Config.pm ! CN/CN.pm use Encode::CN::HZ; was missing. ! t/Unicode.t ! t/unibench.t More rigorous tests added to test XS, especially on memory allocation. ! Encode.xs ! lib/Encode/Unicode.pm NI-S implemented an XS version -- merged Message-Id: [EMAIL PROTECTED] ! encoding.pm ! t/jperl.t Source filter option added. With this option on, you can write perl 5.8-savvy scripts (such as UTF-8 identifiers) in legacy encodings. t/jperl.t enhanced to test this feature. ! t/Unicode.t ok() gotcha addressed by Benjamin fixed. Though I didn't exactly apply his suggestion, this degree of nitting is enough to add him to AUTHORS list. Message-Id: [EMAIL PROTECTED] ! JP/JP.pm + lib/Encode/JP/JIS7.pm - lib/Encode/JP/JIS.pm - lib/Encode/JP/2022_JP.pm - lib/Encode/JP/2022_JP1.pm 7bit-jis, iso-2022-jp and iso-2022-jp1 are all aggregated to JIS7.pm for better maintainability and performance ! encoding.pm Added caveat for non-ascii identifiers. ! encoding.pm fixes by jhi, the original author of this pragramtic module. Message-Id: [EMAIL PROTECTED]