Re: README.jp, README.tw, README.cn, README.kr

2002-04-14 Thread jshin

On Sun, 14 Apr 2002, Dan Kogai wrote:

 On Sunday, April 14, 2002, at 05:38 , Sean M. Burke wrote:
  At 23:30 2002-04-13 +0300, Jarkko Hietaniemi wrote:
  (You know what?  Since of the files will be named README.xx and
  written in pod, the build machinery will automatically create
  the pod pages perljp, perltw, perlcn, and perlkr...)
 
  BTW, you all know those are country codes and not language tags, right?
 
 Right.  But sometimes we have to bend the rule to keep legacy systems 
 happy.  So be it .(cn|jp|kr|tw) instead of .(zh_cn|ja|ko|zh_tw) ;)

  I'm just wondering what legacy system we have to/can make happy
by using (cn|jp|kr|tw) in place of (zh_cn|ja|ko|zh_tw).  My North Korean
brethren may not like it much if I use 'kr' instead of 'ko' (ko_kr) :-)

  BTW, I'm sorry to make things more complicated when we seem to
have enough headache with perldoc's handling of 8bit characters.  However,
I can't help thinking it'd be better to make README.xx in UTF-8 and let
Encode convert to legacy encodings depending on the present locale setting
(LC_CTYPE - nl_codeset()) than the other way around. Am I missing
something here?  

  Jungshik 




Re: README.jp, README.tw, README.cn, README.kr

2002-04-14 Thread Sean M. Burke

At 08:09 2002-04-14 -0400, [EMAIL PROTECTED] wrote:
BTW, I'm sorry to make things more complicated when we seem to have enough 
headache with perldoc's handling of 8bit characters.

I was thinking today about the situation where someone is Korean (for 
example), and they try doing perldoc README.kr (or whatever it ends up 
getting called), and they can't see the content, for any of a dozen reasons.

I was a bit bothered by that possibillity, until I recalled that if they 
could always just look at the README.kr file in their browser, and have it 
appear as a plaintext file in a selectable encoding.  Since, in my 
experience, if people only one program at all that can render their 
language, it's their browser.


--
Sean M. Burkehttp://www.spinn.net/~sburke/




Re: README.jp, README.tw, README.cn, README.kr

2002-04-14 Thread Jarkko Hietaniemi

  by using (cn|jp|kr|tw) in place of (zh_cn|ja|ko|zh_tw).  My North Korean
  brethren may not like it much if I use 'kr' instead of 'ko' (ko_kr) :-)

Maybe then go for README.ko?  But README.tw, README.cn, to separate the
Trad/Simp, Japanese may be either ja or jp, whichever feels better to Dan.
Yes, there's no strict logic country vs language in there, just the logic
that they are different enough to work in 8.3.

I'll think about some more about the legacy vs UTF-8 matter, but as
I said, that would sort of ruin the idea of having something in each
CJK native encoding and showing how easy it is to start using Perl
and converting to using Unicode.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



[Encode] 1.40 will be released in a few hours!

2002-04-14 Thread Dan Kogai

Folks,

   I will release ver. 1.40 of Encode after the smoke testings are done.  
With In-XSimmons'  XS version of Unicode transcoders, encoding.pm 
enhancements and fixes (that led to child gets croaked before born bug 
discovery), and other nits picked, simple version increment is not 
enough.

* With all modules loaded, it can transcode some 113 encodings and it is 
easy to add more via enc2xs.
* With encoding pragma, you can emulate Jperl and more
* Though Encode accounts for some 30% of PERL5LIB in size, its memory 
consumption is not that big.  Here is a list of core file sizes via 
dump immediately after modules loaded on my FreeBSD box.

perl alone  774,144 bytes
No Encode::XX   1,171,456
With All  2,990,080
All+HanExtra  3,534,848

* I decided not to include Indics.  It is MY obsession to include all 
encodings that are available in unicode.org but come to think of it, 
HanExtra is already 'external' and for other Encodings there are always 
others that are 'obsession'.  So I decided to wait till my obsession 
becomes 'ours'.  And I already added '-C' option to enc2xs so 
postinstalled modules can also join the demand-loading list.  Better 
take time to let it mature enough for production quality.

Detailed Changes right after my signature.

Dan the Encode Maintainer

1.40
+ Encode/ConfigLocal_PM.e2x
! lib/Encode/Config.pm
! bin/enc2xs
   enc2xs -C now generates/updates Encode::ConfigLocal.
   ConfigLocal_PM.e2x is a skelton thereof.
! lib/Encode/Config.pm
! CN/CN.pm
   use Encode::CN::HZ; was missing.
! t/Unicode.t
! t/unibench.t
   More rigorous tests added to test XS, especially on memory allocation.
! Encode.xs
! lib/Encode/Unicode.pm
   NI-S implemented an XS version -- merged
   Message-Id: [EMAIL PROTECTED]
! encoding.pm
! t/jperl.t
   Source filter option added.  With this option on, you can write
   perl 5.8-savvy scripts (such as UTF-8 identifiers) in legacy
   encodings.  t/jperl.t enhanced to test this feature.
! t/Unicode.t
   ok() gotcha addressed by Benjamin fixed.  Though I didn't exactly
   apply his suggestion, this degree of nitting is enough to add him
   to AUTHORS list.
   Message-Id: [EMAIL PROTECTED]
! JP/JP.pm
+ lib/Encode/JP/JIS7.pm
- lib/Encode/JP/JIS.pm
- lib/Encode/JP/2022_JP.pm
- lib/Encode/JP/2022_JP1.pm
   7bit-jis, iso-2022-jp and iso-2022-jp1 are all aggregated to
   JIS7.pm for better maintainability and performance
! encoding.pm
   Added caveat for non-ascii identifiers.
! encoding.pm
   fixes by jhi, the original author of this pragramtic module.
   Message-Id: [EMAIL PROTECTED]