Dan Kogai <[EMAIL PROTECTED]> writes:
 
>    Japanese is notorious for the number of character encodings used.  JIS,
> shift JIS, EUC, and now Unicode.  JIS (ISO-2022-JP to be more exact) is a de
> facto standard for e-mails. shift JIS is de facto standard for Win/Mac
> files.  EUC is de facto standard for Unixen.  Unicode is de facto standard
> for internal representation but not so popular as data exchange format.
> When you handle Japanese strings, you must not assume incoming data is in
> the character set you are using.
>   The easiest solution is as follows;

Why should Unicode be the "de facto standard for internal 
representation"? ...or "internal standard" to whom, or what? In perl
that could happen, but as a general statement I cannot agree, but 
anyway I would like to hear your reasoning. 
E.g. if I was going to write one of the bigger Kanwa-Jiten 
(Chinese/Japanese Character Dictionary) Database I would rather 
use TAD (TRON-encoding) than its compititor Unicode.
For much other stuff I am quit happy with euc-jp. 
  
> * Use perl 5.6.0 or above
Or if you can't use 5.6.0 learn the basics of Japanese information
processing with byte-orientated Perl.
A good starter are Ken Lunde's pdf-files at:
  http://examples.oreilly.com/cjkvinfo/perl/
but if you wan't to get serious about Japanese information processing
with byte-oriented Perl you should get the whole book:
   http://www.oreilly.com/catalog/cjkvinfo/

> * convert any string to utf8 using Jcode or other modules
> * convert to other character set when you need to output

Maybe I am old fashioned, but I still use euc-jp or sjis for 
most of the processing/ output I do. And I am quit happy with
them. 


>   Perl 5.0.x and below can handle EUC faily well but regex may fail.  If you
> don't use regex, just replace utf8 with EUC in the recipe above.

Ken Lundes pdfs and book will help with the regex problem. 

> Dan the Developer of Jcode
Andreas Marcel the happy and thankfull  user of Jcode











Reply via email to