Re: Unicode. Perl does the right thing?

2002-10-24 Thread Dan Kogai
On Friday, Oct 25, 2002, at 14:10 Asia/Tokyo, Philip Newton wrote:
> Well, partially because there's no "good" names for many of the
> characters. What do you call "生"? "CJK UNIFIED IDEOGRAPH-751F"? (That's
> the current Unicode "name", but it's not particularly useful.) "CJK
> shou"? "CJK sei"? "CJK sheng1"? "CJK saeng"? "CJK ikiru"? ikasu, ikeru,
> umareru, umu, ou, haeru, hayasu, ki, nama, naru, nasu, musu,  which
> one do you pick?

If we are stuck with de jure, ex officio names from Unicode Consortium 
we are out of luck but this is perl; if there are more than one way to 
do it,  Why not more than one way to name it?  I am kind of wondering a 
charnames extension that goes like

use charnames ":ja"; # Japanese
print "\N{sei-ikiru}";
#
use charnames ":ko";
print "\N{saeng}";
#
use charanames ":zh";
print "\N{sheng1}";

Since pragmatic approach is rather inflexible, I would prefer OO 
aproach, like

use Char::Name;

my $char = Char::Name->new;

print $char->jp("sei-ikiru");

I know Japanese is the biggest nightmare to name characters because in 
Japanese we give too many "names" to each character; It's really hard 
to disambiguate these

I may come up with something as I look though Unihan DB, now accessible 
via CPAN (Unicode::Unihan)

> Cheers,
> Philip Newton (不衣律不入豚)

\x{5c0f}\x{98fc} \x{5f3e}


Re: [Encode] 1.80 released

2002-10-24 Thread Dan Kogai
On Friday, Oct 25, 2002, at 09:29 Asia/Tokyo, [EMAIL PROTECTED] wrote:

I'd recommend the small patch below, which will make it possible to
run the new rt.pl in any of the standard manners under the core:
  ( cd t ; ./perl TEST ../ext/Encode/t/rt.pl )
  ( cd t ; ./perl harness ../ext/Encode/t/rt.pl )
  PERL_CORE=1 ./perl -Ilib ext/Encode/t/rt.pl

With this patch, those tests also pass (eventually :).


Thanks, applied back :)  I feel relieved now.  And I am doubly relieved 
to find how meticulous a pumpking you are.  I wonder why Net::Ping's 
(rather obvious bug (for *BSD users)) slipped thru :)  And I was 
surprised to find your name was not on ext/Encode/AUTHORS.  Now added.

With that done, please proceed to the next patch to fix tr///

From: Dan Kogai <[EMAIL PROTECTED]>
Date: Mon Oct 21, 2002  17:36:02 Asia/Tokyo
To: hv <[EMAIL PROTECTED]>, Inaba Hiroto <[EMAIL PROTECTED]>, Jarkko 
Hietaniemi <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: The Inaba patch for tr/// vs. use encoding
Message-Id: <[EMAIL PROTECTED]>

I KNOW you are working on it (at least reviewing it) but just for 
reminder

Dan the Perl5 Porter



Re: [Encode] 1.80 released

2002-10-24 Thread hv
Dan Kogai <[EMAIL PROTECTED]> wrote:
:   I have released Encode 1.80 despite the fact I just released 1.79 
:less than 24 hours ago.

Thanks, integrated as change #18057.

I'd recommend the small patch below, which will make it possible to
run the new rt.pl in any of the standard manners under the core:
  ( cd t ; ./perl TEST ../ext/Encode/t/rt.pl )
  ( cd t ; ./perl harness ../ext/Encode/t/rt.pl )
  PERL_CORE=1 ./perl -Ilib ext/Encode/t/rt.pl

With this patch, those tests also pass (eventually :).

Hugo
--- Encode-1.80/t/rt.pl.old Fri Oct 25 00:14:20 2002
+++ Encode-1.80/t/rt.pl Fri Oct 25 00:14:55 2002
@@ -4,9 +4,11 @@
 #
 
 BEGIN {
+my $ucmdir  = "ucm";
 if ($ENV{'PERL_CORE'}){
 chdir 't';
 unshift @INC, '../lib';
+$ucmdir = "../ext/Encode/ucm";
 }
 require Config; import Config;
 if ($Config{'extensions'} !~ /\bEncode\b/) {
@@ -19,7 +21,6 @@
 }
 use strict;
 require Test::More;
-my $ucmdir  = "ucm";
 our $DEBUG;
 our @ucm;
 unless(@ARGV){



Re: MD5 digest of UTF-8 string in Perl 5.8

2002-10-24 Thread Markus Kuhn
Gisle Aas wrote on 2002-10-23 15:01 UTC:
> md5_hex(Encode::encode_utf8($string))

Thanks, that looks indeed like the proper solution.

Juha-Mikko Ahonen wrote on 2002-10-23 14:42 UTC:
> >   $ perl -e 'use Digest::MD5 qw(md5_hex); print md5_hex("\x{20ac}");'
> >   Wide character in subroutine entry at -e line 1.
> 
> The problem is in \x{20ac}. If you place the character in UTF-8 encoding 
> in place of the escape, it works perfectly. If you have real UTF-8 
> data, not perl escapes, then there is no problem.

I'm afraid, this didn't make sense to me. The internal representation of
the input value of the MD5 function should not depend on whether I used
the UTF-8 character of the hex escape notation in the source code. The
Perl compiler should eliminate this difference already in the scanner.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: