date:20021024

Re: Unicode. Perl does the right thing?

2002-10-24 Thread Dan Kogai

On Friday, Oct 25, 2002, at 14:10 Asia/Tokyo, Philip Newton wrote:
(B> Well, partially because there's no "good" names for many of the
(B> characters. What do you call "$B@8(B"? "CJK UNIFIED IDEOGRAPH-751F"? (That's
(B> the current Unicode "name", but it's not particularly useful.) "CJK
(B> shou"? "CJK sei"? "CJK sheng1"? "CJK saeng"? "CJK ikiru"? ikasu, ikeru,
(B> umareru, umu, ou, haeru, hayasu, ki, nama, naru, nasu, musu,  which
(B> one do you pick?
(B
(BIf we are stuck with de jure, ex officio names from Unicode Consortium 
(Bwe are out of luck but this is perl; if there are more than one way to 
(Bdo it,  Why not more than one way to name it?  I am kind of wondering a 
(Bcharnames extension that goes like
(B
(Buse charnames ":ja"; # Japanese
(Bprint "\N{sei-ikiru}";
(B#
(Buse charnames ":ko";
(Bprint "\N{saeng}";
(B#
(Buse charanames ":zh";
(Bprint "\N{sheng1}";
(B
(BSince pragmatic approach is rather inflexible, I would prefer OO 
(Baproach, like
(B
(Buse Char::Name;
(B
(Bmy $char = Char::Name->new;
(B
(Bprint $char->jp("sei-ikiru");
(B
(BI know Japanese is the biggest nightmare to name characters because in 
(BJapanese we give too many "names" to each character; It's really hard 
(Bto disambiguate these
(B
(BI may come up with something as I look though Unihan DB, now accessible 
(Bvia CPAN (Unicode::Unihan)
(B
(B> Cheers,
(B> Philip Newton ($BIT0aN'ITF~FZ(B)
(B
(B\x{5c0f}\x{98fc} \x{5f3e}

Re: [Encode] 1.80 released

2002-10-24 Thread Dan Kogai

On Friday, Oct 25, 2002, at 09:29 Asia/Tokyo, [EMAIL PROTECTED] wrote:

I'd recommend the small patch below, which will make it possible to
run the new rt.pl in any of the standard manners under the core:
  ( cd t ; ./perl TEST ../ext/Encode/t/rt.pl )
  ( cd t ; ./perl harness ../ext/Encode/t/rt.pl )
  PERL_CORE=1 ./perl -Ilib ext/Encode/t/rt.pl

With this patch, those tests also pass (eventually :).


Thanks, applied back :)  I feel relieved now.  And I am doubly relieved 
to find how meticulous a pumpking you are.  I wonder why Net::Ping's 
(rather obvious bug (for *BSD users)) slipped thru :)  And I was 
surprised to find your name was not on ext/Encode/AUTHORS.  Now added.

With that done, please proceed to the next patch to fix tr///

From: Dan Kogai <[EMAIL PROTECTED]>
Date: Mon Oct 21, 2002  17:36:02 Asia/Tokyo
To: hv <[EMAIL PROTECTED]>, Inaba Hiroto <[EMAIL PROTECTED]>, Jarkko 
Hietaniemi <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: The Inaba patch for tr/// vs. use encoding
Message-Id: <[EMAIL PROTECTED]>

I KNOW you are working on it (at least reviewing it) but just for 
reminder

Dan the Perl5 Porter

Re: [Encode] 1.80 released

2002-10-24 Thread hv

Dan Kogai <[EMAIL PROTECTED]> wrote:
:   I have released Encode 1.80 despite the fact I just released 1.79 
:less than 24 hours ago.

Thanks, integrated as change #18057.

I'd recommend the small patch below, which will make it possible to
run the new rt.pl in any of the standard manners under the core:
  ( cd t ; ./perl TEST ../ext/Encode/t/rt.pl )
  ( cd t ; ./perl harness ../ext/Encode/t/rt.pl )
  PERL_CORE=1 ./perl -Ilib ext/Encode/t/rt.pl

With this patch, those tests also pass (eventually :).

Hugo
--- Encode-1.80/t/rt.pl.old Fri Oct 25 00:14:20 2002
+++ Encode-1.80/t/rt.pl Fri Oct 25 00:14:55 2002
@@ -4,9 +4,11 @@
 #
 
 BEGIN {
+my $ucmdir  = "ucm";
 if ($ENV{'PERL_CORE'}){
 chdir 't';
 unshift @INC, '../lib';
+$ucmdir = "../ext/Encode/ucm";
 }
 require Config; import Config;
 if ($Config{'extensions'} !~ /\bEncode\b/) {
@@ -19,7 +21,6 @@
 }
 use strict;
 require Test::More;
-my $ucmdir  = "ucm";
 our $DEBUG;
 our @ucm;
 unless(@ARGV){

Re: MD5 digest of UTF-8 string in Perl 5.8

2002-10-24 Thread Markus Kuhn

Gisle Aas wrote on 2002-10-23 15:01 UTC:
> md5_hex(Encode::encode_utf8($string))

Thanks, that looks indeed like the proper solution.

Juha-Mikko Ahonen wrote on 2002-10-23 14:42 UTC:
> >   $ perl -e 'use Digest::MD5 qw(md5_hex); print md5_hex("\x{20ac}");'
> >   Wide character in subroutine entry at -e line 1.
> 
> The problem is in \x{20ac}. If you place the character in UTF-8 encoding 
> in place of the escape, it works perfectly. If you have real UTF-8 
> data, not perl escapes, then there is no problem.

I'm afraid, this didn't make sense to me. The internal representation of
the input value of the MD5 function should not depend on whether I used
the UTF-8 character of the hex escape notation in the source code. The
Perl compiler should eliminate this difference already in the scanner.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW:

Re: Unicode. Perl does the right thing?

Re: [Encode] 1.80 released

Re: [Encode] 1.80 released

Re: MD5 digest of UTF-8 string in Perl 5.8

4 matches

Site Navigation

Mail list logo

Footer information