Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi

 As you see, tr/// is not subject to the magic of 'use encoding'.  
 jhi, have we made it so deliberately ?  I am begging to think tr/// 

Not deliberately, no.  I agree that making tr/// to understand
'use encoding' would be good.

 is happier to enbrace the power thereof.
 
 Still, it can be overcome by simple eval qq{} as illustrated.  This 
 much idiom would not hurt much, at least not as much as the Cookbook 
 sample

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen



Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi

(Not that I understand any Japanese but) could you resend your script
as an attachment?  I'm afraid it might get mangled otherwise.  In the
headers I see the following:

  Content-Type: text/plain; charset=ISO-2022-JP; format=flowed
  ...
  Content-Transfer-Encoding: 7bit

and when I save the message from mutt, I do not see any eight-bit
characters in the saved file...


-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen



Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi

(Hi, it's me again...)

Are you doing character ranges in the tr/// under 'use encoding'?
(I'm asking because I see a - in the middle of what I assume is
mangled EUC-JP)

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen



Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Dan Kogai

On Wednesday, Oct 2, 2002, at 22:15 Asia/Tokyo, Jarkko Hietaniemi wrote:
 (Hi, it's me again...)

 Are you doing character ranges in the tr/// under 'use encoding'?
 (I'm asking because I see a - in the middle of what I assume is
 mangled EUC-JP)

Yes. that's where hiragana - katakana conversion is attempted;  
English equivalent of tr/A-Z/a-z/.

Dan




Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Dan Kogai

On Wednesday, Oct 2, 2002, at 21:51 Asia/Tokyo, Jarkko Hietaniemi wrote:
 However, I will need to stare at your example some more, since
 for simpler cases I think tr/// *is* obeying the 'use encoding':

 use encoding 'greek';
 ($a = \x{3af}bc\x{3af}de) =~ tr/\xdf/a/;
 print $a, \n;

 This does print abcade\n, and it also works when I replace the \xdf
 with the literal \xdf.

I can explain that.  \x{3af}bc\x{3af}de is is a string literal so it 
gets encoded.  however, my example in escaped form is;

   $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/

   which does not get encoded.  the intention was;

   $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/

   That's why

   eval qq{ $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ }

works because \xA4\xA1-\xA4\xF3 and \xA5\xA1-\xA5\xF3 are converted. to 
\x{3041}-\x{3093} and \x{30a1}-\x{30f3}, respectively.

Dan




Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi

 I can explain that.  \x{3af}bc\x{3af}de is is a string literal so 
 it gets encoded.  however, my example in escaped form is;
 
   $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/
 
   which does not get encoded.  the intention was;
 
   $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/
 
   That's why
 
   eval qq{ $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ }
 
 works because \xA4\xA1-\xA4\xF3 and \xA5\xA1-\xA5\xF3 are converted. 
 to \x{3041}-\x{3093} and \x{30a1}-\x{30f3}, respectively.

I'm confused.  Firstly, the tr/\xA4... converts bytes thusly:

  A1 - A1
  A2 - A2
  A3 - A3
  A4 - A5
  A5 - A5
  F3 - A5

So why isn't it just tr/\xA4\xF3/\xA5/?

Secondly, aren't you expecting tr/// to magically recognize that when
the EUC-JP codes \xA4, \xA1 to \xA4, and \xF3 are converted to their
Unicode counterparts they are supposed to spell out the Hiragana range?
The range concept of tr/// is very limited.  I think you want s///e.

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen



Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi

On Wed, Oct 02, 2002 at 10:44:06PM +0900, Dan Kogai wrote:
 On Wednesday, Oct 2, 2002, at 22:34 Asia/Tokyo, Jarkko Hietaniemi wrote:
 Yes. that's where hiragana - katakana conversion is attempted;
 English equivalent of tr/A-Z/a-z/.
 
 Okay...  What are the {begin,end} codepoints of those ranges,
 both LHS and RHS of tr, both in EUC-JP and in Unicode?
 
 Both.  I think the operation needed is straight-forward.  When you get 
 tr[LHS][RHS], decode'em then
 feed it to the naked tr// .

Urk...  That means a dip into the toke.c, how the tr/// ranges are
implemented is... tricky.  sv_recode_to_utf8() is needed somewhere...
but I'm a little bit pressed for time right now.  I suggest you
perlbug this and move the process to perl5-porters.  (Inaba Hiroto
also might have insight on this; he's the tr///-with-Unicode sensei,
really-- he practically implemented all of it.  And he might read
*[gk]ana much better than me :-)

 Dan
 
 

-- 
Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special
biologist word we use for 'stable'.  It is 'dead'. -- Jack Cohen