Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;
As you see, tr/// is not subject to the magic of 'use encoding'. jhi, have we made it so deliberately ? I am begging to think tr/// Not deliberately, no. I agree that making tr/// to understand 'use encoding' would be good. is happier to enbrace the power thereof. Still, it can be overcome by simple eval qq{} as illustrated. This much idiom would not hurt much, at least not as much as the Cookbook sample -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;
(Not that I understand any Japanese but) could you resend your script as an attachment? I'm afraid it might get mangled otherwise. In the headers I see the following: Content-Type: text/plain; charset=ISO-2022-JP; format=flowed ... Content-Transfer-Encoding: 7bit and when I save the message from mutt, I do not see any eight-bit characters in the saved file... -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;
(Hi, it's me again...) Are you doing character ranges in the tr/// under 'use encoding'? (I'm asking because I see a - in the middle of what I assume is mangled EUC-JP) -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;
On Wednesday, Oct 2, 2002, at 22:15 Asia/Tokyo, Jarkko Hietaniemi wrote: (Hi, it's me again...) Are you doing character ranges in the tr/// under 'use encoding'? (I'm asking because I see a - in the middle of what I assume is mangled EUC-JP) Yes. that's where hiragana - katakana conversion is attempted; English equivalent of tr/A-Z/a-z/. Dan
Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;
On Wednesday, Oct 2, 2002, at 21:51 Asia/Tokyo, Jarkko Hietaniemi wrote: However, I will need to stare at your example some more, since for simpler cases I think tr/// *is* obeying the 'use encoding': use encoding 'greek'; ($a = \x{3af}bc\x{3af}de) =~ tr/\xdf/a/; print $a, \n; This does print abcade\n, and it also works when I replace the \xdf with the literal \xdf. I can explain that. \x{3af}bc\x{3af}de is is a string literal so it gets encoded. however, my example in escaped form is; $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ which does not get encoded. the intention was; $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/ That's why eval qq{ $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ } works because \xA4\xA1-\xA4\xF3 and \xA5\xA1-\xA5\xF3 are converted. to \x{3041}-\x{3093} and \x{30a1}-\x{30f3}, respectively. Dan
Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;
I can explain that. \x{3af}bc\x{3af}de is is a string literal so it gets encoded. however, my example in escaped form is; $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ which does not get encoded. the intention was; $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/ That's why eval qq{ $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ } works because \xA4\xA1-\xA4\xF3 and \xA5\xA1-\xA5\xF3 are converted. to \x{3041}-\x{3093} and \x{30a1}-\x{30f3}, respectively. I'm confused. Firstly, the tr/\xA4... converts bytes thusly: A1 - A1 A2 - A2 A3 - A3 A4 - A5 A5 - A5 F3 - A5 So why isn't it just tr/\xA4\xF3/\xA5/? Secondly, aren't you expecting tr/// to magically recognize that when the EUC-JP codes \xA4, \xA1 to \xA4, and \xF3 are converted to their Unicode counterparts they are supposed to spell out the Hiragana range? The range concept of tr/// is very limited. I think you want s///e. -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen
Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;
On Wed, Oct 02, 2002 at 10:44:06PM +0900, Dan Kogai wrote: On Wednesday, Oct 2, 2002, at 22:34 Asia/Tokyo, Jarkko Hietaniemi wrote: Yes. that's where hiragana - katakana conversion is attempted; English equivalent of tr/A-Z/a-z/. Okay... What are the {begin,end} codepoints of those ranges, both LHS and RHS of tr, both in EUC-JP and in Unicode? Both. I think the operation needed is straight-forward. When you get tr[LHS][RHS], decode'em then feed it to the naked tr// . Urk... That means a dip into the toke.c, how the tr/// ranges are implemented is... tricky. sv_recode_to_utf8() is needed somewhere... but I'm a little bit pressed for time right now. I suggest you perlbug this and move the process to perl5-porters. (Inaba Hiroto also might have insight on this; he's the tr///-with-Unicode sensei, really-- he practically implemented all of it. And he might read *[gk]ana much better than me :-) Dan -- Jarkko Hietaniemi [EMAIL PROTECTED] http://www.iki.fi/jhi/ There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen