Le 31 dÃc. 03, Ã 16:28, [EMAIL PROTECTED] a Ãcrit :


Why are you using:


use encoding 'utf8';

?

So that, for the sake of keeping the snippet short, Perl would know that my character constant was in utf-8, and that the "print" statements would output utf-8 as well. I typed the source code in an utf-8 editor, and used a utf-8 terminal to run it. I apologize for not making this clear.


Without it, perl 5.8.1, I see output:


1 Ã
2 Ã
3 Gro

Without the "use encoding" Perl is just doing bytes, you lose the unicode character semantics and end up with "3 Gro" which is wrong, GroÃbritannien is one word.


When I run with your use encoding 'utf8'; I get an error from perl:
Malformed UTF-8 character (unexpected non-continuation byte 0x62, immediately after start byte 0xdf) in pattern match (m//) at /tmp/w.pl line 9.

So you have 0xdf 0x62 which is Ãb in latin1. My sample assumes utf-8, in utf-8 Ãb is 0xc3 0x9f 0x62.

In other words you're not running the same code as I am.
With such a latin1 source code and of course dropping
the "use encoding" line, the character constant needs to
be explicitely decoded to unicode:

$x = Encode::decode("iso-8859-1", "GroÃbritannien");

...which yields the same results of course:

1
2 Ã
3 GroÃbritannien


------------------------------------------ #!/usr/bin/perl -w

use strict;
use encoding 'utf8';

my $x = 'GroÃbritannien';
$\ = "\n";

print '1 ', $x =~ /(\W+)/;
print '2 ', $x =~ /([\W]+)/;
print '3 ', $x =~ /(\w+)/;

exit(0);

--
Eric Cholet



Reply via email to