On 05/16/2007 12:57 AM, Neil wrote:
Dear All:

Question:

How come the length of Chinese word I print shows “ 3 “.
Isn’t it supposed to 2 bytes?

Program:
-----------------------------------
$str=”我”;

$str_len = length($str);

Print $str_len, “\n\n”;

------------------------------------
The result is 3

I took a picture for the program. In case of it doesn’t show Chinese word
in some of your system,
[...]
My environment:
[...]
Encode: Big5


Something is messed up with your locale or environment. Since you only have one character in $str, the length should be "1"--and that's what I get.

I saved your program two ways: as a utf8 file and as a big5 file; both programs produce the same result on my system: 1; however, to get your program to run, I had to change the quotes.

Here is the first program (saved in UTF8):
-----------------------------------
#!/usr/bin/perl
use utf8;
use strict;
use warnings;

my $str="我";

my $str_len = length($str);

print $str_len, "\n\n";
----------------------------------

Here is the second program (saved in Big5):
--------------------------------------------
#!/usr/bin/perl
use encoding big5 => STDOUT => 'utf8';
use strict;
use warnings;

my $str="§Ú";

my $str_len = length($str);

print $str_len, "\n\n";
print "data = $str\n";
--------------------------------------------

The second program displays this:
------start output-------
1

data = 我
-------end output--------

Evidently the Big5 character sequence \xA7\xDA represents the single Unicode character \x6211 which is the Chinese character 我. You probably just need to tell Perl about the encoding of your script.

My environment:
Perl 5.8.4
Debian 3.1
Encoding: UTF8


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to