Quoth [EMAIL PROTECTED] (John Delacour): > At 12:31 am +0800 3/12/04, He Zhiqiang wrote: > > >Now i encountered another problem, there are a few files contains > >not only one charset but also two or more, for example, file1 > >contains japanese and chinese, if i use open() to load the data > >into memory, ord and length etc.. can't correctly work! Perhasp i > >miss something to encode or decode the data ? > >code: > >#!/usr/bin/perl -w > >use utf8; > >open(FD, "< file1"); > >while(<FD>) { > >chomp; > >print "length = ".length($_); > >} > >close FD; > >---------- > >length() can not count the correct non-ASCII characters. :( > > If the file is in UTF-8, then it may be in any number of _languages_ > but it uses only one character set -- Unicode. So far as I know "use > utf8" is now redundant and ineffectual in Perl.
Both utf8.pm and encoding.pm alter the encoding Perl considers your *source file* to be in. This is different from what utf8.pm did under 5.6. > You will get the > correct character count (6 characters rather than 18 bytes) by > opening the file handle as utf-8 as below. > > no warnings; > my $f = "/tmp/cjk.txt"; > my $text = "\x{56d8}\x{56d9}\x{56da}\x{56db}\x{56dc}\x{56dd}\n"; > open F, ">$f"; binmode F; both for portability and in case of some environment setting (PERLIO, the locale variables with 5.8.0 or -C) having set some other encoding on the data. > print F $text; # writes $text to $f as UTF-8 utf8::encode $text; # make sure $text is a a sequence of octets not # characters print F $text; > close F; > open F, "<:utf8", $f; > for (<F>) { > chomp; > print "$_ - Length = " . length() . $/; > } Ben -- Joy and Woe are woven fine, A Clothing for the Soul divine William Blake Under every grief and pine 'Auguries of Innocence' Runs a joy with silken twine. [EMAIL PROTECTED]