Dana Sharvit - M <[EMAIL PROTECTED]> writes: >Hi , >I am using the Encode module (perl 5.8)to convert a string from utf8 to big >5. >There is something that I do not understand that I thought you may help >with: >The input to the program is a file that contains a utf8 string, >The encoding works properly only when I use the following code: > use Encode qw(encode decode find_encoding from_to); > >$file = shift; >open my $in, $file; > >while ($str = <$in>) { > chomp; > open my $in1, "<:encoding(utf8)", \$str; > while (<$in1>) { > $octet = encode("utf8", $_); > from_to($octet, "utf8","big5"); > print "$octet\n"; > } > > >} >close $in;
That code reads the file, and decodes the UTF-8 to get characters (due to :encoding). As they are now characters you re-encode back to UTF-8 octets, then use from_to to take those re-encoded octets, decode them again (internal to from_to) and then re-encode as big5. This is a lot of pointless re-encoding! You should either read the file as octets, or keep the :encoding (which is safer with respect to locale effects) and just encode: open my $in1, "<:encoding(utf8)", $file; while (<$in1>) { chomp; $octet = encode("big5", $_); print "$octet\n"; } close $in; > >what I dont understand is two things: >1.why do I need to read the string using IO ( open my $in1, >"<:encoding(utf8)", \$str;) If your environment expects big5 (as the fact you do raw print suggests) then something is probably assuming data read from files is big5 and not utf8 - so you have to tell it. >2.why do I need to use the encode function before the from_to >function($octet = encode("utf8", $_);) See above. > >I thought that the bellow code will convert correctly but it does not: >use Encode qw(encode decode find_encoding from_to); > >$file = shift; >open my $in, $file; > >while ($str = <$in>) { > chomp($str); > from_to($octet, "utf8","big5"); > print "$octet\n"; Presumably as from_to modifies string "in place" (ugh) you meant: $file = shift; open my $in, $file; while (defined($str = <$in>)) { from_to($str,"utf8,"big5"); print $str; } That should work unless you have something which causes open to assume some encoding. > >} >close $in; > >Thank you >Dana