On 04/24/2007 03:06 AM, Jeff Pang wrote:
2007/4/24, Beau E. Cox <[EMAIL PROTECTED]>:
How do I get a proper conversion from iso-8859-1 to perl's internal utf8?
Hello,
You may use Encode module's decode function to do this conversion.
ie,for this string which was 'gb2312' format,
$str = "中文";
We use decode to convert it to perl's internal utf8,
$str2 = decode('gb2312',$str);
But you can't see this $str2 since it's perl's internal encode
format.So you need to convert it to corresponding output format,like,
$output = encode('utf8',$str2);
then print it out,
print $output;
The whole things can be written one line,
$ perl -MEncode -e '$out=encode("utf8",decode("gb2312","中文"));print $out'
That would get the correct result you wanted (output with utf8).
Hope this helps.
I don't think it'll work in this case because \x99 doesn't seem to be
inside my list of iso-8859-1 characters.
The document Mr. Cox is downloading is less than truthful about its
character set. Although it advertises itself as iso-8859-1, it's
actually cp1250.
Mr. Cox, I got your program to decode the text properly by changing the
decoding line to this:
my $name1 = decode( 'cp1250', $name );
Have a nice day.
BTW, I got the list of valid cp1250 characters here:
http://www.microsoft.com/typography/unicode/1250.htm
Read "perldoc Encode::Supported" to see the list of supported character
sets.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/