On 04/24/2007 03:06 AM, Jeff Pang wrote:
2007/4/24, Beau E. Cox <[EMAIL PROTECTED]>:

How do I get a proper conversion from iso-8859-1 to perl's internal utf8?

Hello,

You may use Encode module's decode function to do this conversion.
ie,for this string which was 'gb2312' format,

$str = "中文";

We use decode to convert it to perl's internal utf8,

$str2 = decode('gb2312',$str);

But you can't see this $str2 since it's perl's internal encode
format.So you need to convert it to corresponding output format,like,

$output = encode('utf8',$str2);

then print it out,
print $output;

The whole things can be written one line,

$ perl -MEncode -e '$out=encode("utf8",decode("gb2312","中文"));print $out'

That would get the correct result you wanted (output with utf8).
Hope this helps.


I don't think it'll work in this case because \x99 doesn't seem to be inside my list of iso-8859-1 characters.

The document Mr. Cox is downloading is less than truthful about its character set. Although it advertises itself as iso-8859-1, it's actually cp1250.

Mr. Cox, I got your program to decode the text properly by changing the decoding line to this:

my $name1 = decode( 'cp1250', $name );

Have a nice day.

BTW, I got the list of valid cp1250 characters here: http://www.microsoft.com/typography/unicode/1250.htm

Read "perldoc Encode::Supported" to see the list of supported character sets.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to