Hi All, I'm reading loads, and loads of very confusing and contradicting information about UTF8 in Perl. A lot of posts are also (rightfully IMHO) stating that UTF8 is an absolute nightmare in Perl.
Can someone shed some light as to what is going on here please: use Encoding; SysLog("debug", "1 - DEBUG LENGTH: " . length($Response)); my $unicode_chars = Encode::decode('utf8', $Response); SysLog("debug", "** ENCODING: " . find_encoding($Response)); my $newunicode_chars = substr($unicode_chars, 0, -3); my $Body = $newunicode_chars; Log: Nov 8 11:44:59 cache12 perl[44786]: DEBUG: 1 - DEBUG LENGTH: 1001 Nov 8 11:44:59 cache12 perl[44786]: DEBUG: ** ENCODING: Nov 8 11:44:59 cache12 perl[44786]: DEBUG: 2 - DEBUG LENGTH: 998 The idea is to remove the last three characters from the string (.\r\n). Now whilst it looks like it worked because the length is 3 less, the encoding is entirely whacked. The encoding at the beginning and the encoding at the end are different. find_encoding() does not state which encoding is used on the string initially, yet are, apparently, more than happy to decode it as utf8. When Perl now re-encodes the string as utf8, it's completely whacked and the string just plain and simply is wrong and the data does not match CRC checksums. I know for a FACT that the initial data is encoded using UTF8. When I remove the code to strip the last 3 characters (.\r\n) from the $Response everything works absolutely fine. Unfortunately, I *must* remove these last three characters. Can anyone perhaps please shed some light on the subject for me. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/