UTF8, UTF-8, utf8, Utf8 encoding blues

Chris Knipe Sat, 08 Nov 2014 01:54:08 -0800

Hi All,

I'm reading loads, and loads of very confusing and contradicting information
about UTF8 in Perl.  A lot of posts are also (rightfully IMHO) stating that
UTF8 is an absolute nightmare in Perl.


Can someone shed some light as to what is going on here please:

use Encoding;

SysLog("debug", "1 - DEBUG LENGTH: " . length($Response));
my $unicode_chars = Encode::decode('utf8', $Response);
SysLog("debug", "** ENCODING: " . find_encoding($Response));
my $newunicode_chars = substr($unicode_chars, 0, -3);
my $Body = $newunicode_chars;

Log:
Nov  8 11:44:59 cache12 perl[44786]: DEBUG: 1 - DEBUG LENGTH: 1001 
Nov  8 11:44:59 cache12 perl[44786]: DEBUG: ** ENCODING:  
Nov  8 11:44:59 cache12 perl[44786]: DEBUG: 2 - DEBUG LENGTH: 998

The idea is to remove the last three characters from the string (.\r\n).
Now whilst it looks like it worked because the length is 3 less, the
encoding is entirely whacked.  The encoding at the beginning and the
encoding at the end are different.  find_encoding() does not state which
encoding is used on the string initially, yet are, apparently, more than
happy to decode it as utf8. When Perl now re-encodes the string as utf8,
it's completely whacked and the string just plain and simply is wrong and
the data does not match CRC checksums.

I know for a FACT that the initial data is encoded using UTF8. When I remove
the code to strip the last 3 characters (.\r\n) from the $Response
everything works absolutely fine.  Unfortunately, I *must* remove these last
three characters.

Can anyone perhaps please shed some light on the subject for me.  



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

UTF8, UTF-8, utf8, Utf8 encoding blues

Reply via email to