On Fri, 11 Oct 2019, Matthias Apitz wrote:
The resulting column contains ISO 8859-1 data:
HexStr:
50e46461676f67697363686520486f6368736368756c65205765696e67617274656e2020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020
P<E4>dagogische Hochschule Weingarten
That output is from a print statement going to STDOUT, so I wonder, are
you already telling Perl that you want UTF-8 output with something like
this at the beginning of your script?
use open qw( :std :utf8 );
Otherwise Perl may be defaulting to writing out Latin-1.
Hi Jon,
Of course the line
P<E4>dagogische Hochschule Weingarten
is caused by missing UTF-8 on STDOUT. But, the line with
HexStr: 50e464616 ...
is not and shows that the \xe4 is there. Why?
Perl's internal storage of string data is a little odd. \xe4 is the
correct Unicode code point as per:
https://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29
It is not UTF-8 encoded, true, but there's no reason Perl internally needs
to use UTF-8 specifically, and I believe for Latin-1 it does not by
default. It's a question of in-memory storage and processing (some kind of
Unicode) vs. input/output (where you want UTF-8).
If your script is configured to send UTF-8 to STDOUT, then I would expect
that \xe4 will show up as the UTF-8 \xc3\xa4 instead.
Jon
--
Jon Jensen
End Point Corporation
https://www.endpoint.com/