Sent from my iPhone
> On Oct 12, 2019, at 3:32 AM, Matthias Apitz <[email protected]> wrote:
>
> El día viernes, octubre 11, 2019 a las 04:03:31p. m. -0600, Jon Jensen
> escribió:
>
>> Perl's internal storage of string data is a little odd. \xe4 is the
>> correct Unicode code point as per:
>>
>> https://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29
>>
>> It is not UTF-8 encoded, true, but there's no reason Perl internally needs
>> to use UTF-8 specifically, and I believe for Latin-1 it does not by
>> default. It's a question of in-memory storage and processing (some kind of
>> Unicode) vs. input/output (where you want UTF-8).
>>
>> If your script is configured to send UTF-8 to STDOUT, then I would expect
>> that \xe4 will show up as the UTF-8 \xc3\xa4 instead.
>
> I inserted another row into this table, encoded in UTF-8:
>
> pos71=# select d02name from d02ben where d02bnr = '08.05.1945' ;
> освобождение
>
> pos71=# select d02name::bytea from d02ben where d02bnr = '08.05.1945' ;
> \xd0bed181d0b2d0bed0b1d0bed0b6d0b4d0b5d0bdd0b8d0b520202020202020 ...
>
> If I run this through Perl DBD::Pg:
>
> @row = $sth->fetchrow_array;
> $HexStr = unpack("H*", $row[0]);
> print "HexStr: " . $HexStr . "\n";
> print "$row[0]\n";
>
> binmode(STDOUT, ':encoding(utf8)');
> print "after binmode: $row[0]\n";
>
>
> it gives:
>
> DBI is version 1.642, DBD::Pg is version 3.10.0
> client_encoding=UTF8, server_encoding=UTF8
> HexStr: 3e41323e313e3634353d38352020202020202020 ...
> Wide character in print at ./utf8-01.pl line 66.
> освобождение
> after binmode: освобождение
>
> and if I add an utf8::encode($row[0]) after the fetch, like:
>
> @row = $sth->fetchrow_array;
> utf8::encode($row[0]);
>
> it gives the correkt UTF-8 encoding:
>
> DBI is version 1.642, DBD::Pg is version 3.10.0
> client_encoding=UTF8, server_encoding=UTF8
> HexStr: d0bed181d0b2d0bed0b1d0bed0b6d0b4d0b5d0bdd0b8d0b520202020202020 ...
> освобождение
> after binmode: оÑвобождение
>
> i.e. the array returned by $sth->fetchrow_array does not contain an UTF-8
> string.
>
> Why it has to be passed through utf8::encode($row[0]) ?
This seems related to the behavior of pg_encode_utf8. If you want/need to deal
with octets only then set it to 0 on your $dbh.
$dbh = DBI->connect(...);
$dbh->{pg_encode_utf8} = 0;
...
HTH,
David