Sent from my iPhone

> On Oct 12, 2019, at 3:32 AM, Matthias Apitz <[email protected]> wrote:
> 
> El día viernes, octubre 11, 2019 a las 04:03:31p. m. -0600, Jon Jensen 
> escribió:
> 
>> Perl's internal storage of string data is a little odd. \xe4 is the 
>> correct Unicode code point as per:
>> 
>> https://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29
>> 
>> It is not UTF-8 encoded, true, but there's no reason Perl internally needs 
>> to use UTF-8 specifically, and I believe for Latin-1 it does not by 
>> default. It's a question of in-memory storage and processing (some kind of 
>> Unicode) vs. input/output (where you want UTF-8).
>> 
>> If your script is configured to send UTF-8 to STDOUT, then I would expect 
>> that \xe4 will show up as the UTF-8 \xc3\xa4 instead.
> 
> I inserted another row into this table, encoded in UTF-8:
> 
> pos71=# select d02name from d02ben where d02bnr = '08.05.1945' ;
> освобождение
> 
> pos71=# select d02name::bytea from d02ben where d02bnr = '08.05.1945' ;
> \xd0bed181d0b2d0bed0b1d0bed0b6d0b4d0b5d0bdd0b8d0b520202020202020 ...
> 
> If I run this through Perl DBD::Pg:
> 
>   @row = $sth->fetchrow_array;
>   $HexStr = unpack("H*", $row[0]);
>   print "HexStr: " . $HexStr . "\n";
>   print "$row[0]\n";
> 
>   binmode(STDOUT, ':encoding(utf8)');
>   print "after binmode: $row[0]\n";
> 
> 
> it gives:
> 
> DBI is version 1.642, DBD::Pg is version 3.10.0
> client_encoding=UTF8, server_encoding=UTF8
> HexStr: 3e41323e313e3634353d38352020202020202020 ...
> Wide character in print at ./utf8-01.pl line 66.
> освобождение
> after binmode: освобождение
> 
> and if I add an utf8::encode($row[0]) after the fetch, like:
> 
>   @row = $sth->fetchrow_array;
>   utf8::encode($row[0]);
> 
> it gives the correkt UTF-8 encoding:
> 
> DBI is version 1.642, DBD::Pg is version 3.10.0
> client_encoding=UTF8, server_encoding=UTF8
> HexStr: d0bed181d0b2d0bed0b1d0bed0b6d0b4d0b5d0bdd0b8d0b520202020202020 ...
> освобождение
> after binmode: оÑвобождение
> 
> i.e. the array returned by $sth->fetchrow_array does not contain an UTF-8 
> string.
> 
> Why it has to be passed through utf8::encode($row[0]) ?

This seems related to the behavior of pg_encode_utf8. If you want/need to deal 
with octets only then set it to 0 on your $dbh. 

$dbh = DBI->connect(...);
$dbh->{pg_encode_utf8} = 0;
...

HTH,

David 

Reply via email to