El día viernes, octubre 11, 2019 a las 04:03:31p. m. -0600, Jon Jensen escribió:

> Perl's internal storage of string data is a little odd. \xe4 is the 
> correct Unicode code point as per:
> 
> https://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29
> 
> It is not UTF-8 encoded, true, but there's no reason Perl internally needs 
> to use UTF-8 specifically, and I believe for Latin-1 it does not by 
> default. It's a question of in-memory storage and processing (some kind of 
> Unicode) vs. input/output (where you want UTF-8).
> 
> If your script is configured to send UTF-8 to STDOUT, then I would expect 
> that \xe4 will show up as the UTF-8 \xc3\xa4 instead.

I inserted another row into this table, encoded in UTF-8:

pos71=# select d02name from d02ben where d02bnr = '08.05.1945' ;
 освобождение

pos71=# select d02name::bytea from d02ben where d02bnr = '08.05.1945' ;
 \xd0bed181d0b2d0bed0b1d0bed0b6d0b4d0b5d0bdd0b8d0b520202020202020 ...

If I run this through Perl DBD::Pg:

   @row = $sth->fetchrow_array;
   $HexStr = unpack("H*", $row[0]);
   print "HexStr: " . $HexStr . "\n";
   print "$row[0]\n";

   binmode(STDOUT, ':encoding(utf8)');
   print "after binmode: $row[0]\n";


it gives:

DBI is version 1.642, DBD::Pg is version 3.10.0
client_encoding=UTF8, server_encoding=UTF8
HexStr: 3e41323e313e3634353d38352020202020202020 ...
Wide character in print at ./utf8-01.pl line 66.
освобождение
after binmode: освобождение

and if I add an utf8::encode($row[0]) after the fetch, like:

   @row = $sth->fetchrow_array;
   utf8::encode($row[0]);

it gives the correkt UTF-8 encoding:

DBI is version 1.642, DBD::Pg is version 3.10.0
client_encoding=UTF8, server_encoding=UTF8
HexStr: d0bed181d0b2d0bed0b1d0bed0b6d0b4d0b5d0bdd0b8d0b520202020202020 ...
освобождение
after binmode: оÑвобождение

i.e. the array returned by $sth->fetchrow_array does not contain an UTF-8 
string.

Why it has to be passed through utf8::encode($row[0]) ?

Thanks

        matthias

-- 
Matthias Apitz, ✉ [email protected], http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

3. Oktober! Wir gratulieren! Der Berliner Fernsehturm wird 50 
aus: https://www.jungewelt.de/2019/10-02/index.php

Attachment: signature.asc
Description: PGP signature

Reply via email to