On Mon, Oct 14, 2019 at 10:46 PM Matthias Apitz <[email protected]> wrote:
> i.e. from the database PG server is coming the code point correctly as
> (octal) \303\244 which is the same as \xc3\xa4. And Perl mangles this to
>
> [UTF8 "P\x{e4}dagogische Hochschule Weingarten"]
>
> which is IMHO not correct and causing all this confusion.
>
> We have to deal with this in our perl code. It's not a PostrgreSQL
> problem.
>
We use UTF8 extensively with Pg and Perl and have no issues,
so I suspect there's a configuration issue somewhere.
And yes, I don't think you should be worrying about how Perl encodes things
internally.
>From perlunifaq:
> *I lost track; what encoding is the internal format really?*
It's good that you lost track, because you shouldn't depend on the internal
> format being any specific encoding. But since you asked: by default, the
> internal format is either ISO-8859-1 (latin-1), or utf8, depending on the
> history of the string. On EBCDIC platforms, this may be different even.
> Perl knows how it stored the string internally, and will use that
> knowledge when you encode . In other words: don't try to find out what the
> internal encoding for a certain string is, but instead just encode it into
> the encoding that you want.
https://perldoc.perl.org/perlunifaq.html#INTERNALS
Maurice