Marko Karppinen wrote:
I think this interaction between the locale and server_encoding is
confusing. Is there any use case for running an incompatible mix?
If not, would it not make sense to fetch initdb's default database
encoding with nl_langinfo(CODESET) instead of using SQL_ASCII?

Peter Eisentraut wrote:
This would be fine and dandy if we had any sort of idea about what sort
of strings nl_langinfo(CODESET) returns and how to map them to our
encoding names.

Karel Zak posted an answer to this last year, here on pgsql-hackers: http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php It's not complete, but it's sort of an idea.

The code is under LGPL, but copyright doesn't reach down to the
actual information about the encoding strings used by various
operating systems, so it's possible to reappropriate. I'd imagine
that it covers many, if not most, of the likely cases.

The current situation of upper/lower/collating/etc just being
broken by default on many non-C locales is bad enough to warrant
bailing out during initdb when this situation is detected
(with a reasonably cautious heuristic).

It used to be that you got what you deserved if you were stupid
enough to define a non-C, non-ASCII-based locale. You had only
yourself to blame for everything breaking. These days, however,
millions of systems get shipped and installed with UTF-8 locales
on by default, so it's not possible to portray this as an user error.

Requiring every one of these people to configure initdb's encoding
manually would be harsh, however, so I think that an heuristic
that'd work with most modern systems would strike an appropriate
balance of correctness and path-of-least-surprise.

mk


---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to