Marko Karppinen wrote:I think this interaction between the locale and server_encoding is confusing. Is there any use case for running an incompatible mix? If not, would it not make sense to fetch initdb's default database encoding with nl_langinfo(CODESET) instead of using SQL_ASCII?
Peter Eisentraut wrote:
This would be fine and dandy if we had any sort of idea about what sort of strings nl_langinfo(CODESET) returns and how to map them to our encoding names.
Karel Zak posted an answer to this last year, here on pgsql-hackers: http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php It's not complete, but it's sort of an idea.
The code is under LGPL, but copyright doesn't reach down to the actual information about the encoding strings used by various operating systems, so it's possible to reappropriate. I'd imagine that it covers many, if not most, of the likely cases.
The current situation of upper/lower/collating/etc just being broken by default on many non-C locales is bad enough to warrant bailing out during initdb when this situation is detected (with a reasonably cautious heuristic).
It used to be that you got what you deserved if you were stupid enough to define a non-C, non-ASCII-based locale. You had only yourself to blame for everything breaking. These days, however, millions of systems get shipped and installed with UTF-8 locales on by default, so it's not possible to portray this as an user error.
Requiring every one of these people to configure initdb's encoding manually would be harsh, however, so I think that an heuristic that'd work with most modern systems would strike an appropriate balance of correctness and path-of-least-surprise.
mk
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org