On 26.01.2012 17:31, Tom Lane wrote:
Heikki Linnakangas<heikki.linnakan...@enterprisedb.com> writes:
The thing is, there's currently no encoding conversion happening, so if
you have one database in LATIN1 encoding and another in UTF-8, for
example, whatever you put in your postgresql.conf is going to be wrong
for one database. I'm happy to just document the issue for per-database
messages, "ALTER DATABASE ... SET welcome_message", the encoding used
there need to match the encoding of the database, or it's displayed as
garbage. But what about per-user messages, when the user has access to
several databases, or postgresql.conf?
I've not looked at the patch, but what exactly will happen if the string
has the wrong encoding?
You get an incorrectly encoded string, ie. garbage, in your console,
when you log in with psql.
You can also use current_setting() to copy the incorrectly-encoded
string elsewhere in the system. If you insert it into a table and run
pg_dump, I think the dump might not be restorable. That's a bit of a
stretch, perhaps, but it would be nice to avoid that.
BTW, you can already do that if you set e.g default_text_search_config
to something non-ASCII in postgresql.conf. Or if you do it with
search_path, you get a warning at login. For example, I did "ALTER USER
foouser set search_path ='kääk';" in a LATIN1 database, and then
connected to a UTF-8 database and got:
$ ~/pgsql.master/bin/psql postgres foouser
WARNING: invalid value for parameter "search_path": ""k��k""
DETAIL: schema "k��k" does not exist
psql (9.2devel)
Type "help" for help.
(in case that didn't get across right, I set the search_path to a string
containing two a-with-umlauts, and in the warning, they got replaced
with question marks with inverse colors, which is apparently a character
that the console uses to display bytes that are not valid UTF-8).
The problem with welcome_message would look just like that. No-one is
likely to run into that with search_path, but it's quite reasonable and
expected to use your native language in a welcome message.
The idea that occurs to me is to have the code that uses the GUC do a
verify_mbstr(noerror) on it, and silently ignore it if it doesn't pass
(maybe with a LOG message). This would have to be documented of course,
but it seems better than the potential consequences of trying to send a
wrongly-encoded string.
Hmm, fine with me. It would be nice to plug the hole that these bogus
characters can leak elsewhere into the system through current_setting,
though. Perhaps we could put the verify_mbstr() call somewhere in guc.c,
to forbid incorrectly encoded characters from being stored in the guc
variable in the first place.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers