On 26.01.2012 17:31, Tom Lane wrote:
Heikki Linnakangas<heikki.linnakan...@enterprisedb.com>  writes:
The thing is, there's currently no encoding conversion happening, so if
you have one database in LATIN1 encoding and another in UTF-8, for
example, whatever you put in your postgresql.conf is going to be wrong
for one database. I'm happy to just document the issue for per-database
messages, "ALTER DATABASE ... SET welcome_message", the encoding used
there need to match the encoding of the database, or it's displayed as
garbage. But what about per-user messages, when the user has access to
several databases, or postgresql.conf?

I've not looked at the patch, but what exactly will happen if the string
has the wrong encoding?

You get an incorrectly encoded string, ie. garbage, in your console, when you log in with psql.

You can also use current_setting() to copy the incorrectly-encoded string elsewhere in the system. If you insert it into a table and run pg_dump, I think the dump might not be restorable. That's a bit of a stretch, perhaps, but it would be nice to avoid that.

BTW, you can already do that if you set e.g default_text_search_config to something non-ASCII in postgresql.conf. Or if you do it with search_path, you get a warning at login. For example, I did "ALTER USER foouser set search_path ='kääk';" in a LATIN1 database, and then connected to a UTF-8 database and got:

$ ~/pgsql.master/bin/psql postgres foouser
WARNING:  invalid value for parameter "search_path": ""k��k""
DETAIL:  schema "k��k" does not exist
psql (9.2devel)
Type "help" for help.

(in case that didn't get across right, I set the search_path to a string containing two a-with-umlauts, and in the warning, they got replaced with question marks with inverse colors, which is apparently a character that the console uses to display bytes that are not valid UTF-8).

The problem with welcome_message would look just like that. No-one is likely to run into that with search_path, but it's quite reasonable and expected to use your native language in a welcome message.

The idea that occurs to me is to have the code that uses the GUC do a
verify_mbstr(noerror) on it, and silently ignore it if it doesn't pass
(maybe with a LOG message).  This would have to be documented of course,
but it seems better than the potential consequences of trying to send a
wrongly-encoded string.

Hmm, fine with me. It would be nice to plug the hole that these bogus characters can leak elsewhere into the system through current_setting, though. Perhaps we could put the verify_mbstr() call somewhere in guc.c, to forbid incorrectly encoded characters from being stored in the guc variable in the first place.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to