On Sun, Feb 10, 2013 at 06:47:30PM -0500, Tom Lane wrote:
> Noah Misch <n...@leadboat.com> writes:
> > Following some actual testing, I see that we treat postgresql.conf values as
> > byte sequences; any reinterpretation as encoded text happens later.  Hence,
> > contrary to my earlier suspicion, your patch does not make that situation
> > worse.  The present situation is bad; among other things, current_setting() 
> > is
> > a vector for injecting invalid text data.  But unconditionally validating
> > postgresql.conf values in the platform encoding would not be an improvement.
> > Suppose you have a UTF-8 platform encoding and KOI8R databases.  You may 
> > wish
> > to put KOI8R strings in a GUC, say search_path.  That's possible today; if 
> > we
> > required that postgresql.conf conform to the platform encoding and no other,
> > it would become impossible.  This area warrants improvement, but doing so 
> > will
> > entail careful design.
> 
> The key problem, ISTM, is that it's not at all clear what encoding to
> expect the incoming data to be in.  I'm concerned about trying to fix
> that by assuming it's in some "platform encoding" --- for one thing,
> while that might be a well-defined concept on Windows, I don't believe
> it is anywhere else.

GetPlatformEncoding() imposes a sufficiently-portable definition.  I just
don't think that definition leads to a value that can be presumed desirable
and adequate for postgresql.conf.

> If we knew that postgresql.conf was stored in, say, UTF8, then it would
> probably be possible to perform encoding conversion to get string
> variables into the database encoding.  Perhaps we should allow some
> magic syntax to tell us the encoding of a config file?
> 
>       file_encoding = 'utf8'  # must precede any non-ASCII in the file
> 
> There would still be a lot of practical problems to solve, like what to
> do if we fail to convert some string into the database encoding.  But at
> least the problems would be somewhat well-defined.

Agreed.  That's a promising direction.

> While we're thinking about this, it'd be nice to fix our handling (or
> rather lack of handling) of encoding considerations for database names,
> user names, and passwords.  I could imagine adding some sort of encoding
> marker to connection request packets, which could fix the don't-know-
> the-encoding problem as far as incoming data is concerned.

That deserves a TODO entry under Wire Protocol Changes to avoid losing it.

> But how
> shall we deal with storing the strings in shared catalogs, which have to
> be readable from multiple databases possibly of different encodings?

I suppose we would pick an encoding sufficient for all values we intend to
support (UTF8?  MULE_INTERNAL?), then store the data in that encoding using
either bytea or a new type, say "omnitext".

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to