[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

Andrew Dunstan Sun, 09 Jun 2013 06:52:23 -0700


On 06/09/2013 12:38 AM, Noah Misch wrote:

On Sat, Jun 08, 2013 at 11:50:53PM -0400, Andrew Dunstan wrote:

On 06/08/2013 10:52 PM, Noah Misch wrote:

Let's return to the drawing board on this one.  I would be inclined to keep
the current bad behavior until we implement the i18n-aware case folding
required by SQL.  If I'm alone in thinking that, perhaps switch to downcasing
only ASCII characters regardless of the encoding.  That at least gives
consistent application behavior.


I apologize for not noticing to comment on this week's thread.

The behaviour which this fixes is an unambiguous bug. Calling tolower()
on the individual bytes of a multi-byte character can't possibly produce
any sort of correct result. A database that contains such corrupted
names, probably not valid in any encoding at all, is almost certainly
not restorable, and I'm not sure if it's dumpable either.

I agree with each of those points.  However, since any change here breaks
compatibility, we should fix it right the first time.  A second compatibility
break would be all the more onerous once this intermediate step helps more
users to start using unquoted, non-ASCII object names.

It's already
produced several complaints in recent months, so ISTM that returning to
it for any period of time is unthinkable.

PostgreSQL has lived with this wrong behavior since ... the beginning?  It's a
problem, certainly, but a bandage fix brings its own trouble.

If you have a better fix I am all ears. I can recall at least onediscussion of this area (concerning Turkish I quite a few years ago)where we failed to come up with anything.

I have a fairly hard time believing in your "relies on this and somehowworks" scenario.


cheers

andrew



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

Reply via email to