[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

Noah Misch Sat, 08 Jun 2013 21:39:26 -0700

On Sat, Jun 08, 2013 at 11:50:53PM -0400, Andrew Dunstan wrote:
> On 06/08/2013 10:52 PM, Noah Misch wrote:


>> Let's return to the drawing board on this one.  I would be inclined to keep
>> the current bad behavior until we implement the i18n-aware case folding
>> required by SQL.  If I'm alone in thinking that, perhaps switch to downcasing
>> only ASCII characters regardless of the encoding.  That at least gives
>> consistent application behavior.
>>
>> I apologize for not noticing to comment on this week's thread.
>>
>
> The behaviour which this fixes is an unambiguous bug. Calling tolower()  
> on the individual bytes of a multi-byte character can't possibly produce  
> any sort of correct result. A database that contains such corrupted  
> names, probably not valid in any encoding at all, is almost certainly  
> not restorable, and I'm not sure if it's dumpable either.

I agree with each of those points.  However, since any change here breaks
compatibility, we should fix it right the first time.  A second compatibility
break would be all the more onerous once this intermediate step helps more
users to start using unquoted, non-ASCII object names.

> It's already  
> produced several complaints in recent months, so ISTM that returning to  
> it for any period of time is unthinkable.

PostgreSQL has lived with this wrong behavior since ... the beginning?  It's a
problem, certainly, but a bandage fix brings its own trouble.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

Reply via email to