On Sun, Jun 09, 2013 at 11:39:18AM -0400, Tom Lane wrote:
> The key point for me is that if tolower() actually does anything in the
> previous state of the code, it's more than likely going to produce
> invalidly encoded data.  The consequences of that can't be good.
> You can argue that there might be people out there for whom the
> transformation accidentally produced a validly-encoded string, but how
> likely is that really?  It seems much more likely that the only reason
> we've not had more complaints is that on most popular platforms, the
> code accidentally fails to fire on any UTF8 characters (or any common
> ones, anyway).  On those platforms, there will be no change of behavior.

Your hypothesis is that almost all libc tolower() implementations will in
every case either (a) turn a multi-byte character to byte soup not valid in
the server encoding or (b) leave it unchanged?  Quite possible.  If that
hypothesis holds, I agree that the committed change does not break
compatibility.  That carries a certain appeal.

I still anticipate regretting that we have approved and made reliable this
often-sufficed-by-accident behavior, particularly when the SQL standard calls
for something else.  But I think I now understand your reasoning.

> The resistance to moving this code to use towlower() for non-ASCII
> mainly comes from worries about speed, I think; although there was also
> something about downcasing conversions that change the string's byte
> length being problematic for some callers.

Considering that using ASCII-only or quoted identifiers sidesteps the speed
penalty altogether, that seems a poor cause for demur.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to