David E. Wheeler napsal(a):
On Jul 7, 2008, at 12:21, David E. Wheeler wrote:
My question is: why? Shouldn't they all use the same function for
comparison? I'm happy to dupe this implementation for citext, but I
don't understand it. Should not all comparisons be executed consistently?
Let me try to answer my own question by citing this comment:
/*
* Since we only care about equality or not-equality, we can avoid
all the
* expense of strcoll() here, and just do bitwise comparison.
*/
So, the upshot is that the = and <> operators are not locale-aware, yes?
They just do byte comparisons. Is that really the way it should be? I
mean, could there not be strings that are equivalent but have different
bytes?
Correct. The problem is complex. It works fine only for normalized string. But
postgres now assume that all utf8 strings are normalized.
If you need to implement < <= >= > operators you need to use strcol which take
care of locale collation.
See unicode collation algorithm http://www.unicode.org/reports/tr10/
Zdenek
--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers