Re: [HACKERS] [WIP] collation support revisited (phase 1)

Zdenek Kotala Tue, 22 Jul 2008 13:16:36 -0700

Martijn van Oosterhout napsal(a):

On Mon, Jul 21, 2008 at 03:15:56AM +0200, Radek Strnad wrote:

I was trying to sort out the problem with not creating new catalog for
character sets and I came up following ideas. Correct me if my ideas are
wrong.


Since collation has to have a defined character set.


Not really. AIUI at least glibc and ICU define a collation over all
possible characters (ie unicode). When you create a locale you take a
subset and use that. Think about it: if you want to sort strings and
one of them happens to contain a chinese charater, it can't *fail*.
Note strcoll() has no error return for unknown characters.


It has.
See http://www.opengroup.org/onlinepubs/009695399/functions/strcoll.html

The strcoll() function may fail if:

    [EINVAL]

[CX] The s1 or s2 arguments contain characters outside the domain ofthe collating sequence.

I'm suggesting to use
already written infrastructure of encodings and to use list of encodings in
chklocale.c. Currently databases are not created with specified character
set but with specified encoding. I think instead of pointing a record in
collation catalog to another record in character set catalog we might use
only name (string) of the encoding.


That's reasonable. From an abstract point of view collations and
encodings are orthoginal, it's only when you're using POSIX locales
that there are limitations on how you combine them. I think you can
assume a collation can handle any characters that can be produced by
encoding.

I think you are not correct. You cannot use collation over all UNICODE. Seehttp://www.unicode.org/reports/tr10/#Common_Misperceptions. Same characters canbe ordered differently in different languages.


                Zdenek



--
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] collation support revisited (phase 1)

Reply via email to