Martijn van Oosterhout napsal(a):
On Mon, Jul 21, 2008 at 03:15:56AM +0200, Radek Strnad wrote:
I was trying to sort out the problem with not creating new catalog for
character sets and I came up following ideas. Correct me if my ideas are
wrong.

Since collation has to have a defined character set.

Not really. AIUI at least glibc and ICU define a collation over all
possible characters (ie unicode). When you create a locale you take a
subset and use that. Think about it: if you want to sort strings and
one of them happens to contain a chinese charater, it can't *fail*.
Note strcoll() has no error return for unknown characters.

It has.
See http://www.opengroup.org/onlinepubs/009695399/functions/strcoll.html

The strcoll() function may fail if:

    [EINVAL]
[CX] The s1 or s2 arguments contain characters outside the domain of the collating sequence.


I'm suggesting to use
already written infrastructure of encodings and to use list of encodings in
chklocale.c. Currently databases are not created with specified character
set but with specified encoding. I think instead of pointing a record in
collation catalog to another record in character set catalog we might use
only name (string) of the encoding.

That's reasonable. From an abstract point of view collations and
encodings are orthoginal, it's only when you're using POSIX locales
that there are limitations on how you combine them. I think you can
assume a collation can handle any characters that can be produced by
encoding.

I think you are not correct. You cannot use collation over all UNICODE. See http://www.unicode.org/reports/tr10/#Common_Misperceptions. Same characters can be ordered differently in different languages.

                Zdenek



--
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to