Re: [HACKERS] [WIP] collation support revisited (phase 1)

Zdenek Kotala Tue, 22 Jul 2008 07:37:34 -0700

Martijn van Oosterhout napsal(a):

On Sat, Jul 12, 2008 at 10:02:24AM +0200, Zdenek Kotala wrote:
Background:
We specify encoding in initdb phase. ANSI specify repertoire, charset,encoding and collation. If I understand it correctly, then charset issubset of repertoire and specify list of allowed characters forlanguage->collation. Encoding is mapping of character set to binary format.For example for Czech alphabet(charset) we have 6 different encoding for8bit ASCII, but on other side for UTF8 there is specified multi charsets.
Oh, so you're thinking of a charset as a sort of check constraint. If
your locale is turkish and you have a column marked charset ASCII then
storing lower('HI') results in an error.


Yeah, if you use strcoll function it fails when illegal character is found.
See
http://www.opengroup.org/onlinepubs/009695399/functions/strcoll.html

A collation must be defined over all possible characters, it can't
depend on the character set. That doesn't mean sorting in en_US must do
something meaningful with japanese characters, it does mean it can't
throw an error (the usual procedure is to sort on unicode point).


Collation cannot be defined on any character. There is not any relation between

Latin and Chines characters. Collation has sense when you are able to specify <= > operators.

If you need compare Japanese and Latin characters then ansi specify defaultcollation for each repertoire. I think it is usually bitwise comparing.



                Zdenek

--
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] collation support revisited (phase 1)

Reply via email to