Martijn van Oosterhout napsal(a):
On Sat, Jul 12, 2008 at 10:02:24AM +0200, Zdenek Kotala wrote:
Background:
We specify encoding in initdb phase. ANSI specify repertoire, charset,
encoding and collation. If I understand it correctly, then charset is
subset of repertoire and specify list of allowed characters for
language->collation. Encoding is mapping of character set to binary format.
For example for Czech alphabet(charset) we have 6 different encoding for
8bit ASCII, but on other side for UTF8 there is specified multi charsets.
Oh, so you're thinking of a charset as a sort of check constraint. If
your locale is turkish and you have a column marked charset ASCII then
storing lower('HI') results in an error.
Yeah, if you use strcoll function it fails when illegal character is found.
See
http://www.opengroup.org/onlinepubs/009695399/functions/strcoll.html
A collation must be defined over all possible characters, it can't
depend on the character set. That doesn't mean sorting in en_US must do
something meaningful with japanese characters, it does mean it can't
throw an error (the usual procedure is to sort on unicode point).
Collation cannot be defined on any character. There is not any relation between
Latin and Chines characters. Collation has sense when you are able to specify <
= > operators.
If you need compare Japanese and Latin characters then ansi specify default
collation for each repertoire. I think it is usually bitwise comparing.
Zdenek
--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers