For my own amusement I'm reading the sql 92 spec about character sets. There are some concepts that are a bit difficult that maybe someone can explain for me:
character set character repertoire for example in 4.2.1 it says: A character set is described by a character set descriptor. A character set descriptor includes: - the name of the character set or character repertoire, - if the character set is a character repertoire, then the name of the form-of-use, - an indication of what characters are in the character set, and - the name of the default collation of the character set. What I have understod so far is that form-of-use is the encoding. So if the character set is UNICODE then the form-of-use could be UTF-8, UTF-16 and so on. The character repertoire however I don't have an intuition about it all. Then we have this little section: The <implementation-defined character repertoire name> SQL_TEXT specifies the name of a character repertoire and implied form-of- use that can represent every character that is in <SQL language character> and all other characters that are in character sets supported by the implementation. Had unicode been a superset of all character sets, then one could just have used unicode for SQL_TEXT. Exactly how do we create a character repertoire that can store any character from any character set.. Storing the character set for each character is not such a cool thing to do even if it would work :-) SQL_ASCII in pg is similar, it's basically a number of bytes. But the spec seems to say that one should be able to count the characters as well (not the bytes) so SQL_ASCII is not the same as SQL_TEXT. ps. This is not me volunteering to implement all this :-) -- /Dennis Björklund ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend