[HACKERS] sql92 character sets

Dennis Bjorklund Tue, 13 Apr 2004 01:19:00 -0700

For my own amusement I'm reading the sql 92 spec about character sets. 
There are some concepts that are a bit difficult that maybe someone can 
explain for me:


   character set
   character repertoire

for example in 4.2.1 it says:

  A character set is described by a character set descriptor. A 
  character set descriptor includes:
                                                                                
    -  the name of the character set or character repertoire,
                                                                                
    -  if the character set is a character repertoire, then the name 
       of the form-of-use,
                                                                                
    -  an indication of what characters are in the character set, and
                                                                                
    -  the name of the default collation of the character set.


What I have understod so far is that form-of-use is the encoding. So if 
the character set is UNICODE then the form-of-use could be UTF-8, UTF-16 
and so on.

The character repertoire however I don't have an intuition about it all.



Then we have this little section:

  The <implementation-defined character repertoire name> SQL_TEXT
  specifies the name of a character repertoire and implied form-of-
  use that can represent every character that is in <SQL language
  character> and all other characters that are in character sets
  supported by the implementation.

Had unicode been a superset of all character sets, then one could just 
have used unicode for SQL_TEXT. Exactly how do we create a character 
repertoire that can store any character from any character set.. Storing 
the character set for each character is not such a cool thing to do 
even if it would work :-)

SQL_ASCII in pg is similar, it's basically a number of bytes. But the spec 
seems to say that one should be able to count the characters as well (not 
the bytes) so SQL_ASCII is not the same as SQL_TEXT.

ps. This is not me volunteering to implement all this :-)

-- 
/Dennis Bj�rklund


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

[HACKERS] sql92 character sets

Reply via email to