"Arulappan, Arul Shaji" <a...@fast.au.fujitsu.com> writes: > Given below is a design draft for this functionality:
> Core new functionality (new code): > 1)Create and register independent NCHAR/NVARCHAR/NTEXT data types. > 2)Provide support for the new GUC nchar_collation to provide the > database with information about the default collation that needs to be > used for the new data types. A GUC seems like completely the wrong tack to be taking. In the first place, that would mandate just one value (at a time anyway) of collation, which is surely not much of an advance over what's already possible. In the second place, what happens if you change the value? All your indexes on nchar columns are corrupt, that's what. Actually the data itself would be corrupt, if you intend that this setting determines the encoding and not just the collation. If you really are speaking only of collation, it's not clear to me exactly what this proposal offers that can't be achieved today (with greater security, functionality and spec compliance) by using COLLATE clauses on plain text columns. Actually, you really haven't answered at all what it is you want to do that COLLATE can't do. > 4)Because all symbols from non-UTF8 encodings could be represented as > UTF8 (but the reverse is not true) comparison between N* types and the > regular string types inside database will be performed in UTF8 form. I believe that in some Far Eastern character sets there are some characters that map to the same Unicode glyph, but that some people would prefer to keep separate. So transcoding to UTF8 isn't necessarily lossless. This is one of the reasons why we've resisted adopting ICU or standardizing on UTF8 as the One True Database Encoding. Now this may or may not matter for comparison to strings that were in some other encoding to start with --- but as soon as you base your design on the premise that UTF8 is a universal encoding, you are sliding down a slippery slope to a design that will meet resistance. > 6)Client input/output of NATIONAL strings - NATIONAL strings will > respect the client_encoding setting, and their values will be > transparently converted to the requested client_encoding before > sending(receiving) to client (the same mechanics as used for usual > string types). > So no mixed encoding in client input/output will be supported/available. If you have this restriction, then I'm really failing to see what benefit there is over what can be done today with COLLATE. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers