On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii <is...@postgresql.org> wrote: > What about limiting to use NCHAR with a database which has same > encoding or "compatible" encoding (on which the encoding conversion is > defined)? This way, NCHAR text can be automatically converted from > NCHAR to the database encoding in the server side thus we can treat > NCHAR exactly same as CHAR afterward. I suppose what encoding is used > for NCHAR should be defined in initdb time or creation of the database > (if we allow this, we need to add a new column to know what encoding > is used for NCHAR). > > For example, "CREATE TABLE t1(t NCHAR(10))" will succeed if NCHAR is > UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is > SHIFT-JIS and database encoding is UTF-8 because there is a conversion > between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is > SHIFT-JIS and database encoding is ISO-8859-1 because there's no > conversion between them.
I think the point here is that, at least as I understand it, encoding conversion and sanitization happens at a very early stage right now, when we first receive the input from the client. If the user sends a string of bytes as part of a query or bind placeholder that's not valid in the database encoding, it's going to error out before any type-specific code has an opportunity to get control. Look at textin(), for example. There's no encoding check there. That means it's already been done at that point. To make this work, someone's going to have to figure out what to do about *that*. Until we have a sketch of what the design for that looks like, I don't see how we can credibly entertain more specific proposals. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers