On Sat, Jan 23, 2016 at 11:27 PM, Chapman Flack <c...@anastigmatix.net> wrote: > I see in the documentation (and confirm in practice) that a > Unicode character string literal U&'...' is only allowed to have > <Unicode escape value>s representing Unicode characters if the > server encoding is, exactly and only, UTF8. > > Otherwise, it can still have <Unicode escape value>s, but they can only > be in the range \+000001 to \+00007f and can only represent ASCII characters > ... and this isn't just for an ASCII server encoding but for _any server > encoding other than UTF8_. > > I'm a newcomer here, so maybe there was an existing long conversation > where that was determined to be necessary for some deep reason, and I > just need to be pointed to it. > > What I would have expected would be to allow <Unicode escape value>s > for any Unicode codepoint that's representable in the server encoding, > whatever encoding that is. Indeed, that's how I read the SQL standard > (or my scrounged 2006 draft of it, anyway). The standard even lets > you precede U& with _charsetname and have the escapes be allowed to > be any character representable in the specified charset. *That*, I assume, > would be tough to implement in PostgreSQL, since strings don't walk > around with their own personal charsets attached. But what's the reason > for not being able to mention characters available in the server encoding?
I don't know anything for sure here, but I wonder if it would make validating string literals in non-UTF8 encodings significant more costly. When the encoding is UTF-8, the test as to whether the escape sequence forms a legal code point doesn't require any table lookups. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers