On 02/23/2011 10:22 AM, Heikki Linnakangas wrote:
On 23.02.2011 17:16, Andrew Dunstan wrote:
On 02/23/2011 10:09 AM, Peter Geoghegan wrote:
On 23 February 2011 04:36, Greg Stark<gsst...@mit.edu> wrote:
This is only true for server encodings. In a client library I think
you lose on this and do have to deal with it. I'm not sure what client
encodings we do support that aren't ascii-supersets though, it's
possible none of them generate quote characters this way.
I'm pretty sure all of the client encodings Tatsuo mentions are ASCII
supersets. The absence of by far the most popular non-ASCII superset
encoding, UTF-16, as a client encoding indicated that to me. It isn't
byte oriented, and Postgres is.

They are not. It's precisely because they are not that they are not
allowed as server encodings.

To be precise, they are all ASCII supersets in the sense that a valid 7-bit ASCII string is valid and means the same thing in all of the client-only encodings as well. The difference between supported server-encodings and those that are only supported as client_encoding is whether *all* bytes in a multi-byte character have the high bit set. All server-encodings have that property, and we rely on it in the backend. In the supported client-only encodings, the *first* byte of a multi-byte character is guaranteed to have the high bit set, but the subsequent bytes are not.

Yes, that's a better explanation.


Even that more loose property isn't true for UTF-16, which is why we don't support it even as a client-only encoding.

The fact that UTF-16 uses nul bytes would make it particularly hard to handle.

There might be value in having a UTF-16 aware version of libpq that would translate strings into UTF-8 on the way to the server and to UTF-16 on the way back to the client.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to