On ons, 2010-09-08 at 10:18 +0300, Marko Kreen wrote: > On 9/7/10, Peter Eisentraut <pete...@gmx.net> wrote: > > On sön, 2010-08-22 at 15:15 -0400, Tom Lane wrote: > > > > We combine the surrogate pair components to a single code point and > > > > encode that in UTF-8. We don't encode the components separately; > > > that > > > > would be wrong. > > > > > > Oh, OK. Should the docs make that a bit clearer? > > > > > > Done. > > This is confusing: > > (When surrogate > pairs are used when the server encoding is <literal>UTF8</>, they > are first combined into a single code point that is then encoded > in UTF-8.) > > So something else happens if encoding is not UTF8?
Then you can't specify surrogate pairs because they are outside of the ASCII range, per constraint mentioned earlier in the paragraph. > I think this part can be simply removed, it does not add anything. > > Or say that surrogate pairs are only allowed in UTF8 encoding. > Reason is that you cannot encode 0..7F codepoints with them, > and only those are allowed to be given numerically. But this is > already mentioned before. Well, Tom wanted an additional explanation. I personally agree with you; this is not the place to explain encoding and Unicode internals, when really the code only does what it's supposed to. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers