On 06/11/2013 03:42 PM, Andrew Dunstan wrote: > > On 06/11/2013 09:16 AM, Hannu Krosing wrote: > > >>> >>> It's a pity that we don't have a non-error producing conversion >>> function >>> (or if we do that I haven't found it). Then we might adopt a rule for >>> processing >>> unicode escapes that said "convert unicode escapes to the database >>> encoding >> only when extracting JSON keys or values to text makes it sense to >> unescape >> to database encoding. > > That's exactly the scenario we are talking about. When emitting JSON > the functions have always emitted unicode escapes as they are in the > text, and will continue to do so. > >> >> strings inside JSON itself are by definition utf8 > > > We have deliberately extended that to allow JSON strings to be in any > database server encoding. Ugh!
Does that imply that we just not "allow" it, but rather "require" it ? Why are we arguing the "unicode surrogate pairs" as a "JSON thing" then ? Should it not be "client to server encoding conversion thing" instead ? > That was argued back in the 9.2 timeframe and I am not interested in > re-litigating it. > > The only issue at hand is how to handle unicode escapes (which in > their string form are pure ASCII) when emitting text strings. Unicode escapes in non-unicode strings seem something that is ill-defined by nature ;) That is, you can't come up with a good general answer for this. >>> if possible, and if not then emit them unchanged." which might be a >>> reasonable >>> compromise. >> I'd opt for "... and if not then emit them quoted". The default should >> be not loosing >> any data. >> >> >> > > > I don't know what this means at all. Quoted how? Let's say I have a > Latin1 database and have the following JSON string: "\u20AC2.00". In a > UTF8 database the text representation of this is €2.00 - what are you > saying it should be in the Latin1 database? utf8-quote the '€' - "\u20AC2.00" That is, convert unicode-->Latin1 what has a correspondence, utf8-quote anything that does not. If we allow unicode escapes in non-unicode strings anyway, then this seems the most logical thing to do. > > cheers > > andrew > > -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers