On Sat, Jul 23, 2011 at 11:14 PM, Robert Haas <robertmh...@gmail.com> wrote: > I doubt you're going to want to reinvent TOAST, ...
I was thinking about making it efficient to access or update foo.a.b.c.d[1000] in a huge JSON tree. Simply TOASTing the varlena text means we have to unpack the entire datum to access and update individual members. An alternative would be to split the JSON into chunks (possibly by using the pg_toast_<id> table) and have some sort of index that can be used to efficiently look up values by path. This would not be trivial, and I don't plan to implement it any time soon. > On Sun, Jul 24, 2011 at 2:19 PM, Florian Pflug <f...@phlo.org> wrote: > On Jul24, 2011, at 05:14 , Robert Haas wrote: >> On Fri, Jul 22, 2011 at 10:36 PM, Joey Adams <joeyadams3.14...@gmail.com> >> wrote: >>> ... Fortunately, JSON's definition of a >>> "number" is its decimal syntax, so the algorithm is child's play: >>> >>> * Figure out the digits and exponent. >>> * If the exponent is greater than 20 or less than 6 (arbitrary), use >>> exponential notation. >>> >> > > I agree. As for your proposed algorithm, I suggest to instead use > exponential notation if it produces a shorter textual representation. > In other words, for values between -1 and 1, we'd switch to exponential > notation if there's more than 1 leading zero (to the right of the decimal > point, of course), and for values outside that range if there're more than > 2 trailing zeros and no decimal point. All after redundant zeros and > decimal points are removed. So we'd store > > 0 as 0 > 1 as 1 > 0.1 as 0.1 > 0.01 as 0.01 > 0.001 as 1e-3 > 10 as 10 > 100 as 100 > 1000 as 1e3 > 1000.1 as 1000.1 > 1001 as 1001 > Interesting idea. The reason I suggested using exponential notation only for extreme exponents (less than -6 or greater than +20) is partly for presentation value. Users might be annoyed to see 1000000 turned into 1e6. Moreover, applications working solely with integers that don't expect the floating point syntax may choke on the converted numbers. 32-bit integers can be losslessly encoded as IEEE double-precision floats (JavaScript's internal representation), and JavaScript's algorithm for converting a number to a string ([1], section 9.8.1) happens to preserve the integer syntax (I think). Should we follow the JavaScript standard for rendering numbers (which my suggestion approximates)? Or should we use the shortest encoding as Florian suggests? - Joey [1]: http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262%205th%20edition%20December%202009.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers