Re: [HACKERS] Wanted: jsonb on-disk representation documentation

Andres Freund Wed, 07 May 2014 11:15:06 -0700

On 2014-05-07 10:55:28 -0700, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 4:40 AM, Andres Freund <[email protected]> wrote:
> > * The jentry representation should be changed so it's possible to get the 
> > type
> >   of a entry without checking individual types. That'll make code like
> >   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
> >   much easier to read. Preferrably so it an have the same values (after
> >   shifting/masking) ask the JBE variants. And it needs space for futher
> >   types (or representations thereof).
>
> I don't know what you mean. Every place that macros like
> JBE_ISNUMERIC() are used subsequently involves access to (say) numeric
> union fields. At best, you could have those functions check that the
> types match, and then handle each case in a switch that only looked at
> (say) the "candidate", but that doesn't really save much at all. It
> wouldn't take much to have the macros give enum constant values back
> as you suggest, though.


Compare
                for (i = 0; i < count; i++)
                {
                        JEntry     *e = array + i;

                        if (JBE_ISNULL(*e) && key->type == jbvNull)
                        {
                                result->type = jbvNull;
                                result->estSize = sizeof(JEntry);
                        }
                        else if (JBE_ISSTRING(*e) && key->type == jbvString)
                        {
                                result->type = jbvString;
                                result->val.string.val = data + JBE_OFF(*e);
                                result->val.string.len = JBE_LEN(*e);
                                result->estSize = sizeof(JEntry) + 
result->val.string.len;
                        }
                        else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
                        {
                                result->type = jbvNumeric;
                                result->val.numeric = (Numeric) (data + 
INTALIGN(JBE_OFF(*e)));

                                result->estSize = 2 * sizeof(JEntry) +
                                        VARSIZE_ANY(result->val.numeric);
                        }
                        else if (JBE_ISBOOL(*e) && key->type == jbvBool)
                        {
                                result->type = jbvBool;
                                result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
                                result->estSize = sizeof(JEntry);
                        }
                        else
                                continue;

                        if (compareJsonbScalarValue(key, result) == 0)
                                return result;
                }
with

                for (i = 0; i < count; i++)
                {
                        JEntry     *e = array + i;

                        if (!JBE_TYPE_IS_SCALAR(*e))
                            continue;

                        if (JBE_TYPE(*e) != key->type)
                            continue;

                        result = getJsonbValue(e);

                        if (compareJsonbScalarValue(key, result) == 0)
                                return result;
                }

Yes, it's not a fair comparison because I made up getJsonbValue(). But
it's *much* more readable regardless. And there's more places that could
use it. Like the second half of findJsonbValueFromSuperHeader(). FWIW,
that's one of the requests I definitely made before.

> > * I wonder if the hash/object pair representation is wise and if it
> >   shouldn't be keys combined with offsets first, and then the
> >   values. That will make access much faster. And that's what jsonb
> >   essentially is about.
>
> I like that the physical layout reflects the layout of the original
> JSON document.

Don't see much value in that. This is a binary representation and it'd
be bijective.

> Besides, I don't think this is obviously the case. Are
> you sure that it won't be more useful to make children as close to
> their parents as possible? Particularly given idiomatic usage.

Because - if done right - it would allow elementwise access without
scanning previous values.

> > * I think both arrays and hashes should grow individual structs. With
> >   space for additional flags.

> Why?

Because a) it will make the code more readable b) it'll allow for
adding different representations of hashes/arrays.

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

Reply via email to