I hear ya. It might be a premature optimization but I still think there may be benefit for the case of large scale extraction and in- database transformation of large JSON datastructures. We have terabytes of this stuff and I'd like something between the hip nosql options and a fully structured SQL datastore.
Terry Sent from my iPhone On Oct 19, 2010, at 6:36 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Oct 19, 2010 at 6:56 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Greg Stark <gsst...@mit.edu> writes: >>> The elephant in the room is if the binary encoded form is smaller then >>> it occupies less ram and disk bandwidth to copy it around. >> >> It seems equally likely that a binary-encoded form could be larger >> than the text form (that's often true for our other datatypes). >> Again, this is an argument that would require experimental evidence >> to back it up. > > That's exactly what I was thinking when I read Greg's email. I > designed something vaguely (very vaguely) like this many years ago and > the binary format that I worked so hard to create was enormous > compared to the text format, mostly because I had a lot of small > integers in the data I was serializing, and as it turns out, > representing {0,1,2} in less than 7 bytes is not very easy. It can > certainly be done if you set out to optimize for precisely those kinds > of cases, but I ended up with something awful like: > > <4 byte type = list> <4 byte list length = 3> <4 byte type = integer> > <4 byte integer = 0> <4 byte type = integer> <4 byte integer = 1> <4 > byte type = integer> <4 byte integer = 2> > > = 32 bytes. Even if you were a little smarter than I was and used 2 > byte integers (with some escape hatch allowing larger numbers to be > represented) it's still more than twice the size of the text > representation. Even if you use 1 byte integers it's still bigger. > To get it down to being smaller, you've got to do something like make > the high nibble of each byte a type field and the low nibble the first > 4 payload bits. You can certainly do all of this but you could also > just store it as text and let the TOAST compression algorithm worry > about making it smaller. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers