On 22/02/16 23:23, Álvaro Hernández Tortosa wrote:


On 22/02/16 05:10, Tom Lane wrote:
Heikki Linnakangas <hlinn...@iki.fi> writes:
On 19/02/16 10:10, �lvaro Hernández Tortosa wrote:
Oleg and I discussed recently that a really good addition to a GSoC
item would be to study whether it's convenient to have a binary
serialization format for jsonb over the wire.
Seems a bit risky for a GSoC project. We don't know if a different
serialization format will be a win, or whether we want to do it in the
end, until the benchmarking is done. It's also not clear what we're
trying to achieve with the serialization format: smaller on-the-wire
size, faster serialization in the server, faster parsing in the client,
or what?
Another variable is that your answers might depend on what format you
assume the client is trying to convert from/to.  (It's presumably not
text JSON, but then what is it?)

As I mentioned before, there are many well-known JSON serialization formats, like:

- http://ubjson.org/
- http://cbor.io/
- http://msgpack.org/
- BSON (ok, let's skip that one hehehe)
- http://wiki.fasterxml.com/SmileFormatSpec


Having said that, I'm not sure that risk is a blocking factor here.
History says that a large fraction of our GSoC projects don't result
in a commit to core PG.  As long as we're clear that "success" in this
project isn't measured by getting a feature committed, it doesn't seem
riskier than any other one.  Maybe it's even less risky, because there's
less of the success condition that's not under the GSoC student's control.


I wanted to bring an update here. It looks like someone did the expected benchmark "for us" :)

https://eng.uber.com/trip-data-squeeze/    (thanks Alam for the link)

While this is Uber's own test, I think the conclusions are quite significant: an encoding like message pack + zlib requires only 14% of the size and encodes+decodes in 76% of the time of JSON. There are of course other contenders that trade better encoding times over slightly slower decoding and bigger size. But there are very interesting numbers on this benchmark. MessagePack, CBOR and UJSON (all + zlib) look like really good options.

So now that we have this data I would like to ask these questions to the community:

- Is this enough, or do we need to perform our own, different benchmarks?

- If this is enough, and given that we weren't elected for GSoC, is there interest in the community to work on this nonetheless?

- Regarding GSoC: it looks to me that we failed to submit in time. Is this what happened, or we weren't selected? If the former (and no criticism here, just realizing a fact) what can we do next year to avoid this happening again? Is anyone "appointed" to take care of it?


    Álvaro

--
Álvaro Hernández Tortosa


-----------
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to