(continuing an old thread from
https://www.postgresql.org/message-id/CAMsr%2BYEtamQYZ5EocsuthQCvyvmRnQrucDP6GZynPtf0gsMbuw%40mail.gmail.com
)

Craig Ringer <craig(at)2ndquadrant(dot)com>, 2018-02-26:

> On 26 February 2018 at 04:05, Anthony Communier
<anthony(dot)communier(at)gmail(dot)com> wrote:
>
> > It would be nice if application connected to a Postrgesql database could
> > send and receive JSONB in binary. It could save some useless text
> > conversion. All works could be done on application side which are often
> > more scalable than database itself.
>
> To support this, you'd need to extract PostgreSQL's jsonb support into a C
> library that could be used independently of backend server infrastructure
> like 'palloc' and memory contexts, ereport(), etc. Or write a parallel
> implementation.

At Shift, we also have use cases that would likely be sped up quite a bit
if we could avoid the conversion from JSONB to JSON, and instead pass
binary JSONB to the application side and parse it there (in Go).  I doubt
we'd want to reuse any of Postgres's C code, and would instead go with your
"parallel implementation" idea; I can't imagine it being particularly
difficult to implement a JSONB parser from scratch.

All we need, I think, is a Postgres function raw_jsonb(jsonb) that returns
bytea but is the identity function at the byte level.  (Or allow a cast
from jsonb to bytea.)

Our Go code would then send queries like SELECT col1, col2, raw_jsonb(col3)
FROM table1 WHERE ...; I haven't thought in depth about how we'd parse the
JSONB in Go, but perhaps we can synthesize a stream of JSON tokens from the
binary JSONB (one token at a time, to avoid copies and allocations) and
adapt the streaming parser https://github.com/json-iterator/go to turn it
into Go values.

Sending raw JSONB to Postgres might also be interesting, but I'd start with
receiving.

Would implementing raw_jsonb be as trivial as it sounds?  What about usages
like SELECT raw_jsonb(col3->'foo'); does the subobject returned by '->'
share structure with the containing object, making the conversion to a
self-contained JSONB value less direct?

Can these conversions be implemented without copying the bytes?


An open question about the API contract would be how raw_jsonb would be
affected if Postgres introduces a version 2 of JSONB encoding.  My
intuition is to punt that problem to the application, and define that
raw_jsonb returns whatever version of JSONB is most convenient for Postgres
for that particular datum; this minimizes conversion work on the Postgres
side, which is the purpose of the mechanism.  Applications that want a
stable format can use the conventional textual JSON format.  But I could
see a case for making the function raw_jsonb(int, jsonb) and allowing the
caller to specify what (maximum?) version of JSONB they want.

Reply via email to