Re: [HACKERS] json api WIP patch

Andrew Dunstan Tue, 08 Jan 2013 06:59:30 -0800


On 01/08/2013 01:45 AM, james wrote:

The processing functions have been extended to providepopulate_record() and populate_recordset() functions.The latter inparticular could be useful in decomposing a piece of jsonrepresenting an array of flat objects (a fairly common pattern) intoa set of Postgres records in a single pass.
So this would allow an 'insert into ... select ... from<unpack-the-JSON>(...)'?


Yes.

I had been wondering how to do such an insertion efficiently in thecontext of SPI, but it seems that there is no SPI_copy equiv thatwould allow a query parse and plan to be avoided.

Your query above would need to be planned too, although the plan will betrivial.

Is this mechanism likely to be as fast as we can get at the moment incontexts where copy is not feasible?

You should not try to use it as a general bulk load facility. And itwill not be as fast as COPY for several reasons, including that the Jsonparsing routines are necessarily much heavier than the COPY parseroutines, which have in any case been optimized over quite a longperiod. Also, a single json datum is limited to no more than 1Gb. If youhave such a datum, parsing it involves having it in memory and thentaking a copy (I wonder if we could avoid that step - will take a look).Then each object is decomposed into a hash table of key value pairs,which it then used to construct the record datum. Each field name inthe result record is used to look up the value in the hash table - thishappens once in the case of populate_record() and once per object in thearray in the case of populate_recordset(). In the latter case theresulting records are put into a tuplestore structure (which spills todisk if necessary) which is then returned to the caller when all theobjects in the json array are processed. COPY doesn't have these sortsof issues. It knows without having to look things up where each datum isin each record, and it stashes the result straight into the targettable. It can read and insert huge numbers of rows without significantmemory implications.

Both these routines and COPY in non-binary mode use the data type inputroutines to convert text values. In some cases (very notably timestamps)these routines can easily be shown to be fantastically expensivecompared to binary input. This is part of what has led to the creationof utilities like pg_bulkload.

Perhaps if you give us a higher level view of what you're trying toachieve we can help you better.


cheers

andrew





--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json api WIP patch

Reply via email to