I've been thinking about INSERT performance and noticed that copyfrom.c
(COPY FROM) performs ~4 unnecessary pointer-deferences per record in the
case when there's no indexes and no AFTER ROW INSERT triggers (i.e. when
you just want to load data really fast!).

I moved the for-loop inside the per-batch if-checks and got a little
speedup. Obviously, this only matters for CPU-bound INSERTs with very
narrow tables - if there's other overhead (including parsing), this gain
disappears into the noise. I'm not a regular contributor, apologies in
advance if I got something wrong, and no worries if this is too small to
bother. My patch below passes "make check". I'll of course post other wins
as I find them, but this one seemed easy.

My reference test comes from a conversation on HN (
https://news.ycombinator.com/item?id=38864213 ) loading 100M tiny records
from COPY TO ... BINARY on a GCP c2d-standard-8:
https://gcloud-compute.com/c2-standard-8.html (8 vCPU, 32GB, network SSD).

time sh -c "echo \"drop table if exists tbl; create unlogged table tbl(city
int2, temp int2);copy tbl FROM '/home/asah/citydata.bin' binary;\" |
./pg/bin/postgres --single -D tmp -p 9999 postgres";

results from 3 runs:
real 0m26.488s, user 0m14.745s, sys 0m3.299s
real 0m28.978s, user 0m14.010s, sys 0m3.288s
real 0m28.920s, user 0m14.028s, sys 0m3.201s
==>
real 0m24.483s, user 0m13.280s, sys 0m3.305s
real 0m28.668s, user 0m13.095s, sys 0m3.501s
real 0m28.306s, user 0m13.032s, sys 0m3.505s


On my mac m1 air,

real 0m11.922s, user 0m10.220s, sys 0m1.302s
real 0m12.761s, user 0m10.137s, sys 0m1.401s
real 0m12.734s, user 0m10.146s, sys 0m1.376s
==>
real 0m12.173s, user 0m9.785s, sys 0m1.221s
real 0m12.462s, user 0m9.691s, sys 0m1.393s
real 0m12.266s, user 0m9.719s, sys 0m1.390s


patch: (passes "make check" - feel free to drop/replace my comments of
course)

diff --git a/src/backend/commands/copyfrom.c
b/src/backend/commands/copyfrom.c
index 37836a769c..d3783678e0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -421,13 +421,14 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo
*miinfo,
                                                   buffer->bistate);
                MemoryContextSwitchTo(oldcontext);

-               for (i = 0; i < nused; i++)
-               {
                /*
                 * If there are any indexes, update them for all the
inserted
                 * tuples, and run AFTER ROW INSERT triggers.
                 */
                if (resultRelInfo->ri_NumIndices > 0)
+               {
+                       /* expensive inner loop hidden by if-check */
+                       for (i = 0; i < nused; i++)
                        {
                                List       *recheckIndexes;

@@ -441,6 +442,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,

 cstate->transition_capture);
                                list_free(recheckIndexes);
                        }
+               }

                /*
                 * There's no indexes, but see if we need to run AFTER ROW
INSERT
@@ -449,15 +451,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo
*miinfo,
                else if (resultRelInfo->ri_TrigDesc != NULL &&

 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||

resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+               {
+                       /* expensive inner loop hidden by if-check */
+                       for (i = 0; i < nused; i++)
                        {
                                cstate->cur_lineno = buffer->linenos[i];
                                ExecARInsertTriggers(estate, resultRelInfo,

 slots[i], NIL,

 cstate->transition_capture);
-                       }

                                ExecClearTuple(slots[i]);
                        }
+               }

                /* Update the row counter and progress of the COPY command
*/
                *processed += nused;

hope this helps,
adam

Reply via email to