Hello, Postgres hackers,

The copy code has used batch insert with function heap_multi_insert() to
speed up. It seems that Create Table As or Materialized View could leverage
that code also to boost the performance also. Attached is a patch to
implement that. That was done by Taylor (cc-ed) and me.

The patch also modifies heap_multi_insert() a bit to do a bit further
code-level optimization by using static memory, instead of using memory
context and dynamic allocation. For Modifytable->insert, it seems that
there are more limitations for batch insert (trigger, etc?) but it seems
that it is possible that we could do batch insert for the case that we
could do?

By the way, while looking at the code, I noticed that there are 9 local
arrays with large length in toast_insert_or_update() which seems to be a
risk of stack overflow. Maybe we should put it as static or global.

Here is a quick simple performance testing on a mirrorless Postgres
instance with the SQLs below. The tests cover tables with small column
length, large column length and toast.

-- tuples with small size.
drop table if exists t1;
create table t1 (a int);

insert into t1 select * from generate_series(1, 10000000);
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

-- tuples that are untoasted and data that is 1664 bytes wide
drop table if exists t1;
create table t1 (a name, b name, c name, d name, e name, f name, g name, h
name, i name, j name, k name, l name, m name, n name, o name, p name, q
name, r name, s name, t name, u name, v name, w name, x name, y name, z
name);

insert into t1 select 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y',
'z' from generate_series(1, 500000);
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

-- tuples that are toastable.
drop table if exists t1;
create table t1 (a text, b text, c text, d text, e text, f text, g text, h
text, i text, j text, k text, l text, m text, n text, o text, p text, q
text, r text, s text, t text, u text, v text, w text, x text, y text, z
text);

insert into t1 select i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i,
i, i, i, i, i, i, i, i from (select repeat('123456789', 10000) from
generate_series(1,2000)) i;
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

Here are the timing results:

With the patch,

Time: 4728.142 ms (00:04.728)
Time: 14203.983 ms (00:14.204)
Time: 1008.669 ms (00:01.009)

Baseline,
Time: 11096.146 ms (00:11.096)
Time: 13106.741 ms (00:13.107)
Time: 1100.174 ms (00:01.100)

While for toast and large column size there is < 10% decrease but for small
column size the improvement is super good. Actually if I hardcode the batch
count as 4 all test cases are better but the improvement for small column
size is smaller than that with current patch. Pretty much the number 4 is
quite case specific so I can not hardcode that in the patch. Of course we
could further tune that but the current value seems to be a good trade-off?

Thanks.

Attachment: 0001-Heap-batch-insert-for-CTAS.patch
Description: Binary data

Reply via email to