RE: Parallel INSERT (INTO ... SELECT ...)

tsunakawa.ta...@fujitsu.com Fri, 26 Feb 2021 00:02:19 -0800

From: Hou, Zhijie/侯 志杰 <houzj.f...@fujitsu.com>
> After doing some more tests on it (performance degradation will not happen
> when source table is out of order).
> I think we can say the performance degradation is related to the order of the
> data in source table.
...
> So, the order of data 's influence seems a normal phenomenon, I think it seems
> we do not need to do anything about it (currently).
> It seems better to mark it as todo which we can improve this in the future.
> 
> (Since the performance degradation in parallel bitmap is because of the lock 
> in
> _bt_search, It will not always happen when the target table already have data,
> less race condition will happened when parallel insert into a evenly 
> distributed
> btree).

I think so, too. The slowness of parallel insert operation due to index page
contention, and index bloat, would occur depending on the order of the index
key values of source records.

I guess other DBMSs exhibit similar phenomenon, but I couldn't find such
description in the manual, whitepapers, or several books on Oracle. One
relevant excerpt is the following. This is about parallel index build. Oracle
tries to minimize page contention and index bloat. This is completely my
guess, but they may do similar things in parallel INSERT SELECT, because Oracle
holds an exclusive lock on the target table. SQL Server also acquires an
exclusive lock. Maybe we can provide an option to do so in the future.

https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-tips.html#GUID-08A08783-C243-4872-AFFA-56B603F1F0F5
--------------------------------------------------
Optimizing Performance by Creating Indexes in Parallel
...
Multiple processes can work simultaneously to create an index. By dividing the
work necessary to create an index among multiple server processes, Oracle
Database can create the index more quickly than if a single server process
created the index serially.

Parallel index creation works in much the same way as a table scan with an
ORDER BY clause. The table is randomly sampled and a set of index keys is found
that equally divides the index into the same number of pieces as the DOP. A
first set of query processes scans the table, extracts key-rowid pairs, and
sends each pair to a process in a second set of query processes based on a key.
Each process in the second set sorts the keys and builds an index in the usual
fashion. After all index pieces are built, the parallel execution coordinator
simply concatenates the pieces (which are ordered) to form the final index.
...
When creating an index in parallel, the STORAGE clause refers to the storage of
each of the subindexes created by the query server processes. Therefore, an
index created with an INITIAL value of 5 MB and a DOP of 12 consumes at least
60 MB of storage during index creation because each process starts with an
extent of 5 MB. When the query coordinator process combines the sorted
subindexes, some extents might be trimmed, and the resulting index might be
smaller than the requested 60 MB.
--------------------------------------------------

IIRC, the current patch showd nice performance improvement in some (many?)
patterns. So, I think it can be committed in PG 14, when it has addressed the
plan cache issue that Amit Langote-san posed. I remember the following
issues/comments are pending, but they are not blockers:

1. Insert operation is run serially when the target table has a foreign key,
sequence or identity column.
This can be added later based on the current design without requiring rework.
That is, the current patch leaves no debt. (Personally, foreign key and
sequence support will also be wanted in PG 14. We may try them in the last CF
once the current patch is likely to be committable.)

2. There's a plausible reason for the performance variation and index bloat
with the bitmap scan case.
Ideally, we want to come up with a solution that can be incorporated in PG 15.

Or, it may be one conclusion that we can't prevent performance degradation in
all cases. That may be one hidden reason why Oracle and SQL Server doesn't
enable parallel DML by default.

We can advise the user in the manual that parallel DML is not always faster
than serial operation so he should test performance by enabling and disabling
parallel DML. Also, maybe we should honestly state that indexes can get a bit
bigger after parallel insert than after serial insert, and advise the user to
do REINDEX CONCURRENTLY if necessary.

3. The total time of parallel execution can get longer because of unbalanced
work distribution among parallel workers.
This seems to be an existing problem, so we can pursue the improvement later,
hopefully before the release of PG 14.

Does anyone see any problem with committing the current patch (after polishing
it)?

Regards
Takayuki Tsunakawa

RE: Parallel INSERT (INTO ... SELECT ...)

Reply via email to