From: Hou, Zhijie/侯 志杰 <houzj.f...@fujitsu.com>
> After doing some more tests on it (performance degradation will not happen
> when source table is out of order).
> I think we can say the performance degradation is related to the order of the
> data in source table.
...
> So, the order of data 's influence seems a normal phenomenon, I think it seems
> we do not need to do anything about it (currently).
> It seems better to mark it as todo which we can improve this in the future.
> 
> (Since the performance degradation in parallel bitmap is because of the lock 
> in
> _bt_search, It will not always happen when the target table already have data,
> less race condition will happened when parallel insert into a evenly 
> distributed
> btree).

I think so, too.  The slowness of parallel insert operation due to index page 
contention, and index bloat, would occur depending on the order of the index 
key values of source records.

I guess other DBMSs exhibit similar phenomenon, but I couldn't find such 
description in the manual, whitepapers, or several books on Oracle.  One 
relevant excerpt is the following.  This is about parallel index build.  Oracle 
tries to minimize page contention and index bloat.  This is completely my 
guess, but they may do similar things in parallel INSERT SELECT, because Oracle 
holds an exclusive lock on the target table.  SQL Server also acquires an 
exclusive lock.  Maybe we can provide an option to do so in the future.

https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-tips.html#GUID-08A08783-C243-4872-AFFA-56B603F1F0F5
--------------------------------------------------
Optimizing Performance by Creating Indexes in Parallel
...
Multiple processes can work simultaneously to create an index. By dividing the 
work necessary to create an index among multiple server processes, Oracle 
Database can create the index more quickly than if a single server process 
created the index serially.

Parallel index creation works in much the same way as a table scan with an 
ORDER BY clause. The table is randomly sampled and a set of index keys is found 
that equally divides the index into the same number of pieces as the DOP. A 
first set of query processes scans the table, extracts key-rowid pairs, and 
sends each pair to a process in a second set of query processes based on a key. 
Each process in the second set sorts the keys and builds an index in the usual 
fashion. After all index pieces are built, the parallel execution coordinator 
simply concatenates the pieces (which are ordered) to form the final index.
...
When creating an index in parallel, the STORAGE clause refers to the storage of 
each of the subindexes created by the query server processes. Therefore, an 
index created with an INITIAL value of 5 MB and a DOP of 12 consumes at least 
60 MB of storage during index creation because each process starts with an 
extent of 5 MB. When the query coordinator process combines the sorted 
subindexes, some extents might be trimmed, and the resulting index might be 
smaller than the requested 60 MB.
--------------------------------------------------


IIRC, the current patch showd nice performance improvement in some (many?) 
patterns.  So, I think it can be committed in PG 14, when it has addressed the 
plan cache issue that Amit Langote-san posed.  I remember the following 
issues/comments are pending, but they are not blockers:

1. Insert operation is run serially when the target table has a foreign key, 
sequence or identity column.
This can be added later based on the current design without requiring rework.  
That is, the current patch leaves no debt.  (Personally, foreign key and 
sequence support will also be wanted in PG 14.  We may try them in the last CF 
once the current patch is likely to be committable.)

2. There's a plausible reason for the performance variation and index bloat 
with the bitmap scan case.
Ideally, we want to come up with a solution that can be incorporated in PG 15.

Or, it may be one conclusion that we can't prevent performance degradation in 
all cases.  That may be one hidden reason why Oracle and SQL Server doesn't 
enable parallel DML by default.

We can advise the user in the manual that parallel DML is not always faster 
than serial operation so he should test performance by enabling and disabling 
parallel DML.  Also, maybe we should honestly state that indexes can get a bit 
bigger after parallel insert than after serial insert, and advise the user to 
do REINDEX CONCURRENTLY if necessary.

3. The total time of parallel execution can get longer because of unbalanced 
work distribution among parallel workers.
This seems to be an existing problem, so we can pursue the improvement later, 
hopefully before the release of PG 14.


Does anyone see any problem with committing the current patch (after polishing 
it)?


Regards
Takayuki Tsunakawa


Reply via email to