> > > It did have performance gain, but I think it's not huge enough to > > > ignore the extra's index cost. > > > What do you think ? > > > > Yes... as you suspect, I'm afraid the benefit from parallel bitmap > > scan may not compensate for the loss of the parallel insert operation. > > > > The loss is probably due to 1) more index page splits, 2) more buffer > > writes (table and index), and 3) internal locks for things such as > > relation extension and page content protection. To investigate 3), we > > should want something like [1], which tells us the wait event > > statistics (wait count and time for each wait event) per session or > > across the instance like Oracke, MySQL and EDB provides. I want to > continue this in the near future. > > What would the result look like if you turn off > parallel_leader_participation? If the leader is freed from > reading/writing the table and index, the index page splits and internal > lock contention may decrease enough to recover part of the loss. > > https://www.postgresql.org/docs/devel/parallel-plans.html > > "In a parallel bitmap heap scan, one process is chosen as the leader. That > process performs a scan of one or more indexes and builds a bitmap indicating > which table blocks need to be visited. These blocks are then divided among > the cooperating processes as in a parallel sequential scan. In other words, > the heap scan is performed in parallel, but the underlying index scan is > not."
If I disable parallel_leader_participation. For max_parallel_workers_per_gather = 4, It still have performance degradation. For max_parallel_workers_per_gather = 2, the performance degradation will not happen in most of the case. There is sometimes a noise(performance degradation), but most of result(about 80%) is good. Best regards, houzj
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=4272.20..1269111.15 rows=79918 width=0) (actual time=381.764..382.715 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 WAL: records=260498 bytes=17019359 -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 width=0) (actual time=378.227..378.229 rows=0 loops=5) Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 WAL: records=260498 bytes=17019359 Worker 0: actual time=376.638..376.640 rows=0 loops=1 Buffers: shared hit=88281 dirtied=236 written=337 WAL: records=56167 bytes=3674994 Worker 1: actual time=377.889..377.892 rows=0 loops=1 Buffers: shared hit=81231 dirtied=213 written=99 WAL: records=52175 bytes=3386885 Worker 2: actual time=377.388..377.389 rows=0 loops=1 Buffers: shared hit=82544 dirtied=232 written=279 WAL: records=52469 bytes=3443843 Worker 3: actual time=377.733..377.734 rows=0 loops=1 Buffers: shared hit=87728 dirtied=225 written=182 WAL: records=56307 bytes=3660758 -> Parallel Bitmap Heap Scan on public.x (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787 rows=26000 loops=5) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 10000 Heap Blocks: exact=167 Buffers: shared hit=1475 Worker 0: actual time=5.203..14.921 rows=28028 loops=1 Buffers: shared hit=209 Worker 1: actual time=5.165..14.403 rows=26039 loops=1 Buffers: shared hit=196 Worker 2: actual time=5.188..14.284 rows=26177 loops=1 Buffers: shared hit=195 Worker 3: actual time=5.151..14.760 rows=28104 loops=1 Buffers: shared hit=208 -> BitmapOr (cost=3272.20..3272.20 rows=174813 width=0) (actual time=8.288..8.290 rows=0 loops=1) Buffers: shared hit=500 -> Bitmap Index Scan on x_a_idx (cost=0.00..1468.38 rows=79441 width=0) (actual time=3.681..3.681 rows=79999 loops=1) Index Cond: (x.a < 80000) Buffers: shared hit=222 -> Bitmap Index Scan on x_a_idx (cost=0.00..1763.86 rows=95372 width=0) (actual time=4.605..4.605 rows=100000 loops=1) Index Cond: (x.a > 199900000) Buffers: shared hit=278 Planning: Buffers: shared hit=19 Planning Time: 0.173 ms Execution Time: 382.776 ms (47 rows) Execuse me, tsunakawa san, sorry to bother you, May I ask you some questions ? >[1] >Add accumulated statistics for wait event >https://commitfest.postgresql.org/28/2332/ I remembered FEP has such feature, is this the same as FEP. Ok. thanks a lot for the explanation. BTW, I am sorry the tablename I tested today is different from yesterday, It may confused you, I have check that the result of both is the same.
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ - Gather (cost=4258.59..2063373.72 rows=81338 width=0) (actual time=0.203..310.184 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=393777 read=4 dirtied=939 written=953 WAL: records=260356 bytes=16281081 -> Insert on public.testscan (cost=3258.59..2054239.92 rows=0 width=0) (actual time=305.474..305.476 rows=0 loops=2) Buffers: shared hit=393777 read=4 dirtied=939 written=953 WAL: records=260356 bytes=16281081 Worker 0: actual time=305.378..305.379 rows=0 loops=1 Buffers: shared hit=196680 read=3 dirtied=462 written=478 WAL: records=130155 bytes=8129880 Worker 1: actual time=305.570..305.572 rows=0 loops=1 Buffers: shared hit=197097 read=1 dirtied=477 written=475 WAL: records=130201 bytes=8151201 -> Parallel Bitmap Heap Scan on public.x (cost=3258.59..2054239.92 rows=40669 width=8) (actual time=8.883..27.998 rows=65000 loops=2) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 25000 Buffers: shared hit=1476 Worker 0: actual time=8.796..27.982 rows=64992 loops=1 Buffers: shared hit=487 Worker 1: actual time=8.971..28.014 rows=65007 loops=1 Buffers: shared hit=989 -> BitmapOr (cost=3258.59..3258.59 rows=173971 width=0) (actual time=8.810..8.811 rows=0 loops=1) Buffers: shared hit=501 Worker 1: actual time=8.810..8.811 rows=0 loops=1 Buffers: shared hit=501 -> Bitmap Index Scan on x_a_idx (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.832..3.832 rows=79999 loops=1) Index Cond: (x.a < 80000) Buffers: shared hit=223 Worker 1: actual time=3.832..3.832 rows=79999 loops=1 Buffers: shared hit=223 -> Bitmap Index Scan on x_a_idx (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.976..4.976 rows=100000 loops=1) Index Cond: (x.a > 199900000) Buffers: shared hit=278 Worker 1: actual time=4.976..4.976 rows=100000 loops=1 Buffers: shared hit=278 Planning: Buffers: shared hit=19 Planning Time: 0.146 ms Execution Time: 310.257 ms (42 rows)
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0) (actual time=360.474..360.476 rows=0 loops=1) Buffers: shared hit=392569 read=3 dirtied=934 written=933 WAL: records=260354 bytes=16259841 -> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918 width=8) (actual time=8.096..41.005 rows=129999 loops=1) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 50000 Heap Blocks: exact=975 Buffers: shared hit=1475 -> BitmapOr (cost=3272.20..3272.20 rows=174813 width=0) (actual time=7.975..7.976 rows=0 loops=1) Buffers: shared hit=500 -> Bitmap Index Scan on x_a_idx (cost=0.00..1468.38 rows=79441 width=0) (actual time=3.469..3.470 rows=79999 loops=1) Index Cond: (x.a < 80000) Buffers: shared hit=222 -> Bitmap Index Scan on x_a_idx (cost=0.00..1763.86 rows=95372 width=0) (actual time=4.499..4.499 rows=100000 loops=1) Index Cond: (x.a > 199900000) Buffers: shared hit=278 Planning: Buffers: shared hit=10 Planning Time: 0.126 ms Execution Time: 360.547 ms (22 rows)