> > > It did have performance gain, but I think it's not huge enough to
> > > ignore the extra's index cost.
> > > What do you think ?
> >
> > Yes... as you suspect, I'm afraid the benefit from parallel bitmap
> > scan may not compensate for the loss of the parallel insert operation.
> >
> > The loss is probably due to 1) more index page splits, 2) more buffer
> > writes (table and index), and 3) internal locks for things such as
> > relation extension and page content protection.  To investigate 3), we
> > should want something like [1], which tells us the wait event
> > statistics (wait count and time for each wait event) per session or
> > across the instance like Oracke, MySQL and EDB provides.  I want to
> continue this in the near future.
> 
> What would the result look like if you turn off
> parallel_leader_participation?  If the leader is freed from
> reading/writing the table and index, the index page splits and internal
> lock contention may decrease enough to recover part of the loss.
> 
> https://www.postgresql.org/docs/devel/parallel-plans.html
> 
> "In a parallel bitmap heap scan, one process is chosen as the leader. That
> process performs a scan of one or more indexes and builds a bitmap indicating
> which table blocks need to be visited. These blocks are then divided among
> the cooperating processes as in a parallel sequential scan. In other words,
> the heap scan is performed in parallel, but the underlying index scan is
> not."

If I disable parallel_leader_participation.

For max_parallel_workers_per_gather = 4, It still have performance degradation.

For max_parallel_workers_per_gather = 2, the performance degradation will not 
happen in most of the case.
There is sometimes a noise(performance degradation), but most of result(about 
80%) is good.

Best regards,
houzj


postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL)  insert into testscan 
select a from x where a<80000 or (a%2=0 and a>199900000);
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4272.20..1269111.15 rows=79918 width=0) (actual 
time=381.764..382.715 rows=0 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   Buffers: shared hit=407094 read=4 dirtied=1085 written=1158
   WAL: records=260498 bytes=17019359
   ->  Insert on public.testscan  (cost=3272.20..1260119.35 rows=0 width=0) 
(actual time=378.227..378.229 rows=0 loops=5)
         Buffers: shared hit=407094 read=4 dirtied=1085 written=1158
         WAL: records=260498 bytes=17019359
         Worker 0:  actual time=376.638..376.640 rows=0 loops=1
           Buffers: shared hit=88281 dirtied=236 written=337
           WAL: records=56167 bytes=3674994
         Worker 1:  actual time=377.889..377.892 rows=0 loops=1
           Buffers: shared hit=81231 dirtied=213 written=99
           WAL: records=52175 bytes=3386885
         Worker 2:  actual time=377.388..377.389 rows=0 loops=1
           Buffers: shared hit=82544 dirtied=232 written=279
           WAL: records=52469 bytes=3443843
         Worker 3:  actual time=377.733..377.734 rows=0 loops=1
           Buffers: shared hit=87728 dirtied=225 written=182
           WAL: records=56307 bytes=3660758
         ->  Parallel Bitmap Heap Scan on public.x  (cost=3272.20..1260119.35 
rows=19980 width=8) (actual time=5.832..14.787 rows=26000 loops=5)
               Output: x.a, NULL::integer
               Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 
199900000)))
               Rows Removed by Filter: 10000
               Heap Blocks: exact=167
               Buffers: shared hit=1475
               Worker 0:  actual time=5.203..14.921 rows=28028 loops=1
                 Buffers: shared hit=209
               Worker 1:  actual time=5.165..14.403 rows=26039 loops=1
                 Buffers: shared hit=196
               Worker 2:  actual time=5.188..14.284 rows=26177 loops=1
                 Buffers: shared hit=195
               Worker 3:  actual time=5.151..14.760 rows=28104 loops=1
                 Buffers: shared hit=208
               ->  BitmapOr  (cost=3272.20..3272.20 rows=174813 width=0) 
(actual time=8.288..8.290 rows=0 loops=1)
                     Buffers: shared hit=500
                     ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1468.38 
rows=79441 width=0) (actual time=3.681..3.681 rows=79999 loops=1)
                           Index Cond: (x.a < 80000)
                           Buffers: shared hit=222
                     ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1763.86 
rows=95372 width=0) (actual time=4.605..4.605 rows=100000 loops=1)
                           Index Cond: (x.a > 199900000)
                           Buffers: shared hit=278
 Planning:
   Buffers: shared hit=19
 Planning Time: 0.173 ms
 Execution Time: 382.776 ms
(47 rows)

Execuse me, tsunakawa san, sorry to bother you,
 May I ask you some questions ?

>[1]
>Add accumulated statistics for wait event 
>https://commitfest.postgresql.org/28/2332/

I remembered FEP has such feature, is this the same as FEP.

Ok. thanks a lot for the explanation.
BTW, I am sorry the tablename I tested today is different from yesterday,
It may confused you, I have check that the result of both is the same.
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL)  insert into testscan 
select a from x where a<80000 or (a%2=0 and a>199900000);
                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
-
 Gather  (cost=4258.59..2063373.72 rows=81338 width=0) (actual 
time=0.203..310.184 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=393777 read=4 dirtied=939 written=953
   WAL: records=260356 bytes=16281081
   ->  Insert on public.testscan  (cost=3258.59..2054239.92 rows=0 width=0) 
(actual time=305.474..305.476 rows=0 loops=2)
         Buffers: shared hit=393777 read=4 dirtied=939 written=953
         WAL: records=260356 bytes=16281081
         Worker 0:  actual time=305.378..305.379 rows=0 loops=1
           Buffers: shared hit=196680 read=3 dirtied=462 written=478
           WAL: records=130155 bytes=8129880
         Worker 1:  actual time=305.570..305.572 rows=0 loops=1
           Buffers: shared hit=197097 read=1 dirtied=477 written=475
           WAL: records=130201 bytes=8151201
         ->  Parallel Bitmap Heap Scan on public.x  (cost=3258.59..2054239.92 
rows=40669 width=8) (actual time=8.883..27.998 rows=65000 loops=2)
               Output: x.a, NULL::integer
               Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
               Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 
199900000)))
               Rows Removed by Filter: 25000
               Buffers: shared hit=1476
               Worker 0:  actual time=8.796..27.982 rows=64992 loops=1
                 Buffers: shared hit=487
               Worker 1:  actual time=8.971..28.014 rows=65007 loops=1
                 Buffers: shared hit=989
               ->  BitmapOr  (cost=3258.59..3258.59 rows=173971 width=0) 
(actual time=8.810..8.811 rows=0 loops=1)
                     Buffers: shared hit=501
                     Worker 1:  actual time=8.810..8.811 rows=0 loops=1
                       Buffers: shared hit=501
                     ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1495.11 
rows=80872 width=0) (actual time=3.832..3.832 rows=79999 loops=1)
                           Index Cond: (x.a < 80000)
                           Buffers: shared hit=223
                           Worker 1:  actual time=3.832..3.832 rows=79999 
loops=1
                             Buffers: shared hit=223
                     ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1722.81 
rows=93099 width=0) (actual time=4.976..4.976 rows=100000 loops=1)
                           Index Cond: (x.a > 199900000)
                           Buffers: shared hit=278
                           Worker 1:  actual time=4.976..4.976 rows=100000 
loops=1
                             Buffers: shared hit=278
 Planning:
   Buffers: shared hit=19
 Planning Time: 0.146 ms
 Execution Time: 310.257 ms
(42 rows)
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL)  insert into testscan 
select a from x where a<80000 or (a%2=0 and a>199900000);
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Insert on public.testscan  (cost=3272.20..3652841.26 rows=0 width=0) (actual 
time=360.474..360.476 rows=0 loops=1)
   Buffers: shared hit=392569 read=3 dirtied=934 written=933
   WAL: records=260354 bytes=16259841
   ->  Bitmap Heap Scan on public.x  (cost=3272.20..3652841.26 rows=79918 
width=8) (actual time=8.096..41.005 rows=129999 loops=1)
         Output: x.a, NULL::integer
         Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
         Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
         Rows Removed by Filter: 50000
         Heap Blocks: exact=975
         Buffers: shared hit=1475
         ->  BitmapOr  (cost=3272.20..3272.20 rows=174813 width=0) (actual 
time=7.975..7.976 rows=0 loops=1)
               Buffers: shared hit=500
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1468.38 rows=79441 
width=0) (actual time=3.469..3.470 rows=79999 loops=1)
                     Index Cond: (x.a < 80000)
                     Buffers: shared hit=222
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1763.86 rows=95372 
width=0) (actual time=4.499..4.499 rows=100000 loops=1)
                     Index Cond: (x.a > 199900000)
                     Buffers: shared hit=278
 Planning:
   Buffers: shared hit=10
 Planning Time: 0.126 ms
 Execution Time: 360.547 ms
(22 rows)

Reply via email to