Hi Bharath,

I'm trying to take some performance measurements on you patch v23.  
But when I started, I found an issue about the tuples unbalance distribution 
among workers(99% tuples read by one worker) under specified case which lead 
the "parallel select" part makes no performance gain.
Then I find it's not introduced by your patch, because it's also happening in 
master(HEAD). But I don't know how to deal with it , so I put it here to see if 
anybody know what's going wrong with this or have good ideas to deal this issue.

Here are the conditions to produce the issue:
1. high CPU spec environment(say above 20 processors). In smaller CPU, it also 
happen but not so obvious(40% tuples on one worker in my tests).
2. query plan is "serial insert + parallel select", I have reproduce this 
behavior in (CTAS, Select into, insert into select).
3. select part needs to query large data size(e.g. query 100 million from 200 
million).

According to above, IMHO, I guess it may be caused by the leader write rate 
can't catch the worker read rate, then the tuples of one worker blocked in the 
queue, become more and more.

Below is my test info:
1. test spec environment
  CentOS 8.2, 128G RAM, 40 processors, disk SAS 

2. test data prepare
  create table x(a int, b int, c int);
  create index on x(a);
  insert into x select 
generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);

3. test execute results
  *Patched CTAS*: please look at worker 2, 99% tuples read by it.
  explain analyze verbose create table test(a,b,c) as select 
a,floor(random()*(10001-1)+1),c from x where b%2=0;
                                                                QUERY PLAN
  
-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual 
time=0.203..24023.686 rows=100006268 loops=1)
   Output: a, floor(((random() * '10000'::double precision) + '1'::double 
precision)), c
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000 
width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
         Output: a, c
         Filter: ((x.b % 2) = 0)
         Rows Removed by Filter: 19998746
         Worker 0:  actual time=0.016..19.265 rows=94592 loops=1
         Worker 1:  actual time=0.027..31.422 rows=94574 loops=1
         Worker 2:  actual time=0.014..21744.549 rows=99627749 loops=1
         Worker 3:  actual time=0.015..19.347 rows=94586 loops=1  Planning 
Time: 0.098 ms  Execution Time: 91054.828 ms

  *Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
  explain analyze verbose create table test(a,b,c) as select 
a,floor(random()*(10001-1)+1),c from x where b%2=0;
                                                                QUERY PLAN
  
-------------------------------------------------------------------------------------------------------------------------------
   Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual 
time=0.283..19216.157 rows=100003148 loops=1)
   Output: a, floor(((random() * '10000'::double precision) + '1'::double 
precision)), c
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000 
width=8) (actual time=0.020..4380.360 rows=20000630 loops=5)
         Output: a, c
         Filter: ((x.b % 2) = 0)
         Rows Removed by Filter: 19999370
         Worker 0:  actual time=0.013..21805.647 rows=99624833 loops=1
         Worker 1:  actual time=0.016..19.790 rows=94398 loops=1
         Worker 2:  actual time=0.013..35.340 rows=94423 loops=1
         Worker 3:  actual time=0.035..19.849 rows=94679 loops=1  Planning 
Time: 0.083 ms  Execution Time: 91151.097 ms

I'm still working on the performance tests on your patch, if I make some 
progress, I will post my results here.

Regards,
Tang


Reply via email to