Re: [HACKERS] Parallel Seq Scan

Amit Kapila Thu, 29 Jan 2015 04:23:23 -0800

On Wed, Jan 28, 2015 at 8:59 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:
>
> I have tried the tests again and found that I have forgotten to increase
> max_worker_processes due to which the data is so different.  Basically
> at higher client count it is just scanning lesser number of blocks in
> fixed chunk approach.  So today I again tried with changing
> max_worker_processes and found that there is not much difference in
> performance at higher client count.  I will take some more data for
> both block_by_block and fixed_chunk approach and repost the data.
>


I have again taken the data and found that there is not much difference
either between block-by-block or fixed_chuck approach, the data is at
end of mail for your reference.  There is variation in some cases like in
fixed_chunk approach, in 8 workers case it is showing lesser time, however
on certain executions it has taken almost the same time as other workers.

Now if we go with block-by-block approach then we have advantage that
the work distribution granularity will be smaller and hence better and if
we go with chunk-by-chunk (fixed_chunk of 1GB) approach, then there
is good chance that kernel can do the better optimization for reading it.

Based on inputs on this thread, one way for execution strategy could
be:

a. In optimizer, based on effective_io_concurrency, size of relation and
parallel_seqscan_degree, we can decide how many workers can be
used for executing the plan
    - choose the number_of_workers equal to effective_io_concurrency,
      if it is less than parallel_seqscan_degree, else number_of_workers
      will be equal to parallel_seqscan_degree.
    - if the size of relation is greater than number_of_workers times GB
      (if number_of_workers is 8, then we need to compare the size of
       relation with 8GB), then keep number_of_workers intact and distribute
      the remaining chunks/segments during execution, else
      reduce the number_of_workers such that each worker gets 1GB
      to operate.
    - if the size of relation is less than 1GB, then we can either not
      choose the parallel_seqscan at all or could use smaller chunks
      or could use block-by-block approach to execute.
    - here we need to consider other parameters like parallel_setup
      parallel_startup and tuple_communication cost as well.

b. In executor, if less workers are available than what are required
for statement execution, then we can redistribute the remaining
work among workers.


Performance Data - Before first run of each worker, I have executed
drop_caches to clear the cache and restarted the server, so we can
assume that except Run-1, all other runs have some caching effect.


  *Fixed-Chunks*



 *No. of workers/Time (ms)* 0 8 16 32  Run-1 322822 245759 330097 330002
Run-2 275685 275428 301625 286251  Run-3 252129 244167 303494 278604  Run-4
252528 259273 250438 258636  Run-5 250612 242072 235384 265918




 *Block-By-Block*



 *No. of workers/Time (ms)* 0 8 16 32  Run-1 323084 341950 338999 334100
Run-2 310968 349366 344272 322643  Run-3 250312 336227 346276 322274  Run-4
262744 314489 351652 325135  Run-5 265987 316260 342924 319200


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

Reply via email to