Re: [HACKERS] Parallel Seq Scan

Amit Kapila Tue, 30 Jun 2015 22:38:16 -0700

On Tue, Jun 30, 2015 at 4:00 AM, Jeff Davis <[email protected]> wrote:
>
> [Jumping in without catching up on entire thread.


No problem.

> Please let me know
> if these questions have already been covered.]
>
> 1. Can you change the name to something like ParallelHeapScan?
> Parallel Sequential is a contradiction. (I know this is bikeshedding
> and I won't protest further if you keep the name.)
>

For what you are asking to change name for?
We have two nodes in patch (Funnel and PartialSeqScan). Funnel is
the name given to node because it is quite generic and can be
used in multiple ways (other than plain parallel sequiantial scan)
and other node is named as PartialSeqScan because it is used
for doing the part of sequence scan.

> 2. Where is the speedup coming from? How much of it is CPU and IO
> overlapping (i.e. not leaving disk or CPU idle while the other is
> working), and how much from the CPU parallelism? I know this is
> difficult to answer rigorously, but it would be nice to have some
> breakdown even if for a specific machine.
>

Yes, you are right and we have done quite some testing (on the hardware
available) with this patch (with different approaches) to see how much
difference it creates for IO and CPU, with respect to IO we have found
that it doesn't help much [1], though it helps when the data is cached
and there are really good benefits in terms of CPU [2].

In terms of completeness, I think we should add some documentation
for this patch, one way is to update about the execution mechanism in
src/backend/access/transam/README.parallel and then explain about
new configuration knobs in documentation (.sgml files).  Also we
can have a separate page in itself in documentation under Server
Programming Section (Parallel Query -> Parallel Scan;
Parallel Scan Examples; ...)

Another thing to think about this patch at this stage do we need to
breakup this patch and if yes, how to break it up into multiple patches,
so that it can be easier to complete the review. I could see that it
can be splitted into 2 or 3 patches.
a. Infrastructure for parallel execution, like some of the stuff in
execparallel.c, heapam.c,tqueue.c, etc and all other generic
(non-nodes specific) code.
b. Nodes (Funnel and PartialSeqScan) specific code for optimiser
and executor.
c. Documentation

Suggestions?

[1] -
http://www.postgresql.org/message-id/caa4ek1jhcmn2x1ljq4bomlapt+btouid5vqqk5g6ddfv69i...@mail.gmail.com
[2] - Refer slides 14-15 for the presentation in PGCon, I can repost the
data here if required.
https://www.pgcon.org/2015/schedule/events/785.en.html


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

Reply via email to