Hello,

>Agreed, that it makes sense to consider only the number of pages to
>scan for computation of parallel workers.  I think for index scan we
>should consider both index and heap pages that need to be scanned
>(costing of index scan consider both index and heap pages).   I thin
>where considering heap pages matter more is when the finally selected
>rows are scattered across heap pages or we need to apply a filter on
>rows after fetching from the heap.  OTOH, we can consider just pages
>in the index as that is where mainly the parallelism works
IMO, considering just index pages will give a better estimate of work to be
done
in parallel. As the amount of work/number of pages divided amongst workers
is irrespective of
the number of heap pages scanned.

Thank you,
Rahila Syed




On Mon, Jan 30, 2017 at 2:52 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Sat, Jan 28, 2017 at 1:34 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >> parallel_index_opt_exec_support_v6 - Removed the usage of
> >> GatherSupportsBackwardScan.  Expanded the comments in
> >> ExecReScanIndexScan.
> >
> > I looked through this and in general it looks reasonable to me.
> > However, I did notice one thing that I think is wrong.  In the
> > parallel bitmap heap scan patch, the second argument to
> > compute_parallel_worker() is the number of pages that the parallel
> > scan is expected to fetch from the heap.  In this patch, it's the
> > total number of pages in the index.  The former seems to me to be
> > better, because the point of having a threshold relation size for
> > parallelism is that we don't want to use a lot of workers to scan a
> > small number of pages -- the distribution of work won't be even, and
> > the potential savings are limited.  If we've got a big index but are
> > using a very selective qual to pull out only one or a small number of
> > rows on a single page or a small handful of pages, we shouldn't
> > generate a parallel path for that.
> >
>
> Agreed, that it makes sense to consider only the number of pages to
> scan for computation of parallel workers.  I think for index scan we
> should consider both index and heap pages that need to be scanned
> (costing of index scan consider both index and heap pages).   I thin
> where considering heap pages matter more is when the finally selected
> rows are scattered across heap pages or we need to apply a filter on
> rows after fetching from the heap.  OTOH, we can consider just pages
> in the index as that is where mainly the parallelism works.  In the
> attached patch (parallel_index_opt_exec_support_v7.patch), I have
> considered both index and heap pages, let me know if you think some
> other way is better.  I have also prepared a separate independent
> patch (compute_index_pages_v1) on HEAD to compute index pages which
> can be used by parallel index scan. There is no change in
> parallel_index_scan (parallel btree scan) patch, so I am not attaching
> its new version.
>
> > Now, against that theory, the GUC that controls the behavior of
> > compute_parallel_worker() is called min_parallel_relation_size, which
> > might make you think that the decision is supposed to be based on the
> > whole size of some relation.  But I think that just means we need to
> > rename the GUC to something like min_parallel_scan_size.  Possibly we
> > also ought consider reducing the default value somewhat, because it
> > seems like both sequential and index scans can benefit even when
> > scanning less than 8MB.
> >
>
> Agreed, but let's consider it separately.
>
>
> The order in which patches needs to be applied:
> compute_index_pages_v1.patch, parallel_index_scan_v6.patch[1],
> parallel_index_opt_exec_support_v7.patch
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1J%
> 3DLSBpDx7i_izGJxGVUryqPe-2SKT02De-PrQvywiMxw%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Reply via email to