On Sat, Jul 20, 2019 at 12:21:01PM -0400, James Coleman wrote:
On Sat, Jul 20, 2019 at 12:12 PM James Coleman <jtc...@gmail.com> wrote:

On Sat, Jul 20, 2019 at 11:25 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
...
> >My current line of investigation is whether we need to do anything in
> >the parallel portion of create_ordered_paths(). I noticed that the
> >first-pass patch adding generate_useful_gather_paths() modified that
> >section but wasn't actually adding any new gather-merge paths (just
> >bare incremental sort paths). That seems pretty clearly just a
> >prototype miss, so I modified the prototype to build gather-merge
> >paths instead (as a side note that change seems to fix an oddity I was
> >seeing where plans would include a parallel index scan node even
> >though they weren't parallel plans). While the resulting plan for
> >something like:
> >
>
> Yes, that seems to be a bug. The block above it clealy has a gather
> merge nodes, so this one should too.
>
> >explain analyze select * from t where t.a in (1,2,3,4,5,6) order by
> >t.a, t.b limit 50;
> >
> >changes cost (to be cheaper) ever so slightly with the gather-merge
> >addition to create_ordered_paths(), the plan itself is otherwise
> >identical (including row estimates):
> >
> >Limit
> >  -> Gather Merge
> >       -> Incremental Sort
> >          -> Parallel Index Scan
> >
> >(Note: I'm forcing parallel plans here with: set
> >max_parallel_workers_per_gather=4; set min_parallel_table_scan_size=0;
> >set parallel_tuple_cost=0; set parallel_setup_cost=0; set
> >min_parallel_index_scan_size=0;)
> >
> >I can't seem to come up with a case where adding these gather-merge
> >paths in create_ordered_paths() isn't entirely duplicative of paths
> >already created by generate_useful_gather_paths() as called from
> >apply_scanjoin_target_to_paths() -- which I _think_ makes sense given
> >that both apply_scanjoin_target_to_paths() and create_ordered_paths()
> >are called by grouping_planner().
> >
> >Can you think of a case I'm missing here that would make it valuable
> >to generate new parallel plans in create_ordered_paths()?
> >
>
> Good question. Not sure. I think such path would need to do something on
> a relation that is neither a join nor a scan - in which case the path
> should not be created by apply_scanjoin_target_to_paths().
>
> So for example a query like this:
>
>   SELECT
>       a, b, sum(expensive_function(c))
>   FROM
>       t
>   GROUP BY a,b
>   ORDER BY a,sum(...) LIMIT 10;
>
> should be able to produce a plan like this:
>
>   -> Limit
>       -> Gather Merge
>         -> Incremental Sort (pathkeys: a, sum)
>           -> Group Aggregate
>              a, b, sum(expensive_function(c))
>             -> Index Scan (pathkeys: a, b)
>
> or something like that, maybe. I haven't tried this, though. The other
> question is whether such queries are useful in practice ...

Hmm, when I step through on that example input_rel->partial_pathlist
!= NIL is false, so we don't even attempt to consider any extra
parallel paths in create_ordered_paths(). Nonetheless we get a
parallel plan, but with a different shape:

 Limit
   ->  Incremental Sort
         Sort Key: a, (sum(expensive_function(c)))
         Presorted Key: a
         ->  Finalize GroupAggregate
               Group Key: a, b
               ->  Gather Merge
                     Workers Planned: 4
                     ->  Partial GroupAggregate
                           Group Key: a, b
                           ->  Sort
                                 Sort Key: a, b
                                 ->  Parallel Seq Scan on t

(or if I disable seqscan then the sort+seq scan above becomes inc sort
+ index scan)

To be honest, I don't think I understand how you would get a plan like
that anyway since the index here only has "a" and so can't provide
(pathkeys: a, b).

Perhaps there's something I'm still missing though.

I wasn't stricly adhering to the example we used before, and I imagined
there would be an index on (a,b). Sorry if that wasn't clear.


Also just realized I don't think (?) we can order by the sum inside a
gather-merge -- at least not without having another sort above the
parallel portion? Or is the group aggregate able to also provide
ordering on the final sum after aggregating the partial sums?


Yes, you're right - an extra sort node would be necessary. But I don't
think that changes the idea behind that example. The question is whether
the extra sorts below the gather would be cheaper than doing a large sort
on top of it - but I don't see why wouldn't that be the case, and if we
only need a couple of rows from the beginning ...


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply via email to