Re: [HACKERS] Parallel Aggregate

David Rowley Tue, 20 Oct 2015 03:24:09 -0700

On 13 October 2015 at 20:57, Haribabu Kommi <[email protected]>
wrote:


> On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
> <[email protected]> wrote:
> > On 13 October 2015 at 17:09, Haribabu Kommi <[email protected]>
> > wrote:
> >>
> >> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <[email protected]>
> >> wrote:
> >> > Also, I think the path for parallel aggregation should probably be
> >> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> >> > path here.  I'm not clear whether that is what you are thinking or
> >> > not.
> >>
> >> No. I am thinking of the following way.
> >> Gather->partialagg->some partial path
> >>
> >> I want the Gather node to merge the results coming from all workers,
> >> otherwise
> >> it may be difficult to merge at parent of gather node. Because in case
> >> the partial
> >> group aggregate is under the Gather node, if any of two workers are
> >> returning
> >> same group key data, we need to compare them and combine it to make it a
> >> single group. If we are at Gather node, it is possible that we can
> >> wait till we get
> >> slots from all workers. Once all workers returns the slots we can
> compare
> >> and merge the necessary slots and return the result. Am I missing
> >> something?
> >
> >
> > My assumption is the same as Robert's here.
> > Unless I've misunderstood, it sounds like you're proposing to add logic
> into
> > the Gather node to handle final aggregation? That sounds like a
> modularity
> > violation of the whole node concept.
> >
> > The handling of the final aggregate stage is not all that different from
> the
> > initial aggregate stage. The primary difference is just that your calling
> > the combine function instead of the transition function, and the values
>
> Yes, you are correct, till now i am thinking of using transition types as
> the
> approach, because of that reason only I proposed it as Gather node to
> handle
> the finalize aggregation.
>
> > being aggregated are aggregates states rather than the type of the values
> > which were initially aggregated. The handling of GROUP BY is all the
> same,
> > yet you only apply the HAVING clause during final aggregation. This is
> why I
> > ended up implementing this in nodeAgg.c instead of inventing some new
> node
> > type that's mostly a copy and paste of nodeAgg.c [1]
>
> After going through your Partial Aggregation / GROUP BY before JOIN patch,
> Following is my understanding of parallel aggregate.
>
> Finalize [hash] aggregate
>         -> Gather
>               -> Partial [hash] aggregate
>
> The data that comes from the Gather node contains the group key and
> grouping results.
> Based on these we can generate another hash table in case of hash
> aggregate at
> finalize aggregate and return the final results. This approach works
> for both plain and
> hash aggregates.
>
> For group aggregate support of parallel aggregate, the plan should be
> as follows.
>
> Finalize Group aggregate
>     ->sort
>         -> Gather
>               -> Partial group aggregate
>                    ->sort
>
> The data that comes from Gather node needs to be sorted again based on
> the grouping key,
> merge the data and generates the final grouping result.
>
> With this approach, we no need to change anything in Gather node. Is
> my understanding correct?
>
>
Our understandings are aligned.

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Parallel Aggregate

Reply via email to