On 13 October 2015 at 20:57, Haribabu Kommi <kommi.harib...@gmail.com> wrote:
> On Tue, Oct 13, 2015 at 5:53 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: > > On 13 October 2015 at 17:09, Haribabu Kommi <kommi.harib...@gmail.com> > > wrote: > >> > >> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmh...@gmail.com> > >> wrote: > >> > Also, I think the path for parallel aggregation should probably be > >> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial > >> > path here. I'm not clear whether that is what you are thinking or > >> > not. > >> > >> No. I am thinking of the following way. > >> Gather->partialagg->some partial path > >> > >> I want the Gather node to merge the results coming from all workers, > >> otherwise > >> it may be difficult to merge at parent of gather node. Because in case > >> the partial > >> group aggregate is under the Gather node, if any of two workers are > >> returning > >> same group key data, we need to compare them and combine it to make it a > >> single group. If we are at Gather node, it is possible that we can > >> wait till we get > >> slots from all workers. Once all workers returns the slots we can > compare > >> and merge the necessary slots and return the result. Am I missing > >> something? > > > > > > My assumption is the same as Robert's here. > > Unless I've misunderstood, it sounds like you're proposing to add logic > into > > the Gather node to handle final aggregation? That sounds like a > modularity > > violation of the whole node concept. > > > > The handling of the final aggregate stage is not all that different from > the > > initial aggregate stage. The primary difference is just that your calling > > the combine function instead of the transition function, and the values > > Yes, you are correct, till now i am thinking of using transition types as > the > approach, because of that reason only I proposed it as Gather node to > handle > the finalize aggregation. > > > being aggregated are aggregates states rather than the type of the values > > which were initially aggregated. The handling of GROUP BY is all the > same, > > yet you only apply the HAVING clause during final aggregation. This is > why I > > ended up implementing this in nodeAgg.c instead of inventing some new > node > > type that's mostly a copy and paste of nodeAgg.c [1] > > After going through your Partial Aggregation / GROUP BY before JOIN patch, > Following is my understanding of parallel aggregate. > > Finalize [hash] aggregate > -> Gather > -> Partial [hash] aggregate > > The data that comes from the Gather node contains the group key and > grouping results. > Based on these we can generate another hash table in case of hash > aggregate at > finalize aggregate and return the final results. This approach works > for both plain and > hash aggregates. > > For group aggregate support of parallel aggregate, the plan should be > as follows. > > Finalize Group aggregate > ->sort > -> Gather > -> Partial group aggregate > ->sort > > The data that comes from Gather node needs to be sorted again based on > the grouping key, > merge the data and generates the final grouping result. > > With this approach, we no need to change anything in Gather node. Is > my understanding correct? > > Our understandings are aligned. Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Training & Services