On Fri, Mar 6, 2015 at 12:41 PM, David Rowley <dgrow...@gmail.com> wrote:
> On 6 March 2015 at 19:01, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> > wrote: > >> Postgres-XC solved this question by creating a plan with two Agg/Group >> nodes, one for combining transitioned result and one for creating the >> distributed transition results (one per distributed run per group). >> > > >> So, Agg/Group for combining result had as many Agg/Group nodes as there >> are distributed/parallel runs. >> > > This sounds quite like the planner must be forcing the executor to having > to execute the plan on a fixed number of worker processes. > > I really hoped that we could, one day, have a load monitor process that > decided what might be the best number of threads to execute a parallel plan > on. Otherwise how would we decide how many worker processes to allocate to > a plan? Surely there must be times where only utilising half of the > processors for a query would be better than trying to use all processors > and having many more context switched to perform. > > Probably the harder part about dynamically deciding the number of workers > would be around the costing. Where maybe the plan will execute the fastest > with 32 workers, but if it was only given 2 workers then it might execute > better as a non-parallel plan. > XC does that, because it knew on how many nodes it had to distribute the aggregation while creating the plan. To keep that dynamic, we can add a place-holder planner node for producing transitioned results for a given distributed run. At the time of execution, that node creates one executor node (corresponding to the place-holder node) per parallel run. I haven't seen a precedence in PG code to create more than one executor node for a given planner node, but is that a rule? > > >> But XC chose this way to reduce the code footprint. In Postgres, we can >> have different nodes for combining and transitioning as you have specified >> above. Aggregation is not pathified in current planner, hence XC took the >> approach of pushing the Agg nodes down the plan tree when there was >> distributed/parallel execution possible. If we can get aggregation >> pathified, we can go by path-based approach which might give a better >> judgement of whether or not to distribute the aggregates itself. >> >> Looking at Postgres-XC might be useful to get ideas. I can help you there. >> >> > > Regards > > David Rowley > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company