On Fri, Mar 6, 2015 at 12:41 PM, David Rowley <dgrow...@gmail.com> wrote:

> On 6 March 2015 at 19:01, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com>
> wrote:
>
>> Postgres-XC solved this question by creating a plan with two Agg/Group
>> nodes, one for combining transitioned result and one for creating the
>> distributed transition results (one per distributed run per group).
>>
>
>
>> So, Agg/Group for combining result had as many Agg/Group nodes as there
>> are distributed/parallel runs.
>>
>
> This sounds quite like the planner must be forcing the executor to having
> to execute the plan on a fixed number of worker processes.
>
> I really hoped that we could, one day, have a load monitor process that
> decided what might be the best number of threads to execute a parallel plan
> on. Otherwise how would we decide how many worker processes to allocate to
> a plan? Surely there must be times where only utilising half of the
> processors for a query would be better than trying to use all processors
> and having many more context switched to perform.
>
> Probably the harder part about dynamically deciding the number of workers
> would be around the costing. Where maybe the plan will execute the fastest
> with 32 workers, but if it was only given 2 workers then it might execute
> better as a non-parallel plan.
>

XC does that, because it knew on how many nodes it had to distribute the
aggregation while creating the plan. To keep that dynamic, we can add a
place-holder planner node for producing transitioned results for a given
distributed run. At the time of execution, that node creates one executor
node (corresponding to the place-holder node) per parallel run. I haven't
seen a precedence in PG code to create more than one executor node for a
given planner node, but is that a rule?


>
>
>> But XC chose this way to reduce the code footprint. In Postgres, we can
>> have different nodes for combining and transitioning as you have specified
>> above. Aggregation is not pathified in current planner, hence XC took the
>> approach of pushing the Agg nodes down the plan tree when there was
>> distributed/parallel execution possible. If we can get aggregation
>> pathified, we can go by path-based approach which might give a better
>> judgement of whether or not to distribute the aggregates itself.
>>
>> Looking at Postgres-XC might be useful to get ideas. I can help you there.
>>
>>
>
>  Regards
>
> David Rowley
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Reply via email to