> 
> This allows us to give presorted input to both aggregates in the following
> case:
> 
> SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...
> 
> but just the first agg in this one:
> 
> SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...

I don't know if it's acceptable, but in the case where you add both an 
aggregate with an ORDER BY clause, and another aggregate without the clause, 
the output for the unordered one will change and use the same ordering, maybe 
suprising the unsuspecting user. Would that be acceptable ?

> When testing the performance of all this I found that when a suitable
> index exists to provide pre-sorted input for the aggregation that the
> performance does improve. Unfortunately, it looks like things get more
> complex when no index exists.  In this case, since we're setting
> pathkeys to tell the planner we need a plan that provides pre-sorted
> input to the aggregates, the planner will add a sort below the
> aggregate node.  I initially didn't see any problem with that as it
> just moves the sort to a Sort node rather than having it done
> implicitly inside nodeAgg.c.  The problem is, it just does not perform
> as well.  I guess this is because when the sort is done inside
> nodeAgg.c that the transition function is called in a tight loop while
> reading records back from the tuplestore.  In the patched version,
> there's an additional node transition in between nodeAgg and nodeSort
> and that causes slower performance.  For now, I'm not quite sure what
> to do about that.  We set the plan pathkeys well before we could
> possibly decide if asking for pre-sorted input for the aggregates
> would be a good idea or not.

I was curious about the performance implication of that additional transition, 
and could not reproduce a signifcant difference. I may be doing something 
wrong: how did you highlight it ?

Regards,

--
Ronan Dunklau




Reply via email to