> On Mar 27, 2018, at 7:58 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> David Rowley <david.row...@2ndquadrant.com> writes:
>> On 27 March 2018 at 13:26, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote:
>>> synchronized_seqscans is another piece of precedent in the area, FWIW.
> 
>> This is true. I guess the order of aggregation could be made more
>> certain if we remove the cost based optimiser completely, and just
>> rely on a syntax based optimiser.
> 
> None of this is responding to my point.  I think the number of people
> who actually don't care about aggregation order for these aggregates
> is negligible, and none of you have argued against that; you've instead
> selected straw men to attack.

I frequently care about the order, but only to the extent that the order
is stable between aggregates of several different columns, along the lines
of:

select array_agg(a) AS x, array_agg(b) AS y
  from generate_a_b_func(foo);

I don't care which order the data is in, as long as x[i] and y[i] are
matched correctly.  It sounds like this patch would force me to write
that as, for example:

select array_agg(a order by a, b) AS x, array_agg(b order by a, b) AS y
  from generate_a_b_func(foo);

which I did not need to do before.  I would expect a performance regression
from the two newly required sorts.  So in that case I agree with Tom.

But I also agree with others that I want the parallel aggregation functionality.
Could we perhaps introduce some option for the aggregate to force it to be
stable?  Something like:

select array_agg(a order undisturbed) AS x, array_agg(b order undisturbed) AS y
  from generate_a_b_func(foo);

which would not perform an extra sort operation but would guarantee to not
disturb the pre-existing sort order coming from generate_a_b_func(foo)?


I don't care about the syntax / keywords in the example above.  I'm just
looking to get the benefits of the parallel aggregation when I don't care
about ordering while preserving the order for these cases where it matters.

mark 


Reply via email to