Re: [GENERAL] Do AGGREGATES consistently use sort order?

2007-09-06 Thread Gregory Stark

"Chris Browne" <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED] (Gregory Stark) writes:
>
>> You can even do this with GROUP BY as long as the leading columns of
>> the ORDER BY inside the subquery exactly matches the GROUP BY
>> columns.
...
> Is there not some risk that the query planner might choose to do
> hash-based accumulation could discard the subquery's ordering?

This is an interesting meme. Every time this topic comes up people always
think it's hash aggregates that risk destroying the ordering. I suppose
because usually the fear is that you can't iterate through hash keys in the
same order you created them.

However the ordering we're concerned with here isn't the order of the hash
keys. It's the order in which the elements making up the aggregate are applied
to each key. That order is always going to be the order in which the values
are seen.

In fact hash aggregates aren't the question at all; they're pretty much always
going to work. There's no reason for hash aggregates to change the order in
which individual data for a given hash key are processed. That's the whole
advantage of hash aggregates, they don't need to be pre-sorted. So they'll
always see the data in the order the subquery provides them.

The dangerous case is *non* hash aggregates. Regular sorted aggregates need to
have their inputs sorted so Postgres has to go out of its way to check for a
pre-existing matching ordering and avoid re-sorting the data. If it re-sorted
the inputs according to just the GROUP BY key it would destroy the
pre-existing order and the aggregate would see the data for an individual
group by key in an arbitrary order. 

(In fact, it's worse, it would work sometimes and not other times depending on
which sort algorithm was used because in-memory we use qsort which is not
stable but on-disk we use mergesort which is.)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Do AGGREGATES consistently use sort order?

2007-09-06 Thread Tom Lane
Chris Browne <[EMAIL PROTECTED]> writes:
> [EMAIL PROTECTED] (Gregory Stark) writes:
>> In theory we can't promise anything about future versions of
>> Postgres but there are lots of people doing this already so if ever
>> this was lost there would probably be some new explicit way to
>> achieve the same thing.

> Is there not some risk that the query planner might choose to do
> hash-based accumulation could discard the subquery's ordering?

Hash aggregation doesn't change the order in which input rows are fed to
any one aggregate; it only results in nominally-independent aggregate
functions being computed in parallel.  If you tried to write an
aggregate function that had hidden private state and wasn't re-entrant,
it would likely fail under hash aggregation; but aggregate functions
that "play by the rules" by keeping all their state in the declared
state object will work fine.

The long and the short of it is that we are very well aware that people
depend on this behavior, and we are not likely to break it.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Do AGGREGATES consistently use sort order?

2007-09-06 Thread Chris Browne
[EMAIL PROTECTED] (Gregory Stark) writes:
> "Webb Sprague" <[EMAIL PROTECTED]> writes:
>
>> I can always count on (note the order name):
>>
>> \a
>> oregon_2007_08_20=# select array_accum(name) from (select name from
>> placenames where desig='crater' order by name desc) a;
>> array_accum
>> {"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red
>> Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican
>> Crater","Little Nash Crater","Le Conte Crater","Jordan
>> Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black
>> Crater","Big Hole","Belknap Crater"}
>> (1 row)
>>
>> I am interested in stitching a line out of points in postgis, but
>> the order/aggregate thing is a general question.
>
> Yes.
>
> You can even do this with GROUP BY as long as the leading columns of
> the ORDER BY inside the subquery exactly matches the GROUP BY
> columns.
>
> In theory we can't promise anything about future versions of
> Postgres but there are lots of people doing this already so if ever
> this was lost there would probably be some new explicit way to
> achieve the same thing.

Is there not some risk that the query planner might choose to do
hash-based accumulation could discard the subquery's ordering?

Under the visible circumstances, it's unlikely, but isn't it possible
for the aggregation to pick hashing and make a hash of this?
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxfinances.info/info/spiritual.html
If anyone ever  markets  a really  well-documented Unix that   doesn't
require  babysitting by a phalanx of  provincial Unix clones, there'll
be a  lot of unemployable,  twinky-braindamaged misfits out deservedly
pounding the pavements.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Do AGGREGATES consistently use sort order?

2007-09-06 Thread Webb Sprague
order/aggregate thing is a general question.
>
> Yes.
>
> You can even do this with GROUP BY as long as the leading columns of the ORDER
> BY inside the subquery exactly matches the GROUP BY columns.
>
> In theory we can't promise anything about future versions of Postgres but
> there are lots of people doing this already so if ever this was lost there
> would probably be some new explicit way to achieve the same thing.

Does anyone have any spec links, oracle behavior, or whatever?  For
now I will trust Postgres to continue behaving sanely, but I am
curious.

Thx to Gregory for the quick reply

>
> --
>   Gregory Stark
>   EnterpriseDB  http://www.enterprisedb.com
>

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Do AGGREGATES consistently use sort order?

2007-09-06 Thread Gregory Stark

"Webb Sprague" <[EMAIL PROTECTED]> writes:

> I can always count on (note the order name):
>
> \a
> oregon_2007_08_20=# select array_accum(name) from (select name from
> placenames where desig='crater' order by name desc) a;
> array_accum
> {"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red
> Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican
> Crater","Little Nash Crater","Le Conte Crater","Jordan
> Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black
> Crater","Big Hole","Belknap Crater"}
> (1 row)
>
> I am interested in stitching a line out of points in postgis, but the
> order/aggregate thing is a general question.

Yes.

You can even do this with GROUP BY as long as the leading columns of the ORDER
BY inside the subquery exactly matches the GROUP BY columns.

In theory we can't promise anything about future versions of Postgres but
there are lots of people doing this already so if ever this was lost there
would probably be some new explicit way to achieve the same thing.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings