Re: [GENERAL] Ordering of records in group by not possible

2006-04-26 Thread Martijn van Oosterhout
On Wed, Apr 26, 2006 at 09:19:41AM -0400, Chris Kratz wrote:
 Hello all,
 
 I wanted to verify what we are seeing.

 Select a, aggregate(b)
 from c
 group by a
 order by a,b
 
 Is not accepted by postgres.  This will only work if you order by a.  But, 
 this means that the records that are grouped are processed in no apparent 
 order.

Well, ORDER BY happens *after* the select values have been calculated,
so it can't possibly affect the order of the rows into the aggregate.

 We have some custom aggregate functions where the order of the rows is 
 important.  Is there no way to do this without a subselect?

You found the right solution, use ORDER BY in a subselect.

 Even with a subselect doing the ordering ahead of time, is there any 
 guarrantee that the records will be processed in the group by with the 
 specified order?  Or will the group by always be arbitrary in it's ordering 
 of the records?

Currently, if the sub-select orders the rows, the outer query will see
the rows in that order. I don't think the SQL standard even allows
ORDER BY there, but PostgreSQL does support it for this reason. BTW,
GROUP BY doesn't not imply any ordering at all, consider a Hash
Aggregate that calculates all the aggregates simultaneously...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] Ordering of records in group by not possible

2006-04-26 Thread Alban Hertroys

Chris Kratz wrote:

Hello all,

I wanted to verify what we are seeing.

Select a, aggregate(b)
from c
group by a
order by a,b


That's a rather odd query... Values in b aren't available to order by, 
as they have been aggregated. There is no relation to the values in b 
and the values in your result set.


You could order by column 2 if you want to order on the results on 
your aggregate:

Select a, aggregate(b)
from c
group by a
order by a,2

--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Ordering of records in group by not possible

2006-04-26 Thread Chris Kratz
On Wednesday 26 April 2006 10:30 am, you wrote:
 Chris Kratz wrote:
  Hello all,
 
  I wanted to verify what we are seeing.
 
  Select a, aggregate(b)
  from c
  group by a
  order by a,b

 That's a rather odd query... Values in b aren't available to order by,
 as they have been aggregated. There is no relation to the values in b
 and the values in your result set.

 You could order by column 2 if you want to order on the results on
 your aggregate:
   Select a, aggregate(b)
   from c
   group by a
   order by a,2

Hello Alban,

The point is that the aggregates we are working on in our application are 
order sensitive.  One common example of order sensitive aggregates are the 
first and last aggregate functions found in some other databases.  Both of 
which are very dependant on the order of the records going into the 
aggregate.  Since you can't order the records before they hit the group by in 
postgres, the actual ordering of the records when they hit the aggregate 
function is indeterminate and so order sensitive aggregates really don't 
work.

The best solution we have found so far is to feed the aggregating query with a 
subquery that orders on the appropriate columns.  But even then, since the 
group by forces another sort, I'm not convinced that the ordering is still 
not indeterminate.

I know you can use distinct on to get similar behavior for first and last, but 
that's not a general solution for aggregates that are order sensitive.

I'm almost convinced that there isn't a way to do order sensitive aggregates 
in pg, but will have to pull the record set out and process it 
programmatically externally if the subquery trick doesn't work.

-Chris

-- 
Chris Kratz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Ordering of records in group by not possible

2006-04-26 Thread Ludwig Isaac Lim
Hi:

 You could order by column 2 if you want to order on the
 results on 
 your aggregate:
   Select a, aggregate(b)
   from c
   group by a
   order by a,2
 

another alternative is :

   select a , aggregrate(b) as alias
   from c
   group by a
   order by a, alias

e.g.
   
   select a,sum(b) as sum_b
   from c
   group by a
   order by a, sum_b

ludwig lim

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Ordering of records in group by not possible

2006-04-26 Thread Chris Kratz
On Wednesday 26 April 2006 11:19 am, Ludwig Isaac Lim wrote:
 Hi:
  You could order by column 2 if you want to order on the
  results on
  your aggregate:
  Select a, aggregate(b)
  from c
  group by a
  order by a,2

 another alternative is :

select a , aggregrate(b) as alias
from c
group by a
order by a, alias

 e.g.

select a,sum(b) as sum_b
from c
group by a
order by a, sum_b

 ludwig lim

 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

Ironically, before we realized we weren't getting the results we expected, we 
were doing something like this.

select a,aggregate(b) as b
from c
group by a
order by a,b

We got lucky the first couple of tests and it appeared to work.  It was only 
later that we realized that the sort was actually sorting the aggregated 
values after the group by, not the rows going into the order sensitive 
aggregate as we had hoped.

I actually like the fact that the sort is the last thing that is done on a 
query, in most cases this is what makes sense.  It's just frustrating in this 
one place to have little control of the records going into the grouping code 
since the aggregates we are using are order sensitive.  Think functions like 
first and last.

-Chris
-- 
Chris Kratz

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Ordering of records in group by not possible

2006-04-26 Thread Alban Hertroys

Chris Kratz wrote:

On Wednesday 26 April 2006 10:30 am, you wrote:
Hello Alban,

The point is that the aggregates we are working on in our application are 
order sensitive.  One common example of order sensitive aggregates are the 
first and last aggregate functions found in some other databases.  Both of 


Ah, order sensitive aggregates... Of course. I've been playing with the 
thought of implementing first and last aggregates myself, but never got 
around to it. It is not that often that you need those, the same result 
can often be achieved by ordering and limitting.


Ordering the data before aggregating should do the trick though, but 
you'll need to order your records in the subquery. Something like this, 
I think:


Select a, aggregate(b)
from (
select a, b
from c
order by a,b
) d
group by a

Ordering on the aggregated values afterwards shouldn't make a difference 
to the aggregated results. I'm pretty sure of that.


The best solution we have found so far is to feed the aggregating query with a 
subquery that orders on the appropriate columns.  But even then, since the 
group by forces another sort, I'm not convinced that the ordering is still 
not indeterminate.


The experts on the internals will know for sure, but I think their 
answer'll be that the ordering is determinate this way.


I'm almost convinced that there isn't a way to do order sensitive aggregates 
in pg, but will have to pull the record set out and process it 
programmatically externally if the subquery trick doesn't work.


No matter how you're going to achieve your aggregate, you'll need to 
order the records you're aggregating over before doing so. Whether that 
can be done in SQL, a stored procedure or in application code doesn't 
change that.


I don't know where your idea to order on the original column outside the 
aggregate originated, but it looks like a rather confusing way to 
achieve this. It may even be indeterminate... It completely depends on 
the database system knowing what the aggregate is about, AFAICS.


For the record, I'm just a regular here. I'm not directly involved with 
PostgreSQL.


Regards,
--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Ordering of records in group by not possible

2006-04-26 Thread Tom Lane
Alban Hertroys [EMAIL PROTECTED] writes:
 Ordering the data before aggregating should do the trick though, but 
 you'll need to order your records in the subquery. Something like this, 
 I think:

 Select a, aggregate(b)
 from (
   select a, b
   from c
   order by a,b
 ) d
 group by a

Right.  The key point there is that the subquery is already delivering
outputs that are suitably sorted for grouping by A, and so even if the
planner wants to do it via group rather than hash aggregation, it will
not need to put in another sort step.  You can't just order by B in
the subquery, or it won't work.

This should work reliably since [ checks CVS... ] PG 7.4.  Just to be
sure, you might try looking at the query plan with set enable_hashagg =
off to verify that the planner recognizes it doesn't need a second sort
step.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org