[HACKERS] HAVING push-down

2007-01-26 Thread Simon Riggs
I've just read a paper that says PostgreSQL doesn't do this. My reading
of the code is that we *do*  evaluate the HAVING clause prior to
calculating the aggregates for it. I thought I'd check to resolve the
confusion.

- - -

If not, it seems fairly straightforward to push down some or all of a
HAVING clause so that the qual clause is tested prior to aggregation,
not after aggregation. This could, for certain queries, significantly
reduce the amount of effort that the final Agg node performs.

We might think about deeper push-down within the query, but since the
Agg node already has the havingQual, it seems a straightforward act to
decide whether to apply it before or after the aggregation.

We already do find_unaggregated_cols(), so little additional analysis
seems required.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Gregory Stark

Simon Riggs [EMAIL PROTECTED] writes:

 I've just read a paper that says PostgreSQL doesn't do this. My reading
 of the code is that we *do*  evaluate the HAVING clause prior to
 calculating the aggregates for it. I thought I'd check to resolve the
 confusion.

 - - -

 If not, it seems fairly straightforward to push down some or all of a
 HAVING clause so that the qual clause is tested prior to aggregation,
 not after aggregation. This could, for certain queries, significantly
 reduce the amount of effort that the final Agg node performs.

You mean in cases like this?

postgres=# explain select  count(*) from customer group by c_w_id,c_d_id,c_id 
having c_w_id = 1 and c_d_id=1 and c_id=1;
 QUERY PLAN 


 GroupAggregate  (cost=0.00..13.61 rows=1 width=12)
   -  Index Scan using pk_customer on customer  (cost=0.00..13.56 rows=4 
width=12)
 Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
(3 rows)

I think we push having clauses into WHERE clauses whenever there are no
aggregates in them.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 15:22 +, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
 
  I've just read a paper that says PostgreSQL doesn't do this. My reading
  of the code is that we *do*  evaluate the HAVING clause prior to
  calculating the aggregates for it. I thought I'd check to resolve the
  confusion.
 

 You mean in cases like this?
 
 postgres=# explain select  count(*) from customer group by c_w_id,c_d_id,c_id 
 having c_w_id = 1 and c_d_id=1 and c_id=1;
  QUERY PLAN   
   
 
  GroupAggregate  (cost=0.00..13.61 rows=1 width=12)
-  Index Scan using pk_customer on customer  (cost=0.00..13.56 rows=4 
 width=12)
  Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
 (3 rows)

OK, thanks. I'll feedback to the author of the paper I was reviewing.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Joshua D. Drake
Simon Riggs wrote:
 On Fri, 2007-01-26 at 15:22 +, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:

 I've just read a paper that says PostgreSQL doesn't do this. My reading
 of the code is that we *do*  evaluate the HAVING clause prior to
 calculating the aggregates for it. I thought I'd check to resolve the
 confusion.

 
 You mean in cases like this?

 postgres=# explain select  count(*) from customer group by 
 c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1;
  QUERY PLAN  

 
  GroupAggregate  (cost=0.00..13.61 rows=1 width=12)
-  Index Scan using pk_customer on customer  (cost=0.00..13.56 rows=4 
 width=12)
  Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
 (3 rows)
 
 OK, thanks. I'll feedback to the author of the paper I was reviewing.
 

Care to share the paper in general? It might be beneficial for all of us.

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 I've just read a paper that says PostgreSQL doesn't do this.

What does he mean by that exactly, and which PG version is he looking
at?  As Greg notes, we do know how to push down non-aggregated
conditions, but I'm not sure that's what he's thinking of.  There have
been some relevant bug fixes, eg

2004-07-10 14:39  tgl

* src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test
HAVING condition before computing targetlist of an Aggregate node. 
This is required by SQL spec to avoid failures in cases like  
SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) 
0; AFAICT we have gotten this wrong since day one.  Kudos to Holger
Jakobs for being the first to notice.

Also, it's still true that we run all the aggregate transition functions
in parallel, so if you were hoping to use HAVING on an aggregate
condition to prevent an overflow or something in the state accumulation
function for a targetlist aggregate, you'd lose.  But I don't see any
way to avoid that without scanning the data twice, which we're surely
not gonna do.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 11:16 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  I've just read a paper that says PostgreSQL doesn't do this.
 
 What does he mean by that exactly, and which PG version is he looking
 at?  As Greg notes, we do know how to push down non-aggregated
 conditions, but I'm not sure that's what he's thinking of.  

Yes, it was specifically non-aggregated conditions.

 There have
 been some relevant bug fixes, eg
 
 2004-07-10 14:39  tgl
 
   * src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test
   HAVING condition before computing targetlist of an Aggregate node. 
   This is required by SQL spec to avoid failures in cases like  
   SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) 
   0; AFAICT we have gotten this wrong since day one.  Kudos to Holger
   Jakobs for being the first to notice.
 
 Also, it's still true that we run all the aggregate transition functions
 in parallel, so if you were hoping to use HAVING on an aggregate
 condition to prevent an overflow or something in the state accumulation
 function for a targetlist aggregate, you'd lose.  But I don't see any
 way to avoid that without scanning the data twice, which we're surely
 not gonna do.

I'll send you the paper off-line, there's some more interesting stuff
also. p.12

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 07:46 -0800, Joshua D. Drake wrote:

 Care to share the paper in general? It might be beneficial for all of us.

I'll ask the author, but don't expect an immediate response.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq