[HACKERS] HAVING push-down
I've just read a paper that says PostgreSQL doesn't do this. My reading of the code is that we *do* evaluate the HAVING clause prior to calculating the aggregates for it. I thought I'd check to resolve the confusion. - - - If not, it seems fairly straightforward to push down some or all of a HAVING clause so that the qual clause is tested prior to aggregation, not after aggregation. This could, for certain queries, significantly reduce the amount of effort that the final Agg node performs. We might think about deeper push-down within the query, but since the Agg node already has the havingQual, it seems a straightforward act to decide whether to apply it before or after the aggregation. We already do find_unaggregated_cols(), so little additional analysis seems required. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] HAVING push-down
Simon Riggs [EMAIL PROTECTED] writes: I've just read a paper that says PostgreSQL doesn't do this. My reading of the code is that we *do* evaluate the HAVING clause prior to calculating the aggregates for it. I thought I'd check to resolve the confusion. - - - If not, it seems fairly straightforward to push down some or all of a HAVING clause so that the qual clause is tested prior to aggregation, not after aggregation. This could, for certain queries, significantly reduce the amount of effort that the final Agg node performs. You mean in cases like this? postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1; QUERY PLAN GroupAggregate (cost=0.00..13.61 rows=1 width=12) - Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12) Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1)) (3 rows) I think we push having clauses into WHERE clauses whenever there are no aggregates in them. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HAVING push-down
On Fri, 2007-01-26 at 15:22 +, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: I've just read a paper that says PostgreSQL doesn't do this. My reading of the code is that we *do* evaluate the HAVING clause prior to calculating the aggregates for it. I thought I'd check to resolve the confusion. You mean in cases like this? postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1; QUERY PLAN GroupAggregate (cost=0.00..13.61 rows=1 width=12) - Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12) Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1)) (3 rows) OK, thanks. I'll feedback to the author of the paper I was reviewing. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] HAVING push-down
Simon Riggs wrote: On Fri, 2007-01-26 at 15:22 +, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: I've just read a paper that says PostgreSQL doesn't do this. My reading of the code is that we *do* evaluate the HAVING clause prior to calculating the aggregates for it. I thought I'd check to resolve the confusion. You mean in cases like this? postgres=# explain select count(*) from customer group by c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1; QUERY PLAN GroupAggregate (cost=0.00..13.61 rows=1 width=12) - Index Scan using pk_customer on customer (cost=0.00..13.56 rows=4 width=12) Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1)) (3 rows) OK, thanks. I'll feedback to the author of the paper I was reviewing. Care to share the paper in general? It might be beneficial for all of us. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HAVING push-down
Simon Riggs [EMAIL PROTECTED] writes: I've just read a paper that says PostgreSQL doesn't do this. What does he mean by that exactly, and which PG version is he looking at? As Greg notes, we do know how to push down non-aggregated conditions, but I'm not sure that's what he's thinking of. There have been some relevant bug fixes, eg 2004-07-10 14:39 tgl * src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test HAVING condition before computing targetlist of an Aggregate node. This is required by SQL spec to avoid failures in cases like SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) 0; AFAICT we have gotten this wrong since day one. Kudos to Holger Jakobs for being the first to notice. Also, it's still true that we run all the aggregate transition functions in parallel, so if you were hoping to use HAVING on an aggregate condition to prevent an overflow or something in the state accumulation function for a targetlist aggregate, you'd lose. But I don't see any way to avoid that without scanning the data twice, which we're surely not gonna do. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] HAVING push-down
On Fri, 2007-01-26 at 11:16 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I've just read a paper that says PostgreSQL doesn't do this. What does he mean by that exactly, and which PG version is he looking at? As Greg notes, we do know how to push down non-aggregated conditions, but I'm not sure that's what he's thinking of. Yes, it was specifically non-aggregated conditions. There have been some relevant bug fixes, eg 2004-07-10 14:39 tgl * src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test HAVING condition before computing targetlist of an Aggregate node. This is required by SQL spec to avoid failures in cases like SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) 0; AFAICT we have gotten this wrong since day one. Kudos to Holger Jakobs for being the first to notice. Also, it's still true that we run all the aggregate transition functions in parallel, so if you were hoping to use HAVING on an aggregate condition to prevent an overflow or something in the state accumulation function for a targetlist aggregate, you'd lose. But I don't see any way to avoid that without scanning the data twice, which we're surely not gonna do. I'll send you the paper off-line, there's some more interesting stuff also. p.12 -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HAVING push-down
On Fri, 2007-01-26 at 07:46 -0800, Joshua D. Drake wrote: Care to share the paper in general? It might be beneficial for all of us. I'll ask the author, but don't expect an immediate response. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq