Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-23 Thread Marti Raudsepp
On Tue, Mar 22, 2011 at 01:02, Tom Lane t...@sss.pgh.pa.us wrote:
 I studied the code some more, and I think this probably can be made to
 work.  The basic idea is to have preprocess_minmax_aggregates build
 simplified queries like the above (working by modifying the query tree
 that exists at the point where it's called) and call query_planner on
 them.  Save aside the resulting path data, then let the regular planning
 process continue.  When optimize_minmax_aggregates is called, see
 whether the regular plan is cheaper than the sum of the path costs.
 If not, use the paths to construct a replacement plan, same as now.

Thanks a lot! I can confirm that this is fixed now in git version, and
now also works with partitioned tables, which is great news.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote:
 Hi list,

 When I have fields with lots of null values, I often create indexes
 like this: CREATE INDEX foo_i ON foo(i) WHERE i IS NOT NULL;
 This saves me some space, as most indexed queries exclude NULLs anyway.

 In PostgreSQL 9.0.3, min(i) can successfully use this index:
 ---
 marti=# create table foo as select null::int as i from
 generate_series(1,10);
 marti=# create index foo_i_notnull on foo (i) where i is not null;
 marti=# analyze foo;
 marti=# explain analyze select min(i) from foo;
  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.025..0.026
 rows=1 loops=1)
   InitPlan 1 (returns $0)
     -  Limit  (cost=0.00..0.00 rows=1 width=4) (actual
 time=0.021..0.021 rows=0 loops=1)
           -  Index Scan using foo_i_notnull on foo  (cost=0.00..8.27
 rows=10 width=4) (actual time=0.019..0.019 rows=0 loops=1)
  Total runtime: 0.063 ms
 ---

 It seems that PostgreSQL 9.1alpha3 cannot, however:
 ---
 marti=# explain analyze select min(i) from foo;
  Aggregate  (cost=1594.00..1594.01 rows=1 width=4) (actual
 time=29.612..29.612 rows=1 loops=1)
   -  Seq Scan on foo  (cost=0.00..1344.00 rows=10 width=4)
 (actual time=0.023..14.221 rows=10 loops=1)
  Total runtime: 29.661 ms
 ---

 It would be cool to have this feature re-added before a 9.1 release.

 I know that the Merge Append patch required some changes in the
 min/max optimization, which is probably the cause.

Yeah, I think this is a direct result of commit
034967bdcbb0c7be61d0500955226e1234ec5f04.

I was kind of nervous about that one when it went in, and the fact
that we're getting our first complaint about it before we've even hit
beta is not setting my mind at ease...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote:
 I know that the Merge Append patch required some changes in the
 min/max optimization, which is probably the cause.

 Yeah, I think this is a direct result of commit
 034967bdcbb0c7be61d0500955226e1234ec5f04.

Yeah, looks that way.  I'm not sure what it would take to re-support
this case without losing the other advantages of the change.  Personally
I'm not terribly excited about it: I don't think that suppressing nulls
from an index this way is really very useful.  Using a partial index
probably eats more planner cycles than you'll save, overall.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Robert Haas
On Mon, Mar 21, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote:
 I know that the Merge Append patch required some changes in the
 min/max optimization, which is probably the cause.

 Yeah, I think this is a direct result of commit
 034967bdcbb0c7be61d0500955226e1234ec5f04.

 Yeah, looks that way.  I'm not sure what it would take to re-support
 this case without losing the other advantages of the change.  Personally
 I'm not terribly excited about it: I don't think that suppressing nulls
 from an index this way is really very useful.  Using a partial index
 probably eats more planner cycles than you'll save, overall.

If only 1% of the table has non-NULL values in that column, maybe not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Kevin Grittner
Robert Haas  wrote:
 Tom Lane  wrote:
 
 I don't think that suppressing nulls from an index this way is
 really very useful. Using a partial index probably eats more
 planner cycles than you'll save, overall.
 
 If only 1% of the table has non-NULL values in that column, maybe
 not.
 
We definitely have indexes with less than 1% non-NULL, and we've
found partial indexes to be efficient for them.  On the other hand,
I can't think where we do min/max on any of them; so as long as this
regression only affects those aggregates, it won't hurt our shop.
 
The use case doesn't seem all that far-fetched to me, though.
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas  wrote:
 Tom Lane  wrote:
 I don't think that suppressing nulls from an index this way is
 really very useful. Using a partial index probably eats more
 planner cycles than you'll save, overall.

 If only 1% of the table has non-NULL values in that column, maybe
 not.
 
 We definitely have indexes with less than 1% non-NULL, and we've
 found partial indexes to be efficient for them.  On the other hand,
 I can't think where we do min/max on any of them; so as long as this
 regression only affects those aggregates, it won't hurt our shop.
 
 The use case doesn't seem all that far-fetched to me, though.

Hmm.  We could possibly fix this by having planagg.c do a completely
separate planner run for each aggregate, wherein it actually does build
the equivalent query
SELECT col FROM tab WHERE existing-quals AND col IS NOT NULL
ORDER BY col ASC/DESC LIMIT 1
and plan that.  That'd be less efficient than the current way,
especially for cases where there are multiple aggregates, because there
would be some duplication of processing between the per-aggregate
planner runs and the main one.  But since we can only do this
optimization for rather simple queries anyway, maybe it wouldn't matter
much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 So it's a clever hack that we used to allow the partial indexes to be
 used. It relied on the implicit assumption that min(x) and max(x)
 where the only values of x where NULL were both NULL.

 It would be nice if we were clever enough to support *any* strict
 aggregate using partial indexes on WHERE NOT NULL since they'll all
 have that property.

Huh?  The point of the min/max optimization is to not scan the whole
index but just fetch the endpoint value.  For general aggregates, you
have to scan the table anyway.  If an index is useful for that, it'll
get picked up in the normal planning process.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Mon, Mar 21, 2011 at 5:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 For general aggregates, you
 have to scan the table anyway.  If an index is useful for that, it'll
 get picked up in the normal planning process.

 if I do SELECT count(col) from tab with no WHERE clauses on a table
 with 1% non-null values in col will the planner correctly find the
 partial index? If so why doesn't the min/max planning find it?

It will not.  The hard part of doing something with that is that there
could be more than one aggregate.  I did think about whether we could
just push the IS NOT NULL into the main query, but that falls down on
cases like this:

select min(x), max(y) from tab;

If we try to modify that to

select min(x), max(y) from tab where x is not null and y is not null;

then we get the wrong answers, since x and y are probably nonnull in
different subsets of the table.

In the case of min/max, the endpoint hack makes the aggregates so cheap
that we can afford to perform a separate indexscan for each aggregate,
and thus having a NOT NULL qual that is different for each aggregate
isn't a problem (as long as we make sure it only affects that
aggregate's subquery and not the whole query).  This approach doesn't
scale to aggregates that will scan the whole table, though.

I suppose we might be able to do what you're suggesting for the case of
only one aggregate, but that isn't going to meet the desire of not
having a regression from what 9.0 could do with min/max.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
I wrote:
 Hmm.  We could possibly fix this by having planagg.c do a completely
 separate planner run for each aggregate, wherein it actually does build
 the equivalent query
   SELECT col FROM tab WHERE existing-quals AND col IS NOT NULL
   ORDER BY col ASC/DESC LIMIT 1
 and plan that.  That'd be less efficient than the current way,
 especially for cases where there are multiple aggregates, because there
 would be some duplication of processing between the per-aggregate
 planner runs and the main one.  But since we can only do this
 optimization for rather simple queries anyway, maybe it wouldn't matter
 much.

I studied the code some more, and I think this probably can be made to
work.  The basic idea is to have preprocess_minmax_aggregates build
simplified queries like the above (working by modifying the query tree
that exists at the point where it's called) and call query_planner on
them.  Save aside the resulting path data, then let the regular planning
process continue.  When optimize_minmax_aggregates is called, see
whether the regular plan is cheaper than the sum of the path costs.
If not, use the paths to construct a replacement plan, same as now.

The reason this should work is that query_planner() embodies pretty
much all the useful processing that happens between
preprocess_minmax_aggregates and optimize_minmax_aggregates --- the
other code in that stretch is mostly about grouping, which would disable
the minmax optimization anyway.  So no important steps will get left
out.  Of course, this introduces still more coupling between planagg.c
and planner.c, but I think that's probably tolerable.  The main
objection to this approach is having to do all the index analysis
N+1 times for an N-aggregate query.  I don't see any practical
alternative though if we want to make use of indexes that wouldn't be
used without the IS NOT NULL clause.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-21 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Tue, Mar 22, 2011 at 4:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Reimplement planner's handling of MIN/MAX aggregate optimization (again).

 I'm just curious, Why is this no longer an interesting special case?

 --- this is an interesting special case as of 9.1
 -explain (costs off)
 -  select min(unique2) from tenk1 where unique2 = 42;
 -  QUERY PLAN
 
 - Aggregate
 -   -  Index Scan using tenk1_unique2 on tenk1
 - Index Cond: (unique2 = 42)

In the pathkey-based implementation, that resulted in an empty pathkey
list, which that implementation couldn't deal with.  I figured that was
okay because the default plan isn't bad in such a case, but I put in a
test case (probably because the code failed before I put in a defense
against it, but I don't recall for sure).  It's not particularly a
corner case for the new code, though, and the resulting plan changed
(because the new code will in fact turn this into a LIMIT subselect
anyway).  So I debated whether to change the expected output or just
take it out, and I chose the latter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers