Re: [HACKERS] strange plan with bitmap heap scan and multiple partial indexes

Tomas Vondra Sat, 11 Jul 2015 14:21:06 -0700

On 07/11/2015 06:32 PM, Tom Lane wrote:
...


Presumably, this is happening because the numbers of rows actually
satisfying the index predicates are so small that it's a matter of
luck whether any of them are included in ANALYZE's sample.

Given this bad data for the index sizes, it's not totally surprising
that choose_bitmap_and() does something wacko. I'm not sure whether
we should try to make it smarter, or write this off as "garbage in,
garbage out".

I think we should make it smarter, if possible - while this example issomewhat artificial, partial indexes are often used exactly like this,i.e. to index only very small subset of data. A good example may be anindex on "active invoices", i.e. invoices that were yet sorted out.There may be a lot of invoices in the table, but only very smallfraction of them will be active (and thus in the index).

So I don't think is an artificial problem, and we should not write itoff as "garbage in".

Another idea is to not trust any individual ANALYZE's estimate of
the index rowcount so completely. (I'd thought that the
moving-average logic would get applied to that, but it doesn't seem
to be kicking in for some reason.)

We could probably make this smarter if we were willing to apply the
predicate-proof machinery in more situations; in this example, once
we know that idx001 is applicable, we really should disregard idx002
and idx003 altogether because their predicates are implied by
idx001's. I've always been hesitant to do that because the cost of
checking seemed likely to greatly outweigh the benefits. But since
Tomas is nosing around in this territory already, maybe he'd like to
 investigate that further.

I think there are two possible approaches in general - we may improvethe statistics somehow, or we may start doing the predicate proofing.

I doubt approaching this at the statistics level alone is sufficient,because even with statistics target 10k (i.e. the most detailed one),the sample is still fixed-size. So there will always exist a combinationof a sufficiently large data set and selective partial index, causingtrouble with the sampling.

Moreover, I can't really think of a way to fix this at the statisticslevel. Maybe there's a clever trick guarding against this particularissue, but my personal experience is that whenever I used such a smarthack, it eventually caused strange issues elsewhere.

So I think the predicate proofing is a better approach, but of coursethe planning cost may be an issue. But maybe we can make this cheaper bysome clever tricks? For example, given two predicates A and B, it seemsthat if A => B, then selectivity(A) <= selectivity(B). Could we use thisto skip some of the expensive stuff? We should have the selectivitiesanyway, no?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange plan with bitmap heap scan and multiple partial indexes

Reply via email to