Andres Freund <and...@anarazel.de> writes:
> On 2015-07-11 14:31:25 +0200, Tomas Vondra wrote:
>> While working on the "IOS with partial indexes" patch, I've noticed a bit
>> strange plan. It's unrelated to that particular patch (reproducible on
>> master), so I'm starting a new thread for it.

> It's indeed interesting. Running
> ANALYZE t;EXPLAIN SELECT a FROM t WHERE b < 100;
> repeatedly switches back and forth between the plans.

The issue basically is that ANALYZE is putting quasi-random numbers into
the reltuples estimates for the indexes.  Things seem to be consistently
sane after a VACUUM:

regression=# vacuum t;
VACUUM
regression=# select relname,relpages,reltuples from pg_class where relname in ( 
't', 'idx001', 'idx002', 'idx003');
 relname | relpages |  reltuples  
---------+----------+-------------
 t       |    44248 | 9.99998e+06
 idx001  |        2 |          99
 idx002  |        2 |         199
 idx003  |        2 |         299
(4 rows)

but not so much after ANALYZE:

regression=# analyze t;
ANALYZE
regression=# select relname,relpages,reltuples from pg_class where relname in ( 
't', 'idx001', 'idx002', 'idx003');
 relname | relpages | reltuples 
---------+----------+-----------
 t       |    44248 |     1e+07
 idx001  |        2 |         0
 idx002  |        2 |         0
 idx003  |        2 |         0
(4 rows)

I've also seen states like

 relname | relpages |  reltuples  
---------+----------+-------------
 t       |    44248 | 9.99998e+06
 idx001  |        2 |           0
 idx002  |        2 |         334
 idx003  |        2 |         334
(4 rows)

Presumably, this is happening because the numbers of rows actually
satisfying the index predicates are so small that it's a matter of luck
whether any of them are included in ANALYZE's sample.

Given this bad data for the index sizes, it's not totally surprising that
choose_bitmap_and() does something wacko.  I'm not sure whether we should
try to make it smarter, or write this off as "garbage in, garbage out".

Another idea is to not trust any individual ANALYZE's estimate of the
index rowcount so completely.  (I'd thought that the moving-average logic
would get applied to that, but it doesn't seem to be kicking in for some
reason.)

We could probably make this smarter if we were willing to apply the
predicate-proof machinery in more situations; in this example, once we know
that idx001 is applicable, we really should disregard idx002 and idx003
altogether because their predicates are implied by idx001's.  I've always
been hesitant to do that because the cost of checking seemed likely to
greatly outweigh the benefits.  But since Tomas is nosing around in this
territory already, maybe he'd like to investigate that further.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to