On 3/14/19 12:56 PM, Kyotaro HORIGUCHI wrote: > At Wed, 13 Mar 2019 19:37:45 +1300, David Rowley > <[email protected]> wrote in > <cakjs1f_6qdqj9m2h0jf4brkzvlpfc7o9e+mxdxrq0wgv0z1...@mail.gmail.com> >> On Wed, 13 Mar 2019 at 17:20, Kyotaro HORIGUCHI >> <[email protected]> wrote: >>> bms_member_index seems working differently than maybe expected. >>> >>> bms_member_index((2, 4), 0) => 0, (I think) should be -1 >>> bms_member_index((2, 4), 1) => 0, should be -1 >>> bms_member_index((2, 4), 2) => 0, should be 0 >>> bms_member_index((2, 4), 3) => 1, should be -1 >>> bms_member_index((2, 4), 4) => 1, should be 1 >>> bms_member_index((2, 4), 5) => 2, should be -1 >>> bms_member_index((2, 4), 6) => 2, should be -1 >>> ... >>> bms_member_index((2, 4), 63) => 2, should be -1 >>> bms_member_index((2, 4), 64) => -1, correct >>> >>> It works correctly only when x is a member - the way the function >>> is maybe actually used in this patch -, or needs to change the >>> specifiction (or the comment) of the function. >> >> Looks like: >> >> + if (wordnum >= a->nwords) >> + return -1; >> >> should be: >> >> + if (wordnum >= a->nwords || >> + (a->word[wordnum] & ((bitmapword) 1 << bitnum)) == 0) >> + return -1; > > Yeah, seems right. >
Yep, that was broken. The attached patch fixes this by simply calling
bms_is_member, instead of copying the checks into bms_member_index.
I've also reworked the regression tests to use a function extracting the
cardinality estimates, as proposed by Dean and David. I have not reduced
the size of data sets yet, so the tests are not much faster, but we no
longer check the exact query plan. That's probably a good idea anyway.
Actually - the tests are a bit faster because it allows removing indexes
that were used for the query plans.
FWIW I've noticed an annoying thing when modifying type of column not
included in a statistics. Consider this:
create table t (a int, b int, c text);
insert into t select mod(i,10), mod(i,10), ''
from generate_series(1,10000) s(i);
create statistics s (dependencies) on a,b from t;
analyze t;
explain analyze select * from t where a = 1 and b = 1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on t (cost=0.00..205.00 rows=1000 width=9)
(actual time=0.014..1.910 rows=1000 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 9000
Planning Time: 0.119 ms
Execution Time: 2.234 ms
(5 rows)
alter table t alter c type varchar(61);
explain analyze select * from t where a = 1 and b = 1;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on t (cost=0.00..92.95 rows=253 width=148)
(actual time=0.020..2.420 rows=1000 loops=1)
Filter: ((a = 1) AND (b = 1))
Rows Removed by Filter: 9000
Planning Time: 0.128 ms
Execution Time: 2.767 ms
(5 rows)
select stxdependencies from pg_statistic_ext;
stxdependencies
------------------------------------------
{"1 => 2": 1.000000, "2 => 1": 1.000000}
(1 row)
That is, we don't remove the statistics, but the estimate still changes.
But that's because the ALTER TABLE also resets reltuples/relpages:
select relpages, reltuples from pg_class where relname = 't';
relpages | reltuples
----------+-----------
0 | 0
(1 row)
That's a bit unfortunate, and it kinda makes the whole effort to not
drop the statistics unnecessarily kinda pointless :-(
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
0001-multivariate-MCV-lists-20190315.patch.gz
Description: application/gzip
0002-multivariate-histograms-20190315.patch.gz
Description: application/gzip
