On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofengli...@gmail.com> wrote:
> It seems that we need to improve estimate of distinct values in > estimate_num_groups() when taking the selectivity of restrictions into > account. > > In 84f9a35e3 we changed to a new formula to perform such estimation. > But that does not apply to the case here, because for an appendrel, > set_append_rel_size() always sets "raw tuples" count equal to "rows", > and that would make estimate_num_groups() skip the adjustment of the > estimate using the new formula. > I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'. Why don't we set it to the accumulated estimate of tuples from each live child, like attached? I believe this aligns more closely with reality. And this would also allow us to adjust the estimate for the number of distinct values in estimate_num_groups() for appendrels using the new formula introduced in 84f9a35e3. As I experimented, this can improve the estimate for appendrels. For instance, create table t (a int, b int, c float) partition by range(a); create table tp1 partition of t for values from (0) to (1000); create table tp2 partition of t for values from (1000) to (2000); insert into t select i%2000, (100000 * random())::int, random() from generate_series(1,1000000) i; analyze t; explain analyze select b from t where c < 0.1 group by b; -- on master HashAggregate (cost=18659.28..19598.74 rows=93946 width=4) (actual time=220.760..234.439 rows=63224 loops=1) -- on patched HashAggregate (cost=18659.28..19294.25 rows=63497 width=4) (actual time=235.161..250.023 rows=63224 loops=1) With the patch the estimate for the number of distinct 'b' values is more accurate. BTW, this patch does not change any existing regression test results. I attempted to devise a regression test that shows how this change can improve query plans, but failed. Should I try harder to find such a test case? Thanks Richard
v1-0001-Adjust-tuples-estimate-for-appendrel.patch
Description: Binary data