On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofengli...@gmail.com> wrote:

> It seems that we need to improve estimate of distinct values in
> estimate_num_groups() when taking the selectivity of restrictions into
> account.
>
> In 84f9a35e3 we changed to a new formula to perform such estimation.
> But that does not apply to the case here, because for an appendrel,
> set_append_rel_size() always sets "raw tuples" count equal to "rows",
> and that would make estimate_num_groups() skip the adjustment of the
> estimate using the new formula.
>

I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached?  I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3.  As I experimented, this can improve
the estimate for appendrels.  For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (100000 * random())::int, random() from
generate_series(1,1000000) i;
analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
 HashAggregate  (cost=18659.28..19598.74 rows=93946 width=4)
                (actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
 HashAggregate  (cost=18659.28..19294.25 rows=63497 width=4)
                (actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results.  I
attempted to devise a regression test that shows how this change can
improve query plans, but failed.  Should I try harder to find such a
test case?

Thanks
Richard

Attachment: v1-0001-Adjust-tuples-estimate-for-appendrel.patch
Description: Binary data

Reply via email to