On 26 July 2018 at 07:12, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > In the patch clauselist_selectivity() gets called repeatedly for every > new qual added to the clause list. Instead, if we try to combine the > cost/row estimation with order_qual_clauses() or > clauselist_selectivity(), we might be able to what the current patch > does in O(n). clauselist_selectivity() calculates combined selectivity > of all the given quals in O(n). We should do something similar to that > in this case.
Duplicating the logic of clauselist_selectivity() seems like quite a lot of effort to go to just for improved filter cost estimates. Note also that clauselist_selectivity() might get a good deal more complex with multivariate stats. Perhaps a reasonable simplification would be to just treat the clauses as independent when estimating the filter cost, and so use the following as a "good enough" estimate for the filter cost: cost(A) + sel(A) * cost(B) + sel(A) * sel(B) * cost(C) + ... That would probably be an improvement over what we currently have. It would be O(n) to compute, and it would probably use the cached selectivity estimates for the clauses. Note also that with this simplified expression for the filter cost, it would be possible to improve order_qual_clauses() to take into account the clause selectivities when sorting the clauses, and minimise the cost of the filter by evaluating more selective clauses sooner, if they're not too expensive. I believe that amounts to sorting by cost/(1-sel) rather than just cost, except for clauses with sel==1, which it makes sense to move to the end, and just sort by cost. Regards, Dean