Re: Consider low startup cost in add_partial_path

Robert Haas Wed, 04 Mar 2026 14:04:53 -0800

On Wed, Feb 25, 2026 at 1:35 AM Richard Guo <[email protected]> wrote:
> I noticed that in this patch, add_partial_path_precheck duplicates the
> entire logic of compare_path_costs_fuzzily.  I wonder if it would be
> better to avoid this code duplication.
>
> I'm thinking that maybe we could extract the core math of
> compare_path_costs_fuzzily into an internal helper function that
> operates on cost values (and disabled_nodes) rather than on Path
> structures.  We could then call this helper directly from within
> add_partial_path_precheck.


Maybe, but it would be a ten-parameter function.

> Otherwise, the patch looks good to me.

Turns out that version had a few problems:

1. It didn't handle the case where costcmp was COSTS_BETTER{1|2} but
keyscmp was PATHKEYS_EQUAL correctly. It had logic like this: if
(costcmp == COSTS_BETTER1) { if (keyscmp == PATHKEYS_BETTER1)
remove_old = true; }. This says that if the new path is better on cost
and the new path is also better on pathkeys, throw away the old path.
That's too weak. It should instead say if (costcmp == COSTS_BETTER1) {
if (keyscmp != PATHKEYS_BETTER2) remove_old = true; }. That is, if the
new path is better cost and is EQUAL OR BETTER on pathkeys, throw away
the old path. As a result, v56 would have kept too many paths.

2. It failed to preserve an equivalent of the (old_path->total_cost >
new_path->total_cost * 1.0000000001) test. It now tests
compare_path_costs_fuzzily(new_path, old_path, 1.0000000001) ==
COSTS_BETTER1).

3. It wasn't pgindent-clean.

Fixing the first of these problems results in one additional
regression test change. The query is:

explain (costs off) select count(*)
  from tenk1 t1
  join tenk1 t2 on t1.unique1 = t2.unique2
  join tenk1 t3 on t2.unique1 = t3.unique1
   order by count(*);

The sides of the t1-t2 join get swapped: the driving table is chosen
to be t2 rather than t1. Without that fix, we keep both
hash_join(t1,t2) and hash_join(t2,t1), The former has a total cost
that is lower by a very small amount and therefore ends up first in
the partial pathlist and so wins. With that fix, we see that
hash_join(t2,t1) has an essentially equal total cost and a much lower
startup cost, so we keep only that one. In other words, this change
seems to be correct.

Here's v57.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v57-0001-Consider-startup-cost-as-a-figure-of-merit-for-p.patch
Description: Binary data

Re: Consider low startup cost in add_partial_path

Reply via email to