Hi, On 2024-06-12 11:35:48 -0400, Robert Haas wrote: > Subject: [PATCH v2 3/4] Treat the # of disabled nodes in a path as a separate > cost metric. > > Previously, when a path type was disabled by e.g. enable_seqscan=false, > we either avoided generating that path type in the first place, or > more commonly, we added a large constant, called disable_cost, to the > estimated startup cost of that path. This latter approach can distort > planning. For instance, an extremely expensive non-disabled path > could seem to be worse than a disabled path, especially if the full > cost of that path node need not be paid (e.g. due to a Limit). > Or, as in the regression test whose expected output changes with this > commit, the addition of disable_cost can make two paths that would > normally be distinguishible cost seem to have fuzzily the same cost. > > To fix that, we now count the number of disabled path nodes and > consider that a high-order component of both the cost. Hence, the > path list is now sorted by disabled_nodes and then by total_cost, > instead of just by the latter, and likewise for the partial path list. > It is important that this number is a count and not simply a Boolean; > else, as soon as we're unable to respect disabled path types in all > portions of the path, we stop trying to avoid them where we can.
> if (criterion == STARTUP_COST) > { > if (path1->startup_cost < path2->startup_cost) > @@ -118,6 +127,15 @@ compare_fractional_path_costs(Path *path1, Path *path2, > Cost cost1, > cost2; > > + /* Number of disabled nodes, if different, trumps all else. */ > + if (unlikely(path1->disabled_nodes != path2->disabled_nodes)) > + { > + if (path1->disabled_nodes < path2->disabled_nodes) > + return -1; > + else > + return +1; > + } I suspect it's going to be ok, because the branch is going to be very predictable in normal workloads, but I still worry a bit about making compare_path_costs_fuzzily() more expensive. For more join-heavy queries it can really show up and there's plenty ORM generated join-heavy query workloads. If costs were 32 bit integers, I'd have suggested just stashing the disabled counts in the upper 32 bits of a 64bit integer. But ... <can't resist trying if I see overhead> In an extreme case i can see a tiny bit of overhead, but not enough to be worth worrying about. Mostly because we're so profligate in doing bms_overlap() that cost comparisons don't end up mattering as much - I seem to recall that being different in the not distant past though. Aside: I'm somewhat confused by add_paths_to_joinrel()'s handling of mergejoins_allowed. If mergejoins are disabled we end up reaching match_unsorted_outer() in more cases than with mergejoins enabled. E.g. we only set mergejoin_enabled for right joins inside select_mergejoin_clauses(), but we don't call select_mergejoin_clauses() if !enable_mergejoin and jointype != FULL. I, what? Greetings, Andres Freund