Hi,

On 2024-06-12 11:35:48 -0400, Robert Haas wrote:
> Subject: [PATCH v2 3/4] Treat the # of disabled nodes in a path as a separate
>  cost metric.
> 
> Previously, when a path type was disabled by e.g. enable_seqscan=false,
> we either avoided generating that path type in the first place, or
> more commonly, we added a large constant, called disable_cost, to the
> estimated startup cost of that path. This latter approach can distort
> planning. For instance, an extremely expensive non-disabled path
> could seem to be worse than a disabled path, especially if the full
> cost of that path node need not be paid (e.g. due to a Limit).
> Or, as in the regression test whose expected output changes with this
> commit, the addition of disable_cost can make two paths that would
> normally be distinguishible cost seem to have fuzzily the same cost.
> 
> To fix that, we now count the number of disabled path nodes and
> consider that a high-order component of both the cost. Hence, the
> path list is now sorted by disabled_nodes and then by total_cost,
> instead of just by the latter, and likewise for the partial path list.
> It is important that this number is a count and not simply a Boolean;
> else, as soon as we're unable to respect disabled path types in all
> portions of the path, we stop trying to avoid them where we can.


>       if (criterion == STARTUP_COST)
>       {
>               if (path1->startup_cost < path2->startup_cost)
> @@ -118,6 +127,15 @@ compare_fractional_path_costs(Path *path1, Path *path2,
>       Cost            cost1,
>                               cost2;
>  
> +     /* Number of disabled nodes, if different, trumps all else. */
> +     if (unlikely(path1->disabled_nodes != path2->disabled_nodes))
> +     {
> +             if (path1->disabled_nodes < path2->disabled_nodes)
> +                     return -1;
> +             else
> +                     return +1;
> +     }

I suspect it's going to be ok, because the branch is going to be very
predictable in normal workloads, but I still worry a bit about making
compare_path_costs_fuzzily() more expensive. For more join-heavy queries it
can really show up and there's plenty ORM generated join-heavy query
workloads.

If costs were 32 bit integers, I'd have suggested just stashing the disabled
counts in the upper 32 bits of a 64bit integer. But ...

<can't resist trying if I see overhead>


In an extreme case i can see a tiny bit of overhead, but not enough to be
worth worrying about. Mostly because we're so profligate in doing
bms_overlap() that cost comparisons don't end up mattering as much - I seem to
recall that being different in the not distant past though.


Aside: I'm somewhat confused by add_paths_to_joinrel()'s handling of
mergejoins_allowed. If mergejoins are disabled we end up reaching
match_unsorted_outer() in more cases than with mergejoins enabled. E.g. we
only set mergejoin_enabled for right joins inside select_mergejoin_clauses(),
but we don't call select_mergejoin_clauses() if !enable_mergejoin and jointype
!= FULL.  I, what?


Greetings,

Andres Freund


Reply via email to