Hi, For the record, here is the relevant part of the Incremental Sort patch series, updating add_partial_path and add_partial_path_precheck to also consider startup cost.
The changes in the first two patches are pretty straight-forward, plus there's a proposed optimization in the precheck function to only run compare_pathkeys if entirely necessary. I'm currently evaluating those changes and I'll post the results to the incremental sort thread. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 761b935584229243ecc6fd47d83e86d6b1b382c7 Mon Sep 17 00:00:00 2001 From: Tomas Vondra <[email protected]> Date: Sun, 28 Jul 2019 15:55:54 +0200 Subject: [PATCH 1/5] Consider low startup cost when adding partial path 45be99f8cd5d606086e0a458c9c72910ba8a613d added `add_partial_path` with the comment: > Neither do we need to consider startup costs: > parallelism is only used for plans that will be run to completion. > Therefore, this routine is much simpler than add_path: it needs to > consider only pathkeys and total cost. I'm not entirely sure if that is still true or not--I can't easily come up with a scenario in which it's not, but I also can't come up with an inherent reason why such a scenario cannot exist. Regardless, the in-progress incremental sort patch uncovered a new case where it definitely no longer holds, and, as a result a higher cost plan ends up being chosen because a low startup cost partial path is ignored in favor of a lower total cost partial path and a limit is a applied on top of that which would normal favor the lower startup cost plan. --- src/backend/optimizer/util/pathnode.c | 65 +++++++++++++-------------- 1 file changed, 31 insertions(+), 34 deletions(-) diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 8ba8122ee2..b570bfd3be 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -733,10 +733,11 @@ add_path_precheck(RelOptInfo *parent_rel, * * Because we don't consider parameterized paths here, we also don't * need to consider the row counts as a measure of quality: every path will - * produce the same number of rows. Neither do we need to consider startup - * costs: parallelism is only used for plans that will be run to completion. - * Therefore, this routine is much simpler than add_path: it needs to - * consider only pathkeys and total cost. + * produce the same number of rows. It may however matter how much the + * path ordering matches the final ordering, needed by upper parts of the + * plan. Because that will affect how expensive the incremental sort is, + * we need to consider both the total and startup path, in addition to + * pathkeys. * * As with add_path, we pfree paths that are found to be dominated by * another partial path; this requires that there be no other references to @@ -774,44 +775,40 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path) /* Compare pathkeys. */ keyscmp = compare_pathkeys(new_path->pathkeys, old_path->pathkeys); - /* Unless pathkeys are incompatible, keep just one of the two paths. */ + /* + * Unless pathkeys are incompatible, see if one of the paths dominates + * the other (both in startup and total cost). It may happen that one + * path has lower startup cost, the other has lower total cost. + * + * XXX Perhaps we could do this only when incremental sort is enabled, + * and use the simpler version (comparing just total cost) otherwise? + */ if (keyscmp != PATHKEYS_DIFFERENT) { - if (new_path->total_cost > old_path->total_cost * STD_FUZZ_FACTOR) - { - /* New path costs more; keep it only if pathkeys are better. */ - if (keyscmp != PATHKEYS_BETTER1) - accept_new = false; - } - else if (old_path->total_cost > new_path->total_cost - * STD_FUZZ_FACTOR) + PathCostComparison costcmp; + + /* + * Do a fuzzy cost comparison with standard fuzziness limit. + */ + costcmp = compare_path_costs_fuzzily(new_path, old_path, + STD_FUZZ_FACTOR); + + if (costcmp == COSTS_BETTER1) { - /* Old path costs more; keep it only if pathkeys are better. */ - if (keyscmp != PATHKEYS_BETTER2) + if (keyscmp == PATHKEYS_BETTER1) remove_old = true; } - else if (keyscmp == PATHKEYS_BETTER1) + else if (costcmp == COSTS_BETTER2) { - /* Costs are about the same, new path has better pathkeys. */ - remove_old = true; - } - else if (keyscmp == PATHKEYS_BETTER2) - { - /* Costs are about the same, old path has better pathkeys. */ - accept_new = false; - } - else if (old_path->total_cost > new_path->total_cost * 1.0000000001) - { - /* Pathkeys are the same, and the old path costs more. */ - remove_old = true; + if (keyscmp == PATHKEYS_BETTER2) + accept_new = false; } - else + else if (costcmp == COSTS_EQUAL) { - /* - * Pathkeys are the same, and new path isn't materially - * cheaper. - */ - accept_new = false; + if (keyscmp == PATHKEYS_BETTER1) + remove_old = true; + else if (keyscmp == PATHKEYS_BETTER2) + accept_new = false; } } -- 2.21.1
>From 060234426851cb8f815fb873ab5aaf33b3830143 Mon Sep 17 00:00:00 2001 From: Tomas Vondra <[email protected]> Date: Fri, 3 Apr 2020 18:26:29 +0200 Subject: [PATCH 2/5] rework add_partial_path_precheck too --- src/backend/optimizer/path/joinpath.c | 12 ++++++++--- src/backend/optimizer/util/pathnode.c | 31 ++++++++++++--------------- src/include/optimizer/pathnode.h | 3 ++- 3 files changed, 25 insertions(+), 21 deletions(-) diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index db54a6ba2e..1d0c3e8027 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -515,7 +515,9 @@ try_partial_nestloop_path(PlannerInfo *root, */ initial_cost_nestloop(root, &workspace, jointype, outer_path, inner_path, extra); - if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys)) + if (!add_partial_path_precheck(joinrel, + workspace.startup_cost, workspace.total_cost, + pathkeys)) return; /* @@ -693,7 +695,9 @@ try_partial_mergejoin_path(PlannerInfo *root, outersortkeys, innersortkeys, extra); - if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys)) + if (!add_partial_path_precheck(joinrel, + workspace.startup_cost, workspace.total_cost, + pathkeys)) return; /* Might be good enough to be worth trying, so let's try it. */ @@ -817,7 +821,9 @@ try_partial_hashjoin_path(PlannerInfo *root, */ initial_cost_hashjoin(root, &workspace, jointype, hashclauses, outer_path, inner_path, extra, parallel_hash); - if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL)) + if (!add_partial_path_precheck(joinrel, + workspace.startup_cost, workspace.total_cost, + NIL)) return; /* Might be good enough to be worth trying, so let's try it. */ diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index b570bfd3be..7211fc35fd 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -854,15 +854,14 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path) * add_partial_path_precheck * Check whether a proposed new partial path could possibly get accepted. * - * Unlike add_path_precheck, we can ignore startup cost and parameterization, - * since they don't matter for partial paths (see add_partial_path). But - * we do want to make sure we don't add a partial path if there's already - * a complete path that dominates it, since in that case the proposed path - * is surely a loser. + * Unlike add_path_precheck, we can ignore parameterization, since it doesn't + * matter for partial paths (see add_partial_path). But we do want to make + * sure we don't add a partial path if there's already a complete path that + * dominates it, since in that case the proposed path is surely a loser. */ bool -add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost, - List *pathkeys) +add_partial_path_precheck(RelOptInfo *parent_rel, Cost startup_cost, + Cost total_cost, List *pathkeys) { ListCell *p1; @@ -885,11 +884,14 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost, keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys); if (keyscmp != PATHKEYS_DIFFERENT) { - if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR && - keyscmp != PATHKEYS_BETTER1) + if ((startup_cost > old_path->startup_cost * STD_FUZZ_FACTOR) && + (total_cost > old_path->total_cost * STD_FUZZ_FACTOR) && + (keyscmp != PATHKEYS_BETTER1)) return false; - if (old_path->total_cost > total_cost * STD_FUZZ_FACTOR && - keyscmp != PATHKEYS_BETTER2) + + if ((old_path->startup_cost > startup_cost * STD_FUZZ_FACTOR) && + (old_path->total_cost > total_cost * STD_FUZZ_FACTOR) && + (keyscmp != PATHKEYS_BETTER2)) return true; } } @@ -899,13 +901,8 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost, * clearly good enough that it might replace one. Compare it to * non-parallel plans. If it loses even before accounting for the cost of * the Gather node, we should definitely reject it. - * - * Note that we pass the total_cost to add_path_precheck twice. This is - * because it's never advantageous to consider the startup cost of a - * partial path; the resulting plans, if run in parallel, will be run to - * completion. */ - if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys, + if (!add_path_precheck(parent_rel, startup_cost, total_cost, pathkeys, NULL)) return false; diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index e450fe112a..e73c5637cc 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -32,7 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel, List *pathkeys, Relids required_outer); extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path); extern bool add_partial_path_precheck(RelOptInfo *parent_rel, - Cost total_cost, List *pathkeys); + Cost startup_cost, Cost total_cost, + List *pathkeys); extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer, int parallel_workers); -- 2.21.1
>From e5dc0ccd72cd4dce25de22982a1d950ae73f1f6a Mon Sep 17 00:00:00 2001 From: Tomas Vondra <[email protected]> Date: Fri, 3 Apr 2020 18:43:50 +0200 Subject: [PATCH 3/5] rework add_partial_path_precheck - check costs first --- src/backend/optimizer/util/pathnode.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 7211fc35fd..4e798b801a 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -880,18 +880,25 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost startup_cost, { Path *old_path = (Path *) lfirst(p1); PathKeysComparison keyscmp; + bool compared = false; - keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys); - if (keyscmp != PATHKEYS_DIFFERENT) + if ((startup_cost > old_path->startup_cost * STD_FUZZ_FACTOR) && + (total_cost > old_path->total_cost * STD_FUZZ_FACTOR)) { - if ((startup_cost > old_path->startup_cost * STD_FUZZ_FACTOR) && - (total_cost > old_path->total_cost * STD_FUZZ_FACTOR) && - (keyscmp != PATHKEYS_BETTER1)) + keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys); + compared = true; + + if (keyscmp != PATHKEYS_BETTER1) return false; + } + + if ((old_path->startup_cost > startup_cost * STD_FUZZ_FACTOR) && + (old_path->total_cost > total_cost * STD_FUZZ_FACTOR)) + { + if (!compared) + keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys); - if ((old_path->startup_cost > startup_cost * STD_FUZZ_FACTOR) && - (old_path->total_cost > total_cost * STD_FUZZ_FACTOR) && - (keyscmp != PATHKEYS_BETTER2)) + if (keyscmp != PATHKEYS_BETTER2) return true; } } -- 2.21.1
