Hi,

For the record, here is the relevant part of the Incremental Sort patch
series, updating add_partial_path and add_partial_path_precheck to also
consider startup cost.

The changes in the first two patches are pretty straight-forward, plus
there's a proposed optimization in the precheck function to only run
compare_pathkeys if entirely necessary. I'm currently evaluating those
changes and I'll post the results to the incremental sort thread.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 761b935584229243ecc6fd47d83e86d6b1b382c7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Sun, 28 Jul 2019 15:55:54 +0200
Subject: [PATCH 1/5] Consider low startup cost when adding partial path

45be99f8cd5d606086e0a458c9c72910ba8a613d added `add_partial_path` with the
comment:

> Neither do we need to consider startup costs:
> parallelism is only used for plans that will be run to completion.
> Therefore, this routine is much simpler than add_path: it needs to
> consider only pathkeys and total cost.

I'm not entirely sure if that is still true or not--I can't easily come
up with a scenario in which it's not, but I also can't come up with an
inherent reason why such a scenario cannot exist.

Regardless, the in-progress incremental sort patch uncovered a new case
where it definitely no longer holds, and, as a result a higher cost plan
ends up being chosen because a low startup cost partial path is ignored
in favor of a lower total cost partial path and a limit is a applied on
top of that which would normal favor the lower startup cost plan.
---
 src/backend/optimizer/util/pathnode.c | 65 +++++++++++++--------------
 1 file changed, 31 insertions(+), 34 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index 8ba8122ee2..b570bfd3be 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -733,10 +733,11 @@ add_path_precheck(RelOptInfo *parent_rel,
  *
  *       Because we don't consider parameterized paths here, we also don't
  *       need to consider the row counts as a measure of quality: every path 
will
- *       produce the same number of rows.  Neither do we need to consider 
startup
- *       costs: parallelism is only used for plans that will be run to 
completion.
- *       Therefore, this routine is much simpler than add_path: it needs to
- *       consider only pathkeys and total cost.
+ *       produce the same number of rows.  It may however matter how much the
+ *       path ordering matches the final ordering, needed by upper parts of the
+ *       plan. Because that will affect how expensive the incremental sort is,
+ *       we need to consider both the total and startup path, in addition to
+ *       pathkeys.
  *
  *       As with add_path, we pfree paths that are found to be dominated by
  *       another partial path; this requires that there be no other references 
to
@@ -774,44 +775,40 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
                /* Compare pathkeys. */
                keyscmp = compare_pathkeys(new_path->pathkeys, 
old_path->pathkeys);
 
-               /* Unless pathkeys are incompatible, keep just one of the two 
paths. */
+               /*
+                * Unless pathkeys are incompatible, see if one of the paths 
dominates
+                * the other (both in startup and total cost). It may happen 
that one
+                * path has lower startup cost, the other has lower total cost.
+                *
+                * XXX Perhaps we could do this only when incremental sort is 
enabled,
+                * and use the simpler version (comparing just total cost) 
otherwise?
+                */
                if (keyscmp != PATHKEYS_DIFFERENT)
                {
-                       if (new_path->total_cost > old_path->total_cost * 
STD_FUZZ_FACTOR)
-                       {
-                               /* New path costs more; keep it only if 
pathkeys are better. */
-                               if (keyscmp != PATHKEYS_BETTER1)
-                                       accept_new = false;
-                       }
-                       else if (old_path->total_cost > new_path->total_cost
-                                        * STD_FUZZ_FACTOR)
+                       PathCostComparison costcmp;
+
+                       /*
+                        * Do a fuzzy cost comparison with standard fuzziness 
limit.
+                        */
+                       costcmp = compare_path_costs_fuzzily(new_path, old_path,
+                                                                               
                 STD_FUZZ_FACTOR);
+
+                       if (costcmp == COSTS_BETTER1)
                        {
-                               /* Old path costs more; keep it only if 
pathkeys are better. */
-                               if (keyscmp != PATHKEYS_BETTER2)
+                               if (keyscmp == PATHKEYS_BETTER1)
                                        remove_old = true;
                        }
-                       else if (keyscmp == PATHKEYS_BETTER1)
+                       else if (costcmp == COSTS_BETTER2)
                        {
-                               /* Costs are about the same, new path has 
better pathkeys. */
-                               remove_old = true;
-                       }
-                       else if (keyscmp == PATHKEYS_BETTER2)
-                       {
-                               /* Costs are about the same, old path has 
better pathkeys. */
-                               accept_new = false;
-                       }
-                       else if (old_path->total_cost > new_path->total_cost * 
1.0000000001)
-                       {
-                               /* Pathkeys are the same, and the old path 
costs more. */
-                               remove_old = true;
+                               if (keyscmp == PATHKEYS_BETTER2)
+                                       accept_new = false;
                        }
-                       else
+                       else if (costcmp == COSTS_EQUAL)
                        {
-                               /*
-                                * Pathkeys are the same, and new path isn't 
materially
-                                * cheaper.
-                                */
-                               accept_new = false;
+                               if (keyscmp == PATHKEYS_BETTER1)
+                                       remove_old = true;
+                               else if (keyscmp == PATHKEYS_BETTER2)
+                                       accept_new = false;
                        }
                }
 
-- 
2.21.1

>From 060234426851cb8f815fb873ab5aaf33b3830143 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <t...@fuzzy.cz>
Date: Fri, 3 Apr 2020 18:26:29 +0200
Subject: [PATCH 2/5] rework add_partial_path_precheck too

---
 src/backend/optimizer/path/joinpath.c | 12 ++++++++---
 src/backend/optimizer/util/pathnode.c | 31 ++++++++++++---------------
 src/include/optimizer/pathnode.h      |  3 ++-
 3 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c 
b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..1d0c3e8027 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -515,7 +515,9 @@ try_partial_nestloop_path(PlannerInfo *root,
         */
        initial_cost_nestloop(root, &workspace, jointype,
                                                  outer_path, inner_path, 
extra);
-       if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+       if (!add_partial_path_precheck(joinrel,
+                                                                  
workspace.startup_cost, workspace.total_cost,
+                                                                  pathkeys))
                return;
 
        /*
@@ -693,7 +695,9 @@ try_partial_mergejoin_path(PlannerInfo *root,
                                                   outersortkeys, innersortkeys,
                                                   extra);
 
-       if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
+       if (!add_partial_path_precheck(joinrel,
+                                                                  
workspace.startup_cost, workspace.total_cost,
+                                                                  pathkeys))
                return;
 
        /* Might be good enough to be worth trying, so let's try it. */
@@ -817,7 +821,9 @@ try_partial_hashjoin_path(PlannerInfo *root,
         */
        initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
                                                  outer_path, inner_path, 
extra, parallel_hash);
-       if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL))
+       if (!add_partial_path_precheck(joinrel,
+                                                                  
workspace.startup_cost, workspace.total_cost,
+                                                                  NIL))
                return;
 
        /* Might be good enough to be worth trying, so let's try it. */
diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index b570bfd3be..7211fc35fd 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -854,15 +854,14 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
  * add_partial_path_precheck
  *       Check whether a proposed new partial path could possibly get accepted.
  *
- * Unlike add_path_precheck, we can ignore startup cost and parameterization,
- * since they don't matter for partial paths (see add_partial_path).  But
- * we do want to make sure we don't add a partial path if there's already
- * a complete path that dominates it, since in that case the proposed path
- * is surely a loser.
+ * Unlike add_path_precheck, we can ignore parameterization, since it doesn't
+ * matter for partial paths (see add_partial_path).  But we do want to make
+ * sure we don't add a partial path if there's already a complete path that
+ * dominates it, since in that case the proposed path is surely a loser.
  */
 bool
-add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
-                                                 List *pathkeys)
+add_partial_path_precheck(RelOptInfo *parent_rel, Cost startup_cost,
+                                                 Cost total_cost, List 
*pathkeys)
 {
        ListCell   *p1;
 
@@ -885,11 +884,14 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost 
total_cost,
                keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys);
                if (keyscmp != PATHKEYS_DIFFERENT)
                {
-                       if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR 
&&
-                               keyscmp != PATHKEYS_BETTER1)
+                       if ((startup_cost > old_path->startup_cost * 
STD_FUZZ_FACTOR) &&
+                               (total_cost > old_path->total_cost * 
STD_FUZZ_FACTOR) &&
+                               (keyscmp != PATHKEYS_BETTER1))
                                return false;
-                       if (old_path->total_cost > total_cost * STD_FUZZ_FACTOR 
&&
-                               keyscmp != PATHKEYS_BETTER2)
+
+                       if ((old_path->startup_cost > startup_cost * 
STD_FUZZ_FACTOR) &&
+                               (old_path->total_cost > total_cost * 
STD_FUZZ_FACTOR) &&
+                               (keyscmp != PATHKEYS_BETTER2))
                                return true;
                }
        }
@@ -899,13 +901,8 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost 
total_cost,
         * clearly good enough that it might replace one.  Compare it to
         * non-parallel plans.  If it loses even before accounting for the cost 
of
         * the Gather node, we should definitely reject it.
-        *
-        * Note that we pass the total_cost to add_path_precheck twice.  This is
-        * because it's never advantageous to consider the startup cost of a
-        * partial path; the resulting plans, if run in parallel, will be run to
-        * completion.
         */
-       if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys,
+       if (!add_path_precheck(parent_rel, startup_cost, total_cost, pathkeys,
                                                   NULL))
                return false;
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..e73c5637cc 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,7 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
                                                          List *pathkeys, 
Relids required_outer);
 extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path);
 extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
-                                                                         Cost 
total_cost, List *pathkeys);
+                                                                         Cost 
startup_cost, Cost total_cost,
+                                                                         List 
*pathkeys);
 
 extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
                                                                 Relids 
required_outer, int parallel_workers);
-- 
2.21.1

>From e5dc0ccd72cd4dce25de22982a1d950ae73f1f6a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <t...@fuzzy.cz>
Date: Fri, 3 Apr 2020 18:43:50 +0200
Subject: [PATCH 3/5] rework add_partial_path_precheck - check costs first

---
 src/backend/optimizer/util/pathnode.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index 7211fc35fd..4e798b801a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -880,18 +880,25 @@ add_partial_path_precheck(RelOptInfo *parent_rel, Cost 
startup_cost,
        {
                Path       *old_path = (Path *) lfirst(p1);
                PathKeysComparison keyscmp;
+               bool            compared = false;
 
-               keyscmp = compare_pathkeys(pathkeys, old_path->pathkeys);
-               if (keyscmp != PATHKEYS_DIFFERENT)
+               if ((startup_cost > old_path->startup_cost * STD_FUZZ_FACTOR) &&
+                       (total_cost > old_path->total_cost * STD_FUZZ_FACTOR))
                {
-                       if ((startup_cost > old_path->startup_cost * 
STD_FUZZ_FACTOR) &&
-                               (total_cost > old_path->total_cost * 
STD_FUZZ_FACTOR) &&
-                               (keyscmp != PATHKEYS_BETTER1))
+                       keyscmp = compare_pathkeys(pathkeys, 
old_path->pathkeys);
+                       compared = true;
+
+                       if (keyscmp != PATHKEYS_BETTER1)
                                return false;
+               }
+
+               if ((old_path->startup_cost > startup_cost * STD_FUZZ_FACTOR) &&
+                       (old_path->total_cost > total_cost * STD_FUZZ_FACTOR))
+               {
+                       if (!compared)
+                               keyscmp = compare_pathkeys(pathkeys, 
old_path->pathkeys);
 
-                       if ((old_path->startup_cost > startup_cost * 
STD_FUZZ_FACTOR) &&
-                               (old_path->total_cost > total_cost * 
STD_FUZZ_FACTOR) &&
-                               (keyscmp != PATHKEYS_BETTER2))
+                       if (keyscmp != PATHKEYS_BETTER2)
                                return true;
                }
        }
-- 
2.21.1

Reply via email to