Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra Sat, 20 Jul 2019 06:23:32 -0700

On Fri, Jul 19, 2019 at 04:59:21PM -0400, James Coleman wrote:

On Mon, Jul 8, 2019 at 9:37 PM Tomas Vondra
<[email protected]> wrote:

Now, consider this example:


  create table t (a int, b int, c int);
  insert into t select mod(i,100),mod(i,100),i from generate_series(1,10000000) 
s(i);
  create index on t (a);
  analyze t;
  explain select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1;

With 0001+0002+0003 pathes, I get a plan like this:

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10375.39..10594.72 rows=1 width=16)
   ->  Incremental Sort  (cost=10375.39..2203675.71 rows=10000 width=16)
         Sort Key: a, b, (sum(c))
         Presorted Key: a, b
         ->  GroupAggregate  (cost=10156.07..2203225.71 rows=10000 width=16)
               Group Key: a, b
               ->  Gather Merge  (cost=10156.07..2128124.39 rows=10000175 
width=12)
                     Workers Planned: 2
                     ->  Incremental Sort  (cost=9156.04..972856.05 
rows=4166740 width=12)
                           Sort Key: a, b
                           Presorted Key: a
                           ->  Parallel Index Scan using t_a_idx on t  
(cost=0.43..417690.30 rows=4166740 width=12)
(12 rows)


and with 0004, I get this:

                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Limit  (cost=20443.84..20665.32 rows=1 width=16)
   ->  Incremental Sort  (cost=20443.84..2235250.05 rows=10000 width=16)
         Sort Key: a, b, (sum(c))
         Presorted Key: a, b
         ->  GroupAggregate  (cost=20222.37..2234800.05 rows=10000 width=16)
               Group Key: a, b
               ->  Incremental Sort  (cost=20222.37..2159698.74 rows=10000175 
width=12)
                     Sort Key: a, b
                     Presorted Key: a
                     ->  Index Scan using t_a_idx on t  (cost=0.43..476024.65 
rows=10000175 width=12)
(10 rows)

Notice that cost of the second plan is almost double the first one. That
means 0004 does not even generate the first plan, i.e. there are cases
where we don't try to add the explicit sort before passing the path to
generate_gather_paths().

And I think I know why is that - while gather_grouping_paths() tries to
add explicit sort below the gather merge, there are other places that
call generate_gather_paths() that don't do that. In this case it's
probably apply_scanjoin_target_to_paths() which simply builds

   parallel (seq|index) scan + gather merge

and that's it. The problem is likely the same - the code does not know
which pathkeys are "interesting" at that point. We probably need to
teach planner to do this.


I've been working on figuring out sample queries for each of the
places we're looking at adding create_increment_sort() (starting with
the cases enabled by gather-merge nodes). The
generate_useful_gather_paths() call in
apply_scanjoin_target_to_paths() is required to generate the above
preferred plan.

But I found that if I set enable_sort=off (with our without the
_useful_ variant of generate_gather_paths()) I get a very similar plan
that's actually lower cost yet:

                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Limit  (cost=10255.98..10355.77 rows=1 width=16)
  ->  Incremental Sort  (cost=10255.98..1008228.67 rows=10000 width=16)
        Sort Key: a, b, (sum(c))
        Presorted Key: a, b
        ->  Finalize GroupAggregate  (cost=10156.20..1007778.67
rows=10000 width=16)
              Group Key: a, b
              ->  Gather Merge  (cost=10156.20..1007528.67 rows=20000 width=16)
                    Workers Planned: 2
                    ->  Partial GroupAggregate
(cost=9156.18..1004220.15 rows=10000 width=16)
                          Group Key: a, b
                          ->  Incremental Sort
(cost=9156.18..972869.60 rows=4166740 width=12)
                                Sort Key: a, b
                                Presorted Key: a
                                ->  Parallel Index Scan using t_a_idx
on t  (cost=0.43..417703.85 rows=4166740 width=12)

Is that something we should consider a bug at this stage? It's also
not clear to mean (costing aside) which plan is intuitively
preferable.


This seems like a thinko in add_partial_path() - it only looks at total
cost of the paths, and ignores the startup cost entirely. I've debugged
it a bit, and what's happening for the partially-grouped relation is
roughly this:

1) We add a partial path with startup/total costs 696263 / 738029

2) We attempt to add the "Partial GroupAggregate" path, but it loses the
fight because the total cost (1004207) is higher than the first path.
Which entirely misses that the startup cost is way lower.

3) We however use the startup cost later when computing the LIMIT cost
(because that's linear approximation between startup and total cost),
and we reject the first path too, because we happen to find something
cheaper (but more expensive than what we'd get the path rejected in 2).

Attached is a debug patch which makes this clear - it only prints info
about first step of partial aggregates, because that's what matters in
this example. You should see something like this:

WARNING:  rel 0x2aa8e00 adding partial agg path 0x2aa9448 startup 696263.839993 
total 738029.919993
WARNING:  rel 0x2aa8e00 path 0x2aa9448 adding new = 1
WARNING:  rel 0x2aa8e00 adding partial agg path 0x2aa9710 startup 9156.084995 
total 1004207.854495
WARNING:  A: new path 0x2aa9710 rejected because of 0x2aa9448
WARNING:  rel 0x2aa8e00 path 0x2aa9710 adding new = 0

which essentially says "path rejected because of total cost" (and you
know it's the interesting partial aggregate from the second plan). And
if you disable sort, you get this:

WARNING:  rel 0x2aa8e00 adding partial agg path 0x2aa9448 startup 
10000696263.839994 total 10000738029.919996
WARNING:  rel 0x2aa8e00 path 0x2aa9448 adding new = 1
WARNING:  rel 0x2aa8e00 adding partial agg path 0x2aa9710 startup 9156.084995 
total 1004207.854495
WARNING:  rel 0x2aa8e00 path 0x2aa9710 adding new = 1
...

So in this case we decided the path is interesting, thanks to the
increased cost of sort.

The comment for add_partial_path says this:

*  Neither do we need to consider startup costs: parallelism is only
*  used for plans that will be run to completion.  Therefore, this
*  routine is much simpler than add_path: it needs to consider only
*  pathkeys and total cost.

I think this may be a thinko, as this plan demonstrates - but I'm not
sure about it. I wonder if this might be penalizing some other types of
plans (essentially anything with limit + gather).

Attached is a WIP patch fixing this by considering both startup and
total cost (by calling compare_path_costs_fuzzily).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 1363b9bd45..4fd8a77f48 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1833,9 +1833,6 @@ cost_incremental_sort(Path *path,
 
        Assert(presorted_keys != 0);
 
-       if (!enable_sort)
-               startup_cost += disable_cost;
-
        if (!enable_incrementalsort)
                startup_cost += disable_cost;
 
diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index 704cfd032d..f920fbff30 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -762,6 +762,15 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
        /* Relation should be OK for parallelism, too. */
        Assert(parent_rel->consider_parallel);
 
+       if (IsA(new_path, AggPath))
+       {
+               AggPath *path = (AggPath *) new_path;
+               /* we only care about first step of partial aggregates here */
+               if (path->aggsplit == AGGSPLIT_INITIAL_SERIAL)
+                       elog(WARNING, "rel %p adding partial agg path %p 
startup %f total %f", 
+                                parent_rel, path, new_path->startup_cost, 
new_path->total_cost);
+       }
+
        /*
         * As in add_path, throw out any paths which are dominated by the new
         * path, but throw out the new path if some existing path dominates it.
@@ -782,7 +791,10 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
                        {
                                /* New path costs more; keep it only if 
pathkeys are better. */
                                if (keyscmp != PATHKEYS_BETTER1)
+                               {
+                                       elog(WARNING, "A: new path %p rejected 
because of %p", new_path, old_path);
                                        accept_new = false;
+                               }
                        }
                        else if (old_path->total_cost > new_path->total_cost
                                         * STD_FUZZ_FACTOR)
@@ -799,6 +811,7 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
                        else if (keyscmp == PATHKEYS_BETTER2)
                        {
                                /* Costs are about the same, old path has 
better pathkeys. */
+                               elog(WARNING, "B: new path %p rejected because 
of %p", new_path, old_path);
                                accept_new = false;
                        }
                        else if (old_path->total_cost > new_path->total_cost * 
1.0000000001)
@@ -813,6 +826,7 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
                                 * cheaper.
                                 */
                                accept_new = false;
+                               elog(WARNING, "C: new path %p rejected because 
of %p", new_path, old_path);
                        }
                }
 
@@ -841,6 +855,8 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
                        break;
        }
 
+       elog(WARNING, "rel %p path %p adding new = %d", parent_rel, new_path, 
accept_new);
+
        if (accept_new)
        {
                /* Accept the new path: insert it at proper place */

diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index 0ac73984d2..66afbd93ac 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -778,41 +807,30 @@ add_partial_path(RelOptInfo *parent_rel, Path *new_path)
                /* Unless pathkeys are incompable, keep just one of the two 
paths. */
                if (keyscmp != PATHKEYS_DIFFERENT)
                {
-                       if (new_path->total_cost > old_path->total_cost * 
STD_FUZZ_FACTOR)
-                       {
-                               /* New path costs more; keep it only if 
pathkeys are better. */
-                               if (keyscmp != PATHKEYS_BETTER1)
-                                       accept_new = false;
-                       }
-                       else if (old_path->total_cost > new_path->total_cost
-                                        * STD_FUZZ_FACTOR)
+                       PathCostComparison costcmp;
+
+                       /*
+                        * Do a fuzzy cost comparison with standard fuzziness 
limit.
+                        */
+                       costcmp = compare_path_costs_fuzzily(new_path, old_path,
+                                                                               
                 STD_FUZZ_FACTOR);
+
+                       if (costcmp == COSTS_BETTER1)
                        {
-                               /* Old path costs more; keep it only if 
pathkeys are better. */
-                               if (keyscmp != PATHKEYS_BETTER2)
+                               if (keyscmp == PATHKEYS_BETTER1)
                                        remove_old = true;
                        }
-                       else if (keyscmp == PATHKEYS_BETTER1)
+                       else if (costcmp == COSTS_BETTER2)
                        {
-                               /* Costs are about the same, new path has 
better pathkeys. */
-                               remove_old = true;
-                       }
-                       else if (keyscmp == PATHKEYS_BETTER2)
-                       {
-                               /* Costs are about the same, old path has 
better pathkeys. */
-                               accept_new = false;
-                       }
-                       else if (old_path->total_cost > new_path->total_cost * 
1.0000000001)
-                       {
-                               /* Pathkeys are the same, and the old path 
costs more. */
-                               remove_old = true;
+                               if (keyscmp == PATHKEYS_BETTER2)
+                                       accept_new = false;
                        }
-                       else
+                       else if (costcmp == COSTS_EQUAL)
                        {
-                               /*
-                                * Pathkeys are the same, and new path isn't 
materially
-                                * cheaper.
-                                */
-                               accept_new = false;
+                               if (keyscmp == PATHKEYS_BETTER1)
+                                       remove_old = true;
+                               else if (keyscmp == PATHKEYS_BETTER2)
+                                       accept_new = false;
                        }
                }

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Reply via email to