On Thu, Apr 16, 2020 at 12:04:03PM -0400, James Coleman wrote:
On Thu, Apr 16, 2020 at 8:22 AM Richard Guo <guofengli...@gmail.com> wrote:


On Thu, Apr 16, 2020 at 6:35 PM Richard Guo <guofengli...@gmail.com> wrote:


On Wed, Apr 15, 2020 at 10:47 PM Tomas Vondra <tomas.von...@2ndquadrant.com> 
wrote:


Well, yeah. The problem is the Unique simply compares the columns in the
order it sees them, and it does not match the column order desired by
incremental sort. But we don't push down this information at all :-(


This is a nice optimization better to have. Since the 'Sort and Unique'
would unique-ify the result of a UNION by sorting on all columns, why
not we adjust the sort order trying to match parse->sortClause so that
we can avoid the final sort node?

Doing that we can transform plan from:

# explain (costs off) select * from foo union select * from foo order by 1,3;
                  QUERY PLAN
-----------------------------------------------
 Incremental Sort
   Sort Key: foo.a, foo.c
   Presorted Key: foo.a
   ->  Unique
         ->  Sort
               Sort Key: foo.a, foo.b, foo.c
               ->  Append
                     ->  Seq Scan on foo
                     ->  Seq Scan on foo foo_1
(9 rows)

To:

# explain (costs off) select * from foo union select * from foo order by 1,3;
               QUERY PLAN
-----------------------------------------
 Unique
   ->  Sort
         Sort Key: foo.a, foo.c, foo.b
         ->  Append
               ->  Seq Scan on foo
               ->  Seq Scan on foo foo_1
(6 rows)


Attached is what I'm thinking about this optimization. Does it make any
sense?

Shouldn't this go one either a new thread or on the thread for the
patch Tomas was referencing (by Teodor I believe)?


FWIW the optimization I had in mind is this:

  https://commitfest.postgresql.org/21/1651/

I now realize that was about GROUP BY, but it's not all that different
and the concerns will / should be fairly similar, I think.

IMO simply tweaking the sort keys to match the upper parts of the plan
is probably way too simplistic, I'm afraid. For example, if the Unique
significantly reduces cardinality, then the cost of the additional sort
is much less important. It may be much better to optimize the "large"
sort of the whole data set, either by reordering the columns as proposed
by Teodor in his patch (by number of distinct values and/or cost of the
comparison function function).

Furthermore, this is one of the places that is not using incremental
sort yet - I can easily imagine doing something like this:


   Sort
     -> Unique
        -> Incremenal Sort
           -> ...

could be a massive win. So I think we can't just rejigger the sort keys
abitrarily, we should / need to consider those alternatives.

Or are you saying you believe this patch guarantees we never see this
problem in incremental sort costing?


Yeah, that's not entirely close to me. But maybe it shows us where we to
get the unprocessed target list?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to