subject:"Todo\: Teach planner to evaluate multiple windows in the optimal order"

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-08-16 Thread John Naylor

On Mon, Jul 3, 2023 at 5:15 PM David Rowley  wrote:
>
> On Fri, 30 Jun 2023 at 18:45, John Naylor 
wrote:

> I looked over your patch and don't see anything to report aside from
> the unfinished/undecided part around the tiebreak function for
> tuplesort_begin_index_hash().

I went ahead and added a degenerate function, just for consistency -- might
also head off possible alarms from code analysis tools.

> I also ran the benchmark script [1] with the patch from [2] and
> calculated the speedup with [2] with and without your v3 patch. I've
> attached two graphs with the benchmark results. Any value >100%
> indicates that performing the sort for the ORDER BY at the same time
> as the WindowAgg improves performance, whereas anything < 100%
> indicates a regression.  The bars in blue show the results without
> your v3 patch and the red bars show the results with your v3 patch.
> Looking at the remaining regressions it does not really feel like
> we've found the culprit for the regressions.  Certainly, v3 helps, but
> I just don't think it's to the level we'd need to make the window sort
> changes a good idea.
>
> I'm not sure exactly how best to proceed here. I think the tiebreak
> stuff is worth doing regardless, so maybe that can just go in to
> eliminate that as a factor and we or I can continue to see what else
> is to blame.

Thanks for testing again. Sounds good, I removed a now-invalidated comment,
pgindent'd, and pushed.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-06-30 Thread John Naylor

On Wed, Feb 15, 2023 at 11:23 AM John Naylor 
wrote:
>
>
> On Tue, Feb 14, 2023 at 5:46 PM David Rowley  wrote:
> >
> > I didn't do a detailed review of the sort patch, but I did wonder
> > about the use of the name "fallback"  in the new functions. The
> > comment in the following snippet from qsort_tuple_unsigned_compare()
> > makes me think "tiebreak" is a better name.
>
> I agree "tiebreak" is better.

Here is v3 with that change. I still need to make sure the tests cover all
cases, so I'll do that as time permits. Also creating CF entry.

--
John Naylor
EDB: http://www.enterprisedb.com
From be88d7cab46e44385762365f1ba23d7684f76d3a Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 30 Jan 2023 17:10:00 +0700
Subject: [PATCH v3] Split out tiebreaker comparisons from comparetup_*
 functions

Previously, if a specialized comparator found equal datum1 keys,
the "comparetup" function would repeat the comparison on the
datum before proceeding with the unabbreviated first key
and/or additional sort keys.

Move comparing additional sort keys into "tiebreak" functions so
that specialized comparators can call these directly if needed,
avoiding duplicate work.

Reviewed by David Rowley
Discussion: https://www.postgresql.org/message-id/CAFBsxsGaVfUrjTghpf%3DkDBYY%3DjWx1PN-fuusVe7Vw5s0XqGdGw%40mail.gmail.com
---
 src/backend/utils/sort/tuplesort.c |   6 +-
 src/backend/utils/sort/tuplesortvariants.c | 112 -
 src/include/utils/tuplesort.h  |   7 ++
 3 files changed, 95 insertions(+), 30 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index e5a4e5b371..3e872d7518 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -513,7 +513,7 @@ qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 	if (state->base.onlyKey != NULL)
 		return 0;
 
-	return state->base.comparetup(a, b, state);
+	return state->base.comparetup_tiebreak(a, b, state);
 }
 
 #if SIZEOF_DATUM >= 8
@@ -537,7 +537,7 @@ qsort_tuple_signed_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 	if (state->base.onlyKey != NULL)
 		return 0;
 
-	return state->base.comparetup(a, b, state);
+	return state->base.comparetup_tiebreak(a, b, state);
 }
 #endif
 
@@ -561,7 +561,7 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 	if (state->base.onlyKey != NULL)
 		return 0;
 
-	return state->base.comparetup(a, b, state);
+	return state->base.comparetup_tiebreak(a, b, state);
 }
 
 /*
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index eb6cfcfd00..dd9d4d3764 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -47,18 +47,24 @@ static void removeabbrev_datum(Tuplesortstate *state, SortTuple *stups,
 			   int count);
 static int	comparetup_heap(const SortTuple *a, const SortTuple *b,
 			Tuplesortstate *state);
+static int	comparetup_heap_tiebreak(const SortTuple *a, const SortTuple *b,
+			Tuplesortstate *state);
 static void writetup_heap(Tuplesortstate *state, LogicalTape *tape,
 		  SortTuple *stup);
 static void readtup_heap(Tuplesortstate *state, SortTuple *stup,
 		 LogicalTape *tape, unsigned int len);
 static int	comparetup_cluster(const SortTuple *a, const SortTuple *b,
 			   Tuplesortstate *state);
+static int	comparetup_cluster_tiebreak(const SortTuple *a, const SortTuple *b,
+			   Tuplesortstate *state);
 static void writetup_cluster(Tuplesortstate *state, LogicalTape *tape,
 			 SortTuple *stup);
 static void readtup_cluster(Tuplesortstate *state, SortTuple *stup,
 			LogicalTape *tape, unsigned int tuplen);
 static int	comparetup_index_btree(const SortTuple *a, const SortTuple *b,
    Tuplesortstate *state);
+static int	comparetup_index_btree_tiebreak(const SortTuple *a, const SortTuple *b,
+   Tuplesortstate *state);
 static int	comparetup_index_hash(const SortTuple *a, const SortTuple *b,
   Tuplesortstate *state);
 static void writetup_index(Tuplesortstate *state, LogicalTape *tape,
@@ -67,6 +73,8 @@ static void readtup_index(Tuplesortstate *state, SortTuple *stup,
 		  LogicalTape *tape, unsigned int len);
 static int	comparetup_datum(const SortTuple *a, const SortTuple *b,
 			 Tuplesortstate *state);
+static int	comparetup_datum_tiebreak(const SortTuple *a, const SortTuple *b,
+			 Tuplesortstate *state);
 static void writetup_datum(Tuplesortstate *state, LogicalTape *tape,
 		   SortTuple *stup);
 static void readtup_datum(Tuplesortstate *state, SortTuple *stup,
@@ -165,6 +173,7 @@ tuplesort_begin_heap(TupleDesc tupDesc,
 
 	base->removeabbrev = removeabbrev_heap;
 	base->comparetup = comparetup_heap;
+	base->comparetup_tiebreak = comparetup_heap_tiebreak;
 	base->writetup = writetup_heap;
 	base->readtup = readtup_heap;
 	base->haveDatum1 = true;
@@ -242,6 +251,7 @@

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-16 Thread John Naylor

On Thu, Feb 16, 2023 at 10:03 AM David Rowley  wrote:

> I suspect it's slower because the final sort must sort the entire
> array still without knowledge that portions of it are pre-sorted.  It
> would be very interesting to improve this and do some additional work
> and keep track of the "memtupsortedto" index by pushing them onto a
> List each time we cross the availMem boundary, then do then qsort just
> the final portion of the array in tuplesort_performsort() before doing
> a k-way merge on each segment rather than qsorting the entire thing
> again. I suspect this would be faster when work_mem exceeds L3 by some
> large amount.

Sounds like a reasonable thing to try.

It seems like in-memory merge could still use abbreviation, unlike external
merge.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-15 Thread David Rowley

On Thu, 16 Feb 2023 at 00:45, John Naylor  wrote:
> Okay, here's a rerun including the sort hack, and it looks like incremental 
> sort is only ahead with the smallest set, otherwise same or maybe slightly 
> slower:
>
> HEAD:
>
> 4 ^ 8: latency average = 113.461 ms
> 5 ^ 8: latency average = 786.080 ms
> 6 ^ 8: latency average = 3948.388 ms
> 7 ^ 8: latency average = 15733.348 ms
>
> tiebreaker:
>
> 4 ^ 8: latency average = 106.556 ms
> 5 ^ 8: latency average = 734.834 ms
> 6 ^ 8: latency average = 3640.507 ms
> 7 ^ 8: latency average = 14470.199 ms
>
> tiebreaker + incr sort hack:
>
> 4 ^ 8: latency average = 93.998 ms
> 5 ^ 8: latency average = 740.120 ms
> 6 ^ 8: latency average = 3715.942 ms
> 7 ^ 8: latency average = 14749.323 ms

Sad news :(  the sort hacks are still quite a bit faster for 4 ^ 8.

I was fooling around with the attached (very quickly and crudely put
together) patch just there. The idea is to sort during
tuplesort_puttuple_common() when the memory consumption goes over some
approximation of L3 cache size in the hope we still have cache lines
for the tuples in question still.   The code is not ideal there as we
track availMem rather than the used mem, so maybe I need to do that
better as we could cross some boundary without actually having done
very much, plus we USEMEM for other reasons too.

I found that the patch didn't really help:

create table t (a int not null, b int not null);
insert into t select (x*random())::int % 100,(x*random())::int % 100
from generate_Series(1,1000)x;
vacuum freeze t;
select pg_prewarm('t');
show work_mem;
 work_mem
--
 4GB

explain (analyze, timing off) select * from t order by a,b;

master:
Execution Time: 5620.704 ms
Execution Time: 5506.705 ms

patched:
Execution Time: 6801.421 ms
Execution Time: 6762.130 ms

I suspect it's slower because the final sort must sort the entire
array still without knowledge that portions of it are pre-sorted.  It
would be very interesting to improve this and do some additional work
and keep track of the "memtupsortedto" index by pushing them onto a
List each time we cross the availMem boundary, then do then qsort just
the final portion of the array in tuplesort_performsort() before doing
a k-way merge on each segment rather than qsorting the entire thing
again. I suspect this would be faster when work_mem exceeds L3 by some
large amount.

David
diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab..65ffe442cd 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -216,6 +216,7 @@ struct Tuplesortstate
SortTuple  *memtuples;  /* array of SortTuple structs */
int memtupcount;/* number of tuples currently 
present */
int memtupsize; /* allocated length of 
memtuples array */
+   int memtupsortedto; /* index of where we've already 
sorted up to */
boolgrowmemtuples;  /* memtuples' growth still underway? */
 
/*
@@ -465,7 +466,7 @@ static bool mergereadnext(Tuplesortstate *state, 
LogicalTape *srcTape, SortTuple
 static void dumptuples(Tuplesortstate *state, bool alltuples);
 static void make_bounded_heap(Tuplesortstate *state);
 static void sort_bounded_heap(Tuplesortstate *state);
-static void tuplesort_sort_memtuples(Tuplesortstate *state);
+static void tuplesort_sort_memtuples(Tuplesortstate *state, int start_index);
 static void tuplesort_heap_insert(Tuplesortstate *state, SortTuple *tuple);
 static void tuplesort_heap_replace_top(Tuplesortstate *state, SortTuple 
*tuple);
 static void tuplesort_heap_delete_top(Tuplesortstate *state);
@@ -708,6 +709,7 @@ tuplesort_begin_common(int workMem, SortCoordinate 
coordinate, int sortopt)
 */
state->memtupsize = INITIAL_MEMTUPSIZE;
state->memtuples = NULL;
+   state->memtupsortedto = 0;
 
/*
 * After all of the other non-parallel-related state, we setup all of 
the
@@ -793,6 +795,7 @@ tuplesort_begin_batch(Tuplesortstate *state)
state->tapeset = NULL;
 
state->memtupcount = 0;
+   state->memtupsortedto = 0;
 
/*
 * Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
@@ -1193,10 +1196,29 @@ tuplesort_puttuple_common(Tuplesortstate *state, 
SortTuple *tuple, bool useAbbre
 
Assert(!LEADER(state));
 
+/* just roughly. Must be power-of-2 for efficient division */
+#define EFFECTIVE_CPU_L3_SIZE 16777216
+
/* Count the size of the out-of-line data */
if (tuple->tuple != NULL)
+   {
+   size_t before_mem = state->availMem / EFFECTIVE_CPU_L3_SIZE;
+
USEMEM(state, GetMemoryChunkSpace(tuple->tuple));
 
+   /*
+* Being a bit more cache awareness to the sort and do a presort
+* of the tuple seen thus far while they're likely still 
sitting in
+*

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-15 Thread John Naylor

On Wed, Feb 15, 2023 at 3:02 PM David Rowley  wrote:
>
> On Wed, 15 Feb 2023 at 17:23, John Naylor 
wrote:
> > HEAD:
> >
> > 4 ^ 8: latency average = 113.976 ms
> > 5 ^ 8: latency average = 783.830 ms
> > 6 ^ 8: latency average = 3990.351 ms
> > 7 ^ 8: latency average = 15793.629 ms
> >
> > Skip rechecking first key:
> >
> > 4 ^ 8: latency average = 107.028 ms
> > 5 ^ 8: latency average = 732.327 ms
> > 6 ^ 8: latency average = 3709.882 ms
> > 7 ^ 8: latency average = 14570.651 ms

> Yeah, just a hack. My intention with it was just to prove we had a
> problem because sometimes Sort -> Incremental Sort was faster than
> Sort.  Ideally, with your change, we'd see that it's always faster to
> do the full sort in one go. It would be good to see your patch with
> and without the planner hack patch to ensure sort is now always faster
> than sort -> incremental sort.

Okay, here's a rerun including the sort hack, and it looks like incremental
sort is only ahead with the smallest set, otherwise same or maybe slightly
slower:

HEAD:

4 ^ 8: latency average = 113.461 ms
5 ^ 8: latency average = 786.080 ms
6 ^ 8: latency average = 3948.388 ms
7 ^ 8: latency average = 15733.348 ms

tiebreaker:

4 ^ 8: latency average = 106.556 ms
5 ^ 8: latency average = 734.834 ms
6 ^ 8: latency average = 3640.507 ms
7 ^ 8: latency average = 14470.199 ms

tiebreaker + incr sort hack:

4 ^ 8: latency average = 93.998 ms
5 ^ 8: latency average = 740.120 ms
6 ^ 8: latency average = 3715.942 ms
7 ^ 8: latency average = 14749.323 ms

And as far as this:

> I suspect that I fat-fingered work_mem yesterday, so next I'll pick a
badly-performing mod like 32, then range over work_mem and see if that
explains anything, especially whether L3 effects are in fact more important
in this workload.

Attached is a script and image for fixing the input at random / mod32 and
varying work_mem. There is not a whole lot of variation here: pushdown
regresses and the tiebreaker patch only helped marginally. I'm still not
sure why the results from yesterday looked better than today.

--
John Naylor
EDB: http://www.enterprisedb.com


bench_windowsort_workmem-20230215.sh
Description: application/shellscript

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-15 Thread John Naylor

I wrote:
> it might be worthwhile to "zoom in" with more measurements, but haven't
done that yet.

I've attached the script and image for 1 million / random / varying the mod
by quarter-log intervals. Unfortunately I didn't get as good results as
yesterday. Immediately going from mod 1 to mod 2, sort pushdown regresses
sharply and stays regressed up until 1. The tiebreaker patch helps but
never removes the regression.

I suspect that I fat-fingered work_mem yesterday, so next I'll pick a
badly-performing mod like 32, then range over work_mem and see if that
explains anything, especially whether L3 effects are in fact more important
in this workload.

--
John Naylor
EDB: http://www.enterprisedb.com


bench_windowsort-jcn-random-finegrained.sh
Description: application/shellscript

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-15 Thread David Rowley

On Wed, 15 Feb 2023 at 17:23, John Naylor  wrote:
> HEAD:
>
> 4 ^ 8: latency average = 113.976 ms
> 5 ^ 8: latency average = 783.830 ms
> 6 ^ 8: latency average = 3990.351 ms
> 7 ^ 8: latency average = 15793.629 ms
>
> Skip rechecking first key:
>
> 4 ^ 8: latency average = 107.028 ms
> 5 ^ 8: latency average = 732.327 ms
> 6 ^ 8: latency average = 3709.882 ms
> 7 ^ 8: latency average = 14570.651 ms

Thanks for testing that. It's good to see improvements in each of them.

> I gather that planner hack was just a demonstration, so I didn't test it, but 
> if that was a move toward something larger I can run additional tests.

Yeah, just a hack. My intention with it was just to prove we had a
problem because sometimes Sort -> Incremental Sort was faster than
Sort.  Ideally, with your change, we'd see that it's always faster to
do the full sort in one go. It would be good to see your patch with
and without the planner hack patch to ensure sort is now always faster
than sort -> incremental sort.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-14 Thread John Naylor

On Tue, Feb 14, 2023 at 5:46 PM David Rowley  wrote:
>
> I didn't do a detailed review of the sort patch, but I did wonder
> about the use of the name "fallback"  in the new functions. The
> comment in the following snippet from qsort_tuple_unsigned_compare()
> makes me think "tiebreak" is a better name.

I agree "tiebreak" is better.

> I also wonder if the weirdness I reported in [1] would also disappear
> with your patch.  There's a patch on that thread that hacks up the
> planner to split multi-column sorts into Sort -> Incremental Sort
> rather than just a single sort.

I tried that test (attached in script form) with and without the tiebreaker
patch and got some improvement:

HEAD:

4 ^ 8: latency average = 113.976 ms
5 ^ 8: latency average = 783.830 ms
6 ^ 8: latency average = 3990.351 ms
7 ^ 8: latency average = 15793.629 ms

Skip rechecking first key:

4 ^ 8: latency average = 107.028 ms
5 ^ 8: latency average = 732.327 ms
6 ^ 8: latency average = 3709.882 ms
7 ^ 8: latency average = 14570.651 ms

I gather that planner hack was just a demonstration, so I didn't test
it, but if that was a move toward something larger I can run additional
tests.

The configuration was (same as yesterday, but forgot to mention then)

turbo off
shared_buffers = '8GB';
work_mem = '4GB';
max_parallel_workers = 0;

--
John Naylor
EDB: http://www.enterprisedb.com


bench_cartesiansort.sh
Description: application/shellscript

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-14 Thread David Rowley

On Tue, 14 Feb 2023 at 17:21, John Naylor  wrote:
> Not rechecking seems to eliminate the regression in 4 cases, and reduce it in 
> the other 2 cases. For those 2 cases (10e6 rows, random, mod 10 and 100), it 
> might be worthwhile to "zoom in" with more measurements, but haven't done 
> that yet.

Thanks for writing up this patch and for running those tests again.
I'm surprised to see there's a decent about of truth in the surplus
recheck of the first column in tiebreaks (mostly) causing the
regression. I really would have suspected CPU caching effects to be a
bigger factor. From looking at your numbers, it looks like it's just
the mod=100 test in random and unsorted. fallback-on looks faster than
master-off for random mod=10.

I didn't do a detailed review of the sort patch, but I did wonder
about the use of the name "fallback"  in the new functions. The
comment in the following snippet from qsort_tuple_unsigned_compare()
makes me think "tiebreak" is a better name.

/*
* No need to waste effort calling the tiebreak function when there are no
* other keys to sort on.
*/
if (state->base.onlyKey != NULL)
return 0;

return state->base.comparetup_fallback(a, b, state);

I think if we fixed this duplicate recompare thing then I'd be much
more inclined to push the windowagg sort reduction stuff.

I also wonder if the weirdness I reported in [1] would also disappear
with your patch.  There's a patch on that thread that hacks up the
planner to split multi-column sorts into Sort -> Incremental Sort
rather than just a single sort.

David

[1] 
https://www.postgresql.org/message-id/CAApHDvpAO5H_L84kn9gCJ_hihOavtmDjimKYyftjWtF69BJ=8...@mail.gmail.com

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-13 Thread John Naylor

On Thu, Jan 26, 2023 at 9:11 AM David Rowley  wrote:
>
> I'm unsure if 69749243 might be partially to blame here as it favours
> single-key sorts.  If you look at qsort_tuple_signed_compare(), you'll
> see that the tiebreak function will be called only when it's needed
> and there are > 1 sort keys. The comparetup function will re-compare
> the first key all over again. If I get some more time I'll run the
> tests again with the sort specialisation code disabled to see if the
> situation is the same or not.

> I've attached the benchmark script that I used and also a copy of the
> patch with a GUC added solely to allow easier benchmarking of patched
> vs unpatched.

I've attached a cleaned up v2 (*) of a patch to avoid rechecking the first
column if a specialized comparator already did so, and the results of the
"bench_windowsort" benchmark. Here, master vs. patch refers to skipping the
first column recheck, and on/off is the dev guc from David's last patch.

In my test, orderby_windowclause_pushdown caused 6 regressions by itself.

Not rechecking seems to eliminate the regression in 4 cases, and reduce it
in the other 2 cases. For those 2 cases (10e6 rows, random, mod 10 and
100), it might be worthwhile to "zoom in" with more measurements, but
haven't done that yet.

* v1 was here, but I thought it best to keep everything in the same thread,
and that thread is nominally about a different kind of specialization:

https://www.postgresql.org/message-id/CAFBsxsFdFpzyBekxxkiA4vXnLpw-wcaQXz%3DEAP4pzkZMo91-MA%40mail.gmail.com

--
John Naylor
EDB: http://www.enterprisedb.com
From d051fc02cfae4053d22ae5f87f767636c2e6ff7f Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 30 Jan 2023 17:10:00 +0700
Subject: [PATCH v2] Split out fallback functionality from comparetup*
 functions

Previously, if a specialized comparator find equal datum1 keys,
the comparetup function would call the full comparator on the
datum before proceeding with either the unabbreviated first key
or the second key.

Add a comparetup_fallback field for these that call special
fallback functions. The existing comparetup functions just call
ApplySortComparator where we can, than call our new fallback.
---
 src/backend/utils/sort/tuplesort.c |   6 +-
 src/backend/utils/sort/tuplesortvariants.c | 109 -
 src/include/utils/tuplesort.h  |   7 ++
 3 files changed, 93 insertions(+), 29 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab..54f64d6c2d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -513,7 +513,7 @@ qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, 
Tuplesortstate *state)
if (state->base.onlyKey != NULL)
return 0;
 
-   return state->base.comparetup(a, b, state);
+   return state->base.comparetup_fallback(a, b, state);
 }
 
 #if SIZEOF_DATUM >= 8
@@ -537,7 +537,7 @@ qsort_tuple_signed_compare(SortTuple *a, SortTuple *b, 
Tuplesortstate *state)
if (state->base.onlyKey != NULL)
return 0;
 
-   return state->base.comparetup(a, b, state);
+   return state->base.comparetup_fallback(a, b, state);
 }
 #endif
 
@@ -561,7 +561,7 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, 
Tuplesortstate *state)
if (state->base.onlyKey != NULL)
return 0;
 
-   return state->base.comparetup(a, b, state);
+   return state->base.comparetup_fallback(a, b, state);
 }
 
 /*
diff --git a/src/backend/utils/sort/tuplesortvariants.c 
b/src/backend/utils/sort/tuplesortvariants.c
index eb6cfcfd00..4b49cd0a8d 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -47,18 +47,24 @@ static void removeabbrev_datum(Tuplesortstate *state, 
SortTuple *stups,
   int count);
 static int comparetup_heap(const SortTuple *a, const SortTuple *b,
Tuplesortstate *state);
+static int comparetup_heap_fallback(const SortTuple *a, const SortTuple *b,
+   Tuplesortstate *state);
 static void writetup_heap(Tuplesortstate *state, LogicalTape *tape,
  SortTuple *stup);
 static void readtup_heap(Tuplesortstate *state, SortTuple *stup,
 LogicalTape *tape, unsigned 
int len);
 static int comparetup_cluster(const SortTuple *a, const SortTuple *b,
   Tuplesortstate 
*state);
+static int comparetup_cluster_fallback(const SortTuple *a, const SortTuple 
*b,
+  Tuplesortstate 
*state);
 static void writetup_cluster(Tuplesortstate *state, LogicalTape *tape,

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-30 Thread Ankit Kumar Pandey




> On 30/01/23 11:01, John Naylor wrote:

> Since David referred to L3 size as the starting point of a possible 
configuration parameter, that's actually cache-conscious.


Okay, makes sense. I am correcting error on my part.


> I'm not close enough to this thread to guess at the right direction 
(although I hope related work will help), but I have a couple general 
remarks:


> 1. In my experience, academic papers like to test sorting with 
register-sized numbers or strings. Our sorttuples are bigger, have 
complex comparators, and can fall back to fields in the full tuple.
> 2. That paper is over 20 years old. If it demonstrated something 
genuinely useful, some of those concepts would likely be implemented in 
the real-world somewhere. Looking for evidence of that might be a good 
exercise.

> 3. 20 year-old results may not carry over to modern hardware.
> 4. Open source software is more widespread in the academic world now 
than 20 years ago, so papers with code (maybe even the author's github) 
are much more useful to us in my view.
> 5. It's actually not terribly hard to make sorting faster for some 
specific cases -- the hard part is keeping other inputs of interest from 
regressing.

> 6. The bigger the change, the bigger the risk of regressing somewhere.

Thanks John, these inputs are actually what I was looking for. I will do 
more research based on these inputs and build up my understanding.



Regards,

Ankit

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-29 Thread John Naylor

On Sun, Jan 29, 2023 at 7:05 PM Ankit Kumar Pandey 
wrote:
>
>
> > On 26/01/23 07:40, David Rowley wrote:

> > Sorting in smaller batches that better fit into CPU cache.
>
> More reading yielded that we are looking for cache-oblivious
> sorting algorithm.

Since David referred to L3 size as the starting point of a possible
configuration parameter, that's actually cache-conscious.

> One the papers[1] mentions that in quick sort, whenever we reach size
which can fit in cache,
> instead of partitioning it further, we can do insertion sort there itself.

> >  Memory-tuned quicksort uses insertion sort to sort small subsets while
they are in
> > the cache, instead of partitioning further. This increases the
instruction count,
> > but reduces the cache misses
>
> If this looks step in right direction, I can give it a try and do more
reading and experimentation.

I'm not close enough to this thread to guess at the right direction
(although I hope related work will help), but I have a couple general
remarks:

1. In my experience, academic papers like to test sorting with
register-sized numbers or strings. Our sorttuples are bigger, have
complex comparators, and can fall back to fields in the full tuple.
2. That paper is over 20 years old. If it demonstrated something genuinely
useful, some of those concepts would likely be implemented in the
real-world somewhere. Looking for evidence of that might be a good exercise.
3. 20 year-old results may not carry over to modern hardware.
4. Open source software is more widespread in the academic world now than
20 years ago, so papers with code (maybe even the author's github) are much
more useful to us in my view.
5. It's actually not terribly hard to make sorting faster for some specific
cases -- the hard part is keeping other inputs of interest from regressing.
6. The bigger the change, the bigger the risk of regressing somewhere.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-28 Thread Ankit Kumar Pandey




> On 28/01/23 14:31, John Naylor wrote:
> I recently brought up this older thread (mostly about #1), partly 
because of the issues discovered above, and partly because I hope to 
make progress on it before feature freeze (likely early April):
> 
https://www.postgresql.org/message-id/CAApHDvqXcmzAZDsj8iQs84%2Bdrzo0PgYub_QYzT6Zdj6p4A3A5A%40mail.gmail.com 



Thanks John for letting me know. I will keep eye on above thread and 
will focus on rest of issues.


Regards,
Ankit

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-28 Thread John Naylor

On Sat, Jan 28, 2023 at 3:21 PM Ankit Kumar Pandey 
wrote:
>
> > On 26/01/23 07:40, David Rowley wrote:

> > We might want to look at 1) Expanding
> > on what 69749243 did and considering if we want sort specialisations
> > that are specifically for 1 column and another set for multi-columns.
> > The multi-column ones don't need to re-compare key[0] again. 2)
> > Sorting in smaller batches that better fit into CPU cache. Both of
> > these ideas would require a large amount of testing and discussion.
> > For #1 we're considering other specialisations, for example NOT NULL,
> > and we don't want to explode the number of specialisations we have to
> > compile into the binary.
>
> Yes, 1 & 2 needs to be addressed before going ahead with this patch.
> Do we any have ongoing thread with #1 and #2?

I recently brought up this older thread (mostly about #1), partly because
of the issues discovered above, and partly because I hope to make progress
on it before feature freeze (likely early April):

https://www.postgresql.org/message-id/CAApHDvqXcmzAZDsj8iQs84%2Bdrzo0PgYub_QYzT6Zdj6p4A3A5A%40mail.gmail.com

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-28 Thread Ankit Kumar Pandey

On 26/01/23 07:40, David Rowley wrote:

You can see from the results that the patched version is not looking
particularly great. I did a 1 million row test and a 10 million row
test. work_mem was 4GB for each, so the sorts never went to disk.

Yes, its lackluster gains are very limited (pretty much when data gets
pushed to disk).

I'm unsure if 69749243 might be partially to blame here as it favours
single-key sorts. If you look at qsort_tuple_signed_compare(), you'll
see that the tiebreak function will be called only when it's needed
and there are > 1 sort keys. The comparetup function will re-compare
the first key all over again. If I get some more time I'll run the
tests again with the sort specialisation code disabled to see if the
situation is the same or not.
Another way to test that would be to have a table with 3 columns and
always sort by at least 2.

I will need to go through this.

I've attached the benchmark script that I used and also a copy of the
patch with a GUC added solely to allow easier benchmarking of patched
vs unpatched.

This is much relief, will be easier to repro and create more cases. Thanks.

I think we need to park this patch until we can figure out what can be
done to stop these regressions.

Makes sense.

We might want to look at 1) Expanding
on what 69749243 did and considering if we want sort specialisations
that are specifically for 1 column and another set for multi-columns.
The multi-column ones don't need to re-compare key[0] again. 2)
Sorting in smaller batches that better fit into CPU cache. Both of
these ideas would require a large amount of testing and discussion.
For #1 we're considering other specialisations, for example NOT NULL,
and we don't want to explode the number of specialisations we have to
compile into the binary.

Yes, 1 & 2 needs to be addressed before going ahead with this patch.
Do we any have ongoing thread with #1 and #2?

Also, we have seen an issue with cost ( 1 sort vs 2 sorts, both having same
cost)
https://www.postgresql.org/message-id/caaphdvo2y9s2ao-bpyo7gmpyd0xe2lo-kflnqx80fcftqbc...@mail.gmail.com
This needs to be investigated too (although might not be related but need to
check at least)

I will do some study around things mentioned here and will setup new threads
(if they don't exists)
based on discussions we had here .

Thanks,
Ankit

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-24 Thread Ankit Kumar Pandey





I think more benchmarking is required
so we can figure out if this is a corner case or a common case


I did some more benchmarks:

#1. AIM: Pushdown column whose size is very high

create table test(a int, b int, c text);
insert into test select a,b,c from generate_series(1,1000)a, 
generate_series(1,1000)b, repeat(md5(random()::text), 999)c;

explain (analyze, costs off) select count(*) over (order by a), row_number() 
over (order by a, b) from test order by a,b,c;
QUERY PLAN
--
 Incremental Sort (actual time=1161.605..6577.141 rows=100 loops=1)
   Sort Key: a, b, c
   Presorted Key: a, b
   Full-sort Groups: 31250  Sort Method: quicksort  Average Memory: 39kB  Peak 
Memory: 39kB
   ->  WindowAgg (actual time=1158.896..5819.460 rows=100 loops=1)
 ->  WindowAgg (actual time=1154.614..3391.537 rows=100 loops=1)
   ->  Gather Merge (actual time=1154.602..2404.125 rows=100 
loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort (actual time=1118.326..1295.743 rows=33 
loops=3)
   Sort Key: a, b
   Sort Method: external merge  Disk: 145648kB
   Worker 0:  Sort Method: external merge  Disk: 
140608kB
   Worker 1:  Sort Method: external merge  Disk: 
132792kB
   ->  Parallel Seq Scan on test (actual 
time=0.018..169.319 rows=33 loops=3)
 Planning Time: 0.091 ms
 Execution Time: 6816.616 ms
(17 rows)

Planner choose faster path correctly (which was not path which had pushed down 
column).

#2. AIM: Check strict vs incremental sorts wrt to large size data
Patch version is faster as for external merge sort, disk IO is main bottleneck 
and if we sort an extra column,
it doesn't have major impact. This is when work mem is very small.

For larger work_mem, difference between patched version and master is minimal 
and
they both provide somewhat comparable performance.

Tried permutation of few cases which we have already covered but I did not see 
anything alarming in those.



I'm just unsure if we should write this off as the expected behaviour



of Sort and continue with the patch or delay the whole thing until we


make some improvements to sort.  


I am not seeing other cases where patch version is consistently slower.


Thanks,
Ankit

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-18 Thread Ankit Kumar Pandey





On 19/01/23 08:58, David Rowley wrote:



The problem is that the next item looked at is 1 and the value 2 is skipped.


Okay, I see the issue.


I think you are misinterpreting the results, but the main point
remains - it's slower.  The explain analyze timing shows the time
between outputting the first row and the last row. For sort, there's a
lot of work that needs to be done before you output the first row.


Okay got it.


I looked into this a bit further and using the same table as you, and
the attached set of hacks that adjust the ORDER BY path generation to
split a Sort into a Sort and Incremental Sort when the number of
pathkeys to sort by is > 2.



Okay, so this is issue with strict sort itself.

I will play around with current patch but it should be fine for

current work and would account for issue in strict sort.


  I suspect this might be to do with having to perform more swaps in the 
array elements that we'resorting by with the full sort.  When work_mem is 
large this array no longer fits in CPU cache and I suspect those swaps become 
more costly when there are more cache lines having to be fetched from RAM.


Looks like possible reason.


I think we really should fix tuplesort.c so that it batches sorts into
about L3 CPU cache-sized chunks in RAM rather than trying to sort much
larger arrays.



I assume same thing we are doing for incremental sort and that's why it 
perform


better here?

Or is it due to shorter tuple width and thus being able to fit into 
memory easily?


 


On the
other hand, we already sort WindowClauses with the most strict sort
order first which results in a full Sort and no additional sort for
subsequent WindowClauses that can make use of that sort order.  It
would be bizarre to reverse the final few lines of common_prefix_cmp
so that we sort the least strict order first so we end up injecting
Incremental Sorts into the plan to make it faster!


I would see this as beneficial when tuple size is too huge for strict sort.


Basically

cost(sort(a,b,c,d,e)) > cost(sort(a)+sort(b)+sort(c)+ sort(d,e))

Don't know if cost based decision is realistic or not in current

state of things.


I think more benchmarking is required
so we can figure out if this is a corner case or a common case


I will do few more benchmarks. I assume all scaling issue with strict sort

to remain as it is but no further issue due to this change itself comes up.



I'm just unsure if we should write this off as the expected behaviour



of Sort and continue with the patch or delay the whole thing until we


make some improvements to sort.  


Unless we found more issues, we might be good with the former but

you can make better call on this. As much as I would love my first patch 
to get


merged, I would want it to be right.


Thanks,

Ankit

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-18 Thread David Rowley

On Thu, 19 Jan 2023 at 06:27, Ankit Kumar Pandey  wrote:
> Hmm, not really sure why did I miss that. I tried this again (added
> following in postgres.c above
>
> PortalStart)
>
> List* l = NIL;
> l = lappend(l, 1);
> l = lappend(l, 2);
> l = lappend(l, 3);
> l = lappend(l, 4);
>
> ListCell *lc;
> foreach_reverse(lc, l)
> {
> if (foreach_current_index(lc) == 2) // delete 3
> {
> foreach_delete_current(l, lc);
> }
> }

The problem is that the next item looked at is 1 and the value 2 is skipped.

> I also tried a bit unrealistic case.
>
> create table abcdefgh(a int, b int, c int, d int, e int, f int, g int, h int);
>
> insert into abcdefgh select a,b,c,d,e,f,g,h from
> generate_series(1,7) a,
> generate_series(1,7) b,
> generate_series(1,7) c,
> generate_series(1,7) d,
> generate_series(1,7) e,
> generate_series(1,7) f,
> generate_series(1,7) g,
> generate_series(1,7) h;
>
> explain analyze select count(*) over (order by a),
> row_number() over (partition by a order by b) from abcdefgh order by 
> a,b,c,d,e,f,g,h;
>
> In patch version

>   Execution Time: 82748.789 ms

> In Master

>   Execution Time: 71172.586 ms

> Patch version sort took around 15 sec, whereas master sort took ~2 + 46
> = around 48 sec

> Maybe I am interpreting it wrong but still wanted to bring this to notice.

I think you are misinterpreting the results, but the main point
remains - it's slower.  The explain analyze timing shows the time
between outputting the first row and the last row. For sort, there's a
lot of work that needs to be done before you output the first row.

I looked into this a bit further and using the same table as you, and
the attached set of hacks that adjust the ORDER BY path generation to
split a Sort into a Sort and Incremental Sort when the number of
pathkeys to sort by is > 2.

work_mem = '1GB' in all cases below:

I turned off the timing in EXPLAIN so that wasn't a factor in the
results. Generally having more nodes means more timing requests and
that's got > 0 overhead.

explain (analyze,timing off) select * from abcdefgh order by a,b,c,d,e,f,g,h;

7^8 rows
Master: Execution Time: 7444.479 ms
master + sort_hacks.diff: Execution Time: 5147.937 ms

So I'm also seeing Incremental Sort - > Sort faster than a single Sort
for 7^8 rows.

With 5^8 rows:
master: Execution Time: 299.949 ms
master + sort_hacks: Execution Time: 239.447 ms

and 4^8 rows:
master: Execution Time: 62.596 ms
master + sort_hacks: Execution Time: 67.900 ms

So at 4^8 sort_hacks becomes slower.  I suspect this might be to do
with having to perform more swaps in the array elements that we're
sorting by with the full sort.  When work_mem is large this array no
longer fits in CPU cache and I suspect those swaps become more costly
when there are more cache lines having to be fetched from RAM.

I think we really should fix tuplesort.c so that it batches sorts into
about L3 CPU cache-sized chunks in RAM rather than trying to sort much
larger arrays.

I'm just unsure if we should write this off as the expected behaviour
of Sort and continue with the patch or delay the whole thing until we
make some improvements to sort.  I think more benchmarking is required
so we can figure out if this is a corner case or a common case. On the
other hand, we already sort WindowClauses with the most strict sort
order first which results in a full Sort and no additional sort for
subsequent WindowClauses that can make use of that sort order.  It
would be bizarre to reverse the final few lines of common_prefix_cmp
so that we sort the least strict order first so we end up injecting
Incremental Sorts into the plan to make it faster!

David
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 093ace0864..a714e83a69 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5085,11 +5085,30 @@ create_ordered_paths(PlannerInfo *root,
 * incremental sort when there are presorted keys.
 */
if (presorted_keys == 0 || !enable_incremental_sort)
-   sorted_path = (Path *) create_sort_path(root,
-   
ordered_rel,
-   
input_path,
-   
root->sort_pathkeys,
-   
limit_tuples);
+   {
+   if (list_length(root->sort_pathkeys) > 2)
+   {
+   sorted_path = (Path *) 
create_sort_path(root,
+

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-18 Thread Ankit Kumar Pandey





On 18/01/23 15:12, David Rowley wrote:



I also thought I'd better test that foreach_delete_current() works
with foreach_reverse(). I can confirm that it *does not* work
correctly.  I guess maybe you only tested the fact that it deleted the
current item and not that the subsequent loop correctly went to the
item directly before the deleted item. There's a problem with that. We
skip an item.


Hmm, not really sure why did I miss that. I tried this again (added 
following in postgres.c above


PortalStart)

List* l = NIL;
l = lappend(l, 1);
l = lappend(l, 2);
l = lappend(l, 3);
l = lappend(l, 4);

ListCell *lc;
foreach_reverse(lc, l)
{
if (foreach_current_index(lc) == 2) // delete 3
{
foreach_delete_current(l, lc);
}
}

foreach(lc, l)
{
int i = (int) lfirst(lc);
ereport(LOG,(errmsg("%d", i)));
}

Got result:
2023-01-18 20:23:28.115 IST [51007] LOG:  1
2023-01-18 20:23:28.115 IST [51007] STATEMENT:  select pg_backend_pid();
2023-01-18 20:23:28.115 IST [51007] LOG:  2
2023-01-18 20:23:28.115 IST [51007] STATEMENT:  select pg_backend_pid();
2023-01-18 20:23:28.115 IST [51007] LOG:  4
2023-01-18 20:23:28.115 IST [51007] STATEMENT:  select pg_backend_pid();

I had expected list_delete_cell to take care of rest.


Instead of fixing that, I think it's likely better just to loop
backwards manually with a for() loop, so I've adjusted the patch to
work that way.  It's quite likely that the additional code in
foreach() and what was in foreach_reverse() is slower than looping
manually due to the additional work those macros do to set the cell to
NULL when we run out of cells to loop over.


Okay, current version looks fine as well.


I made another pass over the v7 patch and fixed a bug that was
disabling the optimization when the deepest WindowAgg had a
runCondition.  This should have been using llast_node instead of
linitial_node.  The top-level WindowAgg is the last in the list.



I also made a pass over the tests and added a test case for the
runCondition check to make sure we disable the optimization when the
top-level WindowAgg has one of those.  


I wasn't sure what your test comments case numbers were meant to represent. 
They were not aligned with the code comments that document when the optimisation is

disabled, they started out aligned, but seemed to go off the rails at
#3. I've now made them follow the comments in create_one_window_path()
and made it more clear what we expect the test outcome to be in each
case.


Those were just numbering for exceptional cases, making them in sync 
with comments


wasn't really on my mind, but now they looks better.


I've attached the v9 patch. I feel like this patch is quite
self-contained and I'm quite happy with it now.  If there are no
objections soon, I'm planning on pushing it.


Patch is already rebased with latest master, tests are all green.

Tried some basic profiling and it looked good.


I also tried a bit unrealistic case.

create table abcdefgh(a int, b int, c int, d int, e int, f int, g int, h int);

insert into abcdefgh select a,b,c,d,e,f,g,h from
generate_series(1,7) a,
generate_series(1,7) b,
generate_series(1,7) c,
generate_series(1,7) d,
generate_series(1,7) e,
generate_series(1,7) f,
generate_series(1,7) g,
generate_series(1,7) h;

explain analyze select count(*) over (order by a),
row_number() over (partition by a order by b) from abcdefgh order by 
a,b,c,d,e,f,g,h;

In patch version

QUERY PLAN

--
---
 WindowAgg  (cost=1023241.14..1225007.67 rows=5764758 width=48) (actual 
time=64957.894..81950.352 rows=5764801 loops=1)
   ->  WindowAgg  (cost=1023241.14..1138536.30 rows=5764758 width=40) 
(actual time=37959.055..60391.799 rows=5764801 loops

=1)
 ->  Sort  (cost=1023241.14..1037653.03 rows=5764758 width=32) 
(actual time=37959.045..52968.791 rows=5764801 loop

s=1)
   Sort Key: a, b, c, d, e, f, g, h
   Sort Method: external merge  Disk: 237016kB
   ->  Seq Scan on abcdefgh  (cost=0.00..100036.58 
rows=5764758 width=32) (actual time=0.857..1341.107 rows=57

64801 loops=1)
 Planning Time: 0.168 ms
 Execution Time: 82748.789 ms
(8 rows)

In Master

QUERY PLAN

--
-
 Incremental Sort  (cost=1040889.72..1960081.97 rows=5764758 width=48) 
(actual time=23461.815..69654.700 rows=5764801 loop

s=1)
   Sort Key: a, b, c, d, e, f, g, h
   Presorted Key: a, b
   Full-sort Groups: 49  Sort Method: quicksort  Average Memory: 30kB  
Peak Memory: 30kB
   Pre-sorted Groups: 49  Sort Method: external merge  Average Disk: 
6688kB  Peak Disk: 6688kB
   ->  WindowAgg  (cost=1023241.14..1225007.67 rows=5764758 width=48) 
(actual time=22729.171..40189.407 rows=5764801 loops

=1)
 ->

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-18 Thread David Rowley

On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey  wrote:
>
>
> > On 09/01/23 17:53, David Rowley wrote:
> > I gave some thought to whether doing foreach_delete_current() is safe
> > within a foreach_reverse() loop. I didn't try it, but I couldn't see
> > any reason why not.  It is pretty late here and I'd need to test that
> > to be certain. If it turns out not to be safe then we need to document
> > that fact in the comments of the foreach_reverse() macro and the
> > foreach_delete_current() macro.
>
> I tested foreach_delete_current inside foreach_reverse loop.
>
> It worked fine.

I also thought I'd better test that foreach_delete_current() works
with foreach_reverse(). I can confirm that it *does not* work
correctly.  I guess maybe you only tested the fact that it deleted the
current item and not that the subsequent loop correctly went to the
item directly before the deleted item. There's a problem with that. We
skip an item.

Instead of fixing that, I think it's likely better just to loop
backwards manually with a for() loop, so I've adjusted the patch to
work that way.  It's quite likely that the additional code in
foreach() and what was in foreach_reverse() is slower than looping
manually due to the additional work those macros do to set the cell to
NULL when we run out of cells to loop over.

> I have attached patch with one extra test case (as mentioned above).
> Rest of the changes are looking fine.

I made another pass over the v7 patch and fixed a bug that was
disabling the optimization when the deepest WindowAgg had a
runCondition.  This should have been using llast_node instead of
linitial_node.  The top-level WindowAgg is the last in the list.

I also made a pass over the tests and added a test case for the
runCondition check to make sure we disable the optimization when the
top-level WindowAgg has one of those.  I wasn't sure what your test
comments case numbers were meant to represent. They were not aligned
with the code comments that document when the optimisation is
disabled, they started out aligned, but seemed to go off the rails at
#3. I've now made them follow the comments in create_one_window_path()
and made it more clear what we expect the test outcome to be in each
case.

I've attached the v9 patch. I feel like this patch is quite
self-contained and I'm quite happy with it now.  If there are no
objections soon, I'm planning on pushing it.

David
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 093ace0864..c110696403 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4464,8 +4464,11 @@ create_one_window_path(PlannerInfo *root,
   List *activeWindows)
 {
PathTarget *window_target;
+   Query  *parse = root->parse;
ListCell   *l;
List   *topqual = NIL;
+   List   *orderby_pathkeys = NIL;
+   int sort_pushdown_idx = -1;

/*
 * Since each window clause could require a different sort order, we 
stack
@@ -4484,6 +4487,75 @@ create_one_window_path(PlannerInfo *root,
 */
window_target = input_target;

+   /*
+* Here we attempt to minimize the number of sorts which must be 
performed
+* for queries with an ORDER BY clause.
+*
+* It's possible due to select_active_windows() putting the 
WindowClauses
+* with the lowest tleSortGroupRef last in the activeWindows list that 
the
+* final WindowClause has a subset of the sort requirements that the
+* query's ORDER BY clause has.  Below we try to detect this case and if
+* we find this is true, we consider adjusting the sort that's done for
+* WindowAggs and include the additional clauses so that no additional
+* sorting is required for the query's ORDER BY clause.
+*
+* We don't try this optimization for the following cases:
+*
+* 1. If the query has a DISTINCT clause.  We needn't waste any 
additional
+* effort for the more strict sort order as if DISTINCT is done via Hash
+* Aggregate then that's going to undo this work.
+*
+* 2. If the query has a LIMIT clause.  The top-level sort will be able 
to
+* use a top-n sort which might be more efficient than sorting  by the
+* additional columns.  If the LIMIT does not reduce thenumber 
of rows
+* significantly then this might not be true, but we don't try to 
consider
+* that here.
+*
+* 3. If the top-level WindowClause has a runCondition then this can
+* filter out tuples and make the final sort cheaper.  If we pushed the
+* sort down below the WindowAgg then we'd need to sort all rows 
including
+* ones that the runCondition might filter out.  This may waste effort 
so
+* we just don't try to push down the sort for this case.
+

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-15 Thread Ankit Kumar Pandey





On 16/01/23 09:48, David Rowley wrote:
I don't think we should touch this.  It could significantly increase
the number of indexes that we consider when generating paths on base
relations and therefore *significantly* increase the number of paths
we consider during the join search.  What I had in mind was just
making better use of existing paths to see if we can find a cheaper
way to perform the DISTINCT.  That'll only possibly increase the
number of paths for the distinct upper relation which really only
increases the number of paths which are considered in
create_ordered_paths(). That's unlikely to cause much of a slowdown in
the planner.


Okay, I see the issue. Makes sense



I'm seeing these two things as separate patches. I don't think there's
any need to add further complexity to the patch that tries to reduce
the number of sorts for WindowAggs. I think you'd better start a new
thread for this.


Will be starting new thread for this and separate patch.



As far as I see it, you shouldn't be touching the distinct_pathkeys.
Those are set in such a way as to minimise the likelihood of an
additional sort for the ORDER BY.  If you've fiddled with that, then I
imagine this is why the plan below has an additional Incremental Sort
that didn't exist before.



I've not looked at your patch, but all I imagine you need to do for it
is to invent a function in pathkeys.c which is along the lines of what
pathkeys_count_contained_in() does, but returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1. In
create_final_distinct_paths(), you can then perform an incremental
sort on any input_path which has a non-empty return list and in
create_incremental_sort_path(), you'll pass presorted_keys as the
number of pathkeys in the path, and the required pathkeys the
input_path->pathkeys + the pathkeys returned from the new function.


Okay, this should be straightforward. Let me try this.


As an optimization, you might want to consider that the
distinct_pathkeys list might be long and that the new function, if you
code the lookup as a nested loop, might be slow. You might want to
consider hashing the distinct_pathkeys once in
create_final_distinct_paths(), then for each input_path, perform a
series of hash lookups to see which of the input_path->pathkeys are in
the hash table. That might require adding two functions to pathkeys.c,
one to build the hash table and then another to probe it and return
the remaining pathkeys list.  I'd go and make sure it all works as we
expect before going to the trouble of trying to optimize this. A
simple nested loop lookup will allow us to review that this works as
we expect.


Okay make sense, will start with nested loop while it is in review and then

optimal version once it is all good to go.

Thanks,

Ankit

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-15 Thread David Rowley

On Mon, 16 Jan 2023 at 06:52, Ankit Kumar Pandey  wrote:
> Hence, input_rel->pathlist returns null for select distinct b,a from ab
> where a < 10; even if index is created on a.
>
> In order to tackle this, I have added index_pathkeys in indexpath node
> itself.

I don't think we should touch this.  It could significantly increase
the number of indexes that we consider when generating paths on base
relations and therefore *significantly* increase the number of paths
we consider during the join search.  What I had in mind was just
making better use of existing paths to see if we can find a cheaper
way to perform the DISTINCT.  That'll only possibly increase the
number of paths for the distinct upper relation which really only
increases the number of paths which are considered in
create_ordered_paths(). That's unlikely to cause much of a slowdown in
the planner.

> Although I started this patch from master, I merged changes to window sort
> optimizations.

I'm seeing these two things as separate patches. I don't think there's
any need to add further complexity to the patch that tries to reduce
the number of sorts for WindowAggs. I think you'd better start a new
thread for this.

> Also, I am assuming distinct pathkeys can be changed without any issues.
> As changes are limited to modification in distinct path only,

As far as I see it, you shouldn't be touching the distinct_pathkeys.
Those are set in such a way as to minimise the likelihood of an
additional sort for the ORDER BY.  If you've fiddled with that, then I
imagine this is why the plan below has an additional Incremental Sort
that didn't exist before.

I've not looked at your patch, but all I imagine you need to do for it
is to invent a function in pathkeys.c which is along the lines of what
pathkeys_count_contained_in() does, but returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1. In
create_final_distinct_paths(), you can then perform an incremental
sort on any input_path which has a non-empty return list and in
create_incremental_sort_path(), you'll pass presorted_keys as the
number of pathkeys in the path, and the required pathkeys the
input_path->pathkeys + the pathkeys returned from the new function.

As an optimization, you might want to consider that the
distinct_pathkeys list might be long and that the new function, if you
code the lookup as a nested loop, might be slow. You might want to
consider hashing the distinct_pathkeys once in
create_final_distinct_paths(), then for each input_path, perform a
series of hash lookups to see which of the input_path->pathkeys are in
the hash table. That might require adding two functions to pathkeys.c,
one to build the hash table and then another to probe it and return
the remaining pathkeys list.  I'd go and make sure it all works as we
expect before going to the trouble of trying to optimize this. A
simple nested loop lookup will allow us to review that this works as
we expect.

> EXPLAIN (COSTS OFF)
> SELECT DISTINCT
> empno,
> enroll_date,
> depname,
> sum(salary) OVER (PARTITION BY depname order by empno) depsalary,
> min(salary) OVER (PARTITION BY depname order by enroll_date) 
> depminsalary
> FROM empsalary
> ORDER BY depname, empno;
>   QUERY PLAN
> -
>   Incremental Sort
> Sort Key: depname, empno
> Presorted Key: depname
> ->  Unique
>   ->  Incremental Sort
> Sort Key: depname, enroll_date, empno, (sum(salary) OVER 
> (?)), (min(salary) OVER (?))
> Presorted Key: depname
> ->  WindowAgg
>   ->  Incremental Sort
> Sort Key: depname, empno
> Presorted Key: depname
> ->  WindowAgg
>   ->  Sort
> Sort Key: depname, enroll_date
> ->  Seq Scan on empsalary
> (15 rows)

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-15 Thread Ankit Kumar Pandey




On 13/01/23 07:48, David Rowley wrote:



It would just care if the
pathkeys were present and return a list of pathkeys not contained so
that an incremental sort could be done only on the returned list and a
Unique on an empty returned list. 


In create_final_distinct_paths, presorted keys are determined from

input_rel->pathlist & needed_pathkeys. Problem with input_rel->pathlist

is that, for index node, useful_pathkeys is stored in 
input_rel->pathlist but this useful_pathkeys


is determined from truncate_useless_pathkeys(index_pathkeys) which 
removes index_keys if ordering is different.


Hence, input_rel->pathlist returns null for select distinct b,a from ab 
where a < 10; even if index is created on a.


In order to tackle this, I have added index_pathkeys in indexpath node 
itself.


Although I started this patch from master, I merged changes to window sort

optimizations.


In patched version:

set enable_hashagg=0;
set enable_seqscan=0;

explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
  QUERY PLAN
---
 Unique  (cost=139.49..163.73 rows=415 width=73)
   ->  Incremental Sort  (cost=139.49..160.62 rows=415 width=73)
 Sort Key: relkind, relname, (count(*) OVER (?))
 Presorted Key: relkind
 ->  WindowAgg  (cost=134.20..141.46 rows=415 width=73)
   ->  Sort  (cost=134.20..135.23 rows=415 width=65)
 Sort Key: relkind
 ->  Seq Scan on pg_class  
(cost=100.00..116.15 rows=415 width=65)
(8 rows)

explain select distinct b,a from ab where a < 10;
    QUERY PLAN
--
 Unique  (cost=0.71..60.05 rows=611 width=8)
   ->  Incremental Sort  (cost=0.71..55.55 rows=900 width=8)
 Sort Key: a, b
 Presorted Key: a
 ->  Index Scan using ab_a_idx on ab  (cost=0.29..28.04 rows=900 
width=8)
   Index Cond: (a < 10)
(6 rows)

explain select distinct b,a, count(*) over (partition by a) from abcd order by 
a,b;
 QUERY PLAN
-
 Unique  (cost=1021174.63..1038095.75 rows=60 width=16)
   ->  Incremental Sort  (cost=1021174.63..1036745.75 rows=18 
width=16)
 Sort Key: a, b, (count(*) OVER (?))
 Presorted Key: a, b
 ->  WindowAgg  (cost=1020948.87..1024098.87 rows=18 
width=16)
   ->  Sort  (cost=1020948.87..1021398.87 rows=18 
width=8)
 Sort Key: a, b
 ->  Seq Scan on abcd  (cost=100.00..1002773.00 
rows=18 width=8)
(8 rows)

explain select distinct a, b, count(*) over (partition by a,b,c) from abcd;
  QUERY PLAN
--
 Unique  (cost=1021580.47..1036629.31 rows=60 width=20)
   ->  Incremental Sort  (cost=1021580.47..1035279.31 rows=18 
width=20)
 Sort Key: a, b, c, (count(*) OVER (?))
 Presorted Key: a, b, c
 ->  WindowAgg  (cost=1021561.37..1025611.37 rows=18 
width=20)
   ->  Sort  (cost=1021561.37..1022011.37 rows=18 
width=12)
 Sort Key: a, b, c
 ->  Seq Scan on abcd  (cost=100.00..1002773.00 
rows=18 width=12)
(8 rows)

explain select distinct a, b, count(*) over (partition by b,a, c) from abcd;

    QUERY PLAN

---

 Unique  (cost=2041.88..36764.90 rows=60 width=20)

   ->  Incremental Sort  (cost=2041.88..35414.90 rows=18 width=20)

 Sort Key: b, a, c, (count(*) OVER (?))

 Presorted Key: b, a, c

 ->  WindowAgg  (cost=1989.94..25746.96 rows=18 width=20)

   ->  Incremental Sort  (cost=1989.94..22146.96 rows=18 
width=12)

 Sort Key: b, a, c

 Presorted Key: b

 ->  Index Scan using b_idx on abcd  (cost=0.29..7174.62 
rows=18 width=12)

(9 rows)


In master:

explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;

  QUERY PLAN
---
 Unique  (cost=159.50..163.65 rows=415 width=73)
   ->  Sort  (cost=159.50..160.54 rows=415 width=73)

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-12 Thread Ankit Kumar Pandey





On 13/01/23 07:48, David Rowley wrote:



I don't think you can claim that one so easily.  The two should have
quite different scaling characteristics which will be more evident
with a larger number of input rows. Also, Hash Aggregate makes use of
work_mem * hash_mem_multiplier, whereas sort uses work_mem.  Consider
a hash_mem_multiplier less than 1.0.


In this case, it would make sense to do push down. I will do testing 
around large data to see it myself.




We could consider adjusting the create_distinct_paths() so that it
uses some newly invented and less strict pathkey comparison where the
order of the pathkeys does not matter. It would just care if the
pathkeys were present and return a list of pathkeys not contained so
that an incremental sort could be done only on the returned list and a
Unique on an empty returned list.  Something like that might be able
to apply in more cases, for example:



select distinct b,a from ab where a < 10;



the distinct pathkeys would be b,a but if there's an index on (a),
then we might have a path with pathkeys containing "a".


This would be a very good improvement.


even if they're not costed quite as
accurately as we might have liked


This is very exciting piece actually. Once current set of optimizations 
gets ahead,


I will be giving this a shot. We need to look at cost models for sorting.



We might also want to also consider if Pathkey.pk_strategy and
pk_nulls_first need to be compared too.  That makes the check a bit
more expensive as Pathkeys are canonical and if those fields vary then
we need to perform more than just a comparison by the memory address
of the pathkey.
 

This very much seems like a separate effort than the
WindowClause sort reduction work. I think it gives us everything we've
talked about extra we might want out of reducing WindowClause sorts
and more.


I will work on this as separate patch (against HEAD). It makes much more

sense to look this as distinct sort related optimizations (which window 
sort optimization


can take benefit). We may take a call to combine them or apply in series.

For unit of work perspective, I would prefer later.


Anyways, the forthcoming patch will contain the following:

1. Modify create_distinct_paths with newly invented and less strict 
pathkey comparison where the


order of the pathkeys does not matter.

2. Handle Pathkey.pk_strategy and pk_nulls comparison.

3. Test cases


Thanks


Regards,

Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-12 Thread David Rowley

On Wed, 11 Jan 2023 at 19:21, Ankit Kumar Pandey  wrote:
> HashAgg has better cost than Unique even with incremental sort (tried
> with other case
>
> where we have more columns pushed down but still hashAgg wins).

I don't think you can claim that one so easily.  The two should have
quite different scaling characteristics which will be more evident
with a larger number of input rows. Also, Hash Aggregate makes use of
work_mem * hash_mem_multiplier, whereas sort uses work_mem.  Consider
a hash_mem_multiplier less than 1.0.

> > I'm not really all that sure the above query shape makes much sense in
> > the real world. Would anyone ever want to use DISTINCT on some results
> > containing WindowFuncs?
>
> This could still have been good to have if there were no negative impact
>
> and some benefit in few cases but as mentioned before, if hashagg removes
>
> any sort (which happened due to push down), all gains will be lost
>
> and we will be probably worse off than before.

We could consider adjusting the create_distinct_paths() so that it
uses some newly invented and less strict pathkey comparison where the
order of the pathkeys does not matter. It would just care if the
pathkeys were present and return a list of pathkeys not contained so
that an incremental sort could be done only on the returned list and a
Unique on an empty returned list.  Something like that might be able
to apply in more cases, for example:

select distinct b,a from ab where a < 10;

the distinct pathkeys would be b,a but if there's an index on (a),
then we might have a path with pathkeys containing "a".

You can see when we manually swap the order of the DISTINCT clause
that we get a more optimal plan (even if they're not costed quite as
accurately as we might have liked)

create table ab(a int, b int);
create index on ab(a);
set enable_hashagg=0;
set enable_seqscan=0;
insert into ab select x,y from generate_series(1,100)x, generate_Series(1,100)y;
analyze ab;

# explain select distinct b,a from ab where a < 10;
QUERY PLAN
--
 Unique  (cost=72.20..78.95 rows=611 width=8)
   ->  Sort  (cost=72.20..74.45 rows=900 width=8)
 Sort Key: b, a
 ->  Index Scan using ab_a_idx on ab  (cost=0.29..28.04
rows=900 width=8)
   Index Cond: (a < 10)
(5 rows)

# explain select distinct a,b from ab where a < 10; -- manually swap
DISTINCT column order.
QUERY PLAN
--
 Unique  (cost=0.71..60.05 rows=611 width=8)
   ->  Incremental Sort  (cost=0.71..55.55 rows=900 width=8)
 Sort Key: a, b
 Presorted Key: a
 ->  Index Scan using ab_a_idx on ab  (cost=0.29..28.04
rows=900 width=8)
   Index Cond: (a < 10)
(6 rows)

We might also want to also consider if Pathkey.pk_strategy and
pk_nulls_first need to be compared too.  That makes the check a bit
more expensive as Pathkeys are canonical and if those fields vary then
we need to perform more than just a comparison by the memory address
of the pathkey. This very much seems like a separate effort than the
WindowClause sort reduction work. I think it gives us everything we've
talked about extra we might want out of reducing WindowClause sorts
and more.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-10 Thread Ankit Kumar Pandey


On 11/01/23 06:18, David Rowley wrote:




Not sure if we should be trying to improve that in this patch. I just
wanted to identify it as something else that perhaps could be done.


This could be within reach but still original problem of having hashagg 
removing


any gains from this remains.


eg

set enable_hashagg=0;

explain select distinct relkind, relname, count(*) over (partition by
relkind) from pg_Class;
 QUERY PLAN

 Unique  (cost=41.26..65.32 rows=412 width=73)
   ->  Incremental Sort  (cost=41.26..62.23 rows=412 width=73)
 Sort Key: relkind, relname, (count(*) OVER (?))
 Presorted Key: relkind
 ->  WindowAgg  (cost=36.01..43.22 rows=412 width=73)
   ->  Sort  (cost=36.01..37.04 rows=412 width=65)
 Sort Key: relkind
 ->  Seq Scan on pg_class  (cost=0.00..18.12 rows=412 
width=65)
(8 rows)

reset enable_hashagg;
explain select distinct relkind, relname, count(*) over (partition by
relkind) from pg_Class;
  QUERY PLAN
--
 HashAggregate  (cost=46.31..50.43 rows=412 width=73)
   Group Key: relkind, relname, count(*) OVER (?)
   ->  WindowAgg  (cost=36.01..43.22 rows=412 width=73)
 ->  Sort  (cost=36.01..37.04 rows=412 width=65)
   Sort Key: relkind
   ->  Seq Scan on pg_class  (cost=0.00..18.12 rows=412 width=65)
(6 rows)

HashAgg has better cost than Unique even with incremental sort (tried 
with other case


where we have more columns pushed down but still hashAgg wins).

explain select distinct a, b, count(*) over (partition by a order by b) from 
abcd;
  QUERY PLAN
--
 Unique  (cost=345712.12..400370.25 rows=1595 width=16)
   ->  Incremental Sort  (cost=345712.12..395456.14 rows=655214 width=16)
 Sort Key: a, b, (count(*) OVER (?))
 Presorted Key: a, b
 ->  WindowAgg  (cost=345686.08..358790.36 rows=655214 width=16)
   ->  Sort  (cost=345686.08..347324.11 rows=655214 width=8)
 Sort Key: a, b
 ->  Seq Scan on abcd  (cost=0.00..273427.14 rows=655214 
width=8)

explain select distinct a, b, count(*) over (partition by a order by b) from 
abcd;

   QUERY PLAN



 HashAggregate  (cost=363704.46..363720.41 rows=1595 width=16)

   Group Key: a, b, count(*) OVER (?)

   ->  WindowAgg  (cost=345686.08..358790.36 rows=655214 width=16)

 ->  Sort  (cost=345686.08..347324.11 rows=655214 width=8)

   Sort Key: a, b

   ->  Seq Scan on abcd  (cost=0.00..273427.14 rows=655214 width=8)

(6 rows)



I'm not really all that sure the above query shape makes much sense in
the real world. Would anyone ever want to use DISTINCT on some results
containing WindowFuncs?


This could still have been good to have if there were no negative impact

and some benefit in few cases but as mentioned before, if hashagg removes

any sort (which happened due to push down), all gains will be lost

and we will be probably worse off than before.

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-10 Thread David Rowley

On Tue, 10 Jan 2023 at 18:36, Tom Lane  wrote:
>
> David Rowley  writes:
> > Ideally, our sort costing would just be better, but I think that
> > raises the bar a little too high to start thinking of making
> > improvements to that for this patch.
>
> It's trickier than it looks, cf f4c7c410e.  But if you just want
> to add a small correction based on number of columns being sorted
> by, that seems within reach.  See the comment for cost_sort though.
> Also, I suppose for incremental sorts we'd want to consider only
> the number of newly-sorted columns, but I'm not sure if that info
> is readily at hand either.

Yeah, I had exactly that in mind when I mentioned about setting the
bar higher. It seems like a worthy enough goal to improve the sort
costs separately from this work. I'm starting to consider if we might
need to revisit cost_sort() anyway. There's been quite a number of
performance improvements made to sort in the past few years and I
don't recall if anything has been done to check if the sort costs are
still realistic. I'm aware that it's a difficult problem as the number
of comparisons is highly dependent on the order of the input rows.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-10 Thread David Rowley

On Wed, 11 Jan 2023 at 08:17, Ankit Kumar Pandey  wrote:
>
>
> > On 10/01/23 10:53, David Rowley wrote:
>
> > the total cost is the same for both of these, but the execution time
> > seems to vary quite a bit.
>
> This is really weird, I tried it different ways (to rule out any issues
> due to
>
> caching) and execution time varied in spite of having same cost.
>
> > Maybe we should try and do this for DISTINCT queries if the
> > distinct_pathkeys match the orderby_pathkeys. That seems a little less
> > copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
> > likely that the ORDER BY might opt to use the Unique path for DISTINCT
> > since it'll already have the correct pathkeys.
>
> > However, if the ORDER BY has fewer columns then it might be cheaper to Hash 
> > Aggregate and
> > then sort all over again, especially so when the DISTINCT removes a
> > large proportion of the rows.
>
> Isn't order by pathkeys are always fewer than distinct pathkeys?

Just thinking about this again, I remembered why I thought DISTINCT
was uninteresting to start with.  The problem is that if the query has
WindowFuncs and also has a DISTINCT clause, then the WindowFunc
results *must* be in the DISTINCT clause and, optionally also in the
ORDER BY clause.  There's no other place to write WindowFuncs IIRC.
Since we cannot pushdown the sort when the more strict version of the
pathkeys have WindowFuncs, then we must still perform the additional
sort if the planner chooses to do a non-hashed DISTINCT.  The aim of
this patch is to reduce the total number of sorts, and I don't think
that's possible in this case as you can't have WindowFuncs in the
ORDER BY when they're not in the DISTINCT clause:

postgres=# select distinct relname from pg_Class order by row_number()
over (order by oid);
ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: select distinct relname from pg_Class order by row_number() ...

Another type of query which is suboptimal still is when there's a
DISTINCT and WindowClause but no ORDER BY. We'll reorder the DISTINCT
clause so that the leading columns of the ORDER BY come first in
transformDistinctClause(), but we've nothing to do the same for
WindowClauses.  It can't happen around when transformDistinctClause()
is called  as we've yet to decide the WindowClause evaluation order,
so if we were to try to make that better it would maybe have to do in
the upper planner somewhere. It's possible it's too late by that time
to adjust the DISTINCT clause.

Here's an example of it.

# set enable_hashagg=0;
# explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
 QUERY PLAN

 Unique  (cost=61.12..65.24 rows=412 width=73)
   ->  Sort  (cost=61.12..62.15 rows=412 width=73)
 Sort Key: relname, relkind, (count(*) OVER (?))
 ->  WindowAgg  (cost=36.01..43.22 rows=412 width=73)
   ->  Sort  (cost=36.01..37.04 rows=412 width=65)
 Sort Key: relkind
 ->  Seq Scan on pg_class  (cost=0.00..18.12
rows=412 width=65)
(7 rows)

We can simulate the optimisation by swapping the column order in the
targetlist. Note the planner can use Incremental Sort (at least since
3c6fc5820, from about 2 hours ago)

# explain select distinct relkind,relname,count(*) over (partition by
relkind) from pg_Class;
 QUERY PLAN

 Unique  (cost=41.26..65.32 rows=412 width=73)
   ->  Incremental Sort  (cost=41.26..62.23 rows=412 width=73)
 Sort Key: relkind, relname, (count(*) OVER (?))
 Presorted Key: relkind
 ->  WindowAgg  (cost=36.01..43.22 rows=412 width=73)
   ->  Sort  (cost=36.01..37.04 rows=412 width=65)
 Sort Key: relkind
 ->  Seq Scan on pg_class  (cost=0.00..18.12
rows=412 width=65)
(8 rows)

Not sure if we should be trying to improve that in this patch. I just
wanted to identify it as something else that perhaps could be done.
I'm not really all that sure the above query shape makes much sense in
the real world. Would anyone ever want to use DISTINCT on some results
containing WindowFuncs?

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-10 Thread Ankit Kumar Pandey





On 10/01/23 10:53, David Rowley wrote:



the total cost is the same for both of these, but the execution time
seems to vary quite a bit.


This is really weird, I tried it different ways (to rule out any issues 
due to


caching) and execution time varied in spite of having same cost.


Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys.



However, if the ORDER BY has fewer columns then it might be cheaper to Hash 
Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.


Isn't order by pathkeys are always fewer than distinct pathkeys?

distinct pathkeys = order by pathkeys + window functions pathkeys

Again, I got your point which that it is okay to pushdown order by clause

if distinct is doing unique sort. But problem is (atleast from what I am 
facing),


distinct is not caring about pushed down sortkeys, it goes with hashagg 
or unique


with some other logic (mentioned below).


Consider following (with distinct clause restriction removed)

if (parse->distinctClause)
{
List* distinct_pathkeys = make_pathkeys_for_sortclauses(root, 
parse->distinctClause, root->processed_tlist);
if (!compare_pathkeys(distinct_pathkeys, orderby_pathkeys)==1) // distinct 
key > order by key
skip = true; // this is used to skip order by pushdown


}

CASE #1:

explain (costs off) select distinct a,b, min(a) over (partition by a), sum (a) 
over (partition by a) from abcd order by a,b;
QUERY PLAN
---
 Sort
   Sort Key: a, b
   ->  HashAggregate
 Group Key: a, b, min(a) OVER (?), sum(a) OVER (?)
 ->  WindowAgg
   ->  Sort
 Sort Key: a, b
 ->  Seq Scan on abcd
(8 rows)

explain (costs off) select distinct a,b,c, min(a) over (partition by a), sum 
(a) over (partition by a) from abcd order by a,b,c;
  QUERY PLAN
--
 Sort
   Sort Key: a, b, c
   ->  HashAggregate
 Group Key: a, b, c, min(a) OVER (?), sum(a) OVER (?)
 ->  WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  Seq Scan on abcd
(8 rows)

No matter how many columns are pushed down, it does hashagg.

On the other hand:

CASE #2:

EXPLAIN (costs off) SELECT DISTINCT depname, empno, min(salary) OVER (PARTITION 
BY depname) depminsalary,sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno;
QUERY PLAN
--
 Unique
   ->  Sort
 Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER 
(?))
 ->  WindowAgg
   ->  Sort
 Sort Key: depname, empno
 ->  Seq Scan on empsalary
(7 rows)

EXPLAIN (costs off) SELECT DISTINCT depname, empno, enroll_date, min(salary) 
OVER (PARTITION BY depname) depminsalary,sum(salary) OVER (PARTITION BY 
depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
  QUERY PLAN
---
 Unique
   ->  Sort
 Sort Key: depname, empno, enroll_date, (min(salary) OVER (?)), 
(sum(salary) OVER (?))
 ->  WindowAgg
   ->  Sort
 Sort Key: depname, empno, enroll_date
 ->  Seq Scan on empsalary
(7 rows)

It keeps doing Unique.

In both of the cases, compare_pathkeys(distinct_pathkeys, 
orderby_pathkeys) returns 1


Looking bit further, planner is choosing things correctly.

I could see cost of unique being higher in 1st case and lower in 2nd case.

But the point is, if sort for orderby is pushdown, shouldn't there be 
some discount on


cost of Unique sort (so that there is more possibility of it being 
favorable compared to HashAgg in certain cases)?


Again, cost of Unqiue node is taken as cost of sort node as it is, but

for HashAgg, new cost is being computed. If we do incremental sort here 
(for unique node),


as we have pushed down orderby's, unique cost could be reduced and our 
optimization could


be made worthwhile (I assume this is what you intended here) in case of 
distinct.


Eg:

EXPLAIN SELECT DISTINCT depname, empno, enroll_date, min(salary) OVER 
(PARTITION BY depname) depminsalary,sum(salary) OVER (PARTITION BY depname) 
depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
  QUERY PLAN

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-10 Thread Ankit Kumar Pandey





On 10/01/23 10:53, David Rowley wrote:



On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey  wrote:
> Do we have any pending items for this patch now?

I'm just wondering if not trying this when the query has a DISTINCT
clause is a copout.  What I wanted to avoid was doing additional
sorting work for WindowAgg just to have it destroyed by Hash
Aggregate.  I'm now wondering if adding both the original
slightly-less-sorted path plus the new slightly-more-sorted path then
if distinct decides to Hash Aggregate then it'll still be able to pick
the cheapest input path to do that on.  Unfortunately, our sort
costing just does not seem to be advanced enough to know that sorting
by fewer columns might be cheaper, so adding the additional path is
likely just going to result in add_path() ditching the old
slightly-less-sorted path due to the new slightly-more-sorted path
having better pathkeys. So, we'd probably be wasting our time if we
added both paths with the current sort costing code.



Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys.  However, if the ORDER
BY has fewer columns then it might be cheaper to Hash Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.

Ideally, our sort costing would just be better, but I think that
raises the bar a little too high to start thinking of making
improvements to that for this patch.


Let me take a stab at this. Depending on complexity, we can take

a call to address this in current patch or a follow up.

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-09 Thread Tom Lane

David Rowley  writes:
> Ideally, our sort costing would just be better, but I think that
> raises the bar a little too high to start thinking of making
> improvements to that for this patch.

It's trickier than it looks, cf f4c7c410e.  But if you just want
to add a small correction based on number of columns being sorted
by, that seems within reach.  See the comment for cost_sort though.
Also, I suppose for incremental sorts we'd want to consider only
the number of newly-sorted columns, but I'm not sure if that info
is readily at hand either.

regards, tom lane

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-09 Thread David Rowley

On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey  wrote:
> Do we have any pending items for this patch now?

I'm just wondering if not trying this when the query has a DISTINCT
clause is a copout.  What I wanted to avoid was doing additional
sorting work for WindowAgg just to have it destroyed by Hash
Aggregate.  I'm now wondering if adding both the original
slightly-less-sorted path plus the new slightly-more-sorted path then
if distinct decides to Hash Aggregate then it'll still be able to pick
the cheapest input path to do that on.  Unfortunately, our sort
costing just does not seem to be advanced enough to know that sorting
by fewer columns might be cheaper, so adding the additional path is
likely just going to result in add_path() ditching the old
slightly-less-sorted path due to the new slightly-more-sorted path
having better pathkeys. So, we'd probably be wasting our time if we
added both paths with the current sort costing code.

# explain analyze select * from pg_Class order by relkind,relname;
  QUERY PLAN
---
 Sort  (cost=36.01..37.04 rows=412 width=273) (actual
time=0.544..0.567 rows=412 loops=1)
   Sort Key: relkind, relname
   Sort Method: quicksort  Memory: 109kB
   ->  Seq Scan on pg_class  (cost=0.00..18.12 rows=412 width=273)
(actual time=0.014..0.083 rows=412 loops=1)
 Planning Time: 0.152 ms
 Execution Time: 0.629 ms
(6 rows)


# explain analyze select * from pg_Class order by relkind;
  QUERY PLAN
---
 Sort  (cost=36.01..37.04 rows=412 width=273) (actual
time=0.194..0.218 rows=412 loops=1)
   Sort Key: relkind
   Sort Method: quicksort  Memory: 109kB
   ->  Seq Scan on pg_class  (cost=0.00..18.12 rows=412 width=273)
(actual time=0.014..0.083 rows=412 loops=1)
 Planning Time: 0.143 ms
 Execution Time: 0.278 ms
(6 rows)

the total cost is the same for both of these, but the execution time
seems to vary quite a bit.

Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys.  However, if the ORDER
BY has fewer columns then it might be cheaper to Hash Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.

Ideally, our sort costing would just be better, but I think that
raises the bar a little too high to start thinking of making
improvements to that for this patch.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-09 Thread Ankit Kumar Pandey




On 09/01/23 17:53, David Rowley wrote:



We need to keep looping backwards until we find the first WindowClause
which does not contain the pathkeys of the ORDER BY.


I found the cause of confusion, *first* WindowClause means first from

forward direction. Since we are looping from backward, I took it first

from the last.


eg:

select count(*) over (order by a), count(*) over (order by a,b), 
count(*) over (order by a,b,c) from abcd order by a,b,c,d;


first window clause is count(*) over (order by a) which we are using for 
order-by pushdown.



This is in sync with implementation as well.



I couldn't quite understand why the foreach() loop's
condition couldn't just be "if (foreach_current_index(l) ==
sort_pushdown_idx)", but I see that if we don't check if the path is
already correctly sorted that we could end up pushing the sort down
onto the path that's already correctly sorted.  We decided we didn't
want to move the sort around if it does not reduce the amount of
sorting.


Yes, this was the reason, the current patch handles this without is_sort 
now, which is great.



All the tests you added still pass. Although, I didn't
really study the tests yet to see if everything we talked about is
covered.


It covers general cases and exceptions. Also, I did few additional 
tests. Looked good.



It turned out the sort_pushdown_idx = foreach_current_index(l) - 1;
break; didn't work as if all the WindowClauses have pathkeys contained
in the order by pathkeys then we don't ever set sort_pushdown_idx. I
adjusted it to do:



if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
   sort_pushdown_idx = foreach_current_index(l);
else
  break;


Yes, that would have been problematic. I have verified this case

and on related note, I have added a test case that ensure order by pushdown

shouldn't happen if window function's order by is superset of query's 
order by.



I also fixed up the outdated comments and changed it so we only set
orderby_pathkeys once instead of once per loop in the
foreach_reverse() loop.


Thanks, code look a lot neater now (is_sorted is gone and handled in a 
better way).



I gave some thought to whether doing foreach_delete_current() is safe
within a foreach_reverse() loop. I didn't try it, but I couldn't see
any reason why not.  It is pretty late here and I'd need to test that
to be certain. If it turns out not to be safe then we need to document
that fact in the comments of the foreach_reverse() macro and the
foreach_delete_current() macro.


I tested foreach_delete_current inside foreach_reverse loop.

It worked fine.


I have attached patch with one extra test case (as mentioned above). 
Rest of the changes are looking fine.


Ran pgbench again and optimized version still had a lead (168 tps vs 135 
tps) in performance.



Do we have any pending items for this patch now?

--

Regards,
Ankit Kumar Pandey
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..cf59ebfe87 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4421,8 +4421,14 @@ create_one_window_path(PlannerInfo *root,
 	   List *activeWindows)
 {
 	PathTarget *window_target;
+	Query	   *parse = root->parse;
 	ListCell   *l;
 	List	   *topqual = NIL;
+	List	   *window_pathkeys;
+	List	   *orderby_pathkeys = NIL;
+	int			sort_pushdown_idx = -1;
+	int			presorted_keys;
+	bool		is_sorted;
 
 	/*
 	 * Since each window clause could require a different sort order, we stack
@@ -4441,17 +4447,99 @@ create_one_window_path(PlannerInfo *root,
 	 */
 	window_target = input_target;
 
+	/*
+	 * Here we attempt to minimize the number of sorts which must be performed
+	 * for queries with an ORDER BY clause.
+	 *
+	 * It's possible due to select_active_windows() putting the WindowClauses
+	 * with the lowest tleSortGroupRef last in the activeWindows list that the
+	 * final WindowClause has a subset of the sort requirements that the
+	 * query's ORDER BY clause has.  Below we try to detect this case and if
+	 * we find this is true, we consider adjusting the sort that's done for
+	 * WindowAggs and include the additional clauses so that no additional
+	 * sorting is required for the query's ORDER BY clause.
+	 *
+	 * We don't try this optimization for the following cases:
+	 *
+	 * 1.	If the query has a DISTINCT clause.  We needn't waste any
+	 *		additional effort for the more strict sort order as if DISTINCT
+	 *		is done via Hash Aggregate then that's going to undo this work.
+	 *
+	 * 2.	If the query has a LIMIT clause.  The top-level sort will be
+	 *		able to use a top-n sort which might be more efficient than
+	 *		sorting	by the additional columns.  If the LIMIT does not reduce
+	 *		the	number of rows significantly then this might not be true, but
+	 *		we don't try to consider that here.
+	 *
+	 * 3.	If the top-level WindowClause has a runCondition then this can
+	 *		filter out tuples and make the final sort

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-09 Thread David Rowley

On Mon, 9 Jan 2023 at 21:34, Ankit Kumar Pandey  wrote:
>
>
> > On 09/01/23 03:48, David Rowley wrote:
> > 3. I don't think you need to set "index" on every loop. Why not just
> set it to foreach_current_index(l) - 1; break;
>
> Consider this query
>
> EXPLAIN (COSTS OFF)
> SELECT empno,
> depname,
> min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
> sum(salary) OVER (PARTITION BY depname) depsalary,
> count(*) OVER (ORDER BY enroll_date DESC) c
> FROM empsalary
> ORDER BY depname, empno, enroll_date;
>
>
> Here, W1 = min(salary) OVER (PARTITION BY depname ORDER BY empno) W2 =
> sum(salary) OVER (PARTITION BY depname)
>
> W3 = count(*) OVER (ORDER BY enroll_date DESC)
>
> (1,2,3 are winref).
>
>
> activeWindows = [W3, W1, W2]
>
> If we iterate from reverse and break at first occurrence, we will
> break at W2 and add extra keys there, but what we want it to add
> keys at W1 so that it gets spilled to W2 (as existing logic is designed to
> carry over sorted cols first to last).

We need to keep looping backwards until we find the first WindowClause
which does not contain the pathkeys of the ORDER BY.  When we find a
WindowClause that does not contain the pathkeys of the ORDER BY, then
we must set the sort_pushdown_idx to the index of the prior
WindowClause.  I couldn't quite understand why the foreach() loop's
condition couldn't just be "if (foreach_current_index(l) ==
sort_pushdown_idx)", but I see that if we don't check if the path is
already correctly sorted that we could end up pushing the sort down
onto the path that's already correctly sorted.  We decided we didn't
want to move the sort around if it does not reduce the amount of
sorting.

I had to try this out for myself and I've ended up with the attached
v6 patch.  All the tests you added still pass. Although, I didn't
really study the tests yet to see if everything we talked about is
covered.

It turned out the sort_pushdown_idx = foreach_current_index(l) - 1;
break; didn't work as if all the WindowClauses have pathkeys contained
in the order by pathkeys then we don't ever set sort_pushdown_idx. I
adjusted it to do:

if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
sort_pushdown_idx = foreach_current_index(l);
else
break;

I also fixed up the outdated comments and changed it so we only set
orderby_pathkeys once instead of once per loop in the
foreach_reverse() loop.

I gave some thought to whether doing foreach_delete_current() is safe
within a foreach_reverse() loop. I didn't try it, but I couldn't see
any reason why not.  It is pretty late here and I'd need to test that
to be certain. If it turns out not to be safe then we need to document
that fact in the comments of the foreach_reverse() macro and the
foreach_delete_current() macro.

David
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 000d757bdd..cf59ebfe87 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4421,8 +4421,14 @@ create_one_window_path(PlannerInfo *root,
   List *activeWindows)
 {
PathTarget *window_target;
+   Query  *parse = root->parse;
ListCell   *l;
List   *topqual = NIL;
+   List   *window_pathkeys;
+   List   *orderby_pathkeys = NIL;
+   int sort_pushdown_idx = -1;
+   int presorted_keys;
+   boolis_sorted;

/*
 * Since each window clause could require a different sort order, we 
stack
@@ -4441,17 +4447,99 @@ create_one_window_path(PlannerInfo *root,
 */
window_target = input_target;

+   /*
+* Here we attempt to minimize the number of sorts which must be 
performed
+* for queries with an ORDER BY clause.
+*
+* It's possible due to select_active_windows() putting the 
WindowClauses
+* with the lowest tleSortGroupRef last in the activeWindows list that 
the
+* final WindowClause has a subset of the sort requirements that the
+* query's ORDER BY clause has.  Below we try to detect this case and if
+* we find this is true, we consider adjusting the sort that's done for
+* WindowAggs and include the additional clauses so that no additional
+* sorting is required for the query's ORDER BY clause.
+*
+* We don't try this optimization for the following cases:
+*
+* 1.   If the query has a DISTINCT clause.  We needn't waste any
+*  additional effort for the more strict sort order as if 
DISTINCT
+*  is done via Hash Aggregate then that's going to undo 
this work.
+*
+* 2.   If the query has a LIMIT clause.  The top-level sort will be
+*  able to use a top-n sort which might be more efficient 
than
+*  sorting by the

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-09 Thread Ankit Kumar Pandey




On 09/01/23 03:48, David Rowley wrote:



On Mon, 9 Jan 2023 at 06:17, Ankit Kumar Pandey   wrote:

I have addressed all points 1-6 in the attached patch.



A few more things:



1. You're still using the 'i' variable in the foreach loop.

foreach_current_index() will work.


2. I think the "index" variable needs a better name. sort_pushdown_idx maybe.


Done these (1 & 2)


3. I don't think you need to set "index" on every loop. Why not just

set it to foreach_current_index(l) - 1; break;

Consider this query

EXPLAIN (COSTS OFF)

SELECT empno,

   depname,

   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,

   sum(salary) OVER (PARTITION BY depname) depsalary,

   count(*) OVER (ORDER BY enroll_date DESC) c

FROM empsalary

ORDER BY depname, empno, enroll_date;


Here, W1 = min(salary) OVER (PARTITION BY depname ORDER BY empno) W2 = 
sum(salary) OVER (PARTITION BY depname)


W3 = count(*) OVER (ORDER BY enroll_date DESC)

(1,2,3 are winref).


activeWindows = [W3, W1, W2]


If we iterate from reverse and break at first occurrence, we will

break at W2 and add extra keys there, but what we want it to add

keys at W1 so that it gets spilled to W2 (as existing logic is designed to

carry over sorted cols first to last).


4. You're still setting orderby_pathkeys in the foreach loop. That's

already been set above and it won't have changed.


5. I don't think there's any need to check pathkeys_contained_in() in

the foreach loop anymore. With #3 the index will be -1 if the
optimisation cannot apply. You could likely also get rid of
try_sort_pushdown too and just make the condition "if
(sort_pushdown_idx == foreach_current_index(l))".

Done this.

Added pathkeys_contained_in as an assert, hope that's okay.

I'm a little unsure why there's still the is_sorted check there.  

Shouldn't that always be false now that you're looping until the pathkeys
don't match in the foreach_reverse loop?

Removing is_sorted causes issue if there is matching pathkey which is 
presorted


eg this case

-- Do not perform sort pushdown if column is presorted
CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;

EXPLAIN (COSTS OFF)
SELECT empno,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;

We can move this to if (try_sort_pushdown) block but it looks to me bit 
ugly.


Nevertheless, it make sense to have it here, sort_pushdown_index should 
point to exact


window function which needs to be modified. Having extra check (for 
is_sorted) in 2nd foreach loop


adds ambiguity if we don't add it in first check.

foreach_reverse(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
orderby_pathkeys = 
make_pathkeys_for_sortclauses(root,root->parse->sortClause,root->processed_tlist);
window_pathkeys = 
make_pathkeys_for_window(root,wc,root->processed_tlist);
is_sorted = 
pathkeys_count_contained_in(window_pathkeys,path->pathkeys,_keys);
has_runcondition |= (wc->runCondition != NIL);
if (!pathkeys_contained_in(window_pathkeys, orderby_pathkeys) || 
has_runcondition)
break;
if(!is_sorted)
sort_pushdown_idx = foreach_current_index(l);
}

Tests passes on this so logically it is ok.


Correct me if I'm wrong as I've not tested, but I think the new code
in the foreach loop can just become:

if (sort_pushdown_idx == foreach_current_index(l))
{
   Assert(!is_sorted);
   window_pathkeys = orderby_pathkeys;
   is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys, _keys);
}


Depending on where we have is_sorted (as mentioned above) it looks a lot 
like you mentioned.


Also, we can add Assert(pathkeys_contained_in(window_pathkeys, 
orderby_pathkeys))



I have one doubt regarding runCondition, do we only need to ensure
that window function which has subset sort clause of main query should
not have runCondition or none of the window functions should not contain
runCondition? I have gone with later case but wanted to clarify.



Actually, maybe it's ok just to check the top-level WindowClause for
runConditions. It's only that one that'll filter rows.  That probably
simplifies the code quite a bit. Lower-level runConditions only serve
to halt the evaluation of WindowFuncs when the runCondition is no
longer met.


Okay, then this approach makes sense.


hmm, perhaps the Limit would have to be put between the WindowAgg and
Sort for it to work.  Maybe that's more complexity than it's worth.


Yes, not specific to this change. It is more around allowing top-N sort in

window functions (in general). Once we have it there, then this could be 
taken care of.



I have attached patch which fixes 1 & 2 and rearrange is_sorted.

Point #3 needs to be resolved (and perhaps another way to handle is_sorted)


Thanks,

--
Regards,
Ankit Kumar Pandey
diff --git a/src/backend/optimizer/plan/planner.c

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-08 Thread David Rowley

On Mon, 9 Jan 2023 at 06:17, Ankit Kumar Pandey  wrote:
> I have addressed all points 1-6 in the attached patch.

A few more things:

1. You're still using the 'i' variable in the foreach loop.
foreach_current_index() will work.

2. I think the "index" variable needs a better name. sort_pushdown_idx maybe.

3. I don't think you need to set "index" on every loop. Why not just
set it to foreach_current_index(l) - 1; break;

4. You're still setting orderby_pathkeys in the foreach loop. That's
already been set above and it won't have changed.

5. I don't think there's any need to check pathkeys_contained_in() in
the foreach loop anymore. With #3 the index will be -1 if the
optimisation cannot apply. You could likely also get rid of
try_sort_pushdown too and just make the condition "if
(sort_pushdown_idx == foreach_current_index(l))".  I'm a little unsure
why there's still the is_sorted check there.  Shouldn't that always be
false now that you're looping until the pathkeys don't match in the
foreach_reverse loop?

Correct me if I'm wrong as I've not tested, but I think the new code
in the foreach loop can just become:

if (sort_pushdown_idx == foreach_current_index(l))
{
Assert(!is_sorted);
window_pathkeys = orderby_pathkeys;
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys, _keys);
}

> I have one doubt regarding runCondition, do we only need to ensure
> that window function which has subset sort clause of main query should
> not have runCondition or none of the window functions should not contain
> runCondition? I have gone with later case but wanted to clarify.

Actually, maybe it's ok just to check the top-level WindowClause for
runConditions. It's only that one that'll filter rows.  That probably
simplifies the code quite a bit. Lower-level runConditions only serve
to halt the evaluation of WindowFuncs when the runCondition is no
longer met.

>
>
> > Also, do you want to have a go at coding up the sort bound pushdown
> > too?  It'll require removing the limitCount restriction and adjusting
> > ExecSetTupleBound() to recurse through a WindowAggState. I think it's
> > pretty easy. You could try it then play around with it to make sure it
> > works and we get the expected performance.
>
> I tried this in the patch but kept getting `retrieved too many tuples in
> a bounded sort`.
>
> Added following code in ExecSetTupleBound which correctly found sortstate
>
> and set bound value.
>
> else if(IsA(child_node, WindowAggState))
>
> {
>
> WindowAggState  *winstate = (WindowAggState *) child_node;
>
> if (outerPlanState(winstate))
>
> ExecSetTupleBound(tuples_needed, 
> outerPlanState(winstate));
>
> }
>
> I think problem is that are not using limit clause inside window
> function (which
> may need to scan all tuples) so passing bound value to
> WindowAggState->sortstate
> is not working as we might expect. Or maybe I am getting it wrong? I was
> trying to
> have top-N sort for limit clause if orderby pushdown happens.

hmm, perhaps the Limit would have to be put between the WindowAgg and
Sort for it to work.  Maybe that's more complexity than it's worth.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-08 Thread David Rowley

On Mon, 9 Jan 2023 at 05:06, Vik Fearing  wrote:
> +EXPLAIN (COSTS OFF)
> +SELECT empno,
> +   depname,
> +   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
> +   sum(salary) OVER (PARTITION BY depname) depsalary,
> +   count(*) OVER (ORDER BY enroll_date DESC) c
> +FROM empsalary
> +ORDER BY depname, empno, enroll_date;
> +  QUERY PLAN
> +--
> + WindowAgg
> +   ->  WindowAgg
> + ->  Sort
> +   Sort Key: depname, empno, enroll_date
> +   ->  WindowAgg
> + ->  Sort
> +   Sort Key: enroll_date DESC
> +   ->  Seq Scan on empsalary

> Why aren't min() and sum() calculated on the same WindowAgg run?

We'd need to have an ORDER BY per WindowFunc rather than per
WindowClause to do that.  The problem is when there is no ORDER BY,
all rows are peers.

Likely there likely are a bunch more optimisations we could do in that
area. I think all the builtin window functions (not aggregates being
used as window functions) don't care about peer rows, so it may be
possible to merge the WindowClauses when the WIndowClause being merged
only has window functions that don't care about peer rows.  Not for
this patch though.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-08 Thread Vik Fearing


On 1/8/23 18:05, Ankit Kumar Pandey wrote:



On 08/01/23 21:36, Vik Fearing wrote:



On 1/8/23 11:21, Ankit Kumar Pandey wrote:


Please find attached patch with addressed issues mentioned before.




I am curious about this plan:



+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+   depname,
+   min(salary) OVER (PARTITION BY depname ORDER BY empno) 
depminsalary,

+   sum(salary) OVER (PARTITION BY depname) depsalary,
+   count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+  QUERY PLAN
+--
+ WindowAgg
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: depname, empno, enroll_date
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: enroll_date DESC
+   ->  Seq Scan on empsalary
+(8 rows)
+




Why aren't min() and sum() calculated on the same WindowAgg run?


Isn't that exactly what is happening here? First count() with sort on 
enroll_date is run and


then min() and sum()?


No, there are two passes over the window for those two but I don't see 
that there needs to be.


Only difference between this and plan generated by master(given below) 
is a sort in the end.


Then this is probably not this patch's job to fix.
--
Vik Fearing

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-08 Thread Ankit Kumar Pandey




On 08/01/23 16:33, David Rowley wrote:



On Sun, 8 Jan 2023 at 23:21, Ankit Kumar Pandey  wrote:

Please find attached patch with addressed issues mentioned before.


Here's a quick review:

1. You can use foreach_current_index(l) to obtain the index of the list element.

2. I'd rather see you looping backwards over the list in the first
loop. I think it'll be more efficient to loop backwards as you can
just break out the loop when you see the pathkeys are not contained in
the order by pathkreys.  When the optimisation does not apply that
means you only need to look at the last item in the list.  You could
maybe just invent foreach_reverse() for this purpose and put it in
pg_list.h.  That'll save having to manually code up the loop.

3. I don't think you should call the variable
enable_order_by_pushdown. We've a bunch of planner related GUCs that
start with enable_. That might cause a bit of confusion.  Maybe just
try_sort_pushdown.

4. You should widen the scope of orderby_pathkeys and set it within
the if (enable_order_by_pushdown) scope. You can reuse this again in
the 2nd loop too.  Just initialise it to NIL

5. You don't seem to be checking all WindowClauses for a runCondtion.
If you do #2 above then you can start that process in the initial
reverse loop so that you've checked them all by the time you get
around to that WindowClause again in the 2nd loop.

6. The test with "+-- Do not perform additional sort if column is
presorted". I don't think "additional" is the correct word here.  I
think you want to ensure that we don't push down the ORDER BY below
the WindowAgg for this case. There is no ability to reduce the sorts
here, only move them around, which we agreed we don't want to do for
this case.



I have addressed all points 1-6 in the attached patch.

I have one doubt regarding runCondition, do we only need to ensure

that window function which has subset sort clause of main query should

not have runCondition or none of the window functions should not contain

runCondition? I have gone with later case but wanted to clarify.



Also, do you want to have a go at coding up the sort bound pushdown
too?  It'll require removing the limitCount restriction and adjusting
ExecSetTupleBound() to recurse through a WindowAggState. I think it's
pretty easy. You could try it then play around with it to make sure it
works and we get the expected performance.


I tried this in the patch but kept getting `retrieved too many tuples in 
a bounded sort`.


Added following code in ExecSetTupleBound which correctly found sortstate

and set bound value.

else if(IsA(child_node, WindowAggState))

{

WindowAggState  *winstate = (WindowAggState *) child_node;

if (outerPlanState(winstate))

ExecSetTupleBound(tuples_needed, 
outerPlanState(winstate));

}

I think problem is that are not using limit clause inside window 
function (which


may need to scan all tuples) so passing bound value to 
WindowAggState->sortstate


is not working as we might expect. Or maybe I am getting it wrong? I was 
trying to


have top-N sort for limit clause if orderby pushdown happens.


You'll likely want to add a few more rows than the last performance test you 
did or run the query
with pgbench. Running a query once that only takes 1-2ms is likely not
a reliable way to test the performance of something.


I did some profiling.

CREATE TABLE empsalary1 (
depname varchar,
empno bigint,
salary int,
enroll_date date
);
INSERT INTO empsalary1(depname, empno, salary, enroll_date)
SELECT string_agg 
(substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', ceil 
(random() * 62)::integer, 1000), '')
 AS depname, generate_series(1, 1000) AS empno, ceil 
(random()*1)::integer AS salary
, NOW() + (random() * (interval '90 days')) + '30 days' AS enroll_date;

1) Optimization case

EXPLAIN (ANALYZE)
SELECT empno,
   depname,
   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
   sum(salary) OVER (PARTITION BY depname) depsalary,
   count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary1
ORDER BY depname, empno, enroll_date;

EXPLAIN (ANALYZE)
SELECT empno,
   depname,
   min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary1
ORDER BY depname, empno;



In patched version:

   QUERY PLAN
   
--

---
 WindowAgg  (cost=93458.04..93458.08 rows=1 width=61) (actual 
time=149996.006..156756.995 rows=1000 loops=1)
   ->  WindowAgg  (cost=93458.04..93458.06 rows=1 width=57) (actual 
time=108559.126..135892.188 rows=1000 loops=1)
 ->  Sort  (cost=93458.04..93458.04 rows=1 width=53) (actual 
time=108554.213..112564.168 rows=1000 loops=1)

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-08 Thread Ankit Kumar Pandey





On 08/01/23 21:36, Vik Fearing wrote:



On 1/8/23 11:21, Ankit Kumar Pandey wrote:


Please find attached patch with addressed issues mentioned before.




I am curious about this plan:



+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+   depname,
+   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+   sum(salary) OVER (PARTITION BY depname) depsalary,
+   count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+  QUERY PLAN
+--
+ WindowAgg
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: depname, empno, enroll_date
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: enroll_date DESC
+   ->  Seq Scan on empsalary
+(8 rows)
+




Why aren't min() and sum() calculated on the same WindowAgg run?


Isn't that exactly what is happening here? First count() with sort on 
enroll_date is run and


then min() and sum()?


Only difference between this and plan generated by master(given below) 
is a sort in the end.


 QUERY PLAN

 Incremental Sort
   Sort Key: depname, empno, enroll_date
   Presorted Key: depname, empno
   ->  WindowAgg
 ->  WindowAgg
   ->  Sort
 Sort Key: depname, empno
 ->  WindowAgg
   ->  Sort
 Sort Key: enroll_date DESC
 ->  Seq Scan on empsalary

Let me know if I am missing anything. Thanks.

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-08 Thread Vik Fearing


On 1/8/23 11:21, Ankit Kumar Pandey wrote:


Please find attached patch with addressed issues mentioned before.



I am curious about this plan:

+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+   depname,
+   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+   sum(salary) OVER (PARTITION BY depname) depsalary,
+   count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+  QUERY PLAN
+--
+ WindowAgg
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: depname, empno, enroll_date
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: enroll_date DESC
+   ->  Seq Scan on empsalary
+(8 rows)
+


Why aren't min() and sum() calculated on the same WindowAgg run?
--
Vik Fearing

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-08 Thread David Rowley

On Sun, 8 Jan 2023 at 23:21, Ankit Kumar Pandey  wrote:
> Please find attached patch with addressed issues mentioned before.

Here's a quick review:

1. You can use foreach_current_index(l) to obtain the index of the list element.

2. I'd rather see you looping backwards over the list in the first
loop. I think it'll be more efficient to loop backwards as you can
just break out the loop when you see the pathkeys are not contained in
the order by pathkreys.  When the optimisation does not apply that
means you only need to look at the last item in the list.  You could
maybe just invent foreach_reverse() for this purpose and put it in
pg_list.h.  That'll save having to manually code up the loop.

3. I don't think you should call the variable
enable_order_by_pushdown. We've a bunch of planner related GUCs that
start with enable_. That might cause a bit of confusion.  Maybe just
try_sort_pushdown.

4. You should widen the scope of orderby_pathkeys and set it within
the if (enable_order_by_pushdown) scope. You can reuse this again in
the 2nd loop too.  Just initialise it to NIL

5. You don't seem to be checking all WindowClauses for a runCondtion.
If you do #2 above then you can start that process in the initial
reverse loop so that you've checked them all by the time you get
around to that WindowClause again in the 2nd loop.

6. The test with "+-- Do not perform additional sort if column is
presorted". I don't think "additional" is the correct word here.  I
think you want to ensure that we don't push down the ORDER BY below
the WindowAgg for this case. There is no ability to reduce the sorts
here, only move them around, which we agreed we don't want to do for
this case.

Also, do you want to have a go at coding up the sort bound pushdown
too?  It'll require removing the limitCount restriction and adjusting
ExecSetTupleBound() to recurse through a WindowAggState. I think it's
pretty easy. You could try it then play around with it to make sure it
works and we get the expected performance. You'll likely want to add a
few more rows than the last performance test you did or run the query
with pgbench. Running a query once that only takes 1-2ms is likely not
a reliable way to test the performance of something.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-08 Thread Ankit Kumar Pandey


Hi David,

Please find attached patch with addressed issues mentioned before.

Things resolved:

1. Correct position of window function from where order by push down can 
happen


is determined, this fixes issue mentioned in case #1.


While writing test cases, I found that optimization do not happen for
case #1

(which is prime candidate for such operation) like

EXPLAIN (COSTS OFF)
SELECT empno,
 depname,
 min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
 sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date


2. Point #2 as in above discussions


a) looks like the best plan to me.  What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.


Works as mentioned.

All test cases (newly added and existing ones) are green.

--
Regards,
Ankit Kumar Pandey
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..fd3fcd43a0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4423,6 +4423,12 @@ create_one_window_path(PlannerInfo *root,
 	PathTarget *window_target;
 	ListCell   *l;
 	List	   *topqual = NIL;
+	bool		has_runcondition = false;
+
+	List	   *window_pathkeys;
+	int			index = 0;
+	int			i;
+	bool		enable_order_by_pushdown = false;
 
 	/*
 	 * Since each window clause could require a different sort order, we stack
@@ -4441,10 +4447,45 @@ create_one_window_path(PlannerInfo *root,
 	 */
 	window_target = input_target;
 
+	enable_order_by_pushdown = (root->parse->distinctClause == NIL &&
+root->parse->sortClause != NIL &&
+root->parse->limitCount == NULL &&
+!contain_window_function((Node *) get_sortgrouplist_exprs(root->parse->sortClause,
+	  root->processed_tlist)));
+
+
+	i=0;
+	if (enable_order_by_pushdown)
+	{
+		/*
+		 * Search for the window function whose path keys are
+		 * subset of orderby_path keys, this allows us to perform
+		 * order by pushdown from this position of window function onwards
+		 */
+		foreach(l, activeWindows)
+		{
+			List	   *orderby_pathkeys;
+			WindowClause *wc = lfirst_node(WindowClause, l);
+			orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+			 root->parse->sortClause,
+			 root->processed_tlist);
+			window_pathkeys = make_pathkeys_for_window(root,
+		wc,
+		root->processed_tlist);
+			if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+			{
+
+index = i;
+break;
+			}
+			i++;
+		}
+	}
+
+	i=0;
 	foreach(l, activeWindows)
 	{
 		WindowClause *wc = lfirst_node(WindowClause, l);
-		List	   *window_pathkeys;
 		int			presorted_keys;
 		bool		is_sorted;
 		bool		topwindow;
@@ -4457,6 +4498,38 @@ create_one_window_path(PlannerInfo *root,
 path->pathkeys,
 _keys);
 
+		has_runcondition |= (wc->runCondition != NIL);
+
+		/*
+		 * If not sorted already and the query has an ORDER BY clause, then we
+		 * try to push the entire ORDER BY below the final WindowAgg.  We
+		 * don't do this when the query has a DISTINCT clause as the planner
+		 * might opt to do a Hash Aggregate and spoil our efforts here.  We
+		 * also can't do this when the ORDER BY contains any WindowFuncs as we
+		 * obviously can't sort something that's yet to be evaluated.  We also
+		 * don't bother with this when there is a WindowClauses with a
+		 * runCondition as those can reduce the number of rows to sort in the
+		 * ORDER BY and we don't want add complexity to the WindowAgg sort if
+		 * the final ORDER BY sort is going to have far fewer rows to sort.
+		 * For the same reason, don't bother if the query has a LIMIT clause.
+		 */
+		if (enable_order_by_pushdown && !is_sorted && !has_runcondition && (i == index))
+		{
+			List	   *orderby_pathkeys;
+
+			orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+			 root->parse->sortClause,
+			 root->processed_tlist);
+
+			if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+window_pathkeys = orderby_pathkeys;
+
+			is_sorted = pathkeys_count_contained_in(window_pathkeys,
+	path->pathkeys,
+	_keys);
+		}
+		i++;
+
 		/* Sort if necessary */
 		if (!is_sorted)
 		{
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..08663fd914 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3982,7 +3982,194 @@ ORDER BY depname, empno;
  ->  Seq Scan on empsalary
 (11 rows)
 
+-- Do not perform additional sort if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread Ankit Kumar Pandey




On 08/01/23 04:06, David Rowley wrote:


On Sun, 8 Jan 2023 at 05:45, Ankit Kumar Pandey  wrote:

Attached patch with test cases.



I can look at this in a bit more detail if you find a way to fix the
case you mentioned earlier. i.e, push the sort down to the deepest
WindowAgg that has pathkeys contained in the query's ORDER BY
pathkeys.



EXPLAIN (COSTS OFF)
SELECT empno,
  depname,
  min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
  sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
 QUERY PLAN
---
Incremental Sort
  Sort Key: depname, empno, enroll_date
  Presorted Key: depname, empno
  ->  WindowAgg
->  WindowAgg
 ->  Sort
   Sort Key: depname, empno
   ->  Seq Scan on empsalary
(8 rows)

You'll also need to pay attention to how the has_runcondition is set.
If you start pushing before looking at all the WindowClauses then you
won't know if some later WindowClause has a runCondition. 


Yes, this should be the main culprit.



Adding an additional backwards foreach loop should allow you to do all the
required prechecks and find the index of the WindowClause which you
should start pushing from.


This should do the trick. Thanks for headup, I will update the patch 
with suggested


changes and required fixes.


Regards,

Ankit

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread Ankit Kumar Pandey




On 08/01/23 03:56, David Rowley wrote:

> (your email client still seems broken)

I am looking at this again, will be changing client for here onward.


You might need to have another loop before the foreach loop that loops
backwards through the WindowClauses and remembers the index of the
WindowClause which has pathkeys contained in the query's ORDER BY
pathkeys then apply the optimisation from that point in the main
foreach loop.  Also, if the condition within the foreach loop which
checks when we want to apply this optimisation is going to be run > 1
time, then you should probably have  boolean variable that's set
before the loop which saves if we're going to try to apply the
optimisation.  That'll save from having to check things like if the
 query has a LIMIT clause multiple times.


Thanks, this should do the trick.


a) looks like the best plan to me.  What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.


Yes, you are right, not in this case. I actually mentioned wrong case here,

real problematic case is:

EXPLAIN (COSTS OFF)
SELECT empno,
   depname,
   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
   sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
QUERY PLAN
---
 Incremental Sort
   Sort Key: depname, empno, enroll_date
   Presorted Key: depname, empno
   ->  WindowAgg
 ->  WindowAgg
   ->  Incremental Sort
 Sort Key: depname, empno
 Presorted Key: depname
 ->  Index Scan using depname_idx on empsalary
(9 rows)

Here, it could have sorted on depname, empno, enroll_date.

Again, as I mentioned before, this is implementation issue. We shouldn't be

skipping optimization if pre-sorted keys are present.

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread David Rowley

On Sun, 8 Jan 2023 at 05:45, Ankit Kumar Pandey  wrote:
> Attached patch with test cases.

I can look at this in a bit more detail if you find a way to fix the
case you mentioned earlier. i.e, push the sort down to the deepest
WindowAgg that has pathkeys contained in the query's ORDER BY
pathkeys.

EXPLAIN (COSTS OFF)
SELECT empno,
   depname,
   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
   sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
  QUERY PLAN
---
 Incremental Sort
   Sort Key: depname, empno, enroll_date
   Presorted Key: depname, empno
   ->  WindowAgg
 ->  WindowAgg
   ->  Sort
 Sort Key: depname, empno
 ->  Seq Scan on empsalary
(8 rows)

You'll also need to pay attention to how the has_runcondition is set.
If you start pushing before looking at all the WindowClauses then you
won't know if some later WindowClause has a runCondition. Adding an
additional backwards foreach loop should allow you to do all the
required prechecks and find the index of the WindowClause which you
should start pushing from.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread David Rowley

(your email client still seems broken)

On Sun, 8 Jan 2023 at 05:27, Ankit Kumar Pandey  wrote:
>
>
> While writing test cases, I found that optimization do not happen for
> case #1
>
> (which is prime candidate for such operation) like
>
> EXPLAIN (COSTS OFF)
> SELECT empno,
> depname,
> min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
> sum(salary) OVER (PARTITION BY depname) depsalary
> FROM empsalary
> ORDER BY depname, empno, enroll_date
>
> This happens because mutual exclusiveness of two operands (when number
> of window functions > 1) viz
>
> is_sorted and last activeWindow in the condition:
>
> ( !is_sorted && lnext(activeWindows, l) == NULL)
>
> For 2nd last window function, is_sorted is false and path keys get added.
>
> In next run (for last window function), is_sorted becomes true and whole
> optimization
>
> part is skipped.
>
> Note: Major issue that if I remove is_sorted from condition, even though
>
> path keys are added, it still do not perform optimization and works same
> as in master/unoptimized case.
>
> Perhaps adding path keys at last window function is not doing trick?
> Maybe we need to add pathkeys
>
> to all window functions which are subset of query's order by
> irrespective of being last or not?

You might need to have another loop before the foreach loop that loops
backwards through the WindowClauses and remembers the index of the
WindowClause which has pathkeys contained in the query's ORDER BY
pathkeys then apply the optimisation from that point in the main
foreach loop.  Also, if the condition within the foreach loop which
checks when we want to apply this optimisation is going to be run > 1
time, then you should probably have  boolean variable that's set
before the loop which saves if we're going to try to apply the
optimisation.  That'll save from having to check things like if the
query has a LIMIT clause multiple times.

> Case #2:
>
> For presorted columns, eg
>
> CREATE INDEX depname_idx ON empsalary(depname);
> SET enable_seqscan=0;
> EXPLAIN (COSTS OFF)
> SELECT empno,
>  min(salary) OVER (PARTITION BY depname) depminsalary
> FROM empsalary
> ORDER BY depname, empno;
>
> Is this correct plan:
>
> a)
>
>QUERY PLAN
> ---
>   Incremental Sort
> Sort Key: depname, empno
> Presorted Key: depname
> ->  WindowAgg
>   ->  Index Scan using depname_idx on empsalary
> (5 rows)
>
> or this:
>
> b) (Akin to Optimized version)
>
>QUERY PLAN
> ---
>   WindowAgg
> ->  Incremental Sort
>   Sort Key: depname, empno
>   Presorted Key: depname
>   ->  Index Scan using depname_idx on empsalary
> (5 rows)
>
> Patched version does (a) because of is_sorted condition.

a) looks like the best plan to me.  What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread David Rowley

On Sun, 8 Jan 2023 at 01:52, Ankit Kumar Pandey  wrote:
> I am just wondering though, why can we not do top-N sort
> in optimized version if we include limit clause? Is top-N sort is
> limited to non strict sorting or
> cases last operation before limit is sort? .

Maybe the sort bound can be pushed down. You'd need to adjust
ExecSetTupleBound() so that it pushes the bound through
WindowAggState.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread Ankit Kumar Pandey



On 07/01/23 21:57, Ankit Kumar Pandey wrote:

On 07/01/23 09:58, David Rowley wrote:
>
> The attached patch has no tests added. It's going to need some of
> those.

While writing test cases, I found that optimization do not happen for
case #1

(which is prime candidate for such operation) like

EXPLAIN (COSTS OFF)
SELECT empno,
     depname,
     min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
     sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date

This happens because mutual exclusiveness of two operands (when number
of window functions > 1) viz

is_sorted and last activeWindow in the condition:

( !is_sorted && lnext(activeWindows, l) == NULL)

For 2nd last window function, is_sorted is false and path keys get added.

In next run (for last window function), is_sorted becomes true and whole
optimization

part is skipped.

Note: Major issue that if I remove is_sorted from condition, even though

path keys are added, it still do not perform optimization and works same
as in master/unoptimized case.

Perhaps adding path keys at last window function is not doing trick?
Maybe we need to add pathkeys

to all window functions which are subset of query's order by
irrespective of being last or not?


Case #2:

For presorted columns, eg

CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;
EXPLAIN (COSTS OFF)
SELECT empno,
      min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;

Is this correct plan:

a)

    QUERY PLAN
---
   Incremental Sort
     Sort Key: depname, empno
     Presorted Key: depname
     ->  WindowAgg
   ->  Index Scan using depname_idx on empsalary
(5 rows)

or this:

b) (Akin to Optimized version)

    QUERY PLAN
---
   WindowAgg
     ->  Incremental Sort
   Sort Key: depname, empno
   Presorted Key: depname
   ->  Index Scan using depname_idx on empsalary
(5 rows)

Patched version does (a) because of is_sorted condition.

If we remove both is_sorted and lnext(activeWindows, l) == NULL conditions,

we get correct results in these two cases.



Attached patch with test cases.

For case #2, test cases still uses (a) as expected output which I don't 
think is right


and we should revisit. Other than that, only failing case is due to 
issue mentioned in case #1.


Thanks


--
Regards,
Ankit Kumar Pandey
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..781916f219 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4423,6 +4423,7 @@ create_one_window_path(PlannerInfo *root,
 	PathTarget *window_target;
 	ListCell   *l;
 	List	   *topqual = NIL;
+	bool		has_runcondition = false;
 
 	/*
 	 * Since each window clause could require a different sort order, we stack
@@ -4457,6 +4458,42 @@ create_one_window_path(PlannerInfo *root,
 path->pathkeys,
 _keys);
 
+		has_runcondition |= (wc->runCondition != NIL);
+
+		/*
+		 * If not sorted already and the query has an ORDER BY clause, then we
+		 * try to push the entire ORDER BY below the final WindowAgg.  We
+		 * don't do this when the query has a DISTINCT clause as the planner
+		 * might opt to do a Hash Aggregate and spoil our efforts here.  We
+		 * also can't do this when the ORDER BY contains any WindowFuncs as we
+		 * obviously can't sort something that's yet to be evaluated.  We also
+		 * don't bother with this when there is a WindowClauses with a
+		 * runCondition as those can reduce the number of rows to sort in the
+		 * ORDER BY and we don't want add complexity to the WindowAgg sort if
+		 * the final ORDER BY sort is going to have far fewer rows to sort.
+		 * For the same reason, don't bother if the query has a LIMIT clause.
+		 */
+		if (!is_sorted && lnext(activeWindows, l) == NULL &&
+			root->parse->distinctClause == NIL &&
+			root->parse->sortClause != NIL &&
+			!has_runcondition && root->parse->limitCount == NULL &&
+			!contain_window_function((Node *) get_sortgrouplist_exprs(root->parse->sortClause,
+	  root->processed_tlist)))
+		{
+			List	   *orderby_pathkeys;
+
+			orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+			 root->parse->sortClause,
+			 root->processed_tlist);
+
+			if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+window_pathkeys = orderby_pathkeys;
+
+			is_sorted = pathkeys_count_contained_in(window_pathkeys,
+	path->pathkeys,
+	_keys);
+		}
+
 		/* Sort if necessary */
 		if (!is_sorted)
 		{
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..2b8b5c4533 100644
--- a/src/test/regress/expected/window.out
+++

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread Ankit Kumar Pandey




On 07/01/23 09:58, David Rowley wrote:


The attached patch has no tests added. It's going to need some of
those.


While writing test cases, I found that optimization do not happen for 
case #1


(which is prime candidate for such operation) like

EXPLAIN (COSTS OFF)
SELECT empno,
   depname,
   min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
   sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date

This happens because mutual exclusiveness of two operands (when number 
of window functions > 1) viz


is_sorted and last activeWindow in the condition:

( !is_sorted && lnext(activeWindows, l) == NULL)

For 2nd last window function, is_sorted is false and path keys get added.

In next run (for last window function), is_sorted becomes true and whole 
optimization


part is skipped.

Note: Major issue that if I remove is_sorted from condition, even though

path keys are added, it still do not perform optimization and works same 
as in master/unoptimized case.


Perhaps adding path keys at last window function is not doing trick? 
Maybe we need to add pathkeys


to all window functions which are subset of query's order by 
irrespective of being last or not?



Case #2:

For presorted columns, eg

CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;
EXPLAIN (COSTS OFF)
SELECT empno,
    min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;

Is this correct plan:

a)

  QUERY PLAN
---
 Incremental Sort
   Sort Key: depname, empno
   Presorted Key: depname
   ->  WindowAgg
 ->  Index Scan using depname_idx on empsalary
(5 rows)

or this:

b) (Akin to Optimized version)

  QUERY PLAN
---
 WindowAgg
   ->  Incremental Sort
 Sort Key: depname, empno
 Presorted Key: depname
 ->  Index Scan using depname_idx on empsalary
(5 rows)

Patched version does (a) because of is_sorted condition.

If we remove both is_sorted and lnext(activeWindows, l) == NULL conditions,

we get correct results in these two cases.


--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread Ankit Kumar Pandey

On 07/01/23 17:28, David Rowley wrote:

Your email client seems to be adding additional vertical space to your
emails. I've removed the additional newlines in the quotes. Are you
able to fix the client so it does not do that?

I have adjusted my mail client, hope it is better now?

On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey  wrote:
>
> On 07/01/23 09:58, David Rowley wrote:
> > You also don't seem to be considering the fact that the query might
> > have a DISTINCT clause.
>
> Major reason for this was that I am not exactly aware of what distinct
> clause means (especially in
>
> context of window functions) and how it is different from other
> sortClauses (partition, order by, group).

I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT.
If you look in the grouping_planner() function, you'll see that
create_distinct_paths() is called between create_window_paths() and
create_ordered_paths().

Yes just saw this and got what you meant.

> Yes, this is a fair point. Multiple sort is actually beneficial in cases
> like this, perhaps limits clause and runCondition should be no op too?

I'm not sure what you mean by "no op". Do you mean not apply the optimization?

Yes, no op = no optimization. Sorry I didn't mention it before.

> > I think the patch should also be using pathkeys_contained_in() and
> > Lists of pathkeys rather than concatenating lists of SortGroupClauses
> > together.  That should allow things to work correctly when a given
> > pathkey has become redundant due to either duplication or a Const in
> > the Eclass.
>
> Make sense, I actually duplicated that logic from
> make_pathkeys_for_window. We should make this changes there as well because
> if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)
> (weird example but you get the idea), it leads to duplicates in
> window_sortclauses.

It won't lead to duplicate pathkeys. Look in
make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does.
Notice that it checks if the pathkey already exists in the list before
appending.

Okay I see this, pathkey_is_redundant is much more smarter as well.

Replacing list_concat_copy with list_concat_unique in 
make_pathkeys_for_window

won't be of much benefit.

> Agree with runConditions part but for limit clause, row reduction happens
> at the last, so whether we use patched version or master version,
> none of sorts would benefit/degrade from that, right?

Maybe you're right. Just be aware that a sort done for a query with an
ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to
store the top-n tuples and that can significantly reduce the memory
required for sorting perhaps resulting in the sort fitting in memory
rather than spilling out to disk.

You might want to test this by having the leading sort column as an
INT, and then the 2nd one as a long text column of maybe around two
kilobytes. Make all the leading column values the same so that the
comparison for the text column is always performed. Make the LIMIT
small compared to the total number of rows, that should test the worse
case.  Check the performance with and without the limitCount != NULL
part of the patch that disables the optimization for LIMIT.

I checked this. For limit <<< total number of rows, top-n sort was

performed but when I changed limit to higher value (or no limit),

quick sort was performed.

Top-n sort was twice as fast.

Also, tested (first) patch version vs master, top-n sort

was twice as fast there as well (outputs mentioned below).

Current patch version (with limit excluded for optimizations)

explain (analyze ,costs off) select count(*) over (order by id) from tt 
order by id, name limit 1;

    QUERY PLAN
---
 Limit (actual time=1.718..1.719 rows=1 loops=1)
   ->  Incremental Sort (actual time=1.717..1.717 rows=1 loops=1)
 Sort Key: id, name
 Presorted Key: id
 Full-sort Groups: 1  Sort Method: top-N heapsort  Average 
Memory: 25kB  Peak Memory: 25kB

 ->  WindowAgg (actual time=0.028..0.036 rows=6 loops=1)
   ->  Sort (actual time=0.017..0.018 rows=6 loops=1)
 Sort Key: id
 Sort Method: quicksort  Memory: 25kB
 ->  Seq Scan on tt (actual time=0.011..0.012 
rows=6 loops=1)

 Planning Time: 0.069 ms
 Execution Time: 1.799 ms

Earlier patch(which included limit clause for optimizations)

explain (analyze ,costs off) select count(*) over (order by id) from tt 
order by id, name limit 1;

 QUERY PLAN

 Limit (actual time=3.766..3.767 rows=1 loops=1)
   ->  WindowAgg (actual time=3.764..3.765 rows=1 loops=1)
 ->  Sort (actual time=3.749..3.750 rows=6 loops=1)
   Sort Key: id, name
   Sort Method: quicksort

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread David Rowley

Your email client seems to be adding additional vertical space to your
emails. I've removed the additional newlines in the quotes. Are you
able to fix the client so it does not do that?

On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey  wrote:
>
> On 07/01/23 09:58, David Rowley wrote:
> > You also don't seem to be considering the fact that the query might
> > have a DISTINCT clause.
>
> Major reason for this was that I am not exactly aware of what distinct
> clause means (especially in
>
> context of window functions) and how it is different from other
> sortClauses (partition, order by, group).

I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT.
If you look in the grouping_planner() function, you'll see that
create_distinct_paths() is called between create_window_paths() and
create_ordered_paths().

> Yes, this is a fair point. Multiple sort is actually beneficial in cases
> like this, perhaps limits clause and runCondition should be no op too?

I'm not sure what you mean by "no op". Do you mean not apply the optimization?

> > I think the patch should also be using pathkeys_contained_in() and
> > Lists of pathkeys rather than concatenating lists of SortGroupClauses
> > together.  That should allow things to work correctly when a given
> > pathkey has become redundant due to either duplication or a Const in
> > the Eclass.
>
> Make sense, I actually duplicated that logic from
> make_pathkeys_for_window. We should make this changes there as well because
> if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)
> (weird example but you get the idea), it leads to duplicates in
> window_sortclauses.

It won't lead to duplicate pathkeys. Look in
make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does.
Notice that it checks if the pathkey already exists in the list before
appending.

> Agree with runConditions part but for limit clause, row reduction happens
> at the last, so whether we use patched version or master version,
> none of sorts would benefit/degrade from that, right?

Maybe you're right. Just be aware that a sort done for a query with an
ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to
store the top-n tuples and that can significantly reduce the memory
required for sorting perhaps resulting in the sort fitting in memory
rather than spilling out to disk.

You might want to test this by having the leading sort column as an
INT, and then the 2nd one as a long text column of maybe around two
kilobytes. Make all the leading column values the same so that the
comparison for the text column is always performed. Make the LIMIT
small compared to the total number of rows, that should test the worse
case.  Check the performance with and without the limitCount != NULL
part of the patch that disables the optimization for LIMIT.

> Is it okay
> to add comments in test cases? I don't see it much on existing cases
> so kind of reluctant to add but it makes intentions much more clear.

I think tests should always have a comment to state what they're
testing. Not many people seem to do that, unfortunately. The problem
with not stating what the test is testing is that if, for example, the
test is checking that the EXPLAIN output is showing a Sort, what if at
some point in the future someone adjusts some costing code and the
plan changes to an Index Scan.  If there's no comment to state that
we're looking for a Sort plan, then the author of the patch that's
adjusting the costs might just think it's ok to change the expected
plan to an Index Scan.   I've seen this problem occur even when the
comments *do* exist. There's just about no hope of such a test
continuing to do what it's meant to if the comments don't exist.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread Ankit Kumar Pandey




On 07/01/23 07:59, David Rowley wrote:

On Thu, 5 Jan 2023 at 04:11, Ankit Kumar Pandey  wrote:

Attaching test cases for this (+ small change in doc).

Tested this in one of WIP branch where I had modified
select_active_windows and it failed

as expected.

Please let me know if something can be improved in this.

Thanks for writing that.

I had a look over the patch and ended up making some adjustments to
the tests. Looking back at 728202b63, I think any tests we add here
should be kept alongside the tests added by that commit rather than
tacked on to the end of the test file.  It also makes sense to me just
to use the same table as the original tests. I also thought the
comment in select_active_windows should be in the sort comparator
function instead. I think that's a more likely place to capture the
attention of anyone making modifications.

Thanks, I will look it through.

I've now pushed the adjusted patch.


I can't seem to find updated patch in the attachment, can you please

forward the patch again.

Thanks.

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-07 Thread Ankit Kumar Pandey


Thanks for looking into this.

On 07/01/23 09:58, David Rowley wrote:

On Sat, 7 Jan 2023 at 02:11, Ankit Kumar Pandey  wrote:

On 05/01/23 12:53, David Rowley wrote:

We *can* reuse Sorts where a more strict or equivalent sort order is
available.  The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY.  One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.

I attempted this in attached patch.

I had a quick look at this and it's going to need some additional code
to ensure there are no WindowFuncs in the ORDER BY clause.  We can't
sort on those before we evaluate them.

Okay I will add this check.

I also don't think there's any point in adding the additional pathkeys
when the input path is already presorted.

It would be better to leave this case alone and just do the
incremental sort afterwards.


So this will be no operation case well.


You also don't seem to be considering the fact that the query might
have a DISTINCT clause.


Major reason for this was that I am not exactly aware of what distinct 
clause means (especially in


context of window functions) and how it is different from other 
sortClauses (partition, order by, group).


Comments in parsenodes.h didn't help.


   That's evaluated between window functions and
the order by. It would be fairly useless to do a more strict sort when
the sort order is going to be obliterated by a Hash Aggregate. Perhaps
we can just not do this when the query has a DISTINCT clause.

On the other hand, there are also a few reasons why we shouldn't do
this. I mentioned the WindowClause runConditions earlier here.

The patched version produces:

postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
   QUERY PLAN
--
  WindowAgg (actual time=0.488..0.497 rows=9 loops=1)
Run Condition: (row_number() OVER (?) < 10)
->  Sort (actual time=0.466..0.468 rows=10 loops=1)
  Sort Key: pg_class.oid, pg_class.relname
  Sort Method: quicksort  Memory: 67kB
  ->  Seq Scan on pg_class (actual time=0.028..0.170 rows=420 loops=1)
  Planning Time: 0.214 ms
  Execution Time: 0.581 ms
(8 rows)

Whereas master produces:

postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
QUERY PLAN

  Incremental Sort (actual time=0.506..0.508 rows=9 loops=1)
Sort Key: pg_class.oid, pg_class.relname
Presorted Key: pg_class.oid
Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 26kB
Peak Memory: 26kB
->  WindowAgg (actual time=0.475..0.483 rows=9 loops=1)
  Run Condition: (row_number() OVER (?) < 10)
  ->  Sort (actual time=0.461..0.461 rows=10 loops=1)
Sort Key: pg_class.oid
Sort Method: quicksort  Memory: 67kB
->  Seq Scan on pg_class (actual time=0.022..0.178
rows=420 loops=1)
  Planning Time: 0.245 ms
  Execution Time: 0.594 ms
(12 rows)

(slightly bad example since oid is unique but...)

It's not too clear to me that the patched version is a better plan.
The bottom level sort, which sorts 420 rows has a more complex
comparison to do. Imagine the 2nd column is a long text string. That
would make the sort much more expensive.  The top-level sort has far
fewer rows to sort due to the runCondition filtering out anything that
does not match rn1 < 10. The same can be said for a query with a LIMIT
clause.


Yes, this is a fair point. Multiple sort is actually beneficial in cases

like this, perhaps limits clause and runCondition should be no op too?


I think the patch should also be using pathkeys_contained_in() and
Lists of pathkeys rather than concatenating lists of SortGroupClauses
together.  That should allow things to work correctly when a given
pathkey has become redundant due to either duplication or a Const in
the Eclass.


Make sense, I actually duplicated that logic from

make_pathkeys_for_window. We should make this changes there as well because

if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)

(weird example but you get the idea), it leads to duplicates in 
window_sortclauses.



Also, since I seem to be only be able to think of these cases properly
by actually trying them, I ended up with the attached patch.  I opted
to not do the optimisation when there are runConditions or a LIMIT
clause.

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-06 Thread David Rowley

On Sat, 7 Jan 2023 at 02:11, Ankit Kumar Pandey  wrote:
> > On 05/01/23 12:53, David Rowley wrote:
> >>
> >> We *can* reuse Sorts where a more strict or equivalent sort order is
> >> available.  The question is how do we get the final WindowClause to do
> >> something slightly more strict to save having to do anything for the
> >> ORDER BY.  One way you might think would be to adjust the
> >> WindowClause's orderClause to add the additional clauses, but that
> >> cannot be done because that would cause are_peers() in nodeWindowAgg.c
> >> to not count some rows as peers when they maybe should be given a less
> >> strict orderClause in the WindowClause.
>
> I attempted this in attached patch.

I had a quick look at this and it's going to need some additional code
to ensure there are no WindowFuncs in the ORDER BY clause.  We can't
sort on those before we evaluate them.

Right now you get:

postgres=# explain select *,row_number() over (order by oid) rn1 from
pg_class order by oid,rn1;
ERROR:  could not find pathkey item to sort

I also don't think there's any point in adding the additional pathkeys
when the input path is already presorted.  Have a look at:

postgres=# set enable_seqscan=0;
SET
postgres=# explain select *,row_number() over (order by oid) rn1 from
pg_class order by oid,relname;
 QUERY PLAN

 WindowAgg  (cost=0.43..85.44 rows=412 width=281)
   ->  Incremental Sort  (cost=0.43..79.26 rows=412 width=273)
 Sort Key: oid, relname
 Presorted Key: oid
 ->  Index Scan using pg_class_oid_index on pg_class
(cost=0.27..60.72 rows=412 width=273)
(5 rows)

It would be better to leave this case alone and just do the
incremental sort afterwards.

You also don't seem to be considering the fact that the query might
have a DISTINCT clause.  That's evaluated between window functions and
the order by. It would be fairly useless to do a more strict sort when
the sort order is going to be obliterated by a Hash Aggregate. Perhaps
we can just not do this when the query has a DISTINCT clause.

On the other hand, there are also a few reasons why we shouldn't do
this. I mentioned the WindowClause runConditions earlier here.

The patched version produces:

postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
  QUERY PLAN
--
 WindowAgg (actual time=0.488..0.497 rows=9 loops=1)
   Run Condition: (row_number() OVER (?) < 10)
   ->  Sort (actual time=0.466..0.468 rows=10 loops=1)
 Sort Key: pg_class.oid, pg_class.relname
 Sort Method: quicksort  Memory: 67kB
 ->  Seq Scan on pg_class (actual time=0.028..0.170 rows=420 loops=1)
 Planning Time: 0.214 ms
 Execution Time: 0.581 ms
(8 rows)

Whereas master produces:

postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
   QUERY PLAN

 Incremental Sort (actual time=0.506..0.508 rows=9 loops=1)
   Sort Key: pg_class.oid, pg_class.relname
   Presorted Key: pg_class.oid
   Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 26kB
Peak Memory: 26kB
   ->  WindowAgg (actual time=0.475..0.483 rows=9 loops=1)
 Run Condition: (row_number() OVER (?) < 10)
 ->  Sort (actual time=0.461..0.461 rows=10 loops=1)
   Sort Key: pg_class.oid
   Sort Method: quicksort  Memory: 67kB
   ->  Seq Scan on pg_class (actual time=0.022..0.178
rows=420 loops=1)
 Planning Time: 0.245 ms
 Execution Time: 0.594 ms
(12 rows)

(slightly bad example since oid is unique but...)

It's not too clear to me that the patched version is a better plan.
The bottom level sort, which sorts 420 rows has a more complex
comparison to do. Imagine the 2nd column is a long text string. That
would make the sort much more expensive.  The top-level sort has far
fewer rows to sort due to the runCondition filtering out anything that
does not match rn1 < 10. The same can be said for a query with a LIMIT
clause.

I think the patch should also be using pathkeys_contained_in() and
Lists of pathkeys rather than concatenating lists of SortGroupClauses
together.  That should allow things to work correctly when a given
pathkey has become redundant due to either duplication or a Const in
the Eclass.

Also, since I seem to be only be able to think of these cases properly
by actually trying them, I ended up with the attached patch.  I opted
to not do the optimisation when there are runConditions or a LIMIT
clause.  Doing it when there are runConditions

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-06 Thread David Rowley

On Thu, 5 Jan 2023 at 04:11, Ankit Kumar Pandey  wrote:
>
> Attaching test cases for this (+ small change in doc).
>
> Tested this in one of WIP branch where I had modified
> select_active_windows and it failed
>
> as expected.
>
> Please let me know if something can be improved in this.

Thanks for writing that.

I had a look over the patch and ended up making some adjustments to
the tests. Looking back at 728202b63, I think any tests we add here
should be kept alongside the tests added by that commit rather than
tacked on to the end of the test file.  It also makes sense to me just
to use the same table as the original tests. I also thought the
comment in select_active_windows should be in the sort comparator
function instead. I think that's a more likely place to capture the
attention of anyone making modifications.

I've now pushed the adjusted patch.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-06 Thread Ankit Kumar Pandey


Sorry if multiple mails has been sent for this.



On 05/01/23 12:53, David Rowley wrote:


We *can* reuse Sorts where a more strict or equivalent sort order is
available.  The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY.  One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.


I attempted this in attached patch.


#1. No op case

--In patched 
version---


explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a,b,c) FROM abcd order by a;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  WindowAgg
   ->  Sort
 Sort Key: b
 ->  Seq Scan on abcd
(7 rows)

explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a,b,c) FROM abcd;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: b
 ->  WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  Seq Scan on abcd
(7 rows)

--In 
master



explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a,b,c) FROM abcd order by a;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  WindowAgg
   ->  Sort
 Sort Key: b
 ->  Seq Scan on abcd
(7 rows)
explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a,b,c) FROM abcd;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: b
 ->  WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  Seq Scan on abcd
(7 rows)

No change between patched version and master.


2. In case where optimization can happen

In patched 
version---


explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a) FROM abcd order by a,b;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: a, b
 ->  WindowAgg
   ->  Sort
 Sort Key: b
 ->  Seq Scan on abcd
(7 rows)

explain (costs off)  SELECT rank() OVER (ORDER BY a), count(*) OVER 
(ORDER BY b), count(*) OVER (PARTITION BY a ORDER BY b) FROM abcd order 
by a,b,c,d;

   QUERY PLAN

 WindowAgg
   ->  WindowAgg
 ->  Sort
   Sort Key: a, b, c, d
   ->  WindowAgg
 ->  Sort
   Sort Key: b
   ->  Seq Scan on abcd
(8 rows)

---In 
master


explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a) FROM abcd order by a,b;

   QUERY PLAN

 Incremental Sort
   Sort Key: a, b
   Presorted Key: a
   ->  WindowAgg
 ->  Sort
   Sort Key: a
   ->  WindowAgg
 ->  Sort
   Sort Key: b
   ->  Seq Scan on abcd
(10 rows)

explain (costs off)  SELECT rank() OVER (ORDER BY a), count(*) OVER 
(ORDER BY b), count(*) OVER (PARTITION BY a ORDER BY b) FROM abcd order 
by a,b,c,d;

  QUERY PLAN
--
 Incremental Sort
   Sort Key: a, b, c, d
   Presorted Key: a, b
   ->  WindowAgg
 ->  WindowAgg
   ->  Sort
 Sort Key: a, b
 ->  WindowAgg
   ->  Sort
 Sort Key: b
 ->  Seq Scan on abcd
(11 rows)

Patched version removes few sorts.

Regression tests all passed so it is not breaking anything existing.

We don't have any tests for verifying sorting plan in window functions 
(which would have failed, if present).


Please let me know any feedbacks (I have added some my own concerns in 
the comments)


Thanks


--
Regards,
Ankit Kumar Pandey
From 8fb4b6188eda111bee2e47d3e85289064b428702 Mon Sep 17 00:00:00 2001
From: Ankit Kumar Pandey 
Date: Fri, 6 Jan 2023 14:30:38 +0530
Subject: [PATCH] Teach

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-05 Thread David Rowley

On Thu, 5 Jan 2023 at 19:14, Ankit Kumar Pandey  wrote:
> We are already doing something like I mentioned.
>
> Consider this example:
>
> explain SELECT rank() OVER (ORDER BY a), count(*) OVER (ORDER BY a,b)
> FROM abcd;
>  QUERY PLAN
> --
>   WindowAgg  (cost=83.80..127.55 rows=1250 width=24)
> ->  WindowAgg  (cost=83.80..108.80 rows=1250 width=16)
>   ->  Sort  (cost=83.80..86.92 rows=1250 width=8)
> Sort Key: a, b
> ->  Seq Scan on abcd  (cost=0.00..19.50 rows=1250 width=8)
> (5 rows)
>
>
> If it is okay to do extra sort for first window function (rank) here,
> why would it be
>
> any different in case which I mentioned?

We *can* reuse Sorts where a more strict or equivalent sort order is
available.  The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY.  One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.

It might be possible to adjust create_one_window_path() so that when
processing the final WindowClause that it looks at the DISTINCT or
ORDER BY clause to see if we can sort on a few extra columns to save
having to do any further sorting.  We just *cannot* make any
adjustments to the WindowClause's orderClause.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread Ankit Kumar Pandey




On 05/01/23 12:53, David Rowley wrote:


We *can* reuse Sorts where a more strict or equivalent sort order is
available.  The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY.  One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.

Okay, now I see issue in my approach.

It might be possible to adjust create_one_window_path() so that when
processing the final WindowClause that it looks at the DISTINCT or
ORDER BY clause to see if we can sort on a few extra columns to save
having to do any further sorting.  We just *cannot* make any
adjustments to the WindowClause's orderClause.


This is much better solution. I will check

create_one_window_path for the same.

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread David Rowley

On Thu, 5 Jan 2023 at 16:12, Tom Lane  wrote:
>
> David Rowley  writes:
> > Additionally, it's also not that clear to me that sorting by more
> > columns in the sort below the WindowAgg would always be a win over
> > doing the final sort for the ORDER BY.  What if the WHERE clause (that
> > could not be pushed down before a join) filtered out the vast majority
> > of the rows before the ORDER BY. It might be cheaper to do the sort
> > then than to sort by the additional columns earlier.
>
> That's certainly a legitimate question to ask, but I don't quite see
> where you figure we'd be sorting more rows?  WHERE filtering happens
> before window functions, which never eliminate any rows.  So it seems
> like a sort just before the window functions must sort the same number
> of rows as one just after them.

Yeah, I didn't think the WHERE clause thing out carefully enough. I
think it's only the WindowClause's runCondition that could possibly
filter any rows between the Sort below the WindowAgg and before the
ORDER BY is evaluated.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread Ankit Kumar Pandey




On 05/01/23 07:48, Vik Fearing wrote:

On 1/4/23 13:07, Ankit Kumar Pandey wrote:

Also, one thing, consider the following query:

explain analyze select row_number() over (order by a,b),count(*) over 
(order by a) from abcd order by a,b,c;


In this case, sorting is done on (a,b) followed by incremental sort 
on c at final stage.


If we do just one sort: a,b,c at first stage then there won't be need 
to do another sort (incremental one).



This could give incorrect results.  Consider the following query:

postgres=# select a, b, c, rank() over (order by a, b)
from (values (1, 2, 1), (1, 2, 2), (1, 2, 1)) as abcd (a, b, c)
order by a, b, c;

 a | b | c | rank
---+---+---+--
 1 | 2 | 1 |    1
 1 | 2 | 1 |    1
 1 | 2 | 2 |    1
(3 rows)


If you change the window's ordering like you suggest, you get this 
different result:



postgres=# select a, b, c, rank() over (order by a, b, c)
from (values (1, 2, 1), (1, 2, 2), (1, 2, 1)) as abcd (a, b, c)
order by a, b, c;

 a | b | c | rank
---+---+---+--
 1 | 2 | 1 |    1
 1 | 2 | 1 |    1
 1 | 2 | 2 |    3
(3 rows)



We are already doing something like I mentioned.

Consider this example:

explain SELECT rank() OVER (ORDER BY a), count(*) OVER (ORDER BY a,b) 
FROM abcd;

    QUERY PLAN
--
 WindowAgg  (cost=83.80..127.55 rows=1250 width=24)
   ->  WindowAgg  (cost=83.80..108.80 rows=1250 width=16)
 ->  Sort  (cost=83.80..86.92 rows=1250 width=8)
   Sort Key: a, b
   ->  Seq Scan on abcd  (cost=0.00..19.50 rows=1250 width=8)
(5 rows)


If it is okay to do extra sort for first window function (rank) here, 
why would it be


any different in case which I mentioned?

My suggestion rest on assumption that for a window function, say

rank() OVER (ORDER BY a), ordering of columns (other than column 'a') 
shouldn't matter.



--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread Tom Lane

David Rowley  writes:
> Additionally, it's also not that clear to me that sorting by more
> columns in the sort below the WindowAgg would always be a win over
> doing the final sort for the ORDER BY.  What if the WHERE clause (that
> could not be pushed down before a join) filtered out the vast majority
> of the rows before the ORDER BY. It might be cheaper to do the sort
> then than to sort by the additional columns earlier.

That's certainly a legitimate question to ask, but I don't quite see
where you figure we'd be sorting more rows?  WHERE filtering happens
before window functions, which never eliminate any rows.  So it seems
like a sort just before the window functions must sort the same number
of rows as one just after them.

regards, tom lane

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread David Rowley

On Thu, 5 Jan 2023 at 15:18, Vik Fearing  wrote:
>
> On 1/4/23 13:07, Ankit Kumar Pandey wrote:
> > Also, one thing, consider the following query:
> >
> > explain analyze select row_number() over (order by a,b),count(*) over
> > (order by a) from abcd order by a,b,c;
> >
> > In this case, sorting is done on (a,b) followed by incremental sort on c
> > at final stage.
> >
> > If we do just one sort: a,b,c at first stage then there won't be need to
> > do another sort (incremental one).
>
>
> This could give incorrect results.

Yeah, this seems to be what I warned against in [1].

If we wanted to make that work we'd need to do it without adjusting
the WindowClause's orderClause so that the peer row checks still
worked correctly in nodeWindowAgg.c.

Additionally, it's also not that clear to me that sorting by more
columns in the sort below the WindowAgg would always be a win over
doing the final sort for the ORDER BY.  What if the WHERE clause (that
could not be pushed down before a join) filtered out the vast majority
of the rows before the ORDER BY. It might be cheaper to do the sort
then than to sort by the additional columns earlier.

David

[1] 
https://www.postgresql.org/message-id/CAApHDvp=r1lnekcmwcyarumpl-jp4j_sdc8yefywadt1ac5...@mail.gmail.com

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread Tom Lane

Vik Fearing  writes:
> On 1/4/23 13:07, Ankit Kumar Pandey wrote:
>> Also, one thing, consider the following query:
>> explain analyze select row_number() over (order by a,b),count(*) over 
>> (order by a) from abcd order by a,b,c;
>> In this case, sorting is done on (a,b) followed by incremental sort on c 
>> at final stage.
>> If we do just one sort: a,b,c at first stage then there won't be need to 
>> do another sort (incremental one).

> This could give incorrect results.

Mmmm ... your counterexample doesn't really prove that.  Yes,
the "rank()" step must consider only two ORDER BY columns while
deciding which rows are peers, but I don't see why it wouldn't
be okay if the rows happened to already be sorted by "c" within
those peer groups.

I don't recall the implementation details well enough to be sure
how hard it would be to keep that straight.

regards, tom lane

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread Vik Fearing


On 1/4/23 13:07, Ankit Kumar Pandey wrote:

Also, one thing, consider the following query:

explain analyze select row_number() over (order by a,b),count(*) over 
(order by a) from abcd order by a,b,c;


In this case, sorting is done on (a,b) followed by incremental sort on c 
at final stage.


If we do just one sort: a,b,c at first stage then there won't be need to 
do another sort (incremental one).



This could give incorrect results.  Consider the following query:

postgres=# select a, b, c, rank() over (order by a, b)
from (values (1, 2, 1), (1, 2, 2), (1, 2, 1)) as abcd (a, b, c)
order by a, b, c;

 a | b | c | rank
---+---+---+--
 1 | 2 | 1 |1
 1 | 2 | 1 |1
 1 | 2 | 2 |1
(3 rows)


If you change the window's ordering like you suggest, you get this 
different result:



postgres=# select a, b, c, rank() over (order by a, b, c)
from (values (1, 2, 1), (1, 2, 2), (1, 2, 1)) as abcd (a, b, c)
order by a, b, c;

 a | b | c | rank
---+---+---+--
 1 | 2 | 1 |1
 1 | 2 | 1 |1
 1 | 2 | 2 |3
(3 rows)


--
Vik Fearing

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread Ankit Kumar Pandey


Attaching test cases for this (+ small change in doc).

Tested this in one of WIP branch where I had modified 
select_active_windows and it failed


as expected.

Please let me know if something can be improved in this.


Regards,
Ankit Kumar Pandey
From 7647759eb92e1a0560bcff73b4169be8694f83d8 Mon Sep 17 00:00:00 2001
From: Ankit Kumar Pandey 
Date: Wed, 4 Jan 2023 19:57:57 +0530
Subject: [PATCH] Add test cases for optimal ordering of window function order
 by clauses

In case of multiple orderby clauses for window function, it is desired
to have minimum sort operations. This is already implemented in code but
corresponding test cases were missing.
This patch adds test cases covering various cases where order by clause
get optimized and some additional comments in function which implements
this feature.
---
 src/backend/optimizer/plan/planner.c |   5 +
 src/test/regress/expected/window.out | 140 +++
 src/test/regress/sql/window.sql  |  51 ++
 3 files changed, 196 insertions(+)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6ba7589f3..e3989a7b44 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5592,6 +5592,11 @@ optimize_window_clauses(PlannerInfo *root, WindowFuncLists *wflists)
  * select_active_windows
  *		Create a list of the "active" window clauses (ie, those referenced
  *		by non-deleted WindowFuncs) in the order they are to be executed.
+ *		Window clauses are ordered by the highest tleSortGroupRef first,
+ *		resulting in the lowest order tleSortGroupRefs being the last WindowAgg to be
+ *		processed. This reduces requirement for sort in lower order WindowAgg
+ *		as it is quite likely that required column is already sorted and if not,
+ *		there is possibility of incremental sort.
  */
 static List *
 select_active_windows(PlannerInfo *root, WindowFuncLists *wflists)
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 5505e9a2da..71d0cff587 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -4733,3 +4733,143 @@ SELECT * FROM pg_temp.f(2);
  {5}
 (5 rows)
 
+-- test optimal ordering for minimal sort operations
+CREATE temp TABLE abcd (a int, b int, c int, d int);
+INSERT INTO abcd
+  SELECT p,q,r,s FROM
+generate_series(1,5) p,
+generate_Series(1,5) q,
+generate_Series(1,5) r,
+generate_Series(1,5) s;
+EXPLAIN (COSTS OFF)
+  SELECT row_number() OVER (ORDER BY b),
+  count(*) OVER (ORDER BY a,b)
+FROM abcd ORDER BY a,b,c;
+   QUERY PLAN   
+
+ Incremental Sort
+   Sort Key: a, b, c
+   Presorted Key: a, b
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: a, b
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: b
+   ->  Seq Scan on abcd
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+  SELECT row_number() OVER (ORDER BY a, b),
+  count(*) OVER (ORDER BY a)
+FROM abcd ORDER BY a,b,c;
+QUERY PLAN
+--
+ Incremental Sort
+   Sort Key: a, b, c
+   Presorted Key: a, b
+   ->  WindowAgg
+ ->  WindowAgg
+   ->  Sort
+ Sort Key: a, b
+ ->  Seq Scan on abcd
+(8 rows)
+
+EXPLAIN (COSTS OFF)
+  SELECT row_number() OVER (ORDER BY a),
+  count(*) OVER (ORDER BY a,b,c)
+FROM abcd ORDER BY a,b;
+ QUERY PLAN 
+
+ WindowAgg
+   ->  WindowAgg
+ ->  Sort
+   Sort Key: a, b, c
+   ->  Seq Scan on abcd
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+  SELECT row_number() OVER (ORDER BY a),
+  count(*) OVER (ORDER BY a,c),
+  count(*) OVER (ORDER BY a,b)
+FROM abcd ORDER BY a,d;
+  QUERY PLAN  
+--
+ Incremental Sort
+   Sort Key: a, d
+   Presorted Key: a
+   ->  WindowAgg
+ ->  WindowAgg
+   ->  Incremental Sort
+ Sort Key: a, c
+ Presorted Key: a
+ ->  WindowAgg
+   ->  Sort
+ Sort Key: a, b
+ ->  Seq Scan on abcd
+(12 rows)
+
+EXPLAIN (COSTS OFF)
+  SELECT row_number() OVER (ORDER BY a),
+  count(*) OVER (ORDER BY a,b),
+  count(*) OVER (ORDER BY a,c)
+FROM abcd ORDER BY a,d;
+  QUERY PLAN  
+--
+ Incremental Sort
+   Sort Key: a, d
+   Presorted Key: a
+   ->  WindowAgg
+ ->  WindowAgg
+   ->  Incremental Sort
+ Sort Key: a, b
+ Presorted Key: a
+ ->  WindowAgg
+

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-04 Thread Ankit Kumar Pandey



On 04/01/23 09:32, David Rowley wrote:


It looks like that works by accident. I see no mention of this either
in the comments or in [1].


This kind of troubles me because function name 
/select_active_windows///doesn't tell me if its only job is


to reorder window clauses for optimizing sort. From code, I don't see it 
doing anything else either.




If we don't have one already, then we should likely add a regression
test that ensures that this remains true.  Since it does not seem to
be documented in the code anywhere, it seems like something that could
easily be overlooked if we were to ever refactor that code.

I don't see any tests in windows specific to sorting operation (and in 
what order). I will add those.



Also, one thing, consider the following query:

explain analyze select row_number() over (order by a,b),count(*) over 
(order by a) from abcd order by a,b,c;


In this case, sorting is done on (a,b) followed by incremental sort on c 
at final stage.


If we do just one sort: a,b,c at first stage then there won't be need to 
do another sort (incremental one).



Now, I am not sure if which one would be faster: sorting (a,b,c) vs 
sort(a,b) + incremental sort(c)


because even though datum sort is fast, there can be n number of combos 
where we won't be doing that.


I might be looking at extreme corner cases though but still wanted to share.




--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-03 Thread David Rowley

On Wed, 4 Jan 2023 at 03:11, Ankit Kumar Pandey  wrote:
> #2. If order by clause in the Window function is superset of order by in query
>
> explain analyze select a,row_number() over (order by a,b,c),count(*) over 
> (order by a,b) from abcd order by a;
>
>   QUERY PLAN
> --
>  WindowAgg  (cost=39.27..64.27 rows=625 width=28) (actual time=1.089..3.020 
> rows=625 loops=1)
>->  WindowAgg  (cost=39.27..53.34 rows=625 width=20) (actual 
> time=1.024..1.635 rows=625 loops=1)
>  ->  Sort  (cost=39.27..40.84 rows=625 width=12) (actual 
> time=1.019..1.084 rows=625 loops=1)
>Sort Key: a, b, c
>Sort Method: quicksort  Memory: 54kB
>->  Seq Scan on abcd  (cost=0.00..10.25 rows=625 width=12) 
> (actual time=0.023..0.265 rows=625 loops=1)
>  Planning Time: 0.071 ms
>  Execution Time: 3.156 ms
> (8 rows)
>
> No, additional sort is needed to be performed in this case, as you referred.

It looks like that works by accident. I see no mention of this either
in the comments or in [1].  What seems to be going on is that
common_prefix_cmp() is coded in such a way that the WindowClauses end
up ordered by the highest tleSortGroupRef first, resulting in the
lowest order tleSortGroupRefs being the last WindowAgg to be
processed.  We do transformSortClause() before
transformWindowDefinitions(), this is where the tleSortGroupRef
indexes are assigned, so the ORDER BY clause will have a lower
tleSortGroupRef than the WindowClauses.

If we don't have one already, then we should likely add a regression
test that ensures that this remains true.  Since it does not seem to
be documented in the code anywhere, it seems like something that could
easily be overlooked if we were to ever refactor that code.

I just tried moving the calls to transformWindowDefinitions() so that
they come before transformSortClause() and our regression tests still
pass.  That's not great.

With that change, the following query has an additional sort for the
ORDER BY clause which previously wasn't done.

explain select a,b,c,row_number() over (order by a) rn1, row_number()
over(partition by b) rn2, row_number() over (order by c) from abc
order by b;

David

[1] 
https://www.postgresql.org/message-id/flat/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-03 Thread Ankit Kumar Pandey


Hi,

On 03/01/23 08:21, David Rowley wrote:


I do think you'll likely want to put any WindowClauses which have
pathkeys which are a true subset or true superset of the ORDER BY /
DISTINCT pathkeys last.  If they're a superset then we won't need to
perform any additional ordering for the DISTINCT / ORDER BY clause.
If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort.  The existing
code should handle that part. You just need to make
select_active_windows() more intelligent.


I think current implementation does exactly this.

#1. If order by clause in the window function is subset of order by in query

create table abcd(a int, b int, c int, d int);
insert into abcd select x,y,z,c from generate_series(1,5) x, 
generate_Series(1,5)y, generate_Series(1,5) z, generate_Series(1,5) c;
explain analyze select a,row_number() over (order by b),count(*) over (order by 
a,b) from abcd order by a,b,c;

QUERY PLAN

--


 Incremental Sort  (cost=80.32..114.56 rows=625 width=28) (actual 
time=1.440..3.311 rows=625 loops=1)
   Sort Key: a, b, c
   Presorted Key: a, b
   Full-sort Groups: 13  Sort Method: quicksort  Average Memory: 28kB  Peak 
Memory: 28kB
   ->  WindowAgg  (cost=79.24..91.74 rows=625 width=28) (actual 
time=1.272..2.567 rows=625 loops=1)
 ->  Sort  (cost=79.24..80.80 rows=625 width=20) (actual 
time=1.233..1.296 rows=625 loops=1)
   Sort Key: a, b
   Sort Method: quicksort  Memory: 64kB
   ->  WindowAgg  (cost=39.27..50.21 rows=625 width=20) (actual 
time=0.304..0.786 rows=625 loops=1)
 ->  Sort  (cost=39.27..40.84 rows=625 width=12) (actual 
time=0.300..0.354 rows=625 loops=1)
   Sort Key: b
   Sort Method: quicksort  Memory: 54kB
   ->  Seq Scan on abcd  (cost=0.00..10.25 rows=625 
width=12) (actual time=0.021..0.161 rows=625 l
oops=1)
 Planning Time: 0.068 ms
 Execution Time: 3.509 ms
(15 rows)

Here, as window function (row count) has two cols a, b for order by, 
incremental sort is performed for remaining col in query,


which makes sense.


#2. If order by clause in the Window function is superset of order by in 
query


explain analyze select a,row_number() over (order by a,b,c),count(*) over 
(order by a,b) from abcd order by a;

  QUERY PLAN
--
 WindowAgg  (cost=39.27..64.27 rows=625 width=28) (actual time=1.089..3.020 
rows=625 loops=1)
   ->  WindowAgg  (cost=39.27..53.34 rows=625 width=20) (actual 
time=1.024..1.635 rows=625 loops=1)
 ->  Sort  (cost=39.27..40.84 rows=625 width=12) (actual 
time=1.019..1.084 rows=625 loops=1)
   Sort Key: a, b, c
   Sort Method: quicksort  Memory: 54kB
   ->  Seq Scan on abcd  (cost=0.00..10.25 rows=625 width=12) 
(actual time=0.023..0.265 rows=625 loops=1)
 Planning Time: 0.071 ms
 Execution Time: 3.156 ms
(8 rows)

No, additional sort is needed to be performed in this case, as you referred.

On 03/01/23 08:21, David Rowley wrote:
If they're a superset then we won't need to perform any additional 
ordering for the DISTINCT / ORDER BY clause.

If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort.


So question is, how current implementation is different from desired one?

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-02 Thread Ankit Kumar Pandey




On 03/01/23 08:21, David Rowley wrote:

On Mon, 26 Dec 2022 at 02:04, Ankit Kumar Pandey  wrote:

Point #1

In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
sorts. We also perform 3.

This shouldn't be too hard to do. See the code in
select_active_windows().  You'll likely want to pay attention to the
DISTINCT pathkeys if they exist and just use the ORDER BY pathkeys if
the query has no DISTINCT clause. DISTINCT is evaluated after Window
and before ORDER BY.

One idea to implement this would be to adjust the loop in
select_active_windows() so that we record any WindowClauses which have
the pathkeys contained in the ORDER BY / DISTINCT pathkeys then record
those separately and append those onto the end of the actives array
after the sort.

I do think you'll likely want to put any WindowClauses which have
pathkeys which are a true subset or true superset of the ORDER BY /
DISTINCT pathkeys last.  If they're a superset then we won't need to
perform any additional ordering for the DISTINCT / ORDER BY clause.
If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort.  The existing
code should handle that part. You just need to make
select_active_windows() more intelligent.

You might also think that we could perform additional optimisations
and also adjust the ORDER BY clause of a WindowClause if it contains
the pathkeys of the DISTINCT / ORDER BY clause. For example:

SELECT *,row_number() over (order by a,b) from tab order by a,b,c;

However, if you were to adjust the WindowClauses ORDER BY to become
a,b,c then you could produce incorrect results for window functions
that change their result based on peer rows.

Note the difference in results from:

create table ab(a int, b int);
insert into ab select x,y from generate_series(1,5) x, generate_Series(1,5)y;

select a,b,count(*) over (order by a) from ab order by a,b;
select a,b,count(*) over (order by a,b) from ab order by a,b;


Thanks, let me try this.



and Point #2

Teach planner to decide which window to evaluate first based on costs.
Currently the first window in the query is evaluated first, there may be no
index to help sort the first window, but perhaps there are for other windows
in the query. This may allow an index scan instead of a seqscan -> sort.

What Tom wrote about that in the first paragraph of [1] still applies.
The problem is that if the query contains many joins that to properly
find the cheapest way of executing the query we'd have to perform the
join search once for each unique sort order of each WindowClause.
That's just not practical to do from a performance standpoint. The
join search can be very expensive. There may be something that could
be done to better determine the most likely candidate for the first
WindowClause using some heuristics, but I've no idea what those would
be. You should look into point #1 first. Point #2 is significantly
more difficult to solve in a way that would be acceptable to the
project.


Okay, leaving this out for now.

--
Regards,
Ankit Kumar Pandey

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-02 Thread David Rowley

On Mon, 26 Dec 2022 at 02:04, Ankit Kumar Pandey  wrote:
> Point #1
>
> In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
> sorts. We also perform 3.

This shouldn't be too hard to do. See the code in
select_active_windows().  You'll likely want to pay attention to the
DISTINCT pathkeys if they exist and just use the ORDER BY pathkeys if
the query has no DISTINCT clause. DISTINCT is evaluated after Window
and before ORDER BY.

One idea to implement this would be to adjust the loop in
select_active_windows() so that we record any WindowClauses which have
the pathkeys contained in the ORDER BY / DISTINCT pathkeys then record
those separately and append those onto the end of the actives array
after the sort.

I do think you'll likely want to put any WindowClauses which have
pathkeys which are a true subset or true superset of the ORDER BY /
DISTINCT pathkeys last.  If they're a superset then we won't need to
perform any additional ordering for the DISTINCT / ORDER BY clause.
If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort.  The existing
code should handle that part. You just need to make
select_active_windows() more intelligent.

You might also think that we could perform additional optimisations
and also adjust the ORDER BY clause of a WindowClause if it contains
the pathkeys of the DISTINCT / ORDER BY clause. For example:

SELECT *,row_number() over (order by a,b) from tab order by a,b,c;

However, if you were to adjust the WindowClauses ORDER BY to become
a,b,c then you could produce incorrect results for window functions
that change their result based on peer rows.

Note the difference in results from:

create table ab(a int, b int);
insert into ab select x,y from generate_series(1,5) x, generate_Series(1,5)y;

select a,b,count(*) over (order by a) from ab order by a,b;
select a,b,count(*) over (order by a,b) from ab order by a,b;

> and Point #2
>
> Teach planner to decide which window to evaluate first based on costs.
> Currently the first window in the query is evaluated first, there may be no
> index to help sort the first window, but perhaps there are for other windows
> in the query. This may allow an index scan instead of a seqscan -> sort.

What Tom wrote about that in the first paragraph of [1] still applies.
The problem is that if the query contains many joins that to properly
find the cheapest way of executing the query we'd have to perform the
join search once for each unique sort order of each WindowClause.
That's just not practical to do from a performance standpoint. The
join search can be very expensive. There may be something that could
be done to better determine the most likely candidate for the first
WindowClause using some heuristics, but I've no idea what those would
be. You should look into point #1 first. Point #2 is significantly
more difficult to solve in a way that would be acceptable to the
project.

David

[1] https://www.postgresql.org/message-id/11535.1230501658%40sss.pgh.pa.us

Todo: Teach planner to evaluate multiple windows in the optimal order

2022-12-25 Thread Ankit Kumar Pandey


Hi,

While looking at one of the todo item in Window function, namely:

/Teach planner to evaluate multiple windows in the optimal order 
Currently windows are always evaluated in the query-specified order./


From threads, relevant points.

Point #1


In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
sorts. We also perform 3.

and Point #2

Teach planner to decide which window to evaluate first based on costs.
Currently the first window in the query is evaluated first, there may 
be no
index to help sort the first window, but perhaps there are for other 
windows

in the query. This may allow an index scan instead of a seqscan -> sort.

Repro:

select pg_catalog.version();

/version //
////
// PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
12.2.0-3ubuntu1) 12.2.0, 64-bit//

//(1 row)/

create table empsalary(depname text, empno int, salary int);
insert into empsalary values (select substr(md5(random()::text), 0, 25), 
generate_series(1,10), generate_series(1,12000));


explain SELECT depname, SUM(salary) OVER (ORDER BY salary), SUM(salary) 
OVER (ORDER BY empno) FROM empsalary ORDER BY salary;


/  QUERY PLAN //
//--//
// WindowAgg  (cost=289.47..324.48 rows=2001 width=49)//
//   ->  Sort  (cost=289.47..294.47 rows=2001 width=41)//
// Sort Key: salary//
// ->  WindowAgg  (cost=144.73..179.75 rows=2001 width=41)//
//   ->  Sort  (cost=144.73..149.73 rows=2001 width=33)//
// Sort Key: empno//
// ->  Seq Scan on empsalary (cost=0.00..35.01 
rows=2001 width=33)//

//(7 rows)/

As it is seen, for case #1, issue looks like resolved and only 2 sorts 
are performed.


For #2, index col ordering is changed.

create index idx_emp on empsalary (empno);
explain SELECT depname, SUM(salary) OVER (ORDER BY salary), SUM(salary) 
OVER (ORDER BY empno) FROM empsalary ORDER BY salary;

/   QUERY PLAN //
////
// WindowAgg  (cost=204.03..239.04 rows=2001 width=49)//
//   ->  Sort  (cost=204.03..209.03 rows=2001 width=41)//
// Sort Key: salary//
// ->  WindowAgg  (cost=0.28..94.31 rows=2001 width=41)//
//   ->  Index Scan using idx_emp on empsalary  
(cost=0.28..64.29 rows=2001 width=33)//

//(5 rows)/

explain SELECT depname, SUM(salary) OVER (ORDER BY empno), SUM(salary) 
OVER (ORDER BY salary) FROM empsalary ORDER BY salary;

/   QUERY PLAN //
////
// WindowAgg  (cost=204.03..239.04 rows=2001 width=49)//
//   ->  Sort  (cost=204.03..209.03 rows=2001 width=41)//
// Sort Key: salary//
// ->  WindowAgg  (cost=0.28..94.31 rows=2001 width=41)//
//   ->  Index Scan using idx_emp on empsalary  
(cost=0.28..64.29 rows=2001 width=33)//

//(5 rows)/

In both cases, index scan is performed, which means this issue is 
resolved as well.


Is this todo still relevant?

Further down the threads:


I do think the patch has probably left some low-hanging fruit on the
simpler end of the difficulty spectrum, namely when the window stuff
requires only one ordering that could be done either explicitly or
by an indexscan. That choice should ideally be done with a proper
cost comparison taking any LIMIT into account. I think right now
the LIMIT might not be accounted for, or might be considered even
when it shouldn't be because another sort is needed anyway.


I am not sure if I understand this fully but does it means proposed todo 
(to use indexes) should be refined to


teach planner to  take into account of cost model while deciding to use 
index or not in window functions? Meaning not always go with index route 
(modify point #2)?


--
Regards,
Ankit Kumar Pandey

72 matches

Mail list logo