Re: Expand applicability of aggregate's sortop optimization

2024-10-01 Thread Andrei Lepikhov

On 18/7/2024 00:03, David Rowley wrote:

On Wed, 17 Jul 2024 at 17:12, Andrei Lepikhov  wrote:

Also, I don't clearly understand the case you mentioned here - does it
mean that you want to nullify orders for other aggregate types if they
are the same as the incoming order?


No, I mean unconditionally nullify Aggref->aggorder and
Aggref->aggdistinct for aggregate functions where ORDER BY / DISTINCT
in the Aggref makes no difference to the result. I think that's ok for
max() and min() for everything besides NUMERIC.  For aggorder, we'd
have to *not* optimise sum() and avg() for floating point types as
that could change the result. sum() and avg() for INT2, INT4 and INT8
seems fine. I'd need to check, but I think sum(numeric) is ok too as
the dscale should end up the same regardless of the order. Obviously,
aggdistinct can't be changed for sum() and avg() on any type.
I have written a sketch patch to implement the idea with aggregate 
prosupport in code - see the attachment.

My intention was to provide an example, not a ready-to-commit patch.
As I see, the only problem there lies in the tests - CREATE AGGREGATE 
doesn't allow us to set a prosupport routine, and we need to update the 
pg_proc table manually.


--
regards, Andrei Lepikhov
From b272413e64dd45d0b8cff11a18a16bdcc2146cab Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Tue, 1 Oct 2024 16:08:11 +0700
Subject: [PATCH] Introduce prosupport helpers for aggregates.

For example, add minmax_support routine to simplify min/max aggregates.
---
 src/backend/optimizer/plan/planagg.c |  73 +
 src/backend/optimizer/prep/prepagg.c |  26 +
 src/include/catalog/pg_proc.dat  |  93 
 src/include/nodes/supportnodes.h |   7 ++
 src/test/regress/expected/aggregates.out | 128 ++-
 src/test/regress/sql/aggregates.sql  |  55 ++
 src/tools/pgindent/typedefs.list |   1 +
 7 files changed, 336 insertions(+), 47 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c 
b/src/backend/optimizer/plan/planagg.c
index afb5445b77..93131fa022 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -510,3 +512,74 @@ fetch_agg_sort_op(Oid aggfnoid)
 
return aggsortop;
 }
+
+Datum
+minmax_support(PG_FUNCTION_ARGS)
+{
+   Node   *rawreq = (Node *) PG_GETARG_POINTER(0);
+   Oid aggsortop;
+
+   if (IsA(rawreq, SupportRequestMinMax))
+   {
+   SupportRequestMinMax   *req = (SupportRequestMinMax *) rawreq;
+   Aggref *aggref = req->aggref;
+
+   if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL)
+   PG_RETURN_POINTER(NULL);
+
+   aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+   if (!OidIsValid(aggsortop))
+   PG_RETURN_POINTER(NULL);/* not a 
MIN/MAX aggregate */
+
+   if (aggref->aggorder != NIL)
+   {
+   SortGroupClause*orderClause;
+   TargetEntry*curTarget;
+
+   curTarget = (TargetEntry *) linitial(aggref->args);
+
+   /*
+* If the order clause is the same column as the one 
we're
+* aggregating, we can still use the index: It is 
undefined which
+* value is MIN() or MAX(), as well as which value is 
first or
+* last when sorted. So, we can still use the index IFF 
the
+* aggregated expression equals the expression used in 
the
+* ordering operation.
+*/
+
+   /*
+* We only accept a single argument to min/max 
aggregates,
+* orderings that have more clauses won't provide 
correct results.
+*/
+   Assert(list_length(aggref->aggorder) == 1);
+
+   orderClause = castNode(SortGroupClause, 
linitial(aggref->aggorder));
+
+   Assert(orderClause->tleSortGroupRef == 
curTarget->ressortgroupref);
+
+   if (orderClause->sortop != aggsortop)
+   {
+   List   *btclasses;
+   ListCell   *lc;
+
+   btclasses = 
get_op_btree_interpretation(orderClause

Re: Expand applicability of aggregate's sortop optimization

2024-09-02 Thread Andrei Lepikhov

On 18/7/2024 14:49, Matthias van de Meent wrote:

Aside: Arguably, checking for commutator operators would not be
incorrect when looking at it from "applied operators" point of view,
but if that commutative operator isn't registered as opposite ordering
of the same btree opclass, then we'd probably break some assumptions
of some aggregate's sortop - it could be registered with another
opclass, and that could cause us to select a different btree opclass
(thus: ordering) than is indicated to be required by the aggregate;
the thing we're trying to protect against here.

Hi,
This thread stands idle. At the same time, the general idea of this 
patch and the idea of utilising prosupport functions look promising. Are 
you going to develop this feature further?


--
regards, Andrei Lepikhov





Re: Expand applicability of aggregate's sortop optimization

2024-07-18 Thread Andrei Lepikhov

On 18/7/2024 19:49, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 16:09, Andrei Lepikhov  wrote:


On 17/7/2024 16:33, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov  wrote:

Because the @<@ and @>@ operators are not registered as commutative,
it couldn't apply the optimization in your patch, while the btree
operator check does allow it to apply the optimization.

Ok, I got it.
And next issue: I think it would be better to save cycles than to free 
some piece of memory, so why not to break the foreach cycle if you 
already matched the opfamily?

Also, in the patch attached I added your smoothed test to the aggregates.sql


Aside: Arguably, checking for commutator operators would not be
incorrect when looking at it from "applied operators" point of view,
but if that commutative operator isn't registered as opposite ordering
of the same btree opclass, then we'd probably break some assumptions
of some aggregate's sortop - it could be registered with another
opclass, and that could cause us to select a different btree opclass
(thus: ordering) than is indicated to be required by the aggregate;
the thing we're trying to protect against hereYes, I also think if someone doesn't register < as a commutator to >, it 
may mean they do it intentionally: may be it is a bit different 
sortings? - this subject is too far from my experience and I can agree 
with your approach.


--
regards, Andrei Lepikhov
diff --git a/src/backend/optimizer/plan/planagg.c 
b/src/backend/optimizer/plan/planagg.c
index afb5445b77..9b1d6f4e43 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,20 @@ can_minmax_aggs(PlannerInfo *root, List **context)
if (list_length(aggref->args) != 1)
return false;   /* it couldn't be MIN/MAX */
 
+   /*
+* We might implement the optimization when a FILTER clause is 
present
+* by adding the filter to the quals of the generated subquery. 
 For
+* now, just punt.
+*/
+   if (aggref->aggfilter != NULL)
+   return false;
+
+   curTarget = (TargetEntry *) linitial(aggref->args);
+
+   aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+   if (!OidIsValid(aggsortop))
+   return false;   /* not a MIN/MAX aggregate */
+
/*
 * ORDER BY is usually irrelevant for MIN/MAX, but it can 
change the
 * outcome if the aggsortop's operator class recognizes 
non-identical
@@ -267,22 +281,55 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 * quickly.
 */
if (aggref->aggorder != NIL)
-   return false;
-   /* note: we do not care if DISTINCT is mentioned ... */
-
-   /*
-* We might implement the optimization when a FILTER clause is 
present
-* by adding the filter to the quals of the generated subquery. 
 For
-* now, just punt.
-*/
-   if (aggref->aggfilter != NULL)
-   return false;
+   {
+   SortGroupClause *orderClause;
+
+   /*
+* If the order clause is the same column as the one 
we're
+* aggregating, we can still use the index: It is 
undefined which
+* value is MIN() or MAX(), as well as which value is 
first or
+* last when sorted. So, we can still use the index IFF 
the
+* aggregated expression equals the expression used in 
the
+* ordering operation.
+*/
+
+   /*
+* We only accept a single argument to min/max 
aggregates,
+* orderings that have more clauses won't provide 
correct results.
+*/
+   Assert(list_length(aggref->aggorder) == 1);
+
+   orderClause = castNode(SortGroupClause, 
linitial(aggref->aggorder));
+
+   Assert(orderClause->tleSortGroupRef == 
curTarget->ressortgroupref);
+
+   if (orderClause->sortop != aggsortop)
+   {
+   List   *btclasses;
+   ListCell   *cell;
+   boolmatch = false;
+
+   btclasses = 
get_op_btree_interpretation(orderClause->sortop);
+
+   foreach(cell, btclasses)
+   {
+   OpBtreeInterpretation *interpretation;
+   interpretation = (OpBtreeInterpretation 
*) lfirst(cell);
+
+ 

Re: Expand applicability of aggregate's sortop optimization

2024-07-18 Thread Matthias van de Meent
On Wed, 17 Jul 2024 at 16:09, Andrei Lepikhov  wrote:
>
> On 17/7/2024 16:33, Matthias van de Meent wrote:
> > On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov  wrote:
> >> Thanks for the job! I guess you could be more brave and push down also
> >> FILTER statement.
> >
> > While probably not impossible, I wasn't planning on changing this code
> > with new optimizations; just expanding the applicability of the
> > current optimizations.
> Got it>> As I see, the code:
> >> aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
> >> if (!OidIsValid(aggsortop))
> >> return false;/* not a MIN/MAX aggregate */
> >>
> >> used twice and can be evaluated earlier to avoid duplicated code.
> >
> > The code is structured like this to make sure we only start accessing
> > catalogs once we know that all other reasons to bail out from this
> > optimization indicate we can apply the opimization. You'll notice that
> > I've tried to put the cheapest checks that only use caller-supplied
> > information first, and catalog accesses only after that.
> >
> > If the fetch_agg_sort_op clause would be deduplicated, it would either
> > increase code complexity to handle both aggref->aggorder paths, or it
> > would increase the cost of planning MAX(a ORDER BY b) because of the
> > newly added catalog access.
> IMO it looks like a micro optimisation. But I agree, it is more about
> code style - let the committer decide what is better.>> Also, I'm unsure
> about the necessity of looking through the btree
> >> classes. Maybe just to check the commutator to the sortop, like in the
> >> diff attached? Or could you provide an example to support your approach?
> >
> > I think it could work, but I'd be hesitant to rely on that, as
> > commutator registration is optional (useful, but never required for
> > btree operator classes' operators). Looking at the btree operator
> > class, which is the definition of sortability in PostgreSQL, seems
> > more suitable and correct.
> Hm, I dubious about that. Can you provide an example which my variant
> will not pass but your does that correctly?

Here is one:

"""
CREATE OPERATOR @@> (
  function=int4gt, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@>= (
  function=int4ge, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@= (
  function=int4eq, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@<= (
  function=int4le, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@< (
  function=int4lt, leftarg=int4, rightarg=int4
);

CREATE OPERATOR CLASS my_int_ops
  FOR TYPE int
  USING btree AS
  OPERATOR 1 @<@,
  OPERATOR 2 @<=@,
  OPERATOR 3 @=@,
  OPERATOR 4 @>=@,
  OPERATOR 5 @>@,
  FUNCTION 1 btint4cmp;

CREATE AGGREGATE my_int_max (
  BASETYPE = int4,
  SFUNC = int4larger,
  STYPE = int4,
  SORTOP = @>@
);

CREATE TABLE my_table AS
SELECT id::int4 FROM generate_series(1, 1) id;

CREATE INDEX ON my_table (id my_int_ops);

SELECT my_int_max(id ORDER BY id USING @<@ ) from my_table;
"""

Because the @<@ and @>@ operators are not registered as commutative,
it couldn't apply the optimization in your patch, while the btree
operator check does allow it to apply the optimization.

Aside: Arguably, checking for commutator operators would not be
incorrect when looking at it from "applied operators" point of view,
but if that commutative operator isn't registered as opposite ordering
of the same btree opclass, then we'd probably break some assumptions
of some aggregate's sortop - it could be registered with another
opclass, and that could cause us to select a different btree opclass
(thus: ordering) than is indicated to be required by the aggregate;
the thing we're trying to protect against here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Expand applicability of aggregate's sortop optimization

2024-07-17 Thread Andrei Lepikhov

On 17/7/2024 16:33, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov  wrote:

As I see, the code:
aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
if (!OidIsValid(aggsortop))
return false;/* not a MIN/MAX aggregate */

used twice and can be evaluated earlier to avoid duplicated code.


The code is structured like this to make sure we only start accessing
catalogs once we know that all other reasons to bail out from this
optimization indicate we can apply the opimization. You'll notice that
I've tried to put the cheapest checks that only use caller-supplied
information first, and catalog accesses only after that.
After additional research I think I get the key misunderstanding why you 
did so:

As I see, the checks:
if (list_length(aggref->aggorder) > 1)
  return false;
if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
  return false;

not needed at all. You already have check:
if (list_length(aggref->args) != 1)
and this tells us, that if we have ordering like MIN(x ORDER BY ), 
this  ordering contains only aggregate argument x. Because if it 
contained some expression, the transformAggregateCall() would add this 
expression to agg->args by calling the transformSortClause() routine.
The tleSortGroupRef is just exactly ressortgroupref - no need to recheck 
it one more time. Of course, it is suitable only for MIN/MAX aggregates, 
but we discuss only them right now. Am I wrong?

If you want, you can place it as assertions (see the diff in attachment).

--
regards, Andrei Lepikhov
diff --git a/src/backend/optimizer/plan/planagg.c 
b/src/backend/optimizer/plan/planagg.c
index afb5445b77..e0cbe5c923 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,20 @@ can_minmax_aggs(PlannerInfo *root, List **context)
if (list_length(aggref->args) != 1)
return false;   /* it couldn't be MIN/MAX */
 
+   /*
+* We might implement the optimization when a FILTER clause is 
present
+* by adding the filter to the quals of the generated subquery. 
 For
+* now, just punt.
+*/
+   if (aggref->aggfilter != NULL)
+   return false;
+
+   curTarget = (TargetEntry *) linitial(aggref->args);
+
+   aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+   if (!OidIsValid(aggsortop))
+   return false;   /* not a MIN/MAX aggregate */
+
/*
 * ORDER BY is usually irrelevant for MIN/MAX, but it can 
change the
 * outcome if the aggsortop's operator class recognizes 
non-identical
@@ -267,22 +281,35 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 * quickly.
 */
if (aggref->aggorder != NIL)
-   return false;
-   /* note: we do not care if DISTINCT is mentioned ... */
+   {
+   SortGroupClause *orderClause;
 
-   /*
-* We might implement the optimization when a FILTER clause is 
present
-* by adding the filter to the quals of the generated subquery. 
 For
-* now, just punt.
-*/
-   if (aggref->aggfilter != NULL)
-   return false;
+   /*
+* If the order clause is the same column as the one 
we're
+* aggregating, we can still use the index: It is 
undefined which
+* value is MIN() or MAX(), as well as which value is 
first or
+* last when sorted. So, we can still use the index IFF 
the
+* aggregated expression equals the expression used in 
the
+* ordering operation.
+*/
 
-   aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
-   if (!OidIsValid(aggsortop))
-   return false;   /* not a MIN/MAX aggregate */
+   /*
+* We only accept a single argument to min/max 
aggregates,
+* orderings that have more clauses won't provide 
correct results.
+*/
+   Assert(list_length(aggref->aggorder) == 1);
+
+   orderClause = castNode(SortGroupClause, 
linitial(aggref->aggorder));
+
+   Assert(orderClause->tleSortGroupRef == 
curTarget->ressortgroupref);
+
+   if (orderClause->sortop != aggsortop &&
+   orderClause->sortop != 
get_commutator(aggsortop))
+   return false;
+   }
+
+   /* note: we do not care if DISTINCT is mentioned ... */
 
-   curTarget = (TargetEntry *) linit

Re: Expand applicability of aggregate's sortop optimization

2024-07-17 Thread David Rowley
On Wed, 17 Jul 2024 at 17:12, Andrei Lepikhov  wrote:
> I generally like the idea of a support function. But as I can see,  the
> can_minmax_aggs() rejects if any of the aggregates don't pass the
> checks. The prosupport feature is designed to be applied to each
> function separately. How do you think to avoid it?

You wouldn't avoid it.  The prosupport function would be called once
for each Aggref in the query. Why is that a problem?

> Also, I don't clearly understand the case you mentioned here - does it
> mean that you want to nullify orders for other aggregate types if they
> are the same as the incoming order?

No, I mean unconditionally nullify Aggref->aggorder and
Aggref->aggdistinct for aggregate functions where ORDER BY / DISTINCT
in the Aggref makes no difference to the result. I think that's ok for
max() and min() for everything besides NUMERIC.  For aggorder, we'd
have to *not* optimise sum() and avg() for floating point types as
that could change the result. sum() and avg() for INT2, INT4 and INT8
seems fine. I'd need to check, but I think sum(numeric) is ok too as
the dscale should end up the same regardless of the order. Obviously,
aggdistinct can't be changed for sum() and avg() on any type.

It seems also possible to adjust count(non-nullable-var) into
count(*), which, if done early enough in planning could help
significantly by both reducing evaluation during execution, but also
possibly reduce tuple deformation if that Var has a higher varattno
than anything else in the relation. That would require checking
varnullingrels is empty and the respective RelOptInfo's notnullattnums
mentions the Var.

David




Re: Expand applicability of aggregate's sortop optimization

2024-07-17 Thread Andrei Lepikhov

On 17/7/2024 16:33, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov  wrote:

Thanks for the job! I guess you could be more brave and push down also
FILTER statement.


While probably not impossible, I wasn't planning on changing this code
with new optimizations; just expanding the applicability of the
current optimizations.

Got it>> As I see, the code:

aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
if (!OidIsValid(aggsortop))
return false;/* not a MIN/MAX aggregate */

used twice and can be evaluated earlier to avoid duplicated code.


The code is structured like this to make sure we only start accessing
catalogs once we know that all other reasons to bail out from this
optimization indicate we can apply the opimization. You'll notice that
I've tried to put the cheapest checks that only use caller-supplied
information first, and catalog accesses only after that.

If the fetch_agg_sort_op clause would be deduplicated, it would either
increase code complexity to handle both aggref->aggorder paths, or it
would increase the cost of planning MAX(a ORDER BY b) because of the
newly added catalog access.
IMO it looks like a micro optimisation. But I agree, it is more about 
code style - let the committer decide what is better.>> Also, I'm unsure 
about the necessity of looking through the btree

classes. Maybe just to check the commutator to the sortop, like in the
diff attached? Or could you provide an example to support your approach?


I think it could work, but I'd be hesitant to rely on that, as
commutator registration is optional (useful, but never required for
btree operator classes' operators). Looking at the btree operator
class, which is the definition of sortability in PostgreSQL, seems
more suitable and correct.
Hm, I dubious about that. Can you provide an example which my variant 
will not pass but your does that correctly?


--
regards, Andrei Lepikhov





Re: Expand applicability of aggregate's sortop optimization

2024-07-17 Thread Matthias van de Meent
On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov  wrote:
>
> On 5/8/24 17:13, Matthias van de Meent wrote:
> > As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
> > rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
> > 1; by using the optional sortop field in the aggregator.
> > However, this optimization is disabled for clauses that in itself have
> > an ORDER BY clause such as `MIN(unique1 ORDER BY ), because
> >  can cause reordering of distinguisable values like 1.0 and
> > 1.00, which then causes measurable differences in the output. In the
> > general case, that's a good reason to not apply this optimization, but
> > in some cases, we could still apply the index optimization.
>
> Thanks for the job! I guess you could be more brave and push down also
> FILTER statement.

While probably not impossible, I wasn't planning on changing this code
with new optimizations; just expanding the applicability of the
current optimizations.

Note that the "aggfilter" clause was not new, but moved up in the code
to make sure we use this local information to bail out (if applicable)
before trying to use the catalogs for bail-out information.

> >
> > One of those cases is fixed in the attached patch: if we order by the
> > same column that we're aggregating, using the same order class as the
> > aggregate's sort operator (i.e. the aggregate's sortop is in the same
> > btree opclass as the ORDER BY's sort operator), then we can still use
> > the index operation: The sort behaviour isn't changed, thus we can
> > apply the optimization.
> >
> > PFA the small patch that implements this.
> >
> > Note that we can't blindly accept just any ordering by the same
> > column: If we had an opclass that sorted numeric values by the length
> > of the significant/mantissa, then that'd provide a different (and
> > distinct) ordering from that which is expected by the normal
> > min()/max() aggregates for numeric, which could cause us to return
> > arguably incorrect results for the aggregate expression.
>
> As I see, the code:
> aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
> if (!OidIsValid(aggsortop))
>return false;/* not a MIN/MAX aggregate */
>
> used twice and can be evaluated earlier to avoid duplicated code.

The code is structured like this to make sure we only start accessing
catalogs once we know that all other reasons to bail out from this
optimization indicate we can apply the opimization. You'll notice that
I've tried to put the cheapest checks that only use caller-supplied
information first, and catalog accesses only after that.

If the fetch_agg_sort_op clause would be deduplicated, it would either
increase code complexity to handle both aggref->aggorder paths, or it
would increase the cost of planning MAX(a ORDER BY b) because of the
newly added catalog access.

> Also, I'm unsure about the necessity of looking through the btree
> classes. Maybe just to check the commutator to the sortop, like in the
> diff attached? Or could you provide an example to support your approach?

I think it could work, but I'd be hesitant to rely on that, as
commutator registration is optional (useful, but never required for
btree operator classes' operators). Looking at the btree operator
class, which is the definition of sortability in PostgreSQL, seems
more suitable and correct.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Expand applicability of aggregate's sortop optimization

2024-07-16 Thread Andrei Lepikhov

On 5/9/24 08:08, David Rowley wrote:

On Thu, 9 May 2024 at 12:26, David Rowley  wrote:

I wonder if we should also consider as an alternative to this to just
have an aggregate support function, similar to
SupportRequestOptimizeWindowClause that just nullifies the aggorder /
aggdistinct fields for Min/Max aggregates on types where there's no
possible difference in output when calling the transition function on
rows in a different order.

Would that apply in enough cases for you?


One additional thought is that the above method would also help
eliminate redundant sorting in queries with a GROUP BY clause.
Whereas, the can_minmax_aggs optimisation is not applied in that case.
I generally like the idea of a support function. But as I can see,  the 
can_minmax_aggs() rejects if any of the aggregates don't pass the 
checks. The prosupport feature is designed to be applied to each 
function separately. How do you think to avoid it?
Also, I don't clearly understand the case you mentioned here - does it 
mean that you want to nullify orders for other aggregate types if they 
are the same as the incoming order?


--
regards, Andrei Lepikhov





Re: Expand applicability of aggregate's sortop optimization

2024-07-16 Thread Andrei Lepikhov

On 5/8/24 17:13, Matthias van de Meent wrote:

As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
1; by using the optional sortop field in the aggregator.
However, this optimization is disabled for clauses that in itself have
an ORDER BY clause such as `MIN(unique1 ORDER BY ), because
 can cause reordering of distinguisable values like 1.0 and
1.00, which then causes measurable differences in the output. In the
general case, that's a good reason to not apply this optimization, but
in some cases, we could still apply the index optimization.
Thanks for the job! I guess you could be more brave and push down also 
FILTER statement.


One of those cases is fixed in the attached patch: if we order by the
same column that we're aggregating, using the same order class as the
aggregate's sort operator (i.e. the aggregate's sortop is in the same
btree opclass as the ORDER BY's sort operator), then we can still use
the index operation: The sort behaviour isn't changed, thus we can
apply the optimization.

PFA the small patch that implements this.

Note that we can't blindly accept just any ordering by the same
column: If we had an opclass that sorted numeric values by the length
of the significant/mantissa, then that'd provide a different (and
distinct) ordering from that which is expected by the normal
min()/max() aggregates for numeric, which could cause us to return
arguably incorrect results for the aggregate expression.

As I see, the code:
aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
if (!OidIsValid(aggsortop))
  return false; /* not a MIN/MAX aggregate */

used twice and can be evaluated earlier to avoid duplicated code.

Also, I'm unsure about the necessity of looking through the btree 
classes. Maybe just to check the commutator to the sortop, like in the 
diff attached? Or could you provide an example to support your approach?


--
regards, Andrei Lepikhov
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..e9e95e2b2a 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,20 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		if (list_length(aggref->args) != 1)
 			return false;		/* it couldn't be MIN/MAX */
 
+		/*
+		 * We might implement the optimization when a FILTER clause is present
+		 * by adding the filter to the quals of the generated subquery.  For
+		 * now, just punt.
+		 */
+		if (aggref->aggfilter != NULL)
+			return false;
+
+		curTarget = (TargetEntry *) linitial(aggref->args);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			return false;		/* not a MIN/MAX aggregate */
+
 		/*
 		 * ORDER BY is usually irrelevant for MIN/MAX, but it can change the
 		 * outcome if the aggsortop's operator class recognizes non-identical
@@ -267,22 +281,37 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		 * quickly.
 		 */
 		if (aggref->aggorder != NIL)
-			return false;
-		/* note: we do not care if DISTINCT is mentioned ... */
-
-		/*
-		 * We might implement the optimization when a FILTER clause is present
-		 * by adding the filter to the quals of the generated subquery.  For
-		 * now, just punt.
-		 */
-		if (aggref->aggfilter != NULL)
-			return false;
+		{
+			SortGroupClause *orderClause;
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			if (list_length(aggref->aggorder) > 1)
+return false;
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+return false;
+
+			if (orderClause->sortop != aggsortop &&
+orderClause->sortop != get_commutator(aggsortop))
+return false;
+		}
 
-		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
-		if (!OidIsValid(aggsortop))
-			return false;		/* not a MIN/MAX aggregate */
+		/* note: we do not care if DISTINCT is mentioned ... */
 
-		curTarget = (TargetEntry *) linitial(aggref->args);
 
 		if (contain_mutable_functions((Node *) curTarget->expr))
 			return false;		/* not potentially indexable */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index a5596ab210..5d37510d64 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,71 @@ select max(unique1) from tenk1 where unique1 > 42;
  
 (1 row)
 
+-- When sorting on the colu

Re: Expand applicability of aggregate's sortop optimization

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 13:08, David Rowley  wrote:
> One additional thought is that the above method would also help
> eliminate redundant sorting in queries with a GROUP BY clause.
> Whereas, the can_minmax_aggs optimisation is not applied in that case.

Another argument for using this method is that
SupportRequestOptimizeAggref could allow future unrelated
optimisations such as swapping count() for count(*).
Where  is a NOT NULL column and isn't nullable by
any outer join.  Doing that could speed some queries up quite a bit as
it may mean fewer columns to deform from the tuple. You could imagine
a fact table with many columns and a few dimensions, often the
dimension columns that you'd expect to use in GROUP BY would appear
before the columns you'd aggregate on.

David




Re: Expand applicability of aggregate's sortop optimization

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 12:26, David Rowley  wrote:
> I wonder if we should also consider as an alternative to this to just
> have an aggregate support function, similar to
> SupportRequestOptimizeWindowClause that just nullifies the aggorder /
> aggdistinct fields for Min/Max aggregates on types where there's no
> possible difference in output when calling the transition function on
> rows in a different order.
>
> Would that apply in enough cases for you?

One additional thought is that the above method would also help
eliminate redundant sorting in queries with a GROUP BY clause.
Whereas, the can_minmax_aggs optimisation is not applied in that case.

David




Re: Expand applicability of aggregate's sortop optimization

2024-05-08 Thread David Rowley
On Wed, 8 May 2024 at 22:13, Matthias van de Meent
 wrote:
> As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
> rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
> 1; by using the optional sortop field in the aggregator.
> However, this optimization is disabled for clauses that in itself have
> an ORDER BY clause such as `MIN(unique1 ORDER BY ), because
>  can cause reordering of distinguisable values like 1.0 and
> 1.00, which then causes measurable differences in the output. In the
> general case, that's a good reason to not apply this optimization, but
> in some cases, we could still apply the index optimization.

I wonder if we should also consider as an alternative to this to just
have an aggregate support function, similar to
SupportRequestOptimizeWindowClause that just nullifies the aggorder /
aggdistinct fields for Min/Max aggregates on types where there's no
possible difference in output when calling the transition function on
rows in a different order.

Would that apply in enough cases for you?

I think it would rule out Min(numeric) and Max(numeric). We were
careful not to affect the number of decimal places in the numeric
output when using the moving aggregate inverse transition
infrastructure for WindowFuncs, so I agree we should maintain an
ability to control the aggregate transition order for numeric. (See
do_numeric_discard's maxScale if check)

I don't think floating point types have the same issues here. At least
+1.0 is greater than -1.0.

Are there any strange collation rules that would cause issues if we
did this with the text types?

David




Re: Expand applicability of aggregate's sortop optimization

2024-05-08 Thread Dagfinn Ilmari Mannsåker
Matthias van de Meent  writes:

> PFA the small patch that implements this.

I don't have enough knowledge to have an opinion on most of the patch
other than it looks okay at a glance, but the list API usage could be
updated to more modern variants:

> diff --git a/src/backend/optimizer/plan/planagg.c 
> b/src/backend/optimizer/plan/planagg.c
> index afb5445b77..d8479fe286 100644
> --- a/src/backend/optimizer/plan/planagg.c
> +++ b/src/backend/optimizer/plan/planagg.c
> @@ -253,6 +253,16 @@ can_minmax_aggs(PlannerInfo *root, List **context)
>   if (list_length(aggref->args) != 1)
>   return false;   /* it couldn't be MIN/MAX */
>  
> + /*
> +  * We might implement the optimization when a FILTER clause is 
> present
> +  * by adding the filter to the quals of the generated subquery. 
>  For
> +  * now, just punt.
> +  */
> + if (aggref->aggfilter != NULL)
> + return false;
> +
> + curTarget = (TargetEntry *) linitial(aggref->args);

This could be linitial_node(TargetEntry, aggref->args).

> + if (list_length(aggref->aggorder) > 1)
> + return false;
> +
> + orderClause = castNode(SortGroupClause, 
> linitial(aggref->aggorder));

This could be linitial_node(SortGroupClause, aggref->aggorder).

> + if (orderClause->sortop != aggsortop)
> + {
> + List   *btclasses;
> + ListCell *cell;
> + boolmatch = false;
> +
> + btclasses = 
> get_op_btree_interpretation(orderClause->sortop);
> +
> + foreach(cell, btclasses)
> + {
> + OpBtreeInterpretation *interpretation;
> + interpretation = (OpBtreeInterpretation 
> *) lfirst(cell);

This could be foreach_ptr(OpBtreeInterpretation, interpretation, btclasses),
which also eliminates the need for the explicit `ListCell *` variable
and lfirst() call.

- ilmari




Expand applicability of aggregate's sortop optimization

2024-05-08 Thread Matthias van de Meent
Hi,

As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
1; by using the optional sortop field in the aggregator.
However, this optimization is disabled for clauses that in itself have
an ORDER BY clause such as `MIN(unique1 ORDER BY ), because
 can cause reordering of distinguisable values like 1.0 and
1.00, which then causes measurable differences in the output. In the
general case, that's a good reason to not apply this optimization, but
in some cases, we could still apply the index optimization.

One of those cases is fixed in the attached patch: if we order by the
same column that we're aggregating, using the same order class as the
aggregate's sort operator (i.e. the aggregate's sortop is in the same
btree opclass as the ORDER BY's sort operator), then we can still use
the index operation: The sort behaviour isn't changed, thus we can
apply the optimization.

PFA the small patch that implements this.

Note that we can't blindly accept just any ordering by the same
column: If we had an opclass that sorted numeric values by the length
of the significant/mantissa, then that'd provide a different (and
distinct) ordering from that which is expected by the normal
min()/max() aggregates for numeric, which could cause us to return
arguably incorrect results for the aggregate expression.

Alternatively, the current code could be changed to build indexed
paths that append the SORT BY paths to the aggregate's sort operator,
but that'd significantly increase complexity here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)
From 215600d12164f214aae8f0de94b16373bd202d69 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Thu, 25 Apr 2024 15:23:57 +0200
Subject: [PATCH v1] Plan min()/max()-like aggregates with index accesses in
 more cases

When the aggregate's operator is included in the ORDER BY's ordering
opclass, we know they have a common ordering, and thus should get the
same output (or at least same consistency guarantee for that output).

So, check if the ORDER BY orders things by only the aggregated
expression, and check if that ordering shares a btree opclass with
the aggregator's operator. If so, we can use the index.
---
 src/backend/optimizer/plan/planagg.c | 87 
 src/test/regress/expected/aggregates.out | 65 ++
 src/test/regress/sql/aggregates.sql  | 18 +
 3 files changed, 156 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..d8479fe286 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,16 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		if (list_length(aggref->args) != 1)
 			return false;		/* it couldn't be MIN/MAX */
 
+		/*
+		 * We might implement the optimization when a FILTER clause is present
+		 * by adding the filter to the quals of the generated subquery.  For
+		 * now, just punt.
+		 */
+		if (aggref->aggfilter != NULL)
+			return false;
+
+		curTarget = (TargetEntry *) linitial(aggref->args);
+
 		/*
 		 * ORDER BY is usually irrelevant for MIN/MAX, but it can change the
 		 * outcome if the aggsortop's operator class recognizes non-identical
@@ -267,22 +277,71 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		 * quickly.
 		 */
 		if (aggref->aggorder != NIL)
-			return false;
+		{
+			SortGroupClause *orderClause;
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			if (list_length(aggref->aggorder) > 1)
+return false;
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+return false;
+
+			aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+			if (!OidIsValid(aggsortop))
+return false;		/* not a MIN/MAX aggregate */
+
+			if (orderClause->sortop != aggsortop)
+			{
+List   *btclasses;
+ListCell *cell;
+bool	match = false;
+
+btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+foreach(cell, btclasses)
+{
+	OpBtreeInterpretation *interpretation;
+	interpretation = (OpBtreeInterpretation *) lfirst(cell);
+
+	if (!match)
+	{
+		if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+			match = true;
+	}
+	/* Now useless */
+	pfree(interpretation);
+}
+
+/* Now not useful anymore */
+pfree(btclasses);
+
+if (!match)