subject:"POC\: GROUP BY optimization"

Re: POC: GROUP BY optimization

2024-06-04 Thread Alexander Korotkov

On Tue, Jun 4, 2024 at 1:47 AM Alexander Korotkov  wrote:
> On Mon, Jun 3, 2024 at 6:55 PM Pavel Borisov  wrote:
> > On Sun, 2 Jun 2024 at 17:18, Alexander Korotkov  
> > wrote:
> >> On Sun, Jun 2, 2024 at 10:55 AM jian he  
> >> wrote:
> >> >
> >> > On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov 
> >> >  wrote:
> >> > >
> >> > > I've revised some grammar including the sentence you've proposed.
> >> > >
> >> >
> >> > -static List *groupclause_apply_groupingset(PlannerInfo *root, List 
> >> > *force);
> >> > +static List *preprocess_groupclause(PlannerInfo *root, List *force);
> >> >
> >> > changing preprocess_groupclause the second argument
> >> > from "force" to "gset" would be more intuitive, I think.
> >>
> >> Probably, but my intention is to restore preprocess_groupclause() as
> >> it was before 0452b461bc with minimal edits to support incremental
> >> sort.  I'd rather avoid refactoring if this area for now.
> >>
> >> > `elog(ERROR, "Order of group-by clauses doesn't correspond incoming
> >> > sort order");`
> >> >
> >> > I think this error message makes people wonder what "incoming sort 
> >> > order" is.
> >> > BTW, "correspond", generally people use  "correspond to".
> >>
> >> Thank you.  On the second thought, I think it would be better to turn
> >> this into an assertion like the checks before.
> >>
> >> > I did some minor cosmetic changes, mainly changing foreach to 
> >> > foreach_node.
> >> > Please check the attachment.
> >>
> >> I would avoid refactoring of preprocess_groupclause() for the reason
> >> described above.  But I picked the grammar fix for PlannerInfo's
> >> comment.
> >
> >
> > Thank you for working on this patchset.
> >
> > 0001: Patch looks right
> >
> > Comments:
> >
> > Covert -> Convert
> > sets the uninitialized value of ec_sortref of the pathkey's 
> > EquivalenceClass -> sets the value of the pathkey's EquivalenceClass unless 
> > it's already initialized
> > wasn't set yet -> hasn't been set yet
> >
> > 0002: additional assert checking only. Looks right.
> >
> > 0003: pure renaming, looks good.
> >
> > 0004: Restores pre 0452b461bc state to preprocess_groupclause with removed 
> > partial_match fallback. Looks right. I haven't checked the comments 
> > provided they are restored from pre 0452b461bc state.
> >
> > 0005: Looks right.
>
> Thank you.  Revised according to your comments.  I think 0001-0004
> comprising a good refactoring addressing concerns from Tom [1].
> 0001 Removes from get_eclass_for_sort_expr() unnatural responsibility
> of setting EquivalenceClass.ec_sortref.  Instead this field is set in
> make_pathkeys_for_sortclauses_extended() while processing of group
> clauses.
> 0002 Provides additional asserts.  That should helping to defend
> against lurking bugs.
> 0003 Fixes unsuitable name of data structure.
> 0004 Restores pre 0452b461bc state to preprocess_groupclause() and
> lowers the number of paths to consider.
> It would be good to explicitly hear Tom about this.  But I'm quite
> sure these patches makes situation better not worse.  I'm going to
> push them if no objections.
>
> I'm putting 0005 aside.  It's unclear why and how there could be
> duplicate SortGroupClauses at that stage.  Also, it's unclear why
> existing code handles them bad.  So, this should wait a comment from
> Tom.
>
> Links.
> 1. https://www.postgresql.org/message-id/266850.1712879082%40sss.pgh.pa.us

Ops... I found I've published an old version of patchset.  The
relevant version is attached.

--
Regards,
Alexander Korotkov
Supabase


v6-0002-Add-invariants-check-to-get_useful_group_keys_ord.patch
Description: Binary data


v6-0004-Restore-preprocess_groupclause.patch
Description: Binary data


v6-0003-Rename-PathKeyInfo-to-GroupByOrdering.patch
Description: Binary data


v6-0005-Teach-group_keys_reorder_by_pathkeys-about-redund.patch
Description: Binary data


v6-0001-Fix-asymmetry-in-setting-EquivalenceClass.ec_sort.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-06-03 Thread Alexander Korotkov

On Mon, Jun 3, 2024 at 6:55 PM Pavel Borisov  wrote:
> On Sun, 2 Jun 2024 at 17:18, Alexander Korotkov  wrote:
>>
>> Hi!
>>
>> On Sun, Jun 2, 2024 at 10:55 AM jian he  wrote:
>> >
>> > On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov  
>> > wrote:
>> > >
>> > > I've revised some grammar including the sentence you've proposed.
>> > >
>> >
>> > -static List *groupclause_apply_groupingset(PlannerInfo *root, List 
>> > *force);
>> > +static List *preprocess_groupclause(PlannerInfo *root, List *force);
>> >
>> > changing preprocess_groupclause the second argument
>> > from "force" to "gset" would be more intuitive, I think.
>>
>> Probably, but my intention is to restore preprocess_groupclause() as
>> it was before 0452b461bc with minimal edits to support incremental
>> sort.  I'd rather avoid refactoring if this area for now.
>>
>> > `elog(ERROR, "Order of group-by clauses doesn't correspond incoming
>> > sort order");`
>> >
>> > I think this error message makes people wonder what "incoming sort order" 
>> > is.
>> > BTW, "correspond", generally people use  "correspond to".
>>
>> Thank you.  On the second thought, I think it would be better to turn
>> this into an assertion like the checks before.
>>
>> > I did some minor cosmetic changes, mainly changing foreach to foreach_node.
>> > Please check the attachment.
>>
>> I would avoid refactoring of preprocess_groupclause() for the reason
>> described above.  But I picked the grammar fix for PlannerInfo's
>> comment.
>
>
> Thank you for working on this patchset.
>
> 0001: Patch looks right
>
> Comments:
>
> Covert -> Convert
> sets the uninitialized value of ec_sortref of the pathkey's EquivalenceClass 
> -> sets the value of the pathkey's EquivalenceClass unless it's already 
> initialized
> wasn't set yet -> hasn't been set yet
>
> 0002: additional assert checking only. Looks right.
>
> 0003: pure renaming, looks good.
>
> 0004: Restores pre 0452b461bc state to preprocess_groupclause with removed 
> partial_match fallback. Looks right. I haven't checked the comments provided 
> they are restored from pre 0452b461bc state.
>
> 0005: Looks right.

Thank you.  Revised according to your comments.  I think 0001-0004
comprising a good refactoring addressing concerns from Tom [1].
0001 Removes from get_eclass_for_sort_expr() unnatural responsibility
of setting EquivalenceClass.ec_sortref.  Instead this field is set in
make_pathkeys_for_sortclauses_extended() while processing of group
clauses.
0002 Provides additional asserts.  That should helping to defend
against lurking bugs.
0003 Fixes unsuitable name of data structure.
0004 Restores pre 0452b461bc state to preprocess_groupclause() and
lowers the number of paths to consider.
It would be good to explicitly hear Tom about this.  But I'm quite
sure these patches makes situation better not worse.  I'm going to
push them if no objections.

I'm putting 0005 aside.  It's unclear why and how there could be
duplicate SortGroupClauses at that stage.  Also, it's unclear why
existing code handles them bad.  So, this should wait a comment from
Tom.

Links.
1. https://www.postgresql.org/message-id/266850.1712879082%40sss.pgh.pa.us

--
Regards,
Alexander Korotkov
Supabase


v5-0001-Fix-asymmetry-in-setting-EquivalenceClass.ec_sort.patch
Description: Binary data


v5-0002-Add-invariants-check-to-get_useful_group_keys_ord.patch
Description: Binary data


v5-0004-Restore-preprocess_groupclause.patch
Description: Binary data


v5-0003-Rename-PathKeyInfo-to-GroupByOrdering.patch
Description: Binary data


v5-0005-Teach-group_keys_reorder_by_pathkeys-about-redund.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-06-03 Thread Pavel Borisov

Hi, Alexander!

On Sun, 2 Jun 2024 at 17:18, Alexander Korotkov 
wrote:

> Hi!
>
> On Sun, Jun 2, 2024 at 10:55 AM jian he 
> wrote:
> >
> > On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov 
> wrote:
> > >
> > > I've revised some grammar including the sentence you've proposed.
> > >
> >
> > -static List *groupclause_apply_groupingset(PlannerInfo *root, List
> *force);
> > +static List *preprocess_groupclause(PlannerInfo *root, List *force);
> >
> > changing preprocess_groupclause the second argument
> > from "force" to "gset" would be more intuitive, I think.
>
> Probably, but my intention is to restore preprocess_groupclause() as
> it was before 0452b461bc with minimal edits to support incremental
> sort.  I'd rather avoid refactoring if this area for now.
>
> > `elog(ERROR, "Order of group-by clauses doesn't correspond incoming
> > sort order");`
> >
> > I think this error message makes people wonder what "incoming sort
> order" is.
> > BTW, "correspond", generally people use  "correspond to".
>
> Thank you.  On the second thought, I think it would be better to turn
> this into an assertion like the checks before.
>
> > I did some minor cosmetic changes, mainly changing foreach to
> foreach_node.
> > Please check the attachment.
>
> I would avoid refactoring of preprocess_groupclause() for the reason
> described above.  But I picked the grammar fix for PlannerInfo's
> comment.
>

Thank you for working on this patchset.

0001: Patch looks right

Comments:

Covert -> Convert
sets the uninitialized value of ec_sortref of the pathkey's
EquivalenceClass -> sets the value of the pathkey's EquivalenceClass unless
it's already initialized
wasn't set yet -> hasn't been set yet

0002: additional assert checking only. Looks right.

0003: pure renaming, looks good.

0004: Restores pre 0452b461bc state to preprocess_groupclause with removed
partial_match fallback. Looks right. I haven't checked the comments
provided they are restored from pre 0452b461bc state.

0005: Looks right.

Regards,
Pavel Borisov
Supabase

Re: POC: GROUP BY optimization

2024-06-02 Thread Alexander Korotkov

Hi!

On Sun, Jun 2, 2024 at 10:55 AM jian he  wrote:
>
> On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov  
> wrote:
> >
> > I've revised some grammar including the sentence you've proposed.
> >
>
> -static List *groupclause_apply_groupingset(PlannerInfo *root, List *force);
> +static List *preprocess_groupclause(PlannerInfo *root, List *force);
>
> changing preprocess_groupclause the second argument
> from "force" to "gset" would be more intuitive, I think.

Probably, but my intention is to restore preprocess_groupclause() as
it was before 0452b461bc with minimal edits to support incremental
sort.  I'd rather avoid refactoring if this area for now.

> `elog(ERROR, "Order of group-by clauses doesn't correspond incoming
> sort order");`
>
> I think this error message makes people wonder what "incoming sort order" is.
> BTW, "correspond", generally people use  "correspond to".

Thank you.  On the second thought, I think it would be better to turn
this into an assertion like the checks before.

> I did some minor cosmetic changes, mainly changing foreach to foreach_node.
> Please check the attachment.

I would avoid refactoring of preprocess_groupclause() for the reason
described above.  But I picked the grammar fix for PlannerInfo's
comment.

--
Regards,
Alexander Korotkov
Supabase


v4-0003-Rename-PathKeyInfo-to-GroupByOrdering.patch
Description: Binary data


v4-0004-Restore-preprocess_groupclause.patch
Description: Binary data


v4-0005-Teach-group_keys_reorder_by_pathkeys-about-redund.patch
Description: Binary data


v4-0001-Fix-asymmetry-in-setting-EquivalenceClass.ec_sort.patch
Description: Binary data


v4-0002-Add-invariants-check-to-get_useful_group_keys_ord.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-06-02 Thread jian he

On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov  wrote:
>
> I've revised some grammar including the sentence you've proposed.
>

-static List *groupclause_apply_groupingset(PlannerInfo *root, List *force);
+static List *preprocess_groupclause(PlannerInfo *root, List *force);

changing preprocess_groupclause the second argument
from "force" to "gset" would be more intuitive, I think.


`elog(ERROR, "Order of group-by clauses doesn't correspond incoming
sort order");`

I think this error message makes people wonder what "incoming sort order" is.
BTW, "correspond", generally people use  "correspond to".


I did some minor cosmetic changes, mainly changing foreach to foreach_node.
Please check the attachment.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6a1fb8e..801d9f6d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2852,15 +2852,13 @@ preprocess_groupclause(PlannerInfo *root, List *force)
 {
 	Query	   *parse = root->parse;
 	List	   *new_groupclause = NIL;
-	ListCell   *sl;
-	ListCell   *gl;
+	ListCell	*gl;
 
 	/* For grouping sets, we need to force the ordering */
-	if (force)
+	if (force != NIL)
 	{
-		foreach(sl, force)
+		foreach_int(ref, force)
 		{
-			Index		ref = lfirst_int(sl);
 			SortGroupClause *cl = get_sortgroupref_clause(ref, parse->groupClause);
 
 			new_groupclause = lappend(new_groupclause, cl);
@@ -2879,10 +2877,8 @@ preprocess_groupclause(PlannerInfo *root, List *force)
 	 *
 	 * This code assumes that the sortClause contains no duplicate items.
 	 */
-	foreach(sl, parse->sortClause)
+	foreach_node(SortGroupClause, sc, parse->sortClause)
 	{
-		SortGroupClause *sc = lfirst_node(SortGroupClause, sl);
-
 		foreach(gl, parse->groupClause)
 		{
 			SortGroupClause *gc = lfirst_node(SortGroupClause, gl);
@@ -2908,10 +2904,8 @@ preprocess_groupclause(PlannerInfo *root, List *force)
 	 * implemented using incremental sort.  Also, give up if there are any
 	 * non-sortable GROUP BY items, since then there's no hope anyway.
 	 */
-	foreach(gl, parse->groupClause)
+	foreach_node(SortGroupClause,gc, parse->groupClause)
 	{
-		SortGroupClause *gc = lfirst_node(SortGroupClause, gl);
-
 		if (list_member_ptr(new_groupclause, gc))
 			continue;			/* it matched an ORDER BY item */
 		if (!OidIsValid(gc->sortop))	/* give up, GROUP BY can't be sorted */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index af8de421..2ba297c1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -426,7 +426,7 @@ struct PlannerInfo
 	 * items to be proven redundant, implying that there is only one group
 	 * containing all the query's rows.  Hence, if you want to check whether
 	 * GROUP BY was specified, test for nonempty parse->groupClause, not for
-	 * nonempty processed_groupClause.  Optimiser chooses specific order of
+	 * nonempty processed_groupClause.  Optimizer chooses specific order of
 	 * group-by clauses during the upper paths generation process, attempting
 	 * to use different strategies to minimize number of sorts or engage
 	 * incremental sort.  See preprocess_groupclause() and

Re: POC: GROUP BY optimization

2024-05-30 Thread Alexander Korotkov

Hi!

On Thu, May 30, 2024 at 7:22 AM Andrei Lepikhov
 wrote:
> On 5/29/24 19:53, Alexander Korotkov wrote:
> > Thank you for your feedback.
> >
> > On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov
> >  wrote:
> >> On 5/27/24 19:41, Alexander Korotkov wrote:
> >>> Any thoughts?
> >> About 0001:
> >> Having overviewed it, I don't see any issues (but I'm the author),
> >> except grammatical ones - but I'm not a native to judge it.
> >> Also, the sentence 'turning GROUP BY clauses  into pathkeys' is unclear
> >> to me. It may be better to write something like:  'building pathkeys by
> >> the list of grouping clauses'.
> >
> > OK, thank you.  I'll run once again for the grammar issues.

I've revised some grammar including the sentence you've proposed.

> >> 0002:
> >> The part under USE_ASSERT_CHECKING looks good to me. But the code in
> >> group_keys_reorder_by_pathkeys looks suspicious: of course, we do some
> >> doubtful work without any possible way to reproduce, but if we envision
> >> some duplicated elements in the group_clauses, we should avoid usage of
> >> the list_concat_unique_ptr.
> >
> > As I understand Tom, there is a risk that clauses list may contain
> > multiple instances of equivalent SortGroupClause, not duplicate
> > pointers.
> Maybe. I just implemented the worst-case scenario with the intention of
> maximum safety. So, it's up to you.
> >
> >> What's more, why do you not exit from
> >> foreach_ptr immediately after SortGroupClause has been found? I think
> >> the new_group_clauses should be consistent with the new_group_pathkeys.
> >
> > I wanted this to be consistent with preprocess_groupclause(), where
> > duplicate SortGroupClause'es are grouped together.  Otherwise, we
> > could just delete redundant SortGroupClause'es.
> Hm, preprocess_groupclause is called before the standard_qp_callback
> forms the 'canonical form' of group_pathkeys and such behaviour needed.
> But the code that chooses the reordering strategy uses already processed
> grouping clauses, where we don't have duplicates in the first
> num_groupby_pathkeys elements, do we?

Yep, it seems that we work with group clauses which were processed by
standard_qp_callback().  In turn standard_qp_callback() called
make_pathkeys_for_sortclauses_extended() with remove_redundant ==
true.  So, there shouldn't be duplicates.  And it's unclear what
should we be protected from.

I leaved 0002 with just asserts.  And extracted questionable changes into 0005.

--
Regards,
Alexander Korotkov
Supabase


v3-0004-Restore-preprocess_groupclause.patch
Description: Binary data


v3-0005-Teach-group_keys_reorder_by_pathkeys-about-redund.patch
Description: Binary data


v3-0002-Add-invariants-check-to-get_useful_group_keys_ord.patch
Description: Binary data


v3-0003-Rename-PathKeyInfo-to-GroupByOrdering.patch
Description: Binary data


v3-0001-Fix-asymmetry-in-setting-EquivalenceClass.ec_sort.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-05-29 Thread Andrei Lepikhov


On 5/29/24 19:53, Alexander Korotkov wrote:

Hi, Andrei!

Thank you for your feedback.

On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov
 wrote:

On 5/27/24 19:41, Alexander Korotkov wrote:

Any thoughts?

About 0001:
Having overviewed it, I don't see any issues (but I'm the author),
except grammatical ones - but I'm not a native to judge it.
Also, the sentence 'turning GROUP BY clauses  into pathkeys' is unclear
to me. It may be better to write something like:  'building pathkeys by
the list of grouping clauses'.


OK, thank you.  I'll run once again for the grammar issues.


0002:
The part under USE_ASSERT_CHECKING looks good to me. But the code in
group_keys_reorder_by_pathkeys looks suspicious: of course, we do some
doubtful work without any possible way to reproduce, but if we envision
some duplicated elements in the group_clauses, we should avoid usage of
the list_concat_unique_ptr.


As I understand Tom, there is a risk that clauses list may contain
multiple instances of equivalent SortGroupClause, not duplicate
pointers.
Maybe. I just implemented the worst-case scenario with the intention of 
maximum safety. So, it's up to you.



What's more, why do you not exit from
foreach_ptr immediately after SortGroupClause has been found? I think
the new_group_clauses should be consistent with the new_group_pathkeys.


I wanted this to be consistent with preprocess_groupclause(), where
duplicate SortGroupClause'es are grouped together.  Otherwise, we
could just delete redundant SortGroupClause'es.
Hm, preprocess_groupclause is called before the standard_qp_callback 
forms the 'canonical form' of group_pathkeys and such behaviour needed. 
But the code that chooses the reordering strategy uses already processed 
grouping clauses, where we don't have duplicates in the first 
num_groupby_pathkeys elements, do we?

0004:
I was also thinking about reintroducing the preprocess_groupclause
because with the re-arrangement of GROUP-BY clauses according to
incoming pathkeys, it doesn't make sense to have a user-defined order—at
least while cost_sort doesn't differ costs for alternative column orderings.
So, I'm okay with the code. But why don't you use the same approach with
foreach_ptr as before?


I restored the function as it was before 0452b461bc with minimal edits
to support the incremental sort.  I think it would be more valuable to
keep the difference with pg16 code small rather than refactor to
simplify existing code.

Got it

--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2024-05-27 Thread Alexander Korotkov

Hi!

On Tue, Apr 23, 2024 at 1:19 AM Alexander Korotkov  wrote:
> On Thu, Apr 18, 2024 at 1:57 PM Alexander Korotkov  
> wrote:
> > Thank you for the fixes you've proposed.  I didn't look much into
> > details yet, but I think the main concern Tom expressed in [1] is
> > whether the feature is reasonable at all.  I think at this stage the
> > most important thing is to come up with convincing examples showing
> > how huge performance benefits it could cause.  I will return to this
> > later today and will try to provide some convincing examples.
>
> I took a fresh look at 0452b461b, and have the following thoughts:
> 1) Previously, preprocess_groupclause() tries to reorder GROUP BY
> clauses to match the required ORDER BY order.  It only reorders if
> GROUP BY pathkeys are the prefix of ORDER BY pathkeys or vice versa.
> So, both of them need to be satisfied by one ordering.  0452b461b also
> tries to match GROUP BY clauses to ORDER BY clauses, but takes into
> account an incremental sort.  Actually, instead of removing
> preprocess_groupclause(), we could just teach it to take incremental
> sort into account.
> 2) The get_useful_group_keys_orderings() function takes into account 3
> orderings of pathkeys and clauses: original order as written in GROUP
> BY, matching ORDER BY clauses as much as possible, and matching the
> input path as much as possible.  Given that even before 0452b461b,
> preprocess_groupclause() could change the original order of GROUP BY
> clauses, so we don't need to distinguish the first two.  We could just
> consider output of new preprocess_groupclause() taking into account an
> incremental sort and the ordering matching the input path as much as
> possible.  This seems like significant simplification.
>
> Let me also describe my thoughts about the justification of the
> feature itself.  As Tom pointed upthread, Sort + Grouping is generally
> unlikely faster than Hash Aggregate.  The main point of this feature
> is being able to match the order of GROUP BY clauses to the order of
> the input path.  That input path could be Index Scan or Subquery.
> Let's concentrate on Index Scan.  Index Scan can give us the required
> order, so we can do grouping without Sort or with significantly
> cheaper Incremental Sort.  That could be much faster than Hash
> Aggregate.  But if we scan the full table (or its significant
> fraction), Index Scan is typically much more expensive than Sequential
> Scan because of random page access.  However, there are cases when
> Index Scan could be cheaper.
> 1) If the heap row is wide and the index contains all the required
> columns, Index Only Scan can be cheaper than Sequential Scan because
> of lesser volume.
> 2) If the query predicate is selective enough and matches the index,
> Index Scan might be significantly cheaper.  One may say that if the
> query predicate is selective enough then there are not that many
> matching rows, so aggregating them either way isn't a big deal anyway.
> However, given that data volumes are growing tremendously, it's not
> hard to imagine that the index selected a small fraction of table
> rows, but they are still enough to be costly for aggregating.
> Therefore, I think there are significant cases where matching GROUP BY
> clauses to the order of the input path could give a substantial
> improvement over Hash Aggregate.
>
> While there are some particular use-cases by Jian He, I hope that
> above could give some rationale.

I've assembled patches in this thread into one patchset.
0001 The patch fixing asymmetry in setting EquivalenceClass.ec_sortref
by Andrei [1].  I've revised comments and wrote the commit message.
0002 The patch for handling duplicates of SortGroupClause.  I didn't
get the sense of Andrei implementation.  It seems to care about
duplicate pointers in group clauses list.  But the question is the
equal SortGroupClause's comprising different pointers.  I think we
should group duplicate SortGroupClause's together as
preprocess_groupclause() used to do.  Reimplemented patch to do so.
0003 Rename PathKeyInfo to GroupByOrdering by Andres [3].  I only
revised comments and wrote the commit message.
0004 Turn back preprocess_groupclause() for the reason I described upthread [4].

Any thoughts?

Links.
1. 
https://www.postgresql.org/message-id/17037754-f187-4138-8285-0e2bfebd0dea%40postgrespro.ru
2. 
https://www.postgresql.org/message-id/a663f0f6-cbf6-49aa-af2e-234dc6768a07%40postgrespro.ru
3. 
https://www.postgresql.org/message-id/db0fc3a4-966c-4cec-a136-94024d39212d%40postgrespro.ru
4. 
https://www.postgresql.org/message-id/CAPpHfdvyWLMGwvxaf%3D7KAp-z-4mxbSH8ti2f6mNOQv5metZFzg%40mail.gmail.com

--
Regards,
Alexander Korotkov
Supabase


v2-0003-Rename-PathKeyInfo-to-GroupByOrdering.patch
Description: Binary data


v2-0004-Restore-preprocess_groupclause.patch
Description: Binary data


v2-0002-Teach-group_keys_reorder_by_pathkeys-about-redund.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-05-23 Thread jian he

On Mon, May 20, 2024 at 4:54 PM jian he  wrote:
>
>
> The behavior is still the same as the master.
> meaning that below quoted queries are still using "Presorted Key: x".
>
> > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
> > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
> > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
> > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
> > > the above part will use:
> > >->  Incremental Sort
> > >   Sort Key: x, $, $, $
> > >   Presorted Key: x
> > >   ->  Index Scan using t1_x_y_idx on t1
>
>
> > 2. Could you try to find the reason?
> the following are my understanding, it may be wrong...
>

I think
my previous thread only explained that two paths have the same cost,
the planner chooses the one that was first added into the pathlist.

but didn't explain why Incremental Sort path, presorted by one key and
presorted by two keys
yield the same cost.

if (rel->tuples > 0)
{
/*
* Clamp to size of rel, or size of rel / 10 if multiple Vars. The
* fudge factor is because the Vars are probably correlated but we
* don't know by how much.  We should never clamp to less than the
* largest ndistinct value for any of the Vars, though, since
* there will surely be at least that many groups.
*/
double clamp = rel->tuples;

if (relvarcount > 1)
{
clamp *= 0.1;
if (clamp < relmaxndistinct)
{
clamp = relmaxndistinct;
/* for sanity in case some ndistinct is too large: */
if (clamp > rel->tuples)
clamp = rel->tuples;
}
}
if (reldistinct > clamp)
reldistinct = clamp;
...
}

i think,
the above code[0] snippet within function estimate_num_groups
makes Incremental Sort Path not pickup the optimal presorted keys.

see original problem thread: [1]

CREATE TABLE t1 AS
SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
will generate 2 Incremental Sort path, one: "Presorted Key: x",
another one: "Presorted Key: x,y".
The first Incremental Sort path is added first.
The function estimate_num_groups returned value (input_groups) is the
main key to
calculate the cost of Incremental Sort path!

But here, the estimate_num_groups function returned the same
value: 10 for
"Presorted Key: x" and "Presorted Key: x,y".
then later cost_incremental_sort returns the same cost for "Presorted
Key: x" and "Presorted Key: x,y".

(line refers to src/backend/utils/adt/selfuncs line number).
why "Presorted Key: x,y" return 10 in estimate_num_groups:
line 3667 assign clamp to 100.0, because rel->tuples is 100.
line 3671 clamp *= 0.1; make clamp because 10.
line 3680, 3681  `if (reldistinct > clamp) branch` make reldistinct
from 100 to 10.
line 3733 make  `numdistinct *= reldistinct;` makes numdistinct because 10.

If I change the total number of rows in a relation, or change the
distinct number of y values
then the planner will use "Incremental Sort Presorted Key: x,y".
for example:

CREATE TABLE t10 AS
SELECT (i % 10)::numeric AS x,(i % 11)::int8 AS y,'abc' || i % 10
AS z, i::float4 AS w FROM generate_series(1, 1E2) AS i;
CREATE INDEX t10_x_y_idx ON t10 (x, y);
ANALYZE t10;

The above setup will make the following query using "Incremental Sort
Presorted Key: x,y".
EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,z,y,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,w,y,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,z,w,y;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,w,z,y;

summary:
* the regression test setup at [2] can make some cases not pickup the
best optimal
Incremental Sort path.

we use this test in src/test/regress/sql/aggregates.sql
-- Engage incremental sort
EXPLAIN (COSTS OFF)
SELECT count(*) FROM btg GROUP BY z, y, w, x;

but if we use:
EXPLAIN (COSTS OFF)
SELECT count(*) FROM btg GROUP BY x, z, w, y;
then the plan is not the best.

* The regression test setup makes estimate_num_groups code logic
return the same value.
therefore make Incremental Sort presort by one key, two keys yield the
same cost.
the cost_incremental_sort will return the same cost.

* 
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/aggregates.sql#n1194
maybe we can change:

CREATE TABLE btg AS SELECT
  i % 10 AS x,
  i % 10 AS y,
  'abc' || i % 10 AS z,
  i AS w
FROM generate_series(1, 100) AS i;

to

CREATE TABLE btg AS SELECT
  i % 10 AS x,
  i % 10 AS y,
  'abc' || i % 10 AS z,
  i AS w
FROM generate_series(1, 1000) AS i;

[0] 
https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/utils/adt/selfuncs.c#n3669
[1]

Re: POC: GROUP BY optimization

2024-05-21 Thread Andrei Lepikhov


On 20/5/2024 15:54, jian he wrote:

As mentioned previously,
both A and B can use Incremental Sort, set_cheapest will choose the A
instead of B,
because A step generated path **first** satisfies increment sort.

Yeah, I agree with your analysis.
Looking into the cost_incremental_sort, I see that we have ten 
group_tuples. This value doesn't depend on how many presorted keys (1 or 
2) we use. This is caused by default estimation.
Given the current circumstances, it seems unlikely that we can make any 
significant changes without introducing a new sort cost model that 
accounts for the number of sorted columns.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2024-05-20 Thread jian he

On Thu, May 16, 2024 at 3:47 PM Andrei Lepikhov
 wrote:
>
> On 24.04.2024 13:25, jian he wrote:
> > hi.
> > I found an interesting case.
> >
> > CREATE TABLE t1 AS
> >SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
> > z, i::int4 AS w
> >FROM generate_series(1, 100) AS i;
> > CREATE INDEX t1_x_y_idx ON t1 (x, y);
> > ANALYZE t1;
> > SET enable_hashagg = off;
> > SET enable_seqscan = off;
> >
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
> > the above part will use:
> >->  Incremental Sort
> >   Sort Key: x, $, $, $
> >   Presorted Key: x
> >   ->  Index Scan using t1_x_y_idx on t1
> >
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z;
> >
> > these will use:
> >->  Incremental Sort
> >   Sort Key: x, y, $, $
> >   Presorted Key: x, y
> >   ->  Index Scan using t1_x_y_idx on t1
> >
> > I guess this is fine, but not optimal?
> It looks like a bug right now - in current implementation we don't
> differentiate different orders. So:
> 1. Applying all the patches from the thread which I proposed as an
> answer to T.Lane last rebuke - does behavior still the same?.

I've applied these 4 patches in  this thread.
final_improvements.patch
minor_comment.patch
get_useful_group_keys_orderings.patch
group_by_asymmetry.diff

The behavior is still the same as the master.
meaning that below quoted queries are still using "Presorted Key: x".

> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
> > the above part will use:
> >->  Incremental Sort
> >   Sort Key: x, $, $, $
> >   Presorted Key: x
> >   ->  Index Scan using t1_x_y_idx on t1

> 2. Could you try to find the reason?
the following are my understanding, it may be wrong...

function get_useful_group_keys_orderings, may generate alternative
pathkeys and group clauses.
for different PathKeyInfo
sub functions within add_paths_to_grouping_rel:
make_ordered_path, create_agg_path will yield the same cost.
function add_path (within add_paths_to_grouping_rel) will add these
paths (different PathKeyInfos) in a *sequential* order.

then later in the function set_cheapest.
`foreach(p, parent_rel->pathlist)`  will iterate these different paths
in a sequential order.
so in set_cheapest if 2 path costs are the same, then the first path
residing in parent_rel->pathlist wins.

fo a concrete example:
CREATE TABLE t1 AS
SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::float4 AS w FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;
EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY x,z,y,w;

add_paths_to_grouping_rel will do the following in a sequential order.
A. generate and add original PathKeyInfo(groupby x,z,y,w)
B. generate and add alternative PathKeyInfo(groupby x,y,z,w). (because
of the index path)
C. can_hash is true, generate a HashAgg Path (because of
enable_hashagg is set to off, so this path the cost is very high)

As mentioned previously,
both A and B can use Incremental Sort, set_cheapest will choose the A
instead of B,
because A step generated path **first** satisfies increment sort.

Re: POC: GROUP BY optimization

2024-05-16 Thread Andrei Lepikhov


On 24.04.2024 13:25, jian he wrote:

hi.
I found an interesting case.

CREATE TABLE t1 AS
   SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
   FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
the above part will use:
   ->  Incremental Sort
  Sort Key: x, $, $, $
  Presorted Key: x
  ->  Index Scan using t1_x_y_idx on t1

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z;

these will use:
   ->  Incremental Sort
  Sort Key: x, y, $, $
  Presorted Key: x, y
  ->  Index Scan using t1_x_y_idx on t1

I guess this is fine, but not optimal?
It looks like a bug right now - in current implementation we don't 
differentiate different orders. So:
1. Applying all the patches from the thread which I proposed as an 
answer to T.Lane last rebuke - does behavior still the same?.

2. Could you try to find the reason?

--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2024-04-28 Thread jian he

On Wed, Apr 24, 2024 at 2:25 PM jian he  wrote:
>
> hi.
> I found an interesting case.
>
> CREATE TABLE t1 AS
>   SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
> z, i::int4 AS w
>   FROM generate_series(1, 100) AS i;
> CREATE INDEX t1_x_y_idx ON t1 (x, y);
> ANALYZE t1;
> SET enable_hashagg = off;
> SET enable_seqscan = off;
>
> EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
> EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
> EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
> EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
> the above part will use:
>   ->  Incremental Sort
>  Sort Key: x, $, $, $
>  Presorted Key: x
>  ->  Index Scan using t1_x_y_idx on t1

We can make these cases also `Presorted Key: x, y`.

in
`if (path->pathkeys && !pathkeys_contained_in(path->pathkeys,
root->group_pathkeys))` branch
we can simple do
-   infos = lappend(infos, info);
+   infos = lcons(info, infos);

similar to what we did at plancat.c (search lcons).

get_useful_group_keys_orderings returns a  list of PathKeyInfo,
then the caller function just iterates each element.
so for the caller, order of the returned list element from
get_useful_group_keys_orderings
does not matter.

for path Incremental Sort:
function make_ordered_path will return the same cost for different
numbers of  presorted keys.

for example:
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
make_ordered_path cost is same for:
`info->pathkeys: x,y,z,w`
`info->pathkeys:x,z,y,w`

if we arrange `info->pathkeys: x,y,z,w` before `info->pathkeys:x,z,y,w`
in get_useful_group_keys_orderings.
then with the same cost, we will choose the first one
(`info->pathkeys: x,y,z,w`),
if we use IncrementalSort, then we use `Presorted Key: x, y`.

Re: POC: GROUP BY optimization

2024-04-24 Thread jian he

hi
one more question (maybe a dumb one)

drop table if exists t1;
CREATE TABLE t1 AS
  SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
  FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;


EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY z,x,y,w
order by w;


  QUERY PLAN
--
 GroupAggregate
   Output: count(*), w, z, x, y
   Group Key: t1.w, t1.x, t1.y, t1.z
   ->  Sort
 Output: w, z, x, y
 Sort Key: t1.w, t1.x, t1.y, t1.z
 ->  Index Scan using t1_x_y_idx on public.t1
   Output: w, z, x, y
(8 rows)


if you do
` Sort Key: t1.w, t1.x, t1.y, t1.z`

then the output is supposed to be:

Output: w, x, y, z

?

Re: POC: GROUP BY optimization

2024-04-24 Thread jian he

hi.
I found an interesting case.

CREATE TABLE t1 AS
  SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
  FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
the above part will use:
  ->  Incremental Sort
 Sort Key: x, $, $, $
 Presorted Key: x
 ->  Index Scan using t1_x_y_idx on t1

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z;

these will use:
  ->  Incremental Sort
 Sort Key: x, y, $, $
 Presorted Key: x, y
 ->  Index Scan using t1_x_y_idx on t1

I guess this is fine, but not optimal?

Re: POC: GROUP BY optimization

2024-04-22 Thread Alexander Korotkov

Hi!

On Thu, Apr 18, 2024 at 1:57 PM Alexander Korotkov  wrote:
> Thank you for the fixes you've proposed.  I didn't look much into
> details yet, but I think the main concern Tom expressed in [1] is
> whether the feature is reasonable at all.  I think at this stage the
> most important thing is to come up with convincing examples showing
> how huge performance benefits it could cause.  I will return to this
> later today and will try to provide some convincing examples.

I took a fresh look at 0452b461b, and have the following thoughts:
1) Previously, preprocess_groupclause() tries to reorder GROUP BY
clauses to match the required ORDER BY order.  It only reorders if
GROUP BY pathkeys are the prefix of ORDER BY pathkeys or vice versa.
So, both of them need to be satisfied by one ordering.  0452b461b also
tries to match GROUP BY clauses to ORDER BY clauses, but takes into
account an incremental sort.  Actually, instead of removing
preprocess_groupclause(), we could just teach it to take incremental
sort into account.
2) The get_useful_group_keys_orderings() function takes into account 3
orderings of pathkeys and clauses: original order as written in GROUP
BY, matching ORDER BY clauses as much as possible, and matching the
input path as much as possible.  Given that even before 0452b461b,
preprocess_groupclause() could change the original order of GROUP BY
clauses, so we don't need to distinguish the first two.  We could just
consider output of new preprocess_groupclause() taking into account an
incremental sort and the ordering matching the input path as much as
possible.  This seems like significant simplification.

Let me also describe my thoughts about the justification of the
feature itself.  As Tom pointed upthread, Sort + Grouping is generally
unlikely faster than Hash Aggregate.  The main point of this feature
is being able to match the order of GROUP BY clauses to the order of
the input path.  That input path could be Index Scan or Subquery.
Let's concentrate on Index Scan.  Index Scan can give us the required
order, so we can do grouping without Sort or with significantly
cheaper Incremental Sort.  That could be much faster than Hash
Aggregate.  But if we scan the full table (or its significant
fraction), Index Scan is typically much more expensive than Sequential
Scan because of random page access.  However, there are cases when
Index Scan could be cheaper.
1) If the heap row is wide and the index contains all the required
columns, Index Only Scan can be cheaper than Sequential Scan because
of lesser volume.
2) If the query predicate is selective enough and matches the index,
Index Scan might be significantly cheaper.  One may say that if the
query predicate is selective enough then there are not that many
matching rows, so aggregating them either way isn't a big deal anyway.
However, given that data volumes are growing tremendously, it's not
hard to imagine that the index selected a small fraction of table
rows, but they are still enough to be costly for aggregating.
Therefore, I think there are significant cases where matching GROUP BY
clauses to the order of the input path could give a substantial
improvement over Hash Aggregate.

While there are some particular use-cases by Jian He, I hope that
above could give some rationale.

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

2024-04-22 Thread jian he

On Fri, Apr 19, 2024 at 6:44 PM jian he  wrote:
>
> On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov  
> wrote:
> >
> > Thank you for the fixes you've proposed.  I didn't look much into
> > details yet, but I think the main concern Tom expressed in [1] is
> > whether the feature is reasonable at all.  I think at this stage the
> > most important thing is to come up with convincing examples showing
> > how huge performance benefits it could cause.  I will return to this
> > later today and will try to provide some convincing examples.
> >

hi.
previously preprocess_groupclause will not process cases
where no ORDER BY clause is specified.
commit 0452b461b will reorder the GROUP BY element even though no
ORDER BY clause is specified
, if there are associated indexes on it.
(hope I understand it correctly).

for example (when enable_hashagg is false)
explain(verbose) select count(*) FROM btg GROUP BY y,x;
in pg16 will not reorder, it will be as is: `GROUP BY y,x`

after commit 0452b461b, it will reorder to `GROUP BY x,y`.
because there is an index `btree (x, y)` (only one) associated with it.
if you drop the index `btree (x, y)` , it will be `GROUP BY y,x` as pg16.

This reordering GROUP BY element when no ORDER BY clause is not specified
is performant useful when the work_mem is small.
I've attached some tests comparing master with REL_16_STABLE to
demonstrate that.
all the tests attached are under the condition:
work_mem='64kB', buildtype=release, max_parallel_workers_per_gather=0.

one example:
CREATE TABLE btg5 AS
SELECT i::numeric % 10 AS x, i % 10 AS y, 'abc' || i % 10 AS z, i % 10 AS w
FROM generate_series(1, 1e6) AS i;
CREATE INDEX btg5_x_y_idx ON btg5(x, y);

explain(analyze) SELECT count(*) FROM btg5 GROUP BY z, y, w, x;
in pg17,  the execution time is: 746.574 ms
in pg16,  the execution time is: 1693.483 ms

if I reorder it manually as:
`explain(analyze) SELECT count(*) FROM btg5 GROUP BY x, y, w, z;`
then in pg16, the execution time is 630.394 ms

low_mem_groupby_reorder.sql
Description: application/sql

Re: POC: GROUP BY optimization

2024-04-21 Thread Andrei Lepikhov


On 4/12/24 06:44, Tom Lane wrote:

If this patch were producing better results I'd be more excited
about putting more work into it.  But on the basis of what I'm
seeing right now, I think maybe we ought to give up on it.
Let me show current cases where users have a profit with this tiny 
improvement (see queries and execution results in query.sql):
1. 'Not optimised query text' — when we didn't consider group-by 
ordering during database development.
2. 'Accidental pathkeys' - we didn't see any explicit orderings, but 
accidentally, the planner used merge join that caused some orderings and 
we can utilise it.
3. 'Uncertain scan path' — We have options regarding which index to use, 
and because of that, we can't predict the optimal group-by ordering 
before the start of query planning.
4. 'HashAgg V/S GroupAgg' — sometimes, the GroupAgg strategy outperforms 
HashAgg just because we don't need any ordering procedure at all.


And the last thing here — this code introduces the basics needed to add 
more sophisticated strategies, like ordering according to uniqueness or 
preferring to set constant-width columns first in grouping.


--
regards,
Andrei Lepikhov
Postgres Professional


query.sql
Description: application/sql

Re: POC: GROUP BY optimization

2024-04-19 Thread jian he

On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov  wrote:
>
> Thank you for the fixes you've proposed.  I didn't look much into
> details yet, but I think the main concern Tom expressed in [1] is
> whether the feature is reasonable at all.  I think at this stage the
> most important thing is to come up with convincing examples showing
> how huge performance benefits it could cause.  I will return to this
> later today and will try to provide some convincing examples.
>

hi.
I found a case where it improved performance.

+-- GROUP BY optimization by reorder columns
+CREATE TABLE btg AS SELECT
+ i % 100 AS x,
+ i % 100 AS y,
+ 'abc' || i % 10 AS z,
+ i AS w
+FROM generate_series(1,1) AS i;
+CREATE INDEX abc ON btg(x,y);
+ANALYZE btg;
+
I change
+FROM generate_series(1,1) AS i;
to
+ FROM generate_series(1, 1e6) AS i;

Then I found out about these 2 queries performance improved a lot.
A: explain(analyze) SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER
BY y, x \watch i=0.1 c=10
B: explain(analyze) SELECT count(*) FROM btg GROUP BY w, x, z, y ORDER
BY y, x, z, w \watch i=0.1 c=10

set (enable_seqscan,enable_hashagg) from on to off:
queryA execution time from 1533.013 ms to 533.430 ms
queryB execution time from 1996.817 ms to 497.020 ms

Re: POC: GROUP BY optimization

2024-04-18 Thread Alexander Korotkov

Hi, Andrei!

On Thu, Apr 18, 2024 at 11:54 AM Andrei Lepikhov
 wrote:
> On 4/12/24 06:44, Tom Lane wrote:
> > * Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node
> > type name, and the comments provided for it are not going to educate
> > anybody.  What is the "association" between the pathkeys and clauses?
> > I guess the clauses are supposed to be SortGroupClauses, but not all
> > pathkeys match up to SortGroupClauses, so what then?  This is very
> > underspecified, and fuzzy thinking about data structures tends to lead
> > to bugs.
> I'm not the best in English documentation and naming style. So, feel
> free to check my version.
> >
> > So I'm quite afraid that there are still bugs lurking here.
> > What's more, given that the plans this patch makes apparently
> > seldom win when you don't put a thumb on the scales, it might
> > take a very long time to isolate those bugs.  If the patch
> > produces a faulty plan (e.g. not sorted properly) we'd never
> > notice if that plan isn't chosen, and even if it is chosen
> > it probably wouldn't produce anything as obviously wrong as
> > a crash.
> I added checkings on the proper order of pathkeys and clauses.
>   If you really care about that, we should spend additional time and
> rewrite the code to generate an order of clauses somewhere in the plan
> creation phase. For example, during the create_group_plan call, we could
> use the processed_groupClause list, cycle through subpath->pathkeys, set
> the order, and detect whether the pathkeys list corresponds to the
> group-by or is enough to build a grouping plan.
> Anyway, make this part of code more resistant to blunders is another story.

Thank you for the fixes you've proposed.  I didn't look much into
details yet, but I think the main concern Tom expressed in [1] is
whether the feature is reasonable at all.  I think at this stage the
most important thing is to come up with convincing examples showing
how huge performance benefits it could cause.  I will return to this
later today and will try to provide some convincing examples.

Links
1. https://www.postgresql.org/message-id/266850.1712879082%40sss.pgh.pa.us

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

2024-04-18 Thread Andrei Lepikhov


On 4/12/24 06:44, Tom Lane wrote:

* Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node
type name, and the comments provided for it are not going to educate
anybody.  What is the "association" between the pathkeys and clauses?
I guess the clauses are supposed to be SortGroupClauses, but not all
pathkeys match up to SortGroupClauses, so what then?  This is very
underspecified, and fuzzy thinking about data structures tends to lead
to bugs.
I'm not the best in English documentation and naming style. So, feel 
free to check my version.


So I'm quite afraid that there are still bugs lurking here.
What's more, given that the plans this patch makes apparently
seldom win when you don't put a thumb on the scales, it might
take a very long time to isolate those bugs.  If the patch
produces a faulty plan (e.g. not sorted properly) we'd never
notice if that plan isn't chosen, and even if it is chosen
it probably wouldn't produce anything as obviously wrong as
a crash.

I added checkings on the proper order of pathkeys and clauses.
 If you really care about that, we should spend additional time and 
rewrite the code to generate an order of clauses somewhere in the plan 
creation phase. For example, during the create_group_plan call, we could 
use the processed_groupClause list, cycle through subpath->pathkeys, set 
the order, and detect whether the pathkeys list corresponds to the 
group-by or is enough to build a grouping plan.

Anyway, make this part of code more resistant to blunders is another story.

--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index aa80f6486c..9c079270ec 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -461,7 +461,7 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 /*
  * pathkeys_are_duplicate
  *		Check if give pathkeys are already contained the list of
- *		PathKeyInfo's.
+ *		GroupByOrdering's.
  */
 static bool
 pathkeys_are_duplicate(List *infos, List *pathkeys)
@@ -470,7 +470,7 @@ pathkeys_are_duplicate(List *infos, List *pathkeys)
 
 	foreach(lc, infos)
 	{
-		PathKeyInfo *info = lfirst_node(PathKeyInfo, lc);
+		GroupByOrdering *info = lfirst_node(GroupByOrdering, lc);
 
 		if (compare_pathkeys(pathkeys, info->pathkeys) == PATHKEYS_EQUAL)
 			return true;
@@ -482,7 +482,7 @@ pathkeys_are_duplicate(List *infos, List *pathkeys)
  * get_useful_group_keys_orderings
  *		Determine which orderings of GROUP BY keys are potentially interesting.
  *
- * Returns a list of PathKeyInfo items, each representing an interesting
+ * Returns a list of GroupByOrdering items, each representing an interesting
  * ordering of GROUP BY keys.  Each item stores pathkeys and clauses in the
  * matching order.
  *
@@ -495,15 +495,15 @@ pathkeys_are_duplicate(List *infos, List *pathkeys)
 List *
 get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 {
-	Query		   *parse = root->parse;
-	List		   *infos = NIL;
-	PathKeyInfo	   *info;
+	Query			   *parse = root->parse;
+	List			   *infos = NIL;
+	GroupByOrdering	   *info;
 
 	List	   *pathkeys = root->group_pathkeys;
 	List	   *clauses = root->processed_groupClause;
 
 	/* always return at least the original pathkeys/clauses */
-	info = makeNode(PathKeyInfo);
+	info = makeNode(GroupByOrdering);
 	info->pathkeys = pathkeys;
 	info->clauses = clauses;
 	infos = lappend(infos, info);
@@ -539,7 +539,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 			(enable_incremental_sort || n == root->num_groupby_pathkeys) &&
 			!pathkeys_are_duplicate(infos, pathkeys))
 		{
-			info = makeNode(PathKeyInfo);
+			info = makeNode(GroupByOrdering);
 			info->pathkeys = pathkeys;
 			info->clauses = clauses;
 
@@ -564,7 +564,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 			(enable_incremental_sort || n == list_length(root->sort_pathkeys)) &&
 			!pathkeys_are_duplicate(infos, pathkeys))
 		{
-			info = makeNode(PathKeyInfo);
+			info = makeNode(GroupByOrdering);
 			info->pathkeys = pathkeys;
 			info->clauses = clauses;
 
@@ -574,18 +574,29 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 
 #ifdef USE_ASSERT_CHECKING
 	{
-		PathKeyInfo	   *pinfo = linitial_node(PathKeyInfo, infos);
+		GroupByOrdering	   *pinfo = linitial_node(GroupByOrdering, infos);
 		ListCell	   *lc;
 
 		/* Test consistency of info structures */
 		for_each_from(lc, infos, 1)
 		{
-			info = lfirst_node(PathKeyInfo, lc);
+			ListCell *lc1, *lc2;
+
+			info = lfirst_node(GroupByOrdering, lc);
 
 			Assert(list_length(info->clauses) == list_length(pinfo->clauses));
 			Assert(list_length(info->pathkeys) == list_length(pinfo->pathkeys));
 			Assert(list_difference(info->clauses, pinfo->clauses) == NIL);
 			Assert(list_difference_ptr(info->pathkeys, pinfo->pathkeys) == NIL);
+
+			forboth(lc1, info->clauses, lc2, info->pathkeys)
+			{
+SortGroupClause *sgc

Re: POC: GROUP BY optimization

2024-04-17 Thread Andrei Lepikhov


On 4/12/24 06:44, Tom Lane wrote:

* It seems like root->processed_groupClause no longer has much to do
with the grouping behavior, which is scary given how much code still
believes that it does.  I suspect there are bugs lurking there, and

1. Analysing the code, processed_groupClause is used in:
- adjust_foreign_grouping_path_cost - to decide, do we need to add cost 
of sort to the foreign grouping.

- just for replacing relids during SJE process.
- create_groupingsets_plan(), preprocess_grouping_sets, 
remove_useless_groupby_columns - we don't apply this feature in the case 
of grouping sets.
- get_number_of_groups - estimation shouldn't depend on the order of 
clauses.

- create_grouping_paths - to analyse hashability of grouping clause
- make_group_input_target, make_partial_grouping_target - doesn't depend 
on the order of clauses
planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in 
the case of hash grouping.


So, the only case we can impact current behavior lies in the 
postgres_fdw. But here we don't really know which plan will be chosen 
during planning on foreign node and stay the same behavior is legal for me.

am not comforted by the fact that the patch changed exactly nothing
in the pathnodes.h documentation of that field.  This comment looks
pretty silly now too:

 /* Preprocess regular GROUP BY clause, if any */
 root->processed_groupClause = list_copy(parse->groupClause);

What "preprocessing" is going on there?  This comment was adequate
before, when the code line invoked preprocess_groupclause which had
a bunch of commentary; but now it has to stand on its own and it's
not doing a great job of that.

Thanks for picking this place. I fixed it.
More interesting here is that we move decisions on the order of group-by 
clauses to the stage, where we already have underlying subtree and 
incoming path keys. In principle, we could return the preprocessing 
routine and arrange GROUP-BY with the ORDER-BY clause as it was before. 
But looking into the code, I found only one reason to do this: 
adjust_group_pathkeys_for_groupagg. Curious about how much profit we get 
here, I think we can discover it later with no hurry. A good outcome 
here will be a test case that can show the importance of arranging 
GROUP-BY and ORDER-BY at an early stage.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5ea3705796..861656a192 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1424,7 +1424,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction,
 		}
 		else if (parse->groupClause)
 		{
-			/* Preprocess regular GROUP BY clause, if any */
+			/*
+			 * Make a copy of origin groupClause because we are going to
+			 * remove redundant clauses.
+			 */
 			root->processed_groupClause = list_copy(parse->groupClause);
 			/* Remove any redundant GROUP BY columns */
 			remove_useless_groupby_columns(root);

Re: POC: GROUP BY optimization

2024-04-16 Thread Andrei Lepikhov


On 4/12/24 06:44, Tom Lane wrote:

* I'm pretty unconvinced by group_keys_reorder_by_pathkeys (which
I notice has already had one band-aid added to it since commit).
In particular, it seems to believe that the pathkeys and clauses
lists match one-for-one, but I seriously doubt that that invariant
remains guaranteed after the cleanup steps

 /* append the remaining group pathkeys (will be treated as not sorted) */
 *group_pathkeys = list_concat_unique_ptr(new_group_pathkeys,
  *group_pathkeys);
 *group_clauses = list_concat_unique_ptr(new_group_clauses,
 *group_clauses);

For that to be reliable, the SortGroupClauses added to
new_group_clauses in the main loop have to be exactly those
that are associated with the same pathkeys in the old lists.
I doubt that that's necessarily true in the presence of redundant
grouping clauses.  (Maybe we can't get here with any redundant
grouping clauses, but still, we don't really guarantee uniqueness of
SortGroupClauses, much less that they are never copied which is what
you need if you want to believe that pointer equality is sufficient
for de-duping here.  PathKeys are explicitly made to be safe to compare
pointer-wise, but I know of no such guarantee for SortGroupClauses.)
I spent a lot of time inventing situations with SortGroupClause 
duplicates. Unfortunately, it looks impossible so far. But because we 
really don't guarantee uniqueness, I changed the code to survive in this 
case. Also, I added assertion checking to be sure we don't have logical 
mistakes here - see attachment.
About the band-aid mentioned above - as I see, 4169850 introduces the 
same trick in planner.c. So, it looks like result of design of the 
current code.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 5d9597adcd..aa80f6486c 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -380,15 +380,18 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 
 	/*
 	 * We're going to search within just the first num_groupby_pathkeys of
-	 * *group_pathkeys.  The thing is that root->group_pathkeys is passed as
+	 * *group_pathkeys. The thing is that root->group_pathkeys is passed as
 	 * *group_pathkeys containing grouping pathkeys altogether with aggregate
-	 * pathkeys.  If we process aggregate pathkeys we could get an invalid
+	 * pathkeys. If we process aggregate pathkeys we could get an invalid
 	 * result of get_sortgroupref_clause_noerr(), because their
-	 * pathkey->pk_eclass->ec_sortref doesn't referece query targetlist.  So,
+	 * pathkey->pk_eclass->ec_sortref doesn't reference query targetlist. So,
 	 * we allocate a separate list of pathkeys for lookups.
 	 */
 	grouping_pathkeys = list_copy_head(*group_pathkeys, num_groupby_pathkeys);
 
+	/* Make a new copy before reordering clauses */
+	*group_clauses = list_copy(*group_clauses);
+
 	/*
 	 * Walk the pathkeys (determining ordering of the input path) and see if
 	 * there's a matching GROUP BY key. If we find one, we append it to the
@@ -400,8 +403,8 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 	 */
 	foreach(lc, pathkeys)
 	{
-		PathKey*pathkey = (PathKey *) lfirst(lc);
-		SortGroupClause *sgc;
+		PathKey			   *pathkey = (PathKey *) lfirst(lc);
+		SortGroupClause	   *sgc;
 
 		/*
 		 * Pathkeys are built in a way that allows simply comparing pointers.
@@ -431,17 +434,25 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 		Assert(OidIsValid(sgc->sortop));
 
 		new_group_pathkeys = lappend(new_group_pathkeys, pathkey);
+
+		/*
+		 * Keeping in mind that the SortGroupClause list doesn't guarantee
+		 * unique pointers we must explicitly transfer elements one-by-one.
+		 */
 		new_group_clauses = lappend(new_group_clauses, sgc);
+		*group_clauses = list_delete_ptr(*group_clauses, sgc);
 	}
 
 	/* remember the number of pathkeys with a matching GROUP BY key */
 	n = list_length(new_group_pathkeys);
 
-	/* append the remaining group pathkeys (will be treated as not sorted) */
+	/*
+	 * Append the remaining group pathkeys (will be treated as not sorted) and
+	 * grouping clauses.
+	 */
 	*group_pathkeys = list_concat_unique_ptr(new_group_pathkeys,
 			 *group_pathkeys);
-	*group_clauses = list_concat_unique_ptr(new_group_clauses,
-			*group_clauses);
+	*group_clauses = list_concat(new_group_clauses, *group_clauses);
 
 	list_free(grouping_pathkeys);
 	return n;
@@ -484,9 +495,9 @@ pathkeys_are_duplicate(List *infos, List *pathkeys)
 List *
 get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 {
-	Query	   *parse = root->parse;
-	List	   *infos = NIL;
-	PathKeyInfo *info;
+	Query		   *parse = root->parse;
+	List		   *infos = NIL;
+	PathKeyInfo	   *info;
 
 	List	   *pathkeys = root->group_pathkeys;
 	List

Re: POC: GROUP BY optimization

2024-04-16 Thread Andrei Lepikhov


On 4/12/24 06:44, Tom Lane wrote:

* The very first hunk causes get_eclass_for_sort_expr to have
side-effects on existing EquivalenceClass data structures.
What is the argument that that's not going to cause problems?
At the very least it introduces asymmetry, in that the first
caller wins the chance to change the EC's ec_sortref and later
callers won't be able to.

Let me resolve issues bit by bit.
Addressing your first note, 'asymmetry,' I discovered this part of the 
code. Before the 8d83a5d, it could cause some artefacts, but since then, 
a kind of asymmetry has been introduced into the GROUP-BY clause list.
As you can see in the attached patch, GROUP-BY reordering works even 
when the asymmetry of the 8d83a5d chooses different keys.


At the same time, I agree that implicitly setting anything in an 
exporting function, which should just look up for EC, is a questionable 
coding style. To avoid possible usage issues (in extensions, for 
example), I propose setting it up afterwards, explicitly forcing this 
action by input parameter - see my attempt in the attachment.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index a619ff9177..bc08dfadaf 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -652,18 +652,7 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 
 			if (opcintype == cur_em->em_datatype &&
 equal(expr, cur_em->em_expr))
-			{
-/*
- * Match!
- *
- * Copy the sortref if it wasn't set yet.  That may happen if
- * the ec was constructed from a WHERE clause, i.e. it doesn't
- * have a target reference at all.
- */
-if (cur_ec->ec_sortref == 0 && sortref > 0)
-	cur_ec->ec_sortref = sortref;
-return cur_ec;
-			}
+return cur_ec; /* Match! */
 		}
 	}
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 1d61881a6b..5d9597adcd 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1355,7 +1355,8 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	,
 	tlist,
 	false,
-	);
+	,
+	false);
 	/* It's caller error if not all clauses were sortable */
 	Assert(sortable);
 	return result;
@@ -1379,13 +1380,16 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
  * to remove any clauses that can be proven redundant via the eclass logic.
  * Even though we'll have to hash in that case, we might as well not hash
  * redundant columns.)
+ * *set_ec_sortref is true if caller wants to set up ec_sortref field in
+ * the pathkey Equivalence Class, if it have not initialized beforehand.
  */
 List *
 make_pathkeys_for_sortclauses_extended(PlannerInfo *root,
 	   List **sortclauses,
 	   List *tlist,
 	   bool remove_redundant,
-	   bool *sortable)
+	   bool *sortable,
+	   bool set_ec_sortref)
 {
 	List	   *pathkeys = NIL;
 	ListCell   *l;
@@ -1409,6 +1413,13 @@ make_pathkeys_for_sortclauses_extended(PlannerInfo *root,
 		   sortcl->nulls_first,
 		   sortcl->tleSortGroupRef,
 		   true);
+		if (pathkey->pk_eclass->ec_sortref == 0 && set_ec_sortref)
+			/*
+			 * Copy the sortref if it wasn't set yet.  That may happen if
+			 * the ec was constructed from a WHERE clause, i.e. it doesn't
+			 * have a target reference at all.
+			 */
+			pathkey->pk_eclass->ec_sortref = sortcl->tleSortGroupRef;
 
 		/* Canonical form eliminates redundant ordering keys */
 		if (!pathkey_is_redundant(pathkey, pathkeys))
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5320da51a0..dee98c9a90 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3391,7 +3391,7 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 		/*
 		 * With a plain GROUP BY list, we can remove any grouping items that
 		 * are proven redundant by EquivalenceClass processing.  For example,
-		 * we can remove y given "WHERE x = y GROUP BY x, y".  These aren't
+		 * we can remove y given "WHERE x get_eclass_for_sort_expr= y GROUP BY x, y".  These aren't
 		 * especially common cases, but they're nearly free to detect.  Note
 		 * that we remove redundant items from processed_groupClause but not
 		 * the original parse->groupClause.
@@ -3403,7 +3403,8 @@ standard_qp_callback(PlannerInfo *root, void *extra)
    >processed_groupClause,
    tlist,
    true,
-   );
+   ,
+   true);
 		if (!sortable)
 		{
 			/* Can't sort; no point in considering aggregate ordering either */
@@ -3453,7 +3454,8 @@ standard_qp_callback(PlannerInfo *root, void *extra)
    >processed_distinctClause,
    tlist,
    true,
-   );
+   ,
+

Re: POC: GROUP BY optimization

2024-04-11 Thread Andrei Lepikhov


On 4/12/24 06:44, Tom Lane wrote:

If this patch were producing better results I'd be more excited
about putting more work into it.  But on the basis of what I'm
seeing right now, I think maybe we ought to give up on it.
First, thanks for the deep review - sometimes, only a commit gives us a 
chance to get such observation :))).
On a broader note, introducing automatic group-by-order choosing is a 
step towards training the optimiser to handle poorly tuned incoming 
queries. While it's true that this may initially impact performance, 
it's crucial to weigh the potential benefits. So, beforehand, we should 
agree: Is it worth it?
If yes, I would say I see how often hashing doesn't work in grouping. 
Sometimes because of estimation errors, sometimes because grouping 
already has sorted input, sometimes in analytical queries when planner 
doesn't have enough memory for hashing. In analytical cases, the only 
way to speed up queries sometimes is to be smart with features like 
IncrementalSort and this one.
About low efficiency. Remember the previous version of the GROUP-BY 
optimisation - we disagreed on operator costs and the cost model in 
general. In the current version, we went the opposite - adding small 
features step-by-step. The current commit contains an integral part of 
the feature and is designed for safely testing the approach and adding 
more profitable parts like choosing group-by-order according to distinct 
values or unique indexes on grouping columns.
I have passed through the code being steered by the issues explained in 
detail. I see seven issues. Two of them definitely should be scrutinised 
right now, and I'm ready to do that.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2024-04-11 Thread Tom Lane

I'm late to the party, and I apologize for not having paid attention
to this thread for months ... but I am wondering why 0452b461b got
committed.  It seems to add a substantial amount of complexity and
planner cycles in return for not much.  The reason why I'm dubious
about it can be found in this comment that the patch deleted:

- * In principle it might be interesting to consider other orderings of the
- * GROUP BY elements, which could match the sort ordering of other
- * possible plans (eg an indexscan) and thereby reduce cost.  We don't
- * bother with that, though.  Hashed grouping will frequently win anyway.

Indeed, if you remove the

SET enable_hashagg = off;
SET enable_seqscan = off;

lines added to aggregates.sql, you find that every single one of the
test cases added there changes plan, except for the one case that is
there to prove that the planner *doesn't* pick this type of plan.
That shows that the planner doesn't believe any of the plans developed
here are cheaper than other alternatives it would normally consider
(usually HashAgg).  So it sure seems like we're spending a lot of
effort on uninteresting plans.  Maybe this is an artifact of testing
against toy tables and a useful win could be shown on bigger tables,
but I don't see any indication in the thread that anyone actually
demonstrated that.

I might hold still for this patch anyway if the patch were simpler
and more obviously correct, but there are scary bits in it:

* The very first hunk causes get_eclass_for_sort_expr to have
side-effects on existing EquivalenceClass data structures.
What is the argument that that's not going to cause problems?
At the very least it introduces asymmetry, in that the first
caller wins the chance to change the EC's ec_sortref and later
callers won't be able to.

* I'm pretty unconvinced by group_keys_reorder_by_pathkeys (which
I notice has already had one band-aid added to it since commit).
In particular, it seems to believe that the pathkeys and clauses
lists match one-for-one, but I seriously doubt that that invariant
remains guaranteed after the cleanup steps

/* append the remaining group pathkeys (will be treated as not sorted) */
*group_pathkeys = list_concat_unique_ptr(new_group_pathkeys,
 *group_pathkeys);
*group_clauses = list_concat_unique_ptr(new_group_clauses,
*group_clauses);

For that to be reliable, the SortGroupClauses added to
new_group_clauses in the main loop have to be exactly those
that are associated with the same pathkeys in the old lists.
I doubt that that's necessarily true in the presence of redundant
grouping clauses.  (Maybe we can't get here with any redundant
grouping clauses, but still, we don't really guarantee uniqueness of
SortGroupClauses, much less that they are never copied which is what
you need if you want to believe that pointer equality is sufficient
for de-duping here.  PathKeys are explicitly made to be safe to compare
pointer-wise, but I know of no such guarantee for SortGroupClauses.)

* It seems like root->processed_groupClause no longer has much to do
with the grouping behavior, which is scary given how much code still
believes that it does.  I suspect there are bugs lurking there, and
am not comforted by the fact that the patch changed exactly nothing
in the pathnodes.h documentation of that field.  This comment looks
pretty silly now too:

/* Preprocess regular GROUP BY clause, if any */
root->processed_groupClause = list_copy(parse->groupClause);

What "preprocessing" is going on there?  This comment was adequate
before, when the code line invoked preprocess_groupclause which had
a bunch of commentary; but now it has to stand on its own and it's
not doing a great job of that.

* Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node
type name, and the comments provided for it are not going to educate
anybody.  What is the "association" between the pathkeys and clauses?
I guess the clauses are supposed to be SortGroupClauses, but not all
pathkeys match up to SortGroupClauses, so what then?  This is very
underspecified, and fuzzy thinking about data structures tends to lead
to bugs.

So I'm quite afraid that there are still bugs lurking here.
What's more, given that the plans this patch makes apparently
seldom win when you don't put a thumb on the scales, it might
take a very long time to isolate those bugs.  If the patch
produces a faulty plan (e.g. not sorted properly) we'd never
notice if that plan isn't chosen, and even if it is chosen
it probably wouldn't produce anything as obviously wrong as
a crash.

If this patch were producing better results I'd be more excited
about putting more work into it.  But on the basis of what I'm
seeing right now, I think maybe we ought to give up on it.

regards, tom lane

Re: POC: GROUP BY optimization

2024-02-21 Thread Andrei Lepikhov

On 22/2/2024 13:35, Richard Guo wrote:

The avg() function on integer argument is commonly used in
aggregates.sql.  I don't think this is an issue.  See the first test
query in aggregates.sql.

Make sense

 > it should be parallel to the test cases for utilize the ordering of
 > index scan and subquery scan.

Also, I'm unsure about removing the disabling of the
max_parallel_workers_per_gather parameter. Have you discovered the
domination of the current plan over the partial one? Do the cost
fluctuations across platforms not trigger a parallel plan?

The table used for testing contains only 100 tuples, which is the size
of only one page.  I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.
I don't intend to argue it, but just for the information, I frequently 
reduce it to zero, allowing PostgreSQL to make a decision based on 
costs. It sometimes works much better, because one small table in multi 
join can disallow an effective parallel plan.

What's more, I suggest to address here the complaint from [1]. As I
see,
cost difference between Sort and IncrementalSort strategies in that
case
is around 0.5. To make the test more stable I propose to change it a
bit
and add a limit:
SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10
cost points.

I don't think that's necessary.  With Incremental Sort the final cost
is:

     GroupAggregate  (cost=1.66..19.00 rows=100 width=25)

while with full Sort it is:

     GroupAggregate  (cost=16.96..19.46 rows=100 width=25)

With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost.  Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.
As before, I won't protest here - it needs some computations about how 
much cost can be added by bulk extension of the relation blocks. If 
Maxim will answer that it's enough to resolve his issue, why not?

--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2024-02-21 Thread Richard Guo

On Thu, Feb 22, 2024 at 12:18 PM Andrei Lepikhov 
wrote:

> On 22/2/2024 09:09, Richard Guo wrote:
> > I looked through the v2 patch and have two comments.
> >
> > * The test case under "Check we don't pick aggregate path key instead of
> > grouping path key" does not have EXPLAIN to show the plan.  So how can
> > we ensure we do not mistakenly select the aggregate pathkeys instead of
> > the grouping pathkeys?
>


> I confirm it works correctly. I am not sure about the stability of the
> zeros number in the output on different platforms here:
> avg
> 
>   4.
>   5.
> It was why I'd used the format() function before. So, may we elaborate
> on the test and restrict the output?


The avg() function on integer argument is commonly used in
aggregates.sql.  I don't think this is an issue.  See the first test
query in aggregates.sql.

SELECT avg(four) AS avg_1 FROM onek;
   avg_1

 1.5000
(1 row)


> > * I don't think the test case introduced by e1b7fde418 is still needed,
> > because we already have one under "Utilize the ordering of merge join to
> > avoid a full Sort operation".  This kind of test case is just to ensure
> > that we are able to utilize the ordering of the subplans underneath.  So
> > it should be parallel to the test cases for utilize the ordering of
> > index scan and subquery scan.
>
> Also, I'm unsure about removing the disabling of the
> max_parallel_workers_per_gather parameter. Have you discovered the
> domination of the current plan over the partial one? Do the cost
> fluctuations across platforms not trigger a parallel plan?


The table used for testing contains only 100 tuples, which is the size
of only one page.  I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.


> What's more, I suggest to address here the complaint from [1]. As I see,
> cost difference between Sort and IncrementalSort strategies in that case
> is around 0.5. To make the test more stable I propose to change it a bit
> and add a limit:
> SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
> It makes efficacy of IncrementalSort more obvious difference around 10
> cost points.


I don't think that's necessary.  With Incremental Sort the final cost
is:

GroupAggregate  (cost=1.66..19.00 rows=100 width=25)

while with full Sort it is:

GroupAggregate  (cost=16.96..19.46 rows=100 width=25)

With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost.  Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.

Thanks
Richard

Re: POC: GROUP BY optimization

2024-02-21 Thread Andrei Lepikhov

On 22/2/2024 09:09, Richard Guo wrote:

On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov > wrote:

Hi, Richard!

> What do you think about the revisions for the test cases?

I've rebased your patch upthread. Did some minor beautifications.

> * The table 'btg' is inserted with 1 tuples, which seems a bit
> expensive for a test. I don't think we need such a big table to test
> what we want.

Your patch reduces the number of rows to 1000 tuples. I found it
possible to further reduce it to 100 tuples. That also allowed me to
save the plan in the test case introduced by e1b7fde418.

Please check if you're OK with the patch attached.

I looked through the v2 patch and have two comments.

* The test case under "Check we don't pick aggregate path key instead of
grouping path key" does not have EXPLAIN to show the plan. So how can
we ensure we do not mistakenly select the aggregate pathkeys instead of
the grouping pathkeys?
I confirm it works correctly. I am not sure about the stability of the
zeros number in the output on different platforms here:

avg

4.
5.
It was why I'd used the format() function before. So, may we elaborate
on the test and restrict the output?

* I don't think the test case introduced by e1b7fde418 is still needed,
because we already have one under "Utilize the ordering of merge join to
avoid a full Sort operation". This kind of test case is just to ensure
that we are able to utilize the ordering of the subplans underneath. So
it should be parallel to the test cases for utilize the ordering of
index scan and subquery scan.
I confirm, this version also checks ec_sortref initialization in the
case when ec are contructed from WHERE clause. Generally, I like more
one test for one issue instead of one test for all at once. But it works
and I don't have any reason to dispute it.

Also, I'm unsure about removing the disabling of the
max_parallel_workers_per_gather parameter. Have you discovered the
domination of the current plan over the partial one? Do the cost
fluctuations across platforms not trigger a parallel plan?

What's more, I suggest to address here the complaint from [1]. As I see,
cost difference between Sort and IncrementalSort strategies in that case
is around 0.5. To make the test more stable I propose to change it a bit
and add a limit:

SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10
cost points.

[1]
https://www.postgresql.org/message-id/CACG=ezaYM1tr6Lmp8PRH1aeZq=rbkxeotwgzmclad5mphfw...@mail.gmail.com

--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2024-02-21 Thread Richard Guo

On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov 
wrote:

> Hi, Richard!
>
> > What do you think about the revisions for the test cases?
>
> I've rebased your patch upthread.  Did some minor beautifications.
>
> > * The table 'btg' is inserted with 1 tuples, which seems a bit
> > expensive for a test.  I don't think we need such a big table to test
> > what we want.
>
> Your patch reduces the number of rows to 1000 tuples.  I found it
> possible to further reduce it to 100 tuples.  That also allowed me to
> save the plan in the test case introduced by e1b7fde418.
>
> Please check if you're OK with the patch attached.

I looked through the v2 patch and have two comments.

* The test case under "Check we don't pick aggregate path key instead of
grouping path key" does not have EXPLAIN to show the plan.  So how can
we ensure we do not mistakenly select the aggregate pathkeys instead of
the grouping pathkeys?

* I don't think the test case introduced by e1b7fde418 is still needed,
because we already have one under "Utilize the ordering of merge join to
avoid a full Sort operation".  This kind of test case is just to ensure
that we are able to utilize the ordering of the subplans underneath.  So
it should be parallel to the test cases for utilize the ordering of
index scan and subquery scan.

See the attached v3 patch.  I also made cosmetic tweaks to the comments,
and simplified a test case query.

Thanks
Richard

v3-0001-Multiple-revisions-to-the-GROUP-BY-reordering-tests.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-02-21 Thread Maxim Orlov

Hi!

Another issue on test introduced in 0452b461bc405. I think it may be
unstable in some circumstances.
For example, if we'll try to use different BLCKSZ. See, I've made a little
change in the number of tuples to be inserted:

$ git diff
diff --git a/src/test/regress/sql/aggregates.sql
b/src/test/regress/sql/aggregates.sql
index d6ed5d0eff..414078d4ec 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,1) AS i;
+FROM generate_series(1,11900) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;

And the bulk extension is kicked, so we got zeroed pages in the relation.
The plane is also changed,
switched to seq scan from index scan:
@@ -2734,7 +2734,7 @@
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,1) AS i;
+FROM generate_series(1,11900) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;
 -- GROUP BY optimization by reorder columns by frequency
@@ -2760,62 +2760,57 @@

 -- Engage incremental sort
 explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
-   QUERY PLAN
--
+  QUERY PLAN
+--
  Group
Group Key: x, y, z, w
-   ->  Incremental Sort
+   ->  Sort
  Sort Key: x, y, z, w
- Presorted Key: x, y
- ->  Index Scan using btg_x_y_idx on btg
-(6 rows)
+ ->  Seq Scan on btg
+(5 rows)
... and so on.

So, my proposal is simple. I think we need not just "ANALYZE btg", but
"VACUUM ANALYZE btg", to get rid of zeroed pages in this particular
case. PFA corresponding patch.

-- 
Best regards,
Maxim Orlov.


0001-Force-VACUUM-to-avoid-zeroed-pages-from-bulk-extensi.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-02-21 Thread Alexander Korotkov

Hi, Richard!

> What do you think about the revisions for the test cases?

I've rebased your patch upthread.  Did some minor beautifications.

> * The table 'btg' is inserted with 1 tuples, which seems a bit
> expensive for a test.  I don't think we need such a big table to test
> what we want.

Your patch reduces the number of rows to 1000 tuples.  I found it
possible to further reduce it to 100 tuples.  That also allowed me to
save the plan in the test case introduced by e1b7fde418.

Please check if you're OK with the patch attached.

--
Regards,
Alexander Korotkov


0001-Multiple-revises-for-the-GROUP-BY-reordering-test-v2.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-02-21 Thread Richard Guo

On Fri, Feb 2, 2024 at 12:40 PM Andrei Lepikhov 
wrote:

> On 2/2/2024 11:06, Richard Guo wrote:
> >
> > On Fri, Feb 2, 2024 at 11:32 AM Richard Guo  > > wrote:
> >
> > On Fri, Feb 2, 2024 at 10:02 AM Tom Lane  > > wrote:
> >
> > One of the test cases added by this commit has not been very
> > stable in the buildfarm.  Latest example is here:
> >
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
> <
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
> >
> >
> > and I've seen similar failures intermittently on other machines.
> >
> > I'd suggest building this test atop a table that is more stable
> > than pg_class.  You're just waving a red flag in front of a bull
> > if you expect stable statistics from that during a regression
> run.
> > Nor do I see any particular reason for pg_class to be especially
> > suited to the test.
> >
> >
> > Yeah, it's not a good practice to use pg_class in this place.  While
> > looking through the test cases added by this commit, I noticed some
> > other minor issues that are not great.  Such as
> >
> > * The table 'btg' is inserted with 1 tuples, which seems a bit
> > expensive for a test.  I don't think we need such a big table to test
> > what we want.
> >
> > * I don't see why we need to manipulate GUC max_parallel_workers and
> > max_parallel_workers_per_gather.
> >
> > * I think we'd better write the tests with the keywords being all
> upper
> > or all lower.  A mixed use of upper and lower is not great. Such as
> in
> >
> >  explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
> >
> > * Some comments for the test queries are not easy to read.
> >
> > * For this statement
> >
> >  CREATE INDEX idx_y_x_z ON btg(y,x,w);
> >
> > I think the index name would cause confusion.  It creates an index on
> > columns y, x and w, but the name indicates an index on y, x and z.
> >
> > I'd like to write a draft patch for the fixes.
> >
> >
> > Here is the draft patch that fixes the issues I complained about in
> > upthread.
>


> I passed through the patch. Looks like it doesn't break anything. Why do
> you prefer to use count(*) in EXPLAIN instead of plain targetlist, like
> "SELECT x,y,..."?


Nothing special.  Just making the test cases consistent as much as
possible.


> Also, according to the test mentioned by Tom:
> 1. I see, PG uses IndexScan on (x,y). So, column x will be already
> sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w)
> instead of full sort?


I think that's because the planner chooses to use (z, w, x) to perform
the mergejoin.  I did not delve into the details, but I guess the cost
estimation decides this is cheaper.

Hi Alexander,

What do you think about the revisions for the test cases?

Thanks
Richard

Re: POC: GROUP BY optimization

2024-02-01 Thread Andrei Lepikhov


On 2/2/2024 11:06, Richard Guo wrote:


On Fri, Feb 2, 2024 at 11:32 AM Richard Guo > wrote:


On Fri, Feb 2, 2024 at 10:02 AM Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:


https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
 


and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.


Yeah, it's not a good practice to use pg_class in this place.  While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great.  Such as

* The table 'btg' is inserted with 1 tuples, which seems a bit
expensive for a test.  I don't think we need such a big table to test
what we want.

* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.

* I think we'd better write the tests with the keywords being all upper
or all lower.  A mixed use of upper and lower is not great. Such as in

     explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

* Some comments for the test queries are not easy to read.

* For this statement

     CREATE INDEX idx_y_x_z ON btg(y,x,w);

I think the index name would cause confusion.  It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.

I'd like to write a draft patch for the fixes.


Here is the draft patch that fixes the issues I complained about in
upthread.
I passed through the patch. Looks like it doesn't break anything. Why do 
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like 
"SELECT x,y,..."?

Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already 
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w) 
instead of full sort?
2. For memo, IMO, this test shows us the future near perspective of this 
feature: It is cheaper to use grouping order (w,z) instead of (z,w).


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2024-02-01 Thread Richard Guo

On Fri, Feb 2, 2024 at 11:32 AM Richard Guo  wrote:

> On Fri, Feb 2, 2024 at 10:02 AM Tom Lane  wrote:
>
>> One of the test cases added by this commit has not been very
>> stable in the buildfarm.  Latest example is here:
>>
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
>>
>> and I've seen similar failures intermittently on other machines.
>>
>> I'd suggest building this test atop a table that is more stable
>> than pg_class.  You're just waving a red flag in front of a bull
>> if you expect stable statistics from that during a regression run.
>> Nor do I see any particular reason for pg_class to be especially
>> suited to the test.
>
>
> Yeah, it's not a good practice to use pg_class in this place.  While
> looking through the test cases added by this commit, I noticed some
> other minor issues that are not great.  Such as
>
> * The table 'btg' is inserted with 1 tuples, which seems a bit
> expensive for a test.  I don't think we need such a big table to test
> what we want.
>
> * I don't see why we need to manipulate GUC max_parallel_workers and
> max_parallel_workers_per_gather.
>
> * I think we'd better write the tests with the keywords being all upper
> or all lower.  A mixed use of upper and lower is not great. Such as in
>
> explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
>
> * Some comments for the test queries are not easy to read.
>
> * For this statement
>
> CREATE INDEX idx_y_x_z ON btg(y,x,w);
>
> I think the index name would cause confusion.  It creates an index on
> columns y, x and w, but the name indicates an index on y, x and z.
>
> I'd like to write a draft patch for the fixes.
>

Here is the draft patch that fixes the issues I complained about in
upthread.

Thanks
Richard


v1-0001-Multiple-revises-for-the-GROUP-BY-reordering-tests.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-02-01 Thread Andrei Lepikhov


On 2/2/2024 09:02, Tom Lane wrote:

Alexander Korotkov  writes:

I'm going to push this if there are no objections.


One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.

Yeah, It is my fault. Please, see in the attachment the patch fixing that.
--
regards,
Andrei Lepikhov
Postgres Professional
From 11a049d95ee48e38ad569aab7663d8de91f946ad Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 2 Feb 2024 10:39:55 +0700
Subject: [PATCH] Replace the GROUP-BY optimization test with the same based on
 something less volatile when the pg_class relation.

---
 src/test/regress/expected/aggregates.out | 32 +++-
 src/test/regress/sql/aggregates.sql  |  9 +++
 2 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 7a73c19314..c2e1b8c9ed 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2873,7 +2873,6 @@ SELECT y,x,array_agg(distinct w) FROM btg WHERE y < 0 
GROUP BY x,y;
 (6 rows)
 
 RESET enable_incremental_sort;
-DROP TABLE btg;
 -- The case, when scanning sort order correspond to aggregate sort order but
 -- can not be found in the group-by list
 CREATE TABLE agg_sort_order (c1 int PRIMARY KEY, c2 int);
@@ -2901,32 +2900,31 @@ DROP TABLE agg_sort_order CASCADE;
 SET enable_hashjoin = off;
 SET enable_nestloop = off;
 explain (COSTS OFF)
-SELECT c1.relname,c1.relpages
-FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
-GROUP BY c1.reltuples,c1.relpages,c1.relname
-ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
- QUERY PLAN
  
--
+SELECT b1.x,b1.w FROM btg b1 JOIN btg b2 ON (b1.z=b2.z AND b1.w=b2.w)
+GROUP BY b1.x,b1.z,b1.w ORDER BY b1.z, b1.w, b1.x*b1.x;
+QUERY PLAN 
+---
  Incremental Sort
-   Sort Key: c1.relpages, c1.relname, ((c1.relpages * c1.relpages))
-   Presorted Key: c1.relpages, c1.relname
+   Sort Key: b1.z, b1.w, ((b1.x * b1.x))
+   Presorted Key: b1.z, b1.w
->  Group
- Group Key: c1.relpages, c1.relname, c1.reltuples
+ Group Key: b1.z, b1.w, b1.x
  ->  Incremental Sort
-   Sort Key: c1.relpages, c1.relname, c1.reltuples
-   Presorted Key: c1.relpages, c1.relname
+   Sort Key: b1.z, b1.w, b1.x
+   Presorted Key: b1.z, b1.w
->  Merge Join
- Merge Cond: ((c1.relpages = c2.relpages) AND (c1.relname 
= c2.relname))
+ Merge Cond: ((b1.z = b2.z) AND (b1.w = b2.w))
  ->  Sort
-   Sort Key: c1.relpages, c1.relname
-   ->  Seq Scan on pg_class c1
+   Sort Key: b1.z, b1.w
+   ->  Seq Scan on btg b1
  ->  Sort
-   Sort Key: c2.relpages, c2.relname
-   ->  Seq Scan on pg_class c2
+   Sort Key: b2.z, b2.w
+   ->  Seq Scan on btg b2
 (16 rows)
 
 RESET enable_hashjoin;
 RESET enable_nestloop;
+DROP TABLE btg;
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index 916dbf908f..3548fbb8db 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1229,8 +1229,6 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT y,x,array_agg(distinct w) FROM btg WHERE y < 0 GROUP BY x,y;
 RESET enable_incremental_sort;
 
-DROP TABLE btg;
-
 -- The case, when scanning sort order correspond to aggregate sort order but
 -- can not be found in the group-by list
 CREATE TABLE agg_sort_order (c1 int PRIMARY KEY, c2 int);
@@ -1245,13 +1243,12 @@ DROP TABLE agg_sort_order CASCADE;
 SET enable_hashjoin = off;
 SET enable_nestloop = off;
 explain (COSTS OFF)
-SELECT c1.relname,c1.relpages
-FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
-GROUP BY c1.reltuples,c1.relpages,c1.relname
-ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
+SELECT b1.x,b1.w FROM btg b1 JOIN

Re: POC: GROUP BY optimization

2024-02-01 Thread Richard Guo

On Fri, Feb 2, 2024 at 10:02 AM Tom Lane  wrote:

> Alexander Korotkov  writes:
> > I'm going to push this if there are no objections.
>
> One of the test cases added by this commit has not been very
> stable in the buildfarm.  Latest example is here:
>
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
>
> and I've seen similar failures intermittently on other machines.
>
> I'd suggest building this test atop a table that is more stable
> than pg_class.  You're just waving a red flag in front of a bull
> if you expect stable statistics from that during a regression run.
> Nor do I see any particular reason for pg_class to be especially
> suited to the test.

Yeah, it's not a good practice to use pg_class in this place.  While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great.  Such as

* The table 'btg' is inserted with 1 tuples, which seems a bit
expensive for a test.  I don't think we need such a big table to test
what we want.

* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.

* I think we'd better write the tests with the keywords being all upper
or all lower.  A mixed use of upper and lower is not great. Such as in

explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

* Some comments for the test queries are not easy to read.

* For this statement

CREATE INDEX idx_y_x_z ON btg(y,x,w);

I think the index name would cause confusion.  It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.

I'd like to write a draft patch for the fixes.

Thanks
Richard

Re: POC: GROUP BY optimization

2024-02-01 Thread Tom Lane

Alexander Korotkov  writes:
> I'm going to push this if there are no objections.

One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.

regards, tom lane

Re: POC: GROUP BY optimization

2024-01-26 Thread Robert Haas

On Fri, Jan 26, 2024 at 10:38 AM Tom Lane  wrote:
> Sadly, that's not a small task:
>
> * We'd need to put effort into assigning more realistic procost
> values --- preferably across the board, not just comparison functions.
> As long as all the comparison functions have procost 1.0, you're
> just flying blind.
>
> * As you mentioned, there'd need to be some accounting for the
> likely size of varlena inputs, and especially whether they might
> be toasted.
>
> * cost_sort knows nothing of the low-level sort algorithm improvements
> we've made in recent years, such as abbreviated keys.
>
> That's a lot of work, and I think it has to be done before we try
> to build infrastructure on top, not afterwards.

OK, that makes sense to me. Thanks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: POC: GROUP BY optimization

2024-01-26 Thread Tom Lane

Robert Haas  writes:
> On Tue, Dec 26, 2023 at 10:23 PM Tom Lane  wrote:
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost.  We simply do not have sufficiently accurate
>> cost information.  The previous patch in this thread got reverted because
>> of that (well, also some implementation issues, but mostly that), and
>> nothing has happened to make me think that another try will fare any
>> better.

> I'm late to the party, but I'd like to better understand what's being
> argued here.

What I am saying is that we don't have sufficiently accurate cost
information to support the sort of logic that got committed and
reverted before.  I did not mean to imply that it's not possible
to have such info, only that it is not present today.  IOW, what
I'm saying is that if you want to write code that tries to make
a cost-based preference of one sorting over another, you *first*
need to put in a bunch of legwork to create more accurate cost
numbers.  Trying to make such logic depend on the numbers we have
today is just going to result in garbage in, garbage out.

Sadly, that's not a small task:

* We'd need to put effort into assigning more realistic procost
values --- preferably across the board, not just comparison functions.
As long as all the comparison functions have procost 1.0, you're
just flying blind.

* As you mentioned, there'd need to be some accounting for the
likely size of varlena inputs, and especially whether they might
be toasted.

* cost_sort knows nothing of the low-level sort algorithm improvements
we've made in recent years, such as abbreviated keys.

That's a lot of work, and I think it has to be done before we try
to build infrastructure on top, not afterwards.

regards, tom lane

Re: POC: GROUP BY optimization

2024-01-26 Thread Robert Haas

On Tue, Dec 26, 2023 at 10:23 PM Tom Lane  wrote:
> I think it's a fool's errand to even try to separate different sort
> column orderings by cost.  We simply do not have sufficiently accurate
> cost information.  The previous patch in this thread got reverted because
> of that (well, also some implementation issues, but mostly that), and
> nothing has happened to make me think that another try will fare any
> better.

I'm late to the party, but I'd like to better understand what's being
argued here. If you're saying that, for some particular planner
problem, we should prefer a solution that doesn't need to know about
the relative cost of various sorts over one that does, I agree, for
exactly the reason that you state: our knowledge of sort costs won't
be reliable, and we will make mistakes. That's true in lots of
situations, not just related to sorts,
because estimation is a hard problem. Heuristics not based on cost are
going to be, in many cases, more accurate than heuristics based on
cost. They're also often cheaper, since they often let us reject
possible approaches very early, without all the bother of a cost
comparison.

But if you're saying that it's utterly impossible to know whether
sorting text will be cheaper or more expensive than sorting 4-byte
integers, and that if a particular problem can be solved only by
knowing which one is cheaper we should just give up, then I disagree.
In the absence of any other information, it must be right, at the very
least, to bank on varlena data types being more expensive to sort than
fixed-length data types. How much more expensive is hard to know,
because toasted blobs are going to be more expensive to sort than
short varlenas. But even before you reach the comparison function, a
pass-by-value datum has a significantly lower access cost than a
pass-by-reference datum. The fact that the pass-by-reference value
might be huge only compounds the problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: POC: GROUP BY optimization

2024-01-26 Thread vignesh C

On Thu, 25 Jan 2024 at 01:15, Alexander Korotkov  wrote:
>
> On Wed, Jan 24, 2024 at 7:38 PM Nathan Bossart  
> wrote:
> > A recent buildfarm failure [0] seems to indicate a name collision with the
> > "abc" index in the aggregates.sql test and the "abc" table in
> > namespace.sql.
> >
> > [0] 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet=2024-01-24%2014%3A05%3A14
>
> Thank you for catching this.  Fixed.

Since the patch has been committed, I have marked this entry as committed.

Regards,
Vignesh

Re: POC: GROUP BY optimization

2024-01-24 Thread Alexander Korotkov

On Wed, Jan 24, 2024 at 7:38 PM Nathan Bossart  wrote:
> A recent buildfarm failure [0] seems to indicate a name collision with the
> "abc" index in the aggregates.sql test and the "abc" table in
> namespace.sql.
>
> [0] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet=2024-01-24%2014%3A05%3A14

Thank you for catching this.  Fixed.

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

2024-01-24 Thread Nathan Bossart

A recent buildfarm failure [0] seems to indicate a name collision with the
"abc" index in the aggregates.sql test and the "abc" table in
namespace.sql.

[0] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet=2024-01-24%2014%3A05%3A14

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: POC: GROUP BY optimization

2024-01-19 Thread Alexander Korotkov

Hi!

I've applied your changes with minor editing, thank you.

On Thu, Jan 18, 2024 at 11:49 AM Andrei Lepikhov
 wrote:
> >> * Part of the work performed in this patch overlaps with that of
> >> preprocess_groupclause.  They are both trying to adjust the ordering of
> >> the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
> >> perform this work only once.
> >
> > Andrei, could you take a look.
> As I see, the PathKeyInfo list often contains duplicated pathkeys,
> coming from either sort_pathkeys or path->pathkeys orderings. So, I can
> propose to check duplicates each time (see step2.txt in attachment).

Actually I asked to recheck if we can cut some part of
preprocess_groupclause() given that we're reordering the pathkeys
later.  It seems that we can remove everything except the work with a
grouping set.  I've done this in the revised patchset.

I'm going to push this if there are no objections.

--
Regards,
Alexander Korotkov

0002-Explore-alternative-orderings-of-group-by-p-20240119.patch
Description: Binary data

0001-Generalize-common-code-of-adding-sort-befor-20240119.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-01-18 Thread Andrei Lepikhov


Just forgotten second attachment.

--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index 1095b73dac..b612420547 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -432,6 +432,21 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List 
**group_pathkeys,
return n;
 }
 
+static bool
+duplicated_pathkey_combination(List *infos, List *pathkeys)
+{
+   ListCell *lc;
+
+   foreach (lc, infos)
+   {
+   PathKeyInfo *info = lfirst_node(PathKeyInfo, lc);
+
+   if (compare_pathkeys(pathkeys, info->pathkeys) == 
PATHKEYS_EQUAL)
+   return true;
+   }
+   return false;
+}
+
 /*
  * get_useful_group_keys_orderings
  * Determine which orderings of GROUP BY keys are potentially 
interesting.
@@ -491,7 +506,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
 
if (n > 0 &&
(enable_incremental_sort || n == 
root->num_groupby_pathkeys) &&
-   compare_pathkeys(pathkeys, root->group_pathkeys) != 
PATHKEYS_EQUAL)
+   !duplicated_pathkey_combination(infos, pathkeys))
{
info = makeNode(PathKeyInfo);
info->pathkeys = pathkeys;
@@ -514,8 +529,9 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)

   ,

   root->num_groupby_pathkeys);
 
-   if (n > 0 && compare_pathkeys(pathkeys, root->group_pathkeys) 
!= PATHKEYS_EQUAL &&
-   (enable_incremental_sort || n == 
list_length(root->sort_pathkeys)))
+   if (n > 0 &&
+   (enable_incremental_sort || n == 
list_length(root->sort_pathkeys)) &&
+   !duplicated_pathkey_combination(infos, pathkeys))
{
info = makeNode(PathKeyInfo);
info->pathkeys = pathkeys;

Re: POC: GROUP BY optimization

2024-01-18 Thread Andrei Lepikhov


On 16/1/2024 22:05, Alexander Korotkov wrote:

On Tue, Jan 16, 2024 at 4:48 AM Richard Guo  wrote:

* When trying to match the ordering of GROUP BY to that of ORDER BY in
get_useful_group_keys_orderings, this patch checks against the length of
path->pathkeys.  This does not make sense.  I guess this is a typo and
the real intention is to check against root->sort_pathkeys.

Thanks! It is really my blunder - fresh look works.


--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
root->num_groupby_pathkeys);

 if (n > 0 &&
-   (enable_incremental_sort || n == list_length(path->pathkeys)))
+   (enable_incremental_sort || n == list_length(root->sort_pathkeys)))

However, I think this is also incorrect.  When incremental sort is
disabled, it is reasonable to consider reordering the GROUP BY keys only
if the number of matching pathkeys is equal to the length of
root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.


Hmm... Why should this be list_length(root->group_pathkeys) while
we're targeting root->sort_pathkeys.  I yet changed that to
list_length(root->sort_pathkeys).
I think, in the first case, when we are trying to arrange group-by keys 
according to underlying pathkeys with incremental sort = off, it makes 
sense to do if we fetch all group-by keys regardless of a more or equal 
number of path keys the incoming path contains. The code and test case 
are in step1.txt.



* When the original ordering of GROUP BY keys matches the path's
pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
generate duplicate PathKeyInfos.  Consequently, this duplication would
lead to the creation and addition of identical paths multiple times.
This is not great.  Consider the query below:

create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;

explain select count(*) from t group by a, b;
 QUERY PLAN
--
  GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
Group Key: a, b
->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260 
width=8)
(3 rows)

The same path with group keys {a, b} is created and added twice.


I tried to address that.


* Part of the work performed in this patch overlaps with that of
preprocess_groupclause.  They are both trying to adjust the ordering of
the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
perform this work only once.


Andrei, could you take a look.
As I see, the PathKeyInfo list often contains duplicated pathkeys, 
coming from either sort_pathkeys or path->pathkeys orderings. So, I can 
propose to check duplicates each time (see step2.txt in attachment).



* When reordering the GROUP BY keys, I think the following checks need
some improvements.

+   /*
+* Since 1349d27 pathkey coming from underlying node can be in the
+* root->group_pathkeys but not in the processed_groupClause. So, we
+* should be careful here.
+*/
+   sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
+   *group_clauses);
+   if (!sgc)
+   /* The grouping clause does not cover this pathkey */
+   break;

I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
valid before calling get_sortgroupref_clause_noerr with it.  Also, how
can we guarantee that the returned GROUP BY item is sortable?  Should we
check that with OidIsValid(sgc->sortop)?


Added.

Reviewed it, looks good.

--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index 919d54dd79..1095b73dac 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -489,8 +489,9 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
n = group_keys_reorder_by_pathkeys(path->pathkeys, , 
,

   root->num_groupby_pathkeys);
 
-   if (n > 0 && compare_pathkeys(pathkeys, root->group_pathkeys) 
!= PATHKEYS_EQUAL &&
-   (enable_incremental_sort || n == 
list_length(path->pathkeys)))
+   if (n > 0 &&
+   (enable_incremental_sort || n == 
root->num_groupby_pathkeys) &&
+   compare_pathkeys(pathkeys, root->group_pathkeys) != 
PATHKEYS_EQUAL)
{
info = makeNode(PathKeyInfo);
info->pathkeys = pathkeys;
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index ca38e78f21..67dd20f375 100644
---

Re: POC: GROUP BY optimization

2024-01-16 Thread Alexander Korotkov

Hi!

Thank you for your review.  The revised patchset is attached.

On Tue, Jan 16, 2024 at 4:48 AM Richard Guo  wrote:
> On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov  
> wrote:
>>
>> On Mon, Jan 15, 2024 at 8:42 AM Richard Guo  wrote:
>> > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov  
>> > wrote:
>> >>
>> >> Thank you for providing the test case relevant for this code change.
>> >> The revised patch incorporating this change is attached.  Now the
>> >> patchset looks good to me.  I'm going to push it if there are no
>> >> objections.
>> >
>> > Seems I'm late for the party.  Can we hold for several more days?  I'd
>> > like to have a review on this patch.
>>
>> Sure, NP.  I'll hold it till your review.
>
>
> Thanks.  Appreciate that.
>
> I have briefly reviewed this patch and here are some comments.
>
> * When trying to match the ordering of GROUP BY to that of ORDER BY in
> get_useful_group_keys_orderings, this patch checks against the length of
> path->pathkeys.  This does not make sense.  I guess this is a typo and
> the real intention is to check against root->sort_pathkeys.
>
> --- a/src/backend/optimizer/path/pathkeys.c
> +++ b/src/backend/optimizer/path/pathkeys.c
> @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
> *path)
>root->num_groupby_pathkeys);
>
> if (n > 0 &&
> -   (enable_incremental_sort || n == list_length(path->pathkeys)))
> +   (enable_incremental_sort || n == 
> list_length(root->sort_pathkeys)))
>
> However, I think this is also incorrect.  When incremental sort is
> disabled, it is reasonable to consider reordering the GROUP BY keys only
> if the number of matching pathkeys is equal to the length of
> root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.

Hmm... Why should this be list_length(root->group_pathkeys) while
we're targeting root->sort_pathkeys.  I yet changed that to
list_length(root->sort_pathkeys).

> * When the original ordering of GROUP BY keys matches the path's
> pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
> generate duplicate PathKeyInfos.  Consequently, this duplication would
> lead to the creation and addition of identical paths multiple times.
> This is not great.  Consider the query below:
>
> create table t (a int, b int);
> create index on t (a, b);
> set enable_hashagg to off;
>
> explain select count(*) from t group by a, b;
> QUERY PLAN
> --
>  GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
>Group Key: a, b
>->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260 
> width=8)
> (3 rows)
>
> The same path with group keys {a, b} is created and added twice.

I tried to address that.

> * Part of the work performed in this patch overlaps with that of
> preprocess_groupclause.  They are both trying to adjust the ordering of
> the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
> perform this work only once.

Andrei, could you take a look.

> * When reordering the GROUP BY keys, I think the following checks need
> some improvements.
>
> +   /*
> +* Since 1349d27 pathkey coming from underlying node can be in the
> +* root->group_pathkeys but not in the processed_groupClause. So, we
> +* should be careful here.
> +*/
> +   sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
> +   *group_clauses);
> +   if (!sgc)
> +   /* The grouping clause does not cover this pathkey */
> +   break;
>
> I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
> valid before calling get_sortgroupref_clause_noerr with it.  Also, how
> can we guarantee that the returned GROUP BY item is sortable?  Should we
> check that with OidIsValid(sgc->sortop)?

Added.

--
Regards,
Alexander Korotkov


0001-Generalize-common-code-of-adding-sort-befor-20240116.patch
Description: Binary data


0002-Explore-alternative-orderings-of-group-by-p-20240116.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-01-15 Thread Richard Guo

On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov 
wrote:

> On Mon, Jan 15, 2024 at 8:42 AM Richard Guo 
> wrote:
> > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov 
> wrote:
> >>
> >> Thank you for providing the test case relevant for this code change.
> >> The revised patch incorporating this change is attached.  Now the
> >> patchset looks good to me.  I'm going to push it if there are no
> >> objections.
> >
> > Seems I'm late for the party.  Can we hold for several more days?  I'd
> > like to have a review on this patch.
>
> Sure, NP.  I'll hold it till your review.

Thanks.  Appreciate that.

I have briefly reviewed this patch and here are some comments.

* When trying to match the ordering of GROUP BY to that of ORDER BY in
get_useful_group_keys_orderings, this patch checks against the length of
path->pathkeys.  This does not make sense.  I guess this is a typo and
the real intention is to check against root->sort_pathkeys.

--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path
*path)
   root->num_groupby_pathkeys);

if (n > 0 &&
-   (enable_incremental_sort || n == list_length(path->pathkeys)))
+   (enable_incremental_sort || n ==
list_length(root->sort_pathkeys)))

However, I think this is also incorrect.  When incremental sort is
disabled, it is reasonable to consider reordering the GROUP BY keys only
if the number of matching pathkeys is equal to the length of
root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.

* When the original ordering of GROUP BY keys matches the path's
pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
generate duplicate PathKeyInfos.  Consequently, this duplication would
lead to the creation and addition of identical paths multiple times.
This is not great.  Consider the query below:

create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;

explain select count(*) from t group by a, b;
QUERY PLAN
--
 GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
   Group Key: a, b
   ->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260
width=8)
(3 rows)

The same path with group keys {a, b} is created and added twice.

* Part of the work performed in this patch overlaps with that of
preprocess_groupclause.  They are both trying to adjust the ordering of
the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
perform this work only once.

* When reordering the GROUP BY keys, I think the following checks need
some improvements.

+   /*
+* Since 1349d27 pathkey coming from underlying node can be in the
+* root->group_pathkeys but not in the processed_groupClause. So, we
+* should be careful here.
+*/
+   sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
+   *group_clauses);
+   if (!sgc)
+   /* The grouping clause does not cover this pathkey */
+   break;

I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
valid before calling get_sortgroupref_clause_noerr with it.  Also, how
can we guarantee that the returned GROUP BY item is sortable?  Should we
check that with OidIsValid(sgc->sortop)?

Thanks
Richard

Re: POC: GROUP BY optimization

2024-01-15 Thread Alena Rybakina


On 15.01.2024 12:46, Andrei Lepikhov wrote:

On 15/1/2024 13:42, Richard Guo wrote:


On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov 
mailto:aekorot...@gmail.com>> wrote:


    Thank you for providing the test case relevant for this code change.
    The revised patch incorporating this change is attached. Now the
    patchset looks good to me.  I'm going to push it if there are no
    objections.


Seems I'm late for the party.  Can we hold for several more days?  I'd
like to have a review on this patch.
Get on board! It looks like this feature needs as much review as 
possible (likewise SJE).


Hi! Thank you for your work on this issue! I believe that this will help 
the scheduler to make a more optimal query plan here and therefore speed 
up their execution.


I have reviewed patches and noticed that we can add some code 
refactoring. I have attached a diff file (group_by.diff) to this email.


The changes involve spelling corrections, renaming variables and porting 
some common parts.



In addition, I have a few questions, since some points in the code 
remained unclear to me.


1.  I didn't understand why we have a question in the comment next to 
the enable_group_by_reordering variable in 
src/backend/optimizer/path/pathkeys.c file, I assumed it was spelling 
and fixed it in the diff file.


2. Why do we set the variable (path = path_save) here 
(add_paths_to_grouping_rel function) if we change its variable below and 
we can pass path_save as a parameter?


foreach(lc2, pathkey_orderings)
{
    PathKeyInfo *info = (PathKeyInfo *) lfirst(lc2);

    /* restore the path (we replace it in the loop) */
    path = path_save;

    path = make_ordered_path(root,
                         grouped_rel,
                         path,
                         cheapest_path,
                         info->pathkeys);
    if (path == NULL)
    continue;

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 5aac6d66776..8be58fa2b0e 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,7 +29,7 @@
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
 
-/* Consider reordering of GROUP BY keys? */
+/* Consider reordering of GROUP BY keys */
 bool		enable_group_by_reordering = true;
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -362,7 +362,7 @@ pathkeys_contained_in(List *keys1, List *keys2)
  *
  * Returns the number of GROUP BY keys with a matching pathkey.
  */
-static int
+static PathKeyInfo *
 group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 			   List **group_clauses,
 			   int num_groupby_pathkeys)
@@ -421,7 +421,16 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 	*group_clauses = list_concat_unique_ptr(new_group_clauses,
 			*group_clauses);
 
-	return n;
+	if (n > 0 &&
+		(enable_incremental_sort || n == list_length(*group_pathkeys)))
+	{
+		PathKeyInfo *info = makeNode(PathKeyInfo);
+		info->pathkeys = *group_pathkeys;
+		info->clauses = *group_clauses;
+		return info;
+	}
+
+	return NULL;
 }
 
 /*
@@ -436,7 +445,7 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
  *
  * - the original ordering, as specified by the GROUP BY clause,
  * - GROUP BY keys reordered to match 'path' ordering (as much as possible),
- * - GROUP BY keys to match target ORDER BY clause (as much as possible).
+ * - GROUP BY keys should match the target ORDER BY clause (as much as possible).
  */
 List *
 get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
@@ -475,20 +484,11 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 	 */
 	if (path->pathkeys)
 	{
-		int			n;
-
-		n = group_keys_reorder_by_pathkeys(path->pathkeys, , ,
+		info = group_keys_reorder_by_pathkeys(path->pathkeys, , ,
 		   root->num_groupby_pathkeys);
 
-		if (n > 0 &&
-			(enable_incremental_sort || n == list_length(path->pathkeys)))
-		{
-			info = makeNode(PathKeyInfo);
-			info->pathkeys = pathkeys;
-			info->clauses = clauses;
-
+		if (info)
 			infos = lappend(infos, info);
-		}
 	}
 
 	/*
@@ -497,21 +497,12 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 	 */
 	if (root->sort_pathkeys)
 	{
-		int			n;
-
-		n = group_keys_reorder_by_pathkeys(root->sort_pathkeys, ,
+		info = group_keys_reorder_by_pathkeys(root->sort_pathkeys, ,
 		   ,
 		   root->num_groupby_pathkeys);
 
-		if (n > 0 &&
-			(enable_incremental_sort || n == list_length(path->pathkeys)))
-		{
-			info = makeNode(PathKeyInfo);
-			info->pathkeys = pathkeys;
-			info->clauses = clauses;
-
+		if (info)
 			infos = lappend(infos, info);
-		}
 	}
 
 	return infos;
@@ -2163,27 +2154,26 @@ truncate_useless_pathkeys(PlannerInfo *root,
 		  RelOptInfo *rel,
 		  List *pathkeys)
 {

Re: POC: GROUP BY optimization

2024-01-15 Thread Andrei Lepikhov


On 15/1/2024 13:42, Richard Guo wrote:


On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov > wrote:


Thank you for providing the test case relevant for this code change.
The revised patch incorporating this change is attached.  Now the
patchset looks good to me.  I'm going to push it if there are no
objections.


Seems I'm late for the party.  Can we hold for several more days?  I'd
like to have a review on this patch.
Get on board! It looks like this feature needs as much review as 
possible (likewise SJE).


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2024-01-14 Thread Alexander Korotkov

On Mon, Jan 15, 2024 at 8:42 AM Richard Guo  wrote:
> On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov  
> wrote:
>>
>> Thank you for providing the test case relevant for this code change.
>> The revised patch incorporating this change is attached.  Now the
>> patchset looks good to me.  I'm going to push it if there are no
>> objections.
>
> Seems I'm late for the party.  Can we hold for several more days?  I'd
> like to have a review on this patch.

Sure, NP.  I'll hold it till your review.

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

2024-01-14 Thread Richard Guo

On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov 
wrote:

> Thank you for providing the test case relevant for this code change.
> The revised patch incorporating this change is attached.  Now the
> patchset looks good to me.  I'm going to push it if there are no
> objections.

Seems I'm late for the party.  Can we hold for several more days?  I'd
like to have a review on this patch.

Thanks
Richard

Re: POC: GROUP BY optimization

2024-01-14 Thread Alexander Korotkov

On Sun, Jan 14, 2024 at 2:14 PM Andrei Lepikhov
 wrote:
> On 13/1/2024 22:00, Alexander Korotkov wrote:
> > On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
> >  wrote:
> >> On 11/1/2024 18:30, Alexander Korotkov wrote:
> >>> On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  
> >>> wrote:
> > Hmm, I don't see this old code in these patches. Resend 0002-* because
> > of trailing spaces.
> 
> 
>  AFAIK, cfbot does not seek old versions of patchset parts in previous 
>  messages. So for it to run correctly, a new version of the whole 
>  patchset should be sent even if most patches are unchanged.
> >>>
> >>> Please, find the revised patchset with some refactoring and comments
> >>> improvement from me.  I'll continue looking into this.
> >> The patch looks better, thanks to your refactoring.
> >> I propose additional comments and tests to make the code more
> >> understandable (see attachment).
> >> I intended to look into this part of the code more, but the tests show a
> >> difference between PostgreSQL v.15 and v.16, which causes changes in the
> >> code of this feature.
> >
> > Makes sense.  I've incorporated your changes into the patchset.
> One more improvement. To underpin code change:
>
> -   return cur_ec;  /* Match! */
> +   {
> +   /*
> +* Match!
> +*
> +* Copy the sortref if it wasn't set yet. That may happen if
> +* the ec was constructed from WHERE clause, i.e. it doesn't
> +* have a target reference at all.
> +*/
> +   if (cur_ec->ec_sortref == 0 && sortref > 0)
> +   cur_ec->ec_sortref = sortref;
> +   return cur_ec;
> +   }
>
> I propose the test (see attachment). It shows why we introduce this
> change: GROUP-BY should juggle not only pathkeys generated by explicit
> sort operators but also planner-made, likewise in this example, by
> MergeJoin.

Thank you for providing the test case relevant for this code change.
The revised patch incorporating this change is attached.  Now the
patchset looks good to me.  I'm going to push it if there are no
objections.

--
Regards,
Alexander Korotkov


0002-Explore-alternative-orderings-of-group-by-p-20240115.patch
Description: Binary data


0001-Generalize-common-code-of-adding-sort-befor-20240115.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-01-14 Thread Andrei Lepikhov


On 13/1/2024 22:00, Alexander Korotkov wrote:

On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
 wrote:

On 11/1/2024 18:30, Alexander Korotkov wrote:

On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  wrote:

Hmm, I don't see this old code in these patches. Resend 0002-* because
of trailing spaces.



AFAIK, cfbot does not seek old versions of patchset parts in previous messages. 
So for it to run correctly, a new version of the whole patchset should be sent 
even if most patches are unchanged.


Please, find the revised patchset with some refactoring and comments
improvement from me.  I'll continue looking into this.

The patch looks better, thanks to your refactoring.
I propose additional comments and tests to make the code more
understandable (see attachment).
I intended to look into this part of the code more, but the tests show a
difference between PostgreSQL v.15 and v.16, which causes changes in the
code of this feature.


Makes sense.  I've incorporated your changes into the patchset.

One more improvement. To underpin code change:

-   return cur_ec;  /* Match! */
+   {
+   /*
+* Match!
+*
+* Copy the sortref if it wasn't set yet. That may happen if
+* the ec was constructed from WHERE clause, i.e. it doesn't
+* have a target reference at all.
+*/
+   if (cur_ec->ec_sortref == 0 && sortref > 0)
+   cur_ec->ec_sortref = sortref;
+   return cur_ec;
+   }

I propose the test (see attachment). It shows why we introduce this 
change: GROUP-BY should juggle not only pathkeys generated by explicit 
sort operators but also planner-made, likewise in this example, by 
MergeJoin.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 0d46e096e5..ca38e78f21 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2879,6 +2879,37 @@ FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
 (10 rows)
 
 DROP TABLE t1 CASCADE;
+-- Check, that GROUP-BY reordering optimization can operate with pathkeys, 
built
+-- by planner itself. For example, by MergeJoin.
+SET enable_hashjoin = off;
+SET enable_nestloop = off;
+explain (COSTS OFF)
+SELECT c1.relname,c1.relpages
+FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
+GROUP BY c1.reltuples,c1.relpages,c1.relname
+ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
+ QUERY PLAN
  
+-
+ Incremental Sort
+   Sort Key: c1.relpages, c1.relname, ((c1.relpages * c1.relpages))
+   Presorted Key: c1.relpages, c1.relname
+   ->  Group
+ Group Key: c1.relpages, c1.relname, c1.reltuples
+ ->  Incremental Sort
+   Sort Key: c1.relpages, c1.relname, c1.reltuples
+   Presorted Key: c1.relpages, c1.relname
+   ->  Merge Join
+ Merge Cond: ((c1.relpages = c2.relpages) AND (c1.relname 
= c2.relname))
+ ->  Sort
+   Sort Key: c1.relpages, c1.relname
+   ->  Seq Scan on pg_class c1
+ ->  Sort
+   Sort Key: c2.relpages, c2.relname
+   ->  Seq Scan on pg_class c2
+(16 rows)
+
+RESET enable_hashjoin;
+RESET enable_nestloop;
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index f99167ac9e..cf87b5d5dd 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1233,6 +1233,18 @@ SELECT array_agg(c1 ORDER BY c2),c2
 FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
 DROP TABLE t1 CASCADE;
 
+-- Check, that GROUP-BY reordering optimization can operate with pathkeys, 
built
+-- by planner itself. For example, by MergeJoin.
+SET enable_hashjoin = off;
+SET enable_nestloop = off;
+explain (COSTS OFF)
+SELECT c1.relname,c1.relpages
+FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
+GROUP BY c1.reltuples,c1.relpages,c1.relname
+ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
+RESET enable_hashjoin;
+RESET enable_nestloop;
+
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;

Re: POC: GROUP BY optimization

2024-01-13 Thread Alexander Korotkov

On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
 wrote:
> On 11/1/2024 18:30, Alexander Korotkov wrote:
> > On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  wrote:
> >>> Hmm, I don't see this old code in these patches. Resend 0002-* because
> >>> of trailing spaces.
> >>
> >>
> >> AFAIK, cfbot does not seek old versions of patchset parts in previous 
> >> messages. So for it to run correctly, a new version of the whole patchset 
> >> should be sent even if most patches are unchanged.
> >
> > Please, find the revised patchset with some refactoring and comments
> > improvement from me.  I'll continue looking into this.
> The patch looks better, thanks to your refactoring.
> I propose additional comments and tests to make the code more
> understandable (see attachment).
> I intended to look into this part of the code more, but the tests show a
> difference between PostgreSQL v.15 and v.16, which causes changes in the
> code of this feature.

Makes sense.  I've incorporated your changes into the patchset.

--
Regards,
Alexander Korotkov


0001-Generalize-common-code-of-adding-sort-befor-20240113.patch
Description: Binary data


0002-Explore-alternative-orderings-of-group-by-p-20240113.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-01-13 Thread Andrei Lepikhov


On 11/1/2024 18:30, Alexander Korotkov wrote:

Hi!

On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  wrote:

Hmm, I don't see this old code in these patches. Resend 0002-* because
of trailing spaces.



AFAIK, cfbot does not seek old versions of patchset parts in previous messages. 
So for it to run correctly, a new version of the whole patchset should be sent 
even if most patches are unchanged.


Please, find the revised patchset with some refactoring and comments
improvement from me.  I'll continue looking into this.

The patch looks better, thanks to your refactoring.
I propose additional comments and tests to make the code more 
understandable (see attachment).
I intended to look into this part of the code more, but the tests show a 
difference between PostgreSQL v.15 and v.16, which causes changes in the 
code of this feature.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index f4b7dcac21..5aac6d6677 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -397,6 +397,11 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List 
**group_pathkeys,
!list_member_ptr(*group_pathkeys, pathkey))
break;
 
+   /*
+* Since 1349d27 pathkey coming from underlying node can be in 
the
+* root->group_pathkeys but not in the processed_groupClause. 
So, we
+* should be careful here.
+*/
sgc = 
get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,

*group_clauses);
if (!sgc)
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 423c8ec3b6..0d46e096e5 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2857,6 +2857,28 @@ explain (COSTS OFF) SELECT x,w FROM btg GROUP BY w,x,y,z 
ORDER BY x*x,z;
 (8 rows)
 
 DROP TABLE btg;
+-- The case, when scanning sort order correspond to aggregate sort order but
+-- can not be found in the group-by list
+CREATE TABLE t1 (c1 int PRIMARY KEY, c2 int);
+CREATE UNIQUE INDEX ON t1(c2);
+explain (costs off)
+SELECT array_agg(c1 ORDER BY c2),c2
+FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
+   QUERY PLAN   
+
+ Sort
+   Sort Key: c2
+   ->  GroupAggregate
+ Group Key: c1
+ ->  Sort
+   Sort Key: c1, c2
+   ->  Bitmap Heap Scan on t1
+ Recheck Cond: (c2 < 100)
+ ->  Bitmap Index Scan on t1_c2_idx
+   Index Cond: (c2 < 100)
+(10 rows)
+
+DROP TABLE t1 CASCADE;
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index b9fcceedd7..f99167ac9e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1224,6 +1224,15 @@ explain (COSTS OFF) SELECT x,w FROM btg GROUP BY w,x,y,z 
ORDER BY x*x,z;
 
 DROP TABLE btg;
 
+-- The case, when scanning sort order correspond to aggregate sort order but
+-- can not be found in the group-by list
+CREATE TABLE t1 (c1 int PRIMARY KEY, c2 int);
+CREATE UNIQUE INDEX ON t1(c2);
+explain (costs off)
+SELECT array_agg(c1 ORDER BY c2),c2
+FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
+DROP TABLE t1 CASCADE;
+
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;

Re: POC: GROUP BY optimization

2024-01-11 Thread Alexander Korotkov

Hi!

On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  wrote:
>> Hmm, I don't see this old code in these patches. Resend 0002-* because
>> of trailing spaces.
>
>
> AFAIK, cfbot does not seek old versions of patchset parts in previous 
> messages. So for it to run correctly, a new version of the whole patchset 
> should be sent even if most patches are unchanged.

Please, find the revised patchset with some refactoring and comments
improvement from me.  I'll continue looking into this.

--
Regards,
Alexander Korotkov


0001-Generalize-common-code-of-adding-sort-befor-20240111.patch
Description: Binary data


0002-Explore-alternative-orderings-of-group-by-p-20240111.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-01-09 Thread Pavel Borisov

Hi, Andrei!

> Hmm, I don't see this old code in these patches. Resend 0002-* because
> of trailing spaces.
>

AFAIK, cfbot does not seek old versions of patchset parts in previous
messages. So for it to run correctly, a new version of the whole patchset
should be sent even if most patches are unchanged.

Regards,
Pavel Borisov

Re: POC: GROUP BY optimization

2024-01-09 Thread Andrei Lepikhov


On 9/1/2024 16:45, vignesh C wrote:

On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov  wrote:


Here is a new version of GROUP-BY optimization without sort model.

On 21/12/2023 17:53, Alexander Korotkov wrote:

I'd like to make some notes.

1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls.  I think
it would be good to extract it into a separate function.  Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.

Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't
practical, because it blows out the interface of the routine.


2) I wonder what planning overhead this patch could introduce?  Could
you try to measure the worst case?  What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index.  This should give us an understanding on whether we need
a separate GUC to control this feature.

In current implementation I don't anticipate any significant overhead.
GUC is needed here to allow users adhere their own ordering and to
disable feature in the case of problems.


4) I think we can do some optimizations when enable_incremental_sort
== off.  Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.

Hm, is it really make sense in current implementation?


CFBot shows the following errors at [1] with:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c: In function
‘estimate_num_groups’:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: warning:
implicit declaration of function ‘estimate_num_groups_incremental’
[-Wimplicit-function-declaration]
[08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
[08:33:28.813] | ^~~
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c: At top level:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: warning: no
previous prototype for ‘estimate_num_groups_incremental’
[-Wmissing-prototypes]
[08:33:28.813] 3400 | estimate_num_groups_incremental(PlannerInfo
*root, List *groupExprs,
[08:33:28.813] | ^~~
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: error:
conflicting types for ‘estimate_num_groups_incremental’
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: note:
previous implicit declaration of ‘estimate_num_groups_incremental’ was
here
[08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
Hmm, I don't see this old code in these patches. Resend 0002-* because 
of trailing spaces.


--
regards,
Andrei Lepikhov
Postgres Professional
From 45cfff5731b81c0df2af00f5e4212fc598f6a231 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 9 Jan 2024 12:32:15 +0700
Subject: [PATCH 2/2] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause, we can minimize
sort operations or avoid them if we synchronize the order of GROUP BY clauses
with the ORDER BY sort clause or sort order, which comes from the underlying
query tree. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch always keeps the ordering specified in the query, assuming the user
might have additional insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 222 ++
 src/backend/optimizer/plan/planner.c  | 214 +++--
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/paths.h |   3 +
 src/test/regress/expected/aggregates.out  | 132 +++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/aggregates.sql   |  47 
 10 files changed, 586 insertions(+), 69 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c 
b/src/backend/optimizer/path/equivclass.c
index e86dfeaecd..7dd14d0a43 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -652,7 +652,18

Re: POC: GROUP BY optimization

2024-01-09 Thread vignesh C

On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov  wrote:
>
> Here is a new version of GROUP-BY optimization without sort model.
>
> On 21/12/2023 17:53, Alexander Korotkov wrote:
> > I'd like to make some notes.
> >
> > 1) As already mentioned, there is clearly a repetitive pattern for the
> > code following after get_useful_group_keys_orderings() calls.  I think
> > it would be good to extract it into a separate function.  Please, do
> > this as a separate patch coming before the group-by patch. That would
> > simplify the review.
> Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't
> practical, because it blows out the interface of the routine.
>
> > 2) I wonder what planning overhead this patch could introduce?  Could
> > you try to measure the worst case?  What if we have a table with a lot
> > of indexes and a long list of group-by clauses partially patching
> > every index.  This should give us an understanding on whether we need
> > a separate GUC to control this feature.
> In current implementation I don't anticipate any significant overhead.
> GUC is needed here to allow users adhere their own ordering and to
> disable feature in the case of problems.
>
> > 4) I think we can do some optimizations when enable_incremental_sort
> > == off.  Then in get_useful_group_keys_orderings() we should only deal
> > with input_path fully matching the group-by clause, and try only full
> > match of group-by output to the required order.
> Hm, is it really make sense in current implementation?

CFBot shows the following errors at [1] with:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c: In function
‘estimate_num_groups’:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: warning:
implicit declaration of function ‘estimate_num_groups_incremental’
[-Wimplicit-function-declaration]
[08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
[08:33:28.813] | ^~~
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c: At top level:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: warning: no
previous prototype for ‘estimate_num_groups_incremental’
[-Wmissing-prototypes]
[08:33:28.813] 3400 | estimate_num_groups_incremental(PlannerInfo
*root, List *groupExprs,
[08:33:28.813] | ^~~
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: error:
conflicting types for ‘estimate_num_groups_incremental’
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: note:
previous implicit declaration of ‘estimate_num_groups_incremental’ was
here
[08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,

[1] - https://cirrus-ci.com/task/5074942069309440

Regards,
Vignesh

Re: POC: GROUP BY optimization

2024-01-09 Thread Andrei Lepikhov


Here is a new version of GROUP-BY optimization without sort model.

On 21/12/2023 17:53, Alexander Korotkov wrote:

I'd like to make some notes.

1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls.  I think
it would be good to extract it into a separate function.  Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.
Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't 
practical, because it blows out the interface of the routine.



2) I wonder what planning overhead this patch could introduce?  Could
you try to measure the worst case?  What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index.  This should give us an understanding on whether we need
a separate GUC to control this feature.
In current implementation I don't anticipate any significant overhead. 
GUC is needed here to allow users adhere their own ordering and to 
disable feature in the case of problems.



4) I think we can do some optimizations when enable_incremental_sort
== off.  Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.

Hm, is it really make sense in current implementation?

--
regards,
Andrei Lepikhov
Postgres Professional
From ea068e221a754a463142575f6e0360d3118475f8 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 9 Jan 2024 12:00:27 +0700
Subject: [PATCH 1/2] Generalize common code of adding sort before generation
 of grouping paths.

---
 src/backend/optimizer/plan/planner.c | 209 ---
 1 file changed, 61 insertions(+), 148 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 667723b675..fcf647940e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6809,6 +6809,52 @@ done:
return parallel_workers;
 }
 
+static Path *
+try_presorting(PlannerInfo *root, RelOptInfo *grouped_rel,
+   Path *path, Path *cheapest_path)
+{
+   boolis_sorted;
+   int presorted_keys;
+
+   is_sorted = pathkeys_count_contained_in(root->group_pathkeys,
+   
path->pathkeys,
+   
_keys);
+
+   if (!is_sorted)
+   {
+   /*
+* Try at least sorting the cheapest path and also try
+* incrementally sorting any path which is partially sorted
+* already (no need to deal with paths which have presorted
+* keys when incremental sort is disabled unless it's the
+* cheapest input path).
+*/
+   if (path != cheapest_path &&
+   (presorted_keys == 0 || !enable_incremental_sort))
+   return NULL;
+
+   /*
+* We've no need to consider both a sort and incremental sort.
+* We'll just do a sort if there are no presorted keys and an
+* incremental sort when there are presorted keys.
+*/
+   if (presorted_keys == 0 || !enable_incremental_sort)
+   path = (Path *) create_sort_path(root,
+   
 grouped_rel,
+   
 path,
+   
 root->group_pathkeys,
+   
 -1.0);
+   else
+   path = (Path *) create_incremental_sort_path(root,
+   
 grouped_rel,
+   
 path,
+   
 root->group_pathkeys,
+   
 presorted_keys,
+   
 -1.0);
+   }
+   return path;
+}
+
 /*
  * add_paths_to_grouping_rel
  *
@@ -6840,45 +6886,11 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo 
*input_rel,
foreach(lc, input_rel->pathlist)
{
Path   *path = (Path *)

Re: POC: GROUP BY optimization

2023-12-28 Thread Andrei Lepikhov


On 28/12/2023 18:29, Alexander Korotkov wrote:

On Thu, Dec 28, 2023 at 10:22 AM Andrei Lepikhov
 wrote:

But arrangement with an ORDER BY clause doesn't work:

DROP INDEX abc;
explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);

I think the reason is that the sort_pathkeys and group_pathkeys are
physically different structures, and we can't just compare pointers here.


I haven't yet looked into the code.  But this looks strange to me.
Somehow, optimizer currently matches index pathkeys to ORDER BY
pathkeys.  If GROUP BY pathkeys could be matched to index pathkeys,
then it should be possible to match them to ORDER BY pathkeys too.
Oh, I found the mistake: I got used to using GROUP BY and ORDER BY on 
many columns with round brackets. In the case of the grouping list, it 
doesn't change anything. But ordering treats it as a WholeRowVar and 
breaks group-by arrangement. Look:

explain (COSTS OFF) SELECT relname,reltuples FROM pg_class
GROUP BY relname,reltuples ORDER BY reltuples,relname;

 Group
   Group Key: reltuples, relname
   ->  Sort
 Sort Key: reltuples, relname
 ->  Seq Scan on pg_class
But:
explain (COSTS OFF) SELECT relname,reltuples FROM pg_class
GROUP BY relname,reltuples ORDER BY (reltuples,relname);

 Sort
   Sort Key: (ROW(reltuples, relname))
   ->  Group
 Group Key: relname, reltuples
 ->  Sort
   Sort Key: relname, reltuples
   ->  Seq Scan on pg_class

So, let's continue to work.

--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2023-12-28 Thread Alexander Korotkov

On Thu, Dec 28, 2023 at 10:22 AM Andrei Lepikhov
 wrote:
> But arrangement with an ORDER BY clause doesn't work:
>
> DROP INDEX abc;
> explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);
>
> I think the reason is that the sort_pathkeys and group_pathkeys are
> physically different structures, and we can't just compare pointers here.

I haven't yet looked into the code.  But this looks strange to me.
Somehow, optimizer currently matches index pathkeys to ORDER BY
pathkeys.  If GROUP BY pathkeys could be matched to index pathkeys,
then it should be possible to match them to ORDER BY pathkeys too.

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

2023-12-28 Thread Andrei Lepikhov


On 27/12/2023 12:07, Tom Lane wrote:

Andrei Lepikhov  writes:

To be clear. In [1], I mentioned we can perform micro-benchmarks and
structure costs of operators. At least for fixed-length operators, it is
relatively easy.


I repeat what I said: this is a fool's errand.  You will not get
trustworthy results even for the cases you measured, let alone
all the rest.  I'd go as far as to say I would not believe your
microbenchmarks, because they would only apply for one platform,
compiler, backend build, phase of the moon, etc.


Thanks for the explanation.
I removed all cost-related codes. It still needs to be finished; I will 
smooth the code further and rewrite regression tests - many of them 
without cost-dependent reorderings look silly. Also, remember 
Alexander's remarks, which must be implemented, too.

But already here, it works well. Look:

Preliminaries:
CREATE TABLE t(x int, y int, z text, w int);
INSERT INTO t SELECT gs%100,gs%100, 'abc' || gs%10, gs
  FROM generate_series(1,1) AS gs;
CREATE INDEX abc ON t(x,y);
ANALYZE t;
SET enable_hashagg = 'off';

This patch eliminates unneeded Sort operation:
explain SELECT x,y FROM t GROUP BY (x,y);
explain SELECT x,y FROM t GROUP BY (y,x);

Engages incremental sort:
explain SELECT x,y FROM t GROUP BY (x,y,z,w);
explain SELECT x,y FROM t GROUP BY (z,y,w,x);
explain SELECT x,y FROM t GROUP BY (w,z,x,y);
explain SELECT x,y FROM t GROUP BY (w,x,z,y);

Works with subqueries:
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z) AS q1
GROUP BY (w,x,z,y);
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z LIMIT 100) AS q1
GROUP BY (w,x,z,y);

But arrangement with an ORDER BY clause doesn't work:

DROP INDEX abc;
explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);

I think the reason is that the sort_pathkeys and group_pathkeys are 
physically different structures, and we can't just compare pointers here.


--
regards,
Andrei Lepikhov
Postgres Professional
From 6183555c2c56d7cbbc44f213f6c5bc4cbcaef1ec Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Sep 2023 11:20:03 +0700
Subject: [PATCH] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause, we can minimize
sort operations or avoid them if we synchronize the order of GROUP BY clauses
with the ORDER BY sort clause or sort order, which comes from the underlying
query tree. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch always keeps the ordering specified in the query, assuming the user
might have additional insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 224 +
 src/backend/optimizer/plan/planner.c  | 464 ++
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/paths.h |   3 +
 src/test/regress/expected/aggregates.out  | 235 +
 .../regress/expected/incremental_sort.out |  20 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/aggregates.sql   |  99 
 src/test/regress/sql/incremental_sort.sql |   2 +-
 13 files changed, 912 insertions(+), 210 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c 
b/src/backend/optimizer/path/equivclass.c
index 7fa502d6e2..07edd4f38e 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -652,7 +652,18 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 
if (opcintype == cur_em->em_datatype &&
equal(expr, cur_em->em_expr))
-   return cur_ec;  /* Match! */
+   {
+   /*
+* Match!
+*
+* Copy the sortref if it wasn't set yet. That 
may happen if
+* the ec was constructed from WHERE clause, 
i.e. it doesn't
+* have a target reference at all.
+

Re: POC: GROUP BY optimization

2023-12-26 Thread Tom Lane

Andrei Lepikhov  writes:
> To be clear. In [1], I mentioned we can perform micro-benchmarks and 
> structure costs of operators. At least for fixed-length operators, it is 
> relatively easy.

I repeat what I said: this is a fool's errand.  You will not get
trustworthy results even for the cases you measured, let alone
all the rest.  I'd go as far as to say I would not believe your
microbenchmarks, because they would only apply for one platform,
compiler, backend build, phase of the moon, etc.

regards, tom lane

Re: POC: GROUP BY optimization

2023-12-26 Thread Andrei Lepikhov


On 27/12/2023 11:15, Alexander Korotkov wrote:

On Wed, Dec 27, 2023 at 5:23 AM Tom Lane  wrote:

Alexander Korotkov  writes:

2) An accurate estimate of the sorting cost is quite a difficult task.


Indeed.


What if we make a simple rule of thumb that sorting integers and
floats is cheaper than sorting numerics and strings with collation C,
in turn, that is cheaper than sorting collation-aware strings
(probably more groups)?  Within the group, we could keep the original
order of items.


I think it's a fool's errand to even try to separate different sort
column orderings by cost.  We simply do not have sufficiently accurate
cost information.  The previous patch in this thread got reverted because
of that (well, also some implementation issues, but mostly that), and
nothing has happened to make me think that another try will fare any
better.
To be clear. In [1], I mentioned we can perform micro-benchmarks and 
structure costs of operators. At least for fixed-length operators, it is 
relatively easy. So, the main block here is an accurate prediction of 
ndistincts for different combinations of columns. Does it make sense to 
continue to design the feature in the direction of turning on choosing 
between different sort column orderings if we have extended statistics 
on the columns?


[1] 
https://www.postgresql.org/message-id/e3602ccb-e643-2e79-ed2c-1175a8053...@postgrespro.ru


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2023-12-26 Thread Tom Lane

Alexander Korotkov  writes:
> On Wed, Dec 27, 2023 at 5:23 AM Tom Lane  wrote:
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost.

> Besides sorting column orderings by cost, this patch also tries to
> match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys.  Do
> you think there is a chance for the second part if we leave the cost
> part aside?

I think it's definitely reasonable to try to match up available
orderings, because that doesn't really require fine distinctions
of cost: either it matches or it doesn't.  Eliminating a sort step
entirely is clearly a win.  (Incremental sort complicates this though.
I doubt our cost model for incremental sorts is any good either, so
I am not eager to rely on that more heavily.)

regards, tom lane

Re: POC: GROUP BY optimization

2023-12-26 Thread Alexander Korotkov

On Wed, Dec 27, 2023 at 5:23 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > 2) An accurate estimate of the sorting cost is quite a difficult task.
>
> Indeed.
>
> > What if we make a simple rule of thumb that sorting integers and
> > floats is cheaper than sorting numerics and strings with collation C,
> > in turn, that is cheaper than sorting collation-aware strings
> > (probably more groups)?  Within the group, we could keep the original
> > order of items.
>
> I think it's a fool's errand to even try to separate different sort
> column orderings by cost.  We simply do not have sufficiently accurate
> cost information.  The previous patch in this thread got reverted because
> of that (well, also some implementation issues, but mostly that), and
> nothing has happened to make me think that another try will fare any
> better.

If there is a choice of what to compare first: 8-bytes integers or
collation-aware texts possibly toasted, then the answer is pretty
evident for me.  For sure, there are cases then this choice is wrong.
But even if all the integers appear to be the same, the penalty isn't
that much.

Besides sorting column orderings by cost, this patch also tries to
match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys.  Do
you think there is a chance for the second part if we leave the cost
part aside?

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

2023-12-26 Thread Tom Lane

Alexander Korotkov  writes:
> 2) An accurate estimate of the sorting cost is quite a difficult task.

Indeed.

> What if we make a simple rule of thumb that sorting integers and
> floats is cheaper than sorting numerics and strings with collation C,
> in turn, that is cheaper than sorting collation-aware strings
> (probably more groups)?  Within the group, we could keep the original
> order of items.

I think it's a fool's errand to even try to separate different sort
column orderings by cost.  We simply do not have sufficiently accurate
cost information.  The previous patch in this thread got reverted because
of that (well, also some implementation issues, but mostly that), and
nothing has happened to make me think that another try will fare any
better.

regards, tom lane

Re: POC: GROUP BY optimization

2023-12-26 Thread Alexander Korotkov

On Tue, Dec 26, 2023 at 1:37 PM Andrei Lepikhov
 wrote:
> On 21/12/2023 17:53, Alexander Korotkov wrote:
> > On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
> >  wrote:
> >> New version of the patch. Fixed minor inconsistencies and rebased onto
> >> current master.
> > Thank you (and other authors) for working on this subject.  Indeed to
> > GROUP BY clauses are order-agnostic.  Reordering them in the most
> > suitable order could give up significant query planning benefits.  I
> > went through the thread: I see significant work has been already made
> > on this patch, the code is quite polished.
> Maybe, but issues, mentioned in [1], still not resolved. It is the only
> reason, why this thread hasn't been active.

Yes, this makes sense.  I have a couple of points from me on this subject.
1) The patch reorders GROUP BY items not only to make comparison
cheaper but also to match the ordering of input paths and to match the
ORDER BY clause.  Thus, even if we leave aside for now sorting GROUP
BY items by their cost, the patch will remain valuable.
2) An accurate estimate of the sorting cost is quite a difficult task.
What if we make a simple rule of thumb that sorting integers and
floats is cheaper than sorting numerics and strings with collation C,
in turn, that is cheaper than sorting collation-aware strings
(probably more groups)?  Within the group, we could keep the original
order of items.

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

2023-12-26 Thread Andrei Lepikhov

On 21/12/2023 17:53, Alexander Korotkov wrote:

On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
wrote:

New version of the patch. Fixed minor inconsistencies and rebased onto
current master.

Thank you (and other authors) for working on this subject. Indeed to
GROUP BY clauses are order-agnostic. Reordering them in the most
suitable order could give up significant query planning benefits. I
went through the thread: I see significant work has been already made
on this patch, the code is quite polished.
Maybe, but issues, mentioned in [1], still not resolved. It is the only
reason, why this thread hasn't been active.

I'd like to make some notes.
1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls. I think
it would be good to extract it into a separate function. Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.
Yeah, these parts of code a bit different. I will try to make common
routine.

2) I wonder what planning overhead this patch could introduce? Could
you try to measure the worst case? What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index. This should give us an understanding on whether we need
a separate GUC to control this feature.

Ok> 3) I see that get_useful_group_keys_orderings() makes 3 calls to

get_cheapest_group_keys_order() function. Each time
get_cheapest_group_keys_order() performs the cost estimate and
reorders the free keys. However, cost estimation implies the system
catalog lookups (that is quite expensive). I wonder if we could
change the algorithm. Could we just sort the group-by keys by cost
once, save this ordering and then just re-use it. So, every time we
need to reorder a group by, we can just pull the required keys to the
top and use saved ordering for the rest. I also wonder if we could do
this once for add_paths_to_grouping_rel() and
create_partial_grouping_paths() calls. So, it probably should be
somewhere in create_ordinary_grouping_paths().
Thanks for the idea!> 4) I think we can do some optimizations when
enable_incremental_sort

== off. Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.
Oh, we had designed before the incremental sort was invented. Will see
what we can do here.

[1]
https://www.postgresql.org/message-id/60610df1-c32f-ebdf-e58c-7a664431f452%40enterprisedb.com

--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2023-12-21 Thread Alexander Korotkov

Hi!

On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
 wrote:
>
> New version of the patch. Fixed minor inconsistencies and rebased onto
> current master.

Thank you (and other authors) for working on this subject.  Indeed to
GROUP BY clauses are order-agnostic.  Reordering them in the most
suitable order could give up significant query planning benefits.  I
went through the thread: I see significant work has been already made
on this patch, the code is quite polished.

I'd like to make some notes.

1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls.  I think
it would be good to extract it into a separate function.  Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.

2) I wonder what planning overhead this patch could introduce?  Could
you try to measure the worst case?  What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index.  This should give us an understanding on whether we need
a separate GUC to control this feature.

3) I see that get_useful_group_keys_orderings() makes 3 calls to
get_cheapest_group_keys_order() function.  Each time
get_cheapest_group_keys_order() performs the cost estimate and
reorders the free keys.  However, cost estimation implies the system
catalog lookups (that is quite expensive).  I wonder if we could
change the algorithm.  Could we just sort the group-by keys by cost
once, save this ordering and then just re-use it.  So, every time we
need to reorder a group by, we can just pull the required keys to the
top and use saved ordering for the rest.  I also wonder if we could do
this once for add_paths_to_grouping_rel() and
create_partial_grouping_paths() calls.  So, it probably should be
somewhere in create_ordinary_grouping_paths().

4) I think we can do some optimizations when enable_incremental_sort
== off.  Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

2023-10-01 Thread Andrei Lepikhov

New version of the patch. Fixed minor inconsistencies and rebased onto 
current master.


--
regards,
Andrey Lepikhov
Postgres Professional
From 2f5a42c8a53286f830e3376ff4d3f8b7d4215b4b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Sep 2023 11:20:03 +0700
Subject: [PATCH] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may depend heavily on the order in which the keys are compared when
building the groups. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of the keys
and leave the rest to the optimizer. But that might get very expensive, so we
try to pick only a couple interesting orderings based on both local and global
information.

When planning the grouping path, we explore statistics (number of distinct
values, cost of the comparison function) for the keys and reorder them
to minimize comparison costs. Intuitively, it may be better to perform more
expensive comparisons (for complex data types, etc.) last because maybe
the cheaper comparisons will be enough. Similarly, the higher the cardinality
of a key, the lower the probability we'll need to compare more keys. The patch
generates and costs various orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch generates orderings and picks those, minimizing the comparison cost
(for various path keys), and then adds orderings that might be useful for
operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps
the ordering specified in the query, assuming the user might have additional
insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 .../postgres_fdw/expected/postgres_fdw.out|  36 +-
 src/backend/optimizer/path/costsize.c | 363 ++-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 600 ++
 src/backend/optimizer/plan/planner.c  | 465 --
 src/backend/optimizer/util/pathnode.c |   4 +-
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |   7 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 src/test/regress/expected/merge.out   |  15 +-
 .../regress/expected/partition_aggregate.out  |  44 +-
 src/test/regress/expected/partition_join.out  |  75 +--
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 23 files changed, 1769 insertions(+), 382 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 144c114d0f..63af7feabe 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2319,18 +2319,21 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT 
DISTINCT t2.c1, t3.c1 FROM
 -- join with pseudoconstant quals
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1 AND CURRENT_USER 
= SESSION_USER) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-   
QUERY PLAN  
 
-
+  QUERY PLAN

Re: POC: GROUP BY optimization

2023-09-25 Thread Andrey Lepikhov


On 20/7/2023 18:46, Tomas Vondra wrote:

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

According to this issue, I see two options:
1. Go through the grouping column list and find the most reliable one. 
If we have a unique column or column with statistics on the number of 
distinct values, which is significantly more than ndistincts for other 
grouping columns, we can place this column as the first in the grouping. 
It should guarantee the reliability of such a decision, isn't it?
2. If we have extended statistics on distinct values and these 
statistics cover some set of first columns in the grouping list, we can 
optimize these positions. It also looks reliable.


Any thoughts?

--
regards,
Andrey Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2023-09-18 Thread Andrey Lepikhov


On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.
Ok, some thoughts on this part of the task. As I see, we have not so 
many different operators: 26 with fixed width and 16 with variable width:


SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
  JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen>0
ORDER BY o.oid;

SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
  JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen<0
ORDER BY o.oid;

Benchmarking procedure of types with fixed length can be something like 
below:


CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE (
nrows integer,
exec_time float
)  AS $$
DECLARE
  data json;
  step integer;
BEGIN
  SET work_mem='1GB';

  FOR step IN 0..3 LOOP
SELECT pow(100, step)::integer INTO nrows;
DROP TABLE IF EXISTS test CASCADE;
EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x
FROM generate_series(1,%s) AS gs;', typ, nrows);

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON)
  SELECT * FROM test ORDER BY (x) DESC INTO data;
SELECT data->0->'Execution Time' INTO exec_time;
RETURN NEXT;
  END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;

Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow 
of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for 
the integer type (as a basis) we can calculate costs for other 
operators. Variable-width types, i think, could require more complex 
technique to check dependency on the length.


Does this way look worthwhile?

--
regards,
Andrey Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2023-09-12 Thread Andrey Lepikhov


Hi,
Here is the patch rebased on the current master. Also, I fixed some 
minor slips and one static analyzer warning.
This is just for adding to the next commitfest and enforcing work with 
this patch.


One extra difference in newly added postgres_fdw tests is caused by this 
patch - see changes in the query plan in attachment.


--
regards,
Andrey Lepikhov
Postgres Professional
From 33953655c9ac3f9ec64b80c9f2a2ff38bd178745 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Sep 2023 11:20:03 +0700
Subject: [PATCH] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may depend heavily on the order in which the keys are compared when
building the groups. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of the keys
and leave the rest to the optimizer. But that might get very expensive, so we
try to pick only a couple interesting orderings based on both local and global
information.

When planning the grouping path, we explore statistics (number of distinct
values, cost of the comparison function) for the keys and reorder them
to minimize comparison costs. Intuitively, it may be better to perform more
expensive comparisons (for complex data types, etc.) last because maybe
the cheaper comparisons will be enough. Similarly, the higher the cardinality
of a key, the lower the probability we'll need to compare more keys. The patch
generates and costs various orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch generates orderings and picks those, minimizing the comparison cost
(for various path keys), and then adds orderings that might be useful for
operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps
the ordering specified in the query, assuming the user might have additional
insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 .../postgres_fdw/expected/postgres_fdw.out|  36 +-
 src/backend/optimizer/path/costsize.c | 363 ++-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 602 ++
 src/backend/optimizer/plan/planner.c  | 465 --
 src/backend/optimizer/util/pathnode.c |   4 +-
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |   7 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 src/test/regress/expected/merge.out   |  15 +-
 .../regress/expected/partition_aggregate.out  |  44 +-
 src/test/regress/expected/partition_join.out  |  75 +--
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 23 files changed, 1771 insertions(+), 382 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 144c114d0f..63af7feabe 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2319,18 +2319,21 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT 
DISTINCT t2.c1, t3.c1 FROM
 -- join with pseudoconstant quals
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1 AND CURRENT_USER 
= SESSION_USER) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-   
QUERY PLAN  
 
-
+

Re: POC: GROUP BY optimization

2023-07-24 Thread Tomas Vondra




On 7/24/23 14:04, Andrey Lepikhov wrote:
> On 24/7/2023 16:56, Tomas Vondra wrote:
>> On 7/24/23 04:10, Andrey Lepikhov wrote:
>>> On 20/7/2023 18:46, Tomas Vondra wrote:
 On 7/20/23 08:37, Andrey Lepikhov wrote:
> On 3/10/2022 21:56, Tom Lane wrote:
>> Revert "Optimize order of GROUP BY keys".
>>
>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>> several follow-on fixes.
>> ...
>> Since we're hard up against the release deadline for v15, let's
>> revert these changes for now.  We can always try again later.
>
> It may be time to restart the project. As a first step, I rebased the
> patch on the current master. It wasn't trivial because of some latest
> optimizations (a29eab, 1349d27 and 8d83a5d).
> Now, Let's repeat the review and rewrite the current path according to
> the reasons uttered in the revert commit.

 I think the fundamental task is to make the costing more reliable, and
 the commit message 443df6e2db points out a couple challenges in this
 area. Not sure how feasible it is to address enough of them ...

 1) procost = 1.0 - I guess we could make this more realistic by doing
 some microbenchmarks and tuning the costs for the most expensive cases.

 2) estimating quicksort comparisons - This relies on ndistinct
 estimates, and I'm not sure how much more reliable we can make those.
 Probably not much :-( Not sure what to do about this, the only thing I
 can think of is to track "reliability" of the estimates and only do the
 reordering if we have high confidence in the estimates. That means
 we'll
 miss some optimization opportunities, but it should limit the risk.
>>> I read up on the history of this thread.
>>> As I see, all the problems mentioned above can be beaten by excluding
>>> the new cost model at all. We can sort GROUP BY columns according to the
>>> 'ndistinct' value.
>>> I see the reason for introducing the cost model in [1]. The main
>>> supporting point here is that with this patch, people couldn't optimize
>>> the query by themselves, organizing the order of the columns in a more
>>> optimal way. But now we have at least the GUC to switch off the
>>> behaviour introduced here. Also, some extensions, like the well-known
>>> pg_hint_plan, can help with automation.
>>
>> I think the main concern is that if we reorder the group keys and get it
>> wrong, it's a regression. If that happens often (due to costing based on
>> poor stats), it's a problem. Yes, there's a GUC, but that's a rather
>> blunt instrument, unfortunately.
> I see. My point here is if the ndistinct of one column is much more than
> the ndistinct of another, it is more probable that this correlation will
> be kept in the grouping phase. Here we can get some regression, which
> can be overweighed by the possibility below.

I think the word "probable" hits what the problem is. Because what if it
isn't? Also, we don't actually need ndistinct for individual attributes,
but for the groups of attributes. Imagine you do

GROUP BY a, b

so you need to estimate whether to do (a,b) or (b,a). You need to calculate

  ndistinct(a,b) / ndistinct(a)

and

  ndistinct(b,a) / ndistinct(b)

And similarly for more grouping keys. This is why we have ndistinct
extended statistics, mostly.

>>
>>> So, how about committing of the undoubted part of the feature and
>>> working on the cost model in a new thread?
>>>
>>
>> But Tom's commit message says this:
>>
>>  Worse, to arrive at estimates of the number of calls made to the
>>  lower-order-column comparison functions, the code needs to make
>>  estimates of the numbers of distinct values of multiple columns,
>>  which are necessarily even less trustworthy than per-column stats.
>>
>> so I'm not sure this really counts as "undoubted".
> Don't try to estimate multiple  columns - just sort columns according to
> the value of ndistinct as a heuristic.

Surely you realize how easy such simple heuristics can fail, right?

> I think we should estimate the number of values of multiple columns only
> if we have extended statistics on these columns. And this can extend the
> applicability of extended statistics.
> 

I don't see why this should estimate ndistinct differently than every
other place. That'd certainly be surprising. I can be convinced, but
there needs to be sound justification why that's the right way to do
(i.e. why it's less likely to cause poor planning choices).

> The suggestion above is just an attempt to gather low-hanging fruit
> right now. If it is not applicable, we will go a long way.
> 
I'm not sure this qualities as "low hanging fruit" - that generally
means simple changes with little risk. But you're using it for rather
simplistic heuristic that can easily misfire (I think). At least that's
my impression, I can be convinced otherwise.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The

Re: POC: GROUP BY optimization

2023-07-24 Thread Andrey Lepikhov


On 24/7/2023 16:56, Tomas Vondra wrote:

On 7/24/23 04:10, Andrey Lepikhov wrote:

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.


I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

I read up on the history of this thread.
As I see, all the problems mentioned above can be beaten by excluding
the new cost model at all. We can sort GROUP BY columns according to the
'ndistinct' value.
I see the reason for introducing the cost model in [1]. The main
supporting point here is that with this patch, people couldn't optimize
the query by themselves, organizing the order of the columns in a more
optimal way. But now we have at least the GUC to switch off the
behaviour introduced here. Also, some extensions, like the well-known
pg_hint_plan, can help with automation.


I think the main concern is that if we reorder the group keys and get it
wrong, it's a regression. If that happens often (due to costing based on
poor stats), it's a problem. Yes, there's a GUC, but that's a rather
blunt instrument, unfortunately.
I see. My point here is if the ndistinct of one column is much more than 
the ndistinct of another, it is more probable that this correlation will 
be kept in the grouping phase. Here we can get some regression, which 
can be overweighed by the possibility below.



So, how about committing of the undoubted part of the feature and
working on the cost model in a new thread?



But Tom's commit message says this:

 Worse, to arrive at estimates of the number of calls made to the
 lower-order-column comparison functions, the code needs to make
 estimates of the numbers of distinct values of multiple columns,
 which are necessarily even less trustworthy than per-column stats.

so I'm not sure this really counts as "undoubted".
Don't try to estimate multiple  columns - just sort columns according to 
the value of ndistinct as a heuristic.
I think we should estimate the number of values of multiple columns only 
if we have extended statistics on these columns. And this can extend the 
applicability of extended statistics.


The suggestion above is just an attempt to gather low-hanging fruit 
right now. If it is not applicable, we will go a long way.


--
regards,
Andrey Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2023-07-24 Thread Tomas Vondra

On 7/24/23 04:10, Andrey Lepikhov wrote:
> On 20/7/2023 18:46, Tomas Vondra wrote:
>> On 7/20/23 08:37, Andrey Lepikhov wrote:
>>> On 3/10/2022 21:56, Tom Lane wrote:
 Revert "Optimize order of GROUP BY keys".

 This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
 several follow-on fixes.
 ...
 Since we're hard up against the release deadline for v15, let's
 revert these changes for now.  We can always try again later.
>>>
>>> It may be time to restart the project. As a first step, I rebased the
>>> patch on the current master. It wasn't trivial because of some latest
>>> optimizations (a29eab, 1349d27 and 8d83a5d).
>>> Now, Let's repeat the review and rewrite the current path according to
>>> the reasons uttered in the revert commit.
>>
>> I think the fundamental task is to make the costing more reliable, and
>> the commit message 443df6e2db points out a couple challenges in this
>> area. Not sure how feasible it is to address enough of them ...
>>
>> 1) procost = 1.0 - I guess we could make this more realistic by doing
>> some microbenchmarks and tuning the costs for the most expensive cases.
>>
>> 2) estimating quicksort comparisons - This relies on ndistinct
>> estimates, and I'm not sure how much more reliable we can make those.
>> Probably not much :-( Not sure what to do about this, the only thing I
>> can think of is to track "reliability" of the estimates and only do the
>> reordering if we have high confidence in the estimates. That means we'll
>> miss some optimization opportunities, but it should limit the risk.
> I read up on the history of this thread.
> As I see, all the problems mentioned above can be beaten by excluding
> the new cost model at all. We can sort GROUP BY columns according to the
> 'ndistinct' value.
> I see the reason for introducing the cost model in [1]. The main
> supporting point here is that with this patch, people couldn't optimize
> the query by themselves, organizing the order of the columns in a more
> optimal way. But now we have at least the GUC to switch off the
> behaviour introduced here. Also, some extensions, like the well-known
> pg_hint_plan, can help with automation.

I think the main concern is that if we reorder the group keys and get it
wrong, it's a regression. If that happens often (due to costing based on
poor stats), it's a problem. Yes, there's a GUC, but that's a rather
blunt instrument, unfortunately.

> So, how about committing of the undoubted part of the feature and
> working on the cost model in a new thread?
> 

But Tom's commit message says this:

Worse, to arrive at estimates of the number of calls made to the
lower-order-column comparison functions, the code needs to make
estimates of the numbers of distinct values of multiple columns,
which are necessarily even less trustworthy than per-column stats.

so I'm not sure this really counts as "undoubted".


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: GROUP BY optimization

2023-07-23 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now. We can always try again later.

It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.

I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

I read up on the history of this thread.
As I see, all the problems mentioned above can be beaten by excluding
the new cost model at all. We can sort GROUP BY columns according to the
'ndistinct' value.
I see the reason for introducing the cost model in [1]. The main
supporting point here is that with this patch, people couldn't optimize
the query by themselves, organizing the order of the columns in a more
optimal way. But now we have at least the GUC to switch off the
behaviour introduced here. Also, some extensions, like the well-known
pg_hint_plan, can help with automation.
So, how about committing of the undoubted part of the feature and
working on the cost model in a new thread?

[1]
https://www.postgresql.org/message-id/6d1e0cdb-dde3-f62a-43e2-e90bbd9b0f42%402ndquadrant.com

--
regards,
Andrey Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2023-07-20 Thread Andrey Lepikhov


On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.


I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

For me personally, the most challenging issue is:
3. Imbalance, induced by the cost_sort() changes. It may increase or 
decrease the contribution of sorting to the total cost compared to other 
factors and change choice of sorted paths.


--
regards,
Andrey Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

2023-07-20 Thread Tomas Vondra

On 7/20/23 08:37, Andrey Lepikhov wrote:
> On 3/10/2022 21:56, Tom Lane wrote:
>> Revert "Optimize order of GROUP BY keys".
>>
>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>> several follow-on fixes.
>> ...
>> Since we're hard up against the release deadline for v15, let's
>> revert these changes for now.  We can always try again later.
> 
> It may be time to restart the project. As a first step, I rebased the
> patch on the current master. It wasn't trivial because of some latest
> optimizations (a29eab, 1349d27 and 8d83a5d).
> Now, Let's repeat the review and rewrite the current path according to
> the reasons uttered in the revert commit.

I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: GROUP BY optimization

2023-07-20 Thread Andrey Lepikhov

On 3/10/2022 21:56, Tom Lane wrote:
> Revert "Optimize order of GROUP BY keys".
>
> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
> several follow-on fixes.
> ...
> Since we're hard up against the release deadline for v15, let's
> revert these changes for now.  We can always try again later.

It may be time to restart the project. As a first step, I rebased the 
patch on the current master. It wasn't trivial because of some latest 
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to 
the reasons uttered in the revert commit.

--
regards,
Andrey Lepikhov
Postgres Professional
From 913d55ee887dccfeba360f5f44ed347dd1ba9044 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 14 Jul 2023 10:29:36 +0700
Subject: [PATCH] When evaluating a query with a multi-column GROUP BY clause
 using sort, the cost may be heavily dependent on the order in which the keys
 are compared when building the groups. Grouping does not imply any ordering,
 so we're allowed to compare the keys in arbitrary order, and a Hash Agg
 leverages this. But for Group Agg, we simply compared keys in the order as
 specified in the query. This commit explores alternative ordering of the
 keys, trying to find a cheaper one.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In principle, we might generate grouping paths for all permutations of
the keys, and leave the rest to the optimizer. But that might get very
expensive, so we try to pick only a couple interesting orderings based
on both local and global information.

When planning the grouping path, we explore statistics (number of
distinct values, cost of the comparison function) for the keys and
reorder them to minimize comparison costs. Intuitively, it may be better
to perform more expensive comparisons (for complex data types etc.)
last, because maybe the cheaper comparisons will be enough. Similarly,
the higher the cardinality of a key, the lower the probability we’ll
need to compare more keys. The patch generates and costs various
orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query,
some of which may not be known while planning the grouping. E.g. there
may be an explicit ORDER BY clause, or some other ordering-dependent
operation, higher up in the query, and using the same ordering may allow
using either incremental sort or even eliminate the sort entirely.

The patch generates orderings and picks those minimizing the comparison
cost (for various pathkeys), and then adds orderings that might be
useful for operations higher up in the plan (ORDER BY, etc.). Finally,
it always keeps the ordering specified in the query, on the assumption
the user might have additional insights.

This introduces a new GUC enable_group_by_reordering, so that the
optimization may be disabled if needed.
---
 .../postgres_fdw/expected/postgres_fdw.out|  15 +-
 src/backend/optimizer/path/costsize.c | 363 ++-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 602 ++
 src/backend/optimizer/plan/planner.c  | 467 --
 src/backend/optimizer/util/pathnode.c |   4 +-
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |   7 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 src/test/regress/expected/merge.out   |  15 +-
 .../regress/expected/partition_aggregate.out  |  44 +-
 src/test/regress/expected/partition_join.out  |  75 +--
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 23 files changed, 1760 insertions(+), 374 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 852b5b4707..3f3c181ac8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9593,13 +9593,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 
ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 
10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-

Re: POC: GROUP BY optimization

2022-09-19 Thread David Rowley

On Wed, 13 Jul 2022 at 15:37, David Rowley  wrote:
> I'm just in this general area of the code again today and wondered
> about the header comment for the preprocess_groupclause() function.
>
> It says:
>
>  * In principle it might be interesting to consider other orderings of the
>  * GROUP BY elements, which could match the sort ordering of other
>  * possible plans (eg an indexscan) and thereby reduce cost.  We don't
>  * bother with that, though.  Hashed grouping will frequently win anyway.
>
> I'd say this commit makes that paragraph mostly obsolete.  It's only
> true now in the sense that we don't try orders that suit some index
> that would provide pre-sorted results for a GroupAggregate path.  The
> comment leads me to believe that we don't do anything at all to find a
> better order, and that's not true now.

I've just pushed a fix for this.

David

Re: POC: GROUP BY optimization

2022-09-05 Thread Michael Paquier

On Mon, Sep 05, 2022 at 12:14:48AM +0200, Tomas Vondra wrote:
> I've pushed the fix to 15+master. In the end I just used David's patches
> that set parallel_setup_cost to 0.

Thanks, Tomas!
--
Michael


signature.asc
Description: PGP signature

Re: POC: GROUP BY optimization

2022-09-04 Thread Jonathan S. Katz


On 9/4/22 6:14 PM, Tomas Vondra wrote:


I've pushed the fix to 15+master. In the end I just used David's patches
that set parallel_setup_cost to 0.


Thanks! I have closed the open item.

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: POC: GROUP BY optimization

2022-09-04 Thread Tomas Vondra

On 9/1/22 16:05, Jonathan S. Katz wrote:
> On 9/1/22 9:06 AM, Tomas Vondra wrote:
>>
>>
>> On 8/30/22 15:15, Jonathan S. Katz wrote:
>>> On 8/18/22 3:29 AM, Tomas Vondra wrote:


 On 8/18/22 03:32, David Rowley wrote:
>>>
> Here are a couple of patches to demo the idea.
>

 Yeah, that's an option too. I should have mentioned it along with the
 cpu_operator_cost.

 BTW would you mind taking a look at the costing? I think it's fine, but
 it would be good if someone not involved in the patch takes a look.
>>>
>>> With RMT hat on, just a friendly reminder that this is still on the open
>>> items list[1] -- we have the Beta 4 release next week and it would be
>>> great if we can get this resolved prior to it.
>>>
>>
>> I know. I'll fix this by tweaking one of the cost gucs, mentioned by
>> David.
> 
> Great -- thanks for the update!
> 

I've pushed the fix to 15+master. In the end I just used David's patches
that set parallel_setup_cost to 0.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: GROUP BY optimization

2022-09-01 Thread Jonathan S. Katz


On 9/1/22 9:06 AM, Tomas Vondra wrote:



On 8/30/22 15:15, Jonathan S. Katz wrote:

On 8/18/22 3:29 AM, Tomas Vondra wrote:



On 8/18/22 03:32, David Rowley wrote:



Here are a couple of patches to demo the idea.



Yeah, that's an option too. I should have mentioned it along with the
cpu_operator_cost.

BTW would you mind taking a look at the costing? I think it's fine, but
it would be good if someone not involved in the patch takes a look.


With RMT hat on, just a friendly reminder that this is still on the open
items list[1] -- we have the Beta 4 release next week and it would be
great if we can get this resolved prior to it.



I know. I'll fix this by tweaking one of the cost gucs, mentioned by David.


Great -- thanks for the update!

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: POC: GROUP BY optimization

2022-09-01 Thread Tomas Vondra




On 8/30/22 15:15, Jonathan S. Katz wrote:
> On 8/18/22 3:29 AM, Tomas Vondra wrote:
>>
>>
>> On 8/18/22 03:32, David Rowley wrote:
> 
>>> Here are a couple of patches to demo the idea.
>>>
>>
>> Yeah, that's an option too. I should have mentioned it along with the
>> cpu_operator_cost.
>>
>> BTW would you mind taking a look at the costing? I think it's fine, but
>> it would be good if someone not involved in the patch takes a look.
> 
> With RMT hat on, just a friendly reminder that this is still on the open
> items list[1] -- we have the Beta 4 release next week and it would be
> great if we can get this resolved prior to it.
> 

I know. I'll fix this by tweaking one of the cost gucs, mentioned by David.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: GROUP BY optimization

2022-08-30 Thread Jonathan S. Katz


On 8/18/22 3:29 AM, Tomas Vondra wrote:



On 8/18/22 03:32, David Rowley wrote:



Here are a couple of patches to demo the idea.



Yeah, that's an option too. I should have mentioned it along with the
cpu_operator_cost.

BTW would you mind taking a look at the costing? I think it's fine, but
it would be good if someone not involved in the patch takes a look.


With RMT hat on, just a friendly reminder that this is still on the open 
items list[1] -- we have the Beta 4 release next week and it would be 
great if we can get this resolved prior to it.


Thanks!

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items
[2] 
https://www.postgresql.org/message-id/9d251aec-cea2-bc1a-5ed8-46ef0bcf6...@postgresql.org


OpenPGP_signature
Description: OpenPGP digital signature

Re: POC: GROUP BY optimization

2022-08-18 Thread Tomas Vondra




On 8/18/22 03:32, David Rowley wrote:
> On Thu, 18 Aug 2022 at 02:46, Tomas Vondra
>  wrote:
>> So I don't think the current costing is wrong, but it certainly is more
>> complex. But the test does not test what it intended - I have two ideas
>> how to make it work:
>>
>> 1) increase the number of rows in the table
>>
>> 2) increase cpu_operator_cost (for that one test?)
>>
>> 3) tweak the costing somehow, to increase the cost a bit
> 
> Why not, 4) SET parallel_setup_cost = 0;  there are plenty of other
> places we do just that so we get a parallel plan without having to
> generate enough cost to drown out the parallel worker startup cost.
> 
> Here are a couple of patches to demo the idea.
> 

Yeah, that's an option too. I should have mentioned it along with the
cpu_operator_cost.

BTW would you mind taking a look at the costing? I think it's fine, but
it would be good if someone not involved in the patch takes a look.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: GROUP BY optimization

2022-08-17 Thread David Rowley

On Thu, 18 Aug 2022 at 02:46, Tomas Vondra
 wrote:
> So I don't think the current costing is wrong, but it certainly is more
> complex. But the test does not test what it intended - I have two ideas
> how to make it work:
>
> 1) increase the number of rows in the table
>
> 2) increase cpu_operator_cost (for that one test?)
>
> 3) tweak the costing somehow, to increase the cost a bit

Why not, 4) SET parallel_setup_cost = 0;  there are plenty of other
places we do just that so we get a parallel plan without having to
generate enough cost to drown out the parallel worker startup cost.

Here are a couple of patches to demo the idea.

David
diff --git a/src/test/regress/expected/partition_aggregate.out 
b/src/test/regress/expected/partition_aggregate.out
index 4b41ccf1aa..db36e3a150 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -942,6 +942,7 @@ INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 
4, 'FM') FROM gen
 ANALYZE pagg_tab_ml;
 -- For Parallel Append
 SET max_parallel_workers_per_gather TO 2;
+SET parallel_setup_cost = 0;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
@@ -1025,6 +1026,7 @@ SELECT a, sum(b), array_agg(distinct c), count(*) FROM 
pagg_tab_ml GROUP BY a HA
->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
 (25 rows)
 
+RESET parallel_setup_cost;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, thus we will have a partial aggregation for them.
diff --git a/src/test/regress/sql/partition_aggregate.sql 
b/src/test/regress/sql/partition_aggregate.sql
index c17294b15b..ab070fee24 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -222,6 +222,7 @@ ANALYZE pagg_tab_ml;
 
 -- For Parallel Append
 SET max_parallel_workers_per_gather TO 2;
+SET parallel_setup_cost = 0;
 
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
@@ -235,6 +236,8 @@ SELECT a, sum(b), array_agg(distinct c), count(*) FROM 
pagg_tab_ml GROUP BY a HA
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a 
HAVING avg(b) < 3;
 
+RESET parallel_setup_cost;
+
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, thus we will have a partial aggregation for them.
diff --git a/src/test/regress/expected/partition_aggregate.out 
b/src/test/regress/expected/partition_aggregate.out
index a08a3825ff..0dc6d63347 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -942,40 +942,43 @@ INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i 
% 4, 'FM') FROM gen
 ANALYZE pagg_tab_ml;
 -- For Parallel Append
 SET max_parallel_workers_per_gather TO 2;
+SET parallel_setup_cost = 0;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
 -- is not partial agg safe.
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a 
HAVING avg(b) < 3 ORDER BY 1, 2, 3;
-  QUERY PLAN   
   
---
- Sort
-   Sort Key: pagg_tab_ml.a, (sum(pagg_tab_ml.b)), (array_agg(DISTINCT 
pagg_tab_ml.c))
-   ->  Append
- ->  GroupAggregate
-   Group Key: pagg_tab_ml.a
-   Filter: (avg(pagg_tab_ml.b) < '3'::numeric)
-   ->  Sort
- Sort Key: pagg_tab_ml.a
- ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
- ->  GroupAggregate
-   Group Key: pagg_tab_ml_2.a
-   Filter: (avg(pagg_tab_ml_2.b) < '3'::numeric)
-   ->  Sort
- Sort Key: pagg_tab_ml_2.a
- ->  Append
-   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
-   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
- ->  GroupAggregate
-   Group Key: pagg_tab_ml_5.a
-   Filter: (avg(pagg_tab_ml_5.b) < '3'::numeric)
-   ->  Sort
- Sort Key: pagg_tab_ml_5.a
- ->  Append
-   ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
-

Re: POC: GROUP BY optimization

2022-08-17 Thread Tomas Vondra

On 8/2/22 13:14, David Rowley wrote:
> On Tue, 2 Aug 2022 at 22:21, Michael Paquier  wrote:
>>
>> On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote:
>>> I agree this is a mistake in db0d67db2 that makes the test useless.
>>
>> Tomas, please note that this is an open item assigned to you.  Are you
>> planning to do something with these regression tests by beta3?
> 
> There's still something to do there for PG15, but 1349d2790 (just gone
> in to master) made those two tests run as a parallel query again.
> 

Hi,

I've been looking at this, and as I speculated before it's due to the
sort costing changing a little bit. On PG14, the costing of the plans
looks like this:

 Gather  (cost=1869.39..2823.15 rows=8 width=52)

and with disabled parallelism, it's like this:

 Append  (cost=998.04..3000.64 rows=10 width=52)

so there's a (fairly small) diffrence in favor of the parallel plan. But
on PG15 it's the other way around - the selected non-parallel one is
costed like this:

 Append  (cost=779.41..2490.61 rows=10 width=52)

and by setting parallel_setup_cost=0 you get this:

 Gather  (cost=700.34..1531.76 rows=8 width=52)

So with the setup cost included it's ~2531, and it loses to the simple
plan. This happens because the patch changed sort costing - the same
sort on PG14 and PG15 looks like this:

 PG14: ->  Sort  (cost=998.04..1028.04 rows=12000 width=13)

 PG15: ->  Sort  (cost=779.41..809.41 rows=12000 width=13)

As mentioned, the commit tweaked sort costing - before it was pretty
much just

  comparison_cost * tuples * LOG2(tuples)

but the patch needs to cost different pathkey orderings, and consider
that maybe we don't need to compare all the keys (which the original
costing kind assumes). That's the whole point of this optimization.

The costing (compute_cpu_sort_cost) considers a couple other things,
like cost of the comparator function for the data type, width of the
values, groupings determined by preceding keys, and so on.

It might seem strange that a query with a single pathkey changes, but
that single pathkey is costed the same way, of course. In principle we
might have a special costing for this case, but I'd bet that would
result in pretty surprising inconsistencies when adding a sort key
(going from 1 to 2 keys).

So I don't think the current costing is wrong, but it certainly is more
complex. But the test does not test what it intended - I have two ideas
how to make it work:

1) increase the number of rows in the table

2) increase cpu_operator_cost (for that one test?)

3) tweak the costing somehow, to increase the cost a bit

Both (1) and (2) work - I've tried doubling the number of rows or
setting the operator cost to 0.005, and that does the trick, but maybe a
smaller change would be enough.

I don't like (3) very much - changing the costing just to get the same
test behavior as on older release does not seem very principled. Yes,
maybe it should be tweaked, but not because of a regression test.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company-- Without ORDER BY clause, to test Gather at top-most path
EXPLAIN (COSTS ON)
SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3;
QUERY PLAN 
---
 Gather  (cost=1869.39..2823.15 rows=8 width=52)
   Workers Planned: 2
   ->  Parallel Append  (cost=869.39..1822.35 rows=4 width=52)
 ->  GroupAggregate  (cost=998.04..1178.25 rows=4 width=52)
   Group Key: pagg_tab_ml.a
   Filter: (avg(pagg_tab_ml.b) < '3'::numeric)
   ->  Sort  (cost=998.04..1028.04 rows=12000 width=13)
 Sort Key: pagg_tab_ml.a
 ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml  (cost=0.00..185.00 rows=12000 width=13)
 ->  GroupAggregate  (cost=869.39..1019.56 rows=3 width=52)
   Group Key: pagg_tab_ml_5.a
   Filter: (avg(pagg_tab_ml_5.b) < '3'::numeric)
   ->  Sort  (cost=869.39..894.39 rows=1 width=13)
 Sort Key: pagg_tab_ml_5.a
 ->  Append  (cost=0.00..205.00 rows=1 width=13)
   ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5  (cost=0.00..108.00 rows=7000 width=13)
   ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6  (cost=0.00..47.00 rows=3000 width=13)
 ->  GroupAggregate  (cost=682.63..802.77 rows=3 width=52)
   Group Key: pagg_tab_ml_2.a
   Filter: (avg(pagg_tab_ml_2.b) < '3'::numeric)
   ->  Sort  (cost=682.63..702.63 rows=8000 width=13)
 Sort Key: pagg_tab_ml_2.a
 ->  Append  (cost=0.00..164.00 rows=8000 width=13)
   ->  Seq Scan on

Re: POC: GROUP BY optimization

2022-08-02 Thread David Rowley

On Tue, 2 Aug 2022 at 22:21, Michael Paquier  wrote:
>
> On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote:
> > I agree this is a mistake in db0d67db2 that makes the test useless.
>
> Tomas, please note that this is an open item assigned to you.  Are you
> planning to do something with these regression tests by beta3?

There's still something to do there for PG15, but 1349d2790 (just gone
in to master) made those two tests run as a parallel query again.

David

Re: POC: GROUP BY optimization

2022-08-02 Thread Michael Paquier

On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote:
> I agree this is a mistake in db0d67db2 that makes the test useless.

Tomas, please note that this is an open item assigned to you.  Are you
planning to do something with these regression tests by beta3?
--
Michael


signature.asc
Description: PGP signature

Re: POC: GROUP BY optimization

2022-07-15 Thread Tomas Vondra




On 7/15/22 07:18, David Rowley wrote:
> On Thu, 31 Mar 2022 at 12:19, Tomas Vondra
>  wrote:
>> Pushed, after going through the patch once more, running check-world
>> under valgrind, and updating the commit message.
> 
> I'm still working in this area and I noticed that db0d67db2 updated
> some regression tests in partition_aggregate.out without any care as
> to what the test was testing.
> 
> The comment above the test reads:
> 
> -- Without ORDER BY clause, to test Gather at top-most path
> 
> and you've changed the expected plan from being a parallel plan with a
> Gather to being a serial plan. So it looks like the test might have
> become useless.
> 

I agree this is a mistake in db0d67db2 that makes the test useless.

> I see that the original plan appears to come back with some
> adjustments to parallel_setup_cost and parallel_tuple_cost. It seems a
> bit strange to me that the changes with this patch would cause a
> change of plan for this. There is only 1 GROUP BY column in the query
> in question. There's no rearrangement to do with a single column GROUP
> BY.
> 

It might seem a bit strange, but the patch tweaked the costing a bit, so
it's not entirely unexpected. I'd bet the plan cost changed just a teeny
bit, but enough to change the cheapest plan. The costing changed for all
group counts, including a single group.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: GROUP BY optimization

2022-07-14 Thread David Rowley

On Thu, 31 Mar 2022 at 12:19, Tomas Vondra
 wrote:
> Pushed, after going through the patch once more, running check-world
> under valgrind, and updating the commit message.

I'm still working in this area and I noticed that db0d67db2 updated
some regression tests in partition_aggregate.out without any care as
to what the test was testing.

The comment above the test reads:

-- Without ORDER BY clause, to test Gather at top-most path

and you've changed the expected plan from being a parallel plan with a
Gather to being a serial plan. So it looks like the test might have
become useless.

I see that the original plan appears to come back with some
adjustments to parallel_setup_cost and parallel_tuple_cost. It seems a
bit strange to me that the changes with this patch would cause a
change of plan for this. There is only 1 GROUP BY column in the query
in question. There's no rearrangement to do with a single column GROUP
BY.

David

Re: POC: GROUP BY optimization

2022-07-12 Thread David Rowley

On Thu, 31 Mar 2022 at 12:19, Tomas Vondra
 wrote:
> Pushed, after going through the patch once more, running check-world
> under valgrind, and updating the commit message.

I'm just in this general area of the code again today and wondered
about the header comment for the preprocess_groupclause() function.

It says:

 * In principle it might be interesting to consider other orderings of the
 * GROUP BY elements, which could match the sort ordering of other
 * possible plans (eg an indexscan) and thereby reduce cost.  We don't
 * bother with that, though.  Hashed grouping will frequently win anyway.

I'd say this commit makes that paragraph mostly obsolete.  It's only
true now in the sense that we don't try orders that suit some index
that would provide pre-sorted results for a GroupAggregate path.  The
comment leads me to believe that we don't do anything at all to find a
better order, and that's not true now.

David

Re: POC: GROUP BY optimization

2022-03-30 Thread Tomas Vondra

Pushed, after going through the patch once more, running check-world
under valgrind, and updating the commit message.

Thanks to everyone who reviewed/tested this patch!

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

1 2 >

1 - 100 of 167 matches

Mail list logo