Re: Multiple grouping set specs referencing duplicate alias

2022-10-23 Thread Tom Lane
David Kimura  writes:
> I think I may have stumbled across a case of wrong results on HEAD (same
> through version 9.6, though interestingly 9.5 produces different results
> altogether).

> test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
> ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

Yeah, this is an instance of an issue we've known about for awhile:
when using grouping sets (ROLLUP), the planner fails to distinguish
between "ai1" and "ai1 as possibly nulled by the action of the
grouping node".  This has been discussed at, eg, [1] and [2].
The direction I'd like to take to fix it is to invent explicit
labeling of Vars that have been nulled by some operation such as
outer joins or grouping, and then represent grouping set outputs
as either PlaceHolderVars or Vars tied to a new RTE that represents
the grouping step.  I have been working on a patch that'd do the
first half of that [3], but it's been slow going, because we've
indulged in a lot of semantic squishiness in this area and cleaning
it all up is a large undertaking.

> I tinkered a bit and hacked together an admittedly ugly patch that triggers an
> explicit sort constructed from the parse tree.

I seriously doubt that that'll fix all the issues in this area.
We really really need to understand that a PathKey based on
the scan-level value of a Var is different from a PathKey based
on a post-nulling-step value.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CAMbWs48AtQTQGk37MSyDk_EAgDO3Y0iA_LzvuvGQ2uO_Wh2muw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/7dbdcf5c-b5a6-ef89-4958-da212fe10176%40iki.fi
[3] https://www.postgresql.org/message-id/flat/830269.1656693...@sss.pgh.pa.us




Multiple grouping set specs referencing duplicate alias

2022-10-21 Thread David Kimura
Hi all!

I think I may have stumbled across a case of wrong results on HEAD (same
through version 9.6, though interestingly 9.5 produces different results
altogether).

test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

 ai1 | ai2
-+-
   1 |   1
 |   1
   2 |   2
 |   2
   3 |   3
 |   3
(6 rows)

I had expected:

 ai1 | ai2
-+-
   1 |   1
   2 |   2
   3 |   3
 |   1
 |   2
 |   3
(6 rows)

It seems to me that the plan is missing a Sort node (on ai1 and ai2) above the
Aggregate node.

   QUERY PLAN

 GroupAggregate
   Group Key: i, i
   Group Key: i
   ->  Sort
 Sort Key: i
 ->  Function Scan on generate_series i

I have a hunch part of the issue may be an assumption that a duplicate aliased
column will produce the same column values and hence isn't included in the
range table, nor subsequently the pathkeys. However, that assumption does not
seem to be true for queries with multiple grouping set specifications:

test=#  SELECT i as ai1, i as ai2 FROM (values (1),(2),(3)) v(i) GROUP
BY ai1, ROLLUP(ai2);
 ai1 | ai2
-+-
   1 |   1
   2 |   2
   3 |   3
   1 |
   2 |
   3 |
(6 rows)

It seems to me that excluding the duplicate alias from the pathkeys is leading
to a case where the group order is incorrectly determined to satisfy the sort
order. Thus create_ordered_paths() decides against applying an explicit sort
node. But simply forcing an explicit sort still seems wrong since we've
effectively lost a relevant column for the sort.

I tinkered a bit and hacked together an admittedly ugly patch that triggers an
explicit sort constructed from the parse tree. An alternative approach I had
considered was to update the rewriteHandler to explicitly force the existence of
the duplicate alias column in the range tables. But that also felt meh.

Does this seem like a legit issue? And if so, any thoughts on alternative
approaches?

Thanks,
David Kimura
commit 7e7b9a0dce1ba60f4fc087082f04b04e7f405539
Author: David Kimura 
Date:   Wed Oct 19 21:32:27 2022 +

Fix multiple grouping specs using duplicate alias bug

There seems to have been an assumption that multiple aliases that refer
to the same column would always produce mirrored results. However, that
does not seem to be the case when the aliased columns are passed through
different grouping set specs. Consider the case:

  SELECT i AS alias1, i AS alias2 FROM generate_series(1,3)i GROUP BY alias2, ROLLUP(alias1);

This led to a scenario where a relevant column is collapsed at parse
time and thus excluded from sort pathkeys. Consequently group order
could then incorrectly decide it satisfied the sort order. Which leads
to create_ordered_paths() forgoing an explicit sort node.

A solution to the issue is to force an explicit sort and revive the
relevant column info.

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce9003..0b81d4d211 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2157,6 +2157,46 @@ change_plan_targetlist(Plan *subplan, List *tlist, bool tlist_parallel_safe)
 	return subplan;
 }
 
+/*
+ * reevaluate_sort_info
+ *
+ * Build the sort node info from the parse tree.
+ */
+static void
+reevaluate_sort_info(Query *parse, Sort *plan)
+{
+	int			i = 0;
+	ListCell   *l1;
+	ListCell   *l2;
+
+	plan->numCols = list_length(parse->sortClause);
+
+	plan->sortColIdx = palloc(sizeof(int) * plan->numCols);
+	plan->sortOperators = palloc(sizeof(int) * plan->numCols);
+	plan->collations = palloc(sizeof(int) * plan->numCols);
+	plan->nullsFirst = palloc(sizeof(int) * plan->numCols);
+
+	foreach(l1, parse->sortClause)
+	{
+		foreach(l2, parse->targetList)
+		{
+			SortGroupClause *sortClause = (SortGroupClause *) lfirst(l1);
+			TargetEntry *targetEntry = (TargetEntry *) lfirst(l2);
+
+			if (sortClause->tleSortGroupRef == targetEntry->ressortgroupref)
+			{
+plan->sortColIdx[i] = targetEntry->resno;
+plan->sortOperators[i] = sortClause->sortop;
+plan->collations[i] = exprCollation((Node *) targetEntry->expr);
+plan->nullsFirst[i] = sortClause->nulls_first;
+
+i++;
+break;
+			}
+		}
+	}
+}
+
 /*
  * create_sort_plan
  *
@@ -2187,6 +2227,9 @@ create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
    IS_OTHER_REL(best_path->subpath->parent) ?
    best_path->path.parent->relids : NULL);
 
+	if (best_path->need_evaluation)
+		reevaluate_sort_info(root->parse, plan);
+
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
 	return plan;
@@ -2213,6 +2256,9 @@ create_incrementalsort_plan(PlannerInfo *root, IncrementalSortPath *best_path,
 			  best_path->spath.path.parent->relids : NULL,
 			  best_path->nPresortedCols);
 
+	if (best_path->spath.nee