ethan-tyler commented on code in PR #19252:
URL: https://github.com/apache/datafusion/pull/19252#discussion_r2607956837
##########
datafusion/sqllogictest/test_files/grouping.slt:
##########
@@ -212,3 +212,15 @@ select c1, grouping(c1, c2) from test group by CUBE(c1);
statement error zero arguments
select c1, grouping() from test group by CUBE(c1);
+
+# grouping_sets_with_empty_set
+query I
+SELECT COUNT(*) FROM test GROUP BY GROUPING SETS (());
+----
+2
+
+# grouping_sets_with_empty_set
+query I
+SELECT SUM(v1) FROM generate_series(10) AS t1(v1) GROUP BY GROUPING SETS(())
+----
+55
Review Comment:
This is good to have both COUNT and SUM tested here.
##########
datafusion/physical-plan/src/aggregates/no_grouping.rs:
##########
@@ -249,12 +250,30 @@ fn scalar_cmp_null_short_circuit(
}
}
+/// Prepend the grouping ID column to the output columns if present.
+///
+/// For GROUPING SETS with no GROUP BY expressions, the schema includes a
`__grouping_id`
+/// column that must be present in the output. This function inserts it at the
beginning
+/// of the columns array to maintain schema alignment.
+fn prepend_grouping_id_column(
+ mut columns: Vec<Arc<dyn arrow::array::Array>>,
+ grouping_id: Option<&ScalarValue>,
+) -> Result<Vec<Arc<dyn arrow::array::Array>>> {
+ if let Some(id) = grouping_id {
+ let num_rows = columns.first().map(|array| array.len()).unwrap_or(1);
+ let grouping_ids = id.to_array_of_size(num_rows)?;
+ columns.insert(0, grouping_ids);
+ }
+ Ok(columns)
+}
+
impl AggregateStream {
/// Create a new AggregateStream
pub fn new(
agg: &AggregateExec,
context: &Arc<TaskContext>,
partition: usize,
+ grouping_id: Option<ScalarValue>,
Review Comment:
Looks like this is always None at the call site `mod.rs:742` , is this
scaffolding for
something planned, or can we drop it? If it's for future use, maybe add a
comment so nobody removes it thinking it's dead code.
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -885,7 +898,7 @@ impl AggregateExec {
};
match self.mode {
AggregateMode::Final | AggregateMode::FinalPartitioned
- if self.group_by.expr.is_empty() =>
+ if self.group_by.is_true_no_grouping() =>
Review Comment:
This should probably be `self.group_by.expr.is_empty()` instead.
`GROUPING SETS(())` still produces exactly one row (it's aggregating
everything),
but `is_true_no_grouping()` returns false for it since has_grouping_set is
true.
So we'd fall through to the inexact branch and report the wrong cardinality.
```suggestion
if self.group_by.expr.is_empty() =>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]