alamb commented on code in PR #9991:
URL: https://github.com/apache/arrow-datafusion/pull/9991#discussion_r1559832348
##########
datafusion/sqllogictest/test_files/group_by.slt:
##########
@@ -2129,41 +2129,11 @@ query III
1 2 1550
1 3 2175
-
-# test_source_sorted_groupby2
-# If ordering is not important for the aggregation function, we should ignore
the ordering requirement. Hence
-# "ORDER BY a DESC" should have no effect.
-query TT
-EXPLAIN SELECT a, d,
- SUM(c ORDER BY a DESC) as summation1
- FROM annotated_data_infinite2
- GROUP BY d, a
-----
-logical_plan
-Projection: annotated_data_infinite2.a, annotated_data_infinite2.d,
SUM(annotated_data_infinite2.c) ORDER BY [annotated_data_infinite2.a DESC NULLS
FIRST] AS summation1
---Aggregate: groupBy=[[annotated_data_infinite2.d,
annotated_data_infinite2.a]], aggr=[[SUM(CAST(annotated_data_infinite2.c AS
Int64)) ORDER BY [annotated_data_infinite2.a DESC NULLS FIRST]]]
-----TableScan: annotated_data_infinite2 projection=[a, c, d]
-physical_plan
-ProjectionExec: expr=[a@1 as a, d@0 as d, SUM(annotated_data_infinite2.c)
ORDER BY [annotated_data_infinite2.a DESC NULLS FIRST]@2 as summation1]
---AggregateExec: mode=Single, gby=[d@2 as d, a@0 as a],
aggr=[SUM(annotated_data_infinite2.c)], ordering_mode=PartiallySorted([1])
-----StreamingTableExec: partition_sizes=1, projection=[a, c, d],
infinite_source=true, output_ordering=[a@0 ASC NULLS LAST]
-
-query III
+statement error DataFusion error: This feature is not implemented: ORDER BY is
not implemented for SUM
Review Comment:
@mustafasrepo / @ozankabak / @metesynnada can you help us understand if
the ordering for `SUM(c ORDER BY a DESC)` has meaning?
DataFusion currently ignores cases
We propose making it an error to include clauses that make no sense
However, I could see an argument for permitting the user to write `ORDER BY`
even if the aggregate didn't care about ordering (and simply remove the ORDER
BY as a optimization)
I can also see the rationale for failing fast and erroring as the user
probably didn't meant to specify an ordering on SUM 🤔
##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -3428,3 +3428,16 @@ SELECT LAST_VALUE(column1 ORDER BY column2 DESC) IGNORE
NULLS FROM t;
statement ok
DROP TABLE t;
+
+# Test for IGNORE NULLS / ORDER BY not implemented
+statement error DataFusion error: This feature is not implemented: IGNORE
NULLS is not implemented for COUNT
+SELECT COUNT(*) IGNORE NULLS FROM (values (1), (null), (2));
Review Comment:
I think we need to check in the physical plan layer as well for the reason
@andygrove says.
Another approach could be an analyzer rule (following the model of how
certain subqueries are handled) like this:
https://github.com/apache/arrow-datafusion/blob/582050728914650c6d4340ca803a0e9af087d8ec/datafusion/optimizer/src/analyzer/subquery.rs#L32-L39
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]