alamb commented on code in PR #9991:
URL: https://github.com/apache/arrow-datafusion/pull/9991#discussion_r1559832348


##########
datafusion/sqllogictest/test_files/group_by.slt:
##########
@@ -2129,41 +2129,11 @@ query III
 1 2 1550
 1 3 2175
 
-
-# test_source_sorted_groupby2
-# If ordering is not important for the aggregation function, we should ignore 
the ordering requirement. Hence
-# "ORDER BY a DESC" should have no effect.
-query TT
-EXPLAIN SELECT a, d,
- SUM(c ORDER BY a DESC) as summation1
- FROM annotated_data_infinite2
- GROUP BY d, a
-----
-logical_plan
-Projection: annotated_data_infinite2.a, annotated_data_infinite2.d, 
SUM(annotated_data_infinite2.c) ORDER BY [annotated_data_infinite2.a DESC NULLS 
FIRST] AS summation1
---Aggregate: groupBy=[[annotated_data_infinite2.d, 
annotated_data_infinite2.a]], aggr=[[SUM(CAST(annotated_data_infinite2.c AS 
Int64)) ORDER BY [annotated_data_infinite2.a DESC NULLS FIRST]]]
-----TableScan: annotated_data_infinite2 projection=[a, c, d]
-physical_plan
-ProjectionExec: expr=[a@1 as a, d@0 as d, SUM(annotated_data_infinite2.c) 
ORDER BY [annotated_data_infinite2.a DESC NULLS FIRST]@2 as summation1]
---AggregateExec: mode=Single, gby=[d@2 as d, a@0 as a], 
aggr=[SUM(annotated_data_infinite2.c)], ordering_mode=PartiallySorted([1])
-----StreamingTableExec: partition_sizes=1, projection=[a, c, d], 
infinite_source=true, output_ordering=[a@0 ASC NULLS LAST]
-
-query III
+statement error DataFusion error: This feature is not implemented: ORDER BY is 
not implemented for SUM

Review Comment:
   @mustafasrepo  / @ozankabak  / @metesynnada  can you help us understand if 
the ordering for `SUM(c ORDER BY a DESC)` has meaning?
   
   DataFusion currently ignores cases
   
   We propose making it an error to include clauses that make no sense
   
   However, I could see an argument for permitting the user to write `ORDER BY` 
even if the aggregate didn't care about ordering (and simply remove the ORDER 
BY as a optimization) 
   
   I can also see the rationale for failing fast and erroring as the user 
probably didn't meant to specify an ordering on SUM 🤔 



##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -3428,3 +3428,16 @@ SELECT LAST_VALUE(column1 ORDER BY column2 DESC) IGNORE 
NULLS FROM t;
 
 statement ok
 DROP TABLE t;
+
+# Test for IGNORE NULLS / ORDER BY not implemented
+statement error DataFusion error: This feature is not implemented: IGNORE 
NULLS is not implemented for COUNT
+SELECT COUNT(*) IGNORE NULLS FROM (values (1), (null), (2));

Review Comment:
   I think we need to check in the physical plan layer as well for the reason 
@andygrove  says.
   
   Another approach could be an analyzer rule (following the model of how 
certain subqueries are handled) like this:
    
   
https://github.com/apache/arrow-datafusion/blob/582050728914650c6d4340ca803a0e9af087d8ec/datafusion/optimizer/src/analyzer/subquery.rs#L32-L39
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to