[GitHub] [spark] tanelk commented on a change in pull request #30999: [SPARK-33971][SQL] Eliminate distinct from more aggregates

GitBox Sun, 10 Jan 2021 20:27:16 -0800


tanelk commented on a change in pull request #30999:
URL: https://github.com/apache/spark/pull/30999#discussion_r554702109




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -349,11 +349,19 @@ abstract class Optimizer(catalogManager: CatalogManager)
  */
 object EliminateDistinct extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformExpressions  {
-    case ae: AggregateExpression if ae.isDistinct =>
-      ae.aggregateFunction match {
-        case _: Max | _: Min => ae.copy(isDistinct = false)
-        case _ => ae
-      }
+    case ae: AggregateExpression if ae.isDistinct && 
isDuplicateAgnostic(ae.aggregateFunction) =>
+      ae.copy(isDistinct = false)
+  }
+
+  private def isDuplicateAgnostic(af: AggregateFunction): Boolean = af match {
+    case _: Max => true
+    case _: Min => true
+    case _: BitAndAgg => true
+    case _: BitOrAgg => true
+    case _: First => true

Review comment:
       In another PR I try to argue that they should be deterministic: #29810
   TLDR: An analogous aggregator would be  sum on float and double datatype - 
its result does depend on the order of its inputs, but is deterministic. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tanelk commented on a change in pull request #30999: [SPARK-33971][SQL] Eliminate distinct from more aggregates

Reply via email to