Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

via GitHub Thu, 07 Dec 2023 17:40:18 -0800


beliefer commented on code in PR #44184:
URL: https://github.com/apache/spark/pull/44184#discussion_r1419842677



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##########
@@ -18,22 +18,21 @@
 package org.apache.spark.sql.catalyst.expressions.aggregate
 
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
-import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{DataTypeMismatch, 
TypeCheckSuccess}
-import org.apache.spark.sql.catalyst.expressions.{Expression, 
ExpressionDescription, ImplicitCastInputTypes, Literal}
-import org.apache.spark.sql.catalyst.trees.{BinaryLike, UnaryLike}
+import org.apache.spark.sql.catalyst.analysis.{ExpressionBuilder, 
UnresolvedWithinGroup}
+import org.apache.spark.sql.catalyst.expressions.{Ascending, Descending, 
Expression, ExpressionDescription, ImplicitCastInputTypes, SortOrder}
+import org.apache.spark.sql.catalyst.trees.UnaryLike
 import org.apache.spark.sql.catalyst.types.PhysicalDataType
 import org.apache.spark.sql.catalyst.util.GenericArrayData
-import org.apache.spark.sql.catalyst.util.TypeUtils.toSQLExpr
-import org.apache.spark.sql.errors.DataTypeErrors.{toSQLId, toSQLType}
+import org.apache.spark.sql.errors.QueryCompilationErrors
 import org.apache.spark.sql.types.{AbstractDataType, AnyDataType, ArrayType, 
BooleanType, DataType}
 import org.apache.spark.util.collection.OpenHashMap
 
 // scalastyle:off line.size.limit
 @ExpressionDescription(
   usage = """
-    _FUNC_(col[, deterministic]) - Returns the most frequent value for the 
values within `col`. NULL values are ignored. If all the values are NULL, or 
there are 0 rows, returns NULL.
-      When multiple values have the same greatest frequency then either any of 
values is returned if `deterministic` is false or is not defined, or the lowest 
value is returned if `deterministic` is true.""",
+    _FUNC_(col[, reverse]) - Returns the most frequent value for the values 
within `col`. NULL values are ignored. If all the values are NULL, or there are 
0 rows, returns NULL.
+      When multiple values have the same greatest frequency only one value 
will be returned. The value will be chosen based on optional reverse value. 
Return the smallest value
+      if reverse is false or the largest value if reverse is true from 
multiple values with the same frequency. If reverse is not specified the chosen 
value is not determined.""",

Review Comment:
    For keep compatibility with origin `deterministicExpr`.
   The false in `mode(col, false)` is the value of `deterministicExpr`.
   The true in `mode(col, true)` is the value of `deterministicExpr`.
   
   After the offline discussion. @cloud-fan suggests keep temporarily 
compatibility.
   I don't know if the users really need `deterministicExpr`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

Reply via email to