Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

via GitHub Tue, 04 Jun 2024 15:17:48 -0700


GideonPotok commented on code in PR #46597:
URL: https://github.com/apache/spark/pull/46597#discussion_r1626660746



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##########
@@ -74,16 +90,25 @@ case class Mode(
     if (buffer.isEmpty) {
       return null
     }
-
+    val collationAwareBuffer = child.dataType match {
+      case c: StringType if
+        !CollationFactory.fetchCollation(c.collationId).supportsBinaryEquality 
=>
+        val collationId = c.collationId
+        val modeMap = buffer.toSeq.groupMapReduce {

Review Comment:
   Also, there is a code smell doing it that way. It brings alot of type-aware 
logic into a class where that is not really seen. eg, the following will have 
to be changed to include: 
   1. an `isInstanceOf[UTF8String] condition,
   2. an `asInstanceOf[UTF8String]` cast, and the transformation into a 
collation key
   3. etc
   ```
   /**
      * Check if a key exists at the provided position using object equality 
rather than
      * cooperative equality. Otherwise, hash sets will mishandle values for 
which `==`
      * and `equals` return different results, like 0.0/-0.0 and NaN/NaN.
      *
      * See: https://issues.apache.org/jira/browse/SPARK-45599
      */
     @annotation.nowarn("cat=other-non-cooperative-equals")
     private def keyExistsAtPos(k: T, pos: Int) =
       _data(pos) equals k
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

Reply via email to