GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626660746
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ########## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) { return null } - + val collationAwareBuffer = child.dataType match { + case c: StringType if + !CollationFactory.fetchCollation(c.collationId).supportsBinaryEquality => + val collationId = c.collationId + val modeMap = buffer.toSeq.groupMapReduce { Review Comment: Also, there is a code smell doing it that way. It brings alot of type-aware logic into a class where that is not really seen. eg, the following will have to be changed to include: 1. an `isInstanceOf[UTF8String] condition, 2. an `asInstanceOf[UTF8String]` cast, and the transformation into a collation key 3. etc ``` /** * Check if a key exists at the provided position using object equality rather than * cooperative equality. Otherwise, hash sets will mishandle values for which `==` * and `equals` return different results, like 0.0/-0.0 and NaN/NaN. * * See: https://issues.apache.org/jira/browse/SPARK-45599 */ @annotation.nowarn("cat=other-non-cooperative-equals") private def keyExistsAtPos(k: T, pos: Int) = _data(pos) equals k ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org