Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

via GitHub Tue, 04 Jun 2024 15:20:31 -0700


GideonPotok commented on code in PR #46597:
URL: https://github.com/apache/spark/pull/46597#discussion_r1626659308



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##########
@@ -74,16 +90,25 @@ case class Mode(
     if (buffer.isEmpty) {
       return null
     }
-
+    val collationAwareBuffer = child.dataType match {
+      case c: StringType if
+        !CollationFactory.fetchCollation(c.collationId).supportsBinaryEquality 
=>
+        val collationId = c.collationId
+        val modeMap = buffer.toSeq.groupMapReduce {

Review Comment:
   @uros-db can you weigh in on whether you think that further exploration of 
dbatomic's approach would be warranted? What @dbatomic is suggesting is 
slightly different then the approach i have prototyped in the past (though is 
basically what you suggested very early on). dbatomic's suggestion might look 
something like the change seen here in 
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala: 
https://github.com/GideonPotok/spark/pull/2/files#diff-d758fde336b16d62c0feb00db9350822d60bd16805f8a816f2ee82b343d851aa
 which I have just now whipped up.
   
   It is possible his approach will work well, but as with past experience when 
we tried to use the hashmap plus an auxilliary data structure 
(https://github.com/GideonPotok/spark/pull/1/files), it might bring complexity 
and more gotcha's then we might be interested in taking on for this feature. 
   
   Is it worth it? I am capable of implementing it that way, but it is going to 
take a bit of work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

Reply via email to