GideonPotok commented on PR #46597:
URL: https://github.com/apache/spark/pull/46597#issuecomment-2129923419

   @uros-db I forgot but should I add collation support to 
`org.apache.spark.sql.catalyst.expressions.aggregate.PandasMode`?
   
   The only difference will be 
   1. Support for null keys (thus StringType won't necessarily mean all values 
in buffer are UTF8String, some might just be null, right?)
   2. PandasMode returns a list of all values that are tied for mode. In that 
case, should all the values be present? Eg if you have the pandas_mode of ['a', 
'a', 'a', 'b',  'b', 'B'], with utf_binary_lcase collation, what do you think 
pandas_mode should return? If we want to support PandasMode, I can do a little 
research on what other databases seem to favor for this type of question. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to