GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2129923419
@uros-db I forgot but should I add collation support to `org.apache.spark.sql.catalyst.expressions.aggregate.PandasMode`? The only difference will be 1. Support for null keys (thus StringType won't necessarily mean all values in buffer are UTF8String, some might just be null, right?) 2. PandasMode returns a list of all values that are tied for mode. In that case, should all the values be present? Eg if you have the pandas_mode of ['a', 'a', 'a', 'b', 'b', 'B'], with utf_binary_lcase collation, what do you think pandas_mode should return? If we want to support PandasMode, I can do a little research on what other databases seem to favor for this type of question. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org