GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2118991669
> > since Mode expression works with any child expression, and you special-cased handling Strings, how do we handle Array(String) and Struct(String), etc.? > > In my local tests, I found that Mode performs a byte-by-byte comparison for structs, which does not consider collation. So that is still outstanding. Good catch! > > @uros-db There are several strategies we might adopt to handle structs with collation fields. I am looking into implementations. It is potentially straightforward though have some gotchas. > > Do you feel I should solve for that in a separate PR or in this one? I assume you prefer that this get solve in this PR and not a follow-up PR, right? @uros-db Added implementation for mode to support structs with fields with the various collations. Performance is not great, so far. ``` [info] collation unit benchmarks - mode - 30105 elements: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] --------------------------------------------------------------------------------------------------------------------------------- [info] UTF8_BINARY_LCASE - mode - 30105 elements 31 32 1 9.8 102.3 1.0X [info] UNICODE - mode - 30105 elements 1 1 0 240.4 4.2 24.6X [info] UTF8_BINARY - mode - 30105 elements 1 1 0 239.1 4.2 24.5X [info] UNICODE_CI - mode - 30105 elements 57 59 2 5.3 189.9 0.5X ``` I will add the benchmark results from GHA once I get your feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org