GideonPotok commented on PR #46597:
URL: https://github.com/apache/spark/pull/46597#issuecomment-2118991669

   > > since Mode expression works with any child expression, and you 
special-cased handling Strings, how do we handle Array(String) and 
Struct(String), etc.?
   > 
   > In my local tests, I found that Mode performs a byte-by-byte comparison 
for structs, which does not consider collation. So that is still outstanding. 
Good catch!
   > 
   > @uros-db There are several strategies we might adopt to handle structs 
with collation fields. I am looking into implementations. It is potentially 
straightforward though have some gotchas.
   > 
   > Do you feel I should solve for that in a separate PR or in this one? I 
assume you prefer that this get solve in this PR and not a follow-up PR, right?
   
   @uros-db 
   
   Added implementation for mode to support structs with fields with the 
various collations. Performance is not great, so far.
     
   ```
   [info] collation unit benchmarks - mode - 30105 elements:  Best Time(ms)   
Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
---------------------------------------------------------------------------------------------------------------------------------
   [info] UTF8_BINARY_LCASE - mode - 30105 elements                     31      
       32           1          9.8         102.3       1.0X
   [info] UNICODE - mode - 30105 elements                                1      
        1           0        240.4           4.2      24.6X
   [info] UTF8_BINARY - mode - 30105 elements                            1      
        1           0        239.1           4.2      24.5X
   [info] UNICODE_CI - mode - 30105 elements                            57      
       59           2          5.3         189.9       0.5X
   ```
   
   I will add the benchmark results from GHA once I get your feedback.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to