GideonPotok commented on code in PR #46597:
URL: https://github.com/apache/spark/pull/46597#discussion_r1604951157


##########
sql/core/benchmarks/CollationBenchmark-jdk21-results.txt:
##########
@@ -1,54 +1,63 @@
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
--------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                    2948           2958       
   13          0.0       29483.6       1.0X
-UNICODE                                              2040           2042       
    3          0.0       20396.6       1.4X
-UTF8_BINARY                                          2043           2043       
    0          0.0       20426.3       1.4X
-UNICODE_CI                                          16318          16338       
   28          0.0      163178.4       0.2X
+UTF8_BINARY_LCASE                                    2896           2898       
    3          0.0       28958.7       1.0X
+UNICODE                                              2038           2040       
    3          0.0       20377.5       1.4X
+UTF8_BINARY                                          2053           2054       
    1          0.0       20534.9       1.4X
+UNICODE_CI                                          16779          16802       
   34          0.0      167785.2       0.2X
 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                     3227           3228      
     1          0.0       32272.1       1.0X
-UNICODE                                              16637          16643      
     9          0.0      166367.7       0.2X
-UTF8_BINARY                                           3132           3137      
     7          0.0       31319.2       1.0X
-UNICODE_CI                                           17816          17829      
    18          0.0      178162.4       0.2X
+UTF8_BINARY_LCASE                                     4705           4705      
     0          0.0       47048.0       1.0X
+UNICODE                                              18863          18867      
     6          0.0      188625.3       0.2X
+UTF8_BINARY                                           4894           4901      
    11          0.0       48936.8       1.0X
+UNICODE_CI                                           19595          19598      
     4          0.0      195953.0       0.2X
 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                  4824           4824         
  0          0.0       48243.7       1.0X
-UNICODE                                           69416          69475         
 84          0.0      694158.3       0.1X
-UTF8_BINARY                                        3806           3808         
  2          0.0       38062.8       1.3X
-UNICODE_CI                                        60943          60975         
 45          0.0      609426.2       0.1X
+UTF8_BINARY_LCASE                                  5011           5013         
  2          0.0       50113.1       1.0X
+UNICODE                                           68309          68319         
 13          0.0      683094.7       0.1X
+UTF8_BINARY                                        3887           3887         
  1          0.0       38869.8       1.3X
+UNICODE_CI                                        56675          56686         
 15          0.0      566750.3       0.1X
 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - contains:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 11979          11980         
  1          0.0      119790.4       1.0X
-UNICODE                                            6469           6474         
  7          0.0       64694.8       1.9X
-UTF8_BINARY                                        7253           7253         
  1          0.0       72528.3       1.7X
-UNICODE_CI                                       319124         319881        
1070          0.0     3191244.0       0.0X
+UTF8_BINARY_LCASE                                 10534          10534         
  1          0.0      105336.8       1.0X
+UNICODE                                            5835           5836         
  2          0.0       58348.9       1.8X
+UTF8_BINARY                                        6451           6453         
  3          0.0       64506.4       1.6X
+UNICODE_CI                                       313827         314029         
285          0.0     3138270.1       0.0X
 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - startsWith:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 11584          11595         
 15          0.0      115841.4       1.0X
-UNICODE                                            6155           6156         
  2          0.0       61548.7       1.9X
-UTF8_BINARY                                        6979           6982         
  5          0.0       69785.6       1.7X
-UNICODE_CI                                       318228         318726         
705          0.0     3182275.2       0.0X
+UTF8_BINARY_LCASE                                 10164          10165         
  2          0.0      101635.6       1.0X
+UNICODE                                            5683           5684         
  1          0.0       56828.5       1.8X
+UTF8_BINARY                                        6280           6281         
  2          0.0       62802.3       1.6X
+UNICODE_CI                                       307901         317477       
13542          0.0     3079007.4       0.0X
 
-OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - endsWith:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 11655          11664         
 12          0.0      116552.8       1.0X
-UNICODE                                            6235           6239         
  5          0.0       62350.8       1.9X
-UTF8_BINARY                                        7066           7069         
  5          0.0       70658.1       1.6X
-UNICODE_CI                                       313515         313999         
685          0.0     3135149.1       0.0X
+UTF8_BINARY_LCASE                                 10360          10361         
  1          0.0      103596.7       1.0X
+UNICODE                                            5667           5668         
  0          0.0       56674.0       1.8X
+UTF8_BINARY                                        6307           6309         
  3          0.0       63069.2       1.6X
+UNICODE_CI                                       311942         312293         
496          0.0     3119419.4       0.0X
+
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1021-azure
+AMD EPYC 7763 64-Core Processor
+collation unit benchmarks - mode - 30105 elements:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+---------------------------------------------------------------------------------------------------------------------------------
+UTF8_BINARY_LCASE - mode - 30105 elements                      4              
4           0         80.4          12.4       1.0X
+UNICODE - mode - 30105 elements                                0              
0           0       1277.7           0.8      15.9X
+UTF8_BINARY - mode - 30105 elements                            0              
0           0       1282.2           0.8      15.9X
+UNICODE_CI - mode - 30105 elements                             9              
9           0         32.5          30.7       0.4X

Review Comment:
   The best performance we have seen is with a OpenHashMap approach, basically 
forcing binary equality. But there are a lot of pitfalls. So we should get this 
in place and think about whether that is necessary.
   
   We can consider switching with that implementation later.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to