Re: [PR] [SPARK-46840][SQL][TESTS] Add `CollationBenchmark` [spark]

via GitHub Tue, 26 Mar 2024 07:00:25 -0700


GideonPotok commented on code in PR #45453:
URL: https://github.com/apache/spark/pull/45453#discussion_r1538210182



##########
sql/core/benchmarks/CollationBenchmark-results.txt:
##########
@@ -0,0 +1,26 @@
+OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure
+AMD EPYC 7763 64-Core Processor
+filter df column with collation:                     Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+-----------------------------------------------------------------------------------------------------------------------------------
+filter df column with collation - UNICODE_CI                   403            
463          39          0.0    20147470.0       1.0X
+filter df column with collation - UNICODE                      187            
223          37          0.0     9339586.0       2.2X
+filter df column with collation - UTF8_BINARY_LCASE            426            
434           7          0.0    21300903.4       0.9X
+filter df column with collation - UTF8_BINARY                  188            
199           5          0.0     9403169.1       2.1X
+
+OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure
+AMD EPYC 7763 64-Core Processor
+collation unit benchmarks:                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+equalsFunction - UTF8_BINARY                          0              0         
  0         10.4          96.6       1.0X

Review Comment:
   Sure, I am happy to make such a change. I will warn you though that it will 
be different data used between the e2e benchmarks and the unit benchmarks. 
Which is fine with me but was a previously raised concern, if I understood 
correctly. @dbatomic Heads up that the two test suites will be using different 
size input data.



##########
sql/core/benchmarks/CollationBenchmark-results.txt:
##########
@@ -0,0 +1,26 @@
+OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure
+AMD EPYC 7763 64-Core Processor
+filter df column with collation:                     Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+-----------------------------------------------------------------------------------------------------------------------------------
+filter df column with collation - UNICODE_CI                   403            
463          39          0.0    20147470.0       1.0X
+filter df column with collation - UNICODE                      187            
223          37          0.0     9339586.0       2.2X
+filter df column with collation - UTF8_BINARY_LCASE            426            
434           7          0.0    21300903.4       0.9X
+filter df column with collation - UTF8_BINARY                  188            
199           5          0.0     9403169.1       2.1X
+
+OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure
+AMD EPYC 7763 64-Core Processor
+collation unit benchmarks:                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+equalsFunction - UTF8_BINARY                          0              0         
  0         10.4          96.6       1.0X

Review Comment:
   @MaxGekk Or why don't I instead just see if I can switch the unit used to 
nanoseconds 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46840][SQL][TESTS] Add `CollationBenchmark` [spark]

Reply via email to