GideonPotok commented on code in PR #45453: URL: https://github.com/apache/spark/pull/45453#discussion_r1538210182
########## sql/core/benchmarks/CollationBenchmark-results.txt: ########## @@ -0,0 +1,26 @@ +OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure +AMD EPYC 7763 64-Core Processor +filter df column with collation: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +----------------------------------------------------------------------------------------------------------------------------------- +filter df column with collation - UNICODE_CI 403 463 39 0.0 20147470.0 1.0X +filter df column with collation - UNICODE 187 223 37 0.0 9339586.0 2.2X +filter df column with collation - UTF8_BINARY_LCASE 426 434 7 0.0 21300903.4 0.9X +filter df column with collation - UTF8_BINARY 188 199 5 0.0 9403169.1 2.1X + +OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure +AMD EPYC 7763 64-Core Processor +collation unit benchmarks: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +equalsFunction - UTF8_BINARY 0 0 0 10.4 96.6 1.0X Review Comment: Sure, I am happy to make such a change. I will warn you though that it will be different data used between the e2e benchmarks and the unit benchmarks. Which is fine with me but was a previously raised concern, if I understood correctly. @dbatomic Heads up that the two test suites will be using different size input data. ########## sql/core/benchmarks/CollationBenchmark-results.txt: ########## @@ -0,0 +1,26 @@ +OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure +AMD EPYC 7763 64-Core Processor +filter df column with collation: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +----------------------------------------------------------------------------------------------------------------------------------- +filter df column with collation - UNICODE_CI 403 463 39 0.0 20147470.0 1.0X +filter df column with collation - UNICODE 187 223 37 0.0 9339586.0 2.2X +filter df column with collation - UTF8_BINARY_LCASE 426 434 7 0.0 21300903.4 0.9X +filter df column with collation - UTF8_BINARY 188 199 5 0.0 9403169.1 2.1X + +OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure +AMD EPYC 7763 64-Core Processor +collation unit benchmarks: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +equalsFunction - UTF8_BINARY 0 0 0 10.4 96.6 1.0X Review Comment: @MaxGekk Or why don't I instead just see if I can switch the unit used to nanoseconds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org