dbatomic commented on PR #45453:
URL: https://github.com/apache/spark/pull/45453#issuecomment-1988798774

   @GideonPotok - I think that better approach for benchmarking collation track 
is to start with the basics. e.g. unit benchmarks against `CollationFactory` 
+`UTF8String`. E.g. what is the perf diff between simple filter, without the 
rest of the spark stack, between UTF8_BINARY, UTF8_BINARY_LCASE, UNICODE and 
UNICODE_CI. After filter we can do the same for hashFunction. You should be 
able to just generate bunch of UTF8Stings and guide them through 
`comparator`/`hashFunction` of `Collations` in `CollationFactory`.
   
   That way benchmarking will be actionable. Starting immediately with joins is 
too high up and I think that we will not be able to do much with the results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to