dbatomic commented on code in PR #45643: URL: https://github.com/apache/spark/pull/45643#discussion_r1555389826
########## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ########## @@ -549,6 +549,51 @@ public int findInSet(UTF8String match) { return 0; } + public int findInSet(UTF8String match, int collationId) { + if (CollationFactory.fetchCollation(collationId).supportsBinaryEquality) { + return this.findInSet(match); + } + if (collationId == CollationFactory.UTF8_BINARY_LCASE_COLLATION_ID) { Review Comment: That being said, I also don't like current direction of pushing everything into `UTF8String`. Let me see if we can come up with some cleaner approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org