uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1612011134
########## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ########## @@ -430,7 +430,7 @@ public static int execBinary(final UTF8String string, final UTF8String substring } public static int execLowercase(final UTF8String string, final UTF8String substring, final int start) { - return string.toLowerCase().indexOf(substring.toLowerCase(), start); Review Comment: so as part of this PR, we actually changed the core definition of string-searching in UTF8_BINARY_LCASE, i.e. what it means for one substring (_pattern_) to be found in another string (_target_) under UTF8_BINARY_LCASE in the old implementation, `contains("İ", "i")` would return `true` - however, this behaviour is incorrect because it relies on the fact that `substr(lower("İ"), 1, 1)` == `"i"` (incorrect, old implementation), instead of `lower(substr("İ"substr))` == `"i"` (correct, new implementation) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org