uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1612004770
########## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ########## @@ -430,7 +430,7 @@ public static int execBinary(final UTF8String string, final UTF8String substring } public static int execLowercase(final UTF8String string, final UTF8String substring, final int start) { - return string.toLowerCase().indexOf(substring.toLowerCase(), start); Review Comment: no, unfortunately it's not - while it works fine for ASCII, it actually gives wrong results in some special cases featuring conditional case mapping, when a character has a lowercase equivalent that consists of multiple characters, or is found at a particular place in the string (context-awareness) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org