uros-db commented on code in PR #46511:
URL: https://github.com/apache/spark/pull/46511#discussion_r1612011134


##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##########
@@ -430,7 +430,7 @@ public static int execBinary(final UTF8String string, final 
UTF8String substring
     }
     public static int execLowercase(final UTF8String string, final UTF8String 
substring,
         final int start) {
-      return string.toLowerCase().indexOf(substring.toLowerCase(), start);

Review Comment:
   so as part of this PR, we actually changed the core definition of 
string-searching in UTF8_BINARY_LCASE, i.e. what it means for one substring 
(_pattern_) to be found in another string (_target_) under UTF8_BINARY_LCASE
   
   in the old implementation, `contains("İ", "i")` would return `true` - 
however, this behaviour is incorrect because it relies on the fact that 
`substr(lower("İ"), 1, 1)` == `"i"` (incorrect, old implementation), instead of 
`lower(substr("İ"substr))` == `"i"` (correct, new implementation)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to