Re: [PR] [SPARK-47567][SQL] Support LOCATE function to work with collated strings [spark]

via GitHub Mon, 08 Apr 2024 01:41:03 -0700


miland-db commented on code in PR #45791:
URL: https://github.com/apache/spark/pull/45791#discussion_r1555445740



##########
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##########
@@ -176,15 +176,31 @@ public Collation(
    */
 
   public static StringSearch getStringSearch(
-      final UTF8String left,
-      final UTF8String right,
+      final UTF8String targetUTF8String,
+      final UTF8String patternUTF8String,
       final int collationId) {
-    String pattern = right.toString();
-    CharacterIterator target = new StringCharacterIterator(left.toString());
+
+    if (collationId == UTF8_BINARY_COLLATION_ID) {
+      return getStringSearch(targetUTF8String, patternUTF8String);
+    } else if (collationId == UTF8_BINARY_LCASE_COLLATION_ID) {
+      return getStringSearch(targetUTF8String.toLowerCase(), 
patternUTF8String.toLowerCase());
+    }
+
+    String pattern = patternUTF8String.toString();

Review Comment:
   1. _2 allocations for UTF8_BINARY (should be 0)_ - this should be never be 
called for UTF8_BINARY
   2. For this one, I understand the consequences, but it's very similar to 
what we had to do in the `UTF8String` to be able to successfully work with 
UTF8_BINARY_LCASE collation. This makes the code a lot cleaner than it was 
before when we had separate methods with _almost identical_ code for 
UTF8_BINARY_LCASE and other collations.
   
   If this is not good/performant enough, we should think of a some other way 
to solve it because more and more PRs are coming with this change
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47567][SQL] Support LOCATE function to work with collated strings [spark]

Reply via email to