This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 890f78d03020 [SPARK-47418][SQL] Add hand-crafted implementations for 
lowercase unicode-aware contains, startsWith and endsWith and optimize 
UTF8_BINARY_LCASE
890f78d03020 is described below

commit 890f78d03020f905b732054c78748d8d21a69fcf
Author: Vladimir Golubev <vladimir.golu...@databricks.com>
AuthorDate: Wed Apr 24 15:59:54 2024 +0800

    [SPARK-47418][SQL] Add hand-crafted implementations for lowercase 
unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE
    
    ### What changes were proposed in this pull request?
    Added hand-crafted implementations of unicode-aware lower-case `contains`, 
`startsWith`, `endsWith` to optimize UTF8_BINARY_LCASE for ASCII-only strings.
    
    ### Why are the changes needed?
    `UTF8String.toLowerCase()`, which is used for the aforementioned 
collation-aware functions, has an optimization for full-ascii strings, but 
still always allocates a new object. In this PR I introduced loop-based 
implementations, which fall-back to `toLowerCase()` in case they meet a 
non-asci character.
    
    ### Does this PR introduce _any_ user-facing change?
    No, these functions should behave exactly as:
    - `lhs.containsInLowerCase(rhs)` == 
`lhs.toLowerCase().contains(rhs.toLowerCase())`
    - `lhs.startsWithInLowerCase(rhs)` == 
`lhs.toLowerCase().startsWith(rhs.toLowerCase())`
    - `lhs.endsWithInLowerCase(rhs)` == 
`lhs.toLowerCase().endsWith(rhs.toLowerCase())`
    
    ### How was this patch tested?
    Added new test cases to 
`org.apache.spark.unsafe.types.CollationSupportSuite` and 
`org.apache.spark.unsafe.types.UTF8StringSuite`, including several unicode 
lowercase specific. Also I've run `CollationBenchmark` on GHA for JDK 17 and 
JDK 21 and have updated the data.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #46181 from 
vladimirg-db/vladimirg-db/add-hand-crafted-string-function-implementations-for-utf8-binary-lcase-collations.
    
    Authored-by: Vladimir Golubev <vladimir.golu...@databricks.com>
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
---
 .../spark/sql/catalyst/util/CollationSupport.java  |   6 +-
 .../org/apache/spark/unsafe/types/UTF8String.java  | 143 +++++++++++++++++++--
 .../spark/unsafe/types/CollationSupportSuite.java  |  34 +++++
 .../apache/spark/unsafe/types/UTF8StringSuite.java | 105 +++++++++++++++
 .../CollationBenchmark-jdk21-results.txt           |  60 ++++-----
 sql/core/benchmarks/CollationBenchmark-results.txt |  60 ++++-----
 .../CollationNonASCIIBenchmark-jdk21-results.txt   |  60 ++++-----
 .../CollationNonASCIIBenchmark-results.txt         |  60 ++++-----
 8 files changed, 396 insertions(+), 132 deletions(-)

diff --git 
a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
 
b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
index b28321230840..3e4973f5c187 100644
--- 
a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
+++ 
b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
@@ -60,7 +60,7 @@ public final class CollationSupport {
       return l.contains(r);
     }
     public static boolean execLowercase(final UTF8String l, final UTF8String 
r) {
-      return l.toLowerCase().contains(r.toLowerCase());
+      return l.containsInLowerCase(r);
     }
     public static boolean execICU(final UTF8String l, final UTF8String r,
         final int collationId) {
@@ -98,7 +98,7 @@ public final class CollationSupport {
       return l.startsWith(r);
     }
     public static boolean execLowercase(final UTF8String l, final UTF8String 
r) {
-      return l.toLowerCase().startsWith(r.toLowerCase());
+      return l.startsWithInLowerCase(r);
     }
     public static boolean execICU(final UTF8String l, final UTF8String r,
         final int collationId) {
@@ -135,7 +135,7 @@ public final class CollationSupport {
       return l.endsWith(r);
     }
     public static boolean execLowercase(final UTF8String l, final UTF8String 
r) {
-      return l.toLowerCase().endsWith(r.toLowerCase());
+      return l.endsWithInLowerCase(r);
     }
     public static boolean execICU(final UTF8String l, final UTF8String r,
         final int collationId) {
diff --git 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
index 2009f1d20442..8ceeddb0c3dd 100644
--- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
+++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
@@ -341,6 +341,44 @@ public final class UTF8String implements 
Comparable<UTF8String>, Externalizable,
     return false;
   }
 
+  /**
+   * Returns whether `this` contains `substring` in a lowercase unicode-aware 
manner
+   *
+   * This function is written in a way which avoids excessive allocations in 
case if we work with
+   * bare ASCII-character strings.
+   */
+  public boolean containsInLowerCase(final UTF8String substring) {
+    if (substring.numBytes == 0) {
+      return true;
+    }
+
+    // Both `this` and the `substring` are checked for non-ASCII characters, 
otherwise we would
+    // have to use `startsWithLowerCase(...)` in a loop, and it would 
frequently allocate
+    // (e.g. in case of `containsInLowerCase("1大1大1大...", "11")`)
+    if (!substring.isFullAscii()) {
+      return toLowerCase().contains(substring.toLowerCaseSlow());
+    }
+    if (!isFullAscii()) {
+      return toLowerCaseSlow().contains(substring.toLowerCaseAscii());
+    }
+
+    if (numBytes < substring.numBytes) {
+      return false;
+    }
+
+    final var firstLower = Character.toLowerCase(substring.getByte(0));
+    for (var i = 0; i <= (numBytes - substring.numBytes); i++) {
+      if (Character.toLowerCase(getByte(i)) == firstLower) {
+        final var rest = UTF8String.fromAddress(base, offset + i, numBytes - 
i);
+        if (rest.matchAtInLowerCaseAscii(substring, 0)) {
+          return true;
+        }
+      }
+    }
+
+    return false;
+  }
+
   /**
    * Returns the byte at position `i`.
    */
@@ -355,14 +393,94 @@ public final class UTF8String implements 
Comparable<UTF8String>, Externalizable,
     return ByteArrayMethods.arrayEquals(base, offset + pos, s.base, s.offset, 
s.numBytes);
   }
 
+  private boolean matchAtInLowerCaseAscii(final UTF8String s, int pos) {
+    if (s.numBytes + pos > numBytes || pos < 0) {
+      return false;
+    }
+
+    for (var i = 0; i < s.numBytes; i++) {
+      if (Character.toLowerCase(getByte(pos + i)) != 
Character.toLowerCase(s.getByte(i))) {
+        return false;
+      }
+    }
+
+    return true;
+  }
+
   public boolean startsWith(final UTF8String prefix) {
     return matchAt(prefix, 0);
   }
 
+  /**
+   * Checks whether `prefix` is a prefix of `this` in a lowercase 
unicode-aware manner
+   *
+   * This function is written in a way which avoids excessive allocations in 
case if we work with
+   * bare ASCII-character strings.
+   */
+  public boolean startsWithInLowerCase(final UTF8String prefix) {
+    // No way to match sizes of strings for early return, since single 
grapheme can be expanded
+    // into several independent ones in lowercase
+    if (prefix.numBytes == 0) {
+      return true;
+    }
+    if (numBytes == 0) {
+      return false;
+    }
+
+    if (!prefix.isFullAscii()) {
+      return toLowerCase().startsWith(prefix.toLowerCaseSlow());
+    }
+
+    final var part = prefix.numBytes >= numBytes ? this : 
UTF8String.fromAddress(
+      base, offset, prefix.numBytes);
+    if (!part.isFullAscii()) {
+      return toLowerCaseSlow().startsWith(prefix.toLowerCaseAscii());
+    }
+
+    if (numBytes < prefix.numBytes) {
+      return false;
+    }
+
+    return matchAtInLowerCaseAscii(prefix, 0);
+  }
+
   public boolean endsWith(final UTF8String suffix) {
     return matchAt(suffix, numBytes - suffix.numBytes);
   }
 
+  /**
+   * Checks whether `suffix` is a suffix of `this` in a lowercase 
unicode-aware manner
+   *
+   * This function is written in a way which avoids excessive allocations in 
case if we work with
+   * bare ASCII-character strings.
+   */
+  public boolean endsWithInLowerCase(final UTF8String suffix) {
+    // No way to match sizes of strings for early return, since single 
grapheme can be expanded
+    // into several independent ones in lowercase
+    if (suffix.numBytes == 0) {
+      return true;
+    }
+    if (numBytes == 0) {
+      return false;
+    }
+
+    if (!suffix.isFullAscii()) {
+      return toLowerCase().endsWith(suffix.toLowerCaseSlow());
+    }
+
+    final var part = suffix.numBytes >= numBytes ? this : 
UTF8String.fromAddress(
+      base, offset + numBytes - suffix.numBytes, suffix.numBytes);
+    if (!part.isFullAscii()) {
+      return toLowerCaseSlow().endsWith(suffix.toLowerCaseAscii());
+    }
+
+    if (numBytes < suffix.numBytes) {
+      return false;
+    }
+
+    return matchAtInLowerCaseAscii(suffix, numBytes - suffix.numBytes);
+  }
+
   /**
    * Returns the upper case of this string
    */
@@ -423,24 +541,31 @@ public final class UTF8String implements 
Comparable<UTF8String>, Externalizable,
     if (numBytes == 0) {
       return EMPTY_UTF8;
     }
-    // Optimization - do char level lowercase conversion in case of chars in 
ASCII range
-    for (int i = 0; i < numBytes; i++) {
+
+    return isFullAscii() ? toLowerCaseAscii() : toLowerCaseSlow();
+  }
+
+  private boolean isFullAscii() {
+    for (var i = 0; i < numBytes; i++) {
       if (getByte(i) < 0) {
-        // non-ASCII
-        return toLowerCaseSlow();
+        return false;
       }
     }
-    byte[] bytes = new byte[numBytes];
-    for (int i = 0; i < numBytes; i++) {
-      bytes[i] = (byte) Character.toLowerCase(getByte(i));
-    }
-    return fromBytes(bytes);
+    return true;
   }
 
   private UTF8String toLowerCaseSlow() {
     return fromString(toString().toLowerCase());
   }
 
+  private UTF8String toLowerCaseAscii() {
+    final var bytes = new byte[numBytes];
+    for (var i = 0; i < numBytes; i++) {
+      bytes[i] = (byte) Character.toLowerCase(getByte(i));
+    }
+    return fromBytes(bytes);
+  }
+
   /**
    * Returns the title case of this string, that could be used as title.
    */
diff --git 
a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java
 
b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java
index 3fca7296b832..d59bd5c20e67 100644
--- 
a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java
+++ 
b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java
@@ -104,6 +104,18 @@ public class CollationSupportSuite {
     // Case-variable character length
     assertContains("abİo12", "i̇o", "UNICODE_CI", true);
     assertContains("abi̇o12", "İo", "UNICODE_CI", true);
+    assertContains("the İodine", "the i̇odine", "UTF8_BINARY_LCASE", true);
+    assertContains("the i̇odine", "the İodine", "UTF8_BINARY_LCASE", true);
+    assertContains("The İodiNe", " i̇oDin", "UTF8_BINARY_LCASE", true);
+    assertContains("İodiNe", "i̇oDine", "UTF8_BINARY_LCASE", true);
+    assertContains("İodiNe", " i̇oDin", "UTF8_BINARY_LCASE", false);
+    // Characters with the same binary lowercase representation
+    assertContains("The Kelvin.", "Kelvin", "UTF8_BINARY_LCASE", true);
+    assertContains("The Kelvin.", "Kelvin", "UTF8_BINARY_LCASE", true);
+    assertContains("The KKelvin.", "KKelvin", "UTF8_BINARY_LCASE", true);
+    assertContains("2 Kelvin.", "2 Kelvin", "UTF8_BINARY_LCASE", true);
+    assertContains("2 Kelvin.", "2 Kelvin", "UTF8_BINARY_LCASE", true);
+    assertContains("The KKelvin.", "KKelvin,", "UTF8_BINARY_LCASE", false);
   }
 
   private void assertStartsWith(
@@ -182,6 +194,17 @@ public class CollationSupportSuite {
     // Case-variable character length
     assertStartsWith("İonic", "i̇o", "UNICODE_CI", true);
     assertStartsWith("i̇onic", "İo", "UNICODE_CI", true);
+    assertStartsWith("the İodine", "the i̇odine", "UTF8_BINARY_LCASE", true);
+    assertStartsWith("the i̇odine", "the İodine", "UTF8_BINARY_LCASE", true);
+    assertStartsWith("İodiNe", "i̇oDin", "UTF8_BINARY_LCASE", true);
+    assertStartsWith("The İodiNe", "i̇oDin", "UTF8_BINARY_LCASE", false);
+    // Characters with the same binary lowercase representation
+    assertStartsWith("Kelvin.", "Kelvin", "UTF8_BINARY_LCASE", true);
+    assertStartsWith("Kelvin.", "Kelvin", "UTF8_BINARY_LCASE", true);
+    assertStartsWith("KKelvin.", "KKelvin", "UTF8_BINARY_LCASE", true);
+    assertStartsWith("2 Kelvin.", "2 Kelvin", "UTF8_BINARY_LCASE", true);
+    assertStartsWith("2 Kelvin.", "2 Kelvin", "UTF8_BINARY_LCASE", true);
+    assertStartsWith("KKelvin.", "KKelvin,", "UTF8_BINARY_LCASE", false);
   }
 
   private void assertEndsWith(String pattern, String suffix, String 
collationName, boolean expected)
@@ -259,6 +282,17 @@ public class CollationSupportSuite {
     // Case-variable character length
     assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);
     assertEndsWith("The i̇o", "İo", "UNICODE_CI", true);
+    assertEndsWith("the İodine", "the i̇odine", "UTF8_BINARY_LCASE", true);
+    assertEndsWith("the i̇odine", "the İodine", "UTF8_BINARY_LCASE", true);
+    assertEndsWith("The İodiNe", "i̇oDine", "UTF8_BINARY_LCASE", true);
+    assertEndsWith("The İodiNe", "i̇oDin", "UTF8_BINARY_LCASE", false);
+    // Characters with the same binary lowercase representation
+    assertEndsWith("The Kelvin", "Kelvin", "UTF8_BINARY_LCASE", true);
+    assertEndsWith("The Kelvin", "Kelvin", "UTF8_BINARY_LCASE", true);
+    assertEndsWith("The KKelvin", "KKelvin", "UTF8_BINARY_LCASE", true);
+    assertEndsWith("The 2 Kelvin", "2 Kelvin", "UTF8_BINARY_LCASE", true);
+    assertEndsWith("The 2 Kelvin", "2 Kelvin", "UTF8_BINARY_LCASE", true);
+    assertEndsWith("The KKelvin", "KKelvin,", "UTF8_BINARY_LCASE", false);
   }
 
 
diff --git 
a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java
 
b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java
index 934b93c9345b..711e31fd6881 100644
--- 
a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java
+++ 
b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java
@@ -215,6 +215,43 @@ public class UTF8StringSuite {
     assertFalse(fromString("大千世界").contains(fromString("大千世界好")));
   }
 
+  @Test
+  public void containsInLowerCase() {
+    // Corner cases
+    assertTrue(EMPTY_UTF8.containsInLowerCase(EMPTY_UTF8));
+    assertTrue(fromString("a").containsInLowerCase(EMPTY_UTF8));
+    assertTrue(fromString("A").containsInLowerCase(fromString("a")));
+    assertTrue(fromString("a").containsInLowerCase(fromString("A")));
+    assertFalse(EMPTY_UTF8.containsInLowerCase(fromString("a")));
+    // ASCII
+    assertTrue(fromString("hello").containsInLowerCase(fromString("ello")));
+    assertFalse(fromString("hello").containsInLowerCase(fromString("vello")));
+    
assertFalse(fromString("hello").containsInLowerCase(fromString("hellooo")));
+    // Unicode
+    assertTrue(fromString("大千世界").containsInLowerCase(fromString("千世界")));
+    assertFalse(fromString("大千世界").containsInLowerCase(fromString("世千")));
+    assertFalse(fromString("大千世界").containsInLowerCase(fromString("大千世界好")));
+    // ASCII lowercase
+    assertTrue(fromString("HeLlO").containsInLowerCase(fromString("ElL")));
+    assertFalse(fromString("HeLlO").containsInLowerCase(fromString("ElLoO")));
+    // Unicode lowercase
+    assertTrue(fromString("ЯбЛоКо").containsInLowerCase(fromString("БлОк")));
+    assertFalse(fromString("ЯбЛоКо").containsInLowerCase(fromString("лОкБ")));
+    // Characters with the same binary lowercase representation
+    assertTrue(fromString("The 
Kelvin.").containsInLowerCase(fromString("Kelvin")));
+    assertTrue(fromString("The 
Kelvin.").containsInLowerCase(fromString("Kelvin")));
+    assertTrue(fromString("The 
KKelvin.").containsInLowerCase(fromString("KKelvin")));
+    assertTrue(fromString("2 Kelvin.").containsInLowerCase(fromString("2 
Kelvin")));
+    assertTrue(fromString("2 Kelvin.").containsInLowerCase(fromString("2 
Kelvin")));
+    assertFalse(fromString("The 
KKelvin.").containsInLowerCase(fromString("KKelvin,")));
+    // Characters with longer binary lowercase representation
+    assertTrue(fromString("the İodine").containsInLowerCase(fromString("the 
i̇odine")));
+    assertTrue(fromString("the i̇odine").containsInLowerCase(fromString("the 
İodine")));
+    assertTrue(fromString("The İodiNe").containsInLowerCase(fromString(" 
i̇oDin")));
+    assertTrue(fromString("İodiNe").containsInLowerCase(fromString("i̇oDin")));
+    assertFalse(fromString("İodiNe").containsInLowerCase(fromString(" 
i̇oDin")));
+  }
+
   @Test
   public void startsWith() {
     assertTrue(EMPTY_UTF8.startsWith(EMPTY_UTF8));
@@ -226,6 +263,40 @@ public class UTF8StringSuite {
     assertFalse(fromString("大千世界").startsWith(fromString("大千世界好")));
   }
 
+  @Test
+  public void startsWithInLowerCase() {
+    // Corner cases
+    assertTrue(EMPTY_UTF8.startsWithInLowerCase(EMPTY_UTF8));
+    assertTrue(fromString("a").startsWithInLowerCase(EMPTY_UTF8));
+    assertTrue(fromString("A").startsWithInLowerCase(fromString("a")));
+    assertTrue(fromString("a").startsWithInLowerCase(fromString("A")));
+    assertFalse(EMPTY_UTF8.startsWithInLowerCase(fromString("a")));
+    // ASCII
+    assertTrue(fromString("hello").startsWithInLowerCase(fromString("hell")));
+    assertFalse(fromString("hello").startsWithInLowerCase(fromString("ell")));
+    // Unicode
+    assertTrue(fromString("大千世界").startsWithInLowerCase(fromString("大千")));
+    assertFalse(fromString("大千世界").startsWithInLowerCase(fromString("世千")));
+    // ASCII lowercase
+    assertTrue(fromString("HeLlO").startsWithInLowerCase(fromString("hElL")));
+    assertFalse(fromString("HeLlO").startsWithInLowerCase(fromString("ElL")));
+    // Unicode lowercase
+    
assertTrue(fromString("ЯбЛоКо").startsWithInLowerCase(fromString("яБлОк")));
+    
assertFalse(fromString("ЯбЛоКо").startsWithInLowerCase(fromString("БлОк")));
+    // Characters with the same binary lowercase representation
+    
assertTrue(fromString("Kelvin.").startsWithInLowerCase(fromString("Kelvin")));
+    
assertTrue(fromString("Kelvin.").startsWithInLowerCase(fromString("Kelvin")));
+    
assertTrue(fromString("KKelvin.").startsWithInLowerCase(fromString("KKelvin")));
+    assertTrue(fromString("2 Kelvin.").startsWithInLowerCase(fromString("2 
Kelvin")));
+    assertTrue(fromString("2 Kelvin.").startsWithInLowerCase(fromString("2 
Kelvin")));
+    
assertFalse(fromString("KKelvin.").startsWithInLowerCase(fromString("KKelvin,")));
+    // Characters with longer binary lowercase representation
+    assertTrue(fromString("the İodine").startsWithInLowerCase(fromString("the 
i̇odine")));
+    assertTrue(fromString("the i̇odine").startsWithInLowerCase(fromString("the 
İodine")));
+    
assertTrue(fromString("İodiNe").startsWithInLowerCase(fromString("i̇oDin")));
+    assertFalse(fromString("The 
İodiNe").startsWithInLowerCase(fromString("i̇oDin")));
+  }
+
   @Test
   public void endsWith() {
     assertTrue(EMPTY_UTF8.endsWith(EMPTY_UTF8));
@@ -237,6 +308,40 @@ public class UTF8StringSuite {
     assertFalse(fromString("数据砖头").endsWith(fromString("我的数据砖头")));
   }
 
+  @Test
+  public void endsWithInLowerCase() {
+    // Corner cases
+    assertTrue(EMPTY_UTF8.endsWithInLowerCase(EMPTY_UTF8));
+    assertTrue(fromString("a").endsWithInLowerCase(EMPTY_UTF8));
+    assertTrue(fromString("A").endsWithInLowerCase(fromString("a")));
+    assertTrue(fromString("a").endsWithInLowerCase(fromString("A")));
+    assertFalse(EMPTY_UTF8.endsWithInLowerCase(fromString("a")));
+    // ASCII
+    assertTrue(fromString("hello").endsWithInLowerCase(fromString("ello")));
+    assertFalse(fromString("hello").endsWithInLowerCase(fromString("hell")));
+    // Unicode
+    assertTrue(fromString("大千世界").endsWithInLowerCase(fromString("世界")));
+    assertFalse(fromString("大千世界").endsWithInLowerCase(fromString("大千")));
+    // ASCII lowercase
+    assertTrue(fromString("HeLlO").endsWithInLowerCase(fromString("ElLo")));
+    assertFalse(fromString("HeLlO").endsWithInLowerCase(fromString("hElL")));
+    // Unicode lowercase
+    assertTrue(fromString("ЯбЛоКо").endsWithInLowerCase(fromString("БлОкО")));
+    assertFalse(fromString("ЯбЛоКо").endsWithInLowerCase(fromString("яБлОк")));
+    // Characters with the same binary lowercase representation
+    assertTrue(fromString("The 
Kelvin").endsWithInLowerCase(fromString("Kelvin")));
+    assertTrue(fromString("The 
Kelvin").endsWithInLowerCase(fromString("Kelvin")));
+    assertTrue(fromString("The 
KKelvin").endsWithInLowerCase(fromString("KKelvin")));
+    assertTrue(fromString("The 2 Kelvin").endsWithInLowerCase(fromString("2 
Kelvin")));
+    assertTrue(fromString("The 2 Kelvin").endsWithInLowerCase(fromString("2 
Kelvin")));
+    assertFalse(fromString("The 
KKelvin").endsWithInLowerCase(fromString("KKelvin,")));
+    // Characters with longer binary lowercase representation
+    assertTrue(fromString("the İodine").endsWithInLowerCase(fromString("the 
i̇odine")));
+    assertTrue(fromString("the i̇odine").endsWithInLowerCase(fromString("the 
İodine")));
+    assertTrue(fromString("The 
İodiNe").endsWithInLowerCase(fromString("i̇oDine")));
+    assertFalse(fromString("The 
İodiNe").endsWithInLowerCase(fromString("i̇oDin")));
+  }
+
   @Test
   public void substring() {
     assertEquals(EMPTY_UTF8, fromString("hello").substring(0, 0));
diff --git a/sql/core/benchmarks/CollationBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/CollationBenchmark-jdk21-results.txt
index 24605e051dbb..326e6b705313 100644
--- a/sql/core/benchmarks/CollationBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/CollationBenchmark-jdk21-results.txt
@@ -1,54 +1,54 @@
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
--------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                    6910           6912       
    3          0.0       69099.7       1.0X
-UNICODE                                              4367           4368       
    1          0.0       43669.6       1.6X
-UTF8_BINARY                                          4361           4364       
    4          0.0       43606.5       1.6X
-UNICODE_CI                                          46480          46526       
   66          0.0      464795.7       0.1X
+UTF8_BINARY_LCASE                                    6686           6690       
    5          0.0       66862.9       1.0X
+UNICODE                                              4302           4314       
   17          0.0       43021.3       1.6X
+UTF8_BINARY                                          4295           4299       
    6          0.0       42951.9       1.6X
+UNICODE_CI                                          43948          43951       
    4          0.0      439481.4       0.2X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                     6522           6526      
     4          0.0       65223.9       1.0X
-UNICODE                                              45792          45797      
     7          0.0      457922.3       0.1X
-UTF8_BINARY                                           7092           7112      
    29          0.0       70921.7       0.9X
-UNICODE_CI                                           47548          47564      
    22          0.0      475476.7       0.1X
+UTF8_BINARY_LCASE                                     7919           7920      
     1          0.0       79188.2       1.0X
+UNICODE                                              45764          45795      
    44          0.0      457641.3       0.2X
+UTF8_BINARY                                           7384           7388      
     5          0.0       73839.9       1.1X
+UNICODE_CI                                           48078          48099      
    29          0.0      480782.5       0.2X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 11716          11716         
  1          0.0      117157.9       1.0X
-UNICODE                                          180133         180137         
  5          0.0     1801332.1       0.1X
-UTF8_BINARY                                       10476          10477         
  1          0.0      104757.4       1.1X
-UNICODE_CI                                       148171         148190         
 28          0.0     1481705.6       0.1X
+UTF8_BINARY_LCASE                                 11353          11354         
  2          0.0      113527.0       1.0X
+UNICODE                                          175533         175720         
265          0.0     1755327.6       0.1X
+UTF8_BINARY                                        9995           9998         
  3          0.0       99953.2       1.1X
+UNICODE_CI                                       148475         148498         
 33          0.0     1484745.3       0.1X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - contains:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 49257          49280         
 32          0.0      492574.0       1.0X
-UNICODE                                           18253          18293         
 57          0.0      182530.8       2.7X
-UTF8_BINARY                                       20199          20247         
 68          0.0      201987.8       2.4X
-UNICODE_CI                                       882302         882576         
387          0.0     8823023.9       0.1X
+UTF8_BINARY_LCASE                                 28707          28715         
 12          0.0      287065.8       1.0X
+UNICODE                                           15578          15623         
 64          0.0      155783.5       1.8X
+UTF8_BINARY                                       17321          17410         
126          0.0      173208.7       1.7X
+UNICODE_CI                                       907463         907667         
289          0.0     9074632.3       0.0X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - startsWith:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 45015          45024         
 13          0.0      450153.7       1.0X
-UNICODE                                           17425          17455         
 43          0.0      174247.1       2.6X
-UTF8_BINARY                                       19237          19268         
 44          0.0      192371.4       2.3X
-UNICODE_CI                                       954993         955680         
971          0.0     9549930.3       0.0X
+UTF8_BINARY_LCASE                                 28001          28011         
 14          0.0      280014.2       1.0X
+UNICODE                                           15284          15288         
  5          0.0      152841.3       1.8X
+UTF8_BINARY                                       17035          17042         
 10          0.0      170348.1       1.6X
+UNICODE_CI                                       873571         874628        
1494          0.0     8735712.7       0.0X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - endsWith:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 45919          45966         
 67          0.0      459187.0       1.0X
-UNICODE                                           17697          17713         
 23          0.0      176970.4       2.6X
-UTF8_BINARY                                       19448          19449         
  2          0.0      194479.6       2.4X
-UNICODE_CI                                       962916         963010         
133          0.0     9629158.5       0.0X
+UTF8_BINARY_LCASE                                 28260          28263         
  3          0.0      282603.9       1.0X
+UNICODE                                           15531          15538         
  9          0.0      155312.3       1.8X
+UTF8_BINARY                                       17239          17242         
  5          0.0      172387.3       1.6X
+UNICODE_CI                                       886437         888336        
2685          0.0     8864372.1       0.0X
 
diff --git a/sql/core/benchmarks/CollationBenchmark-results.txt 
b/sql/core/benchmarks/CollationBenchmark-results.txt
index a92aadc52ee2..7b28c96f379c 100644
--- a/sql/core/benchmarks/CollationBenchmark-results.txt
+++ b/sql/core/benchmarks/CollationBenchmark-results.txt
@@ -1,54 +1,54 @@
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
--------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                    7692           7731       
   55          0.0       76919.2       1.0X
-UNICODE                                              4378           4379       
    0          0.0       43784.6       1.8X
-UTF8_BINARY                                          4382           4396       
   19          0.0       43821.6       1.8X
-UNICODE_CI                                          48344          48360       
   23          0.0      483436.5       0.2X
+UTF8_BINARY_LCASE                                    7726           7727       
    2          0.0       77260.4       1.0X
+UNICODE                                              4411           4412       
    2          0.0       44106.7       1.8X
+UTF8_BINARY                                          4409           4414       
    6          0.0       44090.3       1.8X
+UNICODE_CI                                          46811          46820       
   12          0.0      468113.1       0.2X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                     9819           9820      
     0          0.0       98194.9       1.0X
-UNICODE                                              49507          49518      
    17          0.0      495066.2       0.2X
-UTF8_BINARY                                           7354           7365      
    17          0.0       73536.3       1.3X
-UNICODE_CI                                           52149          52163      
    20          0.0      521489.4       0.2X
+UTF8_BINARY_LCASE                                     6290           6293      
     5          0.0       62895.3       1.0X
+UNICODE                                              48173          48210      
    53          0.0      481725.5       0.1X
+UTF8_BINARY                                           5252           5259      
     9          0.0       52524.2       1.2X
+UNICODE_CI                                           48093          48104      
    16          0.0      480931.0       0.1X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 18110          18127         
 24          0.0      181103.9       1.0X
-UNICODE                                          171375         171435         
 85          0.0     1713752.3       0.1X
-UTF8_BINARY                                       14012          14030         
 26          0.0      140116.7       1.3X
-UNICODE_CI                                       153847         153901         
 76          0.0     1538471.1       0.1X
+UTF8_BINARY_LCASE                                 18369          18386         
 25          0.0      183685.5       1.0X
+UNICODE                                          177476         177572         
135          0.0     1774764.4       0.1X
+UTF8_BINARY                                       14029          14039         
 13          0.0      140293.8       1.3X
+UNICODE_CI                                       150438         150527         
126          0.0     1504375.8       0.1X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - contains:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 48528          48534         
  8          0.0      485281.3       1.0X
-UNICODE                                           17612          17628         
 23          0.0      176119.4       2.8X
-UTF8_BINARY                                       19664          19671         
 11          0.0      196636.4       2.5X
-UNICODE_CI                                       860919         862936        
2853          0.0     8609190.8       0.1X
+UTF8_BINARY_LCASE                                 33830          33842         
 17          0.0      338295.2       1.0X
+UNICODE                                           19038          19040         
  3          0.0      190376.9       1.8X
+UTF8_BINARY                                       21217          21222         
  7          0.0      212165.3       1.6X
+UNICODE_CI                                       888851         890073        
1729          0.0     8888510.1       0.0X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - startsWith:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 49520          49524         
  7          0.0      495195.4       1.0X
-UNICODE                                           18346          18346         
  0          0.0      183457.7       2.7X
-UTF8_BINARY                                       20483          20488         
  7          0.0      204828.7       2.4X
-UNICODE_CI                                       928756         930065        
1851          0.0     9287564.4       0.1X
+UTF8_BINARY_LCASE                                 31240          31248         
 11          0.0      312403.3       1.0X
+UNICODE                                           17197          17208         
 16          0.0      171969.9       1.8X
+UTF8_BINARY                                       19262          19263         
  1          0.0      192620.0       1.6X
+UNICODE_CI                                       879963         881716        
2479          0.0     8799628.7       0.0X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - endsWith:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 49501          49504         
  5          0.0      495006.9       1.0X
-UNICODE                                           18052          18095         
 61          0.0      180523.7       2.7X
-UTF8_BINARY                                       20187          20197         
 15          0.0      201867.1       2.5X
-UNICODE_CI                                       934011         938842        
6833          0.0     9340109.8       0.1X
+UTF8_BINARY_LCASE                                 31490          31505         
 21          0.0      314902.0       1.0X
+UNICODE                                           17129          17157         
 40          0.0      171292.3       1.8X
+UTF8_BINARY                                       19336          19340         
  6          0.0      193362.0       1.6X
+UNICODE_CI                                       908514         911193        
3788          0.0     9085140.8       0.0X
 
diff --git a/sql/core/benchmarks/CollationNonASCIIBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/CollationNonASCIIBenchmark-jdk21-results.txt
index 0a50baab36ea..9573a37c3a9c 100644
--- a/sql/core/benchmarks/CollationNonASCIIBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/CollationNonASCIIBenchmark-jdk21-results.txt
@@ -1,54 +1,54 @@
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
--------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                   18244          18258       
   20          0.0      456096.4       1.0X
-UNICODE                                               498            498       
    0          0.1       12440.3      36.7X
-UTF8_BINARY                                           499            500       
    1          0.1       12467.7      36.6X
-UNICODE_CI                                          13429          13443       
   19          0.0      335725.4       1.4X
+UTF8_BINARY_LCASE                                   18521          18553       
   44          0.0      463036.6       1.0X
+UNICODE                                               459            461       
    2          0.1       11475.9      40.3X
+UTF8_BINARY                                           458            459       
    2          0.1       11442.0      40.5X
+UNICODE_CI                                          13493          13528       
   49          0.0      337322.0       1.4X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                    18377          18399      
    31          0.0      459430.5       1.0X
-UNICODE                                              14238          14240      
     3          0.0      355957.4       1.3X
-UTF8_BINARY                                            975            976      
     1          0.0       24371.3      18.9X
-UNICODE_CI                                           13819          13826      
    10          0.0      345482.6       1.3X
+UTF8_BINARY_LCASE                                    18163          18171      
    11          0.0      454086.2       1.0X
+UNICODE                                              14502          14505      
     4          0.0      362541.9       1.3X
+UTF8_BINARY                                            970            972      
     2          0.0       24246.1      18.7X
+UNICODE_CI                                           14209          14216      
     9          0.0      355231.9       1.3X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                  9183           9230         
 67          0.0      229564.0       1.0X
-UNICODE                                           38937          38952         
 22          0.0      973421.3       0.2X
-UTF8_BINARY                                        1376           1376         
  0          0.0       34397.5       6.7X
-UNICODE_CI                                        32881          32882         
  1          0.0      822027.4       0.3X
+UTF8_BINARY_LCASE                                 10171          10173         
  2          0.0      254276.4       1.0X
+UNICODE                                           39033          39056         
 32          0.0      975819.3       0.3X
+UTF8_BINARY                                        1389           1389         
  1          0.0       34719.6       7.3X
+UNICODE_CI                                        34546          34552         
  9          0.0      863641.0       0.3X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - contains:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 22429          22438         
 13          0.0      560735.1       1.0X
-UNICODE                                            2900           2901         
  2          0.0       72503.2       7.7X
-UTF8_BINARY                                        3190           3198         
 11          0.0       79740.5       7.0X
-UNICODE_CI                                       166847         167278         
609          0.0     4171180.3       0.1X
+UTF8_BINARY_LCASE                                 23928          23938         
 15          0.0      598196.6       1.0X
+UNICODE                                            2711           2712         
  1          0.0       67778.9       8.8X
+UTF8_BINARY                                        2991           2994         
  5          0.0       74774.5       8.0X
+UNICODE_CI                                       168620         168643         
 33          0.0     4215495.0       0.1X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - startsWith:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 22865          22875         
 13          0.0      571636.3       1.0X
-UNICODE                                            3137           3137         
  0          0.0       78422.3       7.3X
-UTF8_BINARY                                        3448           3450         
  3          0.0       86188.5       6.6X
-UNICODE_CI                                       190473         190894         
595          0.0     4761831.2       0.1X
+UTF8_BINARY_LCASE                                 25400          25416         
 23          0.0      634993.8       1.0X
+UNICODE                                            3079           3079         
  1          0.0       76968.9       8.3X
+UTF8_BINARY                                        3376           3380         
  5          0.0       84401.9       7.5X
+UNICODE_CI                                       168738         168850         
159          0.0     4218448.7       0.2X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - endsWith:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 23693          23695         
  3          0.0      592333.2       1.0X
-UNICODE                                            3170           3172         
  3          0.0       79243.5       7.5X
-UTF8_BINARY                                        3472           3473         
  2          0.0       86788.8       6.8X
-UNICODE_CI                                        63331          63603         
384          0.0     1583274.3       0.4X
+UTF8_BINARY_LCASE                                 25067          25113         
 65          0.0      626683.3       1.0X
+UNICODE                                            3070           3082         
 16          0.0       76758.7       8.2X
+UTF8_BINARY                                        3359           3366         
 10          0.0       83983.5       7.5X
+UNICODE_CI                                       180852         180985         
189          0.0     4521288.1       0.1X
 
diff --git a/sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt 
b/sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt
index bef5f9d7211f..6df1f69174d6 100644
--- a/sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt
+++ b/sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt
@@ -1,54 +1,54 @@
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - equalsFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
--------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                   17881          17885       
    6          0.0      447017.7       1.0X
-UNICODE                                               493            495       
    2          0.1       12328.9      36.3X
-UTF8_BINARY                                           493            494       
    1          0.1       12331.4      36.3X
-UNICODE_CI                                          13731          13737       
    8          0.0      343284.6       1.3X
+UTF8_BINARY_LCASE                                   19237          19255       
   26          0.0      480925.9       1.0X
+UNICODE                                               311            319       
   17          0.1        7764.9      61.9X
+UTF8_BINARY                                           313            314       
    1          0.1        7817.9      61.5X
+UNICODE_CI                                          15481          15517       
   52          0.0      387018.9       1.2X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - compareFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                    18041          18047      
     8          0.0      451030.2       1.0X
-UNICODE                                              14023          14047      
    34          0.0      350573.9       1.3X
-UTF8_BINARY                                           1387           1397      
    14          0.0       34680.4      13.0X
-UNICODE_CI                                           14232          14242      
    14          0.0      355808.4       1.3X
+UTF8_BINARY_LCASE                                    17886          17892      
     9          0.0      447142.3       1.0X
+UNICODE                                              13888          13908      
    28          0.0      347192.7       1.3X
+UTF8_BINARY                                           1384           1387      
     5          0.0       34589.8      12.9X
+UNICODE_CI                                           14209          14221      
    17          0.0      355233.9       1.3X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - hashFunction:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 10494          10499         
  6          0.0      262360.0       1.0X
-UNICODE                                           40410          40422         
 17          0.0     1010261.8       0.3X
-UTF8_BINARY                                        2035           2035         
  1          0.0       50877.8       5.2X
-UNICODE_CI                                        31470          31493         
 32          0.0      786752.4       0.3X
+UTF8_BINARY_LCASE                                 10311          10316         
  7          0.0      257774.0       1.0X
+UNICODE                                           39377          39379         
  4          0.0      984422.3       0.3X
+UTF8_BINARY                                        2030           2032         
  3          0.0       50751.8       5.1X
+UNICODE_CI                                        31011          31035         
 34          0.0      775281.7       0.3X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - contains:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 22342          22352         
 13          0.0      558560.4       1.0X
-UNICODE                                            3073           3074         
  0          0.0       76829.5       7.3X
-UTF8_BINARY                                        3486           3487         
  2          0.0       87147.6       6.4X
-UNICODE_CI                                       162838         164378        
2177          0.0     4070960.3       0.1X
+UTF8_BINARY_LCASE                                 21933          21953         
 28          0.0      548332.8       1.0X
+UNICODE                                            2951           2954         
  4          0.0       73782.5       7.4X
+UTF8_BINARY                                        3273           3279         
  8          0.0       81830.0       6.7X
+UNICODE_CI                                       158862         159283         
596          0.0     3971544.5       0.1X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - startsWith:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 21882          21890         
 11          0.0      547051.8       1.0X
-UNICODE                                            2672           2676         
  6          0.0       66799.0       8.2X
-UTF8_BINARY                                        3069           3071         
  2          0.0       76732.2       7.1X
-UNICODE_CI                                       187853         188724        
1232          0.0     4696336.1       0.1X
+UTF8_BINARY_LCASE                                 22054          22093         
 55          0.0      551348.0       1.0X
+UNICODE                                            2745           2779         
 48          0.0       68623.1       8.0X
+UTF8_BINARY                                        3068           3069         
  2          0.0       76703.7       7.2X
+UNICODE_CI                                       157491         157671         
254          0.0     3937270.4       0.1X
 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 collation unit benchmarks - endsWith:     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-UTF8_BINARY_LCASE                                 21818          21866         
 68          0.0      545439.9       1.0X
-UNICODE                                            2637           2643         
  9          0.0       65913.3       8.3X
-UTF8_BINARY                                        3037           3039         
  2          0.0       75934.6       7.2X
-UNICODE_CI                                        61372          61510         
195          0.0     1534307.9       0.4X
+UTF8_BINARY_LCASE                                 21932          21970         
 55          0.0      548288.1       1.0X
+UNICODE                                            2743           2758         
 22          0.0       68566.3       8.0X
+UTF8_BINARY                                        3057           3071         
 19          0.0       76428.2       7.2X
+UNICODE_CI                                       172037         172321         
403          0.0     4300920.2       0.1X
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to