This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 890f78d03020 [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE 890f78d03020 is described below commit 890f78d03020f905b732054c78748d8d21a69fcf Author: Vladimir Golubev <vladimir.golu...@databricks.com> AuthorDate: Wed Apr 24 15:59:54 2024 +0800 [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE ### What changes were proposed in this pull request? Added hand-crafted implementations of unicode-aware lower-case `contains`, `startsWith`, `endsWith` to optimize UTF8_BINARY_LCASE for ASCII-only strings. ### Why are the changes needed? `UTF8String.toLowerCase()`, which is used for the aforementioned collation-aware functions, has an optimization for full-ascii strings, but still always allocates a new object. In this PR I introduced loop-based implementations, which fall-back to `toLowerCase()` in case they meet a non-asci character. ### Does this PR introduce _any_ user-facing change? No, these functions should behave exactly as: - `lhs.containsInLowerCase(rhs)` == `lhs.toLowerCase().contains(rhs.toLowerCase())` - `lhs.startsWithInLowerCase(rhs)` == `lhs.toLowerCase().startsWith(rhs.toLowerCase())` - `lhs.endsWithInLowerCase(rhs)` == `lhs.toLowerCase().endsWith(rhs.toLowerCase())` ### How was this patch tested? Added new test cases to `org.apache.spark.unsafe.types.CollationSupportSuite` and `org.apache.spark.unsafe.types.UTF8StringSuite`, including several unicode lowercase specific. Also I've run `CollationBenchmark` on GHA for JDK 17 and JDK 21 and have updated the data. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46181 from vladimirg-db/vladimirg-db/add-hand-crafted-string-function-implementations-for-utf8-binary-lcase-collations. Authored-by: Vladimir Golubev <vladimir.golu...@databricks.com> Signed-off-by: Wenchen Fan <wenc...@databricks.com> --- .../spark/sql/catalyst/util/CollationSupport.java | 6 +- .../org/apache/spark/unsafe/types/UTF8String.java | 143 +++++++++++++++++++-- .../spark/unsafe/types/CollationSupportSuite.java | 34 +++++ .../apache/spark/unsafe/types/UTF8StringSuite.java | 105 +++++++++++++++ .../CollationBenchmark-jdk21-results.txt | 60 ++++----- sql/core/benchmarks/CollationBenchmark-results.txt | 60 ++++----- .../CollationNonASCIIBenchmark-jdk21-results.txt | 60 ++++----- .../CollationNonASCIIBenchmark-results.txt | 60 ++++----- 8 files changed, 396 insertions(+), 132 deletions(-) diff --git a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java index b28321230840..3e4973f5c187 100644 --- a/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java +++ b/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java @@ -60,7 +60,7 @@ public final class CollationSupport { return l.contains(r); } public static boolean execLowercase(final UTF8String l, final UTF8String r) { - return l.toLowerCase().contains(r.toLowerCase()); + return l.containsInLowerCase(r); } public static boolean execICU(final UTF8String l, final UTF8String r, final int collationId) { @@ -98,7 +98,7 @@ public final class CollationSupport { return l.startsWith(r); } public static boolean execLowercase(final UTF8String l, final UTF8String r) { - return l.toLowerCase().startsWith(r.toLowerCase()); + return l.startsWithInLowerCase(r); } public static boolean execICU(final UTF8String l, final UTF8String r, final int collationId) { @@ -135,7 +135,7 @@ public final class CollationSupport { return l.endsWith(r); } public static boolean execLowercase(final UTF8String l, final UTF8String r) { - return l.toLowerCase().endsWith(r.toLowerCase()); + return l.endsWithInLowerCase(r); } public static boolean execICU(final UTF8String l, final UTF8String r, final int collationId) { diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java index 2009f1d20442..8ceeddb0c3dd 100644 --- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java +++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java @@ -341,6 +341,44 @@ public final class UTF8String implements Comparable<UTF8String>, Externalizable, return false; } + /** + * Returns whether `this` contains `substring` in a lowercase unicode-aware manner + * + * This function is written in a way which avoids excessive allocations in case if we work with + * bare ASCII-character strings. + */ + public boolean containsInLowerCase(final UTF8String substring) { + if (substring.numBytes == 0) { + return true; + } + + // Both `this` and the `substring` are checked for non-ASCII characters, otherwise we would + // have to use `startsWithLowerCase(...)` in a loop, and it would frequently allocate + // (e.g. in case of `containsInLowerCase("1大1大1大...", "11")`) + if (!substring.isFullAscii()) { + return toLowerCase().contains(substring.toLowerCaseSlow()); + } + if (!isFullAscii()) { + return toLowerCaseSlow().contains(substring.toLowerCaseAscii()); + } + + if (numBytes < substring.numBytes) { + return false; + } + + final var firstLower = Character.toLowerCase(substring.getByte(0)); + for (var i = 0; i <= (numBytes - substring.numBytes); i++) { + if (Character.toLowerCase(getByte(i)) == firstLower) { + final var rest = UTF8String.fromAddress(base, offset + i, numBytes - i); + if (rest.matchAtInLowerCaseAscii(substring, 0)) { + return true; + } + } + } + + return false; + } + /** * Returns the byte at position `i`. */ @@ -355,14 +393,94 @@ public final class UTF8String implements Comparable<UTF8String>, Externalizable, return ByteArrayMethods.arrayEquals(base, offset + pos, s.base, s.offset, s.numBytes); } + private boolean matchAtInLowerCaseAscii(final UTF8String s, int pos) { + if (s.numBytes + pos > numBytes || pos < 0) { + return false; + } + + for (var i = 0; i < s.numBytes; i++) { + if (Character.toLowerCase(getByte(pos + i)) != Character.toLowerCase(s.getByte(i))) { + return false; + } + } + + return true; + } + public boolean startsWith(final UTF8String prefix) { return matchAt(prefix, 0); } + /** + * Checks whether `prefix` is a prefix of `this` in a lowercase unicode-aware manner + * + * This function is written in a way which avoids excessive allocations in case if we work with + * bare ASCII-character strings. + */ + public boolean startsWithInLowerCase(final UTF8String prefix) { + // No way to match sizes of strings for early return, since single grapheme can be expanded + // into several independent ones in lowercase + if (prefix.numBytes == 0) { + return true; + } + if (numBytes == 0) { + return false; + } + + if (!prefix.isFullAscii()) { + return toLowerCase().startsWith(prefix.toLowerCaseSlow()); + } + + final var part = prefix.numBytes >= numBytes ? this : UTF8String.fromAddress( + base, offset, prefix.numBytes); + if (!part.isFullAscii()) { + return toLowerCaseSlow().startsWith(prefix.toLowerCaseAscii()); + } + + if (numBytes < prefix.numBytes) { + return false; + } + + return matchAtInLowerCaseAscii(prefix, 0); + } + public boolean endsWith(final UTF8String suffix) { return matchAt(suffix, numBytes - suffix.numBytes); } + /** + * Checks whether `suffix` is a suffix of `this` in a lowercase unicode-aware manner + * + * This function is written in a way which avoids excessive allocations in case if we work with + * bare ASCII-character strings. + */ + public boolean endsWithInLowerCase(final UTF8String suffix) { + // No way to match sizes of strings for early return, since single grapheme can be expanded + // into several independent ones in lowercase + if (suffix.numBytes == 0) { + return true; + } + if (numBytes == 0) { + return false; + } + + if (!suffix.isFullAscii()) { + return toLowerCase().endsWith(suffix.toLowerCaseSlow()); + } + + final var part = suffix.numBytes >= numBytes ? this : UTF8String.fromAddress( + base, offset + numBytes - suffix.numBytes, suffix.numBytes); + if (!part.isFullAscii()) { + return toLowerCaseSlow().endsWith(suffix.toLowerCaseAscii()); + } + + if (numBytes < suffix.numBytes) { + return false; + } + + return matchAtInLowerCaseAscii(suffix, numBytes - suffix.numBytes); + } + /** * Returns the upper case of this string */ @@ -423,24 +541,31 @@ public final class UTF8String implements Comparable<UTF8String>, Externalizable, if (numBytes == 0) { return EMPTY_UTF8; } - // Optimization - do char level lowercase conversion in case of chars in ASCII range - for (int i = 0; i < numBytes; i++) { + + return isFullAscii() ? toLowerCaseAscii() : toLowerCaseSlow(); + } + + private boolean isFullAscii() { + for (var i = 0; i < numBytes; i++) { if (getByte(i) < 0) { - // non-ASCII - return toLowerCaseSlow(); + return false; } } - byte[] bytes = new byte[numBytes]; - for (int i = 0; i < numBytes; i++) { - bytes[i] = (byte) Character.toLowerCase(getByte(i)); - } - return fromBytes(bytes); + return true; } private UTF8String toLowerCaseSlow() { return fromString(toString().toLowerCase()); } + private UTF8String toLowerCaseAscii() { + final var bytes = new byte[numBytes]; + for (var i = 0; i < numBytes; i++) { + bytes[i] = (byte) Character.toLowerCase(getByte(i)); + } + return fromBytes(bytes); + } + /** * Returns the title case of this string, that could be used as title. */ diff --git a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java index 3fca7296b832..d59bd5c20e67 100644 --- a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java +++ b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java @@ -104,6 +104,18 @@ public class CollationSupportSuite { // Case-variable character length assertContains("abİo12", "i̇o", "UNICODE_CI", true); assertContains("abi̇o12", "İo", "UNICODE_CI", true); + assertContains("the İodine", "the i̇odine", "UTF8_BINARY_LCASE", true); + assertContains("the i̇odine", "the İodine", "UTF8_BINARY_LCASE", true); + assertContains("The İodiNe", " i̇oDin", "UTF8_BINARY_LCASE", true); + assertContains("İodiNe", "i̇oDine", "UTF8_BINARY_LCASE", true); + assertContains("İodiNe", " i̇oDin", "UTF8_BINARY_LCASE", false); + // Characters with the same binary lowercase representation + assertContains("The Kelvin.", "Kelvin", "UTF8_BINARY_LCASE", true); + assertContains("The Kelvin.", "Kelvin", "UTF8_BINARY_LCASE", true); + assertContains("The KKelvin.", "KKelvin", "UTF8_BINARY_LCASE", true); + assertContains("2 Kelvin.", "2 Kelvin", "UTF8_BINARY_LCASE", true); + assertContains("2 Kelvin.", "2 Kelvin", "UTF8_BINARY_LCASE", true); + assertContains("The KKelvin.", "KKelvin,", "UTF8_BINARY_LCASE", false); } private void assertStartsWith( @@ -182,6 +194,17 @@ public class CollationSupportSuite { // Case-variable character length assertStartsWith("İonic", "i̇o", "UNICODE_CI", true); assertStartsWith("i̇onic", "İo", "UNICODE_CI", true); + assertStartsWith("the İodine", "the i̇odine", "UTF8_BINARY_LCASE", true); + assertStartsWith("the i̇odine", "the İodine", "UTF8_BINARY_LCASE", true); + assertStartsWith("İodiNe", "i̇oDin", "UTF8_BINARY_LCASE", true); + assertStartsWith("The İodiNe", "i̇oDin", "UTF8_BINARY_LCASE", false); + // Characters with the same binary lowercase representation + assertStartsWith("Kelvin.", "Kelvin", "UTF8_BINARY_LCASE", true); + assertStartsWith("Kelvin.", "Kelvin", "UTF8_BINARY_LCASE", true); + assertStartsWith("KKelvin.", "KKelvin", "UTF8_BINARY_LCASE", true); + assertStartsWith("2 Kelvin.", "2 Kelvin", "UTF8_BINARY_LCASE", true); + assertStartsWith("2 Kelvin.", "2 Kelvin", "UTF8_BINARY_LCASE", true); + assertStartsWith("KKelvin.", "KKelvin,", "UTF8_BINARY_LCASE", false); } private void assertEndsWith(String pattern, String suffix, String collationName, boolean expected) @@ -259,6 +282,17 @@ public class CollationSupportSuite { // Case-variable character length assertEndsWith("The İo", "i̇o", "UNICODE_CI", true); assertEndsWith("The i̇o", "İo", "UNICODE_CI", true); + assertEndsWith("the İodine", "the i̇odine", "UTF8_BINARY_LCASE", true); + assertEndsWith("the i̇odine", "the İodine", "UTF8_BINARY_LCASE", true); + assertEndsWith("The İodiNe", "i̇oDine", "UTF8_BINARY_LCASE", true); + assertEndsWith("The İodiNe", "i̇oDin", "UTF8_BINARY_LCASE", false); + // Characters with the same binary lowercase representation + assertEndsWith("The Kelvin", "Kelvin", "UTF8_BINARY_LCASE", true); + assertEndsWith("The Kelvin", "Kelvin", "UTF8_BINARY_LCASE", true); + assertEndsWith("The KKelvin", "KKelvin", "UTF8_BINARY_LCASE", true); + assertEndsWith("The 2 Kelvin", "2 Kelvin", "UTF8_BINARY_LCASE", true); + assertEndsWith("The 2 Kelvin", "2 Kelvin", "UTF8_BINARY_LCASE", true); + assertEndsWith("The KKelvin", "KKelvin,", "UTF8_BINARY_LCASE", false); } diff --git a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java index 934b93c9345b..711e31fd6881 100644 --- a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java +++ b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java @@ -215,6 +215,43 @@ public class UTF8StringSuite { assertFalse(fromString("大千世界").contains(fromString("大千世界好"))); } + @Test + public void containsInLowerCase() { + // Corner cases + assertTrue(EMPTY_UTF8.containsInLowerCase(EMPTY_UTF8)); + assertTrue(fromString("a").containsInLowerCase(EMPTY_UTF8)); + assertTrue(fromString("A").containsInLowerCase(fromString("a"))); + assertTrue(fromString("a").containsInLowerCase(fromString("A"))); + assertFalse(EMPTY_UTF8.containsInLowerCase(fromString("a"))); + // ASCII + assertTrue(fromString("hello").containsInLowerCase(fromString("ello"))); + assertFalse(fromString("hello").containsInLowerCase(fromString("vello"))); + assertFalse(fromString("hello").containsInLowerCase(fromString("hellooo"))); + // Unicode + assertTrue(fromString("大千世界").containsInLowerCase(fromString("千世界"))); + assertFalse(fromString("大千世界").containsInLowerCase(fromString("世千"))); + assertFalse(fromString("大千世界").containsInLowerCase(fromString("大千世界好"))); + // ASCII lowercase + assertTrue(fromString("HeLlO").containsInLowerCase(fromString("ElL"))); + assertFalse(fromString("HeLlO").containsInLowerCase(fromString("ElLoO"))); + // Unicode lowercase + assertTrue(fromString("ЯбЛоКо").containsInLowerCase(fromString("БлОк"))); + assertFalse(fromString("ЯбЛоКо").containsInLowerCase(fromString("лОкБ"))); + // Characters with the same binary lowercase representation + assertTrue(fromString("The Kelvin.").containsInLowerCase(fromString("Kelvin"))); + assertTrue(fromString("The Kelvin.").containsInLowerCase(fromString("Kelvin"))); + assertTrue(fromString("The KKelvin.").containsInLowerCase(fromString("KKelvin"))); + assertTrue(fromString("2 Kelvin.").containsInLowerCase(fromString("2 Kelvin"))); + assertTrue(fromString("2 Kelvin.").containsInLowerCase(fromString("2 Kelvin"))); + assertFalse(fromString("The KKelvin.").containsInLowerCase(fromString("KKelvin,"))); + // Characters with longer binary lowercase representation + assertTrue(fromString("the İodine").containsInLowerCase(fromString("the i̇odine"))); + assertTrue(fromString("the i̇odine").containsInLowerCase(fromString("the İodine"))); + assertTrue(fromString("The İodiNe").containsInLowerCase(fromString(" i̇oDin"))); + assertTrue(fromString("İodiNe").containsInLowerCase(fromString("i̇oDin"))); + assertFalse(fromString("İodiNe").containsInLowerCase(fromString(" i̇oDin"))); + } + @Test public void startsWith() { assertTrue(EMPTY_UTF8.startsWith(EMPTY_UTF8)); @@ -226,6 +263,40 @@ public class UTF8StringSuite { assertFalse(fromString("大千世界").startsWith(fromString("大千世界好"))); } + @Test + public void startsWithInLowerCase() { + // Corner cases + assertTrue(EMPTY_UTF8.startsWithInLowerCase(EMPTY_UTF8)); + assertTrue(fromString("a").startsWithInLowerCase(EMPTY_UTF8)); + assertTrue(fromString("A").startsWithInLowerCase(fromString("a"))); + assertTrue(fromString("a").startsWithInLowerCase(fromString("A"))); + assertFalse(EMPTY_UTF8.startsWithInLowerCase(fromString("a"))); + // ASCII + assertTrue(fromString("hello").startsWithInLowerCase(fromString("hell"))); + assertFalse(fromString("hello").startsWithInLowerCase(fromString("ell"))); + // Unicode + assertTrue(fromString("大千世界").startsWithInLowerCase(fromString("大千"))); + assertFalse(fromString("大千世界").startsWithInLowerCase(fromString("世千"))); + // ASCII lowercase + assertTrue(fromString("HeLlO").startsWithInLowerCase(fromString("hElL"))); + assertFalse(fromString("HeLlO").startsWithInLowerCase(fromString("ElL"))); + // Unicode lowercase + assertTrue(fromString("ЯбЛоКо").startsWithInLowerCase(fromString("яБлОк"))); + assertFalse(fromString("ЯбЛоКо").startsWithInLowerCase(fromString("БлОк"))); + // Characters with the same binary lowercase representation + assertTrue(fromString("Kelvin.").startsWithInLowerCase(fromString("Kelvin"))); + assertTrue(fromString("Kelvin.").startsWithInLowerCase(fromString("Kelvin"))); + assertTrue(fromString("KKelvin.").startsWithInLowerCase(fromString("KKelvin"))); + assertTrue(fromString("2 Kelvin.").startsWithInLowerCase(fromString("2 Kelvin"))); + assertTrue(fromString("2 Kelvin.").startsWithInLowerCase(fromString("2 Kelvin"))); + assertFalse(fromString("KKelvin.").startsWithInLowerCase(fromString("KKelvin,"))); + // Characters with longer binary lowercase representation + assertTrue(fromString("the İodine").startsWithInLowerCase(fromString("the i̇odine"))); + assertTrue(fromString("the i̇odine").startsWithInLowerCase(fromString("the İodine"))); + assertTrue(fromString("İodiNe").startsWithInLowerCase(fromString("i̇oDin"))); + assertFalse(fromString("The İodiNe").startsWithInLowerCase(fromString("i̇oDin"))); + } + @Test public void endsWith() { assertTrue(EMPTY_UTF8.endsWith(EMPTY_UTF8)); @@ -237,6 +308,40 @@ public class UTF8StringSuite { assertFalse(fromString("数据砖头").endsWith(fromString("我的数据砖头"))); } + @Test + public void endsWithInLowerCase() { + // Corner cases + assertTrue(EMPTY_UTF8.endsWithInLowerCase(EMPTY_UTF8)); + assertTrue(fromString("a").endsWithInLowerCase(EMPTY_UTF8)); + assertTrue(fromString("A").endsWithInLowerCase(fromString("a"))); + assertTrue(fromString("a").endsWithInLowerCase(fromString("A"))); + assertFalse(EMPTY_UTF8.endsWithInLowerCase(fromString("a"))); + // ASCII + assertTrue(fromString("hello").endsWithInLowerCase(fromString("ello"))); + assertFalse(fromString("hello").endsWithInLowerCase(fromString("hell"))); + // Unicode + assertTrue(fromString("大千世界").endsWithInLowerCase(fromString("世界"))); + assertFalse(fromString("大千世界").endsWithInLowerCase(fromString("大千"))); + // ASCII lowercase + assertTrue(fromString("HeLlO").endsWithInLowerCase(fromString("ElLo"))); + assertFalse(fromString("HeLlO").endsWithInLowerCase(fromString("hElL"))); + // Unicode lowercase + assertTrue(fromString("ЯбЛоКо").endsWithInLowerCase(fromString("БлОкО"))); + assertFalse(fromString("ЯбЛоКо").endsWithInLowerCase(fromString("яБлОк"))); + // Characters with the same binary lowercase representation + assertTrue(fromString("The Kelvin").endsWithInLowerCase(fromString("Kelvin"))); + assertTrue(fromString("The Kelvin").endsWithInLowerCase(fromString("Kelvin"))); + assertTrue(fromString("The KKelvin").endsWithInLowerCase(fromString("KKelvin"))); + assertTrue(fromString("The 2 Kelvin").endsWithInLowerCase(fromString("2 Kelvin"))); + assertTrue(fromString("The 2 Kelvin").endsWithInLowerCase(fromString("2 Kelvin"))); + assertFalse(fromString("The KKelvin").endsWithInLowerCase(fromString("KKelvin,"))); + // Characters with longer binary lowercase representation + assertTrue(fromString("the İodine").endsWithInLowerCase(fromString("the i̇odine"))); + assertTrue(fromString("the i̇odine").endsWithInLowerCase(fromString("the İodine"))); + assertTrue(fromString("The İodiNe").endsWithInLowerCase(fromString("i̇oDine"))); + assertFalse(fromString("The İodiNe").endsWithInLowerCase(fromString("i̇oDin"))); + } + @Test public void substring() { assertEquals(EMPTY_UTF8, fromString("hello").substring(0, 0)); diff --git a/sql/core/benchmarks/CollationBenchmark-jdk21-results.txt b/sql/core/benchmarks/CollationBenchmark-jdk21-results.txt index 24605e051dbb..326e6b705313 100644 --- a/sql/core/benchmarks/CollationBenchmark-jdk21-results.txt +++ b/sql/core/benchmarks/CollationBenchmark-jdk21-results.txt @@ -1,54 +1,54 @@ -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------- -UTF8_BINARY_LCASE 6910 6912 3 0.0 69099.7 1.0X -UNICODE 4367 4368 1 0.0 43669.6 1.6X -UTF8_BINARY 4361 4364 4 0.0 43606.5 1.6X -UNICODE_CI 46480 46526 66 0.0 464795.7 0.1X +UTF8_BINARY_LCASE 6686 6690 5 0.0 66862.9 1.0X +UNICODE 4302 4314 17 0.0 43021.3 1.6X +UTF8_BINARY 4295 4299 6 0.0 42951.9 1.6X +UNICODE_CI 43948 43951 4 0.0 439481.4 0.2X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -UTF8_BINARY_LCASE 6522 6526 4 0.0 65223.9 1.0X -UNICODE 45792 45797 7 0.0 457922.3 0.1X -UTF8_BINARY 7092 7112 29 0.0 70921.7 0.9X -UNICODE_CI 47548 47564 22 0.0 475476.7 0.1X +UTF8_BINARY_LCASE 7919 7920 1 0.0 79188.2 1.0X +UNICODE 45764 45795 44 0.0 457641.3 0.2X +UTF8_BINARY 7384 7388 5 0.0 73839.9 1.1X +UNICODE_CI 48078 48099 29 0.0 480782.5 0.2X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 11716 11716 1 0.0 117157.9 1.0X -UNICODE 180133 180137 5 0.0 1801332.1 0.1X -UTF8_BINARY 10476 10477 1 0.0 104757.4 1.1X -UNICODE_CI 148171 148190 28 0.0 1481705.6 0.1X +UTF8_BINARY_LCASE 11353 11354 2 0.0 113527.0 1.0X +UNICODE 175533 175720 265 0.0 1755327.6 0.1X +UTF8_BINARY 9995 9998 3 0.0 99953.2 1.1X +UNICODE_CI 148475 148498 33 0.0 1484745.3 0.1X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 49257 49280 32 0.0 492574.0 1.0X -UNICODE 18253 18293 57 0.0 182530.8 2.7X -UTF8_BINARY 20199 20247 68 0.0 201987.8 2.4X -UNICODE_CI 882302 882576 387 0.0 8823023.9 0.1X +UTF8_BINARY_LCASE 28707 28715 12 0.0 287065.8 1.0X +UNICODE 15578 15623 64 0.0 155783.5 1.8X +UTF8_BINARY 17321 17410 126 0.0 173208.7 1.7X +UNICODE_CI 907463 907667 289 0.0 9074632.3 0.0X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 45015 45024 13 0.0 450153.7 1.0X -UNICODE 17425 17455 43 0.0 174247.1 2.6X -UTF8_BINARY 19237 19268 44 0.0 192371.4 2.3X -UNICODE_CI 954993 955680 971 0.0 9549930.3 0.0X +UTF8_BINARY_LCASE 28001 28011 14 0.0 280014.2 1.0X +UNICODE 15284 15288 5 0.0 152841.3 1.8X +UTF8_BINARY 17035 17042 10 0.0 170348.1 1.6X +UNICODE_CI 873571 874628 1494 0.0 8735712.7 0.0X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 45919 45966 67 0.0 459187.0 1.0X -UNICODE 17697 17713 23 0.0 176970.4 2.6X -UTF8_BINARY 19448 19449 2 0.0 194479.6 2.4X -UNICODE_CI 962916 963010 133 0.0 9629158.5 0.0X +UTF8_BINARY_LCASE 28260 28263 3 0.0 282603.9 1.0X +UNICODE 15531 15538 9 0.0 155312.3 1.8X +UTF8_BINARY 17239 17242 5 0.0 172387.3 1.6X +UNICODE_CI 886437 888336 2685 0.0 8864372.1 0.0X diff --git a/sql/core/benchmarks/CollationBenchmark-results.txt b/sql/core/benchmarks/CollationBenchmark-results.txt index a92aadc52ee2..7b28c96f379c 100644 --- a/sql/core/benchmarks/CollationBenchmark-results.txt +++ b/sql/core/benchmarks/CollationBenchmark-results.txt @@ -1,54 +1,54 @@ -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------- -UTF8_BINARY_LCASE 7692 7731 55 0.0 76919.2 1.0X -UNICODE 4378 4379 0 0.0 43784.6 1.8X -UTF8_BINARY 4382 4396 19 0.0 43821.6 1.8X -UNICODE_CI 48344 48360 23 0.0 483436.5 0.2X +UTF8_BINARY_LCASE 7726 7727 2 0.0 77260.4 1.0X +UNICODE 4411 4412 2 0.0 44106.7 1.8X +UTF8_BINARY 4409 4414 6 0.0 44090.3 1.8X +UNICODE_CI 46811 46820 12 0.0 468113.1 0.2X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -UTF8_BINARY_LCASE 9819 9820 0 0.0 98194.9 1.0X -UNICODE 49507 49518 17 0.0 495066.2 0.2X -UTF8_BINARY 7354 7365 17 0.0 73536.3 1.3X -UNICODE_CI 52149 52163 20 0.0 521489.4 0.2X +UTF8_BINARY_LCASE 6290 6293 5 0.0 62895.3 1.0X +UNICODE 48173 48210 53 0.0 481725.5 0.1X +UTF8_BINARY 5252 5259 9 0.0 52524.2 1.2X +UNICODE_CI 48093 48104 16 0.0 480931.0 0.1X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 18110 18127 24 0.0 181103.9 1.0X -UNICODE 171375 171435 85 0.0 1713752.3 0.1X -UTF8_BINARY 14012 14030 26 0.0 140116.7 1.3X -UNICODE_CI 153847 153901 76 0.0 1538471.1 0.1X +UTF8_BINARY_LCASE 18369 18386 25 0.0 183685.5 1.0X +UNICODE 177476 177572 135 0.0 1774764.4 0.1X +UTF8_BINARY 14029 14039 13 0.0 140293.8 1.3X +UNICODE_CI 150438 150527 126 0.0 1504375.8 0.1X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 48528 48534 8 0.0 485281.3 1.0X -UNICODE 17612 17628 23 0.0 176119.4 2.8X -UTF8_BINARY 19664 19671 11 0.0 196636.4 2.5X -UNICODE_CI 860919 862936 2853 0.0 8609190.8 0.1X +UTF8_BINARY_LCASE 33830 33842 17 0.0 338295.2 1.0X +UNICODE 19038 19040 3 0.0 190376.9 1.8X +UTF8_BINARY 21217 21222 7 0.0 212165.3 1.6X +UNICODE_CI 888851 890073 1729 0.0 8888510.1 0.0X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 49520 49524 7 0.0 495195.4 1.0X -UNICODE 18346 18346 0 0.0 183457.7 2.7X -UTF8_BINARY 20483 20488 7 0.0 204828.7 2.4X -UNICODE_CI 928756 930065 1851 0.0 9287564.4 0.1X +UTF8_BINARY_LCASE 31240 31248 11 0.0 312403.3 1.0X +UNICODE 17197 17208 16 0.0 171969.9 1.8X +UTF8_BINARY 19262 19263 1 0.0 192620.0 1.6X +UNICODE_CI 879963 881716 2479 0.0 8799628.7 0.0X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 49501 49504 5 0.0 495006.9 1.0X -UNICODE 18052 18095 61 0.0 180523.7 2.7X -UTF8_BINARY 20187 20197 15 0.0 201867.1 2.5X -UNICODE_CI 934011 938842 6833 0.0 9340109.8 0.1X +UTF8_BINARY_LCASE 31490 31505 21 0.0 314902.0 1.0X +UNICODE 17129 17157 40 0.0 171292.3 1.8X +UTF8_BINARY 19336 19340 6 0.0 193362.0 1.6X +UNICODE_CI 908514 911193 3788 0.0 9085140.8 0.0X diff --git a/sql/core/benchmarks/CollationNonASCIIBenchmark-jdk21-results.txt b/sql/core/benchmarks/CollationNonASCIIBenchmark-jdk21-results.txt index 0a50baab36ea..9573a37c3a9c 100644 --- a/sql/core/benchmarks/CollationNonASCIIBenchmark-jdk21-results.txt +++ b/sql/core/benchmarks/CollationNonASCIIBenchmark-jdk21-results.txt @@ -1,54 +1,54 @@ -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------- -UTF8_BINARY_LCASE 18244 18258 20 0.0 456096.4 1.0X -UNICODE 498 498 0 0.1 12440.3 36.7X -UTF8_BINARY 499 500 1 0.1 12467.7 36.6X -UNICODE_CI 13429 13443 19 0.0 335725.4 1.4X +UTF8_BINARY_LCASE 18521 18553 44 0.0 463036.6 1.0X +UNICODE 459 461 2 0.1 11475.9 40.3X +UTF8_BINARY 458 459 2 0.1 11442.0 40.5X +UNICODE_CI 13493 13528 49 0.0 337322.0 1.4X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -UTF8_BINARY_LCASE 18377 18399 31 0.0 459430.5 1.0X -UNICODE 14238 14240 3 0.0 355957.4 1.3X -UTF8_BINARY 975 976 1 0.0 24371.3 18.9X -UNICODE_CI 13819 13826 10 0.0 345482.6 1.3X +UTF8_BINARY_LCASE 18163 18171 11 0.0 454086.2 1.0X +UNICODE 14502 14505 4 0.0 362541.9 1.3X +UTF8_BINARY 970 972 2 0.0 24246.1 18.7X +UNICODE_CI 14209 14216 9 0.0 355231.9 1.3X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 9183 9230 67 0.0 229564.0 1.0X -UNICODE 38937 38952 22 0.0 973421.3 0.2X -UTF8_BINARY 1376 1376 0 0.0 34397.5 6.7X -UNICODE_CI 32881 32882 1 0.0 822027.4 0.3X +UTF8_BINARY_LCASE 10171 10173 2 0.0 254276.4 1.0X +UNICODE 39033 39056 32 0.0 975819.3 0.3X +UTF8_BINARY 1389 1389 1 0.0 34719.6 7.3X +UNICODE_CI 34546 34552 9 0.0 863641.0 0.3X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 22429 22438 13 0.0 560735.1 1.0X -UNICODE 2900 2901 2 0.0 72503.2 7.7X -UTF8_BINARY 3190 3198 11 0.0 79740.5 7.0X -UNICODE_CI 166847 167278 609 0.0 4171180.3 0.1X +UTF8_BINARY_LCASE 23928 23938 15 0.0 598196.6 1.0X +UNICODE 2711 2712 1 0.0 67778.9 8.8X +UTF8_BINARY 2991 2994 5 0.0 74774.5 8.0X +UNICODE_CI 168620 168643 33 0.0 4215495.0 0.1X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 22865 22875 13 0.0 571636.3 1.0X -UNICODE 3137 3137 0 0.0 78422.3 7.3X -UTF8_BINARY 3448 3450 3 0.0 86188.5 6.6X -UNICODE_CI 190473 190894 595 0.0 4761831.2 0.1X +UTF8_BINARY_LCASE 25400 25416 23 0.0 634993.8 1.0X +UNICODE 3079 3079 1 0.0 76968.9 8.3X +UTF8_BINARY 3376 3380 5 0.0 84401.9 7.5X +UNICODE_CI 168738 168850 159 0.0 4218448.7 0.2X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 23693 23695 3 0.0 592333.2 1.0X -UNICODE 3170 3172 3 0.0 79243.5 7.5X -UTF8_BINARY 3472 3473 2 0.0 86788.8 6.8X -UNICODE_CI 63331 63603 384 0.0 1583274.3 0.4X +UTF8_BINARY_LCASE 25067 25113 65 0.0 626683.3 1.0X +UNICODE 3070 3082 16 0.0 76758.7 8.2X +UTF8_BINARY 3359 3366 10 0.0 83983.5 7.5X +UNICODE_CI 180852 180985 189 0.0 4521288.1 0.1X diff --git a/sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt b/sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt index bef5f9d7211f..6df1f69174d6 100644 --- a/sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt +++ b/sql/core/benchmarks/CollationNonASCIIBenchmark-results.txt @@ -1,54 +1,54 @@ -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - equalsFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------- -UTF8_BINARY_LCASE 17881 17885 6 0.0 447017.7 1.0X -UNICODE 493 495 2 0.1 12328.9 36.3X -UTF8_BINARY 493 494 1 0.1 12331.4 36.3X -UNICODE_CI 13731 13737 8 0.0 343284.6 1.3X +UTF8_BINARY_LCASE 19237 19255 26 0.0 480925.9 1.0X +UNICODE 311 319 17 0.1 7764.9 61.9X +UTF8_BINARY 313 314 1 0.1 7817.9 61.5X +UNICODE_CI 15481 15517 52 0.0 387018.9 1.2X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - compareFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -UTF8_BINARY_LCASE 18041 18047 8 0.0 451030.2 1.0X -UNICODE 14023 14047 34 0.0 350573.9 1.3X -UTF8_BINARY 1387 1397 14 0.0 34680.4 13.0X -UNICODE_CI 14232 14242 14 0.0 355808.4 1.3X +UTF8_BINARY_LCASE 17886 17892 9 0.0 447142.3 1.0X +UNICODE 13888 13908 28 0.0 347192.7 1.3X +UTF8_BINARY 1384 1387 5 0.0 34589.8 12.9X +UNICODE_CI 14209 14221 17 0.0 355233.9 1.3X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - hashFunction: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 10494 10499 6 0.0 262360.0 1.0X -UNICODE 40410 40422 17 0.0 1010261.8 0.3X -UTF8_BINARY 2035 2035 1 0.0 50877.8 5.2X -UNICODE_CI 31470 31493 32 0.0 786752.4 0.3X +UTF8_BINARY_LCASE 10311 10316 7 0.0 257774.0 1.0X +UNICODE 39377 39379 4 0.0 984422.3 0.3X +UTF8_BINARY 2030 2032 3 0.0 50751.8 5.1X +UNICODE_CI 31011 31035 34 0.0 775281.7 0.3X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - contains: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 22342 22352 13 0.0 558560.4 1.0X -UNICODE 3073 3074 0 0.0 76829.5 7.3X -UTF8_BINARY 3486 3487 2 0.0 87147.6 6.4X -UNICODE_CI 162838 164378 2177 0.0 4070960.3 0.1X +UTF8_BINARY_LCASE 21933 21953 28 0.0 548332.8 1.0X +UNICODE 2951 2954 4 0.0 73782.5 7.4X +UTF8_BINARY 3273 3279 8 0.0 81830.0 6.7X +UNICODE_CI 158862 159283 596 0.0 3971544.5 0.1X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - startsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 21882 21890 11 0.0 547051.8 1.0X -UNICODE 2672 2676 6 0.0 66799.0 8.2X -UTF8_BINARY 3069 3071 2 0.0 76732.2 7.1X -UNICODE_CI 187853 188724 1232 0.0 4696336.1 0.1X +UTF8_BINARY_LCASE 22054 22093 55 0.0 551348.0 1.0X +UNICODE 2745 2779 48 0.0 68623.1 8.0X +UTF8_BINARY 3068 3069 2 0.0 76703.7 7.2X +UNICODE_CI 157491 157671 254 0.0 3937270.4 0.1X -OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1017-azure +OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor collation unit benchmarks - endsWith: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -UTF8_BINARY_LCASE 21818 21866 68 0.0 545439.9 1.0X -UNICODE 2637 2643 9 0.0 65913.3 8.3X -UTF8_BINARY 3037 3039 2 0.0 75934.6 7.2X -UNICODE_CI 61372 61510 195 0.0 1534307.9 0.4X +UTF8_BINARY_LCASE 21932 21970 55 0.0 548288.1 1.0X +UNICODE 2743 2758 22 0.0 68566.3 8.0X +UTF8_BINARY 3057 3071 19 0.0 76428.2 7.2X +UNICODE_CI 172037 172321 403 0.0 4300920.2 0.1X --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org