[spark] branch branch-2.4 updated: [SPARK-29758][SQL][2.4] Fix truncation of requested string fields in `json_tuple`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new a936522 [SPARK-29758][SQL][2.4] Fix truncation of requested string fields in `json_tuple` a936522 is described below commit a9365221133caadffce1aae1ace799a588a3 Author: Maxim Gekk AuthorDate: Wed Nov 20 15:32:28 2019 +0800 [SPARK-29758][SQL][2.4] Fix truncation of requested string fields in `json_tuple` ### What changes were proposed in this pull request? In the PR, I propose to remove an optimization in `json_tuple` which causes truncation of results for large requested string fields. ### Why are the changes needed? Spark 2.4 uses Jackson Core 2.6.7 which has a bug in copying string. This bug may lead to truncation of results in some cases. The bug has been already fixed by the commit https://github.com/FasterXML/jackson-core/commit/554f8db0f940b2a53f974852a2af194739d65200 which is a part of Jackson Core since the version 2.7.7. Upgrading Jackson Core up to 2.7.7 or later version is risky. That's why this PR propose to avoid using the buggy methods of Jackson Core 2.6.7. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By new test added to `JsonFunctionsSuite` Closes #26563 from MaxGekk/fix-truncation-by-json_tuple-2.4. Authored-by: Maxim Gekk Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/jsonExpressions.scala | 5 - .../test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala | 10 ++ 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index 6650e45..4cd1a091 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -472,11 +472,6 @@ case class JsonTuple(children: Seq[Expression]) parser.getCurrentToken match { // if the user requests a string field it needs to be returned without enclosing // quotes which is accomplished via JsonGenerator.writeRaw instead of JsonGenerator.write - case JsonToken.VALUE_STRING if parser.hasTextCharacters => -// slight optimization to avoid allocating a String instance, though the characters -// still have to be decoded... Jackson doesn't have a way to access the raw bytes -generator.writeRaw(parser.getTextCharacters, parser.getTextOffset, parser.getTextLength) - case JsonToken.VALUE_STRING => // the normal String case, pass it through to the output without enclosing quotes generator.writeRaw(parser.getText) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala index b1f7446..18335ef 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala @@ -535,4 +535,14 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { to_json(struct($"t"), Map("timestampFormat" -> "-MM-dd HH:mm:ss.SS"))) checkAnswer(df, Row(s"""{"t":"$s"}""")) } + + test("json_tuple - do not truncate results") { +val len = 2800 +val str = Array.tabulate(len)(_ => "a").mkString +val json_tuple_result = Seq(s"""{"test":"$str"}""").toDF("json") + .withColumn("result", json_tuple('json, "test")) + .select('result) + .as[String].head.length +assert(json_tuple_result === len) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9e58b10 -> 5a70af7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9e58b10 [SPARK-29945][SQL] do not handle negative sign specially in the parser add 5a70af7 [SPARK-29029][SQL] Use AttributeMap in PhysicalOperation.collectProjectsAndFilters No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/planning/patterns.scala | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (40b8a08 -> 9e58b10)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 40b8a08 [SPARK-29963][SQL][TESTS] Check formatting timestamps up to microsecond precision by JSON/CSV datasource add 9e58b10 [SPARK-29945][SQL] do not handle negative sign specially in the parser No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 31 +++ .../catalyst/parser/ExpressionParserSuite.scala| 5 +- .../test/resources/sql-tests/inputs/literals.sql | 5 +- .../sql-tests/results/ansi/interval.sql.out| 12 ++-- .../sql-tests/results/ansi/literals.sql.out| 65 +++--- .../results/interval-display-iso_8601.sql.out | 2 +- .../results/interval-display-sql_standard.sql.out | 2 +- .../sql-tests/results/interval-display.sql.out | 2 +- .../resources/sql-tests/results/interval.sql.out | 12 ++-- .../resources/sql-tests/results/literals.sql.out | 65 +++--- .../sql-tests/results/postgreSQL/interval.sql.out | 2 +- 12 files changed, 98 insertions(+), 109 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e753aa3 -> 40b8a08)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e753aa3 [SPARK-29964][BUILD] lintr github workflows failed due to buggy GnuPG add 40b8a08 [SPARK-29963][SQL][TESTS] Check formatting timestamps up to microsecond precision by JSON/CSV datasource No new revisions were added by this update. Summary of changes: .../spark/sql/util/TimestampFormatterSuite.scala | 40 ++ .../org/apache/spark/sql/JsonFunctionsSuite.scala | 7 .../sql/execution/datasources/csv/CSVSuite.scala | 15 3 files changed, 62 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e804ed5 -> e753aa3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e804ed5 [SPARK-29691][ML][PYTHON] ensure Param objects are valid in fit, transform add e753aa3 [SPARK-29964][BUILD] lintr github workflows failed due to buggy GnuPG No new revisions were added by this update. Summary of changes: .github/workflows/master.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-29964][BUILD] lintr github workflows failed due to buggy GnuPG
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1a26c8e [SPARK-29964][BUILD] lintr github workflows failed due to buggy GnuPG 1a26c8e is described below commit 1a26c8edf15d2647c1462fa9971eae746bbe0b17 Author: Liang-Chi Hsieh AuthorDate: Tue Nov 19 15:56:50 2019 -0800 [SPARK-29964][BUILD] lintr github workflows failed due to buggy GnuPG ### What changes were proposed in this pull request? Linter (R) github workflows failed sometimes like: https://github.com/apache/spark/pull/26509/checks?check_run_id=310718016 Failed message: ``` Executing: /tmp/apt-key-gpghome.8r74rQNEjj/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 gpg: connecting dirmngr at '/tmp/apt-key-gpghome.8r74rQNEjj/S.dirmngr' failed: IPC connect call failed gpg: keyserver receive failed: No dirmngr ##[error]Process completed with exit code 2. ``` It is due to a buggy GnuPG. Context: https://github.com/sbt/website/pull/825 https://github.com/sbt/sbt/issues/4261 https://github.com/microsoft/WSL/issues/3286 ### Why are the changes needed? Make lint-r github workflows work. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass github workflows. Closes #26602 from viirya/SPARK-29964. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit e753aa30e659706c3fa3414bf38566a79e0af8d6) Signed-off-by: Dongjoon Hyun --- .github/workflows/branch-2.4.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/branch-2.4.yml b/.github/workflows/branch-2.4.yml index b466995..2aeffc5 100644 --- a/.github/workflows/branch-2.4.yml +++ b/.github/workflows/branch-2.4.yml @@ -84,7 +84,7 @@ jobs: - name: install R run: | echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' | sudo tee -a /etc/apt/sources.list -sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 +curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0xE298A3A825C0D65DFD57CBB651716619E084DAB9"; | sudo apt-key add sudo apt-get update sudo apt-get install -y r-base r-base-dev libcurl4-openssl-dev - name: install R packages - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3d2a6f4 -> e804ed5)
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3d2a6f4 [SPARK-29906][SQL] AQE should not introduce extra shuffle for outermost limit add e804ed5 [SPARK-29691][ML][PYTHON] ensure Param objects are valid in fit, transform No new revisions were added by this update. Summary of changes: python/pyspark/ml/param/__init__.py| 12 ++-- python/pyspark/ml/tests/test_param.py | 4 python/pyspark/ml/tests/test_tuning.py | 9 + python/pyspark/ml/tuning.py| 8 +++- 4 files changed, 30 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3d2a6f4 -> e804ed5)
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3d2a6f4 [SPARK-29906][SQL] AQE should not introduce extra shuffle for outermost limit add e804ed5 [SPARK-29691][ML][PYTHON] ensure Param objects are valid in fit, transform No new revisions were added by this update. Summary of changes: python/pyspark/ml/param/__init__.py| 12 ++-- python/pyspark/ml/tests/test_param.py | 4 python/pyspark/ml/tests/test_tuning.py | 9 + python/pyspark/ml/tuning.py| 8 +++- 4 files changed, 30 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6fb8b86 -> 3d2a6f4)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6fb8b86 [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean add 3d2a6f4 [SPARK-29906][SQL] AQE should not introduce extra shuffle for outermost limit No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 21 2 files changed, 36 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6fb8b86 -> 3d2a6f4)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6fb8b86 [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean add 3d2a6f4 [SPARK-29906][SQL] AQE should not introduce extra shuffle for outermost limit No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 23 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 21 2 files changed, 36 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (79ed4ae -> 6fb8b86)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 79ed4ae [SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point add 6fb8b86 [SPARK-29913][SQL] Improve Exception in postgreCastToBoolean No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a8d9883 -> 79ed4ae)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a8d9883 [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi add 79ed4ae [SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/util/IntervalUtils.scala | 10 +++--- .../apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala| 2 +- 2 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ffc9753 -> a8d9883)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ffc9753 [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long add a8d9883 [SPARK-29893] improve the local shuffle reader performance by changing the reading task number from 1 to multi No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/MapOutputTracker.scala | 3 +- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 13 +-- .../execution/adaptive/LocalShuffledRowRDD.scala | 52 +++--- .../adaptive/OptimizeLocalShuffleReader.scala | 114 - .../execution/exchange/ShuffleExchangeExec.scala | 5 +- .../adaptive/AdaptiveQueryExecSuite.scala | 61 +++ 6 files changed, 170 insertions(+), 78 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-29949][SQL][2.4] Fix formatting of timestamps by JSON/CSV datasources
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 47cb1f3 [SPARK-29949][SQL][2.4] Fix formatting of timestamps by JSON/CSV datasources 47cb1f3 is described below commit 47cb1f359af62383e24198dbbaa0b4503348cd04 Author: Maxim Gekk AuthorDate: Tue Nov 19 17:10:16 2019 +0800 [SPARK-29949][SQL][2.4] Fix formatting of timestamps by JSON/CSV datasources ### What changes were proposed in this pull request? In the PR, I propose to use the `format()` method of `FastDateFormat` which accepts an instance of the `Calendar` type. This allows to adjust the `MILLISECOND` field of the calendar directly before formatting. I added new method `format()` to `DateTimeUtils.TimestampParser`. This method splits the input timestamp to a part truncated to seconds and the seconds fractional part. The calendar is initialized by the first part in normal way, and the last one is converted to a form appropria [...] I refactored `MicrosCalendar` by passing the number of digits from the fraction pattern as a parameter to the default constructor because it is used by the existing `getMicros()` and new one `setMicros()`. `setMicros()` is used to set the seconds fraction to calendar's `MILLISECOND` field directly before formatting. This PR supports various patterns for seconds fractions from `S` up to `SS`. If the patterns has more than 6 `S`, the first 6 digits reflect to milliseconds and microseconds of the input timestamp but the rest digits are set to `0`. ### Why are the changes needed? This fixes a bug of incorrectly formatting timestamps in microsecond precision. For example: ```scala Seq(java.sql.Timestamp.valueOf("2019-11-18 11:56:00.123456")).toDF("t") .select(to_json(struct($"t"), Map("timestampFormat" -> "-MM-dd HH:mm:ss.SS")).as("json")) .show(false) +--+ |json | +--+ |{"t":"2019-11-18 11:56:00.000123"}| +--+ ``` ### Does this PR introduce any user-facing change? Yes. The example above outputs: ```scala +--+ |json | +--+ |{"t":"2019-11-18 11:56:00.123456"}| +--+ ``` ### How was this patch tested? - By new tests for formatting by different patterns from `S` to `SS` in `DateTimeUtilsSuite` - A test for `to_json()` in `JsonFunctionsSuite` - A roundtrp test for writing and reading back a timestamp in a CSV file. Closes #26582 from MaxGekk/micros-format-2.4. Authored-by: Maxim Gekk Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/json/JacksonGenerator.scala | 6 ++-- .../spark/sql/catalyst/util/DateTimeUtils.scala| 35 ++- .../sql/catalyst/util/DateTimeUtilsSuite.scala | 40 ++ .../datasources/csv/UnivocityGenerator.scala | 6 ++-- .../org/apache/spark/sql/JsonFunctionsSuite.scala | 7 .../sql/execution/datasources/csv/CSVSuite.scala | 15 6 files changed, 97 insertions(+), 12 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala index 9b86d86..a379f86 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala @@ -24,6 +24,7 @@ import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} +import org.apache.spark.sql.catalyst.util.DateTimeUtils.TimestampParser import org.apache.spark.sql.types._ /** @@ -74,6 +75,8 @@ private[sql] class JacksonGenerator( private val lineSeparator: String = options.lineSeparatorInWrite + @transient private lazy val timestampParser = new TimestampParser(options.timestampFormat) + private def makeWriter(dataType: DataType): ValueWriter = dataType match { case NullType => (row: SpecializedGetters, ordinal: Int) => @@ -113,8 +116,7 @@ private[sql] class JacksonGenerator( case TimestampType => (row: SpecializedGetters, ordinal: Int) => -val timestampString = - options.timestampFormat.format(DateTimeUtils.toJavaTimestamp(row.getLong(ordinal))) +val timestampString = timestampParser.format(row.getLong(ordinal)) gen.writeString(timestampString)
[spark] branch branch-2.4 updated: [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new dc2abe51 [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long dc2abe51 is described below commit dc2abe51ca2d3d702d6b6457301c3ca9c7244212 Author: wangguangxin.cn AuthorDate: Tue Nov 19 16:10:22 2019 +0800 [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long ### What changes were proposed in this pull request? This PR try to make sure the comparison results of `compared by 8 bytes at a time` and `compared by bytes wise` in RecordBinaryComparator is *consistent*, by reverse long bytes if it is little-endian and using Long.compareUnsigned. ### Why are the changes needed? If the architecture supports unaligned or the offset is 8 bytes aligned, `RecordBinaryComparator` compare 8 bytes at a time by reading 8 bytes as a long. Related code is ``` if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) % 8 == 0))) { while (i <= leftLen - 8) { final long v1 = Platform.getLong(leftObj, leftOff + i); final long v2 = Platform.getLong(rightObj, rightOff + i); if (v1 != v2) { return v1 > v2 ? 1 : -1; } i += 8; } } ``` Otherwise, it will compare bytes by bytes. Related code is ``` while (i < leftLen) { final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff; final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff; if (v1 != v2) { return v1 > v2 ? 1 : -1; } i += 1; } ``` However, on little-endian machine, the result of *compared by a long value* and *compared bytes by bytes* maybe different. For two same records, its offsets may vary in the first run and second run, which will lead to compare them using long comparison or byte-by-byte comparison, the result maybe different. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Add new test cases in RecordBinaryComparatorSuite Closes #26548 from WangGuangxin/binary_comparator. Authored-by: wangguangxin.cn Signed-off-by: Wenchen Fan (cherry picked from commit ffc97530371433bc0221e06d8c1d11af8d92bd94) Signed-off-by: Wenchen Fan --- .../sql/execution/RecordBinaryComparator.java | 30 +- .../sort/RecordBinaryComparatorSuite.java | 47 +- 2 files changed, 67 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java b/sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java index 40c2cc8..1f24340 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java @@ -20,8 +20,13 @@ package org.apache.spark.sql.execution; import org.apache.spark.unsafe.Platform; import org.apache.spark.util.collection.unsafe.sort.RecordComparator; +import java.nio.ByteOrder; + public final class RecordBinaryComparator extends RecordComparator { + private static final boolean LITTLE_ENDIAN = + ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN); + @Override public int compare( Object leftObj, long leftOff, int leftLen, Object rightObj, long rightOff, int rightLen) { @@ -38,10 +43,10 @@ public final class RecordBinaryComparator extends RecordComparator { // check if stars align and we can get both offsets to be aligned if ((leftOff % 8) == (rightOff % 8)) { while ((leftOff + i) % 8 != 0 && i < leftLen) { -final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff; -final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff; +final int v1 = Platform.getByte(leftObj, leftOff + i); +final int v2 = Platform.getByte(rightObj, rightOff + i); if (v1 != v2) { - return v1 > v2 ? 1 : -1; + return (v1 & 0xff) > (v2 & 0xff) ? 1 : -1; } i += 1; } @@ -49,10 +54,17 @@ public final class RecordBinaryComparator extends RecordComparator { // for architectures that support unaligned accesses, chew it up 8 bytes at a time if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) % 8 == 0))) { while (i <= leftLen - 8) { -final long v1 = Platform.getLong(leftObj, leftOff + i); -final long v2 = Platform.getLong(rightObj, rightOff + i); +long v1 = Platform.getLong(leftObj, leftOff + i); +long v2 = Platform.getLong(rightObj, rightOff + i)
[spark] branch master updated (16134d6 -> ffc9753)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 16134d6 [SPARK-29948][SQL] make the default alias consistent between date, timestamp and interval add ffc9753 [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long No new revisions were added by this update. Summary of changes: .../sql/execution/RecordBinaryComparator.java | 30 +- .../sort/RecordBinaryComparatorSuite.java | 47 +- 2 files changed, 67 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (16134d6 -> ffc9753)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 16134d6 [SPARK-29948][SQL] make the default alias consistent between date, timestamp and interval add ffc9753 [SPARK-29918][SQL] RecordBinaryComparator should check endianness when compared by long No new revisions were added by this update. Summary of changes: .../sql/execution/RecordBinaryComparator.java | 30 +- .../sort/RecordBinaryComparatorSuite.java | 47 +- 2 files changed, 67 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org