[spark] branch branch-3.1 updated (9c95d3f -> aece7e7)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 9c95d3f [SPARK-36441][INFRA] Fix GA failure related to downloading lintr dependencies add aece7e7 [SPARK-36393][BUILD][3.1] Try to raise memory for GHA No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 2 +- build/sbt-launch-lib.bash| 6 -- dev/run-tests.py | 7 +-- pom.xml | 6 +++--- project/SparkBuild.scala | 4 ++-- 5 files changed, 11 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6e72951 -> 4624e59)
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6e72951 [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle add 4624e59 [SPARK-36359][SQL] Coalesce drop all expressions after the first non nullable expression No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/nullExpressions.scala | 4 +++- .../spark/sql/catalyst/optimizer/expressions.scala | 11 --- .../spark/sql/catalyst/trees/TreePatterns.scala | 1 + .../BinaryComparisonSimplificationSuite.scala | 21 + .../scala/org/apache/spark/sql/ExplainSuite.scala | 4 ++-- 5 files changed, 35 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 552a332 [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle 552a332 is described below commit 552a332dd464a47531574c4aba97060410d889cb Author: Min Shen AuthorDate: Fri Aug 6 09:47:42 2021 -0500 [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle ### What changes were proposed in this pull request? On the client side, we are currently randomizing the order of push requests before processing each request. In addition we can further randomize the order of blocks within each push request before pushing them. In our benchmark, this has resulted in a 60%-70% reduction of blocks that fail to be merged due to bock collision (the existing block merge ratio is already pretty good in general, and this further improves it). ### Why are the changes needed? Improve block merge ratio for push-based shuffle ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Straightforward small change, no additional test needed. Closes #33649 from Victsm/SPARK-36423. Lead-authored-by: Min Shen Co-authored-by: Min Shen Signed-off-by: Mridul Muralidharan gmail.com> (cherry picked from commit 6e729515fd2bb228afed964b50f0d02329684934) Signed-off-by: Mridul Muralidharan --- .../scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala b/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala index 56f915b..ecaa4f0 100644 --- a/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala +++ b/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala @@ -242,10 +242,16 @@ private[spark] class ShuffleBlockPusher(conf: SparkConf) extends Logging { handleResult(PushResult(blockId, exception)) } } +// In addition to randomizing the order of the push requests, further randomize the order +// of blocks within the push request to further reduce the likelihood of shuffle server side +// collision of pushed blocks. This does not increase the cost of reading unmerged shuffle +// files on the executor side, because we are still reading MB-size chunks and only randomize +// the in-memory sliced buffers post reading. +val (blockPushIds, blockPushBuffers) = Utils.randomize(blockIds.zip( + sliceReqBufferIntoBlockBuffers(request.reqBuffer, request.blocks.map(_._2.unzip SparkEnv.get.blockManager.blockStoreClient.pushBlocks( - address.host, address.port, blockIds.toArray, - sliceReqBufferIntoBlockBuffers(request.reqBuffer, request.blocks.map(_._2)), - blockPushListener) + address.host, address.port, blockPushIds.toArray, + blockPushBuffers.toArray, blockPushListener) } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6e72951 [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle 6e72951 is described below commit 6e729515fd2bb228afed964b50f0d02329684934 Author: Min Shen AuthorDate: Fri Aug 6 09:47:42 2021 -0500 [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle ### What changes were proposed in this pull request? On the client side, we are currently randomizing the order of push requests before processing each request. In addition we can further randomize the order of blocks within each push request before pushing them. In our benchmark, this has resulted in a 60%-70% reduction of blocks that fail to be merged due to bock collision (the existing block merge ratio is already pretty good in general, and this further improves it). ### Why are the changes needed? Improve block merge ratio for push-based shuffle ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Straightforward small change, no additional test needed. Closes #33649 from Victsm/SPARK-36423. Lead-authored-by: Min Shen Co-authored-by: Min Shen Signed-off-by: Mridul Muralidharan gmail.com> --- .../scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala b/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala index 56f915b..ecaa4f0 100644 --- a/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala +++ b/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala @@ -242,10 +242,16 @@ private[spark] class ShuffleBlockPusher(conf: SparkConf) extends Logging { handleResult(PushResult(blockId, exception)) } } +// In addition to randomizing the order of the push requests, further randomize the order +// of blocks within the push request to further reduce the likelihood of shuffle server side +// collision of pushed blocks. This does not increase the cost of reading unmerged shuffle +// files on the executor side, because we are still reading MB-size chunks and only randomize +// the in-memory sliced buffers post reading. +val (blockPushIds, blockPushBuffers) = Utils.randomize(blockIds.zip( + sliceReqBufferIntoBlockBuffers(request.reqBuffer, request.blocks.map(_._2.unzip SparkEnv.get.blockManager.blockStoreClient.pushBlocks( - address.host, address.port, blockIds.toArray, - sliceReqBufferIntoBlockBuffers(request.reqBuffer, request.blocks.map(_._2)), - blockPushListener) + address.host, address.port, blockPushIds.toArray, + blockPushBuffers.toArray, blockPushListener) } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-595][DOCS] Add local-cluster mode option in Documentation
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new a5d0eaf [SPARK-595][DOCS] Add local-cluster mode option in Documentation a5d0eaf is described below commit a5d0eafa324279e4516bd4c6b544b0cc7dbbd4e3 Author: Yuto Akutsu AuthorDate: Fri Aug 6 09:26:13 2021 -0500 [SPARK-595][DOCS] Add local-cluster mode option in Documentation ### What changes were proposed in this pull request? Add local-cluster mode option to submitting-applications.md ### Why are the changes needed? Help users to find/use this option for unit tests. ### Does this PR introduce _any_ user-facing change? Yes, docs changed. ### How was this patch tested? `SKIP_API=1 bundle exec jekyll build` https://user-images.githubusercontent.com/87687356/127125380-6beb4601-7cf4-4876-b2c6-459454ce2a02.png;> Closes #33537 from yutoacts/SPARK-595. Lead-authored-by: Yuto Akutsu Co-authored-by: Yuto Akutsu Co-authored-by: Yuto Akutsu <87687356+yutoa...@users.noreply.github.com> Signed-off-by: Thomas Graves (cherry picked from commit 41b011e416286374e2e8e8dea36ba79f4c403040) Signed-off-by: Thomas Graves --- docs/submitting-applications.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md index 0319859..402dd06 100644 --- a/docs/submitting-applications.md +++ b/docs/submitting-applications.md @@ -162,9 +162,10 @@ The master URL passed to Spark can be in one of the following formats: Master URLMeaning local Run Spark locally with one worker thread (i.e. no parallelism at all). local[K] Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). - local[K,F] Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable) + local[K,F] Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable). local[*] Run Spark locally with as many worker threads as logical cores on your machine. local[*,F] Run Spark locally with as many worker threads as logical cores on your machine and F maxFailures. + local-cluster[N,C,M] Local-cluster mode is only for unit tests. It emulates a distributed cluster in a single JVM with N number of workers, C cores per worker and M MiB of memory per worker. spark://HOST:PORT Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-595][DOCS] Add local-cluster mode option in Documentation
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 41b011e [SPARK-595][DOCS] Add local-cluster mode option in Documentation 41b011e is described below commit 41b011e416286374e2e8e8dea36ba79f4c403040 Author: Yuto Akutsu AuthorDate: Fri Aug 6 09:26:13 2021 -0500 [SPARK-595][DOCS] Add local-cluster mode option in Documentation ### What changes were proposed in this pull request? Add local-cluster mode option to submitting-applications.md ### Why are the changes needed? Help users to find/use this option for unit tests. ### Does this PR introduce _any_ user-facing change? Yes, docs changed. ### How was this patch tested? `SKIP_API=1 bundle exec jekyll build` https://user-images.githubusercontent.com/87687356/127125380-6beb4601-7cf4-4876-b2c6-459454ce2a02.png;> Closes #33537 from yutoacts/SPARK-595. Lead-authored-by: Yuto Akutsu Co-authored-by: Yuto Akutsu Co-authored-by: Yuto Akutsu <87687356+yutoa...@users.noreply.github.com> Signed-off-by: Thomas Graves --- docs/submitting-applications.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md index 0319859..402dd06 100644 --- a/docs/submitting-applications.md +++ b/docs/submitting-applications.md @@ -162,9 +162,10 @@ The master URL passed to Spark can be in one of the following formats: Master URLMeaning local Run Spark locally with one worker thread (i.e. no parallelism at all). local[K] Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). - local[K,F] Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable) + local[K,F] Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable). local[*] Run Spark locally with as many worker threads as logical cores on your machine. local[*,F] Run Spark locally with as many worker threads as logical cores on your machine and F maxFailures. + local-cluster[N,C,M] Local-cluster mode is only for unit tests. It emulates a distributed cluster in a single JVM with N number of workers, C cores per worker and M MiB of memory per worker. spark://HOST:PORT Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported"
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 586eb5d Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported" 586eb5d is described below commit 586eb5d4c6b01b008cb0ace076f94f49580201de Author: Kousuke Saruta AuthorDate: Fri Aug 6 20:56:24 2021 +0900 Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported" ### What changes were proposed in this pull request? This PR reverts the change in SPARK-36429 (#33654). See [conversation](https://github.com/apache/spark/pull/33654#issuecomment-894160037). ### Why are the changes needed? To recover CIs. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33670 from sarutak/revert-SPARK-36429. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta (cherry picked from commit e17612d0bfa1b1dc719f6f2c202e2a4ea7870ff1) Signed-off-by: Kousuke Saruta --- .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala | 8 ++-- .../sql-tests/results/timestampNTZ/timestamp-ansi.sql.out | 5 ++--- .../resources/sql-tests/results/timestampNTZ/timestamp.sql.out| 5 ++--- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala index 2761c52..04a0f1a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala @@ -330,8 +330,12 @@ class JacksonParser( case udt: UserDefinedType[_] => makeConverter(udt.sqlType) -// We don't actually hit this exception though, we keep it for understandability -case _ => throw QueryExecutionErrors.unsupportedTypeError(dataType) +case _ => + (parser: JsonParser) => +// Here, we pass empty `PartialFunction` so that this case can be +// handled as a failed conversion. It will throw an exception as +// long as the value is not null. +parseJsonToken[AnyRef](parser, dataType)(PartialFunction.empty[JsonToken, AnyRef]) } /** diff --git a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out index fae7721..fe83675 100644 --- a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out @@ -661,10 +661,9 @@ You may get a different result due to the upgrading of Spark 3.0: Fail to recogn -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema -struct<> +struct> -- !query output -java.lang.Exception -Unsupported type: timestamp_ntz +{"t":null} -- !query diff --git a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out index c6de535..b8a6800 100644 --- a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out @@ -642,10 +642,9 @@ You may get a different result due to the upgrading of Spark 3.0: Fail to recogn -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema -struct<> +struct> -- !query output -java.lang.Exception -Unsupported type: timestamp_ntz +{"t":null} -- !query - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (888f8f0 -> e17612d)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 888f8f0 [SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced add e17612d Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported" No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala | 8 ++-- .../sql-tests/results/timestampNTZ/timestamp-ansi.sql.out | 5 ++--- .../resources/sql-tests/results/timestampNTZ/timestamp.sql.out| 5 ++--- 3 files changed, 10 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 33e4ce5 [SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced 33e4ce5 is described below commit 33e4ce562a59d191b06d477ecc6d9230e43b96b8 Author: gaoyajun02 AuthorDate: Fri Aug 6 16:34:37 2021 +0800 [SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced ### What changes were proposed in this pull request? Currently, references to grouping sets are reported as errors after aggregated expressions, e.g. ``` SELECT count(name) c, name FROM VALUES ('Alice'), ('Bob') people(name) GROUP BY name GROUPING SETS(name); ``` Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; ### Why are the changes needed? Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests. Closes #33574 from gaoyajun02/SPARK-36339. Lead-authored-by: gaoyajun02 Co-authored-by: gaoyajun02 Signed-off-by: Wenchen Fan (cherry picked from commit 888f8f03c89ea7ee8997171eadf64c87e17c4efe) Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 + 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 75fad11a..963b42b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -580,7 +580,7 @@ class Analyzer(override val catalogManager: CatalogManager) aggregations: Seq[NamedExpression], groupByAliases: Seq[Alias], groupingAttrs: Seq[Expression], -gid: Attribute): Seq[NamedExpression] = aggregations.map { +gid: Attribute): Seq[NamedExpression] = aggregations.map { agg => // collect all the found AggregateExpression, so we can check an expression is part of // any AggregateExpression or not. val aggsBuffer = ArrayBuffer[Expression]() @@ -588,7 +588,7 @@ class Analyzer(override val catalogManager: CatalogManager) def isPartOfAggregation(e: Expression): Boolean = { aggsBuffer.exists(a => a.find(_ eq e).isDefined) } - replaceGroupingFunc(_, groupByExprs, gid).transformDown { + replaceGroupingFunc(agg, groupByExprs, gid).transformDown { // AggregateExpression should be computed on the unmodified value of its argument // expressions, so we should not replace any references to grouping expression // inside it. diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index ed3b479..032ddbb 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -3405,6 +3405,19 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } + test("SPARK-36339: References to grouping attributes should be replaced") { +withTempView("t") { + Seq("a", "a", "b").toDF("x").createOrReplaceTempView("t") + checkAnswer( +sql( + """ +|select count(x) c, x from t +|group by x grouping sets(x) + """.stripMargin), +Seq(Row(2, "a"), Row(1, "b"))) +} + } + test("SPARK-31166: UNION map and other maps should not fail") { checkAnswer( sql("(SELECT map()) UNION ALL (SELECT map(1, 2))"), - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7bb53b8 -> 888f8f0)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7bb53b8 [SPARK-36098][CORE] Grouping exception in core/storage add 888f8f0 [SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 + 2 files changed, 15 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c97fb68 -> 7bb53b8)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c97fb68 [SPARK-35221][SQL] Add the check of supported join hints add 7bb53b8 [SPARK-36098][CORE] Grouping exception in core/storage No new revisions were added by this update. Summary of changes: .../org/apache/spark/errors/SparkCoreErrors.scala | 97 +- .../scala/org/apache/spark/storage/BlockId.scala | 4 +- .../apache/spark/storage/BlockInfoManager.scala| 8 +- .../org/apache/spark/storage/BlockManager.scala| 26 +++--- .../spark/storage/BlockManagerDecommissioner.scala | 3 +- .../apache/spark/storage/BlockManagerMaster.scala | 7 +- ...avedOnDecommissionedBlockManagerException.scala | 2 +- .../apache/spark/storage/DiskBlockManager.scala| 5 +- .../spark/storage/DiskBlockObjectWriter.scala | 3 +- .../storage/ShuffleBlockFetcherIterator.scala | 16 ++-- 10 files changed, 133 insertions(+), 38 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (63c7d18 -> c97fb68)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 63c7d18 [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply with the change in SPARK-36429 add c97fb68 [SPARK-35221][SQL] Add the check of supported join hints No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/HintErrorLogger.scala| 4 + .../spark/sql/catalyst/optimizer/joins.scala | 4 + .../spark/sql/catalyst/plans/logical/hints.scala | 6 ++ .../spark/sql/execution/SparkStrategies.scala | 48 ++- .../scala/org/apache/spark/sql/JoinHintSuite.scala | 95 ++ 5 files changed, 154 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply with the change in SPARK-36429
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new f3761bd [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply with the change in SPARK-36429 f3761bd is described below commit f3761bdb76559ff666effde31bf14773a75c452b Author: Kousuke Saruta AuthorDate: Fri Aug 6 15:20:54 2021 +0800 [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply with the change in SPARK-36429 ### What changes were proposed in this pull request? This PR updates a golden to comply with the change in SPARK-36429 (#33654). ### Why are the changes needed? To recover GA failure. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? GA itself. Closes #33663 from sarutak/followup-SPARK-36429. Authored-by: Kousuke Saruta Signed-off-by: Wenchen Fan (cherry picked from commit 63c7d1847d97dca5ceb9a46c77a623cb78565f5b) Signed-off-by: Wenchen Fan --- .../resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out index fe83675..fae7721 100644 --- a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out @@ -661,9 +661,10 @@ You may get a different result due to the upgrading of Spark 3.0: Fail to recogn -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema -struct> +struct<> -- !query output -{"t":null} +java.lang.Exception +Unsupported type: timestamp_ntz -- !query - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (eb12727 -> 63c7d18)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from eb12727 [SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported add 63c7d18 [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply with the change in SPARK-36429 No new revisions were added by this update. Summary of changes: .../resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new be19270 [SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported be19270 is described below commit be192708809de363a04895e62bc1ca1216658395 Author: gengjiaan AuthorDate: Fri Aug 6 12:53:04 2021 +0800 [SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported ### What changes were proposed in this pull request? Currently, when `set spark.sql.timestampType=TIMESTAMP_NTZ`, the behavior is different between `from_json` and `from_csv`. ``` -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct> -- !query output {"t":null} ``` ``` -- !query select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct<> -- !query output java.lang.Exception Unsupported type: timestamp_ntz ``` We should make `from_json` throws exception too. This PR fix the discussion below https://github.com/apache/spark/pull/33640#discussion_r682862523 ### Why are the changes needed? Make the behavior of `from_json` more reasonable. ### Does this PR introduce _any_ user-facing change? 'Yes'. from_json throwing Exception when we set spark.sql.timestampType=TIMESTAMP_NTZ. ### How was this patch tested? Tests updated. Closes #33654 from beliefer/SPARK-36429. Authored-by: gengjiaan Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala | 8 ++-- .../resources/sql-tests/results/timestampNTZ/timestamp.sql.out| 5 +++-- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala index 04a0f1a..2761c52 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala @@ -330,12 +330,8 @@ class JacksonParser( case udt: UserDefinedType[_] => makeConverter(udt.sqlType) -case _ => - (parser: JsonParser) => -// Here, we pass empty `PartialFunction` so that this case can be -// handled as a failed conversion. It will throw an exception as -// long as the value is not null. -parseJsonToken[AnyRef](parser, dataType)(PartialFunction.empty[JsonToken, AnyRef]) +// We don't actually hit this exception though, we keep it for understandability +case _ => throw QueryExecutionErrors.unsupportedTypeError(dataType) } /** diff --git a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out index b8a6800..c6de535 100644 --- a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out @@ -642,9 +642,10 @@ You may get a different result due to the upgrading of Spark 3.0: Fail to recogn -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema -struct> +struct<> -- !query output -{"t":null} +java.lang.Exception +Unsupported type: timestamp_ntz -- !query - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org