[spark] branch branch-3.2 updated: [MINOR][DOCS] Add Apache license header to GitHub Actions workflow files
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new e3edb65 [MINOR][DOCS] Add Apache license header to GitHub Actions workflow files e3edb65 is described below commit e3edb65bf07ded55289c7101876ead9fa8633375 Author: Hyukjin Kwon AuthorDate: Sat Aug 28 20:30:16 2021 -0700 [MINOR][DOCS] Add Apache license header to GitHub Actions workflow files ### What changes were proposed in this pull request? Some of GitHub Actions workflow files do not have the Apache license header. This PR adds them. ### Why are the changes needed? To comply Apache license. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33862 from HyukjinKwon/minor-lisence. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun (cherry picked from commit 22c492a6b827be65fc42b3130c67a698323b9b4e) Signed-off-by: Dongjoon Hyun --- .github/workflows/benchmark.yml | 19 +++ .github/workflows/build_and_test.yml | 19 +++ .github/workflows/cancel_duplicate_workflow_runs.yml | 19 +++ .github/workflows/publish_snapshot.yml | 19 +++ .github/workflows/stale.yml | 19 +++ .github/workflows/test_report.yml| 19 +++ 6 files changed, 114 insertions(+) diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml index 9a599b9..2e5fe3f 100644 --- a/.github/workflows/benchmark.yml +++ b/.github/workflows/benchmark.yml @@ -1,3 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + name: Run benchmarks on: diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 7fc99ef..23406f9 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -1,3 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + name: Build and test on: diff --git a/.github/workflows/cancel_duplicate_workflow_runs.yml b/.github/workflows/cancel_duplicate_workflow_runs.yml index b20fc94..1077371 100644 --- a/.github/workflows/cancel_duplicate_workflow_runs.yml +++ b/.github/workflows/cancel_duplicate_workflow_runs.yml @@ -1,3 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + name: Cancelling Duplicates on: workflow_run: diff --git a/.github/workflows/publish_snapshot.yml
[spark] branch master updated: [MINOR][DOCS] Add Apache license header to GitHub Actions workflow files
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 22c492a6 [MINOR][DOCS] Add Apache license header to GitHub Actions workflow files 22c492a6 is described below commit 22c492a6b827be65fc42b3130c67a698323b9b4e Author: Hyukjin Kwon AuthorDate: Sat Aug 28 20:30:16 2021 -0700 [MINOR][DOCS] Add Apache license header to GitHub Actions workflow files ### What changes were proposed in this pull request? Some of GitHub Actions workflow files do not have the Apache license header. This PR adds them. ### Why are the changes needed? To comply Apache license. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33862 from HyukjinKwon/minor-lisence. Authored-by: Hyukjin Kwon Signed-off-by: Dongjoon Hyun --- .github/workflows/benchmark.yml | 19 +++ .github/workflows/build_and_test.yml | 19 +++ .github/workflows/cancel_duplicate_workflow_runs.yml | 19 +++ .github/workflows/publish_snapshot.yml | 19 +++ .github/workflows/stale.yml | 19 +++ .github/workflows/test_report.yml| 19 +++ 6 files changed, 114 insertions(+) diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml index 9a599b9..2e5fe3f 100644 --- a/.github/workflows/benchmark.yml +++ b/.github/workflows/benchmark.yml @@ -1,3 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + name: Run benchmarks on: diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 77b7111..20ee740 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -1,3 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + name: Build and test on: diff --git a/.github/workflows/cancel_duplicate_workflow_runs.yml b/.github/workflows/cancel_duplicate_workflow_runs.yml index b20fc94..1077371 100644 --- a/.github/workflows/cancel_duplicate_workflow_runs.yml +++ b/.github/workflows/cancel_duplicate_workflow_runs.yml @@ -1,3 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + name: Cancelling Duplicates on: workflow_run: diff --git a/.github/workflows/publish_snapshot.yml b/.github/workflows/publish_snapshot.yml index 75ba737..46f4f7a 100644 --- a/.github/workflows/publish_snapshot.yml +++
[spark] branch branch-3.2 updated: [SPARK-36606][DOCS][TESTS] Enhance the docs and tests of try_add/try_divide
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 3719d87 [SPARK-36606][DOCS][TESTS] Enhance the docs and tests of try_add/try_divide 3719d87 is described below commit 3719d8766868a24706c3172fbfb4bb4e5ea7c4b4 Author: Gengliang Wang AuthorDate: Sun Aug 29 10:30:04 2021 +0900 [SPARK-36606][DOCS][TESTS] Enhance the docs and tests of try_add/try_divide ### What changes were proposed in this pull request? The `try_add` function allows the following inputs: - number, number - date, number - date, interval - timestamp, interval - interval, interval And, the `try_divide` function allows the following inputs: - number, number - interval, number However, in the current code, there are only examples and tests about the (number, number) inputs. We should enhance the docs to let users know that the functions can be used for datetime and interval operations too. ### Why are the changes needed? Improve documentation and tests. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Also build docs for preview: ![image](https://user-images.githubusercontent.com/1097932/131212897-8aea14c8-a882-4e12-94e2-f56bde7c0367.png) Closes #33861 from gengliangwang/enhanceTryDoc. Authored-by: Gengliang Wang Signed-off-by: Hyukjin Kwon (cherry picked from commit 8a52ad9f82982b443afce6b92ccbd9c0d7e88a21) Signed-off-by: Hyukjin Kwon --- .../spark/sql/catalyst/expressions/TryEval.scala | 20 ++- .../resources/sql-tests/inputs/try_arithmetic.sql | 37 - .../sql-tests/results/try_arithmetic.sql.out | 171 - 3 files changed, 222 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala index a75db1b..bc2604a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala @@ -53,17 +53,28 @@ case class TryEval(child: Expression) extends UnaryExpression with NullIntoleran copy(child = newChild) } +// scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr1, expr2) - Returns `expr1`+`expr2` and the result is null on overflow.", + usage = "_FUNC_(expr1, expr2) - Returns the sum of `expr1`and `expr2` and the result is null on overflow. " + +"The acceptable input types are the same with the `+` operator.", examples = """ Examples: > SELECT _FUNC_(1, 2); 3 > SELECT _FUNC_(2147483647, 1); NULL + > SELECT _FUNC_(date'2021-01-01', 1); + 2021-01-02 + > SELECT _FUNC_(date'2021-01-01', interval 1 year); + 2022-01-01 + > SELECT _FUNC_(timestamp'2021-01-01 00:00:00', interval 1 day); + 2021-01-02 00:00:00 + > SELECT _FUNC_(interval 1 year, interval 2 year); + 3-0 """, since = "3.2.0", group = "math_funcs") +// scalastyle:on line.size.limit case class TryAdd(left: Expression, right: Expression, child: Expression) extends RuntimeReplaceable { def this(left: Expression, right: Expression) = @@ -81,7 +92,8 @@ case class TryAdd(left: Expression, right: Expression, child: Expression) // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr1, expr2) - Returns `expr1`/`expr2`. It always performs floating point division. Its result is always null if `expr2` is 0.", + usage = "_FUNC_(dividend, divisor) - Returns `dividend`/`divisor`. It always performs floating point division. Its result is always null if `expr2` is 0. " + +"`dividend` must be a numeric or an interval. `divisor` must be a numeric.", examples = """ Examples: > SELECT _FUNC_(3, 2); @@ -90,6 +102,10 @@ case class TryAdd(left: Expression, right: Expression, child: Expression) 1.0 > SELECT _FUNC_(1, 0); NULL + > SELECT _FUNC_(interval 2 month, 2); + 0-1 + > SELECT _FUNC_(interval 2 month, 0); + NULL """, since = "3.2.0", group = "math_funcs") diff --git a/sql/core/src/test/resources/sql-tests/inputs/try_arithmetic.sql b/sql/core/src/test/resources/sql-tests/inputs/try_arithmetic.sql index cda83e8..5962a5d 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/try_arithmetic.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/try_arithmetic.sql @@ -1,11 +1,42 @@ --- TRY_ADD +-- Numeric + Numeric SELECT try_add(1, 1); SELECT try_add(2147483647, 1); SELECT try_add(-2147483648, -1); SELECT try_add(9223372036854775807L, 1);
[spark] branch master updated (8ffb59d -> 8a52ad9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8ffb59d [SPARK-36583][BUILD] Upgrade Apache commons-pool2 from 2.6.2 to 2.11.1 add 8a52ad9 [SPARK-36606][DOCS][TESTS] Enhance the docs and tests of try_add/try_divide No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/TryEval.scala | 20 ++- .../resources/sql-tests/inputs/try_arithmetic.sql | 37 - .../sql-tests/results/try_arithmetic.sql.out | 171 - 3 files changed, 222 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4356d66 -> 8ffb59d)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4356d66 [SPARK-36605][BUILD] Upgrade Jackson to 2.12.5 add 8ffb59d [SPARK-36583][BUILD] Upgrade Apache commons-pool2 from 2.6.2 to 2.11.1 No new revisions were added by this update. Summary of changes: pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36605][BUILD] Upgrade Jackson to 2.12.5
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4356d66 [SPARK-36605][BUILD] Upgrade Jackson to 2.12.5 4356d66 is described below commit 4356d6603a6dbb965a773dbb1c3d34bfa0bca7bc Author: Kousuke Saruta AuthorDate: Sat Aug 28 15:57:24 2021 -0700 [SPARK-36605][BUILD] Upgrade Jackson to 2.12.5 ### What changes were proposed in this pull request? This PR upgrades Jackson from `2.12.3` to `2.12.5`. ### Why are the changes needed? Recently, Jackson `2.12.5` was released and it seems to be expected as the last full patch release for 2.12.x. This release includes a fix for a regression in jackson-databind introduced in `2.12.3` which Spark 3.2 currently depends on. https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12.5 ### Does this PR introduce _any_ user-facing change? Dependency maintenance. ### How was this patch tested? CIs. Closes #33860 from sarutak/upgrade-jackson-2.12.5. Authored-by: Kousuke Saruta Signed-off-by: Liang-Chi Hsieh --- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 10 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 10 +- pom.xml | 2 +- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index 10eaa5c..512b3c1 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -106,15 +106,15 @@ httpclient/4.5.13//httpclient-4.5.13.jar httpcore/4.4.14//httpcore-4.4.14.jar istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar ivy/2.5.0//ivy-2.5.0.jar -jackson-annotations/2.12.3//jackson-annotations-2.12.3.jar +jackson-annotations/2.12.5//jackson-annotations-2.12.5.jar jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar -jackson-core/2.12.3//jackson-core-2.12.3.jar -jackson-databind/2.12.3//jackson-databind-2.12.3.jar -jackson-dataformat-yaml/2.12.3//jackson-dataformat-yaml-2.12.3.jar +jackson-core/2.12.5//jackson-core-2.12.5.jar +jackson-databind/2.12.5//jackson-databind-2.12.5.jar +jackson-dataformat-yaml/2.12.5//jackson-dataformat-yaml-2.12.5.jar jackson-datatype-jsr310/2.11.2//jackson-datatype-jsr310-2.11.2.jar jackson-jaxrs/1.9.13//jackson-jaxrs-1.9.13.jar jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar -jackson-module-scala_2.12/2.12.3//jackson-module-scala_2.12-2.12.3.jar +jackson-module-scala_2.12/2.12.5//jackson-module-scala_2.12-2.12.5.jar jackson-xc/1.9.13//jackson-xc-1.9.13.jar jakarta.annotation-api/1.3.5//jakarta.annotation-api-1.3.5.jar jakarta.inject/2.6.1//jakarta.inject-2.6.1.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index 2f9e709..5c05be7 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -84,14 +84,14 @@ httpclient/4.5.13//httpclient-4.5.13.jar httpcore/4.4.14//httpcore-4.4.14.jar istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar ivy/2.5.0//ivy-2.5.0.jar -jackson-annotations/2.12.3//jackson-annotations-2.12.3.jar +jackson-annotations/2.12.5//jackson-annotations-2.12.5.jar jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar -jackson-core/2.12.3//jackson-core-2.12.3.jar -jackson-databind/2.12.3//jackson-databind-2.12.3.jar -jackson-dataformat-yaml/2.12.3//jackson-dataformat-yaml-2.12.3.jar +jackson-core/2.12.5//jackson-core-2.12.5.jar +jackson-databind/2.12.5//jackson-databind-2.12.5.jar +jackson-dataformat-yaml/2.12.5//jackson-dataformat-yaml-2.12.5.jar jackson-datatype-jsr310/2.11.2//jackson-datatype-jsr310-2.11.2.jar jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar -jackson-module-scala_2.12/2.12.3//jackson-module-scala_2.12-2.12.3.jar +jackson-module-scala_2.12/2.12.5//jackson-module-scala_2.12-2.12.5.jar jakarta.annotation-api/1.3.5//jakarta.annotation-api-1.3.5.jar jakarta.inject/2.6.1//jakarta.inject-2.6.1.jar jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar diff --git a/pom.xml b/pom.xml index 214a5e8..08d5aa8 100644 --- a/pom.xml +++ b/pom.xml @@ -167,7 +167,7 @@ true 1.9.13 -2.12.3 +2.12.5 1.1.8.4 1.1.2 2.2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (c420149 -> 068465d)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from c420149 [SPARK-36352][SQL][3.0] Spark should check result plan's output schema name add 068465d [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 1 + core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 0af666a [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster 0af666a is described below commit 0af666a310590367a80439000d74975526064c87 Author: Kousuke Saruta AuthorDate: Sat Aug 28 18:01:55 2021 +0900 [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster ### What changes were proposed in this pull request? This PR fixes an issue that executors are never re-scheduled if the worker which the executors run on stops. As a result, the application stucks. You can easily reproduce this issue by the following procedures. ``` # Run master $ sbin/start-master.sh # Run worker 1 $ SPARK_LOG_DIR=/tmp/worker1 SPARK_PID_DIR=/tmp/worker1/ sbin/start-worker.sh -c 1 -h localhost -d /tmp/worker1 --webui-port 8081 spark://:7077 # Run worker 2 $ SPARK_LOG_DIR=/tmp/worker2 SPARK_PID_DIR=/tmp/worker2/ sbin/start-worker.sh -c 1 -h localhost -d /tmp/worker2 --webui-port 8082 spark://:7077 # Run Spark Shell $ bin/spark-shell --master spark://:7077 --executor-cores 1 --total-executor-cores 1 # Check which worker the executor runs on and then kill the worker. $ kill ``` With the procedure above, we will expect that the executor is re-scheduled on the other worker but it won't. The reason seems that `Master.schedule` cannot be called after the worker is marked as `WorkerState.DEAD`. So, the solution this PR proposes is to call `Master.schedule` whenever `Master.removeWorker` is called. This PR also fixes an issue that `ExecutorRunner` can send `ExecutorStateChanged` message without changing its state. This issue causes assertion error. ``` 2021-08-13 14:05:37,991 [dispatcher-event-loop-9] ERROR: Ignoring errorjava.lang.AssertionError: assertion failed: executor 0 state transfer from RUNNING to RUNNING is illegal ``` ### Why are the changes needed? It's a critical bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested with the procedure shown above and confirmed the executor is re-scheduled. Closes #33818 from sarutak/fix-scheduling-stuck. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta (cherry picked from commit ea8c31e5ea233da4407f6821b2d6dd7f3c88f8d9) Signed-off-by: Kousuke Saruta --- core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 1 + core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala index 9f1b36a..1cbeacf 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala @@ -964,6 +964,7 @@ private[deploy] class Master( app.driver.send(WorkerRemoved(worker.id, worker.host, msg)) } persistenceEngine.removeWorker(worker) +schedule() } private def relaunchDriver(driver: DriverInfo): Unit = { diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala index 974c2d6..40d9407 100644 --- a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala +++ b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala @@ -83,7 +83,7 @@ private[deploy] class ExecutorRunner( shutdownHook = ShutdownHookManager.addShutdownHook { () => // It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will // be `ExecutorState.LAUNCHING`. In this case, we should set `state` to `FAILED`. - if (state == ExecutorState.LAUNCHING) { + if (state == ExecutorState.LAUNCHING || state == ExecutorState.RUNNING) { state = ExecutorState.FAILED } killProcess(Some("Worker shutting down")) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 93f2b00 [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster 93f2b00 is described below commit 93f2b00501c7fad20fb6bc130b548cb87e9f91f1 Author: Kousuke Saruta AuthorDate: Sat Aug 28 18:01:55 2021 +0900 [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster ### What changes were proposed in this pull request? This PR fixes an issue that executors are never re-scheduled if the worker which the executors run on stops. As a result, the application stucks. You can easily reproduce this issue by the following procedures. ``` # Run master $ sbin/start-master.sh # Run worker 1 $ SPARK_LOG_DIR=/tmp/worker1 SPARK_PID_DIR=/tmp/worker1/ sbin/start-worker.sh -c 1 -h localhost -d /tmp/worker1 --webui-port 8081 spark://:7077 # Run worker 2 $ SPARK_LOG_DIR=/tmp/worker2 SPARK_PID_DIR=/tmp/worker2/ sbin/start-worker.sh -c 1 -h localhost -d /tmp/worker2 --webui-port 8082 spark://:7077 # Run Spark Shell $ bin/spark-shell --master spark://:7077 --executor-cores 1 --total-executor-cores 1 # Check which worker the executor runs on and then kill the worker. $ kill ``` With the procedure above, we will expect that the executor is re-scheduled on the other worker but it won't. The reason seems that `Master.schedule` cannot be called after the worker is marked as `WorkerState.DEAD`. So, the solution this PR proposes is to call `Master.schedule` whenever `Master.removeWorker` is called. This PR also fixes an issue that `ExecutorRunner` can send `ExecutorStateChanged` message without changing its state. This issue causes assertion error. ``` 2021-08-13 14:05:37,991 [dispatcher-event-loop-9] ERROR: Ignoring errorjava.lang.AssertionError: assertion failed: executor 0 state transfer from RUNNING to RUNNING is illegal ``` ### Why are the changes needed? It's a critical bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested with the procedure shown above and confirmed the executor is re-scheduled. Closes #33818 from sarutak/fix-scheduling-stuck. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta (cherry picked from commit ea8c31e5ea233da4407f6821b2d6dd7f3c88f8d9) Signed-off-by: Kousuke Saruta --- core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 1 + core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala index c964e34..7dbf6b9 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala @@ -965,6 +965,7 @@ private[deploy] class Master( app.driver.send(WorkerRemoved(worker.id, worker.host, msg)) } persistenceEngine.removeWorker(worker) +schedule() } private def relaunchDriver(driver: DriverInfo): Unit = { diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala index 974c2d6..40d9407 100644 --- a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala +++ b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala @@ -83,7 +83,7 @@ private[deploy] class ExecutorRunner( shutdownHook = ShutdownHookManager.addShutdownHook { () => // It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will // be `ExecutorState.LAUNCHING`. In this case, we should set `state` to `FAILED`. - if (state == ExecutorState.LAUNCHING) { + if (state == ExecutorState.LAUNCHING || state == ExecutorState.RUNNING) { state = ExecutorState.FAILED } killProcess(Some("Worker shutting down")) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fe7bf5f -> ea8c31e)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fe7bf5f [SPARK-36327][SQL] Spark sql creates staging dir inside database directory rather than creating inside table directory add ea8c31e [SPARK-36509][CORE] Fix the issue that executors are never re-scheduled if the worker stops with standalone cluster No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 1 + core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org