[spark] branch branch-3.0 updated (635feaa -> d270be4)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 635feaa [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types add d270be4 [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard No new revisions were added by this update. Summary of changes: .../apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala | 10 ++ 1 file changed, 10 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d270be4 [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard d270be4 is described below commit d270be43e38f1b44a99c4035cc1829751c223346 Author: Kent Yao AuthorDate: Fri May 15 06:36:34 2020 + [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard ### What changes were proposed in this pull request? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122622/testReport/junit/org.apache.spark.sql.hive.thriftserver/SparkSQLEnvSuite/SPARK_29604_external_listeners_should_be_initialized_with_Spark_classloader/history/?start=25 According to the test report history of SparkSQLEnvSuite,this test fails frequently which is caused by single derby instance restriction. ```java Caused by: sbt.ForkMain$ForkError: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source) ... 138 more ``` This PR adds a separate directory to locate the metastore_db for this test which runs in a dedicated JVM. Besides, diable the UI for the potential race on `spark.ui.port` which may also let the test case become flaky. ### Why are the changes needed? test fix ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? SparkSQLEnvSuite itself. Closes #28537 from yaooqinn/SPARK-31715. Authored-by: Kent Yao Signed-off-by: Wenchen Fan (cherry picked from commit 503faa24d33eb52da79ad99e39c8c011597499ea) Signed-off-by: Wenchen Fan --- .../apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala | 10 ++ 1 file changed, 10 insertions(+) diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala index ffd1fc4..f28faea 100644 --- a/sql/hive-thriftser
[spark] branch master updated (c7ce37d -> 503faa2)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c7ce37d [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types add 503faa2 [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard No new revisions were added by this update. Summary of changes: .../apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala | 10 ++ 1 file changed, 10 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 503faa2 [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard 503faa2 is described below commit 503faa24d33eb52da79ad99e39c8c011597499ea Author: Kent Yao AuthorDate: Fri May 15 06:36:34 2020 + [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard ### What changes were proposed in this pull request? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122622/testReport/junit/org.apache.spark.sql.hive.thriftserver/SparkSQLEnvSuite/SPARK_29604_external_listeners_should_be_initialized_with_Spark_classloader/history/?start=25 According to the test report history of SparkSQLEnvSuite,this test fails frequently which is caused by single derby instance restriction. ```java Caused by: sbt.ForkMain$ForkError: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source) ... 138 more ``` This PR adds a separate directory to locate the metastore_db for this test which runs in a dedicated JVM. Besides, diable the UI for the potential race on `spark.ui.port` which may also let the test case become flaky. ### Why are the changes needed? test fix ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? SparkSQLEnvSuite itself. Closes #28537 from yaooqinn/SPARK-31715. Authored-by: Kent Yao Signed-off-by: Wenchen Fan --- .../apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala | 10 ++ 1 file changed, 10 insertions(+) diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala index ffd1fc4..f28faea 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala +++ b/sql/hive-thriftserver/src
[spark] branch branch-3.0 updated: [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 635feaa [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types 635feaa is described below commit 635feaa0ef6298e336a447e1fdcfeca403b741bd Author: Max Gekk AuthorDate: Fri May 15 04:24:58 2020 + [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types ### What changes were proposed in this pull request? Added tests to check casting timestamps before 1970-01-01 00:00:00Z to ByteType, ShortType, IntegerType and LongType in ansi and non-ansi modes. ### Why are the changes needed? To improve test coverage and prevent errors while modifying the CAST expression code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the modified test suites: ``` $ ./build/sbt "test:testOnly *CastSuite" ``` Closes #28531 from MaxGekk/test-cast-timestamp-to-byte. Authored-by: Max Gekk Signed-off-by: Wenchen Fan (cherry picked from commit c7ce37dfa713f80c5f0157719f0e3d9bf0d271dd) Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/CastSuite.scala | 27 ++ 1 file changed, 27 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index 5c57843..334b43e 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -1299,6 +1299,18 @@ class CastSuite extends CastSuiteBase { } } } + + test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") { +withDefaultTimeZone(UTC) { + val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1") + assert(negativeTs.getTime < 0) + val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND) + checkEvaluation(cast(negativeTs, ByteType), expectedSecs.toByte) + checkEvaluation(cast(negativeTs, ShortType), expectedSecs.toShort) + checkEvaluation(cast(negativeTs, IntegerType), expectedSecs.toInt) + checkEvaluation(cast(negativeTs, LongType), expectedSecs) +} + } } /** @@ -1341,4 +1353,19 @@ class AnsiCastSuite extends CastSuiteBase { cast("abc.com", dataType), "invalid input") } } + + test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") { +withDefaultTimeZone(UTC) { + val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1") + assert(negativeTs.getTime < 0) + val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND) + checkExceptionInExpression[ArithmeticException]( +cast(negativeTs, ByteType), "to byte causes overflow") + checkExceptionInExpression[ArithmeticException]( +cast(negativeTs, ShortType), "to short causes overflow") + checkExceptionInExpression[ArithmeticException]( +cast(negativeTs, IntegerType), "to int causes overflow") + checkEvaluation(cast(negativeTs, LongType), expectedSecs) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 635feaa [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types 635feaa is described below commit 635feaa0ef6298e336a447e1fdcfeca403b741bd Author: Max Gekk AuthorDate: Fri May 15 04:24:58 2020 + [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types ### What changes were proposed in this pull request? Added tests to check casting timestamps before 1970-01-01 00:00:00Z to ByteType, ShortType, IntegerType and LongType in ansi and non-ansi modes. ### Why are the changes needed? To improve test coverage and prevent errors while modifying the CAST expression code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the modified test suites: ``` $ ./build/sbt "test:testOnly *CastSuite" ``` Closes #28531 from MaxGekk/test-cast-timestamp-to-byte. Authored-by: Max Gekk Signed-off-by: Wenchen Fan (cherry picked from commit c7ce37dfa713f80c5f0157719f0e3d9bf0d271dd) Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/CastSuite.scala | 27 ++ 1 file changed, 27 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index 5c57843..334b43e 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -1299,6 +1299,18 @@ class CastSuite extends CastSuiteBase { } } } + + test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") { +withDefaultTimeZone(UTC) { + val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1") + assert(negativeTs.getTime < 0) + val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND) + checkEvaluation(cast(negativeTs, ByteType), expectedSecs.toByte) + checkEvaluation(cast(negativeTs, ShortType), expectedSecs.toShort) + checkEvaluation(cast(negativeTs, IntegerType), expectedSecs.toInt) + checkEvaluation(cast(negativeTs, LongType), expectedSecs) +} + } } /** @@ -1341,4 +1353,19 @@ class AnsiCastSuite extends CastSuiteBase { cast("abc.com", dataType), "invalid input") } } + + test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") { +withDefaultTimeZone(UTC) { + val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1") + assert(negativeTs.getTime < 0) + val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND) + checkExceptionInExpression[ArithmeticException]( +cast(negativeTs, ByteType), "to byte causes overflow") + checkExceptionInExpression[ArithmeticException]( +cast(negativeTs, ShortType), "to short causes overflow") + checkExceptionInExpression[ArithmeticException]( +cast(negativeTs, IntegerType), "to int causes overflow") + checkEvaluation(cast(negativeTs, LongType), expectedSecs) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cd5fbcf -> c7ce37d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cd5fbcf [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly add c7ce37d [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/CastSuite.scala | 27 ++ 1 file changed, 27 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cd5fbcf -> c7ce37d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cd5fbcf [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly add c7ce37d [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/CastSuite.scala | 27 ++ 1 file changed, 27 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 630f4dd [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly 630f4dd is described below commit 630f4dd2b2bb1d33f8c85cc3506a55f0ad70ef7f Author: Dongjoon Hyun AuthorDate: Thu May 14 19:28:25 2020 -0700 [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly ### What changes were proposed in this pull request? This PR makes `test-dependencies.sh` detect the version string correctly by ignoring all the other lines. ### Why are the changes needed? Currently, all SBT jobs are broken like the following. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console ``` [error] running /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh ; received return code 1 Build step 'Execute shell' marked build as failure ``` The reason is that the script detects the old version like `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` when `build/mvn` did fallback. Specifically, in the script, `OLD_VERSION` became `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` instead of `3.1.0-SNAPSHOT` if build/mvn did fallback. Then, `pom.xml` file is corrupted like the following at the end and the exit code become `1` instead of `0`. It causes Jenkins jobs fails ``` -3.1.0-SNAPSHOT +Falling ``` **NO FALLBACK** ``` $ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn 3.1.0-SNAPSHOT ``` **FALLBACK** ``` $ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec Falling back to archive.apache.org to download Maven Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn 3.1.0-SNAPSHOT ``` **In the script** ``` $ echo $(build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec) Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT ``` This PR will prevent irrelevant logs like `Falling back to archive.apache.org to download Maven`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the PR Builder. Closes #28532 from dongjoon-hyun/SPARK-31713. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit cd5fbcf9a0151f10553f67bcaa22b8122b3cf263) Signed-off-by: Dongjoon Hyun --- dev/test-dependencies.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/test-dependencies.sh b/dev/test-dependencies.sh index 6b4fa71..58e9112 100755 --- a/dev/test-dependencies.sh +++ b/dev/test-dependencies.sh @@ -56,7 +56,7 @@ OLD_VERSION=$($MVN -q \ -Dexec.executable="echo" \ -Dexec.args='${project.version}' \ --non-recursive \ -org.codehaus.mojo:exec-maven-plugin:1.6.0:exec) +org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+') if [ $? != 0 ]; then echo -e "Error while getting version string from Maven:\n$OLD_VERSION" exit 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 630f4dd [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly 630f4dd is described below commit 630f4dd2b2bb1d33f8c85cc3506a55f0ad70ef7f Author: Dongjoon Hyun AuthorDate: Thu May 14 19:28:25 2020 -0700 [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly ### What changes were proposed in this pull request? This PR makes `test-dependencies.sh` detect the version string correctly by ignoring all the other lines. ### Why are the changes needed? Currently, all SBT jobs are broken like the following. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console ``` [error] running /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh ; received return code 1 Build step 'Execute shell' marked build as failure ``` The reason is that the script detects the old version like `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` when `build/mvn` did fallback. Specifically, in the script, `OLD_VERSION` became `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` instead of `3.1.0-SNAPSHOT` if build/mvn did fallback. Then, `pom.xml` file is corrupted like the following at the end and the exit code become `1` instead of `0`. It causes Jenkins jobs fails ``` -3.1.0-SNAPSHOT +Falling ``` **NO FALLBACK** ``` $ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn 3.1.0-SNAPSHOT ``` **FALLBACK** ``` $ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec Falling back to archive.apache.org to download Maven Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn 3.1.0-SNAPSHOT ``` **In the script** ``` $ echo $(build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec) Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT ``` This PR will prevent irrelevant logs like `Falling back to archive.apache.org to download Maven`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the PR Builder. Closes #28532 from dongjoon-hyun/SPARK-31713. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit cd5fbcf9a0151f10553f67bcaa22b8122b3cf263) Signed-off-by: Dongjoon Hyun --- dev/test-dependencies.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/test-dependencies.sh b/dev/test-dependencies.sh index 6b4fa71..58e9112 100755 --- a/dev/test-dependencies.sh +++ b/dev/test-dependencies.sh @@ -56,7 +56,7 @@ OLD_VERSION=$($MVN -q \ -Dexec.executable="echo" \ -Dexec.args='${project.version}' \ --non-recursive \ -org.codehaus.mojo:exec-maven-plugin:1.6.0:exec) +org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+') if [ $? != 0 ]; then echo -e "Error while getting version string from Maven:\n$OLD_VERSION" exit 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (cf708f9 -> 8ba1578)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from cf708f9 Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener" add 8ba1578 [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly No new revisions were added by this update. Summary of changes: dev/test-dependencies.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (cf708f9 -> 8ba1578)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from cf708f9 Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener" add 8ba1578 [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly No new revisions were added by this update. Summary of changes: dev/test-dependencies.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 630f4dd [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly 630f4dd is described below commit 630f4dd2b2bb1d33f8c85cc3506a55f0ad70ef7f Author: Dongjoon Hyun AuthorDate: Thu May 14 19:28:25 2020 -0700 [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly ### What changes were proposed in this pull request? This PR makes `test-dependencies.sh` detect the version string correctly by ignoring all the other lines. ### Why are the changes needed? Currently, all SBT jobs are broken like the following. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console ``` [error] running /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh ; received return code 1 Build step 'Execute shell' marked build as failure ``` The reason is that the script detects the old version like `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` when `build/mvn` did fallback. Specifically, in the script, `OLD_VERSION` became `Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT` instead of `3.1.0-SNAPSHOT` if build/mvn did fallback. Then, `pom.xml` file is corrupted like the following at the end and the exit code become `1` instead of `0`. It causes Jenkins jobs fails ``` -3.1.0-SNAPSHOT +Falling ``` **NO FALLBACK** ``` $ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn 3.1.0-SNAPSHOT ``` **FALLBACK** ``` $ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec Falling back to archive.apache.org to download Maven Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn 3.1.0-SNAPSHOT ``` **In the script** ``` $ echo $(build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' --non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec) Using `mvn` from path: /Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT ``` This PR will prevent irrelevant logs like `Falling back to archive.apache.org to download Maven`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the PR Builder. Closes #28532 from dongjoon-hyun/SPARK-31713. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit cd5fbcf9a0151f10553f67bcaa22b8122b3cf263) Signed-off-by: Dongjoon Hyun --- dev/test-dependencies.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/test-dependencies.sh b/dev/test-dependencies.sh index 6b4fa71..58e9112 100755 --- a/dev/test-dependencies.sh +++ b/dev/test-dependencies.sh @@ -56,7 +56,7 @@ OLD_VERSION=$($MVN -q \ -Dexec.executable="echo" \ -Dexec.args='${project.version}' \ --non-recursive \ -org.codehaus.mojo:exec-maven-plugin:1.6.0:exec) +org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+') if [ $? != 0 ]; then echo -e "Error while getting version string from Maven:\n$OLD_VERSION" exit 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bbb62c5 -> cd5fbcf)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bbb62c5 Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener" add cd5fbcf [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly No new revisions were added by this update. Summary of changes: dev/test-dependencies.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bbb62c5 -> cd5fbcf)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bbb62c5 Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener" add cd5fbcf [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly No new revisions were added by this update. Summary of changes: dev/test-dependencies.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new cf708f9 Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener" cf708f9 is described below commit cf708f970b640722062b3102b8757a554aa0c841 Author: Dongjoon Hyun AuthorDate: Thu May 14 12:06:13 2020 -0700 Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener" This reverts commit 512cb2f0246a0d020f0ba726b4596555b15797c6. --- .../ui/HiveThriftServer2Listener.scala | 120 + .../hive/thriftserver/HiveSessionImplSuite.scala | 73 - .../ui/HiveThriftServer2ListenerSuite.scala| 16 --- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 51 insertions(+), 170 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala index 20a8f2c..6d0a506 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala @@ -25,7 +25,6 @@ import scala.collection.mutable.ArrayBuffer import org.apache.hive.service.server.HiveServer2 import org.apache.spark.{SparkConf, SparkContext} -import org.apache.spark.internal.Logging import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD import org.apache.spark.scheduler._ import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState @@ -39,7 +38,7 @@ private[thriftserver] class HiveThriftServer2Listener( kvstore: ElementTrackingStore, sparkConf: SparkConf, server: Option[HiveServer2], -live: Boolean = true) extends SparkListener with Logging { +live: Boolean = true) extends SparkListener { private val sessionList = new ConcurrentHashMap[String, LiveSessionData]() private val executionList = new ConcurrentHashMap[String, LiveExecutionData]() @@ -132,81 +131,60 @@ private[thriftserver] class HiveThriftServer2Listener( updateLiveStore(session) } - private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = -Option(sessionList.get(e.sessionId)) match { - case None => logWarning(s"onSessionClosed called with unknown session id: ${e.sessionId}") - case Some(sessionData) => -val session = sessionData -session.finishTimestamp = e.finishTime -updateStoreWithTriggerEnabled(session) -sessionList.remove(e.sessionId) -} + private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = { +val session = sessionList.get(e.sessionId) +session.finishTimestamp = e.finishTime +updateStoreWithTriggerEnabled(session) +sessionList.remove(e.sessionId) + } - private def onOperationStart(e: SparkListenerThriftServerOperationStart): Unit = -Option(sessionList.get(e.sessionId)) match { - case None => logWarning(s"onOperationStart called with unknown session id: ${e.sessionId}") - case Some(sessionData) => -val info = getOrCreateExecution( - e.id, - e.statement, - e.sessionId, - e.startTime, - e.userName) - -info.state = ExecutionState.STARTED -executionList.put(e.id, info) -sessionData.totalExecution += 1 -executionList.get(e.id).groupId = e.groupId -updateLiveStore(executionList.get(e.id)) -updateLiveStore(sessionData) -} + private def onOperationStart(e: SparkListenerThriftServerOperationStart): Unit = { +val info = getOrCreateExecution( + e.id, + e.statement, + e.sessionId, + e.startTime, + e.userName) + +info.state = ExecutionState.STARTED +executionList.put(e.id, info) +sessionList.get(e.sessionId).totalExecution += 1 +executionList.get(e.id).groupId = e.groupId +updateLiveStore(executionList.get(e.id)) +updateLiveStore(sessionList.get(e.sessionId)) + } - private def onOperationParsed(e: SparkListenerThriftServerOperationParsed): Unit = -Option(executionList.get(e.id)) match { - case None => logWarning(s"onOperationParsed called with unknown operation id: ${e.id}") - case Some(executionData) => -executionData.executePlan = e.executionPlan -executionData.state = ExecutionState.COMPILED -updateLiveStore(executionData) -} + private def onOperationParsed(e: SparkListenerThriftServerOperationParsed): Unit = { +executionList.get(e.id).executePlan = e.executionPlan
[spark] branch master updated (7ce3f76 -> bbb62c5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7ce3f76 [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation add bbb62c5 Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener" No new revisions were added by this update. Summary of changes: .../ui/HiveThriftServer2Listener.scala | 120 + .../hive/thriftserver/HiveSessionImplSuite.scala | 73 - .../ui/HiveThriftServer2ListenerSuite.scala| 16 --- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 51 insertions(+), 170 deletions(-) delete mode 100644 sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7ce3f76 -> bbb62c5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7ce3f76 [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation add bbb62c5 Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener" No new revisions were added by this update. Summary of changes: .../ui/HiveThriftServer2Listener.scala | 120 + .../hive/thriftserver/HiveSessionImplSuite.scala | 73 - .../ui/HiveThriftServer2ListenerSuite.scala| 16 --- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- .../hive/service/cli/session/HiveSessionImpl.java | 6 +- 5 files changed, 51 insertions(+), 170 deletions(-) delete mode 100644 sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ca9cde8 [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation ca9cde8 is described below commit ca9cde8e76abfdb73f0202fbf043e4d895d7163b Author: Dongjoon Hyun AuthorDate: Thu May 14 10:25:22 2020 -0700 [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation # What changes were proposed in this pull request? This PR is a follow-up to fix a version of configuration document. ### Why are the changes needed? The original PR is backported to branch-3.0. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Manual. Closes #28530 from dongjoon-hyun/SPARK-31696-2. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 7ce3f76af6b72e88722a89f792e6fded9c586795) Signed-off-by: Dongjoon Hyun --- docs/running-on-kubernetes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 2739149..3abb891 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -815,7 +815,7 @@ See the [configuration page](configuration.html) for information on Spark config Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation specified by AnnotationName to the driver service. For example, spark.kubernetes.driver.service.annotation.something=true. - 3.1.0 + 3.0.0 spark.kubernetes.executor.label.[LabelName] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e10516a -> 7ce3f76)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e10516a [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary add 7ce3f76 [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31696][K8S] Support driver service annotation in K8S
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 541d451 [SPARK-31696][K8S] Support driver service annotation in K8S 541d451 is described below commit 541d451cd7062f04e53d93719034877f27452a3e Author: Dongjoon Hyun AuthorDate: Wed May 13 13:59:42 2020 -0700 [SPARK-31696][K8S] Support driver service annotation in K8S ### What changes were proposed in this pull request? This PR aims to add `spark.kubernetes.driver.service.annotation` like `spark.kubernetes.driver.service.annotation`. ### Why are the changes needed? Annotations are used in many ways. One example is that Prometheus monitoring system search metric endpoint via annotation. - https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations ### Does this PR introduce _any_ user-facing change? Yes. The documentation is added. ### How was this patch tested? Pass Jenkins with the updated unit tests. Closes #28518 from dongjoon-hyun/SPARK-31696. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit c8f3bd861d96cf3f7b01cd9f864c181a57e1c77a) Signed-off-by: Dongjoon Hyun --- docs/running-on-kubernetes.md | 13 +++-- .../src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 1 + .../scala/org/apache/spark/deploy/k8s/KubernetesConf.scala | 5 + .../deploy/k8s/features/DriverServiceFeatureStep.scala | 1 + .../org/apache/spark/deploy/k8s/KubernetesTestConf.scala| 2 ++ .../deploy/k8s/features/DriverServiceFeatureStepSuite.scala | 13 +++-- 6 files changed, 31 insertions(+), 4 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 4f228a5..2739149 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -803,12 +803,21 @@ See the [configuration page](configuration.html) for information on Spark config spark.kubernetes.driver.annotation.[AnnotationName] (none) -Add the annotation specified by AnnotationName to the driver pod. +Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation specified by AnnotationName to the driver pod. For example, spark.kubernetes.driver.annotation.something=true. 2.3.0 + spark.kubernetes.driver.service.annotation.[AnnotationName] + (none) + +Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation specified by AnnotationName to the driver service. +For example, spark.kubernetes.driver.service.annotation.something=true. + + 3.1.0 + + spark.kubernetes.executor.label.[LabelName] (none) @@ -823,7 +832,7 @@ See the [configuration page](configuration.html) for information on Spark config spark.kubernetes.executor.annotation.[AnnotationName] (none) -Add the annotation specified by AnnotationName to the executor pods. +Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation specified by AnnotationName to the executor pods. For example, spark.kubernetes.executor.annotation.something=true. 2.3.0 diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala index 8684a60..22f4c75 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala @@ -411,6 +411,7 @@ private[spark] object Config extends Logging { val KUBERNETES_DRIVER_LABEL_PREFIX = "spark.kubernetes.driver.label." val KUBERNETES_DRIVER_ANNOTATION_PREFIX = "spark.kubernetes.driver.annotation." + val KUBERNETES_DRIVER_SERVICE_ANNOTATION_PREFIX = "spark.kubernetes.driver.service.annotation." val KUBERNETES_DRIVER_SECRETS_PREFIX = "spark.kubernetes.driver.secrets." val KUBERNETES_DRIVER_SECRET_KEY_REF_PREFIX = "spark.kubernetes.driver.secretKeyRef." val KUBERNETES_DRIVER_VOLUMES_PREFIX = "spark.kubernetes.driver.volumes." diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala index 09943b7..6dcae3e 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala @@ -106,6 +106,11 @@ private[spark] class KubernetesDr
[spark] branch branch-3.0 updated: [SPARK-31696][K8S] Support driver service annotation in K8S
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 541d451 [SPARK-31696][K8S] Support driver service annotation in K8S 541d451 is described below commit 541d451cd7062f04e53d93719034877f27452a3e Author: Dongjoon Hyun AuthorDate: Wed May 13 13:59:42 2020 -0700 [SPARK-31696][K8S] Support driver service annotation in K8S ### What changes were proposed in this pull request? This PR aims to add `spark.kubernetes.driver.service.annotation` like `spark.kubernetes.driver.service.annotation`. ### Why are the changes needed? Annotations are used in many ways. One example is that Prometheus monitoring system search metric endpoint via annotation. - https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations ### Does this PR introduce _any_ user-facing change? Yes. The documentation is added. ### How was this patch tested? Pass Jenkins with the updated unit tests. Closes #28518 from dongjoon-hyun/SPARK-31696. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit c8f3bd861d96cf3f7b01cd9f864c181a57e1c77a) Signed-off-by: Dongjoon Hyun --- docs/running-on-kubernetes.md | 13 +++-- .../src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 1 + .../scala/org/apache/spark/deploy/k8s/KubernetesConf.scala | 5 + .../deploy/k8s/features/DriverServiceFeatureStep.scala | 1 + .../org/apache/spark/deploy/k8s/KubernetesTestConf.scala| 2 ++ .../deploy/k8s/features/DriverServiceFeatureStepSuite.scala | 13 +++-- 6 files changed, 31 insertions(+), 4 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 4f228a5..2739149 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -803,12 +803,21 @@ See the [configuration page](configuration.html) for information on Spark config spark.kubernetes.driver.annotation.[AnnotationName] (none) -Add the annotation specified by AnnotationName to the driver pod. +Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation specified by AnnotationName to the driver pod. For example, spark.kubernetes.driver.annotation.something=true. 2.3.0 + spark.kubernetes.driver.service.annotation.[AnnotationName] + (none) + +Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation specified by AnnotationName to the driver service. +For example, spark.kubernetes.driver.service.annotation.something=true. + + 3.1.0 + + spark.kubernetes.executor.label.[LabelName] (none) @@ -823,7 +832,7 @@ See the [configuration page](configuration.html) for information on Spark config spark.kubernetes.executor.annotation.[AnnotationName] (none) -Add the annotation specified by AnnotationName to the executor pods. +Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation specified by AnnotationName to the executor pods. For example, spark.kubernetes.executor.annotation.something=true. 2.3.0 diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala index 8684a60..22f4c75 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala @@ -411,6 +411,7 @@ private[spark] object Config extends Logging { val KUBERNETES_DRIVER_LABEL_PREFIX = "spark.kubernetes.driver.label." val KUBERNETES_DRIVER_ANNOTATION_PREFIX = "spark.kubernetes.driver.annotation." + val KUBERNETES_DRIVER_SERVICE_ANNOTATION_PREFIX = "spark.kubernetes.driver.service.annotation." val KUBERNETES_DRIVER_SECRETS_PREFIX = "spark.kubernetes.driver.secrets." val KUBERNETES_DRIVER_SECRET_KEY_REF_PREFIX = "spark.kubernetes.driver.secretKeyRef." val KUBERNETES_DRIVER_VOLUMES_PREFIX = "spark.kubernetes.driver.volumes." diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala index 09943b7..6dcae3e 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala @@ -106,6 +106,11 @@ private[spark] class KubernetesDr
[spark] branch branch-3.0 updated: [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6834f46 [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary 6834f46 is described below commit 6834f4691b3e2603d410bfe24f0db0b6e3a36446 Author: Huaxin Gao AuthorDate: Thu May 14 10:54:35 2020 -0500 [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary ### What changes were proposed in this pull request? Return LogisticRegressionSummary for multiclass logistic regression evaluate in PySpark ### Why are the changes needed? Currently we have ``` since("2.0.0") def evaluate(self, dataset): if not isinstance(dataset, DataFrame): raise ValueError("dataset must be a DataFrame but got %s." % type(dataset)) java_blr_summary = self._call_java("evaluate", dataset) return BinaryLogisticRegressionSummary(java_blr_summary) ``` we should return LogisticRegressionSummary for multiclass logistic regression ### Does this PR introduce _any_ user-facing change? Yes return LogisticRegressionSummary instead of BinaryLogisticRegressionSummary for multiclass logistic regression in Python ### How was this patch tested? unit test Closes #28503 from huaxingao/lr_summary. Authored-by: Huaxin Gao Signed-off-by: Sean Owen (cherry picked from commit e10516ae63cfc58f2d493e4d3f19940d45c8f033) Signed-off-by: Sean Owen --- python/pyspark/ml/classification.py | 5 - python/pyspark/ml/tests/test_training_summary.py | 6 +- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/python/pyspark/ml/classification.py b/python/pyspark/ml/classification.py index 1436b78..424e16a 100644 --- a/python/pyspark/ml/classification.py +++ b/python/pyspark/ml/classification.py @@ -831,7 +831,10 @@ class LogisticRegressionModel(JavaProbabilisticClassificationModel, _LogisticReg if not isinstance(dataset, DataFrame): raise ValueError("dataset must be a DataFrame but got %s." % type(dataset)) java_blr_summary = self._call_java("evaluate", dataset) -return BinaryLogisticRegressionSummary(java_blr_summary) +if self.numClasses <= 2: +return BinaryLogisticRegressionSummary(java_blr_summary) +else: +return LogisticRegressionSummary(java_blr_summary) class LogisticRegressionSummary(JavaWrapper): diff --git a/python/pyspark/ml/tests/test_training_summary.py b/python/pyspark/ml/tests/test_training_summary.py index 1d19ebf..b505409 100644 --- a/python/pyspark/ml/tests/test_training_summary.py +++ b/python/pyspark/ml/tests/test_training_summary.py @@ -21,7 +21,8 @@ import unittest if sys.version > '3': basestring = str -from pyspark.ml.classification import LogisticRegression +from pyspark.ml.classification import BinaryLogisticRegressionSummary, LogisticRegression, \ +LogisticRegressionSummary from pyspark.ml.clustering import BisectingKMeans, GaussianMixture, KMeans from pyspark.ml.linalg import Vectors from pyspark.ml.regression import GeneralizedLinearRegression, LinearRegression @@ -149,6 +150,7 @@ class TrainingSummaryTest(SparkSessionTestCase): # test evaluation (with training dataset) produces a summary with same values # one check is enough to verify a summary is returned, Scala version runs full test sameSummary = model.evaluate(df) +self.assertTrue(isinstance(sameSummary, BinaryLogisticRegressionSummary)) self.assertAlmostEqual(sameSummary.areaUnderROC, s.areaUnderROC) def test_multiclass_logistic_regression_summary(self): @@ -187,6 +189,8 @@ class TrainingSummaryTest(SparkSessionTestCase): # test evaluation (with training dataset) produces a summary with same values # one check is enough to verify a summary is returned, Scala version runs full test sameSummary = model.evaluate(df) +self.assertTrue(isinstance(sameSummary, LogisticRegressionSummary)) +self.assertFalse(isinstance(sameSummary, BinaryLogisticRegressionSummary)) self.assertAlmostEqual(sameSummary.accuracy, s.accuracy) def test_gaussian_mixture_summary(self): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6834f46 [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary 6834f46 is described below commit 6834f4691b3e2603d410bfe24f0db0b6e3a36446 Author: Huaxin Gao AuthorDate: Thu May 14 10:54:35 2020 -0500 [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary ### What changes were proposed in this pull request? Return LogisticRegressionSummary for multiclass logistic regression evaluate in PySpark ### Why are the changes needed? Currently we have ``` since("2.0.0") def evaluate(self, dataset): if not isinstance(dataset, DataFrame): raise ValueError("dataset must be a DataFrame but got %s." % type(dataset)) java_blr_summary = self._call_java("evaluate", dataset) return BinaryLogisticRegressionSummary(java_blr_summary) ``` we should return LogisticRegressionSummary for multiclass logistic regression ### Does this PR introduce _any_ user-facing change? Yes return LogisticRegressionSummary instead of BinaryLogisticRegressionSummary for multiclass logistic regression in Python ### How was this patch tested? unit test Closes #28503 from huaxingao/lr_summary. Authored-by: Huaxin Gao Signed-off-by: Sean Owen (cherry picked from commit e10516ae63cfc58f2d493e4d3f19940d45c8f033) Signed-off-by: Sean Owen --- python/pyspark/ml/classification.py | 5 - python/pyspark/ml/tests/test_training_summary.py | 6 +- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/python/pyspark/ml/classification.py b/python/pyspark/ml/classification.py index 1436b78..424e16a 100644 --- a/python/pyspark/ml/classification.py +++ b/python/pyspark/ml/classification.py @@ -831,7 +831,10 @@ class LogisticRegressionModel(JavaProbabilisticClassificationModel, _LogisticReg if not isinstance(dataset, DataFrame): raise ValueError("dataset must be a DataFrame but got %s." % type(dataset)) java_blr_summary = self._call_java("evaluate", dataset) -return BinaryLogisticRegressionSummary(java_blr_summary) +if self.numClasses <= 2: +return BinaryLogisticRegressionSummary(java_blr_summary) +else: +return LogisticRegressionSummary(java_blr_summary) class LogisticRegressionSummary(JavaWrapper): diff --git a/python/pyspark/ml/tests/test_training_summary.py b/python/pyspark/ml/tests/test_training_summary.py index 1d19ebf..b505409 100644 --- a/python/pyspark/ml/tests/test_training_summary.py +++ b/python/pyspark/ml/tests/test_training_summary.py @@ -21,7 +21,8 @@ import unittest if sys.version > '3': basestring = str -from pyspark.ml.classification import LogisticRegression +from pyspark.ml.classification import BinaryLogisticRegressionSummary, LogisticRegression, \ +LogisticRegressionSummary from pyspark.ml.clustering import BisectingKMeans, GaussianMixture, KMeans from pyspark.ml.linalg import Vectors from pyspark.ml.regression import GeneralizedLinearRegression, LinearRegression @@ -149,6 +150,7 @@ class TrainingSummaryTest(SparkSessionTestCase): # test evaluation (with training dataset) produces a summary with same values # one check is enough to verify a summary is returned, Scala version runs full test sameSummary = model.evaluate(df) +self.assertTrue(isinstance(sameSummary, BinaryLogisticRegressionSummary)) self.assertAlmostEqual(sameSummary.areaUnderROC, s.areaUnderROC) def test_multiclass_logistic_regression_summary(self): @@ -187,6 +189,8 @@ class TrainingSummaryTest(SparkSessionTestCase): # test evaluation (with training dataset) produces a summary with same values # one check is enough to verify a summary is returned, Scala version runs full test sameSummary = model.evaluate(df) +self.assertTrue(isinstance(sameSummary, LogisticRegressionSummary)) +self.assertFalse(isinstance(sameSummary, BinaryLogisticRegressionSummary)) self.assertAlmostEqual(sameSummary.accuracy, s.accuracy) def test_gaussian_mixture_summary(self): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b2300fc -> e10516a)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b2300fc [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0) add e10516a [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary No new revisions were added by this update. Summary of changes: python/pyspark/ml/classification.py | 5 - python/pyspark/ml/tests/test_training_summary.py | 6 +- 2 files changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b2300fc -> e10516a)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b2300fc [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0) add e10516a [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary No new revisions were added by this update. Summary of changes: python/pyspark/ml/classification.py | 5 - python/pyspark/ml/tests/test_training_summary.py | 6 +- 2 files changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1ea5844 [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0) 1ea5844 is described below commit 1ea584443e9372a6a0b3c8449f5bf7e9e1369b0d Author: Weichen Xu AuthorDate: Thu May 14 09:24:40 2020 -0500 [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0) In QuantileDiscretizer.getDistinctSplits, before invoking distinct, normalize all -0.0 and 0.0 to be 0.0 ``` for (i <- 0 until splits.length) { if (splits(i) == -0.0) { splits(i) = 0.0 } } ``` Fix bug. No Unit test. ~~~scala import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) import spark.implicits._ val df1 = sc.parallelize(a1, 2).toDF("id") import org.apache.spark.ml.feature.QuantileDiscretizer val qd = new QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) val model = qd.fit(df1) // will raise error in spark master. ~~~ scala `0.0 == -0.0` is True but `0.0.hashCode == -0.0.hashCode()` is False. This break the contract between equals() and hashCode() If two objects are equal, then they must have the same hash code. And array.distinct will rely on elem.hashCode so it leads to this error. Test code on distinct ``` import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) a1.distinct.sorted.foreach(x => print(x.toString + "\n")) ``` Then you will see output like: ``` ... -0.009292684662246975 -0.0033280686465135823 -0.0 0.0 0.0022219556032221366 0.02217419561977274 ... ``` Closes #28498 from WeichenXu123/SPARK-31676. Authored-by: Weichen Xu Signed-off-by: Sean Owen (cherry picked from commit b2300fca1e1a22d74c6eeda37942920a6c6299ff) Signed-off-by: Sean Owen --- .../apache/spark/ml/feature/QuantileDiscretizer.scala | 12 .../spark/ml/feature/QuantileDiscretizerSuite.scala| 18 ++ 2 files changed, 30 insertions(+) diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala index 56e2c54..f3ec358 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala @@ -243,6 +243,18 @@ final class QuantileDiscretizer @Since("1.6.0") (@Since("1.6.0") override val ui private def getDistinctSplits(splits: Array[Double]): Array[Double] = { splits(0) = Double.NegativeInfinity splits(splits.length - 1) = Double.PositiveInfinity + +// 0.0 and -0.0 are distinct values, array.distinct will preserve both of them. +// but 0.0 > -0.0 is False which will break the parameter validation checking. +// and in scala <= 2.12, there's bug which will cause array.distinct generate +// non-deterministic results when array contains both 0.0 and -0.0 +// So that here we should first normalize all 0.0 and -0.0 to be 0.0 +// See https://github.com/scala/bug/issues/11995 +for (i <- 0 until splits.length) { + if (splits(i) == -0.0) { +splits(i) = 0.0 + } +} val distinctSplits = splits.distinct if (splits.length != distinctSplits.length) { log.warn(s"Some quantiles were identical. Bucketing to ${distinctSplits.length - 1}" + diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala index b009038..9c37416 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala @@ -443,4 +443,22 @@ class QuantileDiscretizerSuite extends MLTest with DefaultReadWriteTest { discretizer.fit(df) } } + + test("[SPARK-31676] QuantileDiscretizer raise error parameter splits given invalid value") { +import scala.util.Random +val rng = new Random(3) + +val a1 = Array.tabulate(200)(_ => rng.nextDouble * 2.0 - 1.0) ++ + Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) + +val df1 = sc.parallelize(a1, 2).toDF("id") + +val qd = new QuantileDiscretizer() + .setInputCol("id") + .se
[spark] branch branch-3.0 updated: [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 00e6acc [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0) 00e6acc is described below commit 00e6acc9c6d45c5dd3b3f70c87909743a8073dba Author: Weichen Xu AuthorDate: Thu May 14 09:24:40 2020 -0500 [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0) ### What changes were proposed in this pull request? In QuantileDiscretizer.getDistinctSplits, before invoking distinct, normalize all -0.0 and 0.0 to be 0.0 ``` for (i <- 0 until splits.length) { if (splits(i) == -0.0) { splits(i) = 0.0 } } ``` ### Why are the changes needed? Fix bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. Manually test: ~~~scala import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) import spark.implicits._ val df1 = sc.parallelize(a1, 2).toDF("id") import org.apache.spark.ml.feature.QuantileDiscretizer val qd = new QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) val model = qd.fit(df1) // will raise error in spark master. ~~~ ### Explain scala `0.0 == -0.0` is True but `0.0.hashCode == -0.0.hashCode()` is False. This break the contract between equals() and hashCode() If two objects are equal, then they must have the same hash code. And array.distinct will rely on elem.hashCode so it leads to this error. Test code on distinct ``` import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) a1.distinct.sorted.foreach(x => print(x.toString + "\n")) ``` Then you will see output like: ``` ... -0.009292684662246975 -0.0033280686465135823 -0.0 0.0 0.0022219556032221366 0.02217419561977274 ... ``` Closes #28498 from WeichenXu123/SPARK-31676. Authored-by: Weichen Xu Signed-off-by: Sean Owen (cherry picked from commit b2300fca1e1a22d74c6eeda37942920a6c6299ff) Signed-off-by: Sean Owen --- .../apache/spark/ml/feature/QuantileDiscretizer.scala | 12 .../spark/ml/feature/QuantileDiscretizerSuite.scala| 18 ++ 2 files changed, 30 insertions(+) diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala index 216d99d..4eedfc4 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala @@ -236,6 +236,18 @@ final class QuantileDiscretizer @Since("1.6.0") (@Since("1.6.0") override val ui private def getDistinctSplits(splits: Array[Double]): Array[Double] = { splits(0) = Double.NegativeInfinity splits(splits.length - 1) = Double.PositiveInfinity + +// 0.0 and -0.0 are distinct values, array.distinct will preserve both of them. +// but 0.0 > -0.0 is False which will break the parameter validation checking. +// and in scala <= 2.12, there's bug which will cause array.distinct generate +// non-deterministic results when array contains both 0.0 and -0.0 +// So that here we should first normalize all 0.0 and -0.0 to be 0.0 +// See https://github.com/scala/bug/issues/11995 +for (i <- 0 until splits.length) { + if (splits(i) == -0.0) { +splits(i) = 0.0 + } +} val distinctSplits = splits.distinct if (splits.length != distinctSplits.length) { log.warn(s"Some quantiles were identical. Bucketing to ${distinctSplits.length - 1}" + diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala index 6f6ab26..682b87a 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala @@ -512,4 +512,22 @@ class QuantileDiscretizerSuite extends MLTest with DefaultReadWriteTest { assert(observedNumBuckets === numBuckets, "Observed number of buckets does not equal expected number of buckets.") } + + test("[SPARK-31676] QuantileDiscretizer raise error parameter splits given
[spark] branch master updated (ddbce4e -> b2300fc)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ddbce4e [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination … add b2300fc [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0) No new revisions were added by this update. Summary of changes: .../apache/spark/ml/feature/QuantileDiscretizer.scala | 12 .../spark/ml/feature/QuantileDiscretizerSuite.scala| 18 ++ 2 files changed, 30 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5cf11c [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination … f5cf11c is described below commit f5cf11c4d39190f7f5f8a20c8c634c0dc2d6c212 Author: sunke.03 AuthorDate: Thu May 14 13:55:24 2020 + [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination … ### What changes were proposed in this pull request? This PR try to fix a bug in `org.apache.spark.sql.hive.execution.ScriptTransformationExec`. This bug appears in our online cluster. `ScriptTransformationExec` should throw an exception, when user uses a python script which contains parse error. But current implementation may miss this case of failure. ### Why are the changes needed? When user uses a python script which contains a parse error, there will be no output. So `scriptOutputReader.next(scriptOutputWritable) <= 0` matches, then we use `checkFailureAndPropagate()` to check the `proc`. But the `proc` may still be alive and `writerThread.exception` is not defined, `checkFailureAndPropagate` cannot check this case of failure. In the end, the Spark SQL job runs successfully and returns no result. In fact, the SparK SQL job should fails and shows the except [...] For example, the error python script is blow. ``` python # encoding: utf8 import unknow_module import sys for line in sys.stdin: print line ``` The bug can be reproduced by running the following code in our cluter. ``` spark.range(100*100).toDF("index").createOrReplaceTempView("test") spark.sql("select TRANSFORM(index) USING 'python error_python.py' as new_index from test").collect.foreach(println) ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing UT Closes #27724 from slamke/transformation. Authored-by: sunke.03 Signed-off-by: Wenchen Fan (cherry picked from commit ddbce4edee6d4de30e6900bc0f03728a989aef0a) Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/internal/SQLConf.scala| 9 ++ .../hive/execution/ScriptTransformationExec.scala | 12 +++- .../hive/execution/ScriptTransformationSuite.scala | 36 ++ 3 files changed, 56 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 6c18280..31038a0e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2561,6 +2561,15 @@ object SQLConf { .booleanConf .createWithDefault(false) + val SCRIPT_TRANSFORMATION_EXIT_TIMEOUT = +buildConf("spark.sql.scriptTransformation.exitTimeoutInSeconds") + .internal() + .doc("Timeout for executor to wait for the termination of transformation script when EOF.") + .version("3.0.0") + .timeConf(TimeUnit.SECONDS) + .checkValue(_ > 0, "The timeout value must be positive") + .createWithDefault(10L) + /** * Holds information about keys that have been deprecated. * diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala index 40f7b4e..c7183fd 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.hive.execution import java.io._ import java.nio.charset.StandardCharsets import java.util.Properties +import java.util.concurrent.TimeUnit import javax.annotation.Nullable import scala.collection.JavaConverters._ @@ -42,6 +43,7 @@ import org.apache.spark.sql.catalyst.plans.physical.Partitioning import org.apache.spark.sql.execution._ import org.apache.spark.sql.hive.HiveInspectors import org.apache.spark.sql.hive.HiveShim._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.DataType import org.apache.spark.util.{CircularBuffer, RedirectThread, SerializableConfiguration, Utils} @@ -136,6 +138,15 @@ case class ScriptTransformationExec( throw writerThread.exception.get } + // There can be a lag between reader read EOF and the process termination. + // If the script fails to startup, this kind of error may be missed. + // So explicitly waiting for the process termination. + val timeout = conf.getConf(SQLConf.
[spark] branch branch-3.0 updated: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5cf11c [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination … f5cf11c is described below commit f5cf11c4d39190f7f5f8a20c8c634c0dc2d6c212 Author: sunke.03 AuthorDate: Thu May 14 13:55:24 2020 + [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination … ### What changes were proposed in this pull request? This PR try to fix a bug in `org.apache.spark.sql.hive.execution.ScriptTransformationExec`. This bug appears in our online cluster. `ScriptTransformationExec` should throw an exception, when user uses a python script which contains parse error. But current implementation may miss this case of failure. ### Why are the changes needed? When user uses a python script which contains a parse error, there will be no output. So `scriptOutputReader.next(scriptOutputWritable) <= 0` matches, then we use `checkFailureAndPropagate()` to check the `proc`. But the `proc` may still be alive and `writerThread.exception` is not defined, `checkFailureAndPropagate` cannot check this case of failure. In the end, the Spark SQL job runs successfully and returns no result. In fact, the SparK SQL job should fails and shows the except [...] For example, the error python script is blow. ``` python # encoding: utf8 import unknow_module import sys for line in sys.stdin: print line ``` The bug can be reproduced by running the following code in our cluter. ``` spark.range(100*100).toDF("index").createOrReplaceTempView("test") spark.sql("select TRANSFORM(index) USING 'python error_python.py' as new_index from test").collect.foreach(println) ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing UT Closes #27724 from slamke/transformation. Authored-by: sunke.03 Signed-off-by: Wenchen Fan (cherry picked from commit ddbce4edee6d4de30e6900bc0f03728a989aef0a) Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/internal/SQLConf.scala| 9 ++ .../hive/execution/ScriptTransformationExec.scala | 12 +++- .../hive/execution/ScriptTransformationSuite.scala | 36 ++ 3 files changed, 56 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 6c18280..31038a0e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2561,6 +2561,15 @@ object SQLConf { .booleanConf .createWithDefault(false) + val SCRIPT_TRANSFORMATION_EXIT_TIMEOUT = +buildConf("spark.sql.scriptTransformation.exitTimeoutInSeconds") + .internal() + .doc("Timeout for executor to wait for the termination of transformation script when EOF.") + .version("3.0.0") + .timeConf(TimeUnit.SECONDS) + .checkValue(_ > 0, "The timeout value must be positive") + .createWithDefault(10L) + /** * Holds information about keys that have been deprecated. * diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala index 40f7b4e..c7183fd 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.hive.execution import java.io._ import java.nio.charset.StandardCharsets import java.util.Properties +import java.util.concurrent.TimeUnit import javax.annotation.Nullable import scala.collection.JavaConverters._ @@ -42,6 +43,7 @@ import org.apache.spark.sql.catalyst.plans.physical.Partitioning import org.apache.spark.sql.execution._ import org.apache.spark.sql.hive.HiveInspectors import org.apache.spark.sql.hive.HiveShim._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.DataType import org.apache.spark.util.{CircularBuffer, RedirectThread, SerializableConfiguration, Utils} @@ -136,6 +138,15 @@ case class ScriptTransformationExec( throw writerThread.exception.get } + // There can be a lag between reader read EOF and the process termination. + // If the script fails to startup, this kind of error may be missed. + // So explicitly waiting for the process termination. + val timeout = conf.getConf(SQLConf.
[spark] branch master updated (7260146 -> ddbce4e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7260146 [SPARK-31692][SQL] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory add ddbce4e [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination … No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 9 ++ .../hive/execution/ScriptTransformationExec.scala | 12 +++- .../hive/execution/ScriptTransformationSuite.scala | 36 ++ 3 files changed, 56 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7260146 -> ddbce4e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7260146 [SPARK-31692][SQL] Pass hadoop confs specifed via Spark confs to URLStreamHandlerfactory add ddbce4e [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination … No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 9 ++ .../hive/execution/ScriptTransformationExec.scala | 12 +++- .../hive/execution/ScriptTransformationSuite.scala | 36 ++ 3 files changed, 56 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org