[spark] branch branch-3.0 updated (635feaa -> d270be4)

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 635feaa  [SPARK-31712][SQL][TESTS] Check casting timestamps before the 
epoch to Byte/Short/Int/Long types
 add d270be4  [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that 
sometimes varies single derby instance standard

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala  | 10 ++
 1 file changed, 10 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d270be4  [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that 
sometimes varies single derby instance standard
d270be4 is described below

commit d270be43e38f1b44a99c4035cc1829751c223346
Author: Kent Yao 
AuthorDate: Fri May 15 06:36:34 2020 +

[SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies 
single derby instance standard

### What changes were proposed in this pull request?


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122622/testReport/junit/org.apache.spark.sql.hive.thriftserver/SparkSQLEnvSuite/SPARK_29604_external_listeners_should_be_initialized_with_Spark_classloader/history/?start=25

According to the test report history of SparkSQLEnvSuite,this test fails 
frequently which is caused by single derby instance restriction.

```java
Caused by: sbt.ForkMain$ForkError: 
org.apache.derby.iapi.error.StandardException: Another instance of Derby may 
have already booted the database 
/home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown
 Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown 
Source)
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown
 Source)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
 Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
 Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown 
Source)
... 138 more
```

This PR adds a separate directory to locate the metastore_db for this test 
which runs in a dedicated JVM.

Besides, diable the UI for the potential race on `spark.ui.port` which may 
also let the test case become flaky.

### Why are the changes needed?
test fix

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
SparkSQLEnvSuite itself.

Closes #28537 from yaooqinn/SPARK-31715.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 503faa24d33eb52da79ad99e39c8c011597499ea)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala  | 10 ++
 1 file changed, 10 insertions(+)

diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala
index ffd1fc4..f28faea 100644
--- 
a/sql/hive-thriftser

[spark] branch master updated (c7ce37d -> 503faa2)

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c7ce37d  [SPARK-31712][SQL][TESTS] Check casting timestamps before the 
epoch to Byte/Short/Int/Long types
 add 503faa2  [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that 
sometimes varies single derby instance standard

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala  | 10 ++
 1 file changed, 10 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 503faa2  [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that 
sometimes varies single derby instance standard
503faa2 is described below

commit 503faa24d33eb52da79ad99e39c8c011597499ea
Author: Kent Yao 
AuthorDate: Fri May 15 06:36:34 2020 +

[SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies 
single derby instance standard

### What changes were proposed in this pull request?


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122622/testReport/junit/org.apache.spark.sql.hive.thriftserver/SparkSQLEnvSuite/SPARK_29604_external_listeners_should_be_initialized_with_Spark_classloader/history/?start=25

According to the test report history of SparkSQLEnvSuite,this test fails 
frequently which is caused by single derby instance restriction.

```java
Caused by: sbt.ForkMain$ForkError: 
org.apache.derby.iapi.error.StandardException: Another instance of Derby may 
have already booted the database 
/home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown
 Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown 
Source)
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown
 Source)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
 Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
 Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown 
Source)
... 138 more
```

This PR adds a separate directory to locate the metastore_db for this test 
which runs in a dedicated JVM.

Besides, diable the UI for the potential race on `spark.ui.port` which may 
also let the test case become flaky.

### Why are the changes needed?
test fix

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
SparkSQLEnvSuite itself.

Closes #28537 from yaooqinn/SPARK-31715.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala  | 10 ++
 1 file changed, 10 insertions(+)

diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala
index ffd1fc4..f28faea 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala
+++ 
b/sql/hive-thriftserver/src

[spark] branch branch-3.0 updated: [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 635feaa  [SPARK-31712][SQL][TESTS] Check casting timestamps before the 
epoch to Byte/Short/Int/Long types
635feaa is described below

commit 635feaa0ef6298e336a447e1fdcfeca403b741bd
Author: Max Gekk 
AuthorDate: Fri May 15 04:24:58 2020 +

[SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to 
Byte/Short/Int/Long types

### What changes were proposed in this pull request?
Added tests to check casting timestamps before 1970-01-01 00:00:00Z to 
ByteType, ShortType, IntegerType and LongType in ansi and non-ansi modes.

### Why are the changes needed?
To improve test coverage and prevent errors while modifying the CAST 
expression code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suites:
```
$ ./build/sbt "test:testOnly *CastSuite"
```

Closes #28531 from MaxGekk/test-cast-timestamp-to-byte.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit c7ce37dfa713f80c5f0157719f0e3d9bf0d271dd)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/CastSuite.scala | 27 ++
 1 file changed, 27 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 5c57843..334b43e 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -1299,6 +1299,18 @@ class CastSuite extends CastSuiteBase {
   }
 }
   }
+
+  test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") {
+withDefaultTimeZone(UTC) {
+  val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
+  assert(negativeTs.getTime < 0)
+  val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
+  checkEvaluation(cast(negativeTs, ByteType), expectedSecs.toByte)
+  checkEvaluation(cast(negativeTs, ShortType), expectedSecs.toShort)
+  checkEvaluation(cast(negativeTs, IntegerType), expectedSecs.toInt)
+  checkEvaluation(cast(negativeTs, LongType), expectedSecs)
+}
+  }
 }
 
 /**
@@ -1341,4 +1353,19 @@ class AnsiCastSuite extends CastSuiteBase {
 cast("abc.com", dataType), "invalid input")
 }
   }
+
+  test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") {
+withDefaultTimeZone(UTC) {
+  val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
+  assert(negativeTs.getTime < 0)
+  val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
+  checkExceptionInExpression[ArithmeticException](
+cast(negativeTs, ByteType), "to byte causes overflow")
+  checkExceptionInExpression[ArithmeticException](
+cast(negativeTs, ShortType), "to short causes overflow")
+  checkExceptionInExpression[ArithmeticException](
+cast(negativeTs, IntegerType), "to int causes overflow")
+  checkEvaluation(cast(negativeTs, LongType), expectedSecs)
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to Byte/Short/Int/Long types

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 635feaa  [SPARK-31712][SQL][TESTS] Check casting timestamps before the 
epoch to Byte/Short/Int/Long types
635feaa is described below

commit 635feaa0ef6298e336a447e1fdcfeca403b741bd
Author: Max Gekk 
AuthorDate: Fri May 15 04:24:58 2020 +

[SPARK-31712][SQL][TESTS] Check casting timestamps before the epoch to 
Byte/Short/Int/Long types

### What changes were proposed in this pull request?
Added tests to check casting timestamps before 1970-01-01 00:00:00Z to 
ByteType, ShortType, IntegerType and LongType in ansi and non-ansi modes.

### Why are the changes needed?
To improve test coverage and prevent errors while modifying the CAST 
expression code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suites:
```
$ ./build/sbt "test:testOnly *CastSuite"
```

Closes #28531 from MaxGekk/test-cast-timestamp-to-byte.

Authored-by: Max Gekk 
Signed-off-by: Wenchen Fan 
(cherry picked from commit c7ce37dfa713f80c5f0157719f0e3d9bf0d271dd)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/CastSuite.scala | 27 ++
 1 file changed, 27 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 5c57843..334b43e 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -1299,6 +1299,18 @@ class CastSuite extends CastSuiteBase {
   }
 }
   }
+
+  test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") {
+withDefaultTimeZone(UTC) {
+  val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
+  assert(negativeTs.getTime < 0)
+  val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
+  checkEvaluation(cast(negativeTs, ByteType), expectedSecs.toByte)
+  checkEvaluation(cast(negativeTs, ShortType), expectedSecs.toShort)
+  checkEvaluation(cast(negativeTs, IntegerType), expectedSecs.toInt)
+  checkEvaluation(cast(negativeTs, LongType), expectedSecs)
+}
+  }
 }
 
 /**
@@ -1341,4 +1353,19 @@ class AnsiCastSuite extends CastSuiteBase {
 cast("abc.com", dataType), "invalid input")
 }
   }
+
+  test("cast a timestamp before the epoch 1970-01-01 00:00:00Z") {
+withDefaultTimeZone(UTC) {
+  val negativeTs = Timestamp.valueOf("1900-05-05 18:34:56.1")
+  assert(negativeTs.getTime < 0)
+  val expectedSecs = Math.floorDiv(negativeTs.getTime, MILLIS_PER_SECOND)
+  checkExceptionInExpression[ArithmeticException](
+cast(negativeTs, ByteType), "to byte causes overflow")
+  checkExceptionInExpression[ArithmeticException](
+cast(negativeTs, ShortType), "to short causes overflow")
+  checkExceptionInExpression[ArithmeticException](
+cast(negativeTs, IntegerType), "to int causes overflow")
+  checkEvaluation(cast(negativeTs, LongType), expectedSecs)
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cd5fbcf -> c7ce37d)

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cd5fbcf  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly
 add c7ce37d  [SPARK-31712][SQL][TESTS] Check casting timestamps before the 
epoch to Byte/Short/Int/Long types

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/CastSuite.scala | 27 ++
 1 file changed, 27 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cd5fbcf -> c7ce37d)

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cd5fbcf  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly
 add c7ce37d  [SPARK-31712][SQL][TESTS] Check casting timestamps before the 
epoch to Byte/Short/Int/Long types

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/CastSuite.scala | 27 ++
 1 file changed, 27 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 630f4dd  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly
630f4dd is described below

commit 630f4dd2b2bb1d33f8c85cc3506a55f0ad70ef7f
Author: Dongjoon Hyun 
AuthorDate: Thu May 14 19:28:25 2020 -0700

[SPARK-31713][INFRA] Make test-dependencies.sh detect version string 
correctly

### What changes were proposed in this pull request?

This PR makes `test-dependencies.sh` detect the version string correctly by 
ignoring all the other lines.

### Why are the changes needed?

Currently, all SBT jobs are broken like the following.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
```
[error] running 
/home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
 ; received return code 1
Build step 'Execute shell' marked build as failure
```

The reason is that the script detects the old version like `Falling back to 
archive.apache.org to download Maven 3.1.0-SNAPSHOT` when `build/mvn` did 
fallback.

Specifically, in the script, `OLD_VERSION` became `Falling back to 
archive.apache.org to download Maven 3.1.0-SNAPSHOT` instead of 
`3.1.0-SNAPSHOT` if build/mvn did fallback. Then, `pom.xml` file is corrupted 
like the following at the end and the exit code become `1` instead of `0`. It 
causes Jenkins jobs fails
```
-3.1.0-SNAPSHOT
+Falling
```

**NO FALLBACK**
```
$ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' 
--non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
3.1.0-SNAPSHOT
```

**FALLBACK**
```
$ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' 
--non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec
Falling back to archive.apache.org to download Maven
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
3.1.0-SNAPSHOT
```

**In the script**
```
$ echo $(build/mvn -q -Dexec.executable="echo" 
-Dexec.args='${project.version}' --non-recursive 
org.codehaus.mojo:exec-maven-plugin:1.6.0:exec)
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT
```

This PR will prevent irrelevant logs like `Falling back to 
archive.apache.org to download Maven`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the PR Builder.

Closes #28532 from dongjoon-hyun/SPARK-31713.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cd5fbcf9a0151f10553f67bcaa22b8122b3cf263)
Signed-off-by: Dongjoon Hyun 
---
 dev/test-dependencies.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/test-dependencies.sh b/dev/test-dependencies.sh
index 6b4fa71..58e9112 100755
--- a/dev/test-dependencies.sh
+++ b/dev/test-dependencies.sh
@@ -56,7 +56,7 @@ OLD_VERSION=$($MVN -q \
 -Dexec.executable="echo" \
 -Dexec.args='${project.version}' \
 --non-recursive \
-org.codehaus.mojo:exec-maven-plugin:1.6.0:exec)
+org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E 
'[0-9]+\.[0-9]+\.[0-9]+')
 if [ $? != 0 ]; then
 echo -e "Error while getting version string from Maven:\n$OLD_VERSION"
 exit 1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 630f4dd  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly
630f4dd is described below

commit 630f4dd2b2bb1d33f8c85cc3506a55f0ad70ef7f
Author: Dongjoon Hyun 
AuthorDate: Thu May 14 19:28:25 2020 -0700

[SPARK-31713][INFRA] Make test-dependencies.sh detect version string 
correctly

### What changes were proposed in this pull request?

This PR makes `test-dependencies.sh` detect the version string correctly by 
ignoring all the other lines.

### Why are the changes needed?

Currently, all SBT jobs are broken like the following.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
```
[error] running 
/home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
 ; received return code 1
Build step 'Execute shell' marked build as failure
```

The reason is that the script detects the old version like `Falling back to 
archive.apache.org to download Maven 3.1.0-SNAPSHOT` when `build/mvn` did 
fallback.

Specifically, in the script, `OLD_VERSION` became `Falling back to 
archive.apache.org to download Maven 3.1.0-SNAPSHOT` instead of 
`3.1.0-SNAPSHOT` if build/mvn did fallback. Then, `pom.xml` file is corrupted 
like the following at the end and the exit code become `1` instead of `0`. It 
causes Jenkins jobs fails
```
-3.1.0-SNAPSHOT
+Falling
```

**NO FALLBACK**
```
$ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' 
--non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
3.1.0-SNAPSHOT
```

**FALLBACK**
```
$ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' 
--non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec
Falling back to archive.apache.org to download Maven
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
3.1.0-SNAPSHOT
```

**In the script**
```
$ echo $(build/mvn -q -Dexec.executable="echo" 
-Dexec.args='${project.version}' --non-recursive 
org.codehaus.mojo:exec-maven-plugin:1.6.0:exec)
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT
```

This PR will prevent irrelevant logs like `Falling back to 
archive.apache.org to download Maven`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the PR Builder.

Closes #28532 from dongjoon-hyun/SPARK-31713.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cd5fbcf9a0151f10553f67bcaa22b8122b3cf263)
Signed-off-by: Dongjoon Hyun 
---
 dev/test-dependencies.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/test-dependencies.sh b/dev/test-dependencies.sh
index 6b4fa71..58e9112 100755
--- a/dev/test-dependencies.sh
+++ b/dev/test-dependencies.sh
@@ -56,7 +56,7 @@ OLD_VERSION=$($MVN -q \
 -Dexec.executable="echo" \
 -Dexec.args='${project.version}' \
 --non-recursive \
-org.codehaus.mojo:exec-maven-plugin:1.6.0:exec)
+org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E 
'[0-9]+\.[0-9]+\.[0-9]+')
 if [ $? != 0 ]; then
 echo -e "Error while getting version string from Maven:\n$OLD_VERSION"
 exit 1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (cf708f9 -> 8ba1578)

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf708f9  Revert "[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener"
 add 8ba1578  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly

No new revisions were added by this update.

Summary of changes:
 dev/test-dependencies.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (cf708f9 -> 8ba1578)

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf708f9  Revert "[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener"
 add 8ba1578  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly

No new revisions were added by this update.

Summary of changes:
 dev/test-dependencies.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 630f4dd  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly
630f4dd is described below

commit 630f4dd2b2bb1d33f8c85cc3506a55f0ad70ef7f
Author: Dongjoon Hyun 
AuthorDate: Thu May 14 19:28:25 2020 -0700

[SPARK-31713][INFRA] Make test-dependencies.sh detect version string 
correctly

### What changes were proposed in this pull request?

This PR makes `test-dependencies.sh` detect the version string correctly by 
ignoring all the other lines.

### Why are the changes needed?

Currently, all SBT jobs are broken like the following.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
```
[error] running 
/home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
 ; received return code 1
Build step 'Execute shell' marked build as failure
```

The reason is that the script detects the old version like `Falling back to 
archive.apache.org to download Maven 3.1.0-SNAPSHOT` when `build/mvn` did 
fallback.

Specifically, in the script, `OLD_VERSION` became `Falling back to 
archive.apache.org to download Maven 3.1.0-SNAPSHOT` instead of 
`3.1.0-SNAPSHOT` if build/mvn did fallback. Then, `pom.xml` file is corrupted 
like the following at the end and the exit code become `1` instead of `0`. It 
causes Jenkins jobs fails
```
-3.1.0-SNAPSHOT
+Falling
```

**NO FALLBACK**
```
$ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' 
--non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
3.1.0-SNAPSHOT
```

**FALLBACK**
```
$ build/mvn -q -Dexec.executable="echo" -Dexec.args='${project.version}' 
--non-recursive org.codehaus.mojo:exec-maven-plugin:1.6.0:exec
Falling back to archive.apache.org to download Maven
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
3.1.0-SNAPSHOT
```

**In the script**
```
$ echo $(build/mvn -q -Dexec.executable="echo" 
-Dexec.args='${project.version}' --non-recursive 
org.codehaus.mojo:exec-maven-plugin:1.6.0:exec)
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.6.3/bin/mvn
Falling back to archive.apache.org to download Maven 3.1.0-SNAPSHOT
```

This PR will prevent irrelevant logs like `Falling back to 
archive.apache.org to download Maven`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the PR Builder.

Closes #28532 from dongjoon-hyun/SPARK-31713.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cd5fbcf9a0151f10553f67bcaa22b8122b3cf263)
Signed-off-by: Dongjoon Hyun 
---
 dev/test-dependencies.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/test-dependencies.sh b/dev/test-dependencies.sh
index 6b4fa71..58e9112 100755
--- a/dev/test-dependencies.sh
+++ b/dev/test-dependencies.sh
@@ -56,7 +56,7 @@ OLD_VERSION=$($MVN -q \
 -Dexec.executable="echo" \
 -Dexec.args='${project.version}' \
 --non-recursive \
-org.codehaus.mojo:exec-maven-plugin:1.6.0:exec)
+org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E 
'[0-9]+\.[0-9]+\.[0-9]+')
 if [ $? != 0 ]; then
 echo -e "Error while getting version string from Maven:\n$OLD_VERSION"
 exit 1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bbb62c5 -> cd5fbcf)

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bbb62c5  Revert "[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener"
 add cd5fbcf  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly

No new revisions were added by this update.

Summary of changes:
 dev/test-dependencies.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bbb62c5 -> cd5fbcf)

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bbb62c5  Revert "[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener"
 add cd5fbcf  [SPARK-31713][INFRA] Make test-dependencies.sh detect version 
string correctly

No new revisions were added by this update.

Summary of changes:
 dev/test-dependencies.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: Revert "[SPARK-31387] Handle unknown operation/session ID in HiveThriftServer2Listener"

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cf708f9  Revert "[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener"
cf708f9 is described below

commit cf708f970b640722062b3102b8757a554aa0c841
Author: Dongjoon Hyun 
AuthorDate: Thu May 14 12:06:13 2020 -0700

Revert "[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener"

This reverts commit 512cb2f0246a0d020f0ba726b4596555b15797c6.
---
 .../ui/HiveThriftServer2Listener.scala | 120 +
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 -
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 ---
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 51 insertions(+), 170 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
index 20a8f2c..6d0a506 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
@@ -25,7 +25,6 @@ import scala.collection.mutable.ArrayBuffer
 import org.apache.hive.service.server.HiveServer2
 
 import org.apache.spark.{SparkConf, SparkContext}
-import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD
 import org.apache.spark.scheduler._
 import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState
@@ -39,7 +38,7 @@ private[thriftserver] class HiveThriftServer2Listener(
 kvstore: ElementTrackingStore,
 sparkConf: SparkConf,
 server: Option[HiveServer2],
-live: Boolean = true) extends SparkListener with Logging {
+live: Boolean = true) extends SparkListener {
 
   private val sessionList = new ConcurrentHashMap[String, LiveSessionData]()
   private val executionList = new ConcurrentHashMap[String, 
LiveExecutionData]()
@@ -132,81 +131,60 @@ private[thriftserver] class HiveThriftServer2Listener(
 updateLiveStore(session)
   }
 
-  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
=
-Option(sessionList.get(e.sessionId)) match {
-  case None => logWarning(s"onSessionClosed called with unknown session 
id: ${e.sessionId}")
-  case Some(sessionData) =>
-val session = sessionData
-session.finishTimestamp = e.finishTime
-updateStoreWithTriggerEnabled(session)
-sessionList.remove(e.sessionId)
-}
+  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
= {
+val session = sessionList.get(e.sessionId)
+session.finishTimestamp = e.finishTime
+updateStoreWithTriggerEnabled(session)
+sessionList.remove(e.sessionId)
+  }
 
-  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit =
-Option(sessionList.get(e.sessionId)) match {
-  case None => logWarning(s"onOperationStart called with unknown session 
id: ${e.sessionId}")
-  case Some(sessionData) =>
-val info = getOrCreateExecution(
-  e.id,
-  e.statement,
-  e.sessionId,
-  e.startTime,
-  e.userName)
-
-info.state = ExecutionState.STARTED
-executionList.put(e.id, info)
-sessionData.totalExecution += 1
-executionList.get(e.id).groupId = e.groupId
-updateLiveStore(executionList.get(e.id))
-updateLiveStore(sessionData)
-}
+  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit = {
+val info = getOrCreateExecution(
+  e.id,
+  e.statement,
+  e.sessionId,
+  e.startTime,
+  e.userName)
+
+info.state = ExecutionState.STARTED
+executionList.put(e.id, info)
+sessionList.get(e.sessionId).totalExecution += 1
+executionList.get(e.id).groupId = e.groupId
+updateLiveStore(executionList.get(e.id))
+updateLiveStore(sessionList.get(e.sessionId))
+  }
 
-  private def onOperationParsed(e: SparkListenerThriftServerOperationParsed): 
Unit =
-Option(executionList.get(e.id)) match {
-  case None => logWarning(s"onOperationParsed called with unknown 
operation id: ${e.id}")
-  case Some(executionData) =>
-executionData.executePlan = e.executionPlan
-executionData.state = ExecutionState.COMPILED
-updateLiveStore(executionData)
-}
+  private def onOperationParsed(e: SparkListenerThriftServerOperationParsed): 
Unit = {
+executionList.get(e.id).executePlan = e.executionPlan

[spark] branch master updated (7ce3f76 -> bbb62c5)

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7ce3f76  [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation
 add bbb62c5  Revert "[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener"

No new revisions were added by this update.

Summary of changes:
 .../ui/HiveThriftServer2Listener.scala | 120 +
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 -
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 ---
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 51 insertions(+), 170 deletions(-)
 delete mode 100644 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7ce3f76 -> bbb62c5)

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7ce3f76  [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation
 add bbb62c5  Revert "[SPARK-31387] Handle unknown operation/session ID in 
HiveThriftServer2Listener"

No new revisions were added by this update.

Summary of changes:
 .../ui/HiveThriftServer2Listener.scala | 120 +
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 -
 .../ui/HiveThriftServer2ListenerSuite.scala|  16 ---
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 .../hive/service/cli/session/HiveSessionImpl.java  |   6 +-
 5 files changed, 51 insertions(+), 170 deletions(-)
 delete mode 100644 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ca9cde8  [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation
ca9cde8 is described below

commit ca9cde8e76abfdb73f0202fbf043e4d895d7163b
Author: Dongjoon Hyun 
AuthorDate: Thu May 14 10:25:22 2020 -0700

[SPARK-31696][DOCS][FOLLOWUP] Update version in documentation

# What changes were proposed in this pull request?

This PR is a follow-up to fix a version of configuration document.

### Why are the changes needed?

The original PR is backported to branch-3.0.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Manual.

Closes #28530 from dongjoon-hyun/SPARK-31696-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 7ce3f76af6b72e88722a89f792e6fded9c586795)
Signed-off-by: Dongjoon Hyun 
---
 docs/running-on-kubernetes.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 2739149..3abb891 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -815,7 +815,7 @@ See the [configuration page](configuration.html) for 
information on Spark config
 Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation
 specified by AnnotationName to the driver service.
 For example, 
spark.kubernetes.driver.service.annotation.something=true.
   
-  3.1.0
+  3.0.0
 
 
   spark.kubernetes.executor.label.[LabelName]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e10516a -> 7ce3f76)

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e10516a  [SPARK-31681][ML][PYSPARK] Python multiclass logistic 
regression evaluate should return LogisticRegressionSummary
 add 7ce3f76  [SPARK-31696][DOCS][FOLLOWUP] Update version in documentation

No new revisions were added by this update.

Summary of changes:
 docs/running-on-kubernetes.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31696][K8S] Support driver service annotation in K8S

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 541d451  [SPARK-31696][K8S] Support driver service annotation in K8S
541d451 is described below

commit 541d451cd7062f04e53d93719034877f27452a3e
Author: Dongjoon Hyun 
AuthorDate: Wed May 13 13:59:42 2020 -0700

[SPARK-31696][K8S] Support driver service annotation in K8S

### What changes were proposed in this pull request?

This PR aims to add `spark.kubernetes.driver.service.annotation` like 
`spark.kubernetes.driver.service.annotation`.

### Why are the changes needed?

Annotations are used in many ways. One example is that Prometheus 
monitoring system search metric endpoint via annotation.
- 
https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations

### Does this PR introduce _any_ user-facing change?

Yes. The documentation is added.

### How was this patch tested?

Pass Jenkins with the updated unit tests.

Closes #28518 from dongjoon-hyun/SPARK-31696.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit c8f3bd861d96cf3f7b01cd9f864c181a57e1c77a)
Signed-off-by: Dongjoon Hyun 
---
 docs/running-on-kubernetes.md   | 13 +++--
 .../src/main/scala/org/apache/spark/deploy/k8s/Config.scala |  1 +
 .../scala/org/apache/spark/deploy/k8s/KubernetesConf.scala  |  5 +
 .../deploy/k8s/features/DriverServiceFeatureStep.scala  |  1 +
 .../org/apache/spark/deploy/k8s/KubernetesTestConf.scala|  2 ++
 .../deploy/k8s/features/DriverServiceFeatureStepSuite.scala | 13 +++--
 6 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 4f228a5..2739149 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -803,12 +803,21 @@ See the [configuration page](configuration.html) for 
information on Spark config
   spark.kubernetes.driver.annotation.[AnnotationName]
   (none)
   
-Add the annotation specified by AnnotationName to the driver 
pod.
+Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation
 specified by AnnotationName to the driver pod.
 For example, 
spark.kubernetes.driver.annotation.something=true.
   
   2.3.0
 
 
+  
spark.kubernetes.driver.service.annotation.[AnnotationName]
+  (none)
+  
+Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation
 specified by AnnotationName to the driver service.
+For example, 
spark.kubernetes.driver.service.annotation.something=true.
+  
+  3.1.0
+
+
   spark.kubernetes.executor.label.[LabelName]
   (none)
   
@@ -823,7 +832,7 @@ See the [configuration page](configuration.html) for 
information on Spark config
   spark.kubernetes.executor.annotation.[AnnotationName]
   (none)
   
-Add the annotation specified by AnnotationName to the 
executor pods.
+Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation
 specified by AnnotationName to the executor pods.
 For example, 
spark.kubernetes.executor.annotation.something=true.
   
   2.3.0
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
index 8684a60..22f4c75 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -411,6 +411,7 @@ private[spark] object Config extends Logging {
 
   val KUBERNETES_DRIVER_LABEL_PREFIX = "spark.kubernetes.driver.label."
   val KUBERNETES_DRIVER_ANNOTATION_PREFIX = 
"spark.kubernetes.driver.annotation."
+  val KUBERNETES_DRIVER_SERVICE_ANNOTATION_PREFIX = 
"spark.kubernetes.driver.service.annotation."
   val KUBERNETES_DRIVER_SECRETS_PREFIX = "spark.kubernetes.driver.secrets."
   val KUBERNETES_DRIVER_SECRET_KEY_REF_PREFIX = 
"spark.kubernetes.driver.secretKeyRef."
   val KUBERNETES_DRIVER_VOLUMES_PREFIX = "spark.kubernetes.driver.volumes."
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
index 09943b7..6dcae3e 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
@@ -106,6 +106,11 @@ private[spark] class KubernetesDr

[spark] branch branch-3.0 updated: [SPARK-31696][K8S] Support driver service annotation in K8S

2020-05-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 541d451  [SPARK-31696][K8S] Support driver service annotation in K8S
541d451 is described below

commit 541d451cd7062f04e53d93719034877f27452a3e
Author: Dongjoon Hyun 
AuthorDate: Wed May 13 13:59:42 2020 -0700

[SPARK-31696][K8S] Support driver service annotation in K8S

### What changes were proposed in this pull request?

This PR aims to add `spark.kubernetes.driver.service.annotation` like 
`spark.kubernetes.driver.service.annotation`.

### Why are the changes needed?

Annotations are used in many ways. One example is that Prometheus 
monitoring system search metric endpoint via annotation.
- 
https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations

### Does this PR introduce _any_ user-facing change?

Yes. The documentation is added.

### How was this patch tested?

Pass Jenkins with the updated unit tests.

Closes #28518 from dongjoon-hyun/SPARK-31696.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit c8f3bd861d96cf3f7b01cd9f864c181a57e1c77a)
Signed-off-by: Dongjoon Hyun 
---
 docs/running-on-kubernetes.md   | 13 +++--
 .../src/main/scala/org/apache/spark/deploy/k8s/Config.scala |  1 +
 .../scala/org/apache/spark/deploy/k8s/KubernetesConf.scala  |  5 +
 .../deploy/k8s/features/DriverServiceFeatureStep.scala  |  1 +
 .../org/apache/spark/deploy/k8s/KubernetesTestConf.scala|  2 ++
 .../deploy/k8s/features/DriverServiceFeatureStepSuite.scala | 13 +++--
 6 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 4f228a5..2739149 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -803,12 +803,21 @@ See the [configuration page](configuration.html) for 
information on Spark config
   spark.kubernetes.driver.annotation.[AnnotationName]
   (none)
   
-Add the annotation specified by AnnotationName to the driver 
pod.
+Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation
 specified by AnnotationName to the driver pod.
 For example, 
spark.kubernetes.driver.annotation.something=true.
   
   2.3.0
 
 
+  
spark.kubernetes.driver.service.annotation.[AnnotationName]
+  (none)
+  
+Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation
 specified by AnnotationName to the driver service.
+For example, 
spark.kubernetes.driver.service.annotation.something=true.
+  
+  3.1.0
+
+
   spark.kubernetes.executor.label.[LabelName]
   (none)
   
@@ -823,7 +832,7 @@ See the [configuration page](configuration.html) for 
information on Spark config
   spark.kubernetes.executor.annotation.[AnnotationName]
   (none)
   
-Add the annotation specified by AnnotationName to the 
executor pods.
+Add the Kubernetes https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/";>annotation
 specified by AnnotationName to the executor pods.
 For example, 
spark.kubernetes.executor.annotation.something=true.
   
   2.3.0
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
index 8684a60..22f4c75 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -411,6 +411,7 @@ private[spark] object Config extends Logging {
 
   val KUBERNETES_DRIVER_LABEL_PREFIX = "spark.kubernetes.driver.label."
   val KUBERNETES_DRIVER_ANNOTATION_PREFIX = 
"spark.kubernetes.driver.annotation."
+  val KUBERNETES_DRIVER_SERVICE_ANNOTATION_PREFIX = 
"spark.kubernetes.driver.service.annotation."
   val KUBERNETES_DRIVER_SECRETS_PREFIX = "spark.kubernetes.driver.secrets."
   val KUBERNETES_DRIVER_SECRET_KEY_REF_PREFIX = 
"spark.kubernetes.driver.secretKeyRef."
   val KUBERNETES_DRIVER_VOLUMES_PREFIX = "spark.kubernetes.driver.volumes."
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
index 09943b7..6dcae3e 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
@@ -106,6 +106,11 @@ private[spark] class KubernetesDr

[spark] branch branch-3.0 updated: [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-14 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6834f46  [SPARK-31681][ML][PYSPARK] Python multiclass logistic 
regression evaluate should return LogisticRegressionSummary
6834f46 is described below

commit 6834f4691b3e2603d410bfe24f0db0b6e3a36446
Author: Huaxin Gao 
AuthorDate: Thu May 14 10:54:35 2020 -0500

[SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate 
should return LogisticRegressionSummary

### What changes were proposed in this pull request?
Return LogisticRegressionSummary for multiclass logistic regression 
evaluate in PySpark

### Why are the changes needed?
Currently we have
```
since("2.0.0")
def evaluate(self, dataset):
if not isinstance(dataset, DataFrame):
raise ValueError("dataset must be a DataFrame but got %s." % 
type(dataset))
java_blr_summary = self._call_java("evaluate", dataset)
return BinaryLogisticRegressionSummary(java_blr_summary)
```
we should return LogisticRegressionSummary for multiclass logistic 
regression

### Does this PR introduce _any_ user-facing change?
Yes
return LogisticRegressionSummary instead of BinaryLogisticRegressionSummary 
for multiclass logistic regression in Python

### How was this patch tested?
unit test

Closes #28503 from huaxingao/lr_summary.

Authored-by: Huaxin Gao 
Signed-off-by: Sean Owen 
(cherry picked from commit e10516ae63cfc58f2d493e4d3f19940d45c8f033)
Signed-off-by: Sean Owen 
---
 python/pyspark/ml/classification.py  | 5 -
 python/pyspark/ml/tests/test_training_summary.py | 6 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/ml/classification.py 
b/python/pyspark/ml/classification.py
index 1436b78..424e16a 100644
--- a/python/pyspark/ml/classification.py
+++ b/python/pyspark/ml/classification.py
@@ -831,7 +831,10 @@ class 
LogisticRegressionModel(JavaProbabilisticClassificationModel, _LogisticReg
 if not isinstance(dataset, DataFrame):
 raise ValueError("dataset must be a DataFrame but got %s." % 
type(dataset))
 java_blr_summary = self._call_java("evaluate", dataset)
-return BinaryLogisticRegressionSummary(java_blr_summary)
+if self.numClasses <= 2:
+return BinaryLogisticRegressionSummary(java_blr_summary)
+else:
+return LogisticRegressionSummary(java_blr_summary)
 
 
 class LogisticRegressionSummary(JavaWrapper):
diff --git a/python/pyspark/ml/tests/test_training_summary.py 
b/python/pyspark/ml/tests/test_training_summary.py
index 1d19ebf..b505409 100644
--- a/python/pyspark/ml/tests/test_training_summary.py
+++ b/python/pyspark/ml/tests/test_training_summary.py
@@ -21,7 +21,8 @@ import unittest
 if sys.version > '3':
 basestring = str
 
-from pyspark.ml.classification import LogisticRegression
+from pyspark.ml.classification import BinaryLogisticRegressionSummary, 
LogisticRegression, \
+LogisticRegressionSummary
 from pyspark.ml.clustering import BisectingKMeans, GaussianMixture, KMeans
 from pyspark.ml.linalg import Vectors
 from pyspark.ml.regression import GeneralizedLinearRegression, LinearRegression
@@ -149,6 +150,7 @@ class TrainingSummaryTest(SparkSessionTestCase):
 # test evaluation (with training dataset) produces a summary with same 
values
 # one check is enough to verify a summary is returned, Scala version 
runs full test
 sameSummary = model.evaluate(df)
+self.assertTrue(isinstance(sameSummary, 
BinaryLogisticRegressionSummary))
 self.assertAlmostEqual(sameSummary.areaUnderROC, s.areaUnderROC)
 
 def test_multiclass_logistic_regression_summary(self):
@@ -187,6 +189,8 @@ class TrainingSummaryTest(SparkSessionTestCase):
 # test evaluation (with training dataset) produces a summary with same 
values
 # one check is enough to verify a summary is returned, Scala version 
runs full test
 sameSummary = model.evaluate(df)
+self.assertTrue(isinstance(sameSummary, LogisticRegressionSummary))
+self.assertFalse(isinstance(sameSummary, 
BinaryLogisticRegressionSummary))
 self.assertAlmostEqual(sameSummary.accuracy, s.accuracy)
 
 def test_gaussian_mixture_summary(self):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-14 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6834f46  [SPARK-31681][ML][PYSPARK] Python multiclass logistic 
regression evaluate should return LogisticRegressionSummary
6834f46 is described below

commit 6834f4691b3e2603d410bfe24f0db0b6e3a36446
Author: Huaxin Gao 
AuthorDate: Thu May 14 10:54:35 2020 -0500

[SPARK-31681][ML][PYSPARK] Python multiclass logistic regression evaluate 
should return LogisticRegressionSummary

### What changes were proposed in this pull request?
Return LogisticRegressionSummary for multiclass logistic regression 
evaluate in PySpark

### Why are the changes needed?
Currently we have
```
since("2.0.0")
def evaluate(self, dataset):
if not isinstance(dataset, DataFrame):
raise ValueError("dataset must be a DataFrame but got %s." % 
type(dataset))
java_blr_summary = self._call_java("evaluate", dataset)
return BinaryLogisticRegressionSummary(java_blr_summary)
```
we should return LogisticRegressionSummary for multiclass logistic 
regression

### Does this PR introduce _any_ user-facing change?
Yes
return LogisticRegressionSummary instead of BinaryLogisticRegressionSummary 
for multiclass logistic regression in Python

### How was this patch tested?
unit test

Closes #28503 from huaxingao/lr_summary.

Authored-by: Huaxin Gao 
Signed-off-by: Sean Owen 
(cherry picked from commit e10516ae63cfc58f2d493e4d3f19940d45c8f033)
Signed-off-by: Sean Owen 
---
 python/pyspark/ml/classification.py  | 5 -
 python/pyspark/ml/tests/test_training_summary.py | 6 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/ml/classification.py 
b/python/pyspark/ml/classification.py
index 1436b78..424e16a 100644
--- a/python/pyspark/ml/classification.py
+++ b/python/pyspark/ml/classification.py
@@ -831,7 +831,10 @@ class 
LogisticRegressionModel(JavaProbabilisticClassificationModel, _LogisticReg
 if not isinstance(dataset, DataFrame):
 raise ValueError("dataset must be a DataFrame but got %s." % 
type(dataset))
 java_blr_summary = self._call_java("evaluate", dataset)
-return BinaryLogisticRegressionSummary(java_blr_summary)
+if self.numClasses <= 2:
+return BinaryLogisticRegressionSummary(java_blr_summary)
+else:
+return LogisticRegressionSummary(java_blr_summary)
 
 
 class LogisticRegressionSummary(JavaWrapper):
diff --git a/python/pyspark/ml/tests/test_training_summary.py 
b/python/pyspark/ml/tests/test_training_summary.py
index 1d19ebf..b505409 100644
--- a/python/pyspark/ml/tests/test_training_summary.py
+++ b/python/pyspark/ml/tests/test_training_summary.py
@@ -21,7 +21,8 @@ import unittest
 if sys.version > '3':
 basestring = str
 
-from pyspark.ml.classification import LogisticRegression
+from pyspark.ml.classification import BinaryLogisticRegressionSummary, 
LogisticRegression, \
+LogisticRegressionSummary
 from pyspark.ml.clustering import BisectingKMeans, GaussianMixture, KMeans
 from pyspark.ml.linalg import Vectors
 from pyspark.ml.regression import GeneralizedLinearRegression, LinearRegression
@@ -149,6 +150,7 @@ class TrainingSummaryTest(SparkSessionTestCase):
 # test evaluation (with training dataset) produces a summary with same 
values
 # one check is enough to verify a summary is returned, Scala version 
runs full test
 sameSummary = model.evaluate(df)
+self.assertTrue(isinstance(sameSummary, 
BinaryLogisticRegressionSummary))
 self.assertAlmostEqual(sameSummary.areaUnderROC, s.areaUnderROC)
 
 def test_multiclass_logistic_regression_summary(self):
@@ -187,6 +189,8 @@ class TrainingSummaryTest(SparkSessionTestCase):
 # test evaluation (with training dataset) produces a summary with same 
values
 # one check is enough to verify a summary is returned, Scala version 
runs full test
 sameSummary = model.evaluate(df)
+self.assertTrue(isinstance(sameSummary, LogisticRegressionSummary))
+self.assertFalse(isinstance(sameSummary, 
BinaryLogisticRegressionSummary))
 self.assertAlmostEqual(sameSummary.accuracy, s.accuracy)
 
 def test_gaussian_mixture_summary(self):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b2300fc -> e10516a)

2020-05-14 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b2300fc  [SPARK-31676][ML] QuantileDiscretizer raise error parameter 
splits given invalid value (splits array includes -0.0 and 0.0)
 add e10516a  [SPARK-31681][ML][PYSPARK] Python multiclass logistic 
regression evaluate should return LogisticRegressionSummary

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/classification.py  | 5 -
 python/pyspark/ml/tests/test_training_summary.py | 6 +-
 2 files changed, 9 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b2300fc -> e10516a)

2020-05-14 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b2300fc  [SPARK-31676][ML] QuantileDiscretizer raise error parameter 
splits given invalid value (splits array includes -0.0 and 0.0)
 add e10516a  [SPARK-31681][ML][PYSPARK] Python multiclass logistic 
regression evaluate should return LogisticRegressionSummary

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/classification.py  | 5 -
 python/pyspark/ml/tests/test_training_summary.py | 6 +-
 2 files changed, 9 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-14 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 1ea5844  [SPARK-31676][ML] QuantileDiscretizer raise error parameter 
splits given invalid value (splits array includes -0.0 and 0.0)
1ea5844 is described below

commit 1ea584443e9372a6a0b3c8449f5bf7e9e1369b0d
Author: Weichen Xu 
AuthorDate: Thu May 14 09:24:40 2020 -0500

[SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given 
invalid value (splits array includes -0.0 and 0.0)

In QuantileDiscretizer.getDistinctSplits, before invoking distinct, 
normalize all -0.0 and 0.0 to be 0.0
```
for (i <- 0 until splits.length) {
  if (splits(i) == -0.0) {
splits(i) = 0.0
  }
}
```
Fix bug.

No

Unit test.

~~~scala
import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)

import spark.implicits._
val df1 = sc.parallelize(a1, 2).toDF("id")

import org.apache.spark.ml.feature.QuantileDiscretizer
val qd = new 
QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)

val model = qd.fit(df1) // will raise error in spark master.
~~~

scala `0.0 == -0.0` is True but `0.0.hashCode == -0.0.hashCode()` is False. 
This break the contract between equals() and hashCode() If two objects are 
equal, then they must have the same hash code.

And array.distinct will rely on elem.hashCode so it leads to this error.

Test code on distinct
```
import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
a1.distinct.sorted.foreach(x => print(x.toString + "\n"))
```

Then you will see output like:
```
...
-0.009292684662246975
-0.0033280686465135823
-0.0
0.0
0.0022219556032221366
0.02217419561977274
...
```

Closes #28498 from WeichenXu123/SPARK-31676.

Authored-by: Weichen Xu 
Signed-off-by: Sean Owen 
(cherry picked from commit b2300fca1e1a22d74c6eeda37942920a6c6299ff)
Signed-off-by: Sean Owen 
---
 .../apache/spark/ml/feature/QuantileDiscretizer.scala  | 12 
 .../spark/ml/feature/QuantileDiscretizerSuite.scala| 18 ++
 2 files changed, 30 insertions(+)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
index 56e2c54..f3ec358 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
@@ -243,6 +243,18 @@ final class QuantileDiscretizer @Since("1.6.0") 
(@Since("1.6.0") override val ui
   private def getDistinctSplits(splits: Array[Double]): Array[Double] = {
 splits(0) = Double.NegativeInfinity
 splits(splits.length - 1) = Double.PositiveInfinity
+
+// 0.0 and -0.0 are distinct values, array.distinct will preserve both of 
them.
+// but 0.0 > -0.0 is False which will break the parameter validation 
checking.
+// and in scala <= 2.12, there's bug which will cause array.distinct 
generate
+// non-deterministic results when array contains both 0.0 and -0.0
+// So that here we should first normalize all 0.0 and -0.0 to be 0.0
+// See https://github.com/scala/bug/issues/11995
+for (i <- 0 until splits.length) {
+  if (splits(i) == -0.0) {
+splits(i) = 0.0
+  }
+}
 val distinctSplits = splits.distinct
 if (splits.length != distinctSplits.length) {
   log.warn(s"Some quantiles were identical. Bucketing to 
${distinctSplits.length - 1}" +
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
index b009038..9c37416 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
@@ -443,4 +443,22 @@ class QuantileDiscretizerSuite extends MLTest with 
DefaultReadWriteTest {
   discretizer.fit(df)
 }
   }
+
+  test("[SPARK-31676] QuantileDiscretizer raise error parameter splits given 
invalid value") {
+import scala.util.Random
+val rng = new Random(3)
+
+val a1 = Array.tabulate(200)(_ => rng.nextDouble * 2.0 - 1.0) ++
+  Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
+
+val df1 = sc.parallelize(a1, 2).toDF("id")
+
+val qd = new QuantileDiscretizer()
+  .setInputCol("id")
+  .se

[spark] branch branch-3.0 updated: [SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-14 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 00e6acc  [SPARK-31676][ML] QuantileDiscretizer raise error parameter 
splits given invalid value (splits array includes -0.0 and 0.0)
00e6acc is described below

commit 00e6acc9c6d45c5dd3b3f70c87909743a8073dba
Author: Weichen Xu 
AuthorDate: Thu May 14 09:24:40 2020 -0500

[SPARK-31676][ML] QuantileDiscretizer raise error parameter splits given 
invalid value (splits array includes -0.0 and 0.0)

### What changes were proposed in this pull request?

In QuantileDiscretizer.getDistinctSplits, before invoking distinct, 
normalize all -0.0 and 0.0 to be 0.0
```
for (i <- 0 until splits.length) {
  if (splits(i) == -0.0) {
splits(i) = 0.0
  }
}
```
### Why are the changes needed?
Fix bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test.

 Manually test:

~~~scala
import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)

import spark.implicits._
val df1 = sc.parallelize(a1, 2).toDF("id")

import org.apache.spark.ml.feature.QuantileDiscretizer
val qd = new 
QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)

val model = qd.fit(df1) // will raise error in spark master.
~~~

### Explain
scala `0.0 == -0.0` is True but `0.0.hashCode == -0.0.hashCode()` is False. 
This break the contract between equals() and hashCode() If two objects are 
equal, then they must have the same hash code.

And array.distinct will rely on elem.hashCode so it leads to this error.

Test code on distinct
```
import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
a1.distinct.sorted.foreach(x => print(x.toString + "\n"))
```

Then you will see output like:
```
...
-0.009292684662246975
-0.0033280686465135823
-0.0
0.0
0.0022219556032221366
0.02217419561977274
...
```

Closes #28498 from WeichenXu123/SPARK-31676.

Authored-by: Weichen Xu 
Signed-off-by: Sean Owen 
(cherry picked from commit b2300fca1e1a22d74c6eeda37942920a6c6299ff)
Signed-off-by: Sean Owen 
---
 .../apache/spark/ml/feature/QuantileDiscretizer.scala  | 12 
 .../spark/ml/feature/QuantileDiscretizerSuite.scala| 18 ++
 2 files changed, 30 insertions(+)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
index 216d99d..4eedfc4 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
@@ -236,6 +236,18 @@ final class QuantileDiscretizer @Since("1.6.0") 
(@Since("1.6.0") override val ui
   private def getDistinctSplits(splits: Array[Double]): Array[Double] = {
 splits(0) = Double.NegativeInfinity
 splits(splits.length - 1) = Double.PositiveInfinity
+
+// 0.0 and -0.0 are distinct values, array.distinct will preserve both of 
them.
+// but 0.0 > -0.0 is False which will break the parameter validation 
checking.
+// and in scala <= 2.12, there's bug which will cause array.distinct 
generate
+// non-deterministic results when array contains both 0.0 and -0.0
+// So that here we should first normalize all 0.0 and -0.0 to be 0.0
+// See https://github.com/scala/bug/issues/11995
+for (i <- 0 until splits.length) {
+  if (splits(i) == -0.0) {
+splits(i) = 0.0
+  }
+}
 val distinctSplits = splits.distinct
 if (splits.length != distinctSplits.length) {
   log.warn(s"Some quantiles were identical. Bucketing to 
${distinctSplits.length - 1}" +
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
index 6f6ab26..682b87a 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
@@ -512,4 +512,22 @@ class QuantileDiscretizerSuite extends MLTest with 
DefaultReadWriteTest {
 assert(observedNumBuckets === numBuckets,
   "Observed number of buckets does not equal expected number of buckets.")
   }
+
+  test("[SPARK-31676] QuantileDiscretizer raise error parameter splits given 

[spark] branch master updated (ddbce4e -> b2300fc)

2020-05-14 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ddbce4e  [SPARK-30973][SQL] ScriptTransformationExec should wait for 
the termination …
 add b2300fc  [SPARK-31676][ML] QuantileDiscretizer raise error parameter 
splits given invalid value (splits array includes -0.0 and 0.0)

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/ml/feature/QuantileDiscretizer.scala  | 12 
 .../spark/ml/feature/QuantileDiscretizerSuite.scala| 18 ++
 2 files changed, 30 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5cf11c  [SPARK-30973][SQL] ScriptTransformationExec should wait for 
the termination …
f5cf11c is described below

commit f5cf11c4d39190f7f5f8a20c8c634c0dc2d6c212
Author: sunke.03 
AuthorDate: Thu May 14 13:55:24 2020 +

[SPARK-30973][SQL] ScriptTransformationExec should wait for the termination 
…

### What changes were proposed in this pull request?

This PR try to fix a bug in 
`org.apache.spark.sql.hive.execution.ScriptTransformationExec`. This bug 
appears in our online cluster.  `ScriptTransformationExec` should throw an 
exception, when user uses a python script which contains parse error.  But 
current implementation may miss this case of failure.

### Why are the changes needed?

When user uses a python script which contains a parse error, there will be 
no output. So  `scriptOutputReader.next(scriptOutputWritable) <= 0` matches, 
then we use `checkFailureAndPropagate()` to check the `proc`.  But the `proc` 
may still be alive and `writerThread.exception` is not defined,  
`checkFailureAndPropagate` cannot check this case of failure.  In the end, the 
Spark SQL job runs successfully and returns no result. In fact, the SparK SQL 
job should fails and shows the except [...]

For example, the error python script is blow.
``` python
# encoding: utf8
import unknow_module
import sys

for line in sys.stdin:
print line
```
The bug can be reproduced by running the following code in our cluter.
```
spark.range(100*100).toDF("index").createOrReplaceTempView("test")
spark.sql("select TRANSFORM(index) USING 'python error_python.py' as 
new_index from test").collect.foreach(println)
```

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing UT

Closes #27724 from slamke/transformation.

Authored-by: sunke.03 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ddbce4edee6d4de30e6900bc0f03728a989aef0a)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 ++
 .../hive/execution/ScriptTransformationExec.scala  | 12 +++-
 .../hive/execution/ScriptTransformationSuite.scala | 36 ++
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 6c18280..31038a0e 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2561,6 +2561,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val SCRIPT_TRANSFORMATION_EXIT_TIMEOUT =
+buildConf("spark.sql.scriptTransformation.exitTimeoutInSeconds")
+  .internal()
+  .doc("Timeout for executor to wait for the termination of transformation 
script when EOF.")
+  .version("3.0.0")
+  .timeConf(TimeUnit.SECONDS)
+  .checkValue(_ > 0, "The timeout value must be positive")
+  .createWithDefault(10L)
+
   /**
* Holds information about keys that have been deprecated.
*
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
index 40f7b4e..c7183fd 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.hive.execution
 import java.io._
 import java.nio.charset.StandardCharsets
 import java.util.Properties
+import java.util.concurrent.TimeUnit
 import javax.annotation.Nullable
 
 import scala.collection.JavaConverters._
@@ -42,6 +43,7 @@ import 
org.apache.spark.sql.catalyst.plans.physical.Partitioning
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.hive.HiveInspectors
 import org.apache.spark.sql.hive.HiveShim._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.DataType
 import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfiguration, Utils}
 
@@ -136,6 +138,15 @@ case class ScriptTransformationExec(
 throw writerThread.exception.get
   }
 
+  // There can be a lag between reader read EOF and the process 
termination.
+  // If the script fails to startup, this kind of error may be missed.
+  // So explicitly waiting for the process termination.
+  val timeout = 
conf.getConf(SQLConf.

[spark] branch branch-3.0 updated: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5cf11c  [SPARK-30973][SQL] ScriptTransformationExec should wait for 
the termination …
f5cf11c is described below

commit f5cf11c4d39190f7f5f8a20c8c634c0dc2d6c212
Author: sunke.03 
AuthorDate: Thu May 14 13:55:24 2020 +

[SPARK-30973][SQL] ScriptTransformationExec should wait for the termination 
…

### What changes were proposed in this pull request?

This PR try to fix a bug in 
`org.apache.spark.sql.hive.execution.ScriptTransformationExec`. This bug 
appears in our online cluster.  `ScriptTransformationExec` should throw an 
exception, when user uses a python script which contains parse error.  But 
current implementation may miss this case of failure.

### Why are the changes needed?

When user uses a python script which contains a parse error, there will be 
no output. So  `scriptOutputReader.next(scriptOutputWritable) <= 0` matches, 
then we use `checkFailureAndPropagate()` to check the `proc`.  But the `proc` 
may still be alive and `writerThread.exception` is not defined,  
`checkFailureAndPropagate` cannot check this case of failure.  In the end, the 
Spark SQL job runs successfully and returns no result. In fact, the SparK SQL 
job should fails and shows the except [...]

For example, the error python script is blow.
``` python
# encoding: utf8
import unknow_module
import sys

for line in sys.stdin:
print line
```
The bug can be reproduced by running the following code in our cluter.
```
spark.range(100*100).toDF("index").createOrReplaceTempView("test")
spark.sql("select TRANSFORM(index) USING 'python error_python.py' as 
new_index from test").collect.foreach(println)
```

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing UT

Closes #27724 from slamke/transformation.

Authored-by: sunke.03 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ddbce4edee6d4de30e6900bc0f03728a989aef0a)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 ++
 .../hive/execution/ScriptTransformationExec.scala  | 12 +++-
 .../hive/execution/ScriptTransformationSuite.scala | 36 ++
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 6c18280..31038a0e 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2561,6 +2561,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val SCRIPT_TRANSFORMATION_EXIT_TIMEOUT =
+buildConf("spark.sql.scriptTransformation.exitTimeoutInSeconds")
+  .internal()
+  .doc("Timeout for executor to wait for the termination of transformation 
script when EOF.")
+  .version("3.0.0")
+  .timeConf(TimeUnit.SECONDS)
+  .checkValue(_ > 0, "The timeout value must be positive")
+  .createWithDefault(10L)
+
   /**
* Holds information about keys that have been deprecated.
*
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
index 40f7b4e..c7183fd 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.hive.execution
 import java.io._
 import java.nio.charset.StandardCharsets
 import java.util.Properties
+import java.util.concurrent.TimeUnit
 import javax.annotation.Nullable
 
 import scala.collection.JavaConverters._
@@ -42,6 +43,7 @@ import 
org.apache.spark.sql.catalyst.plans.physical.Partitioning
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.hive.HiveInspectors
 import org.apache.spark.sql.hive.HiveShim._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.DataType
 import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfiguration, Utils}
 
@@ -136,6 +138,15 @@ case class ScriptTransformationExec(
 throw writerThread.exception.get
   }
 
+  // There can be a lag between reader read EOF and the process 
termination.
+  // If the script fails to startup, this kind of error may be missed.
+  // So explicitly waiting for the process termination.
+  val timeout = 
conf.getConf(SQLConf.

[spark] branch master updated (7260146 -> ddbce4e)

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7260146  [SPARK-31692][SQL] Pass hadoop confs  specifed via Spark 
confs to URLStreamHandlerfactory
 add ddbce4e  [SPARK-30973][SQL] ScriptTransformationExec should wait for 
the termination …

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 ++
 .../hive/execution/ScriptTransformationExec.scala  | 12 +++-
 .../hive/execution/ScriptTransformationSuite.scala | 36 ++
 3 files changed, 56 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7260146 -> ddbce4e)

2020-05-14 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7260146  [SPARK-31692][SQL] Pass hadoop confs  specifed via Spark 
confs to URLStreamHandlerfactory
 add ddbce4e  [SPARK-30973][SQL] ScriptTransformationExec should wait for 
the termination …

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 ++
 .../hive/execution/ScriptTransformationExec.scala  | 12 +++-
 .../hive/execution/ScriptTransformationSuite.scala | 36 ++
 3 files changed, 56 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org