date:20200521

[spark] branch master updated (83d0967 -> 245aee9)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
 add 245aee9  [SPARK-31757][CORE] Improve 
HistoryServerDiskManager.updateAccessTime()

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/HistoryServerDiskManager.scala   | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (83d0967 -> 245aee9)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
 add 245aee9  [SPARK-31757][CORE] Improve 
HistoryServerDiskManager.updateAccessTime()

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/HistoryServerDiskManager.scala   | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ec80e4b  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
ec80e4b is described below

commit ec80e4b5f80876378765544e0c4c6af1a704
Author: yi.wu 
AuthorDate: Thu May 21 23:34:11 2020 -0700

[SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages 
with allGather() call"

### What changes were proposed in this pull request?

Change from `messages.toList.iterator` to 
`Iterator.single(messages.toList)`.

### Why are the changes needed?

In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated test.

Thanks WeichenXu123 reported this problem.

Closes #28596 from Ngone51/fix_allgather_test.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index b5614b2..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   // Pass partitionId message in
   val message: String = context.partitionId().toString
   val messages: Array[String] = context.allGather(message)
-  messages.toList.iterator
+  Iterator.single(messages.toList)
 }
-// Take a sorted list of all the partitionId messages
-val messages = rdd2.collect().head
-// All the task partitionIds are shared
-for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString)
+val messages = rdd2.collect()
+// All the task partitionIds are shared across all tasks
+assert(messages.length === 4)
+assert(messages.forall(_ == List("0", "1", "2", "3")))
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (83d0967 -> 245aee9)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
 add 245aee9  [SPARK-31757][CORE] Improve 
HistoryServerDiskManager.updateAccessTime()

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/HistoryServerDiskManager.scala   | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ec80e4b  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
ec80e4b is described below

commit ec80e4b5f80876378765544e0c4c6af1a704
Author: yi.wu 
AuthorDate: Thu May 21 23:34:11 2020 -0700

[SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages 
with allGather() call"

### What changes were proposed in this pull request?

Change from `messages.toList.iterator` to 
`Iterator.single(messages.toList)`.

### Why are the changes needed?

In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated test.

Thanks WeichenXu123 reported this problem.

Closes #28596 from Ngone51/fix_allgather_test.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index b5614b2..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   // Pass partitionId message in
   val message: String = context.partitionId().toString
   val messages: Array[String] = context.allGather(message)
-  messages.toList.iterator
+  Iterator.single(messages.toList)
 }
-// Take a sorted list of all the partitionId messages
-val messages = rdd2.collect().head
-// All the task partitionIds are shared
-for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString)
+val messages = rdd2.collect()
+// All the task partitionIds are shared across all tasks
+assert(messages.length === 4)
+assert(messages.forall(_ == List("0", "1", "2", "3")))
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (60118a2 -> 83d0967)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 60118a2  [SPARK-31785][SQL][TESTS] Add a helper function to test all 
parquet readers
 add 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ec80e4b  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
ec80e4b is described below

commit ec80e4b5f80876378765544e0c4c6af1a704
Author: yi.wu 
AuthorDate: Thu May 21 23:34:11 2020 -0700

[SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages 
with allGather() call"

### What changes were proposed in this pull request?

Change from `messages.toList.iterator` to 
`Iterator.single(messages.toList)`.

### Why are the changes needed?

In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated test.

Thanks WeichenXu123 reported this problem.

Closes #28596 from Ngone51/fix_allgather_test.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index b5614b2..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   // Pass partitionId message in
   val message: String = context.partitionId().toString
   val messages: Array[String] = context.allGather(message)
-  messages.toList.iterator
+  Iterator.single(messages.toList)
 }
-// Take a sorted list of all the partitionId messages
-val messages = rdd2.collect().head
-// All the task partitionIds are shared
-for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString)
+val messages = rdd2.collect()
+// All the task partitionIds are shared across all tasks
+assert(messages.length === 4)
+assert(messages.forall(_ == List("0", "1", "2", "3")))
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (60118a2 -> 83d0967)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 60118a2  [SPARK-31785][SQL][TESTS] Add a helper function to test all 
parquet readers
 add 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
83d0967 is described below

commit 83d0967dcc6b205a3fd2003e051f49733f63cb30
Author: yi.wu 
AuthorDate: Thu May 21 23:34:11 2020 -0700

[SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages 
with allGather() call"

### What changes were proposed in this pull request?

Change from `messages.toList.iterator` to 
`Iterator.single(messages.toList)`.

### Why are the changes needed?

In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated test.

Thanks WeichenXu123 reported this problem.

Closes #28596 from Ngone51/fix_allgather_test.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index b5614b2..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   // Pass partitionId message in
   val message: String = context.partitionId().toString
   val messages: Array[String] = context.allGather(message)
-  messages.toList.iterator
+  Iterator.single(messages.toList)
 }
-// Take a sorted list of all the partitionId messages
-val messages = rdd2.collect().head
-// All the task partitionIds are shared
-for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString)
+val messages = rdd2.collect()
+// All the task partitionIds are shared across all tasks
+assert(messages.length === 4)
+assert(messages.forall(_ == List("0", "1", "2", "3")))
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers

2020-05-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 212ca86  [SPARK-31785][SQL][TESTS] Add a helper function to test all 
parquet readers
212ca86 is described below

commit 212ca86b8c2e7671e4980ea6c2b869286a4dac06
Author: Max Gekk 
AuthorDate: Fri May 22 09:53:35 2020 +0900

[SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers

### What changes were proposed in this pull request?
Add `withAllParquetReaders` to `ParquetTest`. The function allow to run a 
block of code for all available Parquet readers.

### Why are the changes needed?
1. It simplifies tests
2. Allow to test all parquet readers that could be available in projects 
based on Apache Spark.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running affected test suites.

Closes #28598 from MaxGekk/add-withAllParquetReaders.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 60118a242639df060a9fdcaa4f14cd072ea3d056)
Signed-off-by: HyukjinKwon 
---
 .../datasources/parquet/ParquetFilterSuite.scala   |  39 +++---
 .../datasources/parquet/ParquetIOSuite.scala   | 144 ++---
 .../parquet/ParquetInteroperabilitySuite.scala |   8 +-
 .../datasources/parquet/ParquetQuerySuite.scala|  30 ++---
 .../datasources/parquet/ParquetTest.scala  |   7 +
 5 files changed, 106 insertions(+), 122 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
index 5cf2129..7b33cef 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
@@ -781,10 +781,9 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
 
   test("Filter applied on merged Parquet schema with new column should work") {
 import testImplicits._
-Seq("true", "false").foreach { vectorized =>
+withAllParquetReaders {
   withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
-SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true",
-SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized) {
+SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true") {
 withTempPath { dir =>
   val path1 = s"${dir.getCanonicalPath}/table1"
   (1 to 3).map(i => (i, i.toString)).toDF("a", 
"b").write.parquet(path1)
@@ -1219,24 +1218,22 @@ abstract class ParquetFilterSuite extends QueryTest 
with ParquetTest with Shared
   }
 
   test("SPARK-17213: Broken Parquet filter push-down for string columns") {
-Seq(true, false).foreach { vectorizedEnabled =>
-  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorizedEnabled.toString) {
-withTempPath { dir =>
-  import testImplicits._
+withAllParquetReaders {
+  withTempPath { dir =>
+import testImplicits._
 
-  val path = dir.getCanonicalPath
-  // scalastyle:off nonascii
-  Seq("a", "é").toDF("name").write.parquet(path)
-  // scalastyle:on nonascii
+val path = dir.getCanonicalPath
+// scalastyle:off nonascii
+Seq("a", "é").toDF("name").write.parquet(path)
+// scalastyle:on nonascii
 
-  assert(spark.read.parquet(path).where("name > 'a'").count() == 1)
-  assert(spark.read.parquet(path).where("name >= 'a'").count() == 2)
+assert(spark.read.parquet(path).where("name > 'a'").count() == 1)
+assert(spark.read.parquet(path).where("name >= 'a'").count() == 2)
 
-  // scalastyle:off nonascii
-  assert(spark.read.parquet(path).where("name < 'é'").count() == 1)
-  assert(spark.read.parquet(path).where("name <= 'é'").count() == 2)
-  // scalastyle:on nonascii
-}
+// scalastyle:off nonascii
+assert(spark.read.parquet(path).where("name < 'é'").count() == 1)
+assert(spark.read.parquet(path).where("name <= 'é'").count() == 2)
+// scalastyle:on nonascii
   }
 }
   }
@@ -1244,8 +1241,8 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
   test("SPARK-31026: Parquet predicate pushdown for fields having dots in the 
names") {
 import testImplicits._
 
-Seq(true, false).foreach { vectorized =>
-  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorized.toString,
+withAllParquetReaders {
+  withSQLConf(
   SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> true

[spark] branch branch-3.0 updated: [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers

2020-05-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 212ca86  [SPARK-31785][SQL][TESTS] Add a helper function to test all 
parquet readers
212ca86 is described below

commit 212ca86b8c2e7671e4980ea6c2b869286a4dac06
Author: Max Gekk 
AuthorDate: Fri May 22 09:53:35 2020 +0900

[SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers

### What changes were proposed in this pull request?
Add `withAllParquetReaders` to `ParquetTest`. The function allow to run a 
block of code for all available Parquet readers.

### Why are the changes needed?
1. It simplifies tests
2. Allow to test all parquet readers that could be available in projects 
based on Apache Spark.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running affected test suites.

Closes #28598 from MaxGekk/add-withAllParquetReaders.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 60118a242639df060a9fdcaa4f14cd072ea3d056)
Signed-off-by: HyukjinKwon 
---
 .../datasources/parquet/ParquetFilterSuite.scala   |  39 +++---
 .../datasources/parquet/ParquetIOSuite.scala   | 144 ++---
 .../parquet/ParquetInteroperabilitySuite.scala |   8 +-
 .../datasources/parquet/ParquetQuerySuite.scala|  30 ++---
 .../datasources/parquet/ParquetTest.scala  |   7 +
 5 files changed, 106 insertions(+), 122 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
index 5cf2129..7b33cef 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
@@ -781,10 +781,9 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
 
   test("Filter applied on merged Parquet schema with new column should work") {
 import testImplicits._
-Seq("true", "false").foreach { vectorized =>
+withAllParquetReaders {
   withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
-SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true",
-SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized) {
+SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true") {
 withTempPath { dir =>
   val path1 = s"${dir.getCanonicalPath}/table1"
   (1 to 3).map(i => (i, i.toString)).toDF("a", 
"b").write.parquet(path1)
@@ -1219,24 +1218,22 @@ abstract class ParquetFilterSuite extends QueryTest 
with ParquetTest with Shared
   }
 
   test("SPARK-17213: Broken Parquet filter push-down for string columns") {
-Seq(true, false).foreach { vectorizedEnabled =>
-  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorizedEnabled.toString) {
-withTempPath { dir =>
-  import testImplicits._
+withAllParquetReaders {
+  withTempPath { dir =>
+import testImplicits._
 
-  val path = dir.getCanonicalPath
-  // scalastyle:off nonascii
-  Seq("a", "é").toDF("name").write.parquet(path)
-  // scalastyle:on nonascii
+val path = dir.getCanonicalPath
+// scalastyle:off nonascii
+Seq("a", "é").toDF("name").write.parquet(path)
+// scalastyle:on nonascii
 
-  assert(spark.read.parquet(path).where("name > 'a'").count() == 1)
-  assert(spark.read.parquet(path).where("name >= 'a'").count() == 2)
+assert(spark.read.parquet(path).where("name > 'a'").count() == 1)
+assert(spark.read.parquet(path).where("name >= 'a'").count() == 2)
 
-  // scalastyle:off nonascii
-  assert(spark.read.parquet(path).where("name < 'é'").count() == 1)
-  assert(spark.read.parquet(path).where("name <= 'é'").count() == 2)
-  // scalastyle:on nonascii
-}
+// scalastyle:off nonascii
+assert(spark.read.parquet(path).where("name < 'é'").count() == 1)
+assert(spark.read.parquet(path).where("name <= 'é'").count() == 2)
+// scalastyle:on nonascii
   }
 }
   }
@@ -1244,8 +1241,8 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
   test("SPARK-31026: Parquet predicate pushdown for fields having dots in the 
names") {
 import testImplicits._
 
-Seq(true, false).foreach { vectorized =>
-  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorized.toString,
+withAllParquetReaders {
+  withSQLConf(
   SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> true

[spark] branch master updated (db5e5fc -> 60118a2)

2020-05-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db5e5fc  Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0"
 add 60118a2  [SPARK-31785][SQL][TESTS] Add a helper function to test all 
parquet readers

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetFilterSuite.scala   |  39 +++---
 .../datasources/parquet/ParquetIOSuite.scala   | 144 ++---
 .../parquet/ParquetInteroperabilitySuite.scala |   8 +-
 .../datasources/parquet/ParquetQuerySuite.scala|  30 ++---
 .../datasources/parquet/ParquetTest.scala  |   7 +
 5 files changed, 106 insertions(+), 122 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers

2020-05-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 60118a2  [SPARK-31785][SQL][TESTS] Add a helper function to test all 
parquet readers
60118a2 is described below

commit 60118a242639df060a9fdcaa4f14cd072ea3d056
Author: Max Gekk 
AuthorDate: Fri May 22 09:53:35 2020 +0900

[SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers

### What changes were proposed in this pull request?
Add `withAllParquetReaders` to `ParquetTest`. The function allow to run a 
block of code for all available Parquet readers.

### Why are the changes needed?
1. It simplifies tests
2. Allow to test all parquet readers that could be available in projects 
based on Apache Spark.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running affected test suites.

Closes #28598 from MaxGekk/add-withAllParquetReaders.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
---
 .../datasources/parquet/ParquetFilterSuite.scala   |  39 +++---
 .../datasources/parquet/ParquetIOSuite.scala   | 144 ++---
 .../parquet/ParquetInteroperabilitySuite.scala |   8 +-
 .../datasources/parquet/ParquetQuerySuite.scala|  30 ++---
 .../datasources/parquet/ParquetTest.scala  |   7 +
 5 files changed, 106 insertions(+), 122 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
index 5cf2129..7b33cef 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
@@ -781,10 +781,9 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
 
   test("Filter applied on merged Parquet schema with new column should work") {
 import testImplicits._
-Seq("true", "false").foreach { vectorized =>
+withAllParquetReaders {
   withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
-SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true",
-SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized) {
+SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true") {
 withTempPath { dir =>
   val path1 = s"${dir.getCanonicalPath}/table1"
   (1 to 3).map(i => (i, i.toString)).toDF("a", 
"b").write.parquet(path1)
@@ -1219,24 +1218,22 @@ abstract class ParquetFilterSuite extends QueryTest 
with ParquetTest with Shared
   }
 
   test("SPARK-17213: Broken Parquet filter push-down for string columns") {
-Seq(true, false).foreach { vectorizedEnabled =>
-  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorizedEnabled.toString) {
-withTempPath { dir =>
-  import testImplicits._
+withAllParquetReaders {
+  withTempPath { dir =>
+import testImplicits._
 
-  val path = dir.getCanonicalPath
-  // scalastyle:off nonascii
-  Seq("a", "é").toDF("name").write.parquet(path)
-  // scalastyle:on nonascii
+val path = dir.getCanonicalPath
+// scalastyle:off nonascii
+Seq("a", "é").toDF("name").write.parquet(path)
+// scalastyle:on nonascii
 
-  assert(spark.read.parquet(path).where("name > 'a'").count() == 1)
-  assert(spark.read.parquet(path).where("name >= 'a'").count() == 2)
+assert(spark.read.parquet(path).where("name > 'a'").count() == 1)
+assert(spark.read.parquet(path).where("name >= 'a'").count() == 2)
 
-  // scalastyle:off nonascii
-  assert(spark.read.parquet(path).where("name < 'é'").count() == 1)
-  assert(spark.read.parquet(path).where("name <= 'é'").count() == 2)
-  // scalastyle:on nonascii
-}
+// scalastyle:off nonascii
+assert(spark.read.parquet(path).where("name < 'é'").count() == 1)
+assert(spark.read.parquet(path).where("name <= 'é'").count() == 2)
+// scalastyle:on nonascii
   }
 }
   }
@@ -1244,8 +1241,8 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
   test("SPARK-31026: Parquet predicate pushdown for fields having dots in the 
names") {
 import testImplicits._
 
-Seq(true, false).foreach { vectorized =>
-  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorized.toString,
+withAllParquetReaders {
+  withSQLConf(
   SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> true.toString,
   SQLConf.SUPPORT_QUOTED_REGEX_COLUMN_NAME.key -> "false") {
 withTempPath { path =>

[spark] branch master updated (92877c4 -> db5e5fc)

2020-05-21 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92877c4  [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0
 add db5e5fc  Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0"

No new revisions were added by this update.

Summary of changes:
 core/pom.xml  |  2 +-
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala  |  7 +--
 core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala |  3 +--
 pom.xml   | 10 +-
 sql/core/pom.xml  |  2 +-
 sql/hive-thriftserver/pom.xml |  2 +-
 streaming/pom.xml |  2 +-
 7 files changed, 11 insertions(+), 17 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (92877c4 -> db5e5fc)

2020-05-21 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92877c4  [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0
 add db5e5fc  Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0"

No new revisions were added by this update.

Summary of changes:
 core/pom.xml  |  2 +-
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala  |  7 +--
 core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala |  3 +--
 pom.xml   | 10 +-
 sql/core/pom.xml  |  2 +-
 sql/hive-thriftserver/pom.xml |  2 +-
 streaming/pom.xml |  2 +-
 7 files changed, 11 insertions(+), 17 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (92877c4 -> db5e5fc)

2020-05-21 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92877c4  [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0
 add db5e5fc  Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0"

No new revisions were added by this update.

Summary of changes:
 core/pom.xml  |  2 +-
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala  |  7 +--
 core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala |  3 +--
 pom.xml   | 10 +-
 sql/core/pom.xml  |  2 +-
 sql/hive-thriftserver/pom.xml |  2 +-
 streaming/pom.xml |  2 +-
 7 files changed, 11 insertions(+), 17 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: Revert "[SPARK-31387][SQL] Handle unknown operation/session ID in HiveThriftServer2Listener"

2020-05-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cc1e568e Revert "[SPARK-31387][SQL] Handle unknown operation/session 
ID in HiveThriftServer2Listener"
cc1e568e is described below

commit cc1e568ea9cff11879a7c3eceadfe455443af3af
Author: Dongjoon Hyun 
AuthorDate: Thu May 21 14:20:50 2020 -0700

Revert "[SPARK-31387][SQL] Handle unknown operation/session ID in 
HiveThriftServer2Listener"

This reverts commit 5198b6853b2e3bc69fc013c653aa163c79168366.
---
 .../ui/HiveThriftServer2Listener.scala | 106 -
 .../hive/thriftserver/HiveSessionImplSuite.scala   |  73 --
 .../ui/HiveThriftServer2ListenerSuite.scala|  17 
 .../hive/service/cli/session/HiveSessionImpl.java  |  20 ++--
 .../hive/service/cli/session/HiveSessionImpl.java  |  20 ++--
 5 files changed, 53 insertions(+), 183 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
index 6b7e5ee..6d0a506 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
@@ -25,7 +25,6 @@ import scala.collection.mutable.ArrayBuffer
 import org.apache.hive.service.server.HiveServer2
 
 import org.apache.spark.{SparkConf, SparkContext}
-import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD
 import org.apache.spark.scheduler._
 import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState
@@ -39,7 +38,7 @@ private[thriftserver] class HiveThriftServer2Listener(
 kvstore: ElementTrackingStore,
 sparkConf: SparkConf,
 server: Option[HiveServer2],
-live: Boolean = true) extends SparkListener with Logging {
+live: Boolean = true) extends SparkListener {
 
   private val sessionList = new ConcurrentHashMap[String, LiveSessionData]()
   private val executionList = new ConcurrentHashMap[String, 
LiveExecutionData]()
@@ -132,83 +131,60 @@ private[thriftserver] class HiveThriftServer2Listener(
 updateLiveStore(session)
   }
 
-  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
=
-Option(sessionList.get(e.sessionId)) match {
-  case Some(sessionData) =>
-sessionData.finishTimestamp = e.finishTime
-updateStoreWithTriggerEnabled(sessionData)
-sessionList.remove(e.sessionId)
-  case None => logWarning(s"onSessionClosed called with unknown session 
id: ${e.sessionId}")
-}
+  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
= {
+val session = sessionList.get(e.sessionId)
+session.finishTimestamp = e.finishTime
+updateStoreWithTriggerEnabled(session)
+sessionList.remove(e.sessionId)
+  }
 
   private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit = {
-val executionData = getOrCreateExecution(
+val info = getOrCreateExecution(
   e.id,
   e.statement,
   e.sessionId,
   e.startTime,
   e.userName)
 
-executionData.state = ExecutionState.STARTED
-executionList.put(e.id, executionData)
-executionData.groupId = e.groupId
-updateLiveStore(executionData)
-
-Option(sessionList.get(e.sessionId)) match {
-  case Some(sessionData) =>
-sessionData.totalExecution += 1
-updateLiveStore(sessionData)
-  case None => logWarning(s"onOperationStart called with unknown session 
id: ${e.sessionId}." +
-s"Regardless, the operation has been registered.")
-}
+info.state = ExecutionState.STARTED
+executionList.put(e.id, info)
+sessionList.get(e.sessionId).totalExecution += 1
+executionList.get(e.id).groupId = e.groupId
+updateLiveStore(executionList.get(e.id))
+updateLiveStore(sessionList.get(e.sessionId))
   }
 
-  private def onOperationParsed(e: SparkListenerThriftServerOperationParsed): 
Unit =
-Option(executionList.get(e.id)) match {
-  case Some(executionData) =>
-executionData.executePlan = e.executionPlan
-executionData.state = ExecutionState.COMPILED
-updateLiveStore(executionData)
-  case None => logWarning(s"onOperationParsed called with unknown 
operation id: ${e.id}")
-}
+  private def onOperationParsed(e: SparkListenerThriftServerOperationParsed): 
Unit = {
+executionList.get(e.id).executePlan = e.executionPlan
+executionList.get(e.id).state = ExecutionState.COMPILED
+updateLiveStore(executionList.get(e.id))
+  }
 
-  private def onOperationCanceled(e: 
Spar

[spark] branch branch-2.4 updated: [SPARK-31787][K8S][TESTS][2.4] Fix Minikube.getIfNewMinikubeStatus to understand 1.5+

2020-05-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 1d1a207  [SPARK-31787][K8S][TESTS][2.4] Fix 
Minikube.getIfNewMinikubeStatus to understand 1.5+
1d1a207 is described below

commit 1d1a2079a56de25b0d14dd57f458a0f3830a6e14
Author: Marcelo Vanzin 
AuthorDate: Thu May 21 14:09:51 2020 -0700

[SPARK-31787][K8S][TESTS][2.4] Fix Minikube.getIfNewMinikubeStatus to 
understand 1.5+

### What changes were proposed in this pull request?

This PR aims to fix the testing infra to support Minikube 1.5+ in K8s IT.
Also, note that this is a subset of #26488 with the same ownership.

### Why are the changes needed?

This helps us testing `master/3.0/2.4` in the same Minikube version.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Pass the Jenkins K8s IT with Minikube v0.34.1.
- Manually, test with Minikube 1.5.x.
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 4 minutes, 37 seconds.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #28599 from dongjoon-hyun/SPARK-31787.

Authored-by: Marcelo Vanzin 
Signed-off-by: Dongjoon Hyun 
---
 .../integrationtest/backend/minikube/Minikube.scala| 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala
 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala
index 78ef44b..5968d3a 100644
--- 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala
+++ 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala
@@ -30,6 +30,7 @@ private[spark] object Minikube extends Logging {
   private val KUBELET_PREFIX = "kubelet:"
   private val APISERVER_PREFIX = "apiserver:"
   private val KUBECTL_PREFIX = "kubectl:"
+  private val KUBECONFIG_PREFIX = "kubeconfig:"
   private val MINIKUBE_VM_PREFIX = "minikubeVM: "
   private val MINIKUBE_PREFIX = "minikube: "
   private val MINIKUBE_PATH = ".minikube"
@@ -86,18 +87,23 @@ private[spark] object Minikube extends Logging {
 val kubeletString = statusString.find(_.contains(s"$KUBELET_PREFIX "))
 val apiserverString = statusString.find(_.contains(s"$APISERVER_PREFIX "))
 val kubectlString = statusString.find(_.contains(s"$KUBECTL_PREFIX "))
+val kubeconfigString = statusString.find(_.contains(s"$KUBECONFIG_PREFIX 
"))
+val hasConfigStatus = kubectlString.isDefined || kubeconfigString.isDefined
 
-if (hostString.isEmpty || kubeletString.isEmpty
-  || apiserverString.isEmpty || kubectlString.isEmpty) {
+if (hostString.isEmpty || kubeletString.isEmpty || apiserverString.isEmpty 
||
+!hasConfigStatus) {
   MinikubeStatus.NONE
 } else {
   val status1 = hostString.get.replaceFirst(s"$HOST_PREFIX ", "")
   val status2 = kubeletString.get.replaceFirst(s"$KUBELET_PREFIX ", "")
   val status3 = apiserverString.get.replaceFirst(s"$APISERVER_PREFIX ", "")
-  val status4 = kubectlString.get.replaceFirst(s"$KUBECTL_PREFIX ", "")
-  if (!status4.contains("Correctly Configured:")) {
-MinikubeStatus.NONE
+  val isConfigured = if (kubectlString.isDefined) {
+val cfgStatus = kubectlString.get.replaceFirst(s"$KUBECTL_PREFIX ", "")
+cfgStatus.contains("Correctly Configured:")
   } else {
+kubeconfigString.get.replaceFirst(s"$KUBECONFIG_PREFIX ", "") == 
"Configured"
+  }
+  if (isConfigured) {
 val stats = List(status1, status2, status3)
   .map(MinikubeStatus.unapply)
   .map(_.getOrElse(throw new IllegalStateException(s"Unknown status 
$statusString")))
@@ -106,6 +112,8 @@ private[spark] object

[spark] branch branch-2.4 updated (11e97b9 -> 1d1a207)

2020-05-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 11e97b9  [K8S][MINOR] Log minikube version when running integration 
tests.
 add 1d1a207  [SPARK-31787][K8S][TESTS][2.4] Fix 
Minikube.getIfNewMinikubeStatus to understand 1.5+

No new revisions were added by this update.

Summary of changes:
 .../integrationtest/backend/minikube/Minikube.scala| 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0

2020-05-21 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 92877c4  [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0
92877c4 is described below

commit 92877c4ef2ad113c156b7d9c359f396187c78fa3
Author: Kousuke Saruta 
AuthorDate: Thu May 21 11:43:25 2020 -0700

[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0

### What changes were proposed in this pull request?

This PR upgrades HtmlUnit.
Selenium and Jetty also upgraded because of dependency.
### Why are the changes needed?

Recently, a security issue which affects HtmlUnit is reported.
https://nvd.nist.gov/vuln/detail/CVE-2020-5529
According to the report, arbitrary code can be run by malicious users.
HtmlUnit is used for test so the impact might not be large but it's better 
to upgrade it just in case.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing testcases.

Closes #28585 from sarutak/upgrade-htmlunit.

Authored-by: Kousuke Saruta 
Signed-off-by: Gengliang Wang 
---
 core/pom.xml  |  2 +-
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala  |  7 ++-
 core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala |  3 ++-
 pom.xml   | 10 +-
 sql/core/pom.xml  |  2 +-
 sql/hive-thriftserver/pom.xml |  2 +-
 streaming/pom.xml |  2 +-
 7 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/core/pom.xml b/core/pom.xml
index b0f6888..14b217d 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -334,7 +334,7 @@
 
 
   org.seleniumhq.selenium
-  selenium-htmlunit-driver
+  htmlunit-driver
   test
 
 
diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index 4b4788f..f1962ef 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -23,6 +23,7 @@ import javax.servlet.DispatcherType
 import javax.servlet.http._
 
 import scala.language.implicitConversions
+import scala.util.Try
 import scala.xml.Node
 
 import org.eclipse.jetty.client.HttpClient
@@ -500,7 +501,11 @@ private[spark] case class ServerInfo(
 threadPool match {
   case pool: QueuedThreadPool =>
 // Workaround for SPARK-30385 to avoid Jetty's acceptor thread shrink.
-pool.setIdleTimeout(0)
+// As of Jetty 9.4.21, the implementation of
+// QueuedThreadPool#setIdleTimeout is changed and IllegalStateException
+// will be thrown if we try to set idle timeout after the server has 
started.
+// But this workaround works for Jetty 9.4.28 by ignoring the 
exception.
+Try(pool.setIdleTimeout(0))
   case _ =>
 }
 server.stop()
diff --git a/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala 
b/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala
index 3ec9385..e96d82a 100644
--- a/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala
@@ -24,6 +24,7 @@ import javax.servlet.http.{HttpServletRequest, 
HttpServletResponse}
 import scala.io.Source
 import scala.xml.Node
 
+import com.gargoylesoftware.css.parser.CSSParseException
 import com.gargoylesoftware.htmlunit.DefaultCssErrorHandler
 import org.json4s._
 import org.json4s.jackson.JsonMethods
@@ -33,7 +34,6 @@ import org.scalatest._
 import org.scalatest.concurrent.Eventually._
 import org.scalatest.time.SpanSugar._
 import org.scalatestplus.selenium.WebBrowser
-import org.w3c.css.sac.CSSParseException
 
 import org.apache.spark._
 import org.apache.spark.LocalSparkContext._
@@ -784,6 +784,7 @@ class UISeleniumSuite extends SparkFunSuite with WebBrowser 
with Matchers with B
 
   eventually(timeout(10.seconds), interval(50.milliseconds)) {
 goToUi(sc, "/jobs")
+
 val jobDesc =
   
driver.findElement(By.cssSelector("div[class='application-timeline-content']"))
 jobDesc.getAttribute("data-title") should include  ("collect at 
:25")
diff --git a/pom.xml b/pom.xml
index fd4cebc..29f7fec 100644
--- a/pom.xml
+++ b/pom.xml
@@ -139,7 +139,7 @@
 
 com.twitter
 1.6.0
-9.4.18.v20190429
+9.4.28.v20200408
 3.1.0
 0.9.5
 2.4.0
@@ -187,8 +187,8 @@
 0.12.0
 4.7.1
 1.1
-2.52.0
-2.22
+3.141.59
+2.40.0
 
@@ -591,8 +591,8 @@
   
   
 org.seleniumhq.selenium
-selenium-htmlunit-driver
-${selenium.version}
+

[spark] branch master updated: [SPARK-29303][WEB UI] Add UI support for stage level scheduling

2020-05-21 Thread tgraves

This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b64688e  [SPARK-29303][WEB UI] Add UI support for stage level 
scheduling
b64688e is described below

commit b64688ebbaac7afd3734c0d84d1e77b1fd2d2e9d
Author: Thomas Graves 
AuthorDate: Thu May 21 13:11:35 2020 -0500

[SPARK-29303][WEB UI] Add UI support for stage level scheduling

### What changes were proposed in this pull request?

This adds UI updates to support stage level scheduling and 
ResourceProfiles. 3 main things have been added. ResourceProfile id added to 
the Stage page, the Executors page now has an optional selectable column to 
show the ResourceProfile Id of each executor, and the Environment page now has 
a section with the ResourceProfile ids.  Along with this the rest api for 
environment page was updated to include the Resource profile information.

I debating on splitting the resource profile information into its own page 
but I wasn't sure it called for a completely separate page. Open to peoples 
thoughts on this.

Screen shots:
![Screen Shot 2020-04-01 at 3 07 46 
PM](https://user-images.githubusercontent.com/4563792/78185169-469a7000-7430-11ea-8b0c-d9ede2d41df8.png)
![Screen Shot 2020-04-01 at 3 08 14 
PM](https://user-images.githubusercontent.com/4563792/78185175-48fcca00-7430-11ea-8d1d-6b9333700f32.png)
![Screen Shot 2020-04-01 at 3 09 03 
PM](https://user-images.githubusercontent.com/4563792/78185176-4a2df700-7430-11ea-92d9-73c382bb0f32.png)
![Screen Shot 2020-04-01 at 11 05 48 
AM](https://user-images.githubusercontent.com/4563792/78185186-4dc17e00-7430-11ea-8962-f749dd47ea60.png)

### Why are the changes needed?

For user to be able to know what resource profile was used with which stage 
and executors. The resource profile information is also available so user 
debugging can see exactly what resources were requested with that profile.

### Does this PR introduce any user-facing change?

Yes, UI updates.

### How was this patch tested?

Unit tests and tested on yarn both active applications and with the history 
server.

Closes #28094 from tgravescs/SPARK-29303-pr.

Lead-authored-by: Thomas Graves 
Co-authored-by: Thomas Graves 
Signed-off-by: Thomas Graves 
---
 .../org/apache/spark/SparkFirehoseListener.java|   5 +
 .../spark/ui/static/executorspage-template.html|   1 +
 .../org/apache/spark/ui/static/executorspage.js|   7 +-
 .../main/scala/org/apache/spark/SparkContext.scala |   2 +-
 .../deploy/history/HistoryAppStatusStore.scala |   3 +-
 .../spark/resource/ResourceProfileManager.scala|   7 +-
 .../spark/scheduler/EventLoggingListener.scala |   4 +
 .../org/apache/spark/scheduler/SparkListener.scala |  12 +++
 .../apache/spark/scheduler/SparkListenerBus.scala  |   2 +
 .../apache/spark/status/AppStatusListener.scala|  33 +-
 .../org/apache/spark/status/AppStatusStore.scala   |   8 +-
 .../scala/org/apache/spark/status/LiveEntity.scala |  25 -
 .../status/api/v1/OneApplicationResource.scala |   4 +-
 .../scala/org/apache/spark/status/api/v1/api.scala |  18 +++-
 .../scala/org/apache/spark/status/storeTypes.scala |   7 ++
 .../org/apache/spark/ui/env/EnvironmentPage.scala  |  48 +
 .../scala/org/apache/spark/ui/jobs/JobPage.scala   |   4 +-
 .../scala/org/apache/spark/ui/jobs/StagePage.scala |   4 +
 .../scala/org/apache/spark/util/JsonProtocol.scala | 101 +--
 .../app_environment_expectation.json   |   3 +-
 .../application_list_json_expectation.json |  15 +++
 .../blacklisting_for_stage_expectation.json|   3 +-
 .../blacklisting_node_for_stage_expectation.json   |   3 +-
 .../complete_stage_list_json_expectation.json  |   9 +-
 .../completed_app_list_json_expectation.json   |  15 +++
 .../executor_list_json_expectation.json|   3 +-
 ...ist_with_executor_metrics_json_expectation.json |  12 ++-
 .../executor_memory_usage_expectation.json |  15 ++-
 .../executor_node_blacklisting_expectation.json|  15 ++-
 ...de_blacklisting_unblacklisting_expectation.json |  15 ++-
 .../executor_resource_information_expectation.json |   9 +-
 .../failed_stage_list_json_expectation.json|   3 +-
 .../limit_app_list_json_expectation.json   |  30 +++---
 .../minDate_app_list_json_expectation.json |  18 +++-
 .../minEndDate_app_list_json_expectation.json  |  15 +++
 .../multiple_resource_profiles_expectation.json| 112 +
 .../one_stage_attempt_json_expectation.json|   3 +-
 .../one_stage_json_expectation.json|   3 +-
 .../stage_list_json_expectation.json   |  12 ++-
 ...age_list_with_accumulable_json_expectation.json |

[spark] branch master updated (dae7988 -> f1495c5)

2020-05-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
 add f1495c5  [SPARK-31688][WEBUI] Refactor Pagination framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/PagedTable.scala | 101 -
 .../org/apache/spark/ui/jobs/AllJobsPage.scala | 128 ++-
 .../org/apache/spark/ui/jobs/StageTable.scala  | 114 ++
 .../org/apache/spark/ui/storage/RDDPage.scala  |  64 ++
 .../spark/sql/execution/ui/AllExecutionsPage.scala | 123 ++
 .../hive/thriftserver/ui/ThriftServerPage.scala| 251 +
 .../thriftserver/ui/ThriftServerSessionPage.scala  |  29 +--
 7 files changed, 238 insertions(+), 572 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (dae7988 -> f1495c5)

2020-05-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
 add f1495c5  [SPARK-31688][WEBUI] Refactor Pagination framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/PagedTable.scala | 101 -
 .../org/apache/spark/ui/jobs/AllJobsPage.scala | 128 ++-
 .../org/apache/spark/ui/jobs/StageTable.scala  | 114 ++
 .../org/apache/spark/ui/storage/RDDPage.scala  |  64 ++
 .../spark/sql/execution/ui/AllExecutionsPage.scala | 123 ++
 .../hive/thriftserver/ui/ThriftServerPage.scala| 251 +
 .../thriftserver/ui/ThriftServerSessionPage.scala  |  29 +--
 7 files changed, 238 insertions(+), 572 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (dae7988 -> f1495c5)

2020-05-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
 add f1495c5  [SPARK-31688][WEBUI] Refactor Pagination framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/PagedTable.scala | 101 -
 .../org/apache/spark/ui/jobs/AllJobsPage.scala | 128 ++-
 .../org/apache/spark/ui/jobs/StageTable.scala  | 114 ++
 .../org/apache/spark/ui/storage/RDDPage.scala  |  64 ++
 .../spark/sql/execution/ui/AllExecutionsPage.scala | 123 ++
 .../hive/thriftserver/ui/ThriftServerPage.scala| 251 +
 .../thriftserver/ui/ThriftServerSessionPage.scala  |  29 +--
 7 files changed, 238 insertions(+), 572 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (dae7988 -> f1495c5)

2020-05-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
 add f1495c5  [SPARK-31688][WEBUI] Refactor Pagination framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/PagedTable.scala | 101 -
 .../org/apache/spark/ui/jobs/AllJobsPage.scala | 128 ++-
 .../org/apache/spark/ui/jobs/StageTable.scala  | 114 ++
 .../org/apache/spark/ui/storage/RDDPage.scala  |  64 ++
 .../spark/sql/execution/ui/AllExecutionsPage.scala | 123 ++
 .../hive/thriftserver/ui/ThriftServerPage.scala| 251 +
 .../thriftserver/ui/ThriftServerSessionPage.scala  |  29 +--
 7 files changed, 238 insertions(+), 572 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [K8S][MINOR] Log minikube version when running integration tests.

2020-05-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 11e97b9  [K8S][MINOR] Log minikube version when running integration 
tests.
11e97b9 is described below

commit 11e97b92b576102e77db287199c905abb3ba4244
Author: Marcelo Vanzin 
AuthorDate: Fri Mar 1 11:24:08 2019 -0800

[K8S][MINOR] Log minikube version when running integration tests.

Closes #23893 from vanzin/minikube-version.

Authored-by: Marcelo Vanzin 
Signed-off-by: Marcelo Vanzin 
(cherry picked from commit 9f16af636661ec1f7af057764409fe359da0026a)
Signed-off-by: Dongjoon Hyun 
---
 .../k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala   | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala
 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala
index cb93241..f92977d 100644
--- 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala
+++ 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala
@@ -25,6 +25,7 @@ private[spark] object MinikubeTestBackend extends 
IntegrationTestBackend {
   private var defaultClient: DefaultKubernetesClient = _
 
   override def initialize(): Unit = {
+Minikube.logVersion()
 val minikubeStatus = Minikube.getMinikubeStatus
 require(minikubeStatus == MinikubeStatus.RUNNING,
 s"Minikube must be running to use the Minikube backend for integration 
tests." +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [K8S][MINOR] Log minikube version when running integration tests.

2020-05-21 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 11e97b9  [K8S][MINOR] Log minikube version when running integration 
tests.
11e97b9 is described below

commit 11e97b92b576102e77db287199c905abb3ba4244
Author: Marcelo Vanzin 
AuthorDate: Fri Mar 1 11:24:08 2019 -0800

[K8S][MINOR] Log minikube version when running integration tests.

Closes #23893 from vanzin/minikube-version.

Authored-by: Marcelo Vanzin 
Signed-off-by: Marcelo Vanzin 
(cherry picked from commit 9f16af636661ec1f7af057764409fe359da0026a)
Signed-off-by: Dongjoon Hyun 
---
 .../k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala   | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala
 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala
index cb93241..f92977d 100644
--- 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala
+++ 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala
@@ -25,6 +25,7 @@ private[spark] object MinikubeTestBackend extends 
IntegrationTestBackend {
   private var defaultClient: DefaultKubernetesClient = _
 
   override def initialize(): Unit = {
+Minikube.logVersion()
 val minikubeStatus = Minikube.getMinikubeStatus
 require(minikubeStatus == MinikubeStatus.RUNNING,
 s"Minikube must be running to use the Minikube backend for integration 
tests." +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0f2afd3  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
0f2afd3 is described below

commit 0f2afd3455e164dcd273f6c69774d08f7121af8d
Author: Vinoo Ganesh 
AuthorDate: Thu May 21 16:06:28 2020 +

[SPARK-31354] SparkContext only register one SparkSession ApplicationEnd 
listener

## What changes were proposed in this pull request?

This change was made as a result of the conversation on 
https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue 
work from that ticket here.

This change fixes a memory leak where SparkSession listeners are never 
cleared off of the SparkContext listener bus.

Before running this PR, the following code:
```
SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()
```
would result in a SparkContext with the following listeners on the listener 
bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance
org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance
```
After this PR, the execution of the same code above results in SparkContext 
with the following listeners on the listener bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance
```
## How was this patch tested?

* Unit test included as a part of the PR

Closes #28128 from vinooganesh/vinooganesh/SPARK-27958.

Lead-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Signed-off-by: Wenchen Fan 
(cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74)
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index be597ed..60a6037 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql
 
 import java.io.Closeable
 import java.util.concurrent.TimeUnit._
-import java.util.concurrent.atomic.AtomicReference
+import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference}
 
 import scala.collection.JavaConverters._
 import scala.reflect.runtime.universe.TypeTag
@@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.sql.util.ExecutionListenerManager
 import org.apache.spark.util.{CallSite, Utils}
 
-
 /**
  * The entry point to programming Spark with the Dataset and DataFrame API.
  *
@@ -940,15 +939,7 @@ object SparkSession extends Logging {
 options.foreach { case (k, v) => session.initialSessionOptions.put(k, 
v) }
 setDefaultSession(session)
 setActiveSession(session)
-
-// Register a successfully instantiated context to the singleton. This 
should be at the
-// end of the class definition so that the singleton is updated only 
if there is no
-// exception in the construction of the instance.
-sparkContext.addSparkListener(new SparkListener {
-  override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
-defaultSession.set(null)
-  }
-})
+registerContextListener(sparkContext)
   }
 
   return session
@@ -1064,6 +1055,20 @@ object SparkSession extends Logging {
   // Private methods from now on
   

 
+  private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false)
+
+  /** Register the AppEnd listener onto the Context  */
+  private def registerContextListener(sparkContext: SparkContext): Unit = {
+if (!listenerRegistered.get()) {
+  sparkContext.addSparkListener(new SparkListener {
+override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
+  defaultSession.set(null)
+}
+  })
+  listenerRegistered.set(true)
+}
+  }
+
   /** The active SparkSession for the current thread. */

[spark] branch master updated (5d67331 -> dae7988)

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5d67331  [SPARK-31762][SQL] Fix perf regression of date/timestamp 
formatting in toHiveString
 add dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0f2afd3  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
0f2afd3 is described below

commit 0f2afd3455e164dcd273f6c69774d08f7121af8d
Author: Vinoo Ganesh 
AuthorDate: Thu May 21 16:06:28 2020 +

[SPARK-31354] SparkContext only register one SparkSession ApplicationEnd 
listener

## What changes were proposed in this pull request?

This change was made as a result of the conversation on 
https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue 
work from that ticket here.

This change fixes a memory leak where SparkSession listeners are never 
cleared off of the SparkContext listener bus.

Before running this PR, the following code:
```
SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()
```
would result in a SparkContext with the following listeners on the listener 
bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance
org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance
```
After this PR, the execution of the same code above results in SparkContext 
with the following listeners on the listener bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance
```
## How was this patch tested?

* Unit test included as a part of the PR

Closes #28128 from vinooganesh/vinooganesh/SPARK-27958.

Lead-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Signed-off-by: Wenchen Fan 
(cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74)
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index be597ed..60a6037 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql
 
 import java.io.Closeable
 import java.util.concurrent.TimeUnit._
-import java.util.concurrent.atomic.AtomicReference
+import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference}
 
 import scala.collection.JavaConverters._
 import scala.reflect.runtime.universe.TypeTag
@@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.sql.util.ExecutionListenerManager
 import org.apache.spark.util.{CallSite, Utils}
 
-
 /**
  * The entry point to programming Spark with the Dataset and DataFrame API.
  *
@@ -940,15 +939,7 @@ object SparkSession extends Logging {
 options.foreach { case (k, v) => session.initialSessionOptions.put(k, 
v) }
 setDefaultSession(session)
 setActiveSession(session)
-
-// Register a successfully instantiated context to the singleton. This 
should be at the
-// end of the class definition so that the singleton is updated only 
if there is no
-// exception in the construction of the instance.
-sparkContext.addSparkListener(new SparkListener {
-  override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
-defaultSession.set(null)
-  }
-})
+registerContextListener(sparkContext)
   }
 
   return session
@@ -1064,6 +1055,20 @@ object SparkSession extends Logging {
   // Private methods from now on
   

 
+  private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false)
+
+  /** Register the AppEnd listener onto the Context  */
+  private def registerContextListener(sparkContext: SparkContext): Unit = {
+if (!listenerRegistered.get()) {
+  sparkContext.addSparkListener(new SparkListener {
+override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
+  defaultSession.set(null)
+}
+  })
+  listenerRegistered.set(true)
+}
+  }
+
   /** The active SparkSession for the current thread. */

[spark] branch master updated (5d67331 -> dae7988)

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5d67331  [SPARK-31762][SQL] Fix perf regression of date/timestamp 
formatting in toHiveString
 add dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0f2afd3  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
0f2afd3 is described below

commit 0f2afd3455e164dcd273f6c69774d08f7121af8d
Author: Vinoo Ganesh 
AuthorDate: Thu May 21 16:06:28 2020 +

[SPARK-31354] SparkContext only register one SparkSession ApplicationEnd 
listener

## What changes were proposed in this pull request?

This change was made as a result of the conversation on 
https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue 
work from that ticket here.

This change fixes a memory leak where SparkSession listeners are never 
cleared off of the SparkContext listener bus.

Before running this PR, the following code:
```
SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()
```
would result in a SparkContext with the following listeners on the listener 
bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance
org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance
```
After this PR, the execution of the same code above results in SparkContext 
with the following listeners on the listener bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance
```
## How was this patch tested?

* Unit test included as a part of the PR

Closes #28128 from vinooganesh/vinooganesh/SPARK-27958.

Lead-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Signed-off-by: Wenchen Fan 
(cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74)
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index be597ed..60a6037 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql
 
 import java.io.Closeable
 import java.util.concurrent.TimeUnit._
-import java.util.concurrent.atomic.AtomicReference
+import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference}
 
 import scala.collection.JavaConverters._
 import scala.reflect.runtime.universe.TypeTag
@@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.sql.util.ExecutionListenerManager
 import org.apache.spark.util.{CallSite, Utils}
 
-
 /**
  * The entry point to programming Spark with the Dataset and DataFrame API.
  *
@@ -940,15 +939,7 @@ object SparkSession extends Logging {
 options.foreach { case (k, v) => session.initialSessionOptions.put(k, 
v) }
 setDefaultSession(session)
 setActiveSession(session)
-
-// Register a successfully instantiated context to the singleton. This 
should be at the
-// end of the class definition so that the singleton is updated only 
if there is no
-// exception in the construction of the instance.
-sparkContext.addSparkListener(new SparkListener {
-  override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
-defaultSession.set(null)
-  }
-})
+registerContextListener(sparkContext)
   }
 
   return session
@@ -1064,6 +1055,20 @@ object SparkSession extends Logging {
   // Private methods from now on
   

 
+  private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false)
+
+  /** Register the AppEnd listener onto the Context  */
+  private def registerContextListener(sparkContext: SparkContext): Unit = {
+if (!listenerRegistered.get()) {
+  sparkContext.addSparkListener(new SparkListener {
+override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
+  defaultSession.set(null)
+}
+  })
+  listenerRegistered.set(true)
+}
+  }
+
   /** The active SparkSession for the current thread. */

[spark] branch master updated (5d67331 -> dae7988)

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5d67331  [SPARK-31762][SQL] Fix perf regression of date/timestamp 
formatting in toHiveString
 add dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0f2afd3  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
0f2afd3 is described below

commit 0f2afd3455e164dcd273f6c69774d08f7121af8d
Author: Vinoo Ganesh 
AuthorDate: Thu May 21 16:06:28 2020 +

[SPARK-31354] SparkContext only register one SparkSession ApplicationEnd 
listener

## What changes were proposed in this pull request?

This change was made as a result of the conversation on 
https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue 
work from that ticket here.

This change fixes a memory leak where SparkSession listeners are never 
cleared off of the SparkContext listener bus.

Before running this PR, the following code:
```
SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()
```
would result in a SparkContext with the following listeners on the listener 
bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance
org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance
```
After this PR, the execution of the same code above results in SparkContext 
with the following listeners on the listener bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance
```
## How was this patch tested?

* Unit test included as a part of the PR

Closes #28128 from vinooganesh/vinooganesh/SPARK-27958.

Lead-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Signed-off-by: Wenchen Fan 
(cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74)
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index be597ed..60a6037 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql
 
 import java.io.Closeable
 import java.util.concurrent.TimeUnit._
-import java.util.concurrent.atomic.AtomicReference
+import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference}
 
 import scala.collection.JavaConverters._
 import scala.reflect.runtime.universe.TypeTag
@@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.sql.util.ExecutionListenerManager
 import org.apache.spark.util.{CallSite, Utils}
 
-
 /**
  * The entry point to programming Spark with the Dataset and DataFrame API.
  *
@@ -940,15 +939,7 @@ object SparkSession extends Logging {
 options.foreach { case (k, v) => session.initialSessionOptions.put(k, 
v) }
 setDefaultSession(session)
 setActiveSession(session)
-
-// Register a successfully instantiated context to the singleton. This 
should be at the
-// end of the class definition so that the singleton is updated only 
if there is no
-// exception in the construction of the instance.
-sparkContext.addSparkListener(new SparkListener {
-  override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
-defaultSession.set(null)
-  }
-})
+registerContextListener(sparkContext)
   }
 
   return session
@@ -1064,6 +1055,20 @@ object SparkSession extends Logging {
   // Private methods from now on
   

 
+  private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false)
+
+  /** Register the AppEnd listener onto the Context  */
+  private def registerContextListener(sparkContext: SparkContext): Unit = {
+if (!listenerRegistered.get()) {
+  sparkContext.addSparkListener(new SparkListener {
+override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
+  defaultSession.set(null)
+}
+  })
+  listenerRegistered.set(true)
+}
+  }
+
   /** The active SparkSession for the current thread. */

[spark] branch master updated (5d67331 -> dae7988)

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5d67331  [SPARK-31762][SQL] Fix perf regression of date/timestamp 
formatting in toHiveString
 add dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 0f2afd3  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
0f2afd3 is described below

commit 0f2afd3455e164dcd273f6c69774d08f7121af8d
Author: Vinoo Ganesh 
AuthorDate: Thu May 21 16:06:28 2020 +

[SPARK-31354] SparkContext only register one SparkSession ApplicationEnd 
listener

## What changes were proposed in this pull request?

This change was made as a result of the conversation on 
https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue 
work from that ticket here.

This change fixes a memory leak where SparkSession listeners are never 
cleared off of the SparkContext listener bus.

Before running this PR, the following code:
```
SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()
```
would result in a SparkContext with the following listeners on the listener 
bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance
org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance
```
After this PR, the execution of the same code above results in SparkContext 
with the following listeners on the listener bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance
```
## How was this patch tested?

* Unit test included as a part of the PR

Closes #28128 from vinooganesh/vinooganesh/SPARK-27958.

Lead-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Signed-off-by: Wenchen Fan 
(cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74)
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index be597ed..60a6037 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql
 
 import java.io.Closeable
 import java.util.concurrent.TimeUnit._
-import java.util.concurrent.atomic.AtomicReference
+import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference}
 
 import scala.collection.JavaConverters._
 import scala.reflect.runtime.universe.TypeTag
@@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.sql.util.ExecutionListenerManager
 import org.apache.spark.util.{CallSite, Utils}
 
-
 /**
  * The entry point to programming Spark with the Dataset and DataFrame API.
  *
@@ -940,15 +939,7 @@ object SparkSession extends Logging {
 options.foreach { case (k, v) => session.initialSessionOptions.put(k, 
v) }
 setDefaultSession(session)
 setActiveSession(session)
-
-// Register a successfully instantiated context to the singleton. This 
should be at the
-// end of the class definition so that the singleton is updated only 
if there is no
-// exception in the construction of the instance.
-sparkContext.addSparkListener(new SparkListener {
-  override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
-defaultSession.set(null)
-  }
-})
+registerContextListener(sparkContext)
   }
 
   return session
@@ -1064,6 +1055,20 @@ object SparkSession extends Logging {
   // Private methods from now on
   

 
+  private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false)
+
+  /** Register the AppEnd listener onto the Context  */
+  private def registerContextListener(sparkContext: SparkContext): Unit = {
+if (!listenerRegistered.get()) {
+  sparkContext.addSparkListener(new SparkListener {
+override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
+  defaultSession.set(null)
+}
+  })
+  listenerRegistered.set(true)
+}
+  }
+
   /** The active SparkSession for the current thread. */

[spark] branch master updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener

2020-05-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dae7988  [SPARK-31354] SparkContext only register one SparkSession 
ApplicationEnd listener
dae7988 is described below

commit dae79888dc6476892877d3b3b233381cdbf7fa74
Author: Vinoo Ganesh 
AuthorDate: Thu May 21 16:06:28 2020 +

[SPARK-31354] SparkContext only register one SparkSession ApplicationEnd 
listener

## What changes were proposed in this pull request?

This change was made as a result of the conversation on 
https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue 
work from that ticket here.

This change fixes a memory leak where SparkSession listeners are never 
cleared off of the SparkContext listener bus.

Before running this PR, the following code:
```
SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()

SparkSession.builder().master("local").getOrCreate()
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()
```
would result in a SparkContext with the following listeners on the listener 
bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance
org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance
```
After this PR, the execution of the same code above results in SparkContext 
with the following listeners on the listener bus:
```
[org.apache.spark.status.AppStatusListener5f610071,
org.apache.spark.HeartbeatReceiverd400c17,
org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance
```
## How was this patch tested?

* Unit test included as a part of the PR

Closes #28128 from vinooganesh/vinooganesh/SPARK-27958.

Lead-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Co-authored-by: Vinoo Ganesh 
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  | 27 +-
 .../spark/sql/SparkSessionBuilderSuite.scala   | 25 
 2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index be597ed..60a6037 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql
 
 import java.io.Closeable
 import java.util.concurrent.TimeUnit._
-import java.util.concurrent.atomic.AtomicReference
+import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference}
 
 import scala.collection.JavaConverters._
 import scala.reflect.runtime.universe.TypeTag
@@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType}
 import org.apache.spark.sql.util.ExecutionListenerManager
 import org.apache.spark.util.{CallSite, Utils}
 
-
 /**
  * The entry point to programming Spark with the Dataset and DataFrame API.
  *
@@ -940,15 +939,7 @@ object SparkSession extends Logging {
 options.foreach { case (k, v) => session.initialSessionOptions.put(k, 
v) }
 setDefaultSession(session)
 setActiveSession(session)
-
-// Register a successfully instantiated context to the singleton. This 
should be at the
-// end of the class definition so that the singleton is updated only 
if there is no
-// exception in the construction of the instance.
-sparkContext.addSparkListener(new SparkListener {
-  override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
-defaultSession.set(null)
-  }
-})
+registerContextListener(sparkContext)
   }
 
   return session
@@ -1064,6 +1055,20 @@ object SparkSession extends Logging {
   // Private methods from now on
   

 
+  private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false)
+
+  /** Register the AppEnd listener onto the Context  */
+  private def registerContextListener(sparkContext: SparkContext): Unit = {
+if (!listenerRegistered.get()) {
+  sparkContext.addSparkListener(new SparkListener {
+override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
+  defaultSession.set(null)
+}
+  })
+  listenerRegistered.set(true)
+}
+  }
+
   /** The active SparkSession for the current thread. */
   private val activeThreadSession = new InheritableThreadLocal[SparkSession]
 
diff --git 
a/sql/core/src/test/

37 matches

Mail list logo