[spark] branch master updated (83d0967 -> 245aee9)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" add 245aee9 [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime() No new revisions were added by this update. Summary of changes: .../spark/deploy/history/HistoryServerDiskManager.scala | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (83d0967 -> 245aee9)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" add 245aee9 [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime() No new revisions were added by this update. Summary of changes: .../spark/deploy/history/HistoryServerDiskManager.scala | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ec80e4b [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ec80e4b is described below commit ec80e4b5f80876378765544e0c4c6af1a704 Author: yi.wu AuthorDate: Thu May 21 23:34:11 2020 -0700 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index b5614b2..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) - messages.toList.iterator + Iterator.single(messages.toList) } -// Take a sorted list of all the partitionId messages -val messages = rdd2.collect().head -// All the task partitionIds are shared -for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) +val messages = rdd2.collect() +// All the task partitionIds are shared across all tasks +assert(messages.length === 4) +assert(messages.forall(_ == List("0", "1", "2", "3"))) } test("throw exception if we attempt to synchronize with different blocking calls") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (83d0967 -> 245aee9)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" add 245aee9 [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime() No new revisions were added by this update. Summary of changes: .../spark/deploy/history/HistoryServerDiskManager.scala | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ec80e4b [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ec80e4b is described below commit ec80e4b5f80876378765544e0c4c6af1a704 Author: yi.wu AuthorDate: Thu May 21 23:34:11 2020 -0700 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index b5614b2..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) - messages.toList.iterator + Iterator.single(messages.toList) } -// Take a sorted list of all the partitionId messages -val messages = rdd2.collect().head -// All the task partitionIds are shared -for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) +val messages = rdd2.collect() +// All the task partitionIds are shared across all tasks +assert(messages.length === 4) +assert(messages.forall(_ == List("0", "1", "2", "3"))) } test("throw exception if we attempt to synchronize with different blocking calls") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (60118a2 -> 83d0967)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 60118a2 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers add 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ec80e4b [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ec80e4b is described below commit ec80e4b5f80876378765544e0c4c6af1a704 Author: yi.wu AuthorDate: Thu May 21 23:34:11 2020 -0700 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index b5614b2..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) - messages.toList.iterator + Iterator.single(messages.toList) } -// Take a sorted list of all the partitionId messages -val messages = rdd2.collect().head -// All the task partitionIds are shared -for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) +val messages = rdd2.collect() +// All the task partitionIds are shared across all tasks +assert(messages.length === 4) +assert(messages.forall(_ == List("0", "1", "2", "3"))) } test("throw exception if we attempt to synchronize with different blocking calls") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (60118a2 -> 83d0967)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 60118a2 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers add 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" 83d0967 is described below commit 83d0967dcc6b205a3fd2003e051f49733f63cb30 Author: yi.wu AuthorDate: Thu May 21 23:34:11 2020 -0700 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu Signed-off-by: Xingbo Jiang --- .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index b5614b2..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) - messages.toList.iterator + Iterator.single(messages.toList) } -// Take a sorted list of all the partitionId messages -val messages = rdd2.collect().head -// All the task partitionIds are shared -for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) +val messages = rdd2.collect() +// All the task partitionIds are shared across all tasks +assert(messages.length === 4) +assert(messages.forall(_ == List("0", "1", "2", "3"))) } test("throw exception if we attempt to synchronize with different blocking calls") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 212ca86 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers 212ca86 is described below commit 212ca86b8c2e7671e4980ea6c2b869286a4dac06 Author: Max Gekk AuthorDate: Fri May 22 09:53:35 2020 +0900 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers ### What changes were proposed in this pull request? Add `withAllParquetReaders` to `ParquetTest`. The function allow to run a block of code for all available Parquet readers. ### Why are the changes needed? 1. It simplifies tests 2. Allow to test all parquet readers that could be available in projects based on Apache Spark. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running affected test suites. Closes #28598 from MaxGekk/add-withAllParquetReaders. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 60118a242639df060a9fdcaa4f14cd072ea3d056) Signed-off-by: HyukjinKwon --- .../datasources/parquet/ParquetFilterSuite.scala | 39 +++--- .../datasources/parquet/ParquetIOSuite.scala | 144 ++--- .../parquet/ParquetInteroperabilitySuite.scala | 8 +- .../datasources/parquet/ParquetQuerySuite.scala| 30 ++--- .../datasources/parquet/ParquetTest.scala | 7 + 5 files changed, 106 insertions(+), 122 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala index 5cf2129..7b33cef 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala @@ -781,10 +781,9 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared test("Filter applied on merged Parquet schema with new column should work") { import testImplicits._ -Seq("true", "false").foreach { vectorized => +withAllParquetReaders { withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true", -SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true", -SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized) { +SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true") { withTempPath { dir => val path1 = s"${dir.getCanonicalPath}/table1" (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path1) @@ -1219,24 +1218,22 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared } test("SPARK-17213: Broken Parquet filter push-down for string columns") { -Seq(true, false).foreach { vectorizedEnabled => - withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorizedEnabled.toString) { -withTempPath { dir => - import testImplicits._ +withAllParquetReaders { + withTempPath { dir => +import testImplicits._ - val path = dir.getCanonicalPath - // scalastyle:off nonascii - Seq("a", "é").toDF("name").write.parquet(path) - // scalastyle:on nonascii +val path = dir.getCanonicalPath +// scalastyle:off nonascii +Seq("a", "é").toDF("name").write.parquet(path) +// scalastyle:on nonascii - assert(spark.read.parquet(path).where("name > 'a'").count() == 1) - assert(spark.read.parquet(path).where("name >= 'a'").count() == 2) +assert(spark.read.parquet(path).where("name > 'a'").count() == 1) +assert(spark.read.parquet(path).where("name >= 'a'").count() == 2) - // scalastyle:off nonascii - assert(spark.read.parquet(path).where("name < 'é'").count() == 1) - assert(spark.read.parquet(path).where("name <= 'é'").count() == 2) - // scalastyle:on nonascii -} +// scalastyle:off nonascii +assert(spark.read.parquet(path).where("name < 'é'").count() == 1) +assert(spark.read.parquet(path).where("name <= 'é'").count() == 2) +// scalastyle:on nonascii } } } @@ -1244,8 +1241,8 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared test("SPARK-31026: Parquet predicate pushdown for fields having dots in the names") { import testImplicits._ -Seq(true, false).foreach { vectorized => - withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized.toString, +withAllParquetReaders { + withSQLConf( SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> true
[spark] branch branch-3.0 updated: [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 212ca86 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers 212ca86 is described below commit 212ca86b8c2e7671e4980ea6c2b869286a4dac06 Author: Max Gekk AuthorDate: Fri May 22 09:53:35 2020 +0900 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers ### What changes were proposed in this pull request? Add `withAllParquetReaders` to `ParquetTest`. The function allow to run a block of code for all available Parquet readers. ### Why are the changes needed? 1. It simplifies tests 2. Allow to test all parquet readers that could be available in projects based on Apache Spark. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running affected test suites. Closes #28598 from MaxGekk/add-withAllParquetReaders. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 60118a242639df060a9fdcaa4f14cd072ea3d056) Signed-off-by: HyukjinKwon --- .../datasources/parquet/ParquetFilterSuite.scala | 39 +++--- .../datasources/parquet/ParquetIOSuite.scala | 144 ++--- .../parquet/ParquetInteroperabilitySuite.scala | 8 +- .../datasources/parquet/ParquetQuerySuite.scala| 30 ++--- .../datasources/parquet/ParquetTest.scala | 7 + 5 files changed, 106 insertions(+), 122 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala index 5cf2129..7b33cef 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala @@ -781,10 +781,9 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared test("Filter applied on merged Parquet schema with new column should work") { import testImplicits._ -Seq("true", "false").foreach { vectorized => +withAllParquetReaders { withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true", -SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true", -SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized) { +SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true") { withTempPath { dir => val path1 = s"${dir.getCanonicalPath}/table1" (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path1) @@ -1219,24 +1218,22 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared } test("SPARK-17213: Broken Parquet filter push-down for string columns") { -Seq(true, false).foreach { vectorizedEnabled => - withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorizedEnabled.toString) { -withTempPath { dir => - import testImplicits._ +withAllParquetReaders { + withTempPath { dir => +import testImplicits._ - val path = dir.getCanonicalPath - // scalastyle:off nonascii - Seq("a", "é").toDF("name").write.parquet(path) - // scalastyle:on nonascii +val path = dir.getCanonicalPath +// scalastyle:off nonascii +Seq("a", "é").toDF("name").write.parquet(path) +// scalastyle:on nonascii - assert(spark.read.parquet(path).where("name > 'a'").count() == 1) - assert(spark.read.parquet(path).where("name >= 'a'").count() == 2) +assert(spark.read.parquet(path).where("name > 'a'").count() == 1) +assert(spark.read.parquet(path).where("name >= 'a'").count() == 2) - // scalastyle:off nonascii - assert(spark.read.parquet(path).where("name < 'é'").count() == 1) - assert(spark.read.parquet(path).where("name <= 'é'").count() == 2) - // scalastyle:on nonascii -} +// scalastyle:off nonascii +assert(spark.read.parquet(path).where("name < 'é'").count() == 1) +assert(spark.read.parquet(path).where("name <= 'é'").count() == 2) +// scalastyle:on nonascii } } } @@ -1244,8 +1241,8 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared test("SPARK-31026: Parquet predicate pushdown for fields having dots in the names") { import testImplicits._ -Seq(true, false).foreach { vectorized => - withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized.toString, +withAllParquetReaders { + withSQLConf( SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> true
[spark] branch master updated (db5e5fc -> 60118a2)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db5e5fc Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0" add 60118a2 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers No new revisions were added by this update. Summary of changes: .../datasources/parquet/ParquetFilterSuite.scala | 39 +++--- .../datasources/parquet/ParquetIOSuite.scala | 144 ++--- .../parquet/ParquetInteroperabilitySuite.scala | 8 +- .../datasources/parquet/ParquetQuerySuite.scala| 30 ++--- .../datasources/parquet/ParquetTest.scala | 7 + 5 files changed, 106 insertions(+), 122 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 60118a2 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers 60118a2 is described below commit 60118a242639df060a9fdcaa4f14cd072ea3d056 Author: Max Gekk AuthorDate: Fri May 22 09:53:35 2020 +0900 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers ### What changes were proposed in this pull request? Add `withAllParquetReaders` to `ParquetTest`. The function allow to run a block of code for all available Parquet readers. ### Why are the changes needed? 1. It simplifies tests 2. Allow to test all parquet readers that could be available in projects based on Apache Spark. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running affected test suites. Closes #28598 from MaxGekk/add-withAllParquetReaders. Authored-by: Max Gekk Signed-off-by: HyukjinKwon --- .../datasources/parquet/ParquetFilterSuite.scala | 39 +++--- .../datasources/parquet/ParquetIOSuite.scala | 144 ++--- .../parquet/ParquetInteroperabilitySuite.scala | 8 +- .../datasources/parquet/ParquetQuerySuite.scala| 30 ++--- .../datasources/parquet/ParquetTest.scala | 7 + 5 files changed, 106 insertions(+), 122 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala index 5cf2129..7b33cef 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala @@ -781,10 +781,9 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared test("Filter applied on merged Parquet schema with new column should work") { import testImplicits._ -Seq("true", "false").foreach { vectorized => +withAllParquetReaders { withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true", -SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true", -SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized) { +SQLConf.PARQUET_SCHEMA_MERGING_ENABLED.key -> "true") { withTempPath { dir => val path1 = s"${dir.getCanonicalPath}/table1" (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path1) @@ -1219,24 +1218,22 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared } test("SPARK-17213: Broken Parquet filter push-down for string columns") { -Seq(true, false).foreach { vectorizedEnabled => - withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorizedEnabled.toString) { -withTempPath { dir => - import testImplicits._ +withAllParquetReaders { + withTempPath { dir => +import testImplicits._ - val path = dir.getCanonicalPath - // scalastyle:off nonascii - Seq("a", "é").toDF("name").write.parquet(path) - // scalastyle:on nonascii +val path = dir.getCanonicalPath +// scalastyle:off nonascii +Seq("a", "é").toDF("name").write.parquet(path) +// scalastyle:on nonascii - assert(spark.read.parquet(path).where("name > 'a'").count() == 1) - assert(spark.read.parquet(path).where("name >= 'a'").count() == 2) +assert(spark.read.parquet(path).where("name > 'a'").count() == 1) +assert(spark.read.parquet(path).where("name >= 'a'").count() == 2) - // scalastyle:off nonascii - assert(spark.read.parquet(path).where("name < 'é'").count() == 1) - assert(spark.read.parquet(path).where("name <= 'é'").count() == 2) - // scalastyle:on nonascii -} +// scalastyle:off nonascii +assert(spark.read.parquet(path).where("name < 'é'").count() == 1) +assert(spark.read.parquet(path).where("name <= 'é'").count() == 2) +// scalastyle:on nonascii } } } @@ -1244,8 +1241,8 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared test("SPARK-31026: Parquet predicate pushdown for fields having dots in the names") { import testImplicits._ -Seq(true, false).foreach { vectorized => - withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized.toString, +withAllParquetReaders { + withSQLConf( SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> true.toString, SQLConf.SUPPORT_QUOTED_REGEX_COLUMN_NAME.key -> "false") { withTempPath { path =>
[spark] branch master updated (92877c4 -> db5e5fc)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92877c4 [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0 add db5e5fc Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0" No new revisions were added by this update. Summary of changes: core/pom.xml | 2 +- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 7 +-- core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala | 3 +-- pom.xml | 10 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- streaming/pom.xml | 2 +- 7 files changed, 11 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (92877c4 -> db5e5fc)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92877c4 [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0 add db5e5fc Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0" No new revisions were added by this update. Summary of changes: core/pom.xml | 2 +- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 7 +-- core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala | 3 +-- pom.xml | 10 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- streaming/pom.xml | 2 +- 7 files changed, 11 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (92877c4 -> db5e5fc)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92877c4 [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0 add db5e5fc Revert "[SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0" No new revisions were added by this update. Summary of changes: core/pom.xml | 2 +- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 7 +-- core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala | 3 +-- pom.xml | 10 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- streaming/pom.xml | 2 +- 7 files changed, 11 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-31387][SQL] Handle unknown operation/session ID in HiveThriftServer2Listener"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new cc1e568e Revert "[SPARK-31387][SQL] Handle unknown operation/session ID in HiveThriftServer2Listener" cc1e568e is described below commit cc1e568ea9cff11879a7c3eceadfe455443af3af Author: Dongjoon Hyun AuthorDate: Thu May 21 14:20:50 2020 -0700 Revert "[SPARK-31387][SQL] Handle unknown operation/session ID in HiveThriftServer2Listener" This reverts commit 5198b6853b2e3bc69fc013c653aa163c79168366. --- .../ui/HiveThriftServer2Listener.scala | 106 - .../hive/thriftserver/HiveSessionImplSuite.scala | 73 -- .../ui/HiveThriftServer2ListenerSuite.scala| 17 .../hive/service/cli/session/HiveSessionImpl.java | 20 ++-- .../hive/service/cli/session/HiveSessionImpl.java | 20 ++-- 5 files changed, 53 insertions(+), 183 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala index 6b7e5ee..6d0a506 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala @@ -25,7 +25,6 @@ import scala.collection.mutable.ArrayBuffer import org.apache.hive.service.server.HiveServer2 import org.apache.spark.{SparkConf, SparkContext} -import org.apache.spark.internal.Logging import org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD import org.apache.spark.scheduler._ import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.ExecutionState @@ -39,7 +38,7 @@ private[thriftserver] class HiveThriftServer2Listener( kvstore: ElementTrackingStore, sparkConf: SparkConf, server: Option[HiveServer2], -live: Boolean = true) extends SparkListener with Logging { +live: Boolean = true) extends SparkListener { private val sessionList = new ConcurrentHashMap[String, LiveSessionData]() private val executionList = new ConcurrentHashMap[String, LiveExecutionData]() @@ -132,83 +131,60 @@ private[thriftserver] class HiveThriftServer2Listener( updateLiveStore(session) } - private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = -Option(sessionList.get(e.sessionId)) match { - case Some(sessionData) => -sessionData.finishTimestamp = e.finishTime -updateStoreWithTriggerEnabled(sessionData) -sessionList.remove(e.sessionId) - case None => logWarning(s"onSessionClosed called with unknown session id: ${e.sessionId}") -} + private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit = { +val session = sessionList.get(e.sessionId) +session.finishTimestamp = e.finishTime +updateStoreWithTriggerEnabled(session) +sessionList.remove(e.sessionId) + } private def onOperationStart(e: SparkListenerThriftServerOperationStart): Unit = { -val executionData = getOrCreateExecution( +val info = getOrCreateExecution( e.id, e.statement, e.sessionId, e.startTime, e.userName) -executionData.state = ExecutionState.STARTED -executionList.put(e.id, executionData) -executionData.groupId = e.groupId -updateLiveStore(executionData) - -Option(sessionList.get(e.sessionId)) match { - case Some(sessionData) => -sessionData.totalExecution += 1 -updateLiveStore(sessionData) - case None => logWarning(s"onOperationStart called with unknown session id: ${e.sessionId}." + -s"Regardless, the operation has been registered.") -} +info.state = ExecutionState.STARTED +executionList.put(e.id, info) +sessionList.get(e.sessionId).totalExecution += 1 +executionList.get(e.id).groupId = e.groupId +updateLiveStore(executionList.get(e.id)) +updateLiveStore(sessionList.get(e.sessionId)) } - private def onOperationParsed(e: SparkListenerThriftServerOperationParsed): Unit = -Option(executionList.get(e.id)) match { - case Some(executionData) => -executionData.executePlan = e.executionPlan -executionData.state = ExecutionState.COMPILED -updateLiveStore(executionData) - case None => logWarning(s"onOperationParsed called with unknown operation id: ${e.id}") -} + private def onOperationParsed(e: SparkListenerThriftServerOperationParsed): Unit = { +executionList.get(e.id).executePlan = e.executionPlan +executionList.get(e.id).state = ExecutionState.COMPILED +updateLiveStore(executionList.get(e.id)) + } - private def onOperationCanceled(e: Spar
[spark] branch branch-2.4 updated: [SPARK-31787][K8S][TESTS][2.4] Fix Minikube.getIfNewMinikubeStatus to understand 1.5+
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1d1a207 [SPARK-31787][K8S][TESTS][2.4] Fix Minikube.getIfNewMinikubeStatus to understand 1.5+ 1d1a207 is described below commit 1d1a2079a56de25b0d14dd57f458a0f3830a6e14 Author: Marcelo Vanzin AuthorDate: Thu May 21 14:09:51 2020 -0700 [SPARK-31787][K8S][TESTS][2.4] Fix Minikube.getIfNewMinikubeStatus to understand 1.5+ ### What changes were proposed in this pull request? This PR aims to fix the testing infra to support Minikube 1.5+ in K8s IT. Also, note that this is a subset of #26488 with the same ownership. ### Why are the changes needed? This helps us testing `master/3.0/2.4` in the same Minikube version. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass the Jenkins K8s IT with Minikube v0.34.1. - Manually, test with Minikube 1.5.x. ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. Run completed in 4 minutes, 37 seconds. Total number of tests run: 14 Suites: completed 2, aborted 0 Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Closes #28599 from dongjoon-hyun/SPARK-31787. Authored-by: Marcelo Vanzin Signed-off-by: Dongjoon Hyun --- .../integrationtest/backend/minikube/Minikube.scala| 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala index 78ef44b..5968d3a 100644 --- a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala +++ b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala @@ -30,6 +30,7 @@ private[spark] object Minikube extends Logging { private val KUBELET_PREFIX = "kubelet:" private val APISERVER_PREFIX = "apiserver:" private val KUBECTL_PREFIX = "kubectl:" + private val KUBECONFIG_PREFIX = "kubeconfig:" private val MINIKUBE_VM_PREFIX = "minikubeVM: " private val MINIKUBE_PREFIX = "minikube: " private val MINIKUBE_PATH = ".minikube" @@ -86,18 +87,23 @@ private[spark] object Minikube extends Logging { val kubeletString = statusString.find(_.contains(s"$KUBELET_PREFIX ")) val apiserverString = statusString.find(_.contains(s"$APISERVER_PREFIX ")) val kubectlString = statusString.find(_.contains(s"$KUBECTL_PREFIX ")) +val kubeconfigString = statusString.find(_.contains(s"$KUBECONFIG_PREFIX ")) +val hasConfigStatus = kubectlString.isDefined || kubeconfigString.isDefined -if (hostString.isEmpty || kubeletString.isEmpty - || apiserverString.isEmpty || kubectlString.isEmpty) { +if (hostString.isEmpty || kubeletString.isEmpty || apiserverString.isEmpty || +!hasConfigStatus) { MinikubeStatus.NONE } else { val status1 = hostString.get.replaceFirst(s"$HOST_PREFIX ", "") val status2 = kubeletString.get.replaceFirst(s"$KUBELET_PREFIX ", "") val status3 = apiserverString.get.replaceFirst(s"$APISERVER_PREFIX ", "") - val status4 = kubectlString.get.replaceFirst(s"$KUBECTL_PREFIX ", "") - if (!status4.contains("Correctly Configured:")) { -MinikubeStatus.NONE + val isConfigured = if (kubectlString.isDefined) { +val cfgStatus = kubectlString.get.replaceFirst(s"$KUBECTL_PREFIX ", "") +cfgStatus.contains("Correctly Configured:") } else { +kubeconfigString.get.replaceFirst(s"$KUBECONFIG_PREFIX ", "") == "Configured" + } + if (isConfigured) { val stats = List(status1, status2, status3) .map(MinikubeStatus.unapply) .map(_.getOrElse(throw new IllegalStateException(s"Unknown status $statusString"))) @@ -106,6 +112,8 @@ private[spark] object
[spark] branch branch-2.4 updated (11e97b9 -> 1d1a207)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 11e97b9 [K8S][MINOR] Log minikube version when running integration tests. add 1d1a207 [SPARK-31787][K8S][TESTS][2.4] Fix Minikube.getIfNewMinikubeStatus to understand 1.5+ No new revisions were added by this update. Summary of changes: .../integrationtest/backend/minikube/Minikube.scala| 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 92877c4 [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0 92877c4 is described below commit 92877c4ef2ad113c156b7d9c359f396187c78fa3 Author: Kousuke Saruta AuthorDate: Thu May 21 11:43:25 2020 -0700 [SPARK-31765][WEBUI] Upgrade HtmlUnit >= 2.37.0 ### What changes were proposed in this pull request? This PR upgrades HtmlUnit. Selenium and Jetty also upgraded because of dependency. ### Why are the changes needed? Recently, a security issue which affects HtmlUnit is reported. https://nvd.nist.gov/vuln/detail/CVE-2020-5529 According to the report, arbitrary code can be run by malicious users. HtmlUnit is used for test so the impact might not be large but it's better to upgrade it just in case. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing testcases. Closes #28585 from sarutak/upgrade-htmlunit. Authored-by: Kousuke Saruta Signed-off-by: Gengliang Wang --- core/pom.xml | 2 +- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 7 ++- core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala | 3 ++- pom.xml | 10 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- streaming/pom.xml | 2 +- 7 files changed, 17 insertions(+), 11 deletions(-) diff --git a/core/pom.xml b/core/pom.xml index b0f6888..14b217d 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -334,7 +334,7 @@ org.seleniumhq.selenium - selenium-htmlunit-driver + htmlunit-driver test diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index 4b4788f..f1962ef 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -23,6 +23,7 @@ import javax.servlet.DispatcherType import javax.servlet.http._ import scala.language.implicitConversions +import scala.util.Try import scala.xml.Node import org.eclipse.jetty.client.HttpClient @@ -500,7 +501,11 @@ private[spark] case class ServerInfo( threadPool match { case pool: QueuedThreadPool => // Workaround for SPARK-30385 to avoid Jetty's acceptor thread shrink. -pool.setIdleTimeout(0) +// As of Jetty 9.4.21, the implementation of +// QueuedThreadPool#setIdleTimeout is changed and IllegalStateException +// will be thrown if we try to set idle timeout after the server has started. +// But this workaround works for Jetty 9.4.28 by ignoring the exception. +Try(pool.setIdleTimeout(0)) case _ => } server.stop() diff --git a/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala b/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala index 3ec9385..e96d82a 100644 --- a/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala +++ b/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala @@ -24,6 +24,7 @@ import javax.servlet.http.{HttpServletRequest, HttpServletResponse} import scala.io.Source import scala.xml.Node +import com.gargoylesoftware.css.parser.CSSParseException import com.gargoylesoftware.htmlunit.DefaultCssErrorHandler import org.json4s._ import org.json4s.jackson.JsonMethods @@ -33,7 +34,6 @@ import org.scalatest._ import org.scalatest.concurrent.Eventually._ import org.scalatest.time.SpanSugar._ import org.scalatestplus.selenium.WebBrowser -import org.w3c.css.sac.CSSParseException import org.apache.spark._ import org.apache.spark.LocalSparkContext._ @@ -784,6 +784,7 @@ class UISeleniumSuite extends SparkFunSuite with WebBrowser with Matchers with B eventually(timeout(10.seconds), interval(50.milliseconds)) { goToUi(sc, "/jobs") + val jobDesc = driver.findElement(By.cssSelector("div[class='application-timeline-content']")) jobDesc.getAttribute("data-title") should include ("collect at:25") diff --git a/pom.xml b/pom.xml index fd4cebc..29f7fec 100644 --- a/pom.xml +++ b/pom.xml @@ -139,7 +139,7 @@ com.twitter 1.6.0 -9.4.18.v20190429 +9.4.28.v20200408 3.1.0 0.9.5 2.4.0 @@ -187,8 +187,8 @@ 0.12.0 4.7.1 1.1 -2.52.0 -2.22 +3.141.59 +2.40.0 @@ -591,8 +591,8 @@ org.seleniumhq.selenium -selenium-htmlunit-driver -${selenium.version} +
[spark] branch master updated: [SPARK-29303][WEB UI] Add UI support for stage level scheduling
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b64688e [SPARK-29303][WEB UI] Add UI support for stage level scheduling b64688e is described below commit b64688ebbaac7afd3734c0d84d1e77b1fd2d2e9d Author: Thomas Graves AuthorDate: Thu May 21 13:11:35 2020 -0500 [SPARK-29303][WEB UI] Add UI support for stage level scheduling ### What changes were proposed in this pull request? This adds UI updates to support stage level scheduling and ResourceProfiles. 3 main things have been added. ResourceProfile id added to the Stage page, the Executors page now has an optional selectable column to show the ResourceProfile Id of each executor, and the Environment page now has a section with the ResourceProfile ids. Along with this the rest api for environment page was updated to include the Resource profile information. I debating on splitting the resource profile information into its own page but I wasn't sure it called for a completely separate page. Open to peoples thoughts on this. Screen shots: ![Screen Shot 2020-04-01 at 3 07 46 PM](https://user-images.githubusercontent.com/4563792/78185169-469a7000-7430-11ea-8b0c-d9ede2d41df8.png) ![Screen Shot 2020-04-01 at 3 08 14 PM](https://user-images.githubusercontent.com/4563792/78185175-48fcca00-7430-11ea-8d1d-6b9333700f32.png) ![Screen Shot 2020-04-01 at 3 09 03 PM](https://user-images.githubusercontent.com/4563792/78185176-4a2df700-7430-11ea-92d9-73c382bb0f32.png) ![Screen Shot 2020-04-01 at 11 05 48 AM](https://user-images.githubusercontent.com/4563792/78185186-4dc17e00-7430-11ea-8962-f749dd47ea60.png) ### Why are the changes needed? For user to be able to know what resource profile was used with which stage and executors. The resource profile information is also available so user debugging can see exactly what resources were requested with that profile. ### Does this PR introduce any user-facing change? Yes, UI updates. ### How was this patch tested? Unit tests and tested on yarn both active applications and with the history server. Closes #28094 from tgravescs/SPARK-29303-pr. Lead-authored-by: Thomas Graves Co-authored-by: Thomas Graves Signed-off-by: Thomas Graves --- .../org/apache/spark/SparkFirehoseListener.java| 5 + .../spark/ui/static/executorspage-template.html| 1 + .../org/apache/spark/ui/static/executorspage.js| 7 +- .../main/scala/org/apache/spark/SparkContext.scala | 2 +- .../deploy/history/HistoryAppStatusStore.scala | 3 +- .../spark/resource/ResourceProfileManager.scala| 7 +- .../spark/scheduler/EventLoggingListener.scala | 4 + .../org/apache/spark/scheduler/SparkListener.scala | 12 +++ .../apache/spark/scheduler/SparkListenerBus.scala | 2 + .../apache/spark/status/AppStatusListener.scala| 33 +- .../org/apache/spark/status/AppStatusStore.scala | 8 +- .../scala/org/apache/spark/status/LiveEntity.scala | 25 - .../status/api/v1/OneApplicationResource.scala | 4 +- .../scala/org/apache/spark/status/api/v1/api.scala | 18 +++- .../scala/org/apache/spark/status/storeTypes.scala | 7 ++ .../org/apache/spark/ui/env/EnvironmentPage.scala | 48 + .../scala/org/apache/spark/ui/jobs/JobPage.scala | 4 +- .../scala/org/apache/spark/ui/jobs/StagePage.scala | 4 + .../scala/org/apache/spark/util/JsonProtocol.scala | 101 +-- .../app_environment_expectation.json | 3 +- .../application_list_json_expectation.json | 15 +++ .../blacklisting_for_stage_expectation.json| 3 +- .../blacklisting_node_for_stage_expectation.json | 3 +- .../complete_stage_list_json_expectation.json | 9 +- .../completed_app_list_json_expectation.json | 15 +++ .../executor_list_json_expectation.json| 3 +- ...ist_with_executor_metrics_json_expectation.json | 12 ++- .../executor_memory_usage_expectation.json | 15 ++- .../executor_node_blacklisting_expectation.json| 15 ++- ...de_blacklisting_unblacklisting_expectation.json | 15 ++- .../executor_resource_information_expectation.json | 9 +- .../failed_stage_list_json_expectation.json| 3 +- .../limit_app_list_json_expectation.json | 30 +++--- .../minDate_app_list_json_expectation.json | 18 +++- .../minEndDate_app_list_json_expectation.json | 15 +++ .../multiple_resource_profiles_expectation.json| 112 + .../one_stage_attempt_json_expectation.json| 3 +- .../one_stage_json_expectation.json| 3 +- .../stage_list_json_expectation.json | 12 ++- ...age_list_with_accumulable_json_expectation.json |
[spark] branch master updated (dae7988 -> f1495c5)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener add f1495c5 [SPARK-31688][WEBUI] Refactor Pagination framework No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/PagedTable.scala | 101 - .../org/apache/spark/ui/jobs/AllJobsPage.scala | 128 ++- .../org/apache/spark/ui/jobs/StageTable.scala | 114 ++ .../org/apache/spark/ui/storage/RDDPage.scala | 64 ++ .../spark/sql/execution/ui/AllExecutionsPage.scala | 123 ++ .../hive/thriftserver/ui/ThriftServerPage.scala| 251 + .../thriftserver/ui/ThriftServerSessionPage.scala | 29 +-- 7 files changed, 238 insertions(+), 572 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dae7988 -> f1495c5)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener add f1495c5 [SPARK-31688][WEBUI] Refactor Pagination framework No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/PagedTable.scala | 101 - .../org/apache/spark/ui/jobs/AllJobsPage.scala | 128 ++- .../org/apache/spark/ui/jobs/StageTable.scala | 114 ++ .../org/apache/spark/ui/storage/RDDPage.scala | 64 ++ .../spark/sql/execution/ui/AllExecutionsPage.scala | 123 ++ .../hive/thriftserver/ui/ThriftServerPage.scala| 251 + .../thriftserver/ui/ThriftServerSessionPage.scala | 29 +-- 7 files changed, 238 insertions(+), 572 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dae7988 -> f1495c5)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener add f1495c5 [SPARK-31688][WEBUI] Refactor Pagination framework No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/PagedTable.scala | 101 - .../org/apache/spark/ui/jobs/AllJobsPage.scala | 128 ++- .../org/apache/spark/ui/jobs/StageTable.scala | 114 ++ .../org/apache/spark/ui/storage/RDDPage.scala | 64 ++ .../spark/sql/execution/ui/AllExecutionsPage.scala | 123 ++ .../hive/thriftserver/ui/ThriftServerPage.scala| 251 + .../thriftserver/ui/ThriftServerSessionPage.scala | 29 +-- 7 files changed, 238 insertions(+), 572 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dae7988 -> f1495c5)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener add f1495c5 [SPARK-31688][WEBUI] Refactor Pagination framework No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/PagedTable.scala | 101 - .../org/apache/spark/ui/jobs/AllJobsPage.scala | 128 ++- .../org/apache/spark/ui/jobs/StageTable.scala | 114 ++ .../org/apache/spark/ui/storage/RDDPage.scala | 64 ++ .../spark/sql/execution/ui/AllExecutionsPage.scala | 123 ++ .../hive/thriftserver/ui/ThriftServerPage.scala| 251 + .../thriftserver/ui/ThriftServerSessionPage.scala | 29 +-- 7 files changed, 238 insertions(+), 572 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [K8S][MINOR] Log minikube version when running integration tests.
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 11e97b9 [K8S][MINOR] Log minikube version when running integration tests. 11e97b9 is described below commit 11e97b92b576102e77db287199c905abb3ba4244 Author: Marcelo Vanzin AuthorDate: Fri Mar 1 11:24:08 2019 -0800 [K8S][MINOR] Log minikube version when running integration tests. Closes #23893 from vanzin/minikube-version. Authored-by: Marcelo Vanzin Signed-off-by: Marcelo Vanzin (cherry picked from commit 9f16af636661ec1f7af057764409fe359da0026a) Signed-off-by: Dongjoon Hyun --- .../k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala index cb93241..f92977d 100644 --- a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala +++ b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala @@ -25,6 +25,7 @@ private[spark] object MinikubeTestBackend extends IntegrationTestBackend { private var defaultClient: DefaultKubernetesClient = _ override def initialize(): Unit = { +Minikube.logVersion() val minikubeStatus = Minikube.getMinikubeStatus require(minikubeStatus == MinikubeStatus.RUNNING, s"Minikube must be running to use the Minikube backend for integration tests." + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [K8S][MINOR] Log minikube version when running integration tests.
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 11e97b9 [K8S][MINOR] Log minikube version when running integration tests. 11e97b9 is described below commit 11e97b92b576102e77db287199c905abb3ba4244 Author: Marcelo Vanzin AuthorDate: Fri Mar 1 11:24:08 2019 -0800 [K8S][MINOR] Log minikube version when running integration tests. Closes #23893 from vanzin/minikube-version. Authored-by: Marcelo Vanzin Signed-off-by: Marcelo Vanzin (cherry picked from commit 9f16af636661ec1f7af057764409fe359da0026a) Signed-off-by: Dongjoon Hyun --- .../k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala index cb93241..f92977d 100644 --- a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala +++ b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/MinikubeTestBackend.scala @@ -25,6 +25,7 @@ private[spark] object MinikubeTestBackend extends IntegrationTestBackend { private var defaultClient: DefaultKubernetesClient = _ override def initialize(): Unit = { +Minikube.logVersion() val minikubeStatus = Minikube.getMinikubeStatus require(minikubeStatus == MinikubeStatus.RUNNING, s"Minikube must be running to use the Minikube backend for integration tests." + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0f2afd3 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener 0f2afd3 is described below commit 0f2afd3455e164dcd273f6c69774d08f7121af8d Author: Vinoo Ganesh AuthorDate: Thu May 21 16:06:28 2020 + [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener ## What changes were proposed in this pull request? This change was made as a result of the conversation on https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue work from that ticket here. This change fixes a memory leak where SparkSession listeners are never cleared off of the SparkContext listener bus. Before running this PR, the following code: ``` SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() ``` would result in a SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance ``` After this PR, the execution of the same code above results in SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance ``` ## How was this patch tested? * Unit test included as a part of the PR Closes #28128 from vinooganesh/vinooganesh/SPARK-27958. Lead-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Signed-off-by: Wenchen Fan (cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74) Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index be597ed..60a6037 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -19,7 +19,7 @@ package org.apache.spark.sql import java.io.Closeable import java.util.concurrent.TimeUnit._ -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference} import scala.collection.JavaConverters._ import scala.reflect.runtime.universe.TypeTag @@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType} import org.apache.spark.sql.util.ExecutionListenerManager import org.apache.spark.util.{CallSite, Utils} - /** * The entry point to programming Spark with the Dataset and DataFrame API. * @@ -940,15 +939,7 @@ object SparkSession extends Logging { options.foreach { case (k, v) => session.initialSessionOptions.put(k, v) } setDefaultSession(session) setActiveSession(session) - -// Register a successfully instantiated context to the singleton. This should be at the -// end of the class definition so that the singleton is updated only if there is no -// exception in the construction of the instance. -sparkContext.addSparkListener(new SparkListener { - override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { -defaultSession.set(null) - } -}) +registerContextListener(sparkContext) } return session @@ -1064,6 +1055,20 @@ object SparkSession extends Logging { // Private methods from now on + private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false) + + /** Register the AppEnd listener onto the Context */ + private def registerContextListener(sparkContext: SparkContext): Unit = { +if (!listenerRegistered.get()) { + sparkContext.addSparkListener(new SparkListener { +override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { + defaultSession.set(null) +} + }) + listenerRegistered.set(true) +} + } + /** The active SparkSession for the current thread. */
[spark] branch master updated (5d67331 -> dae7988)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d67331 [SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString add dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0f2afd3 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener 0f2afd3 is described below commit 0f2afd3455e164dcd273f6c69774d08f7121af8d Author: Vinoo Ganesh AuthorDate: Thu May 21 16:06:28 2020 + [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener ## What changes were proposed in this pull request? This change was made as a result of the conversation on https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue work from that ticket here. This change fixes a memory leak where SparkSession listeners are never cleared off of the SparkContext listener bus. Before running this PR, the following code: ``` SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() ``` would result in a SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance ``` After this PR, the execution of the same code above results in SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance ``` ## How was this patch tested? * Unit test included as a part of the PR Closes #28128 from vinooganesh/vinooganesh/SPARK-27958. Lead-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Signed-off-by: Wenchen Fan (cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74) Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index be597ed..60a6037 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -19,7 +19,7 @@ package org.apache.spark.sql import java.io.Closeable import java.util.concurrent.TimeUnit._ -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference} import scala.collection.JavaConverters._ import scala.reflect.runtime.universe.TypeTag @@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType} import org.apache.spark.sql.util.ExecutionListenerManager import org.apache.spark.util.{CallSite, Utils} - /** * The entry point to programming Spark with the Dataset and DataFrame API. * @@ -940,15 +939,7 @@ object SparkSession extends Logging { options.foreach { case (k, v) => session.initialSessionOptions.put(k, v) } setDefaultSession(session) setActiveSession(session) - -// Register a successfully instantiated context to the singleton. This should be at the -// end of the class definition so that the singleton is updated only if there is no -// exception in the construction of the instance. -sparkContext.addSparkListener(new SparkListener { - override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { -defaultSession.set(null) - } -}) +registerContextListener(sparkContext) } return session @@ -1064,6 +1055,20 @@ object SparkSession extends Logging { // Private methods from now on + private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false) + + /** Register the AppEnd listener onto the Context */ + private def registerContextListener(sparkContext: SparkContext): Unit = { +if (!listenerRegistered.get()) { + sparkContext.addSparkListener(new SparkListener { +override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { + defaultSession.set(null) +} + }) + listenerRegistered.set(true) +} + } + /** The active SparkSession for the current thread. */
[spark] branch master updated (5d67331 -> dae7988)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d67331 [SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString add dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0f2afd3 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener 0f2afd3 is described below commit 0f2afd3455e164dcd273f6c69774d08f7121af8d Author: Vinoo Ganesh AuthorDate: Thu May 21 16:06:28 2020 + [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener ## What changes were proposed in this pull request? This change was made as a result of the conversation on https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue work from that ticket here. This change fixes a memory leak where SparkSession listeners are never cleared off of the SparkContext listener bus. Before running this PR, the following code: ``` SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() ``` would result in a SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance ``` After this PR, the execution of the same code above results in SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance ``` ## How was this patch tested? * Unit test included as a part of the PR Closes #28128 from vinooganesh/vinooganesh/SPARK-27958. Lead-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Signed-off-by: Wenchen Fan (cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74) Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index be597ed..60a6037 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -19,7 +19,7 @@ package org.apache.spark.sql import java.io.Closeable import java.util.concurrent.TimeUnit._ -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference} import scala.collection.JavaConverters._ import scala.reflect.runtime.universe.TypeTag @@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType} import org.apache.spark.sql.util.ExecutionListenerManager import org.apache.spark.util.{CallSite, Utils} - /** * The entry point to programming Spark with the Dataset and DataFrame API. * @@ -940,15 +939,7 @@ object SparkSession extends Logging { options.foreach { case (k, v) => session.initialSessionOptions.put(k, v) } setDefaultSession(session) setActiveSession(session) - -// Register a successfully instantiated context to the singleton. This should be at the -// end of the class definition so that the singleton is updated only if there is no -// exception in the construction of the instance. -sparkContext.addSparkListener(new SparkListener { - override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { -defaultSession.set(null) - } -}) +registerContextListener(sparkContext) } return session @@ -1064,6 +1055,20 @@ object SparkSession extends Logging { // Private methods from now on + private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false) + + /** Register the AppEnd listener onto the Context */ + private def registerContextListener(sparkContext: SparkContext): Unit = { +if (!listenerRegistered.get()) { + sparkContext.addSparkListener(new SparkListener { +override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { + defaultSession.set(null) +} + }) + listenerRegistered.set(true) +} + } + /** The active SparkSession for the current thread. */
[spark] branch master updated (5d67331 -> dae7988)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d67331 [SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString add dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0f2afd3 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener 0f2afd3 is described below commit 0f2afd3455e164dcd273f6c69774d08f7121af8d Author: Vinoo Ganesh AuthorDate: Thu May 21 16:06:28 2020 + [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener ## What changes were proposed in this pull request? This change was made as a result of the conversation on https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue work from that ticket here. This change fixes a memory leak where SparkSession listeners are never cleared off of the SparkContext listener bus. Before running this PR, the following code: ``` SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() ``` would result in a SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance ``` After this PR, the execution of the same code above results in SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance ``` ## How was this patch tested? * Unit test included as a part of the PR Closes #28128 from vinooganesh/vinooganesh/SPARK-27958. Lead-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Signed-off-by: Wenchen Fan (cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74) Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index be597ed..60a6037 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -19,7 +19,7 @@ package org.apache.spark.sql import java.io.Closeable import java.util.concurrent.TimeUnit._ -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference} import scala.collection.JavaConverters._ import scala.reflect.runtime.universe.TypeTag @@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType} import org.apache.spark.sql.util.ExecutionListenerManager import org.apache.spark.util.{CallSite, Utils} - /** * The entry point to programming Spark with the Dataset and DataFrame API. * @@ -940,15 +939,7 @@ object SparkSession extends Logging { options.foreach { case (k, v) => session.initialSessionOptions.put(k, v) } setDefaultSession(session) setActiveSession(session) - -// Register a successfully instantiated context to the singleton. This should be at the -// end of the class definition so that the singleton is updated only if there is no -// exception in the construction of the instance. -sparkContext.addSparkListener(new SparkListener { - override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { -defaultSession.set(null) - } -}) +registerContextListener(sparkContext) } return session @@ -1064,6 +1055,20 @@ object SparkSession extends Logging { // Private methods from now on + private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false) + + /** Register the AppEnd listener onto the Context */ + private def registerContextListener(sparkContext: SparkContext): Unit = { +if (!listenerRegistered.get()) { + sparkContext.addSparkListener(new SparkListener { +override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { + defaultSession.set(null) +} + }) + listenerRegistered.set(true) +} + } + /** The active SparkSession for the current thread. */
[spark] branch master updated (5d67331 -> dae7988)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d67331 [SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString add dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0f2afd3 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener 0f2afd3 is described below commit 0f2afd3455e164dcd273f6c69774d08f7121af8d Author: Vinoo Ganesh AuthorDate: Thu May 21 16:06:28 2020 + [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener ## What changes were proposed in this pull request? This change was made as a result of the conversation on https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue work from that ticket here. This change fixes a memory leak where SparkSession listeners are never cleared off of the SparkContext listener bus. Before running this PR, the following code: ``` SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() ``` would result in a SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance ``` After this PR, the execution of the same code above results in SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance ``` ## How was this patch tested? * Unit test included as a part of the PR Closes #28128 from vinooganesh/vinooganesh/SPARK-27958. Lead-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Signed-off-by: Wenchen Fan (cherry picked from commit dae79888dc6476892877d3b3b233381cdbf7fa74) Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index be597ed..60a6037 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -19,7 +19,7 @@ package org.apache.spark.sql import java.io.Closeable import java.util.concurrent.TimeUnit._ -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference} import scala.collection.JavaConverters._ import scala.reflect.runtime.universe.TypeTag @@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType} import org.apache.spark.sql.util.ExecutionListenerManager import org.apache.spark.util.{CallSite, Utils} - /** * The entry point to programming Spark with the Dataset and DataFrame API. * @@ -940,15 +939,7 @@ object SparkSession extends Logging { options.foreach { case (k, v) => session.initialSessionOptions.put(k, v) } setDefaultSession(session) setActiveSession(session) - -// Register a successfully instantiated context to the singleton. This should be at the -// end of the class definition so that the singleton is updated only if there is no -// exception in the construction of the instance. -sparkContext.addSparkListener(new SparkListener { - override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { -defaultSession.set(null) - } -}) +registerContextListener(sparkContext) } return session @@ -1064,6 +1055,20 @@ object SparkSession extends Logging { // Private methods from now on + private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false) + + /** Register the AppEnd listener onto the Context */ + private def registerContextListener(sparkContext: SparkContext): Unit = { +if (!listenerRegistered.get()) { + sparkContext.addSparkListener(new SparkListener { +override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { + defaultSession.set(null) +} + }) + listenerRegistered.set(true) +} + } + /** The active SparkSession for the current thread. */
[spark] branch master updated: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dae7988 [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener dae7988 is described below commit dae79888dc6476892877d3b3b233381cdbf7fa74 Author: Vinoo Ganesh AuthorDate: Thu May 21 16:06:28 2020 + [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener ## What changes were proposed in this pull request? This change was made as a result of the conversation on https://issues.apache.org/jira/browse/SPARK-31354 and is intended to continue work from that ticket here. This change fixes a memory leak where SparkSession listeners are never cleared off of the SparkContext listener bus. Before running this PR, the following code: ``` SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() SparkSession.builder().master("local").getOrCreate() SparkSession.clearActiveSession() SparkSession.clearDefaultSession() ``` would result in a SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb, <-First instance org.apache.spark.sql.SparkSession$$anon$1fadb9a0] <- Second instance ``` After this PR, the execution of the same code above results in SparkContext with the following listeners on the listener bus: ``` [org.apache.spark.status.AppStatusListener5f610071, org.apache.spark.HeartbeatReceiverd400c17, org.apache.spark.sql.SparkSession$$anon$125849aeb] <-One instance ``` ## How was this patch tested? * Unit test included as a part of the PR Closes #28128 from vinooganesh/vinooganesh/SPARK-27958. Lead-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Co-authored-by: Vinoo Ganesh Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/SparkSession.scala | 27 +- .../spark/sql/SparkSessionBuilderSuite.scala | 25 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index be597ed..60a6037 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -19,7 +19,7 @@ package org.apache.spark.sql import java.io.Closeable import java.util.concurrent.TimeUnit._ -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicBoolean, AtomicReference} import scala.collection.JavaConverters._ import scala.reflect.runtime.universe.TypeTag @@ -49,7 +49,6 @@ import org.apache.spark.sql.types.{DataType, StructType} import org.apache.spark.sql.util.ExecutionListenerManager import org.apache.spark.util.{CallSite, Utils} - /** * The entry point to programming Spark with the Dataset and DataFrame API. * @@ -940,15 +939,7 @@ object SparkSession extends Logging { options.foreach { case (k, v) => session.initialSessionOptions.put(k, v) } setDefaultSession(session) setActiveSession(session) - -// Register a successfully instantiated context to the singleton. This should be at the -// end of the class definition so that the singleton is updated only if there is no -// exception in the construction of the instance. -sparkContext.addSparkListener(new SparkListener { - override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { -defaultSession.set(null) - } -}) +registerContextListener(sparkContext) } return session @@ -1064,6 +1055,20 @@ object SparkSession extends Logging { // Private methods from now on + private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false) + + /** Register the AppEnd listener onto the Context */ + private def registerContextListener(sparkContext: SparkContext): Unit = { +if (!listenerRegistered.get()) { + sparkContext.addSparkListener(new SparkListener { +override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { + defaultSession.set(null) +} + }) + listenerRegistered.set(true) +} + } + /** The active SparkSession for the current thread. */ private val activeThreadSession = new InheritableThreadLocal[SparkSession] diff --git a/sql/core/src/test/