[spark] branch master updated (2527fbc -> e449993)
This is an automated email from the ASF dual-hosted git repository. dkbiswal pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2527fbc Revert "[SPARK-32276][SQL] Remove redundant sorts before repartition nodes" add e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 33 +- ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++--- .../datasources/v2/DataSourceV2ScanExecBase.scala | 30 - .../sql/execution/datasources/v2/FileScan.scala| 27 +++- .../sql/execution/datasources/v2/csv/CSVScan.scala | 4 ++ .../sql/execution/datasources/v2/orc/OrcScan.scala | 4 ++ .../datasources/v2/parquet/ParquetScan.scala | 4 ++ .../scala/org/apache/spark/sql/ExplainSuite.scala | 50 +- 8 files changed, 144 insertions(+), 22 deletions(-) copy sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsMetadata.scala} (75%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2527fbc -> e449993)
This is an automated email from the ASF dual-hosted git repository. dkbiswal pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2527fbc Revert "[SPARK-32276][SQL] Remove redundant sorts before repartition nodes" add e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 33 +- ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++--- .../datasources/v2/DataSourceV2ScanExecBase.scala | 30 - .../sql/execution/datasources/v2/FileScan.scala| 27 +++- .../sql/execution/datasources/v2/csv/CSVScan.scala | 4 ++ .../sql/execution/datasources/v2/orc/OrcScan.scala | 4 ++ .../datasources/v2/parquet/ParquetScan.scala | 4 ++ .../scala/org/apache/spark/sql/ExplainSuite.scala | 50 +- 8 files changed, 144 insertions(+), 22 deletions(-) copy sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsMetadata.scala} (75%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2527fbc -> e449993)
This is an automated email from the ASF dual-hosted git repository. dkbiswal pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2527fbc Revert "[SPARK-32276][SQL] Remove redundant sorts before repartition nodes" add e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 33 +- ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++--- .../datasources/v2/DataSourceV2ScanExecBase.scala | 30 - .../sql/execution/datasources/v2/FileScan.scala| 27 +++- .../sql/execution/datasources/v2/csv/CSVScan.scala | 4 ++ .../sql/execution/datasources/v2/orc/OrcScan.scala | 4 ++ .../datasources/v2/parquet/ParquetScan.scala | 4 ++ .../scala/org/apache/spark/sql/ExplainSuite.scala | 50 +- 8 files changed, 144 insertions(+), 22 deletions(-) copy sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsMetadata.scala} (75%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2527fbc -> e449993)
This is an automated email from the ASF dual-hosted git repository. dkbiswal pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2527fbc Revert "[SPARK-32276][SQL] Remove redundant sorts before repartition nodes" add e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 33 +- ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++--- .../datasources/v2/DataSourceV2ScanExecBase.scala | 30 - .../sql/execution/datasources/v2/FileScan.scala| 27 +++- .../sql/execution/datasources/v2/csv/CSVScan.scala | 4 ++ .../sql/execution/datasources/v2/orc/OrcScan.scala | 4 ++ .../datasources/v2/parquet/ParquetScan.scala | 4 ++ .../scala/org/apache/spark/sql/ExplainSuite.scala | 50 +- 8 files changed, 144 insertions(+), 22 deletions(-) copy sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsMetadata.scala} (75%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2527fbc -> e449993)
This is an automated email from the ASF dual-hosted git repository. dkbiswal pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2527fbc Revert "[SPARK-32276][SQL] Remove redundant sorts before repartition nodes" add e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 33 +- ...treamingUpdate.scala => SupportsMetadata.scala} | 14 +++--- .../datasources/v2/DataSourceV2ScanExecBase.scala | 30 - .../sql/execution/datasources/v2/FileScan.scala| 27 +++- .../sql/execution/datasources/v2/csv/CSVScan.scala | 4 ++ .../sql/execution/datasources/v2/orc/OrcScan.scala | 4 ++ .../datasources/v2/parquet/ParquetScan.scala | 4 ++ .../scala/org/apache/spark/sql/ExplainSuite.scala | 50 +- 8 files changed, 144 insertions(+), 22 deletions(-) copy sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsMetadata.scala} (75%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8950dcb [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 8950dcb is described below commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY ### What changes were proposed in this pull request? This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ### Why are the changes needed? ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` ### Does this PR introduce _any_ user-facing change? No. This only improves the test coverage. ### How was this patch tested? Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index d7eb048..e2b599a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer = distributedPlan.analyze +comparePlans(optimized, correctAnswer) + } + test("should not remove orderBy in le
[spark] branch master updated (e449993 -> 8950dcb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node add 8950dcb [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 74c910a [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 74c910a is described below commit 74c910afb2101ac1335176a0824b508e9fd9e43f Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY ### What changes were proposed in this pull request? This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ### Why are the changes needed? ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` ### Does this PR introduce _any_ user-facing change? No. This only improves the test coverage. ### How was this patch tested? Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index d7eb048..e2b599a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer =
[spark] branch master updated (e449993 -> 8950dcb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node add 8950dcb [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 74c910a [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 74c910a is described below commit 74c910afb2101ac1335176a0824b508e9fd9e43f Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY ### What changes were proposed in this pull request? This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ### Why are the changes needed? ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` ### Does this PR introduce _any_ user-facing change? No. This only improves the test coverage. ### How was this patch tested? Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index d7eb048..e2b599a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer =
[spark] branch master updated (e449993 -> 8950dcb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node add 8950dcb [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 74c910a [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 74c910a is described below commit 74c910afb2101ac1335176a0824b508e9fd9e43f Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY ### What changes were proposed in this pull request? This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ### Why are the changes needed? ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` ### Does this PR introduce _any_ user-facing change? No. This only improves the test coverage. ### How was this patch tested? Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index d7eb048..e2b599a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer =
[spark] branch branch-2.4 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 9aeeb0f [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 9aeeb0f is described below commit 9aeeb0f5932550c8025b6804235a50fc203da3a1 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` No. This only improves the test coverage. Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index e318f36..5d4f99a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -83,4 +83,13 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer = distributedPlan.analyze +comparePlans(optimized, correctAnswer) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.
[spark] branch master updated (e449993 -> 8950dcb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e449993 [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node add 8950dcb [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 74c910a [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 74c910a is described below commit 74c910afb2101ac1335176a0824b508e9fd9e43f Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY ### What changes were proposed in this pull request? This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ### Why are the changes needed? ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` ### Does this PR introduce _any_ user-facing change? No. This only improves the test coverage. ### How was this patch tested? Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index d7eb048..e2b599a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer =
[spark] branch branch-2.4 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 9aeeb0f [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 9aeeb0f is described below commit 9aeeb0f5932550c8025b6804235a50fc203da3a1 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` No. This only improves the test coverage. Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index e318f36..5d4f99a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -83,4 +83,13 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer = distributedPlan.analyze +comparePlans(optimized, correctAnswer) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.
[spark] branch branch-3.0 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 74c910a [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 74c910a is described below commit 74c910afb2101ac1335176a0824b508e9fd9e43f Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY ### What changes were proposed in this pull request? This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ### Why are the changes needed? ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` ### Does this PR introduce _any_ user-facing change? No. This only improves the test coverage. ### How was this patch tested? Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index d7eb048..e2b599a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer =
[spark] branch branch-2.4 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 9aeeb0f [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 9aeeb0f is described below commit 9aeeb0f5932550c8025b6804235a50fc203da3a1 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` No. This only improves the test coverage. Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index e318f36..5d4f99a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -83,4 +83,13 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer = distributedPlan.analyze +comparePlans(optimized, correctAnswer) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.
[spark] branch branch-2.4 updated: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 9aeeb0f [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY 9aeeb0f is described below commit 9aeeb0f5932550c8025b6804235a50fc203da3a1 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 07:43:56 2020 -0700 [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` The following was found during SPARK-32276. If Spark optimizer removes the inner `ORDER BY`, the file size increases. ```scala scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` No. This only improves the test coverage. Pass the GitHub Action or Jenkins. Closes #29118 from dongjoon-hyun/SPARK-32318. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/optimizer/EliminateSortsSuite.scala | 9 + 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala index e318f36..5d4f99a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala @@ -83,4 +83,13 @@ class EliminateSortsSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + + test("SPARK-32318: should not remove orderBy in distribute statement") { +val projectPlan = testRelation.select('a, 'b) +val orderByPlan = projectPlan.orderBy('b.desc) +val distributedPlan = orderByPlan.distribute('a)(1) +val optimized = Optimize.execute(distributedPlan.analyze) +val correctAnswer = distributedPlan.analyze +comparePlans(optimized, correctAnswer) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.
[spark] branch master updated: [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature cf22d94 is described below commit cf22d947fb8f37aa4d394b6633d6f08dbbf6dc1c Author: Erik Krogen AuthorDate: Wed Jul 15 11:40:55 2020 -0500 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature ### What changes were proposed in this pull request? This PR will remove references to these "blacklist" and "whitelist" terms besides the blacklisting feature as a whole, which can be handled in a separate JIRA/PR. This touches quite a few files, but the changes are straightforward (variable/method/etc. name changes) and most quite self-contained. ### Why are the changes needed? As per discussion on the Spark dev list, it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. ### Does this PR introduce _any_ user-facing change? In the test file `HiveQueryFileTest`, a developer has the ability to specify the system property `spark.hive.whitelist` to specify a list of Hive query files that should be tested. This system property has been renamed to `spark.hive.includelist`. The old property has been kept for compatibility, but will log a warning if used. I am open to feedback from others on whether keeping a deprecated property here is unnecessary given that this is just for developers running tests. ### How was this patch tested? Existing tests should be suitable since no behavior changes are expected as a result of this PR. Closes #28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists. Authored-by: Erik Krogen Signed-off-by: Thomas Graves --- R/pkg/tests/fulltests/test_context.R | 2 +- R/pkg/tests/fulltests/test_sparkSQL.R | 8 ++-- R/pkg/tests/run-all.R | 4 +- .../java/org/apache/spark/network/crypto/README.md | 2 +- .../spark/deploy/history/FsHistoryProvider.scala | 29 +++-- .../spark/deploy/rest/RestSubmissionClient.scala | 4 +- .../spark/scheduler/OutputCommitCoordinator.scala | 2 +- .../scala/org/apache/spark/util/JsonProtocol.scala | 4 +- .../test/scala/org/apache/spark/ThreadAudit.scala | 4 +- .../org/apache/spark/deploy/SparkSubmitSuite.scala | 22 +- .../deploy/history/FsHistoryProviderSuite.scala| 8 ++-- .../org/apache/spark/ui/UISeleniumSuite.scala | 14 +++--- dev/sparktestsupport/modules.py| 10 ++--- docs/streaming-programming-guide.md| 50 +++--- .../streaming/JavaRecoverableNetworkWordCount.java | 20 - .../streaming/recoverable_network_wordcount.py | 16 +++ .../streaming/RecoverableNetworkWordCount.scala| 16 +++ .../scala/org/apache/spark/util/DockerUtils.scala | 6 +-- project/SparkBuild.scala | 4 +- python/pylintrc| 2 +- python/pyspark/cloudpickle.py | 6 +-- python/pyspark/sql/functions.py| 4 +- python/pyspark/sql/pandas/typehints.py | 4 +- python/run-tests.py| 2 +- .../cluster/mesos/MesosClusterScheduler.scala | 4 +- .../spark/deploy/yarn/YarnSparkHadoopUtil.scala| 2 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 2 +- .../spark/sql/catalyst/json/JSONOptions.scala | 10 ++--- .../spark/sql/catalyst/optimizer/Optimizer.scala | 34 +++ .../spark/sql/catalyst/optimizer/expressions.scala | 2 +- .../plans/logical/basicLogicalOperators.scala | 2 +- .../spark/sql/catalyst/rules/RuleExecutor.scala| 6 +-- .../catalyst/optimizer/FilterPushdownSuite.scala | 2 +- .../PullupCorrelatedPredicatesSuite.scala | 2 +- .../datasources/json/JsonOutputWriter.scala| 2 +- .../inputs/{blacklist.sql => ignored.sql} | 2 +- .../org/apache/spark/sql/SQLQueryTestSuite.scala | 6 +-- .../org/apache/spark/sql/TPCDSQuerySuite.scala | 4 +- .../sql/execution/datasources/json/JsonSuite.scala | 2 +- .../thriftserver/ThriftServerQueryTestSuite.scala | 4 +- .../hive/execution/HiveCompatibilitySuite.scala| 16 +++ .../execution/HiveWindowFunctionQuerySuite.scala | 8 ++-- .../clientpositive/add_partition_no_includelist.q | 7
svn commit: r40495 - in /release/spark: spark-2.3.4/ spark-2.4.5/ spark-3.0.0-preview2/
Author: srowen Date: Wed Jul 15 17:12:28 2020 New Revision: 40495 Log: Remove non-current Spark 2.3, 2.4, 3.0 releases Removed: release/spark/spark-2.3.4/ release/spark/spark-2.4.5/ release/spark/spark-3.0.0-preview2/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf22d94 -> b05f309)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf22d94 [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature add b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel No new revisions were added by this update. Summary of changes: .../spark/ml/classification/FMClassifier.scala | 100 - .../apache/spark/ml/regression/FMRegressor.scala | 10 +-- .../spark/mllib/optimization/GradientDescent.scala | 45 ++ .../apache/spark/mllib/optimization/LBFGS.scala| 11 ++- .../ml/classification/FMClassifierSuite.scala | 26 ++ python/pyspark/ml/classification.py| 48 +- python/pyspark/ml/tests/test_training_summary.py | 49 +- 7 files changed, 257 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b05f309 -> c28a6fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel add c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation No new revisions were added by this update. Summary of changes: .../spark/examples/ml/JavaTokenizerExample.java| 4 ++-- .../org/apache/spark/examples/SparkKMeans.scala| 8 ++- .../apache/spark/sql/avro/SchemaConverters.scala | 4 ++-- .../spark/sql/kafka010/KafkaOffsetReader.scala | 2 +- .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 4 ++-- .../main/scala/org/apache/spark/ml/Estimator.scala | 2 +- .../spark/ml/clustering/GaussianMixture.scala | 28 +++--- .../org/apache/spark/ml/feature/RobustScaler.scala | 4 ++-- .../org/apache/spark/ml/feature/Word2Vec.scala | 2 +- .../scala/org/apache/spark/ml/param/params.scala | 2 +- .../spark/mllib/api/python/PythonMLLibAPI.scala| 8 +++ .../spark/mllib/clustering/BisectingKMeans.scala | 2 +- .../spark/mllib/clustering/GaussianMixture.scala | 10 .../org/apache/spark/mllib/fpm/PrefixSpan.scala| 2 +- .../org/apache/spark/mllib/rdd/SlidingRDD.scala| 2 +- .../apache/spark/mllib/tree/impurity/Entropy.scala | 2 +- .../apache/spark/mllib/tree/impurity/Gini.scala| 2 +- .../spark/mllib/tree/impurity/Variance.scala | 2 +- .../apache/spark/mllib/util/NumericParser.scala| 8 +++ .../spark/ml/clustering/BisectingKMeansSuite.scala | 4 ++-- .../apache/spark/ml/clustering/KMeansSuite.scala | 12 +- .../ml/evaluation/ClusteringEvaluatorSuite.scala | 2 +- .../apache/spark/ml/feature/NormalizerSuite.scala | 12 +- .../apache/spark/ml/recommendation/ALSSuite.scala | 12 +- .../spark/sql/hive/HiveExternalCatalog.scala | 8 +++ .../org/apache/spark/sql/hive/HiveInspectors.scala | 4 ++-- .../spark/sql/hive/HiveMetastoreCatalog.scala | 4 ++-- .../org/apache/spark/sql/hive/HiveUtils.scala | 4 ++-- .../spark/sql/hive/client/HiveClientImpl.scala | 24 +-- .../apache/spark/sql/hive/client/HiveShim.scala| 10 .../spark/sql/hive/execution/HiveOptions.scala | 2 +- .../sql/hive/execution/HiveTableScanExec.scala | 2 +- .../scala/org/apache/spark/sql/hive/hiveUDFs.scala | 4 ++-- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 2 +- .../apache/spark/sql/hive/StatisticsSuite.scala| 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 2 +- 36 files changed, 106 insertions(+), 102 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b05f309 -> c28a6fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel add c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation No new revisions were added by this update. Summary of changes: .../spark/examples/ml/JavaTokenizerExample.java| 4 ++-- .../org/apache/spark/examples/SparkKMeans.scala| 8 ++- .../apache/spark/sql/avro/SchemaConverters.scala | 4 ++-- .../spark/sql/kafka010/KafkaOffsetReader.scala | 2 +- .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 4 ++-- .../main/scala/org/apache/spark/ml/Estimator.scala | 2 +- .../spark/ml/clustering/GaussianMixture.scala | 28 +++--- .../org/apache/spark/ml/feature/RobustScaler.scala | 4 ++-- .../org/apache/spark/ml/feature/Word2Vec.scala | 2 +- .../scala/org/apache/spark/ml/param/params.scala | 2 +- .../spark/mllib/api/python/PythonMLLibAPI.scala| 8 +++ .../spark/mllib/clustering/BisectingKMeans.scala | 2 +- .../spark/mllib/clustering/GaussianMixture.scala | 10 .../org/apache/spark/mllib/fpm/PrefixSpan.scala| 2 +- .../org/apache/spark/mllib/rdd/SlidingRDD.scala| 2 +- .../apache/spark/mllib/tree/impurity/Entropy.scala | 2 +- .../apache/spark/mllib/tree/impurity/Gini.scala| 2 +- .../spark/mllib/tree/impurity/Variance.scala | 2 +- .../apache/spark/mllib/util/NumericParser.scala| 8 +++ .../spark/ml/clustering/BisectingKMeansSuite.scala | 4 ++-- .../apache/spark/ml/clustering/KMeansSuite.scala | 12 +- .../ml/evaluation/ClusteringEvaluatorSuite.scala | 2 +- .../apache/spark/ml/feature/NormalizerSuite.scala | 12 +- .../apache/spark/ml/recommendation/ALSSuite.scala | 12 +- .../spark/sql/hive/HiveExternalCatalog.scala | 8 +++ .../org/apache/spark/sql/hive/HiveInspectors.scala | 4 ++-- .../spark/sql/hive/HiveMetastoreCatalog.scala | 4 ++-- .../org/apache/spark/sql/hive/HiveUtils.scala | 4 ++-- .../spark/sql/hive/client/HiveClientImpl.scala | 24 +-- .../apache/spark/sql/hive/client/HiveShim.scala| 10 .../spark/sql/hive/execution/HiveOptions.scala | 2 +- .../sql/hive/execution/HiveTableScanExec.scala | 2 +- .../scala/org/apache/spark/sql/hive/hiveUDFs.scala | 4 ++-- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 2 +- .../apache/spark/sql/hive/StatisticsSuite.scala| 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 2 +- 36 files changed, 106 insertions(+), 102 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b05f309 -> c28a6fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel add c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation No new revisions were added by this update. Summary of changes: .../spark/examples/ml/JavaTokenizerExample.java| 4 ++-- .../org/apache/spark/examples/SparkKMeans.scala| 8 ++- .../apache/spark/sql/avro/SchemaConverters.scala | 4 ++-- .../spark/sql/kafka010/KafkaOffsetReader.scala | 2 +- .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 4 ++-- .../main/scala/org/apache/spark/ml/Estimator.scala | 2 +- .../spark/ml/clustering/GaussianMixture.scala | 28 +++--- .../org/apache/spark/ml/feature/RobustScaler.scala | 4 ++-- .../org/apache/spark/ml/feature/Word2Vec.scala | 2 +- .../scala/org/apache/spark/ml/param/params.scala | 2 +- .../spark/mllib/api/python/PythonMLLibAPI.scala| 8 +++ .../spark/mllib/clustering/BisectingKMeans.scala | 2 +- .../spark/mllib/clustering/GaussianMixture.scala | 10 .../org/apache/spark/mllib/fpm/PrefixSpan.scala| 2 +- .../org/apache/spark/mllib/rdd/SlidingRDD.scala| 2 +- .../apache/spark/mllib/tree/impurity/Entropy.scala | 2 +- .../apache/spark/mllib/tree/impurity/Gini.scala| 2 +- .../spark/mllib/tree/impurity/Variance.scala | 2 +- .../apache/spark/mllib/util/NumericParser.scala| 8 +++ .../spark/ml/clustering/BisectingKMeansSuite.scala | 4 ++-- .../apache/spark/ml/clustering/KMeansSuite.scala | 12 +- .../ml/evaluation/ClusteringEvaluatorSuite.scala | 2 +- .../apache/spark/ml/feature/NormalizerSuite.scala | 12 +- .../apache/spark/ml/recommendation/ALSSuite.scala | 12 +- .../spark/sql/hive/HiveExternalCatalog.scala | 8 +++ .../org/apache/spark/sql/hive/HiveInspectors.scala | 4 ++-- .../spark/sql/hive/HiveMetastoreCatalog.scala | 4 ++-- .../org/apache/spark/sql/hive/HiveUtils.scala | 4 ++-- .../spark/sql/hive/client/HiveClientImpl.scala | 24 +-- .../apache/spark/sql/hive/client/HiveShim.scala| 10 .../spark/sql/hive/execution/HiveOptions.scala | 2 +- .../sql/hive/execution/HiveTableScanExec.scala | 2 +- .../scala/org/apache/spark/sql/hive/hiveUDFs.scala | 4 ++-- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 2 +- .../apache/spark/sql/hive/StatisticsSuite.scala| 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 2 +- 36 files changed, 106 insertions(+), 102 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b05f309 -> c28a6fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel add c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation No new revisions were added by this update. Summary of changes: .../spark/examples/ml/JavaTokenizerExample.java| 4 ++-- .../org/apache/spark/examples/SparkKMeans.scala| 8 ++- .../apache/spark/sql/avro/SchemaConverters.scala | 4 ++-- .../spark/sql/kafka010/KafkaOffsetReader.scala | 2 +- .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 4 ++-- .../main/scala/org/apache/spark/ml/Estimator.scala | 2 +- .../spark/ml/clustering/GaussianMixture.scala | 28 +++--- .../org/apache/spark/ml/feature/RobustScaler.scala | 4 ++-- .../org/apache/spark/ml/feature/Word2Vec.scala | 2 +- .../scala/org/apache/spark/ml/param/params.scala | 2 +- .../spark/mllib/api/python/PythonMLLibAPI.scala| 8 +++ .../spark/mllib/clustering/BisectingKMeans.scala | 2 +- .../spark/mllib/clustering/GaussianMixture.scala | 10 .../org/apache/spark/mllib/fpm/PrefixSpan.scala| 2 +- .../org/apache/spark/mllib/rdd/SlidingRDD.scala| 2 +- .../apache/spark/mllib/tree/impurity/Entropy.scala | 2 +- .../apache/spark/mllib/tree/impurity/Gini.scala| 2 +- .../spark/mllib/tree/impurity/Variance.scala | 2 +- .../apache/spark/mllib/util/NumericParser.scala| 8 +++ .../spark/ml/clustering/BisectingKMeansSuite.scala | 4 ++-- .../apache/spark/ml/clustering/KMeansSuite.scala | 12 +- .../ml/evaluation/ClusteringEvaluatorSuite.scala | 2 +- .../apache/spark/ml/feature/NormalizerSuite.scala | 12 +- .../apache/spark/ml/recommendation/ALSSuite.scala | 12 +- .../spark/sql/hive/HiveExternalCatalog.scala | 8 +++ .../org/apache/spark/sql/hive/HiveInspectors.scala | 4 ++-- .../spark/sql/hive/HiveMetastoreCatalog.scala | 4 ++-- .../org/apache/spark/sql/hive/HiveUtils.scala | 4 ++-- .../spark/sql/hive/client/HiveClientImpl.scala | 24 +-- .../apache/spark/sql/hive/client/HiveShim.scala| 10 .../spark/sql/hive/execution/HiveOptions.scala | 2 +- .../sql/hive/execution/HiveTableScanExec.scala | 2 +- .../scala/org/apache/spark/sql/hive/hiveUDFs.scala | 4 ++-- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 2 +- .../apache/spark/sql/hive/StatisticsSuite.scala| 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 2 +- 36 files changed, 106 insertions(+), 102 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b05f309 -> c28a6fa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b05f309 [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel add c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation No new revisions were added by this update. Summary of changes: .../spark/examples/ml/JavaTokenizerExample.java| 4 ++-- .../org/apache/spark/examples/SparkKMeans.scala| 8 ++- .../apache/spark/sql/avro/SchemaConverters.scala | 4 ++-- .../spark/sql/kafka010/KafkaOffsetReader.scala | 2 +- .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 4 ++-- .../main/scala/org/apache/spark/ml/Estimator.scala | 2 +- .../spark/ml/clustering/GaussianMixture.scala | 28 +++--- .../org/apache/spark/ml/feature/RobustScaler.scala | 4 ++-- .../org/apache/spark/ml/feature/Word2Vec.scala | 2 +- .../scala/org/apache/spark/ml/param/params.scala | 2 +- .../spark/mllib/api/python/PythonMLLibAPI.scala| 8 +++ .../spark/mllib/clustering/BisectingKMeans.scala | 2 +- .../spark/mllib/clustering/GaussianMixture.scala | 10 .../org/apache/spark/mllib/fpm/PrefixSpan.scala| 2 +- .../org/apache/spark/mllib/rdd/SlidingRDD.scala| 2 +- .../apache/spark/mllib/tree/impurity/Entropy.scala | 2 +- .../apache/spark/mllib/tree/impurity/Gini.scala| 2 +- .../spark/mllib/tree/impurity/Variance.scala | 2 +- .../apache/spark/mllib/util/NumericParser.scala| 8 +++ .../spark/ml/clustering/BisectingKMeansSuite.scala | 4 ++-- .../apache/spark/ml/clustering/KMeansSuite.scala | 12 +- .../ml/evaluation/ClusteringEvaluatorSuite.scala | 2 +- .../apache/spark/ml/feature/NormalizerSuite.scala | 12 +- .../apache/spark/ml/recommendation/ALSSuite.scala | 12 +- .../spark/sql/hive/HiveExternalCatalog.scala | 8 +++ .../org/apache/spark/sql/hive/HiveInspectors.scala | 4 ++-- .../spark/sql/hive/HiveMetastoreCatalog.scala | 4 ++-- .../org/apache/spark/sql/hive/HiveUtils.scala | 4 ++-- .../spark/sql/hive/client/HiveClientImpl.scala | 24 +-- .../apache/spark/sql/hive/client/HiveShim.scala| 10 .../spark/sql/hive/execution/HiveOptions.scala | 2 +- .../sql/hive/execution/HiveTableScanExec.scala | 2 +- .../scala/org/apache/spark/sql/hive/hiveUDFs.scala | 4 ++-- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 2 +- .../apache/spark/sql/hive/StatisticsSuite.scala| 2 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 2 +- 36 files changed, 106 insertions(+), 102 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" 4ef535ff is described below commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 17:43:23 2020 -0700 Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa. --- .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala | 12 2 files changed, 18 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index 2706e4d..58a9f68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala @@ -57,12 +57,6 @@ case class ScalaUDF( override def toString: String = s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})" - override lazy val canonicalized: Expression = { -// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and technically we don't -// need it to identify a `ScalaUDF`. -Canonicalize.execute(copy(children = children.map(_.canonicalized), inputEncoders = Nil)) - } - /** * The analyzer should be aware of Scala primitive types so as to make the * UDF return null if there is any null input value of these types. On the diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala index 2ab14d5..91e9f1d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala @@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession { } assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction")) } - - test("SPARK-32307: Aggression that use map type input UDF as group expression") { -spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt)) -Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } - - test("SPARK-32307: Aggression that use array type input UDF as group expression") { -spark.udf.register("key", udf((m: Array[Int]) => m.head)) -Seq(Array(1)).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" 4ef535ff is described below commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 17:43:23 2020 -0700 Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa. --- .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala | 12 2 files changed, 18 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index 2706e4d..58a9f68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala @@ -57,12 +57,6 @@ case class ScalaUDF( override def toString: String = s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})" - override lazy val canonicalized: Expression = { -// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and technically we don't -// need it to identify a `ScalaUDF`. -Canonicalize.execute(copy(children = children.map(_.canonicalized), inputEncoders = Nil)) - } - /** * The analyzer should be aware of Scala primitive types so as to make the * UDF return null if there is any null input value of these types. On the diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala index 2ab14d5..91e9f1d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala @@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession { } assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction")) } - - test("SPARK-32307: Aggression that use map type input UDF as group expression") { -spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt)) -Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } - - test("SPARK-32307: Aggression that use array type input UDF as group expression") { -spark.udf.register("key", udf((m: Array[Int]) => m.head)) -Seq(Array(1)).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" 4ef535ff is described below commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 17:43:23 2020 -0700 Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa. --- .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala | 12 2 files changed, 18 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index 2706e4d..58a9f68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala @@ -57,12 +57,6 @@ case class ScalaUDF( override def toString: String = s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})" - override lazy val canonicalized: Expression = { -// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and technically we don't -// need it to identify a `ScalaUDF`. -Canonicalize.execute(copy(children = children.map(_.canonicalized), inputEncoders = Nil)) - } - /** * The analyzer should be aware of Scala primitive types so as to make the * UDF return null if there is any null input value of these types. On the diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala index 2ab14d5..91e9f1d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala @@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession { } assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction")) } - - test("SPARK-32307: Aggression that use map type input UDF as group expression") { -spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt)) -Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } - - test("SPARK-32307: Aggression that use array type input UDF as group expression") { -spark.udf.register("key", udf((m: Array[Int]) => m.head)) -Seq(Array(1)).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" 4ef535ff is described below commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 17:43:23 2020 -0700 Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa. --- .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala | 12 2 files changed, 18 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index 2706e4d..58a9f68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala @@ -57,12 +57,6 @@ case class ScalaUDF( override def toString: String = s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})" - override lazy val canonicalized: Expression = { -// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and technically we don't -// need it to identify a `ScalaUDF`. -Canonicalize.execute(copy(children = children.map(_.canonicalized), inputEncoders = Nil)) - } - /** * The analyzer should be aware of Scala primitive types so as to make the * UDF return null if there is any null input value of these types. On the diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala index 2ab14d5..91e9f1d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala @@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession { } assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction")) } - - test("SPARK-32307: Aggression that use map type input UDF as group expression") { -spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt)) -Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } - - test("SPARK-32307: Aggression that use array type input UDF as group expression") { -spark.udf.register("key", udf((m: Array[Int]) => m.head)) -Seq(Array(1)).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4ef535ff Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" 4ef535ff is described below commit 4ef535fffbc1cacbacb035b2b1ac1dffcc0352b4 Author: Dongjoon Hyun AuthorDate: Wed Jul 15 17:43:23 2020 -0700 Revert "[SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders" This reverts commit 785ec2ee6c2473f54b7ca6c01f446cc8bdf883fa. --- .../org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala | 12 2 files changed, 18 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index 2706e4d..58a9f68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala @@ -57,12 +57,6 @@ case class ScalaUDF( override def toString: String = s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})" - override lazy val canonicalized: Expression = { -// SPARK-32307: `ExpressionEncoder` can't be canonicalized, and technically we don't -// need it to identify a `ScalaUDF`. -Canonicalize.execute(copy(children = children.map(_.canonicalized), inputEncoders = Nil)) - } - /** * The analyzer should be aware of Scala primitive types so as to make the * UDF return null if there is any null input value of these types. On the diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala index 2ab14d5..91e9f1d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala @@ -609,16 +609,4 @@ class UDFSuite extends QueryTest with SharedSparkSession { } assert(e2.getMessage.contains("UDFSuite$MalformedClassObject$MalformedPrimitiveFunction")) } - - test("SPARK-32307: Aggression that use map type input UDF as group expression") { -spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt)) -Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } - - test("SPARK-32307: Aggression that use array type input UDF as group expression") { -spark.udf.register("key", udf((m: Array[Int]) => m.head)) -Seq(Array(1)).toDF("a").createOrReplaceTempView("t") -checkAnswer(sql("SELECT key(a) AS k FROM t GROUP BY key(a)"), Row(1) :: Nil) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c28a6fa -> db47c6e)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation add db47c6e [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API No new revisions were added by this update. Summary of changes: .../api/v1/{StageStatus.java => TaskStatus.java} | 14 +- .../org/apache/spark/status/AppStatusStore.scala | 16 +- .../spark/status/api/v1/StagesResource.scala | 5 +- ...st_w__status___offset___length_expectation.json | 99 ...__sortBy_short_names__runtime_expectation.json} | 0 .../stage_task_list_w__status_expectation.json | 531 + .../spark/deploy/history/HistoryServerSuite.scala | 6 + docs/monitoring.md | 3 +- 8 files changed, 660 insertions(+), 14 deletions(-) copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => TaskStatus.java} (83%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json copy core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} (100%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c28a6fa -> db47c6e)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation add db47c6e [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API No new revisions were added by this update. Summary of changes: .../api/v1/{StageStatus.java => TaskStatus.java} | 14 +- .../org/apache/spark/status/AppStatusStore.scala | 16 +- .../spark/status/api/v1/StagesResource.scala | 5 +- ...st_w__status___offset___length_expectation.json | 99 ...__sortBy_short_names__runtime_expectation.json} | 0 .../stage_task_list_w__status_expectation.json | 531 + .../spark/deploy/history/HistoryServerSuite.scala | 6 + docs/monitoring.md | 3 +- 8 files changed, 660 insertions(+), 14 deletions(-) copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => TaskStatus.java} (83%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json copy core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} (100%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c28a6fa -> db47c6e)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation add db47c6e [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API No new revisions were added by this update. Summary of changes: .../api/v1/{StageStatus.java => TaskStatus.java} | 14 +- .../org/apache/spark/status/AppStatusStore.scala | 16 +- .../spark/status/api/v1/StagesResource.scala | 5 +- ...st_w__status___offset___length_expectation.json | 99 ...__sortBy_short_names__runtime_expectation.json} | 0 .../stage_task_list_w__status_expectation.json | 531 + .../spark/deploy/history/HistoryServerSuite.scala | 6 + docs/monitoring.md | 3 +- 8 files changed, 660 insertions(+), 14 deletions(-) copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => TaskStatus.java} (83%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json copy core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} (100%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c28a6fa -> db47c6e)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation add db47c6e [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API No new revisions were added by this update. Summary of changes: .../api/v1/{StageStatus.java => TaskStatus.java} | 14 +- .../org/apache/spark/status/AppStatusStore.scala | 16 +- .../spark/status/api/v1/StagesResource.scala | 5 +- ...st_w__status___offset___length_expectation.json | 99 ...__sortBy_short_names__runtime_expectation.json} | 0 .../stage_task_list_w__status_expectation.json | 531 + .../spark/deploy/history/HistoryServerSuite.scala | 6 + docs/monitoring.md | 3 +- 8 files changed, 660 insertions(+), 14 deletions(-) copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => TaskStatus.java} (83%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json copy core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} (100%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c28a6fa -> db47c6e)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c28a6fa [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation add db47c6e [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API No new revisions were added by this update. Summary of changes: .../api/v1/{StageStatus.java => TaskStatus.java} | 14 +- .../org/apache/spark/status/AppStatusStore.scala | 16 +- .../spark/status/api/v1/StagesResource.scala | 5 +- ...st_w__status___offset___length_expectation.json | 99 ...__sortBy_short_names__runtime_expectation.json} | 0 .../stage_task_list_w__status_expectation.json | 531 + .../spark/deploy/history/HistoryServerSuite.scala | 6 + docs/monitoring.md | 3 +- 8 files changed, 660 insertions(+), 14 deletions(-) copy core/src/main/java/org/apache/spark/status/api/v1/{StageStatus.java => TaskStatus.java} (83%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status___offset___length_expectation.json copy core/src/test/resources/HistoryServerExpectations/{stage_task_list_w__sortBy_short_names__runtime_expectation.json => stage_task_list_w__status___sortBy_short_names__runtime_expectation.json} (100%) create mode 100644 core/src/test/resources/HistoryServerExpectations/stage_task_list_w__status_expectation.json - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org