[spark] branch branch-3.0 updated (37d6b3c -> 698ac6a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions add 698ac6a [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (37d6b3c -> 698ac6a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions add 698ac6a [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8f4fc22 -> bf52fa8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8f4fc22 [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events add bf52fa8 [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (37d6b3c -> 698ac6a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions add 698ac6a [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8f4fc22 -> bf52fa8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8f4fc22 [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events add bf52fa8 [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (37d6b3c -> 698ac6a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions add 698ac6a [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8f4fc22 -> bf52fa8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8f4fc22 [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events add bf52fa8 [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (37d6b3c -> 698ac6a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions add 698ac6a [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8f4fc22 -> bf52fa8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8f4fc22 [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events add bf52fa8 [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8f4fc22 -> bf52fa8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8f4fc22 [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events add bf52fa8 [SPARK-33165][SQL][TESTS][FOLLOW-UP] Use scala.Predef.assert instead No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions 37d6b3c is described below commit 37d6b3c0fafa98922ed1ecf4f8634d962f5bb9d9 Author: Linhong Liu AuthorDate: Fri Oct 16 03:36:21 2020 + [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions ### What changes were proposed in this pull request? For queries with multiple foldable distinct columns, since they will be eliminated during execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with multiple foldable distinct expressions" cases. For example: `select count(distinct 2), count(distinct 2, 3)` will be missed. But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed. As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation planner check. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case Authored-by: Linhong Liu Signed-off-by: Wenchen Fan (cherry picked from commit a410658c9bc244e325702dc926075bd835b669ff) Closes #30052 from linhongliu-db/SPARK-32761-3.0. Authored-by: Linhong Liu Signed-off-by: Wenchen Fan --- .../main/scala/org/apache/spark/sql/execution/SparkStrategies.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 4 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index f836deb..689d1eb 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -517,7 +517,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { val (functionsWithDistinct, functionsWithoutDistinct) = aggregateExpressions.partition(_.isDistinct) -if (functionsWithDistinct.map(_.aggregateFunction.children.toSet).distinct.length > 1) { +if (functionsWithDistinct.map( + _.aggregateFunction.children.filterNot(_.foldable).toSet).distinct.length > 1) { // This is a sanity check. We should not reach here when we have multiple distinct // column sets. Our `RewriteDistinctAggregates` should take care this case. sys.error("You hit a query analyzer bug. Please report your query to " + @@ -548,7 +549,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // to be [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but // [COUNT(DISTINCT bar), COUNT(DISTINCT foo)] is disallowed because those two distinct // aggregates have different column expressions. -val distinctExpressions = functionsWithDistinct.head.aggregateFunction.children +val distinctExpressions = + functionsWithDistinct.head.aggregateFunction.children.filterNot(_.foldable) val normalizedNamedDistinctExpressions = distinctExpressions.map { e => // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it here // because `distinctExpressions` is not extracted during logical phase. diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala index 7869005..85cbe45 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala @@ -2467,6 +2467,10 @@ class DataFrameSuite extends QueryTest val df = l.join(r, $"col2" === $"col4", "LeftOuter") checkAnswer(df, Row("2", "2")) } + + test("SPARK-32761: aggregating multiple distinct CONSTANT columns") { + checkAnswer(sql("select count(distinct 2), count(distinct 2,3)"), Row(1, 1)) + } } case class GroupByKey(a: Int, b: Int) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions 37d6b3c is described below commit 37d6b3c0fafa98922ed1ecf4f8634d962f5bb9d9 Author: Linhong Liu AuthorDate: Fri Oct 16 03:36:21 2020 + [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions ### What changes were proposed in this pull request? For queries with multiple foldable distinct columns, since they will be eliminated during execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with multiple foldable distinct expressions" cases. For example: `select count(distinct 2), count(distinct 2, 3)` will be missed. But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed. As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation planner check. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case Authored-by: Linhong Liu Signed-off-by: Wenchen Fan (cherry picked from commit a410658c9bc244e325702dc926075bd835b669ff) Closes #30052 from linhongliu-db/SPARK-32761-3.0. Authored-by: Linhong Liu Signed-off-by: Wenchen Fan --- .../main/scala/org/apache/spark/sql/execution/SparkStrategies.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 4 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index f836deb..689d1eb 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -517,7 +517,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { val (functionsWithDistinct, functionsWithoutDistinct) = aggregateExpressions.partition(_.isDistinct) -if (functionsWithDistinct.map(_.aggregateFunction.children.toSet).distinct.length > 1) { +if (functionsWithDistinct.map( + _.aggregateFunction.children.filterNot(_.foldable).toSet).distinct.length > 1) { // This is a sanity check. We should not reach here when we have multiple distinct // column sets. Our `RewriteDistinctAggregates` should take care this case. sys.error("You hit a query analyzer bug. Please report your query to " + @@ -548,7 +549,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // to be [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but // [COUNT(DISTINCT bar), COUNT(DISTINCT foo)] is disallowed because those two distinct // aggregates have different column expressions. -val distinctExpressions = functionsWithDistinct.head.aggregateFunction.children +val distinctExpressions = + functionsWithDistinct.head.aggregateFunction.children.filterNot(_.foldable) val normalizedNamedDistinctExpressions = distinctExpressions.map { e => // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it here // because `distinctExpressions` is not extracted during logical phase. diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala index 7869005..85cbe45 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala @@ -2467,6 +2467,10 @@ class DataFrameSuite extends QueryTest val df = l.join(r, $"col2" === $"col4", "LeftOuter") checkAnswer(df, Row("2", "2")) } + + test("SPARK-32761: aggregating multiple distinct CONSTANT columns") { + checkAnswer(sql("select count(distinct 2), count(distinct 2,3)"), Row(1, 1)) + } } case class GroupByKey(a: Int, b: Int) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions 37d6b3c is described below commit 37d6b3c0fafa98922ed1ecf4f8634d962f5bb9d9 Author: Linhong Liu AuthorDate: Fri Oct 16 03:36:21 2020 + [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions ### What changes were proposed in this pull request? For queries with multiple foldable distinct columns, since they will be eliminated during execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with multiple foldable distinct expressions" cases. For example: `select count(distinct 2), count(distinct 2, 3)` will be missed. But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed. As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation planner check. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case Authored-by: Linhong Liu Signed-off-by: Wenchen Fan (cherry picked from commit a410658c9bc244e325702dc926075bd835b669ff) Closes #30052 from linhongliu-db/SPARK-32761-3.0. Authored-by: Linhong Liu Signed-off-by: Wenchen Fan --- .../main/scala/org/apache/spark/sql/execution/SparkStrategies.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 4 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index f836deb..689d1eb 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -517,7 +517,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { val (functionsWithDistinct, functionsWithoutDistinct) = aggregateExpressions.partition(_.isDistinct) -if (functionsWithDistinct.map(_.aggregateFunction.children.toSet).distinct.length > 1) { +if (functionsWithDistinct.map( + _.aggregateFunction.children.filterNot(_.foldable).toSet).distinct.length > 1) { // This is a sanity check. We should not reach here when we have multiple distinct // column sets. Our `RewriteDistinctAggregates` should take care this case. sys.error("You hit a query analyzer bug. Please report your query to " + @@ -548,7 +549,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // to be [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but // [COUNT(DISTINCT bar), COUNT(DISTINCT foo)] is disallowed because those two distinct // aggregates have different column expressions. -val distinctExpressions = functionsWithDistinct.head.aggregateFunction.children +val distinctExpressions = + functionsWithDistinct.head.aggregateFunction.children.filterNot(_.foldable) val normalizedNamedDistinctExpressions = distinctExpressions.map { e => // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it here // because `distinctExpressions` is not extracted during logical phase. diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala index 7869005..85cbe45 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala @@ -2467,6 +2467,10 @@ class DataFrameSuite extends QueryTest val df = l.join(r, $"col2" === $"col4", "LeftOuter") checkAnswer(df, Row("2", "2")) } + + test("SPARK-32761: aggregating multiple distinct CONSTANT columns") { + checkAnswer(sql("select count(distinct 2), count(distinct 2,3)"), Row(1, 1)) + } } case class GroupByKey(a: Int, b: Int) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions 37d6b3c is described below commit 37d6b3c0fafa98922ed1ecf4f8634d962f5bb9d9 Author: Linhong Liu AuthorDate: Fri Oct 16 03:36:21 2020 + [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions ### What changes were proposed in this pull request? For queries with multiple foldable distinct columns, since they will be eliminated during execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with multiple foldable distinct expressions" cases. For example: `select count(distinct 2), count(distinct 2, 3)` will be missed. But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed. As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation planner check. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case Authored-by: Linhong Liu Signed-off-by: Wenchen Fan (cherry picked from commit a410658c9bc244e325702dc926075bd835b669ff) Closes #30052 from linhongliu-db/SPARK-32761-3.0. Authored-by: Linhong Liu Signed-off-by: Wenchen Fan --- .../main/scala/org/apache/spark/sql/execution/SparkStrategies.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 4 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index f836deb..689d1eb 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -517,7 +517,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { val (functionsWithDistinct, functionsWithoutDistinct) = aggregateExpressions.partition(_.isDistinct) -if (functionsWithDistinct.map(_.aggregateFunction.children.toSet).distinct.length > 1) { +if (functionsWithDistinct.map( + _.aggregateFunction.children.filterNot(_.foldable).toSet).distinct.length > 1) { // This is a sanity check. We should not reach here when we have multiple distinct // column sets. Our `RewriteDistinctAggregates` should take care this case. sys.error("You hit a query analyzer bug. Please report your query to " + @@ -548,7 +549,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // to be [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but // [COUNT(DISTINCT bar), COUNT(DISTINCT foo)] is disallowed because those two distinct // aggregates have different column expressions. -val distinctExpressions = functionsWithDistinct.head.aggregateFunction.children +val distinctExpressions = + functionsWithDistinct.head.aggregateFunction.children.filterNot(_.foldable) val normalizedNamedDistinctExpressions = distinctExpressions.map { e => // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it here // because `distinctExpressions` is not extracted during logical phase. diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala index 7869005..85cbe45 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala @@ -2467,6 +2467,10 @@ class DataFrameSuite extends QueryTest val df = l.join(r, $"col2" === $"col4", "LeftOuter") checkAnswer(df, Row("2", "2")) } + + test("SPARK-32761: aggregating multiple distinct CONSTANT columns") { + checkAnswer(sql("select count(distinct 2), count(distinct 2,3)"), Row(1, 1)) + } } case class GroupByKey(a: Int, b: Int) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 37d6b3c [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions 37d6b3c is described below commit 37d6b3c0fafa98922ed1ecf4f8634d962f5bb9d9 Author: Linhong Liu AuthorDate: Fri Oct 16 03:36:21 2020 + [SPARK-32761][SQL][3.0] Allow aggregating multiple foldable distinct expressions ### What changes were proposed in this pull request? For queries with multiple foldable distinct columns, since they will be eliminated during execution, it's not mandatory to let `RewriteDistinctAggregates` handle this case. And in the current code, `RewriteDistinctAggregates` *dose* miss some "aggregating with multiple foldable distinct expressions" cases. For example: `select count(distinct 2), count(distinct 2, 3)` will be missed. But in the planner, this will trigger an error that "multiple distinct expressions" are not allowed. As the foldable distinct columns can be eliminated finally, we can allow this in the aggregation planner check. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case Authored-by: Linhong Liu Signed-off-by: Wenchen Fan (cherry picked from commit a410658c9bc244e325702dc926075bd835b669ff) Closes #30052 from linhongliu-db/SPARK-32761-3.0. Authored-by: Linhong Liu Signed-off-by: Wenchen Fan --- .../main/scala/org/apache/spark/sql/execution/SparkStrategies.scala | 6 -- sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 4 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index f836deb..689d1eb 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -517,7 +517,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { val (functionsWithDistinct, functionsWithoutDistinct) = aggregateExpressions.partition(_.isDistinct) -if (functionsWithDistinct.map(_.aggregateFunction.children.toSet).distinct.length > 1) { +if (functionsWithDistinct.map( + _.aggregateFunction.children.filterNot(_.foldable).toSet).distinct.length > 1) { // This is a sanity check. We should not reach here when we have multiple distinct // column sets. Our `RewriteDistinctAggregates` should take care this case. sys.error("You hit a query analyzer bug. Please report your query to " + @@ -548,7 +549,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // to be [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but // [COUNT(DISTINCT bar), COUNT(DISTINCT foo)] is disallowed because those two distinct // aggregates have different column expressions. -val distinctExpressions = functionsWithDistinct.head.aggregateFunction.children +val distinctExpressions = + functionsWithDistinct.head.aggregateFunction.children.filterNot(_.foldable) val normalizedNamedDistinctExpressions = distinctExpressions.map { e => // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it here // because `distinctExpressions` is not extracted during logical phase. diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala index 7869005..85cbe45 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala @@ -2467,6 +2467,10 @@ class DataFrameSuite extends QueryTest val df = l.join(r, $"col2" === $"col4", "LeftOuter") checkAnswer(df, Row("2", "2")) } + + test("SPARK-32761: aggregating multiple distinct CONSTANT columns") { + checkAnswer(sql("select count(distinct 2), count(distinct 2,3)"), Row(1, 1)) + } } case class GroupByKey(a: Int, b: Int) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8f4fc22 [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events 8f4fc22 is described below commit 8f4fc22dc460eb05c47e0d61facf116c60b1be37 Author: Samuel Souza AuthorDate: Thu Oct 15 22:12:41 2020 -0500 [SPARK-33088][CORE] Enhance ExecutorPlugin API to include callbacks on task start and end events ### What changes were proposed in this pull request? Proposing a new set of APIs for ExecutorPlugins, to provide callbacks invoked at the start and end of each task of a job. Not very opinionated on the shape of the API, tried to be as minimal as possible for now. ### Why are the changes needed? Changes described in detail on [SPARK-33088](https://issues.apache.org/jira/browse/SPARK-33088), but mostly this boils down to: 1. This feature was considered when the ExecutorPlugin API was initially introduced in #21923, but never implemented. 2. The use-case which **requires** this feature is to propagate tracing information from the driver to the executor, such that calls from the same job can all be traced. a. Tracing frameworks usually are setup in thread locals, therefore it's important for the setup to happen in the same thread which runs the tasks. b. Executors can be for multiple jobs, therefore it's not sufficient to set tracing information at executor startup time -- it needs to happen every time a task starts or ends. ### Does this PR introduce _any_ user-facing change? No. This PR introduces new features for future developers to use. ### How was this patch tested? Unit tests on `PluginContainerSuite`. Closes #29977 from fsamuel-bs/SPARK-33088. Authored-by: Samuel Souza Signed-off-by: Mridul Muralidharan gmail.com> --- .../apache/spark/api/plugin/ExecutorPlugin.java| 42 +++ .../scala/org/apache/spark/executor/Executor.scala | 32 -- .../spark/internal/plugin/PluginContainer.scala| 49 +- .../scala/org/apache/spark/scheduler/Task.scala| 6 ++- .../internal/plugin/PluginContainerSuite.scala | 47 + .../apache/spark/scheduler/TaskContextSuite.scala | 4 +- 6 files changed, 163 insertions(+), 17 deletions(-) diff --git a/core/src/main/java/org/apache/spark/api/plugin/ExecutorPlugin.java b/core/src/main/java/org/apache/spark/api/plugin/ExecutorPlugin.java index 4961308..481bf98 100644 --- a/core/src/main/java/org/apache/spark/api/plugin/ExecutorPlugin.java +++ b/core/src/main/java/org/apache/spark/api/plugin/ExecutorPlugin.java @@ -19,6 +19,7 @@ package org.apache.spark.api.plugin; import java.util.Map; +import org.apache.spark.TaskFailedReason; import org.apache.spark.annotation.DeveloperApi; /** @@ -54,4 +55,45 @@ public interface ExecutorPlugin { */ default void shutdown() {} + /** + * Perform any action before the task is run. + * + * This method is invoked from the same thread the task will be executed. + * Task-specific information can be accessed via {@link org.apache.spark.TaskContext#get}. + * + * Plugin authors should avoid expensive operations here, as this method will be called + * on every task, and doing something expensive can significantly slow down a job. + * It is not recommended for a user to call a remote service, for example. + * + * Exceptions thrown from this method do not propagate - they're caught, + * logged, and suppressed. Therefore exceptions when executing this method won't + * make the job fail. + * + * @since 3.1.0 + */ + default void onTaskStart() {} + + /** + * Perform an action after tasks completes without exceptions. + * + * As {@link #onTaskStart() onTaskStart} exceptions are suppressed, this method + * will still be invoked even if the corresponding {@link #onTaskStart} call for this + * task failed. + * + * Same warnings of {@link #onTaskStart() onTaskStart} apply here. + * + * @since 3.1.0 + */ + default void onTaskSucceeded() {} + + /** + * Perform an action after tasks completes with exceptions. + * + * Same warnings of {@link #onTaskStart() onTaskStart} apply here. + * + * @param failureReason the exception thrown from the failed task. + * + * @since 3.1.0 + */ + default void onTaskFailed(TaskFailedReason failureReason) {} } diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index 27addd8..6653650 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -253,7 +253,7 @@ private[spark] class Executor(
[spark] branch branch-3.0 updated (d0f1120 -> 160f458)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add 160f458 [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (d0f1120 -> 160f458)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add 160f458 [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bf594a9 -> a5c17de)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE add a5c17de [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (d0f1120 -> 160f458)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add 160f458 [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bf594a9 -> a5c17de)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE add a5c17de [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (d0f1120 -> 160f458)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add 160f458 [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bf594a9 -> a5c17de)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE add a5c17de [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (d0f1120 -> 160f458)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add 160f458 [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bf594a9 -> a5c17de)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE add a5c17de [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bf594a9 -> a5c17de)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE add a5c17de [SPARK-33165][SQL][TEST] Remove dependencies(scalatest,scalactic) from Benchmark No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala | 5 - .../apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala | 3 ++- 2 files changed, 2 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (38c05af -> bf594a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE No new revisions were added by this update. Summary of changes: .../v2/jdbc/JDBCTableCatalogSuite.scala| 155 +++-- 1 file changed, 114 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (38c05af -> bf594a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE No new revisions were added by this update. Summary of changes: .../v2/jdbc/JDBCTableCatalogSuite.scala| 155 +++-- 1 file changed, 114 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (38c05af -> bf594a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE No new revisions were added by this update. Summary of changes: .../v2/jdbc/JDBCTableCatalogSuite.scala| 155 +++-- 1 file changed, 114 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (38c05af -> bf594a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE No new revisions were added by this update. Summary of changes: .../v2/jdbc/JDBCTableCatalogSuite.scala| 155 +++-- 1 file changed, 114 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (38c05af -> bf594a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files add bf594a9 [SPARK-32402][SQL][FOLLOW-UP] Add case sensitivity tests for column resolution in ALTER TABLE No new revisions were added by this update. Summary of changes: .../v2/jdbc/JDBCTableCatalogSuite.scala| 155 +++-- 1 file changed, 114 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files d0f1120 is described below commit d0f1120f3fb524a52df71e03c3d28ac82f76c1a3 Author: Max Gekk AuthorDate: Fri Oct 16 10:28:15 2020 +0900 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files ### What changes were proposed in this pull request? Added a couple tests to `AvroSuite` and to `ParquetIOSuite` to check that the metadata key 'org.apache.spark.legacyDateTime' is written correctly depending on the SQL configs: - spark.sql.legacy.avro.datetimeRebaseModeInWrite - spark.sql.legacy.parquet.datetimeRebaseModeInWrite This is a follow up https://github.com/apache/spark/pull/28137. ### Why are the changes needed? 1. To improve test coverage 2. To make sure that the metadata key is actually saved to Avro/Parquet files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the added tests: ``` $ build/sbt "testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetIOSuite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV1Suite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV2Suite" ``` Closes #30061 from MaxGekk/parquet-test-metakey. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 38c05af1d5538fc6ad00cdb57c1a90e90d04e25d) Signed-off-by: HyukjinKwon --- .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) diff --git a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala index d2f49ae..5d7d2e4 100644 --- a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala +++ b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala @@ -1788,15 +1788,19 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { } } + private def checkMetaData(path: java.io.File, key: String, expectedValue: String): Unit = { +val avroFiles = path.listFiles() + .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) +assert(avroFiles.length === 1) +val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) +val value = reader.asInstanceOf[DataFileReader[_]].getMetaString(key) +assert(value === expectedValue) + } + test("SPARK-31327: Write Spark version into Avro file metadata") { withTempPath { path => spark.range(1).repartition(1).write.format("avro").save(path.getCanonicalPath) - val avroFiles = path.listFiles() -.filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) - assert(avroFiles.length === 1) - val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) - val version = reader.asInstanceOf[DataFileReader[_]].getMetaString(SPARK_VERSION_METADATA_KEY) - assert(version === SPARK_VERSION_SHORT) + checkMetaData(path, SPARK_VERSION_METADATA_KEY, SPARK_VERSION_SHORT) } } @@ -1809,6 +1813,30 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { spark.read.format("avro").options(conf).load(path) } } + + test("SPARK-33163: write the metadata key 'org.apache.spark.legacyDateTime'") { +def saveTs(dir: java.io.File): Unit = { + Seq(Timestamp.valueOf("2020-10-15 01:02:03")).toDF() +.repartition(1) +.write +.format("avro") +.save(dir.getAbsolutePath) +} +withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> LEGACY.toString) { + withTempPath { dir => +saveTs(dir) +checkMetaData(dir, SPARK_LEGACY_DATETIME, "") + } +} +Seq(CORRECTED, EXCEPTION).foreach { mode => + withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> mode.toString) { +withTempPath { dir => + saveTs(dir) + checkMetaData(dir, SPARK_LEGACY_DATETIME, null) +} + } +} + } } class AvroV1Suite extends AvroSuite { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala index 2dc8a06..ff406f7 100644 ---
[spark] branch branch-3.0 updated: [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files d0f1120 is described below commit d0f1120f3fb524a52df71e03c3d28ac82f76c1a3 Author: Max Gekk AuthorDate: Fri Oct 16 10:28:15 2020 +0900 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files ### What changes were proposed in this pull request? Added a couple tests to `AvroSuite` and to `ParquetIOSuite` to check that the metadata key 'org.apache.spark.legacyDateTime' is written correctly depending on the SQL configs: - spark.sql.legacy.avro.datetimeRebaseModeInWrite - spark.sql.legacy.parquet.datetimeRebaseModeInWrite This is a follow up https://github.com/apache/spark/pull/28137. ### Why are the changes needed? 1. To improve test coverage 2. To make sure that the metadata key is actually saved to Avro/Parquet files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the added tests: ``` $ build/sbt "testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetIOSuite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV1Suite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV2Suite" ``` Closes #30061 from MaxGekk/parquet-test-metakey. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 38c05af1d5538fc6ad00cdb57c1a90e90d04e25d) Signed-off-by: HyukjinKwon --- .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) diff --git a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala index d2f49ae..5d7d2e4 100644 --- a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala +++ b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala @@ -1788,15 +1788,19 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { } } + private def checkMetaData(path: java.io.File, key: String, expectedValue: String): Unit = { +val avroFiles = path.listFiles() + .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) +assert(avroFiles.length === 1) +val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) +val value = reader.asInstanceOf[DataFileReader[_]].getMetaString(key) +assert(value === expectedValue) + } + test("SPARK-31327: Write Spark version into Avro file metadata") { withTempPath { path => spark.range(1).repartition(1).write.format("avro").save(path.getCanonicalPath) - val avroFiles = path.listFiles() -.filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) - assert(avroFiles.length === 1) - val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) - val version = reader.asInstanceOf[DataFileReader[_]].getMetaString(SPARK_VERSION_METADATA_KEY) - assert(version === SPARK_VERSION_SHORT) + checkMetaData(path, SPARK_VERSION_METADATA_KEY, SPARK_VERSION_SHORT) } } @@ -1809,6 +1813,30 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { spark.read.format("avro").options(conf).load(path) } } + + test("SPARK-33163: write the metadata key 'org.apache.spark.legacyDateTime'") { +def saveTs(dir: java.io.File): Unit = { + Seq(Timestamp.valueOf("2020-10-15 01:02:03")).toDF() +.repartition(1) +.write +.format("avro") +.save(dir.getAbsolutePath) +} +withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> LEGACY.toString) { + withTempPath { dir => +saveTs(dir) +checkMetaData(dir, SPARK_LEGACY_DATETIME, "") + } +} +Seq(CORRECTED, EXCEPTION).foreach { mode => + withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> mode.toString) { +withTempPath { dir => + saveTs(dir) + checkMetaData(dir, SPARK_LEGACY_DATETIME, null) +} + } +} + } } class AvroV1Suite extends AvroSuite { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala index 2dc8a06..ff406f7 100644 ---
[spark] branch master updated (9f5eff0 -> 38c05af)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs add 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files d0f1120 is described below commit d0f1120f3fb524a52df71e03c3d28ac82f76c1a3 Author: Max Gekk AuthorDate: Fri Oct 16 10:28:15 2020 +0900 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files ### What changes were proposed in this pull request? Added a couple tests to `AvroSuite` and to `ParquetIOSuite` to check that the metadata key 'org.apache.spark.legacyDateTime' is written correctly depending on the SQL configs: - spark.sql.legacy.avro.datetimeRebaseModeInWrite - spark.sql.legacy.parquet.datetimeRebaseModeInWrite This is a follow up https://github.com/apache/spark/pull/28137. ### Why are the changes needed? 1. To improve test coverage 2. To make sure that the metadata key is actually saved to Avro/Parquet files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the added tests: ``` $ build/sbt "testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetIOSuite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV1Suite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV2Suite" ``` Closes #30061 from MaxGekk/parquet-test-metakey. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 38c05af1d5538fc6ad00cdb57c1a90e90d04e25d) Signed-off-by: HyukjinKwon --- .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) diff --git a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala index d2f49ae..5d7d2e4 100644 --- a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala +++ b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala @@ -1788,15 +1788,19 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { } } + private def checkMetaData(path: java.io.File, key: String, expectedValue: String): Unit = { +val avroFiles = path.listFiles() + .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) +assert(avroFiles.length === 1) +val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) +val value = reader.asInstanceOf[DataFileReader[_]].getMetaString(key) +assert(value === expectedValue) + } + test("SPARK-31327: Write Spark version into Avro file metadata") { withTempPath { path => spark.range(1).repartition(1).write.format("avro").save(path.getCanonicalPath) - val avroFiles = path.listFiles() -.filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) - assert(avroFiles.length === 1) - val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) - val version = reader.asInstanceOf[DataFileReader[_]].getMetaString(SPARK_VERSION_METADATA_KEY) - assert(version === SPARK_VERSION_SHORT) + checkMetaData(path, SPARK_VERSION_METADATA_KEY, SPARK_VERSION_SHORT) } } @@ -1809,6 +1813,30 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { spark.read.format("avro").options(conf).load(path) } } + + test("SPARK-33163: write the metadata key 'org.apache.spark.legacyDateTime'") { +def saveTs(dir: java.io.File): Unit = { + Seq(Timestamp.valueOf("2020-10-15 01:02:03")).toDF() +.repartition(1) +.write +.format("avro") +.save(dir.getAbsolutePath) +} +withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> LEGACY.toString) { + withTempPath { dir => +saveTs(dir) +checkMetaData(dir, SPARK_LEGACY_DATETIME, "") + } +} +Seq(CORRECTED, EXCEPTION).foreach { mode => + withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> mode.toString) { +withTempPath { dir => + saveTs(dir) + checkMetaData(dir, SPARK_LEGACY_DATETIME, null) +} + } +} + } } class AvroV1Suite extends AvroSuite { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala index 2dc8a06..ff406f7 100644 ---
[spark] branch master updated (9f5eff0 -> 38c05af)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs add 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files d0f1120 is described below commit d0f1120f3fb524a52df71e03c3d28ac82f76c1a3 Author: Max Gekk AuthorDate: Fri Oct 16 10:28:15 2020 +0900 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files ### What changes were proposed in this pull request? Added a couple tests to `AvroSuite` and to `ParquetIOSuite` to check that the metadata key 'org.apache.spark.legacyDateTime' is written correctly depending on the SQL configs: - spark.sql.legacy.avro.datetimeRebaseModeInWrite - spark.sql.legacy.parquet.datetimeRebaseModeInWrite This is a follow up https://github.com/apache/spark/pull/28137. ### Why are the changes needed? 1. To improve test coverage 2. To make sure that the metadata key is actually saved to Avro/Parquet files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the added tests: ``` $ build/sbt "testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetIOSuite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV1Suite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV2Suite" ``` Closes #30061 from MaxGekk/parquet-test-metakey. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 38c05af1d5538fc6ad00cdb57c1a90e90d04e25d) Signed-off-by: HyukjinKwon --- .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) diff --git a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala index d2f49ae..5d7d2e4 100644 --- a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala +++ b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala @@ -1788,15 +1788,19 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { } } + private def checkMetaData(path: java.io.File, key: String, expectedValue: String): Unit = { +val avroFiles = path.listFiles() + .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) +assert(avroFiles.length === 1) +val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) +val value = reader.asInstanceOf[DataFileReader[_]].getMetaString(key) +assert(value === expectedValue) + } + test("SPARK-31327: Write Spark version into Avro file metadata") { withTempPath { path => spark.range(1).repartition(1).write.format("avro").save(path.getCanonicalPath) - val avroFiles = path.listFiles() -.filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) - assert(avroFiles.length === 1) - val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) - val version = reader.asInstanceOf[DataFileReader[_]].getMetaString(SPARK_VERSION_METADATA_KEY) - assert(version === SPARK_VERSION_SHORT) + checkMetaData(path, SPARK_VERSION_METADATA_KEY, SPARK_VERSION_SHORT) } } @@ -1809,6 +1813,30 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { spark.read.format("avro").options(conf).load(path) } } + + test("SPARK-33163: write the metadata key 'org.apache.spark.legacyDateTime'") { +def saveTs(dir: java.io.File): Unit = { + Seq(Timestamp.valueOf("2020-10-15 01:02:03")).toDF() +.repartition(1) +.write +.format("avro") +.save(dir.getAbsolutePath) +} +withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> LEGACY.toString) { + withTempPath { dir => +saveTs(dir) +checkMetaData(dir, SPARK_LEGACY_DATETIME, "") + } +} +Seq(CORRECTED, EXCEPTION).foreach { mode => + withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> mode.toString) { +withTempPath { dir => + saveTs(dir) + checkMetaData(dir, SPARK_LEGACY_DATETIME, null) +} + } +} + } } class AvroV1Suite extends AvroSuite { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala index 2dc8a06..ff406f7 100644 ---
[spark] branch master updated (9f5eff0 -> 38c05af)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs add 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d0f1120 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files d0f1120 is described below commit d0f1120f3fb524a52df71e03c3d28ac82f76c1a3 Author: Max Gekk AuthorDate: Fri Oct 16 10:28:15 2020 +0900 [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files ### What changes were proposed in this pull request? Added a couple tests to `AvroSuite` and to `ParquetIOSuite` to check that the metadata key 'org.apache.spark.legacyDateTime' is written correctly depending on the SQL configs: - spark.sql.legacy.avro.datetimeRebaseModeInWrite - spark.sql.legacy.parquet.datetimeRebaseModeInWrite This is a follow up https://github.com/apache/spark/pull/28137. ### Why are the changes needed? 1. To improve test coverage 2. To make sure that the metadata key is actually saved to Avro/Parquet files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the added tests: ``` $ build/sbt "testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetIOSuite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV1Suite" $ build/sbt "avro/test:testOnly org.apache.spark.sql.avro.AvroV2Suite" ``` Closes #30061 from MaxGekk/parquet-test-metakey. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 38c05af1d5538fc6ad00cdb57c1a90e90d04e25d) Signed-off-by: HyukjinKwon --- .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) diff --git a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala index d2f49ae..5d7d2e4 100644 --- a/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala +++ b/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala @@ -1788,15 +1788,19 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { } } + private def checkMetaData(path: java.io.File, key: String, expectedValue: String): Unit = { +val avroFiles = path.listFiles() + .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) +assert(avroFiles.length === 1) +val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) +val value = reader.asInstanceOf[DataFileReader[_]].getMetaString(key) +assert(value === expectedValue) + } + test("SPARK-31327: Write Spark version into Avro file metadata") { withTempPath { path => spark.range(1).repartition(1).write.format("avro").save(path.getCanonicalPath) - val avroFiles = path.listFiles() -.filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")) - assert(avroFiles.length === 1) - val reader = DataFileReader.openReader(avroFiles(0), new GenericDatumReader[GenericRecord]()) - val version = reader.asInstanceOf[DataFileReader[_]].getMetaString(SPARK_VERSION_METADATA_KEY) - assert(version === SPARK_VERSION_SHORT) + checkMetaData(path, SPARK_VERSION_METADATA_KEY, SPARK_VERSION_SHORT) } } @@ -1809,6 +1813,30 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession { spark.read.format("avro").options(conf).load(path) } } + + test("SPARK-33163: write the metadata key 'org.apache.spark.legacyDateTime'") { +def saveTs(dir: java.io.File): Unit = { + Seq(Timestamp.valueOf("2020-10-15 01:02:03")).toDF() +.repartition(1) +.write +.format("avro") +.save(dir.getAbsolutePath) +} +withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> LEGACY.toString) { + withTempPath { dir => +saveTs(dir) +checkMetaData(dir, SPARK_LEGACY_DATETIME, "") + } +} +Seq(CORRECTED, EXCEPTION).foreach { mode => + withSQLConf(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_WRITE.key -> mode.toString) { +withTempPath { dir => + saveTs(dir) + checkMetaData(dir, SPARK_LEGACY_DATETIME, null) +} + } +} + } } class AvroV1Suite extends AvroSuite { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala index 2dc8a06..ff406f7 100644 ---
[spark] branch master updated (9f5eff0 -> 38c05af)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs add 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9f5eff0 -> 38c05af)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs add 38c05af [SPARK-33163][SQL][TESTS] Check the metadata key 'org.apache.spark.legacyDateTime' in Avro/Parquet files No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 40 ++--- .../datasources/parquet/ParquetIOSuite.scala | 51 +- 2 files changed, 73 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (81d3a8e -> 9f5eff0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() add 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 119 ++- 1 file changed, 89 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (81d3a8e -> 9f5eff0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() add 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 119 ++- 1 file changed, 89 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (81d3a8e -> 9f5eff0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() add 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 119 ++- 1 file changed, 89 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (81d3a8e -> 9f5eff0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() add 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 119 ++- 1 file changed, 89 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (81d3a8e -> 9f5eff0)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() add 9f5eff0 [SPARK-33162][INFRA] Use pre-built image at GitHub Action PySpark jobs No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 119 ++- 1 file changed, 89 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ba69d68 -> 81d3a8e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet add 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ba69d68 -> 81d3a8e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet add 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ba69d68 -> 81d3a8e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet add 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ba69d68 -> 81d3a8e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet add 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ba69d68 -> 81d3a8e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet add 81d3a8e [MINOR][PYTHON] Fix the typo in the docstring of method agg() No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen closed pull request #295: Replace test-only to testOnly in Developer tools page
srowen closed pull request #295: URL: https://github.com/apache/spark-website/pull/295 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Replace test-only to testOnly in Developer tools page
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new fe3e503 Replace test-only to testOnly in Developer tools page fe3e503 is described below commit fe3e5037d2eef83da136b9f8c66e7e2d6904d2d4 Author: HyukjinKwon AuthorDate: Thu Oct 15 18:15:03 2020 -0500 Replace test-only to testOnly in Developer tools page See also https://github.com/apache/spark/pull/30028. After SBT was upgraded to 1.3, `test-only` should be `testOnly`. Author: HyukjinKwon Closes #295 from HyukjinKwon/test-only-sbt-upgrade. --- developer-tools.md| 2 +- site/developer-tools.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/developer-tools.md b/developer-tools.md index 0078538..9d82a25 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -267,7 +267,7 @@ it's due to a classpath issue (some classes were probably not compiled). To fix sufficient to run a test from the command line: ``` -build/sbt "test-only org.apache.spark.rdd.SortingSuite" +build/sbt "testOnly org.apache.spark.rdd.SortingSuite" ``` Running Different Test Permutations on Jenkins diff --git a/site/developer-tools.html b/site/developer-tools.html index 86918d8..b9ecb5e 100644 --- a/site/developer-tools.html +++ b/site/developer-tools.html @@ -447,7 +447,7 @@ java.lang.NullPointerException its due to a classpath issue (some classes were probably not compiled). To fix this, it sufficient to run a test from the command line: -build/sbt "test-only org.apache.spark.rdd.SortingSuite" +build/sbt "testOnly org.apache.spark.rdd.SortingSuite" Running Different Test Permutations on Jenkins - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4353f7d [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes 4353f7d is described below commit 4353f7d961aba7f1f65066245215b08817663701 Author: Maxim Gekk AuthorDate: Thu Oct 15 14:00:38 2020 -0700 [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes This is a backport of https://github.com/apache/spark/pull/27658 ### What changes were proposed in this pull request? In the PR, I propose to add the `legacySizeOfNull ` parameter to the `Size` expression, and pass the value of `spark.sql.legacy.sizeOfNull` if `legacySizeOfNull` is not provided on creation of `Size`. ### Why are the changes needed? This allows to avoid the issue when the configuration change between different phases of planning, and this can silently break a query plan which can lead to crashes or data corruption. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By `CollectionExpressionsSuite`. Closes #30058 from anuragmantri/SPARK-30894-2.4. Authored-by: Maxim Gekk Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/expressions/collectionOperations.scala| 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 6d74f45..c8bc1e7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -89,9 +89,10 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression > SELECT _FUNC_(NULL); -1 """) -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes { +case class Size(child: Expression, legacySizeOfNull: Boolean) + extends UnaryExpression with ExpectsInputTypes { - val legacySizeOfNull = SQLConf.get.legacySizeOfNull + def this(child: Expression) = this(child, SQLConf.get.legacySizeOfNull) override def dataType: DataType = IntegerType override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType)) @@ -123,6 +124,10 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType } } +object Size { + def apply(child: Expression): Size = new Size(child) +} + /** * Returns an unordered array containing the keys of the map. */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4353f7d [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes 4353f7d is described below commit 4353f7d961aba7f1f65066245215b08817663701 Author: Maxim Gekk AuthorDate: Thu Oct 15 14:00:38 2020 -0700 [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes This is a backport of https://github.com/apache/spark/pull/27658 ### What changes were proposed in this pull request? In the PR, I propose to add the `legacySizeOfNull ` parameter to the `Size` expression, and pass the value of `spark.sql.legacy.sizeOfNull` if `legacySizeOfNull` is not provided on creation of `Size`. ### Why are the changes needed? This allows to avoid the issue when the configuration change between different phases of planning, and this can silently break a query plan which can lead to crashes or data corruption. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By `CollectionExpressionsSuite`. Closes #30058 from anuragmantri/SPARK-30894-2.4. Authored-by: Maxim Gekk Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/expressions/collectionOperations.scala| 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 6d74f45..c8bc1e7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -89,9 +89,10 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression > SELECT _FUNC_(NULL); -1 """) -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes { +case class Size(child: Expression, legacySizeOfNull: Boolean) + extends UnaryExpression with ExpectsInputTypes { - val legacySizeOfNull = SQLConf.get.legacySizeOfNull + def this(child: Expression) = this(child, SQLConf.get.legacySizeOfNull) override def dataType: DataType = IntegerType override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType)) @@ -123,6 +124,10 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType } } +object Size { + def apply(child: Expression): Size = new Size(child) +} + /** * Returns an unordered array containing the keys of the map. */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4353f7d [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes 4353f7d is described below commit 4353f7d961aba7f1f65066245215b08817663701 Author: Maxim Gekk AuthorDate: Thu Oct 15 14:00:38 2020 -0700 [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes This is a backport of https://github.com/apache/spark/pull/27658 ### What changes were proposed in this pull request? In the PR, I propose to add the `legacySizeOfNull ` parameter to the `Size` expression, and pass the value of `spark.sql.legacy.sizeOfNull` if `legacySizeOfNull` is not provided on creation of `Size`. ### Why are the changes needed? This allows to avoid the issue when the configuration change between different phases of planning, and this can silently break a query plan which can lead to crashes or data corruption. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By `CollectionExpressionsSuite`. Closes #30058 from anuragmantri/SPARK-30894-2.4. Authored-by: Maxim Gekk Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/expressions/collectionOperations.scala| 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 6d74f45..c8bc1e7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -89,9 +89,10 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression > SELECT _FUNC_(NULL); -1 """) -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes { +case class Size(child: Expression, legacySizeOfNull: Boolean) + extends UnaryExpression with ExpectsInputTypes { - val legacySizeOfNull = SQLConf.get.legacySizeOfNull + def this(child: Expression) = this(child, SQLConf.get.legacySizeOfNull) override def dataType: DataType = IntegerType override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType)) @@ -123,6 +124,10 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType } } +object Size { + def apply(child: Expression): Size = new Size(child) +} + /** * Returns an unordered array containing the keys of the map. */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4353f7d [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes 4353f7d is described below commit 4353f7d961aba7f1f65066245215b08817663701 Author: Maxim Gekk AuthorDate: Thu Oct 15 14:00:38 2020 -0700 [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes This is a backport of https://github.com/apache/spark/pull/27658 ### What changes were proposed in this pull request? In the PR, I propose to add the `legacySizeOfNull ` parameter to the `Size` expression, and pass the value of `spark.sql.legacy.sizeOfNull` if `legacySizeOfNull` is not provided on creation of `Size`. ### Why are the changes needed? This allows to avoid the issue when the configuration change between different phases of planning, and this can silently break a query plan which can lead to crashes or data corruption. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By `CollectionExpressionsSuite`. Closes #30058 from anuragmantri/SPARK-30894-2.4. Authored-by: Maxim Gekk Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/expressions/collectionOperations.scala| 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 6d74f45..c8bc1e7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -89,9 +89,10 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression > SELECT _FUNC_(NULL); -1 """) -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes { +case class Size(child: Expression, legacySizeOfNull: Boolean) + extends UnaryExpression with ExpectsInputTypes { - val legacySizeOfNull = SQLConf.get.legacySizeOfNull + def this(child: Expression) = this(child, SQLConf.get.legacySizeOfNull) override def dataType: DataType = IntegerType override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType)) @@ -123,6 +124,10 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType } } +object Size { + def apply(child: Expression): Size = new Size(child) +} + /** * Returns an unordered array containing the keys of the map. */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4353f7d [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes 4353f7d is described below commit 4353f7d961aba7f1f65066245215b08817663701 Author: Maxim Gekk AuthorDate: Thu Oct 15 14:00:38 2020 -0700 [SPARK-30894][SQL][2.4] Make Size's nullable independent from SQL config changes This is a backport of https://github.com/apache/spark/pull/27658 ### What changes were proposed in this pull request? In the PR, I propose to add the `legacySizeOfNull ` parameter to the `Size` expression, and pass the value of `spark.sql.legacy.sizeOfNull` if `legacySizeOfNull` is not provided on creation of `Size`. ### Why are the changes needed? This allows to avoid the issue when the configuration change between different phases of planning, and this can silently break a query plan which can lead to crashes or data corruption. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By `CollectionExpressionsSuite`. Closes #30058 from anuragmantri/SPARK-30894-2.4. Authored-by: Maxim Gekk Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/expressions/collectionOperations.scala| 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 6d74f45..c8bc1e7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -89,9 +89,10 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression > SELECT _FUNC_(NULL); -1 """) -case class Size(child: Expression) extends UnaryExpression with ExpectsInputTypes { +case class Size(child: Expression, legacySizeOfNull: Boolean) + extends UnaryExpression with ExpectsInputTypes { - val legacySizeOfNull = SQLConf.get.legacySizeOfNull + def this(child: Expression) = this(child, SQLConf.get.legacySizeOfNull) override def dataType: DataType = IntegerType override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(ArrayType, MapType)) @@ -123,6 +124,10 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType } } +object Size { + def apply(child: Expression): Size = new Size(child) +} + /** * Returns an unordered array containing the keys of the map. */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9e37464 -> ba69d68)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9e37464 [SPARK-33078][SQL] Add config for json expression optimization add ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet No new revisions were added by this update. Summary of changes: .../shuffle/HostLocalShuffleReadingSuite.scala | 1 + .../apache/spark/storage/BlockManagerSuite.scala | 4 +- project/SparkBuild.scala | 84 -- .../sql/catalyst/optimizer/OptimizerSuite.scala| 2 +- .../spark/sql/catalyst/util/UnsafeArraySuite.scala | 3 +- .../apache/spark/sql/connector/InMemoryTable.scala | 8 +++ .../spark/sql/streaming/StreamingQuerySuite.scala | 2 +- .../spark/sql/hive/thriftserver/CliSuite.scala | 6 +- 8 files changed, 62 insertions(+), 48 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9e37464 -> ba69d68)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9e37464 [SPARK-33078][SQL] Add config for json expression optimization add ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet No new revisions were added by this update. Summary of changes: .../shuffle/HostLocalShuffleReadingSuite.scala | 1 + .../apache/spark/storage/BlockManagerSuite.scala | 4 +- project/SparkBuild.scala | 84 -- .../sql/catalyst/optimizer/OptimizerSuite.scala| 2 +- .../spark/sql/catalyst/util/UnsafeArraySuite.scala | 3 +- .../apache/spark/sql/connector/InMemoryTable.scala | 8 +++ .../spark/sql/streaming/StreamingQuerySuite.scala | 2 +- .../spark/sql/hive/thriftserver/CliSuite.scala | 6 +- 8 files changed, 62 insertions(+), 48 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9e37464 -> ba69d68)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9e37464 [SPARK-33078][SQL] Add config for json expression optimization add ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet No new revisions were added by this update. Summary of changes: .../shuffle/HostLocalShuffleReadingSuite.scala | 1 + .../apache/spark/storage/BlockManagerSuite.scala | 4 +- project/SparkBuild.scala | 84 -- .../sql/catalyst/optimizer/OptimizerSuite.scala| 2 +- .../spark/sql/catalyst/util/UnsafeArraySuite.scala | 3 +- .../apache/spark/sql/connector/InMemoryTable.scala | 8 +++ .../spark/sql/streaming/StreamingQuerySuite.scala | 2 +- .../spark/sql/hive/thriftserver/CliSuite.scala | 6 +- 8 files changed, 62 insertions(+), 48 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9e37464 -> ba69d68)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9e37464 [SPARK-33078][SQL] Add config for json expression optimization add ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet No new revisions were added by this update. Summary of changes: .../shuffle/HostLocalShuffleReadingSuite.scala | 1 + .../apache/spark/storage/BlockManagerSuite.scala | 4 +- project/SparkBuild.scala | 84 -- .../sql/catalyst/optimizer/OptimizerSuite.scala| 2 +- .../spark/sql/catalyst/util/UnsafeArraySuite.scala | 3 +- .../apache/spark/sql/connector/InMemoryTable.scala | 8 +++ .../spark/sql/streaming/StreamingQuerySuite.scala | 2 +- .../spark/sql/hive/thriftserver/CliSuite.scala | 6 +- 8 files changed, 62 insertions(+), 48 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82eea13 -> 9e37464)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82eea13 [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks add 9e37464 [SPARK-33078][SQL] Add config for json expression optimization No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/OptimizeJsonExprs.scala | 3 ++- .../org/apache/spark/sql/internal/SQLConf.scala | 11 +++ .../catalyst/optimizer/OptimizeJsonExprsSuite.scala | 21 + 3 files changed, 34 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9e37464 -> ba69d68)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9e37464 [SPARK-33078][SQL] Add config for json expression optimization add ba69d68 [SPARK-33080][BUILD] Replace fatal warnings snippet No new revisions were added by this update. Summary of changes: .../shuffle/HostLocalShuffleReadingSuite.scala | 1 + .../apache/spark/storage/BlockManagerSuite.scala | 4 +- project/SparkBuild.scala | 84 -- .../sql/catalyst/optimizer/OptimizerSuite.scala| 2 +- .../spark/sql/catalyst/util/UnsafeArraySuite.scala | 3 +- .../apache/spark/sql/connector/InMemoryTable.scala | 8 +++ .../spark/sql/streaming/StreamingQuerySuite.scala | 2 +- .../spark/sql/hive/thriftserver/CliSuite.scala | 6 +- 8 files changed, 62 insertions(+), 48 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82eea13 -> 9e37464)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82eea13 [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks add 9e37464 [SPARK-33078][SQL] Add config for json expression optimization No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/OptimizeJsonExprs.scala | 3 ++- .../org/apache/spark/sql/internal/SQLConf.scala | 11 +++ .../catalyst/optimizer/OptimizeJsonExprsSuite.scala | 21 + 3 files changed, 34 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82eea13 -> 9e37464)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82eea13 [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks add 9e37464 [SPARK-33078][SQL] Add config for json expression optimization No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/OptimizeJsonExprs.scala | 3 ++- .../org/apache/spark/sql/internal/SQLConf.scala | 11 +++ .../catalyst/optimizer/OptimizeJsonExprsSuite.scala | 21 + 3 files changed, 34 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82eea13 -> 9e37464)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82eea13 [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks add 9e37464 [SPARK-33078][SQL] Add config for json expression optimization No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/OptimizeJsonExprs.scala | 3 ++- .../org/apache/spark/sql/internal/SQLConf.scala | 11 +++ .../catalyst/optimizer/OptimizeJsonExprsSuite.scala | 21 + 3 files changed, 34 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82eea13 -> 9e37464)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82eea13 [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks add 9e37464 [SPARK-33078][SQL] Add config for json expression optimization No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/OptimizeJsonExprs.scala | 3 ++- .../org/apache/spark/sql/internal/SQLConf.scala | 11 +++ .../catalyst/optimizer/OptimizeJsonExprsSuite.scala | 21 + 3 files changed, 34 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 82eea13 [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks 82eea13 is described below commit 82eea13c7686fb4bfbe8fb4185db81438d2ea884 Author: Min Shen AuthorDate: Thu Oct 15 12:34:52 2020 -0500 [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks ### What changes were proposed in this pull request? This is the first patch for SPIP SPARK-30602 for push-based shuffle. Summary of changes: * Introduce new API in ExternalBlockStoreClient to push blocks to a remote shuffle service. * Leveraging the streaming upload functionality in SPARK-6237, it also enables the ExternalBlockHandler to delegate the handling of block push requests to MergedShuffleFileManager. * Propose the API for MergedShuffleFileManager, where the core logic on the shuffle service side to handle block push requests is defined. The actual implementation of this API is deferred into a later RB to restrict the size of this PR. * Introduce OneForOneBlockPusher to enable pushing blocks to remote shuffle services in shuffle RPC layer. * New protocols in shuffle RPC layer to support the functionalities. ### Why are the changes needed? Refer to the SPIP in SPARK-30602 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit tests. The reference PR with the consolidated changes covering the complete implementation is also provided in SPARK-30602. We have already verified the functionality and the improved performance as documented in the SPIP doc. Lead-authored-by: Min Shen Co-authored-by: Chandni Singh Co-authored-by: Ye Zhou Closes #29855 from Victsm/SPARK-32915. Lead-authored-by: Min Shen Co-authored-by: Chandni Singh Co-authored-by: Ye Zhou Co-authored-by: Chandni Singh Co-authored-by: Min Shen Signed-off-by: Mridul Muralidharan gmail.com> --- common/network-common/pom.xml | 4 + .../apache/spark/network/protocol/Encoders.java| 63 common/network-shuffle/pom.xml | 9 ++ .../spark/network/shuffle/BlockStoreClient.java| 21 +++ .../apache/spark/network/shuffle/ErrorHandler.java | 85 +++ .../network/shuffle/ExternalBlockHandler.java | 104 +- .../network/shuffle/ExternalBlockStoreClient.java | 52 ++- .../spark/network/shuffle/MergedBlockMeta.java | 64 + .../network/shuffle/MergedShuffleFileManager.java | 116 +++ .../network/shuffle/OneForOneBlockPusher.java | 123 .../network/shuffle/RetryingBlockFetcher.java | 27 +++- .../shuffle/protocol/BlockTransferMessage.java | 6 +- .../shuffle/protocol/FinalizeShuffleMerge.java | 84 +++ .../network/shuffle/protocol/MergeStatuses.java| 118 +++ .../network/shuffle/protocol/PushBlockStream.java | 95 .../spark/network/shuffle/ErrorHandlerSuite.java | 51 +++ .../network/shuffle/ExternalBlockHandlerSuite.java | 40 +- .../network/shuffle/OneForOneBlockPusherSuite.java | 159 + .../ExternalShuffleServiceMetricsSuite.scala | 3 +- .../yarn/YarnShuffleServiceMetricsSuite.scala | 2 +- .../network/yarn/YarnShuffleServiceSuite.scala | 1 + 21 files changed, 1212 insertions(+), 15 deletions(-) diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 9d5bc9a..d328a7d 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -91,6 +91,10 @@ org.apache.commons commons-crypto + + org.roaringbitmap + RoaringBitmap + diff --git a/common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java b/common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java index 490915f..4fa191b 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java +++ b/common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java @@ -17,9 +17,11 @@ package org.apache.spark.network.protocol; +import java.io.IOException; import java.nio.charset.StandardCharsets; import io.netty.buffer.ByteBuf; +import org.roaringbitmap.RoaringBitmap; /** Provides a canonical set of Encoders for simple types. */ public class Encoders { @@ -44,6 +46,40 @@ public class Encoders { } } + /** Bitmaps are encoded with their serialization length followed by the serialization bytes. */ + public static class Bitmaps { +public static int
[spark] branch master updated (31f7097 -> b089fe5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable add b089fe5 [SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (31f7097 -> b089fe5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable add b089fe5 [SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (31f7097 -> b089fe5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable add b089fe5 [SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (31f7097 -> b089fe5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable add b089fe5 [SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (31f7097 -> b089fe5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable add b089fe5 [SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (513b6f5 -> 31f7097)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job add 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/jdbc/DB2Dialect.scala | 5 +++-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 25 - .../org/apache/spark/sql/jdbc/OracleDialect.scala | 11 + .../v2/jdbc/JDBCTableCatalogSuite.scala| 26 +- 4 files changed, 39 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (513b6f5 -> 31f7097)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job add 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/jdbc/DB2Dialect.scala | 5 +++-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 25 - .../org/apache/spark/sql/jdbc/OracleDialect.scala | 11 + .../v2/jdbc/JDBCTableCatalogSuite.scala| 26 +- 4 files changed, 39 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (513b6f5 -> 31f7097)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job add 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/jdbc/DB2Dialect.scala | 5 +++-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 25 - .../org/apache/spark/sql/jdbc/OracleDialect.scala | 11 + .../v2/jdbc/JDBCTableCatalogSuite.scala| 26 +- 4 files changed, 39 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (513b6f5 -> 31f7097)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job add 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/jdbc/DB2Dialect.scala | 5 +++-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 25 - .../org/apache/spark/sql/jdbc/OracleDialect.scala | 11 + .../v2/jdbc/JDBCTableCatalogSuite.scala| 26 +- 4 files changed, 39 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (513b6f5 -> 31f7097)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job add 31f7097 [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/jdbc/DB2Dialect.scala | 5 +++-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 25 - .../org/apache/spark/sql/jdbc/OracleDialect.scala | 11 + .../v2/jdbc/JDBCTableCatalogSuite.scala| 26 +- 4 files changed, 39 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r41940 - /release/spark/KEYS
Author: srowen Date: Thu Oct 15 13:18:43 2020 New Revision: 41940 Log: Add missing key for Ruifeng to Spark KEYS Modified: release/spark/KEYS Modified: release/spark/KEYS == --- release/spark/KEYS (original) +++ release/spark/KEYS Thu Oct 15 13:18:43 2020 @@ -1413,3 +1413,60 @@ Hy4V/RJiJHCHekSXHCNoxgJz8Jc= =+90F -END PGP PUBLIC KEY BLOCK- +pub rsa4096 2020-08-05 [SC] + 5146FBDC4B90744EA948035795E0EE38CF98F9F4 +uid [ultimate] Ruifeng Zheng (CODE SIGNING KEY) +sub rsa4096 2020-08-05 [E] + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBF8qcTwBEADNwwXl2aEihlTGLo4uH4CHyF0Et2qJa0widBEj+LkQg1Alsxml +Eqh/yea5QJObPmtfvIH8qgtUhOUUANH6+GY7XTtTrd4SU2jYupns1Z7HuTHx75IX +oi2i2kzffWXPS4LMe9b7QjceHWsAIqKpmG2/tY1Wm9m0emwfa+qDNZaKQFAP+tnp +24CVGUiNQbUyxDDUlpKHszB2Kw+pj/pFsNqAv30x2QweIVfGTYZAhzgzybR3Oid6 +8Bf1BbkWF9UH5at0Y2+Q9dvhMewRxgbW9jonA9OMy4EBfRqRzauYcjz0F7Pzy+Lk +fd1/9SE4eFIGVts2XTT//AK0IUwoAdjmOT+aq9x1qSqxzrHqgIj5pssn7sPheUAB +67a0oiM7r92a/URvskU4csI1LxWJz2oqTeRa1K7cmvw/4nxHqkNCizbXhVWNLiGH +VC3tZZdgHliMCehCKmFFw9/r0F+XM0cJesUhhbfVL0rPLUaA7tZ5zefKaeDUpUDt +JB/XFv5am02yInlT+n4Er6fxW9Pp0bEYgBVZY3Agr11VxcKFGhS3eb4iDl+obFN9 +UnuG7Vkm7l8j5NWPdkuzMzLG1+wdUbz9EcHhzt3NLutyo0nzt3uZiZjQONagIwhV +5SvdTG6eS6QWxKPbgGETmqGaEqKMXbumXTnqgEHm82w2P4J9OU72X+rkPQARAQAB +tDZSdWlmZW5nIFpoZW5nIChDT0RFIFNJR05JTkcgS0VZKSA8cnVpZmVuZ3pAYXBh +Y2hlLm9yZz6JAk4EEwEKADgWIQRRRvvcS5B0TqlIA1eV4O44z5j59AUCXypxPAIb +AwULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRCV4O44z5j59P1rD/4mkpvICxd4 +tg7r5zgaVtQIaBwgjK9OnsStAiWkpe/PzG3Q0aDNGBO8vuwhI6LHhgU9fea3Mw0N +tpTFB00qwagKckXTAX9hj2EVcjH6KxUEoDlGyEZHLsUgizzGLy8laF2XaHn/Bs8D +fl41iF+fvl/XYD8y8f5F6eIWaJROx73Bjk22fWhndPJgtO4HeaL5/JOMdUvU12AE +Ipk22YBm416rDYixJucoGLlGfRuxMAImlaPgM18NAb25biU8Rd15+c3HgDtVBrTI +0C3XljKcio1cVAY1MyrcC0mKaTLIhsngD+DsjDItWzp8BYg3kHPFfh/8AMDNA960 +3ACcq436UdoqPzHqA/B6dRgw1M3F+dSlX24DzYZ3qz/sn2d2HmdkMO9+4epnk7lz +gxwz14F0mTPKiH/rx4dXo4A/D/KurFA8Ed1Div4azDwlKkk5au0C8KrjJstEy27u +5x41GtY5XoyI+lGGydMC6yrvoDPLxGLZaOIUgkN6hkz/BrkTZ/oEFybx4XxLkZg6 +gQVQTrtqsXZXEL5IEMD8mCP5TYrrTFRwBQNW6ngR7L7kYGb0ksB5TwIu0ZntRZIY +XgVXMbBCM3ehAWdXR0oj25gtkLzRCZSAkPKK1uMaEbksRrb5uuAnX/F8LxAeunQM +P2jbZ3ydT2pMPi8X1TYWCYa+56TaxjCzAbkCDQRfKnE8ARAAtG+2ME5GIjWPofPR +KZkhlMnjbwYL6bVcy2vUmfzuM/sM2SjP8W3x/yPZA+HHfe7+FRaeBzcOhCBuYTKF +K7F+fw1woljDOU1atVtBJu0MH7r47my/MPtuRg0bltT3AE3qJoAQZeDEefJvCcfZ +TPmZN1jETjjPRe045zkhk9tFt1ZB7d8wk+yo3PWwp0iX2p9LkyiCLvYFBqs0McLW +wQI4fgmeA5fiyMpJZJohZjR170Qbyk+QQ3Jri8EWeZvwJEfAPVxVMt1DOxPBv3PI +2AfYM0V8brEVF/2N/Lorpt3LcN+mAhJfASy4RimvE08gj5nJn3+aA98B3uPCZ6AN +IEOYIZPNWseYCWCqDHbiFFqaRIxnLfxgTygJzw8lvBAoBr15ZG5e6Xe4JRAn3Cvu +frkMs4xlnqhFR1tzNezWLn/j7+dOVHzSiPTiKGAjwEiLvusaxNhkVKqrDu3QoPFu +ogvtfyeSPVYcsP6F5IJ2LQzT5Cq8h+H1/+7/tQrhSWd/KAzRw5+rePuoecbaodfr +VaG9sqSMe/GlCBuhqGG4Y3mFaHnemgZaCj4jm0wvjyPo1ik5V9j4TU6nKPEEOXX3 +x4mHHflEOWslHeT9xX2aG5dnh7bHQnJLbbNbEilJxXtKeeuA/iOyPq6+lHWVDJYf +cDuYdAKr2Gzjffg3pfmN2zlOla8AEQEAAYkCNgQYAQoAIBYhBFFG+9xLkHROqUgD +V5Xg7jjPmPn0BQJfKnE8AhsMAAoJEJXg7jjPmPn0N0UQAIZKhyKBnad4A791bx+4 +iHU/zglxq73nUfRoIy1pxt7Sa7YTSG3029Mj6fsHCr5tCHmcSS8leF28CAz8Qs8S +UHf/i+aDk6wDk20V80jUYa6DkuUaolf2GxGBW3dwJKufq/L2lgPhN0R2MIL2gQM+ +M5EB+tpD+69laGrMVFqztSPcFpJjysnDKDiu5EFVD74zU8F9jn3kDD50DTx3LvrD +JD/X5y6TaxUw1TAjdUgrkG/PARxJu3za4anHMiMfHah6Y6dz7ROtCKFMjWH25y28 +O8TMJnVUZdp6uLu3PzWjit9bwB7UuVVlBUQX9piMr/A5WtucpucLGwn7G0ejuJyE +3Bq502QehItW6Ft0nlI8HGoecHXLQK3HUpLSf3BkBlXNz165iImG/RAgZUucbhHb +u2Bmj4c9bQuZucQ4j3dUsXc3y4M8V14d5V1MXceWZ0sGkUcXEzJQnQcy98yn5b9K +71zAI0i5UmtKXU/Xjss+WAfInBzpyq0bk9f9pur9UP7/2visiHQw70AfrSutXWiU +HzpIypF5A8FUA+gcNsUUPkbm4JeTTxTxb0AEb6iBC5eYmDdehhcMeYnNnE/STejM +5hUDBpGDAkbw0Wgolr/Qpxfxlkzstz8XSy2U6BVxkan1Oji889sTamWhHzLf7Ofo +eGh3VPV1RM3YCRkGY7/1fheg +=/4cF +-END PGP PUBLIC KEY BLOCK- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e85ed8a -> 513b6f5)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 add 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 16 ++-- .../spark/streaming/kinesis/KinesisBackedBlockRDD.scala | 2 +- 2 files changed, 7 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e85ed8a -> 513b6f5)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 add 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 16 ++-- .../spark/streaming/kinesis/KinesisBackedBlockRDD.scala | 2 +- 2 files changed, 7 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e85ed8a -> 513b6f5)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 add 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 16 ++-- .../spark/streaming/kinesis/KinesisBackedBlockRDD.scala | 2 +- 2 files changed, 7 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e85ed8a -> 513b6f5)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 add 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 16 ++-- .../spark/streaming/kinesis/KinesisBackedBlockRDD.scala | 2 +- 2 files changed, 7 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] ScrapCodes commented on pull request #295: Replace test-only to testOnly in Developer tools page
ScrapCodes commented on pull request #295: URL: https://github.com/apache/spark-website/pull/295#issuecomment-709241862 Thanks, @HyukjinKwon. I should have done it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e85ed8a -> 513b6f5)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 add 513b6f5 [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in Github Actions with SBT job No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 16 ++-- .../spark/streaming/kinesis/KinesisBackedBlockRDD.scala | 2 +- 2 files changed, 7 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #295: Replace test-only to testOnly in Developer tools page
HyukjinKwon commented on pull request #295: URL: https://github.com/apache/spark-website/pull/295#issuecomment-709235613 cc @ScrapCodes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon opened a new pull request #295: Replace test-only to testOnly in Developer tools page
HyukjinKwon opened a new pull request #295: URL: https://github.com/apache/spark-website/pull/295 See also https://github.com/apache/spark/pull/30028. After SBT was upgraded to 1.3, `test-only` should be `testOnly`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8e7c390 -> e85ed8a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' add e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8e7c390 -> e85ed8a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' add e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8e7c390 -> e85ed8a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' add e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8e7c390 -> e85ed8a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' add e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8e7c390 -> e85ed8a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' add e85ed8a [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (77a8efb -> 8e7c390)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 77a8efb [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command add 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 2 +- .../core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 6 +++--- .../spark/deploy/k8s/features/DriverCommandFeatureStepSuite.scala | 4 +--- .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh | 7 +-- .../spark/deploy/k8s/integrationtest/DecommissionSuite.scala | 1 - .../apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala | 4 +--- resource-managers/kubernetes/integration-tests/tests/pyfiles.py| 2 +- 7 files changed, 8 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (77a8efb -> 8e7c390)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 77a8efb [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command add 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 2 +- .../core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 6 +++--- .../spark/deploy/k8s/features/DriverCommandFeatureStepSuite.scala | 4 +--- .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh | 7 +-- .../spark/deploy/k8s/integrationtest/DecommissionSuite.scala | 1 - .../apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala | 4 +--- resource-managers/kubernetes/integration-tests/tests/pyfiles.py| 2 +- 7 files changed, 8 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (77a8efb -> 8e7c390)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 77a8efb [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command add 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 2 +- .../core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 6 +++--- .../spark/deploy/k8s/features/DriverCommandFeatureStepSuite.scala | 4 +--- .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh | 7 +-- .../spark/deploy/k8s/integrationtest/DecommissionSuite.scala | 1 - .../apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala | 4 +--- resource-managers/kubernetes/integration-tests/tests/pyfiles.py| 2 +- 7 files changed, 8 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (77a8efb -> 8e7c390)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 77a8efb [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command add 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 2 +- .../core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 6 +++--- .../spark/deploy/k8s/features/DriverCommandFeatureStepSuite.scala | 4 +--- .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh | 7 +-- .../spark/deploy/k8s/integrationtest/DecommissionSuite.scala | 1 - .../apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala | 4 +--- resource-managers/kubernetes/integration-tests/tests/pyfiles.py| 2 +- 7 files changed, 8 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (77a8efb -> 8e7c390)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 77a8efb [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command add 8e7c390 [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only '3' No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 2 +- .../core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 6 +++--- .../spark/deploy/k8s/features/DriverCommandFeatureStepSuite.scala | 4 +--- .../kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh | 7 +-- .../spark/deploy/k8s/integrationtest/DecommissionSuite.scala | 1 - .../apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala | 4 +--- resource-managers/kubernetes/integration-tests/tests/pyfiles.py| 2 +- 7 files changed, 8 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ec34a00 -> 77a8efb)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ec34a00 [SPARK-33153][SQL][TESTS] Ignore Spark 2.4 in HiveExternalCatalogVersionsSuite on Python 3.8/3.9 add 77a8efb [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 14 +- .../adaptive/AdaptiveQueryExecSuite.scala | 51 +- 2 files changed, 63 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ec34a00 -> 77a8efb)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ec34a00 [SPARK-33153][SQL][TESTS] Ignore Spark 2.4 in HiveExternalCatalogVersionsSuite on Python 3.8/3.9 add 77a8efb [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 14 +- .../adaptive/AdaptiveQueryExecSuite.scala | 51 +- 2 files changed, 63 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org