[spark] branch master updated (6f4a2e4 -> 3995728)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6f4a2e4 [MINOR][ML] Fix confusing error message in VectorAssembler add 3995728 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 No new revisions were added by this update. Summary of changes: pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c6f718b [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 c6f718b is described below commit c6f718b5f41fe76c9c6eedcf2e9684d4d291cb4d Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index b3750e4..9d36faf 100644 --- a/pom.xml +++ b/pom.xml @@ -149,7 +149,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 7574d99 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 7574d99 is described below commit 7574d99e9c8c9d3e92b1f8269ae09a7b7f0cdbd0 Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 0741096..a162cd1 100644 --- a/pom.xml +++ b/pom.xml @@ -144,7 +144,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c6f718b [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 c6f718b is described below commit c6f718b5f41fe76c9c6eedcf2e9684d4d291cb4d Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index b3750e4..9d36faf 100644 --- a/pom.xml +++ b/pom.xml @@ -149,7 +149,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 7574d99 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 7574d99 is described below commit 7574d99e9c8c9d3e92b1f8269ae09a7b7f0cdbd0 Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 0741096..a162cd1 100644 --- a/pom.xml +++ b/pom.xml @@ -144,7 +144,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c6f718b [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 c6f718b is described below commit c6f718b5f41fe76c9c6eedcf2e9684d4d291cb4d Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index b3750e4..9d36faf 100644 --- a/pom.xml +++ b/pom.xml @@ -149,7 +149,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 7574d99 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 7574d99 is described below commit 7574d99e9c8c9d3e92b1f8269ae09a7b7f0cdbd0 Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 0741096..a162cd1 100644 --- a/pom.xml +++ b/pom.xml @@ -144,7 +144,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 7574d99 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 7574d99 is described below commit 7574d99e9c8c9d3e92b1f8269ae09a7b7f0cdbd0 Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 0741096..a162cd1 100644 --- a/pom.xml +++ b/pom.xml @@ -144,7 +144,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c6f718b [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 c6f718b is described below commit c6f718b5f41fe76c9c6eedcf2e9684d4d291cb4d Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index b3750e4..9d36faf 100644 --- a/pom.xml +++ b/pom.xml @@ -149,7 +149,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 7574d99 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 7574d99 is described below commit 7574d99e9c8c9d3e92b1f8269ae09a7b7f0cdbd0 Author: Dongjoon Hyun AuthorDate: Thu Feb 27 17:05:56 2020 -0800 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 ### What changes were proposed in this pull request? This PR aims to upgrade `aws-java-sdk-sts` to `1.11.655`. ### Why are the changes needed? [SPARK-29677](https://github.com/apache/spark/pull/26333) upgrades AWS Kinesis Client to 1.12.0 for Apache Spark 2.4.5 and 3.0.0. Since AWS Kinesis Client 1.12.0 is using AWS SDK 1.11.665, `aws-java-sdk-sts` should be consistent with Kinesis client dependency. - https://github.com/awslabs/amazon-kinesis-client/releases/tag/v1.12.0 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #27720 from dongjoon-hyun/SPARK-30968. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3995728c3ce9d85b0436c0220f957b9d9133d64a) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 0741096..a162cd1 100644 --- a/pom.xml +++ b/pom.xml @@ -144,7 +144,7 @@ hadoop2 1.12.0 -1.11.271 +1.11.655 0.12.8 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c0d4cc3 -> 1383bd4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c0d4cc3 [MINOR][SQL] Remove unnecessary MiMa excludes add 1383bd4 [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/util/Utils.scala | 15 ++- .../src/test/scala/org/apache/spark/util/UtilsSuite.scala | 4 2 files changed, 10 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c0d4cc3 -> 1383bd4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c0d4cc3 [MINOR][SQL] Remove unnecessary MiMa excludes add 1383bd4 [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/util/Utils.scala | 15 ++- .../src/test/scala/org/apache/spark/util/UtilsSuite.scala | 4 2 files changed, 10 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b8e9cdc [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url b8e9cdc is described below commit b8e9cdcd14dcda68dde0c646f58d10880332691e Author: Kent Yao AuthorDate: Fri Feb 28 00:01:20 2020 -0800 [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url ### What changes were proposed in this pull request? ``` bin/spark-sql --master k8s:///https://kubernetes.docker.internal:6443 --conf spark.kubernetes.container.image=yaooqinn/spark:v2.4.4 Exception in thread "main" java.lang.NullPointerException at org.apache.spark.util.Utils$.checkAndGetK8sMasterUrl(Utils.scala:2739) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:261) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` Althrough `k8s:///https://kubernetes.docker.internal:6443` is a wrong master url but should not throw npe The `case null` will never be touched. https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/util/Utils.scala#L2772-L2776 ### Why are the changes needed? bug fix ### Does this PR introduce any user-facing change? no ### How was this patch tested? add ut case Closes #27721 from yaooqinn/SPARK-30970. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit 1383bd459a834fb075c5b570338fab0886110df9) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/util/Utils.scala | 15 ++- .../src/test/scala/org/apache/spark/util/UtilsSuite.scala | 4 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 297cc5e..dde4323 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -2772,19 +2772,16 @@ private[spark] object Utils extends Logging { } val masterScheme = new URI(masterWithoutK8sPrefix).getScheme -val resolvedURL = masterScheme.toLowerCase(Locale.ROOT) match { - case "https" => + +val resolvedURL = Option(masterScheme).map(_.toLowerCase(Locale.ROOT)) match { + case Some("https") => masterWithoutK8sPrefix - case "http" => + case Some("http") => logWarning("Kubernetes master URL uses HTTP instead of HTTPS.") masterWithoutK8sPrefix - case null => -val resolvedURL = s"https://$masterWithoutK8sPrefix"; -logInfo("No scheme specified for kubernetes master URL, so defaulting to https. Resolved " + - s"URL is $resolvedURL.") -resolvedURL case _ => -throw new IllegalArgumentException("Invalid Kubernetes master scheme: " + masterScheme) +throw new IllegalArgumentException("Invalid Kubernetes master scheme: " + masterScheme + + " found in URL: " + masterWithoutK8sPrefix) } s"k8s://$resolvedURL" diff --git a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala index 8f8902e..f5e438b 100644 --- a/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala @@ -1243,6 +1243,10 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { intercept[IllegalArgumentException] { Utils.checkAndGetK8sMasterUrl("k8s://foo://host:port") } + +intercept[IllegalArgumentException] { + Utils.checkAndGetK8sMasterUrl("k8s:///https://host:port";) +} } test("stringHalfWidth") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (7574d99 -> ff5ba49)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 7574d99 [SPARK-30968][BUILD] Upgrade aws-java-sdk-sts to 1.11.655 add ff5ba49 [SPARK-30970][K8S][CORE][2.4] Fix NPE while resolving k8s master url No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/util/Utils.scala | 15 ++- .../src/test/scala/org/apache/spark/util/UtilsSuite.scala | 4 2 files changed, 10 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28998][SQL][FOLLOW-UP] Remove unnecessary MiMa excludes
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 961c539 [SPARK-28998][SQL][FOLLOW-UP] Remove unnecessary MiMa excludes 961c539 is described below commit 961c539a676d2646a9315b427ad81852aa81b658 Author: Huaxin Gao AuthorDate: Fri Feb 28 11:22:08 2020 -0800 [SPARK-28998][SQL][FOLLOW-UP] Remove unnecessary MiMa excludes ### What changes were proposed in this pull request? Remove the cases for ```MissingTypesProblem```, ```InheritedNewAbstractMethodProblem```, ```DirectMissingMethodProblem``` and ```ReversedMissingMethodProblem```. ### Why are the changes needed? After the changes, we don't have ```org.apache.spark.sql.sources.v2``` any more, so the only problem we can get is ```MissingClassProblem``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually tested Closes #27731 from huaxingao/spark-28998-followup. Authored-by: Huaxin Gao Signed-off-by: Dongjoon Hyun --- project/MimaExcludes.scala | 8 1 file changed, 8 deletions(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index ccb545d..cd55fa8 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -339,14 +339,6 @@ object MimaExcludes { (problem: Problem) => problem match { case MissingClassProblem(cls) => !cls.fullName.startsWith("org.apache.spark.sql.sources.v2") - case MissingTypesProblem(newCls, _) => -!newCls.fullName.startsWith("org.apache.spark.sql.sources.v2") - case InheritedNewAbstractMethodProblem(cls, _) => -!cls.fullName.startsWith("org.apache.spark.sql.sources.v2") - case DirectMissingMethodProblem(meth) => -!meth.owner.fullName.startsWith("org.apache.spark.sql.sources.v2") - case ReversedMissingMethodProblem(meth) => -!meth.owner.fullName.startsWith("org.apache.spark.sql.sources.v2") case _ => true }, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28998][SQL][FOLLOW-UP] Remove unnecessary MiMa excludes
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2342e28 [SPARK-28998][SQL][FOLLOW-UP] Remove unnecessary MiMa excludes 2342e28 is described below commit 2342e280a57a108ff2327aae7157d85065016244 Author: Huaxin Gao AuthorDate: Fri Feb 28 11:22:08 2020 -0800 [SPARK-28998][SQL][FOLLOW-UP] Remove unnecessary MiMa excludes ### What changes were proposed in this pull request? Remove the cases for ```MissingTypesProblem```, ```InheritedNewAbstractMethodProblem```, ```DirectMissingMethodProblem``` and ```ReversedMissingMethodProblem```. ### Why are the changes needed? After the changes, we don't have ```org.apache.spark.sql.sources.v2``` any more, so the only problem we can get is ```MissingClassProblem``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually tested Closes #27731 from huaxingao/spark-28998-followup. Authored-by: Huaxin Gao Signed-off-by: Dongjoon Hyun (cherry picked from commit 961c539a676d2646a9315b427ad81852aa81b658) Signed-off-by: Dongjoon Hyun --- project/MimaExcludes.scala | 8 1 file changed, 8 deletions(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index 23f33a6..7f66577 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -335,14 +335,6 @@ object MimaExcludes { (problem: Problem) => problem match { case MissingClassProblem(cls) => !cls.fullName.startsWith("org.apache.spark.sql.sources.v2") - case MissingTypesProblem(newCls, _) => -!newCls.fullName.startsWith("org.apache.spark.sql.sources.v2") - case InheritedNewAbstractMethodProblem(cls, _) => -!cls.fullName.startsWith("org.apache.spark.sql.sources.v2") - case DirectMissingMethodProblem(meth) => -!meth.owner.fullName.startsWith("org.apache.spark.sql.sources.v2") - case ReversedMissingMethodProblem(meth) => -!meth.owner.fullName.startsWith("org.apache.spark.sql.sources.v2") case _ => true }, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30977][CORE][3.0] Make ResourceProfile and ResourceProfileBuilder private
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4cac4a5 [SPARK-30977][CORE][3.0] Make ResourceProfile and ResourceProfileBuilder private 4cac4a5 is described below commit 4cac4a50e0bd81edc7a6a18674a64045c7c247bb Author: Thomas Graves AuthorDate: Fri Feb 28 18:12:20 2020 -0800 [SPARK-30977][CORE][3.0] Make ResourceProfile and ResourceProfileBuilder private ### What changes were proposed in this pull request? Make the ResourceProfile and ResourceProfileBuilder apis private since the entire feature didn't make 3.0. ### Why are the changes needed? to not expose to user to early. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? unit tests Closes #27737 from tgravescs/SPARK-30977. Authored-by: Thomas Graves Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala | 5 - .../scala/org/apache/spark/resource/ResourceProfileBuilder.scala | 5 - 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala b/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala index 14019d2..f3c39d9 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala @@ -34,9 +34,12 @@ import org.apache.spark.internal.config.Python.PYSPARK_EXECUTOR_MEMORY * specify executor and task requirements for an RDD that will get applied during a * stage. This allows the user to change the resource requirements between stages. * This is meant to be immutable so user can't change it after building. + * + * This api is currently private until the rest of the pieces are in place and then it + * will become public. */ @Evolving -class ResourceProfile( +private[spark] class ResourceProfile( val executorResources: Map[String, ExecutorResourceRequest], val taskResources: Map[String, TaskResourceRequest]) extends Serializable with Logging { diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala b/core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala index 0d55c17..db1c77d 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala @@ -29,9 +29,12 @@ import org.apache.spark.annotation.Evolving * A ResourceProfile allows the user to specify executor and task requirements for an RDD * that will get applied during a stage. This allows the user to change the resource * requirements between stages. + * + * This api is currently private until the rest of the pieces are in place and then it + * will become public. */ @Evolving -class ResourceProfileBuilder() { +private[spark] class ResourceProfileBuilder() { private val _taskResources = new ConcurrentHashMap[String, TaskResourceRequest]() private val _executorResources = new ConcurrentHashMap[String, ExecutorResourceRequest]() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30977][CORE][3.0] Make ResourceProfile and ResourceProfileBuilder private
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4cac4a5 [SPARK-30977][CORE][3.0] Make ResourceProfile and ResourceProfileBuilder private 4cac4a5 is described below commit 4cac4a50e0bd81edc7a6a18674a64045c7c247bb Author: Thomas Graves AuthorDate: Fri Feb 28 18:12:20 2020 -0800 [SPARK-30977][CORE][3.0] Make ResourceProfile and ResourceProfileBuilder private ### What changes were proposed in this pull request? Make the ResourceProfile and ResourceProfileBuilder apis private since the entire feature didn't make 3.0. ### Why are the changes needed? to not expose to user to early. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? unit tests Closes #27737 from tgravescs/SPARK-30977. Authored-by: Thomas Graves Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala | 5 - .../scala/org/apache/spark/resource/ResourceProfileBuilder.scala | 5 - 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala b/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala index 14019d2..f3c39d9 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala @@ -34,9 +34,12 @@ import org.apache.spark.internal.config.Python.PYSPARK_EXECUTOR_MEMORY * specify executor and task requirements for an RDD that will get applied during a * stage. This allows the user to change the resource requirements between stages. * This is meant to be immutable so user can't change it after building. + * + * This api is currently private until the rest of the pieces are in place and then it + * will become public. */ @Evolving -class ResourceProfile( +private[spark] class ResourceProfile( val executorResources: Map[String, ExecutorResourceRequest], val taskResources: Map[String, TaskResourceRequest]) extends Serializable with Logging { diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala b/core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala index 0d55c17..db1c77d 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala @@ -29,9 +29,12 @@ import org.apache.spark.annotation.Evolving * A ResourceProfile allows the user to specify executor and task requirements for an RDD * that will get applied during a stage. This allows the user to change the resource * requirements between stages. + * + * This api is currently private until the rest of the pieces are in place and then it + * will become public. */ @Evolving -class ResourceProfileBuilder() { +private[spark] class ResourceProfileBuilder() { private val _taskResources = new ConcurrentHashMap[String, TaskResourceRequest]() private val _executorResources = new ConcurrentHashMap[String, ExecutorResourceRequest]() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b517f99 -> f0010c8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b517f99 [SPARK-30969][CORE] Remove resource coordination support from Standalone add f0010c8 [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala | 2 +- sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +- .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++--- .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +- .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +- .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++--- .../src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +- .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +- .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala | 2 +- 10 files changed, 15 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b517f99 -> f0010c8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b517f99 [SPARK-30969][CORE] Remove resource coordination support from Standalone add f0010c8 [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala | 2 +- sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +- .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++--- .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +- .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +- .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++--- .../src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +- .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +- .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala | 2 +- 10 files changed, 15 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cb23f0 [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests 8cb23f0 is described below commit 8cb23f0cb5b20b7e49fdd16c52d6451e901d9a7a Author: Josh Rosen AuthorDate: Mon Mar 2 15:20:45 2020 -0800 [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests ### What changes were proposed in this pull request? This patch fixes several incorrect uses of `assume()` in our tests. If a call to `assume(condition)` fails then it will cause the test to be marked as skipped instead of failed: this feature allows test cases to be skipped if certain prerequisites are missing. For example, we use this to skip certain tests when running on Windows (or when Python dependencies are unavailable). In contrast, `assert(condition)` will fail the test if the condition doesn't hold. If `assume()` is accidentally substituted for `assert()`then the resulting test will be marked as skipped in cases where it should have failed, undermining the purpose of the test. This patch fixes several such cases, replacing certain `assume()` calls with `assert()`. Credit to ahirreddy for spotting this problem. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes #27754 from JoshRosen/fix-assume-vs-assert. Lead-authored-by: Josh Rosen Co-authored-by: Josh Rosen Signed-off-by: Dongjoon Hyun (cherry picked from commit f0010c81e2ef9b8859b39917bb62b48d739a4a22) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala | 2 +- sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +- .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++--- .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +- .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +- .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++--- .../src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +- .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +- .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala | 2 +- 10 files changed, 15 insertions(+), 15 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala index 94e251d..4488902 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala @@ -106,7 +106,7 @@ class OrderingSuite extends SparkFunSuite with ExpressionEvalHelper { StructField("a", dataType, nullable = true) :: StructField("b", dataType, nullable = true) :: Nil) val maybeDataGenerator = RandomDataGenerator.forType(rowType, nullable = false) -assume(maybeDataGenerator.isDefined) +assert(maybeDataGenerator.isDefined) val randGenerator = maybeDataGenerator.get val toCatalyst = CatalystTypeConverters.createToCatalystConverter(rowType) for (_ <- 1 to 50) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala index cd2c681..8189353 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala @@ -195,7 +195,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils } test("SPARK-1669: cacheTable should be idempotent") { -assume(!spark.table("testData").logicalPlan.isInstanceOf[InMemoryRelation]) +assert(!spark.table("testData").logicalPlan.isInstanceOf[InMemoryRelation]) spark.catalog.cacheTable("testData") assertCached(spark.table("testData")) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala index 6c824c2..5a67dce 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala @@ -1033,7 +1033,7 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { df.write.insertInto("students") spark.
[spark] branch branch-3.0 updated: [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8cb23f0 [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests 8cb23f0 is described below commit 8cb23f0cb5b20b7e49fdd16c52d6451e901d9a7a Author: Josh Rosen AuthorDate: Mon Mar 2 15:20:45 2020 -0800 [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests ### What changes were proposed in this pull request? This patch fixes several incorrect uses of `assume()` in our tests. If a call to `assume(condition)` fails then it will cause the test to be marked as skipped instead of failed: this feature allows test cases to be skipped if certain prerequisites are missing. For example, we use this to skip certain tests when running on Windows (or when Python dependencies are unavailable). In contrast, `assert(condition)` will fail the test if the condition doesn't hold. If `assume()` is accidentally substituted for `assert()`then the resulting test will be marked as skipped in cases where it should have failed, undermining the purpose of the test. This patch fixes several such cases, replacing certain `assume()` calls with `assert()`. Credit to ahirreddy for spotting this problem. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes #27754 from JoshRosen/fix-assume-vs-assert. Lead-authored-by: Josh Rosen Co-authored-by: Josh Rosen Signed-off-by: Dongjoon Hyun (cherry picked from commit f0010c81e2ef9b8859b39917bb62b48d739a4a22) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala | 2 +- sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +- .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++--- .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +- .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +- .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++--- .../src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +- .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +- .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala | 2 +- 10 files changed, 15 insertions(+), 15 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala index 94e251d..4488902 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala @@ -106,7 +106,7 @@ class OrderingSuite extends SparkFunSuite with ExpressionEvalHelper { StructField("a", dataType, nullable = true) :: StructField("b", dataType, nullable = true) :: Nil) val maybeDataGenerator = RandomDataGenerator.forType(rowType, nullable = false) -assume(maybeDataGenerator.isDefined) +assert(maybeDataGenerator.isDefined) val randGenerator = maybeDataGenerator.get val toCatalyst = CatalystTypeConverters.createToCatalystConverter(rowType) for (_ <- 1 to 50) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala index cd2c681..8189353 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala @@ -195,7 +195,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils } test("SPARK-1669: cacheTable should be idempotent") { -assume(!spark.table("testData").logicalPlan.isInstanceOf[InMemoryRelation]) +assert(!spark.table("testData").logicalPlan.isInstanceOf[InMemoryRelation]) spark.catalog.cacheTable("testData") assertCached(spark.table("testData")) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala index 6c824c2..5a67dce 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala @@ -1033,7 +1033,7 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { df.write.insertInto("students") spark.
[spark] branch branch-2.4 updated (cd8f86a -> 0b71b4d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from cd8f86a [SPARK-30813][ML] Fix Matrices.sprand comments add 0b71b4d [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala | 2 +- sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +- .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++--- .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +- .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +- .../src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +- .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++--- .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +- .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala | 2 +- 10 files changed, 15 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (cd8f86a -> 0b71b4d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from cd8f86a [SPARK-30813][ML] Fix Matrices.sprand comments add 0b71b4d [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala | 2 +- sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +- .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++--- .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +- .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +- .../src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +- .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++--- .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +- .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala | 2 +- 10 files changed, 15 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c263c15 -> 4a1d273)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c263c15 [SPARK-31015][SQL] Star(*) expression fails when used with qualified column names for v2 tables add 4a1d273 [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 14 ++ .../spark/sql/catalyst/analysis/AnalysisErrorSuite.scala | 15 +++ .../org/apache/spark/sql/GeneratorFunctionSuite.scala | 5 + 3 files changed, 34 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c263c15 -> 4a1d273)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c263c15 [SPARK-31015][SQL] Star(*) expression fails when used with qualified column names for v2 tables add 4a1d273 [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 14 ++ .../spark/sql/catalyst/analysis/AnalysisErrorSuite.scala | 15 +++ .../org/apache/spark/sql/GeneratorFunctionSuite.scala | 5 + 3 files changed, 34 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 7d853ab [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions 7d853ab is described below commit 7d853ab6eba479a7cc5d8839b4fc497bc6b6d4c8 Author: Takeshi Yamamuro AuthorDate: Tue Mar 3 12:25:12 2020 -0800 [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions ### What changes were proposed in this pull request? We have supported generators in SQL aggregate expressions by SPARK-28782. But, the generator(explode) query with aggregate functions in DataFrame failed as follows; ``` // SPARK-28782: Generator support in aggregate expressions scala> spark.range(3).toDF("id").createOrReplaceTempView("t") scala> sql("select explode(array(min(id), max(id))) from t").show() +---+ |col| +---+ | 0| | 2| +---+ // A failure case handled in this pr scala> spark.range(3).select(explode(array(min($"id"), max($"id".show() org.apache.spark.sql.AnalysisException: The query operator `Generate` contains one or more unsupported expression types Aggregate, Window or Generate. Invalid expressions: [min(`id`), max(`id`)];; Project [col#46L] +- Generate explode(array(min(id#42L), max(id#42L))), false, [col#46L] +- Range (0, 3, step=1, splits=Some(4)) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:49) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:48) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:129) ``` The root cause is that `ExtractGenerator` wrongly replaces a project w/ aggregate functions before `GlobalAggregates` replaces it with an aggregate as follows; ``` scala> sql("SET spark.sql.optimizer.planChangeLog.level=warn") scala> spark.range(3).select(explode(array(min($"id"), max($"id".show() 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences === !'Project [explode(array(min('id), max('id))) AS List()] 'Project [explode(array(min(id#72L), max(id#72L))) AS List()] +- Range (0, 3, step=1, splits=Some(4)) +- Range (0, 3, step=1, splits=Some(4)) 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator === !'Project [explode(array(min(id#72L), max(id#72L))) AS List()] Project [col#76L] !+- Range (0, 3, step=1, splits=Some(4)) +- Generate explode(array(min(id#72L), max(id#72L))), false, [col#76L] ! +- Range (0, 3, step=1, splits=Some(4)) 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: === Result of Batch Resolution === !'Project [explode(array(min('id), max('id))) AS List()] Project [col#76L] !+- Range (0, 3, step=1, splits=Some(4)) +- Generate explode(array(min(id#72L), max(id#72L))), false, [col#76L] ! +- Range (0, 3, step=1, splits=Some(4)) // the analysis failed here... ``` To avoid the case in `ExtractGenerator`, this pr addes a condition to ignore generators having aggregate functions. A correct sequence of rules is as follows; ``` 20/03/01 13:19:06 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences === !'Project [explode(array(min('id), max('id))) AS List()] 'Project [explode(array(min(id#27L), max(id#27L))) AS List()] +- Range (0, 3, step=1, splits=Some(4)) +- Range (0, 3, step=1, splits=Some(4)) 20/03/01 13:19:06 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates === !'Project [explode(array(min(id#27L), max(id#27L))) AS List()] 'Aggregate [explode(array(min(id#27L), max(id#27L))) AS List()] +- Range (0, 3, step=1, splits=Some(4)) +- Range (0, 3, step=1, splits=Some(4)) 20/03/01 13:19:06 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator === !'Aggregate [explode(array(min(id#27L), max(id#27L))) AS List()] 'Project
[spark] branch branch-3.0 updated: [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 7d853ab [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions 7d853ab is described below commit 7d853ab6eba479a7cc5d8839b4fc497bc6b6d4c8 Author: Takeshi Yamamuro AuthorDate: Tue Mar 3 12:25:12 2020 -0800 [SPARK-30997][SQL] Fix an analysis failure in generators with aggregate functions ### What changes were proposed in this pull request? We have supported generators in SQL aggregate expressions by SPARK-28782. But, the generator(explode) query with aggregate functions in DataFrame failed as follows; ``` // SPARK-28782: Generator support in aggregate expressions scala> spark.range(3).toDF("id").createOrReplaceTempView("t") scala> sql("select explode(array(min(id), max(id))) from t").show() +---+ |col| +---+ | 0| | 2| +---+ // A failure case handled in this pr scala> spark.range(3).select(explode(array(min($"id"), max($"id".show() org.apache.spark.sql.AnalysisException: The query operator `Generate` contains one or more unsupported expression types Aggregate, Window or Generate. Invalid expressions: [min(`id`), max(`id`)];; Project [col#46L] +- Generate explode(array(min(id#42L), max(id#42L))), false, [col#46L] +- Range (0, 3, step=1, splits=Some(4)) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:49) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:48) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:129) ``` The root cause is that `ExtractGenerator` wrongly replaces a project w/ aggregate functions before `GlobalAggregates` replaces it with an aggregate as follows; ``` scala> sql("SET spark.sql.optimizer.planChangeLog.level=warn") scala> spark.range(3).select(explode(array(min($"id"), max($"id".show() 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences === !'Project [explode(array(min('id), max('id))) AS List()] 'Project [explode(array(min(id#72L), max(id#72L))) AS List()] +- Range (0, 3, step=1, splits=Some(4)) +- Range (0, 3, step=1, splits=Some(4)) 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator === !'Project [explode(array(min(id#72L), max(id#72L))) AS List()] Project [col#76L] !+- Range (0, 3, step=1, splits=Some(4)) +- Generate explode(array(min(id#72L), max(id#72L))), false, [col#76L] ! +- Range (0, 3, step=1, splits=Some(4)) 20/03/01 12:51:58 WARN HiveSessionStateBuilder$$anon$1: === Result of Batch Resolution === !'Project [explode(array(min('id), max('id))) AS List()] Project [col#76L] !+- Range (0, 3, step=1, splits=Some(4)) +- Generate explode(array(min(id#72L), max(id#72L))), false, [col#76L] ! +- Range (0, 3, step=1, splits=Some(4)) // the analysis failed here... ``` To avoid the case in `ExtractGenerator`, this pr addes a condition to ignore generators having aggregate functions. A correct sequence of rules is as follows; ``` 20/03/01 13:19:06 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences === !'Project [explode(array(min('id), max('id))) AS List()] 'Project [explode(array(min(id#27L), max(id#27L))) AS List()] +- Range (0, 3, step=1, splits=Some(4)) +- Range (0, 3, step=1, splits=Some(4)) 20/03/01 13:19:06 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates === !'Project [explode(array(min(id#27L), max(id#27L))) AS List()] 'Aggregate [explode(array(min(id#27L), max(id#27L))) AS List()] +- Range (0, 3, step=1, splits=Some(4)) +- Range (0, 3, step=1, splits=Some(4)) 20/03/01 13:19:06 WARN HiveSessionStateBuilder$$anon$1: === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator === !'Aggregate [explode(array(min(id#27L), max(id#27L))) AS List()] 'Project
[spark] branch master updated (ebcff67 -> 3edab6c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ebcff67 [SPARK-30889][SPARK-30913][CORE][DOC] Add version information to the configuration of Tests.scala and Worker add 3edab6c [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 2 +- docs/configuration.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 104a768 [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit 104a768 is described below commit 104a768e242bf5399bce642b9c6295476d9cdad8 Author: Kent Yao AuthorDate: Wed Mar 4 20:37:51 2020 -0800 [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit ### What changes were proposed in this pull request? -c is short for --conf, it was introduced since v1.1.0 but hidden from users until now ### Why are the changes needed? ### Does this PR introduce any user-facing change? no expose hidden feature ### How was this patch tested? Nah Closes #27802 from yaooqinn/conf. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit 3edab6cc1d70c102093e973a2cf97208db19be8c) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 2 +- docs/configuration.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala index 3f7cfea..3090a3b 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala @@ -513,7 +513,7 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | directory of each executor. File paths of these files | in executors can be accessed via SparkFiles.get(fileName). | -| --conf PROP=VALUE Arbitrary Spark configuration property. +| --conf, -c PROP=VALUE Arbitrary Spark configuration property. | --properties-file FILE Path to a file from which to load extra properties. If not | specified, this will look for conf/spark-defaults.conf. | diff --git a/docs/configuration.md b/docs/configuration.md index 5e6fe93..f7b7e16 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -95,7 +95,7 @@ Then, you can supply configuration values at runtime: The Spark shell and [`spark-submit`](submitting-applications.html) tool support two ways to load configurations dynamically. The first is command line options, -such as `--master`, as shown above. `spark-submit` can accept any Spark property using the `--conf` +such as `--master`, as shown above. `spark-submit` can accept any Spark property using the `--conf/-c` flag, but uses special flags for properties that play a part in launching the Spark application. Running `./bin/spark-submit --help` will show the entire list of these options. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1c17ede [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit 1c17ede is described below commit 1c17ede75082fe56d3c5aedc14ac6246fdf3b333 Author: Kent Yao AuthorDate: Wed Mar 4 20:37:51 2020 -0800 [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit ### What changes were proposed in this pull request? -c is short for --conf, it was introduced since v1.1.0 but hidden from users until now ### Why are the changes needed? ### Does this PR introduce any user-facing change? no expose hidden feature ### How was this patch tested? Nah Closes #27802 from yaooqinn/conf. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit 3edab6cc1d70c102093e973a2cf97208db19be8c) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 2 +- docs/configuration.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala index 974a0b7..3d489a3 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala @@ -541,7 +541,7 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | directory of each executor. File paths of these files | in executors can be accessed via SparkFiles.get(fileName). | -| --conf PROP=VALUE Arbitrary Spark configuration property. +| --conf, -c PROP=VALUE Arbitrary Spark configuration property. | --properties-file FILE Path to a file from which to load extra properties. If not | specified, this will look for conf/spark-defaults.conf. | diff --git a/docs/configuration.md b/docs/configuration.md index 1582082..6bb1bda 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -80,7 +80,7 @@ Then, you can supply configuration values at runtime: The Spark shell and [`spark-submit`](submitting-applications.html) tool support two ways to load configurations dynamically. The first is command line options, -such as `--master`, as shown above. `spark-submit` can accept any Spark property using the `--conf` +such as `--master`, as shown above. `spark-submit` can accept any Spark property using the `--conf/-c` flag, but uses special flags for properties that play a part in launching the Spark application. Running `./bin/spark-submit --help` will show the entire list of these options. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1c17ede [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit 1c17ede is described below commit 1c17ede75082fe56d3c5aedc14ac6246fdf3b333 Author: Kent Yao AuthorDate: Wed Mar 4 20:37:51 2020 -0800 [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit ### What changes were proposed in this pull request? -c is short for --conf, it was introduced since v1.1.0 but hidden from users until now ### Why are the changes needed? ### Does this PR introduce any user-facing change? no expose hidden feature ### How was this patch tested? Nah Closes #27802 from yaooqinn/conf. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit 3edab6cc1d70c102093e973a2cf97208db19be8c) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 2 +- docs/configuration.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala index 974a0b7..3d489a3 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala @@ -541,7 +541,7 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | directory of each executor. File paths of these files | in executors can be accessed via SparkFiles.get(fileName). | -| --conf PROP=VALUE Arbitrary Spark configuration property. +| --conf, -c PROP=VALUE Arbitrary Spark configuration property. | --properties-file FILE Path to a file from which to load extra properties. If not | specified, this will look for conf/spark-defaults.conf. | diff --git a/docs/configuration.md b/docs/configuration.md index 1582082..6bb1bda 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -80,7 +80,7 @@ Then, you can supply configuration values at runtime: The Spark shell and [`spark-submit`](submitting-applications.html) tool support two ways to load configurations dynamically. The first is command line options, -such as `--master`, as shown above. `spark-submit` can accept any Spark property using the `--conf` +such as `--master`, as shown above. `spark-submit` can accept any Spark property using the `--conf/-c` flag, but uses special flags for properties that play a part in launching the Spark application. Running `./bin/spark-submit --help` will show the entire list of these options. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31050][TEST] Disable flaky `Roundtrip` test in KafkaDelegationTokenSuite
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0a22f19 [SPARK-31050][TEST] Disable flaky `Roundtrip` test in KafkaDelegationTokenSuite 0a22f19 is described below commit 0a22f1966466629cb745d000a0608d521fece093 Author: yi.wu AuthorDate: Thu Mar 5 00:21:32 2020 -0800 [SPARK-31050][TEST] Disable flaky `Roundtrip` test in KafkaDelegationTokenSuite ### What changes were proposed in this pull request? Disable test `KafkaDelegationTokenSuite`. ### Why are the changes needed? `KafkaDelegationTokenSuite` is too flaky. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass Jenkins. Closes #27789 from Ngone51/retry_kafka. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala index 3064838..79239e5 100644 --- a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala +++ b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala @@ -62,7 +62,7 @@ class KafkaDelegationTokenSuite extends StreamTest with SharedSparkSession with } } - test("Roundtrip") { + ignore("Roundtrip") { val hadoopConf = new Configuration() val manager = new HadoopDelegationTokenManager(spark.sparkContext.conf, hadoopConf, null) val credentials = new Credentials() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31050][TEST] Disable flaky `Roundtrip` test in KafkaDelegationTokenSuite
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 9cea92b [SPARK-31050][TEST] Disable flaky `Roundtrip` test in KafkaDelegationTokenSuite 9cea92b is described below commit 9cea92b6f2c2fc8e0effcec710e6ff6e8a7c965f Author: yi.wu AuthorDate: Thu Mar 5 00:21:32 2020 -0800 [SPARK-31050][TEST] Disable flaky `Roundtrip` test in KafkaDelegationTokenSuite ### What changes were proposed in this pull request? Disable test `KafkaDelegationTokenSuite`. ### Why are the changes needed? `KafkaDelegationTokenSuite` is too flaky. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass Jenkins. Closes #27789 from Ngone51/retry_kafka. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun (cherry picked from commit 0a22f1966466629cb745d000a0608d521fece093) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala index 3064838..79239e5 100644 --- a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala +++ b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala @@ -62,7 +62,7 @@ class KafkaDelegationTokenSuite extends StreamTest with SharedSparkSession with } } - test("Roundtrip") { + ignore("Roundtrip") { val hadoopConf = new Configuration() val manager = new HadoopDelegationTokenManager(spark.sparkContext.conf, hadoopConf, null) val credentials = new Credentials() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d705d36 -> afb84e9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d705d36 [SPARK-31045][SQL] Add config for AQE logging level add afb84e9 [SPARK-30886][SQL] Deprecate two-parameter TRIM/LTRIM/RTRIM functions No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 20 ++--- .../sql/catalyst/analysis/AnalysisSuite.scala | 52 ++ 2 files changed, 66 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30886][SQL] Deprecate two-parameter TRIM/LTRIM/RTRIM functions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1535b2b [SPARK-30886][SQL] Deprecate two-parameter TRIM/LTRIM/RTRIM functions 1535b2b is described below commit 1535b2bb51782a89b271b1ebe53273ab610b390b Author: Dongjoon Hyun AuthorDate: Thu Mar 5 20:09:39 2020 -0800 [SPARK-30886][SQL] Deprecate two-parameter TRIM/LTRIM/RTRIM functions ### What changes were proposed in this pull request? This PR aims to show a deprecation warning on two-parameter TRIM/LTRIM/RTRIM function usages based on the community decision. - https://lists.apache.org/thread.html/r48b6c2596ab06206b7b7fd4bbafd4099dccd4e2cf9801aaa9034c418%40%3Cdev.spark.apache.org%3E ### Why are the changes needed? For backward compatibility, SPARK-28093 is reverted. However, from Apache Spark 3.0.0, we should give a safe guideline to use SQL syntax instead of the esoteric function signatures. ### Does this PR introduce any user-facing change? Yes. This shows a directional warning. ### How was this patch tested? Pass the Jenkins with a newly added test case. Closes #27643 from dongjoon-hyun/SPARK-30886. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit afb84e9d378003c57cd01d21cdb1a977ba25454b) Signed-off-by: Dongjoon Hyun --- .../spark/sql/catalyst/analysis/Analyzer.scala | 20 ++--- .../sql/catalyst/analysis/AnalysisSuite.scala | 52 ++ 2 files changed, 66 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 3cb754d..eadcd0f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.analysis import java.util import java.util.Locale +import java.util.concurrent.atomic.AtomicBoolean import scala.collection.mutable import scala.collection.mutable.ArrayBuffer @@ -1795,6 +1796,7 @@ class Analyzer( * Replaces [[UnresolvedFunction]]s with concrete [[Expression]]s. */ object ResolveFunctions extends Rule[LogicalPlan] { +val trimWarningEnabled = new AtomicBoolean(true) def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { case q: LogicalPlan => q transformExpressions { @@ -1839,13 +1841,19 @@ class Analyzer( } AggregateExpression(agg, Complete, isDistinct, filter) // This function is not an aggregate function, just return the resolved one. -case other => - if (isDistinct || filter.isDefined) { -failAnalysis("DISTINCT or FILTER specified, " + - s"but ${other.prettyName} is not an aggregate function") - } else { -other +case other if (isDistinct || filter.isDefined) => + failAnalysis("DISTINCT or FILTER specified, " + +s"but ${other.prettyName} is not an aggregate function") +case e: String2TrimExpression if arguments.size == 2 => + if (trimWarningEnabled.get) { +log.warn("Two-parameter TRIM/LTRIM/RTRIM function signatures are deprecated." + + " Use SQL syntax `TRIM((BOTH | LEADING | TRAILING)? trimStr FROM str)`" + + " instead.") +trimWarningEnabled.set(false) } + e +case other => + other } } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala index d385133..8451b9b 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala @@ -21,6 +21,7 @@ import java.util.{Locale, TimeZone} import scala.reflect.ClassTag +import org.apache.log4j.Level import org.scalatest.Matchers import org.apache.spark.api.python.PythonEvalType @@ -768,4 +769,55 @@ class AnalysisSuite extends AnalysisTest with Matchers { assert(message.startsWith(s"Max iterations ($maxIterations) reached for batch Resolution, " + s"please set '${SQLConf.ANALYZER_MAX_ITERATIONS.key}&
[spark] branch master updated (587266f -> 1426ad8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 587266f [SPARK-31010][SQL][FOLLOW-UP] Deprecate untyped scala UDF add 1426ad8 [SPARK-23817][FOLLOWUP][TEST] Add OrcV2QuerySuite No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/orc/OrcQuerySuite.scala | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (5375b40 -> 7c09c9f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 5375b40 [SPARK-31010][SQL][FOLLOW-UP] Deprecate untyped scala UDF add 7c09c9f [SPARK-23817][FOLLOWUP][TEST] Add OrcV2QuerySuite No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/orc/OrcQuerySuite.scala | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31045][SQL][FOLLOWUP][3.0] Fix build due to divergence between master and 3.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 9b48f33 [SPARK-31045][SQL][FOLLOWUP][3.0] Fix build due to divergence between master and 3.0 9b48f33 is described below commit 9b48f3358d3efb523715a5f258e5ed83e28692f6 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Thu Mar 5 21:31:08 2020 -0800 [SPARK-31045][SQL][FOLLOWUP][3.0] Fix build due to divergence between master and 3.0 ### What changes were proposed in this pull request? This patch fixes the build failure in `branch-3.0` due to cherry-picking SPARK-31045 to branch-3.0, as `.version()` is not available in `branch-3.0` yet. ### Why are the changes needed? The build is failing in `branch-3.0`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Jenkins build will verify. Closes #27826 from HeartSaVioR/SPARK-31045-branch-3.0-FOLLOWUP. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Dongjoon Hyun --- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 1 - 1 file changed, 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index cd465bc..fdaf0ec 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -382,7 +382,6 @@ object SQLConf { .internal() .doc("Configures the log level for adaptive execution logging of plan changes. The value " + "can be 'trace', 'debug', 'info', 'warn', or 'error'. The default log level is 'debug'.") -.version("3.0.0") .stringConf .transform(_.toUpperCase(Locale.ROOT)) .checkValues(Set("TRACE", "DEBUG", "INFO", "WARN", "ERROR")) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (d73ea97 -> 895ddde)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from d73ea97 [SPARK-31012][ML][PYSPARK][DOCS] Updating ML API docs for 3.0 changes add 895ddde [SPARK-31014][CORE][3.0] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach No new revisions were added by this update. Summary of changes: .../apache/spark/util/kvstore/InMemoryStore.java | 30 +++--- 1 file changed, 21 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31053][SQL] mark connector APIs as Evolving
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1aa1847 [SPARK-31053][SQL] mark connector APIs as Evolving 1aa1847 is described below commit 1aa184763aa49d70907669b2d8af5a713ee0d7fa Author: Wenchen Fan AuthorDate: Sun Mar 8 11:41:09 2020 -0700 [SPARK-31053][SQL] mark connector APIs as Evolving ### What changes were proposed in this pull request? The newly added catalog APIs are marked as Experimental but other DS v2 APIs are marked as Evolving. This PR makes it consistent and mark all Connector APIs as Evolving. ### Why are the changes needed? For consistency. ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27811 from cloud-fan/tag. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../java/org/apache/spark/sql/connector/catalog/CatalogExtension.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java| 4 ++-- .../spark/sql/connector/catalog/DelegatingCatalogExtension.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/Identifier.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/IdentifierImpl.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/NamespaceChange.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/StagedTable.java | 4 ++-- .../org/apache/spark/sql/connector/catalog/StagingTableCatalog.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsDelete.java | 4 ++-- .../org/apache/spark/sql/connector/catalog/SupportsNamespaces.java| 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsRead.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsWrite.java| 4 ++-- .../java/org/apache/spark/sql/connector/catalog/TableCapability.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/TableCatalog.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/TableChange.java | 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Expression.java | 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Expressions.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/expressions/Literal.java | 4 ++-- .../org/apache/spark/sql/connector/expressions/NamedReference.java| 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Transform.java| 4 ++-- .../apache/spark/sql/connector/write/SupportsDynamicOverwrite.java| 3 +++ .../java/org/apache/spark/sql/connector/write/SupportsOverwrite.java | 2 ++ .../java/org/apache/spark/sql/connector/write/SupportsTruncate.java | 3 +++ 23 files changed, 48 insertions(+), 40 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java index 61cb83c..155dca5 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java @@ -17,7 +17,7 @@ package org.apache.spark.sql.connector.catalog; -import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.util.CaseInsensitiveStringMap; /** @@ -29,7 +29,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap; * * @since 3.0.0 */ -@Experimental +@Evolving public interface CatalogExtension extends TableCatalog, SupportsNamespaces { /** diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java index 2958538..8ca4f56 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java @@ -17,7 +17,7 @@ package org.apache.spark.sql.connector.catalog; -import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.internal.SQLConf; import org.apache.spark.sql.util.CaseInsensitiveStringMap; @@ -41,7 +41,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap; * * @since 3.0.0 */ -@Experimental +@Evolving public interface CatalogPlugin { /** * Called to initialize configuration. diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java index 5a51959..d07d299 100644 --- a/sql/catalyst/src/main/java/org/apache
[spark] branch master updated (f8a3730 -> 1aa1847)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f8a3730 [SPARK-30841][SQL][DOC][FOLLOW-UP] Add version information to the configuration of SQL add 1aa1847 [SPARK-31053][SQL] mark connector APIs as Evolving No new revisions were added by this update. Summary of changes: .../java/org/apache/spark/sql/connector/catalog/CatalogExtension.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java| 4 ++-- .../spark/sql/connector/catalog/DelegatingCatalogExtension.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/Identifier.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/IdentifierImpl.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/NamespaceChange.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/StagedTable.java | 4 ++-- .../org/apache/spark/sql/connector/catalog/StagingTableCatalog.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsDelete.java | 4 ++-- .../org/apache/spark/sql/connector/catalog/SupportsNamespaces.java| 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsRead.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsWrite.java| 4 ++-- .../java/org/apache/spark/sql/connector/catalog/TableCapability.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/TableCatalog.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/TableChange.java | 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Expression.java | 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Expressions.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/expressions/Literal.java | 4 ++-- .../org/apache/spark/sql/connector/expressions/NamedReference.java| 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Transform.java| 4 ++-- .../apache/spark/sql/connector/write/SupportsDynamicOverwrite.java| 3 +++ .../java/org/apache/spark/sql/connector/write/SupportsOverwrite.java | 2 ++ .../java/org/apache/spark/sql/connector/write/SupportsTruncate.java | 3 +++ 23 files changed, 48 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31053][SQL] mark connector APIs as Evolving
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4287b03 [SPARK-31053][SQL] mark connector APIs as Evolving 4287b03 is described below commit 4287b03a9c564eb2bdb4dfd93ea78728e3a9e440 Author: Wenchen Fan AuthorDate: Sun Mar 8 11:41:09 2020 -0700 [SPARK-31053][SQL] mark connector APIs as Evolving ### What changes were proposed in this pull request? The newly added catalog APIs are marked as Experimental but other DS v2 APIs are marked as Evolving. This PR makes it consistent and mark all Connector APIs as Evolving. ### Why are the changes needed? For consistency. ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27811 from cloud-fan/tag. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit 1aa184763aa49d70907669b2d8af5a713ee0d7fa) Signed-off-by: Dongjoon Hyun --- .../java/org/apache/spark/sql/connector/catalog/CatalogExtension.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java| 4 ++-- .../spark/sql/connector/catalog/DelegatingCatalogExtension.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/Identifier.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/IdentifierImpl.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/NamespaceChange.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/StagedTable.java | 4 ++-- .../org/apache/spark/sql/connector/catalog/StagingTableCatalog.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsDelete.java | 4 ++-- .../org/apache/spark/sql/connector/catalog/SupportsNamespaces.java| 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsRead.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsWrite.java| 4 ++-- .../java/org/apache/spark/sql/connector/catalog/TableCapability.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/TableCatalog.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/TableChange.java | 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Expression.java | 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Expressions.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/expressions/Literal.java | 4 ++-- .../org/apache/spark/sql/connector/expressions/NamedReference.java| 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Transform.java| 4 ++-- .../apache/spark/sql/connector/write/SupportsDynamicOverwrite.java| 3 +++ .../java/org/apache/spark/sql/connector/write/SupportsOverwrite.java | 2 ++ .../java/org/apache/spark/sql/connector/write/SupportsTruncate.java | 3 +++ 23 files changed, 48 insertions(+), 40 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java index 61cb83c..155dca5 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java @@ -17,7 +17,7 @@ package org.apache.spark.sql.connector.catalog; -import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.util.CaseInsensitiveStringMap; /** @@ -29,7 +29,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap; * * @since 3.0.0 */ -@Experimental +@Evolving public interface CatalogExtension extends TableCatalog, SupportsNamespaces { /** diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java index 2958538..8ca4f56 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java @@ -17,7 +17,7 @@ package org.apache.spark.sql.connector.catalog; -import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.internal.SQLConf; import org.apache.spark.sql.util.CaseInsensitiveStringMap; @@ -41,7 +41,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap; * * @since 3.0.0 */ -@Experimental +@Evolving public interface CatalogPlugin { /** * Called to initialize configuration. diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector
[spark] branch branch-3.0 updated: [SPARK-31053][SQL] mark connector APIs as Evolving
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4287b03 [SPARK-31053][SQL] mark connector APIs as Evolving 4287b03 is described below commit 4287b03a9c564eb2bdb4dfd93ea78728e3a9e440 Author: Wenchen Fan AuthorDate: Sun Mar 8 11:41:09 2020 -0700 [SPARK-31053][SQL] mark connector APIs as Evolving ### What changes were proposed in this pull request? The newly added catalog APIs are marked as Experimental but other DS v2 APIs are marked as Evolving. This PR makes it consistent and mark all Connector APIs as Evolving. ### Why are the changes needed? For consistency. ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27811 from cloud-fan/tag. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit 1aa184763aa49d70907669b2d8af5a713ee0d7fa) Signed-off-by: Dongjoon Hyun --- .../java/org/apache/spark/sql/connector/catalog/CatalogExtension.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java| 4 ++-- .../spark/sql/connector/catalog/DelegatingCatalogExtension.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/Identifier.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/IdentifierImpl.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/NamespaceChange.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/StagedTable.java | 4 ++-- .../org/apache/spark/sql/connector/catalog/StagingTableCatalog.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsDelete.java | 4 ++-- .../org/apache/spark/sql/connector/catalog/SupportsNamespaces.java| 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsRead.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/SupportsWrite.java| 4 ++-- .../java/org/apache/spark/sql/connector/catalog/TableCapability.java | 4 ++-- .../java/org/apache/spark/sql/connector/catalog/TableCatalog.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/catalog/TableChange.java | 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Expression.java | 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Expressions.java | 4 ++-- .../main/java/org/apache/spark/sql/connector/expressions/Literal.java | 4 ++-- .../org/apache/spark/sql/connector/expressions/NamedReference.java| 4 ++-- .../java/org/apache/spark/sql/connector/expressions/Transform.java| 4 ++-- .../apache/spark/sql/connector/write/SupportsDynamicOverwrite.java| 3 +++ .../java/org/apache/spark/sql/connector/write/SupportsOverwrite.java | 2 ++ .../java/org/apache/spark/sql/connector/write/SupportsTruncate.java | 3 +++ 23 files changed, 48 insertions(+), 40 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java index 61cb83c..155dca5 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogExtension.java @@ -17,7 +17,7 @@ package org.apache.spark.sql.connector.catalog; -import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.util.CaseInsensitiveStringMap; /** @@ -29,7 +29,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap; * * @since 3.0.0 */ -@Experimental +@Evolving public interface CatalogExtension extends TableCatalog, SupportsNamespaces { /** diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java index 2958538..8ca4f56 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/CatalogPlugin.java @@ -17,7 +17,7 @@ package org.apache.spark.sql.connector.catalog; -import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.internal.SQLConf; import org.apache.spark.sql.util.CaseInsensitiveStringMap; @@ -41,7 +41,7 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap; * * @since 3.0.0 */ -@Experimental +@Evolving public interface CatalogPlugin { /** * Called to initialize configuration. diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/DelegatingCatalogExtension.java b/sql/catalyst/src/main/java/org/apache/spark/sql/connector
[spark] branch master updated (1aa1847 -> 068bdd4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1aa1847 [SPARK-31053][SQL] mark connector APIs as Evolving add 068bdd4 [SPARK-31073][WEBUI] Add "shuffle write time" to task metrics summary in StagePage No new revisions were added by this update. Summary of changes: .../org/apache/spark/ui/static/stagepage.js| 41 ++ .../spark/ui/static/stagespage-template.html | 2 +- .../scala/org/apache/spark/ui/jobs/StagePage.scala | 2 +- 3 files changed, 29 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d21aab4 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields d21aab4 is described below commit d21aab403a0a32e8b705b38874c0b335e703bd5d Author: Liang-Chi Hsieh AuthorDate: Mon Mar 9 11:06:45 2020 -0700 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields ### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes #27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/types.py | 6 ++ 1 file changed, 6 insertions(+) diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py index a5302e7..320a68d 100644 --- a/python/pyspark/sql/types.py +++ b/python/pyspark/sql/types.py @@ -1528,6 +1528,12 @@ class Row(tuple): :param recursive: turns the nested Rows to dict (default: False). +.. note:: If a row contains duplicate field names, e.g., the rows of a join +between two :class:`DataFrame` that both have the fields of same names, +one of the duplicate fields will be selected by ``asDict``. ``__getitem__`` +will also return one of the duplicate fields, however returned value might +be different to ``asDict``. + >>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11} True >>> row = Row(key=1, value=Row(name='a', age=2)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b6b0343 -> d21aab4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b6b0343 [SPARK-30929][ML] ML, GraphX 3.0 QA: API: New Scala APIs, docs add d21aab4 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields No new revisions were added by this update. Summary of changes: python/pyspark/sql/types.py | 6 ++ 1 file changed, 6 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e0d2b9 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields 2e0d2b9 is described below commit 2e0d2b96195b0a3772225501a703fb02304aa346 Author: Liang-Chi Hsieh AuthorDate: Mon Mar 9 11:06:45 2020 -0700 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields ### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes #27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit d21aab403a0a32e8b705b38874c0b335e703bd5d) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/types.py | 6 ++ 1 file changed, 6 insertions(+) diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py index a5302e7..320a68d 100644 --- a/python/pyspark/sql/types.py +++ b/python/pyspark/sql/types.py @@ -1528,6 +1528,12 @@ class Row(tuple): :param recursive: turns the nested Rows to dict (default: False). +.. note:: If a row contains duplicate field names, e.g., the rows of a join +between two :class:`DataFrame` that both have the fields of same names, +one of the duplicate fields will be selected by ``asDict``. ``__getitem__`` +will also return one of the duplicate fields, however returned value might +be different to ``asDict``. + >>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11} True >>> row = Row(key=1, value=Row(name='a', age=2)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new f378c7f [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields f378c7f is described below commit f378c7fba29368ca32142a3b7fc169dabe6cb37f Author: Liang-Chi Hsieh AuthorDate: Mon Mar 9 11:06:45 2020 -0700 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields ### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes #27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit d21aab403a0a32e8b705b38874c0b335e703bd5d) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/types.py | 6 ++ 1 file changed, 6 insertions(+) diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py index 1d24c40..0d73963 100644 --- a/python/pyspark/sql/types.py +++ b/python/pyspark/sql/types.py @@ -1466,6 +1466,12 @@ class Row(tuple): :param recursive: turns the nested Row as dict (default: False). +.. note:: If a row contains duplicate field names, e.g., the rows of a join +between two :class:`DataFrame` that both have the fields of same names, +one of the duplicate fields will be selected by ``asDict``. ``__getitem__`` +will also return one of the duplicate fields, however returned value might +be different to ``asDict``. + >>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11} True >>> row = Row(key=1, value=Row(name='a', age=2)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e0d2b9 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields 2e0d2b9 is described below commit 2e0d2b96195b0a3772225501a703fb02304aa346 Author: Liang-Chi Hsieh AuthorDate: Mon Mar 9 11:06:45 2020 -0700 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields ### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes #27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit d21aab403a0a32e8b705b38874c0b335e703bd5d) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/types.py | 6 ++ 1 file changed, 6 insertions(+) diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py index a5302e7..320a68d 100644 --- a/python/pyspark/sql/types.py +++ b/python/pyspark/sql/types.py @@ -1528,6 +1528,12 @@ class Row(tuple): :param recursive: turns the nested Rows to dict (default: False). +.. note:: If a row contains duplicate field names, e.g., the rows of a join +between two :class:`DataFrame` that both have the fields of same names, +one of the duplicate fields will be selected by ``asDict``. ``__getitem__`` +will also return one of the duplicate fields, however returned value might +be different to ``asDict``. + >>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11} True >>> row = Row(key=1, value=Row(name='a', age=2)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new f378c7f [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields f378c7f is described below commit f378c7fba29368ca32142a3b7fc169dabe6cb37f Author: Liang-Chi Hsieh AuthorDate: Mon Mar 9 11:06:45 2020 -0700 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields ### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes #27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit d21aab403a0a32e8b705b38874c0b335e703bd5d) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/types.py | 6 ++ 1 file changed, 6 insertions(+) diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py index 1d24c40..0d73963 100644 --- a/python/pyspark/sql/types.py +++ b/python/pyspark/sql/types.py @@ -1466,6 +1466,12 @@ class Row(tuple): :param recursive: turns the nested Row as dict (default: False). +.. note:: If a row contains duplicate field names, e.g., the rows of a join +between two :class:`DataFrame` that both have the fields of same names, +one of the duplicate fields will be selected by ``asDict``. ``__getitem__`` +will also return one of the duplicate fields, however returned value might +be different to ``asDict``. + >>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11} True >>> row = Row(key=1, value=Row(name='a', age=2)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new f378c7f [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields f378c7f is described below commit f378c7fba29368ca32142a3b7fc169dabe6cb37f Author: Liang-Chi Hsieh AuthorDate: Mon Mar 9 11:06:45 2020 -0700 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields ### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes #27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun (cherry picked from commit d21aab403a0a32e8b705b38874c0b335e703bd5d) Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/types.py | 6 ++ 1 file changed, 6 insertions(+) diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py index 1d24c40..0d73963 100644 --- a/python/pyspark/sql/types.py +++ b/python/pyspark/sql/types.py @@ -1466,6 +1466,12 @@ class Row(tuple): :param recursive: turns the nested Row as dict (default: False). +.. note:: If a row contains duplicate field names, e.g., the rows of a join +between two :class:`DataFrame` that both have the fields of same names, +one of the duplicate fields will be selected by ``asDict``. ``__getitem__`` +will also return one of the duplicate fields, however returned value might +be different to ``asDict``. + >>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11} True >>> row = Row(key=1, value=Row(name='a', age=2)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d21aab4 -> e807118)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d21aab4 [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields add e807118 [SPARK-31055][DOCS] Update config docs for shuffle local host reads to have dep on external shuffle service No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/package.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31055][DOCS] Update config docs for shuffle local host reads to have dep on external shuffle service
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 9caf009 [SPARK-31055][DOCS] Update config docs for shuffle local host reads to have dep on external shuffle service 9caf009 is described below commit 9caf009ecd9041e71efe2ef56ca0e75cc94cb56e Author: Thomas Graves AuthorDate: Mon Mar 9 12:17:59 2020 -0700 [SPARK-31055][DOCS] Update config docs for shuffle local host reads to have dep on external shuffle service ### What changes were proposed in this pull request? with SPARK-27651 we now support host local reads for shuffle, but only when external shuffle service is enabled. Update the config docs to state that. ### Why are the changes needed? clarify dependency ### Does this PR introduce any user-facing change? no ### How was this patch tested? n/a Closes #27812 from tgravescs/SPARK-27651-follow. Authored-by: Thomas Graves Signed-off-by: Dongjoon Hyun (cherry picked from commit e807118eef9e0214170ff62c828524d237bd58e3) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 23c31a5..1308a46 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -1135,7 +1135,8 @@ package object config { private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED = ConfigBuilder("spark.shuffle.readHostLocalDisk") - .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is disabled), shuffle " + + .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is disabled and external " + +s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled), shuffle " + "blocks requested from those block managers which are running on the same host are read " + "from the disk directly instead of being fetched as remote blocks over the network.") .booleanConf - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 815c792 [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source 815c792 is described below commit 815c7929c290d6eed86dc5c924f9f7d48cff179d Author: HyukjinKwon AuthorDate: Tue Mar 10 00:33:32 2020 -0700 [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source ### What changes were proposed in this pull request? This PR proposes two things: 1. Convert `null` to `string` type during schema inference of `schema_of_json` as JSON datasource does. This is a bug fix as well because `null` string is not the proper DDL formatted string and it is unable for SQL parser to recognise it as a type string. We should match it to JSON datasource and return a string type so `schema_of_json` returns a proper DDL formatted string. 2. Let `schema_of_json` respect `dropFieldIfAllNull` option during schema inference. ### Why are the changes needed? To let `schema_of_json` return a proper DDL formatted string, and respect `dropFieldIfAllNull` option. ### Does this PR introduce any user-facing change? Yes, it does. ```scala import collection.JavaConverters._ import org.apache.spark.sql.functions._ spark.range(1).select(schema_of_json(lit("""{"id": ""}"""))).show() spark.range(1).select(schema_of_json(lit("""{"id": "a", "drop": {"drop": null}}"""), Map("dropFieldIfAllNull" -> "true").asJava)).show(false) ``` **Before:** ``` struct struct,id:string> ``` **After:** ``` struct struct ``` ### How was this patch tested? Manually tested, and unittests were added. Closes #27854 from HyukjinKwon/SPARK-31065. Authored-by: HyukjinKwon Signed-off-by: Dongjoon Hyun --- .../sql/catalyst/expressions/jsonExpressions.scala | 13 +++- .../spark/sql/catalyst/json/JsonInferSchema.scala | 13 .../org/apache/spark/sql/JsonFunctionsSuite.scala | 36 ++ 3 files changed, 54 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index aa4b464..4c2a511 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -777,7 +777,18 @@ case class SchemaOfJson( override def eval(v: InternalRow): Any = { val dt = Utils.tryWithResource(CreateJacksonParser.utf8String(jsonFactory, json)) { parser => parser.nextToken() - jsonInferSchema.inferField(parser) + // To match with schema inference from JSON datasource. + jsonInferSchema.inferField(parser) match { +case st: StructType => + jsonInferSchema.canonicalizeType(st, jsonOptions).getOrElse(StructType(Nil)) +case at: ArrayType if at.elementType.isInstanceOf[StructType] => + jsonInferSchema +.canonicalizeType(at.elementType, jsonOptions) +.map(ArrayType(_, containsNull = at.containsNull)) +.getOrElse(ArrayType(StructType(Nil), containsNull = at.containsNull)) +case other: DataType => + jsonInferSchema.canonicalizeType(other, jsonOptions).getOrElse(StringType) + } } UTF8String.fromString(dt.catalogString) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala index 82dd6d0..3dd8694 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala @@ -92,12 +92,10 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { } json.sparkContext.runJob(mergedTypesFromPartitions, foldPartition, mergeResult) -canonicalizeType(rootType, options) match { - case Some(st: StructType) => st - case _ => -// canonicalizeType erases all empty structs, including the only one we want to keep -StructType(Nil) -} +canonicalizeType(rootType, options) + .find(_.isInstanceOf[StructType]) + // canonicalizeType erases all empty structs, including the only one we want to keep + .getOrElse(StructType(Nil)).asInstanceOf[StructType]
[spark] branch branch-3.0 updated: [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 0985f13 [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source 0985f13 is described below commit 0985f13bc66a99319820d0d9ba5b3f2a254f61a4 Author: HyukjinKwon AuthorDate: Tue Mar 10 00:33:32 2020 -0700 [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source This PR proposes two things: 1. Convert `null` to `string` type during schema inference of `schema_of_json` as JSON datasource does. This is a bug fix as well because `null` string is not the proper DDL formatted string and it is unable for SQL parser to recognise it as a type string. We should match it to JSON datasource and return a string type so `schema_of_json` returns a proper DDL formatted string. 2. Let `schema_of_json` respect `dropFieldIfAllNull` option during schema inference. To let `schema_of_json` return a proper DDL formatted string, and respect `dropFieldIfAllNull` option. Yes, it does. ```scala import collection.JavaConverters._ import org.apache.spark.sql.functions._ spark.range(1).select(schema_of_json(lit("""{"id": ""}"""))).show() spark.range(1).select(schema_of_json(lit("""{"id": "a", "drop": {"drop": null}}"""), Map("dropFieldIfAllNull" -> "true").asJava)).show(false) ``` **Before:** ``` struct struct,id:string> ``` **After:** ``` struct struct ``` Manually tested, and unittests were added. Closes #27854 from HyukjinKwon/SPARK-31065. Authored-by: HyukjinKwon Signed-off-by: Dongjoon Hyun (cherry picked from commit 815c7929c290d6eed86dc5c924f9f7d48cff179d) Signed-off-by: Dongjoon Hyun --- .../sql/catalyst/expressions/jsonExpressions.scala | 13 +++- .../spark/sql/catalyst/json/JsonInferSchema.scala | 13 .../org/apache/spark/sql/JsonFunctionsSuite.scala | 35 ++ 3 files changed, 53 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index 61afdb6..a63e541 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -773,7 +773,18 @@ case class SchemaOfJson( override def eval(v: InternalRow): Any = { val dt = Utils.tryWithResource(CreateJacksonParser.utf8String(jsonFactory, json)) { parser => parser.nextToken() - jsonInferSchema.inferField(parser) + // To match with schema inference from JSON datasource. + jsonInferSchema.inferField(parser) match { +case st: StructType => + jsonInferSchema.canonicalizeType(st, jsonOptions).getOrElse(StructType(Nil)) +case at: ArrayType if at.elementType.isInstanceOf[StructType] => + jsonInferSchema +.canonicalizeType(at.elementType, jsonOptions) +.map(ArrayType(_, containsNull = at.containsNull)) +.getOrElse(ArrayType(StructType(Nil), containsNull = at.containsNull)) +case other: DataType => + jsonInferSchema.canonicalizeType(other, jsonOptions).getOrElse(StringType) + } } UTF8String.fromString(dt.catalogString) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala index 82dd6d0..3dd8694 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala @@ -92,12 +92,10 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { } json.sparkContext.runJob(mergedTypesFromPartitions, foldPartition, mergeResult) -canonicalizeType(rootType, options) match { - case Some(st: StructType) => st - case _ => -// canonicalizeType erases all empty structs, including the only one we want to keep -StructType(Nil) -} +canonicalizeType(rootType, options) + .find(_.isInstanceOf[StructType]) + // canonicalizeType erases all empty structs, including the only one we want to keep + .getOrElse(StructType(Nil)).asInstanceOf[StructType] } /** @@ -198,7 +196,8 @@ private[sql] class JsonInferSchema(options: JSONO
[spark] branch master updated (3bd6ebf -> 34be83e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3bd6ebf [SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces add 34be83e [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala | 6 +++--- .../apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala| 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3bd6ebf -> 34be83e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3bd6ebf [SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces add 34be83e [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala | 6 +++--- .../apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala| 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 57bf23c [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment 57bf23c is described below commit 57bf23c01b2cffe5011a9d15eb68eff5c28519f4 Author: yi.wu AuthorDate: Tue Mar 10 11:09:36 2020 -0700 [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment ### What changes were proposed in this pull request? Replace legacy `ReduceNumShufflePartitions` with `CoalesceShufflePartitions` in comment. ### Why are the changes needed? Rule `ReduceNumShufflePartitions` has renamed to `CoalesceShufflePartitions`, we should update related comment as well. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A. Closes #27865 from Ngone51/spark_31037_followup. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun (cherry picked from commit 34be83e08b6f5313bdd9d165d3e203d06eff677b) Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala | 6 +++--- .../apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala| 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index fc88a7f..c1486aa 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -97,12 +97,12 @@ case class AdaptiveSparkPlanExec( @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq( ReuseAdaptiveSubquery(conf, context.subqueryCache), // Here the 'OptimizeSkewedJoin' rule should be executed -// before 'ReduceNumShufflePartitions', as the skewed partition handled -// in 'OptimizeSkewedJoin' rule, should be omitted in 'ReduceNumShufflePartitions'. +// before 'CoalesceShufflePartitions', as the skewed partition handled +// in 'OptimizeSkewedJoin' rule, should be omitted in 'CoalesceShufflePartitions'. OptimizeSkewedJoin(conf), CoalesceShufflePartitions(conf), // The rule of 'OptimizeLocalShuffleReader' need to make use of the 'partitionStartIndices' -// in 'ReduceNumShufflePartitions' rule. So it must be after 'ReduceNumShufflePartitions' rule. +// in 'CoalesceShufflePartitions' rule. So it must be after 'CoalesceShufflePartitions' rule. OptimizeLocalShuffleReader(conf), ApplyColumnarRulesAndInsertTransitions(conf, context.session.sessionState.columnarRules), CollapseCodegenStages(conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala index c3bcce4..4387409 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala @@ -52,7 +52,7 @@ import org.apache.spark.sql.internal.SQLConf * (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2) * * Note that, when this rule is enabled, it also coalesces non-skewed partitions like - * `ReduceNumShufflePartitions` does. + * `CoalesceShufflePartitions` does. */ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { @@ -191,7 +191,7 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { val leftSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec] val rightSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec] // This is used to delay the creation of non-skew partitions so that we can potentially - // coalesce them like `ReduceNumShufflePartitions` does. + // coalesce them like `CoalesceShufflePartitions` does. val nonSkewPartitionIndices = mutable.ArrayBuffer.empty[Int] val leftSkewDesc = new SkewDesc val rightSkewDesc = new SkewDesc - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3bd6ebf -> 34be83e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3bd6ebf [SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces add 34be83e [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala | 6 +++--- .../apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala| 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 57bf23c [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment 57bf23c is described below commit 57bf23c01b2cffe5011a9d15eb68eff5c28519f4 Author: yi.wu AuthorDate: Tue Mar 10 11:09:36 2020 -0700 [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment ### What changes were proposed in this pull request? Replace legacy `ReduceNumShufflePartitions` with `CoalesceShufflePartitions` in comment. ### Why are the changes needed? Rule `ReduceNumShufflePartitions` has renamed to `CoalesceShufflePartitions`, we should update related comment as well. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A. Closes #27865 from Ngone51/spark_31037_followup. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun (cherry picked from commit 34be83e08b6f5313bdd9d165d3e203d06eff677b) Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala | 6 +++--- .../apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala| 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index fc88a7f..c1486aa 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -97,12 +97,12 @@ case class AdaptiveSparkPlanExec( @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq( ReuseAdaptiveSubquery(conf, context.subqueryCache), // Here the 'OptimizeSkewedJoin' rule should be executed -// before 'ReduceNumShufflePartitions', as the skewed partition handled -// in 'OptimizeSkewedJoin' rule, should be omitted in 'ReduceNumShufflePartitions'. +// before 'CoalesceShufflePartitions', as the skewed partition handled +// in 'OptimizeSkewedJoin' rule, should be omitted in 'CoalesceShufflePartitions'. OptimizeSkewedJoin(conf), CoalesceShufflePartitions(conf), // The rule of 'OptimizeLocalShuffleReader' need to make use of the 'partitionStartIndices' -// in 'ReduceNumShufflePartitions' rule. So it must be after 'ReduceNumShufflePartitions' rule. +// in 'CoalesceShufflePartitions' rule. So it must be after 'CoalesceShufflePartitions' rule. OptimizeLocalShuffleReader(conf), ApplyColumnarRulesAndInsertTransitions(conf, context.session.sessionState.columnarRules), CollapseCodegenStages(conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala index c3bcce4..4387409 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala @@ -52,7 +52,7 @@ import org.apache.spark.sql.internal.SQLConf * (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2) * * Note that, when this rule is enabled, it also coalesces non-skewed partitions like - * `ReduceNumShufflePartitions` does. + * `CoalesceShufflePartitions` does. */ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { @@ -191,7 +191,7 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { val leftSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec] val rightSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec] // This is used to delay the creation of non-skew partitions so that we can potentially - // coalesce them like `ReduceNumShufflePartitions` does. + // coalesce them like `CoalesceShufflePartitions` does. val nonSkewPartitionIndices = mutable.ArrayBuffer.empty[Int] val leftSkewDesc = new SkewDesc val rightSkewDesc = new SkewDesc - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 57bf23c [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment 57bf23c is described below commit 57bf23c01b2cffe5011a9d15eb68eff5c28519f4 Author: yi.wu AuthorDate: Tue Mar 10 11:09:36 2020 -0700 [SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment ### What changes were proposed in this pull request? Replace legacy `ReduceNumShufflePartitions` with `CoalesceShufflePartitions` in comment. ### Why are the changes needed? Rule `ReduceNumShufflePartitions` has renamed to `CoalesceShufflePartitions`, we should update related comment as well. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A. Closes #27865 from Ngone51/spark_31037_followup. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun (cherry picked from commit 34be83e08b6f5313bdd9d165d3e203d06eff677b) Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala | 6 +++--- .../apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala| 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index fc88a7f..c1486aa 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -97,12 +97,12 @@ case class AdaptiveSparkPlanExec( @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq( ReuseAdaptiveSubquery(conf, context.subqueryCache), // Here the 'OptimizeSkewedJoin' rule should be executed -// before 'ReduceNumShufflePartitions', as the skewed partition handled -// in 'OptimizeSkewedJoin' rule, should be omitted in 'ReduceNumShufflePartitions'. +// before 'CoalesceShufflePartitions', as the skewed partition handled +// in 'OptimizeSkewedJoin' rule, should be omitted in 'CoalesceShufflePartitions'. OptimizeSkewedJoin(conf), CoalesceShufflePartitions(conf), // The rule of 'OptimizeLocalShuffleReader' need to make use of the 'partitionStartIndices' -// in 'ReduceNumShufflePartitions' rule. So it must be after 'ReduceNumShufflePartitions' rule. +// in 'CoalesceShufflePartitions' rule. So it must be after 'CoalesceShufflePartitions' rule. OptimizeLocalShuffleReader(conf), ApplyColumnarRulesAndInsertTransitions(conf, context.session.sessionState.columnarRules), CollapseCodegenStages(conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala index c3bcce4..4387409 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala @@ -52,7 +52,7 @@ import org.apache.spark.sql.internal.SQLConf * (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2) * * Note that, when this rule is enabled, it also coalesces non-skewed partitions like - * `ReduceNumShufflePartitions` does. + * `CoalesceShufflePartitions` does. */ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { @@ -191,7 +191,7 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { val leftSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec] val rightSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec] // This is used to delay the creation of non-skew partitions so that we can potentially - // coalesce them like `ReduceNumShufflePartitions` does. + // coalesce them like `CoalesceShufflePartitions` does. val nonSkewPartitionIndices = mutable.ArrayBuffer.empty[Int] val leftSkewDesc = new SkewDesc val rightSkewDesc = new SkewDesc - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0f54dc7 -> 93def95)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0f54dc7 [SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2 add 93def95 [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7-hive-1.2 | 2 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- pom.xml | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d1f5df4 [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final d1f5df4 is described below commit d1f5df40cb7687c5fd3145d3d629fb1069227638 Author: Dongjoon Hyun AuthorDate: Tue Mar 10 17:50:34 2020 -0700 [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final ### What changes were proposed in this pull request? This PR aims to bring the bug fixes from the latest netty-all. ### Why are the changes needed? - 4.1.47.Final: https://github.com/netty/netty/milestone/222?closed=1 (15 patches or issues) - 4.1.46.Final: https://github.com/netty/netty/milestone/221?closed=1 (80 patches or issues) - 4.1.45.Final: https://github.com/netty/netty/milestone/220?closed=1 (23 patches or issues) - 4.1.44.Final: https://github.com/netty/netty/milestone/218?closed=1 (113 patches or issues) - 4.1.43.Final: https://github.com/netty/netty/milestone/217?closed=1 (63 patches or issues) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #27869 from dongjoon-hyun/SPARK-31095. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 93def95b0801842e0288a77b3a97f84d31b57366) Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-2.7-hive-1.2 | 2 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- pom.xml | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-1.2 b/dev/deps/spark-deps-hadoop-2.7-hive-1.2 index 828b1a6..39f7262 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-1.2 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-1.2 @@ -155,7 +155,7 @@ metrics-jmx/4.1.1//metrics-jmx-4.1.1.jar metrics-json/4.1.1//metrics-json-4.1.1.jar metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.42.Final//netty-all-4.1.42.Final.jar +netty-all/4.1.47.Final//netty-all-4.1.47.Final.jar objenesis/2.5.1//objenesis-2.5.1.jar okhttp/3.12.6//okhttp-3.12.6.jar okio/1.15.0//okio-1.15.0.jar diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index 8a65540..26ac30d 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -170,7 +170,7 @@ metrics-jmx/4.1.1//metrics-jmx-4.1.1.jar metrics-json/4.1.1//metrics-json-4.1.1.jar metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.42.Final//netty-all-4.1.42.Final.jar +netty-all/4.1.47.Final//netty-all-4.1.47.Final.jar objenesis/2.5.1//objenesis-2.5.1.jar okhttp/3.12.6//okhttp-3.12.6.jar okio/1.15.0//okio-1.15.0.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index 4dddbba..e908ec8 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -183,7 +183,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar minlog/1.3.0//minlog-1.3.0.jar mssql-jdbc/6.2.1.jre7//mssql-jdbc-6.2.1.jre7.jar -netty-all/4.1.42.Final//netty-all-4.1.42.Final.jar +netty-all/4.1.47.Final//netty-all-4.1.47.Final.jar nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar objenesis/2.5.1//objenesis-2.5.1.jar okhttp/2.7.5//okhttp-2.7.5.jar diff --git a/pom.xml b/pom.xml index 8a46197..262f3ac 100644 --- a/pom.xml +++ b/pom.xml @@ -698,7 +698,7 @@ io.netty netty-all -4.1.42.Final +4.1.47.Final org.apache.derby - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d1f5df4 [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final d1f5df4 is described below commit d1f5df40cb7687c5fd3145d3d629fb1069227638 Author: Dongjoon Hyun AuthorDate: Tue Mar 10 17:50:34 2020 -0700 [SPARK-31095][BUILD] Upgrade netty-all to 4.1.47.Final ### What changes were proposed in this pull request? This PR aims to bring the bug fixes from the latest netty-all. ### Why are the changes needed? - 4.1.47.Final: https://github.com/netty/netty/milestone/222?closed=1 (15 patches or issues) - 4.1.46.Final: https://github.com/netty/netty/milestone/221?closed=1 (80 patches or issues) - 4.1.45.Final: https://github.com/netty/netty/milestone/220?closed=1 (23 patches or issues) - 4.1.44.Final: https://github.com/netty/netty/milestone/218?closed=1 (113 patches or issues) - 4.1.43.Final: https://github.com/netty/netty/milestone/217?closed=1 (63 patches or issues) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #27869 from dongjoon-hyun/SPARK-31095. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 93def95b0801842e0288a77b3a97f84d31b57366) Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-2.7-hive-1.2 | 2 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- pom.xml | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-1.2 b/dev/deps/spark-deps-hadoop-2.7-hive-1.2 index 828b1a6..39f7262 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-1.2 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-1.2 @@ -155,7 +155,7 @@ metrics-jmx/4.1.1//metrics-jmx-4.1.1.jar metrics-json/4.1.1//metrics-json-4.1.1.jar metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.42.Final//netty-all-4.1.42.Final.jar +netty-all/4.1.47.Final//netty-all-4.1.47.Final.jar objenesis/2.5.1//objenesis-2.5.1.jar okhttp/3.12.6//okhttp-3.12.6.jar okio/1.15.0//okio-1.15.0.jar diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index 8a65540..26ac30d 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -170,7 +170,7 @@ metrics-jmx/4.1.1//metrics-jmx-4.1.1.jar metrics-json/4.1.1//metrics-json-4.1.1.jar metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.42.Final//netty-all-4.1.42.Final.jar +netty-all/4.1.47.Final//netty-all-4.1.47.Final.jar objenesis/2.5.1//objenesis-2.5.1.jar okhttp/3.12.6//okhttp-3.12.6.jar okio/1.15.0//okio-1.15.0.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index 4dddbba..e908ec8 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -183,7 +183,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar minlog/1.3.0//minlog-1.3.0.jar mssql-jdbc/6.2.1.jre7//mssql-jdbc-6.2.1.jre7.jar -netty-all/4.1.42.Final//netty-all-4.1.42.Final.jar +netty-all/4.1.47.Final//netty-all-4.1.47.Final.jar nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar objenesis/2.5.1//objenesis-2.5.1.jar okhttp/2.7.5//okhttp-2.7.5.jar diff --git a/pom.xml b/pom.xml index 8a46197..262f3ac 100644 --- a/pom.xml +++ b/pom.xml @@ -698,7 +698,7 @@ io.netty netty-all -4.1.42.Final +4.1.47.Final org.apache.derby - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5be0d04 -> 8efb710)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5be0d04 [SPARK-31117][SQL][TEST] reduce the test time of DateTimeUtilsSuite add 8efb710 [SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 -- .../apache/spark/sql/catalyst/expressions/collectionOperations.scala | 4 ++-- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- 3 files changed, 3 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5be0d04 -> 8efb710)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5be0d04 [SPARK-31117][SQL][TEST] reduce the test time of DateTimeUtilsSuite add 8efb710 [SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 -- .../apache/spark/sql/catalyst/expressions/collectionOperations.scala | 4 ++-- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- 3 files changed, 3 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c1e6e14 [SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default c1e6e14 is described below commit c1e6e1439d1a79560f197e2627a334fcd0bb8a28 Author: Wenchen Fan AuthorDate: Wed Mar 11 09:55:24 2020 -0700 [SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default ### What changes were proposed in this pull request? This PR reverts https://github.com/apache/spark/pull/26051 and https://github.com/apache/spark/pull/26066 ### Why are the changes needed? There is no standard requiring that `size(null)` must return null, and returning -1 looks reasonable as well. This is kind of a cosmetic change and we should avoid it if it breaks existing queries. This is similar to reverting TRIM function parameter order change. ### Does this PR introduce any user-facing change? Yes, change the behavior of `size(null)` back to be the same as 2.4. ### How was this patch tested? N/A Closes #27834 from cloud-fan/revert. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit 8efb71013d0c9e8d81430aa48f88b91929425bff) Signed-off-by: Dongjoon Hyun --- docs/sql-migration-guide.md | 2 -- .../apache/spark/sql/catalyst/expressions/collectionOperations.scala | 4 ++-- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 6c73038..e7ac9f0 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -214,8 +214,6 @@ license: | - `now` - current query start time For example `SELECT timestamp 'tomorrow';`. - - Since Spark 3.0, the `size` function returns `NULL` for the `NULL` input. In Spark version 2.4 and earlier, this function gives `-1` for the same input. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.sizeOfNull` to `true`. - - Since Spark 3.0, when the `array`/`map` function is called without any parameters, it returns an empty collection with `NullType` as element type. In Spark version 2.4 and earlier, it returns an empty collection with `StringType` as element type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createEmptyCollectionUsingStringType` to `true`. - Since Spark 3.0, the interval literal syntax does not allow multiple from-to units anymore. For example, `SELECT INTERVAL '1-1' YEAR TO MONTH '2-2' YEAR TO MONTH'` throws parser exception. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index cfa877b..6d95909 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -79,7 +79,7 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression _FUNC_(expr) - Returns the size of an array or a map. The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. -By default, the spark.sql.legacy.sizeOfNull parameter is set to false. +By default, the spark.sql.legacy.sizeOfNull parameter is set to true. """, examples = """ Examples: @@ -88,7 +88,7 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression > SELECT _FUNC_(map('a', 1, 'b', 2)); 2 > SELECT _FUNC_(NULL); - NULL + -1 """) case class Size(child: Expression, legacySizeOfNull: Boolean) extends UnaryExpression with ExpectsInputTypes { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index fdaf0ec..644fe89 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1942,7 +1942,7 @@ object SQLConf { .doc("If it is set to true, size of null returns -1. This behavior was inherited from Hive. " + "The size function returns null for null input if the flag is disabled.") .booleanConf -.createWithDefault(false) +.createWithDefault(tru
[spark] branch branch-3.0 updated: [SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c1e6e14 [SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default c1e6e14 is described below commit c1e6e1439d1a79560f197e2627a334fcd0bb8a28 Author: Wenchen Fan AuthorDate: Wed Mar 11 09:55:24 2020 -0700 [SPARK-31091] Revert SPARK-24640 Return `NULL` from `size(NULL)` by default ### What changes were proposed in this pull request? This PR reverts https://github.com/apache/spark/pull/26051 and https://github.com/apache/spark/pull/26066 ### Why are the changes needed? There is no standard requiring that `size(null)` must return null, and returning -1 looks reasonable as well. This is kind of a cosmetic change and we should avoid it if it breaks existing queries. This is similar to reverting TRIM function parameter order change. ### Does this PR introduce any user-facing change? Yes, change the behavior of `size(null)` back to be the same as 2.4. ### How was this patch tested? N/A Closes #27834 from cloud-fan/revert. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit 8efb71013d0c9e8d81430aa48f88b91929425bff) Signed-off-by: Dongjoon Hyun --- docs/sql-migration-guide.md | 2 -- .../apache/spark/sql/catalyst/expressions/collectionOperations.scala | 4 ++-- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 6c73038..e7ac9f0 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -214,8 +214,6 @@ license: | - `now` - current query start time For example `SELECT timestamp 'tomorrow';`. - - Since Spark 3.0, the `size` function returns `NULL` for the `NULL` input. In Spark version 2.4 and earlier, this function gives `-1` for the same input. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.sizeOfNull` to `true`. - - Since Spark 3.0, when the `array`/`map` function is called without any parameters, it returns an empty collection with `NullType` as element type. In Spark version 2.4 and earlier, it returns an empty collection with `StringType` as element type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createEmptyCollectionUsingStringType` to `true`. - Since Spark 3.0, the interval literal syntax does not allow multiple from-to units anymore. For example, `SELECT INTERVAL '1-1' YEAR TO MONTH '2-2' YEAR TO MONTH'` throws parser exception. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index cfa877b..6d95909 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -79,7 +79,7 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression _FUNC_(expr) - Returns the size of an array or a map. The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. -By default, the spark.sql.legacy.sizeOfNull parameter is set to false. +By default, the spark.sql.legacy.sizeOfNull parameter is set to true. """, examples = """ Examples: @@ -88,7 +88,7 @@ trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression > SELECT _FUNC_(map('a', 1, 'b', 2)); 2 > SELECT _FUNC_(NULL); - NULL + -1 """) case class Size(child: Expression, legacySizeOfNull: Boolean) extends UnaryExpression with ExpectsInputTypes { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index fdaf0ec..644fe89 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1942,7 +1942,7 @@ object SQLConf { .doc("If it is set to true, size of null returns -1. This behavior was inherited from Hive. " + "The size function returns null for null input if the flag is disabled.") .booleanConf -.createWithDefault(false) +.createWithDefault(tru
[spark] branch branch-2.4 updated (f378c7f -> 8e1021d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from f378c7f [SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields add 8e1021d [SPARK-31095][BUILD][2.4] Upgrade netty-all to 4.1.47.Final No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.6 | 2 +- dev/deps/spark-deps-hadoop-2.7 | 2 +- dev/deps/spark-deps-hadoop-3.1 | 2 +- pom.xml| 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2825237 -> 0f0ccda)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2825237 [SPARK-31062][K8S][TESTS] Improve spark decommissioning k8s test reliability add 0f0ccda [SPARK-31110][DOCS][SQL] refine sql doc for SELECT No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-clusterby.md | 18 docs/sql-ref-syntax-qry-select-distribute-by.md | 18 docs/sql-ref-syntax-qry-select-groupby.md | 48 ++--- docs/sql-ref-syntax-qry-select-having.md| 12 +++--- docs/sql-ref-syntax-qry-select-limit.md | 23 +++ docs/sql-ref-syntax-qry-select-orderby.md | 24 +-- docs/sql-ref-syntax-qry-select-sortby.md| 28 ++--- docs/sql-ref-syntax-qry-select-where.md | 10 ++--- docs/sql-ref-syntax-qry-select.md | 55 ++--- docs/sql-ref-syntax-qry.md | 8 ++-- 10 files changed, 126 insertions(+), 118 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31110][DOCS][SQL] refine sql doc for SELECT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ffcc4a2 [SPARK-31110][DOCS][SQL] refine sql doc for SELECT ffcc4a2 is described below commit ffcc4a27041abe97991f4bd14d0b5abf3c50a542 Author: Wenchen Fan AuthorDate: Wed Mar 11 16:52:40 2020 -0700 [SPARK-31110][DOCS][SQL] refine sql doc for SELECT ### What changes were proposed in this pull request? A few improvements to the sql ref SELECT doc: 1. correct the syntax of SELECT query 2. correct the default of null sort order 3. correct the GROUP BY syntax 4. several minor fixes ### Why are the changes needed? refine document ### Does this PR introduce any user-facing change? N/A ### How was this patch tested? N/A Closes #27866 from cloud-fan/doc. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit 0f0ccdadb123d5839c34244e25a4ee17dde0fcdc) Signed-off-by: Dongjoon Hyun --- docs/sql-ref-syntax-qry-select-clusterby.md | 18 docs/sql-ref-syntax-qry-select-distribute-by.md | 18 docs/sql-ref-syntax-qry-select-groupby.md | 48 ++--- docs/sql-ref-syntax-qry-select-having.md| 12 +++--- docs/sql-ref-syntax-qry-select-limit.md | 23 +++ docs/sql-ref-syntax-qry-select-orderby.md | 24 +-- docs/sql-ref-syntax-qry-select-sortby.md| 28 ++--- docs/sql-ref-syntax-qry-select-where.md | 10 ++--- docs/sql-ref-syntax-qry-select.md | 55 ++--- docs/sql-ref-syntax-qry.md | 8 ++-- 10 files changed, 126 insertions(+), 118 deletions(-) diff --git a/docs/sql-ref-syntax-qry-select-clusterby.md b/docs/sql-ref-syntax-qry-select-clusterby.md index bb60e8b..8f1dc59 100644 --- a/docs/sql-ref-syntax-qry-select-clusterby.md +++ b/docs/sql-ref-syntax-qry-select-clusterby.md @@ -9,9 +9,9 @@ license: | The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -41,20 +41,20 @@ CLUSTER BY { expression [ , ... ] } ### Examples {% highlight sql %} CREATE TABLE person (name STRING, age INT); -INSERT INTO person VALUES -('Zen Hui', 25), -('Anil B', 18), -('Shone S', 16), +INSERT INTO person VALUES +('Zen Hui', 25), +('Anil B', 18), +('Shone S', 16), ('Mike A', 25), -('John A', 18), +('John A', 18), ('Jack N', 16); -- Reduce the number of shuffle partitions to 2 to illustrate the behavior of `CLUSTER BY`. -- It's easier to see the clustering and sorting behavior with less number of partitions. SET spark.sql.shuffle.partitions = 2; - + -- Select the rows with no ordering. Please note that without any sort directive, the results --- of the query is not deterministic. It's included here to show the difference in behavior +-- of the query is not deterministic. It's included here to show the difference in behavior -- of a query when `CLUSTER BY` is not used vs when it's used. The query below produces rows -- where age column is not sorted. SELECT age, name FROM person; diff --git a/docs/sql-ref-syntax-qry-select-distribute-by.md b/docs/sql-ref-syntax-qry-select-distribute-by.md index 5ade9c1..957df9c 100644 --- a/docs/sql-ref-syntax-qry-select-distribute-by.md +++ b/docs/sql-ref-syntax-qry-select-distribute-by.md @@ -9,9 +9,9 @@ license: | The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -20,7 +20,7 @@ license: | --- The DISTRIBUTE BY clause is used to repartition the data based on the input expressions. Unlike the [CLUSTER BY](sql-ref-syntax-qry-select-clusterby.html) -clause, this does not sort the data within each partition. +clause, this does not sort the data within each partitio
[spark] branch branch-3.0 updated: [SPARK-31126][SS] Upgrade Kafka to 2.4.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b86dc6a [SPARK-31126][SS] Upgrade Kafka to 2.4.1 b86dc6a is described below commit b86dc6ab76d945e5c15c12390436ad95c119a493 Author: Dongjoon Hyun AuthorDate: Wed Mar 11 19:26:15 2020 -0700 [SPARK-31126][SS] Upgrade Kafka to 2.4.1 ### What changes were proposed in this pull request? This PR (SPARK-31126) aims to upgrade Kafka library to bring a client-side bug fix like KAFKA-8933 ### Why are the changes needed? The following is the full release note. - https://downloads.apache.org/kafka/2.4.1/RELEASE_NOTES.html ### Does this PR introduce any user-facing change? No ### How was this patch tested? Pass the Jenkins with the existing test. Closes #27881 from dongjoon-hyun/SPARK-KAFKA-2.4.1. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 614323d326db192540c955b4fa9b3b7af7527001) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 262f3ac..5aa100e 100644 --- a/pom.xml +++ b/pom.xml @@ -132,7 +132,7 @@ 2.3 -2.4.0 +2.4.1 10.12.1.1 1.10.1 1.5.9 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bd2b3f9 -> 614323d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd2b3f9 [SPARK-30911][CORE][DOC] Add version information to the configuration of Status add 614323d [SPARK-31126][SS] Upgrade Kafka to 2.4.1 No new revisions were added by this update. Summary of changes: pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31011][CORE] Log better message if SIGPWR is not supported while setting up decommission
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3946b24 [SPARK-31011][CORE] Log better message if SIGPWR is not supported while setting up decommission 3946b24 is described below commit 3946b243284fbd3bd98b456115ae194ad49fe8fe Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Wed Mar 11 20:27:00 2020 -0700 [SPARK-31011][CORE] Log better message if SIGPWR is not supported while setting up decommission ### What changes were proposed in this pull request? This patch changes to log better message (at least relevant to decommission) when registering signal handler for SIGPWR fails. SIGPWR is non-POSIX and not all unix-like OS support it; we can easily find the case, macOS. ### Why are the changes needed? Spark already logs message on failing to register handler for SIGPWR, but the error message is too general which doesn't give the information of the impact. End users should be noticed that failing to register handler for SIGPWR effectively "disables" the feature of decommission. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually tested via running standalone master/worker in macOS 10.14.6, with `spark.worker.decommission.enabled= true`, and submit an example application to run executors. (NOTE: the message may be different a bit, as the message can be updated in review phase.) For worker log: ``` 20/03/06 17:19:13 INFO Worker: Registering SIGPWR handler to trigger decommissioning. 20/03/06 17:19:13 INFO SignalUtils: Registering signal handler for PWR 20/03/06 17:19:13 WARN SignalUtils: Failed to register SIGPWR - disabling worker decommission. java.lang.IllegalArgumentException: Unknown signal: PWR at java.base/jdk.internal.misc.Signal.(Signal.java:148) at jdk.unsupported/sun.misc.Signal.(Signal.java:139) at org.apache.spark.util.SignalUtils$.$anonfun$registerSignal$1(SignalUtils.scala:95) at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) at org.apache.spark.util.SignalUtils$.registerSignal(SignalUtils.scala:93) at org.apache.spark.util.SignalUtils$.register(SignalUtils.scala:81) at org.apache.spark.deploy.worker.Worker.(Worker.scala:73) at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:887) at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:855) at org.apache.spark.deploy.worker.Worker.main(Worker.scala) ``` For executor: ``` 20/03/06 17:21:52 INFO CoarseGrainedExecutorBackend: Registering PWR handler. 20/03/06 17:21:52 INFO SignalUtils: Registering signal handler for PWR 20/03/06 17:21:52 WARN SignalUtils: Failed to register SIGPWR - disabling decommission feature. java.lang.IllegalArgumentException: Unknown signal: PWR at java.base/jdk.internal.misc.Signal.(Signal.java:148) at jdk.unsupported/sun.misc.Signal.(Signal.java:139) at org.apache.spark.util.SignalUtils$.$anonfun$registerSignal$1(SignalUtils.scala:95) at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) at org.apache.spark.util.SignalUtils$.registerSignal(SignalUtils.scala:93) at org.apache.spark.util.SignalUtils$.register(SignalUtils.scala:81) at org.apache.spark.executor.CoarseGrainedExecutorBackend.onStart(CoarseGrainedExecutorBackend.scala:86) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) ``` Closes #27832 from HeartSaVioR/SPARK-31011. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/deploy/worker/Worker.scala| 3 +- .../executor/CoarseGrainedExecutorBackend.scala| 3 +- .../scala/org/apache/spark/util/Si
[spark] branch branch-2.4 updated: [SPARK-29295][SQL][2.4] Insert overwrite to Hive external table partition should delete old data
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new c017422 [SPARK-29295][SQL][2.4] Insert overwrite to Hive external table partition should delete old data c017422 is described below commit c017422c6582121075738746cf9c7ae2257c658d Author: Liang-Chi Hsieh AuthorDate: Thu Mar 12 03:00:35 2020 -0700 [SPARK-29295][SQL][2.4] Insert overwrite to Hive external table partition should delete old data ### What changes were proposed in this pull request? This patch proposes to delete old Hive external partition directory even the partition does not exist in Hive, when insert overwrite Hive external table partition. This is backport of #25979 to branch-2.4. ### Why are the changes needed? When insert overwrite to a Hive external table partition, if the partition does not exist, Hive will not check if the external partition directory exists or not before copying files. So if users drop the partition, and then do insert overwrite to the same partition, the partition will have both old and new data. For example: ```scala withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") { // test is an external Hive table. sql("INSERT OVERWRITE TABLE test PARTITION(name='n1') SELECT 1") sql("ALTER TABLE test DROP PARTITION(name='n1')") sql("INSERT OVERWRITE TABLE test PARTITION(name='n1') SELECT 2") sql("SELECT id FROM test WHERE name = 'n1' ORDER BY id") // Got both 1 and 2. } ``` ### Does this PR introduce any user-facing change? Yes. This fix a correctness issue when users drop partition on a Hive external table partition and then insert overwrite it. ### How was this patch tested? Added test. Closes #27887 from viirya/SPARK-29295-2.4. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun --- .../sql/hive/execution/InsertIntoHiveTable.scala | 68 +++--- .../spark/sql/hive/execution/SQLQuerySuite.scala | 80 ++ 2 files changed, 139 insertions(+), 9 deletions(-) diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala index 0ed464d..1365737 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala @@ -24,7 +24,7 @@ import org.apache.hadoop.hive.ql.plan.TableDesc import org.apache.spark.SparkException import org.apache.spark.sql.{AnalysisException, Row, SparkSession} -import org.apache.spark.sql.catalyst.catalog.{CatalogTable, ExternalCatalog} +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType, ExternalCatalog, ExternalCatalogUtils} import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.execution.SparkPlan @@ -192,7 +192,7 @@ case class InsertIntoHiveTable( }.asInstanceOf[Attribute] } -saveAsHiveFile( +val writtenParts = saveAsHiveFile( sparkSession = sparkSession, plan = child, hadoopConf = hadoopConf, @@ -202,6 +202,42 @@ case class InsertIntoHiveTable( if (partition.nonEmpty) { if (numDynamicPartitions > 0) { +if (overwrite && table.tableType == CatalogTableType.EXTERNAL) { + // SPARK-29295: When insert overwrite to a Hive external table partition, if the + // partition does not exist, Hive will not check if the external partition directory + // exists or not before copying files. So if users drop the partition, and then do + // insert overwrite to the same partition, the partition will have both old and new + // data. We construct partition path. If the path exists, we delete it manually. + writtenParts.foreach { partPath => +val dpMap = partPath.split("/").map { part => + val splitPart = part.split("=") + assert(splitPart.size == 2, s"Invalid written partition path: $part") + ExternalCatalogUtils.unescapePathName(splitPart(0)) -> +ExternalCatalogUtils.unescapePathName(splitPart(1)) +}.toMap + +val updatedPartitionSpec = partition.map { + case (key, Some(value)) => key -> value + case (key, None) if dpMap.contains(key) => key -> dpMap(key) + case (key, _) => +throw new Spa
[spark] branch branch-2.4 updated: [SPARK-29295][SQL][2.4] Insert overwrite to Hive external table partition should delete old data
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new c017422 [SPARK-29295][SQL][2.4] Insert overwrite to Hive external table partition should delete old data c017422 is described below commit c017422c6582121075738746cf9c7ae2257c658d Author: Liang-Chi Hsieh AuthorDate: Thu Mar 12 03:00:35 2020 -0700 [SPARK-29295][SQL][2.4] Insert overwrite to Hive external table partition should delete old data ### What changes were proposed in this pull request? This patch proposes to delete old Hive external partition directory even the partition does not exist in Hive, when insert overwrite Hive external table partition. This is backport of #25979 to branch-2.4. ### Why are the changes needed? When insert overwrite to a Hive external table partition, if the partition does not exist, Hive will not check if the external partition directory exists or not before copying files. So if users drop the partition, and then do insert overwrite to the same partition, the partition will have both old and new data. For example: ```scala withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false") { // test is an external Hive table. sql("INSERT OVERWRITE TABLE test PARTITION(name='n1') SELECT 1") sql("ALTER TABLE test DROP PARTITION(name='n1')") sql("INSERT OVERWRITE TABLE test PARTITION(name='n1') SELECT 2") sql("SELECT id FROM test WHERE name = 'n1' ORDER BY id") // Got both 1 and 2. } ``` ### Does this PR introduce any user-facing change? Yes. This fix a correctness issue when users drop partition on a Hive external table partition and then insert overwrite it. ### How was this patch tested? Added test. Closes #27887 from viirya/SPARK-29295-2.4. Authored-by: Liang-Chi Hsieh Signed-off-by: Dongjoon Hyun --- .../sql/hive/execution/InsertIntoHiveTable.scala | 68 +++--- .../spark/sql/hive/execution/SQLQuerySuite.scala | 80 ++ 2 files changed, 139 insertions(+), 9 deletions(-) diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala index 0ed464d..1365737 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala @@ -24,7 +24,7 @@ import org.apache.hadoop.hive.ql.plan.TableDesc import org.apache.spark.SparkException import org.apache.spark.sql.{AnalysisException, Row, SparkSession} -import org.apache.spark.sql.catalyst.catalog.{CatalogTable, ExternalCatalog} +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType, ExternalCatalog, ExternalCatalogUtils} import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.execution.SparkPlan @@ -192,7 +192,7 @@ case class InsertIntoHiveTable( }.asInstanceOf[Attribute] } -saveAsHiveFile( +val writtenParts = saveAsHiveFile( sparkSession = sparkSession, plan = child, hadoopConf = hadoopConf, @@ -202,6 +202,42 @@ case class InsertIntoHiveTable( if (partition.nonEmpty) { if (numDynamicPartitions > 0) { +if (overwrite && table.tableType == CatalogTableType.EXTERNAL) { + // SPARK-29295: When insert overwrite to a Hive external table partition, if the + // partition does not exist, Hive will not check if the external partition directory + // exists or not before copying files. So if users drop the partition, and then do + // insert overwrite to the same partition, the partition will have both old and new + // data. We construct partition path. If the path exists, we delete it manually. + writtenParts.foreach { partPath => +val dpMap = partPath.split("/").map { part => + val splitPart = part.split("=") + assert(splitPart.size == 2, s"Invalid written partition path: $part") + ExternalCatalogUtils.unescapePathName(splitPart(0)) -> +ExternalCatalogUtils.unescapePathName(splitPart(1)) +}.toMap + +val updatedPartitionSpec = partition.map { + case (key, Some(value)) => key -> value + case (key, None) if dpMap.contains(key) => key -> dpMap(key) + case (key, _) => +throw new Spa
[spark] branch master updated (77c49cb -> 972e23d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 77c49cb [SPARK-31124][SQL] change the default value of minPartitionNum in AQE add 972e23d [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 74cb509 [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT 74cb509 is described below commit 74cb5094ec00c359bb70a456d6490f45bdd5ccd7 Author: Dongjoon Hyun AuthorDate: Thu Mar 12 09:06:29 2020 -0700 [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT ### What changes were proposed in this pull request? This PR (SPARK-31130) aims to pin `Commons IO` version to `2.4` in SBT build like Maven build. ### Why are the changes needed? [HADOOP-15261](https://issues.apache.org/jira/browse/HADOOP-15261) upgraded `commons-io` from 2.4 to 2.5 at Apache Hadoop 3.1. In `Maven`, Apache Spark always uses `Commons IO 2.4` based on `pom.xml`. ``` $ git grep commons-io.version pom.xml:2.4 pom.xml:${commons-io.version} ``` However, `SBT` choose `2.5`. **branch-3.0** ``` $ build/sbt -Phadoop-3.2 "core/dependencyTree" | grep commons-io:commons-io | head -n1 [info] | | +-commons-io:commons-io:2.5 ``` **branch-2.4** ``` $ build/sbt -Phadoop-3.1 "core/dependencyTree" | grep commons-io:commons-io | head -n1 [info] | | +-commons-io:commons-io:2.5 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with `[test-hadoop3.2]` (the default PR Builder is `SBT`) and manually do the following locally. ``` build/sbt -Phadoop-3.2 "core/dependencyTree" | grep commons-io:commons-io | head -n1 ``` Closes #27886 from dongjoon-hyun/SPARK-31130. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 972e23d18186c73026ebed95b37a886ca6eecf3e) Signed-off-by: Dongjoon Hyun --- project/SparkBuild.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index b606bdd..1a2a7c3 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -621,6 +621,7 @@ object KubernetesIntegrationTests { object DependencyOverrides { lazy val settings = Seq( dependencyOverrides += "com.google.guava" % "guava" % "14.0.1", +dependencyOverrides += "commons-io" % "commons-io" % "2.4", dependencyOverrides += "xerces" % "xercesImpl" % "2.12.0", dependencyOverrides += "jline" % "jline" % "2.14.6") } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 74cb509 [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT 74cb509 is described below commit 74cb5094ec00c359bb70a456d6490f45bdd5ccd7 Author: Dongjoon Hyun AuthorDate: Thu Mar 12 09:06:29 2020 -0700 [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT ### What changes were proposed in this pull request? This PR (SPARK-31130) aims to pin `Commons IO` version to `2.4` in SBT build like Maven build. ### Why are the changes needed? [HADOOP-15261](https://issues.apache.org/jira/browse/HADOOP-15261) upgraded `commons-io` from 2.4 to 2.5 at Apache Hadoop 3.1. In `Maven`, Apache Spark always uses `Commons IO 2.4` based on `pom.xml`. ``` $ git grep commons-io.version pom.xml:2.4 pom.xml:${commons-io.version} ``` However, `SBT` choose `2.5`. **branch-3.0** ``` $ build/sbt -Phadoop-3.2 "core/dependencyTree" | grep commons-io:commons-io | head -n1 [info] | | +-commons-io:commons-io:2.5 ``` **branch-2.4** ``` $ build/sbt -Phadoop-3.1 "core/dependencyTree" | grep commons-io:commons-io | head -n1 [info] | | +-commons-io:commons-io:2.5 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with `[test-hadoop3.2]` (the default PR Builder is `SBT`) and manually do the following locally. ``` build/sbt -Phadoop-3.2 "core/dependencyTree" | grep commons-io:commons-io | head -n1 ``` Closes #27886 from dongjoon-hyun/SPARK-31130. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 972e23d18186c73026ebed95b37a886ca6eecf3e) Signed-off-by: Dongjoon Hyun --- project/SparkBuild.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index b606bdd..1a2a7c3 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -621,6 +621,7 @@ object KubernetesIntegrationTests { object DependencyOverrides { lazy val settings = Seq( dependencyOverrides += "com.google.guava" % "guava" % "14.0.1", +dependencyOverrides += "commons-io" % "commons-io" % "2.4", dependencyOverrides += "xerces" % "xercesImpl" % "2.12.0", dependencyOverrides += "jline" % "jline" % "2.14.6") } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new e6bcaaa [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT e6bcaaa is described below commit e6bcaaa8a78f6c88b5c5276e90cd049e2cffc658 Author: Dongjoon Hyun AuthorDate: Thu Mar 12 09:06:29 2020 -0700 [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT This PR (SPARK-31130) aims to pin `Commons IO` version to `2.4` in SBT build like Maven build. [HADOOP-15261](https://issues.apache.org/jira/browse/HADOOP-15261) upgraded `commons-io` from 2.4 to 2.5 at Apache Hadoop 3.1. In `Maven`, Apache Spark always uses `Commons IO 2.4` based on `pom.xml`. ``` $ git grep commons-io.version pom.xml:2.4 pom.xml:${commons-io.version} ``` However, `SBT` choose `2.5`. **branch-3.0** ``` $ build/sbt -Phadoop-3.2 "core/dependencyTree" | grep commons-io:commons-io | head -n1 [info] | | +-commons-io:commons-io:2.5 ``` **branch-2.4** ``` $ build/sbt -Phadoop-3.1 "core/dependencyTree" | grep commons-io:commons-io | head -n1 [info] | | +-commons-io:commons-io:2.5 ``` No. Pass the Jenkins with `[test-hadoop3.2]` (the default PR Builder is `SBT`) and manually do the following locally. ``` build/sbt -Phadoop-3.2 "core/dependencyTree" | grep commons-io:commons-io | head -n1 ``` Closes #27886 from dongjoon-hyun/SPARK-31130. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 972e23d18186c73026ebed95b37a886ca6eecf3e) Signed-off-by: Dongjoon Hyun --- project/SparkBuild.scala | 1 + 1 file changed, 1 insertion(+) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 3f85ac6..7ee079c 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -552,6 +552,7 @@ object DockerIntegrationTests { object DependencyOverrides { lazy val settings = Seq( dependencyOverrides += "com.google.guava" % "guava" % "14.0.1", +dependencyOverrides += "commons-io" % "commons-io" % "2.4", dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7.3", dependencyOverrides += "jline" % "jline" % "2.14.6") } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (972e23d -> 7b4b29e8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 972e23d [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT add 7b4b29e8 [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 +- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 +-- 2 files changed, 2 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 4bcba6f [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled 4bcba6f is described below commit 4bcba6fa61e4edac3a616403af35d7e2b093fed3 Author: Kent Yao AuthorDate: Thu Mar 12 09:24:49 2020 -0700 [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled spark.sql.legacy.timeParser.enabled should be removed from SQLConf and the migration guide spark.sql.legacy.timeParsePolicy is the right one fix doc no Pass the jenkins Closes #27889 from yaooqinn/SPARK-31131. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit 7b4b29e8d955b43daa9ad28667e4fadbb9fce49a) Signed-off-by: Dongjoon Hyun --- docs/sql-migration-guide.md | 2 +- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 8 2 files changed, 1 insertion(+), 9 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index e7ac9f0..1081079 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -67,7 +67,7 @@ license: | - Since Spark 3.0, Proleptic Gregorian calendar is used in parsing, formatting, and converting dates and timestamps as well as in extracting sub-components like years, days and etc. Spark 3.0 uses Java 8 API classes from the java.time packages that based on ISO chronology (https://docs.oracle.com/javase/8/docs/api/java/time/chrono/IsoChronology.html). In Spark version 2.4 and earlier, those operations are performed by using the hybrid calendar (Julian + Gregorian, see https://docs.orac [...] -- Parsing/formatting of timestamp/date strings. This effects on CSV/JSON datasources and on the `unix_timestamp`, `date_format`, `to_unix_timestamp`, `from_unixtime`, `to_date`, `to_timestamp` functions when patterns specified by users is used for parsing and formatting. Since Spark 3.0, the conversions are based on `java.time.format.DateTimeFormatter`, see https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html. New implementation performs strict checking o [...] +- Parsing/formatting of timestamp/date strings. This effects on CSV/JSON datasources and on the `unix_timestamp`, `date_format`, `to_unix_timestamp`, `from_unixtime`, `to_date`, `to_timestamp` functions when patterns specified by users is used for parsing and formatting. Since Spark 3.0, we define our own pattern strings in `sql-ref-datetime-pattern.md`, which is implemented via `java.time.format.DateTimeFormatter` under the hood. New implementation performs strict checking of its in [...] - The `weekofyear`, `weekday`, `dayofweek`, `date_trunc`, `from_utc_timestamp`, `to_utc_timestamp`, and `unix_timestamp` functions use java.time API for calculation week number of year, day number of week as well for conversion from/to TimestampType values in UTC time zone. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 06180f6..ba25a68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2234,14 +2234,6 @@ object SQLConf { .checkValue(_ > 0, "The value of spark.sql.addPartitionInBatch.size must be positive") .createWithDefault(100) - val LEGACY_TIME_PARSER_ENABLED = buildConf("spark.sql.legacy.timeParser.enabled") -.internal() -.doc("When set to true, java.text.SimpleDateFormat is used for formatting and parsing " + - "dates/timestamps in a locale-sensitive manner. When set to false, classes from " + - "java.time.* packages are used for the same purpose.") -.booleanConf -.createWithDefault(false) - val LEGACY_ALLOW_HASH_ON_MAPTYPE = buildConf("spark.sql.legacy.allowHashOnMapType") .doc("When set to true, hash expressions can be applied on elements of MapType. Otherwise, " + "an analysis exception will be thrown.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7b4b29e8 -> fbc9dc7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7b4b29e8 [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled add fbc9dc7 [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark No new revisions were added by this update. Summary of changes: .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 434 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 434 ++--- .../benchmarks/IntervalBenchmark-jdk11-results.txt | 52 +-- sql/core/benchmarks/IntervalBenchmark-results.txt | 52 +-- .../execution/benchmark/DateTimeBenchmark.scala| 4 +- .../execution/benchmark/IntervalBenchmark.scala| 4 +- 6 files changed, 491 insertions(+), 489 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7b4b29e8 -> fbc9dc7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7b4b29e8 [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled add fbc9dc7 [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark No new revisions were added by this update. Summary of changes: .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 434 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 434 ++--- .../benchmarks/IntervalBenchmark-jdk11-results.txt | 52 +-- sql/core/benchmarks/IntervalBenchmark-results.txt | 52 +-- .../execution/benchmark/DateTimeBenchmark.scala| 4 +- .../execution/benchmark/IntervalBenchmark.scala| 4 +- 6 files changed, 491 insertions(+), 489 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fd56924 [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark fd56924 is described below commit fd5692477ca9ba3407a350caba01a6f192d521b2 Author: Kent Yao AuthorDate: Thu Mar 12 12:59:29 2020 -0700 [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark ### What changes were proposed in this pull request? This PR aims to recover `IntervalBenchmark` and `DataTimeBenchmark` due to banning intervals as output. ### Why are the changes needed? This PR recovers the benchmark suite. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually, re-run the benchmark. Closes #27885 from yaooqinn/SPARK-3-2. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit fbc9dc7e9dcde8a77673b1782f4f1141e183ff00) Signed-off-by: Dongjoon Hyun --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 434 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 434 ++--- .../benchmarks/IntervalBenchmark-jdk11-results.txt | 52 +-- sql/core/benchmarks/IntervalBenchmark-results.txt | 52 +-- .../execution/benchmark/DateTimeBenchmark.scala| 4 +- .../execution/benchmark/IntervalBenchmark.scala| 4 +- 6 files changed, 491 insertions(+), 489 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index 7d9b147..883f9de 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -2,428 +2,428 @@ Extract components -OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1044-aws -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.3 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz cast to timestamp:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -cast to timestamp wholestage off408445 53 24.5 40.8 1.0X -cast to timestamp wholestage on 401453 63 24.9 40.1 1.0X +cast to timestamp wholestage off221232 16 45.3 22.1 1.0X +cast to timestamp wholestage on 213256 71 46.9 21.3 1.0X -OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1044-aws -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.3 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz year of timestamp:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -year of timestamp wholestage off 1197 1246 69 8.4 119.7 1.0X -year of timestamp wholestage on 1123 10 9.0 111.1 1.1X +year of timestamp wholestage off863961 139 11.6 86.3 1.0X +year of timestamp wholestage on 783821 26 12.8 78.3 1.1X -OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1044-aws -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.3 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz quarter of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -quarter of timestamp wholestage off1451 1462 16 6.9 145.1 1.0X -quarter of timestamp wholestage on 1409 1424 13 7.1 140.9 1.0X +quarter of timestamp wholestage off1008 1013 7 9.9 100.8 1.0X +quarter of timestamp wholestage on 926963 36
[spark] branch branch-3.0 updated: [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fd56924 [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark fd56924 is described below commit fd5692477ca9ba3407a350caba01a6f192d521b2 Author: Kent Yao AuthorDate: Thu Mar 12 12:59:29 2020 -0700 [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark ### What changes were proposed in this pull request? This PR aims to recover `IntervalBenchmark` and `DataTimeBenchmark` due to banning intervals as output. ### Why are the changes needed? This PR recovers the benchmark suite. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually, re-run the benchmark. Closes #27885 from yaooqinn/SPARK-3-2. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit fbc9dc7e9dcde8a77673b1782f4f1141e183ff00) Signed-off-by: Dongjoon Hyun --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 434 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 434 ++--- .../benchmarks/IntervalBenchmark-jdk11-results.txt | 52 +-- sql/core/benchmarks/IntervalBenchmark-results.txt | 52 +-- .../execution/benchmark/DateTimeBenchmark.scala| 4 +- .../execution/benchmark/IntervalBenchmark.scala| 4 +- 6 files changed, 491 insertions(+), 489 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index 7d9b147..883f9de 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -2,428 +2,428 @@ Extract components -OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1044-aws -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.3 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz cast to timestamp:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -cast to timestamp wholestage off408445 53 24.5 40.8 1.0X -cast to timestamp wholestage on 401453 63 24.9 40.1 1.0X +cast to timestamp wholestage off221232 16 45.3 22.1 1.0X +cast to timestamp wholestage on 213256 71 46.9 21.3 1.0X -OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1044-aws -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.3 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz year of timestamp:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -year of timestamp wholestage off 1197 1246 69 8.4 119.7 1.0X -year of timestamp wholestage on 1123 10 9.0 111.1 1.1X +year of timestamp wholestage off863961 139 11.6 86.3 1.0X +year of timestamp wholestage on 783821 26 12.8 78.3 1.1X -OpenJDK 64-Bit Server VM 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 on Linux 4.15.0-1044-aws -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.3 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz quarter of timestamp: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -quarter of timestamp wholestage off1451 1462 16 6.9 145.1 1.0X -quarter of timestamp wholestage on 1409 1424 13 7.1 140.9 1.0X +quarter of timestamp wholestage off1008 1013 7 9.9 100.8 1.0X +quarter of timestamp wholestage on 926963 36
[spark] branch branch-2.4 updated (e6bcaaa -> 51ccb6f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from e6bcaaa [SPARK-31130][BUILD] Use the same version of `commons-io` in SBT add 51ccb6f [SPARK-31144][SQL][2.4] Wrap Error with QueryExecutionException to notify QueryExecutionListener No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/DataFrameWriter.scala | 2 +- .../main/scala/org/apache/spark/sql/Dataset.scala | 2 +- .../spark/sql/util/QueryExecutionListener.scala| 15 -- .../spark/sql/util/DataFrameCallbackSuite.scala| 34 -- 4 files changed, 46 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2a4fed0 -> 1ddf44d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2a4fed0 [SPARK-30654][WEBUI] Bootstrap4 WebUI upgrade add 1ddf44d [SPARK-31144][SQL] Wrap Error with QueryExecutionException to notify QueryExecutionListener No new revisions were added by this update. Summary of changes: project/MimaExcludes.scala | 4 -- .../spark/sql/util/QueryExecutionListener.scala| 18 +-- .../apache/spark/sql/DataFrameWriterV2Suite.scala | 2 +- .../org/apache/spark/sql/SessionStateSuite.scala | 2 +- .../spark/sql/TestQueryExecutionListener.scala | 2 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 2 +- .../sql/connector/DataSourceV2DataFrameSuite.scala | 2 +- .../connector/FileDataSourceV2FallBackSuite.scala | 6 +-- .../connector/SupportsCatalogOptionsSuite.scala| 2 +- .../sql/test/DataFrameReaderWriterSuite.scala | 2 +- .../spark/sql/util/DataFrameCallbackSuite.scala| 62 -- .../sql/util/ExecutionListenerManagerSuite.scala | 2 +- .../sql/hive/thriftserver/DummyListeners.scala | 2 +- 13 files changed, 71 insertions(+), 37 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31144][SQL] Wrap Error with QueryExecutionException to notify QueryExecutionListener
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 339e4dd [SPARK-31144][SQL] Wrap Error with QueryExecutionException to notify QueryExecutionListener 339e4dd is described below commit 339e4dd3a3daf6c11670e5ca7786c54f68a86bfa Author: Shixiong Zhu AuthorDate: Fri Mar 13 15:55:29 2020 -0700 [SPARK-31144][SQL] Wrap Error with QueryExecutionException to notify QueryExecutionListener ### What changes were proposed in this pull request? This PR manually reverts changes in #25292 and then wraps java.lang.Error with `QueryExecutionException` to notify `QueryExecutionListener` to send it to `QueryExecutionListener.onFailure` which only accepts `Exception`. The bug fix PR for 2.4 is #27904. It needs a separate PR because the touched codes were changed a lot. ### Why are the changes needed? Avoid API changes and fix a bug. ### Does this PR introduce any user-facing change? Yes. Reverting an API change happening in 3.0. QueryExecutionListener APIs will be the same as 2.4. ### How was this patch tested? The new added test. Closes #27907 from zsxwing/SPARK-31144. Authored-by: Shixiong Zhu Signed-off-by: Dongjoon Hyun (cherry picked from commit 1ddf44dfcaff53e870a3c9608e31a60805e50c29) Signed-off-by: Dongjoon Hyun --- project/MimaExcludes.scala | 4 -- .../spark/sql/util/QueryExecutionListener.scala| 18 +-- .../apache/spark/sql/DataFrameWriterV2Suite.scala | 2 +- .../org/apache/spark/sql/SessionStateSuite.scala | 2 +- .../spark/sql/TestQueryExecutionListener.scala | 2 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 2 +- .../sql/connector/DataSourceV2DataFrameSuite.scala | 2 +- .../connector/FileDataSourceV2FallBackSuite.scala | 6 +-- .../connector/SupportsCatalogOptionsSuite.scala| 2 +- .../sql/test/DataFrameReaderWriterSuite.scala | 2 +- .../spark/sql/util/DataFrameCallbackSuite.scala| 62 -- .../sql/util/ExecutionListenerManagerSuite.scala | 2 +- .../sql/hive/thriftserver/DummyListeners.scala | 2 +- 13 files changed, 71 insertions(+), 37 deletions(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index 7f66577..f8ad60b 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -419,10 +419,6 @@ object MimaExcludes { ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.streaming.ProcessingTime"), ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.streaming.ProcessingTime$"), -// [SPARK-28556][SQL] QueryExecutionListener should also notify Error - ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.util.QueryExecutionListener.onFailure"), - ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.util.QueryExecutionListener.onFailure"), - // [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.image.ImageSchema.readImages"), diff --git a/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala b/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala index 01f8182..0b5951e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala @@ -23,7 +23,7 @@ import org.apache.spark.annotation.DeveloperApi import org.apache.spark.internal.Logging import org.apache.spark.scheduler.{SparkListener, SparkListenerEvent} import org.apache.spark.sql.SparkSession -import org.apache.spark.sql.execution.QueryExecution +import org.apache.spark.sql.execution.{QueryExecution, QueryExecutionException} import org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionEnd import org.apache.spark.sql.internal.StaticSQLConf._ import org.apache.spark.util.{ListenerBus, Utils} @@ -55,12 +55,13 @@ trait QueryExecutionListener { * @param funcName the name of the action that triggered this query. * @param qe the QueryExecution object that carries detail information like logical plan, * physical plan, etc. - * @param error the error that failed this query. - * + * @param exception the exception that failed this query. If `java.lang.Error` is thrown during + * execution, it will be wrapped with an `Exception` and it can be accessed by + * `exception.getCause`. * @note This can be invoked by multiple different threads. */ @DeveloperApi - def onFailure(funcName: String, qe: QueryExecution,
[spark] branch branch-3.0 updated: [SPARK-31144][SQL] Wrap Error with QueryExecutionException to notify QueryExecutionListener
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 339e4dd [SPARK-31144][SQL] Wrap Error with QueryExecutionException to notify QueryExecutionListener 339e4dd is described below commit 339e4dd3a3daf6c11670e5ca7786c54f68a86bfa Author: Shixiong Zhu AuthorDate: Fri Mar 13 15:55:29 2020 -0700 [SPARK-31144][SQL] Wrap Error with QueryExecutionException to notify QueryExecutionListener ### What changes were proposed in this pull request? This PR manually reverts changes in #25292 and then wraps java.lang.Error with `QueryExecutionException` to notify `QueryExecutionListener` to send it to `QueryExecutionListener.onFailure` which only accepts `Exception`. The bug fix PR for 2.4 is #27904. It needs a separate PR because the touched codes were changed a lot. ### Why are the changes needed? Avoid API changes and fix a bug. ### Does this PR introduce any user-facing change? Yes. Reverting an API change happening in 3.0. QueryExecutionListener APIs will be the same as 2.4. ### How was this patch tested? The new added test. Closes #27907 from zsxwing/SPARK-31144. Authored-by: Shixiong Zhu Signed-off-by: Dongjoon Hyun (cherry picked from commit 1ddf44dfcaff53e870a3c9608e31a60805e50c29) Signed-off-by: Dongjoon Hyun --- project/MimaExcludes.scala | 4 -- .../spark/sql/util/QueryExecutionListener.scala| 18 +-- .../apache/spark/sql/DataFrameWriterV2Suite.scala | 2 +- .../org/apache/spark/sql/SessionStateSuite.scala | 2 +- .../spark/sql/TestQueryExecutionListener.scala | 2 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 2 +- .../sql/connector/DataSourceV2DataFrameSuite.scala | 2 +- .../connector/FileDataSourceV2FallBackSuite.scala | 6 +-- .../connector/SupportsCatalogOptionsSuite.scala| 2 +- .../sql/test/DataFrameReaderWriterSuite.scala | 2 +- .../spark/sql/util/DataFrameCallbackSuite.scala| 62 -- .../sql/util/ExecutionListenerManagerSuite.scala | 2 +- .../sql/hive/thriftserver/DummyListeners.scala | 2 +- 13 files changed, 71 insertions(+), 37 deletions(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index 7f66577..f8ad60b 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -419,10 +419,6 @@ object MimaExcludes { ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.streaming.ProcessingTime"), ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.streaming.ProcessingTime$"), -// [SPARK-28556][SQL] QueryExecutionListener should also notify Error - ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.util.QueryExecutionListener.onFailure"), - ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.util.QueryExecutionListener.onFailure"), - // [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.image.ImageSchema.readImages"), diff --git a/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala b/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala index 01f8182..0b5951e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala @@ -23,7 +23,7 @@ import org.apache.spark.annotation.DeveloperApi import org.apache.spark.internal.Logging import org.apache.spark.scheduler.{SparkListener, SparkListenerEvent} import org.apache.spark.sql.SparkSession -import org.apache.spark.sql.execution.QueryExecution +import org.apache.spark.sql.execution.{QueryExecution, QueryExecutionException} import org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionEnd import org.apache.spark.sql.internal.StaticSQLConf._ import org.apache.spark.util.{ListenerBus, Utils} @@ -55,12 +55,13 @@ trait QueryExecutionListener { * @param funcName the name of the action that triggered this query. * @param qe the QueryExecution object that carries detail information like logical plan, * physical plan, etc. - * @param error the error that failed this query. - * + * @param exception the exception that failed this query. If `java.lang.Error` is thrown during + * execution, it will be wrapped with an `Exception` and it can be accessed by + * `exception.getCause`. * @note This can be invoked by multiple different threads. */ @DeveloperApi - def onFailure(funcName: String, qe: QueryExecution,
[spark] branch branch-3.0 updated: [MINOR][DOCS] Fix [[...]] to `...` and ... in documentation
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1fc9833 [MINOR][DOCS] Fix [[...]] to `...` and ... in documentation 1fc9833 is described below commit 1fc98336cfc8390139f2548a1f496d40a6a7f784 Author: HyukjinKwon AuthorDate: Fri Mar 13 16:44:23 2020 -0700 [MINOR][DOCS] Fix [[...]] to `...` and ... in documentation ### What changes were proposed in this pull request? Before: - ![Screen Shot 2020-03-13 at 1 19 12 PM](https://user-images.githubusercontent.com/6477701/76589452-7c34f300-652d-11ea-9da7-3754f8575796.png) - ![Screen Shot 2020-03-13 at 1 19 24 PM](https://user-images.githubusercontent.com/6477701/76589455-7d662000-652d-11ea-9dbe-f5fe10d1e7ad.png) - ![Screen Shot 2020-03-13 at 1 19 03 PM](https://user-images.githubusercontent.com/6477701/76589449-7b03c600-652d-11ea-8e99-dbe47f561f9c.png) After: - ![Screen Shot 2020-03-13 at 1 17 37 PM](https://user-images.githubusercontent.com/6477701/76589437-74754e80-652d-11ea-99f5-14fb4761f915.png) - ![Screen Shot 2020-03-13 at 1 17 46 PM](https://user-images.githubusercontent.com/6477701/76589442-76d7a880-652d-11ea-8c10-53e595421081.png) - ![Screen Shot 2020-03-13 at 1 18 15 PM](https://user-images.githubusercontent.com/6477701/76589443-7808d580-652d-11ea-9b1b-e5d11d638335.png) ### Why are the changes needed? To render the code block properly in the documentation ### Does this PR introduce any user-facing change? Yes, code rendering in documentation. ### How was this patch tested? Manually built the doc via `SKIP_API=1 jekyll build`. Closes #27899 from HyukjinKwon/minor-docss. Authored-by: HyukjinKwon Signed-off-by: Dongjoon Hyun (cherry picked from commit 9628aca68ba0821b8f3fa934ed4872cabb2a5d7d) Signed-off-by: Dongjoon Hyun --- docs/monitoring.md | 6 +++--- docs/quick-start.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 4cba15b..ba3f1dc 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -595,7 +595,7 @@ A list of the available metrics, with a short description: inputMetrics.* -Metrics related to reading data from [[org.apache.spark.rdd.HadoopRDD]] +Metrics related to reading data from org.apache.spark.rdd.HadoopRDD or from persisted data. @@ -779,11 +779,11 @@ A list of the available metrics, with a short description: .DirectPoolMemory -Peak memory that the JVM is using for direct buffer pool ([[java.lang.management.BufferPoolMXBean]]) +Peak memory that the JVM is using for direct buffer pool (java.lang.management.BufferPoolMXBean) .MappedPoolMemory -Peak memory that the JVM is using for mapped buffer pool ([[java.lang.management.BufferPoolMXBean]]) +Peak memory that the JVM is using for mapped buffer pool (java.lang.management.BufferPoolMXBean) .ProcessTreeJVMVMemory diff --git a/docs/quick-start.md b/docs/quick-start.md index 86ba2c4..e7a16a3 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -264,7 +264,7 @@ Spark README. Note that you'll need to replace YOUR_SPARK_HOME with the location installed. Unlike the earlier examples with the Spark shell, which initializes its own SparkSession, we initialize a SparkSession as part of the program. -We call `SparkSession.builder` to construct a [[SparkSession]], then set the application name, and finally call `getOrCreate` to get the [[SparkSession]] instance. +We call `SparkSession.builder` to construct a `SparkSession`, then set the application name, and finally call `getOrCreate` to get the `SparkSession` instance. Our application depends on the Spark API, so we'll also include an sbt configuration file, `build.sbt`, which explains that Spark is a dependency. This file also adds a repository that - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1ddf44d -> 9628aca)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1ddf44d [SPARK-31144][SQL] Wrap Error with QueryExecutionException to notify QueryExecutionListener add 9628aca [MINOR][DOCS] Fix [[...]] to `...` and ... in documentation No new revisions were added by this update. Summary of changes: docs/monitoring.md | 6 +++--- docs/quick-start.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08bdc9c -> b0d2956)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08bdc9c [SPARK-31068][SQL] Avoid IllegalArgumentException in broadcast exchange add b0d2956 [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 No new revisions were added by this update. Summary of changes: external/docker-integration-tests/pom.xml | 1 + pom.xml | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b0d2956 [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 b0d2956 is described below commit b0d2956a359f00e703d5ebe9a58fb9fec869721e Author: Gabor Somogyi AuthorDate: Sun Mar 15 23:55:04 2020 -0700 [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 ### What changes were proposed in this pull request? Upgrdade `docker-client` version. ### Why are the changes needed? `docker-client` what Spark uses is super old. Snippet from the project page: ``` Spotify no longer uses recent versions of this project internally. The version of docker-client we're using is whatever helios has in its pom.xml. => 8.14.1 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? ``` build/mvn install -DskipTests build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.DB2IntegrationSuite test` build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MsSqlServerIntegrationSuite test` build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test` ``` Closes #27892 from gaborgsomogyi/docker-client. Authored-by: Gabor Somogyi Signed-off-by: Dongjoon Hyun --- external/docker-integration-tests/pom.xml | 1 + pom.xml | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/external/docker-integration-tests/pom.xml b/external/docker-integration-tests/pom.xml index c357a2f..8743d72 100644 --- a/external/docker-integration-tests/pom.xml +++ b/external/docker-integration-tests/pom.xml @@ -50,6 +50,7 @@ com.spotify docker-client test + shaded org.apache.httpcomponents diff --git a/pom.xml b/pom.xml index a335759..c90ac68 100644 --- a/pom.xml +++ b/pom.xml @@ -931,8 +931,9 @@ com.spotify docker-client -5.0.2 +8.14.1 test +shaded guava - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new aad1f5a [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 aad1f5a is described below commit aad1f5aa2d3e281dde2a019c1c4975533c908b66 Author: Gabor Somogyi AuthorDate: Sun Mar 15 23:55:04 2020 -0700 [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 ### What changes were proposed in this pull request? Upgrdade `docker-client` version. ### Why are the changes needed? `docker-client` what Spark uses is super old. Snippet from the project page: ``` Spotify no longer uses recent versions of this project internally. The version of docker-client we're using is whatever helios has in its pom.xml. => 8.14.1 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? ``` build/mvn install -DskipTests build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.DB2IntegrationSuite test` build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MsSqlServerIntegrationSuite test` build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test` ``` Closes #27892 from gaborgsomogyi/docker-client. Authored-by: Gabor Somogyi Signed-off-by: Dongjoon Hyun (cherry picked from commit b0d2956a359f00e703d5ebe9a58fb9fec869721e) Signed-off-by: Dongjoon Hyun --- external/docker-integration-tests/pom.xml | 1 + pom.xml | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/external/docker-integration-tests/pom.xml b/external/docker-integration-tests/pom.xml index aff79b8..cdf76e9 100644 --- a/external/docker-integration-tests/pom.xml +++ b/external/docker-integration-tests/pom.xml @@ -50,6 +50,7 @@ com.spotify docker-client test + shaded org.apache.httpcomponents diff --git a/pom.xml b/pom.xml index 5aa100e..978127e 100644 --- a/pom.xml +++ b/pom.xml @@ -931,8 +931,9 @@ com.spotify docker-client -5.0.2 +8.14.1 test +shaded guava - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new aad1f5a [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 aad1f5a is described below commit aad1f5aa2d3e281dde2a019c1c4975533c908b66 Author: Gabor Somogyi AuthorDate: Sun Mar 15 23:55:04 2020 -0700 [SPARK-31135][BUILD][TESTS] Upgrdade docker-client version to 8.14.1 ### What changes were proposed in this pull request? Upgrdade `docker-client` version. ### Why are the changes needed? `docker-client` what Spark uses is super old. Snippet from the project page: ``` Spotify no longer uses recent versions of this project internally. The version of docker-client we're using is whatever helios has in its pom.xml. => 8.14.1 ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? ``` build/mvn install -DskipTests build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.DB2IntegrationSuite test` build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MsSqlServerIntegrationSuite test` build/mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite test` ``` Closes #27892 from gaborgsomogyi/docker-client. Authored-by: Gabor Somogyi Signed-off-by: Dongjoon Hyun (cherry picked from commit b0d2956a359f00e703d5ebe9a58fb9fec869721e) Signed-off-by: Dongjoon Hyun --- external/docker-integration-tests/pom.xml | 1 + pom.xml | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/external/docker-integration-tests/pom.xml b/external/docker-integration-tests/pom.xml index aff79b8..cdf76e9 100644 --- a/external/docker-integration-tests/pom.xml +++ b/external/docker-integration-tests/pom.xml @@ -50,6 +50,7 @@ com.spotify docker-client test + shaded org.apache.httpcomponents diff --git a/pom.xml b/pom.xml index 5aa100e..978127e 100644 --- a/pom.xml +++ b/pom.xml @@ -931,8 +931,9 @@ com.spotify docker-client -5.0.2 +8.14.1 test +shaded guava - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (21c02ee -> e736c62)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 21c02ee [SPARK-30864][SQL][DOC] add the user guide for Adaptive Query Execution add e736c62 [SPARK-31116][SQL] Fix nested schema case-sensitivity in ParquetRowConverter No new revisions were added by this update. Summary of changes: .../datasources/parquet/ParquetRowConverter.scala | 12 +-- .../spark/sql/FileBasedDataSourceSuite.scala | 40 ++ 2 files changed, 50 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31116][SQL] Fix nested schema case-sensitivity in ParquetRowConverter
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new da1f95b [SPARK-31116][SQL] Fix nested schema case-sensitivity in ParquetRowConverter da1f95b is described below commit da1f95be6b9af59a91a14e01613bdc4e8ac35374 Author: Tae-kyeom, Kim AuthorDate: Mon Mar 16 10:31:56 2020 -0700 [SPARK-31116][SQL] Fix nested schema case-sensitivity in ParquetRowConverter ### What changes were proposed in this pull request? This PR (SPARK-31116) add caseSensitive parameter to ParquetRowConverter so that it handle materialize parquet properly with respect to case sensitivity ### Why are the changes needed? From spark 3.0.0, below statement throws IllegalArgumentException in caseInsensitive mode because of explicit field index searching in ParquetRowConverter. As we already constructed parquet requested schema and catalyst requested schema during schema clipping in ParquetReadSupport, just follow these behavior. ```scala val path = "/some/temp/path" spark .range(1L) .selectExpr("NAMED_STRUCT('lowercase', id, 'camelCase', id + 1) AS StructColumn") .write.parquet(path) val caseInsensitiveSchema = new StructType() .add( "StructColumn", new StructType() .add("LowerCase", LongType) .add("camelcase", LongType)) spark.read.schema(caseInsensitiveSchema).parquet(path).show() ``` ### Does this PR introduce any user-facing change? No. The changes are only in unreleased branches (`master` and `branch-3.0`). ### How was this patch tested? Passed new test cases that check parquet column selection with respect to schemas and case sensitivities Closes #27888 from kimtkyeom/parquet_row_converter_case_sensitivity. Authored-by: Tae-kyeom, Kim Signed-off-by: Dongjoon Hyun (cherry picked from commit e736c62764137b2c3af90d2dc8a77e391891200a) Signed-off-by: Dongjoon Hyun --- .../datasources/parquet/ParquetRowConverter.scala | 12 +-- .../spark/sql/FileBasedDataSourceSuite.scala | 40 ++ 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala index 850adae..22422c0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala @@ -33,8 +33,9 @@ import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.{BINARY, DOUBLE import org.apache.spark.internal.Logging import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, CaseInsensitiveMap, DateTimeUtils, GenericArrayData} import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLTimestamp +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String @@ -178,8 +179,15 @@ private[parquet] class ParquetRowConverter( // Converters for each field. private[this] val fieldConverters: Array[Converter with HasParentContainerUpdater] = { +// (SPARK-31116) Use case insensitive map if spark.sql.caseSensitive is false +// to prevent throwing IllegalArgumentException when searching catalyst type's field index +val catalystFieldNameToIndex = if (SQLConf.get.caseSensitiveAnalysis) { + catalystType.fieldNames.zipWithIndex.toMap +} else { + CaseInsensitiveMap(catalystType.fieldNames.zipWithIndex.toMap) +} parquetType.getFields.asScala.map { parquetField => - val fieldIndex = catalystType.fieldIndex(parquetField.getName) + val fieldIndex = catalystFieldNameToIndex(parquetField.getName) val catalystField = catalystType(fieldIndex) // Converted field value should be set to the `fieldIndex`-th cell of `currentRow` newConverter(parquetField, catalystField.dataType, new RowUpdater(currentRow, fieldIndex)) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala index c870958..cb410b4 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSui
[spark] branch branch-3.0 updated: [SPARK-31116][SQL] Fix nested schema case-sensitivity in ParquetRowConverter
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new da1f95b [SPARK-31116][SQL] Fix nested schema case-sensitivity in ParquetRowConverter da1f95b is described below commit da1f95be6b9af59a91a14e01613bdc4e8ac35374 Author: Tae-kyeom, Kim AuthorDate: Mon Mar 16 10:31:56 2020 -0700 [SPARK-31116][SQL] Fix nested schema case-sensitivity in ParquetRowConverter ### What changes were proposed in this pull request? This PR (SPARK-31116) add caseSensitive parameter to ParquetRowConverter so that it handle materialize parquet properly with respect to case sensitivity ### Why are the changes needed? From spark 3.0.0, below statement throws IllegalArgumentException in caseInsensitive mode because of explicit field index searching in ParquetRowConverter. As we already constructed parquet requested schema and catalyst requested schema during schema clipping in ParquetReadSupport, just follow these behavior. ```scala val path = "/some/temp/path" spark .range(1L) .selectExpr("NAMED_STRUCT('lowercase', id, 'camelCase', id + 1) AS StructColumn") .write.parquet(path) val caseInsensitiveSchema = new StructType() .add( "StructColumn", new StructType() .add("LowerCase", LongType) .add("camelcase", LongType)) spark.read.schema(caseInsensitiveSchema).parquet(path).show() ``` ### Does this PR introduce any user-facing change? No. The changes are only in unreleased branches (`master` and `branch-3.0`). ### How was this patch tested? Passed new test cases that check parquet column selection with respect to schemas and case sensitivities Closes #27888 from kimtkyeom/parquet_row_converter_case_sensitivity. Authored-by: Tae-kyeom, Kim Signed-off-by: Dongjoon Hyun (cherry picked from commit e736c62764137b2c3af90d2dc8a77e391891200a) Signed-off-by: Dongjoon Hyun --- .../datasources/parquet/ParquetRowConverter.scala | 12 +-- .../spark/sql/FileBasedDataSourceSuite.scala | 40 ++ 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala index 850adae..22422c0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala @@ -33,8 +33,9 @@ import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.{BINARY, DOUBLE import org.apache.spark.internal.Logging import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, CaseInsensitiveMap, DateTimeUtils, GenericArrayData} import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLTimestamp +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String @@ -178,8 +179,15 @@ private[parquet] class ParquetRowConverter( // Converters for each field. private[this] val fieldConverters: Array[Converter with HasParentContainerUpdater] = { +// (SPARK-31116) Use case insensitive map if spark.sql.caseSensitive is false +// to prevent throwing IllegalArgumentException when searching catalyst type's field index +val catalystFieldNameToIndex = if (SQLConf.get.caseSensitiveAnalysis) { + catalystType.fieldNames.zipWithIndex.toMap +} else { + CaseInsensitiveMap(catalystType.fieldNames.zipWithIndex.toMap) +} parquetType.getFields.asScala.map { parquetField => - val fieldIndex = catalystType.fieldIndex(parquetField.getName) + val fieldIndex = catalystFieldNameToIndex(parquetField.getName) val catalystField = catalystType(fieldIndex) // Converted field value should be set to the `fieldIndex`-th cell of `currentRow` newConverter(parquetField, catalystField.dataType, new RowUpdater(currentRow, fieldIndex)) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala index c870958..cb410b4 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSui