[spark] branch master updated (534f5d4 -> da32d1e)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 534f5d4 [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy add da32d1e [SPARK-30700][ML] NaiveBayesModel predict optimization No new revisions were added by this update. Summary of changes: .../apache/spark/ml/classification/NaiveBayes.scala | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3538095 -> 534f5d4)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3538095 [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29 add 534f5d4 [SPARK-29138][PYTHON][TEST] Increase timeout of StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy No new revisions were added by this update. Summary of changes: python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (878094f -> 3538095)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 878094f [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling add 3538095 [SPARK-30698][BUILD] Bumps checkstyle from 8.25 to 8.29 No new revisions were added by this update. Summary of changes: pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 878094f [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling 878094f is described below commit 878094f9720d3c1866cbc01fb24c9794fe34edd9 Author: Thomas Graves AuthorDate: Fri Jan 31 22:20:28 2020 -0600 [SPARK-30689][CORE][YARN] Add resource discovery plugin api to support YARN versions with resource scheduling ### What changes were proposed in this pull request? This change is to allow custom resource scheduler (GPUs,FPGAs,etc) resource discovery to be more flexible. Users are asking for it to work with hadoop 2.x versions that do not support resource scheduling in YARN and/or also they may not run in an isolated environment. This change creates a plugin api that users can write their own resource discovery class that allows a lot more flexibility. The user can chain plugins for different resource types. The user specified plugins execute in the order specified and will fall back to use the discovery script plugin if they don't return information for a particular resource. I had to open up a few of the classes to be public and change them to not be case classes and make them developer api in order for the the plugin to get enough information it needs. I also relaxed the yarn side so that if yarn isn't configured for resource scheduling we just warn and go on. This helps users that have yarn 3.1 but haven't configured the resource scheduling side on their cluster yet, or aren't running in isolated environment. The user would configured this like: --conf spark.resources.discovery.plugin="org.apache.spark.resource.ResourceDiscoveryFPGAPlugin, org.apache.spark.resource.ResourceDiscoveryGPUPlugin" Note the executor side had to be wrapped with a classloader to make sure we include the user classpath for jars they specified on submission. Note this is more flexible because the discovery script has limitations such as spawning it in a separate process. This means if you are trying to allocate resources in that process they might be released when the script returns. Other things are the class makes it more flexible to be able to integrate with existing systems and solutions for assigning resources. ### Why are the changes needed? to more easily use spark resource scheduling with older versions of hadoop or in non-isolated enivronments. ### Does this PR introduce any user-facing change? Yes a plugin api ### How was this patch tested? Unit tests added and manual testing done on yarn and standalone modes. Closes #27410 from tgravescs/hadoop27spark3. Lead-authored-by: Thomas Graves Co-authored-by: Thomas Graves Signed-off-by: Thomas Graves --- .../api/resource/ResourceDiscoveryPlugin.java | 63 +++ .../main/scala/org/apache/spark/SparkContext.scala | 8 +- .../spark/deploy/StandaloneResourceUtils.scala | 4 +- .../executor/CoarseGrainedExecutorBackend.scala| 36 +++- .../org/apache/spark/internal/config/package.scala | 12 ++ .../resource/ResourceDiscoveryScriptPlugin.scala | 62 +++ .../apache/spark/resource/ResourceProfile.scala| 4 +- .../org/apache/spark/resource/ResourceUtils.scala | 136 +-- .../scala/org/apache/spark/SparkConfSuite.scala| 2 +- .../CoarseGrainedExecutorBackendSuite.scala| 3 +- .../resource/ResourceDiscoveryPluginSuite.scala| 194 + .../apache/spark/resource/ResourceUtilsSuite.scala | 65 --- .../apache/spark/resource/TestResourceIDs.scala| 16 +- docs/configuration.md | 12 ++ .../apache/spark/deploy/k8s/KubernetesUtils.scala | 8 +- .../k8s/features/BasicDriverFeatureStepSuite.scala | 2 +- .../features/BasicExecutorFeatureStepSuite.scala | 4 +- .../spark/deploy/yarn/ResourceRequestHelper.scala | 31 +++- .../org/apache/spark/deploy/yarn/ClientSuite.scala | 6 +- 19 files changed, 560 insertions(+), 108 deletions(-) diff --git a/core/src/main/java/org/apache/spark/api/resource/ResourceDiscoveryPlugin.java b/core/src/main/java/org/apache/spark/api/resource/ResourceDiscoveryPlugin.java new file mode 100644 index 000..ffd2f83 --- /dev/null +++ b/core/src/main/java/org/apache/spark/api/resource/ResourceDiscoveryPlugin.java @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not
[spark] branch master updated (d0c3e9f -> 8eecc20)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d0c3e9f [SPARK-30660][ML][PYSPARK] LinearRegression blockify input vectors add 8eecc20 [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table" No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + .../apache/spark/sql/catalyst/parser/SqlBase.g4| 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../sql/catalyst/plans/logical/statements.scala| 4 +- .../catalyst/analysis/ResolveSessionCatalog.scala | 6 +- .../spark/sql/execution/command/tables.scala | 285 -- .../org/apache/spark/sql/internal/HiveSerDe.scala | 16 + .../sql-tests/inputs/show-create-table.sql | 11 +- .../sql-tests/results/show-create-table.sql.out| 34 ++- .../apache/spark/sql/ShowCreateTableSuite.scala| 16 +- .../spark/sql/hive/HiveShowCreateTableSuite.scala | 327 - 11 files changed, 581 insertions(+), 124 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2fd15a2 -> d0c3e9f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2fd15a2 [SPARK-30695][BUILD] Upgrade Apache ORC to 1.5.9 add d0c3e9f [SPARK-30660][ML][PYSPARK] LinearRegression blockify input vectors No new revisions were added by this update. Summary of changes: .../ml/optim/aggregator/HuberAggregator.scala | 103 - .../optim/aggregator/LeastSquaresAggregator.scala | 74 +++ .../spark/ml/regression/LinearRegression.scala | 49 ++ .../ml/optim/aggregator/HuberAggregatorSuite.scala | 61 +--- .../aggregator/LeastSquaresAggregatorSuite.scala | 62 ++--- python/pyspark/ml/regression.py| 22 +++-- 6 files changed, 289 insertions(+), 82 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2fd15a2 -> d0c3e9f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2fd15a2 [SPARK-30695][BUILD] Upgrade Apache ORC to 1.5.9 add d0c3e9f [SPARK-30660][ML][PYSPARK] LinearRegression blockify input vectors No new revisions were added by this update. Summary of changes: .../ml/optim/aggregator/HuberAggregator.scala | 103 - .../optim/aggregator/LeastSquaresAggregator.scala | 74 +++ .../spark/ml/regression/LinearRegression.scala | 49 ++ .../ml/optim/aggregator/HuberAggregatorSuite.scala | 61 +--- .../aggregator/LeastSquaresAggregatorSuite.scala | 62 ++--- python/pyspark/ml/regression.py| 22 +++-- 6 files changed, 289 insertions(+), 82 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82b4f75 -> 2fd15a2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82b4f75 [SPARK-30508][SQL] Add SparkSession.executeCommand API for external datasource add 2fd15a2 [SPARK-30695][BUILD] Upgrade Apache ORC to 1.5.9 No new revisions were added by this update. Summary of changes: LICENSE-binary | 1 + dev/deps/spark-deps-hadoop-2.7-hive-1.2 | 7 --- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 9 + dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 9 + pom.xml | 6 -- 5 files changed, 19 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on issue #258: Add target version of blockers
dongjoon-hyun commented on issue #258: Add target version of blockers URL: https://github.com/apache/spark-website/pull/258#issuecomment-580968165 Thank you all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen closed pull request #258: Add target version of blockers
srowen closed pull request #258: Add target version of blockers URL: https://github.com/apache/spark-website/pull/258 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Add target version of blockers
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 86f060b Add target version of blockers 86f060b is described below commit 86f060b345abe82ff6ae21600de916c7263e5614 Author: Dongjoon Hyun AuthorDate: Fri Jan 31 18:33:24 2020 -0600 Add target version of blockers Author: Dongjoon Hyun Closes #258 from dongjoon-hyun/target. --- contributing.md| 2 +- site/contributing.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/contributing.md b/contributing.md index 9cde6ce..3016a26 100644 --- a/contributing.md +++ b/contributing.md @@ -268,7 +268,7 @@ Example: `Fix typos in Foo scaladoc` Blockers. JIRA tends to unfortunately conflate "size" and "importance" in its Priority field values. Their meaning is roughly: 1. Blocker: pointless to release without this change as the release would be unusable - to a large minority of users. Correctness and data loss issues should be considered Blockers. + to a large minority of users. Correctness and data loss issues should be considered Blockers for their target versions. 1. Critical: a large minority of users are missing important functionality without this, and/or a workaround is difficult 1. Major: a small minority of users are missing important functionality without this, diff --git a/site/contributing.html b/site/contributing.html index b29d381..98f7ed9 100644 --- a/site/contributing.html +++ b/site/contributing.html @@ -488,7 +488,7 @@ Example: Fix typos in Foo scaladoc Priority field values. Their meaning is roughly: Blocker: pointless to release without this change as the release would be unusable - to a large minority of users. Correctness and data loss issues should be considered Blockers. + to a large minority of users. Correctness and data loss issues should be considered Blockers for their target versions. Critical: a large minority of users are missing important functionality without this, and/or a workaround is difficult Major: a small minority of users are missing important functionality without this, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2d4b5ea -> 82b4f75)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2d4b5ea [SPARK-30676][CORE][TESTS] Eliminate warnings from deprecated constructors of java.lang.Integer and java.lang.Double add 82b4f75 [SPARK-30508][SQL] Add SparkSession.executeCommand API for external datasource No new revisions were added by this update. Summary of changes: ...upportsRead.java => ExternalCommandRunner.java} | 30 +++-- .../scala/org/apache/spark/sql/SparkSession.scala | 31 +- .../spark/sql/execution/command/commands.scala | 30 ++--- .../sql/sources/ExternalCommandRunnerSuite.scala | 50 ++ 4 files changed, 120 insertions(+), 21 deletions(-) copy sql/catalyst/src/main/java/org/apache/spark/sql/connector/{catalog/SupportsRead.java => ExternalCommandRunner.java} (51%) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/sources/ExternalCommandRunnerSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (387ce89 -> 2d4b5ea)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 387ce89 [SPARK-27324][DOC][CORE] Document configurations related to executor metrics and modify a configuration add 2d4b5ea [SPARK-30676][CORE][TESTS] Eliminate warnings from deprecated constructors of java.lang.Integer and java.lang.Double No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/rdd/RDD.scala | 6 +++--- .../spark/sql/catalyst/expressions/MutableProjectionSuite.scala | 2 +- sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala | 4 ++-- .../scala/org/apache/spark/sql/hive/HiveUserDefinedTypeSuite.scala | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27324][DOC][CORE] Document configurations related to executor metrics and modify a configuration
This is an automated email from the ASF dual-hosted git repository. irashid pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 387ce89 [SPARK-27324][DOC][CORE] Document configurations related to executor metrics and modify a configuration 387ce89 is described below commit 387ce89a0631f1a4c6668b90ff2a7bbcf11919cd Author: Wing Yew Poon AuthorDate: Fri Jan 31 14:28:02 2020 -0600 [SPARK-27324][DOC][CORE] Document configurations related to executor metrics and modify a configuration ### What changes were proposed in this pull request? Add a section to the Configuration page to document configurations for executor metrics. At the same time, rename spark.eventLog.logStageExecutorProcessTreeMetrics.enabled to spark.executor.processTreeMetrics.enabled and make it independent of spark.eventLog.logStageExecutorMetrics.enabled. ### Why are the changes needed? Executor metrics are new in Spark 3.0. They lack documentation. Memory metrics as a whole are always collected, but the ones obtained from the process tree have to be optionally enabled. Making this depend on a single configuration makes for more intuitive behavior. Given this, the configuration property is renamed to better reflect its meaning. ### Does this PR introduce any user-facing change? Yes, only in that the configurations are all new to 3.0. ### How was this patch tested? Not necessary. Closes #27329 from wypoon/SPARK-27324. Authored-by: Wing Yew Poon Signed-off-by: Imran Rashid --- .../spark/executor/ExecutorMetricsSource.scala | 3 +- .../spark/executor/ProcfsMetricsGetter.scala | 8 ++--- .../org/apache/spark/internal/config/package.scala | 17 +++--- .../spark/deploy/history/HistoryServerSuite.scala | 2 +- docs/configuration.md | 37 ++ docs/monitoring.md | 20 ++-- 6 files changed, 65 insertions(+), 22 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/executor/ExecutorMetricsSource.scala b/core/src/main/scala/org/apache/spark/executor/ExecutorMetricsSource.scala index b052e43..14645f7 100644 --- a/core/src/main/scala/org/apache/spark/executor/ExecutorMetricsSource.scala +++ b/core/src/main/scala/org/apache/spark/executor/ExecutorMetricsSource.scala @@ -32,8 +32,7 @@ import org.apache.spark.metrics.source.Source * spark.executor.metrics.pollingInterval=. * (2) Procfs metrics are gathered all in one-go and only conditionally: * if the /proc filesystem exists - * and spark.eventLog.logStageExecutorProcessTreeMetrics.enabled=true - * and spark.eventLog.logStageExecutorMetrics.enabled=true. + * and spark.executor.processTreeMetrics.enabled=true. */ private[spark] class ExecutorMetricsSource extends Source { diff --git a/core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala b/core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala index 0d5dcfb4..80ef757 100644 --- a/core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala +++ b/core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala @@ -58,11 +58,9 @@ private[spark] class ProcfsMetricsGetter(procfsDir: String = "/proc/") extends L logWarning("Exception checking for procfs dir", ioe) false } - val shouldLogStageExecutorMetrics = -SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) - val shouldLogStageExecutorProcessTreeMetrics = -SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) - procDirExists.get && shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics + val shouldPollProcessTreeMetrics = +SparkEnv.get.conf.get(config.EXECUTOR_PROCESS_TREE_METRICS_ENABLED) + procDirExists.get && shouldPollProcessTreeMetrics } } diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 40b05cf..e68368f 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -148,11 +148,8 @@ package object config { private[spark] val EVENT_LOG_STAGE_EXECUTOR_METRICS = ConfigBuilder("spark.eventLog.logStageExecutorMetrics.enabled") - .booleanConf - .createWithDefault(false) - - private[spark] val EVENT_LOG_PROCESS_TREE_METRICS = -ConfigBuilder("spark.eventLog.logStageExecutorProcessTreeMetrics.enabled") + .doc("Whether to write per-stage peaks of executor metrics (for each executor) " + +"to the event log.") .booleanConf .createWithDefault(false) @@ -215,8 +212,18 @@ package object config { private[spark] val
[spark] branch master updated (33546d6 -> 18bc4e5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 33546d6 Revert "[SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by" add 18bc4e5 [SPARK-30684][WEBUI] Show the descripton of metrics for WholeStageCodegen in DAG viz No new revisions were added by this update. Summary of changes: .../sql/execution/ui/static/spark-sql-viz.css | 7 - .../spark/sql/execution/ui/static/spark-sql-viz.js | 30 ++ .../spark/sql/execution/ui/SparkPlanGraph.scala| 17 +++- 3 files changed, 47 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (33546d6 -> 18bc4e5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 33546d6 Revert "[SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by" add 18bc4e5 [SPARK-30684][WEBUI] Show the descripton of metrics for WholeStageCodegen in DAG viz No new revisions were added by this update. Summary of changes: .../sql/execution/ui/static/spark-sql-viz.css | 7 - .../spark/sql/execution/ui/static/spark-sql-viz.js | 30 ++ .../spark/sql/execution/ui/SparkPlanGraph.scala| 17 +++- 3 files changed, 47 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5eac2dc -> 33546d6)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5eac2dc [SPARK-30691][SQL][DOC] Add a few main pages to SQL Reference add 33546d6 Revert "[SPARK-30036][SQL] Fix: REPARTITION hint does not work with order by" No new revisions were added by this update. Summary of changes: .../execution/exchange/EnsureRequirements.scala| 2 - .../org/apache/spark/sql/ConfigBehaviorSuite.scala | 8 ++-- .../apache/spark/sql/execution/PlannerSuite.scala | 50 -- 3 files changed, 5 insertions(+), 55 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5e0faf9 -> 5eac2dc)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5e0faf9 [SPARK-29779][SPARK-30479][CORE][SQL][FOLLOWUP] Reflect review comments on post-hoc review add 5eac2dc [SPARK-30691][SQL][DOC] Add a few main pages to SQL Reference No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 17 - docs/sql-ref-functions-builtin.md| 2 +- docs/sql-ref-functions-udf.md| 2 +- docs/sql-ref-functions.md| 2 +- docs/sql-ref-syntax-aux-analyze.md | 4 ++-- docs/sql-ref-syntax-aux-cache.md | 4 ++-- docs/sql-ref-syntax-aux-conf-mgmt.md | 10 -- docs/sql-ref-syntax-aux-describe.md | 12 ++-- docs/sql-ref-syntax-aux-resource-mgmt.md | 12 ++-- docs/sql-ref-syntax-aux-show.md | 17 ++--- docs/sql-ref-syntax-aux.md | 16 ++-- docs/sql-ref-syntax-ddl.md | 25 +++-- docs/sql-ref-syntax-dml.md | 8 docs/sql-ref-syntax.md | 9 +++-- docs/sql-ref.md | 2 +- 15 files changed, 70 insertions(+), 72 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5e0faf9 -> 5eac2dc)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5e0faf9 [SPARK-29779][SPARK-30479][CORE][SQL][FOLLOWUP] Reflect review comments on post-hoc review add 5eac2dc [SPARK-30691][SQL][DOC] Add a few main pages to SQL Reference No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 17 - docs/sql-ref-functions-builtin.md| 2 +- docs/sql-ref-functions-udf.md| 2 +- docs/sql-ref-functions.md| 2 +- docs/sql-ref-syntax-aux-analyze.md | 4 ++-- docs/sql-ref-syntax-aux-cache.md | 4 ++-- docs/sql-ref-syntax-aux-conf-mgmt.md | 10 -- docs/sql-ref-syntax-aux-describe.md | 12 ++-- docs/sql-ref-syntax-aux-resource-mgmt.md | 12 ++-- docs/sql-ref-syntax-aux-show.md | 17 ++--- docs/sql-ref-syntax-aux.md | 16 ++-- docs/sql-ref-syntax-ddl.md | 25 +++-- docs/sql-ref-syntax-dml.md | 8 docs/sql-ref-syntax.md | 9 +++-- docs/sql-ref.md | 2 +- 15 files changed, 70 insertions(+), 72 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ff0f636 -> 5e0faf9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ff0f636 [SPARK-30638][CORE][FOLLOWUP] Fix a spacing issue and use UTF-8 instead of ASCII add 5e0faf9 [SPARK-29779][SPARK-30479][CORE][SQL][FOLLOWUP] Reflect review comments on post-hoc review No new revisions were added by this update. Summary of changes: .../deploy/history/BasicEventFilterBuilder.scala | 52 +++--- .../org/apache/spark/internal/config/History.scala | 3 ++ .../execution/history/SQLEventFilterBuilder.scala | 50 ++--- 3 files changed, 54 insertions(+), 51 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (481e521 -> ff0f636)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 481e521 [SPARK-30657][SPARK-30658][SS] Fixed two bugs in streaming limits add ff0f636 [SPARK-30638][CORE][FOLLOWUP] Fix a spacing issue and use UTF-8 instead of ASCII No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30657][SPARK-30658][SS] Fixed two bugs in streaming limits
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 481e521 [SPARK-30657][SPARK-30658][SS] Fixed two bugs in streaming limits 481e521 is described below commit 481e5211d237173ea0fb7c0b292eb7abd2b8a3fe Author: Tathagata Das AuthorDate: Fri Jan 31 09:26:03 2020 -0800 [SPARK-30657][SPARK-30658][SS] Fixed two bugs in streaming limits This PR solves two bugs related to streaming limits **Bug 1 (SPARK-30658)**: Limit before a streaming aggregate (i.e. `df.limit(5).groupBy().count()`) in complete mode was not being planned as a stateful streaming limit. The planner rule planned a logical limit with a stateful streaming limit plan only if the query is in append mode. As a result, instead of allowing max 5 rows across batches, the planned streaming query was allowing 5 rows in every batch thus producing incorrect results. **Solution**: Change the planner rule to plan the logical limit with a streaming limit plan even when the query is in complete mode if the logical limit has no stateful operator before it. **Bug 2 (SPARK-30657)**: `LocalLimitExec` does not consume the iterator of the child plan. So if there is a limit after a stateful operator like streaming dedup in append mode (e.g. `df.dropDuplicates().limit(5)`), the state changes of streaming duplicate may not be committed (most stateful ops commit state changes only after the generated iterator is fully consumed). **Solution**: Change the planner rule to always use a new `StreamingLocalLimitExec` which always fully consumes the iterator. This is the safest thing to do. However, this will introduce a performance regression as consuming the iterator is extra work. To minimize this performance impact, add an additional post-planner optimization rule to replace `StreamingLocalLimitExec` with `LocalLimitExec` when there is no stateful operator before the limit that could be affected by it. No Updated incorrect unit tests and added new ones Closes #27373 from tdas/SPARK-30657. Authored-by: Tathagata Das Signed-off-by: Shixiong Zhu --- .../spark/sql/execution/SparkStrategies.scala | 38 --- .../execution/streaming/IncrementalExecution.scala | 34 ++- ...GlobalLimitExec.scala => streamingLimits.scala} | 55 -- .../apache/spark/sql/streaming/StreamSuite.scala | 112 - 4 files changed, 211 insertions(+), 28 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index 00ad4e0..bd2684d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -451,21 +451,35 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { * Used to plan the streaming global limit operator for streams in append mode. * We need to check for either a direct Limit or a Limit wrapped in a ReturnAnswer operator, * following the example of the SpecialLimits Strategy above. - * Streams with limit in Append mode use the stateful StreamingGlobalLimitExec. - * Streams with limit in Complete mode use the stateless CollectLimitExec operator. - * Limit is unsupported for streams in Update mode. */ case class StreamingGlobalLimitStrategy(outputMode: OutputMode) extends Strategy { -override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case ReturnAnswer(rootPlan) => rootPlan match { -case Limit(IntegerLiteral(limit), child) -if plan.isStreaming && outputMode == InternalOutputModes.Append => - StreamingGlobalLimitExec(limit, LocalLimitExec(limit, planLater(child))) :: Nil -case _ => Nil + +private def generatesStreamingAppends(plan: LogicalPlan): Boolean = { + + /** Ensures that this plan does not have a streaming aggregate in it. */ + def hasNoStreamingAgg: Boolean = { +plan.collectFirst { case a: Aggregate if a.isStreaming => a }.isEmpty } - case Limit(IntegerLiteral(limit), child) - if plan.isStreaming && outputMode == InternalOutputModes.Append => -StreamingGlobalLimitExec(limit, LocalLimitExec(limit, planLater(child))) :: Nil + + // The following cases of limits on a streaming plan has to be executed with a stateful + // streaming plan. + // 1. When the query is in append mode (that is, all logical plan operate on appended data). + // 2. When the plan does not contain any streaming aggregate (that is, plan has only + //operators that operate on appended data). This must be executed with a stateful + //streaming
[spark] branch master updated (21bc047 -> 5ccbb38)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 21bc047 [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in ExecutorAllocationManager add 5ccbb38 [SPARK-29938][SQL][FOLLOW-UP] Improve AlterTableAddPartitionCommand No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 10 .../command/AnalyzePartitionCommand.scala | 2 +- .../spark/sql/execution/command/CommandUtils.scala | 64 -- .../apache/spark/sql/execution/command/ddl.scala | 15 ++--- 4 files changed, 63 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (21bc047 -> 5ccbb38)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 21bc047 [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in ExecutorAllocationManager add 5ccbb38 [SPARK-29938][SQL][FOLLOW-UP] Improve AlterTableAddPartitionCommand No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 10 .../command/AnalyzePartitionCommand.scala | 2 +- .../spark/sql/execution/command/CommandUtils.scala | 64 -- .../apache/spark/sql/execution/command/ddl.scala | 15 ++--- 4 files changed, 63 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun opened a new pull request #258: Add target version of blockers
dongjoon-hyun opened a new pull request #258: Add target version of blockers URL: https://github.com/apache/spark-website/pull/258 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in ExecutorAllocationManager
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 21bc047 [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in ExecutorAllocationManager 21bc047 is described below commit 21bc0474bbb16c7648aed40f25a2945d98d2a167 Author: zebi...@fb.com AuthorDate: Fri Jan 31 08:49:34 2020 -0600 [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in ExecutorAllocationManager ### What changes were proposed in this pull request? Currently, when speculative tasks fail/get killed, they are still considered as pending and count towards the calculation of number of needed executors. To be more accurate: `stageAttemptToNumSpeculativeTasks(stageAttempt)` is incremented on onSpeculativeTaskSubmitted, but never decremented. `stageAttemptToNumSpeculativeTasks -= stageAttempt` is performed on stage completion. **This means Spark is marking ended speculative tasks as pending, which leads to Spark to hold more executors [...] This PR fixes this issue by updating `stageAttemptToSpeculativeTaskIndices` and `stageAttemptToNumSpeculativeTasks` on speculative tasks completion. This PR also addresses some other minor issues: scheduler behavior after receiving an intentionally killed task event; try to address [SPARK-28403](https://issues.apache.org/jira/browse/SPARK-28403). ### Why are the changes needed? This has caused resource wastage in our production with speculation enabled. With aggressive speculation, we found data skewed jobs can hold hundreds of idle executors with less than 10 tasks running. An easy repro of the issue (`--conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.maxExecutors=1000` in cluster mode): ``` val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index < 300 && index >= 150) { Thread.sleep(index * 1000) // Fake running tasks } else if (index == 300) { Thread.sleep(1000 * 1000) // Fake long running tasks } it.toList.map(x => index + ", " + x).iterator }).collect ``` You will see when running the last task, we would be hold 38 executors (see below), which is exactly (152 + 3) / 4 = 38. ![image](https://user-images.githubusercontent.com/9404831/72469112-9a7fac00-3793-11ea-8f50-74d0ab7325a4.png) ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added a comprehensive unit test. Test with the above repro shows that we are holding 2 executors at the end ![image](https://user-images.githubusercontent.com/9404831/72469177-bbe09800-3793-11ea-850f-4a2c67142899.png) Closes #27223 from linzebing/speculation_fix. Authored-by: zebi...@fb.com Signed-off-by: Thomas Graves --- .../apache/spark/ExecutorAllocationManager.scala | 61 ++ .../spark/ExecutorAllocationManagerSuite.scala | 135 + 2 files changed, 172 insertions(+), 24 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala index bff854a..677386c 100644 --- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala +++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala @@ -263,9 +263,16 @@ private[spark] class ExecutorAllocationManager( */ private def maxNumExecutorsNeeded(): Int = { val numRunningOrPendingTasks = listener.totalPendingTasks + listener.totalRunningTasks -math.ceil(numRunningOrPendingTasks * executorAllocationRatio / - tasksPerExecutorForFullParallelism) - .toInt +val maxNeeded = math.ceil(numRunningOrPendingTasks * executorAllocationRatio / + tasksPerExecutorForFullParallelism).toInt +if (tasksPerExecutorForFullParallelism > 1 && maxNeeded == 1 && + listener.pendingSpeculativeTasks > 0) { + // If we have pending speculative tasks and only need a single executor, allocate one more + // to satisfy the locality requirements of speculation + maxNeeded + 1 +} else { + maxNeeded +} } private def totalRunningTasks(): Int = synchronized { @@ -377,14 +384,8 @@ private[spark] class ExecutorAllocationManager( // If our target has not changed, do not send a message // to the cluster manager and reset our exponential growth if (delta == 0) { - // Check if there is any speculative jobs pending - if (listener.pendingTasks == 0 && listener.pendingSpeculativeTasks > 0) { -numExecutorsTarget = - math.max(math.min(maxNumExecutorsNeeded +
[spark] branch master updated: [SPARK-30638][CORE] Add resources allocated to PluginContext
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3d2b8d8 [SPARK-30638][CORE] Add resources allocated to PluginContext 3d2b8d8 is described below commit 3d2b8d8b13eff0faa02316542a343e7a64873b8a Author: Thomas Graves AuthorDate: Fri Jan 31 08:25:32 2020 -0600 [SPARK-30638][CORE] Add resources allocated to PluginContext ### What changes were proposed in this pull request? Add the allocated resources to parameters to the PluginContext so that any plugins in driver or executor could use this information to initialize devices or use this information in a useful manner. ### Why are the changes needed? To allow users to initialize/track devices once at the executor level before each task runs to use them. ### Does this PR introduce any user-facing change? Yes to the people using the Executor/Driver plugin interface. ### How was this patch tested? Unit tests and manually by writing a plugin that initialized GPU's using this interface. Closes #27367 from tgravescs/pluginWithResources. Lead-authored-by: Thomas Graves Co-authored-by: Thomas Graves Signed-off-by: Thomas Graves --- .../org/apache/spark/api/plugin/PluginContext.java | 5 + .../main/scala/org/apache/spark/SparkContext.scala | 2 +- .../executor/CoarseGrainedExecutorBackend.scala| 10 +- .../scala/org/apache/spark/executor/Executor.scala | 7 +- .../spark/internal/plugin/PluginContainer.scala| 36 +-- .../spark/internal/plugin/PluginContextImpl.scala | 6 +- .../scheduler/local/LocalSchedulerBackend.scala| 5 +- .../org/apache/spark/executor/ExecutorSuite.scala | 12 ++- .../internal/plugin/PluginContainerSuite.scala | 109 +++-- .../spark/executor/MesosExecutorBackend.scala | 4 +- 10 files changed, 167 insertions(+), 29 deletions(-) diff --git a/core/src/main/java/org/apache/spark/api/plugin/PluginContext.java b/core/src/main/java/org/apache/spark/api/plugin/PluginContext.java index b9413cf..36d8275 100644 --- a/core/src/main/java/org/apache/spark/api/plugin/PluginContext.java +++ b/core/src/main/java/org/apache/spark/api/plugin/PluginContext.java @@ -18,11 +18,13 @@ package org.apache.spark.api.plugin; import java.io.IOException; +import java.util.Map; import com.codahale.metrics.MetricRegistry; import org.apache.spark.SparkConf; import org.apache.spark.annotation.DeveloperApi; +import org.apache.spark.resource.ResourceInformation; /** * :: DeveloperApi :: @@ -54,6 +56,9 @@ public interface PluginContext { /** The host name which is being used by the Spark process for communication. */ String hostname(); + /** The custom resources (GPUs, FPGAs, etc) allocated to driver or executor. */ + Map resources(); + /** * Send a message to the plugin's driver-side component. * diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 3262631..6e0c7ac 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -542,7 +542,7 @@ class SparkContext(config: SparkConf) extends Logging { HeartbeatReceiver.ENDPOINT_NAME, new HeartbeatReceiver(this)) // Initialize any plugins before the task scheduler is initialized. -_plugins = PluginContainer(this) +_plugins = PluginContainer(this, _resources.asJava) // Create and start the scheduler val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode) diff --git a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala index 511c63a..ce211ce 100644 --- a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala +++ b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala @@ -69,6 +69,8 @@ private[spark] class CoarseGrainedExecutorBackend( // to be changed so that we don't share the serializer instance across threads private[this] val ser: SerializerInstance = env.closureSerializer.newInstance() + private var _resources = Map.empty[String, ResourceInformation] + /** * Map each taskId to the information about the resource allocated to it, Please refer to * [[ResourceInformation]] for specifics. @@ -78,9 +80,8 @@ private[spark] class CoarseGrainedExecutorBackend( override def onStart(): Unit = { logInfo("Connecting to driver: " + driverUrl) -var resources = Map.empty[String, ResourceInformation] try { - resources = parseOrFindResources(resourcesFileOpt) + _resources = parseOrFindResources(resourcesFileOpt) }
[spark] branch master updated (290a528 -> 6f4703e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 290a528 [SPARK-30615][SQL] Introduce Analyzer rule for V2 AlterTable column change resolution add 6f4703e [SPARK-30690][DOCS][BUILD] Add CalendarInterval into API documentation No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java | 2 ++ project/SparkBuild.scala | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (290a528 -> 6f4703e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 290a528 [SPARK-30615][SQL] Introduce Analyzer rule for V2 AlterTable column change resolution add 6f4703e [SPARK-30690][DOCS][BUILD] Add CalendarInterval into API documentation No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java | 2 ++ project/SparkBuild.scala | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6fac411 -> 290a528)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6fac411 [SPARK-29093][ML][PYSPARK][FOLLOW-UP] Remove duplicate setter add 290a528 [SPARK-30615][SQL] Introduce Analyzer rule for V2 AlterTable column change resolution No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 156 ++ .../sql/catalyst/analysis/CheckAnalysis.scala | 60 +- .../org/apache/spark/sql/types/StructType.scala| 85 +--- .../spark/sql/catalyst/analysis/AnalysisTest.scala | 9 +- .../CreateTablePartitioningValidationSuite.scala | 4 +- .../spark/sql/connector/AlterTableTests.scala | 71 ++- .../connector/V2CommandsCaseSensitivitySuite.scala | 227 + .../execution/command/PlanResolutionSuite.scala| 26 ++- 8 files changed, 583 insertions(+), 55 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/connector/V2CommandsCaseSensitivitySuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org