[spark] branch master updated (e9fd522 -> 28b8713)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e9fd522 [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin add 28b8713 [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT No new revisions were added by this update. Summary of changes: R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- project/MimaExcludes.scala | 5 + python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 40 files changed, 45 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 742e35f [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin 742e35f is described below commit 742e35f1d48c2523dda2ce21d73b7ab5ade20582 Author: yi.wu AuthorDate: Wed Feb 26 11:55:05 2020 +0900 [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin ### What changes were proposed in this pull request? Rename config `spark.resources.discovery.plugin` to `spark.resources.discoveryPlugin`. Also, as a side minor change: labeled `ResourceDiscoveryScriptPlugin` as `DeveloperApi` since it's not for end user. ### Why are the changes needed? Discovery plugin doesn't need to reserve the "discovery" namespace here and it's more consistent with the interface name `ResourceDiscoveryPlugin` if we use `discoveryPlugin` instead. ### Does this PR introduce any user-facing change? No, it's newly added in Spark3.0. ### How was this patch tested? Pass Jenkins. Closes #27689 from Ngone51/spark_30689_followup. Authored-by: yi.wu Signed-off-by: HyukjinKwon (cherry picked from commit e9fd52282e4ed4831c5922348b0e1ee71e045b4b) Signed-off-by: HyukjinKwon --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- .../scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala | 2 ++ docs/configuration.md | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 3f36e61..37ce178 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -55,7 +55,7 @@ package object config { .createOptional private[spark] val RESOURCES_DISCOVERY_PLUGIN = -ConfigBuilder("spark.resources.discovery.plugin") +ConfigBuilder("spark.resources.discoveryPlugin") .doc("Comma-separated list of class names implementing" + "org.apache.spark.api.resource.ResourceDiscoveryPlugin to load into the application." + "This is for advanced users to replace the resource discovery class with a " + diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala b/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala index 2ac6d3c..7027d1e 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala @@ -21,6 +21,7 @@ import java.io.File import java.util.Optional import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.annotation.DeveloperApi import org.apache.spark.api.resource.ResourceDiscoveryPlugin import org.apache.spark.internal.Logging import org.apache.spark.util.Utils.executeAndGetOutput @@ -32,6 +33,7 @@ import org.apache.spark.util.Utils.executeAndGetOutput * If the user specifies custom plugins, this is the last one to be executed and * throws if the resource isn't discovered. */ +@DeveloperApi class ResourceDiscoveryScriptPlugin extends ResourceDiscoveryPlugin with Logging { override def discoverResource( request: ResourceRequest, diff --git a/docs/configuration.md b/docs/configuration.md index 2421e00..469feed 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -244,7 +244,7 @@ of the most common options to set are: - spark.resources.discovery.plugin + spark.resources.discoveryPlugin org.apache.spark.resource.ResourceDiscoveryScriptPlugin Comma-separated list of class names implementing - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30662][ML][PYSPARK] Put back the API changes for HasBlockSize in ALS/MLP
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 84345c7 [SPARK-30662][ML][PYSPARK] Put back the API changes for HasBlockSize in ALS/MLP 84345c7 is described below commit 84345c7e67c9dfd47ec76d5a3d2ad76b62f959b6 Author: Huaxin Gao AuthorDate: Sun Feb 9 13:14:30 2020 +0800 [SPARK-30662][ML][PYSPARK] Put back the API changes for HasBlockSize in ALS/MLP ### What changes were proposed in this pull request? Add ```HasBlockSize``` in shared Params in both Scala and Python. Make ALS/MLP extend ```HasBlockSize``` ### Why are the changes needed? Add ```HasBlockSize ``` in ALS, so user can specify the blockSize. Make ```HasBlockSize``` a shared param so both ALS and MLP can use it. ### Does this PR introduce any user-facing change? Yes ```ALS.setBlockSize/getBlockSize``` ```ALSModel.setBlockSize/getBlockSize``` ### How was this patch tested? Manually tested. Also added doctest. Closes #27501 from huaxingao/spark_30662. Authored-by: Huaxin Gao Signed-off-by: zhengruifeng --- .../MultilayerPerceptronClassifier.scala | 22 +-- .../ml/param/shared/SharedParamsCodeGen.scala | 6 ++- .../spark/ml/param/shared/sharedParams.scala | 17 .../org/apache/spark/ml/recommendation/ALS.scala | 46 -- python/pyspark/ml/classification.py| 22 --- python/pyspark/ml/param/_shared_params_code_gen.py | 5 ++- python/pyspark/ml/param/shared.py | 17 python/pyspark/ml/recommendation.py| 29 +++--- 8 files changed, 109 insertions(+), 55 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala b/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala index c7a8237..6e8f92b 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala @@ -34,7 +34,7 @@ import org.apache.spark.util.VersionUtils.majorMinorVersion /** Params for Multilayer Perceptron. */ private[classification] trait MultilayerPerceptronParams extends ProbabilisticClassifierParams - with HasSeed with HasMaxIter with HasTol with HasStepSize with HasSolver { + with HasSeed with HasMaxIter with HasTol with HasStepSize with HasSolver with HasBlockSize { import MultilayerPerceptronClassifier._ @@ -55,26 +55,6 @@ private[classification] trait MultilayerPerceptronParams extends ProbabilisticCl final def getLayers: Array[Int] = $(layers) /** - * Block size for stacking input data in matrices to speed up the computation. - * Data is stacked within partitions. If block size is more than remaining data in - * a partition then it is adjusted to the size of this data. - * Recommended size is between 10 and 1000. - * Default: 128 - * - * @group expertParam - */ - @Since("1.5.0") - final val blockSize: IntParam = new IntParam(this, "blockSize", -"Block size for stacking input data in matrices. Data is stacked within partitions." + - " If block size is more than remaining data in a partition then " + - "it is adjusted to the size of this data. Recommended size is between 10 and 1000", -ParamValidators.gt(0)) - - /** @group expertGetParam */ - @Since("1.5.0") - final def getBlockSize: Int = $(blockSize) - - /** * The solver algorithm for optimization. * Supported options: "gd" (minibatch gradient descent) or "l-bfgs". * Default: "l-bfgs" diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala index 7ac680e..6194dfa 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala @@ -104,7 +104,11 @@ private[shared] object SharedParamsCodeGen { isValid = "ParamValidators.inArray(Array(\"euclidean\", \"cosine\"))"), ParamDesc[String]("validationIndicatorCol", "name of the column that indicates whether " + "each row is for training or for validation. False indicates training; true indicates " + -"validation.") +"validation."), + ParamDesc[Int]("blockSize", "block size for stacking input data in matrices. Data is " + +"stacked within partitions. If block size is more than remaining data in a partition " + +"then it is adjusted to the size of this data.", +isValid = "ParamValidators.gt(0)", isExpertParam = true) ) val
[spark] branch branch-3.0 updated: [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 742e35f [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin 742e35f is described below commit 742e35f1d48c2523dda2ce21d73b7ab5ade20582 Author: yi.wu AuthorDate: Wed Feb 26 11:55:05 2020 +0900 [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin ### What changes were proposed in this pull request? Rename config `spark.resources.discovery.plugin` to `spark.resources.discoveryPlugin`. Also, as a side minor change: labeled `ResourceDiscoveryScriptPlugin` as `DeveloperApi` since it's not for end user. ### Why are the changes needed? Discovery plugin doesn't need to reserve the "discovery" namespace here and it's more consistent with the interface name `ResourceDiscoveryPlugin` if we use `discoveryPlugin` instead. ### Does this PR introduce any user-facing change? No, it's newly added in Spark3.0. ### How was this patch tested? Pass Jenkins. Closes #27689 from Ngone51/spark_30689_followup. Authored-by: yi.wu Signed-off-by: HyukjinKwon (cherry picked from commit e9fd52282e4ed4831c5922348b0e1ee71e045b4b) Signed-off-by: HyukjinKwon --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- .../scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala | 2 ++ docs/configuration.md | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 3f36e61..37ce178 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -55,7 +55,7 @@ package object config { .createOptional private[spark] val RESOURCES_DISCOVERY_PLUGIN = -ConfigBuilder("spark.resources.discovery.plugin") +ConfigBuilder("spark.resources.discoveryPlugin") .doc("Comma-separated list of class names implementing" + "org.apache.spark.api.resource.ResourceDiscoveryPlugin to load into the application." + "This is for advanced users to replace the resource discovery class with a " + diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala b/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala index 2ac6d3c..7027d1e 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala @@ -21,6 +21,7 @@ import java.io.File import java.util.Optional import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.annotation.DeveloperApi import org.apache.spark.api.resource.ResourceDiscoveryPlugin import org.apache.spark.internal.Logging import org.apache.spark.util.Utils.executeAndGetOutput @@ -32,6 +33,7 @@ import org.apache.spark.util.Utils.executeAndGetOutput * If the user specifies custom plugins, this is the last one to be executed and * throws if the resource isn't discovered. */ +@DeveloperApi class ResourceDiscoveryScriptPlugin extends ResourceDiscoveryPlugin with Logging { override def discoverResource( request: ResourceRequest, diff --git a/docs/configuration.md b/docs/configuration.md index 2421e00..469feed 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -244,7 +244,7 @@ of the most common options to set are: - spark.resources.discovery.plugin + spark.resources.discoveryPlugin org.apache.spark.resource.ResourceDiscoveryScriptPlugin Comma-separated list of class names implementing - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9ea6c0a -> e9fd522)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9ea6c0a [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs add e9fd522 [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- .../scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala | 2 ++ docs/configuration.md | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9ea6c0a -> e9fd522)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9ea6c0a [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs add e9fd522 [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- .../scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala | 2 ++ docs/configuration.md | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 5343059 [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs 5343059 is described below commit 53430594587ad0134eff5cd2b5e06a7a3eec1b99 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Feb 25 15:29:36 2020 -0800 [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs ### What changes were proposed in this pull request? This patch changes the tool tip string in Structured Streaming UI graphs to show batch ID (and timestamp as well) instead of only showing timestamp, which was a key for DStream but no longer a key for Structured Streaming. This patch does some refactoring as there're some spots on confusion between js file for streaming and structured streaming. Note that this patch doesn't actually change the x axis, as once we change it we should decouple the logic for graphs between streaming and structured streaming. It won't change UX meaningfully as in x axis we only show min and max which we still would like to know about "time" as well as batch ID. ### Why are the changes needed? In Structured Streaming, everything is aligned for "batch ID" where the UI is only showing timestamp - end users have to manually find and correlate batch ID and the timestamp which is clearly a huge pain. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually tested. Screenshots: ![Screen Shot 2020-02-25 at 7 22 38 AM](https://user-images.githubusercontent.com/1317309/75197701-40b2ce80-57a2-11ea-9578-c2eb2d1091de.png) ![Screen Shot 2020-02-25 at 7 22 44 AM](https://user-images.githubusercontent.com/1317309/75197704-427c9200-57a2-11ea-9439-e0a8303d0860.png) ![Screen Shot 2020-02-25 at 7 22 58 AM](https://user-images.githubusercontent.com/1317309/75197706-43152880-57a2-11ea-9617-1276c3ba181e.png) ![Screen Shot 2020-02-25 at 7 23 04 AM](https://user-images.githubusercontent.com/1317309/75197708-43152880-57a2-11ea-9de2-7d37eaf88102.png) ![Screen Shot 2020-02-25 at 7 23 31 AM](https://user-images.githubusercontent.com/1317309/75197710-43adbf00-57a2-11ea-9ae4-4e292de39c36.png) Closes #27687 from HeartSaVioR/SPARK-30943. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Shixiong Zhu (cherry picked from commit 9ea6c0a8975a1277abba799b51aca4e2818c23e7) Signed-off-by: Shixiong Zhu --- .../org/apache/spark/ui/static/streaming-page.js | 2 +- .../spark/ui/static/structured-streaming-page.js | 4 +-- .../ui/StreamingQueryStatisticsPage.scala | 36 ++ .../apache/spark/streaming/ui/StreamingPage.scala | 13 +++- 4 files changed, 45 insertions(+), 10 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js b/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js index 5b75bc3..ed3e65c3 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js +++ b/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js @@ -171,7 +171,7 @@ function drawTimeline(id, data, minX, maxX, minY, maxY, unitY, batchInterval) { .attr("cy", function(d) { return y(d.y); }) .attr("r", function(d) { return isFailedBatch(d.x) ? "2" : "3";}) .on('mouseover', function(d) { -var tip = formatYValue(d.y) + " " + unitY + " at " + timeFormat[d.x]; +var tip = formatYValue(d.y) + " " + unitY + " at " + timeTipStrings[d.x]; showBootstrapTooltip(d3.select(this).node(), tip); // show the point d3.select(this) diff --git a/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js b/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js index 70250fd..c92226b 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js +++ b/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js @@ -106,12 +106,12 @@ function drawAreaStack(id, labels, values, minX, maxX, minY, maxY) { .on('mouseover', function(d) { var tip = ''; var idx = 0; -var _values = timeToValues[d._x] +var _values = formattedTimeToValues[d._x]; _values.forEach(function (k) { tip += labels[idx] + ': ' + k + ' '; idx += 1; }); -tip += " at " + d._x +tip += " at " + formattedTimeTipStrings[d._x]; showBootstrapTooltip(d3.select(this).node(), tip); })
[spark] branch master updated: [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9ea6c0a [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs 9ea6c0a is described below commit 9ea6c0a8975a1277abba799b51aca4e2818c23e7 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Feb 25 15:29:36 2020 -0800 [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs ### What changes were proposed in this pull request? This patch changes the tool tip string in Structured Streaming UI graphs to show batch ID (and timestamp as well) instead of only showing timestamp, which was a key for DStream but no longer a key for Structured Streaming. This patch does some refactoring as there're some spots on confusion between js file for streaming and structured streaming. Note that this patch doesn't actually change the x axis, as once we change it we should decouple the logic for graphs between streaming and structured streaming. It won't change UX meaningfully as in x axis we only show min and max which we still would like to know about "time" as well as batch ID. ### Why are the changes needed? In Structured Streaming, everything is aligned for "batch ID" where the UI is only showing timestamp - end users have to manually find and correlate batch ID and the timestamp which is clearly a huge pain. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually tested. Screenshots: ![Screen Shot 2020-02-25 at 7 22 38 AM](https://user-images.githubusercontent.com/1317309/75197701-40b2ce80-57a2-11ea-9578-c2eb2d1091de.png) ![Screen Shot 2020-02-25 at 7 22 44 AM](https://user-images.githubusercontent.com/1317309/75197704-427c9200-57a2-11ea-9439-e0a8303d0860.png) ![Screen Shot 2020-02-25 at 7 22 58 AM](https://user-images.githubusercontent.com/1317309/75197706-43152880-57a2-11ea-9617-1276c3ba181e.png) ![Screen Shot 2020-02-25 at 7 23 04 AM](https://user-images.githubusercontent.com/1317309/75197708-43152880-57a2-11ea-9de2-7d37eaf88102.png) ![Screen Shot 2020-02-25 at 7 23 31 AM](https://user-images.githubusercontent.com/1317309/75197710-43adbf00-57a2-11ea-9ae4-4e292de39c36.png) Closes #27687 from HeartSaVioR/SPARK-30943. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Shixiong Zhu --- .../org/apache/spark/ui/static/streaming-page.js | 2 +- .../spark/ui/static/structured-streaming-page.js | 4 +-- .../ui/StreamingQueryStatisticsPage.scala | 36 ++ .../apache/spark/streaming/ui/StreamingPage.scala | 13 +++- 4 files changed, 45 insertions(+), 10 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js b/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js index 5b75bc3..ed3e65c3 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js +++ b/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js @@ -171,7 +171,7 @@ function drawTimeline(id, data, minX, maxX, minY, maxY, unitY, batchInterval) { .attr("cy", function(d) { return y(d.y); }) .attr("r", function(d) { return isFailedBatch(d.x) ? "2" : "3";}) .on('mouseover', function(d) { -var tip = formatYValue(d.y) + " " + unitY + " at " + timeFormat[d.x]; +var tip = formatYValue(d.y) + " " + unitY + " at " + timeTipStrings[d.x]; showBootstrapTooltip(d3.select(this).node(), tip); // show the point d3.select(this) diff --git a/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js b/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js index 70250fd..c92226b 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js +++ b/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js @@ -106,12 +106,12 @@ function drawAreaStack(id, labels, values, minX, maxX, minY, maxY) { .on('mouseover', function(d) { var tip = ''; var idx = 0; -var _values = timeToValues[d._x] +var _values = formattedTimeToValues[d._x]; _values.forEach(function (k) { tip += labels[idx] + ': ' + k + ' '; idx += 1; }); -tip += " at " + d._x +tip += " at " + formattedTimeTipStrings[d._x]; showBootstrapTooltip(d3.select(this).node(), tip); }) .on('mouseout', function() { diff --git
[spark] branch branch-3.0 updated: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 16c7668 [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md 16c7668 is described below commit 16c76688640b662038737d9de66e541e8051b345 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Feb 25 15:17:16 2020 -0800 [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md ### What changes were proposed in this pull request? This is a FOLLOW-UP PR for review comment on #27208 : https://github.com/apache/spark/pull/27208#pullrequestreview-347451714 This PR documents a new feature `Eventlog Compaction` into the new section of `monitoring.md`, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration. ### Why are the changes needed? Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Built docs via jekyll. > change on the new section https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png;> > change on the table https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png;> Closes #27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Dongjoon Hyun (cherry picked from commit 02f8165343fb5c6fc4e5a1874252abdfe886b5b2) Signed-off-by: Dongjoon Hyun --- docs/monitoring.md | 57 +- 1 file changed, 44 insertions(+), 13 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index c30aa99..4cba15b 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -95,6 +95,48 @@ The history server can be configured as follows: +### Applying compaction on rolling event log files + +A long-running application (e.g. streaming) can bring a huge single event log file which may cost a lot to maintain and +also requires a bunch of resource to replay per each update in Spark History Server. + +Enabling spark.eventLog.rolling.enabled and spark.eventLog.rolling.maxFileSize would +let you have rolling event log files instead of single huge event log file which may help some scenarios on its own, +but it still doesn't help you reducing the overall size of logs. + +Spark History Server can apply compaction on the rolling event log files to reduce the overall size of +logs, via setting the configuration spark.history.fs.eventLog.rolling.maxFilesToRetain on the +Spark History Server. + +Details will be described below, but please note in prior that compaction is LOSSY operation. +Compaction will discard some events which will be no longer seen on UI - you may want to check which events will be discarded +before enabling the option. + +When the compaction happens, the History Server lists all the available event log files for the application, and considers +the event log files having less index than the file with smallest index which will be retained as target of compaction. +For example, if the application A has 5 event log files and spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, then first 3 log files will be selected to be compacted. + +Once it selects the target, it analyzes them to figure out which events can be excluded, and rewrites them +into one compact file with discarding events which are decided to exclude. + +The compaction tries to exclude the events which point to the outdated data. As of now, below describes the candidates of events to be excluded: + +* Events for the job which is finished, and related stage/tasks events +* Events for the executor which is terminated +* Events for the SQL execution which is finished, and related job/stage/tasks events + +Once rewriting is done, original log files will be deleted, via best-effort manner. The History Server may not be able to delete +the original log files, but it will not affect the operation of the History Server. + +Please note that Spark History Server may not compact the old event log files if figures out not a lot of space +would be reduced during compaction. For streaming query we normally expect compaction +will run as each micro-batch will trigger one or more jobs which will be finished shortly, but compaction won't run +in many cases for batch query. + +Please also note that this is a new feature introduced in Spark 3.0, and may not be completely stable. Under some circumstances, +the compaction may exclude more
[spark] branch branch-3.0 updated: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 16c7668 [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md 16c7668 is described below commit 16c76688640b662038737d9de66e541e8051b345 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Feb 25 15:17:16 2020 -0800 [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md ### What changes were proposed in this pull request? This is a FOLLOW-UP PR for review comment on #27208 : https://github.com/apache/spark/pull/27208#pullrequestreview-347451714 This PR documents a new feature `Eventlog Compaction` into the new section of `monitoring.md`, as it only has one configuration on the SHS side and it's hard to explain everything on the description on the single configuration. ### Why are the changes needed? Event log compaction lacks the documentation for what it is and how it helps. This PR will explain it. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Built docs via jekyll. > change on the new section https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png;> > change on the table https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png;> Closes #27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Dongjoon Hyun (cherry picked from commit 02f8165343fb5c6fc4e5a1874252abdfe886b5b2) Signed-off-by: Dongjoon Hyun --- docs/monitoring.md | 57 +- 1 file changed, 44 insertions(+), 13 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index c30aa99..4cba15b 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -95,6 +95,48 @@ The history server can be configured as follows: +### Applying compaction on rolling event log files + +A long-running application (e.g. streaming) can bring a huge single event log file which may cost a lot to maintain and +also requires a bunch of resource to replay per each update in Spark History Server. + +Enabling spark.eventLog.rolling.enabled and spark.eventLog.rolling.maxFileSize would +let you have rolling event log files instead of single huge event log file which may help some scenarios on its own, +but it still doesn't help you reducing the overall size of logs. + +Spark History Server can apply compaction on the rolling event log files to reduce the overall size of +logs, via setting the configuration spark.history.fs.eventLog.rolling.maxFilesToRetain on the +Spark History Server. + +Details will be described below, but please note in prior that compaction is LOSSY operation. +Compaction will discard some events which will be no longer seen on UI - you may want to check which events will be discarded +before enabling the option. + +When the compaction happens, the History Server lists all the available event log files for the application, and considers +the event log files having less index than the file with smallest index which will be retained as target of compaction. +For example, if the application A has 5 event log files and spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, then first 3 log files will be selected to be compacted. + +Once it selects the target, it analyzes them to figure out which events can be excluded, and rewrites them +into one compact file with discarding events which are decided to exclude. + +The compaction tries to exclude the events which point to the outdated data. As of now, below describes the candidates of events to be excluded: + +* Events for the job which is finished, and related stage/tasks events +* Events for the executor which is terminated +* Events for the SQL execution which is finished, and related job/stage/tasks events + +Once rewriting is done, original log files will be deleted, via best-effort manner. The History Server may not be able to delete +the original log files, but it will not affect the operation of the History Server. + +Please note that Spark History Server may not compact the old event log files if figures out not a lot of space +would be reduced during compaction. For streaming query we normally expect compaction +will run as each micro-batch will trigger one or more jobs which will be finished shortly, but compaction won't run +in many cases for batch query. + +Please also note that this is a new feature introduced in Spark 3.0, and may not be completely stable. Under some circumstances, +the compaction may exclude more
[spark] branch master updated (8f247e5 -> 02f8165)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8f247e5 [SPARK-30918][SQL] improve the splitting of skewed partitions add 02f8165 [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md No new revisions were added by this update. Summary of changes: docs/monitoring.md | 57 +- 1 file changed, 44 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8f247e5 -> 02f8165)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8f247e5 [SPARK-30918][SQL] improve the splitting of skewed partitions add 02f8165 [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md No new revisions were added by this update. Summary of changes: docs/monitoring.md | 57 +- 1 file changed, 44 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30918][SQL] improve the splitting of skewed partitions
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b968cd3 [SPARK-30918][SQL] improve the splitting of skewed partitions b968cd3 is described below commit b968cd37796a5730fe5c2318d23a38416f550957 Author: Wenchen Fan AuthorDate: Tue Feb 25 14:10:29 2020 -0800 [SPARK-30918][SQL] improve the splitting of skewed partitions ### What changes were proposed in this pull request? Use the average size of the non-skewed partitions as the target size when splitting skewed partitions, instead of ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD ### Why are the changes needed? The goal of skew join optimization is to make the data distribution move even. So it makes more sense the use the average size of the non-skewed partitions as the target size. ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes #27669 from cloud-fan/aqe. Authored-by: Wenchen Fan Signed-off-by: Xiao Li (cherry picked from commit 8f247e5d3682ad765bdbb9ea5a4315862c5a383c) Signed-off-by: Xiao Li --- .../org/apache/spark/sql/internal/SQLConf.scala| 10 +--- .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 54 insertions(+), 22 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 674c6df..e6f7cfd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -432,19 +432,13 @@ object SQLConf { .booleanConf .createWithDefault(true) - val ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD = - buildConf("spark.sql.adaptive.skewedJoinOptimization.skewedPartitionSizeThreshold") - .doc("Configures the minimum size in bytes for a partition that is considered as a skewed " + -"partition in adaptive skewed join.") - .bytesConf(ByteUnit.BYTE) - .createWithDefaultString("64MB") - val ADAPTIVE_EXECUTION_SKEWED_PARTITION_FACTOR = buildConf("spark.sql.adaptive.skewedJoinOptimization.skewedPartitionFactor") .doc("A partition is considered as a skewed partition if its size is larger than" + " this factor multiple the median partition size and also larger than " + -s" ${ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD.key}") +s" ${SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key}") .intConf + .checkValue(_ > 0, "The skew factor must be positive.") .createWithDefault(10) val NON_EMPTY_PARTITION_RATIO_FOR_BROADCAST_JOIN = diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala index 578d2d7..d3cb864 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala @@ -34,6 +34,30 @@ import org.apache.spark.sql.execution.exchange.{EnsureRequirements, ShuffleExcha import org.apache.spark.sql.execution.joins.SortMergeJoinExec import org.apache.spark.sql.internal.SQLConf +/** + * A rule to optimize skewed joins to avoid straggler tasks whose share of data are significantly + * larger than those of the rest of the tasks. + * + * The general idea is to divide each skew partition into smaller partitions and replicate its + * matching partition on the other side of the join so that they can run in parallel tasks. + * Note that when matching partitions from the left side and the right side both have skew, + * it will become a cartesian product of splits from left and right joining together. + * + * For example, assume the Sort-Merge join has 4 partitions: + * left: [L1, L2, L3, L4] + * right: [R1, R2, R3, R4] + * + * Let's say L2, L4 and R3, R4 are skewed, and each of them get split into 2 sub-partitions. This + * is scheduled to run 4 tasks at the beginning: (L1, R1), (L2, R2), (L2, R2), (L2, R2). + * This rule expands it to 9 tasks to increase parallelism: + * (L1, R1), + * (L2-1, R2), (L2-2, R2), + * (L3, R3-1), (L3, R3-2), + * (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2) + * + * Note that, when this rule is enabled, it also coalesces non-skewed partitions like + * `ReduceNumShufflePartitions` does. + */ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { private val ensureRequirements = EnsureRequirements(conf) @@ -43,12 +67,12 @@ case class
[spark] branch master updated (e086a78 -> 8f247e5)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e086a78 [MINOR][ML] ML cleanup add 8f247e5 [SPARK-30918][SQL] improve the splitting of skewed partitions No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 10 +--- .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 54 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e086a78 -> 8f247e5)
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e086a78 [MINOR][ML] ML cleanup add 8f247e5 [SPARK-30918][SQL] improve the splitting of skewed partitions No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 10 +--- .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++ .../adaptive/AdaptiveQueryExecSuite.scala | 4 +- 3 files changed, 54 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c46c067 -> e086a78)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c46c067 [SPARK-30942] Fix the warning for requiring cores to be limiting resources add e086a78 [MINOR][ML] ML cleanup No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/ml/linalg/BLAS.scala | 4 +--- mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | 4 ++-- .../scala/org/apache/spark/ml/attribute/AttributeGroup.scala | 10 -- .../org/apache/spark/ml/classification/FMClassifier.scala| 1 - .../ml/classification/MultilayerPerceptronClassifier.scala | 2 -- .../spark/ml/classification/RandomForestClassifier.scala | 2 +- .../spark/ml/evaluation/BinaryClassificationEvaluator.scala | 6 -- .../org/apache/spark/ml/evaluation/ClusteringEvaluator.scala | 2 +- .../scala/org/apache/spark/ml/feature/MinMaxScaler.scala | 2 -- .../spark/ml/r/GeneralizedLinearRegressionWrapper.scala | 1 - .../main/scala/org/apache/spark/ml/recommendation/ALS.scala | 8 .../scala/org/apache/spark/ml/regression/FMRegressor.scala | 1 - .../org/apache/spark/ml/source/image/ImageFileFormat.scala | 5 ++--- .../scala/org/apache/spark/ml/tree/impl/RandomForest.scala | 2 +- .../org/apache/spark/mllib/clustering/StreamingKMeans.scala | 7 +++ .../scala/org/apache/spark/mllib/feature/ChiSqSelector.scala | 5 +++-- .../org/apache/spark/mllib/feature/ElementwiseProduct.scala | 5 +++-- .../src/main/scala/org/apache/spark/mllib/feature/IDF.scala | 12 +++- .../src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala | 4 +--- .../apache/spark/mllib/optimization/GradientDescent.scala| 2 +- .../spark/mllib/stat/correlation/SpearmanCorrelation.scala | 4 +--- .../scala/org/apache/spark/mllib/tree/impurity/Entropy.scala | 4 +++- .../apache/spark/mllib/tree/model/treeEnsembleModels.scala | 2 +- .../org/apache/spark/mllib/util/LinearDataGenerator.scala| 2 -- 24 files changed, 43 insertions(+), 54 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c46c067 -> e086a78)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c46c067 [SPARK-30942] Fix the warning for requiring cores to be limiting resources add e086a78 [MINOR][ML] ML cleanup No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/ml/linalg/BLAS.scala | 4 +--- mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | 4 ++-- .../scala/org/apache/spark/ml/attribute/AttributeGroup.scala | 10 -- .../org/apache/spark/ml/classification/FMClassifier.scala| 1 - .../ml/classification/MultilayerPerceptronClassifier.scala | 2 -- .../spark/ml/classification/RandomForestClassifier.scala | 2 +- .../spark/ml/evaluation/BinaryClassificationEvaluator.scala | 6 -- .../org/apache/spark/ml/evaluation/ClusteringEvaluator.scala | 2 +- .../scala/org/apache/spark/ml/feature/MinMaxScaler.scala | 2 -- .../spark/ml/r/GeneralizedLinearRegressionWrapper.scala | 1 - .../main/scala/org/apache/spark/ml/recommendation/ALS.scala | 8 .../scala/org/apache/spark/ml/regression/FMRegressor.scala | 1 - .../org/apache/spark/ml/source/image/ImageFileFormat.scala | 5 ++--- .../scala/org/apache/spark/ml/tree/impl/RandomForest.scala | 2 +- .../org/apache/spark/mllib/clustering/StreamingKMeans.scala | 7 +++ .../scala/org/apache/spark/mllib/feature/ChiSqSelector.scala | 5 +++-- .../org/apache/spark/mllib/feature/ElementwiseProduct.scala | 5 +++-- .../src/main/scala/org/apache/spark/mllib/feature/IDF.scala | 12 +++- .../src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala | 4 +--- .../apache/spark/mllib/optimization/GradientDescent.scala| 2 +- .../spark/mllib/stat/correlation/SpearmanCorrelation.scala | 4 +--- .../scala/org/apache/spark/mllib/tree/impurity/Entropy.scala | 4 +++- .../apache/spark/mllib/tree/model/treeEnsembleModels.scala | 2 +- .../org/apache/spark/mllib/util/LinearDataGenerator.scala| 2 -- 24 files changed, 43 insertions(+), 54 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30942] Fix the warning for requiring cores to be limiting resources
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c46c067 [SPARK-30942] Fix the warning for requiring cores to be limiting resources c46c067 is described below commit c46c067f39213df9b3ee5a51e7d7803b867a0d54 Author: Thomas Graves AuthorDate: Tue Feb 25 10:55:56 2020 -0600 [SPARK-30942] Fix the warning for requiring cores to be limiting resources ### What changes were proposed in this pull request? fix the warning for limiting resources when we don't know the number of executor cores. The issue is that there are places in the Spark code that use cores/task cpus to calculate slots and until the entire Stage level scheduling feature is in, we have to rely on the cores being the limiting resource. Change the check to only warn when custom resources are specified. ### Why are the changes needed? fix the check and warn when we should ### Does this PR introduce any user-facing change? A warning is printed ### How was this patch tested? manually tested spark-shell with standalone mode, yarn, local mode. Closes #27686 from tgravescs/SPARK-30942. Authored-by: Thomas Graves Signed-off-by: Thomas Graves --- core/src/main/scala/org/apache/spark/SparkContext.scala| 2 +- .../src/main/scala/org/apache/spark/resource/ResourceProfile.scala | 7 +++ 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index a47136e..f377f13 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -2798,7 +2798,7 @@ object SparkContext extends Logging { defaultProf.maxTasksPerExecutor(sc.conf) < cpuSlots) { throw new IllegalArgumentException("The number of slots on an executor has to be " + "limited by the number of cores, otherwise you waste resources and " + - "dynamic allocation doesn't work properly. Your configuration has " + + "some scheduling doesn't work properly. Your configuration has " + s"core/task cpu slots = ${cpuSlots} and " + s"${limitingResource} = " + s"${defaultProf.maxTasksPerExecutor(sc.conf)}. Please adjust your configuration " + diff --git a/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala b/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala index 2608ab9..5b2476c 100644 --- a/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala +++ b/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala @@ -168,7 +168,7 @@ class ResourceProfile( // limiting resource because the scheduler code uses that for slots throw new IllegalArgumentException("The number of slots on an executor has to be " + "limited by the number of cores, otherwise you waste resources and " + - "dynamic allocation doesn't work properly. Your configuration has " + + "some scheduling doesn't work properly. Your configuration has " + s"core/task cpu slots = ${taskLimit} and " + s"${execReq.resourceName} = ${numTasks}. " + "Please adjust your configuration so that all resources require same number " + @@ -183,12 +183,11 @@ class ResourceProfile( "no corresponding task resource request was specified.") } } -if(!shouldCheckExecCores && Utils.isDynamicAllocationEnabled(sparkConf)) { +if(!shouldCheckExecCores && execResourceToCheck.nonEmpty) { // if we can't rely on the executor cores config throw a warning for user logWarning("Please ensure that the number of slots available on your " + "executors is limited by the number of cores to task cpus and not another " + -"custom resource. If cores is not the limiting resource then dynamic " + -"allocation will not work properly!") +"custom resource.") } if (taskResourcesToCheck.nonEmpty) { throw new SparkException("No executor resource configs were not specified for the " + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (761209c -> ffc0935)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 761209c [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations add ffc0935 [SPARK-30869][SQL] Convert dates to/from timestamps in microseconds precision No new revisions were added by this update. Summary of changes: .../catalyst/expressions/datetimeExpressions.scala | 10 ++-- .../spark/sql/catalyst/util/DateFormatter.scala| 7 +-- .../spark/sql/catalyst/util/DateTimeUtils.scala| 58 ++ .../spark/sql/catalyst/util/IntervalUtils.scala| 4 +- .../sql/catalyst/util/TimestampFormatter.scala | 4 +- .../sql/catalyst/csv/UnivocityParserSuite.scala| 4 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 4 +- .../expressions/DateExpressionsSuite.scala | 26 +- .../optimizer/ComputeCurrentTimeSuite.scala| 6 ++- .../sql/catalyst/util/DateTimeUtilsSuite.scala | 29 +-- .../sql/catalyst/util/IntervalUtilsSuite.scala | 6 +-- .../parquet/VectorizedColumnReader.java| 4 +- .../datasources/binaryfile/BinaryFileFormat.scala | 2 +- .../datasources/parquet/ParquetRowConverter.scala | 2 +- .../datasources/parquet/ParquetWriteSupport.scala | 2 +- .../streaming/EventTimeWatermarkExec.scala | 4 +- .../spark/sql/execution/streaming/Triggers.scala | 4 +- .../continuous/ContinuousRateStreamSource.scala| 2 +- .../sources/RateStreamMicroBatchStream.scala | 2 +- .../sources/TextSocketMicroBatchStream.scala | 2 +- .../org/apache/spark/sql/DateFunctionsSuite.scala | 6 +-- .../spark/sql/StatisticsCollectionTestBase.scala | 4 +- .../sql/execution/datasources/json/JsonSuite.scala | 2 +- .../apache/spark/sql/streaming/StreamSuite.scala | 3 +- 24 files changed, 98 insertions(+), 99 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1d746eb [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations 1d746eb is described below commit 1d746eb0afddd3c2a4e1313dddf80ac0aec00a7a Author: Kent Yao AuthorDate: Tue Feb 25 22:19:24 2020 +0800 [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations ### What changes were proposed in this pull request? The current behavior of interval multiply and divide follows the ANSI SQL standard when overflow, it is compatible with other operations when `spark.sql.ansi.enabled` is true, but not compatible when `spark.sql.ansi.enabled` is false. When `spark.sql.ansi.enabled` is false, as the factor is a double value, so it should use java's rounding or truncation behavior for casting double to integrals. when divided by zero, it returns `null`. we also follow the natural rules for intervals as defined in the Gregorian calendar, so we do not add the month fraction to days but add days fraction to microseconds. ### Why are the changes needed? Make interval multiply and divide's overflow behavior consistent with other interval operations ### Does this PR introduce any user-facing change? no, these are new features in 3.0 ### How was this patch tested? add uts Closes #27672 from yaooqinn/SPARK-30919. Authored-by: Kent Yao Signed-off-by: Wenchen Fan (cherry picked from commit 761209c1f2af513a9db2e08c5937531cff7aeeed) Signed-off-by: Wenchen Fan --- .../catalyst/expressions/intervalExpressions.scala | 34 +--- .../spark/sql/catalyst/util/IntervalUtils.scala| 45 +--- .../expressions/IntervalExpressionsSuite.scala | 37 - .../sql/catalyst/util/IntervalUtilsSuite.scala | 61 ++ .../test/resources/sql-tests/inputs/interval.sql | 4 ++ .../sql-tests/results/ansi/interval.sql.out| 38 +- .../resources/sql-tests/results/interval.sql.out | 49 + 7 files changed, 210 insertions(+), 58 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala index 831510e..c09350f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala @@ -22,6 +22,7 @@ import java.util.Locale import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} import org.apache.spark.sql.catalyst.util.IntervalUtils import org.apache.spark.sql.catalyst.util.IntervalUtils._ +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.CalendarInterval @@ -112,13 +113,14 @@ object ExtractIntervalPart { abstract class IntervalNumOperation( interval: Expression, -num: Expression, -operation: (CalendarInterval, Double) => CalendarInterval, -operationName: String) +num: Expression) extends BinaryExpression with ImplicitCastInputTypes with Serializable { override def left: Expression = interval override def right: Expression = num + protected val operation: (CalendarInterval, Double) => CalendarInterval + protected def operationName: String + override def inputTypes: Seq[AbstractDataType] = Seq(CalendarIntervalType, DoubleType) override def dataType: DataType = CalendarIntervalType @@ -136,11 +138,29 @@ abstract class IntervalNumOperation( override def prettyName: String = operationName.stripSuffix("Exact") + "_interval" } -case class MultiplyInterval(interval: Expression, num: Expression) - extends IntervalNumOperation(interval, num, multiplyExact, "multiplyExact") +case class MultiplyInterval( +interval: Expression, +num: Expression, +checkOverflow: Boolean = SQLConf.get.ansiEnabled) + extends IntervalNumOperation(interval, num) { + + override protected val operation: (CalendarInterval, Double) => CalendarInterval = +if (checkOverflow) multiplyExact else multiply + + override protected def operationName: String = if (checkOverflow) "multiplyExact" else "multiply" +} + +case class DivideInterval( +interval: Expression, +num: Expression, +checkOverflow: Boolean = SQLConf.get.ansiEnabled) + extends IntervalNumOperation(interval, num) { + + override protected val operation: (CalendarInterval, Double) => CalendarInterval = +if (checkOverflow) divideExact else divide -case class
[spark] branch master updated (e45f2c7 -> 761209c)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e45f2c7 [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests add 761209c [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations No new revisions were added by this update. Summary of changes: .../catalyst/expressions/intervalExpressions.scala | 34 +--- .../spark/sql/catalyst/util/IntervalUtils.scala| 45 +--- .../expressions/IntervalExpressionsSuite.scala | 37 - .../sql/catalyst/util/IntervalUtilsSuite.scala | 61 ++ .../test/resources/sql-tests/inputs/interval.sql | 4 ++ .../sql-tests/results/ansi/interval.sql.out| 38 +- .../resources/sql-tests/results/interval.sql.out | 49 + 7 files changed, 210 insertions(+), 58 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e45f2c7 -> 761209c)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e45f2c7 [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests add 761209c [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations No new revisions were added by this update. Summary of changes: .../catalyst/expressions/intervalExpressions.scala | 34 +--- .../spark/sql/catalyst/util/IntervalUtils.scala| 45 +--- .../expressions/IntervalExpressionsSuite.scala | 37 - .../sql/catalyst/util/IntervalUtilsSuite.scala | 61 ++ .../test/resources/sql-tests/inputs/interval.sql | 4 ++ .../sql-tests/results/ansi/interval.sql.out| 38 +- .../resources/sql-tests/results/interval.sql.out | 49 + 7 files changed, 210 insertions(+), 58 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ff6662a [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests ff6662a is described below commit ff6662acc6ff1511cc1c6b3671c54156102b0aae Author: Yuanjian Li AuthorDate: Tue Feb 25 17:37:34 2020 +0900 [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests ### What changes were proposed in this pull request? Split the nested CTE cases into a single file `cte-nested.sql`, which will be reused in cte-legacy.sql and cte-nonlegacy.sql. ### Why are the changes needed? Make the cases easy to maintain. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UT. Closes #27667 from xuanyuanking/SPARK-28228-test. Authored-by: Yuanjian Li Signed-off-by: HyukjinKwon --- .../test/resources/sql-tests/inputs/cte-legacy.sql | 117 +- .../inputs/{cte-legacy.sql => cte-nested.sql} | 10 -- .../resources/sql-tests/inputs/cte-nonlegacy.sql | 2 +- .../src/test/resources/sql-tests/inputs/cte.sql| 106 .../resources/sql-tests/results/cte-legacy.sql.out | 42 + .../results/{cte.sql.out => cte-nested.sql.out}| 177 + .../sql-tests/results/cte-nonlegacy.sql.out| 177 + .../test/resources/sql-tests/results/cte.sql.out | 174 +--- 8 files changed, 7 insertions(+), 798 deletions(-) diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql b/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql index d8754d3..29dee1a 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql @@ -1,115 +1,2 @@ -create temporary view t as select * from values 0, 1, 2 as t(id); -create temporary view t2 as select * from values 0, 1 as t(id); - --- CTE legacy substitution -SET spark.sql.legacy.ctePrecedencePolicy=legacy; - --- CTE in CTE definition -WITH t as ( - WITH t2 AS (SELECT 1) - SELECT * FROM t2 -) -SELECT * FROM t; - --- CTE in subquery -SELECT max(c) FROM ( - WITH t(c) AS (SELECT 1) - SELECT * FROM t -); - --- CTE in subquery expression -SELECT ( - WITH t AS (SELECT 1) - SELECT * FROM t -); - --- CTE in CTE definition shadows outer -WITH - t AS (SELECT 1), - t2 AS ( -WITH t AS (SELECT 2) -SELECT * FROM t - ) -SELECT * FROM t2; - --- CTE in CTE definition shadows outer 2 -WITH - t(c) AS (SELECT 1), - t2 AS ( -SELECT ( - SELECT max(c) FROM ( -WITH t(c) AS (SELECT 2) -SELECT * FROM t - ) -) - ) -SELECT * FROM t2; - --- CTE in CTE definition shadows outer 3 -WITH - t AS (SELECT 1), - t2 AS ( -WITH t AS (SELECT 2), -t2 AS ( - WITH t AS (SELECT 3) - SELECT * FROM t -) -SELECT * FROM t2 - ) -SELECT * FROM t2; - --- CTE in subquery shadows outer -WITH t(c) AS (SELECT 1) -SELECT max(c) FROM ( - WITH t(c) AS (SELECT 2) - SELECT * FROM t -); - --- CTE in subquery shadows outer 2 -WITH t(c) AS (SELECT 1) -SELECT sum(c) FROM ( - SELECT max(c) AS c FROM ( -WITH t(c) AS (SELECT 2) -SELECT * FROM t - ) -); - --- CTE in subquery shadows outer 3 -WITH t(c) AS (SELECT 1) -SELECT sum(c) FROM ( - WITH t(c) AS (SELECT 2) - SELECT max(c) AS c FROM ( -WITH t(c) AS (SELECT 3) -SELECT * FROM t - ) -); - --- CTE in subquery expression shadows outer -WITH t AS (SELECT 1) -SELECT ( - WITH t AS (SELECT 2) - SELECT * FROM t -); - --- CTE in subquery expression shadows outer 2 -WITH t AS (SELECT 1) -SELECT ( - SELECT ( -WITH t AS (SELECT 2) -SELECT * FROM t - ) -); - --- CTE in subquery expression shadows outer 3 -WITH t AS (SELECT 1) -SELECT ( - WITH t AS (SELECT 2) - SELECT ( -WITH t AS (SELECT 3) -SELECT * FROM t - ) -); - --- Clean up -DROP VIEW IF EXISTS t; -DROP VIEW IF EXISTS t2; +--SET spark.sql.legacy.ctePrecedencePolicy = legacy +--IMPORT cte-nested.sql diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql similarity index 86% copy from sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql copy to sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql index d8754d3..5e5e3a5 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql @@ -1,9 +1,3 @@ -create temporary view t as select * from values 0, 1, 2 as t(id); -create temporary view t2 as select * from values 0, 1 as t(id); - --- CTE legacy substitution -SET spark.sql.legacy.ctePrecedencePolicy=legacy; - -- CTE in CTE definition WITH t as ( WITH t2 AS (SELECT 1) @@ -109,7 +103,3 @@ SELECT ( SELECT * FROM t
[spark] branch master updated (f152d2a -> e45f2c7)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f152d2a [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central add e45f2c7 [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests No new revisions were added by this update. Summary of changes: .../test/resources/sql-tests/inputs/cte-legacy.sql | 117 +- .../inputs/{cte-legacy.sql => cte-nested.sql} | 10 -- .../resources/sql-tests/inputs/cte-nonlegacy.sql | 2 +- .../src/test/resources/sql-tests/inputs/cte.sql| 106 .../resources/sql-tests/results/cte-legacy.sql.out | 42 + .../results/{cte.sql.out => cte-nested.sql.out}| 177 + .../sql-tests/results/cte-nonlegacy.sql.out| 177 + .../test/resources/sql-tests/results/cte.sql.out | 174 +--- 8 files changed, 7 insertions(+), 798 deletions(-) copy sql/core/src/test/resources/sql-tests/inputs/{cte-legacy.sql => cte-nested.sql} (86%) copy sql/core/src/test/resources/sql-tests/results/{cte.sql.out => cte-nested.sql.out} (59%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new b302caf [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central b302caf is described below commit b302caf4d28752a2bf5537c69fd9cbdc8b703e8b Author: Josh Rosen AuthorDate: Tue Feb 25 17:04:13 2020 +0900 [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html This patch updates our build files to use these new URLs. No. Existing build + tests. Closes #27688 from JoshRosen/update-gcs-mirror-url. Authored-by: Josh Rosen Signed-off-by: HyukjinKwon --- pom.xml | 4 ++-- project/SparkBuild.scala | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/pom.xml b/pom.xml index 32b7bae..0741096 100644 --- a/pom.xml +++ b/pom.xml @@ -237,7 +237,7 @@ See https://storage-download.googleapis.com/maven-central/index.html --> GCS Maven Central mirror - https://maven-central.storage-download.googleapis.com/repos/central/data/ + https://maven-central.storage-download.googleapis.com/maven2/ true @@ -268,7 +268,7 @@ See https://storage-download.googleapis.com/maven-central/index.html --> GCS Maven Central mirror - https://maven-central.storage-download.googleapis.com/repos/central/data/ + https://maven-central.storage-download.googleapis.com/maven2/ true diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 4a3f8a5..3f85ac6 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -224,7 +224,7 @@ object SparkBuild extends PomBuild { resolvers := Seq( // Google Mirror of Maven Central, placed first so that it's used instead of flaky Maven Central. // See https://storage-download.googleapis.com/maven-central/index.html for more info. - "gcs-maven-central-mirror" at "https://maven-central.storage-download.googleapis.com/repos/central/data/;, + "gcs-maven-central-mirror" at "https://maven-central.storage-download.googleapis.com/maven2/;, DefaultMavenRepository, Resolver.mavenLocal, Resolver.file("local", file(Path.userHome.absolutePath + "/.ivy2/local"))(Resolver.ivyStylePatterns) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2bbb995 [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central 2bbb995 is described below commit 2bbb9958c3b017062cafca8c78fc9e6d6d33dbd7 Author: Josh Rosen AuthorDate: Tue Feb 25 17:04:13 2020 +0900 [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central ### What changes were proposed in this pull request? This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html This patch updates our build files to use these new URLs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing build + tests. Closes #27688 from JoshRosen/update-gcs-mirror-url. Authored-by: Josh Rosen Signed-off-by: HyukjinKwon (cherry picked from commit f152d2a0a80e2756dd620538a46b030dd5a6e630) Signed-off-by: HyukjinKwon --- pom.xml | 4 ++-- project/SparkBuild.scala | 2 +- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/pom.xml b/pom.xml index 925fa28..b3750e4 100644 --- a/pom.xml +++ b/pom.xml @@ -253,7 +253,7 @@ See https://storage-download.googleapis.com/maven-central/index.html --> GCS Maven Central mirror - https://maven-central.storage-download.googleapis.com/repos/central/data/ + https://maven-central.storage-download.googleapis.com/maven2/ true @@ -284,7 +284,7 @@ See https://storage-download.googleapis.com/maven-central/index.html --> GCS Maven Central mirror - https://maven-central.storage-download.googleapis.com/repos/central/data/ + https://maven-central.storage-download.googleapis.com/maven2/ true diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index a07823c..fcde1e9 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -226,7 +226,7 @@ object SparkBuild extends PomBuild { resolvers := Seq( // Google Mirror of Maven Central, placed first so that it's used instead of flaky Maven Central. // See https://storage-download.googleapis.com/maven-central/index.html for more info. - "gcs-maven-central-mirror" at "https://maven-central.storage-download.googleapis.com/repos/central/data/;, + "gcs-maven-central-mirror" at "https://maven-central.storage-download.googleapis.com/maven2/;, DefaultMavenRepository, Resolver.mavenLocal, Resolver.file("local", file(Path.userHome.absolutePath + "/.ivy2/local"))(Resolver.ivyStylePatterns) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 5297497..674c6df 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2136,7 +2136,7 @@ object SQLConf { "if the default Maven Central repo is unreachable.") .stringConf .createWithDefault( - "https://maven-central.storage-download.googleapis.com/repos/central/data/;) +"https://maven-central.storage-download.googleapis.com/maven2/;) val LEGACY_FROM_DAYTIME_STRING = buildConf("spark.sql.legacy.fromDayTimeString.enabled") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2bbb995 [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central 2bbb995 is described below commit 2bbb9958c3b017062cafca8c78fc9e6d6d33dbd7 Author: Josh Rosen AuthorDate: Tue Feb 25 17:04:13 2020 +0900 [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central ### What changes were proposed in this pull request? This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html This patch updates our build files to use these new URLs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing build + tests. Closes #27688 from JoshRosen/update-gcs-mirror-url. Authored-by: Josh Rosen Signed-off-by: HyukjinKwon (cherry picked from commit f152d2a0a80e2756dd620538a46b030dd5a6e630) Signed-off-by: HyukjinKwon --- pom.xml | 4 ++-- project/SparkBuild.scala | 2 +- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/pom.xml b/pom.xml index 925fa28..b3750e4 100644 --- a/pom.xml +++ b/pom.xml @@ -253,7 +253,7 @@ See https://storage-download.googleapis.com/maven-central/index.html --> GCS Maven Central mirror - https://maven-central.storage-download.googleapis.com/repos/central/data/ + https://maven-central.storage-download.googleapis.com/maven2/ true @@ -284,7 +284,7 @@ See https://storage-download.googleapis.com/maven-central/index.html --> GCS Maven Central mirror - https://maven-central.storage-download.googleapis.com/repos/central/data/ + https://maven-central.storage-download.googleapis.com/maven2/ true diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index a07823c..fcde1e9 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -226,7 +226,7 @@ object SparkBuild extends PomBuild { resolvers := Seq( // Google Mirror of Maven Central, placed first so that it's used instead of flaky Maven Central. // See https://storage-download.googleapis.com/maven-central/index.html for more info. - "gcs-maven-central-mirror" at "https://maven-central.storage-download.googleapis.com/repos/central/data/;, + "gcs-maven-central-mirror" at "https://maven-central.storage-download.googleapis.com/maven2/;, DefaultMavenRepository, Resolver.mavenLocal, Resolver.file("local", file(Path.userHome.absolutePath + "/.ivy2/local"))(Resolver.ivyStylePatterns) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 5297497..674c6df 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2136,7 +2136,7 @@ object SQLConf { "if the default Maven Central repo is unreachable.") .stringConf .createWithDefault( - "https://maven-central.storage-download.googleapis.com/repos/central/data/;) +"https://maven-central.storage-download.googleapis.com/maven2/;) val LEGACY_FROM_DAYTIME_STRING = buildConf("spark.sql.legacy.fromDayTimeString.enabled") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0fd4fa7 -> f152d2a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0fd4fa7 [SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided add f152d2a [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central No new revisions were added by this update. Summary of changes: pom.xml | 4 ++-- project/SparkBuild.scala | 2 +- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0fd4fa7 -> f152d2a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0fd4fa7 [SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided add f152d2a [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central No new revisions were added by this update. Summary of changes: pom.xml | 4 ++-- project/SparkBuild.scala | 2 +- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org