[spark] branch master updated (e9fd522 -> 28b8713)

2020-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e9fd522  [SPARK-30689][CORE][FOLLOW-UP] Rename config name of 
discovery plugin
 add 28b8713  [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT

No new revisions were added by this update.

Summary of changes:
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 project/MimaExcludes.scala | 5 +
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 40 files changed, 45 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 742e35f  [SPARK-30689][CORE][FOLLOW-UP] Rename config name of 
discovery plugin
742e35f is described below

commit 742e35f1d48c2523dda2ce21d73b7ab5ade20582
Author: yi.wu 
AuthorDate: Wed Feb 26 11:55:05 2020 +0900

[SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin

### What changes were proposed in this pull request?

Rename config `spark.resources.discovery.plugin` to 
`spark.resources.discoveryPlugin`.

Also, as a side minor change: labeled `ResourceDiscoveryScriptPlugin` as 
`DeveloperApi` since it's not for end user.

### Why are the changes needed?

Discovery plugin doesn't need to reserve the "discovery" namespace here and 
it's more consistent with the interface name `ResourceDiscoveryPlugin` if we 
use `discoveryPlugin` instead.

### Does this PR introduce any user-facing change?

No, it's newly added in Spark3.0.

### How was this patch tested?

Pass Jenkins.

Closes #27689 from Ngone51/spark_30689_followup.

Authored-by: yi.wu 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e9fd52282e4ed4831c5922348b0e1ee71e045b4b)
Signed-off-by: HyukjinKwon 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala  | 2 +-
 .../scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala | 2 ++
 docs/configuration.md   | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 3f36e61..37ce178 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -55,7 +55,7 @@ package object config {
   .createOptional
 
   private[spark] val RESOURCES_DISCOVERY_PLUGIN =
-ConfigBuilder("spark.resources.discovery.plugin")
+ConfigBuilder("spark.resources.discoveryPlugin")
   .doc("Comma-separated list of class names implementing" +
 "org.apache.spark.api.resource.ResourceDiscoveryPlugin to load into 
the application." +
 "This is for advanced users to replace the resource discovery class 
with a " +
diff --git 
a/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala
 
b/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala
index 2ac6d3c..7027d1e 100644
--- 
a/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala
+++ 
b/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala
@@ -21,6 +21,7 @@ import java.io.File
 import java.util.Optional
 
 import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.api.resource.ResourceDiscoveryPlugin
 import org.apache.spark.internal.Logging
 import org.apache.spark.util.Utils.executeAndGetOutput
@@ -32,6 +33,7 @@ import org.apache.spark.util.Utils.executeAndGetOutput
  * If the user specifies custom plugins, this is the last one to be executed 
and
  * throws if the resource isn't discovered.
  */
+@DeveloperApi
 class ResourceDiscoveryScriptPlugin extends ResourceDiscoveryPlugin with 
Logging {
   override def discoverResource(
   request: ResourceRequest,
diff --git a/docs/configuration.md b/docs/configuration.md
index 2421e00..469feed 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -244,7 +244,7 @@ of the most common options to set are:
   
 
 
- spark.resources.discovery.plugin
+ spark.resources.discoveryPlugin
   org.apache.spark.resource.ResourceDiscoveryScriptPlugin
   
 Comma-separated list of class names implementing


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30662][ML][PYSPARK] Put back the API changes for HasBlockSize in ALS/MLP

2020-02-25 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 84345c7  [SPARK-30662][ML][PYSPARK] Put back the API changes for 
HasBlockSize in ALS/MLP
84345c7 is described below

commit 84345c7e67c9dfd47ec76d5a3d2ad76b62f959b6
Author: Huaxin Gao 
AuthorDate: Sun Feb 9 13:14:30 2020 +0800

[SPARK-30662][ML][PYSPARK] Put back the API changes for HasBlockSize in 
ALS/MLP

### What changes were proposed in this pull request?
Add ```HasBlockSize``` in shared Params in both Scala and Python.
Make ALS/MLP extend ```HasBlockSize```

### Why are the changes needed?
Add ```HasBlockSize ``` in ALS, so user can specify the blockSize.
Make ```HasBlockSize``` a shared param so both ALS and MLP can use it.

### Does this PR introduce any user-facing change?
Yes
```ALS.setBlockSize/getBlockSize```
```ALSModel.setBlockSize/getBlockSize```

### How was this patch tested?
Manually tested. Also added doctest.

Closes #27501 from huaxingao/spark_30662.

Authored-by: Huaxin Gao 
Signed-off-by: zhengruifeng 
---
 .../MultilayerPerceptronClassifier.scala   | 22 +--
 .../ml/param/shared/SharedParamsCodeGen.scala  |  6 ++-
 .../spark/ml/param/shared/sharedParams.scala   | 17 
 .../org/apache/spark/ml/recommendation/ALS.scala   | 46 --
 python/pyspark/ml/classification.py| 22 ---
 python/pyspark/ml/param/_shared_params_code_gen.py |  5 ++-
 python/pyspark/ml/param/shared.py  | 17 
 python/pyspark/ml/recommendation.py| 29 +++---
 8 files changed, 109 insertions(+), 55 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
index c7a8237..6e8f92b 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
@@ -34,7 +34,7 @@ import org.apache.spark.util.VersionUtils.majorMinorVersion
 
 /** Params for Multilayer Perceptron. */
 private[classification] trait MultilayerPerceptronParams extends 
ProbabilisticClassifierParams
-  with HasSeed with HasMaxIter with HasTol with HasStepSize with HasSolver {
+  with HasSeed with HasMaxIter with HasTol with HasStepSize with HasSolver 
with HasBlockSize {
 
   import MultilayerPerceptronClassifier._
 
@@ -55,26 +55,6 @@ private[classification] trait MultilayerPerceptronParams 
extends ProbabilisticCl
   final def getLayers: Array[Int] = $(layers)
 
   /**
-   * Block size for stacking input data in matrices to speed up the 
computation.
-   * Data is stacked within partitions. If block size is more than remaining 
data in
-   * a partition then it is adjusted to the size of this data.
-   * Recommended size is between 10 and 1000.
-   * Default: 128
-   *
-   * @group expertParam
-   */
-  @Since("1.5.0")
-  final val blockSize: IntParam = new IntParam(this, "blockSize",
-"Block size for stacking input data in matrices. Data is stacked within 
partitions." +
-  " If block size is more than remaining data in a partition then " +
-  "it is adjusted to the size of this data. Recommended size is between 10 
and 1000",
-ParamValidators.gt(0))
-
-  /** @group expertGetParam */
-  @Since("1.5.0")
-  final def getBlockSize: Int = $(blockSize)
-
-  /**
* The solver algorithm for optimization.
* Supported options: "gd" (minibatch gradient descent) or "l-bfgs".
* Default: "l-bfgs"
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
index 7ac680e..6194dfa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
@@ -104,7 +104,11 @@ private[shared] object SharedParamsCodeGen {
 isValid = "ParamValidators.inArray(Array(\"euclidean\", \"cosine\"))"),
   ParamDesc[String]("validationIndicatorCol", "name of the column that 
indicates whether " +
 "each row is for training or for validation. False indicates training; 
true indicates " +
-"validation.")
+"validation."),
+  ParamDesc[Int]("blockSize", "block size for stacking input data in 
matrices. Data is " +
+"stacked within partitions. If block size is more than remaining data 
in a partition " +
+"then it is adjusted to the size of this data.",
+isValid = "ParamValidators.gt(0)", isExpertParam = true)
 )
 
 val 

[spark] branch branch-3.0 updated: [SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 742e35f  [SPARK-30689][CORE][FOLLOW-UP] Rename config name of 
discovery plugin
742e35f is described below

commit 742e35f1d48c2523dda2ce21d73b7ab5ade20582
Author: yi.wu 
AuthorDate: Wed Feb 26 11:55:05 2020 +0900

[SPARK-30689][CORE][FOLLOW-UP] Rename config name of discovery plugin

### What changes were proposed in this pull request?

Rename config `spark.resources.discovery.plugin` to 
`spark.resources.discoveryPlugin`.

Also, as a side minor change: labeled `ResourceDiscoveryScriptPlugin` as 
`DeveloperApi` since it's not for end user.

### Why are the changes needed?

Discovery plugin doesn't need to reserve the "discovery" namespace here and 
it's more consistent with the interface name `ResourceDiscoveryPlugin` if we 
use `discoveryPlugin` instead.

### Does this PR introduce any user-facing change?

No, it's newly added in Spark3.0.

### How was this patch tested?

Pass Jenkins.

Closes #27689 from Ngone51/spark_30689_followup.

Authored-by: yi.wu 
Signed-off-by: HyukjinKwon 
(cherry picked from commit e9fd52282e4ed4831c5922348b0e1ee71e045b4b)
Signed-off-by: HyukjinKwon 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala  | 2 +-
 .../scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala | 2 ++
 docs/configuration.md   | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 3f36e61..37ce178 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -55,7 +55,7 @@ package object config {
   .createOptional
 
   private[spark] val RESOURCES_DISCOVERY_PLUGIN =
-ConfigBuilder("spark.resources.discovery.plugin")
+ConfigBuilder("spark.resources.discoveryPlugin")
   .doc("Comma-separated list of class names implementing" +
 "org.apache.spark.api.resource.ResourceDiscoveryPlugin to load into 
the application." +
 "This is for advanced users to replace the resource discovery class 
with a " +
diff --git 
a/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala
 
b/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala
index 2ac6d3c..7027d1e 100644
--- 
a/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala
+++ 
b/core/src/main/scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala
@@ -21,6 +21,7 @@ import java.io.File
 import java.util.Optional
 
 import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.api.resource.ResourceDiscoveryPlugin
 import org.apache.spark.internal.Logging
 import org.apache.spark.util.Utils.executeAndGetOutput
@@ -32,6 +33,7 @@ import org.apache.spark.util.Utils.executeAndGetOutput
  * If the user specifies custom plugins, this is the last one to be executed 
and
  * throws if the resource isn't discovered.
  */
+@DeveloperApi
 class ResourceDiscoveryScriptPlugin extends ResourceDiscoveryPlugin with 
Logging {
   override def discoverResource(
   request: ResourceRequest,
diff --git a/docs/configuration.md b/docs/configuration.md
index 2421e00..469feed 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -244,7 +244,7 @@ of the most common options to set are:
   
 
 
- spark.resources.discovery.plugin
+ spark.resources.discoveryPlugin
   org.apache.spark.resource.ResourceDiscoveryScriptPlugin
   
 Comma-separated list of class names implementing


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9ea6c0a -> e9fd522)

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9ea6c0a  [SPARK-30943][SS] Show "batch ID" in tool tip string for 
Structured Streaming UI graphs
 add e9fd522  [SPARK-30689][CORE][FOLLOW-UP] Rename config name of 
discovery plugin

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala  | 2 +-
 .../scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala | 2 ++
 docs/configuration.md   | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9ea6c0a -> e9fd522)

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9ea6c0a  [SPARK-30943][SS] Show "batch ID" in tool tip string for 
Structured Streaming UI graphs
 add e9fd522  [SPARK-30689][CORE][FOLLOW-UP] Rename config name of 
discovery plugin

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala  | 2 +-
 .../scala/org/apache/spark/resource/ResourceDiscoveryScriptPlugin.scala | 2 ++
 docs/configuration.md   | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs

2020-02-25 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5343059  [SPARK-30943][SS] Show "batch ID" in tool tip string for 
Structured Streaming UI graphs
5343059 is described below

commit 53430594587ad0134eff5cd2b5e06a7a3eec1b99
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Feb 25 15:29:36 2020 -0800

[SPARK-30943][SS] Show "batch ID" in tool tip string for Structured 
Streaming UI graphs

### What changes were proposed in this pull request?

This patch changes the tool tip string in Structured Streaming UI graphs to 
show batch ID (and timestamp as well) instead of only showing timestamp, which 
was a key for DStream but no longer a key for Structured Streaming.

This patch does some refactoring as there're some spots on confusion 
between js file for streaming and structured streaming.

Note that this patch doesn't actually change the x axis, as once we change 
it we should decouple the logic for graphs between streaming and structured 
streaming. It won't change UX meaningfully as in x axis we only show min and 
max which we still would like to know about "time" as well as batch ID.

### Why are the changes needed?

In Structured Streaming, everything is aligned for "batch ID" where the UI 
is only showing timestamp - end users have to manually find and correlate batch 
ID and the timestamp which is clearly a huge pain.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manually tested. Screenshots:

![Screen Shot 2020-02-25 at 7 22 38 
AM](https://user-images.githubusercontent.com/1317309/75197701-40b2ce80-57a2-11ea-9578-c2eb2d1091de.png)
![Screen Shot 2020-02-25 at 7 22 44 
AM](https://user-images.githubusercontent.com/1317309/75197704-427c9200-57a2-11ea-9439-e0a8303d0860.png)
![Screen Shot 2020-02-25 at 7 22 58 
AM](https://user-images.githubusercontent.com/1317309/75197706-43152880-57a2-11ea-9617-1276c3ba181e.png)
![Screen Shot 2020-02-25 at 7 23 04 
AM](https://user-images.githubusercontent.com/1317309/75197708-43152880-57a2-11ea-9de2-7d37eaf88102.png)
![Screen Shot 2020-02-25 at 7 23 31 
AM](https://user-images.githubusercontent.com/1317309/75197710-43adbf00-57a2-11ea-9ae4-4e292de39c36.png)

Closes #27687 from HeartSaVioR/SPARK-30943.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Shixiong Zhu 
(cherry picked from commit 9ea6c0a8975a1277abba799b51aca4e2818c23e7)
Signed-off-by: Shixiong Zhu 
---
 .../org/apache/spark/ui/static/streaming-page.js   |  2 +-
 .../spark/ui/static/structured-streaming-page.js   |  4 +--
 .../ui/StreamingQueryStatisticsPage.scala  | 36 ++
 .../apache/spark/streaming/ui/StreamingPage.scala  | 13 +++-
 4 files changed, 45 insertions(+), 10 deletions(-)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js 
b/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js
index 5b75bc3..ed3e65c3 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js
@@ -171,7 +171,7 @@ function drawTimeline(id, data, minX, maxX, minY, maxY, 
unitY, batchInterval) {
 .attr("cy", function(d) { return y(d.y); })
 .attr("r", function(d) { return isFailedBatch(d.x) ? "2" : "3";})
 .on('mouseover', function(d) {
-var tip = formatYValue(d.y) + " " + unitY + " at " + 
timeFormat[d.x];
+var tip = formatYValue(d.y) + " " + unitY + " at " + 
timeTipStrings[d.x];
 showBootstrapTooltip(d3.select(this).node(), tip);
 // show the point
 d3.select(this)
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js
 
b/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js
index 70250fd..c92226b 100644
--- 
a/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js
+++ 
b/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js
@@ -106,12 +106,12 @@ function drawAreaStack(id, labels, values, minX, maxX, 
minY, maxY) {
 .on('mouseover', function(d) {
 var tip = '';
 var idx = 0;
-var _values = timeToValues[d._x]
+var _values = formattedTimeToValues[d._x];
 _values.forEach(function (k) {
 tip += labels[idx] + ': ' + k + '   ';
 idx += 1;
 });
-tip += " at " + d._x
+tip += " at " + formattedTimeTipStrings[d._x];
 showBootstrapTooltip(d3.select(this).node(), tip);
 })
 

[spark] branch master updated: [SPARK-30943][SS] Show "batch ID" in tool tip string for Structured Streaming UI graphs

2020-02-25 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9ea6c0a  [SPARK-30943][SS] Show "batch ID" in tool tip string for 
Structured Streaming UI graphs
9ea6c0a is described below

commit 9ea6c0a8975a1277abba799b51aca4e2818c23e7
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Feb 25 15:29:36 2020 -0800

[SPARK-30943][SS] Show "batch ID" in tool tip string for Structured 
Streaming UI graphs

### What changes were proposed in this pull request?

This patch changes the tool tip string in Structured Streaming UI graphs to 
show batch ID (and timestamp as well) instead of only showing timestamp, which 
was a key for DStream but no longer a key for Structured Streaming.

This patch does some refactoring as there're some spots on confusion 
between js file for streaming and structured streaming.

Note that this patch doesn't actually change the x axis, as once we change 
it we should decouple the logic for graphs between streaming and structured 
streaming. It won't change UX meaningfully as in x axis we only show min and 
max which we still would like to know about "time" as well as batch ID.

### Why are the changes needed?

In Structured Streaming, everything is aligned for "batch ID" where the UI 
is only showing timestamp - end users have to manually find and correlate batch 
ID and the timestamp which is clearly a huge pain.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Manually tested. Screenshots:

![Screen Shot 2020-02-25 at 7 22 38 
AM](https://user-images.githubusercontent.com/1317309/75197701-40b2ce80-57a2-11ea-9578-c2eb2d1091de.png)
![Screen Shot 2020-02-25 at 7 22 44 
AM](https://user-images.githubusercontent.com/1317309/75197704-427c9200-57a2-11ea-9439-e0a8303d0860.png)
![Screen Shot 2020-02-25 at 7 22 58 
AM](https://user-images.githubusercontent.com/1317309/75197706-43152880-57a2-11ea-9617-1276c3ba181e.png)
![Screen Shot 2020-02-25 at 7 23 04 
AM](https://user-images.githubusercontent.com/1317309/75197708-43152880-57a2-11ea-9de2-7d37eaf88102.png)
![Screen Shot 2020-02-25 at 7 23 31 
AM](https://user-images.githubusercontent.com/1317309/75197710-43adbf00-57a2-11ea-9ae4-4e292de39c36.png)

Closes #27687 from HeartSaVioR/SPARK-30943.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Shixiong Zhu 
---
 .../org/apache/spark/ui/static/streaming-page.js   |  2 +-
 .../spark/ui/static/structured-streaming-page.js   |  4 +--
 .../ui/StreamingQueryStatisticsPage.scala  | 36 ++
 .../apache/spark/streaming/ui/StreamingPage.scala  | 13 +++-
 4 files changed, 45 insertions(+), 10 deletions(-)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js 
b/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js
index 5b75bc3..ed3e65c3 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/streaming-page.js
@@ -171,7 +171,7 @@ function drawTimeline(id, data, minX, maxX, minY, maxY, 
unitY, batchInterval) {
 .attr("cy", function(d) { return y(d.y); })
 .attr("r", function(d) { return isFailedBatch(d.x) ? "2" : "3";})
 .on('mouseover', function(d) {
-var tip = formatYValue(d.y) + " " + unitY + " at " + 
timeFormat[d.x];
+var tip = formatYValue(d.y) + " " + unitY + " at " + 
timeTipStrings[d.x];
 showBootstrapTooltip(d3.select(this).node(), tip);
 // show the point
 d3.select(this)
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js
 
b/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js
index 70250fd..c92226b 100644
--- 
a/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js
+++ 
b/core/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js
@@ -106,12 +106,12 @@ function drawAreaStack(id, labels, values, minX, maxX, 
minY, maxY) {
 .on('mouseover', function(d) {
 var tip = '';
 var idx = 0;
-var _values = timeToValues[d._x]
+var _values = formattedTimeToValues[d._x];
 _values.forEach(function (k) {
 tip += labels[idx] + ': ' + k + '   ';
 idx += 1;
 });
-tip += " at " + d._x
+tip += " at " + formattedTimeTipStrings[d._x];
 showBootstrapTooltip(d3.select(this).node(), tip);
 })
 .on('mouseout',  function() {
diff --git 

[spark] branch branch-3.0 updated: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 16c7668  [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction 
into new section of monitoring.md
16c7668 is described below

commit 16c76688640b662038737d9de66e541e8051b345
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Feb 25 15:17:16 2020 -0800

[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new 
section of monitoring.md

### What changes were proposed in this pull request?

This is a FOLLOW-UP PR for review comment on #27208 : 
https://github.com/apache/spark/pull/27208#pullrequestreview-347451714

This PR documents a new feature `Eventlog Compaction` into the new section 
of `monitoring.md`, as it only has one configuration on the SHS side and it's 
hard to explain everything on the description on the single configuration.

### Why are the changes needed?

Event log compaction lacks the documentation for what it is and how it 
helps. This PR will explain it.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Built docs via jekyll.

> change on the new section

https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png;>

> change on the table

https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png;>

Closes #27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 02f8165343fb5c6fc4e5a1874252abdfe886b5b2)
Signed-off-by: Dongjoon Hyun 
---
 docs/monitoring.md | 57 +-
 1 file changed, 44 insertions(+), 13 deletions(-)

diff --git a/docs/monitoring.md b/docs/monitoring.md
index c30aa99..4cba15b 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -95,6 +95,48 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction on rolling event log files
+
+A long-running application (e.g. streaming) can bring a huge single event log 
file which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have rolling event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply compaction on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+Details will be described below, but please note in prior that compaction is 
LOSSY operation.
+Compaction will discard some events which will be no longer seen on UI - you 
may want to check which events will be discarded
+before enabling the option.
+
+When the compaction happens, the History Server lists all the available event 
log files for the application, and considers
+the event log files having less index than the file with smallest index which 
will be retained as target of compaction.
+For example, if the application A has 5 event log files and 
spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
then first 3 log files will be selected to be compacted.
+
+Once it selects the target, it analyzes them to figure out which events can be 
excluded, and rewrites them
+into one compact file with discarding events which are decided to exclude.
+
+The compaction tries to exclude the events which point to the outdated data. 
As of now, below describes the candidates of events to be excluded:
+
+* Events for the job which is finished, and related stage/tasks events
+* Events for the executor which is terminated
+* Events for the SQL execution which is finished, and related job/stage/tasks 
events
+
+Once rewriting is done, original log files will be deleted, via best-effort 
manner. The History Server may not be able to delete
+the original log files, but it will not affect the operation of the History 
Server.
+
+Please note that Spark History Server may not compact the old event log files 
if figures out not a lot of space
+would be reduced during compaction. For streaming query we normally expect 
compaction
+will run as each micro-batch will trigger one or more jobs which will be 
finished shortly, but compaction won't run
+in many cases for batch query.
+
+Please also note that this is a new feature introduced in Spark 3.0, and may 
not be completely stable. Under some circumstances,
+the compaction may exclude more 

[spark] branch branch-3.0 updated: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 16c7668  [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction 
into new section of monitoring.md
16c7668 is described below

commit 16c76688640b662038737d9de66e541e8051b345
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Tue Feb 25 15:17:16 2020 -0800

[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new 
section of monitoring.md

### What changes were proposed in this pull request?

This is a FOLLOW-UP PR for review comment on #27208 : 
https://github.com/apache/spark/pull/27208#pullrequestreview-347451714

This PR documents a new feature `Eventlog Compaction` into the new section 
of `monitoring.md`, as it only has one configuration on the SHS side and it's 
hard to explain everything on the description on the single configuration.

### Why are the changes needed?

Event log compaction lacks the documentation for what it is and how it 
helps. This PR will explain it.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Built docs via jekyll.

> change on the new section

https://user-images.githubusercontent.com/1317309/74599587-eb9efa80-50c7-11ea-942c-f7744268e40b.png;>

> change on the table

https://user-images.githubusercontent.com/1317309/73431190-2e9c6680-4383-11ea-8ce0-815f10917ddd.png;>

Closes #27398 from HeartSaVioR/SPARK-30481-FOLLOWUP-document-new-feature.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 02f8165343fb5c6fc4e5a1874252abdfe886b5b2)
Signed-off-by: Dongjoon Hyun 
---
 docs/monitoring.md | 57 +-
 1 file changed, 44 insertions(+), 13 deletions(-)

diff --git a/docs/monitoring.md b/docs/monitoring.md
index c30aa99..4cba15b 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -95,6 +95,48 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction on rolling event log files
+
+A long-running application (e.g. streaming) can bring a huge single event log 
file which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have rolling event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply compaction on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+Details will be described below, but please note in prior that compaction is 
LOSSY operation.
+Compaction will discard some events which will be no longer seen on UI - you 
may want to check which events will be discarded
+before enabling the option.
+
+When the compaction happens, the History Server lists all the available event 
log files for the application, and considers
+the event log files having less index than the file with smallest index which 
will be retained as target of compaction.
+For example, if the application A has 5 event log files and 
spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
then first 3 log files will be selected to be compacted.
+
+Once it selects the target, it analyzes them to figure out which events can be 
excluded, and rewrites them
+into one compact file with discarding events which are decided to exclude.
+
+The compaction tries to exclude the events which point to the outdated data. 
As of now, below describes the candidates of events to be excluded:
+
+* Events for the job which is finished, and related stage/tasks events
+* Events for the executor which is terminated
+* Events for the SQL execution which is finished, and related job/stage/tasks 
events
+
+Once rewriting is done, original log files will be deleted, via best-effort 
manner. The History Server may not be able to delete
+the original log files, but it will not affect the operation of the History 
Server.
+
+Please note that Spark History Server may not compact the old event log files 
if figures out not a lot of space
+would be reduced during compaction. For streaming query we normally expect 
compaction
+will run as each micro-batch will trigger one or more jobs which will be 
finished shortly, but compaction won't run
+in many cases for batch query.
+
+Please also note that this is a new feature introduced in Spark 3.0, and may 
not be completely stable. Under some circumstances,
+the compaction may exclude more 

[spark] branch master updated (8f247e5 -> 02f8165)

2020-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8f247e5  [SPARK-30918][SQL] improve the splitting of skewed partitions
 add 02f8165  [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction 
into new section of monitoring.md

No new revisions were added by this update.

Summary of changes:
 docs/monitoring.md | 57 +-
 1 file changed, 44 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8f247e5 -> 02f8165)

2020-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8f247e5  [SPARK-30918][SQL] improve the splitting of skewed partitions
 add 02f8165  [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction 
into new section of monitoring.md

No new revisions were added by this update.

Summary of changes:
 docs/monitoring.md | 57 +-
 1 file changed, 44 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30918][SQL] improve the splitting of skewed partitions

2020-02-25 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b968cd3  [SPARK-30918][SQL] improve the splitting of skewed partitions
b968cd3 is described below

commit b968cd37796a5730fe5c2318d23a38416f550957
Author: Wenchen Fan 
AuthorDate: Tue Feb 25 14:10:29 2020 -0800

[SPARK-30918][SQL] improve the splitting of skewed partitions

### What changes were proposed in this pull request?

Use the average size of the non-skewed partitions as the target size when 
splitting skewed partitions, instead of 
ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD

### Why are the changes needed?

The goal of skew join optimization is to make the data distribution move 
even. So it makes more sense the use the average size of the non-skewed 
partitions as the target size.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #27669 from cloud-fan/aqe.

Authored-by: Wenchen Fan 
Signed-off-by: Xiao Li 
(cherry picked from commit 8f247e5d3682ad765bdbb9ea5a4315862c5a383c)
Signed-off-by: Xiao Li 
---
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 +---
 .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  |  4 +-
 3 files changed, 54 insertions(+), 22 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 674c6df..e6f7cfd 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -432,19 +432,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
-  val ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD =
-
buildConf("spark.sql.adaptive.skewedJoinOptimization.skewedPartitionSizeThreshold")
-  .doc("Configures the minimum size in bytes for a partition that is 
considered as a skewed " +
-"partition in adaptive skewed join.")
-  .bytesConf(ByteUnit.BYTE)
-  .createWithDefaultString("64MB")
-
   val ADAPTIVE_EXECUTION_SKEWED_PARTITION_FACTOR =
 
buildConf("spark.sql.adaptive.skewedJoinOptimization.skewedPartitionFactor")
   .doc("A partition is considered as a skewed partition if its size is 
larger than" +
 " this factor multiple the median partition size and also larger than 
" +
-s" ${ADAPTIVE_EXECUTION_SKEWED_PARTITION_SIZE_THRESHOLD.key}")
+s" ${SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key}")
   .intConf
+  .checkValue(_ > 0, "The skew factor must be positive.")
   .createWithDefault(10)
 
   val NON_EMPTY_PARTITION_RATIO_FOR_BROADCAST_JOIN =
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
index 578d2d7..d3cb864 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
@@ -34,6 +34,30 @@ import 
org.apache.spark.sql.execution.exchange.{EnsureRequirements, ShuffleExcha
 import org.apache.spark.sql.execution.joins.SortMergeJoinExec
 import org.apache.spark.sql.internal.SQLConf
 
+/**
+ * A rule to optimize skewed joins to avoid straggler tasks whose share of 
data are significantly
+ * larger than those of the rest of the tasks.
+ *
+ * The general idea is to divide each skew partition into smaller partitions 
and replicate its
+ * matching partition on the other side of the join so that they can run in 
parallel tasks.
+ * Note that when matching partitions from the left side and the right side 
both have skew,
+ * it will become a cartesian product of splits from left and right joining 
together.
+ *
+ * For example, assume the Sort-Merge join has 4 partitions:
+ * left:  [L1, L2, L3, L4]
+ * right: [R1, R2, R3, R4]
+ *
+ * Let's say L2, L4 and R3, R4 are skewed, and each of them get split into 2 
sub-partitions. This
+ * is scheduled to run 4 tasks at the beginning: (L1, R1), (L2, R2), (L2, R2), 
(L2, R2).
+ * This rule expands it to 9 tasks to increase parallelism:
+ * (L1, R1),
+ * (L2-1, R2), (L2-2, R2),
+ * (L3, R3-1), (L3, R3-2),
+ * (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2)
+ *
+ * Note that, when this rule is enabled, it also coalesces non-skewed 
partitions like
+ * `ReduceNumShufflePartitions` does.
+ */
 case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] {
 
   private val ensureRequirements = EnsureRequirements(conf)
@@ -43,12 +67,12 @@ case class 

[spark] branch master updated (e086a78 -> 8f247e5)

2020-02-25 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e086a78  [MINOR][ML] ML cleanup
 add 8f247e5  [SPARK-30918][SQL] improve the splitting of skewed partitions

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 +---
 .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  |  4 +-
 3 files changed, 54 insertions(+), 22 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e086a78 -> 8f247e5)

2020-02-25 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e086a78  [MINOR][ML] ML cleanup
 add 8f247e5  [SPARK-30918][SQL] improve the splitting of skewed partitions

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala| 10 +---
 .../execution/adaptive/OptimizeSkewedJoin.scala| 62 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  |  4 +-
 3 files changed, 54 insertions(+), 22 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c46c067 -> e086a78)

2020-02-25 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c46c067  [SPARK-30942] Fix the warning for requiring cores to be 
limiting resources
 add e086a78  [MINOR][ML] ML cleanup

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/ml/linalg/BLAS.scala |  4 +---
 mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala  |  4 ++--
 .../scala/org/apache/spark/ml/attribute/AttributeGroup.scala | 10 --
 .../org/apache/spark/ml/classification/FMClassifier.scala|  1 -
 .../ml/classification/MultilayerPerceptronClassifier.scala   |  2 --
 .../spark/ml/classification/RandomForestClassifier.scala |  2 +-
 .../spark/ml/evaluation/BinaryClassificationEvaluator.scala  |  6 --
 .../org/apache/spark/ml/evaluation/ClusteringEvaluator.scala |  2 +-
 .../scala/org/apache/spark/ml/feature/MinMaxScaler.scala |  2 --
 .../spark/ml/r/GeneralizedLinearRegressionWrapper.scala  |  1 -
 .../main/scala/org/apache/spark/ml/recommendation/ALS.scala  |  8 
 .../scala/org/apache/spark/ml/regression/FMRegressor.scala   |  1 -
 .../org/apache/spark/ml/source/image/ImageFileFormat.scala   |  5 ++---
 .../scala/org/apache/spark/ml/tree/impl/RandomForest.scala   |  2 +-
 .../org/apache/spark/mllib/clustering/StreamingKMeans.scala  |  7 +++
 .../scala/org/apache/spark/mllib/feature/ChiSqSelector.scala |  5 +++--
 .../org/apache/spark/mllib/feature/ElementwiseProduct.scala  |  5 +++--
 .../src/main/scala/org/apache/spark/mllib/feature/IDF.scala  | 12 +++-
 .../src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala  |  4 +---
 .../apache/spark/mllib/optimization/GradientDescent.scala|  2 +-
 .../spark/mllib/stat/correlation/SpearmanCorrelation.scala   |  4 +---
 .../scala/org/apache/spark/mllib/tree/impurity/Entropy.scala |  4 +++-
 .../apache/spark/mllib/tree/model/treeEnsembleModels.scala   |  2 +-
 .../org/apache/spark/mllib/util/LinearDataGenerator.scala|  2 --
 24 files changed, 43 insertions(+), 54 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c46c067 -> e086a78)

2020-02-25 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c46c067  [SPARK-30942] Fix the warning for requiring cores to be 
limiting resources
 add e086a78  [MINOR][ML] ML cleanup

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/ml/linalg/BLAS.scala |  4 +---
 mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala  |  4 ++--
 .../scala/org/apache/spark/ml/attribute/AttributeGroup.scala | 10 --
 .../org/apache/spark/ml/classification/FMClassifier.scala|  1 -
 .../ml/classification/MultilayerPerceptronClassifier.scala   |  2 --
 .../spark/ml/classification/RandomForestClassifier.scala |  2 +-
 .../spark/ml/evaluation/BinaryClassificationEvaluator.scala  |  6 --
 .../org/apache/spark/ml/evaluation/ClusteringEvaluator.scala |  2 +-
 .../scala/org/apache/spark/ml/feature/MinMaxScaler.scala |  2 --
 .../spark/ml/r/GeneralizedLinearRegressionWrapper.scala  |  1 -
 .../main/scala/org/apache/spark/ml/recommendation/ALS.scala  |  8 
 .../scala/org/apache/spark/ml/regression/FMRegressor.scala   |  1 -
 .../org/apache/spark/ml/source/image/ImageFileFormat.scala   |  5 ++---
 .../scala/org/apache/spark/ml/tree/impl/RandomForest.scala   |  2 +-
 .../org/apache/spark/mllib/clustering/StreamingKMeans.scala  |  7 +++
 .../scala/org/apache/spark/mllib/feature/ChiSqSelector.scala |  5 +++--
 .../org/apache/spark/mllib/feature/ElementwiseProduct.scala  |  5 +++--
 .../src/main/scala/org/apache/spark/mllib/feature/IDF.scala  | 12 +++-
 .../src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala  |  4 +---
 .../apache/spark/mllib/optimization/GradientDescent.scala|  2 +-
 .../spark/mllib/stat/correlation/SpearmanCorrelation.scala   |  4 +---
 .../scala/org/apache/spark/mllib/tree/impurity/Entropy.scala |  4 +++-
 .../apache/spark/mllib/tree/model/treeEnsembleModels.scala   |  2 +-
 .../org/apache/spark/mllib/util/LinearDataGenerator.scala|  2 --
 24 files changed, 43 insertions(+), 54 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-30942] Fix the warning for requiring cores to be limiting resources

2020-02-25 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c46c067  [SPARK-30942] Fix the warning for requiring cores to be 
limiting resources
c46c067 is described below

commit c46c067f39213df9b3ee5a51e7d7803b867a0d54
Author: Thomas Graves 
AuthorDate: Tue Feb 25 10:55:56 2020 -0600

[SPARK-30942] Fix the warning for requiring cores to be limiting resources

### What changes were proposed in this pull request?

fix the warning for limiting resources when we don't know the number of 
executor cores. The issue is that there are places in the Spark code that use 
cores/task cpus to calculate slots and until the entire Stage level scheduling 
feature is in, we have to rely on the cores being the limiting resource.

Change the check to only warn when custom resources are specified.

### Why are the changes needed?

fix the check and warn when we should

### Does this PR introduce any user-facing change?

A warning is printed

### How was this patch tested?

manually tested spark-shell with standalone mode, yarn, local mode.

Closes #27686 from tgravescs/SPARK-30942.

Authored-by: Thomas Graves 
Signed-off-by: Thomas Graves 
---
 core/src/main/scala/org/apache/spark/SparkContext.scala| 2 +-
 .../src/main/scala/org/apache/spark/resource/ResourceProfile.scala | 7 +++
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index a47136e..f377f13 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -2798,7 +2798,7 @@ object SparkContext extends Logging {
 defaultProf.maxTasksPerExecutor(sc.conf) < cpuSlots) {
 throw new IllegalArgumentException("The number of slots on an executor 
has to be " +
   "limited by the number of cores, otherwise you waste resources and " 
+
-  "dynamic allocation doesn't work properly. Your configuration has " +
+  "some scheduling doesn't work properly. Your configuration has " +
   s"core/task cpu slots = ${cpuSlots} and " +
   s"${limitingResource} = " +
   s"${defaultProf.maxTasksPerExecutor(sc.conf)}. Please adjust your 
configuration " +
diff --git 
a/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala 
b/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala
index 2608ab9..5b2476c 100644
--- a/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala
+++ b/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala
@@ -168,7 +168,7 @@ class ResourceProfile(
 // limiting resource because the scheduler code uses that for slots
 throw new IllegalArgumentException("The number of slots on an 
executor has to be " +
   "limited by the number of cores, otherwise you waste resources 
and " +
-  "dynamic allocation doesn't work properly. Your configuration 
has " +
+  "some scheduling doesn't work properly. Your configuration has " 
+
   s"core/task cpu slots = ${taskLimit} and " +
   s"${execReq.resourceName} = ${numTasks}. " +
   "Please adjust your configuration so that all resources require 
same number " +
@@ -183,12 +183,11 @@ class ResourceProfile(
   "no corresponding task resource request was specified.")
   }
 }
-if(!shouldCheckExecCores && Utils.isDynamicAllocationEnabled(sparkConf)) {
+if(!shouldCheckExecCores && execResourceToCheck.nonEmpty) {
   // if we can't rely on the executor cores config throw a warning for user
   logWarning("Please ensure that the number of slots available on your " +
 "executors is limited by the number of cores to task cpus and not 
another " +
-"custom resource. If cores is not the limiting resource then dynamic " 
+
-"allocation will not work properly!")
+"custom resource.")
 }
 if (taskResourcesToCheck.nonEmpty) {
   throw new SparkException("No executor resource configs were not 
specified for the " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (761209c -> ffc0935)

2020-02-25 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 761209c  [SPARK-30919][SQL] Make interval multiply and divide's 
overflow behavior consistent with other operations
 add ffc0935  [SPARK-30869][SQL] Convert dates to/from timestamps in 
microseconds precision

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/datetimeExpressions.scala | 10 ++--
 .../spark/sql/catalyst/util/DateFormatter.scala|  7 +--
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 58 ++
 .../spark/sql/catalyst/util/IntervalUtils.scala|  4 +-
 .../sql/catalyst/util/TimestampFormatter.scala |  4 +-
 .../sql/catalyst/csv/UnivocityParserSuite.scala|  4 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  4 +-
 .../expressions/DateExpressionsSuite.scala | 26 +-
 .../optimizer/ComputeCurrentTimeSuite.scala|  6 ++-
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 29 +--
 .../sql/catalyst/util/IntervalUtilsSuite.scala |  6 +--
 .../parquet/VectorizedColumnReader.java|  4 +-
 .../datasources/binaryfile/BinaryFileFormat.scala  |  2 +-
 .../datasources/parquet/ParquetRowConverter.scala  |  2 +-
 .../datasources/parquet/ParquetWriteSupport.scala  |  2 +-
 .../streaming/EventTimeWatermarkExec.scala |  4 +-
 .../spark/sql/execution/streaming/Triggers.scala   |  4 +-
 .../continuous/ContinuousRateStreamSource.scala|  2 +-
 .../sources/RateStreamMicroBatchStream.scala   |  2 +-
 .../sources/TextSocketMicroBatchStream.scala   |  2 +-
 .../org/apache/spark/sql/DateFunctionsSuite.scala  |  6 +--
 .../spark/sql/StatisticsCollectionTestBase.scala   |  4 +-
 .../sql/execution/datasources/json/JsonSuite.scala |  2 +-
 .../apache/spark/sql/streaming/StreamSuite.scala   |  3 +-
 24 files changed, 98 insertions(+), 99 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations

2020-02-25 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1d746eb  [SPARK-30919][SQL] Make interval multiply and divide's 
overflow behavior consistent with other operations
1d746eb is described below

commit 1d746eb0afddd3c2a4e1313dddf80ac0aec00a7a
Author: Kent Yao 
AuthorDate: Tue Feb 25 22:19:24 2020 +0800

[SPARK-30919][SQL] Make interval multiply and divide's overflow behavior 
consistent with other operations

### What changes were proposed in this pull request?

The current behavior of interval multiply and divide follows the ANSI SQL 
standard when overflow, it is compatible with other operations when 
`spark.sql.ansi.enabled` is true, but not compatible when 
`spark.sql.ansi.enabled` is false.

When `spark.sql.ansi.enabled` is false, as the factor is a double value, so 
it should use java's rounding or truncation behavior for casting double to 
integrals. when divided by zero, it returns `null`.  we also follow the natural 
rules for intervals as defined in the Gregorian calendar, so we do not add the 
month fraction to days but add days fraction to microseconds.

### Why are the changes needed?

Make interval multiply and divide's overflow behavior consistent with other 
interval operations

### Does this PR introduce any user-facing change?

no, these are new features in 3.0

### How was this patch tested?

add uts

Closes #27672 from yaooqinn/SPARK-30919.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 761209c1f2af513a9db2e08c5937531cff7aeeed)
Signed-off-by: Wenchen Fan 
---
 .../catalyst/expressions/intervalExpressions.scala | 34 +---
 .../spark/sql/catalyst/util/IntervalUtils.scala| 45 +---
 .../expressions/IntervalExpressionsSuite.scala | 37 -
 .../sql/catalyst/util/IntervalUtilsSuite.scala | 61 ++
 .../test/resources/sql-tests/inputs/interval.sql   |  4 ++
 .../sql-tests/results/ansi/interval.sql.out| 38 +-
 .../resources/sql-tests/results/interval.sql.out   | 49 +
 7 files changed, 210 insertions(+), 58 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
index 831510e..c09350f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala
@@ -22,6 +22,7 @@ import java.util.Locale
 import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
 import org.apache.spark.sql.catalyst.util.IntervalUtils
 import org.apache.spark.sql.catalyst.util.IntervalUtils._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.CalendarInterval
 
@@ -112,13 +113,14 @@ object ExtractIntervalPart {
 
 abstract class IntervalNumOperation(
 interval: Expression,
-num: Expression,
-operation: (CalendarInterval, Double) => CalendarInterval,
-operationName: String)
+num: Expression)
   extends BinaryExpression with ImplicitCastInputTypes with Serializable {
   override def left: Expression = interval
   override def right: Expression = num
 
+  protected val operation: (CalendarInterval, Double) => CalendarInterval
+  protected def operationName: String
+
   override def inputTypes: Seq[AbstractDataType] = Seq(CalendarIntervalType, 
DoubleType)
   override def dataType: DataType = CalendarIntervalType
 
@@ -136,11 +138,29 @@ abstract class IntervalNumOperation(
   override def prettyName: String = operationName.stripSuffix("Exact") + 
"_interval"
 }
 
-case class MultiplyInterval(interval: Expression, num: Expression)
-  extends IntervalNumOperation(interval, num, multiplyExact, "multiplyExact")
+case class MultiplyInterval(
+interval: Expression,
+num: Expression,
+checkOverflow: Boolean = SQLConf.get.ansiEnabled)
+  extends IntervalNumOperation(interval, num) {
+
+  override protected val operation: (CalendarInterval, Double) => 
CalendarInterval =
+if (checkOverflow) multiplyExact else multiply
+
+  override protected def operationName: String = if (checkOverflow) 
"multiplyExact" else "multiply"
+}
+
+case class DivideInterval(
+interval: Expression,
+num: Expression,
+checkOverflow: Boolean = SQLConf.get.ansiEnabled)
+  extends IntervalNumOperation(interval, num) {
+
+  override protected val operation: (CalendarInterval, Double) => 
CalendarInterval =
+if (checkOverflow) divideExact else divide
 
-case class 

[spark] branch master updated (e45f2c7 -> 761209c)

2020-02-25 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e45f2c7  [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests
 add 761209c  [SPARK-30919][SQL] Make interval multiply and divide's 
overflow behavior consistent with other operations

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/intervalExpressions.scala | 34 +---
 .../spark/sql/catalyst/util/IntervalUtils.scala| 45 +---
 .../expressions/IntervalExpressionsSuite.scala | 37 -
 .../sql/catalyst/util/IntervalUtilsSuite.scala | 61 ++
 .../test/resources/sql-tests/inputs/interval.sql   |  4 ++
 .../sql-tests/results/ansi/interval.sql.out| 38 +-
 .../resources/sql-tests/results/interval.sql.out   | 49 +
 7 files changed, 210 insertions(+), 58 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (e45f2c7 -> 761209c)

2020-02-25 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e45f2c7  [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests
 add 761209c  [SPARK-30919][SQL] Make interval multiply and divide's 
overflow behavior consistent with other operations

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/intervalExpressions.scala | 34 +---
 .../spark/sql/catalyst/util/IntervalUtils.scala| 45 +---
 .../expressions/IntervalExpressionsSuite.scala | 37 -
 .../sql/catalyst/util/IntervalUtilsSuite.scala | 61 ++
 .../test/resources/sql-tests/inputs/interval.sql   |  4 ++
 .../sql-tests/results/ansi/interval.sql.out| 38 +-
 .../resources/sql-tests/results/interval.sql.out   | 49 +
 7 files changed, 210 insertions(+), 58 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ff6662a  [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests
ff6662a is described below

commit ff6662acc6ff1511cc1c6b3671c54156102b0aae
Author: Yuanjian Li 
AuthorDate: Tue Feb 25 17:37:34 2020 +0900

[SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests

### What changes were proposed in this pull request?
Split the nested CTE cases into a single file `cte-nested.sql`, which will 
be reused in cte-legacy.sql and cte-nonlegacy.sql.

### Why are the changes needed?
Make the cases easy to maintain.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing UT.

Closes #27667 from xuanyuanking/SPARK-28228-test.

Authored-by: Yuanjian Li 
Signed-off-by: HyukjinKwon 
---
 .../test/resources/sql-tests/inputs/cte-legacy.sql | 117 +-
 .../inputs/{cte-legacy.sql => cte-nested.sql}  |  10 --
 .../resources/sql-tests/inputs/cte-nonlegacy.sql   |   2 +-
 .../src/test/resources/sql-tests/inputs/cte.sql| 106 
 .../resources/sql-tests/results/cte-legacy.sql.out |  42 +
 .../results/{cte.sql.out => cte-nested.sql.out}| 177 +
 .../sql-tests/results/cte-nonlegacy.sql.out| 177 +
 .../test/resources/sql-tests/results/cte.sql.out   | 174 +---
 8 files changed, 7 insertions(+), 798 deletions(-)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql 
b/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql
index d8754d3..29dee1a 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql
@@ -1,115 +1,2 @@
-create temporary view t as select * from values 0, 1, 2 as t(id);
-create temporary view t2 as select * from values 0, 1 as t(id);
-
--- CTE legacy substitution
-SET spark.sql.legacy.ctePrecedencePolicy=legacy;
-
--- CTE in CTE definition
-WITH t as (
-  WITH t2 AS (SELECT 1)
-  SELECT * FROM t2
-)
-SELECT * FROM t;
-
--- CTE in subquery
-SELECT max(c) FROM (
-  WITH t(c) AS (SELECT 1)
-  SELECT * FROM t
-);
-
--- CTE in subquery expression
-SELECT (
-  WITH t AS (SELECT 1)
-  SELECT * FROM t
-);
-
--- CTE in CTE definition shadows outer
-WITH
-  t AS (SELECT 1),
-  t2 AS (
-WITH t AS (SELECT 2)
-SELECT * FROM t
-  )
-SELECT * FROM t2;
-
--- CTE in CTE definition shadows outer 2
-WITH
-  t(c) AS (SELECT 1),
-  t2 AS (
-SELECT (
-  SELECT max(c) FROM (
-WITH t(c) AS (SELECT 2)
-SELECT * FROM t
-  )
-)
-  )
-SELECT * FROM t2;
-
--- CTE in CTE definition shadows outer 3
-WITH
-  t AS (SELECT 1),
-  t2 AS (
-WITH t AS (SELECT 2),
-t2 AS (
-  WITH t AS (SELECT 3)
-  SELECT * FROM t
-)
-SELECT * FROM t2
-  )
-SELECT * FROM t2;
-
--- CTE in subquery shadows outer
-WITH t(c) AS (SELECT 1)
-SELECT max(c) FROM (
-  WITH t(c) AS (SELECT 2)
-  SELECT * FROM t
-);
-
--- CTE in subquery shadows outer 2
-WITH t(c) AS (SELECT 1)
-SELECT sum(c) FROM (
-  SELECT max(c) AS c FROM (
-WITH t(c) AS (SELECT 2)
-SELECT * FROM t
-  )
-);
-
--- CTE in subquery shadows outer 3
-WITH t(c) AS (SELECT 1)
-SELECT sum(c) FROM (
-  WITH t(c) AS (SELECT 2)
-  SELECT max(c) AS c FROM (
-WITH t(c) AS (SELECT 3)
-SELECT * FROM t
-  )
-);
-
--- CTE in subquery expression shadows outer
-WITH t AS (SELECT 1)
-SELECT (
-  WITH t AS (SELECT 2)
-  SELECT * FROM t
-);
-
--- CTE in subquery expression shadows outer 2
-WITH t AS (SELECT 1)
-SELECT (
-  SELECT (
-WITH t AS (SELECT 2)
-SELECT * FROM t
-  )
-);
-
--- CTE in subquery expression shadows outer 3
-WITH t AS (SELECT 1)
-SELECT (
-  WITH t AS (SELECT 2)
-  SELECT (
-WITH t AS (SELECT 3)
-SELECT * FROM t
-  )
-);
-
--- Clean up
-DROP VIEW IF EXISTS t;
-DROP VIEW IF EXISTS t2;
+--SET spark.sql.legacy.ctePrecedencePolicy = legacy
+--IMPORT cte-nested.sql
diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql 
b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
similarity index 86%
copy from sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql
copy to sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
index d8754d3..5e5e3a5 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/cte-legacy.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql
@@ -1,9 +1,3 @@
-create temporary view t as select * from values 0, 1, 2 as t(id);
-create temporary view t2 as select * from values 0, 1 as t(id);
-
--- CTE legacy substitution
-SET spark.sql.legacy.ctePrecedencePolicy=legacy;
-
 -- CTE in CTE definition
 WITH t as (
   WITH t2 AS (SELECT 1)
@@ -109,7 +103,3 @@ SELECT (
 SELECT * FROM t
   

[spark] branch master updated (f152d2a -> e45f2c7)

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f152d2a  [SPARK-30944][BUILD] Update URL for Google Cloud Storage 
mirror of Maven Central
 add e45f2c7  [SPARK-28228][SQL][TESTS] Refactoring for nested CTE tests

No new revisions were added by this update.

Summary of changes:
 .../test/resources/sql-tests/inputs/cte-legacy.sql | 117 +-
 .../inputs/{cte-legacy.sql => cte-nested.sql}  |  10 --
 .../resources/sql-tests/inputs/cte-nonlegacy.sql   |   2 +-
 .../src/test/resources/sql-tests/inputs/cte.sql| 106 
 .../resources/sql-tests/results/cte-legacy.sql.out |  42 +
 .../results/{cte.sql.out => cte-nested.sql.out}| 177 +
 .../sql-tests/results/cte-nonlegacy.sql.out| 177 +
 .../test/resources/sql-tests/results/cte.sql.out   | 174 +---
 8 files changed, 7 insertions(+), 798 deletions(-)
 copy sql/core/src/test/resources/sql-tests/inputs/{cte-legacy.sql => 
cte-nested.sql} (86%)
 copy sql/core/src/test/resources/sql-tests/results/{cte.sql.out => 
cte-nested.sql.out} (59%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new b302caf  [SPARK-30944][BUILD] Update URL for Google Cloud Storage 
mirror of Maven Central
b302caf is described below

commit b302caf4d28752a2bf5537c69fd9cbdc8b703e8b
Author: Josh Rosen 
AuthorDate: Tue Feb 25 17:04:13 2020 +0900

[SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven 
Central

This PR is a followup to #27307: per 
https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926,
 the Google Cloud Storage mirror of Maven Central has updated its URLs: the new 
paths are updated more frequently. The new paths are listed on 
https://storage-download.googleapis.com/maven-central/index.html

This patch updates our build files to use these new URLs.

No.

Existing build + tests.

Closes #27688 from JoshRosen/update-gcs-mirror-url.

Authored-by: Josh Rosen 
Signed-off-by: HyukjinKwon 
---
 pom.xml  | 4 ++--
 project/SparkBuild.scala | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/pom.xml b/pom.xml
index 32b7bae..0741096 100644
--- a/pom.xml
+++ b/pom.xml
@@ -237,7 +237,7 @@
 See https://storage-download.googleapis.com/maven-central/index.html
   -->
   GCS Maven Central mirror
-  
https://maven-central.storage-download.googleapis.com/repos/central/data/
+  https://maven-central.storage-download.googleapis.com/maven2/
   
 true
   
@@ -268,7 +268,7 @@
 See https://storage-download.googleapis.com/maven-central/index.html
   -->
   GCS Maven Central mirror
-  
https://maven-central.storage-download.googleapis.com/repos/central/data/
+  https://maven-central.storage-download.googleapis.com/maven2/
   
 true
   
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 4a3f8a5..3f85ac6 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -224,7 +224,7 @@ object SparkBuild extends PomBuild {
 resolvers := Seq(
   // Google Mirror of Maven Central, placed first so that it's used 
instead of flaky Maven Central.
   // See https://storage-download.googleapis.com/maven-central/index.html 
for more info.
-  "gcs-maven-central-mirror" at 
"https://maven-central.storage-download.googleapis.com/repos/central/data/;,
+  "gcs-maven-central-mirror" at 
"https://maven-central.storage-download.googleapis.com/maven2/;,
   DefaultMavenRepository,
   Resolver.mavenLocal,
   Resolver.file("local", file(Path.userHome.absolutePath + 
"/.ivy2/local"))(Resolver.ivyStylePatterns)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2bbb995  [SPARK-30944][BUILD] Update URL for Google Cloud Storage 
mirror of Maven Central
2bbb995 is described below

commit 2bbb9958c3b017062cafca8c78fc9e6d6d33dbd7
Author: Josh Rosen 
AuthorDate: Tue Feb 25 17:04:13 2020 +0900

[SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven 
Central

### What changes were proposed in this pull request?

This PR is a followup to #27307: per 
https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926,
 the Google Cloud Storage mirror of Maven Central has updated its URLs: the new 
paths are updated more frequently. The new paths are listed on 
https://storage-download.googleapis.com/maven-central/index.html

This patch updates our build files to use these new URLs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing build + tests.

Closes #27688 from JoshRosen/update-gcs-mirror-url.

Authored-by: Josh Rosen 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f152d2a0a80e2756dd620538a46b030dd5a6e630)
Signed-off-by: HyukjinKwon 
---
 pom.xml   | 4 ++--
 project/SparkBuild.scala  | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/pom.xml b/pom.xml
index 925fa28..b3750e4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -253,7 +253,7 @@
 See https://storage-download.googleapis.com/maven-central/index.html
   -->
   GCS Maven Central mirror
-  
https://maven-central.storage-download.googleapis.com/repos/central/data/
+  https://maven-central.storage-download.googleapis.com/maven2/
   
 true
   
@@ -284,7 +284,7 @@
 See https://storage-download.googleapis.com/maven-central/index.html
   -->
   GCS Maven Central mirror
-  
https://maven-central.storage-download.googleapis.com/repos/central/data/
+  https://maven-central.storage-download.googleapis.com/maven2/
   
 true
   
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index a07823c..fcde1e9 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -226,7 +226,7 @@ object SparkBuild extends PomBuild {
 resolvers := Seq(
   // Google Mirror of Maven Central, placed first so that it's used 
instead of flaky Maven Central.
   // See https://storage-download.googleapis.com/maven-central/index.html 
for more info.
-  "gcs-maven-central-mirror" at 
"https://maven-central.storage-download.googleapis.com/repos/central/data/;,
+  "gcs-maven-central-mirror" at 
"https://maven-central.storage-download.googleapis.com/maven2/;,
   DefaultMavenRepository,
   Resolver.mavenLocal,
   Resolver.file("local", file(Path.userHome.absolutePath + 
"/.ivy2/local"))(Resolver.ivyStylePatterns)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 5297497..674c6df 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2136,7 +2136,7 @@ object SQLConf {
 "if the default Maven Central repo is unreachable.")
   .stringConf
   .createWithDefault(
-
"https://maven-central.storage-download.googleapis.com/repos/central/data/;)
+"https://maven-central.storage-download.googleapis.com/maven2/;)
 
   val LEGACY_FROM_DAYTIME_STRING =
 buildConf("spark.sql.legacy.fromDayTimeString.enabled")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven Central

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2bbb995  [SPARK-30944][BUILD] Update URL for Google Cloud Storage 
mirror of Maven Central
2bbb995 is described below

commit 2bbb9958c3b017062cafca8c78fc9e6d6d33dbd7
Author: Josh Rosen 
AuthorDate: Tue Feb 25 17:04:13 2020 +0900

[SPARK-30944][BUILD] Update URL for Google Cloud Storage mirror of Maven 
Central

### What changes were proposed in this pull request?

This PR is a followup to #27307: per 
https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926,
 the Google Cloud Storage mirror of Maven Central has updated its URLs: the new 
paths are updated more frequently. The new paths are listed on 
https://storage-download.googleapis.com/maven-central/index.html

This patch updates our build files to use these new URLs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing build + tests.

Closes #27688 from JoshRosen/update-gcs-mirror-url.

Authored-by: Josh Rosen 
Signed-off-by: HyukjinKwon 
(cherry picked from commit f152d2a0a80e2756dd620538a46b030dd5a6e630)
Signed-off-by: HyukjinKwon 
---
 pom.xml   | 4 ++--
 project/SparkBuild.scala  | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/pom.xml b/pom.xml
index 925fa28..b3750e4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -253,7 +253,7 @@
 See https://storage-download.googleapis.com/maven-central/index.html
   -->
   GCS Maven Central mirror
-  
https://maven-central.storage-download.googleapis.com/repos/central/data/
+  https://maven-central.storage-download.googleapis.com/maven2/
   
 true
   
@@ -284,7 +284,7 @@
 See https://storage-download.googleapis.com/maven-central/index.html
   -->
   GCS Maven Central mirror
-  
https://maven-central.storage-download.googleapis.com/repos/central/data/
+  https://maven-central.storage-download.googleapis.com/maven2/
   
 true
   
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index a07823c..fcde1e9 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -226,7 +226,7 @@ object SparkBuild extends PomBuild {
 resolvers := Seq(
   // Google Mirror of Maven Central, placed first so that it's used 
instead of flaky Maven Central.
   // See https://storage-download.googleapis.com/maven-central/index.html 
for more info.
-  "gcs-maven-central-mirror" at 
"https://maven-central.storage-download.googleapis.com/repos/central/data/;,
+  "gcs-maven-central-mirror" at 
"https://maven-central.storage-download.googleapis.com/maven2/;,
   DefaultMavenRepository,
   Resolver.mavenLocal,
   Resolver.file("local", file(Path.userHome.absolutePath + 
"/.ivy2/local"))(Resolver.ivyStylePatterns)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 5297497..674c6df 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2136,7 +2136,7 @@ object SQLConf {
 "if the default Maven Central repo is unreachable.")
   .stringConf
   .createWithDefault(
-
"https://maven-central.storage-download.googleapis.com/repos/central/data/;)
+"https://maven-central.storage-download.googleapis.com/maven2/;)
 
   val LEGACY_FROM_DAYTIME_STRING =
 buildConf("spark.sql.legacy.fromDayTimeString.enabled")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0fd4fa7 -> f152d2a)

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0fd4fa7  [SPARK-30885][SQL] V1 table name should be fully qualified if 
catalog name is provided
 add f152d2a  [SPARK-30944][BUILD] Update URL for Google Cloud Storage 
mirror of Maven Central

No new revisions were added by this update.

Summary of changes:
 pom.xml   | 4 ++--
 project/SparkBuild.scala  | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0fd4fa7 -> f152d2a)

2020-02-25 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0fd4fa7  [SPARK-30885][SQL] V1 table name should be fully qualified if 
catalog name is provided
 add f152d2a  [SPARK-30944][BUILD] Update URL for Google Cloud Storage 
mirror of Maven Central

No new revisions were added by this update.

Summary of changes:
 pom.xml   | 4 ++--
 project/SparkBuild.scala  | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org