date:20200506

[spark] branch master updated (4952f1a -> f05560b)

2020-05-06 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4952f1a  [SPARK-31365][SQL] Enable nested predicate pushdown per data 
sources
 add f05560b  [SPARK-31127][ML] Implement abstract Selector

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/ml/feature/ANOVASelector.scala| 195 +++--
 .../apache/spark/ml/feature/ChiSqSelector.scala| 292 +++
 .../apache/spark/ml/feature/FValueSelector.scala   | 314 +++--
 .../{FValueSelector.scala => Selector.scala}   | 227 +++
 .../ml/feature/VarianceThresholdSelector.scala |  63 +
 .../apache/spark/ml/stat/SelectionTestResult.scala | 117 
 .../spark/ml/feature/ANOVASelectorSuite.scala  |  17 +-
 .../spark/ml/feature/ChiSqSelectorSuite.scala  |  12 +-
 .../spark/ml/feature/FValueSelectorSuite.scala |  20 +-
 .../feature/VarianceThresholdSelectorSuite.scala   |  11 +-
 project/MimaExcludes.scala |   9 +-
 11 files changed, 263 insertions(+), 1014 deletions(-)
 copy mllib/src/main/scala/org/apache/spark/ml/feature/{FValueSelector.scala => 
Selector.scala} (68%)
 delete mode 100644 
mllib/src/main/scala/org/apache/spark/ml/stat/SelectionTestResult.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4952f1a -> f05560b)

2020-05-06 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4952f1a  [SPARK-31365][SQL] Enable nested predicate pushdown per data 
sources
 add f05560b  [SPARK-31127][ML] Implement abstract Selector

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/ml/feature/ANOVASelector.scala| 195 +++--
 .../apache/spark/ml/feature/ChiSqSelector.scala| 292 +++
 .../apache/spark/ml/feature/FValueSelector.scala   | 314 +++--
 .../{FValueSelector.scala => Selector.scala}   | 227 +++
 .../ml/feature/VarianceThresholdSelector.scala |  63 +
 .../apache/spark/ml/stat/SelectionTestResult.scala | 117 
 .../spark/ml/feature/ANOVASelectorSuite.scala  |  17 +-
 .../spark/ml/feature/ChiSqSelectorSuite.scala  |  12 +-
 .../spark/ml/feature/FValueSelectorSuite.scala |  20 +-
 .../feature/VarianceThresholdSelectorSuite.scala   |  11 +-
 project/MimaExcludes.scala |   9 +-
 11 files changed, 263 insertions(+), 1014 deletions(-)
 copy mllib/src/main/scala/org/apache/spark/ml/feature/{FValueSelector.scala => 
Selector.scala} (68%)
 delete mode 100644 
mllib/src/main/scala/org/apache/spark/ml/stat/SelectionTestResult.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f05560b -> b16ea8e)

2020-05-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f05560b  [SPARK-31127][ML] Implement abstract Selector
 add b16ea8e  [SPARK-31650][SQL] Fix wrong UI in case of 
AdaptiveSparkPlanExec has unmanaged subqueries

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 41 --
 .../adaptive/AdaptiveQueryExecSuite.scala  |  7 ++--
 2 files changed, 36 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has unmanaged subqueries

2020-05-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c1b5a4f  [SPARK-31650][SQL] Fix wrong UI in case of 
AdaptiveSparkPlanExec has unmanaged subqueries
c1b5a4f is described below

commit c1b5a4f1877d057973cb0667cdbb7c27550033b8
Author: yi.wu 
AuthorDate: Wed May 6 12:52:53 2020 +

[SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has 
unmanaged subqueries

### What changes were proposed in this pull request?

Make the non-subquery `AdaptiveSparkPlanExec` update UI again after 
execute/executeCollect/executeTake/executeTail if the `AdaptiveSparkPlanExec` 
has subqueries which do not belong to any query stages.

### Why are the changes needed?

If there're subqueries do not belong to any query stages of the main query, 
the main query could get final physical plan and update UI before those 
subqueries finished. As a result, the UI can not reflect the change from the 
subqueries, e.g. new nodes generated from subqueries.

Before:

https://user-images.githubusercontent.com/16397174/81149758-671a9480-8fb1-11ea-84c4-9a4520e2b08e.png";>

After:
https://user-images.githubusercontent.com/16397174/81149752-63870d80-8fb1-11ea-9852-f41e11afe216.png";>

### Does this PR introduce _any_ user-facing change?

No(AQE feature hasn't been released).

### How was this patch tested?

Tested manually.

Closes #28460 from Ngone51/fix_aqe_ui.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit b16ea8e1ab58bd24c50d31ce0dfc6c79c87fa3b2)
Signed-off-by: Wenchen Fan 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 41 --
 .../adaptive/AdaptiveQueryExecSuite.scala  |  7 ++--
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index f00dce2..cd6936b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -138,6 +138,13 @@ case class AdaptiveSparkPlanExec(
 executedPlan.resetMetrics()
   }
 
+  private def getExecutionId: Option[Long] = {
+// If the `QueryExecution` does not match the current execution ID, it 
means the execution ID
+// belongs to another (parent) query, and we should not call update UI in 
this query.
+
Option(context.session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY))
+  .map(_.toLong).filter(SQLExecution.getQueryExecution(_) eq context.qe)
+  }
+
   private def getFinalPhysicalPlan(): SparkPlan = lock.synchronized {
 if (isFinalPlan) return currentPhysicalPlan
 
@@ -145,11 +152,7 @@ case class AdaptiveSparkPlanExec(
 // `plan.queryExecution.rdd`, we need to set active session here as new 
plan nodes can be
 // created in the middle of the execution.
 context.session.withActive {
-  // If the `QueryExecution` does not match the current execution ID, it 
means the execution ID
-  // belongs to another (parent) query, and we should not call update UI 
in this query.
-  val executionId =
-
Option(context.session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY))
-  .map(_.toLong).filter(SQLExecution.getQueryExecution(_) eq 
context.qe)
+  val executionId = getExecutionId
   var currentLogicalPlan = currentPhysicalPlan.logicalLink.get
   var result = createQueryStages(currentPhysicalPlan)
   val events = new LinkedBlockingQueue[StageMaterializationEvent]()
@@ -230,25 +233,43 @@ case class AdaptiveSparkPlanExec(
   currentPhysicalPlan = applyPhysicalRules(result.newPlan, 
queryStageOptimizerRules)
   isFinalPlan = true
   executionId.foreach(onUpdatePlan(_, Seq(currentPhysicalPlan)))
-  logOnLevel(s"Final plan: $currentPhysicalPlan")
   currentPhysicalPlan
 }
   }
 
+  // Use a lazy val to avoid this being called more than once.
+  @transient private lazy val finalPlanUpdate: Unit = {
+// Subqueries that don't belong to any query stage of the main query will 
execute after the
+// last UI update in `getFinalPhysicalPlan`, so we need to update UI here 
again to make sure
+// the newly generated nodes of those subqueries are updated.
+if (!isSubquery && 
currentPhysicalPlan.find(_.subqueries.nonEmpty).isDefined) {
+  getExecutionId.foreach(onUpdatePlan(_, Seq.empty))
+}
+logOnLevel(s"Final plan: $currentPhysicalPlan")
+  }
+
   override def executeCollect(): Array[InternalRow] = {
-getFinalPhysicalPlan().execute

[spark] branch master updated (f05560b -> b16ea8e)

2020-05-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f05560b  [SPARK-31127][ML] Implement abstract Selector
 add b16ea8e  [SPARK-31650][SQL] Fix wrong UI in case of 
AdaptiveSparkPlanExec has unmanaged subqueries

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 41 --
 .../adaptive/AdaptiveQueryExecSuite.scala  |  7 ++--
 2 files changed, 36 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has unmanaged subqueries

2020-05-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c1b5a4f  [SPARK-31650][SQL] Fix wrong UI in case of 
AdaptiveSparkPlanExec has unmanaged subqueries
c1b5a4f is described below

commit c1b5a4f1877d057973cb0667cdbb7c27550033b8
Author: yi.wu 
AuthorDate: Wed May 6 12:52:53 2020 +

[SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has 
unmanaged subqueries

### What changes were proposed in this pull request?

Make the non-subquery `AdaptiveSparkPlanExec` update UI again after 
execute/executeCollect/executeTake/executeTail if the `AdaptiveSparkPlanExec` 
has subqueries which do not belong to any query stages.

### Why are the changes needed?

If there're subqueries do not belong to any query stages of the main query, 
the main query could get final physical plan and update UI before those 
subqueries finished. As a result, the UI can not reflect the change from the 
subqueries, e.g. new nodes generated from subqueries.

Before:

https://user-images.githubusercontent.com/16397174/81149758-671a9480-8fb1-11ea-84c4-9a4520e2b08e.png";>

After:
https://user-images.githubusercontent.com/16397174/81149752-63870d80-8fb1-11ea-9852-f41e11afe216.png";>

### Does this PR introduce _any_ user-facing change?

No(AQE feature hasn't been released).

### How was this patch tested?

Tested manually.

Closes #28460 from Ngone51/fix_aqe_ui.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit b16ea8e1ab58bd24c50d31ce0dfc6c79c87fa3b2)
Signed-off-by: Wenchen Fan 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 41 --
 .../adaptive/AdaptiveQueryExecSuite.scala  |  7 ++--
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index f00dce2..cd6936b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -138,6 +138,13 @@ case class AdaptiveSparkPlanExec(
 executedPlan.resetMetrics()
   }
 
+  private def getExecutionId: Option[Long] = {
+// If the `QueryExecution` does not match the current execution ID, it 
means the execution ID
+// belongs to another (parent) query, and we should not call update UI in 
this query.
+
Option(context.session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY))
+  .map(_.toLong).filter(SQLExecution.getQueryExecution(_) eq context.qe)
+  }
+
   private def getFinalPhysicalPlan(): SparkPlan = lock.synchronized {
 if (isFinalPlan) return currentPhysicalPlan
 
@@ -145,11 +152,7 @@ case class AdaptiveSparkPlanExec(
 // `plan.queryExecution.rdd`, we need to set active session here as new 
plan nodes can be
 // created in the middle of the execution.
 context.session.withActive {
-  // If the `QueryExecution` does not match the current execution ID, it 
means the execution ID
-  // belongs to another (parent) query, and we should not call update UI 
in this query.
-  val executionId =
-
Option(context.session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY))
-  .map(_.toLong).filter(SQLExecution.getQueryExecution(_) eq 
context.qe)
+  val executionId = getExecutionId
   var currentLogicalPlan = currentPhysicalPlan.logicalLink.get
   var result = createQueryStages(currentPhysicalPlan)
   val events = new LinkedBlockingQueue[StageMaterializationEvent]()
@@ -230,25 +233,43 @@ case class AdaptiveSparkPlanExec(
   currentPhysicalPlan = applyPhysicalRules(result.newPlan, 
queryStageOptimizerRules)
   isFinalPlan = true
   executionId.foreach(onUpdatePlan(_, Seq(currentPhysicalPlan)))
-  logOnLevel(s"Final plan: $currentPhysicalPlan")
   currentPhysicalPlan
 }
   }
 
+  // Use a lazy val to avoid this being called more than once.
+  @transient private lazy val finalPlanUpdate: Unit = {
+// Subqueries that don't belong to any query stage of the main query will 
execute after the
+// last UI update in `getFinalPhysicalPlan`, so we need to update UI here 
again to make sure
+// the newly generated nodes of those subqueries are updated.
+if (!isSubquery && 
currentPhysicalPlan.find(_.subqueries.nonEmpty).isDefined) {
+  getExecutionId.foreach(onUpdatePlan(_, Seq.empty))
+}
+logOnLevel(s"Final plan: $currentPhysicalPlan")
+  }
+
   override def executeCollect(): Array[InternalRow] = {
-getFinalPhysicalPlan().execute

[spark] branch branch-3.0 updated: [SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has unmanaged subqueries

2020-05-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c1b5a4f  [SPARK-31650][SQL] Fix wrong UI in case of 
AdaptiveSparkPlanExec has unmanaged subqueries
c1b5a4f is described below

commit c1b5a4f1877d057973cb0667cdbb7c27550033b8
Author: yi.wu 
AuthorDate: Wed May 6 12:52:53 2020 +

[SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has 
unmanaged subqueries

### What changes were proposed in this pull request?

Make the non-subquery `AdaptiveSparkPlanExec` update UI again after 
execute/executeCollect/executeTake/executeTail if the `AdaptiveSparkPlanExec` 
has subqueries which do not belong to any query stages.

### Why are the changes needed?

If there're subqueries do not belong to any query stages of the main query, 
the main query could get final physical plan and update UI before those 
subqueries finished. As a result, the UI can not reflect the change from the 
subqueries, e.g. new nodes generated from subqueries.

Before:

https://user-images.githubusercontent.com/16397174/81149758-671a9480-8fb1-11ea-84c4-9a4520e2b08e.png";>

After:
https://user-images.githubusercontent.com/16397174/81149752-63870d80-8fb1-11ea-9852-f41e11afe216.png";>

### Does this PR introduce _any_ user-facing change?

No(AQE feature hasn't been released).

### How was this patch tested?

Tested manually.

Closes #28460 from Ngone51/fix_aqe_ui.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit b16ea8e1ab58bd24c50d31ce0dfc6c79c87fa3b2)
Signed-off-by: Wenchen Fan 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 41 --
 .../adaptive/AdaptiveQueryExecSuite.scala  |  7 ++--
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index f00dce2..cd6936b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -138,6 +138,13 @@ case class AdaptiveSparkPlanExec(
 executedPlan.resetMetrics()
   }
 
+  private def getExecutionId: Option[Long] = {
+// If the `QueryExecution` does not match the current execution ID, it 
means the execution ID
+// belongs to another (parent) query, and we should not call update UI in 
this query.
+
Option(context.session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY))
+  .map(_.toLong).filter(SQLExecution.getQueryExecution(_) eq context.qe)
+  }
+
   private def getFinalPhysicalPlan(): SparkPlan = lock.synchronized {
 if (isFinalPlan) return currentPhysicalPlan
 
@@ -145,11 +152,7 @@ case class AdaptiveSparkPlanExec(
 // `plan.queryExecution.rdd`, we need to set active session here as new 
plan nodes can be
 // created in the middle of the execution.
 context.session.withActive {
-  // If the `QueryExecution` does not match the current execution ID, it 
means the execution ID
-  // belongs to another (parent) query, and we should not call update UI 
in this query.
-  val executionId =
-
Option(context.session.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY))
-  .map(_.toLong).filter(SQLExecution.getQueryExecution(_) eq 
context.qe)
+  val executionId = getExecutionId
   var currentLogicalPlan = currentPhysicalPlan.logicalLink.get
   var result = createQueryStages(currentPhysicalPlan)
   val events = new LinkedBlockingQueue[StageMaterializationEvent]()
@@ -230,25 +233,43 @@ case class AdaptiveSparkPlanExec(
   currentPhysicalPlan = applyPhysicalRules(result.newPlan, 
queryStageOptimizerRules)
   isFinalPlan = true
   executionId.foreach(onUpdatePlan(_, Seq(currentPhysicalPlan)))
-  logOnLevel(s"Final plan: $currentPhysicalPlan")
   currentPhysicalPlan
 }
   }
 
+  // Use a lazy val to avoid this being called more than once.
+  @transient private lazy val finalPlanUpdate: Unit = {
+// Subqueries that don't belong to any query stage of the main query will 
execute after the
+// last UI update in `getFinalPhysicalPlan`, so we need to update UI here 
again to make sure
+// the newly generated nodes of those subqueries are updated.
+if (!isSubquery && 
currentPhysicalPlan.find(_.subqueries.nonEmpty).isDefined) {
+  getExecutionId.foreach(onUpdatePlan(_, Seq.empty))
+}
+logOnLevel(s"Final plan: $currentPhysicalPlan")
+  }
+
   override def executeCollect(): Array[InternalRow] = {
-getFinalPhysicalPlan().execute

[spark] branch master updated: [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to PySpark

2020-05-06 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 09ece50  [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to 
PySpark
09ece50 is described below

commit 09ece50799222d577009a2bbd480304d1ae1e14e
Author: Huaxin Gao 
AuthorDate: Wed May 6 09:11:03 2020 -0500

[SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to PySpark

### What changes were proposed in this pull request?
Add VarianceThresholdSelector to PySpark

### Why are the changes needed?
parity between Scala and Python

### Does this PR introduce any user-facing change?
Yes.
VarianceThresholdSelector is added to PySpark

### How was this patch tested?
new doctest

Closes #28409 from huaxingao/variance_py.

Authored-by: Huaxin Gao 
Signed-off-by: Sean Owen 
---
 python/pyspark/ml/feature.py | 142 +++
 1 file changed, 142 insertions(+)

diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index 6df2f74..7acf8ce 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -57,6 +57,7 @@ __all__ = ['Binarizer',
'StopWordsRemover',
'StringIndexer', 'StringIndexerModel',
'Tokenizer',
+   'VarianceThresholdSelector', 'VarianceThresholdSelectorModel',
'VectorAssembler',
'VectorIndexer', 'VectorIndexerModel',
'VectorSizeHint',
@@ -5381,6 +5382,147 @@ class VectorSizeHint(JavaTransformer, HasInputCol, 
HasHandleInvalid, JavaMLReada
 return self._set(handleInvalid=value)
 
 
+class _VarianceThresholdSelectorParams(HasFeaturesCol, HasOutputCol):
+"""
+Params for :py:class:`VarianceThresholdSelector` and
+:py:class:`VarianceThresholdSelectorrModel`.
+
+.. versionadded:: 3.1.0
+"""
+
+varianceThreshold = Param(Params._dummy(), "varianceThreshold",
+  "Param for variance threshold. Features with a 
variance not " +
+  "greater than this threshold will be removed. 
The default value " +
+  "is 0.0.", typeConverter=TypeConverters.toFloat)
+
+@since("3.1.0")
+def getVarianceThreshold(self):
+"""
+Gets the value of varianceThreshold or its default value.
+"""
+return self.getOrDefault(self.varianceThreshold)
+
+
+@inherit_doc
+class VarianceThresholdSelector(JavaEstimator, 
_VarianceThresholdSelectorParams, JavaMLReadable,
+JavaMLWritable):
+"""
+Feature selector that removes all low-variance features. Features with a
+variance not greater than the threshold will be removed. The default is to 
keep
+all features with non-zero variance, i.e. remove the features that have the
+same value in all samples.
+
+>>> from pyspark.ml.linalg import Vectors
+>>> df = spark.createDataFrame(
+...[(Vectors.dense([6.0, 7.0, 0.0, 7.0, 6.0, 0.0]),),
+... (Vectors.dense([0.0, 9.0, 6.0, 0.0, 5.0, 9.0]),),
+... (Vectors.dense([0.0, 9.0, 3.0, 0.0, 5.0, 5.0]),),
+... (Vectors.dense([0.0, 9.0, 8.0, 5.0, 6.0, 4.0]),),
+... (Vectors.dense([8.0, 9.0, 6.0, 5.0, 4.0, 4.0]),),
+... (Vectors.dense([8.0, 9.0, 6.0, 0.0, 0.0, 0.0]),)],
+...["features"])
+>>> selector = VarianceThresholdSelector(varianceThreshold=8.2, 
outputCol="selectedFeatures")
+>>> model = selector.fit(df)
+>>> model.getFeaturesCol()
+'features'
+>>> model.setFeaturesCol("features")
+VarianceThresholdSelectorModel...
+>>> model.transform(df).head().selectedFeatures
+DenseVector([6.0, 7.0, 0.0])
+>>> model.selectedFeatures
+[0, 3, 5]
+>>> varianceThresholdSelectorPath = temp_path + 
"/variance-threshold-selector"
+>>> selector.save(varianceThresholdSelectorPath)
+>>> loadedSelector = 
VarianceThresholdSelector.load(varianceThresholdSelectorPath)
+>>> loadedSelector.getVarianceThreshold() == 
selector.getVarianceThreshold()
+True
+>>> modelPath = temp_path + "/variance-threshold-selector-model"
+>>> model.save(modelPath)
+>>> loadedModel = VarianceThresholdSelectorModel.load(modelPath)
+>>> loadedModel.selectedFeatures == model.selectedFeatures
+True
+
+.. versionadded:: 3.1.0
+"""
+
+@keyword_only
+def __init__(self, featuresCol="features", outputCol=None, 
varianceThreshold=0.0):
+"""
+__init__(self, featuresCol="features", outputCol=None, 
varianceThreshold=0.0)
+"""
+super(VarianceThresholdSelector, self).__init__()
+self._java_obj = self._new_java_obj(
+"org.apache.spark.ml.feature.VarianceThresholdSelector", self.uid)
+self._setDefault(varianceThreshold=0.0)
+

[spark] branch master updated (b16ea8e -> 09ece50)

2020-05-06 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b16ea8e  [SPARK-31650][SQL] Fix wrong UI in case of 
AdaptiveSparkPlanExec has unmanaged subqueries
 add 09ece50  [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to 
PySpark

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/feature.py | 142 +++
 1 file changed, 142 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b16ea8e -> 09ece50)

2020-05-06 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b16ea8e  [SPARK-31650][SQL] Fix wrong UI in case of 
AdaptiveSparkPlanExec has unmanaged subqueries
 add 09ece50  [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to 
PySpark

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/feature.py | 142 +++
 1 file changed, 142 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r39313 - /dev/spark/KEYS

2020-05-06 Thread holden

Author: holden
Date: Wed May  6 20:56:25 2020
New Revision: 39313

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Wed May  6 20:56:25 2020
@@ -1225,3 +1225,130 @@ HYoqwKL52HzG121lfWXhx5vNF4bg/fKrFEOy2Wp1
 NjpTIP+lOkfxRwUi
 =rggH
 -END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2018-07-01 [SC]
+  CA19E2CC201B6522D3A2A05DC41AB674424FC3FB
+uid   [ultimate] Holden Karau (CODE SIGNING KEY) 
+sub   rsa4096 2018-07-01 [E]
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBFs5MrIBEAC1jGSgkZd+HjtAt5Mlh9LKOucaHRN9ItLxk4rrYklxt3YI20ft
+DIrLiAQCotiYYhmhBaXZRTfgGgusRa6MhdmX+5t/+yKZfoLqfeZeKawDsvKhGgod
+Yl5iCxB0OmwdZjjOaZPDW2zVlJkjuB24SnSNwsBnd0kkDAEoGMN3bUflvnzL2EJI
+zzvmR0HGK0aWRFzj7fvBu5sN4RM03ZN/3CCCVHlznH8ILvFDz8PnLf+nMhBFIQOo
+OBiiLq20Ag19vj1rXjfe3U0YT6L4+SbLAt6vde8YpXs/mpkWvVPH4OtEATKsLqvJ
+5pP8YGxOkPfls6lfziyLV/JMUO6f9BxqgXXWLPvjAuhRWKJ+yyFfWa4Ju8PTKEO6
+Z344tl+1FNNXNkBGTVN+k264lKe/He8ywYZKgtyF9iGvOf8HgUMsjg6DSFOMaZ9I
+oFfe1G1zYtDCwV4rkf+AS/SlxaMcQpr96wxhtnBL6zD92gK8Of1+EN4yrfVSp8Gj
+cnRC3jYC4iUtyWDZcUpHkSajxH2IbYQMty8fHhYLA6mq8HEhPEtsSE3eX1KQEZEX
+O+mcjHaC6XuvC7aQozNkYw19S6vK9l7Dwcq7+0WYpHFkQ5RgTmE7ns36/lq/4s/B
+9TTC9m8eXAjBtjNjlnax7qXBshru6dYt9kfI4PM6SV0WHZtSHldYkqy/GwARAQAB
+tDNIb2xkZW4gS2FyYXUgKENPREUgU0lHTklORyBLRVkpIDxob2xkZW5AYXBhY2hl
+Lm9yZz6JAk4EEwEKADgWIQTKGeLMIBtlItOioF3EGrZ0Qk/D+wUCWzkysgIbAwUL
+CQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRDEGrZ0Qk/D+4pwEACUTTtP2xxp7oE/
+1V7ozsvrhs0TwaqVVpBnhgvtj8iW8towVRPiBw+f/6M0HxfVtTjcrqsRSE2E3Lw1
+voa7EBGmOr4y+36jKcnme9ImbTTb8bhu9rvEhqFsOhIECDa7qOpuLpYkywn+eHGi
+oz7JKFpI0JbAw8wxsYBRCwHyQ1F+nd6MoLeYHT9vKYfCUlE86ZWX35uqkxYcoza1
+g9XsqUQph6RSrPh0KprYBuvafwnJlixzCOoBi0++r09G0S8p6/PEbf0QcAFAxlbQ
+QcJBApifwo4YgE1vvupgP7OFPfkYQhy+Dxk62xwdrisChFA5xZfDZf0q+lNZQvbf
+iqXBGKG9GK8UI/hs/4CvT1JYKGBVWuxPrtGUp+6bBHgIBQvUlc/v+YOnjtAihTOD
+qWGwnKeJhs5gsCO9TRxEyfB5UZPe7z1zlAgUes5C570jhBz0MqweJ5Ahr7FowEg7
+tRuBtx57A9tBYmsCwyohO/XsLLpQ4YSwFkPPyF2EfzpCYA2Uxc07IeqvJYJSF3dN
+DeHSO+QgZOkxJaI79wDrUbYLlXPTIxdD7JC6sthuqO1NCN0WQ0T/6qHvRP1FS5bP
+IGadvPe7eoVFGg1xFSjyuc1cG7oh6kGYNEq9/88pk7Jw06iZLPGVMtdPBoy4cwLD
+66JBuB1jZZ65JqTTqjkLy6Oflz/7ZIkCMwQQAQoAHRYhBCZkGlKVVCadOX4DBR6B
+6wOfgCXPBQJbOTNhAAoJEB6B6wOfgCXPq+sQAK4En8juzzCZbdGaIGGxpyHNlHee
+C3v+9smZ0ueioszgmtDKIbq8Zkx/zcrcn+xrs6LPIFo8LKD6LcqiQfaXd8k3rgcL
+TBgslew7fs4Nj13JQdg3zo2iD/WDK5pwZdo/ypzZnPr8340SA8Gw5a8zSCYW6cdM
+Ae3f2wpkMQnjaSbbGCM7XeFDorumuCd9Fo3G+GF5qny9u/o2gVeqRdSyN3lo+/H7
+z15PNlBNeQG1bs5BkRawePGqfewqtkZC7Ql77cq/nY83eieQHx5AB1iTSqLfHMo2
+cmvFqOYGcsc9N/9eh1qQ4E0QN9PLgX7gVZldT4AbeTjqUg8N4i7duhyvVLbuPg4+
+lmyp7PzIQKbb7MKhVMDkGURBi4aGmn8eXHVckCUfHHCCAwFwuyS8K0L9Za64dE4B
+e1HiCr+75zaAaaEGHkECM4bzgePf1ZWnAcjN1UrQkO2IKeAa5GoGieIJ4/zjMFDs
+dof1dUvWPZXcCPRJoKAFPE8oerfL7j0xtBh/VaFvOWXJrU9XZ8oMvGxfRBzgfmWG
+WsGmAUzwCisnqKUHzAKm6t7qei/aUH4EG74XOIZdUAotcj6A8Dytfkl4ju6e6oCX
+4cSXIXJ2T2Q9c4NQZpynncynvVyimoPo3k4EIDwKv0pxtSFQCgMTSCQ17RIgAIs1
++ctRb9+MfIM62dPSiQIzBBIBCAAdFiEEhu25wzuFFyKOiKj5PkjAxu82K54FAl6z
+AwAACgkQPkjAxu82K55U2RAAquVJ0ytQGYEZFSqcFhc3mo+tLY3+pfkpHPjbiU/C
+vPNwdwmcfmiEXCpD2u3qNLJgjV+w5KzYOS+6W5tC3TrhuWCpz63EbZbL5VRHVbZY
+7US2SuWPv4utfHdqkm/itUNTrcK3fi8WFTpagtOK0xt1mQhhGNamZtCouq0ilY4i
+/d4KNsqZKjW2sKdClgzaSYyoBjUlCPbiVkDHedaYACcvcezulcqaW0FnSpkYpXzC
+MZsZjf3jYSGylDcufcNIVsA/0OcEHXlxupAtmeWYBxODb/mrwZ9Pc4o5E6gFYHnp
+WgkW23JxBa6vIkGYC7mIndMiI5iuNdXxXThOOQRkxe6sjkp2fe1ac/8XHzqkwdlD
+0G4nX+PcqbFo8+DEZXSsgEXgMUt7TKLYmw4OAny8HVYPXWqPInEByuRUaAD71wD9
+HVwwB+8dCf37ZUfYAW55Ls/QeysCwYZXFBpoc2H7q+9w+ArasGMHh08mVoV0IfjI
+hoAcDQuWJQlsKS0fvx5jY5LmG7eHnfoUaahrCbg21moVHnSQCBe8OCHvwdcHo8m+
+55Hiilo9OCFhiWEm0bAKFG2V8fjo6HeZez7+MZ2exBhR/RQlnqBY3PHjUoIwHeFc
+BRfSCSokMjwO4VkfM+tTglm/i7Y2/ZeGblY9e3iz/Xmqlo6iPUbxarmIm5O38VAD
+2G2JBDMEEAEIAB0WIQQ8pK7/4heko+cuNaNc7YuJamvfoAUCXAr95wAKCRBc7YuJ
+amvfoBP0H/97he9tSbpPyQ+JkfYBRFC2KcIqs6gvbaHAuAycwhVWRLySsI2msw0k
+TPIbxI9DCXvF9sUbmGWgkMqJ//MYzMhQgx5FWGRCL+4GaoEKPHOxJCrEHYyZA2mo
+pZFcXywtfsWcp5rqgtruZ7AEI9bKKauvwHuuZo1wrnuRzlZ6RqqsteSZpMRb7Ell
+/H/zi5r8V8OIYKxbgeVXZP9HdqWgplsii/KFipEfD1qLCVavZLcEsAFRCxx7vYJP
+9Z+ioTzbCXgoFZ2PqO6ez5aIc05MwKZFnYGFEaA4zzwA/Hb/kBqYYpNpScVcwQnl
+jbN3crvguxjdEGH8OZSFiffs7O2eLCn70KONfoeHlUMJ98Q4sSUy9Y7/wGjlFr5i
+inYnAOye41dA0JRY7mnLMXs8JfLUFI+RKaU+c5z1YN923pvYDHQCMCVeym41VNdt
+dv0IfJGZgjsgyOR4derM0rusvfthKjd7sZsOppsXWbhKsY8GsTuUnZ2ts7AEXWNw
+ZFd5ZN/bFQ19CXMeasiYDdk+9paEhamtyOhyYsjJJGRJSPf9WytTL1N6vxKH6N85
+FzWmVdCIxDNmnAJUfOaSJkY5BOcAHB8a9Ujx8DrEFfzotzj+hyKztHDFTPG1NRBR
+EhqkPFf/zFiI+Wuss6xKB99HbKkKLyZmaCvmUFaNtnB3R7XIV41Hb/oZSD4Rr8KV
+nbUU1c4A736n4RowD/3939AaPDY89Tb6SsUVW9lcoQWy68Egnvefvq651ZsW6a3G
+S5rkhv0T9T+udE27q9r81acIJ7caC+tSqdMXTcbtohiw/QF+QhxkTXtOQOIHzFnQ
+05hYhJTzsVUIII/MZfYx8cmF9dxdcneHrEZWLCQGag201mijw4PisPBqTLwMwWo1
+iMkW0WTpfKq0kA9f6FVt3D2iKvPxhS0ZnwRnjrKK1iQJ2yn88wQalPWCobUnsyHG
+oJ4aSrtuotKSjUdebwoxu1Qv8UHbC8/3+MeAaO49GaWktzklGRpGXj15H3CaMLgz
+y+NzTsoqWNEirXyKMSKIstrxhuJzG+4KEw62Dkss5+bFaT07Jc1gux5k7jEowtg8
+lzt9c3cAfk2

[spark] branch branch-2.4 updated: [SPARK-31653][BUILD] Setuptools is needed before installing any other python packages

2020-05-06 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new a00eddc  [SPARK-31653][BUILD] Setuptools is needed before installing 
any other python packages
a00eddc is described below

commit a00eddcffbe83f8c462eea550ea4e33250c26889
Author: Holden Karau 
AuthorDate: Wed May 6 14:56:42 2020 -0700

[SPARK-31653][BUILD] Setuptools is needed before installing any other 
python packages

### What changes were proposed in this pull request?
Allow the docker build to succeed

### Why are the changes needed?
The base packages depend on having setuptools installed now

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ran the release script, pip installs succeeded

Closes #28467 from 
holdenk/SPARK-31653-setuptools-needs-to-be-isntalled-before-anything-else.

Authored-by: Holden Karau 
Signed-off-by: Holden Karau 
---
 dev/create-release/spark-rm/Dockerfile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index 992961f..192f456 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -61,11 +61,13 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg 
ca-certificates apt-tr
   # Install needed python packages. Use pip for installing packages (for 
consistency).
   $APT_INSTALL libpython2.7-dev libpython3-dev python-pip python3-pip && \
   pip install --upgrade pip && hash -r pip && \
+  pip install setuptools && \
   pip install $BASE_PIP_PKGS && \
   pip install $PIP_PKGS && \
   cd && \
   virtualenv -p python3 /opt/p35 && \
   . /opt/p35/bin/activate && \
+  pip install setuptools && \
   pip install $BASE_PIP_PKGS && \
   pip install $PIP_PKGS && \
   # Install R packages and dependencies used when building.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v2.4.6-rc1

2020-05-06 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to tag v2.4.6-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit a3cffc997035d11e1f6c092c1186e943f2f63544
Author: Holden Karau 
AuthorDate: Wed May 6 23:13:21 2020 +

Preparing Spark release v2.4.6-rc1
---
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 2 +-
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 42 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 9aa868c..de59f40 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6-SNAPSHOT
+2.4.6
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 3593374..b3a2265 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 0aadd49..d3db527 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index d862cf8..34872fd 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 011bb49..5059ee0 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 02b9ce7..7d46020 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 5b4c930..4b5cf16 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6-SNAPSHOT
+2.4.6
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.

[spark] tag v2.4.6-rc1 created (now a3cffc9)

2020-05-06 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a change to tag v2.4.6-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at a3cffc9  (commit)
This tag includes the following new commits:

 new a3cffc9  Preparing Spark release v2.4.6-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing development version 2.4.7-SNAPSHOT

2020-05-06 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 8504bdb945432479e1a23fccf2308580ba92ecc4
Author: Holden Karau 
AuthorDate: Wed May 6 23:13:27 2020 +

Preparing development version 2.4.7-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index c913a38..b70014d 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.6
+Version: 2.4.7
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index de59f40..712cc7f 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index b3a2265..825d771 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index d3db527..9dd26b3 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 34872fd..386782b 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 5059ee0..8496a68 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.6
+2.4.7-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 7d46020..

[spark] branch branch-2.4 updated (a00eddc -> 8504bdb)

2020-05-06 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a00eddc  [SPARK-31653][BUILD] Setuptools is needed before installing 
any other python packages
 add a3cffc9  Preparing Spark release v2.4.6-rc1
 new 8504bdb  Preparing development version 2.4.7-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (09ece50 -> 5c5dd77)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 09ece50  [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to 
PySpark
 add 5c5dd77  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (09ece50 -> 5c5dd77)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 09ece50  [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to 
PySpark
 add 5c5dd77  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f8a20c4  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration
f8a20c4 is described below

commit f8a20c470bf115b0834970ce02eb2ec103e0f6df
Author: HyukjinKwon 
AuthorDate: Thu May 7 09:00:59 2020 +0900

[SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' 
configuration

### What changes were proposed in this pull request?

This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' 
configuration and remove it in the future release.

### Why are the changes needed?

This optimization can cause a potential correctness issue, see also 
SPARK-26709.
Also, it seems difficult to extend the optimization. Basically you should 
whitelist all available functions. It costs some maintenance overhead, see also 
SPARK-31590.

Looks we should just better let users use `SparkSessionExtensions` instead 
if they must use, and remove it in Spark side.

### Does this PR introduce _any_ user-facing change?

Yes, setting `spark.sql.optimizer.metadataOnly` will show a deprecation 
warning:

```scala
scala> spark.conf.unset("spark.sql.optimizer.metadataOnly")
```
```
20/05/06 12:57:23 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
 deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
 to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
inject it as a custom rule.
```
```scala
scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true")
```
```
20/05/06 12:57:44 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
 to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
inject it as a custom rule.
```

### How was this patch tested?

Manually tested.

Closes #28459 from HyukjinKwon/SPARK-31647.

Authored-by: HyukjinKwon 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 5c5dd77d6a29b014b3fe4b4015f5c7199650a378)
Signed-off-by: Takeshi Yamamuro 
---
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 51404a2..8d673c5 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -844,8 +844,10 @@ object SQLConf {
 .doc("When true, enable the metadata-only query optimization that use the 
table's metadata " +
   "to produce the partition columns instead of table scans. It applies 
when all the columns " +
   "scanned are partition columns and the query has an aggregate operator 
that satisfies " +
-  "distinct semantics. By default the optimization is disabled, since it 
may return " +
-  "incorrect results when the files are empty.")
+  "distinct semantics. By default the optimization is disabled, and 
deprecated as of Spark " +
+  "3.0 since it may return incorrect results when the files are empty, see 
also SPARK-26709." +
+  "It will be removed in the future releases. If you must use, use 
'SparkSessionExtensions' " +
+  "instead to inject it as a custom rule.")
 .version("2.1.1")
 .booleanConf
 .createWithDefault(false)
@@ -2587,7 +2589,10 @@ object SQLConf {
   DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
 s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."),
   DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0",
-s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.")
+s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."),
+  DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0",
+"Avoid to depend on this optimization to prevent a potential 
correctness issue. " +
+  "If you must use, use 'SparkSessionExtensions' instead to inject it 
as a custom rule.")
 )
 
 Map(configs.map { cfg => cfg.key -> cfg } : _*)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f8a20c4  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration
f8a20c4 is described below

commit f8a20c470bf115b0834970ce02eb2ec103e0f6df
Author: HyukjinKwon 
AuthorDate: Thu May 7 09:00:59 2020 +0900

[SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' 
configuration

### What changes were proposed in this pull request?

This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' 
configuration and remove it in the future release.

### Why are the changes needed?

This optimization can cause a potential correctness issue, see also 
SPARK-26709.
Also, it seems difficult to extend the optimization. Basically you should 
whitelist all available functions. It costs some maintenance overhead, see also 
SPARK-31590.

Looks we should just better let users use `SparkSessionExtensions` instead 
if they must use, and remove it in Spark side.

### Does this PR introduce _any_ user-facing change?

Yes, setting `spark.sql.optimizer.metadataOnly` will show a deprecation 
warning:

```scala
scala> spark.conf.unset("spark.sql.optimizer.metadataOnly")
```
```
20/05/06 12:57:23 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
 deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
 to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
inject it as a custom rule.
```
```scala
scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true")
```
```
20/05/06 12:57:44 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
 to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
inject it as a custom rule.
```

### How was this patch tested?

Manually tested.

Closes #28459 from HyukjinKwon/SPARK-31647.

Authored-by: HyukjinKwon 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 5c5dd77d6a29b014b3fe4b4015f5c7199650a378)
Signed-off-by: Takeshi Yamamuro 
---
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 51404a2..8d673c5 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -844,8 +844,10 @@ object SQLConf {
 .doc("When true, enable the metadata-only query optimization that use the 
table's metadata " +
   "to produce the partition columns instead of table scans. It applies 
when all the columns " +
   "scanned are partition columns and the query has an aggregate operator 
that satisfies " +
-  "distinct semantics. By default the optimization is disabled, since it 
may return " +
-  "incorrect results when the files are empty.")
+  "distinct semantics. By default the optimization is disabled, and 
deprecated as of Spark " +
+  "3.0 since it may return incorrect results when the files are empty, see 
also SPARK-26709." +
+  "It will be removed in the future releases. If you must use, use 
'SparkSessionExtensions' " +
+  "instead to inject it as a custom rule.")
 .version("2.1.1")
 .booleanConf
 .createWithDefault(false)
@@ -2587,7 +2589,10 @@ object SQLConf {
   DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
 s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."),
   DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0",
-s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.")
+s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."),
+  DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0",
+"Avoid to depend on this optimization to prevent a potential 
correctness issue. " +
+  "If you must use, use 'SparkSessionExtensions' instead to inject it 
as a custom rule.")
 )
 
 Map(configs.map { cfg => cfg.key -> cfg } : _*)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r39315 - /dev/spark/v2.4.6-rc1-bin/

2020-05-06 Thread holden

Author: holden
Date: Thu May  7 00:15:41 2020
New Revision: 39315

Log:
Apache Spark v2.4.6-rc1

Added:
dev/spark/v2.4.6-rc1-bin/
dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz   (with props)
dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz.asc
dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz.sha512
dev/spark/v2.4.6-rc1-bin/pyspark-2.4.6.tar.gz   (with props)
dev/spark/v2.4.6-rc1-bin/pyspark-2.4.6.tar.gz.asc
dev/spark/v2.4.6-rc1-bin/pyspark-2.4.6.tar.gz.sha512
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-hadoop2.6.tgz.asc
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-hadoop2.6.tgz.sha512
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-hadoop2.7.tgz.asc
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-hadoop2.7.tgz.sha512
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-without-hadoop-scala-2.12.tgz   
(with props)
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-without-hadoop-scala-2.12.tgz.asc

dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-without-hadoop-scala-2.12.tgz.sha512
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-without-hadoop.tgz   (with props)
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-without-hadoop.tgz.asc
dev/spark/v2.4.6-rc1-bin/spark-2.4.6-bin-without-hadoop.tgz.sha512
dev/spark/v2.4.6-rc1-bin/spark-2.4.6.tgz   (with props)
dev/spark/v2.4.6-rc1-bin/spark-2.4.6.tgz.asc
dev/spark/v2.4.6-rc1-bin/spark-2.4.6.tgz.sha512

Added: dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz.asc
==
--- dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz.asc (added)
+++ dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz.asc Thu May  7 00:15:41 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJes02vAAoJEMQatnRCT8P7kXAQAKo3zVlmUkqu77cpPBx0gEvi
+i+00BJM4iTHIEUKXcaLc/g2EbRMOuMBZQazh2NAA9P3gg6zmgTKAvsfojHE9786G
+/4MUPcFVj+q6Brr/gqPcuRK3NMU7mWWUP1TpR2U4BDaOU9AwAvn6pOrYOMedWiCP
+bU9db8lEM6MVYwQ3VrX//DDeoYWBkYACdIdevhlMckemItbS7zr6JPyIFzPVWJqx
+83s/CR73GdTxAUctuj5pKx7OXKy0sSWmqDedOOZc3lduG6KMSU+CSUB58OFvBcwr
+Bpa98CmqA/HIT0vonplZk1S82nFa2m9+nE2UatMxnueDeV7A0oUIO/kVVlry07KZ
+5rRgCQaqt/YKBu2LFVNQuOyMGkP/yFC2mkjLMyOy9ZqDKJVllzacrwJuwZCpkTzx
+9ugAF31ZdtGgB9E2+MTzg4H2+twJkooVWNq6rX6NMPgfRfwD7zKex1tnt5rXwAFO
+fi9pZGv7jZQUBI5qIEqB8RPVb7t0QYxRJJ18a/VOnJnua+hhlxZ7kZKnc4SXF90d
+ciPFk9v7RT7fpLIVIgkOi5Qo/8AHJ4hVqKZ6ckLgTiFfnhRY/DckE++C2Ran+t6M
+OBXYICU53RTOspGVtEZity3jiV9gwS/qQxUr8KeR/TpzfWWs35jxwdEIsCwUHkcJ
+kiucRsXETRDCH8fo8JTr
+=AXpN
+-END PGP SIGNATURE-

Added: dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz.sha512
==
--- dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz.sha512 (added)
+++ dev/spark/v2.4.6-rc1-bin/SparkR_2.4.6.tar.gz.sha512 Thu May  7 00:15:41 2020
@@ -0,0 +1,3 @@
+SparkR_2.4.6.tar.gz: C3E07BF7 2AB04F65 1BD91C10 B996338D B205FCB7 C5E8E3D2
+ AED296D5 AD54BCA3 78F0B444 7960807E A593B589 F30B6B3E
+ A66123EE 2880662F 27FFC65B 15C2B6A0

Added: dev/spark/v2.4.6-rc1-bin/pyspark-2.4.6.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.4.6-rc1-bin/pyspark-2.4.6.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.4.6-rc1-bin/pyspark-2.4.6.tar.gz.asc
==
--- dev/spark/v2.4.6-rc1-bin/pyspark-2.4.6.tar.gz.asc (added)
+++ dev/spark/v2.4.6-rc1-bin/pyspark-2.4.6.tar.gz.asc Thu May  7 00:15:41 2020
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJes0qJAAoJEMQatnRCT8P7OgMP/jq3eYNdMlFnzbmSqkrw3w01
+GFwq+sBzkz3TAqBIYSkHzzQtB7TpjDGb2c4N4ZWzrhKnOa6TSg5PtTbMiYiysrMX
+7zSe2zxOP9cAa84ADsajvCOljS45YX3N0YmGiKnC5cfAOPlP0nm2sjBXZLatmGLa
+edjv7+fMNz/cZAYTsjE8AqvMKnCXT7l0N6uOXu9ob7gZpiMz/JzI6HrHxYQXsyUN
+t279WCCihLG0cuqePAG8EQQN9r+8TS6/t6RyQ670fQ9nCzNAvid9/t2ozESQJqDu
+kmdT9O/fYQXeBYYK0OLyH0t3tjI/I+8W84AR+ITsJpylsVAPJESHQG+h6aFMwcuW
+8IujUYyQP6opJdIKNQICYr9R14nFcQSP5d20JiYVHOwJa4G7Kyrq5as+X7zVMxa4
+sYWEbY7RrVXRtOivIjprN8NI5Mp3ejhiVdFZ70Vek1PaYDzC0cqH7wDomMHzwK2x
+Z4Hjl0Ez0y2Q4KvO2Od2RdmCu3fKVvg12uCFa+Y2zfjopno4Ke+3PiQFJpPE8KFQ
+yEyNNXIWjFuJNVT+14V7OM2cZ5mWgmQjKHyYaB3uKgrmXAz7DnBtTGOOqTc5NF6Y
+YipoCHFUDHa6wINrdyZg8mVeqSb68ld267Lfs9t9GDEF+y+lTHX6BSsWHhrQv6er
+3YOjNDdFYa3DAIPTiFo/
+=QZRk
+-END PGP SIGNATURE-

Added: dev/s

[spark] branch master updated (5c5dd77 -> 3d38bc2)

2020-05-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5c5dd77  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration
 add 3d38bc2  [SPARK-31361][SQL][FOLLOWUP] Use 
LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/parquet/ParquetIOSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5c5dd77 -> 3d38bc2)

2020-05-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5c5dd77  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration
 add 3d38bc2  [SPARK-31361][SQL][FOLLOWUP] Use 
LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/parquet/ParquetIOSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r39316 - in /dev/spark/v2.4.6-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

2020-05-06 Thread holden

Author: holden
Date: Thu May  7 00:48:34 2020
New Revision: 39316

Log:
Apache Spark v2.4.6-rc1 docs


[This commit notification would consist of 1458 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31361][SQL][FOLLOWUP] Use LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite

2020-05-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 43d8d54  [SPARK-31361][SQL][FOLLOWUP] Use 
LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite
43d8d54 is described below

commit 43d8d54e8beab25dbdf75ca93943f774b93297ea
Author: Max Gekk 
AuthorDate: Thu May 7 09:46:42 2020 +0900

[SPARK-31361][SQL][FOLLOWUP] Use LEGACY_PARQUET_REBASE_DATETIME_IN_READ 
instead of avro config in ParquetIOSuite

### What changes were proposed in this pull request?
Replace the Avro SQL config `LEGACY_AVRO_REBASE_DATETIME_IN_READ ` by 
`LEGACY_PARQUET_REBASE_DATETIME_IN_READ ` in `ParquetIOSuite`.

### Why are the changes needed?
Avro config is not relevant to the parquet tests.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `ParquetIOSuite` via
```
./build/sbt "test:testOnly *ParquetIOSuite"
```

Closes #28461 from MaxGekk/fix-conf-in-ParquetIOSuite.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 3d38bc2605ab01d61127c09e1bf6ed6a6683ed3e)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/datasources/parquet/ParquetIOSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
index 239db7d..7f0a228 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
@@ -955,7 +955,7 @@ class ParquetIOSuite extends QueryTest with ParquetTest 
with SharedSparkSession
 // The file metadata indicates if it needs rebase or not, so we 
can always get the
 // correct result regardless of the "rebaseInRead" config.
 Seq(true, false).foreach { rebase =>
-  withSQLConf(SQLConf.LEGACY_AVRO_REBASE_DATETIME_IN_READ.key -> 
rebase.toString) {
+  withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_DATETIME_IN_READ.key 
-> rebase.toString) {
 checkAnswer(spark.read.parquet(path), 
Row(Timestamp.valueOf(tsStr)))
   }
 }
@@ -984,7 +984,7 @@ class ParquetIOSuite extends QueryTest with ParquetTest 
with SharedSparkSession
   // The file metadata indicates if it needs rebase or not, so we can 
always get the correct
   // result regardless of the "rebaseInRead" config.
   Seq(true, false).foreach { rebase =>
-withSQLConf(SQLConf.LEGACY_AVRO_REBASE_DATETIME_IN_READ.key -> 
rebase.toString) {
+withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_DATETIME_IN_READ.key -> 
rebase.toString) {
   checkAnswer(spark.read.parquet(path), 
Row(Date.valueOf("1001-01-01")))
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31361][SQL][FOLLOWUP] Use LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite

2020-05-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 43d8d54  [SPARK-31361][SQL][FOLLOWUP] Use 
LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite
43d8d54 is described below

commit 43d8d54e8beab25dbdf75ca93943f774b93297ea
Author: Max Gekk 
AuthorDate: Thu May 7 09:46:42 2020 +0900

[SPARK-31361][SQL][FOLLOWUP] Use LEGACY_PARQUET_REBASE_DATETIME_IN_READ 
instead of avro config in ParquetIOSuite

### What changes were proposed in this pull request?
Replace the Avro SQL config `LEGACY_AVRO_REBASE_DATETIME_IN_READ ` by 
`LEGACY_PARQUET_REBASE_DATETIME_IN_READ ` in `ParquetIOSuite`.

### Why are the changes needed?
Avro config is not relevant to the parquet tests.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `ParquetIOSuite` via
```
./build/sbt "test:testOnly *ParquetIOSuite"
```

Closes #28461 from MaxGekk/fix-conf-in-ParquetIOSuite.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 3d38bc2605ab01d61127c09e1bf6ed6a6683ed3e)
Signed-off-by: HyukjinKwon 
---
 .../spark/sql/execution/datasources/parquet/ParquetIOSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
index 239db7d..7f0a228 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
@@ -955,7 +955,7 @@ class ParquetIOSuite extends QueryTest with ParquetTest 
with SharedSparkSession
 // The file metadata indicates if it needs rebase or not, so we 
can always get the
 // correct result regardless of the "rebaseInRead" config.
 Seq(true, false).foreach { rebase =>
-  withSQLConf(SQLConf.LEGACY_AVRO_REBASE_DATETIME_IN_READ.key -> 
rebase.toString) {
+  withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_DATETIME_IN_READ.key 
-> rebase.toString) {
 checkAnswer(spark.read.parquet(path), 
Row(Timestamp.valueOf(tsStr)))
   }
 }
@@ -984,7 +984,7 @@ class ParquetIOSuite extends QueryTest with ParquetTest 
with SharedSparkSession
   // The file metadata indicates if it needs rebase or not, so we can 
always get the correct
   // result regardless of the "rebaseInRead" config.
   Seq(true, false).foreach { rebase =>
-withSQLConf(SQLConf.LEGACY_AVRO_REBASE_DATETIME_IN_READ.key -> 
rebase.toString) {
+withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_DATETIME_IN_READ.key -> 
rebase.toString) {
   checkAnswer(spark.read.parquet(path), 
Row(Date.valueOf("1001-01-01")))
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3d38bc2 -> 9bf7387)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3d38bc2  [SPARK-31361][SQL][FOLLOWUP] Use 
LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite
 add 9bf7387  [SPARK-31365][SQL][FOLLOWUP] Refine config document for 
nested predicate pushdown

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate pushdown

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new dc7324e  [SPARK-31365][SQL][FOLLOWUP] Refine config document for 
nested predicate pushdown
dc7324e is described below

commit dc7324e5e39783995b90e64d4737127c10a210cf
Author: Liang-Chi Hsieh 
AuthorDate: Thu May 7 09:57:08 2020 +0900

[SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate 
pushdown

### What changes were proposed in this pull request?

This is a followup to address the 
https://github.com/apache/spark/pull/28366#discussion_r420611872 by refining 
the SQL config document.

### Why are the changes needed?

Make developers less confusing.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Only doc change.

Closes #28468 from viirya/SPARK-31365-followup.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 9bf738724a3895551464d8ba0d455bc90868983f)
Signed-off-by: Takeshi Yamamuro 
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 8d673c5..6c18280 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2070,7 +2070,8 @@ object SQLConf {
   .internal()
   .doc("A comma-separated list of data source short names or fully 
qualified data source " +
 "implementation class names for which Spark tries to push down 
predicates for nested " +
-"columns and/or names containing `dots` to data sources. Currently, 
Parquet implements " +
+"columns and/or names containing `dots` to data sources. This 
configuration is only " +
+"effective with file-based data source in DSv1. Currently, Parquet 
implements " +
 "both optimizations while ORC only supports predicates for names 
containing `dots`. The " +
 "other data sources don't support this feature yet. So the default 
value is 'parquet,orc'.")
   .version("3.0.0")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9bf7387 -> 052ff49)

2020-05-06 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9bf7387  [SPARK-31365][SQL][FOLLOWUP] Refine config document for 
nested predicate pushdown
 add 052ff49  [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input 
vectors

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/impl/Utils.scala |  16 +
 .../ml/classification/LogisticRegression.scala | 812 -
 .../ml/optim/aggregator/HingeAggregator.scala  |  16 +-
 .../ml/optim/aggregator/LogisticAggregator.scala   | 246 ++-
 .../classification/LogisticRegressionSuite.scala   |  29 +
 .../ml/optim/aggregator/HingeAggregatorSuite.scala |   4 +-
 .../optim/aggregator/LogisticAggregatorSuite.scala |  69 +-
 python/pyspark/ml/classification.py|  26 +-
 8 files changed, 856 insertions(+), 362 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9bf7387 -> 052ff49)

2020-05-06 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9bf7387  [SPARK-31365][SQL][FOLLOWUP] Refine config document for 
nested predicate pushdown
 add 052ff49  [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input 
vectors

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/impl/Utils.scala |  16 +
 .../ml/classification/LogisticRegression.scala | 812 -
 .../ml/optim/aggregator/HingeAggregator.scala  |  16 +-
 .../ml/optim/aggregator/LogisticAggregator.scala   | 246 ++-
 .../classification/LogisticRegressionSuite.scala   |  29 +
 .../ml/optim/aggregator/HingeAggregatorSuite.scala |   4 +-
 .../optim/aggregator/LogisticAggregatorSuite.scala |  69 +-
 python/pyspark/ml/classification.py|  26 +-
 8 files changed, 856 insertions(+), 362 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bd6b53c  [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc 
which throws 'address in use' BindException with retry
bd6b53c is described below

commit bd6b53cc0ba93f7f1ff8e00ccc366cd02a24d72a
Author: Kent Yao 
AuthorDate: Thu May 7 14:37:03 2020 +0900

[SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 
'address in use' BindException with retry

### What changes were proposed in this pull request?
The `Kafka*Suite`s are flaky because of the Hadoop MiniKdc issue - 
https://issues.apache.org/jira/browse/HADOOP-12656
> Looking at MiniKdc implementation, if port is 0, the constructor use 
ServerSocket to find an unused port, assign the port number to the member 
variable port and close the ServerSocket object; later, in initKDCServer(), 
instantiate a TcpTransport object and bind at that port.

> It appears that the port may be used in between, and then throw the 
exception.

Related test failures are suspected,  such as 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/

```scala
[info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED 
*** (15 seconds, 426 milliseconds)
[info]   java.net.BindException: Address already in use
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:433)
[info]   at sun.nio.ch.Net.bind(Net.java:425)
[info]   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
[info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
[info]   at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
```
After comparing the error stack trace with similar issues reported  in 
different projects, such as
https://issues.apache.org/jira/browse/KAFKA-3453
https://issues.apache.org/jira/browse/HBASE-14734

We can be sure that they are caused by the same problem issued in 
HADOOP-12656.

In the PR, We apply the approach from HBASE first before we finally drop 
Hadoop 2.7.x

### Why are the changes needed?

fix test flakiness

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?

the test itself passing Jenkins

Closes #28442 from yaooqinn/SPARK-31631.

Authored-by: Kent Yao 
Signed-off-by: Takeshi Yamamuro 
---
 .../HadoopDelegationTokenManagerSuite.scala| 30 --
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++---
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
index 275bca3..fc28968 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
@@ -19,10 +19,14 @@ package org.apache.spark.deploy.security
 
 import java.security.PrivilegedExceptionAction
 
+import scala.util.control.NonFatal
+
 import org.apache.hadoop.conf.Configuration
 import 
org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION
 import org.apache.hadoop.minikdc.MiniKdc
 import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.scalatest.concurrent.Eventually._
+import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.{SparkConf, SparkFunSuite}
 import org.apache.spark.deploy.SparkHadoopUtil
@@ -88,8 +92,30 @@ class HadoopDelegationTokenManagerSuite extends 
SparkFunSuite {
   // krb5.conf. MiniKdc sets "java.security.krb5.conf"

[spark] branch master updated (052ff49 -> bd6b53c)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 052ff49  [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input 
vectors
 add bd6b53c  [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc 
which throws 'address in use' BindException with retry

No new revisions were added by this update.

Summary of changes:
 .../HadoopDelegationTokenManagerSuite.scala| 30 --
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++---
 2 files changed, 54 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bd6b53c -> b31ae7b)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd6b53c  [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc 
which throws 'address in use' BindException with retry
 add b31ae7b  [SPARK-31615][SQL] Pretty string output for sql method of 
RuntimeReplaceable expressions

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/test_context.py   |   2 +-
 .../sql/catalyst/expressions/Expression.scala  |  14 ++
 .../catalyst/expressions/datetimeExpressions.scala |  31 ++-
 .../sql/catalyst/expressions/nullExpressions.scala |  10 +-
 .../catalyst/expressions/stringExpressions.scala   |   4 +-
 .../apache/spark/sql/catalyst/util/package.scala   |   2 +
 .../sql-functions/sql-expression-schema.md |  16 +-
 .../test/resources/sql-tests/inputs/extract.sql|   5 +
 .../sql-tests/results/ansi/datetime.sql.out|  64 +++---
 .../sql-tests/results/ansi/interval.sql.out|   8 +-
 .../sql-tests/results/csv-functions.sql.out|   2 +-
 .../resources/sql-tests/results/datetime.sql.out   |  64 +++---
 .../resources/sql-tests/results/extract.sql.out| 214 -
 .../sql-tests/results/group-by-filter.sql.out  |  12 +-
 .../resources/sql-tests/results/interval.sql.out   |  10 +-
 .../sql-tests/results/json-functions.sql.out   |   2 +-
 .../sql-tests/results/postgreSQL/text.sql.out  |   6 +-
 .../sql-tests/results/predicate-functions.sql.out  |  26 +--
 .../results/sql-compatibility-functions.sql.out|  18 +-
 .../sql-tests/results/string-functions.sql.out |   8 +-
 .../typeCoercion/native/dateTimeOperations.sql.out |   8 +-
 .../native/stringCastAndExpressions.sql.out|   4 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala  |   6 +-
 23 files changed, 292 insertions(+), 244 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4952f1a -> f05560b)

[spark] branch master updated (4952f1a -> f05560b)

[spark] branch master updated (f05560b -> b16ea8e)

[spark] branch branch-3.0 updated: [SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has unmanaged subqueries

[spark] branch master updated (f05560b -> b16ea8e)

[spark] branch branch-3.0 updated: [SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has unmanaged subqueries

[spark] branch branch-3.0 updated: [SPARK-31650][SQL] Fix wrong UI in case of AdaptiveSparkPlanExec has unmanaged subqueries

[spark] branch master updated: [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to PySpark

[spark] branch master updated (b16ea8e -> 09ece50)

[spark] branch master updated (b16ea8e -> 09ece50)

svn commit: r39313 - /dev/spark/KEYS

[spark] branch branch-2.4 updated: [SPARK-31653][BUILD] Setuptools is needed before installing any other python packages

[spark] 01/01: Preparing Spark release v2.4.6-rc1

[spark] tag v2.4.6-rc1 created (now a3cffc9)

[spark] 01/01: Preparing development version 2.4.7-SNAPSHOT

[spark] branch branch-2.4 updated (a00eddc -> 8504bdb)

[spark] branch master updated (09ece50 -> 5c5dd77)

[spark] branch master updated (09ece50 -> 5c5dd77)

[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

svn commit: r39315 - /dev/spark/v2.4.6-rc1-bin/

[spark] branch master updated (5c5dd77 -> 3d38bc2)

[spark] branch master updated (5c5dd77 -> 3d38bc2)

svn commit: r39316 - in /dev/spark/v2.4.6-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

[spark] branch branch-3.0 updated: [SPARK-31361][SQL][FOLLOWUP] Use LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite

[spark] branch branch-3.0 updated: [SPARK-31361][SQL][FOLLOWUP] Use LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite

[spark] branch master updated (3d38bc2 -> 9bf7387)

[spark] branch branch-3.0 updated: [SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate pushdown

[spark] branch master updated (9bf7387 -> 052ff49)

[spark] branch master updated (9bf7387 -> 052ff49)

[spark] branch master updated: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry

[spark] branch master updated (052ff49 -> bd6b53c)

[spark] branch master updated (bd6b53c -> b31ae7b)

33 matches

Site Navigation

Mail list logo

Footer information