date:20170328

spark git commit: [SPARK-20040][ML][PYTHON] pyspark wrapper for ChiSquareTest

2017-03-28 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master 7d432af8f -> a5c87707e


[SPARK-20040][ML][PYTHON] pyspark wrapper for ChiSquareTest

## What changes were proposed in this pull request?

A pyspark wrapper for spark.ml.stat.ChiSquareTest

## How was this patch tested?

unit tests
doctests

Author: Bago Amirbekian 

Closes #17421 from MrBago/chiSquareTestWrapper.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a5c87707
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a5c87707
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a5c87707

Branch: refs/heads/master
Commit: a5c87707eaec5cacdfb703eb396dfc264bc54cda
Parents: 7d432af
Author: Bago Amirbekian 
Authored: Tue Mar 28 19:19:16 2017 -0700
Committer: Joseph K. Bradley 
Committed: Tue Mar 28 19:19:16 2017 -0700

--
 dev/sparktestsupport/modules.py |  1 +
 .../apache/spark/ml/stat/ChiSquareTest.scala|  6 +-
 python/docs/pyspark.ml.rst  |  8 ++
 python/pyspark/ml/stat.py   | 93 
 python/pyspark/ml/tests.py  | 31 +--
 5 files changed, 127 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a5c87707/dev/sparktestsupport/modules.py
--
diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index eaf1f3a..246f518 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -431,6 +431,7 @@ pyspark_ml = Module(
 "pyspark.ml.linalg.__init__",
 "pyspark.ml.recommendation",
 "pyspark.ml.regression",
+"pyspark.ml.stat",
 "pyspark.ml.tuning",
 "pyspark.ml.tests",
 ],

http://git-wip-us.apache.org/repos/asf/spark/blob/a5c87707/mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquareTest.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquareTest.scala 
b/mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquareTest.scala
index 21eba9a..5b38ca7 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquareTest.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquareTest.scala
@@ -46,9 +46,9 @@ object ChiSquareTest {
   statistics: Vector)
 
   /**
-   * Conduct Pearson's independence test for every feature against the label 
across the input RDD.
-   * For each feature, the (feature, label) pairs are converted into a 
contingency matrix for which
-   * the Chi-squared statistic is computed. All label and feature values must 
be categorical.
+   * Conduct Pearson's independence test for every feature against the label. 
For each feature, the
+   * (feature, label) pairs are converted into a contingency matrix for which 
the Chi-squared
+   * statistic is computed. All label and feature values must be categorical.
*
* The null hypothesis is that the occurrence of the outcomes is 
statistically independent.
*

http://git-wip-us.apache.org/repos/asf/spark/blob/a5c87707/python/docs/pyspark.ml.rst
--
diff --git a/python/docs/pyspark.ml.rst b/python/docs/pyspark.ml.rst
index a681834..930646d 100644
--- a/python/docs/pyspark.ml.rst
+++ b/python/docs/pyspark.ml.rst
@@ -65,6 +65,14 @@ pyspark.ml.regression module
 :undoc-members:
 :inherited-members:
 
+pyspark.ml.stat module
+--
+
+.. automodule:: pyspark.ml.stat
+:members:
+:undoc-members:
+:inherited-members:
+
 pyspark.ml.tuning module
 
 

http://git-wip-us.apache.org/repos/asf/spark/blob/a5c87707/python/pyspark/ml/stat.py
--
diff --git a/python/pyspark/ml/stat.py b/python/pyspark/ml/stat.py
new file mode 100644
index 000..db043ff
--- /dev/null
+++ b/python/pyspark/ml/stat.py
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language

spark git commit: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails for uppercase impurity type Gini

2017-03-28 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 4964dbedb -> 30954806f


[SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails for 
uppercase impurity type Gini

Fix bug: DecisionTreeModel can't recongnize Impurity "Gini" when loading

TODO:
+ [x] add unit test
+ [x] fix the bug

Author: é¢åæï¼Yan Facaiï¼ 

Closes #17407 from facaiy/BUG/decision_tree_loader_failer_with_Gini_impurity.

(cherry picked from commit 7d432af8f3c47973550ea253dae0c23cd2961bde)
Signed-off-by: Joseph K. Bradley 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30954806
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/30954806
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/30954806

Branch: refs/heads/branch-2.1
Commit: 30954806f1be0dba63f0a608d824d7d811485801
Parents: 4964dbe
Author: é¢åæï¼Yan Facaiï¼ 
Authored: Tue Mar 28 16:14:01 2017 -0700
Committer: Joseph K. Bradley 
Committed: Tue Mar 28 16:14:11 2017 -0700

--
 .../apache/spark/mllib/tree/impurity/Impurity.scala   |  2 +-
 .../classification/DecisionTreeClassifierSuite.scala  | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/30954806/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala
index a5bdc2c..98a3021 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala
@@ -184,7 +184,7 @@ private[spark] object ImpurityCalculator {
* the given stats.
*/
   def getCalculator(impurity: String, stats: Array[Double]): 
ImpurityCalculator = {
-impurity match {
+impurity.toLowerCase match {
   case "gini" => new GiniCalculator(stats)
   case "entropy" => new EntropyCalculator(stats)
   case "variance" => new VarianceCalculator(stats)

http://git-wip-us.apache.org/repos/asf/spark/blob/30954806/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
index c711e7f..692a172 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
@@ -383,6 +383,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   checkModelData)
   }
+
+  test("SPARK-20043: " +
+   "ImpurityCalculator builder fails for uppercase impurity type Gini in 
model read/write") {
+val rdd = TreeTests.getTreeReadWriteData(sc)
+val data: DataFrame =
+  TreeTests.setMetadata(rdd, Map.empty[Int, Int], numClasses = 2)
+
+val dt = new DecisionTreeClassifier()
+  .setImpurity("Gini")
+  .setMaxDepth(2)
+val model = dt.fit(data)
+
+testDefaultReadWrite(model)
+  }
 }
 
 private[ml] object DecisionTreeClassifierSuite extends SparkFunSuite {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails for uppercase impurity type Gini

2017-03-28 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master 92e385e0b -> 7d432af8f


[SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails for 
uppercase impurity type Gini

Fix bug: DecisionTreeModel can't recongnize Impurity "Gini" when loading

TODO:
+ [x] add unit test
+ [x] fix the bug

Author: é¢åæï¼Yan Facaiï¼ 

Closes #17407 from facaiy/BUG/decision_tree_loader_failer_with_Gini_impurity.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7d432af8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7d432af8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7d432af8

Branch: refs/heads/master
Commit: 7d432af8f3c47973550ea253dae0c23cd2961bde
Parents: 92e385e
Author: é¢åæï¼Yan Facaiï¼ 
Authored: Tue Mar 28 16:14:01 2017 -0700
Committer: Joseph K. Bradley 
Committed: Tue Mar 28 16:14:01 2017 -0700

--
 .../apache/spark/mllib/tree/impurity/Impurity.scala   |  2 +-
 .../classification/DecisionTreeClassifierSuite.scala  | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7d432af8/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala
index a5bdc2c..98a3021 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala
@@ -184,7 +184,7 @@ private[spark] object ImpurityCalculator {
* the given stats.
*/
   def getCalculator(impurity: String, stats: Array[Double]): 
ImpurityCalculator = {
-impurity match {
+impurity.toLowerCase match {
   case "gini" => new GiniCalculator(stats)
   case "entropy" => new EntropyCalculator(stats)
   case "variance" => new VarianceCalculator(stats)

http://git-wip-us.apache.org/repos/asf/spark/blob/7d432af8/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
index 10de503..964fcfb 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
@@ -385,6 +385,20 @@ class DecisionTreeClassifierSuite
 testEstimatorAndModelReadWrite(dt, continuousData, allParamSettings ++ 
Map("maxDepth" -> 0),
   allParamSettings ++ Map("maxDepth" -> 0), checkModelData)
   }
+
+  test("SPARK-20043: " +
+   "ImpurityCalculator builder fails for uppercase impurity type Gini in 
model read/write") {
+val rdd = TreeTests.getTreeReadWriteData(sc)
+val data: DataFrame =
+  TreeTests.setMetadata(rdd, Map.empty[Int, Int], numClasses = 2)
+
+val dt = new DecisionTreeClassifier()
+  .setImpurity("Gini")
+  .setMaxDepth(2)
+val model = dt.fit(data)
+
+testDefaultReadWrite(model)
+  }
 }
 
 private[ml] object DecisionTreeClassifierSuite extends SparkFunSuite {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark git commit: Preparing Spark release v2.1.1-rc2

2017-03-28 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 e669dd7ea -> 4964dbedb


Preparing Spark release v2.1.1-rc2


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/02b165dc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/02b165dc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/02b165dc

Branch: refs/heads/branch-2.1
Commit: 02b165dcc2ee5245d1293a375a31660c9d4e1fa6
Parents: e669dd7
Author: Patrick Wendell 
Authored: Tue Mar 28 14:29:03 2017 -0700
Committer: Patrick Wendell 
Committed: Tue Mar 28 14:29:03 2017 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/java8-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mesos/pom.xml | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/02b165dc/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2d461ca..1ceda7b 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.1.2
+Version: 2.1.1
 Title: R Frontend for Apache Spark
 Description: The SparkR package provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/02b165dc/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 6e092ef..cc290c0 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2-SNAPSHOT
+2.1.1
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/02b165dc/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 77a4b64..ccf4b27 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2-SNAPSHOT
+2.1.1
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/02b165dc/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 1a2d85a..98a2324 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.2-SNAPSHOT
+2.1.1
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/02b165dc/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 7a57e89..dc1ad14 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@

[2/2] spark git commit: Preparing development version 2.1.2-SNAPSHOT

2017-03-28 Thread pwendell

Preparing development version 2.1.2-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4964dbed
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4964dbed
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4964dbed

Branch: refs/heads/branch-2.1
Commit: 4964dbedbdc2a127b2d5afb6ec11043a62a6e6f6
Parents: 02b165d
Author: Patrick Wendell 
Authored: Tue Mar 28 14:29:08 2017 -0700
Committer: Patrick Wendell 
Committed: Tue Mar 28 14:29:08 2017 -0700

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/java8-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mesos/pom.xml | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 yarn/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4964dbed/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 1ceda7b..2d461ca 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.1.1
+Version: 2.1.2
 Title: R Frontend for Apache Spark
 Description: The SparkR package provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/4964dbed/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index cc290c0..6e092ef 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.1
+2.1.2-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4964dbed/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index ccf4b27..77a4b64 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.1
+2.1.2-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4964dbed/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 98a2324..1a2d85a 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.1
+2.1.2-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4964dbed/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index dc1ad14..7a57e89 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.1.1
+

[spark] Git Push Summary

2017-03-28 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v2.1.1-rc2 [created] 02b165dcc

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-14536][SQL][BACKPORT-2.1] fix to handle null value in array type column for postgres.

2017-03-28 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 fd2e40614 -> e669dd7ea


[SPARK-14536][SQL][BACKPORT-2.1] fix to handle null value in array type column 
for postgres.

## What changes were proposed in this pull request?
JDBC read is failing with NPE due to missing null value check for array data 
type if the source table has null values in the array type column. For null 
values Resultset.getArray() returns null.
This PR adds null safe check to the Resultset.getArray() value before invoking 
method on the Array object

## How was this patch tested?
Updated the PostgresIntegration test suite to test null values. Ran docker 
integration tests on my laptop.

Author: sureshthalamati 

Closes #17460 from sureshthalamati/jdbc_array_null_fix_spark_2.1-SPARK-14536.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e669dd7e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e669dd7e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e669dd7e

Branch: refs/heads/branch-2.1
Commit: e669dd7ea474f65fea0d5df011a333bda9de91b4
Parents: fd2e406
Author: sureshthalamati 
Authored: Tue Mar 28 14:02:01 2017 -0700
Committer: Xiao Li 
Committed: Tue Mar 28 14:02:01 2017 -0700

--
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala   | 12 ++--
 .../sql/execution/datasources/jdbc/JdbcUtils.scala  |  6 +++---
 2 files changed, 13 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e669dd7e/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
--
diff --git 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index c9325de..a1a065a 100644
--- 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -51,12 +51,17 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   + "B'1000100101', E'xDEADBEEF', true, '172.16.0.42', 
'192.168.0.0/16', "
   + """'{1, 2}', '{"a", null, "b"}', '{0.11, 0.22}', '{0.11, 0.22}', 'd1', 
1.01, 1)"""
 ).executeUpdate()
+conn.prepareStatement("INSERT INTO bar VALUES (null, null, null, null, 
null, "
+  + "null, null, null, null, null, "
+  + "null, null, null, null, null, null, null)"
+).executeUpdate()
   }
 
   test("Type mapping for various types") {
 val df = sqlContext.read.jdbc(jdbcUrl, "bar", new Properties)
-val rows = df.collect()
-assert(rows.length == 1)
+val rows = df.collect().sortBy(_.toString())
+assert(rows.length == 2)
+// Test the types, and values using the first row.
 val types = rows(0).toSeq.map(x => x.getClass)
 assert(types.length == 17)
 assert(classOf[String].isAssignableFrom(types(0)))
@@ -96,6 +101,9 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows(0).getString(14) == "d1")
 assert(rows(0).getFloat(15) == 1.01f)
 assert(rows(0).getShort(16) == 1)
+
+// Test reading null values using the second row.
+assert(0.until(16).forall(rows(1).isNullAt(_)))
   }
 
   test("Basic write test") {

http://git-wip-us.apache.org/repos/asf/spark/blob/e669dd7e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index 41edb65..81fdf69 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -423,9 +423,9 @@ object JdbcUtils extends Logging {
   }
 
   (rs: ResultSet, row: InternalRow, pos: Int) =>
-val array = nullSafeConvert[Object](
-  rs.getArray(pos + 1).getArray,
-  array => new GenericArrayData(elementConversion.apply(array)))
+val array = nullSafeConvert[java.sql.Array](
+  input = rs.getArray(pos + 1),
+  array => new 
GenericArrayData(elementConversion.apply(array.getArray)))
 row.update(pos, array)
 
 case _ => throw new IllegalArgumentException(s"Unsupported type 
${dt.simpleString}")

spark git commit: [SPARK-20125][SQL] Dataset of type option of map does not work

2017-03-28 Thread lian

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 4bcb7d676 -> fd2e40614


[SPARK-20125][SQL] Dataset of type option of map does not work

When we build the deserializer expression for map type, we will use 
`StaticInvoke` to call `ArrayBasedMapData.toScalaMap`, and declare the return 
type as `scala.collection.immutable.Map`. If the map is inside an Option, we 
will wrap this `StaticInvoke` with `WrapOption`, which requires the input to be 
`scala.collect.Map`. Ideally this should be fine, as 
`scala.collection.immutable.Map` extends `scala.collect.Map`, but our 
`ObjectType` is too strict about this, this PR fixes it.

new regression test

Author: Wenchen Fan 

Closes #17454 from cloud-fan/map.

(cherry picked from commit d4fac410e0554b7ccd44be44b7ce2fe07ed7f206)
Signed-off-by: Cheng Lian 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fd2e4061
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fd2e4061
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fd2e4061

Branch: refs/heads/branch-2.1
Commit: fd2e40614b511fb9ef3e52cc1351659fdbfd612a
Parents: 4bcb7d6
Author: Wenchen Fan 
Authored: Tue Mar 28 11:47:43 2017 -0700
Committer: Cheng Lian 
Committed: Tue Mar 28 12:36:27 2017 -0700

--
 .../src/main/scala/org/apache/spark/sql/types/ObjectType.scala | 5 +
 .../src/test/scala/org/apache/spark/sql/DatasetSuite.scala | 6 ++
 2 files changed, 11 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fd2e4061/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala
index b18fba2..2d49fe0 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala
@@ -44,4 +44,9 @@ case class ObjectType(cls: Class[_]) extends DataType {
   def asNullable: DataType = this
 
   override def simpleString: String = cls.getName
+
+  override def acceptsType(other: DataType): Boolean = other match {
+case ObjectType(otherCls) => cls.isAssignableFrom(otherCls)
+case _ => false
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/fd2e4061/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
index 381652d..9cc49b6 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@@ -1072,10 +1072,16 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
 val ds2 = Seq(WithMap("hi", Map(42L -> "foo"))).toDS
 checkDataset(ds2.map(t => t), WithMap("hi", Map(42L -> "foo")))
   }
+
+  test("SPARK-20125: option of map") {
+val ds = Seq(WithMapInOption(Some(Map(1 -> 1.toDS()
+checkDataset(ds, WithMapInOption(Some(Map(1 -> 1
+  }
 }
 
 case class WithImmutableMap(id: String, map_test: 
scala.collection.immutable.Map[Long, String])
 case class WithMap(id: String, map_test: scala.collection.Map[Long, String])
+case class WithMapInOption(m: Option[scala.collection.Map[Int, Int]])
 
 case class Generic[T](id: T, value: Double)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-19868] conflict TasksetManager lead to spark stopped

2017-03-28 Thread kayousterhout

Repository: spark
Updated Branches:
  refs/heads/master d4fac410e -> 92e385e0b


[SPARK-19868] conflict TasksetManager lead to spark stopped

## What changes were proposed in this pull request?

We must set the taskset to zombie before the DAGScheduler handles the taskEnded 
event. It's possible the taskEnded event will cause the DAGScheduler to launch 
a new stage attempt (this happens when map output data was lost), and if this 
happens before the taskSet has been set to zombie, it will appear that we have 
conflicting task sets.

Author: liujianhui 

Closes #17208 from liujianhuiouc/spark-19868.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/92e385e0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/92e385e0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/92e385e0

Branch: refs/heads/master
Commit: 92e385e0b55d70a48411e90aa0f2ed141c4d07c8
Parents: d4fac41
Author: liujianhui 
Authored: Tue Mar 28 12:13:45 2017 -0700
Committer: Kay Ousterhout 
Committed: Tue Mar 28 12:13:45 2017 -0700

--
 .../apache/spark/scheduler/TaskSetManager.scala | 15 ++-
 .../spark/scheduler/TaskSetManagerSuite.scala   | 27 +++-
 2 files changed, 34 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/92e385e0/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
index a177aab..a41b059 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
@@ -713,13 +713,7 @@ private[spark] class TaskSetManager(
   successfulTaskDurations.insert(info.duration)
 }
 removeRunningTask(tid)
-// This method is called by "TaskSchedulerImpl.handleSuccessfulTask" which 
holds the
-// "TaskSchedulerImpl" lock until exiting. To avoid the SPARK-7655 issue, 
we should not
-// "deserialize" the value when holding a lock to avoid blocking other 
threads. So we call
-// "result.value()" in "TaskResultGetter.enqueueSuccessfulTask" before 
reaching here.
-// Note: "result.value()" only deserializes the value when it's called at 
the first time, so
-// here "result.value()" just returns the value and won't block other 
threads.
-sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), 
result.accumUpdates, info)
+
 // Kill any other attempts for the same task (since those are unnecessary 
now that one
 // attempt completed successfully).
 for (attemptInfo <- taskAttempts(index) if attemptInfo.running) {
@@ -746,6 +740,13 @@ private[spark] class TaskSetManager(
   logInfo("Ignoring task-finished event for " + info.id + " in stage " + 
taskSet.id +
 " because task " + index + " has already completed successfully")
 }
+// This method is called by "TaskSchedulerImpl.handleSuccessfulTask" which 
holds the
+// "TaskSchedulerImpl" lock until exiting. To avoid the SPARK-7655 issue, 
we should not
+// "deserialize" the value when holding a lock to avoid blocking other 
threads. So we call
+// "result.value()" in "TaskResultGetter.enqueueSuccessfulTask" before 
reaching here.
+// Note: "result.value()" only deserializes the value when it's called at 
the first time, so
+// here "result.value()" just returns the value and won't block other 
threads.
+sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), 
result.accumUpdates, info)
 maybeFinishTaskSet()
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/92e385e0/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
index 132caef..9ca6b8b 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
@@ -22,8 +22,10 @@ import java.util.{Properties, Random}
 import scala.collection.mutable
 import scala.collection.mutable.ArrayBuffer
 
-import org.mockito.Matchers.{anyInt, anyString}
+import org.mockito.Matchers.{any, anyInt, anyString}
 import org.mockito.Mockito.{mock, never, spy, verify, when}
+import org.mockito.invocation.InvocationOnMock
+import org.mockito.stubbing.Answer
 
 import org.apache.spark._
 import org.apache.spark.internal.config
@@ -1056,6 +1058,29 @@

spark git commit: [SPARK-20125][SQL] Dataset of type option of map does not work

2017-03-28 Thread lian

Repository: spark
Updated Branches:
  refs/heads/master 17eddb35a -> d4fac410e


[SPARK-20125][SQL] Dataset of type option of map does not work

## What changes were proposed in this pull request?

When we build the deserializer expression for map type, we will use 
`StaticInvoke` to call `ArrayBasedMapData.toScalaMap`, and declare the return 
type as `scala.collection.immutable.Map`. If the map is inside an Option, we 
will wrap this `StaticInvoke` with `WrapOption`, which requires the input to be 
`scala.collect.Map`. Ideally this should be fine, as 
`scala.collection.immutable.Map` extends `scala.collect.Map`, but our 
`ObjectType` is too strict about this, this PR fixes it.

## How was this patch tested?

new regression test

Author: Wenchen Fan 

Closes #17454 from cloud-fan/map.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d4fac410
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d4fac410
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d4fac410

Branch: refs/heads/master
Commit: d4fac410e0554b7ccd44be44b7ce2fe07ed7f206
Parents: 17eddb3
Author: Wenchen Fan 
Authored: Tue Mar 28 11:47:43 2017 -0700
Committer: Cheng Lian 
Committed: Tue Mar 28 11:47:43 2017 -0700

--
 .../src/main/scala/org/apache/spark/sql/types/ObjectType.scala | 5 +
 .../src/test/scala/org/apache/spark/sql/DatasetSuite.scala | 6 ++
 2 files changed, 11 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d4fac410/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala
index b18fba2..2d49fe0 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala
@@ -44,4 +44,9 @@ case class ObjectType(cls: Class[_]) extends DataType {
   def asNullable: DataType = this
 
   override def simpleString: String = cls.getName
+
+  override def acceptsType(other: DataType): Boolean = other match {
+case ObjectType(otherCls) => cls.isAssignableFrom(otherCls)
+case _ => false
+  }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/d4fac410/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
index 6417e7a..68e071a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@@ -1154,10 +1154,16 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
 assert(errMsg3.getMessage.startsWith("cannot have circular references in 
class, but got the " +
   "circular reference of class"))
   }
+
+  test("SPARK-20125: option of map") {
+val ds = Seq(WithMapInOption(Some(Map(1 -> 1.toDS()
+checkDataset(ds, WithMapInOption(Some(Map(1 -> 1
+  }
 }
 
 case class WithImmutableMap(id: String, map_test: 
scala.collection.immutable.Map[Long, String])
 case class WithMap(id: String, map_test: scala.collection.Map[Long, String])
+case class WithMapInOption(m: Option[scala.collection.Map[Int, Int]])
 
 case class Generic[T](id: T, value: Double)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode

2017-03-28 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 4056191d3 -> 4bcb7d676


[SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of 
tokens in yarn client mode

## What changes were proposed in this pull request?

In the current Spark on YARN code, we will obtain tokens from provided 
services, but we're not going to add these tokens to the current user's 
credentials. This will make all the following operations to these services 
still require TGT rather than delegation tokens. This is unnecessary since we 
already got the tokens, also this will lead to failure in user impersonation 
scenario, because the TGT is granted by real user, not proxy user.

So here changing to put all the tokens to the current UGI, so that following 
operations to these services will honor tokens rather than TGT, and this will 
further handle the proxy user issue mentioned above.

## How was this patch tested?

Local verified in secure cluster.

vanzin tgravescs mridulm  dongjoon-hyun please help to review, thanks a lot.

Author: jerryshao 

Closes #17335 from jerryshao/SPARK-19995.

(cherry picked from commit 17eddb35a280e77da7520343e0bf2a86b329ed62)
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4bcb7d67
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4bcb7d67
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4bcb7d67

Branch: refs/heads/branch-2.1
Commit: 4bcb7d676440dedff737a10c98c308d8f2ed1c96
Parents: 4056191
Author: jerryshao 
Authored: Tue Mar 28 10:41:11 2017 -0700
Committer: Marcelo Vanzin 
Committed: Tue Mar 28 10:41:28 2017 -0700

--
 yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4bcb7d67/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
--
diff --git a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
b/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 5280c42..1ba736b 100644
--- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -403,6 +403,9 @@ private[spark] class Client(
 val nearestTimeOfNextRenewal = 
credentialManager.obtainCredentials(hadoopConf, credentials)
 
 if (credentials != null) {
+  // Add credentials to current user's UGI, so that following operations 
don't need to use the
+  // Kerberos tgt to get delegations again in the client side.
+  UserGroupInformation.getCurrentUser.addCredentials(credentials)
   logDebug(YarnSparkHadoopUtil.get.dumpTokens(credentials).mkString("\n"))
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode

2017-03-28 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/master f82461fc1 -> 17eddb35a


[SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of 
tokens in yarn client mode

## What changes were proposed in this pull request?

In the current Spark on YARN code, we will obtain tokens from provided 
services, but we're not going to add these tokens to the current user's 
credentials. This will make all the following operations to these services 
still require TGT rather than delegation tokens. This is unnecessary since we 
already got the tokens, also this will lead to failure in user impersonation 
scenario, because the TGT is granted by real user, not proxy user.

So here changing to put all the tokens to the current UGI, so that following 
operations to these services will honor tokens rather than TGT, and this will 
further handle the proxy user issue mentioned above.

## How was this patch tested?

Local verified in secure cluster.

vanzin tgravescs mridulm  dongjoon-hyun please help to review, thanks a lot.

Author: jerryshao 

Closes #17335 from jerryshao/SPARK-19995.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/17eddb35
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/17eddb35
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/17eddb35

Branch: refs/heads/master
Commit: 17eddb35a280e77da7520343e0bf2a86b329ed62
Parents: f82461f
Author: jerryshao 
Authored: Tue Mar 28 10:41:11 2017 -0700
Committer: Marcelo Vanzin 
Committed: Tue Mar 28 10:41:11 2017 -0700

--
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/17eddb35/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index ccb0f8f..3218d22 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -371,6 +371,9 @@ private[spark] class Client(
 val nearestTimeOfNextRenewal = 
credentialManager.obtainCredentials(hadoopConf, credentials)
 
 if (credentials != null) {
+  // Add credentials to current user's UGI, so that following operations 
don't need to use the
+  // Kerberos tgt to get delegations again in the client side.
+  UserGroupInformation.getCurrentUser.addCredentials(credentials)
   logDebug(YarnSparkHadoopUtil.get.dumpTokens(credentials).mkString("\n"))
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20126][SQL] Remove HiveSessionState

2017-03-28 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 4fcc214d9 -> f82461fc1


[SPARK-20126][SQL] Remove HiveSessionState

## What changes were proposed in this pull request?
Commit 
https://github.com/apache/spark/commit/ea361165e1ddce4d8aa0242ae3e878d7b39f1de2 
moved most of the logic from the SessionState classes into an accompanying 
builder. This makes the existence of the `HiveSessionState` redundant. This PR 
removes the `HiveSessionState`.

## How was this patch tested?
Existing tests.

Author: Herman van Hovell 

Closes #17457 from hvanhovell/SPARK-20126.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f82461fc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f82461fc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f82461fc

Branch: refs/heads/master
Commit: f82461fc1197f6055d9cf972d82260b178e10a7c
Parents: 4fcc214
Author: Herman van Hovell 
Authored: Tue Mar 28 23:14:31 2017 +0800
Committer: Wenchen Fan 
Committed: Tue Mar 28 23:14:31 2017 +0800

--
 .../spark/sql/execution/command/resources.scala |   2 +-
 .../spark/sql/internal/SessionState.scala   |  47 +++---
 .../sql/internal/sessionStateBuilders.scala |   8 +-
 .../sql/hive/thriftserver/SparkSQLEnv.scala |  12 +-
 .../server/SparkSQLOperationManager.scala   |   6 +-
 .../hive/execution/HiveCompatibilitySuite.scala |   2 +-
 .../org/apache/spark/sql/hive/HiveContext.scala |   4 -
 .../spark/sql/hive/HiveMetastoreCatalog.scala   |   9 +-
 .../spark/sql/hive/HiveSessionState.scala   | 144 +++
 .../apache/spark/sql/hive/test/TestHive.scala   |  23 ++-
 .../sql/hive/HiveMetastoreCatalogSuite.scala|   6 +-
 .../sql/hive/MetastoreDataSourcesSuite.scala|   7 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala |   6 +-
 .../apache/spark/sql/hive/parquetSuites.scala   |  21 +--
 14 files changed, 104 insertions(+), 193 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f82461fc/sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
index 20b0894..2e859cf 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
@@ -37,7 +37,7 @@ case class AddJarCommand(path: String) extends 
RunnableCommand {
   }
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-sparkSession.sessionState.addJar(path)
+sparkSession.sessionState.resourceLoader.addJar(path)
 Seq(Row(0))
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/f82461fc/sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala
index b5b0bb0..c6241d9 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala
@@ -63,6 +63,7 @@ private[sql] class SessionState(
 val optimizer: Optimizer,
 val planner: SparkPlanner,
 val streamingQueryManager: StreamingQueryManager,
+val resourceLoader: SessionResourceLoader,
 createQueryExecution: LogicalPlan => QueryExecution,
 createClone: (SparkSession, SessionState) => SessionState) {
 
@@ -106,27 +107,6 @@ private[sql] class SessionState(
   def refreshTable(tableName: String): Unit = {
 catalog.refreshTable(sqlParser.parseTableIdentifier(tableName))
   }
-
-  /**
-   * Add a jar path to [[SparkContext]] and the classloader.
-   *
-   * Note: this method seems not access any session state, but the subclass 
`HiveSessionState` needs
-   * to add the jar to its hive client for the current session. Hence, it 
still needs to be in
-   * [[SessionState]].
-   */
-  def addJar(path: String): Unit = {
-sparkContext.addJar(path)
-val uri = new Path(path).toUri
-val jarURL = if (uri.getScheme == null) {
-  // `path` is a local file path without a URL scheme
-  new File(path).toURI.toURL
-} else {
-  // `path` is a URL with a scheme
-  uri.toURL
-}
-sharedState.jarClassLoader.addURL(jarURL)
-Thread.currentThread().setContextClassLoader(sharedState.jarClassLoader)
-  }
 }
 
 private[sql] object SessionState {
@@ -160,10 +140,10 @@ class SessionStateBuilder(
  * Session shared

spark git commit: [SPARK-20124][SQL] Join reorder should keep the same order of final project attributes

2017-03-28 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 91559d277 -> 4fcc214d9


[SPARK-20124][SQL] Join reorder should keep the same order of final project 
attributes

## What changes were proposed in this pull request?

Join reorder algorithm should keep exactly the same order of output attributes 
in the top project.
For example, if user want to select a, b, c, after reordering, we should output 
a, b, c in the same order as specified by user, instead of b, a, c or other 
orders.

## How was this patch tested?

A new test case is added in `JoinReorderSuite`.

Author: wangzhenhua 

Closes #17453 from wzhfy/keepOrderInProject.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4fcc214d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4fcc214d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4fcc214d

Branch: refs/heads/master
Commit: 4fcc214d9eb5e98b2eed3e28cc23b0c511cd9007
Parents: 91559d2
Author: wangzhenhua 
Authored: Tue Mar 28 22:22:38 2017 +0800
Committer: Wenchen Fan 
Committed: Tue Mar 28 22:22:38 2017 +0800

--
 .../optimizer/CostBasedJoinReorder.scala| 24 +---
 .../catalyst/optimizer/JoinReorderSuite.scala   | 13 +++
 .../spark/sql/catalyst/plans/PlanTest.scala |  4 ++--
 3 files changed, 31 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4fcc214d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
index fc37720..cbd5064 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
@@ -40,10 +40,10 @@ case class CostBasedJoinReorder(conf: SQLConf) extends 
Rule[LogicalPlan] with Pr
   val result = plan transformDown {
 // Start reordering with a joinable item, which is an InnerLike join 
with conditions.
 case j @ Join(_, _, _: InnerLike, Some(cond)) =>
-  reorder(j, j.outputSet)
+  reorder(j, j.output)
 case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond)))
   if projectList.forall(_.isInstanceOf[Attribute]) =>
-  reorder(p, p.outputSet)
+  reorder(p, p.output)
   }
   // After reordering is finished, convert OrderedJoin back to Join
   result transformDown {
@@ -52,7 +52,7 @@ case class CostBasedJoinReorder(conf: SQLConf) extends 
Rule[LogicalPlan] with Pr
 }
   }
 
-  private def reorder(plan: LogicalPlan, output: AttributeSet): LogicalPlan = {
+  private def reorder(plan: LogicalPlan, output: Seq[Attribute]): LogicalPlan 
= {
 val (items, conditions) = extractInnerJoins(plan)
 // TODO: Compute the set of star-joins and use them in the join enumeration
 // algorithm to prune un-optimal plan choices.
@@ -140,7 +140,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   conf: SQLConf,
   items: Seq[LogicalPlan],
   conditions: Set[Expression],
-  topOutput: AttributeSet): LogicalPlan = {
+  output: Seq[Attribute]): LogicalPlan = {
 
 val startTime = System.nanoTime()
 // Level i maintains all found plans for i + 1 items.
@@ -152,9 +152,10 @@ object JoinReorderDP extends PredicateHelper with Logging {
 
 // Build plans for next levels until the last level has only one plan. 
This plan contains
 // all items that can be joined, so there's no need to continue.
+val topOutputSet = AttributeSet(output)
 while (foundPlans.size < items.length && foundPlans.last.size > 1) {
   // Build plans for the next level.
-  foundPlans += searchLevel(foundPlans, conf, conditions, topOutput)
+  foundPlans += searchLevel(foundPlans, conf, conditions, topOutputSet)
 }
 
 val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
@@ -163,7 +164,14 @@ object JoinReorderDP extends PredicateHelper with Logging {
 
 // The last level must have one and only one plan, because all items are 
joinable.
 assert(foundPlans.size == items.length && foundPlans.last.size == 1)
-foundPlans.last.head._2.plan
+foundPlans.last.head._2.plan match {
+  case p @ Project(projectList, j: Join) if projectList != output =>
+assert(topOutputSet == p.outputSet)
+// Keep the same order of final output attributes.
+p.copy(projectList = output)
+  case finalPlan =>
+finalPlan
+

spark git commit: [SPARK-20094][SQL] Preventing push down of IN subquery to Join operator

2017-03-28 Thread hvanhovell

Repository: spark
Updated Branches:
  refs/heads/master a9abff281 -> 91559d277


[SPARK-20094][SQL] Preventing push down of IN subquery to Join operator

## What changes were proposed in this pull request?

TPCDS q45 fails becuase:
`ReorderJoin` collects all predicates and try to put them into join condition 
when creating ordered join. If a predicate with an IN subquery (`ListQuery`) is 
in a join condition instead of a filter condition, 
`RewritePredicateSubquery.rewriteExistentialExpr` would fail to convert the 
subquery to an `ExistenceJoin`, and thus result in error.

We should prevent push down of IN subquery to Join operator.

## How was this patch tested?

Add a new test case in `FilterPushdownSuite`.

Author: wangzhenhua 

Closes #17428 from wzhfy/noSubqueryInJoinCond.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/91559d27
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/91559d27
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/91559d27

Branch: refs/heads/master
Commit: 91559d277f42ee83b79f5d8eb7ba037cf5c108da
Parents: a9abff2
Author: wangzhenhua 
Authored: Tue Mar 28 13:43:23 2017 +0200
Committer: Herman van Hovell 
Committed: Tue Mar 28 13:43:23 2017 +0200

--
 .../sql/catalyst/expressions/predicates.scala   |  6 ++
 .../optimizer/FilterPushdownSuite.scala | 20 
 2 files changed, 26 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/91559d27/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
index e5d1a1e..1235204 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
@@ -90,6 +90,12 @@ trait PredicateHelper {
* Returns true iff `expr` could be evaluated as a condition within join.
*/
   protected def canEvaluateWithinJoin(expr: Expression): Boolean = expr match {
+case l: ListQuery =>
+  // A ListQuery defines the query which we want to search in an IN 
subquery expression.
+  // Currently the only way to evaluate an IN subquery is to convert it to 
a
+  // LeftSemi/LeftAnti/ExistenceJoin by `RewritePredicateSubquery` rule.
+  // It cannot be evaluated as part of a Join operator.
+  false
 case e: SubqueryExpression =>
   // non-correlated subquery will be replaced as literal
   e.children.isEmpty

http://git-wip-us.apache.org/repos/asf/spark/blob/91559d27/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
index 6feea40..d846786 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala
@@ -836,6 +836,26 @@ class FilterPushdownSuite extends PlanTest {
 comparePlans(optimized, answer)
   }
 
+  test("SPARK-20094: don't push predicate with IN subquery into join 
condition") {
+val x = testRelation.subquery('x)
+val z = testRelation.subquery('z)
+val w = testRelation1.subquery('w)
+
+val queryPlan = x
+  .join(z)
+  .where(("x.b".attr === "z.b".attr) &&
+("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr)
+  .analyze
+
+val expectedPlan = x
+  .join(z, Inner, Some("x.b".attr === "z.b".attr))
+  .where("x.a".attr > 1 || "z.c".attr.in(ListQuery(w.select("w.d".attr
+  .analyze
+
+val optimized = Optimize.execute(queryPlan)
+comparePlans(optimized, expectedPlan)
+  }
+
   test("Window: predicate push down -- basic") {
 val winExpr = windowExpr(count('b), windowSpec('a :: Nil, 'b.asc :: Nil, 
UnspecifiedFrame))
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20119][TEST-MAVEN] Fix the test case fail in DataSourceScanExecRedactionSuite

2017-03-28 Thread hvanhovell

Repository: spark
Updated Branches:
  refs/heads/master 6c70a38c2 -> a9abff281


[SPARK-20119][TEST-MAVEN] Fix the test case fail in 
DataSourceScanExecRedactionSuite

### What changes were proposed in this pull request?
Changed the pattern to match the first n characters in the location field so 
that the string truncation does not affect it.

### How was this patch tested?
N/A

Author: Xiao Li 

Closes #17448 from gatorsmile/fixTestCAse.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9abff28
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9abff28
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9abff28

Branch: refs/heads/master
Commit: a9abff281bcb15fdc9121c8bcb983a9d91cb
Parents: 6c70a38
Author: Xiao Li 
Authored: Tue Mar 28 09:37:28 2017 +0200
Committer: Herman van Hovell 
Committed: Tue Mar 28 09:37:28 2017 +0200

--
 .../spark/sql/execution/DataSourceScanExecRedactionSuite.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a9abff28/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
index 986fa87..05a2b2c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
@@ -31,7 +31,7 @@ class DataSourceScanExecRedactionSuite extends QueryTest with 
SharedSQLContext {
 
   override def beforeAll(): Unit = {
 sparkConf.set("spark.redaction.string.regex",
-  "spark-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}")
+  "file:/[\\w_]+")
 super.beforeAll()
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20040][ML][PYTHON] pyspark wrapper for ChiSquareTest

spark git commit: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails for uppercase impurity type Gini

spark git commit: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails for uppercase impurity type Gini

[1/2] spark git commit: Preparing Spark release v2.1.1-rc2

[2/2] spark git commit: Preparing development version 2.1.2-SNAPSHOT

[spark] Git Push Summary

spark git commit: [SPARK-14536][SQL][BACKPORT-2.1] fix to handle null value in array type column for postgres.

spark git commit: [SPARK-20125][SQL] Dataset of type option of map does not work

spark git commit: [SPARK-19868] conflict TasksetManager lead to spark stopped

spark git commit: [SPARK-20125][SQL] Dataset of type option of map does not work

spark git commit: [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode

spark git commit: [SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuing of tokens in yarn client mode

spark git commit: [SPARK-20126][SQL] Remove HiveSessionState

spark git commit: [SPARK-20124][SQL] Join reorder should keep the same order of final project attributes

spark git commit: [SPARK-20094][SQL] Preventing push down of IN subquery to Join operator

spark git commit: [SPARK-20119][TEST-MAVEN] Fix the test case fail in DataSourceScanExecRedactionSuite

16 matches

Site Navigation

Mail list logo

Footer information