date:20151120

spark git commit: [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml

2015-11-20 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master 9ace2e5c8 -> e359d5dcf


[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml

jira: https://issues.apache.org/jira/browse/SPARK-11689

Add simple user guide for LDA under spark.ml and example code under examples/. 
Use include_example to include example code in the user guide markdown. Check 
SPARK-11606 for instructions.

Author: Yuhao Yang 

Closes #9722 from hhbyyh/ldaMLExample.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e359d5dc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e359d5dc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e359d5dc

Branch: refs/heads/master
Commit: e359d5dcf5bd300213054ebeae9fe75c4f7eb9e7
Parents: 9ace2e5
Author: Yuhao Yang 
Authored: Fri Nov 20 09:57:09 2015 -0800
Committer: Xiangrui Meng 
Committed: Fri Nov 20 09:57:09 2015 -0800

--
 docs/ml-clustering.md   | 30 +++
 docs/ml-guide.md|  3 +-
 docs/mllib-guide.md |  1 +
 .../spark/examples/ml/JavaLDAExample.java   | 94 
 .../apache/spark/examples/ml/LDAExample.scala   | 77 
 5 files changed, 204 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e359d5dc/docs/ml-clustering.md
--
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
new file mode 100644
index 000..1743ef4
--- /dev/null
+++ b/docs/ml-clustering.md
@@ -0,0 +1,30 @@
+---
+layout: global
+title: Clustering - ML
+displayTitle: ML - Clustering
+---
+
+In this section, we introduce the pipeline API for [clustering in 
mllib](mllib-clustering.html).
+
+## Latent Dirichlet allocation (LDA)
+
+`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and 
`OnlineLDAOptimizer`,
+and generates a `LDAModel` as the base models. Expert users may cast a 
`LDAModel` generated by
+`EMLDAOptimizer` to a `DistributedLDAModel` if needed.
+
+
+
+Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.
+
+
+{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
+
+
+
+
+Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) 
for more details.
+
+{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
+
+
+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/e359d5dc/docs/ml-guide.md
--
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index be18a05..6f35b30 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the 
`spark.ml` API; e.g.,
 provide class probabilities, and linear models provide model summaries.
 
 * [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
 * [Decision Trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)
@@ -950,4 +951,4 @@ model.transform(test)
 {% endhighlight %}
 
 
-
+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/e359d5dc/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 91e50cc..54e35fc 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -69,6 +69,7 @@ We list major functionality from both below, with links to 
detailed guides.
 concepts. It also contains sections on using algorithms within the Pipelines 
API, for example:
 
 * [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
 * [Decision trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/e359d5dc/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
new file mode 100644
index 000..b3a7d2e
--- /dev/null
+++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file

spark git commit: [SPARK-11852][ML] StandardScaler minor refactor

2015-11-20 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 eab90d3f3 -> b11aa1797


[SPARK-11852][ML] StandardScaler minor refactor

```withStd``` and ```withMean``` should be params of ```StandardScaler``` and 
```StandardScalerModel```.

Author: Yanbo Liang 

Closes #9839 from yanboliang/standardScaler-refactor.

(cherry picked from commit 9ace2e5c8d7fbd360a93bc5fc4eace64a697b44f)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b11aa179
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b11aa179
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b11aa179

Branch: refs/heads/branch-1.6
Commit: b11aa1797c928f2cfaf1d8821eff4be4109ac41d
Parents: eab90d3
Author: Yanbo Liang 
Authored: Fri Nov 20 09:55:53 2015 -0800
Committer: Xiangrui Meng 
Committed: Fri Nov 20 09:56:02 2015 -0800

--
 .../spark/ml/feature/StandardScaler.scala   | 60 +---
 .../spark/ml/feature/StandardScalerSuite.scala  | 11 ++--
 2 files changed, 32 insertions(+), 39 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b11aa179/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
index 6d54521..d76a9c6 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
@@ -36,20 +36,30 @@ import org.apache.spark.sql.types.{StructField, StructType}
 private[feature] trait StandardScalerParams extends Params with HasInputCol 
with HasOutputCol {
 
   /**
-   * Centers the data with mean before scaling.
+   * Whether to center the data with mean before scaling.
* It will build a dense output, so this does not work on sparse input
* and will raise an exception.
* Default: false
* @group param
*/
-  val withMean: BooleanParam = new BooleanParam(this, "withMean", "Center data 
with mean")
+  val withMean: BooleanParam = new BooleanParam(this, "withMean",
+"Whether to center data with mean")
+
+  /** @group getParam */
+  def getWithMean: Boolean = $(withMean)
 
   /**
-   * Scales the data to unit standard deviation.
+   * Whether to scale the data to unit standard deviation.
* Default: true
* @group param
*/
-  val withStd: BooleanParam = new BooleanParam(this, "withStd", "Scale to unit 
standard deviation")
+  val withStd: BooleanParam = new BooleanParam(this, "withStd",
+"Whether to scale the data to unit standard deviation")
+
+  /** @group getParam */
+  def getWithStd: Boolean = $(withStd)
+
+  setDefault(withMean -> false, withStd -> true)
 }
 
 /**
@@ -63,8 +73,6 @@ class StandardScaler(override val uid: String) extends 
Estimator[StandardScalerM
 
   def this() = this(Identifiable.randomUID("stdScal"))
 
-  setDefault(withMean -> false, withStd -> true)
-
   /** @group setParam */
   def setInputCol(value: String): this.type = set(inputCol, value)
 
@@ -82,7 +90,7 @@ class StandardScaler(override val uid: String) extends 
Estimator[StandardScalerM
 val input = dataset.select($(inputCol)).map { case Row(v: Vector) => v }
 val scaler = new feature.StandardScaler(withMean = $(withMean), withStd = 
$(withStd))
 val scalerModel = scaler.fit(input)
-copyValues(new StandardScalerModel(uid, scalerModel).setParent(this))
+copyValues(new StandardScalerModel(uid, scalerModel.std, 
scalerModel.mean).setParent(this))
   }
 
   override def transformSchema(schema: StructType): StructType = {
@@ -108,29 +116,19 @@ object StandardScaler extends 
DefaultParamsReadable[StandardScaler] {
 /**
  * :: Experimental ::
  * Model fitted by [[StandardScaler]].
+ *
+ * @param std Standard deviation of the StandardScalerModel
+ * @param mean Mean of the StandardScalerModel
  */
 @Experimental
 class StandardScalerModel private[ml] (
 override val uid: String,
-scaler: feature.StandardScalerModel)
+val std: Vector,
+val mean: Vector)
   extends Model[StandardScalerModel] with StandardScalerParams with MLWritable 
{
 
   import StandardScalerModel._
 
-  /** Standard deviation of the StandardScalerModel */
-  val std: Vector = scaler.std
-
-  /** Mean of the StandardScalerModel */
-  val mean: Vector = scaler.mean
-
-  /** Whether to scale to unit standard deviation. */
-  @Since("1.6.0")
-  def getWithStd: Boolean = scaler.withStd
-
-  /** Whether to center data with mean. */
-  @Since("1.6.0")
-  def getWithMean: Boolean = scaler.withMean
-
   /** @group setParam */
   def setInputCol(value:

spark git commit: [SPARK-11852][ML] StandardScaler minor refactor

2015-11-20 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master a66142dec -> 9ace2e5c8


[SPARK-11852][ML] StandardScaler minor refactor

```withStd``` and ```withMean``` should be params of ```StandardScaler``` and 
```StandardScalerModel```.

Author: Yanbo Liang 

Closes #9839 from yanboliang/standardScaler-refactor.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9ace2e5c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9ace2e5c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9ace2e5c

Branch: refs/heads/master
Commit: 9ace2e5c8d7fbd360a93bc5fc4eace64a697b44f
Parents: a66142d
Author: Yanbo Liang 
Authored: Fri Nov 20 09:55:53 2015 -0800
Committer: Xiangrui Meng 
Committed: Fri Nov 20 09:55:53 2015 -0800

--
 .../spark/ml/feature/StandardScaler.scala   | 60 +---
 .../spark/ml/feature/StandardScalerSuite.scala  | 11 ++--
 2 files changed, 32 insertions(+), 39 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9ace2e5c/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
index 6d54521..d76a9c6 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala
@@ -36,20 +36,30 @@ import org.apache.spark.sql.types.{StructField, StructType}
 private[feature] trait StandardScalerParams extends Params with HasInputCol 
with HasOutputCol {
 
   /**
-   * Centers the data with mean before scaling.
+   * Whether to center the data with mean before scaling.
* It will build a dense output, so this does not work on sparse input
* and will raise an exception.
* Default: false
* @group param
*/
-  val withMean: BooleanParam = new BooleanParam(this, "withMean", "Center data 
with mean")
+  val withMean: BooleanParam = new BooleanParam(this, "withMean",
+"Whether to center data with mean")
+
+  /** @group getParam */
+  def getWithMean: Boolean = $(withMean)
 
   /**
-   * Scales the data to unit standard deviation.
+   * Whether to scale the data to unit standard deviation.
* Default: true
* @group param
*/
-  val withStd: BooleanParam = new BooleanParam(this, "withStd", "Scale to unit 
standard deviation")
+  val withStd: BooleanParam = new BooleanParam(this, "withStd",
+"Whether to scale the data to unit standard deviation")
+
+  /** @group getParam */
+  def getWithStd: Boolean = $(withStd)
+
+  setDefault(withMean -> false, withStd -> true)
 }
 
 /**
@@ -63,8 +73,6 @@ class StandardScaler(override val uid: String) extends 
Estimator[StandardScalerM
 
   def this() = this(Identifiable.randomUID("stdScal"))
 
-  setDefault(withMean -> false, withStd -> true)
-
   /** @group setParam */
   def setInputCol(value: String): this.type = set(inputCol, value)
 
@@ -82,7 +90,7 @@ class StandardScaler(override val uid: String) extends 
Estimator[StandardScalerM
 val input = dataset.select($(inputCol)).map { case Row(v: Vector) => v }
 val scaler = new feature.StandardScaler(withMean = $(withMean), withStd = 
$(withStd))
 val scalerModel = scaler.fit(input)
-copyValues(new StandardScalerModel(uid, scalerModel).setParent(this))
+copyValues(new StandardScalerModel(uid, scalerModel.std, 
scalerModel.mean).setParent(this))
   }
 
   override def transformSchema(schema: StructType): StructType = {
@@ -108,29 +116,19 @@ object StandardScaler extends 
DefaultParamsReadable[StandardScaler] {
 /**
  * :: Experimental ::
  * Model fitted by [[StandardScaler]].
+ *
+ * @param std Standard deviation of the StandardScalerModel
+ * @param mean Mean of the StandardScalerModel
  */
 @Experimental
 class StandardScalerModel private[ml] (
 override val uid: String,
-scaler: feature.StandardScalerModel)
+val std: Vector,
+val mean: Vector)
   extends Model[StandardScalerModel] with StandardScalerParams with MLWritable 
{
 
   import StandardScalerModel._
 
-  /** Standard deviation of the StandardScalerModel */
-  val std: Vector = scaler.std
-
-  /** Mean of the StandardScalerModel */
-  val mean: Vector = scaler.mean
-
-  /** Whether to scale to unit standard deviation. */
-  @Since("1.6.0")
-  def getWithStd: Boolean = scaler.withStd
-
-  /** Whether to center data with mean. */
-  @Since("1.6.0")
-  def getWithMean: Boolean = scaler.withMean
-
   /** @group setParam */
   def setInputCol(value: String): this.type = set(inputCol, value)
 
@@ -139,6 +137,7 @@ class StandardScalerModel private[ml] (
 
   override def

spark git commit: [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml

2015-11-20 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 b11aa1797 -> 92d3563fd


[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml

jira: https://issues.apache.org/jira/browse/SPARK-11689

Add simple user guide for LDA under spark.ml and example code under examples/. 
Use include_example to include example code in the user guide markdown. Check 
SPARK-11606 for instructions.

Author: Yuhao Yang 

Closes #9722 from hhbyyh/ldaMLExample.

(cherry picked from commit e359d5dcf5bd300213054ebeae9fe75c4f7eb9e7)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/92d3563f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/92d3563f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/92d3563f

Branch: refs/heads/branch-1.6
Commit: 92d3563fd0cf0c3f4fe037b404d172125b24cf2f
Parents: b11aa17
Author: Yuhao Yang 
Authored: Fri Nov 20 09:57:09 2015 -0800
Committer: Xiangrui Meng 
Committed: Fri Nov 20 09:57:24 2015 -0800

--
 docs/ml-clustering.md   | 30 +++
 docs/ml-guide.md|  3 +-
 docs/mllib-guide.md |  1 +
 .../spark/examples/ml/JavaLDAExample.java   | 94 
 .../apache/spark/examples/ml/LDAExample.scala   | 77 
 5 files changed, 204 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/92d3563f/docs/ml-clustering.md
--
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
new file mode 100644
index 000..1743ef4
--- /dev/null
+++ b/docs/ml-clustering.md
@@ -0,0 +1,30 @@
+---
+layout: global
+title: Clustering - ML
+displayTitle: ML - Clustering
+---
+
+In this section, we introduce the pipeline API for [clustering in 
mllib](mllib-clustering.html).
+
+## Latent Dirichlet allocation (LDA)
+
+`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and 
`OnlineLDAOptimizer`,
+and generates a `LDAModel` as the base models. Expert users may cast a 
`LDAModel` generated by
+`EMLDAOptimizer` to a `DistributedLDAModel` if needed.
+
+
+
+Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.
+
+
+{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
+
+
+
+
+Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) 
for more details.
+
+{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
+
+
+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/92d3563f/docs/ml-guide.md
--
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index be18a05..6f35b30 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the 
`spark.ml` API; e.g.,
 provide class probabilities, and linear models provide model summaries.
 
 * [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
 * [Decision Trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)
@@ -950,4 +951,4 @@ model.transform(test)
 {% endhighlight %}
 
 
-
+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/92d3563f/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 91e50cc..54e35fc 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -69,6 +69,7 @@ We list major functionality from both below, with links to 
detailed guides.
 concepts. It also contains sections on using algorithms within the Pipelines 
API, for example:
 
 * [Feature extraction, transformation, and selection](ml-features.html)
+* [Clustering](ml-clustering.html)
 * [Decision trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/92d3563f/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
new file mode 100644
index 000..b3a7d2e
--- /dev/null
+++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
@@ -0,0 +1,94 @@
+/*
+ *

spark git commit: [SPARK-11876][SQL] Support printSchema in DataSet API

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master e359d5dcf -> bef361c58


[SPARK-11876][SQL] Support printSchema in DataSet API

DataSet APIs look great! However, I am lost when doing multiple level joins.  
For example,
```
val ds1 = Seq(("a", 1), ("b", 2)).toDS().as("a")
val ds2 = Seq(("a", 1), ("b", 2)).toDS().as("b")
val ds3 = Seq(("a", 1), ("b", 2)).toDS().as("c")

ds1.joinWith(ds2, $"a._2" === $"b._2").as("ab").joinWith(ds3, $"ab._1._2" === 
$"c._2").printSchema()
```

The printed schema is like
```
root
 |-- _1: struct (nullable = true)
 ||-- _1: struct (nullable = true)
 |||-- _1: string (nullable = true)
 |||-- _2: integer (nullable = true)
 ||-- _2: struct (nullable = true)
 |||-- _1: string (nullable = true)
 |||-- _2: integer (nullable = true)
 |-- _2: struct (nullable = true)
 ||-- _1: string (nullable = true)
 ||-- _2: integer (nullable = true)
```

Personally, I think we need the printSchema function. Sometimes, I do not know 
how to specify the column, especially when their data types are mixed. For 
example, if I want to write the following select for the above multi-level 
join, I have to know the schema:
```
newDS.select(expr("_1._2._2 + 1").as[Int]).collect()
```

marmbrus rxin cloud-fan  Do you have the same feeling?

Author: gatorsmile 

Closes #9855 from gatorsmile/printSchemaDataSet.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bef361c5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bef361c5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bef361c5

Branch: refs/heads/master
Commit: bef361c589c0a38740232fd8d0a45841e4fc969a
Parents: e359d5d
Author: gatorsmile 
Authored: Fri Nov 20 11:20:47 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 11:20:47 2015 -0800

--
 .../src/main/scala/org/apache/spark/sql/DataFrame.scala | 9 -
 .../scala/org/apache/spark/sql/execution/Queryable.scala| 9 +
 2 files changed, 9 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bef361c5/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
index 9835812..7abceca 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
@@ -300,15 +300,6 @@ class DataFrame private[sql](
   def columns: Array[String] = schema.fields.map(_.name)
 
   /**
-   * Prints the schema to the console in a nice tree format.
-   * @group basic
-   * @since 1.3.0
-   */
-  // scalastyle:off println
-  def printSchema(): Unit = println(schema.treeString)
-  // scalastyle:on println
-
-  /**
* Returns true if the `collect` and `take` methods can be run locally
* (without any Spark executors).
* @group basic

http://git-wip-us.apache.org/repos/asf/spark/blob/bef361c5/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala
index e86a52c..321e2c7 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala
@@ -38,6 +38,15 @@ private[sql] trait Queryable {
   }
 
   /**
+   * Prints the schema to the console in a nice tree format.
+   * @group basic
+   * @since 1.3.0
+   */
+  // scalastyle:off println
+  def printSchema(): Unit = println(schema.treeString)
+  // scalastyle:on println
+
+  /**
* Prints the plans (logical and physical) to the console for debugging 
purposes.
* @since 1.3.0
*/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11819][SQL] nice error message for missing encoder

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 60bfb1133 -> 3b9d2a347


[SPARK-11819][SQL] nice error message for missing encoder

before this PR, when users try to get an encoder for an un-supported class, 
they will only get a very simple error message like `Encoder for type xxx is 
not supported`.

After this PR, the error message become more friendly, for example:
```
No Encoder found for abc.xyz.NonEncodable
- array element class: "abc.xyz.NonEncodable"
- field (class: "scala.Array", name: "arrayField")
- root class: "abc.xyz.AnotherClass"
```

Author: Wenchen Fan 

Closes #9810 from cloud-fan/error-message.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b9d2a34
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b9d2a34
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b9d2a34

Branch: refs/heads/master
Commit: 3b9d2a347f9c796b90852173d84189834e499e25
Parents: 60bfb11
Author: Wenchen Fan 
Authored: Fri Nov 20 12:04:42 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 12:04:42 2015 -0800

--
 .../spark/sql/catalyst/ScalaReflection.scala| 90 +++-
 .../encoders/EncoderErrorMessageSuite.scala | 62 ++
 2 files changed, 129 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3b9d2a34/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 33ae700..918050b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -63,7 +63,7 @@ object ScalaReflection extends ScalaReflection {
   case t if t <:< definitions.BooleanTpe => BooleanType
   case t if t <:< localTypeOf[Array[Byte]] => BinaryType
   case _ =>
-val className: String = tpe.erasure.typeSymbol.asClass.fullName
+val className = getClassNameFromType(tpe)
 className match {
   case "scala.Array" =>
 val TypeRef(_, _, Seq(elementType)) = tpe
@@ -320,9 +320,23 @@ object ScalaReflection extends ScalaReflection {
 }
   }
 
-  /** Returns expressions for extracting all the fields from the given type. */
+  /**
+   * Returns expressions for extracting all the fields from the given type.
+   *
+   * If the given type is not supported, i.e. there is no encoder can be built 
for this type,
+   * an [[UnsupportedOperationException]] will be thrown with detailed error 
message to explain
+   * the type path walked so far and which class we are not supporting.
+   * There are 4 kinds of type path:
+   *  * the root type: `root class: "abc.xyz.MyClass"`
+   *  * the value type of [[Option]]: `option value class: "abc.xyz.MyClass"`
+   *  * the element type of [[Array]] or [[Seq]]: `array element class: 
"abc.xyz.MyClass"`
+   *  * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: 
"myField")`
+   */
   def extractorsFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = 
{
-extractorFor(inputObject, localTypeOf[T]) match {
+val tpe = localTypeOf[T]
+val clsName = getClassNameFromType(tpe)
+val walkedTypePath = s"""- root class: "${clsName} :: Nil
+extractorFor(inputObject, tpe, walkedTypePath) match {
   case s: CreateNamedStruct => s
   case other => CreateNamedStruct(expressions.Literal("value") :: other :: 
Nil)
 }
@@ -331,7 +345,28 @@ object ScalaReflection extends ScalaReflection {
   /** Helper for extracting internal fields from a case class. */
   private def extractorFor(
   inputObject: Expression,
-  tpe: `Type`): Expression = ScalaReflectionLock.synchronized {
+  tpe: `Type`,
+  walkedTypePath: Seq[String]): Expression = 
ScalaReflectionLock.synchronized {
+
+def toCatalystArray(input: Expression, elementType: `Type`): Expression = {
+  val externalDataType = dataTypeFor(elementType)
+  val Schema(catalystType, nullable) = silentSchemaFor(elementType)
+  if (isNativeType(catalystType)) {
+NewInstance(
+  classOf[GenericArrayData],
+  input :: Nil,
+  dataType = ArrayType(catalystType, nullable))
+  } else {
+val clsName = getClassNameFromType(elementType)
+val newPath = s"""- array element class: "$clsName +: 
walkedTypePath
+// `MapObjects` will run `extractorFor` lazily, we need to eagerly 
call `extractorFor` here
+// to trigger the type check.
+

spark git commit: [SPARK-11876][SQL] Support printSchema in DataSet API

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 92d3563fd -> 3662b9f4c


[SPARK-11876][SQL] Support printSchema in DataSet API

DataSet APIs look great! However, I am lost when doing multiple level joins.  
For example,
```
val ds1 = Seq(("a", 1), ("b", 2)).toDS().as("a")
val ds2 = Seq(("a", 1), ("b", 2)).toDS().as("b")
val ds3 = Seq(("a", 1), ("b", 2)).toDS().as("c")

ds1.joinWith(ds2, $"a._2" === $"b._2").as("ab").joinWith(ds3, $"ab._1._2" === 
$"c._2").printSchema()
```

The printed schema is like
```
root
 |-- _1: struct (nullable = true)
 ||-- _1: struct (nullable = true)
 |||-- _1: string (nullable = true)
 |||-- _2: integer (nullable = true)
 ||-- _2: struct (nullable = true)
 |||-- _1: string (nullable = true)
 |||-- _2: integer (nullable = true)
 |-- _2: struct (nullable = true)
 ||-- _1: string (nullable = true)
 ||-- _2: integer (nullable = true)
```

Personally, I think we need the printSchema function. Sometimes, I do not know 
how to specify the column, especially when their data types are mixed. For 
example, if I want to write the following select for the above multi-level 
join, I have to know the schema:
```
newDS.select(expr("_1._2._2 + 1").as[Int]).collect()
```

marmbrus rxin cloud-fan  Do you have the same feeling?

Author: gatorsmile 

Closes #9855 from gatorsmile/printSchemaDataSet.

(cherry picked from commit bef361c589c0a38740232fd8d0a45841e4fc969a)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3662b9f4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3662b9f4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3662b9f4

Branch: refs/heads/branch-1.6
Commit: 3662b9f4c9956fe0299e917edf510ea01cc37791
Parents: 92d3563
Author: gatorsmile 
Authored: Fri Nov 20 11:20:47 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 11:21:03 2015 -0800

--
 .../src/main/scala/org/apache/spark/sql/DataFrame.scala | 9 -
 .../scala/org/apache/spark/sql/execution/Queryable.scala| 9 +
 2 files changed, 9 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3662b9f4/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
index 9835812..7abceca 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
@@ -300,15 +300,6 @@ class DataFrame private[sql](
   def columns: Array[String] = schema.fields.map(_.name)
 
   /**
-   * Prints the schema to the console in a nice tree format.
-   * @group basic
-   * @since 1.3.0
-   */
-  // scalastyle:off println
-  def printSchema(): Unit = println(schema.treeString)
-  // scalastyle:on println
-
-  /**
* Returns true if the `collect` and `take` methods can be run locally
* (without any Spark executors).
* @group basic

http://git-wip-us.apache.org/repos/asf/spark/blob/3662b9f4/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala
index e86a52c..321e2c7 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala
@@ -38,6 +38,15 @@ private[sql] trait Queryable {
   }
 
   /**
+   * Prints the schema to the console in a nice tree format.
+   * @group basic
+   * @since 1.3.0
+   */
+  // scalastyle:off println
+  def printSchema(): Unit = println(schema.treeString)
+  // scalastyle:on println
+
+  /**
* Prints the plans (logical and physical) to the console for debugging 
purposes.
* @since 1.3.0
*/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11819][SQL] nice error message for missing encoder

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 119f92b4e -> ff156a3a6


[SPARK-11819][SQL] nice error message for missing encoder

before this PR, when users try to get an encoder for an un-supported class, 
they will only get a very simple error message like `Encoder for type xxx is 
not supported`.

After this PR, the error message become more friendly, for example:
```
No Encoder found for abc.xyz.NonEncodable
- array element class: "abc.xyz.NonEncodable"
- field (class: "scala.Array", name: "arrayField")
- root class: "abc.xyz.AnotherClass"
```

Author: Wenchen Fan 

Closes #9810 from cloud-fan/error-message.

(cherry picked from commit 3b9d2a347f9c796b90852173d84189834e499e25)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff156a3a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ff156a3a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ff156a3a

Branch: refs/heads/branch-1.6
Commit: ff156a3a660e1730de220b404a61e1bda8b7682e
Parents: 119f92b
Author: Wenchen Fan 
Authored: Fri Nov 20 12:04:42 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 12:04:53 2015 -0800

--
 .../spark/sql/catalyst/ScalaReflection.scala| 90 +++-
 .../encoders/EncoderErrorMessageSuite.scala | 62 ++
 2 files changed, 129 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ff156a3a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 33ae700..918050b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -63,7 +63,7 @@ object ScalaReflection extends ScalaReflection {
   case t if t <:< definitions.BooleanTpe => BooleanType
   case t if t <:< localTypeOf[Array[Byte]] => BinaryType
   case _ =>
-val className: String = tpe.erasure.typeSymbol.asClass.fullName
+val className = getClassNameFromType(tpe)
 className match {
   case "scala.Array" =>
 val TypeRef(_, _, Seq(elementType)) = tpe
@@ -320,9 +320,23 @@ object ScalaReflection extends ScalaReflection {
 }
   }
 
-  /** Returns expressions for extracting all the fields from the given type. */
+  /**
+   * Returns expressions for extracting all the fields from the given type.
+   *
+   * If the given type is not supported, i.e. there is no encoder can be built 
for this type,
+   * an [[UnsupportedOperationException]] will be thrown with detailed error 
message to explain
+   * the type path walked so far and which class we are not supporting.
+   * There are 4 kinds of type path:
+   *  * the root type: `root class: "abc.xyz.MyClass"`
+   *  * the value type of [[Option]]: `option value class: "abc.xyz.MyClass"`
+   *  * the element type of [[Array]] or [[Seq]]: `array element class: 
"abc.xyz.MyClass"`
+   *  * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: 
"myField")`
+   */
   def extractorsFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = 
{
-extractorFor(inputObject, localTypeOf[T]) match {
+val tpe = localTypeOf[T]
+val clsName = getClassNameFromType(tpe)
+val walkedTypePath = s"""- root class: "${clsName} :: Nil
+extractorFor(inputObject, tpe, walkedTypePath) match {
   case s: CreateNamedStruct => s
   case other => CreateNamedStruct(expressions.Literal("value") :: other :: 
Nil)
 }
@@ -331,7 +345,28 @@ object ScalaReflection extends ScalaReflection {
   /** Helper for extracting internal fields from a case class. */
   private def extractorFor(
   inputObject: Expression,
-  tpe: `Type`): Expression = ScalaReflectionLock.synchronized {
+  tpe: `Type`,
+  walkedTypePath: Seq[String]): Expression = 
ScalaReflectionLock.synchronized {
+
+def toCatalystArray(input: Expression, elementType: `Type`): Expression = {
+  val externalDataType = dataTypeFor(elementType)
+  val Schema(catalystType, nullable) = silentSchemaFor(elementType)
+  if (isNativeType(catalystType)) {
+NewInstance(
+  classOf[GenericArrayData],
+  input :: Nil,
+  dataType = ArrayType(catalystType, nullable))
+  } else {
+val clsName = getClassNameFromType(elementType)
+val newPath = s"""- array element class: "$clsName +: 
walkedTypePath
+//

spark git commit: [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL

2015-11-20 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 6fe1ce6ab -> 9a906c1c3


[SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL

JIRA: https://issues.apache.org/jira/browse/SPARK-11817

Instead of return None, we should truncate the fractional seconds to prevent 
inserting NULL.

Author: Liang-Chi Hsieh 

Closes #9834 from viirya/truncate-fractional-sec.

(cherry picked from commit 60bfb113325c71491f8dcf98b6036b0caa2144fe)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9a906c1c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9a906c1c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9a906c1c

Branch: refs/heads/branch-1.5
Commit: 9a906c1c3a76097b72f0951a3730b669ac58e3c7
Parents: 6fe1ce6
Author: Liang-Chi Hsieh 
Authored: Fri Nov 20 11:43:45 2015 -0800
Committer: Yin Huai 
Committed: Fri Nov 20 11:44:32 2015 -0800

--
 .../org/apache/spark/sql/catalyst/util/DateTimeUtils.scala   | 5 +
 .../apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala  | 8 
 2 files changed, 13 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9a906c1c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index c6a2780..0fdcb62 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -326,6 +326,11 @@ object DateTimeUtils {
   return None
 }
 
+// Instead of return None, we truncate the fractional seconds to prevent 
inserting NULL
+if (segments(6) > 99) {
+  segments(6) = segments(6).toString.take(6).toInt
+}
+
 if (segments(3) < 0 || segments(3) > 23 || segments(4) < 0 || segments(4) 
> 59 ||
 segments(5) < 0 || segments(5) > 59 || segments(6) < 0 || segments(6) 
> 99 ||
 segments(7) < 0 || segments(7) > 23 || segments(8) < 0 || segments(8) 
> 59) {

http://git-wip-us.apache.org/repos/asf/spark/blob/9a906c1c/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
index b35d400..753c7e9 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
@@ -300,6 +300,14 @@ class DateTimeUtilsSuite extends SparkFunSuite {
   UTF8String.fromString("2015-03-18T12:03.17-0:70")).isEmpty)
 assert(stringToTimestamp(
   UTF8String.fromString("2015-03-18T12:03.17-1:0:0")).isEmpty)
+
+// Truncating the fractional seconds
+c = Calendar.getInstance(TimeZone.getTimeZone("GMT+00:00"))
+c.set(2015, 2, 18, 12, 3, 17)
+c.set(Calendar.MILLISECOND, 0)
+assert(stringToTimestamp(
+  UTF8String.fromString("2015-03-18T12:03:17.123456789+0:00")).get ===
+c.getTimeInMillis * 1000 + 123456)
   }
 
   test("hours") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL

2015-11-20 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master bef361c58 -> 60bfb1133


[SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL

JIRA: https://issues.apache.org/jira/browse/SPARK-11817

Instead of return None, we should truncate the fractional seconds to prevent 
inserting NULL.

Author: Liang-Chi Hsieh 

Closes #9834 from viirya/truncate-fractional-sec.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60bfb113
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60bfb113
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60bfb113

Branch: refs/heads/master
Commit: 60bfb113325c71491f8dcf98b6036b0caa2144fe
Parents: bef361c
Author: Liang-Chi Hsieh 
Authored: Fri Nov 20 11:43:45 2015 -0800
Committer: Yin Huai 
Committed: Fri Nov 20 11:43:45 2015 -0800

--
 .../org/apache/spark/sql/catalyst/util/DateTimeUtils.scala   | 5 +
 .../apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala  | 8 
 2 files changed, 13 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/60bfb113/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 17a5527..2b93882 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -327,6 +327,11 @@ object DateTimeUtils {
   return None
 }
 
+// Instead of return None, we truncate the fractional seconds to prevent 
inserting NULL
+if (segments(6) > 99) {
+  segments(6) = segments(6).toString.take(6).toInt
+}
+
 if (segments(3) < 0 || segments(3) > 23 || segments(4) < 0 || segments(4) 
> 59 ||
 segments(5) < 0 || segments(5) > 59 || segments(6) < 0 || segments(6) 
> 99 ||
 segments(7) < 0 || segments(7) > 23 || segments(8) < 0 || segments(8) 
> 59) {

http://git-wip-us.apache.org/repos/asf/spark/blob/60bfb113/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
index faca128..0ce5a2f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
@@ -343,6 +343,14 @@ class DateTimeUtilsSuite extends SparkFunSuite {
   UTF8String.fromString("2015-03-18T12:03.17-0:70")).isEmpty)
 assert(stringToTimestamp(
   UTF8String.fromString("2015-03-18T12:03.17-1:0:0")).isEmpty)
+
+// Truncating the fractional seconds
+c = Calendar.getInstance(TimeZone.getTimeZone("GMT+00:00"))
+c.set(2015, 2, 18, 12, 3, 17)
+c.set(Calendar.MILLISECOND, 0)
+assert(stringToTimestamp(
+  UTF8String.fromString("2015-03-18T12:03:17.123456789+0:00")).get ===
+c.getTimeInMillis * 1000 + 123456)
   }
 
   test("hours") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL

2015-11-20 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 3662b9f4c -> 119f92b4e


[SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL

JIRA: https://issues.apache.org/jira/browse/SPARK-11817

Instead of return None, we should truncate the fractional seconds to prevent 
inserting NULL.

Author: Liang-Chi Hsieh 

Closes #9834 from viirya/truncate-fractional-sec.

(cherry picked from commit 60bfb113325c71491f8dcf98b6036b0caa2144fe)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/119f92b4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/119f92b4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/119f92b4

Branch: refs/heads/branch-1.6
Commit: 119f92b4e6cbdf03c5a6a25f892a73d81b2a23d1
Parents: 3662b9f
Author: Liang-Chi Hsieh 
Authored: Fri Nov 20 11:43:45 2015 -0800
Committer: Yin Huai 
Committed: Fri Nov 20 11:43:54 2015 -0800

--
 .../org/apache/spark/sql/catalyst/util/DateTimeUtils.scala   | 5 +
 .../apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala  | 8 
 2 files changed, 13 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/119f92b4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 17a5527..2b93882 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -327,6 +327,11 @@ object DateTimeUtils {
   return None
 }
 
+// Instead of return None, we truncate the fractional seconds to prevent 
inserting NULL
+if (segments(6) > 99) {
+  segments(6) = segments(6).toString.take(6).toInt
+}
+
 if (segments(3) < 0 || segments(3) > 23 || segments(4) < 0 || segments(4) 
> 59 ||
 segments(5) < 0 || segments(5) > 59 || segments(6) < 0 || segments(6) 
> 99 ||
 segments(7) < 0 || segments(7) > 23 || segments(8) < 0 || segments(8) 
> 59) {

http://git-wip-us.apache.org/repos/asf/spark/blob/119f92b4/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
index faca128..0ce5a2f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
@@ -343,6 +343,14 @@ class DateTimeUtilsSuite extends SparkFunSuite {
   UTF8String.fromString("2015-03-18T12:03.17-0:70")).isEmpty)
 assert(stringToTimestamp(
   UTF8String.fromString("2015-03-18T12:03.17-1:0:0")).isEmpty)
+
+// Truncating the fractional seconds
+c = Calendar.getInstance(TimeZone.getTimeZone("GMT+00:00"))
+c.set(2015, 2, 18, 12, 3, 17)
+c.set(Calendar.MILLISECOND, 0)
+assert(stringToTimestamp(
+  UTF8String.fromString("2015-03-18T12:03:17.123456789+0:00")).get ===
+c.getTimeInMillis * 1000 + 123456)
   }
 
   test("hours") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11724][SQL] Change casting between int and timestamp to consistently treat int in seconds.

2015-11-20 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 652def318 -> 9ed4ad426


[SPARK-11724][SQL] Change casting between int and timestamp to consistently 
treat int in seconds.

Hive has since changed this behavior as well. 
https://issues.apache.org/jira/browse/HIVE-3454

Author: Nong Li 
Author: Nong Li 
Author: Yin Huai 

Closes #9685 from nongli/spark-11724.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9ed4ad42
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9ed4ad42
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9ed4ad42

Branch: refs/heads/master
Commit: 9ed4ad4265cf9d3135307eb62dae6de0b220fc21
Parents: 652def3
Author: Nong Li 
Authored: Fri Nov 20 14:19:34 2015 -0800
Committer: Yin Huai 
Committed: Fri Nov 20 14:19:34 2015 -0800

--
 .../spark/sql/catalyst/expressions/Cast.scala   |  6 ++--
 .../sql/catalyst/expressions/CastSuite.scala| 16 +
 .../apache/spark/sql/DateFunctionsSuite.scala   |  3 ++
 ...l testing-0-237a6af90a857da1efcbe98f6bbbf9d6 |  1 +
 ...l testing-0-9a02bc7de09bcabcbd4c91f54a814c20 |  1 -
 ...mp cast #3-0-76ee270337f664b36cacfc6528ac109 |  1 -
 ...p cast #5-0-dbd7bcd167d322d6617b884c02c7f247 |  1 -
 ...p cast #7-0-1d70654217035f8ce5f64344f4c5a80f |  1 -
 .../sql/hive/execution/HiveQuerySuite.scala | 34 +---
 9 files changed, 39 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9ed4ad42/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index 5564e24..533d17e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -204,8 +204,8 @@ case class Cast(child: Expression, dataType: DataType)
 if (d.isNaN || d.isInfinite) null else (d * 100L).toLong
   }
 
-  // converting milliseconds to us
-  private[this] def longToTimestamp(t: Long): Long = t * 1000L
+  // converting seconds to us
+  private[this] def longToTimestamp(t: Long): Long = t * 100L
   // converting us to seconds
   private[this] def timestampToLong(ts: Long): Long = math.floor(ts.toDouble / 
100L).toLong
   // converting us to seconds in double
@@ -647,7 +647,7 @@ case class Cast(child: Expression, dataType: DataType)
 
   private[this] def decimalToTimestampCode(d: String): String =
 s"($d.toBigDecimal().bigDecimal().multiply(new 
java.math.BigDecimal(100L))).longValue()"
-  private[this] def longToTimeStampCode(l: String): String = s"$l * 1000L"
+  private[this] def longToTimeStampCode(l: String): String = s"$l * 100L"
   private[this] def timestampToIntegerCode(ts: String): String =
 s"java.lang.Math.floor((double) $ts / 100L)"
   private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 
100.0"

http://git-wip-us.apache.org/repos/asf/spark/blob/9ed4ad42/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index f4db4da..ab77a76 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -258,8 +258,8 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 
   test("cast from int 2") {
 checkEvaluation(cast(1, LongType), 1.toLong)
-checkEvaluation(cast(cast(1000, TimestampType), LongType), 1.toLong)
-checkEvaluation(cast(cast(-1200, TimestampType), LongType), -2.toLong)
+checkEvaluation(cast(cast(1000, TimestampType), LongType), 1000.toLong)
+checkEvaluation(cast(cast(-1200, TimestampType), LongType), -1200.toLong)
 
 checkEvaluation(cast(123, DecimalType.USER_DEFAULT), Decimal(123))
 checkEvaluation(cast(123, DecimalType(3, 0)), Decimal(123))
@@ -348,14 +348,14 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(
   cast(cast(cast(cast(cast(cast("5", ByteType), TimestampType),
 DecimalType.SYSTEM_DEFAULT), LongType), StringType), ShortType),
-  0.toShort)
+  5.toShort)
 checkEvaluation(
   cast(cast(cast(cast(cast(cast("5", TimestampType),

spark git commit: [SPARK-11724][SQL] Change casting between int and timestamp to consistently treat int in seconds.

2015-11-20 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 6fc968754 -> 9c8e17984


[SPARK-11724][SQL] Change casting between int and timestamp to consistently 
treat int in seconds.

Hive has since changed this behavior as well. 
https://issues.apache.org/jira/browse/HIVE-3454

Author: Nong Li 
Author: Nong Li 
Author: Yin Huai 

Closes #9685 from nongli/spark-11724.

(cherry picked from commit 9ed4ad4265cf9d3135307eb62dae6de0b220fc21)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9c8e1798
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9c8e1798
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9c8e1798

Branch: refs/heads/branch-1.6
Commit: 9c8e17984d95a8d225525a592a921a5af81e4440
Parents: 6fc9687
Author: Nong Li 
Authored: Fri Nov 20 14:19:34 2015 -0800
Committer: Yin Huai 
Committed: Fri Nov 20 14:19:44 2015 -0800

--
 .../spark/sql/catalyst/expressions/Cast.scala   |  6 ++--
 .../sql/catalyst/expressions/CastSuite.scala| 16 +
 .../apache/spark/sql/DateFunctionsSuite.scala   |  3 ++
 ...l testing-0-237a6af90a857da1efcbe98f6bbbf9d6 |  1 +
 ...l testing-0-9a02bc7de09bcabcbd4c91f54a814c20 |  1 -
 ...mp cast #3-0-76ee270337f664b36cacfc6528ac109 |  1 -
 ...p cast #5-0-dbd7bcd167d322d6617b884c02c7f247 |  1 -
 ...p cast #7-0-1d70654217035f8ce5f64344f4c5a80f |  1 -
 .../sql/hive/execution/HiveQuerySuite.scala | 34 +---
 9 files changed, 39 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9c8e1798/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index 5564e24..533d17e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -204,8 +204,8 @@ case class Cast(child: Expression, dataType: DataType)
 if (d.isNaN || d.isInfinite) null else (d * 100L).toLong
   }
 
-  // converting milliseconds to us
-  private[this] def longToTimestamp(t: Long): Long = t * 1000L
+  // converting seconds to us
+  private[this] def longToTimestamp(t: Long): Long = t * 100L
   // converting us to seconds
   private[this] def timestampToLong(ts: Long): Long = math.floor(ts.toDouble / 
100L).toLong
   // converting us to seconds in double
@@ -647,7 +647,7 @@ case class Cast(child: Expression, dataType: DataType)
 
   private[this] def decimalToTimestampCode(d: String): String =
 s"($d.toBigDecimal().bigDecimal().multiply(new 
java.math.BigDecimal(100L))).longValue()"
-  private[this] def longToTimeStampCode(l: String): String = s"$l * 1000L"
+  private[this] def longToTimeStampCode(l: String): String = s"$l * 100L"
   private[this] def timestampToIntegerCode(ts: String): String =
 s"java.lang.Math.floor((double) $ts / 100L)"
   private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 
100.0"

http://git-wip-us.apache.org/repos/asf/spark/blob/9c8e1798/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index f4db4da..ab77a76 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -258,8 +258,8 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 
   test("cast from int 2") {
 checkEvaluation(cast(1, LongType), 1.toLong)
-checkEvaluation(cast(cast(1000, TimestampType), LongType), 1.toLong)
-checkEvaluation(cast(cast(-1200, TimestampType), LongType), -2.toLong)
+checkEvaluation(cast(cast(1000, TimestampType), LongType), 1000.toLong)
+checkEvaluation(cast(cast(-1200, TimestampType), LongType), -1200.toLong)
 
 checkEvaluation(cast(123, DecimalType.USER_DEFAULT), Decimal(123))
 checkEvaluation(cast(123, DecimalType(3, 0)), Decimal(123))
@@ -348,14 +348,14 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(
   cast(cast(cast(cast(cast(cast("5", ByteType), TimestampType),
 DecimalType.SYSTEM_DEFAULT), LongType), StringType),

spark git commit: [SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test

2015-11-20 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 ff156a3a6 -> 6fc968754


[SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test

This patch reduces some RPC timeouts in order to speed up the slow 
"AkkaUtilsSuite.remote fetch ssl on - untrusted server", which used to take two 
minutes to run.

Author: Josh Rosen 

Closes #9869 from JoshRosen/SPARK-11650.

(cherry picked from commit 652def318e47890bd0a0977dc982cc07f99fb06a)
Signed-off-by: Josh Rosen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6fc96875
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6fc96875
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6fc96875

Branch: refs/heads/branch-1.6
Commit: 6fc96875460d881f13ec3082c4a2b32144ea45e9
Parents: ff156a3
Author: Josh Rosen 
Authored: Fri Nov 20 13:17:35 2015 -0800
Committer: Josh Rosen 
Committed: Fri Nov 20 13:18:15 2015 -0800

--
 core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6fc96875/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala
index 6160101..0af4b60 100644
--- a/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala
@@ -340,10 +340,11 @@ class AkkaUtilsSuite extends SparkFunSuite with 
LocalSparkContext with ResetSyst
   new MapOutputTrackerMasterEndpoint(rpcEnv, masterTracker, conf))
 
 val slaveConf = sparkSSLConfig()
+  .set("spark.rpc.askTimeout", "5s")
+  .set("spark.rpc.lookupTimeout", "5s")
 val securityManagerBad = new SecurityManager(slaveConf)
 
 val slaveRpcEnv = RpcEnv.create("spark-slave", hostname, 0, slaveConf, 
securityManagerBad)
-val slaveTracker = new MapOutputTrackerWorker(conf)
 try {
   slaveRpcEnv.setupEndpointRef("spark", rpcEnv.address, 
MapOutputTracker.ENDPOINT_NAME)
   fail("should receive either ActorNotFound or TimeoutException")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test

2015-11-20 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master 3b9d2a347 -> 652def318


[SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test

This patch reduces some RPC timeouts in order to speed up the slow 
"AkkaUtilsSuite.remote fetch ssl on - untrusted server", which used to take two 
minutes to run.

Author: Josh Rosen 

Closes #9869 from JoshRosen/SPARK-11650.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/652def31
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/652def31
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/652def31

Branch: refs/heads/master
Commit: 652def318e47890bd0a0977dc982cc07f99fb06a
Parents: 3b9d2a3
Author: Josh Rosen 
Authored: Fri Nov 20 13:17:35 2015 -0800
Committer: Josh Rosen 
Committed: Fri Nov 20 13:17:35 2015 -0800

--
 core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/652def31/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala 
b/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala
index 6160101..0af4b60 100644
--- a/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala
@@ -340,10 +340,11 @@ class AkkaUtilsSuite extends SparkFunSuite with 
LocalSparkContext with ResetSyst
   new MapOutputTrackerMasterEndpoint(rpcEnv, masterTracker, conf))
 
 val slaveConf = sparkSSLConfig()
+  .set("spark.rpc.askTimeout", "5s")
+  .set("spark.rpc.lookupTimeout", "5s")
 val securityManagerBad = new SecurityManager(slaveConf)
 
 val slaveRpcEnv = RpcEnv.create("spark-slave", hostname, 0, slaveConf, 
securityManagerBad)
-val slaveTracker = new MapOutputTrackerWorker(conf)
 try {
   slaveRpcEnv.setupEndpointRef("spark", rpcEnv.address, 
MapOutputTracker.ENDPOINT_NAME)
   fail("should receive either ActorNotFound or TimeoutException")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11890][SQL] Fix compilation for Scala 2.11

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 7e06d51d5 -> e0bb4e09c


[SPARK-11890][SQL] Fix compilation for Scala 2.11

Author: Michael Armbrust 

Closes #9871 from marmbrus/scala211-break.

(cherry picked from commit 68ed046836975b492b594967256d3c7951b568a5)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e0bb4e09
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e0bb4e09
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e0bb4e09

Branch: refs/heads/branch-1.6
Commit: e0bb4e09c7b04bc8926a4c0658fc2c51db8fb04c
Parents: 7e06d51
Author: Michael Armbrust 
Authored: Fri Nov 20 15:38:04 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 15:38:16 2015 -0800

--
 .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e0bb4e09/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 918050b..4a4a62e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -670,14 +670,14 @@ trait ScalaReflection {
* Unlike `schemaFor`, this method won't throw exception for un-supported 
type, it will return
* `NullType` silently instead.
*/
-  private def silentSchemaFor(tpe: `Type`): Schema = try {
+  protected def silentSchemaFor(tpe: `Type`): Schema = try {
 schemaFor(tpe)
   } catch {
 case _: UnsupportedOperationException => Schema(NullType, nullable = true)
   }
 
   /** Returns the full class name for a type. */
-  private def getClassNameFromType(tpe: `Type`): String = {
+  protected def getClassNameFromType(tpe: `Type`): String = {
 tpe.erasure.typeSymbol.asClass.fullName
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11890][SQL] Fix compilation for Scala 2.11

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 968acf3bd -> 68ed04683


[SPARK-11890][SQL] Fix compilation for Scala 2.11

Author: Michael Armbrust 

Closes #9871 from marmbrus/scala211-break.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68ed0468
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68ed0468
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68ed0468

Branch: refs/heads/master
Commit: 68ed046836975b492b594967256d3c7951b568a5
Parents: 968acf3
Author: Michael Armbrust 
Authored: Fri Nov 20 15:38:04 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 15:38:04 2015 -0800

--
 .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/68ed0468/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 918050b..4a4a62e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -670,14 +670,14 @@ trait ScalaReflection {
* Unlike `schemaFor`, this method won't throw exception for un-supported 
type, it will return
* `NullType` silently instead.
*/
-  private def silentSchemaFor(tpe: `Type`): Schema = try {
+  protected def silentSchemaFor(tpe: `Type`): Schema = try {
 schemaFor(tpe)
   } catch {
 case _: UnsupportedOperationException => Schema(NullType, nullable = true)
   }
 
   /** Returns the full class name for a type. */
-  private def getClassNameFromType(tpe: `Type`): String = {
+  protected def getClassNameFromType(tpe: `Type`): String = {
 tpe.erasure.typeSymbol.asClass.fullName
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11877] Prevent agg. fallback conf. from leaking across test suites

2015-11-20 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 1ce6394e3 -> eab90d3f3


[SPARK-11877] Prevent agg. fallback conf. from leaking across test suites

This patch fixes an issue where the 
`spark.sql.TungstenAggregate.testFallbackStartsAt` SQLConf setting was not 
properly reset / cleared at the end of 
`TungstenAggregationQueryWithControlledFallbackSuite`. This ended up causing 
test failures in HiveCompatibilitySuite in Maven builds by causing spilling to 
occur way too frequently.

This configuration leak was inadvertently introduced during test cleanup in 
#9618.

Author: Josh Rosen 

Closes #9857 from JoshRosen/clear-fallback-prop-in-test-teardown.

(cherry picked from commit a66142decee48bf5689fb7f4f33646d7bb1ac08d)
Signed-off-by: Josh Rosen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eab90d3f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eab90d3f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eab90d3f

Branch: refs/heads/branch-1.6
Commit: eab90d3f37587c6a8df8c686be6d9cd5751f205a
Parents: 1ce6394
Author: Josh Rosen 
Authored: Fri Nov 20 00:46:29 2015 -0800
Committer: Josh Rosen 
Committed: Fri Nov 20 00:46:42 2015 -0800

--
 .../hive/execution/AggregationQuerySuite.scala  | 44 ++--
 1 file changed, 21 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/eab90d3f/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
index 6dde79f..39c0a2a 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
@@ -868,29 +868,27 @@ class TungstenAggregationQueryWithControlledFallbackSuite 
extends AggregationQue
 
   override protected def checkAnswer(actual: => DataFrame, expectedAnswer: 
Seq[Row]): Unit = {
 (0 to 2).foreach { fallbackStartsAt =>
-  sqlContext.setConf(
-"spark.sql.TungstenAggregate.testFallbackStartsAt",
-fallbackStartsAt.toString)
-
-  // Create a new df to make sure its physical operator picks up
-  // spark.sql.TungstenAggregate.testFallbackStartsAt.
-  // todo: remove it?
-  val newActual = DataFrame(sqlContext, actual.logicalPlan)
-
-  QueryTest.checkAnswer(newActual, expectedAnswer) match {
-case Some(errorMessage) =>
-  val newErrorMessage =
-s"""
-  |The following aggregation query failed when using 
TungstenAggregate with
-  |controlled fallback (it falls back to sort-based aggregation 
once it has processed
-  |$fallbackStartsAt input rows). The query is
-  |${actual.queryExecution}
-  |
-  |$errorMessage
-""".stripMargin
-
-  fail(newErrorMessage)
-case None =>
+  withSQLConf("spark.sql.TungstenAggregate.testFallbackStartsAt" -> 
fallbackStartsAt.toString) {
+// Create a new df to make sure its physical operator picks up
+// spark.sql.TungstenAggregate.testFallbackStartsAt.
+// todo: remove it?
+val newActual = DataFrame(sqlContext, actual.logicalPlan)
+
+QueryTest.checkAnswer(newActual, expectedAnswer) match {
+  case Some(errorMessage) =>
+val newErrorMessage =
+  s"""
+|The following aggregation query failed when using 
TungstenAggregate with
+|controlled fallback (it falls back to sort-based aggregation 
once it has processed
+|$fallbackStartsAt input rows). The query is
+|${actual.queryExecution}
+|
+|$errorMessage
+  """.stripMargin
+
+fail(newErrorMessage)
+  case None =>
+}
   }
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11877] Prevent agg. fallback conf. from leaking across test suites

2015-11-20 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master 3e1d120ce -> a66142dec


[SPARK-11877] Prevent agg. fallback conf. from leaking across test suites

This patch fixes an issue where the 
`spark.sql.TungstenAggregate.testFallbackStartsAt` SQLConf setting was not 
properly reset / cleared at the end of 
`TungstenAggregationQueryWithControlledFallbackSuite`. This ended up causing 
test failures in HiveCompatibilitySuite in Maven builds by causing spilling to 
occur way too frequently.

This configuration leak was inadvertently introduced during test cleanup in 
#9618.

Author: Josh Rosen 

Closes #9857 from JoshRosen/clear-fallback-prop-in-test-teardown.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a66142de
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a66142de
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a66142de

Branch: refs/heads/master
Commit: a66142decee48bf5689fb7f4f33646d7bb1ac08d
Parents: 3e1d120
Author: Josh Rosen 
Authored: Fri Nov 20 00:46:29 2015 -0800
Committer: Josh Rosen 
Committed: Fri Nov 20 00:46:29 2015 -0800

--
 .../hive/execution/AggregationQuerySuite.scala  | 44 ++--
 1 file changed, 21 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a66142de/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
index 6dde79f..39c0a2a 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
@@ -868,29 +868,27 @@ class TungstenAggregationQueryWithControlledFallbackSuite 
extends AggregationQue
 
   override protected def checkAnswer(actual: => DataFrame, expectedAnswer: 
Seq[Row]): Unit = {
 (0 to 2).foreach { fallbackStartsAt =>
-  sqlContext.setConf(
-"spark.sql.TungstenAggregate.testFallbackStartsAt",
-fallbackStartsAt.toString)
-
-  // Create a new df to make sure its physical operator picks up
-  // spark.sql.TungstenAggregate.testFallbackStartsAt.
-  // todo: remove it?
-  val newActual = DataFrame(sqlContext, actual.logicalPlan)
-
-  QueryTest.checkAnswer(newActual, expectedAnswer) match {
-case Some(errorMessage) =>
-  val newErrorMessage =
-s"""
-  |The following aggregation query failed when using 
TungstenAggregate with
-  |controlled fallback (it falls back to sort-based aggregation 
once it has processed
-  |$fallbackStartsAt input rows). The query is
-  |${actual.queryExecution}
-  |
-  |$errorMessage
-""".stripMargin
-
-  fail(newErrorMessage)
-case None =>
+  withSQLConf("spark.sql.TungstenAggregate.testFallbackStartsAt" -> 
fallbackStartsAt.toString) {
+// Create a new df to make sure its physical operator picks up
+// spark.sql.TungstenAggregate.testFallbackStartsAt.
+// todo: remove it?
+val newActual = DataFrame(sqlContext, actual.logicalPlan)
+
+QueryTest.checkAnswer(newActual, expectedAnswer) match {
+  case Some(errorMessage) =>
+val newErrorMessage =
+  s"""
+|The following aggregation query failed when using 
TungstenAggregate with
+|controlled fallback (it falls back to sort-based aggregation 
once it has processed
+|$fallbackStartsAt input rows). The query is
+|${actual.queryExecution}
+|
+|$errorMessage
+  """.stripMargin
+
+fail(newErrorMessage)
+  case None =>
+}
   }
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Revert "[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml"

2015-11-20 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master 47815878a -> a2dce22e0


Revert "[SPARK-11689][ML] Add user guide and example code for LDA under 
spark.ml"

This reverts commit e359d5dcf5bd300213054ebeae9fe75c4f7eb9e7.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a2dce22e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a2dce22e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a2dce22e

Branch: refs/heads/master
Commit: a2dce22e0a25922e2052318d32f32877b7c27ec2
Parents: 4781587
Author: Xiangrui Meng 
Authored: Fri Nov 20 16:51:47 2015 -0800
Committer: Xiangrui Meng 
Committed: Fri Nov 20 16:51:47 2015 -0800

--
 docs/ml-clustering.md   | 30 ---
 docs/ml-guide.md|  3 +-
 docs/mllib-guide.md |  1 -
 .../spark/examples/ml/JavaLDAExample.java   | 94 
 .../apache/spark/examples/ml/LDAExample.scala   | 77 
 5 files changed, 1 insertion(+), 204 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a2dce22e/docs/ml-clustering.md
--
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
deleted file mode 100644
index 1743ef4..000
--- a/docs/ml-clustering.md
+++ /dev/null
@@ -1,30 +0,0 @@

-layout: global
-title: Clustering - ML
-displayTitle: ML - Clustering

-
-In this section, we introduce the pipeline API for [clustering in 
mllib](mllib-clustering.html).
-
-## Latent Dirichlet allocation (LDA)
-
-`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and 
`OnlineLDAOptimizer`,
-and generates a `LDAModel` as the base models. Expert users may cast a 
`LDAModel` generated by
-`EMLDAOptimizer` to a `DistributedLDAModel` if needed.
-
-
-
-Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.
-
-
-{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
-
-
-
-
-Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) 
for more details.
-
-{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
-
-
-
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/a2dce22e/docs/ml-guide.md
--
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index 6f35b30..be18a05 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -40,7 +40,6 @@ Also, some algorithms have additional capabilities in the 
`spark.ml` API; e.g.,
 provide class probabilities, and linear models provide model summaries.
 
 * [Feature extraction, transformation, and selection](ml-features.html)
-* [Clustering](ml-clustering.html)
 * [Decision Trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)
@@ -951,4 +950,4 @@ model.transform(test)
 {% endhighlight %}
 
 
-
\ No newline at end of file
+

http://git-wip-us.apache.org/repos/asf/spark/blob/a2dce22e/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 54e35fc..91e50cc 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -69,7 +69,6 @@ We list major functionality from both below, with links to 
detailed guides.
 concepts. It also contains sections on using algorithms within the Pipelines 
API, for example:
 
 * [Feature extraction, transformation, and selection](ml-features.html)
-* [Clustering](ml-clustering.html)
 * [Decision trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/a2dce22e/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
deleted file mode 100644
index b3a7d2e..000
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
+++ /dev/null
@@ -1,94 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the

spark git commit: Revert "[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml"

2015-11-20 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 285e4017a -> 33d856df5


Revert "[SPARK-11689][ML] Add user guide and example code for LDA under 
spark.ml"

This reverts commit 92d3563fd0cf0c3f4fe037b404d172125b24cf2f.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/33d856df
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/33d856df
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/33d856df

Branch: refs/heads/branch-1.6
Commit: 33d856df53689d7fd515a21ec4f34d1d5c74a958
Parents: 285e401
Author: Xiangrui Meng 
Authored: Fri Nov 20 16:52:20 2015 -0800
Committer: Xiangrui Meng 
Committed: Fri Nov 20 16:52:20 2015 -0800

--
 docs/ml-clustering.md   | 30 ---
 docs/ml-guide.md|  3 +-
 docs/mllib-guide.md |  1 -
 .../spark/examples/ml/JavaLDAExample.java   | 94 
 .../apache/spark/examples/ml/LDAExample.scala   | 77 
 5 files changed, 1 insertion(+), 204 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/33d856df/docs/ml-clustering.md
--
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
deleted file mode 100644
index 1743ef4..000
--- a/docs/ml-clustering.md
+++ /dev/null
@@ -1,30 +0,0 @@

-layout: global
-title: Clustering - ML
-displayTitle: ML - Clustering

-
-In this section, we introduce the pipeline API for [clustering in 
mllib](mllib-clustering.html).
-
-## Latent Dirichlet allocation (LDA)
-
-`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and 
`OnlineLDAOptimizer`,
-and generates a `LDAModel` as the base models. Expert users may cast a 
`LDAModel` generated by
-`EMLDAOptimizer` to a `DistributedLDAModel` if needed.
-
-
-
-Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details.
-
-
-{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}
-
-
-
-
-Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) 
for more details.
-
-{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}
-
-
-
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/33d856df/docs/ml-guide.md
--
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index 6f35b30..be18a05 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -40,7 +40,6 @@ Also, some algorithms have additional capabilities in the 
`spark.ml` API; e.g.,
 provide class probabilities, and linear models provide model summaries.
 
 * [Feature extraction, transformation, and selection](ml-features.html)
-* [Clustering](ml-clustering.html)
 * [Decision Trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)
@@ -951,4 +950,4 @@ model.transform(test)
 {% endhighlight %}
 
 
-
\ No newline at end of file
+

http://git-wip-us.apache.org/repos/asf/spark/blob/33d856df/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 54e35fc..91e50cc 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -69,7 +69,6 @@ We list major functionality from both below, with links to 
detailed guides.
 concepts. It also contains sections on using algorithms within the Pipelines 
API, for example:
 
 * [Feature extraction, transformation, and selection](ml-features.html)
-* [Clustering](ml-clustering.html)
 * [Decision trees for classification and regression](ml-decision-tree.html)
 * [Ensembles](ml-ensembles.html)
 * [Linear methods with elastic net regularization](ml-linear-methods.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/33d856df/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
--
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
deleted file mode 100644
index b3a7d2e..000
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java
+++ /dev/null
@@ -1,94 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with

spark git commit: [SPARK-11716][SQL] UDFRegistration just drops the input type when re-creating the UserDefinedFunction

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 fbe6888cc -> b9b0e1747


[SPARK-11716][SQL] UDFRegistration just drops the input type when re-creating 
the UserDefinedFunction

https://issues.apache.org/jira/browse/SPARK-11716

This is one is #9739 and a regression test. When commit it, please make sure 
the author is jbonofre.

You can find the original PR at https://github.com/apache/spark/pull/9739

closes #9739

Author: Jean-Baptiste OnofrÃ© 
Author: Yin Huai 

Closes #9868 from yhuai/SPARK-11716.

(cherry picked from commit 03ba56d78f50747710d01c27d409ba2be42ae557)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9b0e174
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9b0e174
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9b0e174

Branch: refs/heads/branch-1.6
Commit: b9b0e17473e98d3d19b88abaf5ffcfdd6a2a2ea8
Parents: fbe6888
Author: Jean-Baptiste OnofrÃ© 
Authored: Fri Nov 20 14:45:40 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 14:45:55 2015 -0800

--
 .../org/apache/spark/sql/UDFRegistration.scala  | 48 ++--
 .../scala/org/apache/spark/sql/UDFSuite.scala   | 15 ++
 2 files changed, 39 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b9b0e174/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
index fc4d093..051694c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
@@ -88,7 +88,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
   val inputTypes = Try($inputTypes).getOrElse(Nil)
   def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, 
inputTypes)
   functionRegistry.registerFunction(name, builder)
-  UserDefinedFunction(func, dataType)
+  UserDefinedFunction(func, dataType, inputTypes)
 }""")
 }
 
@@ -120,7 +120,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -133,7 +133,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -146,7 +146,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
ScalaReflection.schemaFor[A2].dataType :: Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -159,7 +159,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
ScalaReflection.schemaFor[A2].dataType :: 
ScalaReflection.schemaFor[A3].dataType :: Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -172,7 +172,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
ScalaReflection.schemaFor[A2].dataType :: 
ScalaReflection.schemaFor[A3].dataType :: 
ScalaReflection.schemaFor[A4].dataType :: Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -185,7 +185,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes

spark git commit: [SPARK-11716][SQL] UDFRegistration just drops the input type when re-creating the UserDefinedFunction

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 89fd9bd06 -> 03ba56d78


[SPARK-11716][SQL] UDFRegistration just drops the input type when re-creating 
the UserDefinedFunction

https://issues.apache.org/jira/browse/SPARK-11716

This is one is #9739 and a regression test. When commit it, please make sure 
the author is jbonofre.

You can find the original PR at https://github.com/apache/spark/pull/9739

closes #9739

Author: Jean-Baptiste OnofrÃ© 
Author: Yin Huai 

Closes #9868 from yhuai/SPARK-11716.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/03ba56d7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/03ba56d7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/03ba56d7

Branch: refs/heads/master
Commit: 03ba56d78f50747710d01c27d409ba2be42ae557
Parents: 89fd9bd
Author: Jean-Baptiste OnofrÃ© 
Authored: Fri Nov 20 14:45:40 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 14:45:40 2015 -0800

--
 .../org/apache/spark/sql/UDFRegistration.scala  | 48 ++--
 .../scala/org/apache/spark/sql/UDFSuite.scala   | 15 ++
 2 files changed, 39 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/03ba56d7/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
index fc4d093..051694c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
@@ -88,7 +88,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
   val inputTypes = Try($inputTypes).getOrElse(Nil)
   def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, 
inputTypes)
   functionRegistry.registerFunction(name, builder)
-  UserDefinedFunction(func, dataType)
+  UserDefinedFunction(func, dataType, inputTypes)
 }""")
 }
 
@@ -120,7 +120,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -133,7 +133,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -146,7 +146,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
ScalaReflection.schemaFor[A2].dataType :: Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -159,7 +159,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
ScalaReflection.schemaFor[A2].dataType :: 
ScalaReflection.schemaFor[A3].dataType :: Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -172,7 +172,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
ScalaReflection.schemaFor[A2].dataType :: 
ScalaReflection.schemaFor[A3].dataType :: 
ScalaReflection.schemaFor[A4].dataType :: Nil).getOrElse(Nil)
 def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes)
 functionRegistry.registerFunction(name, builder)
-UserDefinedFunction(func, dataType)
+UserDefinedFunction(func, dataType, inputTypes)
   }
 
   /**
@@ -185,7 +185,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) 
extends Logging {
 val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: 
ScalaReflection.schemaFor[A2].dataType :: 
ScalaReflection.schemaFor[A3].dataType ::

spark git commit: [SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help information for SparkR:::summary correctly

2015-11-20 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/master 03ba56d78 -> a6239d587


[SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help 
information for SparkR:::summary correctly

Fix use of aliases and changes uses of rdname and seealso
`aliases` is the hint for `?` - it should not be linked to some other name - 
those should be seealso
https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html

Clean up usage on family, as multiple use of family with the same rdname is 
causing duplicated See Also html blocks (like 
http://spark.apache.org/docs/latest/api/R/count.html)
Also changing some rdname for dplyr-like variant for better R user visibility 
in R doc, eg. rbind, summary, mutate, summarize

shivaram yanboliang

Author: felixcheung 

Closes #9750 from felixcheung/rdocaliases.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6239d58
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6239d58
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6239d58

Branch: refs/heads/master
Commit: a6239d587c638691f52eca3eee905c53fbf35a12
Parents: 03ba56d
Author: felixcheung 
Authored: Fri Nov 20 15:10:55 2015 -0800
Committer: Shivaram Venkataraman 
Committed: Fri Nov 20 15:10:55 2015 -0800

--
 R/pkg/R/DataFrame.R | 96 
 R/pkg/R/broadcast.R |  1 -
 R/pkg/R/generics.R  | 12 +++---
 R/pkg/R/group.R | 12 +++---
 4 files changed, 37 insertions(+), 84 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a6239d58/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 06b0108..8a13e7a 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -254,7 +254,6 @@ setMethod("dtypes",
 #' @family DataFrame functions
 #' @rdname columns
 #' @name columns
-#' @aliases names
 #' @export
 #' @examples
 #'\dontrun{
@@ -272,7 +271,6 @@ setMethod("columns",
 })
   })
 
-#' @family DataFrame functions
 #' @rdname columns
 #' @name names
 setMethod("names",
@@ -281,7 +279,6 @@ setMethod("names",
 columns(x)
   })
 
-#' @family DataFrame functions
 #' @rdname columns
 #' @name names<-
 setMethod("names<-",
@@ -533,14 +530,8 @@ setMethod("distinct",
 dataFrame(sdf)
   })
 
-#' @title Distinct rows in a DataFrame
-#
-#' @description Returns a new DataFrame containing distinct rows in this 
DataFrame
-#'
-#' @family DataFrame functions
-#' @rdname unique
+#' @rdname distinct
 #' @name unique
-#' @aliases distinct
 setMethod("unique",
   signature(x = "DataFrame"),
   function(x) {
@@ -557,7 +548,7 @@ setMethod("unique",
 #'
 #' @family DataFrame functions
 #' @rdname sample
-#' @aliases sample_frac
+#' @name sample
 #' @export
 #' @examples
 #'\dontrun{
@@ -579,7 +570,6 @@ setMethod("sample",
 dataFrame(sdf)
   })
 
-#' @family DataFrame functions
 #' @rdname sample
 #' @name sample_frac
 setMethod("sample_frac",
@@ -589,16 +579,15 @@ setMethod("sample_frac",
 sample(x, withReplacement, fraction)
   })
 
-#' Count
+#' nrow
 #'
 #' Returns the number of rows in a DataFrame
 #'
 #' @param x A SparkSQL DataFrame
 #'
 #' @family DataFrame functions
-#' @rdname count
+#' @rdname nrow
 #' @name count
-#' @aliases nrow
 #' @export
 #' @examples
 #'\dontrun{
@@ -614,14 +603,8 @@ setMethod("count",
 callJMethod(x@sdf, "count")
   })
 
-#' @title Number of rows for a DataFrame
-#' @description Returns number of rows in a DataFrames
-#'
 #' @name nrow
-#'
-#' @family DataFrame functions
 #' @rdname nrow
-#' @aliases count
 setMethod("nrow",
   signature(x = "DataFrame"),
   function(x) {
@@ -870,7 +853,6 @@ setMethod("toRDD",
 #' @param x a DataFrame
 #' @return a GroupedData
 #' @seealso GroupedData
-#' @aliases group_by
 #' @family DataFrame functions
 #' @rdname groupBy
 #' @name groupBy
@@ -896,7 +878,6 @@ setMethod("groupBy",
  groupedData(sgd)
})
 
-#' @family DataFrame functions
 #' @rdname groupBy
 #' @name group_by
 setMethod("group_by",
@@ -913,7 +894,6 @@ setMethod("group_by",
 #' @family DataFrame functions
 #' @rdname agg
 #' @name agg
-#' @aliases summarize
 #' @export
 setMethod("agg",
   signature(x = "DataFrame"),
@@ -921,7 +901,6 @@ setMethod("agg",
 agg(groupBy(x), ...)
   })
 
-#' @family DataFrame functions
 #' @rdname agg
 #' @name summarize
 setMethod("summarize",
@@ -1092,7 +1071,6 @@ setMethod("[", signature(x = "DataFrame", i = "Column"),
 #' @family DataFrame functions
 #' @rdname subset
 #' @name subset
-#' @aliases [
 #' @family subsetting

[2/2] spark git commit: [SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using include_example

2015-11-20 Thread meng

[SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using 
include_example

Author: Vikas Nelamangala 

Closes #9689 from vikasnp/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed47b1e6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed47b1e6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed47b1e6

Branch: refs/heads/master
Commit: ed47b1e660b830e2d4fac8d6df93f634b260393c
Parents: 4b84c72
Author: Vikas Nelamangala 
Authored: Fri Nov 20 15:18:41 2015 -0800
Committer: Xiangrui Meng 
Committed: Fri Nov 20 15:18:41 2015 -0800

--
 docs/mllib-evaluation-metrics.md| 940 +--
 .../JavaBinaryClassificationMetricsExample.java | 113 +++
 ...aMultiLabelClassificationMetricsExample.java |  80 ++
 ...aMulticlassClassificationMetricsExample.java |  97 ++
 .../mllib/JavaRankingMetricsExample.java| 176 
 .../mllib/JavaRegressionMetricsExample.java |  91 ++
 .../binary_classification_metrics_example.py|  55 ++
 .../python/mllib/multi_class_metrics_example.py |  69 ++
 .../python/mllib/multi_label_metrics_example.py |  61 ++
 .../python/mllib/ranking_metrics_example.py |  55 ++
 .../python/mllib/regression_metrics_example.py  |  59 ++
 .../BinaryClassificationMetricsExample.scala| 103 ++
 .../mllib/MultiLabelMetricsExample.scala|  69 ++
 .../mllib/MulticlassMetricsExample.scala|  99 ++
 .../examples/mllib/RankingMetricsExample.scala  | 110 +++
 .../mllib/RegressionMetricsExample.scala|  67 ++
 16 files changed, 1319 insertions(+), 925 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ed47b1e6/docs/mllib-evaluation-metrics.md
--
diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index f73eff6..6924037 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -104,214 +104,21 @@ data, and evaluate the performance of the algorithm by 
several binary evaluation
 
 Refer to the [`LogisticRegressionWithLBFGS` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS)
 and [`BinaryClassificationMetrics` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.evaluation.BinaryClassificationMetrics)
 for details on the API.
 
-{% highlight scala %}
-import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
-import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
-import org.apache.spark.mllib.regression.LabeledPoint
-import org.apache.spark.mllib.util.MLUtils
-
-// Load training data in LIBSVM format
-val data = MLUtils.loadLibSVMFile(sc, 
"data/mllib/sample_binary_classification_data.txt")
-
-// Split data into training (60%) and test (40%)
-val Array(training, test) = data.randomSplit(Array(0.6, 0.4), seed = 11L)
-training.cache()
-
-// Run training algorithm to build the model
-val model = new LogisticRegressionWithLBFGS()
-  .setNumClasses(2)
-  .run(training)
-
-// Clear the prediction threshold so the model will return probabilities
-model.clearThreshold
-
-// Compute raw scores on the test set
-val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
-  val prediction = model.predict(features)
-  (prediction, label)
-}
-
-// Instantiate metrics object
-val metrics = new BinaryClassificationMetrics(predictionAndLabels)
-
-// Precision by threshold
-val precision = metrics.precisionByThreshold
-precision.foreach { case (t, p) =>
-println(s"Threshold: $t, Precision: $p")
-}
-
-// Recall by threshold
-val recall = metrics.recallByThreshold
-recall.foreach { case (t, r) =>
-println(s"Threshold: $t, Recall: $r")
-}
-
-// Precision-Recall Curve
-val PRC = metrics.pr
-
-// F-measure
-val f1Score = metrics.fMeasureByThreshold
-f1Score.foreach { case (t, f) =>
-println(s"Threshold: $t, F-score: $f, Beta = 1")
-}
-
-val beta = 0.5
-val fScore = metrics.fMeasureByThreshold(beta)
-f1Score.foreach { case (t, f) =>
-println(s"Threshold: $t, F-score: $f, Beta = 0.5")
-}
-
-// AUPRC
-val auPRC = metrics.areaUnderPR
-println("Area under precision-recall curve = " + auPRC)
-
-// Compute thresholds used in ROC and PR curves
-val thresholds = precision.map(_._1)
-
-// ROC Curve
-val roc = metrics.roc
-
-// AUROC
-val auROC = metrics.areaUnderROC
-println("Area under ROC = " + auROC)
-
-{% endhighlight %}
+{% include_example 
scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala 
%}
 
 
 
 
 Refer to the [`LogisticRegressionModel` Java 
docs](api/java/org/apache/spark/mllib/classification/LogisticRegressionModel.html)
 and

[1/2] spark git commit: [SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using include_example

2015-11-20 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master 4b84c72df -> ed47b1e66


http://git-wip-us.apache.org/repos/asf/spark/blob/ed47b1e6/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
new file mode 100644
index 000..4503c15
--- /dev/null
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+
+// $example on$
+import org.apache.spark.mllib.evaluation.MultilabelMetrics
+import org.apache.spark.rdd.RDD
+// $example off$
+import org.apache.spark.{SparkContext, SparkConf}
+
+object MultiLabelMetricsExample {
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("MultiLabelMetricsExample")
+val sc = new SparkContext(conf)
+// $example on$
+val scoreAndLabels: RDD[(Array[Double], Array[Double])] = sc.parallelize(
+  Seq((Array(0.0, 1.0), Array(0.0, 2.0)),
+(Array(0.0, 2.0), Array(0.0, 1.0)),
+(Array.empty[Double], Array(0.0)),
+(Array(2.0), Array(2.0)),
+(Array(2.0, 0.0), Array(2.0, 0.0)),
+(Array(0.0, 1.0, 2.0), Array(0.0, 1.0)),
+(Array(1.0), Array(1.0, 2.0))), 2)
+
+// Instantiate metrics object
+val metrics = new MultilabelMetrics(scoreAndLabels)
+
+// Summary stats
+println(s"Recall = ${metrics.recall}")
+println(s"Precision = ${metrics.precision}")
+println(s"F1 measure = ${metrics.f1Measure}")
+println(s"Accuracy = ${metrics.accuracy}")
+
+// Individual label stats
+metrics.labels.foreach(label =>
+  println(s"Class $label precision = ${metrics.precision(label)}"))
+metrics.labels.foreach(label => println(s"Class $label recall = 
${metrics.recall(label)}"))
+metrics.labels.foreach(label => println(s"Class $label F1-score = 
${metrics.f1Measure(label)}"))
+
+// Micro stats
+println(s"Micro recall = ${metrics.microRecall}")
+println(s"Micro precision = ${metrics.microPrecision}")
+println(s"Micro F1 measure = ${metrics.microF1Measure}")
+
+// Hamming loss
+println(s"Hamming loss = ${metrics.hammingLoss}")
+
+// Subset accuracy
+println(s"Subset accuracy = ${metrics.subsetAccuracy}")
+// $example off$
+  }
+}
+// scalastyle:on println

http://git-wip-us.apache.org/repos/asf/spark/blob/ed47b1e6/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
new file mode 100644
index 000..0904449
--- /dev/null
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+
+// $example on$
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
+import

[1/2] spark git commit: [SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using include_example

2015-11-20 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 0665fb5ea -> 1dde97176


http://git-wip-us.apache.org/repos/asf/spark/blob/1dde9717/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
new file mode 100644
index 000..4503c15
--- /dev/null
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+
+// $example on$
+import org.apache.spark.mllib.evaluation.MultilabelMetrics
+import org.apache.spark.rdd.RDD
+// $example off$
+import org.apache.spark.{SparkContext, SparkConf}
+
+object MultiLabelMetricsExample {
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("MultiLabelMetricsExample")
+val sc = new SparkContext(conf)
+// $example on$
+val scoreAndLabels: RDD[(Array[Double], Array[Double])] = sc.parallelize(
+  Seq((Array(0.0, 1.0), Array(0.0, 2.0)),
+(Array(0.0, 2.0), Array(0.0, 1.0)),
+(Array.empty[Double], Array(0.0)),
+(Array(2.0), Array(2.0)),
+(Array(2.0, 0.0), Array(2.0, 0.0)),
+(Array(0.0, 1.0, 2.0), Array(0.0, 1.0)),
+(Array(1.0), Array(1.0, 2.0))), 2)
+
+// Instantiate metrics object
+val metrics = new MultilabelMetrics(scoreAndLabels)
+
+// Summary stats
+println(s"Recall = ${metrics.recall}")
+println(s"Precision = ${metrics.precision}")
+println(s"F1 measure = ${metrics.f1Measure}")
+println(s"Accuracy = ${metrics.accuracy}")
+
+// Individual label stats
+metrics.labels.foreach(label =>
+  println(s"Class $label precision = ${metrics.precision(label)}"))
+metrics.labels.foreach(label => println(s"Class $label recall = 
${metrics.recall(label)}"))
+metrics.labels.foreach(label => println(s"Class $label F1-score = 
${metrics.f1Measure(label)}"))
+
+// Micro stats
+println(s"Micro recall = ${metrics.microRecall}")
+println(s"Micro precision = ${metrics.microPrecision}")
+println(s"Micro F1 measure = ${metrics.microF1Measure}")
+
+// Hamming loss
+println(s"Hamming loss = ${metrics.hammingLoss}")
+
+// Subset accuracy
+println(s"Subset accuracy = ${metrics.subsetAccuracy}")
+// $example off$
+  }
+}
+// scalastyle:on println

http://git-wip-us.apache.org/repos/asf/spark/blob/1dde9717/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
--
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
new file mode 100644
index 000..0904449
--- /dev/null
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+
+// $example on$
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
+import

spark git commit: [SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 58b4e4f88 -> 968acf3bd


[SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL

In this PR I delete a method that breaks type inference for aggregators (only 
in the REPL)

The error when this method is present is:
```
:38: error: missing parameter type for expanded function ((x$2) => 
x$2._2)
  ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect()
```

Author: Michael Armbrust 

Closes #9870 from marmbrus/dataset-repl-agg.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/968acf3b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/968acf3b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/968acf3b

Branch: refs/heads/master
Commit: 968acf3bd9a502fcad15df3e53e359695ae702cc
Parents: 58b4e4f
Author: Michael Armbrust 
Authored: Fri Nov 20 15:36:30 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 15:36:30 2015 -0800

--
 .../scala/org/apache/spark/repl/ReplSuite.scala | 24 +
 .../org/apache/spark/sql/GroupedDataset.scala   | 27 +++-
 .../org/apache/spark/sql/JavaDatasetSuite.java  |  8 +++---
 3 files changed, 30 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/968acf3b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
--
diff --git 
a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala 
b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index 081aa03..cbcccb1 100644
--- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -339,6 +339,30 @@ class ReplSuite extends SparkFunSuite {
 }
   }
 
+  test("Datasets agg type-inference") {
+val output = runInterpreter("local",
+  """
+|import org.apache.spark.sql.functions._
+|import org.apache.spark.sql.Encoder
+|import org.apache.spark.sql.expressions.Aggregator
+|import org.apache.spark.sql.TypedColumn
+|/** An `Aggregator` that adds up any numeric type returned by the 
given function. */
+|class SumOf[I, N : Numeric](f: I => N) extends Aggregator[I, N, N] 
with Serializable {
+|  val numeric = implicitly[Numeric[N]]
+|  override def zero: N = numeric.zero
+|  override def reduce(b: N, a: I): N = numeric.plus(b, f(a))
+|  override def merge(b1: N,b2: N): N = numeric.plus(b1, b2)
+|  override def finish(reduction: N): N = reduction
+|}
+|
+|def sum[I, N : Numeric : Encoder](f: I => N): TypedColumn[I, N] = new 
SumOf(f).toColumn
+|val ds = Seq((1, 1, 2L), (1, 2, 3L), (1, 3, 4L), (2, 1, 5L)).toDS()
+|ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect()
+  """.stripMargin)
+assertDoesNotContain("error:", output)
+assertDoesNotContain("Exception", output)
+  }
+
   test("collecting objects of class defined in repl") {
 val output = runInterpreter("local[2]",
   """

http://git-wip-us.apache.org/repos/asf/spark/blob/968acf3b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala
index 6de3dd6..263f049 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala
@@ -146,31 +146,10 @@ class GroupedDataset[K, T] private[sql](
 reduce(f.call _)
   }
 
-  /**
-   * Compute aggregates by specifying a series of aggregate columns, and 
return a [[DataFrame]].
-   * We can call `as[T : Encoder]` to turn the returned [[DataFrame]] to 
[[Dataset]] again.
-   *
-   * The available aggregate methods are defined in 
[[org.apache.spark.sql.functions]].
-   *
-   * {{{
-   *   // Selects the age of the oldest employee and the aggregate expense for 
each department
-   *
-   *   // Scala:
-   *   import org.apache.spark.sql.functions._
-   *   df.groupBy("department").agg(max("age"), sum("expense"))
-   *
-   *   // Java:
-   *   import static org.apache.spark.sql.functions.*;
-   *   df.groupBy("department").agg(max("age"), sum("expense"));
-   * }}}
-   *
-   * We can also use `Aggregator.toColumn` to pass in typed aggregate 
functions.
-   *
-   * @since 1.6.0
-   */
+  // This is here to prevent us from adding overloads that would be ambiguous.
   @scala.annotation.varargs
-  def agg(expr: Column, exprs: Column*): DataFrame =
-

spark git commit: [SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 7437a7f5b -> 7e06d51d5


[SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL

In this PR I delete a method that breaks type inference for aggregators (only 
in the REPL)

The error when this method is present is:
```
:38: error: missing parameter type for expanded function ((x$2) => 
x$2._2)
  ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect()
```

Author: Michael Armbrust 

Closes #9870 from marmbrus/dataset-repl-agg.

(cherry picked from commit 968acf3bd9a502fcad15df3e53e359695ae702cc)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e06d51d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e06d51d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e06d51d

Branch: refs/heads/branch-1.6
Commit: 7e06d51d5637d4f8e042a1a230ee48591d08236f
Parents: 7437a7f
Author: Michael Armbrust 
Authored: Fri Nov 20 15:36:30 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 15:36:39 2015 -0800

--
 .../scala/org/apache/spark/repl/ReplSuite.scala | 24 +
 .../org/apache/spark/sql/GroupedDataset.scala   | 27 +++-
 .../org/apache/spark/sql/JavaDatasetSuite.java  |  8 +++---
 3 files changed, 30 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7e06d51d/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
--
diff --git 
a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala 
b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index 081aa03..cbcccb1 100644
--- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -339,6 +339,30 @@ class ReplSuite extends SparkFunSuite {
 }
   }
 
+  test("Datasets agg type-inference") {
+val output = runInterpreter("local",
+  """
+|import org.apache.spark.sql.functions._
+|import org.apache.spark.sql.Encoder
+|import org.apache.spark.sql.expressions.Aggregator
+|import org.apache.spark.sql.TypedColumn
+|/** An `Aggregator` that adds up any numeric type returned by the 
given function. */
+|class SumOf[I, N : Numeric](f: I => N) extends Aggregator[I, N, N] 
with Serializable {
+|  val numeric = implicitly[Numeric[N]]
+|  override def zero: N = numeric.zero
+|  override def reduce(b: N, a: I): N = numeric.plus(b, f(a))
+|  override def merge(b1: N,b2: N): N = numeric.plus(b1, b2)
+|  override def finish(reduction: N): N = reduction
+|}
+|
+|def sum[I, N : Numeric : Encoder](f: I => N): TypedColumn[I, N] = new 
SumOf(f).toColumn
+|val ds = Seq((1, 1, 2L), (1, 2, 3L), (1, 3, 4L), (2, 1, 5L)).toDS()
+|ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect()
+  """.stripMargin)
+assertDoesNotContain("error:", output)
+assertDoesNotContain("Exception", output)
+  }
+
   test("collecting objects of class defined in repl") {
 val output = runInterpreter("local[2]",
   """

http://git-wip-us.apache.org/repos/asf/spark/blob/7e06d51d/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala
index 6de3dd6..263f049 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala
@@ -146,31 +146,10 @@ class GroupedDataset[K, T] private[sql](
 reduce(f.call _)
   }
 
-  /**
-   * Compute aggregates by specifying a series of aggregate columns, and 
return a [[DataFrame]].
-   * We can call `as[T : Encoder]` to turn the returned [[DataFrame]] to 
[[Dataset]] again.
-   *
-   * The available aggregate methods are defined in 
[[org.apache.spark.sql.functions]].
-   *
-   * {{{
-   *   // Selects the age of the oldest employee and the aggregate expense for 
each department
-   *
-   *   // Scala:
-   *   import org.apache.spark.sql.functions._
-   *   df.groupBy("department").agg(max("age"), sum("expense"))
-   *
-   *   // Java:
-   *   import static org.apache.spark.sql.functions.*;
-   *   df.groupBy("department").agg(max("age"), sum("expense"));
-   * }}}
-   *
-   * We can also use `Aggregator.toColumn` to pass in typed aggregate 
functions.
-   *
-   * @since 1.6.0
-   */
+  // This is here to prevent us from adding

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 9c8e17984 -> 0c23dd52d


[SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction 
and TransformFunctionSerializer

TransformFunction and TransformFunctionSerializer don't rethrow the exception, 
so when any exception happens, it just return None. This will cause some weird 
NPE and confuse people.

Author: Shixiong Zhu 

Closes #9847 from zsxwing/pyspark-streaming-exception.

(cherry picked from commit be7a2cfd978143f6f265eca63e9e24f755bc9f22)
Signed-off-by: Tathagata Das 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c23dd52
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c23dd52
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0c23dd52

Branch: refs/heads/branch-1.6
Commit: 0c23dd52d64d4a3448fb7d21b0e40d13f885bcfa
Parents: 9c8e179
Author: Shixiong Zhu 
Authored: Fri Nov 20 14:23:01 2015 -0800
Committer: Tathagata Das 
Committed: Fri Nov 20 14:23:18 2015 -0800

--
 python/pyspark/streaming/tests.py | 16 
 python/pyspark/streaming/util.py  |  3 +++
 2 files changed, 19 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0c23dd52/python/pyspark/streaming/tests.py
--
diff --git a/python/pyspark/streaming/tests.py 
b/python/pyspark/streaming/tests.py
index 3403f6d..a0e0267 100644
--- a/python/pyspark/streaming/tests.py
+++ b/python/pyspark/streaming/tests.py
@@ -403,6 +403,22 @@ class BasicOperationTests(PySparkStreamingTestCase):
 expected = [[('k', v)] for v in expected]
 self._test_func(input, func, expected)
 
+def test_failed_func(self):
+input = [self.sc.parallelize([d], 1) for d in range(4)]
+input_stream = self.ssc.queueStream(input)
+
+def failed_func(i):
+raise ValueError("failed")
+
+input_stream.map(failed_func).pprint()
+self.ssc.start()
+try:
+self.ssc.awaitTerminationOrTimeout(10)
+except:
+return
+
+self.fail("a failed func should throw an error")
+
 
 class StreamingListenerTests(PySparkStreamingTestCase):
 

http://git-wip-us.apache.org/repos/asf/spark/blob/0c23dd52/python/pyspark/streaming/util.py
--
diff --git a/python/pyspark/streaming/util.py b/python/pyspark/streaming/util.py
index b20613b..767c732 100644
--- a/python/pyspark/streaming/util.py
+++ b/python/pyspark/streaming/util.py
@@ -64,6 +64,7 @@ class TransformFunction(object):
 return r._jrdd
 except Exception:
 traceback.print_exc()
+raise
 
 def __repr__(self):
 return "TransformFunction(%s)" % self.func
@@ -95,6 +96,7 @@ class TransformFunctionSerializer(object):
 return bytearray(self.serializer.dumps((func.func, 
func.deserializers)))
 except Exception:
 traceback.print_exc()
+raise
 
 def loads(self, data):
 try:
@@ -102,6 +104,7 @@ class TransformFunctionSerializer(object):
 return TransformFunction(self.ctx, f, *deserializers)
 except Exception:
 traceback.print_exc()
+raise
 
 def __repr__(self):
 return "TransformFunctionSerializer(%s)" % self.serializer


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 5118abb4e -> 94789f374


[SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction 
and TransformFunctionSerializer

TransformFunction and TransformFunctionSerializer don't rethrow the exception, 
so when any exception happens, it just return None. This will cause some weird 
NPE and confuse people.

Author: Shixiong Zhu 

Closes #9847 from zsxwing/pyspark-streaming-exception.

(cherry picked from commit be7a2cfd978143f6f265eca63e9e24f755bc9f22)
Signed-off-by: Tathagata Das 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/94789f37
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/94789f37
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/94789f37

Branch: refs/heads/branch-1.4
Commit: 94789f37400ea534e051d1df19f3a567646979fd
Parents: 5118abb
Author: Shixiong Zhu 
Authored: Fri Nov 20 14:23:01 2015 -0800
Committer: Tathagata Das 
Committed: Fri Nov 20 14:23:57 2015 -0800

--
 python/pyspark/streaming/tests.py | 16 
 python/pyspark/streaming/util.py  |  3 +++
 2 files changed, 19 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/94789f37/python/pyspark/streaming/tests.py
--
diff --git a/python/pyspark/streaming/tests.py 
b/python/pyspark/streaming/tests.py
index ab1cc3f..830fd9e 100644
--- a/python/pyspark/streaming/tests.py
+++ b/python/pyspark/streaming/tests.py
@@ -376,6 +376,22 @@ class BasicOperationTests(PySparkStreamingTestCase):
 expected = [[('k', v)] for v in expected]
 self._test_func(input, func, expected)
 
+def test_failed_func(self):
+input = [self.sc.parallelize([d], 1) for d in range(4)]
+input_stream = self.ssc.queueStream(input)
+
+def failed_func(i):
+raise ValueError("failed")
+
+input_stream.map(failed_func).pprint()
+self.ssc.start()
+try:
+self.ssc.awaitTerminationOrTimeout(10)
+except:
+return
+
+self.fail("a failed func should throw an error")
+
 
 class WindowFunctionTests(PySparkStreamingTestCase):
 

http://git-wip-us.apache.org/repos/asf/spark/blob/94789f37/python/pyspark/streaming/util.py
--
diff --git a/python/pyspark/streaming/util.py b/python/pyspark/streaming/util.py
index 34291f3..0a2773b 100644
--- a/python/pyspark/streaming/util.py
+++ b/python/pyspark/streaming/util.py
@@ -59,6 +59,7 @@ class TransformFunction(object):
 return r._jrdd
 except Exception:
 traceback.print_exc()
+raise
 
 def __repr__(self):
 return "TransformFunction(%s)" % self.func
@@ -90,6 +91,7 @@ class TransformFunctionSerializer(object):
 return bytearray(self.serializer.dumps((func.func, 
func.deserializers)))
 except Exception:
 traceback.print_exc()
+raise
 
 def loads(self, data):
 try:
@@ -97,6 +99,7 @@ class TransformFunctionSerializer(object):
 return TransformFunction(self.ctx, f, *deserializers)
 except Exception:
 traceback.print_exc()
+raise
 
 def __repr__(self):
 return "TransformFunctionSerializer(%s)" % self.serializer


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite tests

2015-11-20 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master be7a2cfd9 -> 89fd9bd06


[SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite tests

In PersistenceEngineSuite, we do not call `close()` on the PersistenceEngine at 
the end of the test. For the ZooKeeperPersistenceEngine, this causes us to leak 
a ZooKeeper client, causing the logs of unrelated tests to be periodically 
spammed with connection error messages from that client:

```
15/11/20 05:13:35.789 
pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741)
 INFO ClientCnxn: Opening socket connection to server 
localhost/127.0.0.1:15741. Will not attempt to authenticate using SASL (unknown 
error)
15/11/20 05:13:35.790 
pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741)
 WARN ClientCnxn: Session 0x15124ff48dd for server null, unexpected error, 
closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
```

This patch fixes this by using a `finally` block.

Author: Josh Rosen 

Closes #9864 from JoshRosen/close-zookeeper-client-in-tests.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/89fd9bd0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/89fd9bd0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/89fd9bd0

Branch: refs/heads/master
Commit: 89fd9bd06160fa89dedbf685bfe159ffe4a06ec6
Parents: be7a2cf
Author: Josh Rosen 
Authored: Fri Nov 20 14:31:26 2015 -0800
Committer: Josh Rosen 
Committed: Fri Nov 20 14:31:26 2015 -0800

--
 .../deploy/master/PersistenceEngineSuite.scala  | 100 ++-
 1 file changed, 52 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/89fd9bd0/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
index 3477557..7a44728 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
@@ -63,56 +63,60 @@ class PersistenceEngineSuite extends SparkFunSuite {
   conf: SparkConf, persistenceEngineCreator: Serializer => 
PersistenceEngine): Unit = {
 val serializer = new JavaSerializer(conf)
 val persistenceEngine = persistenceEngineCreator(serializer)
-persistenceEngine.persist("test_1", "test_1_value")
-assert(Seq("test_1_value") === persistenceEngine.read[String]("test_"))
-persistenceEngine.persist("test_2", "test_2_value")
-assert(Set("test_1_value", "test_2_value") === 
persistenceEngine.read[String]("test_").toSet)
-persistenceEngine.unpersist("test_1")
-assert(Seq("test_2_value") === persistenceEngine.read[String]("test_"))
-persistenceEngine.unpersist("test_2")
-assert(persistenceEngine.read[String]("test_").isEmpty)
-
-// Test deserializing objects that contain RpcEndpointRef
-val testRpcEnv = RpcEnv.create("test", "localhost", 12345, conf, new 
SecurityManager(conf))
 try {
-  // Create a real endpoint so that we can test RpcEndpointRef 
deserialization
-  val workerEndpoint = testRpcEnv.setupEndpoint("worker", new RpcEndpoint {
-override val rpcEnv: RpcEnv = testRpcEnv
-  })
-
-  val workerToPersist = new WorkerInfo(
-id = "test_worker",
-host = "127.0.0.1",
-port = 1,
-cores = 0,
-memory = 0,
-endpoint = workerEndpoint,
-webUiPort = 0,
-publicAddress = ""
-  )
-
-  persistenceEngine.addWorker(workerToPersist)
-
-  val (storedApps, storedDrivers, storedWorkers) =
-persistenceEngine.readPersistedData(testRpcEnv)
-
-  assert(storedApps.isEmpty)
-  assert(storedDrivers.isEmpty)
-
-  // Check deserializing WorkerInfo
-  assert(storedWorkers.size == 1)
-  val recoveryWorkerInfo = storedWorkers.head
-  assert(workerToPersist.id === recoveryWorkerInfo.id)
-  assert(workerToPersist.host === recoveryWorkerInfo.host)
-  assert(workerToPersist.port === recoveryWorkerInfo.port)
-  assert(workerToPersist.cores === recoveryWorkerInfo.cores)
-

spark git commit: [SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite tests

2015-11-20 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 0c23dd52d -> fbe6888cc


[SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite tests

In PersistenceEngineSuite, we do not call `close()` on the PersistenceEngine at 
the end of the test. For the ZooKeeperPersistenceEngine, this causes us to leak 
a ZooKeeper client, causing the logs of unrelated tests to be periodically 
spammed with connection error messages from that client:

```
15/11/20 05:13:35.789 
pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741)
 INFO ClientCnxn: Opening socket connection to server 
localhost/127.0.0.1:15741. Will not attempt to authenticate using SASL (unknown 
error)
15/11/20 05:13:35.790 
pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741)
 WARN ClientCnxn: Session 0x15124ff48dd for server null, unexpected error, 
closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
```

This patch fixes this by using a `finally` block.

Author: Josh Rosen 

Closes #9864 from JoshRosen/close-zookeeper-client-in-tests.

(cherry picked from commit 89fd9bd06160fa89dedbf685bfe159ffe4a06ec6)
Signed-off-by: Josh Rosen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fbe6888c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fbe6888c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fbe6888c

Branch: refs/heads/branch-1.6
Commit: fbe6888cc0c8a16531a4ba7ce5235b84474f1a7b
Parents: 0c23dd5
Author: Josh Rosen 
Authored: Fri Nov 20 14:31:26 2015 -0800
Committer: Josh Rosen 
Committed: Fri Nov 20 14:31:38 2015 -0800

--
 .../deploy/master/PersistenceEngineSuite.scala  | 100 ++-
 1 file changed, 52 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fbe6888c/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
index 3477557..7a44728 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala
@@ -63,56 +63,60 @@ class PersistenceEngineSuite extends SparkFunSuite {
   conf: SparkConf, persistenceEngineCreator: Serializer => 
PersistenceEngine): Unit = {
 val serializer = new JavaSerializer(conf)
 val persistenceEngine = persistenceEngineCreator(serializer)
-persistenceEngine.persist("test_1", "test_1_value")
-assert(Seq("test_1_value") === persistenceEngine.read[String]("test_"))
-persistenceEngine.persist("test_2", "test_2_value")
-assert(Set("test_1_value", "test_2_value") === 
persistenceEngine.read[String]("test_").toSet)
-persistenceEngine.unpersist("test_1")
-assert(Seq("test_2_value") === persistenceEngine.read[String]("test_"))
-persistenceEngine.unpersist("test_2")
-assert(persistenceEngine.read[String]("test_").isEmpty)
-
-// Test deserializing objects that contain RpcEndpointRef
-val testRpcEnv = RpcEnv.create("test", "localhost", 12345, conf, new 
SecurityManager(conf))
 try {
-  // Create a real endpoint so that we can test RpcEndpointRef 
deserialization
-  val workerEndpoint = testRpcEnv.setupEndpoint("worker", new RpcEndpoint {
-override val rpcEnv: RpcEnv = testRpcEnv
-  })
-
-  val workerToPersist = new WorkerInfo(
-id = "test_worker",
-host = "127.0.0.1",
-port = 1,
-cores = 0,
-memory = 0,
-endpoint = workerEndpoint,
-webUiPort = 0,
-publicAddress = ""
-  )
-
-  persistenceEngine.addWorker(workerToPersist)
-
-  val (storedApps, storedDrivers, storedWorkers) =
-persistenceEngine.readPersistedData(testRpcEnv)
-
-  assert(storedApps.isEmpty)
-  assert(storedDrivers.isEmpty)
-
-  // Check deserializing WorkerInfo
-  assert(storedWorkers.size == 1)
-  val recoveryWorkerInfo = storedWorkers.head
-  assert(workerToPersist.id === recoveryWorkerInfo.id)
-  assert(workerToPersist.host === recoveryWorkerInfo.host)
-

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 9a906c1c3 -> e9ae1fda9


[SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction 
and TransformFunctionSerializer

TransformFunction and TransformFunctionSerializer don't rethrow the exception, 
so when any exception happens, it just return None. This will cause some weird 
NPE and confuse people.

Author: Shixiong Zhu 

Closes #9847 from zsxwing/pyspark-streaming-exception.

(cherry picked from commit be7a2cfd978143f6f265eca63e9e24f755bc9f22)
Signed-off-by: Tathagata Das 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9ae1fda
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e9ae1fda
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e9ae1fda

Branch: refs/heads/branch-1.5
Commit: e9ae1fda9e6009cf95f9a98ba130297126155e06
Parents: 9a906c1
Author: Shixiong Zhu 
Authored: Fri Nov 20 14:23:01 2015 -0800
Committer: Tathagata Das 
Committed: Fri Nov 20 14:23:38 2015 -0800

--
 python/pyspark/streaming/tests.py | 16 
 python/pyspark/streaming/util.py  |  3 +++
 2 files changed, 19 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e9ae1fda/python/pyspark/streaming/tests.py
--
diff --git a/python/pyspark/streaming/tests.py 
b/python/pyspark/streaming/tests.py
index 41f94af..63a5fd0 100644
--- a/python/pyspark/streaming/tests.py
+++ b/python/pyspark/streaming/tests.py
@@ -391,6 +391,22 @@ class BasicOperationTests(PySparkStreamingTestCase):
 expected = [[('k', v)] for v in expected]
 self._test_func(input, func, expected)
 
+def test_failed_func(self):
+input = [self.sc.parallelize([d], 1) for d in range(4)]
+input_stream = self.ssc.queueStream(input)
+
+def failed_func(i):
+raise ValueError("failed")
+
+input_stream.map(failed_func).pprint()
+self.ssc.start()
+try:
+self.ssc.awaitTerminationOrTimeout(10)
+except:
+return
+
+self.fail("a failed func should throw an error")
+
 
 class WindowFunctionTests(PySparkStreamingTestCase):
 

http://git-wip-us.apache.org/repos/asf/spark/blob/e9ae1fda/python/pyspark/streaming/util.py
--
diff --git a/python/pyspark/streaming/util.py b/python/pyspark/streaming/util.py
index b20613b..767c732 100644
--- a/python/pyspark/streaming/util.py
+++ b/python/pyspark/streaming/util.py
@@ -64,6 +64,7 @@ class TransformFunction(object):
 return r._jrdd
 except Exception:
 traceback.print_exc()
+raise
 
 def __repr__(self):
 return "TransformFunction(%s)" % self.func
@@ -95,6 +96,7 @@ class TransformFunctionSerializer(object):
 return bytearray(self.serializer.dumps((func.func, 
func.deserializers)))
 except Exception:
 traceback.print_exc()
+raise
 
 def loads(self, data):
 try:
@@ -102,6 +104,7 @@ class TransformFunctionSerializer(object):
 return TransformFunction(self.ctx, f, *deserializers)
 except Exception:
 traceback.print_exc()
+raise
 
 def __repr__(self):
 return "TransformFunctionSerializer(%s)" % self.serializer


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master 9ed4ad426 -> be7a2cfd9


[SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction 
and TransformFunctionSerializer

TransformFunction and TransformFunctionSerializer don't rethrow the exception, 
so when any exception happens, it just return None. This will cause some weird 
NPE and confuse people.

Author: Shixiong Zhu 

Closes #9847 from zsxwing/pyspark-streaming-exception.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/be7a2cfd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/be7a2cfd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/be7a2cfd

Branch: refs/heads/master
Commit: be7a2cfd978143f6f265eca63e9e24f755bc9f22
Parents: 9ed4ad4
Author: Shixiong Zhu 
Authored: Fri Nov 20 14:23:01 2015 -0800
Committer: Tathagata Das 
Committed: Fri Nov 20 14:23:01 2015 -0800

--
 python/pyspark/streaming/tests.py | 16 
 python/pyspark/streaming/util.py  |  3 +++
 2 files changed, 19 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/be7a2cfd/python/pyspark/streaming/tests.py
--
diff --git a/python/pyspark/streaming/tests.py 
b/python/pyspark/streaming/tests.py
index 3403f6d..a0e0267 100644
--- a/python/pyspark/streaming/tests.py
+++ b/python/pyspark/streaming/tests.py
@@ -403,6 +403,22 @@ class BasicOperationTests(PySparkStreamingTestCase):
 expected = [[('k', v)] for v in expected]
 self._test_func(input, func, expected)
 
+def test_failed_func(self):
+input = [self.sc.parallelize([d], 1) for d in range(4)]
+input_stream = self.ssc.queueStream(input)
+
+def failed_func(i):
+raise ValueError("failed")
+
+input_stream.map(failed_func).pprint()
+self.ssc.start()
+try:
+self.ssc.awaitTerminationOrTimeout(10)
+except:
+return
+
+self.fail("a failed func should throw an error")
+
 
 class StreamingListenerTests(PySparkStreamingTestCase):
 

http://git-wip-us.apache.org/repos/asf/spark/blob/be7a2cfd/python/pyspark/streaming/util.py
--
diff --git a/python/pyspark/streaming/util.py b/python/pyspark/streaming/util.py
index b20613b..767c732 100644
--- a/python/pyspark/streaming/util.py
+++ b/python/pyspark/streaming/util.py
@@ -64,6 +64,7 @@ class TransformFunction(object):
 return r._jrdd
 except Exception:
 traceback.print_exc()
+raise
 
 def __repr__(self):
 return "TransformFunction(%s)" % self.func
@@ -95,6 +96,7 @@ class TransformFunctionSerializer(object):
 return bytearray(self.serializer.dumps((func.func, 
func.deserializers)))
 except Exception:
 traceback.print_exc()
+raise
 
 def loads(self, data):
 try:
@@ -102,6 +104,7 @@ class TransformFunctionSerializer(object):
 return TransformFunction(self.ctx, f, *deserializers)
 except Exception:
 traceback.print_exc()
+raise
 
 def __repr__(self):
 return "TransformFunctionSerializer(%s)" % self.serializer


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help information for SparkR:::summary correctly

2015-11-20 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 b9b0e1747 -> 11a11f0ff


[SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help 
information for SparkR:::summary correctly

Fix use of aliases and changes uses of rdname and seealso
`aliases` is the hint for `?` - it should not be linked to some other name - 
those should be seealso
https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html

Clean up usage on family, as multiple use of family with the same rdname is 
causing duplicated See Also html blocks (like 
http://spark.apache.org/docs/latest/api/R/count.html)
Also changing some rdname for dplyr-like variant for better R user visibility 
in R doc, eg. rbind, summary, mutate, summarize

shivaram yanboliang

Author: felixcheung 

Closes #9750 from felixcheung/rdocaliases.

(cherry picked from commit a6239d587c638691f52eca3eee905c53fbf35a12)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/11a11f0f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/11a11f0f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/11a11f0f

Branch: refs/heads/branch-1.6
Commit: 11a11f0ffcb6d7f6478239cfa3fb5d95877cddab
Parents: b9b0e17
Author: felixcheung 
Authored: Fri Nov 20 15:10:55 2015 -0800
Committer: Shivaram Venkataraman 
Committed: Fri Nov 20 15:11:08 2015 -0800

--
 R/pkg/R/DataFrame.R | 96 
 R/pkg/R/broadcast.R |  1 -
 R/pkg/R/generics.R  | 12 +++---
 R/pkg/R/group.R | 12 +++---
 4 files changed, 37 insertions(+), 84 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/11a11f0f/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 06b0108..8a13e7a 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -254,7 +254,6 @@ setMethod("dtypes",
 #' @family DataFrame functions
 #' @rdname columns
 #' @name columns
-#' @aliases names
 #' @export
 #' @examples
 #'\dontrun{
@@ -272,7 +271,6 @@ setMethod("columns",
 })
   })
 
-#' @family DataFrame functions
 #' @rdname columns
 #' @name names
 setMethod("names",
@@ -281,7 +279,6 @@ setMethod("names",
 columns(x)
   })
 
-#' @family DataFrame functions
 #' @rdname columns
 #' @name names<-
 setMethod("names<-",
@@ -533,14 +530,8 @@ setMethod("distinct",
 dataFrame(sdf)
   })
 
-#' @title Distinct rows in a DataFrame
-#
-#' @description Returns a new DataFrame containing distinct rows in this 
DataFrame
-#'
-#' @family DataFrame functions
-#' @rdname unique
+#' @rdname distinct
 #' @name unique
-#' @aliases distinct
 setMethod("unique",
   signature(x = "DataFrame"),
   function(x) {
@@ -557,7 +548,7 @@ setMethod("unique",
 #'
 #' @family DataFrame functions
 #' @rdname sample
-#' @aliases sample_frac
+#' @name sample
 #' @export
 #' @examples
 #'\dontrun{
@@ -579,7 +570,6 @@ setMethod("sample",
 dataFrame(sdf)
   })
 
-#' @family DataFrame functions
 #' @rdname sample
 #' @name sample_frac
 setMethod("sample_frac",
@@ -589,16 +579,15 @@ setMethod("sample_frac",
 sample(x, withReplacement, fraction)
   })
 
-#' Count
+#' nrow
 #'
 #' Returns the number of rows in a DataFrame
 #'
 #' @param x A SparkSQL DataFrame
 #'
 #' @family DataFrame functions
-#' @rdname count
+#' @rdname nrow
 #' @name count
-#' @aliases nrow
 #' @export
 #' @examples
 #'\dontrun{
@@ -614,14 +603,8 @@ setMethod("count",
 callJMethod(x@sdf, "count")
   })
 
-#' @title Number of rows for a DataFrame
-#' @description Returns number of rows in a DataFrames
-#'
 #' @name nrow
-#'
-#' @family DataFrame functions
 #' @rdname nrow
-#' @aliases count
 setMethod("nrow",
   signature(x = "DataFrame"),
   function(x) {
@@ -870,7 +853,6 @@ setMethod("toRDD",
 #' @param x a DataFrame
 #' @return a GroupedData
 #' @seealso GroupedData
-#' @aliases group_by
 #' @family DataFrame functions
 #' @rdname groupBy
 #' @name groupBy
@@ -896,7 +878,6 @@ setMethod("groupBy",
  groupedData(sgd)
})
 
-#' @family DataFrame functions
 #' @rdname groupBy
 #' @name group_by
 setMethod("group_by",
@@ -913,7 +894,6 @@ setMethod("group_by",
 #' @family DataFrame functions
 #' @rdname agg
 #' @name agg
-#' @aliases summarize
 #' @export
 setMethod("agg",
   signature(x = "DataFrame"),
@@ -921,7 +901,6 @@ setMethod("agg",
 agg(groupBy(x), ...)
   })
 
-#' @family DataFrame functions
 #' @rdname agg
 #' @name summarize
 setMethod("summarize",
@@ -1092,7 +1071,6 @@ setMethod("[",

spark git commit: [SPARK-11636][SQL] Support classes defined in the REPL with Encoders

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 11a11f0ff -> 0665fb5ea


[SPARK-11636][SQL] Support classes defined in the REPL with Encoders

#theScaryParts (i.e. changes to the repl, executor classloaders and codegen)...

Author: Michael Armbrust 
Author: Yin Huai 

Closes #9825 from marmbrus/dataset-replClasses2.

(cherry picked from commit 4b84c72dfbb9ddb415fee35f69305b5d7b280891)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0665fb5e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0665fb5e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0665fb5e

Branch: refs/heads/branch-1.6
Commit: 0665fb5eae931ee93e320da9fedcfd6649ed004e
Parents: 11a11f0
Author: Michael Armbrust 
Authored: Fri Nov 20 15:17:17 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 15:17:31 2015 -0800

--
 .../org/apache/spark/repl/SparkIMain.scala  | 14 
 .../scala/org/apache/spark/repl/ReplSuite.scala | 24 
 .../apache/spark/repl/ExecutorClassLoader.scala |  8 ++-
 .../expressions/codegen/CodeGenerator.scala |  4 ++--
 4 files changed, 43 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0665fb5e/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
index 4ee605f..829b122 100644
--- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
+++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
@@ -1221,10 +1221,16 @@ import org.apache.spark.annotation.DeveloperApi
 )
   }
 
-  val preamble = """
-|class %s extends Serializable {
-|  %s%s%s
-  """.stripMargin.format(lineRep.readName, envLines.map("  " + _ + 
";\n").mkString, importsPreamble, indentCode(toCompute))
+  val preamble = s"""
+|class ${lineRep.readName} extends Serializable {
+|  ${envLines.map("  " + _ + ";\n").mkString}
+|  $importsPreamble
+|
+|  // If we need to construct any objects defined in the REPL on an 
executor we will need
+|  // to pass the outer scope to the appropriate encoder.
+|  
org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)
+|  ${indentCode(toCompute)}
+  """.stripMargin
   val postamble = importsTrailer + "\n}" + "\n" +
 "object " + lineRep.readName + " {\n" +
 "  val INSTANCE = new " + lineRep.readName + "();\n" +

http://git-wip-us.apache.org/repos/asf/spark/blob/0665fb5e/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
--
diff --git 
a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala 
b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index 5674dcd..081aa03 100644
--- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -262,6 +262,9 @@ class ReplSuite extends SparkFunSuite {
 |import sqlContext.implicits._
 |case class TestCaseClass(value: Int)
 |sc.parallelize(1 to 10).map(x => TestCaseClass(x)).toDF().collect()
+|
+|// Test Dataset Serialization in the REPL
+|Seq(TestCaseClass(1)).toDS().collect()
   """.stripMargin)
 assertDoesNotContain("error:", output)
 assertDoesNotContain("Exception", output)
@@ -278,6 +281,27 @@ class ReplSuite extends SparkFunSuite {
 assertDoesNotContain("java.lang.ClassNotFoundException", output)
   }
 
+  test("Datasets and encoders") {
+val output = runInterpreter("local",
+  """
+|import org.apache.spark.sql.functions._
+|import org.apache.spark.sql.Encoder
+|import org.apache.spark.sql.expressions.Aggregator
+|import org.apache.spark.sql.TypedColumn
+|val simpleSum = new Aggregator[Int, Int, Int] with Serializable {
+|  def zero: Int = 0 // The initial value.
+|  def reduce(b: Int, a: Int) = b + a// Add an element to the 
running total
+|  def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
+|  def finish(b: Int) = b// Return the final result.
+|}.toColumn
+|
+|val ds = Seq(1, 2, 3, 4).toDS()
+|ds.select(simpleSum).collect
+  """.stripMargin)
+

spark git commit: [SPARK-11636][SQL] Support classes defined in the REPL with Encoders

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master a6239d587 -> 4b84c72df


[SPARK-11636][SQL] Support classes defined in the REPL with Encoders

#theScaryParts (i.e. changes to the repl, executor classloaders and codegen)...

Author: Michael Armbrust 
Author: Yin Huai 

Closes #9825 from marmbrus/dataset-replClasses2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b84c72d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b84c72d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b84c72d

Branch: refs/heads/master
Commit: 4b84c72dfbb9ddb415fee35f69305b5d7b280891
Parents: a6239d5
Author: Michael Armbrust 
Authored: Fri Nov 20 15:17:17 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 15:17:17 2015 -0800

--
 .../org/apache/spark/repl/SparkIMain.scala  | 14 
 .../scala/org/apache/spark/repl/ReplSuite.scala | 24 
 .../apache/spark/repl/ExecutorClassLoader.scala |  8 ++-
 .../expressions/codegen/CodeGenerator.scala |  4 ++--
 4 files changed, 43 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4b84c72d/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
--
diff --git 
a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala 
b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
index 4ee605f..829b122 100644
--- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
+++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala
@@ -1221,10 +1221,16 @@ import org.apache.spark.annotation.DeveloperApi
 )
   }
 
-  val preamble = """
-|class %s extends Serializable {
-|  %s%s%s
-  """.stripMargin.format(lineRep.readName, envLines.map("  " + _ + 
";\n").mkString, importsPreamble, indentCode(toCompute))
+  val preamble = s"""
+|class ${lineRep.readName} extends Serializable {
+|  ${envLines.map("  " + _ + ";\n").mkString}
+|  $importsPreamble
+|
+|  // If we need to construct any objects defined in the REPL on an 
executor we will need
+|  // to pass the outer scope to the appropriate encoder.
+|  
org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)
+|  ${indentCode(toCompute)}
+  """.stripMargin
   val postamble = importsTrailer + "\n}" + "\n" +
 "object " + lineRep.readName + " {\n" +
 "  val INSTANCE = new " + lineRep.readName + "();\n" +

http://git-wip-us.apache.org/repos/asf/spark/blob/4b84c72d/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
--
diff --git 
a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala 
b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
index 5674dcd..081aa03 100644
--- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
+++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala
@@ -262,6 +262,9 @@ class ReplSuite extends SparkFunSuite {
 |import sqlContext.implicits._
 |case class TestCaseClass(value: Int)
 |sc.parallelize(1 to 10).map(x => TestCaseClass(x)).toDF().collect()
+|
+|// Test Dataset Serialization in the REPL
+|Seq(TestCaseClass(1)).toDS().collect()
   """.stripMargin)
 assertDoesNotContain("error:", output)
 assertDoesNotContain("Exception", output)
@@ -278,6 +281,27 @@ class ReplSuite extends SparkFunSuite {
 assertDoesNotContain("java.lang.ClassNotFoundException", output)
   }
 
+  test("Datasets and encoders") {
+val output = runInterpreter("local",
+  """
+|import org.apache.spark.sql.functions._
+|import org.apache.spark.sql.Encoder
+|import org.apache.spark.sql.expressions.Aggregator
+|import org.apache.spark.sql.TypedColumn
+|val simpleSum = new Aggregator[Int, Int, Int] with Serializable {
+|  def zero: Int = 0 // The initial value.
+|  def reduce(b: Int, a: Int) = b + a// Add an element to the 
running total
+|  def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
+|  def finish(b: Int) = b// Return the final result.
+|}.toColumn
+|
+|val ds = Seq(1, 2, 3, 4).toDS()
+|ds.select(simpleSum).collect
+  """.stripMargin)
+assertDoesNotContain("error:", output)
+assertDoesNotContain("Exception", output)
+  }
+
   test("SPARK-2632 importing a method from non serializable class

spark git commit: [SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.

2015-11-20 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master ed47b1e66 -> 58b4e4f88


[SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.

This mainly moves SqlNewHadoopRDD to the sql package. There is some state that 
is
shared between core and I've left that in core. This allows some other 
associated
minor cleanup.

Author: Nong Li 

Closes #9845 from nongli/spark-11787.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/58b4e4f8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/58b4e4f8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/58b4e4f8

Branch: refs/heads/master
Commit: 58b4e4f88a330135c4cec04a30d24ef91bc61d91
Parents: ed47b1e
Author: Nong Li 
Authored: Fri Nov 20 15:30:53 2015 -0800
Committer: Reynold Xin 
Committed: Fri Nov 20 15:30:53 2015 -0800

--
 .../scala/org/apache/spark/rdd/HadoopRDD.scala  |   6 +-
 .../org/apache/spark/rdd/SqlNewHadoopRDD.scala  | 317 ---
 .../apache/spark/rdd/SqlNewHadoopRDDState.scala |  41 +++
 .../sql/catalyst/expressions/UnsafeRow.java |  59 +++-
 .../catalyst/expressions/InputFileName.scala|   6 +-
 .../parquet/UnsafeRowParquetRecordReader.java   |  14 +
 .../scala/org/apache/spark/sql/SQLConf.scala|   5 +
 .../execution/datasources/SqlNewHadoopRDD.scala | 299 +
 .../datasources/parquet/ParquetRelation.scala   |   2 +-
 .../parquet/ParquetFilterSuite.scala|  43 +--
 .../datasources/parquet/ParquetIOSuite.scala|  19 ++
 11 files changed, 453 insertions(+), 358 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/58b4e4f8/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
index 7db5834..f37c95b 100644
--- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
@@ -215,8 +215,8 @@ class HadoopRDD[K, V](
 
   // Sets the thread local variable for the file's name
   split.inputSplit.value match {
-case fs: FileSplit => 
SqlNewHadoopRDD.setInputFileName(fs.getPath.toString)
-case _ => SqlNewHadoopRDD.unsetInputFileName()
+case fs: FileSplit => 
SqlNewHadoopRDDState.setInputFileName(fs.getPath.toString)
+case _ => SqlNewHadoopRDDState.unsetInputFileName()
   }
 
   // Find a function that will return the FileSystem bytes read by this 
thread. Do this before
@@ -256,7 +256,7 @@ class HadoopRDD[K, V](
 
   override def close() {
 if (reader != null) {
-  SqlNewHadoopRDD.unsetInputFileName()
+  SqlNewHadoopRDDState.unsetInputFileName()
   // Close the reader and release it. Note: it's very important that 
we don't close the
   // reader more than once, since that exposes us to MAPREDUCE-5918 
when running against
   // Hadoop 1.x and older Hadoop 2.x releases. That bug can lead to 
non-deterministic

http://git-wip-us.apache.org/repos/asf/spark/blob/58b4e4f8/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala
deleted file mode 100644
index 4d17633..000
--- a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala
+++ /dev/null
@@ -1,317 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.rdd
-
-import java.text.SimpleDateFormat
-import java.util.Date
-
-import org.apache.hadoop.conf.{Configurable, Configuration}
-import org.apache.hadoop.io.Writable
-import org.apache.hadoop.mapreduce._
-import org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}
-import org.apache.spark.broadcast.Broadcast
-import

spark git commit: [SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.

2015-11-20 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 1dde97176 -> 7437a7f5b


[SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.

This mainly moves SqlNewHadoopRDD to the sql package. There is some state that 
is
shared between core and I've left that in core. This allows some other 
associated
minor cleanup.

Author: Nong Li 

Closes #9845 from nongli/spark-11787.

(cherry picked from commit 58b4e4f88a330135c4cec04a30d24ef91bc61d91)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7437a7f5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7437a7f5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7437a7f5

Branch: refs/heads/branch-1.6
Commit: 7437a7f5bd06fd304265ab4e708a97fcd8492839
Parents: 1dde971
Author: Nong Li 
Authored: Fri Nov 20 15:30:53 2015 -0800
Committer: Reynold Xin 
Committed: Fri Nov 20 15:30:59 2015 -0800

--
 .../scala/org/apache/spark/rdd/HadoopRDD.scala  |   6 +-
 .../org/apache/spark/rdd/SqlNewHadoopRDD.scala  | 317 ---
 .../apache/spark/rdd/SqlNewHadoopRDDState.scala |  41 +++
 .../sql/catalyst/expressions/UnsafeRow.java |  59 +++-
 .../catalyst/expressions/InputFileName.scala|   6 +-
 .../parquet/UnsafeRowParquetRecordReader.java   |  14 +
 .../scala/org/apache/spark/sql/SQLConf.scala|   5 +
 .../execution/datasources/SqlNewHadoopRDD.scala | 299 +
 .../datasources/parquet/ParquetRelation.scala   |   2 +-
 .../parquet/ParquetFilterSuite.scala|  43 +--
 .../datasources/parquet/ParquetIOSuite.scala|  19 ++
 11 files changed, 453 insertions(+), 358 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7437a7f5/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
index 7db5834..f37c95b 100644
--- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
@@ -215,8 +215,8 @@ class HadoopRDD[K, V](
 
   // Sets the thread local variable for the file's name
   split.inputSplit.value match {
-case fs: FileSplit => 
SqlNewHadoopRDD.setInputFileName(fs.getPath.toString)
-case _ => SqlNewHadoopRDD.unsetInputFileName()
+case fs: FileSplit => 
SqlNewHadoopRDDState.setInputFileName(fs.getPath.toString)
+case _ => SqlNewHadoopRDDState.unsetInputFileName()
   }
 
   // Find a function that will return the FileSystem bytes read by this 
thread. Do this before
@@ -256,7 +256,7 @@ class HadoopRDD[K, V](
 
   override def close() {
 if (reader != null) {
-  SqlNewHadoopRDD.unsetInputFileName()
+  SqlNewHadoopRDDState.unsetInputFileName()
   // Close the reader and release it. Note: it's very important that 
we don't close the
   // reader more than once, since that exposes us to MAPREDUCE-5918 
when running against
   // Hadoop 1.x and older Hadoop 2.x releases. That bug can lead to 
non-deterministic

http://git-wip-us.apache.org/repos/asf/spark/blob/7437a7f5/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala
deleted file mode 100644
index 4d17633..000
--- a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala
+++ /dev/null
@@ -1,317 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.rdd
-
-import java.text.SimpleDateFormat
-import java.util.Date
-
-import org.apache.hadoop.conf.{Configurable, Configuration}
-import org.apache.hadoop.io.Writable
-import org.apache.hadoop.mapreduce._
-import

[1/2] spark git commit: Preparing Spark release v1.6.0-preview1

2015-11-20 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 e0bb4e09c -> d409afdbc


Preparing Spark release v1.6.0-preview1


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7582425d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7582425d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7582425d

Branch: refs/heads/branch-1.6
Commit: 7582425d193d46c3f14b666b551dd42ff54d7ad7
Parents: e0bb4e0
Author: Patrick Wendell 
Authored: Fri Nov 20 15:43:02 2015 -0800
Committer: Patrick Wendell 
Committed: Fri Nov 20 15:43:02 2015 -0800

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 docker-integration-tests/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tags/pom.xml| 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 35 files changed, 35 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 4b60ee0..fbabaa5 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0-SNAPSHOT
+1.6.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index 672e946..1b3e417 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0-SNAPSHOT
+1.6.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index 37e3f16..d32b93b 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0-SNAPSHOT
+1.6.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/docker-integration-tests/pom.xml
--
diff --git a/docker-integration-tests/pom.xml b/docker-integration-tests/pom.xml
index dee0c4a..ee9de91 100644
--- a/docker-integration-tests/pom.xml
+++ b/docker-integration-tests/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0-SNAPSHOT
+1.6.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index f5ab2a7..37b15bb 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0-SNAPSHOT
+1.6.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index dceedcf..295455a 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0-SNAPSHOT
+1.6.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/external/flume-sink/pom.xml

Git Push Summary

2015-11-20 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.6.0-preview1 [created] 7582425d1

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: Preparing development version 1.6.0-SNAPSHOT

2015-11-20 Thread pwendell

Preparing development version 1.6.0-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d409afdb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d409afdb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d409afdb

Branch: refs/heads/branch-1.6
Commit: d409afdbceb40ea90b1d20656e8ce79bff2ab71f
Parents: 7582425
Author: Patrick Wendell 
Authored: Fri Nov 20 15:43:08 2015 -0800
Committer: Patrick Wendell 
Committed: Fri Nov 20 15:43:08 2015 -0800

--
 assembly/pom.xml| 2 +-
 bagel/pom.xml   | 2 +-
 core/pom.xml| 2 +-
 docker-integration-tests/pom.xml| 2 +-
 examples/pom.xml| 2 +-
 external/flume-assembly/pom.xml | 2 +-
 external/flume-sink/pom.xml | 2 +-
 external/flume/pom.xml  | 2 +-
 external/kafka-assembly/pom.xml | 2 +-
 external/kafka/pom.xml  | 2 +-
 external/mqtt-assembly/pom.xml  | 2 +-
 external/mqtt/pom.xml   | 2 +-
 external/twitter/pom.xml| 2 +-
 external/zeromq/pom.xml | 2 +-
 extras/java8-tests/pom.xml  | 2 +-
 extras/kinesis-asl-assembly/pom.xml | 2 +-
 extras/kinesis-asl/pom.xml  | 2 +-
 extras/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml  | 2 +-
 launcher/pom.xml| 2 +-
 mllib/pom.xml   | 2 +-
 network/common/pom.xml  | 2 +-
 network/shuffle/pom.xml | 2 +-
 network/yarn/pom.xml| 2 +-
 pom.xml | 2 +-
 repl/pom.xml| 2 +-
 sql/catalyst/pom.xml| 2 +-
 sql/core/pom.xml| 2 +-
 sql/hive-thriftserver/pom.xml   | 2 +-
 sql/hive/pom.xml| 2 +-
 streaming/pom.xml   | 2 +-
 tags/pom.xml| 2 +-
 tools/pom.xml   | 2 +-
 unsafe/pom.xml  | 2 +-
 yarn/pom.xml| 2 +-
 35 files changed, 35 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index fbabaa5..4b60ee0 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0
+1.6.0-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/bagel/pom.xml
--
diff --git a/bagel/pom.xml b/bagel/pom.xml
index 1b3e417..672e946 100644
--- a/bagel/pom.xml
+++ b/bagel/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0
+1.6.0-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/core/pom.xml
--
diff --git a/core/pom.xml b/core/pom.xml
index d32b93b..37e3f16 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0
+1.6.0-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/docker-integration-tests/pom.xml
--
diff --git a/docker-integration-tests/pom.xml b/docker-integration-tests/pom.xml
index ee9de91..dee0c4a 100644
--- a/docker-integration-tests/pom.xml
+++ b/docker-integration-tests/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0
+1.6.0-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/examples/pom.xml
--
diff --git a/examples/pom.xml b/examples/pom.xml
index 37b15bb..f5ab2a7 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0
+1.6.0-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/external/flume-assembly/pom.xml
--
diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml
index 295455a..dceedcf 100644
--- a/external/flume-assembly/pom.xml
+++ b/external/flume-assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.10
-1.6.0
+1.6.0-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/external/flume-sink/pom.xml
--
diff --git a/external/flume-sink/pom.xml

spark git commit: [HOTFIX] Fix Java Dataset Tests

2015-11-20 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 68ed04683 -> 47815878a


[HOTFIX] Fix Java Dataset Tests


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47815878
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47815878
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47815878

Branch: refs/heads/master
Commit: 47815878ad5e47e89bfbd57acb848be2ce67a4a5
Parents: 68ed046
Author: Michael Armbrust 
Authored: Fri Nov 20 16:02:03 2015 -0800
Committer: Michael Armbrust 
Committed: Fri Nov 20 16:03:14 2015 -0800

--
 .../test/java/test/org/apache/spark/sql/JavaDatasetSuite.java| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/47815878/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
--
diff --git 
a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java 
b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
index f7249b8..f32374b 100644
--- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
+++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
@@ -409,8 +409,8 @@ public class JavaDatasetSuite implements Serializable {
   .as(Encoders.tuple(Encoders.STRING(), Encoders.INT()));
 Assert.assertEquals(
   Arrays.asList(
-new Tuple4<>("a", 3, 3L, 2L),
-new Tuple4<>("b", 3, 3L, 1L)),
+new Tuple2<>("a", 3),
+new Tuple2<>("b", 3)),
   agged2.collectAsList());
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11819][SQL][FOLLOW-UP] fix scala 2.11 build

2015-11-20 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master a2dce22e0 -> 7d3f922c4


[SPARK-11819][SQL][FOLLOW-UP] fix scala 2.11 build

seems scala 2.11 doesn't support: define private methods in `trait xxx` and use 
it in `object xxx extend xxx`.

Author: Wenchen Fan 

Closes #9879 from cloud-fan/follow.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7d3f922c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7d3f922c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7d3f922c

Branch: refs/heads/master
Commit: 7d3f922c4ba76c4193f98234ae662065c39cdfb1
Parents: a2dce22
Author: Wenchen Fan 
Authored: Fri Nov 20 23:31:19 2015 -0800
Committer: Reynold Xin 
Committed: Fri Nov 20 23:31:19 2015 -0800

--
 .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7d3f922c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 4a4a62e..476bece 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -670,14 +670,14 @@ trait ScalaReflection {
* Unlike `schemaFor`, this method won't throw exception for un-supported 
type, it will return
* `NullType` silently instead.
*/
-  protected def silentSchemaFor(tpe: `Type`): Schema = try {
+  def silentSchemaFor(tpe: `Type`): Schema = try {
 schemaFor(tpe)
   } catch {
 case _: UnsupportedOperationException => Schema(NullType, nullable = true)
   }
 
   /** Returns the full class name for a type. */
-  protected def getClassNameFromType(tpe: `Type`): String = {
+  def getClassNameFromType(tpe: `Type`): String = {
 tpe.erasure.typeSymbol.asClass.fullName
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

45 matches

Mail list logo