spark git commit: [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml
Repository: spark Updated Branches: refs/heads/master 9ace2e5c8 -> e359d5dcf [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Author: Yuhao YangCloses #9722 from hhbyyh/ldaMLExample. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e359d5dc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e359d5dc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e359d5dc Branch: refs/heads/master Commit: e359d5dcf5bd300213054ebeae9fe75c4f7eb9e7 Parents: 9ace2e5 Author: Yuhao Yang Authored: Fri Nov 20 09:57:09 2015 -0800 Committer: Xiangrui Meng Committed: Fri Nov 20 09:57:09 2015 -0800 -- docs/ml-clustering.md | 30 +++ docs/ml-guide.md| 3 +- docs/mllib-guide.md | 1 + .../spark/examples/ml/JavaLDAExample.java | 94 .../apache/spark/examples/ml/LDAExample.scala | 77 5 files changed, 204 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e359d5dc/docs/ml-clustering.md -- diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md new file mode 100644 index 000..1743ef4 --- /dev/null +++ b/docs/ml-clustering.md @@ -0,0 +1,30 @@ +--- +layout: global +title: Clustering - ML +displayTitle: ML - Clustering +--- + +In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html). + +## Latent Dirichlet allocation (LDA) + +`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`, +and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by +`EMLDAOptimizer` to a `DistributedLDAModel` if needed. + + + +Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details. + + +{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %} + + + + +Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details. + +{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %} + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/e359d5dc/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index be18a05..6f35b30 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the `spark.ml` API; e.g., provide class probabilities, and linear models provide model summaries. * [Feature extraction, transformation, and selection](ml-features.html) +* [Clustering](ml-clustering.html) * [Decision Trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) @@ -950,4 +951,4 @@ model.transform(test) {% endhighlight %} - + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/e359d5dc/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 91e50cc..54e35fc 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -69,6 +69,7 @@ We list major functionality from both below, with links to detailed guides. concepts. It also contains sections on using algorithms within the Pipelines API, for example: * [Feature extraction, transformation, and selection](ml-features.html) +* [Clustering](ml-clustering.html) * [Decision trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) http://git-wip-us.apache.org/repos/asf/spark/blob/e359d5dc/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java new file mode 100644 index 000..b3a7d2e --- /dev/null +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file
spark git commit: [SPARK-11852][ML] StandardScaler minor refactor
Repository: spark Updated Branches: refs/heads/branch-1.6 eab90d3f3 -> b11aa1797 [SPARK-11852][ML] StandardScaler minor refactor ```withStd``` and ```withMean``` should be params of ```StandardScaler``` and ```StandardScalerModel```. Author: Yanbo LiangCloses #9839 from yanboliang/standardScaler-refactor. (cherry picked from commit 9ace2e5c8d7fbd360a93bc5fc4eace64a697b44f) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b11aa179 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b11aa179 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b11aa179 Branch: refs/heads/branch-1.6 Commit: b11aa1797c928f2cfaf1d8821eff4be4109ac41d Parents: eab90d3 Author: Yanbo Liang Authored: Fri Nov 20 09:55:53 2015 -0800 Committer: Xiangrui Meng Committed: Fri Nov 20 09:56:02 2015 -0800 -- .../spark/ml/feature/StandardScaler.scala | 60 +--- .../spark/ml/feature/StandardScalerSuite.scala | 11 ++-- 2 files changed, 32 insertions(+), 39 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b11aa179/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala index 6d54521..d76a9c6 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala @@ -36,20 +36,30 @@ import org.apache.spark.sql.types.{StructField, StructType} private[feature] trait StandardScalerParams extends Params with HasInputCol with HasOutputCol { /** - * Centers the data with mean before scaling. + * Whether to center the data with mean before scaling. * It will build a dense output, so this does not work on sparse input * and will raise an exception. * Default: false * @group param */ - val withMean: BooleanParam = new BooleanParam(this, "withMean", "Center data with mean") + val withMean: BooleanParam = new BooleanParam(this, "withMean", +"Whether to center data with mean") + + /** @group getParam */ + def getWithMean: Boolean = $(withMean) /** - * Scales the data to unit standard deviation. + * Whether to scale the data to unit standard deviation. * Default: true * @group param */ - val withStd: BooleanParam = new BooleanParam(this, "withStd", "Scale to unit standard deviation") + val withStd: BooleanParam = new BooleanParam(this, "withStd", +"Whether to scale the data to unit standard deviation") + + /** @group getParam */ + def getWithStd: Boolean = $(withStd) + + setDefault(withMean -> false, withStd -> true) } /** @@ -63,8 +73,6 @@ class StandardScaler(override val uid: String) extends Estimator[StandardScalerM def this() = this(Identifiable.randomUID("stdScal")) - setDefault(withMean -> false, withStd -> true) - /** @group setParam */ def setInputCol(value: String): this.type = set(inputCol, value) @@ -82,7 +90,7 @@ class StandardScaler(override val uid: String) extends Estimator[StandardScalerM val input = dataset.select($(inputCol)).map { case Row(v: Vector) => v } val scaler = new feature.StandardScaler(withMean = $(withMean), withStd = $(withStd)) val scalerModel = scaler.fit(input) -copyValues(new StandardScalerModel(uid, scalerModel).setParent(this)) +copyValues(new StandardScalerModel(uid, scalerModel.std, scalerModel.mean).setParent(this)) } override def transformSchema(schema: StructType): StructType = { @@ -108,29 +116,19 @@ object StandardScaler extends DefaultParamsReadable[StandardScaler] { /** * :: Experimental :: * Model fitted by [[StandardScaler]]. + * + * @param std Standard deviation of the StandardScalerModel + * @param mean Mean of the StandardScalerModel */ @Experimental class StandardScalerModel private[ml] ( override val uid: String, -scaler: feature.StandardScalerModel) +val std: Vector, +val mean: Vector) extends Model[StandardScalerModel] with StandardScalerParams with MLWritable { import StandardScalerModel._ - /** Standard deviation of the StandardScalerModel */ - val std: Vector = scaler.std - - /** Mean of the StandardScalerModel */ - val mean: Vector = scaler.mean - - /** Whether to scale to unit standard deviation. */ - @Since("1.6.0") - def getWithStd: Boolean = scaler.withStd - - /** Whether to center data with mean. */ - @Since("1.6.0") - def getWithMean: Boolean = scaler.withMean - /** @group setParam */ def setInputCol(value:
spark git commit: [SPARK-11852][ML] StandardScaler minor refactor
Repository: spark Updated Branches: refs/heads/master a66142dec -> 9ace2e5c8 [SPARK-11852][ML] StandardScaler minor refactor ```withStd``` and ```withMean``` should be params of ```StandardScaler``` and ```StandardScalerModel```. Author: Yanbo LiangCloses #9839 from yanboliang/standardScaler-refactor. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9ace2e5c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9ace2e5c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9ace2e5c Branch: refs/heads/master Commit: 9ace2e5c8d7fbd360a93bc5fc4eace64a697b44f Parents: a66142d Author: Yanbo Liang Authored: Fri Nov 20 09:55:53 2015 -0800 Committer: Xiangrui Meng Committed: Fri Nov 20 09:55:53 2015 -0800 -- .../spark/ml/feature/StandardScaler.scala | 60 +--- .../spark/ml/feature/StandardScalerSuite.scala | 11 ++-- 2 files changed, 32 insertions(+), 39 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9ace2e5c/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala index 6d54521..d76a9c6 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala @@ -36,20 +36,30 @@ import org.apache.spark.sql.types.{StructField, StructType} private[feature] trait StandardScalerParams extends Params with HasInputCol with HasOutputCol { /** - * Centers the data with mean before scaling. + * Whether to center the data with mean before scaling. * It will build a dense output, so this does not work on sparse input * and will raise an exception. * Default: false * @group param */ - val withMean: BooleanParam = new BooleanParam(this, "withMean", "Center data with mean") + val withMean: BooleanParam = new BooleanParam(this, "withMean", +"Whether to center data with mean") + + /** @group getParam */ + def getWithMean: Boolean = $(withMean) /** - * Scales the data to unit standard deviation. + * Whether to scale the data to unit standard deviation. * Default: true * @group param */ - val withStd: BooleanParam = new BooleanParam(this, "withStd", "Scale to unit standard deviation") + val withStd: BooleanParam = new BooleanParam(this, "withStd", +"Whether to scale the data to unit standard deviation") + + /** @group getParam */ + def getWithStd: Boolean = $(withStd) + + setDefault(withMean -> false, withStd -> true) } /** @@ -63,8 +73,6 @@ class StandardScaler(override val uid: String) extends Estimator[StandardScalerM def this() = this(Identifiable.randomUID("stdScal")) - setDefault(withMean -> false, withStd -> true) - /** @group setParam */ def setInputCol(value: String): this.type = set(inputCol, value) @@ -82,7 +90,7 @@ class StandardScaler(override val uid: String) extends Estimator[StandardScalerM val input = dataset.select($(inputCol)).map { case Row(v: Vector) => v } val scaler = new feature.StandardScaler(withMean = $(withMean), withStd = $(withStd)) val scalerModel = scaler.fit(input) -copyValues(new StandardScalerModel(uid, scalerModel).setParent(this)) +copyValues(new StandardScalerModel(uid, scalerModel.std, scalerModel.mean).setParent(this)) } override def transformSchema(schema: StructType): StructType = { @@ -108,29 +116,19 @@ object StandardScaler extends DefaultParamsReadable[StandardScaler] { /** * :: Experimental :: * Model fitted by [[StandardScaler]]. + * + * @param std Standard deviation of the StandardScalerModel + * @param mean Mean of the StandardScalerModel */ @Experimental class StandardScalerModel private[ml] ( override val uid: String, -scaler: feature.StandardScalerModel) +val std: Vector, +val mean: Vector) extends Model[StandardScalerModel] with StandardScalerParams with MLWritable { import StandardScalerModel._ - /** Standard deviation of the StandardScalerModel */ - val std: Vector = scaler.std - - /** Mean of the StandardScalerModel */ - val mean: Vector = scaler.mean - - /** Whether to scale to unit standard deviation. */ - @Since("1.6.0") - def getWithStd: Boolean = scaler.withStd - - /** Whether to center data with mean. */ - @Since("1.6.0") - def getWithMean: Boolean = scaler.withMean - /** @group setParam */ def setInputCol(value: String): this.type = set(inputCol, value) @@ -139,6 +137,7 @@ class StandardScalerModel private[ml] ( override def
spark git commit: [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml
Repository: spark Updated Branches: refs/heads/branch-1.6 b11aa1797 -> 92d3563fd [SPARK-11689][ML] Add user guide and example code for LDA under spark.ml jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Author: Yuhao YangCloses #9722 from hhbyyh/ldaMLExample. (cherry picked from commit e359d5dcf5bd300213054ebeae9fe75c4f7eb9e7) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/92d3563f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/92d3563f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/92d3563f Branch: refs/heads/branch-1.6 Commit: 92d3563fd0cf0c3f4fe037b404d172125b24cf2f Parents: b11aa17 Author: Yuhao Yang Authored: Fri Nov 20 09:57:09 2015 -0800 Committer: Xiangrui Meng Committed: Fri Nov 20 09:57:24 2015 -0800 -- docs/ml-clustering.md | 30 +++ docs/ml-guide.md| 3 +- docs/mllib-guide.md | 1 + .../spark/examples/ml/JavaLDAExample.java | 94 .../apache/spark/examples/ml/LDAExample.scala | 77 5 files changed, 204 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/92d3563f/docs/ml-clustering.md -- diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md new file mode 100644 index 000..1743ef4 --- /dev/null +++ b/docs/ml-clustering.md @@ -0,0 +1,30 @@ +--- +layout: global +title: Clustering - ML +displayTitle: ML - Clustering +--- + +In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html). + +## Latent Dirichlet allocation (LDA) + +`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`, +and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by +`EMLDAOptimizer` to a `DistributedLDAModel` if needed. + + + +Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details. + + +{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %} + + + + +Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details. + +{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %} + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/92d3563f/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index be18a05..6f35b30 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -40,6 +40,7 @@ Also, some algorithms have additional capabilities in the `spark.ml` API; e.g., provide class probabilities, and linear models provide model summaries. * [Feature extraction, transformation, and selection](ml-features.html) +* [Clustering](ml-clustering.html) * [Decision Trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) @@ -950,4 +951,4 @@ model.transform(test) {% endhighlight %} - + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/92d3563f/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 91e50cc..54e35fc 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -69,6 +69,7 @@ We list major functionality from both below, with links to detailed guides. concepts. It also contains sections on using algorithms within the Pipelines API, for example: * [Feature extraction, transformation, and selection](ml-features.html) +* [Clustering](ml-clustering.html) * [Decision trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) http://git-wip-us.apache.org/repos/asf/spark/blob/92d3563f/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java new file mode 100644 index 000..b3a7d2e --- /dev/null +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java @@ -0,0 +1,94 @@ +/* + *
spark git commit: [SPARK-11876][SQL] Support printSchema in DataSet API
Repository: spark Updated Branches: refs/heads/master e359d5dcf -> bef361c58 [SPARK-11876][SQL] Support printSchema in DataSet API DataSet APIs look great! However, I am lost when doing multiple level joins. For example, ``` val ds1 = Seq(("a", 1), ("b", 2)).toDS().as("a") val ds2 = Seq(("a", 1), ("b", 2)).toDS().as("b") val ds3 = Seq(("a", 1), ("b", 2)).toDS().as("c") ds1.joinWith(ds2, $"a._2" === $"b._2").as("ab").joinWith(ds3, $"ab._1._2" === $"c._2").printSchema() ``` The printed schema is like ``` root |-- _1: struct (nullable = true) ||-- _1: struct (nullable = true) |||-- _1: string (nullable = true) |||-- _2: integer (nullable = true) ||-- _2: struct (nullable = true) |||-- _1: string (nullable = true) |||-- _2: integer (nullable = true) |-- _2: struct (nullable = true) ||-- _1: string (nullable = true) ||-- _2: integer (nullable = true) ``` Personally, I think we need the printSchema function. Sometimes, I do not know how to specify the column, especially when their data types are mixed. For example, if I want to write the following select for the above multi-level join, I have to know the schema: ``` newDS.select(expr("_1._2._2 + 1").as[Int]).collect() ``` marmbrus rxin cloud-fan Do you have the same feeling? Author: gatorsmileCloses #9855 from gatorsmile/printSchemaDataSet. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bef361c5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bef361c5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bef361c5 Branch: refs/heads/master Commit: bef361c589c0a38740232fd8d0a45841e4fc969a Parents: e359d5d Author: gatorsmile Authored: Fri Nov 20 11:20:47 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 11:20:47 2015 -0800 -- .../src/main/scala/org/apache/spark/sql/DataFrame.scala | 9 - .../scala/org/apache/spark/sql/execution/Queryable.scala| 9 + 2 files changed, 9 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bef361c5/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala index 9835812..7abceca 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala @@ -300,15 +300,6 @@ class DataFrame private[sql]( def columns: Array[String] = schema.fields.map(_.name) /** - * Prints the schema to the console in a nice tree format. - * @group basic - * @since 1.3.0 - */ - // scalastyle:off println - def printSchema(): Unit = println(schema.treeString) - // scalastyle:on println - - /** * Returns true if the `collect` and `take` methods can be run locally * (without any Spark executors). * @group basic http://git-wip-us.apache.org/repos/asf/spark/blob/bef361c5/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala index e86a52c..321e2c7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala @@ -38,6 +38,15 @@ private[sql] trait Queryable { } /** + * Prints the schema to the console in a nice tree format. + * @group basic + * @since 1.3.0 + */ + // scalastyle:off println + def printSchema(): Unit = println(schema.treeString) + // scalastyle:on println + + /** * Prints the plans (logical and physical) to the console for debugging purposes. * @since 1.3.0 */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11819][SQL] nice error message for missing encoder
Repository: spark Updated Branches: refs/heads/master 60bfb1133 -> 3b9d2a347 [SPARK-11819][SQL] nice error message for missing encoder before this PR, when users try to get an encoder for an un-supported class, they will only get a very simple error message like `Encoder for type xxx is not supported`. After this PR, the error message become more friendly, for example: ``` No Encoder found for abc.xyz.NonEncodable - array element class: "abc.xyz.NonEncodable" - field (class: "scala.Array", name: "arrayField") - root class: "abc.xyz.AnotherClass" ``` Author: Wenchen FanCloses #9810 from cloud-fan/error-message. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b9d2a34 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b9d2a34 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b9d2a34 Branch: refs/heads/master Commit: 3b9d2a347f9c796b90852173d84189834e499e25 Parents: 60bfb11 Author: Wenchen Fan Authored: Fri Nov 20 12:04:42 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 12:04:42 2015 -0800 -- .../spark/sql/catalyst/ScalaReflection.scala| 90 +++- .../encoders/EncoderErrorMessageSuite.scala | 62 ++ 2 files changed, 129 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3b9d2a34/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala index 33ae700..918050b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala @@ -63,7 +63,7 @@ object ScalaReflection extends ScalaReflection { case t if t <:< definitions.BooleanTpe => BooleanType case t if t <:< localTypeOf[Array[Byte]] => BinaryType case _ => -val className: String = tpe.erasure.typeSymbol.asClass.fullName +val className = getClassNameFromType(tpe) className match { case "scala.Array" => val TypeRef(_, _, Seq(elementType)) = tpe @@ -320,9 +320,23 @@ object ScalaReflection extends ScalaReflection { } } - /** Returns expressions for extracting all the fields from the given type. */ + /** + * Returns expressions for extracting all the fields from the given type. + * + * If the given type is not supported, i.e. there is no encoder can be built for this type, + * an [[UnsupportedOperationException]] will be thrown with detailed error message to explain + * the type path walked so far and which class we are not supporting. + * There are 4 kinds of type path: + * * the root type: `root class: "abc.xyz.MyClass"` + * * the value type of [[Option]]: `option value class: "abc.xyz.MyClass"` + * * the element type of [[Array]] or [[Seq]]: `array element class: "abc.xyz.MyClass"` + * * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: "myField")` + */ def extractorsFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = { -extractorFor(inputObject, localTypeOf[T]) match { +val tpe = localTypeOf[T] +val clsName = getClassNameFromType(tpe) +val walkedTypePath = s"""- root class: "${clsName} :: Nil +extractorFor(inputObject, tpe, walkedTypePath) match { case s: CreateNamedStruct => s case other => CreateNamedStruct(expressions.Literal("value") :: other :: Nil) } @@ -331,7 +345,28 @@ object ScalaReflection extends ScalaReflection { /** Helper for extracting internal fields from a case class. */ private def extractorFor( inputObject: Expression, - tpe: `Type`): Expression = ScalaReflectionLock.synchronized { + tpe: `Type`, + walkedTypePath: Seq[String]): Expression = ScalaReflectionLock.synchronized { + +def toCatalystArray(input: Expression, elementType: `Type`): Expression = { + val externalDataType = dataTypeFor(elementType) + val Schema(catalystType, nullable) = silentSchemaFor(elementType) + if (isNativeType(catalystType)) { +NewInstance( + classOf[GenericArrayData], + input :: Nil, + dataType = ArrayType(catalystType, nullable)) + } else { +val clsName = getClassNameFromType(elementType) +val newPath = s"""- array element class: "$clsName +: walkedTypePath +// `MapObjects` will run `extractorFor` lazily, we need to eagerly call `extractorFor` here +// to trigger the type check. +
spark git commit: [SPARK-11876][SQL] Support printSchema in DataSet API
Repository: spark Updated Branches: refs/heads/branch-1.6 92d3563fd -> 3662b9f4c [SPARK-11876][SQL] Support printSchema in DataSet API DataSet APIs look great! However, I am lost when doing multiple level joins. For example, ``` val ds1 = Seq(("a", 1), ("b", 2)).toDS().as("a") val ds2 = Seq(("a", 1), ("b", 2)).toDS().as("b") val ds3 = Seq(("a", 1), ("b", 2)).toDS().as("c") ds1.joinWith(ds2, $"a._2" === $"b._2").as("ab").joinWith(ds3, $"ab._1._2" === $"c._2").printSchema() ``` The printed schema is like ``` root |-- _1: struct (nullable = true) ||-- _1: struct (nullable = true) |||-- _1: string (nullable = true) |||-- _2: integer (nullable = true) ||-- _2: struct (nullable = true) |||-- _1: string (nullable = true) |||-- _2: integer (nullable = true) |-- _2: struct (nullable = true) ||-- _1: string (nullable = true) ||-- _2: integer (nullable = true) ``` Personally, I think we need the printSchema function. Sometimes, I do not know how to specify the column, especially when their data types are mixed. For example, if I want to write the following select for the above multi-level join, I have to know the schema: ``` newDS.select(expr("_1._2._2 + 1").as[Int]).collect() ``` marmbrus rxin cloud-fan Do you have the same feeling? Author: gatorsmileCloses #9855 from gatorsmile/printSchemaDataSet. (cherry picked from commit bef361c589c0a38740232fd8d0a45841e4fc969a) Signed-off-by: Michael Armbrust Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3662b9f4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3662b9f4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3662b9f4 Branch: refs/heads/branch-1.6 Commit: 3662b9f4c9956fe0299e917edf510ea01cc37791 Parents: 92d3563 Author: gatorsmile Authored: Fri Nov 20 11:20:47 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 11:21:03 2015 -0800 -- .../src/main/scala/org/apache/spark/sql/DataFrame.scala | 9 - .../scala/org/apache/spark/sql/execution/Queryable.scala| 9 + 2 files changed, 9 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3662b9f4/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala index 9835812..7abceca 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala @@ -300,15 +300,6 @@ class DataFrame private[sql]( def columns: Array[String] = schema.fields.map(_.name) /** - * Prints the schema to the console in a nice tree format. - * @group basic - * @since 1.3.0 - */ - // scalastyle:off println - def printSchema(): Unit = println(schema.treeString) - // scalastyle:on println - - /** * Returns true if the `collect` and `take` methods can be run locally * (without any Spark executors). * @group basic http://git-wip-us.apache.org/repos/asf/spark/blob/3662b9f4/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala index e86a52c..321e2c7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/Queryable.scala @@ -38,6 +38,15 @@ private[sql] trait Queryable { } /** + * Prints the schema to the console in a nice tree format. + * @group basic + * @since 1.3.0 + */ + // scalastyle:off println + def printSchema(): Unit = println(schema.treeString) + // scalastyle:on println + + /** * Prints the plans (logical and physical) to the console for debugging purposes. * @since 1.3.0 */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11819][SQL] nice error message for missing encoder
Repository: spark Updated Branches: refs/heads/branch-1.6 119f92b4e -> ff156a3a6 [SPARK-11819][SQL] nice error message for missing encoder before this PR, when users try to get an encoder for an un-supported class, they will only get a very simple error message like `Encoder for type xxx is not supported`. After this PR, the error message become more friendly, for example: ``` No Encoder found for abc.xyz.NonEncodable - array element class: "abc.xyz.NonEncodable" - field (class: "scala.Array", name: "arrayField") - root class: "abc.xyz.AnotherClass" ``` Author: Wenchen FanCloses #9810 from cloud-fan/error-message. (cherry picked from commit 3b9d2a347f9c796b90852173d84189834e499e25) Signed-off-by: Michael Armbrust Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff156a3a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ff156a3a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ff156a3a Branch: refs/heads/branch-1.6 Commit: ff156a3a660e1730de220b404a61e1bda8b7682e Parents: 119f92b Author: Wenchen Fan Authored: Fri Nov 20 12:04:42 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 12:04:53 2015 -0800 -- .../spark/sql/catalyst/ScalaReflection.scala| 90 +++- .../encoders/EncoderErrorMessageSuite.scala | 62 ++ 2 files changed, 129 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ff156a3a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala index 33ae700..918050b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala @@ -63,7 +63,7 @@ object ScalaReflection extends ScalaReflection { case t if t <:< definitions.BooleanTpe => BooleanType case t if t <:< localTypeOf[Array[Byte]] => BinaryType case _ => -val className: String = tpe.erasure.typeSymbol.asClass.fullName +val className = getClassNameFromType(tpe) className match { case "scala.Array" => val TypeRef(_, _, Seq(elementType)) = tpe @@ -320,9 +320,23 @@ object ScalaReflection extends ScalaReflection { } } - /** Returns expressions for extracting all the fields from the given type. */ + /** + * Returns expressions for extracting all the fields from the given type. + * + * If the given type is not supported, i.e. there is no encoder can be built for this type, + * an [[UnsupportedOperationException]] will be thrown with detailed error message to explain + * the type path walked so far and which class we are not supporting. + * There are 4 kinds of type path: + * * the root type: `root class: "abc.xyz.MyClass"` + * * the value type of [[Option]]: `option value class: "abc.xyz.MyClass"` + * * the element type of [[Array]] or [[Seq]]: `array element class: "abc.xyz.MyClass"` + * * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: "myField")` + */ def extractorsFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = { -extractorFor(inputObject, localTypeOf[T]) match { +val tpe = localTypeOf[T] +val clsName = getClassNameFromType(tpe) +val walkedTypePath = s"""- root class: "${clsName} :: Nil +extractorFor(inputObject, tpe, walkedTypePath) match { case s: CreateNamedStruct => s case other => CreateNamedStruct(expressions.Literal("value") :: other :: Nil) } @@ -331,7 +345,28 @@ object ScalaReflection extends ScalaReflection { /** Helper for extracting internal fields from a case class. */ private def extractorFor( inputObject: Expression, - tpe: `Type`): Expression = ScalaReflectionLock.synchronized { + tpe: `Type`, + walkedTypePath: Seq[String]): Expression = ScalaReflectionLock.synchronized { + +def toCatalystArray(input: Expression, elementType: `Type`): Expression = { + val externalDataType = dataTypeFor(elementType) + val Schema(catalystType, nullable) = silentSchemaFor(elementType) + if (isNativeType(catalystType)) { +NewInstance( + classOf[GenericArrayData], + input :: Nil, + dataType = ArrayType(catalystType, nullable)) + } else { +val clsName = getClassNameFromType(elementType) +val newPath = s"""- array element class: "$clsName +: walkedTypePath +//
spark git commit: [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL
Repository: spark Updated Branches: refs/heads/branch-1.5 6fe1ce6ab -> 9a906c1c3 [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL JIRA: https://issues.apache.org/jira/browse/SPARK-11817 Instead of return None, we should truncate the fractional seconds to prevent inserting NULL. Author: Liang-Chi HsiehCloses #9834 from viirya/truncate-fractional-sec. (cherry picked from commit 60bfb113325c71491f8dcf98b6036b0caa2144fe) Signed-off-by: Yin Huai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9a906c1c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9a906c1c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9a906c1c Branch: refs/heads/branch-1.5 Commit: 9a906c1c3a76097b72f0951a3730b669ac58e3c7 Parents: 6fe1ce6 Author: Liang-Chi Hsieh Authored: Fri Nov 20 11:43:45 2015 -0800 Committer: Yin Huai Committed: Fri Nov 20 11:44:32 2015 -0800 -- .../org/apache/spark/sql/catalyst/util/DateTimeUtils.scala | 5 + .../apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala | 8 2 files changed, 13 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9a906c1c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala index c6a2780..0fdcb62 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala @@ -326,6 +326,11 @@ object DateTimeUtils { return None } +// Instead of return None, we truncate the fractional seconds to prevent inserting NULL +if (segments(6) > 99) { + segments(6) = segments(6).toString.take(6).toInt +} + if (segments(3) < 0 || segments(3) > 23 || segments(4) < 0 || segments(4) > 59 || segments(5) < 0 || segments(5) > 59 || segments(6) < 0 || segments(6) > 99 || segments(7) < 0 || segments(7) > 23 || segments(8) < 0 || segments(8) > 59) { http://git-wip-us.apache.org/repos/asf/spark/blob/9a906c1c/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala index b35d400..753c7e9 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala @@ -300,6 +300,14 @@ class DateTimeUtilsSuite extends SparkFunSuite { UTF8String.fromString("2015-03-18T12:03.17-0:70")).isEmpty) assert(stringToTimestamp( UTF8String.fromString("2015-03-18T12:03.17-1:0:0")).isEmpty) + +// Truncating the fractional seconds +c = Calendar.getInstance(TimeZone.getTimeZone("GMT+00:00")) +c.set(2015, 2, 18, 12, 3, 17) +c.set(Calendar.MILLISECOND, 0) +assert(stringToTimestamp( + UTF8String.fromString("2015-03-18T12:03:17.123456789+0:00")).get === +c.getTimeInMillis * 1000 + 123456) } test("hours") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL
Repository: spark Updated Branches: refs/heads/master bef361c58 -> 60bfb1133 [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL JIRA: https://issues.apache.org/jira/browse/SPARK-11817 Instead of return None, we should truncate the fractional seconds to prevent inserting NULL. Author: Liang-Chi HsiehCloses #9834 from viirya/truncate-fractional-sec. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60bfb113 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60bfb113 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60bfb113 Branch: refs/heads/master Commit: 60bfb113325c71491f8dcf98b6036b0caa2144fe Parents: bef361c Author: Liang-Chi Hsieh Authored: Fri Nov 20 11:43:45 2015 -0800 Committer: Yin Huai Committed: Fri Nov 20 11:43:45 2015 -0800 -- .../org/apache/spark/sql/catalyst/util/DateTimeUtils.scala | 5 + .../apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala | 8 2 files changed, 13 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/60bfb113/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala index 17a5527..2b93882 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala @@ -327,6 +327,11 @@ object DateTimeUtils { return None } +// Instead of return None, we truncate the fractional seconds to prevent inserting NULL +if (segments(6) > 99) { + segments(6) = segments(6).toString.take(6).toInt +} + if (segments(3) < 0 || segments(3) > 23 || segments(4) < 0 || segments(4) > 59 || segments(5) < 0 || segments(5) > 59 || segments(6) < 0 || segments(6) > 99 || segments(7) < 0 || segments(7) > 23 || segments(8) < 0 || segments(8) > 59) { http://git-wip-us.apache.org/repos/asf/spark/blob/60bfb113/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala index faca128..0ce5a2f 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala @@ -343,6 +343,14 @@ class DateTimeUtilsSuite extends SparkFunSuite { UTF8String.fromString("2015-03-18T12:03.17-0:70")).isEmpty) assert(stringToTimestamp( UTF8String.fromString("2015-03-18T12:03.17-1:0:0")).isEmpty) + +// Truncating the fractional seconds +c = Calendar.getInstance(TimeZone.getTimeZone("GMT+00:00")) +c.set(2015, 2, 18, 12, 3, 17) +c.set(Calendar.MILLISECOND, 0) +assert(stringToTimestamp( + UTF8String.fromString("2015-03-18T12:03:17.123456789+0:00")).get === +c.getTimeInMillis * 1000 + 123456) } test("hours") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL
Repository: spark Updated Branches: refs/heads/branch-1.6 3662b9f4c -> 119f92b4e [SPARK-11817][SQL] Truncating the fractional seconds to prevent inserting a NULL JIRA: https://issues.apache.org/jira/browse/SPARK-11817 Instead of return None, we should truncate the fractional seconds to prevent inserting NULL. Author: Liang-Chi HsiehCloses #9834 from viirya/truncate-fractional-sec. (cherry picked from commit 60bfb113325c71491f8dcf98b6036b0caa2144fe) Signed-off-by: Yin Huai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/119f92b4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/119f92b4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/119f92b4 Branch: refs/heads/branch-1.6 Commit: 119f92b4e6cbdf03c5a6a25f892a73d81b2a23d1 Parents: 3662b9f Author: Liang-Chi Hsieh Authored: Fri Nov 20 11:43:45 2015 -0800 Committer: Yin Huai Committed: Fri Nov 20 11:43:54 2015 -0800 -- .../org/apache/spark/sql/catalyst/util/DateTimeUtils.scala | 5 + .../apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala | 8 2 files changed, 13 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/119f92b4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala index 17a5527..2b93882 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala @@ -327,6 +327,11 @@ object DateTimeUtils { return None } +// Instead of return None, we truncate the fractional seconds to prevent inserting NULL +if (segments(6) > 99) { + segments(6) = segments(6).toString.take(6).toInt +} + if (segments(3) < 0 || segments(3) > 23 || segments(4) < 0 || segments(4) > 59 || segments(5) < 0 || segments(5) > 59 || segments(6) < 0 || segments(6) > 99 || segments(7) < 0 || segments(7) > 23 || segments(8) < 0 || segments(8) > 59) { http://git-wip-us.apache.org/repos/asf/spark/blob/119f92b4/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala index faca128..0ce5a2f 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala @@ -343,6 +343,14 @@ class DateTimeUtilsSuite extends SparkFunSuite { UTF8String.fromString("2015-03-18T12:03.17-0:70")).isEmpty) assert(stringToTimestamp( UTF8String.fromString("2015-03-18T12:03.17-1:0:0")).isEmpty) + +// Truncating the fractional seconds +c = Calendar.getInstance(TimeZone.getTimeZone("GMT+00:00")) +c.set(2015, 2, 18, 12, 3, 17) +c.set(Calendar.MILLISECOND, 0) +assert(stringToTimestamp( + UTF8String.fromString("2015-03-18T12:03:17.123456789+0:00")).get === +c.getTimeInMillis * 1000 + 123456) } test("hours") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11724][SQL] Change casting between int and timestamp to consistently treat int in seconds.
Repository: spark Updated Branches: refs/heads/master 652def318 -> 9ed4ad426 [SPARK-11724][SQL] Change casting between int and timestamp to consistently treat int in seconds. Hive has since changed this behavior as well. https://issues.apache.org/jira/browse/HIVE-3454 Author: Nong LiAuthor: Nong Li Author: Yin Huai Closes #9685 from nongli/spark-11724. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9ed4ad42 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9ed4ad42 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9ed4ad42 Branch: refs/heads/master Commit: 9ed4ad4265cf9d3135307eb62dae6de0b220fc21 Parents: 652def3 Author: Nong Li Authored: Fri Nov 20 14:19:34 2015 -0800 Committer: Yin Huai Committed: Fri Nov 20 14:19:34 2015 -0800 -- .../spark/sql/catalyst/expressions/Cast.scala | 6 ++-- .../sql/catalyst/expressions/CastSuite.scala| 16 + .../apache/spark/sql/DateFunctionsSuite.scala | 3 ++ ...l testing-0-237a6af90a857da1efcbe98f6bbbf9d6 | 1 + ...l testing-0-9a02bc7de09bcabcbd4c91f54a814c20 | 1 - ...mp cast #3-0-76ee270337f664b36cacfc6528ac109 | 1 - ...p cast #5-0-dbd7bcd167d322d6617b884c02c7f247 | 1 - ...p cast #7-0-1d70654217035f8ce5f64344f4c5a80f | 1 - .../sql/hive/execution/HiveQuerySuite.scala | 34 +--- 9 files changed, 39 insertions(+), 25 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9ed4ad42/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 5564e24..533d17e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -204,8 +204,8 @@ case class Cast(child: Expression, dataType: DataType) if (d.isNaN || d.isInfinite) null else (d * 100L).toLong } - // converting milliseconds to us - private[this] def longToTimestamp(t: Long): Long = t * 1000L + // converting seconds to us + private[this] def longToTimestamp(t: Long): Long = t * 100L // converting us to seconds private[this] def timestampToLong(ts: Long): Long = math.floor(ts.toDouble / 100L).toLong // converting us to seconds in double @@ -647,7 +647,7 @@ case class Cast(child: Expression, dataType: DataType) private[this] def decimalToTimestampCode(d: String): String = s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 1000L" + private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" private[this] def timestampToIntegerCode(ts: String): String = s"java.lang.Math.floor((double) $ts / 100L)" private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" http://git-wip-us.apache.org/repos/asf/spark/blob/9ed4ad42/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index f4db4da..ab77a76 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -258,8 +258,8 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { test("cast from int 2") { checkEvaluation(cast(1, LongType), 1.toLong) -checkEvaluation(cast(cast(1000, TimestampType), LongType), 1.toLong) -checkEvaluation(cast(cast(-1200, TimestampType), LongType), -2.toLong) +checkEvaluation(cast(cast(1000, TimestampType), LongType), 1000.toLong) +checkEvaluation(cast(cast(-1200, TimestampType), LongType), -1200.toLong) checkEvaluation(cast(123, DecimalType.USER_DEFAULT), Decimal(123)) checkEvaluation(cast(123, DecimalType(3, 0)), Decimal(123)) @@ -348,14 +348,14 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation( cast(cast(cast(cast(cast(cast("5", ByteType), TimestampType), DecimalType.SYSTEM_DEFAULT), LongType), StringType), ShortType), - 0.toShort) + 5.toShort) checkEvaluation( cast(cast(cast(cast(cast(cast("5", TimestampType),
spark git commit: [SPARK-11724][SQL] Change casting between int and timestamp to consistently treat int in seconds.
Repository: spark Updated Branches: refs/heads/branch-1.6 6fc968754 -> 9c8e17984 [SPARK-11724][SQL] Change casting between int and timestamp to consistently treat int in seconds. Hive has since changed this behavior as well. https://issues.apache.org/jira/browse/HIVE-3454 Author: Nong LiAuthor: Nong Li Author: Yin Huai Closes #9685 from nongli/spark-11724. (cherry picked from commit 9ed4ad4265cf9d3135307eb62dae6de0b220fc21) Signed-off-by: Yin Huai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9c8e1798 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9c8e1798 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9c8e1798 Branch: refs/heads/branch-1.6 Commit: 9c8e17984d95a8d225525a592a921a5af81e4440 Parents: 6fc9687 Author: Nong Li Authored: Fri Nov 20 14:19:34 2015 -0800 Committer: Yin Huai Committed: Fri Nov 20 14:19:44 2015 -0800 -- .../spark/sql/catalyst/expressions/Cast.scala | 6 ++-- .../sql/catalyst/expressions/CastSuite.scala| 16 + .../apache/spark/sql/DateFunctionsSuite.scala | 3 ++ ...l testing-0-237a6af90a857da1efcbe98f6bbbf9d6 | 1 + ...l testing-0-9a02bc7de09bcabcbd4c91f54a814c20 | 1 - ...mp cast #3-0-76ee270337f664b36cacfc6528ac109 | 1 - ...p cast #5-0-dbd7bcd167d322d6617b884c02c7f247 | 1 - ...p cast #7-0-1d70654217035f8ce5f64344f4c5a80f | 1 - .../sql/hive/execution/HiveQuerySuite.scala | 34 +--- 9 files changed, 39 insertions(+), 25 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9c8e1798/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 5564e24..533d17e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -204,8 +204,8 @@ case class Cast(child: Expression, dataType: DataType) if (d.isNaN || d.isInfinite) null else (d * 100L).toLong } - // converting milliseconds to us - private[this] def longToTimestamp(t: Long): Long = t * 1000L + // converting seconds to us + private[this] def longToTimestamp(t: Long): Long = t * 100L // converting us to seconds private[this] def timestampToLong(ts: Long): Long = math.floor(ts.toDouble / 100L).toLong // converting us to seconds in double @@ -647,7 +647,7 @@ case class Cast(child: Expression, dataType: DataType) private[this] def decimalToTimestampCode(d: String): String = s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 1000L" + private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" private[this] def timestampToIntegerCode(ts: String): String = s"java.lang.Math.floor((double) $ts / 100L)" private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" http://git-wip-us.apache.org/repos/asf/spark/blob/9c8e1798/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index f4db4da..ab77a76 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -258,8 +258,8 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { test("cast from int 2") { checkEvaluation(cast(1, LongType), 1.toLong) -checkEvaluation(cast(cast(1000, TimestampType), LongType), 1.toLong) -checkEvaluation(cast(cast(-1200, TimestampType), LongType), -2.toLong) +checkEvaluation(cast(cast(1000, TimestampType), LongType), 1000.toLong) +checkEvaluation(cast(cast(-1200, TimestampType), LongType), -1200.toLong) checkEvaluation(cast(123, DecimalType.USER_DEFAULT), Decimal(123)) checkEvaluation(cast(123, DecimalType(3, 0)), Decimal(123)) @@ -348,14 +348,14 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation( cast(cast(cast(cast(cast(cast("5", ByteType), TimestampType), DecimalType.SYSTEM_DEFAULT), LongType), StringType),
spark git commit: [SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test
Repository: spark Updated Branches: refs/heads/branch-1.6 ff156a3a6 -> 6fc968754 [SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test This patch reduces some RPC timeouts in order to speed up the slow "AkkaUtilsSuite.remote fetch ssl on - untrusted server", which used to take two minutes to run. Author: Josh RosenCloses #9869 from JoshRosen/SPARK-11650. (cherry picked from commit 652def318e47890bd0a0977dc982cc07f99fb06a) Signed-off-by: Josh Rosen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6fc96875 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6fc96875 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6fc96875 Branch: refs/heads/branch-1.6 Commit: 6fc96875460d881f13ec3082c4a2b32144ea45e9 Parents: ff156a3 Author: Josh Rosen Authored: Fri Nov 20 13:17:35 2015 -0800 Committer: Josh Rosen Committed: Fri Nov 20 13:18:15 2015 -0800 -- core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6fc96875/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala index 6160101..0af4b60 100644 --- a/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala @@ -340,10 +340,11 @@ class AkkaUtilsSuite extends SparkFunSuite with LocalSparkContext with ResetSyst new MapOutputTrackerMasterEndpoint(rpcEnv, masterTracker, conf)) val slaveConf = sparkSSLConfig() + .set("spark.rpc.askTimeout", "5s") + .set("spark.rpc.lookupTimeout", "5s") val securityManagerBad = new SecurityManager(slaveConf) val slaveRpcEnv = RpcEnv.create("spark-slave", hostname, 0, slaveConf, securityManagerBad) -val slaveTracker = new MapOutputTrackerWorker(conf) try { slaveRpcEnv.setupEndpointRef("spark", rpcEnv.address, MapOutputTracker.ENDPOINT_NAME) fail("should receive either ActorNotFound or TimeoutException") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test
Repository: spark Updated Branches: refs/heads/master 3b9d2a347 -> 652def318 [SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test This patch reduces some RPC timeouts in order to speed up the slow "AkkaUtilsSuite.remote fetch ssl on - untrusted server", which used to take two minutes to run. Author: Josh RosenCloses #9869 from JoshRosen/SPARK-11650. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/652def31 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/652def31 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/652def31 Branch: refs/heads/master Commit: 652def318e47890bd0a0977dc982cc07f99fb06a Parents: 3b9d2a3 Author: Josh Rosen Authored: Fri Nov 20 13:17:35 2015 -0800 Committer: Josh Rosen Committed: Fri Nov 20 13:17:35 2015 -0800 -- core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/652def31/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala b/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala index 6160101..0af4b60 100644 --- a/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala +++ b/core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala @@ -340,10 +340,11 @@ class AkkaUtilsSuite extends SparkFunSuite with LocalSparkContext with ResetSyst new MapOutputTrackerMasterEndpoint(rpcEnv, masterTracker, conf)) val slaveConf = sparkSSLConfig() + .set("spark.rpc.askTimeout", "5s") + .set("spark.rpc.lookupTimeout", "5s") val securityManagerBad = new SecurityManager(slaveConf) val slaveRpcEnv = RpcEnv.create("spark-slave", hostname, 0, slaveConf, securityManagerBad) -val slaveTracker = new MapOutputTrackerWorker(conf) try { slaveRpcEnv.setupEndpointRef("spark", rpcEnv.address, MapOutputTracker.ENDPOINT_NAME) fail("should receive either ActorNotFound or TimeoutException") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11890][SQL] Fix compilation for Scala 2.11
Repository: spark Updated Branches: refs/heads/branch-1.6 7e06d51d5 -> e0bb4e09c [SPARK-11890][SQL] Fix compilation for Scala 2.11 Author: Michael ArmbrustCloses #9871 from marmbrus/scala211-break. (cherry picked from commit 68ed046836975b492b594967256d3c7951b568a5) Signed-off-by: Michael Armbrust Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e0bb4e09 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e0bb4e09 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e0bb4e09 Branch: refs/heads/branch-1.6 Commit: e0bb4e09c7b04bc8926a4c0658fc2c51db8fb04c Parents: 7e06d51 Author: Michael Armbrust Authored: Fri Nov 20 15:38:04 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 15:38:16 2015 -0800 -- .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e0bb4e09/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala index 918050b..4a4a62e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala @@ -670,14 +670,14 @@ trait ScalaReflection { * Unlike `schemaFor`, this method won't throw exception for un-supported type, it will return * `NullType` silently instead. */ - private def silentSchemaFor(tpe: `Type`): Schema = try { + protected def silentSchemaFor(tpe: `Type`): Schema = try { schemaFor(tpe) } catch { case _: UnsupportedOperationException => Schema(NullType, nullable = true) } /** Returns the full class name for a type. */ - private def getClassNameFromType(tpe: `Type`): String = { + protected def getClassNameFromType(tpe: `Type`): String = { tpe.erasure.typeSymbol.asClass.fullName } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11890][SQL] Fix compilation for Scala 2.11
Repository: spark Updated Branches: refs/heads/master 968acf3bd -> 68ed04683 [SPARK-11890][SQL] Fix compilation for Scala 2.11 Author: Michael ArmbrustCloses #9871 from marmbrus/scala211-break. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68ed0468 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68ed0468 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68ed0468 Branch: refs/heads/master Commit: 68ed046836975b492b594967256d3c7951b568a5 Parents: 968acf3 Author: Michael Armbrust Authored: Fri Nov 20 15:38:04 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 15:38:04 2015 -0800 -- .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/68ed0468/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala index 918050b..4a4a62e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala @@ -670,14 +670,14 @@ trait ScalaReflection { * Unlike `schemaFor`, this method won't throw exception for un-supported type, it will return * `NullType` silently instead. */ - private def silentSchemaFor(tpe: `Type`): Schema = try { + protected def silentSchemaFor(tpe: `Type`): Schema = try { schemaFor(tpe) } catch { case _: UnsupportedOperationException => Schema(NullType, nullable = true) } /** Returns the full class name for a type. */ - private def getClassNameFromType(tpe: `Type`): String = { + protected def getClassNameFromType(tpe: `Type`): String = { tpe.erasure.typeSymbol.asClass.fullName } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11877] Prevent agg. fallback conf. from leaking across test suites
Repository: spark Updated Branches: refs/heads/branch-1.6 1ce6394e3 -> eab90d3f3 [SPARK-11877] Prevent agg. fallback conf. from leaking across test suites This patch fixes an issue where the `spark.sql.TungstenAggregate.testFallbackStartsAt` SQLConf setting was not properly reset / cleared at the end of `TungstenAggregationQueryWithControlledFallbackSuite`. This ended up causing test failures in HiveCompatibilitySuite in Maven builds by causing spilling to occur way too frequently. This configuration leak was inadvertently introduced during test cleanup in #9618. Author: Josh RosenCloses #9857 from JoshRosen/clear-fallback-prop-in-test-teardown. (cherry picked from commit a66142decee48bf5689fb7f4f33646d7bb1ac08d) Signed-off-by: Josh Rosen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eab90d3f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eab90d3f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/eab90d3f Branch: refs/heads/branch-1.6 Commit: eab90d3f37587c6a8df8c686be6d9cd5751f205a Parents: 1ce6394 Author: Josh Rosen Authored: Fri Nov 20 00:46:29 2015 -0800 Committer: Josh Rosen Committed: Fri Nov 20 00:46:42 2015 -0800 -- .../hive/execution/AggregationQuerySuite.scala | 44 ++-- 1 file changed, 21 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/eab90d3f/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala -- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala index 6dde79f..39c0a2a 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala @@ -868,29 +868,27 @@ class TungstenAggregationQueryWithControlledFallbackSuite extends AggregationQue override protected def checkAnswer(actual: => DataFrame, expectedAnswer: Seq[Row]): Unit = { (0 to 2).foreach { fallbackStartsAt => - sqlContext.setConf( -"spark.sql.TungstenAggregate.testFallbackStartsAt", -fallbackStartsAt.toString) - - // Create a new df to make sure its physical operator picks up - // spark.sql.TungstenAggregate.testFallbackStartsAt. - // todo: remove it? - val newActual = DataFrame(sqlContext, actual.logicalPlan) - - QueryTest.checkAnswer(newActual, expectedAnswer) match { -case Some(errorMessage) => - val newErrorMessage = -s""" - |The following aggregation query failed when using TungstenAggregate with - |controlled fallback (it falls back to sort-based aggregation once it has processed - |$fallbackStartsAt input rows). The query is - |${actual.queryExecution} - | - |$errorMessage -""".stripMargin - - fail(newErrorMessage) -case None => + withSQLConf("spark.sql.TungstenAggregate.testFallbackStartsAt" -> fallbackStartsAt.toString) { +// Create a new df to make sure its physical operator picks up +// spark.sql.TungstenAggregate.testFallbackStartsAt. +// todo: remove it? +val newActual = DataFrame(sqlContext, actual.logicalPlan) + +QueryTest.checkAnswer(newActual, expectedAnswer) match { + case Some(errorMessage) => +val newErrorMessage = + s""" +|The following aggregation query failed when using TungstenAggregate with +|controlled fallback (it falls back to sort-based aggregation once it has processed +|$fallbackStartsAt input rows). The query is +|${actual.queryExecution} +| +|$errorMessage + """.stripMargin + +fail(newErrorMessage) + case None => +} } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11877] Prevent agg. fallback conf. from leaking across test suites
Repository: spark Updated Branches: refs/heads/master 3e1d120ce -> a66142dec [SPARK-11877] Prevent agg. fallback conf. from leaking across test suites This patch fixes an issue where the `spark.sql.TungstenAggregate.testFallbackStartsAt` SQLConf setting was not properly reset / cleared at the end of `TungstenAggregationQueryWithControlledFallbackSuite`. This ended up causing test failures in HiveCompatibilitySuite in Maven builds by causing spilling to occur way too frequently. This configuration leak was inadvertently introduced during test cleanup in #9618. Author: Josh RosenCloses #9857 from JoshRosen/clear-fallback-prop-in-test-teardown. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a66142de Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a66142de Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a66142de Branch: refs/heads/master Commit: a66142decee48bf5689fb7f4f33646d7bb1ac08d Parents: 3e1d120 Author: Josh Rosen Authored: Fri Nov 20 00:46:29 2015 -0800 Committer: Josh Rosen Committed: Fri Nov 20 00:46:29 2015 -0800 -- .../hive/execution/AggregationQuerySuite.scala | 44 ++-- 1 file changed, 21 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a66142de/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala -- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala index 6dde79f..39c0a2a 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala @@ -868,29 +868,27 @@ class TungstenAggregationQueryWithControlledFallbackSuite extends AggregationQue override protected def checkAnswer(actual: => DataFrame, expectedAnswer: Seq[Row]): Unit = { (0 to 2).foreach { fallbackStartsAt => - sqlContext.setConf( -"spark.sql.TungstenAggregate.testFallbackStartsAt", -fallbackStartsAt.toString) - - // Create a new df to make sure its physical operator picks up - // spark.sql.TungstenAggregate.testFallbackStartsAt. - // todo: remove it? - val newActual = DataFrame(sqlContext, actual.logicalPlan) - - QueryTest.checkAnswer(newActual, expectedAnswer) match { -case Some(errorMessage) => - val newErrorMessage = -s""" - |The following aggregation query failed when using TungstenAggregate with - |controlled fallback (it falls back to sort-based aggregation once it has processed - |$fallbackStartsAt input rows). The query is - |${actual.queryExecution} - | - |$errorMessage -""".stripMargin - - fail(newErrorMessage) -case None => + withSQLConf("spark.sql.TungstenAggregate.testFallbackStartsAt" -> fallbackStartsAt.toString) { +// Create a new df to make sure its physical operator picks up +// spark.sql.TungstenAggregate.testFallbackStartsAt. +// todo: remove it? +val newActual = DataFrame(sqlContext, actual.logicalPlan) + +QueryTest.checkAnswer(newActual, expectedAnswer) match { + case Some(errorMessage) => +val newErrorMessage = + s""" +|The following aggregation query failed when using TungstenAggregate with +|controlled fallback (it falls back to sort-based aggregation once it has processed +|$fallbackStartsAt input rows). The query is +|${actual.queryExecution} +| +|$errorMessage + """.stripMargin + +fail(newErrorMessage) + case None => +} } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Revert "[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml"
Repository: spark Updated Branches: refs/heads/master 47815878a -> a2dce22e0 Revert "[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml" This reverts commit e359d5dcf5bd300213054ebeae9fe75c4f7eb9e7. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a2dce22e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a2dce22e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a2dce22e Branch: refs/heads/master Commit: a2dce22e0a25922e2052318d32f32877b7c27ec2 Parents: 4781587 Author: Xiangrui MengAuthored: Fri Nov 20 16:51:47 2015 -0800 Committer: Xiangrui Meng Committed: Fri Nov 20 16:51:47 2015 -0800 -- docs/ml-clustering.md | 30 --- docs/ml-guide.md| 3 +- docs/mllib-guide.md | 1 - .../spark/examples/ml/JavaLDAExample.java | 94 .../apache/spark/examples/ml/LDAExample.scala | 77 5 files changed, 1 insertion(+), 204 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a2dce22e/docs/ml-clustering.md -- diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md deleted file mode 100644 index 1743ef4..000 --- a/docs/ml-clustering.md +++ /dev/null @@ -1,30 +0,0 @@ -layout: global -title: Clustering - ML -displayTitle: ML - Clustering - -In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html). - -## Latent Dirichlet allocation (LDA) - -`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`, -and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by -`EMLDAOptimizer` to a `DistributedLDAModel` if needed. - - - -Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details. - - -{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %} - - - - -Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details. - -{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %} - - - \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/a2dce22e/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index 6f35b30..be18a05 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -40,7 +40,6 @@ Also, some algorithms have additional capabilities in the `spark.ml` API; e.g., provide class probabilities, and linear models provide model summaries. * [Feature extraction, transformation, and selection](ml-features.html) -* [Clustering](ml-clustering.html) * [Decision Trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) @@ -951,4 +950,4 @@ model.transform(test) {% endhighlight %} - \ No newline at end of file + http://git-wip-us.apache.org/repos/asf/spark/blob/a2dce22e/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 54e35fc..91e50cc 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -69,7 +69,6 @@ We list major functionality from both below, with links to detailed guides. concepts. It also contains sections on using algorithms within the Pipelines API, for example: * [Feature extraction, transformation, and selection](ml-features.html) -* [Clustering](ml-clustering.html) * [Decision trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) http://git-wip-us.apache.org/repos/asf/spark/blob/a2dce22e/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java deleted file mode 100644 index b3a7d2e..000 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java +++ /dev/null @@ -1,94 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the
spark git commit: Revert "[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml"
Repository: spark Updated Branches: refs/heads/branch-1.6 285e4017a -> 33d856df5 Revert "[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml" This reverts commit 92d3563fd0cf0c3f4fe037b404d172125b24cf2f. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/33d856df Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/33d856df Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/33d856df Branch: refs/heads/branch-1.6 Commit: 33d856df53689d7fd515a21ec4f34d1d5c74a958 Parents: 285e401 Author: Xiangrui MengAuthored: Fri Nov 20 16:52:20 2015 -0800 Committer: Xiangrui Meng Committed: Fri Nov 20 16:52:20 2015 -0800 -- docs/ml-clustering.md | 30 --- docs/ml-guide.md| 3 +- docs/mllib-guide.md | 1 - .../spark/examples/ml/JavaLDAExample.java | 94 .../apache/spark/examples/ml/LDAExample.scala | 77 5 files changed, 1 insertion(+), 204 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/33d856df/docs/ml-clustering.md -- diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md deleted file mode 100644 index 1743ef4..000 --- a/docs/ml-clustering.md +++ /dev/null @@ -1,30 +0,0 @@ -layout: global -title: Clustering - ML -displayTitle: ML - Clustering - -In this section, we introduce the pipeline API for [clustering in mllib](mllib-clustering.html). - -## Latent Dirichlet allocation (LDA) - -`LDA` is implemented as an `Estimator` that supports both `EMLDAOptimizer` and `OnlineLDAOptimizer`, -and generates a `LDAModel` as the base models. Expert users may cast a `LDAModel` generated by -`EMLDAOptimizer` to a `DistributedLDAModel` if needed. - - - -Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.LDA) for more details. - - -{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %} - - - - -Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/LDA.html) for more details. - -{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %} - - - \ No newline at end of file http://git-wip-us.apache.org/repos/asf/spark/blob/33d856df/docs/ml-guide.md -- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index 6f35b30..be18a05 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -40,7 +40,6 @@ Also, some algorithms have additional capabilities in the `spark.ml` API; e.g., provide class probabilities, and linear models provide model summaries. * [Feature extraction, transformation, and selection](ml-features.html) -* [Clustering](ml-clustering.html) * [Decision Trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) @@ -951,4 +950,4 @@ model.transform(test) {% endhighlight %} - \ No newline at end of file + http://git-wip-us.apache.org/repos/asf/spark/blob/33d856df/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 54e35fc..91e50cc 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -69,7 +69,6 @@ We list major functionality from both below, with links to detailed guides. concepts. It also contains sections on using algorithms within the Pipelines API, for example: * [Feature extraction, transformation, and selection](ml-features.html) -* [Clustering](ml-clustering.html) * [Decision trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) http://git-wip-us.apache.org/repos/asf/spark/blob/33d856df/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java deleted file mode 100644 index b3a7d2e..000 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaLDAExample.java +++ /dev/null @@ -1,94 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with
spark git commit: [SPARK-11716][SQL] UDFRegistration just drops the input type when re-creating the UserDefinedFunction
Repository: spark Updated Branches: refs/heads/branch-1.6 fbe6888cc -> b9b0e1747 [SPARK-11716][SQL] UDFRegistration just drops the input type when re-creating the UserDefinedFunction https://issues.apache.org/jira/browse/SPARK-11716 This is one is #9739 and a regression test. When commit it, please make sure the author is jbonofre. You can find the original PR at https://github.com/apache/spark/pull/9739 closes #9739 Author: Jean-Baptiste OnofréAuthor: Yin Huai Closes #9868 from yhuai/SPARK-11716. (cherry picked from commit 03ba56d78f50747710d01c27d409ba2be42ae557) Signed-off-by: Michael Armbrust Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9b0e174 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9b0e174 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9b0e174 Branch: refs/heads/branch-1.6 Commit: b9b0e17473e98d3d19b88abaf5ffcfdd6a2a2ea8 Parents: fbe6888 Author: Jean-Baptiste Onofré Authored: Fri Nov 20 14:45:40 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 14:45:55 2015 -0800 -- .../org/apache/spark/sql/UDFRegistration.scala | 48 ++-- .../scala/org/apache/spark/sql/UDFSuite.scala | 15 ++ 2 files changed, 39 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b9b0e174/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala index fc4d093..051694c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala @@ -88,7 +88,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try($inputTypes).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) - UserDefinedFunction(func, dataType) + UserDefinedFunction(func, dataType, inputTypes) }""") } @@ -120,7 +120,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -133,7 +133,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -146,7 +146,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: ScalaReflection.schemaFor[A2].dataType :: Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -159,7 +159,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: ScalaReflection.schemaFor[A2].dataType :: ScalaReflection.schemaFor[A3].dataType :: Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -172,7 +172,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: ScalaReflection.schemaFor[A2].dataType :: ScalaReflection.schemaFor[A3].dataType :: ScalaReflection.schemaFor[A4].dataType :: Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -185,7 +185,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes
spark git commit: [SPARK-11716][SQL] UDFRegistration just drops the input type when re-creating the UserDefinedFunction
Repository: spark Updated Branches: refs/heads/master 89fd9bd06 -> 03ba56d78 [SPARK-11716][SQL] UDFRegistration just drops the input type when re-creating the UserDefinedFunction https://issues.apache.org/jira/browse/SPARK-11716 This is one is #9739 and a regression test. When commit it, please make sure the author is jbonofre. You can find the original PR at https://github.com/apache/spark/pull/9739 closes #9739 Author: Jean-Baptiste OnofréAuthor: Yin Huai Closes #9868 from yhuai/SPARK-11716. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/03ba56d7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/03ba56d7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/03ba56d7 Branch: refs/heads/master Commit: 03ba56d78f50747710d01c27d409ba2be42ae557 Parents: 89fd9bd Author: Jean-Baptiste Onofré Authored: Fri Nov 20 14:45:40 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 14:45:40 2015 -0800 -- .../org/apache/spark/sql/UDFRegistration.scala | 48 ++-- .../scala/org/apache/spark/sql/UDFSuite.scala | 15 ++ 2 files changed, 39 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/03ba56d7/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala index fc4d093..051694c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala @@ -88,7 +88,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try($inputTypes).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) - UserDefinedFunction(func, dataType) + UserDefinedFunction(func, dataType, inputTypes) }""") } @@ -120,7 +120,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -133,7 +133,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -146,7 +146,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: ScalaReflection.schemaFor[A2].dataType :: Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -159,7 +159,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: ScalaReflection.schemaFor[A2].dataType :: ScalaReflection.schemaFor[A3].dataType :: Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -172,7 +172,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: ScalaReflection.schemaFor[A2].dataType :: ScalaReflection.schemaFor[A3].dataType :: ScalaReflection.schemaFor[A4].dataType :: Nil).getOrElse(Nil) def builder(e: Seq[Expression]) = ScalaUDF(func, dataType, e, inputTypes) functionRegistry.registerFunction(name, builder) -UserDefinedFunction(func, dataType) +UserDefinedFunction(func, dataType, inputTypes) } /** @@ -185,7 +185,7 @@ class UDFRegistration private[sql] (sqlContext: SQLContext) extends Logging { val inputTypes = Try(ScalaReflection.schemaFor[A1].dataType :: ScalaReflection.schemaFor[A2].dataType :: ScalaReflection.schemaFor[A3].dataType ::
spark git commit: [SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help information for SparkR:::summary correctly
Repository: spark Updated Branches: refs/heads/master 03ba56d78 -> a6239d587 [SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help information for SparkR:::summary correctly Fix use of aliases and changes uses of rdname and seealso `aliases` is the hint for `?` - it should not be linked to some other name - those should be seealso https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html Clean up usage on family, as multiple use of family with the same rdname is causing duplicated See Also html blocks (like http://spark.apache.org/docs/latest/api/R/count.html) Also changing some rdname for dplyr-like variant for better R user visibility in R doc, eg. rbind, summary, mutate, summarize shivaram yanboliang Author: felixcheungCloses #9750 from felixcheung/rdocaliases. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6239d58 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6239d58 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6239d58 Branch: refs/heads/master Commit: a6239d587c638691f52eca3eee905c53fbf35a12 Parents: 03ba56d Author: felixcheung Authored: Fri Nov 20 15:10:55 2015 -0800 Committer: Shivaram Venkataraman Committed: Fri Nov 20 15:10:55 2015 -0800 -- R/pkg/R/DataFrame.R | 96 R/pkg/R/broadcast.R | 1 - R/pkg/R/generics.R | 12 +++--- R/pkg/R/group.R | 12 +++--- 4 files changed, 37 insertions(+), 84 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a6239d58/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 06b0108..8a13e7a 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -254,7 +254,6 @@ setMethod("dtypes", #' @family DataFrame functions #' @rdname columns #' @name columns -#' @aliases names #' @export #' @examples #'\dontrun{ @@ -272,7 +271,6 @@ setMethod("columns", }) }) -#' @family DataFrame functions #' @rdname columns #' @name names setMethod("names", @@ -281,7 +279,6 @@ setMethod("names", columns(x) }) -#' @family DataFrame functions #' @rdname columns #' @name names<- setMethod("names<-", @@ -533,14 +530,8 @@ setMethod("distinct", dataFrame(sdf) }) -#' @title Distinct rows in a DataFrame -# -#' @description Returns a new DataFrame containing distinct rows in this DataFrame -#' -#' @family DataFrame functions -#' @rdname unique +#' @rdname distinct #' @name unique -#' @aliases distinct setMethod("unique", signature(x = "DataFrame"), function(x) { @@ -557,7 +548,7 @@ setMethod("unique", #' #' @family DataFrame functions #' @rdname sample -#' @aliases sample_frac +#' @name sample #' @export #' @examples #'\dontrun{ @@ -579,7 +570,6 @@ setMethod("sample", dataFrame(sdf) }) -#' @family DataFrame functions #' @rdname sample #' @name sample_frac setMethod("sample_frac", @@ -589,16 +579,15 @@ setMethod("sample_frac", sample(x, withReplacement, fraction) }) -#' Count +#' nrow #' #' Returns the number of rows in a DataFrame #' #' @param x A SparkSQL DataFrame #' #' @family DataFrame functions -#' @rdname count +#' @rdname nrow #' @name count -#' @aliases nrow #' @export #' @examples #'\dontrun{ @@ -614,14 +603,8 @@ setMethod("count", callJMethod(x@sdf, "count") }) -#' @title Number of rows for a DataFrame -#' @description Returns number of rows in a DataFrames -#' #' @name nrow -#' -#' @family DataFrame functions #' @rdname nrow -#' @aliases count setMethod("nrow", signature(x = "DataFrame"), function(x) { @@ -870,7 +853,6 @@ setMethod("toRDD", #' @param x a DataFrame #' @return a GroupedData #' @seealso GroupedData -#' @aliases group_by #' @family DataFrame functions #' @rdname groupBy #' @name groupBy @@ -896,7 +878,6 @@ setMethod("groupBy", groupedData(sgd) }) -#' @family DataFrame functions #' @rdname groupBy #' @name group_by setMethod("group_by", @@ -913,7 +894,6 @@ setMethod("group_by", #' @family DataFrame functions #' @rdname agg #' @name agg -#' @aliases summarize #' @export setMethod("agg", signature(x = "DataFrame"), @@ -921,7 +901,6 @@ setMethod("agg", agg(groupBy(x), ...) }) -#' @family DataFrame functions #' @rdname agg #' @name summarize setMethod("summarize", @@ -1092,7 +1071,6 @@ setMethod("[", signature(x = "DataFrame", i = "Column"), #' @family DataFrame functions #' @rdname subset #' @name subset -#' @aliases [ #' @family subsetting
[2/2] spark git commit: [SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using include_example
[SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using include_example Author: Vikas NelamangalaCloses #9689 from vikasnp/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed47b1e6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed47b1e6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed47b1e6 Branch: refs/heads/master Commit: ed47b1e660b830e2d4fac8d6df93f634b260393c Parents: 4b84c72 Author: Vikas Nelamangala Authored: Fri Nov 20 15:18:41 2015 -0800 Committer: Xiangrui Meng Committed: Fri Nov 20 15:18:41 2015 -0800 -- docs/mllib-evaluation-metrics.md| 940 +-- .../JavaBinaryClassificationMetricsExample.java | 113 +++ ...aMultiLabelClassificationMetricsExample.java | 80 ++ ...aMulticlassClassificationMetricsExample.java | 97 ++ .../mllib/JavaRankingMetricsExample.java| 176 .../mllib/JavaRegressionMetricsExample.java | 91 ++ .../binary_classification_metrics_example.py| 55 ++ .../python/mllib/multi_class_metrics_example.py | 69 ++ .../python/mllib/multi_label_metrics_example.py | 61 ++ .../python/mllib/ranking_metrics_example.py | 55 ++ .../python/mllib/regression_metrics_example.py | 59 ++ .../BinaryClassificationMetricsExample.scala| 103 ++ .../mllib/MultiLabelMetricsExample.scala| 69 ++ .../mllib/MulticlassMetricsExample.scala| 99 ++ .../examples/mllib/RankingMetricsExample.scala | 110 +++ .../mllib/RegressionMetricsExample.scala| 67 ++ 16 files changed, 1319 insertions(+), 925 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ed47b1e6/docs/mllib-evaluation-metrics.md -- diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md index f73eff6..6924037 100644 --- a/docs/mllib-evaluation-metrics.md +++ b/docs/mllib-evaluation-metrics.md @@ -104,214 +104,21 @@ data, and evaluate the performance of the algorithm by several binary evaluation Refer to the [`LogisticRegressionWithLBFGS` Scala docs](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS) and [`BinaryClassificationMetrics` Scala docs](api/scala/index.html#org.apache.spark.mllib.evaluation.BinaryClassificationMetrics) for details on the API. -{% highlight scala %} -import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS -import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics -import org.apache.spark.mllib.regression.LabeledPoint -import org.apache.spark.mllib.util.MLUtils - -// Load training data in LIBSVM format -val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_binary_classification_data.txt") - -// Split data into training (60%) and test (40%) -val Array(training, test) = data.randomSplit(Array(0.6, 0.4), seed = 11L) -training.cache() - -// Run training algorithm to build the model -val model = new LogisticRegressionWithLBFGS() - .setNumClasses(2) - .run(training) - -// Clear the prediction threshold so the model will return probabilities -model.clearThreshold - -// Compute raw scores on the test set -val predictionAndLabels = test.map { case LabeledPoint(label, features) => - val prediction = model.predict(features) - (prediction, label) -} - -// Instantiate metrics object -val metrics = new BinaryClassificationMetrics(predictionAndLabels) - -// Precision by threshold -val precision = metrics.precisionByThreshold -precision.foreach { case (t, p) => -println(s"Threshold: $t, Precision: $p") -} - -// Recall by threshold -val recall = metrics.recallByThreshold -recall.foreach { case (t, r) => -println(s"Threshold: $t, Recall: $r") -} - -// Precision-Recall Curve -val PRC = metrics.pr - -// F-measure -val f1Score = metrics.fMeasureByThreshold -f1Score.foreach { case (t, f) => -println(s"Threshold: $t, F-score: $f, Beta = 1") -} - -val beta = 0.5 -val fScore = metrics.fMeasureByThreshold(beta) -f1Score.foreach { case (t, f) => -println(s"Threshold: $t, F-score: $f, Beta = 0.5") -} - -// AUPRC -val auPRC = metrics.areaUnderPR -println("Area under precision-recall curve = " + auPRC) - -// Compute thresholds used in ROC and PR curves -val thresholds = precision.map(_._1) - -// ROC Curve -val roc = metrics.roc - -// AUROC -val auROC = metrics.areaUnderROC -println("Area under ROC = " + auROC) - -{% endhighlight %} +{% include_example scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala %} Refer to the [`LogisticRegressionModel` Java docs](api/java/org/apache/spark/mllib/classification/LogisticRegressionModel.html) and
[1/2] spark git commit: [SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using include_example
Repository: spark Updated Branches: refs/heads/master 4b84c72df -> ed47b1e66 http://git-wip-us.apache.org/repos/asf/spark/blob/ed47b1e6/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala new file mode 100644 index 000..4503c15 --- /dev/null +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples.mllib + +// $example on$ +import org.apache.spark.mllib.evaluation.MultilabelMetrics +import org.apache.spark.rdd.RDD +// $example off$ +import org.apache.spark.{SparkContext, SparkConf} + +object MultiLabelMetricsExample { + def main(args: Array[String]): Unit = { +val conf = new SparkConf().setAppName("MultiLabelMetricsExample") +val sc = new SparkContext(conf) +// $example on$ +val scoreAndLabels: RDD[(Array[Double], Array[Double])] = sc.parallelize( + Seq((Array(0.0, 1.0), Array(0.0, 2.0)), +(Array(0.0, 2.0), Array(0.0, 1.0)), +(Array.empty[Double], Array(0.0)), +(Array(2.0), Array(2.0)), +(Array(2.0, 0.0), Array(2.0, 0.0)), +(Array(0.0, 1.0, 2.0), Array(0.0, 1.0)), +(Array(1.0), Array(1.0, 2.0))), 2) + +// Instantiate metrics object +val metrics = new MultilabelMetrics(scoreAndLabels) + +// Summary stats +println(s"Recall = ${metrics.recall}") +println(s"Precision = ${metrics.precision}") +println(s"F1 measure = ${metrics.f1Measure}") +println(s"Accuracy = ${metrics.accuracy}") + +// Individual label stats +metrics.labels.foreach(label => + println(s"Class $label precision = ${metrics.precision(label)}")) +metrics.labels.foreach(label => println(s"Class $label recall = ${metrics.recall(label)}")) +metrics.labels.foreach(label => println(s"Class $label F1-score = ${metrics.f1Measure(label)}")) + +// Micro stats +println(s"Micro recall = ${metrics.microRecall}") +println(s"Micro precision = ${metrics.microPrecision}") +println(s"Micro F1 measure = ${metrics.microF1Measure}") + +// Hamming loss +println(s"Hamming loss = ${metrics.hammingLoss}") + +// Subset accuracy +println(s"Subset accuracy = ${metrics.subsetAccuracy}") +// $example off$ + } +} +// scalastyle:on println http://git-wip-us.apache.org/repos/asf/spark/blob/ed47b1e6/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala new file mode 100644 index 000..0904449 --- /dev/null +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples.mllib + +// $example on$ +import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS +import
[1/2] spark git commit: [SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md using include_example
Repository: spark Updated Branches: refs/heads/branch-1.6 0665fb5ea -> 1dde97176 http://git-wip-us.apache.org/repos/asf/spark/blob/1dde9717/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala new file mode 100644 index 000..4503c15 --- /dev/null +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples.mllib + +// $example on$ +import org.apache.spark.mllib.evaluation.MultilabelMetrics +import org.apache.spark.rdd.RDD +// $example off$ +import org.apache.spark.{SparkContext, SparkConf} + +object MultiLabelMetricsExample { + def main(args: Array[String]): Unit = { +val conf = new SparkConf().setAppName("MultiLabelMetricsExample") +val sc = new SparkContext(conf) +// $example on$ +val scoreAndLabels: RDD[(Array[Double], Array[Double])] = sc.parallelize( + Seq((Array(0.0, 1.0), Array(0.0, 2.0)), +(Array(0.0, 2.0), Array(0.0, 1.0)), +(Array.empty[Double], Array(0.0)), +(Array(2.0), Array(2.0)), +(Array(2.0, 0.0), Array(2.0, 0.0)), +(Array(0.0, 1.0, 2.0), Array(0.0, 1.0)), +(Array(1.0), Array(1.0, 2.0))), 2) + +// Instantiate metrics object +val metrics = new MultilabelMetrics(scoreAndLabels) + +// Summary stats +println(s"Recall = ${metrics.recall}") +println(s"Precision = ${metrics.precision}") +println(s"F1 measure = ${metrics.f1Measure}") +println(s"Accuracy = ${metrics.accuracy}") + +// Individual label stats +metrics.labels.foreach(label => + println(s"Class $label precision = ${metrics.precision(label)}")) +metrics.labels.foreach(label => println(s"Class $label recall = ${metrics.recall(label)}")) +metrics.labels.foreach(label => println(s"Class $label F1-score = ${metrics.f1Measure(label)}")) + +// Micro stats +println(s"Micro recall = ${metrics.microRecall}") +println(s"Micro precision = ${metrics.microPrecision}") +println(s"Micro F1 measure = ${metrics.microF1Measure}") + +// Hamming loss +println(s"Hamming loss = ${metrics.hammingLoss}") + +// Subset accuracy +println(s"Subset accuracy = ${metrics.subsetAccuracy}") +// $example off$ + } +} +// scalastyle:on println http://git-wip-us.apache.org/repos/asf/spark/blob/1dde9717/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala new file mode 100644 index 000..0904449 --- /dev/null +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples.mllib + +// $example on$ +import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS +import
spark git commit: [SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL
Repository: spark Updated Branches: refs/heads/master 58b4e4f88 -> 968acf3bd [SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL In this PR I delete a method that breaks type inference for aggregators (only in the REPL) The error when this method is present is: ``` :38: error: missing parameter type for expanded function ((x$2) => x$2._2) ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect() ``` Author: Michael ArmbrustCloses #9870 from marmbrus/dataset-repl-agg. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/968acf3b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/968acf3b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/968acf3b Branch: refs/heads/master Commit: 968acf3bd9a502fcad15df3e53e359695ae702cc Parents: 58b4e4f Author: Michael Armbrust Authored: Fri Nov 20 15:36:30 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 15:36:30 2015 -0800 -- .../scala/org/apache/spark/repl/ReplSuite.scala | 24 + .../org/apache/spark/sql/GroupedDataset.scala | 27 +++- .../org/apache/spark/sql/JavaDatasetSuite.java | 8 +++--- 3 files changed, 30 insertions(+), 29 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/968acf3b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala index 081aa03..cbcccb1 100644 --- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala +++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -339,6 +339,30 @@ class ReplSuite extends SparkFunSuite { } } + test("Datasets agg type-inference") { +val output = runInterpreter("local", + """ +|import org.apache.spark.sql.functions._ +|import org.apache.spark.sql.Encoder +|import org.apache.spark.sql.expressions.Aggregator +|import org.apache.spark.sql.TypedColumn +|/** An `Aggregator` that adds up any numeric type returned by the given function. */ +|class SumOf[I, N : Numeric](f: I => N) extends Aggregator[I, N, N] with Serializable { +| val numeric = implicitly[Numeric[N]] +| override def zero: N = numeric.zero +| override def reduce(b: N, a: I): N = numeric.plus(b, f(a)) +| override def merge(b1: N,b2: N): N = numeric.plus(b1, b2) +| override def finish(reduction: N): N = reduction +|} +| +|def sum[I, N : Numeric : Encoder](f: I => N): TypedColumn[I, N] = new SumOf(f).toColumn +|val ds = Seq((1, 1, 2L), (1, 2, 3L), (1, 3, 4L), (2, 1, 5L)).toDS() +|ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect() + """.stripMargin) +assertDoesNotContain("error:", output) +assertDoesNotContain("Exception", output) + } + test("collecting objects of class defined in repl") { val output = runInterpreter("local[2]", """ http://git-wip-us.apache.org/repos/asf/spark/blob/968acf3b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala index 6de3dd6..263f049 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala @@ -146,31 +146,10 @@ class GroupedDataset[K, T] private[sql]( reduce(f.call _) } - /** - * Compute aggregates by specifying a series of aggregate columns, and return a [[DataFrame]]. - * We can call `as[T : Encoder]` to turn the returned [[DataFrame]] to [[Dataset]] again. - * - * The available aggregate methods are defined in [[org.apache.spark.sql.functions]]. - * - * {{{ - * // Selects the age of the oldest employee and the aggregate expense for each department - * - * // Scala: - * import org.apache.spark.sql.functions._ - * df.groupBy("department").agg(max("age"), sum("expense")) - * - * // Java: - * import static org.apache.spark.sql.functions.*; - * df.groupBy("department").agg(max("age"), sum("expense")); - * }}} - * - * We can also use `Aggregator.toColumn` to pass in typed aggregate functions. - * - * @since 1.6.0 - */ + // This is here to prevent us from adding overloads that would be ambiguous. @scala.annotation.varargs - def agg(expr: Column, exprs: Column*): DataFrame = -
spark git commit: [SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL
Repository: spark Updated Branches: refs/heads/branch-1.6 7437a7f5b -> 7e06d51d5 [SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL In this PR I delete a method that breaks type inference for aggregators (only in the REPL) The error when this method is present is: ``` :38: error: missing parameter type for expanded function ((x$2) => x$2._2) ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect() ``` Author: Michael ArmbrustCloses #9870 from marmbrus/dataset-repl-agg. (cherry picked from commit 968acf3bd9a502fcad15df3e53e359695ae702cc) Signed-off-by: Michael Armbrust Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e06d51d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e06d51d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e06d51d Branch: refs/heads/branch-1.6 Commit: 7e06d51d5637d4f8e042a1a230ee48591d08236f Parents: 7437a7f Author: Michael Armbrust Authored: Fri Nov 20 15:36:30 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 15:36:39 2015 -0800 -- .../scala/org/apache/spark/repl/ReplSuite.scala | 24 + .../org/apache/spark/sql/GroupedDataset.scala | 27 +++- .../org/apache/spark/sql/JavaDatasetSuite.java | 8 +++--- 3 files changed, 30 insertions(+), 29 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7e06d51d/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala index 081aa03..cbcccb1 100644 --- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala +++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -339,6 +339,30 @@ class ReplSuite extends SparkFunSuite { } } + test("Datasets agg type-inference") { +val output = runInterpreter("local", + """ +|import org.apache.spark.sql.functions._ +|import org.apache.spark.sql.Encoder +|import org.apache.spark.sql.expressions.Aggregator +|import org.apache.spark.sql.TypedColumn +|/** An `Aggregator` that adds up any numeric type returned by the given function. */ +|class SumOf[I, N : Numeric](f: I => N) extends Aggregator[I, N, N] with Serializable { +| val numeric = implicitly[Numeric[N]] +| override def zero: N = numeric.zero +| override def reduce(b: N, a: I): N = numeric.plus(b, f(a)) +| override def merge(b1: N,b2: N): N = numeric.plus(b1, b2) +| override def finish(reduction: N): N = reduction +|} +| +|def sum[I, N : Numeric : Encoder](f: I => N): TypedColumn[I, N] = new SumOf(f).toColumn +|val ds = Seq((1, 1, 2L), (1, 2, 3L), (1, 3, 4L), (2, 1, 5L)).toDS() +|ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect() + """.stripMargin) +assertDoesNotContain("error:", output) +assertDoesNotContain("Exception", output) + } + test("collecting objects of class defined in repl") { val output = runInterpreter("local[2]", """ http://git-wip-us.apache.org/repos/asf/spark/blob/7e06d51d/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala index 6de3dd6..263f049 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala @@ -146,31 +146,10 @@ class GroupedDataset[K, T] private[sql]( reduce(f.call _) } - /** - * Compute aggregates by specifying a series of aggregate columns, and return a [[DataFrame]]. - * We can call `as[T : Encoder]` to turn the returned [[DataFrame]] to [[Dataset]] again. - * - * The available aggregate methods are defined in [[org.apache.spark.sql.functions]]. - * - * {{{ - * // Selects the age of the oldest employee and the aggregate expense for each department - * - * // Scala: - * import org.apache.spark.sql.functions._ - * df.groupBy("department").agg(max("age"), sum("expense")) - * - * // Java: - * import static org.apache.spark.sql.functions.*; - * df.groupBy("department").agg(max("age"), sum("expense")); - * }}} - * - * We can also use `Aggregator.toColumn` to pass in typed aggregate functions. - * - * @since 1.6.0 - */ + // This is here to prevent us from adding
spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer
Repository: spark Updated Branches: refs/heads/branch-1.6 9c8e17984 -> 0c23dd52d [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception happens, it just return None. This will cause some weird NPE and confuse people. Author: Shixiong ZhuCloses #9847 from zsxwing/pyspark-streaming-exception. (cherry picked from commit be7a2cfd978143f6f265eca63e9e24f755bc9f22) Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c23dd52 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c23dd52 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0c23dd52 Branch: refs/heads/branch-1.6 Commit: 0c23dd52d64d4a3448fb7d21b0e40d13f885bcfa Parents: 9c8e179 Author: Shixiong Zhu Authored: Fri Nov 20 14:23:01 2015 -0800 Committer: Tathagata Das Committed: Fri Nov 20 14:23:18 2015 -0800 -- python/pyspark/streaming/tests.py | 16 python/pyspark/streaming/util.py | 3 +++ 2 files changed, 19 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0c23dd52/python/pyspark/streaming/tests.py -- diff --git a/python/pyspark/streaming/tests.py b/python/pyspark/streaming/tests.py index 3403f6d..a0e0267 100644 --- a/python/pyspark/streaming/tests.py +++ b/python/pyspark/streaming/tests.py @@ -403,6 +403,22 @@ class BasicOperationTests(PySparkStreamingTestCase): expected = [[('k', v)] for v in expected] self._test_func(input, func, expected) +def test_failed_func(self): +input = [self.sc.parallelize([d], 1) for d in range(4)] +input_stream = self.ssc.queueStream(input) + +def failed_func(i): +raise ValueError("failed") + +input_stream.map(failed_func).pprint() +self.ssc.start() +try: +self.ssc.awaitTerminationOrTimeout(10) +except: +return + +self.fail("a failed func should throw an error") + class StreamingListenerTests(PySparkStreamingTestCase): http://git-wip-us.apache.org/repos/asf/spark/blob/0c23dd52/python/pyspark/streaming/util.py -- diff --git a/python/pyspark/streaming/util.py b/python/pyspark/streaming/util.py index b20613b..767c732 100644 --- a/python/pyspark/streaming/util.py +++ b/python/pyspark/streaming/util.py @@ -64,6 +64,7 @@ class TransformFunction(object): return r._jrdd except Exception: traceback.print_exc() +raise def __repr__(self): return "TransformFunction(%s)" % self.func @@ -95,6 +96,7 @@ class TransformFunctionSerializer(object): return bytearray(self.serializer.dumps((func.func, func.deserializers))) except Exception: traceback.print_exc() +raise def loads(self, data): try: @@ -102,6 +104,7 @@ class TransformFunctionSerializer(object): return TransformFunction(self.ctx, f, *deserializers) except Exception: traceback.print_exc() +raise def __repr__(self): return "TransformFunctionSerializer(%s)" % self.serializer - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer
Repository: spark Updated Branches: refs/heads/branch-1.4 5118abb4e -> 94789f374 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception happens, it just return None. This will cause some weird NPE and confuse people. Author: Shixiong ZhuCloses #9847 from zsxwing/pyspark-streaming-exception. (cherry picked from commit be7a2cfd978143f6f265eca63e9e24f755bc9f22) Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/94789f37 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/94789f37 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/94789f37 Branch: refs/heads/branch-1.4 Commit: 94789f37400ea534e051d1df19f3a567646979fd Parents: 5118abb Author: Shixiong Zhu Authored: Fri Nov 20 14:23:01 2015 -0800 Committer: Tathagata Das Committed: Fri Nov 20 14:23:57 2015 -0800 -- python/pyspark/streaming/tests.py | 16 python/pyspark/streaming/util.py | 3 +++ 2 files changed, 19 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/94789f37/python/pyspark/streaming/tests.py -- diff --git a/python/pyspark/streaming/tests.py b/python/pyspark/streaming/tests.py index ab1cc3f..830fd9e 100644 --- a/python/pyspark/streaming/tests.py +++ b/python/pyspark/streaming/tests.py @@ -376,6 +376,22 @@ class BasicOperationTests(PySparkStreamingTestCase): expected = [[('k', v)] for v in expected] self._test_func(input, func, expected) +def test_failed_func(self): +input = [self.sc.parallelize([d], 1) for d in range(4)] +input_stream = self.ssc.queueStream(input) + +def failed_func(i): +raise ValueError("failed") + +input_stream.map(failed_func).pprint() +self.ssc.start() +try: +self.ssc.awaitTerminationOrTimeout(10) +except: +return + +self.fail("a failed func should throw an error") + class WindowFunctionTests(PySparkStreamingTestCase): http://git-wip-us.apache.org/repos/asf/spark/blob/94789f37/python/pyspark/streaming/util.py -- diff --git a/python/pyspark/streaming/util.py b/python/pyspark/streaming/util.py index 34291f3..0a2773b 100644 --- a/python/pyspark/streaming/util.py +++ b/python/pyspark/streaming/util.py @@ -59,6 +59,7 @@ class TransformFunction(object): return r._jrdd except Exception: traceback.print_exc() +raise def __repr__(self): return "TransformFunction(%s)" % self.func @@ -90,6 +91,7 @@ class TransformFunctionSerializer(object): return bytearray(self.serializer.dumps((func.func, func.deserializers))) except Exception: traceback.print_exc() +raise def loads(self, data): try: @@ -97,6 +99,7 @@ class TransformFunctionSerializer(object): return TransformFunction(self.ctx, f, *deserializers) except Exception: traceback.print_exc() +raise def __repr__(self): return "TransformFunctionSerializer(%s)" % self.serializer - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite tests
Repository: spark Updated Branches: refs/heads/master be7a2cfd9 -> 89fd9bd06 [SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite tests In PersistenceEngineSuite, we do not call `close()` on the PersistenceEngine at the end of the test. For the ZooKeeperPersistenceEngine, this causes us to leak a ZooKeeper client, causing the logs of unrelated tests to be periodically spammed with connection error messages from that client: ``` 15/11/20 05:13:35.789 pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741) INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:15741. Will not attempt to authenticate using SASL (unknown error) 15/11/20 05:13:35.790 pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741) WARN ClientCnxn: Session 0x15124ff48dd for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) ``` This patch fixes this by using a `finally` block. Author: Josh RosenCloses #9864 from JoshRosen/close-zookeeper-client-in-tests. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/89fd9bd0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/89fd9bd0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/89fd9bd0 Branch: refs/heads/master Commit: 89fd9bd06160fa89dedbf685bfe159ffe4a06ec6 Parents: be7a2cf Author: Josh Rosen Authored: Fri Nov 20 14:31:26 2015 -0800 Committer: Josh Rosen Committed: Fri Nov 20 14:31:26 2015 -0800 -- .../deploy/master/PersistenceEngineSuite.scala | 100 ++- 1 file changed, 52 insertions(+), 48 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/89fd9bd0/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala b/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala index 3477557..7a44728 100644 --- a/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala @@ -63,56 +63,60 @@ class PersistenceEngineSuite extends SparkFunSuite { conf: SparkConf, persistenceEngineCreator: Serializer => PersistenceEngine): Unit = { val serializer = new JavaSerializer(conf) val persistenceEngine = persistenceEngineCreator(serializer) -persistenceEngine.persist("test_1", "test_1_value") -assert(Seq("test_1_value") === persistenceEngine.read[String]("test_")) -persistenceEngine.persist("test_2", "test_2_value") -assert(Set("test_1_value", "test_2_value") === persistenceEngine.read[String]("test_").toSet) -persistenceEngine.unpersist("test_1") -assert(Seq("test_2_value") === persistenceEngine.read[String]("test_")) -persistenceEngine.unpersist("test_2") -assert(persistenceEngine.read[String]("test_").isEmpty) - -// Test deserializing objects that contain RpcEndpointRef -val testRpcEnv = RpcEnv.create("test", "localhost", 12345, conf, new SecurityManager(conf)) try { - // Create a real endpoint so that we can test RpcEndpointRef deserialization - val workerEndpoint = testRpcEnv.setupEndpoint("worker", new RpcEndpoint { -override val rpcEnv: RpcEnv = testRpcEnv - }) - - val workerToPersist = new WorkerInfo( -id = "test_worker", -host = "127.0.0.1", -port = 1, -cores = 0, -memory = 0, -endpoint = workerEndpoint, -webUiPort = 0, -publicAddress = "" - ) - - persistenceEngine.addWorker(workerToPersist) - - val (storedApps, storedDrivers, storedWorkers) = -persistenceEngine.readPersistedData(testRpcEnv) - - assert(storedApps.isEmpty) - assert(storedDrivers.isEmpty) - - // Check deserializing WorkerInfo - assert(storedWorkers.size == 1) - val recoveryWorkerInfo = storedWorkers.head - assert(workerToPersist.id === recoveryWorkerInfo.id) - assert(workerToPersist.host === recoveryWorkerInfo.host) - assert(workerToPersist.port === recoveryWorkerInfo.port) - assert(workerToPersist.cores === recoveryWorkerInfo.cores) -
spark git commit: [SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite tests
Repository: spark Updated Branches: refs/heads/branch-1.6 0c23dd52d -> fbe6888cc [SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite tests In PersistenceEngineSuite, we do not call `close()` on the PersistenceEngine at the end of the test. For the ZooKeeperPersistenceEngine, this causes us to leak a ZooKeeper client, causing the logs of unrelated tests to be periodically spammed with connection error messages from that client: ``` 15/11/20 05:13:35.789 pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741) INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:15741. Will not attempt to authenticate using SASL (unknown error) 15/11/20 05:13:35.790 pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741) WARN ClientCnxn: Session 0x15124ff48dd for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) ``` This patch fixes this by using a `finally` block. Author: Josh RosenCloses #9864 from JoshRosen/close-zookeeper-client-in-tests. (cherry picked from commit 89fd9bd06160fa89dedbf685bfe159ffe4a06ec6) Signed-off-by: Josh Rosen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fbe6888c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fbe6888c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fbe6888c Branch: refs/heads/branch-1.6 Commit: fbe6888cc0c8a16531a4ba7ce5235b84474f1a7b Parents: 0c23dd5 Author: Josh Rosen Authored: Fri Nov 20 14:31:26 2015 -0800 Committer: Josh Rosen Committed: Fri Nov 20 14:31:38 2015 -0800 -- .../deploy/master/PersistenceEngineSuite.scala | 100 ++- 1 file changed, 52 insertions(+), 48 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fbe6888c/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala b/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala index 3477557..7a44728 100644 --- a/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/master/PersistenceEngineSuite.scala @@ -63,56 +63,60 @@ class PersistenceEngineSuite extends SparkFunSuite { conf: SparkConf, persistenceEngineCreator: Serializer => PersistenceEngine): Unit = { val serializer = new JavaSerializer(conf) val persistenceEngine = persistenceEngineCreator(serializer) -persistenceEngine.persist("test_1", "test_1_value") -assert(Seq("test_1_value") === persistenceEngine.read[String]("test_")) -persistenceEngine.persist("test_2", "test_2_value") -assert(Set("test_1_value", "test_2_value") === persistenceEngine.read[String]("test_").toSet) -persistenceEngine.unpersist("test_1") -assert(Seq("test_2_value") === persistenceEngine.read[String]("test_")) -persistenceEngine.unpersist("test_2") -assert(persistenceEngine.read[String]("test_").isEmpty) - -// Test deserializing objects that contain RpcEndpointRef -val testRpcEnv = RpcEnv.create("test", "localhost", 12345, conf, new SecurityManager(conf)) try { - // Create a real endpoint so that we can test RpcEndpointRef deserialization - val workerEndpoint = testRpcEnv.setupEndpoint("worker", new RpcEndpoint { -override val rpcEnv: RpcEnv = testRpcEnv - }) - - val workerToPersist = new WorkerInfo( -id = "test_worker", -host = "127.0.0.1", -port = 1, -cores = 0, -memory = 0, -endpoint = workerEndpoint, -webUiPort = 0, -publicAddress = "" - ) - - persistenceEngine.addWorker(workerToPersist) - - val (storedApps, storedDrivers, storedWorkers) = -persistenceEngine.readPersistedData(testRpcEnv) - - assert(storedApps.isEmpty) - assert(storedDrivers.isEmpty) - - // Check deserializing WorkerInfo - assert(storedWorkers.size == 1) - val recoveryWorkerInfo = storedWorkers.head - assert(workerToPersist.id === recoveryWorkerInfo.id) - assert(workerToPersist.host === recoveryWorkerInfo.host) -
spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer
Repository: spark Updated Branches: refs/heads/branch-1.5 9a906c1c3 -> e9ae1fda9 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception happens, it just return None. This will cause some weird NPE and confuse people. Author: Shixiong ZhuCloses #9847 from zsxwing/pyspark-streaming-exception. (cherry picked from commit be7a2cfd978143f6f265eca63e9e24f755bc9f22) Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9ae1fda Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e9ae1fda Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e9ae1fda Branch: refs/heads/branch-1.5 Commit: e9ae1fda9e6009cf95f9a98ba130297126155e06 Parents: 9a906c1 Author: Shixiong Zhu Authored: Fri Nov 20 14:23:01 2015 -0800 Committer: Tathagata Das Committed: Fri Nov 20 14:23:38 2015 -0800 -- python/pyspark/streaming/tests.py | 16 python/pyspark/streaming/util.py | 3 +++ 2 files changed, 19 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e9ae1fda/python/pyspark/streaming/tests.py -- diff --git a/python/pyspark/streaming/tests.py b/python/pyspark/streaming/tests.py index 41f94af..63a5fd0 100644 --- a/python/pyspark/streaming/tests.py +++ b/python/pyspark/streaming/tests.py @@ -391,6 +391,22 @@ class BasicOperationTests(PySparkStreamingTestCase): expected = [[('k', v)] for v in expected] self._test_func(input, func, expected) +def test_failed_func(self): +input = [self.sc.parallelize([d], 1) for d in range(4)] +input_stream = self.ssc.queueStream(input) + +def failed_func(i): +raise ValueError("failed") + +input_stream.map(failed_func).pprint() +self.ssc.start() +try: +self.ssc.awaitTerminationOrTimeout(10) +except: +return + +self.fail("a failed func should throw an error") + class WindowFunctionTests(PySparkStreamingTestCase): http://git-wip-us.apache.org/repos/asf/spark/blob/e9ae1fda/python/pyspark/streaming/util.py -- diff --git a/python/pyspark/streaming/util.py b/python/pyspark/streaming/util.py index b20613b..767c732 100644 --- a/python/pyspark/streaming/util.py +++ b/python/pyspark/streaming/util.py @@ -64,6 +64,7 @@ class TransformFunction(object): return r._jrdd except Exception: traceback.print_exc() +raise def __repr__(self): return "TransformFunction(%s)" % self.func @@ -95,6 +96,7 @@ class TransformFunctionSerializer(object): return bytearray(self.serializer.dumps((func.func, func.deserializers))) except Exception: traceback.print_exc() +raise def loads(self, data): try: @@ -102,6 +104,7 @@ class TransformFunctionSerializer(object): return TransformFunction(self.ctx, f, *deserializers) except Exception: traceback.print_exc() +raise def __repr__(self): return "TransformFunctionSerializer(%s)" % self.serializer - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer
Repository: spark Updated Branches: refs/heads/master 9ed4ad426 -> be7a2cfd9 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception happens, it just return None. This will cause some weird NPE and confuse people. Author: Shixiong ZhuCloses #9847 from zsxwing/pyspark-streaming-exception. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/be7a2cfd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/be7a2cfd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/be7a2cfd Branch: refs/heads/master Commit: be7a2cfd978143f6f265eca63e9e24f755bc9f22 Parents: 9ed4ad4 Author: Shixiong Zhu Authored: Fri Nov 20 14:23:01 2015 -0800 Committer: Tathagata Das Committed: Fri Nov 20 14:23:01 2015 -0800 -- python/pyspark/streaming/tests.py | 16 python/pyspark/streaming/util.py | 3 +++ 2 files changed, 19 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/be7a2cfd/python/pyspark/streaming/tests.py -- diff --git a/python/pyspark/streaming/tests.py b/python/pyspark/streaming/tests.py index 3403f6d..a0e0267 100644 --- a/python/pyspark/streaming/tests.py +++ b/python/pyspark/streaming/tests.py @@ -403,6 +403,22 @@ class BasicOperationTests(PySparkStreamingTestCase): expected = [[('k', v)] for v in expected] self._test_func(input, func, expected) +def test_failed_func(self): +input = [self.sc.parallelize([d], 1) for d in range(4)] +input_stream = self.ssc.queueStream(input) + +def failed_func(i): +raise ValueError("failed") + +input_stream.map(failed_func).pprint() +self.ssc.start() +try: +self.ssc.awaitTerminationOrTimeout(10) +except: +return + +self.fail("a failed func should throw an error") + class StreamingListenerTests(PySparkStreamingTestCase): http://git-wip-us.apache.org/repos/asf/spark/blob/be7a2cfd/python/pyspark/streaming/util.py -- diff --git a/python/pyspark/streaming/util.py b/python/pyspark/streaming/util.py index b20613b..767c732 100644 --- a/python/pyspark/streaming/util.py +++ b/python/pyspark/streaming/util.py @@ -64,6 +64,7 @@ class TransformFunction(object): return r._jrdd except Exception: traceback.print_exc() +raise def __repr__(self): return "TransformFunction(%s)" % self.func @@ -95,6 +96,7 @@ class TransformFunctionSerializer(object): return bytearray(self.serializer.dumps((func.func, func.deserializers))) except Exception: traceback.print_exc() +raise def loads(self, data): try: @@ -102,6 +104,7 @@ class TransformFunctionSerializer(object): return TransformFunction(self.ctx, f, *deserializers) except Exception: traceback.print_exc() +raise def __repr__(self): return "TransformFunctionSerializer(%s)" % self.serializer - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help information for SparkR:::summary correctly
Repository: spark Updated Branches: refs/heads/branch-1.6 b9b0e1747 -> 11a11f0ff [SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help information for SparkR:::summary correctly Fix use of aliases and changes uses of rdname and seealso `aliases` is the hint for `?` - it should not be linked to some other name - those should be seealso https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html Clean up usage on family, as multiple use of family with the same rdname is causing duplicated See Also html blocks (like http://spark.apache.org/docs/latest/api/R/count.html) Also changing some rdname for dplyr-like variant for better R user visibility in R doc, eg. rbind, summary, mutate, summarize shivaram yanboliang Author: felixcheungCloses #9750 from felixcheung/rdocaliases. (cherry picked from commit a6239d587c638691f52eca3eee905c53fbf35a12) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/11a11f0f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/11a11f0f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/11a11f0f Branch: refs/heads/branch-1.6 Commit: 11a11f0ffcb6d7f6478239cfa3fb5d95877cddab Parents: b9b0e17 Author: felixcheung Authored: Fri Nov 20 15:10:55 2015 -0800 Committer: Shivaram Venkataraman Committed: Fri Nov 20 15:11:08 2015 -0800 -- R/pkg/R/DataFrame.R | 96 R/pkg/R/broadcast.R | 1 - R/pkg/R/generics.R | 12 +++--- R/pkg/R/group.R | 12 +++--- 4 files changed, 37 insertions(+), 84 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/11a11f0f/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 06b0108..8a13e7a 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -254,7 +254,6 @@ setMethod("dtypes", #' @family DataFrame functions #' @rdname columns #' @name columns -#' @aliases names #' @export #' @examples #'\dontrun{ @@ -272,7 +271,6 @@ setMethod("columns", }) }) -#' @family DataFrame functions #' @rdname columns #' @name names setMethod("names", @@ -281,7 +279,6 @@ setMethod("names", columns(x) }) -#' @family DataFrame functions #' @rdname columns #' @name names<- setMethod("names<-", @@ -533,14 +530,8 @@ setMethod("distinct", dataFrame(sdf) }) -#' @title Distinct rows in a DataFrame -# -#' @description Returns a new DataFrame containing distinct rows in this DataFrame -#' -#' @family DataFrame functions -#' @rdname unique +#' @rdname distinct #' @name unique -#' @aliases distinct setMethod("unique", signature(x = "DataFrame"), function(x) { @@ -557,7 +548,7 @@ setMethod("unique", #' #' @family DataFrame functions #' @rdname sample -#' @aliases sample_frac +#' @name sample #' @export #' @examples #'\dontrun{ @@ -579,7 +570,6 @@ setMethod("sample", dataFrame(sdf) }) -#' @family DataFrame functions #' @rdname sample #' @name sample_frac setMethod("sample_frac", @@ -589,16 +579,15 @@ setMethod("sample_frac", sample(x, withReplacement, fraction) }) -#' Count +#' nrow #' #' Returns the number of rows in a DataFrame #' #' @param x A SparkSQL DataFrame #' #' @family DataFrame functions -#' @rdname count +#' @rdname nrow #' @name count -#' @aliases nrow #' @export #' @examples #'\dontrun{ @@ -614,14 +603,8 @@ setMethod("count", callJMethod(x@sdf, "count") }) -#' @title Number of rows for a DataFrame -#' @description Returns number of rows in a DataFrames -#' #' @name nrow -#' -#' @family DataFrame functions #' @rdname nrow -#' @aliases count setMethod("nrow", signature(x = "DataFrame"), function(x) { @@ -870,7 +853,6 @@ setMethod("toRDD", #' @param x a DataFrame #' @return a GroupedData #' @seealso GroupedData -#' @aliases group_by #' @family DataFrame functions #' @rdname groupBy #' @name groupBy @@ -896,7 +878,6 @@ setMethod("groupBy", groupedData(sgd) }) -#' @family DataFrame functions #' @rdname groupBy #' @name group_by setMethod("group_by", @@ -913,7 +894,6 @@ setMethod("group_by", #' @family DataFrame functions #' @rdname agg #' @name agg -#' @aliases summarize #' @export setMethod("agg", signature(x = "DataFrame"), @@ -921,7 +901,6 @@ setMethod("agg", agg(groupBy(x), ...) }) -#' @family DataFrame functions #' @rdname agg #' @name summarize setMethod("summarize", @@ -1092,7 +1071,6 @@ setMethod("[",
spark git commit: [SPARK-11636][SQL] Support classes defined in the REPL with Encoders
Repository: spark Updated Branches: refs/heads/branch-1.6 11a11f0ff -> 0665fb5ea [SPARK-11636][SQL] Support classes defined in the REPL with Encoders #theScaryParts (i.e. changes to the repl, executor classloaders and codegen)... Author: Michael ArmbrustAuthor: Yin Huai Closes #9825 from marmbrus/dataset-replClasses2. (cherry picked from commit 4b84c72dfbb9ddb415fee35f69305b5d7b280891) Signed-off-by: Michael Armbrust Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0665fb5e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0665fb5e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0665fb5e Branch: refs/heads/branch-1.6 Commit: 0665fb5eae931ee93e320da9fedcfd6649ed004e Parents: 11a11f0 Author: Michael Armbrust Authored: Fri Nov 20 15:17:17 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 15:17:31 2015 -0800 -- .../org/apache/spark/repl/SparkIMain.scala | 14 .../scala/org/apache/spark/repl/ReplSuite.scala | 24 .../apache/spark/repl/ExecutorClassLoader.scala | 8 ++- .../expressions/codegen/CodeGenerator.scala | 4 ++-- 4 files changed, 43 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0665fb5e/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala index 4ee605f..829b122 100644 --- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala @@ -1221,10 +1221,16 @@ import org.apache.spark.annotation.DeveloperApi ) } - val preamble = """ -|class %s extends Serializable { -| %s%s%s - """.stripMargin.format(lineRep.readName, envLines.map(" " + _ + ";\n").mkString, importsPreamble, indentCode(toCompute)) + val preamble = s""" +|class ${lineRep.readName} extends Serializable { +| ${envLines.map(" " + _ + ";\n").mkString} +| $importsPreamble +| +| // If we need to construct any objects defined in the REPL on an executor we will need +| // to pass the outer scope to the appropriate encoder. +| org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this) +| ${indentCode(toCompute)} + """.stripMargin val postamble = importsTrailer + "\n}" + "\n" + "object " + lineRep.readName + " {\n" + " val INSTANCE = new " + lineRep.readName + "();\n" + http://git-wip-us.apache.org/repos/asf/spark/blob/0665fb5e/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala index 5674dcd..081aa03 100644 --- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala +++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -262,6 +262,9 @@ class ReplSuite extends SparkFunSuite { |import sqlContext.implicits._ |case class TestCaseClass(value: Int) |sc.parallelize(1 to 10).map(x => TestCaseClass(x)).toDF().collect() +| +|// Test Dataset Serialization in the REPL +|Seq(TestCaseClass(1)).toDS().collect() """.stripMargin) assertDoesNotContain("error:", output) assertDoesNotContain("Exception", output) @@ -278,6 +281,27 @@ class ReplSuite extends SparkFunSuite { assertDoesNotContain("java.lang.ClassNotFoundException", output) } + test("Datasets and encoders") { +val output = runInterpreter("local", + """ +|import org.apache.spark.sql.functions._ +|import org.apache.spark.sql.Encoder +|import org.apache.spark.sql.expressions.Aggregator +|import org.apache.spark.sql.TypedColumn +|val simpleSum = new Aggregator[Int, Int, Int] with Serializable { +| def zero: Int = 0 // The initial value. +| def reduce(b: Int, a: Int) = b + a// Add an element to the running total +| def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values. +| def finish(b: Int) = b// Return the final result. +|}.toColumn +| +|val ds = Seq(1, 2, 3, 4).toDS() +|ds.select(simpleSum).collect + """.stripMargin) +
spark git commit: [SPARK-11636][SQL] Support classes defined in the REPL with Encoders
Repository: spark Updated Branches: refs/heads/master a6239d587 -> 4b84c72df [SPARK-11636][SQL] Support classes defined in the REPL with Encoders #theScaryParts (i.e. changes to the repl, executor classloaders and codegen)... Author: Michael ArmbrustAuthor: Yin Huai Closes #9825 from marmbrus/dataset-replClasses2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b84c72d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b84c72d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b84c72d Branch: refs/heads/master Commit: 4b84c72dfbb9ddb415fee35f69305b5d7b280891 Parents: a6239d5 Author: Michael Armbrust Authored: Fri Nov 20 15:17:17 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 15:17:17 2015 -0800 -- .../org/apache/spark/repl/SparkIMain.scala | 14 .../scala/org/apache/spark/repl/ReplSuite.scala | 24 .../apache/spark/repl/ExecutorClassLoader.scala | 8 ++- .../expressions/codegen/CodeGenerator.scala | 4 ++-- 4 files changed, 43 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4b84c72d/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala -- diff --git a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala index 4ee605f..829b122 100644 --- a/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala +++ b/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkIMain.scala @@ -1221,10 +1221,16 @@ import org.apache.spark.annotation.DeveloperApi ) } - val preamble = """ -|class %s extends Serializable { -| %s%s%s - """.stripMargin.format(lineRep.readName, envLines.map(" " + _ + ";\n").mkString, importsPreamble, indentCode(toCompute)) + val preamble = s""" +|class ${lineRep.readName} extends Serializable { +| ${envLines.map(" " + _ + ";\n").mkString} +| $importsPreamble +| +| // If we need to construct any objects defined in the REPL on an executor we will need +| // to pass the outer scope to the appropriate encoder. +| org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this) +| ${indentCode(toCompute)} + """.stripMargin val postamble = importsTrailer + "\n}" + "\n" + "object " + lineRep.readName + " {\n" + " val INSTANCE = new " + lineRep.readName + "();\n" + http://git-wip-us.apache.org/repos/asf/spark/blob/4b84c72d/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala -- diff --git a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala index 5674dcd..081aa03 100644 --- a/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala +++ b/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala @@ -262,6 +262,9 @@ class ReplSuite extends SparkFunSuite { |import sqlContext.implicits._ |case class TestCaseClass(value: Int) |sc.parallelize(1 to 10).map(x => TestCaseClass(x)).toDF().collect() +| +|// Test Dataset Serialization in the REPL +|Seq(TestCaseClass(1)).toDS().collect() """.stripMargin) assertDoesNotContain("error:", output) assertDoesNotContain("Exception", output) @@ -278,6 +281,27 @@ class ReplSuite extends SparkFunSuite { assertDoesNotContain("java.lang.ClassNotFoundException", output) } + test("Datasets and encoders") { +val output = runInterpreter("local", + """ +|import org.apache.spark.sql.functions._ +|import org.apache.spark.sql.Encoder +|import org.apache.spark.sql.expressions.Aggregator +|import org.apache.spark.sql.TypedColumn +|val simpleSum = new Aggregator[Int, Int, Int] with Serializable { +| def zero: Int = 0 // The initial value. +| def reduce(b: Int, a: Int) = b + a// Add an element to the running total +| def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values. +| def finish(b: Int) = b// Return the final result. +|}.toColumn +| +|val ds = Seq(1, 2, 3, 4).toDS() +|ds.select(simpleSum).collect + """.stripMargin) +assertDoesNotContain("error:", output) +assertDoesNotContain("Exception", output) + } + test("SPARK-2632 importing a method from non serializable class
spark git commit: [SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.
Repository: spark Updated Branches: refs/heads/master ed47b1e66 -> 58b4e4f88 [SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch. This mainly moves SqlNewHadoopRDD to the sql package. There is some state that is shared between core and I've left that in core. This allows some other associated minor cleanup. Author: Nong LiCloses #9845 from nongli/spark-11787. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/58b4e4f8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/58b4e4f8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/58b4e4f8 Branch: refs/heads/master Commit: 58b4e4f88a330135c4cec04a30d24ef91bc61d91 Parents: ed47b1e Author: Nong Li Authored: Fri Nov 20 15:30:53 2015 -0800 Committer: Reynold Xin Committed: Fri Nov 20 15:30:53 2015 -0800 -- .../scala/org/apache/spark/rdd/HadoopRDD.scala | 6 +- .../org/apache/spark/rdd/SqlNewHadoopRDD.scala | 317 --- .../apache/spark/rdd/SqlNewHadoopRDDState.scala | 41 +++ .../sql/catalyst/expressions/UnsafeRow.java | 59 +++- .../catalyst/expressions/InputFileName.scala| 6 +- .../parquet/UnsafeRowParquetRecordReader.java | 14 + .../scala/org/apache/spark/sql/SQLConf.scala| 5 + .../execution/datasources/SqlNewHadoopRDD.scala | 299 + .../datasources/parquet/ParquetRelation.scala | 2 +- .../parquet/ParquetFilterSuite.scala| 43 +-- .../datasources/parquet/ParquetIOSuite.scala| 19 ++ 11 files changed, 453 insertions(+), 358 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/58b4e4f8/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala index 7db5834..f37c95b 100644 --- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala @@ -215,8 +215,8 @@ class HadoopRDD[K, V]( // Sets the thread local variable for the file's name split.inputSplit.value match { -case fs: FileSplit => SqlNewHadoopRDD.setInputFileName(fs.getPath.toString) -case _ => SqlNewHadoopRDD.unsetInputFileName() +case fs: FileSplit => SqlNewHadoopRDDState.setInputFileName(fs.getPath.toString) +case _ => SqlNewHadoopRDDState.unsetInputFileName() } // Find a function that will return the FileSystem bytes read by this thread. Do this before @@ -256,7 +256,7 @@ class HadoopRDD[K, V]( override def close() { if (reader != null) { - SqlNewHadoopRDD.unsetInputFileName() + SqlNewHadoopRDDState.unsetInputFileName() // Close the reader and release it. Note: it's very important that we don't close the // reader more than once, since that exposes us to MAPREDUCE-5918 when running against // Hadoop 1.x and older Hadoop 2.x releases. That bug can lead to non-deterministic http://git-wip-us.apache.org/repos/asf/spark/blob/58b4e4f8/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala deleted file mode 100644 index 4d17633..000 --- a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala +++ /dev/null @@ -1,317 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.rdd - -import java.text.SimpleDateFormat -import java.util.Date - -import org.apache.hadoop.conf.{Configurable, Configuration} -import org.apache.hadoop.io.Writable -import org.apache.hadoop.mapreduce._ -import org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit} -import org.apache.spark.broadcast.Broadcast -import
spark git commit: [SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.
Repository: spark Updated Branches: refs/heads/branch-1.6 1dde97176 -> 7437a7f5b [SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch. This mainly moves SqlNewHadoopRDD to the sql package. There is some state that is shared between core and I've left that in core. This allows some other associated minor cleanup. Author: Nong LiCloses #9845 from nongli/spark-11787. (cherry picked from commit 58b4e4f88a330135c4cec04a30d24ef91bc61d91) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7437a7f5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7437a7f5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7437a7f5 Branch: refs/heads/branch-1.6 Commit: 7437a7f5bd06fd304265ab4e708a97fcd8492839 Parents: 1dde971 Author: Nong Li Authored: Fri Nov 20 15:30:53 2015 -0800 Committer: Reynold Xin Committed: Fri Nov 20 15:30:59 2015 -0800 -- .../scala/org/apache/spark/rdd/HadoopRDD.scala | 6 +- .../org/apache/spark/rdd/SqlNewHadoopRDD.scala | 317 --- .../apache/spark/rdd/SqlNewHadoopRDDState.scala | 41 +++ .../sql/catalyst/expressions/UnsafeRow.java | 59 +++- .../catalyst/expressions/InputFileName.scala| 6 +- .../parquet/UnsafeRowParquetRecordReader.java | 14 + .../scala/org/apache/spark/sql/SQLConf.scala| 5 + .../execution/datasources/SqlNewHadoopRDD.scala | 299 + .../datasources/parquet/ParquetRelation.scala | 2 +- .../parquet/ParquetFilterSuite.scala| 43 +-- .../datasources/parquet/ParquetIOSuite.scala| 19 ++ 11 files changed, 453 insertions(+), 358 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7437a7f5/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala index 7db5834..f37c95b 100644 --- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala @@ -215,8 +215,8 @@ class HadoopRDD[K, V]( // Sets the thread local variable for the file's name split.inputSplit.value match { -case fs: FileSplit => SqlNewHadoopRDD.setInputFileName(fs.getPath.toString) -case _ => SqlNewHadoopRDD.unsetInputFileName() +case fs: FileSplit => SqlNewHadoopRDDState.setInputFileName(fs.getPath.toString) +case _ => SqlNewHadoopRDDState.unsetInputFileName() } // Find a function that will return the FileSystem bytes read by this thread. Do this before @@ -256,7 +256,7 @@ class HadoopRDD[K, V]( override def close() { if (reader != null) { - SqlNewHadoopRDD.unsetInputFileName() + SqlNewHadoopRDDState.unsetInputFileName() // Close the reader and release it. Note: it's very important that we don't close the // reader more than once, since that exposes us to MAPREDUCE-5918 when running against // Hadoop 1.x and older Hadoop 2.x releases. That bug can lead to non-deterministic http://git-wip-us.apache.org/repos/asf/spark/blob/7437a7f5/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala deleted file mode 100644 index 4d17633..000 --- a/core/src/main/scala/org/apache/spark/rdd/SqlNewHadoopRDD.scala +++ /dev/null @@ -1,317 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.rdd - -import java.text.SimpleDateFormat -import java.util.Date - -import org.apache.hadoop.conf.{Configurable, Configuration} -import org.apache.hadoop.io.Writable -import org.apache.hadoop.mapreduce._ -import
[1/2] spark git commit: Preparing Spark release v1.6.0-preview1
Repository: spark Updated Branches: refs/heads/branch-1.6 e0bb4e09c -> d409afdbc Preparing Spark release v1.6.0-preview1 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7582425d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7582425d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7582425d Branch: refs/heads/branch-1.6 Commit: 7582425d193d46c3f14b666b551dd42ff54d7ad7 Parents: e0bb4e0 Author: Patrick WendellAuthored: Fri Nov 20 15:43:02 2015 -0800 Committer: Patrick Wendell Committed: Fri Nov 20 15:43:02 2015 -0800 -- assembly/pom.xml| 2 +- bagel/pom.xml | 2 +- core/pom.xml| 2 +- docker-integration-tests/pom.xml| 2 +- examples/pom.xml| 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml | 2 +- external/kafka-assembly/pom.xml | 2 +- external/kafka/pom.xml | 2 +- external/mqtt-assembly/pom.xml | 2 +- external/mqtt/pom.xml | 2 +- external/twitter/pom.xml| 2 +- external/zeromq/pom.xml | 2 +- extras/java8-tests/pom.xml | 2 +- extras/kinesis-asl-assembly/pom.xml | 2 +- extras/kinesis-asl/pom.xml | 2 +- extras/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml | 2 +- launcher/pom.xml| 2 +- mllib/pom.xml | 2 +- network/common/pom.xml | 2 +- network/shuffle/pom.xml | 2 +- network/yarn/pom.xml| 2 +- pom.xml | 2 +- repl/pom.xml| 2 +- sql/catalyst/pom.xml| 2 +- sql/core/pom.xml| 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml| 2 +- streaming/pom.xml | 2 +- tags/pom.xml| 2 +- tools/pom.xml | 2 +- unsafe/pom.xml | 2 +- yarn/pom.xml| 2 +- 35 files changed, 35 insertions(+), 35 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 4b60ee0..fbabaa5 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0-SNAPSHOT +1.6.0 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/bagel/pom.xml -- diff --git a/bagel/pom.xml b/bagel/pom.xml index 672e946..1b3e417 100644 --- a/bagel/pom.xml +++ b/bagel/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0-SNAPSHOT +1.6.0 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index 37e3f16..d32b93b 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0-SNAPSHOT +1.6.0 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/docker-integration-tests/pom.xml -- diff --git a/docker-integration-tests/pom.xml b/docker-integration-tests/pom.xml index dee0c4a..ee9de91 100644 --- a/docker-integration-tests/pom.xml +++ b/docker-integration-tests/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.10 -1.6.0-SNAPSHOT +1.6.0 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index f5ab2a7..37b15bb 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0-SNAPSHOT +1.6.0 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/external/flume-assembly/pom.xml -- diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml index dceedcf..295455a 100644 --- a/external/flume-assembly/pom.xml +++ b/external/flume-assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0-SNAPSHOT +1.6.0 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/7582425d/external/flume-sink/pom.xml
Git Push Summary
Repository: spark Updated Tags: refs/tags/v1.6.0-preview1 [created] 7582425d1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: Preparing development version 1.6.0-SNAPSHOT
Preparing development version 1.6.0-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d409afdb Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d409afdb Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d409afdb Branch: refs/heads/branch-1.6 Commit: d409afdbceb40ea90b1d20656e8ce79bff2ab71f Parents: 7582425 Author: Patrick WendellAuthored: Fri Nov 20 15:43:08 2015 -0800 Committer: Patrick Wendell Committed: Fri Nov 20 15:43:08 2015 -0800 -- assembly/pom.xml| 2 +- bagel/pom.xml | 2 +- core/pom.xml| 2 +- docker-integration-tests/pom.xml| 2 +- examples/pom.xml| 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml | 2 +- external/kafka-assembly/pom.xml | 2 +- external/kafka/pom.xml | 2 +- external/mqtt-assembly/pom.xml | 2 +- external/mqtt/pom.xml | 2 +- external/twitter/pom.xml| 2 +- external/zeromq/pom.xml | 2 +- extras/java8-tests/pom.xml | 2 +- extras/kinesis-asl-assembly/pom.xml | 2 +- extras/kinesis-asl/pom.xml | 2 +- extras/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml | 2 +- launcher/pom.xml| 2 +- mllib/pom.xml | 2 +- network/common/pom.xml | 2 +- network/shuffle/pom.xml | 2 +- network/yarn/pom.xml| 2 +- pom.xml | 2 +- repl/pom.xml| 2 +- sql/catalyst/pom.xml| 2 +- sql/core/pom.xml| 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml| 2 +- streaming/pom.xml | 2 +- tags/pom.xml| 2 +- tools/pom.xml | 2 +- unsafe/pom.xml | 2 +- yarn/pom.xml| 2 +- 35 files changed, 35 insertions(+), 35 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index fbabaa5..4b60ee0 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0 +1.6.0-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/bagel/pom.xml -- diff --git a/bagel/pom.xml b/bagel/pom.xml index 1b3e417..672e946 100644 --- a/bagel/pom.xml +++ b/bagel/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0 +1.6.0-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/core/pom.xml -- diff --git a/core/pom.xml b/core/pom.xml index d32b93b..37e3f16 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0 +1.6.0-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/docker-integration-tests/pom.xml -- diff --git a/docker-integration-tests/pom.xml b/docker-integration-tests/pom.xml index ee9de91..dee0c4a 100644 --- a/docker-integration-tests/pom.xml +++ b/docker-integration-tests/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.10 -1.6.0 +1.6.0-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/examples/pom.xml -- diff --git a/examples/pom.xml b/examples/pom.xml index 37b15bb..f5ab2a7 100644 --- a/examples/pom.xml +++ b/examples/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0 +1.6.0-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/external/flume-assembly/pom.xml -- diff --git a/external/flume-assembly/pom.xml b/external/flume-assembly/pom.xml index 295455a..dceedcf 100644 --- a/external/flume-assembly/pom.xml +++ b/external/flume-assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.10 -1.6.0 +1.6.0-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/d409afdb/external/flume-sink/pom.xml -- diff --git a/external/flume-sink/pom.xml
spark git commit: [HOTFIX] Fix Java Dataset Tests
Repository: spark Updated Branches: refs/heads/master 68ed04683 -> 47815878a [HOTFIX] Fix Java Dataset Tests Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47815878 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47815878 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47815878 Branch: refs/heads/master Commit: 47815878ad5e47e89bfbd57acb848be2ce67a4a5 Parents: 68ed046 Author: Michael ArmbrustAuthored: Fri Nov 20 16:02:03 2015 -0800 Committer: Michael Armbrust Committed: Fri Nov 20 16:03:14 2015 -0800 -- .../test/java/test/org/apache/spark/sql/JavaDatasetSuite.java| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/47815878/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java -- diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java index f7249b8..f32374b 100644 --- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java +++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java @@ -409,8 +409,8 @@ public class JavaDatasetSuite implements Serializable { .as(Encoders.tuple(Encoders.STRING(), Encoders.INT())); Assert.assertEquals( Arrays.asList( -new Tuple4<>("a", 3, 3L, 2L), -new Tuple4<>("b", 3, 3L, 1L)), +new Tuple2<>("a", 3), +new Tuple2<>("b", 3)), agged2.collectAsList()); } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-11819][SQL][FOLLOW-UP] fix scala 2.11 build
Repository: spark Updated Branches: refs/heads/master a2dce22e0 -> 7d3f922c4 [SPARK-11819][SQL][FOLLOW-UP] fix scala 2.11 build seems scala 2.11 doesn't support: define private methods in `trait xxx` and use it in `object xxx extend xxx`. Author: Wenchen FanCloses #9879 from cloud-fan/follow. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7d3f922c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7d3f922c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7d3f922c Branch: refs/heads/master Commit: 7d3f922c4ba76c4193f98234ae662065c39cdfb1 Parents: a2dce22 Author: Wenchen Fan Authored: Fri Nov 20 23:31:19 2015 -0800 Committer: Reynold Xin Committed: Fri Nov 20 23:31:19 2015 -0800 -- .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7d3f922c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala index 4a4a62e..476bece 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala @@ -670,14 +670,14 @@ trait ScalaReflection { * Unlike `schemaFor`, this method won't throw exception for un-supported type, it will return * `NullType` silently instead. */ - protected def silentSchemaFor(tpe: `Type`): Schema = try { + def silentSchemaFor(tpe: `Type`): Schema = try { schemaFor(tpe) } catch { case _: UnsupportedOperationException => Schema(NullType, nullable = true) } /** Returns the full class name for a type. */ - protected def getClassNameFromType(tpe: `Type`): String = { + def getClassNameFromType(tpe: `Type`): String = { tpe.erasure.typeSymbol.asClass.fullName } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org