[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...
Github user mgaido91 closed the pull request at: https://github.com/apache/spark/pull/22391 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22391#discussion_r216935856 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -147,4 +147,12 @@ class VectorAssemblerSuite .filter(vectorUDF($"features") > 1) .count() == 1) } + + test("SPARK-25371: VectorAssembler with empty inputCols") { +val inputDF = Seq( + (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), Array(3.0.toDF("i", "v") --- End diff -- yes, because we have no inputDF for the whole class here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22391#discussion_r216935736 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -147,4 +147,12 @@ class VectorAssemblerSuite .filter(vectorUDF($"features") > 1) .count() == 1) } + + test("SPARK-25371: VectorAssembler with empty inputCols") { +val inputDF = Seq( + (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), Array(3.0.toDF("i", "v") +val vectorAssembler = new VectorAssembler().setInputCols(Array()).setOutputCol("a") +val output = vectorAssembler.transform(inputDF) +assert(output.select("a").limit(1).collect().head == Row(Vectors.sparse(0, Seq.empty))) + } --- End diff -- Sure, will do, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22391#discussion_r216793201 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -147,4 +147,12 @@ class VectorAssemblerSuite .filter(vectorUDF($"features") > 1) .count() == 1) } + + test("SPARK-25371: VectorAssembler with empty inputCols") { +val inputDF = Seq( + (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), Array(3.0.toDF("i", "v") +val vectorAssembler = new VectorAssembler().setInputCols(Array()).setOutputCol("a") +val output = vectorAssembler.transform(inputDF) +assert(output.select("a").limit(1).collect().head == Row(Vectors.sparse(0, Seq.empty))) + } --- End diff -- Since `inputDF` is not important here, can we minimize the change like the following? The following will look more similar with the original one. ```scala test("SPARK-25371: VectorAssembler with empty inputCols") { val vectorAssembler = new VectorAssembler().setInputCols(Array()).setOutputCol("a") val output = vectorAssembler.transform(Seq(1).toDF("x")) assert(output.select("a").limit(1).collect().head == Row(Vectors.sparse(0, Seq.empty))) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22391#discussion_r216786657 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -147,4 +147,12 @@ class VectorAssemblerSuite .filter(vectorUDF($"features") > 1) .count() == 1) } + + test("SPARK-25371: VectorAssembler with empty inputCols") { +val inputDF = Seq( + (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), Array(3.0.toDF("i", "v") --- End diff -- @mgaido91 . This is the only difference from the original one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/22391 [SPARK-25371][SQL][BACKPORT-2.3] struct() should allow being called with 0 args ## What changes were proposed in this pull request? SPARK-21281 introduced a check for the inputs of `CreateStructLike` to be non-empty. This means that `struct()`, which was previously considered valid, now throws an Exception. This behavior change was introduced in 2.3.0. The change may break users' application on upgrade and it causes `VectorAssembler` to fail when an empty `inputCols` is defined. The PR removes the added check making `struct()` valid again. ## How was this patch tested? added UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-25371_2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22391.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22391 commit 66b6bd5b7e1538d60915b62fa1155dcd86f3411a Author: Marco Gaido Date: 2018-09-11T06:16:56Z [SPARK-25371][SQL] struct() should allow being called with 0 args SPARK-21281 introduced a check for the inputs of `CreateStructLike` to be non-empty. This means that `struct()`, which was previously considered valid, now throws an Exception. This behavior change was introduced in 2.3.0. The change may break users' application on upgrade and it causes `VectorAssembler` to fail when an empty `inputCols` is defined. The PR removes the added check making `struct()` valid again. added UT Closes #22373 from mgaido91/SPARK-25371. Authored-by: Marco Gaido Signed-off-by: Wenchen Fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org