[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...

2018-09-12 Thread mgaido91
Github user mgaido91 closed the pull request at:

https://github.com/apache/spark/pull/22391


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...

2018-09-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22391#discussion_r216935856
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -147,4 +147,12 @@ class VectorAssemblerSuite
   .filter(vectorUDF($"features") > 1)
   .count() == 1)
   }
+
+  test("SPARK-25371: VectorAssembler with empty inputCols") {
+val inputDF = Seq(
+  (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), 
Array(3.0.toDF("i", "v")
--- End diff --

yes, because we have no inputDF for the whole class here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...

2018-09-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22391#discussion_r216935736
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -147,4 +147,12 @@ class VectorAssemblerSuite
   .filter(vectorUDF($"features") > 1)
   .count() == 1)
   }
+
+  test("SPARK-25371: VectorAssembler with empty inputCols") {
+val inputDF = Seq(
+  (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), 
Array(3.0.toDF("i", "v")
+val vectorAssembler = new 
VectorAssembler().setInputCols(Array()).setOutputCol("a")
+val output = vectorAssembler.transform(inputDF)
+assert(output.select("a").limit(1).collect().head == 
Row(Vectors.sparse(0, Seq.empty)))
+  }
--- End diff --

Sure, will do, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...

2018-09-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22391#discussion_r216793201
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -147,4 +147,12 @@ class VectorAssemblerSuite
   .filter(vectorUDF($"features") > 1)
   .count() == 1)
   }
+
+  test("SPARK-25371: VectorAssembler with empty inputCols") {
+val inputDF = Seq(
+  (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), 
Array(3.0.toDF("i", "v")
+val vectorAssembler = new 
VectorAssembler().setInputCols(Array()).setOutputCol("a")
+val output = vectorAssembler.transform(inputDF)
+assert(output.select("a").limit(1).collect().head == 
Row(Vectors.sparse(0, Seq.empty)))
+  }
--- End diff --

Since `inputDF` is not important here, can we minimize the change like the 
following? The following will look more similar with the original one.
```scala
  test("SPARK-25371: VectorAssembler with empty inputCols") {
val vectorAssembler = new 
VectorAssembler().setInputCols(Array()).setOutputCol("a")
val output = vectorAssembler.transform(Seq(1).toDF("x"))
assert(output.select("a").limit(1).collect().head == 
Row(Vectors.sparse(0, Seq.empty)))
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...

2018-09-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22391#discussion_r216786657
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -147,4 +147,12 @@ class VectorAssemblerSuite
   .filter(vectorUDF($"features") > 1)
   .count() == 1)
   }
+
+  test("SPARK-25371: VectorAssembler with empty inputCols") {
+val inputDF = Seq(
+  (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), 
Array(3.0.toDF("i", "v")
--- End diff --

@mgaido91 . This is the only difference from the original one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22391: [SPARK-25371][SQL][BACKPORT-2.3] struct() should ...

2018-09-11 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/22391

[SPARK-25371][SQL][BACKPORT-2.3] struct() should allow being called with 0 
args

## What changes were proposed in this pull request?

SPARK-21281 introduced a check for the inputs of `CreateStructLike` to be 
non-empty. This means that `struct()`, which was previously considered valid, 
now throws an Exception.  This behavior change was introduced in 2.3.0. The 
change may break users' application on upgrade and it causes `VectorAssembler` 
to fail when an empty `inputCols` is defined.

The PR removes the added check making `struct()` valid again.

## How was this patch tested?

added UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-25371_2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22391.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22391


commit 66b6bd5b7e1538d60915b62fa1155dcd86f3411a
Author: Marco Gaido 
Date:   2018-09-11T06:16:56Z

[SPARK-25371][SQL] struct() should allow being called with 0 args

SPARK-21281 introduced a check for the inputs of `CreateStructLike` to be 
non-empty. This means that `struct()`, which was previously considered valid, 
now throws an Exception.  This behavior change was introduced in 2.3.0. The 
change may break users' application on upgrade and it causes `VectorAssembler` 
to fail when an empty `inputCols` is defined.

The PR removes the added check making `struct()` valid again.

added UT

Closes #22373 from mgaido91/SPARK-25371.

Authored-by: Marco Gaido 
Signed-off-by: Wenchen Fan 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org