Repository: spark
Updated Branches:
  refs/heads/master d9cf9c21f -> 4be360d4e


[SPARK-11902][ML] Unhandled case in VectorAssembler#transform

There is an unhandled case in the transform method of VectorAssembler if one of 
the input columns doesn't have one of the supported type DoubleType, 
NumericType, BooleanType or VectorUDT.

So, if you try to transform a column of StringType you get a cryptic 
"scala.MatchError: StringType".

This PR aims to fix this, throwing a SparkException when dealing with an 
unknown column type.

Author: BenFradet <benjamin.fra...@gmail.com>

Closes #9885 from BenFradet/SPARK-11902.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4be360d4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4be360d4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4be360d4

Branch: refs/heads/master
Commit: 4be360d4ee6cdb4d06306feca38ddef5212608cf
Parents: d9cf9c2
Author: BenFradet <benjamin.fra...@gmail.com>
Authored: Sun Nov 22 22:05:01 2015 -0800
Committer: Xiangrui Meng <m...@databricks.com>
Committed: Sun Nov 22 22:05:01 2015 -0800

----------------------------------------------------------------------
 .../org/apache/spark/ml/feature/VectorAssembler.scala    |  2 ++
 .../apache/spark/ml/feature/VectorAssemblerSuite.scala   | 11 +++++++++++
 2 files changed, 13 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/4be360d4/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
index 0feec05..801096f 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
@@ -84,6 +84,8 @@ class VectorAssembler(override val uid: String)
             val numAttrs = 
group.numAttributes.getOrElse(first.getAs[Vector](index).size)
             Array.fill(numAttrs)(NumericAttribute.defaultAttr)
           }
+        case otherType =>
+          throw new SparkException(s"VectorAssembler does not support the 
$otherType type")
       }
     }
     val metadata = new AttributeGroup($(outputCol), attrs).toMetadata()

http://git-wip-us.apache.org/repos/asf/spark/blob/4be360d4/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
index fb21ab6..9c1c00f 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
@@ -69,6 +69,17 @@ class VectorAssemblerSuite
     }
   }
 
+  test("transform should throw an exception in case of unsupported type") {
+    val df = sqlContext.createDataFrame(Seq(("a", "b", "c"))).toDF("a", "b", 
"c")
+    val assembler = new VectorAssembler()
+      .setInputCols(Array("a", "b", "c"))
+      .setOutputCol("features")
+    val thrown = intercept[SparkException] {
+      assembler.transform(df)
+    }
+    assert(thrown.getMessage contains "VectorAssembler does not support the 
StringType type")
+  }
+
   test("ML attributes") {
     val browser = NominalAttribute.defaultAttr.withValues("chrome", "firefox", 
"safari")
     val hour = NumericAttribute.defaultAttr.withMin(0.0).withMax(24.0)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to