Ah, I think that this was supposed to be changed with SPARK-9062. Let me see about reopening 10835 and addressing it.
On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty <janardhan...@gmail.com> wrote: > Is this a bug? > > On Sep 19, 2016 10:10 PM, "janardhan shetty" <janardhan...@gmail.com> wrote: >> >> Hi, >> >> I am hitting this issue. >> https://issues.apache.org/jira/browse/SPARK-10835. >> >> Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is >> appreciated ? >> >> Note: >> Pipeline has Ngram before word2Vec. >> >> Error: >> val word2Vec = new >> Word2Vec().setInputCol("wordsGrams").setOutputCol("features").setVectorSize(128).setMinCount(10) >> >> scala> word2Vec.fit(grams) >> java.lang.IllegalArgumentException: requirement failed: Column wordsGrams >> must be of type ArrayType(StringType,true) but was actually >> ArrayType(StringType,false). >> at scala.Predef$.require(Predef.scala:224) >> at >> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42) >> at >> org.apache.spark.ml.feature.Word2VecBase$class.validateAndTransformSchema(Word2Vec.scala:111) >> at >> org.apache.spark.ml.feature.Word2Vec.validateAndTransformSchema(Word2Vec.scala:121) >> at >> org.apache.spark.ml.feature.Word2Vec.transformSchema(Word2Vec.scala:187) >> at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:70) >> at org.apache.spark.ml.feature.Word2Vec.fit(Word2Vec.scala:170) >> >> >> Github code for Ngram: >> >> >> override protected def validateInputType(inputType: DataType): Unit = { >> require(inputType.sameType(ArrayType(StringType)), >> s"Input type must be ArrayType(StringType) but got $inputType.") >> } >> >> override protected def outputDataType: DataType = new >> ArrayType(StringType, false) >> } >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org