Ah, I think that this was supposed to be changed with SPARK-9062. Let
me see about reopening 10835 and addressing it.

On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty
<janardhan...@gmail.com> wrote:
> Is this a bug?
>
> On Sep 19, 2016 10:10 PM, "janardhan shetty" <janardhan...@gmail.com> wrote:
>>
>> Hi,
>>
>> I am hitting this issue.
>> https://issues.apache.org/jira/browse/SPARK-10835.
>>
>> Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is
>> appreciated ?
>>
>> Note:
>> Pipeline has Ngram before word2Vec.
>>
>> Error:
>> val word2Vec = new
>> Word2Vec().setInputCol("wordsGrams").setOutputCol("features").setVectorSize(128).setMinCount(10)
>>
>> scala> word2Vec.fit(grams)
>> java.lang.IllegalArgumentException: requirement failed: Column wordsGrams
>> must be of type ArrayType(StringType,true) but was actually
>> ArrayType(StringType,false).
>>   at scala.Predef$.require(Predef.scala:224)
>>   at
>> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
>>   at
>> org.apache.spark.ml.feature.Word2VecBase$class.validateAndTransformSchema(Word2Vec.scala:111)
>>   at
>> org.apache.spark.ml.feature.Word2Vec.validateAndTransformSchema(Word2Vec.scala:121)
>>   at
>> org.apache.spark.ml.feature.Word2Vec.transformSchema(Word2Vec.scala:187)
>>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:70)
>>   at org.apache.spark.ml.feature.Word2Vec.fit(Word2Vec.scala:170)
>>
>>
>> Github code for Ngram:
>>
>>
>> override protected def validateInputType(inputType: DataType): Unit = {
>>     require(inputType.sameType(ArrayType(StringType)),
>>       s"Input type must be ArrayType(StringType) but got $inputType.")
>>   }
>>
>>   override protected def outputDataType: DataType = new
>> ArrayType(StringType, false)
>> }
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to