Re: SPARK-10835 in 2.0

Sean Owen Tue, 20 Sep 2016 08:07:52 -0700

You can probably just do an identity transformation on the column to
make its type a nullable String array -- ArrayType(StringType, true).
Of course, I'm not sure why Word2Vec must reject a non-null array type
when it can of course handle nullable, but the previous discussion
indicated that this had to do with how UDFs work too.


On Tue, Sep 20, 2016 at 4:03 PM, janardhan shetty
<janardhan...@gmail.com> wrote:
> Hi Sean,
>
> Any suggestions for workaround as of now?
>
> On Sep 20, 2016 7:46 AM, "janardhan shetty" <janardhan...@gmail.com> wrote:
>>
>> Thanks Sean.
>>
>> On Sep 20, 2016 7:45 AM, "Sean Owen" <so...@cloudera.com> wrote:
>>>
>>> Ah, I think that this was supposed to be changed with SPARK-9062. Let
>>> me see about reopening 10835 and addressing it.
>>>
>>> On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty
>>> <janardhan...@gmail.com> wrote:
>>> > Is this a bug?
>>> >
>>> > On Sep 19, 2016 10:10 PM, "janardhan shetty" <janardhan...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I am hitting this issue.
>>> >> https://issues.apache.org/jira/browse/SPARK-10835.
>>> >>
>>> >> Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround
>>> >> is
>>> >> appreciated ?
>>> >>
>>> >> Note:
>>> >> Pipeline has Ngram before word2Vec.
>>> >>
>>> >> Error:
>>> >> val word2Vec = new
>>> >>
>>> >> Word2Vec().setInputCol("wordsGrams").setOutputCol("features").setVectorSize(128).setMinCount(10)
>>> >>
>>> >> scala> word2Vec.fit(grams)
>>> >> java.lang.IllegalArgumentException: requirement failed: Column
>>> >> wordsGrams
>>> >> must be of type ArrayType(StringType,true) but was actually
>>> >> ArrayType(StringType,false).
>>> >>   at scala.Predef$.require(Predef.scala:224)
>>> >>   at
>>> >>
>>> >> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
>>> >>   at
>>> >>
>>> >> org.apache.spark.ml.feature.Word2VecBase$class.validateAndTransformSchema(Word2Vec.scala:111)
>>> >>   at
>>> >>
>>> >> org.apache.spark.ml.feature.Word2Vec.validateAndTransformSchema(Word2Vec.scala:121)
>>> >>   at
>>> >>
>>> >> org.apache.spark.ml.feature.Word2Vec.transformSchema(Word2Vec.scala:187)
>>> >>   at
>>> >> org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:70)
>>> >>   at org.apache.spark.ml.feature.Word2Vec.fit(Word2Vec.scala:170)
>>> >>
>>> >>
>>> >> Github code for Ngram:
>>> >>
>>> >>
>>> >> override protected def validateInputType(inputType: DataType): Unit =
>>> >> {
>>> >>     require(inputType.sameType(ArrayType(StringType)),
>>> >>       s"Input type must be ArrayType(StringType) but got $inputType.")
>>> >>   }
>>> >>
>>> >>   override protected def outputDataType: DataType = new
>>> >> ArrayType(StringType, false)
>>> >> }
>>> >>
>>> >

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: SPARK-10835 in 2.0

Reply via email to