Seth Hendrickson created SPARK-19313:
----------------------------------------

             Summary: GaussianMixture throws cryptic error when number of 
features is too high
                 Key: SPARK-19313
                 URL: https://issues.apache.org/jira/browse/SPARK-19313
             Project: Spark
          Issue Type: Bug
          Components: ML, MLlib
            Reporter: Seth Hendrickson
            Priority: Minor


The following fails

{code}
    val df = Seq(
      Vectors.sparse(46400, Array(0, 4), Array(3.0, 8.0)),
      Vectors.sparse(46400, Array(1, 5), Array(4.0, 9.0)))
      .map(Tuple1.apply).toDF("features")
    val gm = new GaussianMixture()
    gm.fit(df)
{code}

It fails because GMMs allocate an array of size {{numFeatures * numFeatures}} 
and in this case we'll get integer overflow. We should limit the number of 
features appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to