[ https://issues.apache.org/jira/browse/SPARK-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326229#comment-14326229 ]
Travis Galoppo commented on SPARK-5016: --------------------------------------- [~josephkb] My previous comment got me thinking about how to make the algorithm usable in higher dimensions,,, the underflow problem is caused by the addition of EPSILON to every likelihood value computed; this is done to avoid some numerical gotchas... but EPSILON is determined such that 1.0 + (EPSILON / 2) == 1.0, which dominates the densities as dimension increases. We could derive a smaller epsilon value based on the maximum density that we expect to see, eg, such that x + (EPSILON / 2) == x, where x = (2 * pi)^-(k/2) ... this, of course, is somewhat simplified because it "assumes" the covariance matrix has determinant of 1, but it would lead to a lower epsilon value and likely extend the utility of the algorithm into higher dimensions ... and likely make this ticket more relevant. > GaussianMixtureEM should distribute matrix inverse for large numFeatures, k > --------------------------------------------------------------------------- > > Key: SPARK-5016 > URL: https://issues.apache.org/jira/browse/SPARK-5016 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.2.0 > Reporter: Joseph K. Bradley > > If numFeatures or k are large, GMM EM should distribute the matrix inverse > computation for Gaussian initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org