[ https://issues.apache.org/jira/browse/SPARK-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15695481#comment-15695481 ]
Hao Ren commented on SPARK-18581: --------------------------------- [~srowen] I have updated the description. The problem is that my covariance matrix is non invertible, since one of the features is zero for all data points. > MultivariateGaussian does not check if covariance matrix is invertible > ---------------------------------------------------------------------- > > Key: SPARK-18581 > URL: https://issues.apache.org/jira/browse/SPARK-18581 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.6.2, 2.0.2 > Reporter: Hao Ren > > When training GaussianMixtureModel, I found some probability much larger than > 1. That leads me to that fact that, the value returned by > MultivariateGaussian.pdf can be 10^5, etc. > After reviewing the code, I found that problem lies in the computation of > determinant of the covariance matrix. > The computation is simplified by using pseudo-determinant of a positive > defined matrix. > In my case, I have a feature = 0 for all data point. > As a result, covariance matrix is not invertible <=> det(covariance matrix) = > 0 => pseudo-determinant will be very close to zero, > Thus, log(pseudo-determinant) will be a large negative number which finally > make logpdf very biger, pdf will be even bigger > 1. > As said in comments of MultivariateGaussian.scala, > """ > Singular values are considered to be non-zero only if they exceed a tolerance > based on machine precision. > """ > But if a singular value is considered to be zero, means the covariance matrix > is non invertible which is a contradiction to the assumption that it should > be invertible. > So we should check if there a single value is smaller than the tolerance > before computing the pseudo determinant -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org