Re: Positive log-likelihood with Gaussian mixture

Simon Dirmeier Tue, 29 May 2018 04:08:52 -0700

Hey,

sorry for the late reply. I cannot share the data but the problem can bereproduced easily, like below.I wanted to check with sklearn and observe a similar behaviour, i.e. apositive per-sample average log-likelihood(http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.score).

I don't think it is necessarily an issue with the implementation, butmaybe due to parameter identifiability or so?

As far as I can tell, the variances seem to be ok.

Thanks for looking into this.

Best,
Simon
/


import scipy//
import sklearn.mixture//
from scipy.stats import multivariate_normal//
from sklearn.mixture import GaussianMixture//
//
//scipy.random.seed(23)//
//X = multivariate_normal.rvs(mean=scipy.ones(10), size=100)//
//
//dff = map(lambda x: (int(x[0]), Vectors.dense(x[0:])), X)//
//df = spark.createDataFrame(dff, schema=["label", "features"])//
//
/
/for i in [100, 90, 80, 70, 60, 50]://

// km = pyspark.ml.clustering.GaussianMixture(k=10,seed=23).fit(df.limit(i))//// sk_gmm = sklearn.mixture.GaussianMixture(10,random_state=23).fit(X[:i, :])//// print(df.limit(i).count(), X[:i, :].shape[0],km.summary.logLikelihood, sk_gmm.score(X[:i, :]))//

//
/

/100 100 368.37475644171036 -1.54949312502 90 90 1026.0845291011551.16196607062 80 80 2245.427539835042 4.25769131857 70 701940.0122633489268 10.0949992881 60 60 2255.002313247103 14.049782372550 50 -140.82605873444814 21.2423016046/

Re: Positive log-likelihood with Gaussian mixture

Reply via email to