Hi, I asked a question here on stackoverflow:
http://stackoverflow.com/questions/22665077/double-gaussian-fitting-with-scikit-learn But there seems to be more activity on this board, so hopefully someone here can help? In case someone can't access the above... I am trying to fit two separate Gaussian curves to a double Gaussian distribution. I am using this (http://stackoverflow.com/a/19182915/2417180) answer to try and do this. I have broken down my code into a minimum working example: import numpy as np import matplotlib.pyplot as plt from sklearn import mixture import matplotlib.pyplot import matplotlib.mlab samples = 100000 data = np.zeros(samples) mu, sigma = 0.05, 0.015 data[0:samples/2] = np.random.normal(mu, sigma, (samples/2)) mu, sigma = 0.18, 0.01 data[(samples/2):samples] = np.random.normal(mu, sigma, (samples/2)) count, bins, ignored = plt.hist(data, 300, normed=True) clf = mixture.GMM(n_components=2, covariance_type='full') clf.fit(data) m1, m2 = clf.means_ w1, w2 = clf.weights_ c1, c2 = clf.covars_ histdist = plt.hist(data, 300, normed=True) plotgauss1 = lambda x: plt.plot(x,w1*matplotlib.mlab.normpdf(x,m1,c1)[0], linewidth=1) plotgauss2 = lambda x: plt.plot(x,w2*matplotlib.mlab.normpdf(x,m2,c2)[0], linewidth=1) plotgauss1(histdist[1]) plotgauss2(histdist[1]) plt.show() The problem I'm having is that the peaks (on the pdf plots) are of far too high a magnitude, and don't fit the data properly. I've been through the sklearn.mixture.GMM documentation, and tried changing a few of the parameters, but I'm not having any luck. Which parameter should I be looking at to get the two curves to fit the histograms as per the linked solution? ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
