Hi,

I asked a question here on stackoverflow:

http://stackoverflow.com/questions/22665077/double-gaussian-fitting-with-scikit-learn

But there seems to be more activity on this board, so hopefully someone here 
can help?

In case someone can't access the above...

I am trying to fit two separate Gaussian curves to a double Gaussian 
distribution. I am using this (http://stackoverflow.com/a/19182915/2417180) 
answer to try and do this. I have broken down my code into a minimum working 
example:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab

samples = 100000
data = np.zeros(samples)

mu, sigma = 0.05, 0.015 
data[0:samples/2] = np.random.normal(mu, sigma, (samples/2))
mu, sigma = 0.18, 0.01 
data[(samples/2):samples] = np.random.normal(mu, sigma, (samples/2))

count, bins, ignored = plt.hist(data, 300, normed=True)

clf = mixture.GMM(n_components=2, covariance_type='full')
clf.fit(data)

m1, m2 = clf.means_
w1, w2 = clf.weights_
c1, c2 = clf.covars_

histdist = plt.hist(data, 300, normed=True)
plotgauss1 = lambda x: plt.plot(x,w1*matplotlib.mlab.normpdf(x,m1,c1)[0], 
linewidth=1)
plotgauss2 = lambda x: plt.plot(x,w2*matplotlib.mlab.normpdf(x,m2,c2)[0], 
linewidth=1)

plotgauss1(histdist[1])
plotgauss2(histdist[1]) 

plt.show()

The problem I'm having is that the peaks (on the pdf plots) are of far too high 
a magnitude, and don't fit the data properly. I've been through the 
sklearn.mixture.GMM documentation, and tried changing a few of the parameters, 
but I'm not having any luck.

Which parameter should I be looking at to get the two curves to fit the 
histograms as per the linked solution?
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to