Hi,
>From what I understand, mlab.normpdf expects the standard deviation, while
you're passing it the variance, which is the square of the standard
deviation. If you want to see what the fit looks like, I think it's much
better to just let sklearn do the work, e.g.
plt.plot(x, np.exp(clf.eval(x)[0]))
Not sure if that's the entire problem, but that's definitely part of it,
Jake
Jake VanderPlas
Director of Research - Physical Sciences
eScience Institute, University of Washington
http://www.vanderplas.com
On Fri, Mar 28, 2014 at 2:25 AM, Balmer, Matthew
<[email protected]>wrote:
> Hi,
>
> I asked a question here on stackoverflow:
>
>
> http://stackoverflow.com/questions/22665077/double-gaussian-fitting-with-scikit-learn
>
> But there seems to be more activity on this board, so hopefully someone
> here can help?
>
> In case someone can't access the above...
>
> I am trying to fit two separate Gaussian curves to a double Gaussian
> distribution. I am using this (http://stackoverflow.com/a/19182915/2417180)
> answer to try and do this. I have broken down my code into a minimum
> working example:
>
> import numpy as np
> import matplotlib.pyplot as plt
> from sklearn import mixture
> import matplotlib.pyplot
> import matplotlib.mlab
>
> samples = 100000
> data = np.zeros(samples)
>
> mu, sigma = 0.05, 0.015
> data[0:samples/2] = np.random.normal(mu, sigma, (samples/2))
> mu, sigma = 0.18, 0.01
> data[(samples/2):samples] = np.random.normal(mu, sigma, (samples/2))
>
> count, bins, ignored = plt.hist(data, 300, normed=True)
>
> clf = mixture.GMM(n_components=2, covariance_type='full')
> clf.fit(data)
>
> m1, m2 = clf.means_
> w1, w2 = clf.weights_
> c1, c2 = clf.covars_
>
> histdist = plt.hist(data, 300, normed=True)
> plotgauss1 = lambda x: plt.plot(x,w1*matplotlib.mlab.normpdf(x,m1,c1)[0],
> linewidth=1)
> plotgauss2 = lambda x: plt.plot(x,w2*matplotlib.mlab.normpdf(x,m2,c2)[0],
> linewidth=1)
>
> plotgauss1(histdist[1])
> plotgauss2(histdist[1])
>
> plt.show()
>
> The problem I'm having is that the peaks (on the pdf plots) are of far too
> high a magnitude, and don't fit the data properly. I've been through the
> sklearn.mixture.GMM documentation, and tried changing a few of the
> parameters, but I'm not having any luck.
>
> Which parameter should I be looking at to get the two curves to fit the
> histograms as per the linked solution?
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general