Hi,

Thanks for the reply. 

I hadn't spotted that I had forgotten to square root the variance. I manually 
entered the standard deviations and this helped me narrow it down. The default 
min_covar was too high for my data, with this set low enough now work.

Matt.
________________________________________
From: Jacob Vanderplas [[email protected]]
Sent: 28 March 2014 14:27
To: [email protected]
Subject: Re: [Scikit-learn-general] Double Gaussian fitting

Hi,
>From what I understand, mlab.normpdf expects the standard deviation, while 
>you're passing it the variance, which is the square of the standard deviation. 
> If you want to see what the fit looks like, I think it's much better to just 
>let sklearn do the work, e.g.

    plt.plot(x, np.exp(clf.eval(x)[0]))

Not sure if that's the entire problem, but that's definitely part of it,
   Jake

 Jake VanderPlas
 Director of Research – Physical Sciences
 eScience Institute, University of Washington
 http://www.vanderplas.com


On Fri, Mar 28, 2014 at 2:25 AM, Balmer, Matthew 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I asked a question here on stackoverflow:

http://stackoverflow.com/questions/22665077/double-gaussian-fitting-with-scikit-learn

But there seems to be more activity on this board, so hopefully someone here 
can help?

In case someone can't access the above...

I am trying to fit two separate Gaussian curves to a double Gaussian 
distribution. I am using this (http://stackoverflow.com/a/19182915/2417180) 
answer to try and do this. I have broken down my code into a minimum working 
example:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab

samples = 100000
data = np.zeros(samples)

mu, sigma = 0.05, 0.015
data[0:samples/2] = np.random.normal(mu, sigma, (samples/2))
mu, sigma = 0.18, 0.01
data[(samples/2):samples] = np.random.normal(mu, sigma, (samples/2))

count, bins, ignored = plt.hist(data, 300, normed=True)

clf = mixture.GMM(n_components=2, covariance_type='full')
clf.fit(data)

m1, m2 = clf.means_
w1, w2 = clf.weights_
c1, c2 = clf.covars_

histdist = plt.hist(data, 300, normed=True)
plotgauss1 = lambda x: plt.plot(x,w1*matplotlib.mlab.normpdf(x,m1,c1)[0], 
linewidth=1)
plotgauss2 = lambda x: plt.plot(x,w2*matplotlib.mlab.normpdf(x,m2,c2)[0], 
linewidth=1)

plotgauss1(histdist[1])
plotgauss2(histdist[1])

plt.show()

The problem I'm having is that the peaks (on the pdf plots) are of far too high 
a magnitude, and don't fit the data properly. I've been through the 
sklearn.mixture.GMM documentation, and tried changing a few of the parameters, 
but I'm not having any luck.

Which parameter should I be looking at to get the two curves to fit the 
histograms as per the linked solution?
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to