Hi,
I've been experimenting with the variational clustering method introduced in
the latest version of scikits-learn. I'm having trouble getting these models
to fit properly. I've been experimenting with two small data sets, one is
the 'old faithful' data set [1], and the other is a 4 component data set
from [2]. I'm using the script given on the scikits website [3] but have
replaced the example data with the data sets above.
Clustering using EM (with mixture.GMM) seems to give reasonably reliable
results on both data sets. However when I use DPGMM and VBGMM the clusters
are heavily biased towards 0, and often over generalise. What is more
concerning, is that the component weights don't appear to change during
training. For example, a 2 component DPGMM/VBGMM will have weights = [0.5
0.5] where as the GMM will have weights = [0.64, 0.36].
Both models behave like this with default initialisation parameters and I
have tried a range of alphas.
I have a matlab implemention of variational Bayes EM (non Dirichlet process)
which is able to cluster this data effectively.
Does anyone have any experience with these models and may be able to shed
some light on the problems I am having? I can send a tar of the code/data
I'm using to anyone who is interested.
Thanks, for such a useful toolkit!
Martin
[1] Old faithful dataset:
http://research.microsoft.com/en-us/um/people/cmbishop/prml/webdatasets/datasets.htm
[2] Figueiredo and Jain, Unsupervised Learning of Finite Mixture Models,
PAMI 2002
[3]
http://scikit-learn.sourceforge.net/stable/auto_examples/mixture/plot_gmm.html#example-mixture-plot-gmm-py
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general