Re: [R] Subsample points for mclust

Mario Valle Tue, 21 Jul 2009 23:46:02 -0700

Nothing is better than asking help to find the answer by myself...

Page 47 of the technical report (tr504.pdf) deals exactly with theproblem of big datasets.

Also I found that mclust is too much for my problem, the optimum numberof Gaussian suggested is way too high. For example for one dataset(downsampled to 1/10) it suggests 9 Gaussian, but the central 7 sum withgood approximation to a single Gaussian, so the dataset is betterdecomposed into only 3 Gaussian.

I admit I'm not rigorous at all...

Bye!
                  mario

Mario Valle wrote:

Hi all!
I have an ordered vector of values. The distribution of these valuescan be modeled by a sum of Gaussians.So I'm using the package 'mclust' to get the Gaussians's parametersfor this 1D distribution. It works very well, but, for input sizesabove 100.000 values it starts taking really forever. Unfortunately mydataset has around 4.6M values...
My question: is it correct to subsample my dataset taking a valueevery N to make mclust happy? Or have I no alternative except usingthe complete dataset?
Excuse my profound ignorance and thank for your help!
mario



--
Ing. Mario Valle
Data Analysis and Visualization Group            | http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS)      | Tel:  +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91) 610.82.82

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsample points for mclust

Reply via email to