Have you considered the situation of wanting to characterize probability densities of prevalence estimates based on a complex random sample of some large population.
No -- and I stand by my statement. The empirical distribution of the data themselves are the best "characterization" of the density. You and others are free to disagree. -- Bert On 8/7/07, Bert Gunter <[EMAIL PROTECTED]> wrote: > Why would anyone want to fit a mixture of normals with 110 million > observations?? Any questions about the distribution that you would care to > ask can be answered directly from the data. Of course, any test of normality > (or anything else) would be rejected. > > More to the point, the data are certainly not a random sample of anything. > There will be all kinds of systematic nonrandom structure in them. This is > clearly a situation where the researcher needs to think more carefully about > the substantive questions of interest and how the data may shed light on > them, instead of arbitrarily and perhaps reflexively throwing some silly > statistical methodology at them. > > Bert Gunter > Genentech Nonclinical Statistics > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Tim Victor > Sent: Tuesday, August 07, 2007 3:02 PM > To: r-help@stat.math.ethz.ch > Subject: Re: [R] Mixture of Normals with Large Data > > I wasn't aware of this literature, thanks for the references. > > On 8/5/07, RAVI VARADHAN <[EMAIL PROTECTED]> wrote: > > Another possibility is to use "data squashing" methods. Relevant papers > are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen > (1999). > > > > Ravi. > > ____________________________________________________________________ > > > > Ravi Varadhan, Ph.D. > > Assistant Professor, > > Division of Geriatric Medicine and Gerontology > > School of Medicine > > Johns Hopkins University > > > > Ph. (410) 502-2619 > > email: [EMAIL PROTECTED] > > > > > > ----- Original Message ----- > > From: "Charles C. Berry" <[EMAIL PROTECTED]> > > Date: Saturday, August 4, 2007 8:01 pm > > Subject: Re: [R] Mixture of Normals with Large Data > > To: [EMAIL PROTECTED] > > Cc: r-help@stat.math.ethz.ch > > > > > > > On Sat, 4 Aug 2007, Tim Victor wrote: > > > > > > > All: > > > > > > > > I am trying to fit a mixture of 2 normals with > 110 million > > > observations. I > > > > am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and > > > I > > > > continue to run out of memory. Does anyone have any suggestions. > > > > > > > > > If the first few million observations can be regarded as a SRS of the > > > > > > rest, then just use them. Or read in blocks of a convenient size and > > > > > > sample some observations from each block. You can repeat this process > > > a > > > few times to see if the results are sufficiently accurate. > > > > > > Otherwise, read in blocks of a convenient size (perhaps 1 million > > > observations at a time), quantize the data to a manageable number of > > > > > > intervals - maybe a few thousand - and tabulate it. Add the counts > > > over > > > all the blocks. > > > > > > Then use mle() to fit a multinomial likelihood whose probabilities > > > are the > > > masses associated with each bin under a mixture of normals law. > > > > > > Chuck > > > > > > > > > > > Thanks so much, > > > > > > > > Tim > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > ______________________________________________ > > > > R-help@stat.math.ethz.ch mailing list > > > > > > > > PLEASE do read the posting guide > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > Charles C. Berry (858) 534-2098 > > > Dept of > > > Family/Preventive Medicine > > > E UC San Diego > > > La Jolla, San Diego 92093-0901 > > > > > > ______________________________________________ > > > R-help@stat.math.ethz.ch mailing list > > > > > > PLEASE do read the posting guide > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.