Re: [R] Percentiles for unequal probability sample
On Thu, Nov 21, 2013 at 8:35 AM, Trevor Walker wrote: > I often work with tree data that is sampled with probability proportional > to size, which presents a special challenge when describing the frequency > distribution. The survey package does lots of calculations (including quantiles) for unequal-probability samples. -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Percentiles for unequal probability sample
You could try: require(quantreg) qs <- rq(x ~ 1, weights = w, tau = 1:3/4) Roger Koenker rkoen...@illinois.edu On Nov 20, 2013, at 4:56 PM, David Winsemius wrote: > > On Nov 20, 2013, at 11:35 AM, Trevor Walker wrote: > >> I often work with tree data that is sampled with probability proportional >> to size, which presents a special challenge when describing the frequency >> distribution. For example, R functions like quantile() and fitdistr() >> expect each observation to have equal sample probability. As a workaround, >> I have been "exploding"/"mushrooming" my data based on the appropriate >> expansion factors. However, this can take a LONG TIME and I am reaching >> out for more efficient suggestions, particularly for the quantile() >> function. Example of my workaround: >> > > The 'Hmisc' package has a `wtd.quantile` function. I seem to remember that it > might have been borrowed from the quantreg package. > > >> # trees.df represents random sample with probability proportional to size >> (of diameter) using "basal area factor" of 20 >> trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2), >> TreesPerAcre=numeric(10)) >> trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576)# expansion >> factor for each observation >> >> # to obtain percentiles that are weighted by trees per acre, "explode" >> diameter data >> explodeFactor <- 10 # represents ten acres >> treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor )) >> explodedDiameters.df <- data.frame(Diameter=numeric(treeCount)) >> k=0 # initialize counter k >> for (i in 1:length(trees.df$Diameter)){ >> for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){ >> k <- k +1 >> explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i] >> } >> } >> >> quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for >> trees per acre) >> quantile(trees.df$Diameter) # percentiles biased upwards >> >> >> >> Trevor Walker >> > -- > > David Winsemius > Alameda, CA, USA > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Percentiles for unequal probability sample
On Nov 20, 2013, at 11:35 AM, Trevor Walker wrote: > I often work with tree data that is sampled with probability proportional > to size, which presents a special challenge when describing the frequency > distribution. For example, R functions like quantile() and fitdistr() > expect each observation to have equal sample probability. As a workaround, > I have been "exploding"/"mushrooming" my data based on the appropriate > expansion factors. However, this can take a LONG TIME and I am reaching > out for more efficient suggestions, particularly for the quantile() > function. Example of my workaround: > The 'Hmisc' package has a `wtd.quantile` function. I seem to remember that it might have been borrowed from the quantreg package. > # trees.df represents random sample with probability proportional to size > (of diameter) using "basal area factor" of 20 > trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2), > TreesPerAcre=numeric(10)) > trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576)# expansion > factor for each observation > > # to obtain percentiles that are weighted by trees per acre, "explode" > diameter data > explodeFactor <- 10 # represents ten acres > treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor )) > explodedDiameters.df <- data.frame(Diameter=numeric(treeCount)) > k=0 # initialize counter k > for (i in 1:length(trees.df$Diameter)){ > for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){ >k <- k +1 >explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i] > } > } > > quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for > trees per acre) > quantile(trees.df$Diameter) # percentiles biased upwards > > > > Trevor Walker > -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Percentiles for unequal probability sample
Rather than "exploding", I suggest you order your data according to tree diameter, then calculate the cumulative sum of the tree densities, and use linear interpolation to estimate the percentiles. For example ... library(plotrix) attach(trees.df) ord <- order(Diameter) CumDensOrdScaled <- rescale(cumsum(TreesPerAcre[ord]), 0:1) DiamOrd <- Diameter[ord] Quants <- c(0, 0.25, 0.5, 0.75, 1) Perc <- approx(x=CumDensScaled, y=DiamOrd, xout=Quants)$y plot(DiamOrd, CumDensScaled, type="o", lwd=2) abline(h=Quants, lty=2) abline(v=Perc, lty=2) Jean On Wed, Nov 20, 2013 at 1:35 PM, Trevor Walker wrote: > I often work with tree data that is sampled with probability proportional > to size, which presents a special challenge when describing the frequency > distribution. For example, R functions like quantile() and fitdistr() > expect each observation to have equal sample probability. As a workaround, > I have been "exploding"/"mushrooming" my data based on the appropriate > expansion factors. However, this can take a LONG TIME and I am reaching > out for more efficient suggestions, particularly for the quantile() > function. Example of my workaround: > > # trees.df represents random sample with probability proportional to size > (of diameter) using "basal area factor" of 20 > trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2), > TreesPerAcre=numeric(10)) > trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576)# expansion > factor for each observation > > # to obtain percentiles that are weighted by trees per acre, "explode" > diameter data > explodeFactor <- 10 # represents ten acres > treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor )) > explodedDiameters.df <- data.frame(Diameter=numeric(treeCount)) > k=0 # initialize counter k > for (i in 1:length(trees.df$Diameter)){ > for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){ > k <- k +1 > explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i] >} > } > > quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for > trees per acre) > quantile(trees.df$Diameter) # percentiles biased upwards > > > > Trevor Walker > 14906 McKemey Pl. > Charlotte, NC 28277 > Cell: (936)591-2130 > Email: trevordaviswal...@gmail.com > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.