Re: [R] Percentiles for unequal probability sample

2013-11-21 Thread Thomas Lumley
On Thu, Nov 21, 2013 at 8:35 AM, Trevor Walker
wrote:

> I often work with tree data that is sampled with probability proportional
> to size, which presents a special challenge when describing the frequency
> distribution.



The survey package does lots of calculations (including quantiles) for
unequal-probability samples.


-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Percentiles for unequal probability sample

2013-11-20 Thread Roger Koenker
You could try:

require(quantreg)
qs <- rq(x ~ 1, weights = w, tau = 1:3/4)


Roger Koenker
rkoen...@illinois.edu




On Nov 20, 2013, at 4:56 PM, David Winsemius wrote:

> 
> On Nov 20, 2013, at 11:35 AM, Trevor Walker wrote:
> 
>> I often work with tree data that is sampled with probability proportional
>> to size, which presents a special challenge when describing the frequency
>> distribution.  For example, R functions like quantile() and fitdistr()
>> expect each observation to have equal sample probability.  As a workaround,
>> I have been "exploding"/"mushrooming" my data based on the appropriate
>> expansion factors.  However, this can take a LONG TIME and I am reaching
>> out for more efficient suggestions, particularly for the quantile()
>> function.  Example of my workaround:
>> 
> 
> The 'Hmisc' package has a `wtd.quantile` function. I seem to remember that it 
> might have been borrowed from the quantreg package.
> 
> 
>> # trees.df represents random sample with probability proportional to size
>> (of diameter) using "basal area factor" of 20
>> trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2),
>> TreesPerAcre=numeric(10))
>> trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576)# expansion
>> factor for each observation
>> 
>> # to obtain percentiles that are weighted by trees per acre, "explode"
>> diameter data
>> explodeFactor <- 10 # represents ten acres
>> treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor ))
>> explodedDiameters.df <- data.frame(Diameter=numeric(treeCount))
>> k=0 # initialize counter k
>> for (i in 1:length(trees.df$Diameter)){
>> for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){
>>   k <- k +1
>>   explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i]
>>  }
>> }
>> 
>> quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for
>> trees per acre)
>> quantile(trees.df$Diameter) # percentiles biased upwards
>> 
>> 
>> 
>> Trevor Walker
>> 
> -- 
> 
> David Winsemius
> Alameda, CA, USA
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Percentiles for unequal probability sample

2013-11-20 Thread David Winsemius

On Nov 20, 2013, at 11:35 AM, Trevor Walker wrote:

> I often work with tree data that is sampled with probability proportional
> to size, which presents a special challenge when describing the frequency
> distribution.  For example, R functions like quantile() and fitdistr()
> expect each observation to have equal sample probability.  As a workaround,
> I have been "exploding"/"mushrooming" my data based on the appropriate
> expansion factors.  However, this can take a LONG TIME and I am reaching
> out for more efficient suggestions, particularly for the quantile()
> function.  Example of my workaround:
> 

The 'Hmisc' package has a `wtd.quantile` function. I seem to remember that it 
might have been borrowed from the quantreg package.


> # trees.df represents random sample with probability proportional to size
> (of diameter) using "basal area factor" of 20
> trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2),
> TreesPerAcre=numeric(10))
> trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576)# expansion
> factor for each observation
> 
> # to obtain percentiles that are weighted by trees per acre, "explode"
> diameter data
> explodeFactor <- 10 # represents ten acres
> treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor ))
> explodedDiameters.df <- data.frame(Diameter=numeric(treeCount))
> k=0 # initialize counter k
> for (i in 1:length(trees.df$Diameter)){
>  for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){
>k <- k +1
>explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i]
>   }
> }
> 
> quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for
> trees per acre)
> quantile(trees.df$Diameter) # percentiles biased upwards
> 
> 
> 
> Trevor Walker
> 
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Percentiles for unequal probability sample

2013-11-20 Thread Adams, Jean
Rather than "exploding", I suggest you order your data according to tree
diameter, then calculate the cumulative sum of the tree densities, and use
linear interpolation to estimate the percentiles.  For example ...

library(plotrix)

attach(trees.df)

ord <- order(Diameter)
CumDensOrdScaled <- rescale(cumsum(TreesPerAcre[ord]), 0:1)
DiamOrd <- Diameter[ord]
Quants <- c(0, 0.25, 0.5, 0.75, 1)
Perc <- approx(x=CumDensScaled, y=DiamOrd, xout=Quants)$y

plot(DiamOrd, CumDensScaled, type="o", lwd=2)
abline(h=Quants, lty=2)
abline(v=Perc, lty=2)

Jean


On Wed, Nov 20, 2013 at 1:35 PM, Trevor Walker
wrote:

> I often work with tree data that is sampled with probability proportional
> to size, which presents a special challenge when describing the frequency
> distribution.  For example, R functions like quantile() and fitdistr()
> expect each observation to have equal sample probability.  As a workaround,
> I have been "exploding"/"mushrooming" my data based on the appropriate
> expansion factors.  However, this can take a LONG TIME and I am reaching
> out for more efficient suggestions, particularly for the quantile()
> function.  Example of my workaround:
>
> # trees.df represents random sample with probability proportional to size
> (of diameter) using "basal area factor" of 20
> trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2),
> TreesPerAcre=numeric(10))
> trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576)# expansion
> factor for each observation
>
> # to obtain percentiles that are weighted by trees per acre, "explode"
> diameter data
> explodeFactor <- 10 # represents ten acres
> treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor ))
> explodedDiameters.df <- data.frame(Diameter=numeric(treeCount))
> k=0 # initialize counter k
> for (i in 1:length(trees.df$Diameter)){
>   for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){
> k <- k +1
> explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i]
>}
> }
>
> quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for
> trees per acre)
> quantile(trees.df$Diameter) # percentiles biased upwards
>
>
>
> Trevor Walker
> 14906 McKemey Pl.
> Charlotte, NC 28277
> Cell: (936)591-2130
> Email: trevordaviswal...@gmail.com
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.