Rather than "exploding", I suggest you order your data according to tree
diameter, then calculate the cumulative sum of the tree densities, and use
linear interpolation to estimate the percentiles.  For example ...

library(plotrix)

attach(trees.df)

ord <- order(Diameter)
CumDensOrdScaled <- rescale(cumsum(TreesPerAcre[ord]), 0:1)
DiamOrd <- Diameter[ord]
Quants <- c(0, 0.25, 0.5, 0.75, 1)
Perc <- approx(x=CumDensScaled, y=DiamOrd, xout=Quants)$y

plot(DiamOrd, CumDensScaled, type="o", lwd=2)
abline(h=Quants, lty=2)
abline(v=Perc, lty=2)

Jean


On Wed, Nov 20, 2013 at 1:35 PM, Trevor Walker
<trevordaviswal...@gmail.com>wrote:

> I often work with tree data that is sampled with probability proportional
> to size, which presents a special challenge when describing the frequency
> distribution.  For example, R functions like quantile() and fitdistr()
> expect each observation to have equal sample probability.  As a workaround,
> I have been "exploding"/"mushrooming" my data based on the appropriate
> expansion factors.  However, this can take a LONG TIME and I am reaching
> out for more efficient suggestions, particularly for the quantile()
> function.  Example of my workaround:
>
> # trees.df represents random sample with probability proportional to size
> (of diameter) using "basal area factor" of 20
> trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2),
> TreesPerAcre=numeric(10))
> trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576)    # expansion
> factor for each observation
>
> # to obtain percentiles that are weighted by trees per acre, "explode"
> diameter data
> explodeFactor <- 10 # represents ten acres
> treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor ))
> explodedDiameters.df <- data.frame(Diameter=numeric(treeCount))
> k=0 # initialize counter k
> for (i in 1:length(trees.df$Diameter)){
>   for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){
>     k <- k +1
>     explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i]
>    }
> }
>
> quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for
> trees per acre)
> quantile(trees.df$Diameter)             # percentiles biased upwards
>
>
>
> Trevor Walker
> 14906 McKemey Pl.
> Charlotte, NC 28277
> Cell: (936)591-2130
> Email: trevordaviswal...@gmail.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to