On Apr 7, 2005 1:18 AM, Itay Furman <[EMAIL PROTECTED]> wrote: > > On Tue, 5 Apr 2005, Gabor Grothendieck wrote: > > > On Apr 5, 2005 6:59 PM, Itay Furman <[EMAIL PROTECTED]> wrote: > >> > >> Hi, > >> > >> I have a data set, the structure of which is something like this: > >> > >>> a <- rep(c("a", "b"), c(6,6)) > >>> x <- rep(c("x", "y", "z"), c(4,4,4)) > >>> df <- data.frame(a=a, x=x, r=rnorm(12)) > >> > >> The true data set has >1 million rows. The factors "a" and "x" > >> have about 70 levels each; combined together they subset 'df' > >> into ~900 data frames. > >> For each such subset I'd like to compute various statistics > >> including quantiles, but I can't find an efficient way of > > [snip] > > >> I would like to end up with a data frame like this: > >> > >> a x 0% 25% > >> 1 a x -0.7727268 0.1693188 > >> 2 a y -0.3410671 0.1566322 > >> 3 b y -0.2914710 -0.2677410 > >> 4 b z -0.8502875 -0.6505710 > > [snip] > > > One can use > > > > do.call("rbind", by(df, list(a = a, x = x), f)) > > > > where f is the appropriate function. > > > > In this case f can be described in terms of df.quantile which > > is like quantile except it returns a one row data frame: > > > > df.quantile <- function(x,p) > > as.data.frame(t(data.matrix(quantile(x, p)))) > > > > f <- function(df, p = c(0.25, 0.5)) > > cbind(df[1,1:2], df.quantile(df[,"r"], p)) > > > > Thanks! Just what I wanted. > > A minor point is that for some reason the row numbers in the > final data frame are not sequential (see below -- this is not a > consequence of my changes).
These are the original row numbers of the first row of each combo of a and x. If z is the result of do.call you can always do this: row.names(z) <- 1:nrow(z) if this its needed. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html