Thanks to Thomas Lumley, Sundar Dorai-Raj, and Don McQueen for their suggestions. I need the INDICES as part of the output data.frame, which McQueen's solution provided. I generalized his method as follows:

by.to.data.frame <-
function(x, INDICES, FUN){
# Split data.frame x on x[,INDICES]
# and lapply FUN to each data.frame subset,
# returning a data.frame
#
#  Internal functions
   get.Index <- function(x, INDICES){
        Ind <- as.character(x[,INDICES[1]])
        k <- length(INDICES)
        if(k > 1)
                Ind <- paste(Ind, get.Index(x, INDICES[-1]), sep=":")      
                Ind     
    }
    FUN2 <- function(data., INDICES, FUN){
        vec <- FUN(data.)
        Vec <- matrix(vec, nrow=1)
        dimnames(Vec) <- list(NULL, names(vec))
        cbind(data.[1,INDICES], Vec)
    }
#   Combine INDICES
    Ind <- get.Index(x, INDICES)
#   Apply ...:  Do the work.
    Split <- split(x, Ind)
    byFits <- lapply(Split, FUN2, INDICES, FUN)
#   Convert to a data.frame
    do.call('rbind',byFits)     
}

Applying this to my toy problem produces the following:

> by.df <- data.frame(A=rep(c("A1", "A2"), each=3),
+  B=rep(c("B1", "B2"), each=3), x=1:6, y=rep(0:1, length=6))
>
> by.to.data.frame(by.df, c("A", "B"), function(data.)coef(lm(y~x, data.)))
       A  B (Intercept)             x
A1:B1 A1 B1   0.3333333 -1.517960e-16
A2:B2 A2 B2   0.6666667  3.282015e-16

Thanks for the assistance. I can now tackle the real problem that generated this question.

Best Wishes,
Spencer Graves
########################################
Don MacQueen wrote:
Since I don't have your by.df to test with I may not have it exactly right, but something along these lines should work:

byFits <- lapply(split(by.df,paste(by.df$A,by.df$B)),
                 FUN=function(data.) {
                    tmp <- coef(lm(y~x,data.))
                    data.frame(A=unique(data.$A),
                               B=unique(data.$B),
                               intercept=tmp[1],
                               slope=tmp[2])
                   })

byFitsDF <- do.call('rbind',byFits)

That's assuming I've got all the closing parantheses in the right places, since my email software (Eudora) doesn't do R syntax checking!

This approach can get rather slow if by.df is big, or when the computations in FUN are extensive (or both).

If by.df$A has mode character (as opposed to being a factor), then replacing A=unique(data.$A) with A=I(unique(data.$A)) might improve performance. You want to avoid character to factor conversions when using an approach like this.

-Don


At 2:54 PM -0700 6/5/03, Spencer Graves wrote:


Dear R-Help:

I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used "for" loops do do this. Is there a way to use "by"?

Consider the following example:

 > byFits <- by(by.df, list(A=by.df$A, B=by.df$B),
+  function(data.)coef(lm(y~x, data.)))
 > byFits
A: A1
B: B1
  (Intercept)             x
 3.333333e-01 -1.517960e-16
------------------------------------------------------------
A: A2
B: B1
NULL
------------------------------------------------------------
A: A1
B: B2
NULL
------------------------------------------------------------
A: A2
B: B2
 (Intercept)            x
6.666667e-01 3.282015e-16



#############################
Desired output:

data.frame(A=c("A1","A2"), B=c("B1", "B2"),
    .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16))

What's the simplest way to do this?
Thanks,
Spencer Graves

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help




______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to