by.to.data.frame <- function(x, INDICES, FUN){ # Split data.frame x on x[,INDICES] # and lapply FUN to each data.frame subset, # returning a data.frame # # Internal functions get.Index <- function(x, INDICES){ Ind <- as.character(x[,INDICES[1]]) k <- length(INDICES) if(k > 1) Ind <- paste(Ind, get.Index(x, INDICES[-1]), sep=":") Ind } FUN2 <- function(data., INDICES, FUN){ vec <- FUN(data.) Vec <- matrix(vec, nrow=1) dimnames(Vec) <- list(NULL, names(vec)) cbind(data.[1,INDICES], Vec) } # Combine INDICES Ind <- get.Index(x, INDICES) # Apply ...: Do the work. Split <- split(x, Ind) byFits <- lapply(Split, FUN2, INDICES, FUN) # Convert to a data.frame do.call('rbind',byFits) }
Applying this to my toy problem produces the following:
> by.df <- data.frame(A=rep(c("A1", "A2"), each=3), + B=rep(c("B1", "B2"), each=3), x=1:6, y=rep(0:1, length=6)) > > by.to.data.frame(by.df, c("A", "B"), function(data.)coef(lm(y~x, data.))) A B (Intercept) x A1:B1 A1 B1 0.3333333 -1.517960e-16 A2:B2 A2 B2 0.6666667 3.282015e-16
Thanks for the assistance. I can now tackle the real problem that generated this question.
Best Wishes, Spencer Graves ######################################## Don MacQueen wrote:
Since I don't have your by.df to test with I may not have it exactly right, but something along these lines should work:
byFits <- lapply(split(by.df,paste(by.df$A,by.df$B)), FUN=function(data.) { tmp <- coef(lm(y~x,data.)) data.frame(A=unique(data.$A), B=unique(data.$B), intercept=tmp[1], slope=tmp[2]) })
byFitsDF <- do.call('rbind',byFits)
That's assuming I've got all the closing parantheses in the right places, since my email software (Eudora) doesn't do R syntax checking!
This approach can get rather slow if by.df is big, or when the computations in FUN are extensive (or both).
If by.df$A has mode character (as opposed to being a factor), then replacing A=unique(data.$A) with A=I(unique(data.$A)) might improve performance. You want to avoid character to factor conversions when using an approach like this.
-Don
At 2:54 PM -0700 6/5/03, Spencer Graves wrote:
Dear R-Help:
I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used "for" loops do do this. Is there a way to use "by"?
Consider the following example:
> byFits <- by(by.df, list(A=by.df$A, B=by.df$B), + function(data.)coef(lm(y~x, data.))) > byFits A: A1 B: B1 (Intercept) x 3.333333e-01 -1.517960e-16 ------------------------------------------------------------ A: A2 B: B1 NULL ------------------------------------------------------------ A: A1 B: B2 NULL ------------------------------------------------------------ A: A2 B: B2 (Intercept) x 6.666667e-01 3.282015e-16
############################# Desired output:
data.frame(A=c("A1","A2"), B=c("B1", "B2"), .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16))
What's the simplest way to do this? Thanks, Spencer Graves
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help