Re: [R] converting by to a data.frame?
Hi, Don: Thanks for your suggestion to use do.call in my get.Index. I discovered that your version actually produces cosmetically different answers in R 1.6.3 and S-Plus 6.1 for Windows. Fortunately, in the context, this difference was unimportant. Since yours is faster, it is clearly superior. To check my understanding, I generalized my toy example as follows: by.df - data.frame(A=rep(c(A1, A2), each=3), + B=rep(c(B1, B2), each=3), C=rep(c(C1, C2), each=3), + x=1:6, y=rep(0:1, length=6)) With this, your get.Index produced the following in R 1.6.2: get.Index - function(x, INDICES) do.call('paste',c(x[INDICES],sep=':')) get.Index(by.df, c(A, B, C)) [1] A1:B1:C1 A1:B1:C1 A1:B1:C1 A2:B2:C2 A2:B2:C2 A2:B2:C2 In S-Plus 6.1 for Windows, I got the following: get.Index - function(x, INDICES) do.call(paste, c(x[INDICES], sep = :)) get.Index(by.df, c(A, B, C)) [1] 1:1:1 1:1:1 1:1:1 2:2:2 2:2:2 2:2:2 Fortunately, this difference is unimportant in this context, as by.to.data.frame produces the same answer in both cases. Moreover, your answer converts to a single call to paste, which means that it should be faster. For someone who understands do.call, your version is also easier to read. Thanks again for your help. Spencer Graves ## Don MacQueen wrote: Glad to hear it was helpful. You can also use the do.call trick for the paste indices business. Try get.Index - function(x, INDICES) do.call('paste',c(x[INDICES],sep=':')) This works because a data frame is actually a list, albeit a special kind of list, and do.call() wants a list for its second arg. -Don ### Thanks to Thomas Lumley, Sundar Dorai-Raj, and Don McQueen for their suggestions. I need the INDICES as part of the output data.frame, which McQueen's solution provided. I generalized his method as follows: by.to.data.frame - function(x, INDICES, FUN){ # Split data.frame x on x[,INDICES] # and lapply FUN to each data.frame subset, # returning a data.frame # # Internal functions get.Index - function(x, INDICES){ Ind - as.character(x[,INDICES[1]]) k - length(INDICES) if(k 1) Ind - paste(Ind, get.Index(x, INDICES[-1]), sep=:) Ind } FUN2 - function(data., INDICES, FUN){ vec - FUN(data.) Vec - matrix(vec, nrow=1) dimnames(Vec) - list(NULL, names(vec)) cbind(data.[1,INDICES], Vec) } # Combine INDICES Ind - get.Index(x, INDICES) # Apply ...: Do the work. Split - split(x, Ind) byFits - lapply(Split, FUN2, INDICES, FUN) # Convert to a data.frame do.call('rbind',byFits) } Applying this to my toy problem produces the following: by.df - data.frame(A=rep(c(A1, A2), each=3), + B=rep(c(B1, B2), each=3), x=1:6, y=rep(0:1, length=6)) by.to.data.frame(by.df, c(A, B), function(data.)coef(lm(y~x, data.))) A B (Intercept) x A1:B1 A1 B1 0.333 -1.517960e-16 A2:B2 A2 B2 0.667 3.282015e-16 Thanks for the assistance. I can now tackle the real problem that generated this question. Best Wishes, Spencer Graves Don MacQueen wrote: Since I don't have your by.df to test with I may not have it exactly right, but something along these lines should work: byFits - lapply(split(by.df,paste(by.df$A,by.df$B)), FUN=function(data.) { tmp - coef(lm(y~x,data.)) data.frame(A=unique(data.$A), B=unique(data.$B), intercept=tmp[1], slope=tmp[2]) }) byFitsDF - do.call('rbind',byFits) That's assuming I've got all the closing parantheses in the right places, since my email software (Eudora) doesn't do R syntax checking! This approach can get rather slow if by.df is big, or when the computations in FUN are extensive (or both). If by.df$A has mode character (as opposed to being a factor), then replacing A=unique(data.$A) with A=I(unique(data.$A)) might improve performance. You want to avoid character to factor conversions when using an approach like this. -Don At 2:54 PM -0700 6/5/03, Spencer Graves wrote: Dear R-Help: I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used for loops do do this. Is there a way to use by? Consider the following example: byFits - by(by.df, list(A=by.df$A, B=by.df$B), + function(data.)coef(lm(y~x, data.))) byFits A: A1 B: B1 (Intercept) x 3.33e-01 -1.517960e-16 A: A2 B: B1 NULL A: A1 B: B2 NULL
Re: [R] converting by to a data.frame?
Thanks to Thomas Lumley, Sundar Dorai-Raj, and Don McQueen for their suggestions. I need the INDICES as part of the output data.frame, which McQueen's solution provided. I generalized his method as follows: by.to.data.frame - function(x, INDICES, FUN){ # Split data.frame x on x[,INDICES] # and lapply FUN to each data.frame subset, # returning a data.frame # # Internal functions get.Index - function(x, INDICES){ Ind - as.character(x[,INDICES[1]]) k - length(INDICES) if(k 1) Ind - paste(Ind, get.Index(x, INDICES[-1]), sep=:) Ind } FUN2 - function(data., INDICES, FUN){ vec - FUN(data.) Vec - matrix(vec, nrow=1) dimnames(Vec) - list(NULL, names(vec)) cbind(data.[1,INDICES], Vec) } # Combine INDICES Ind - get.Index(x, INDICES) # Apply ...: Do the work. Split - split(x, Ind) byFits - lapply(Split, FUN2, INDICES, FUN) # Convert to a data.frame do.call('rbind',byFits) } Applying this to my toy problem produces the following: by.df - data.frame(A=rep(c(A1, A2), each=3), + B=rep(c(B1, B2), each=3), x=1:6, y=rep(0:1, length=6)) by.to.data.frame(by.df, c(A, B), function(data.)coef(lm(y~x, data.))) A B (Intercept) x A1:B1 A1 B1 0.333 -1.517960e-16 A2:B2 A2 B2 0.667 3.282015e-16 Thanks for the assistance. I can now tackle the real problem that generated this question. Best Wishes, Spencer Graves Don MacQueen wrote: Since I don't have your by.df to test with I may not have it exactly right, but something along these lines should work: byFits - lapply(split(by.df,paste(by.df$A,by.df$B)), FUN=function(data.) { tmp - coef(lm(y~x,data.)) data.frame(A=unique(data.$A), B=unique(data.$B), intercept=tmp[1], slope=tmp[2]) }) byFitsDF - do.call('rbind',byFits) That's assuming I've got all the closing parantheses in the right places, since my email software (Eudora) doesn't do R syntax checking! This approach can get rather slow if by.df is big, or when the computations in FUN are extensive (or both). If by.df$A has mode character (as opposed to being a factor), then replacing A=unique(data.$A) with A=I(unique(data.$A)) might improve performance. You want to avoid character to factor conversions when using an approach like this. -Don At 2:54 PM -0700 6/5/03, Spencer Graves wrote: Dear R-Help: I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used for loops do do this. Is there a way to use by? Consider the following example: byFits - by(by.df, list(A=by.df$A, B=by.df$B), + function(data.)coef(lm(y~x, data.))) byFits A: A1 B: B1 (Intercept) x 3.33e-01 -1.517960e-16 A: A2 B: B1 NULL A: A1 B: B2 NULL A: A2 B: B2 (Intercept)x 6.67e-01 3.282015e-16 # Desired output: data.frame(A=c(A1,A2), B=c(B1, B2), .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16)) What's the simplest way to do this? Thanks, Spencer Graves __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] converting by to a data.frame?
Dear R-Help: I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used for loops do do this. Is there a way to use by? Consider the following example: byFits - by(by.df, list(A=by.df$A, B=by.df$B), + function(data.)coef(lm(y~x, data.))) byFits A: A1 B: B1 (Intercept) x 3.33e-01 -1.517960e-16 A: A2 B: B1 NULL A: A1 B: B2 NULL A: A2 B: B2 (Intercept)x 6.67e-01 3.282015e-16 # Desired output: data.frame(A=c(A1,A2), B=c(B1, B2), .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16)) What's the simplest way to do this? Thanks, Spencer Graves __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] converting by to a data.frame?
On Thu, 5 Jun 2003, Spencer Graves wrote: Dear R-Help: I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used for loops do do this. Is there a way to use by? Consider the following example: byFits - by(by.df, list(A=by.df$A, B=by.df$B), + function(data.)coef(lm(y~x, data.))) byFits A: A1 B: B1 (Intercept) x 3.33e-01 -1.517960e-16 A: A2 B: B1 NULL A: A1 B: B2 NULL A: A2 B: B2 (Intercept)x 6.67e-01 3.282015e-16 # Desired output: data.frame(A=c(A1,A2), B=c(B1, B2), .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16)) What's the simplest way to do this? do.call(rbind, byFits) -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] converting by to a data.frame?
Hi, Thomas, et al.: Thanks for the reply. Unfortunately, do.call strips off the subset identifiers, which I want to use for further modeling: do.call(rbind, byFits) (Intercept) x [1,] 0.333 -1.517960e-016 [2,] 0.667 3.282015e-016 The following does what I want using a for loop: by.df - data.frame(A=rep(c(A1, A2), each=3), + B=rep(c(B1, B2), each=3), x=1:6, y=rep(0:1, length=6)) by.lvls - paste(as.character(by.df$A), as.character(by.df$B), sep=:) A.B - unique(by.lvls) Fits - data.frame(A.B = A.B, .Intercept.=rep(NA, length(A.B)), + x=rep(NA, length(A.B))) Fits$A - substring(A.B, 1, regexpr(:, A.B)-1) Fits$B - substring(A.B, regexpr(:, A.B)+1) for(i in 1:length(A.B)) + Fits[i, 2:3] - coef(lm(y~x, by.df[by.lvls==A.B[i],])) Fits A.B X.Intercept. x A B 1 A1:B10.333 -1.517960e-16 A1 B1 2 A2:B20.667 3.282015e-16 A2 B2 I wondered if there was something easier. Thanks again for your reply. Spencer Graves Thomas Lumley wrote: On Thu, 5 Jun 2003, Spencer Graves wrote: Dear R-Help: I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used for loops do do this. Is there a way to use by? Consider the following example: byFits - by(by.df, list(A=by.df$A, B=by.df$B), + function(data.)coef(lm(y~x, data.))) byFits A: A1 B: B1 (Intercept) x 3.33e-01 -1.517960e-16 A: A2 B: B1 NULL A: A1 B: B2 NULL A: A2 B: B2 (Intercept)x 6.67e-01 3.282015e-16 # Desired output: data.frame(A=c(A1,A2), B=c(B1, B2), .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16)) What's the simplest way to do this? do.call(rbind, byFits) -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] converting by to a data.frame?
Since I don't have your by.df to test with I may not have it exactly right, but something along these lines should work: byFits - lapply(split(by.df,paste(by.df$A,by.df$B)), FUN=function(data.) { tmp - coef(lm(y~x,data.)) data.frame(A=unique(data.$A), B=unique(data.$B), intercept=tmp[1], slope=tmp[2]) }) byFitsDF - do.call('rbind',byFits) That's assuming I've got all the closing parantheses in the right places, since my email software (Eudora) doesn't do R syntax checking! This approach can get rather slow if by.df is big, or when the computations in FUN are extensive (or both). If by.df$A has mode character (as opposed to being a factor), then replacing A=unique(data.$A) with A=I(unique(data.$A)) might improve performance. You want to avoid character to factor conversions when using an approach like this. -Don At 2:54 PM -0700 6/5/03, Spencer Graves wrote: Dear R-Help: I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used for loops do do this. Is there a way to use by? Consider the following example: byFits - by(by.df, list(A=by.df$A, B=by.df$B), + function(data.)coef(lm(y~x, data.))) byFits A: A1 B: B1 (Intercept) x 3.33e-01 -1.517960e-16 A: A2 B: B1 NULL A: A1 B: B2 NULL A: A2 B: B2 (Intercept)x 6.67e-01 3.282015e-16 # Desired output: data.frame(A=c(A1,A2), B=c(B1, B2), .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16)) What's the simplest way to do this? Thanks, Spencer Graves __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] converting by to a data.frame?
Spencer, Would sapply be better here? R by.df - data.frame(A=rep(c(A1, A2), each=3), R+ B=rep(c(B1, B2), each=3), R+ x=1:6, y=rep(0:1, length=6)) R t(sapply(split(by.df, do.call(paste, c(by.df[, 1:2], sep = :))), R+ function(x) coef(lm(y ~ x, data = x (Intercept) x A1:B1 0.333 -1.517960e-16 A2:B2 0.667 3.282015e-16 R Sundar Spencer Graves wrote: Hi, Thomas, et al.: Thanks for the reply. Unfortunately, do.call strips off the subset identifiers, which I want to use for further modeling: do.call(rbind, byFits) (Intercept) x [1,] 0.333 -1.517960e-016 [2,] 0.667 3.282015e-016 The following does what I want using a for loop: by.df - data.frame(A=rep(c(A1, A2), each=3), + B=rep(c(B1, B2), each=3), x=1:6, y=rep(0:1, length=6)) by.lvls - paste(as.character(by.df$A), as.character(by.df$B), sep=:) A.B - unique(by.lvls) Fits - data.frame(A.B = A.B, .Intercept.=rep(NA, length(A.B)), + x=rep(NA, length(A.B))) Fits$A - substring(A.B, 1, regexpr(:, A.B)-1) Fits$B - substring(A.B, regexpr(:, A.B)+1) for(i in 1:length(A.B)) + Fits[i, 2:3] - coef(lm(y~x, by.df[by.lvls==A.B[i],])) Fits A.B X.Intercept. x A B 1 A1:B10.333 -1.517960e-16 A1 B1 2 A2:B20.667 3.282015e-16 A2 B2 I wondered if there was something easier. Thanks again for your reply. Spencer Graves Thomas Lumley wrote: On Thu, 5 Jun 2003, Spencer Graves wrote: Dear R-Help: I want to (a) subset a data.frame by several columns, (b) fit a model to each subset, and (c) store a vector of results from the fit in the columns of a data.frame. In the past, I've used for loops do do this. Is there a way to use by? Consider the following example: byFits - by(by.df, list(A=by.df$A, B=by.df$B), + function(data.)coef(lm(y~x, data.))) byFits A: A1 B: B1 (Intercept) x 3.33e-01 -1.517960e-16 A: A2 B: B1 NULL A: A1 B: B2 NULL A: A2 B: B2 (Intercept)x 6.67e-01 3.282015e-16 # Desired output: data.frame(A=c(A1,A2), B=c(B1, B2), .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16)) What's the simplest way to do this? do.call(rbind, byFits) -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help