Re: [R] converting by to a data.frame?

2003-06-09 Thread Spencer Graves
Hi, Don:

	  Thanks for your suggestion to use do.call in my get.Index. I 
discovered that your version actually produces cosmetically different 
answers in R 1.6.3 and S-Plus 6.1 for Windows.  Fortunately, in the 
context, this difference was unimportant.  Since yours is faster, it is 
clearly superior.

	  To check my understanding, I generalized my toy example as follows:

 by.df - data.frame(A=rep(c(A1, A2), each=3),
+  B=rep(c(B1, B2), each=3), C=rep(c(C1, C2), each=3),
+  x=1:6, y=rep(0:1, length=6))
	  With this, your get.Index produced the following in R 1.6.2:

 get.Index - function(x, INDICES) do.call('paste',c(x[INDICES],sep=':'))
 get.Index(by.df, c(A, B, C))
[1] A1:B1:C1 A1:B1:C1 A1:B1:C1 A2:B2:C2 A2:B2:C2 A2:B2:C2
In S-Plus 6.1 for Windows, I got the following:

 get.Index - function(x, INDICES)
do.call(paste, c(x[INDICES], sep = :))
 get.Index(by.df, c(A, B, C))
[1] 1:1:1 1:1:1 1:1:1 2:2:2 2:2:2 2:2:2
	  Fortunately, this difference is unimportant in this context, as 
by.to.data.frame produces the same answer in both cases.  Moreover, 
your answer converts to a single call to paste, which means that it 
should be faster.  For someone who understands do.call, your version 
is also easier to read.

Thanks again for your help.
Spencer Graves
##
Don MacQueen wrote:
 Glad to hear it was helpful.

 You can also use the do.call trick for the paste indices business.

 Try
get.Index - function(x, INDICES) 
do.call('paste',c(x[INDICES],sep=':'))

 This works because a data frame is actually a list, albeit a special
 kind of list, and do.call() wants a list for its second arg.

 -Don
###
Thanks to Thomas Lumley, Sundar Dorai-Raj, and Don McQueen for their
suggestions.  I need the INDICES as part of the output data.frame, which
McQueen's solution provided.  I generalized his method as follows:

by.to.data.frame -
function(x, INDICES, FUN){
# Split data.frame x on x[,INDICES]
# and lapply FUN to each data.frame subset,
# returning a data.frame
#
#  Internal functions
get.Index - function(x, INDICES){
Ind - as.character(x[,INDICES[1]])
k - length(INDICES)
if(k  1)
Ind - paste(Ind, get.Index(x, INDICES[-1]), sep=:)  
Ind 
 }
 FUN2 - function(data., INDICES, FUN){
vec - FUN(data.)
Vec - matrix(vec, nrow=1)
dimnames(Vec) - list(NULL, names(vec))
cbind(data.[1,INDICES], Vec)
 }
#   Combine INDICES
 Ind - get.Index(x, INDICES)
#   Apply ...:  Do the work.
 Split - split(x, Ind)
 byFits - lapply(Split, FUN2, INDICES, FUN)
#   Convert to a data.frame
 do.call('rbind',byFits)
}
Applying this to my toy problem produces the following:

  by.df - data.frame(A=rep(c(A1, A2), each=3),
+  B=rep(c(B1, B2), each=3), x=1:6, y=rep(0:1, length=6))
 
  by.to.data.frame(by.df, c(A, B), function(data.)coef(lm(y~x, 
data.)))
A  B (Intercept) x
A1:B1 A1 B1   0.333 -1.517960e-16
A2:B2 A2 B2   0.667  3.282015e-16

Thanks for the assistance.  I can now tackle the real problem that
generated this question.
Best Wishes,
Spencer Graves

Don MacQueen wrote:
 Since I don't have your by.df to test with I may not have it exactly
 right, but something along these lines should work:

 byFits - lapply(split(by.df,paste(by.df$A,by.df$B)),
  FUN=function(data.) {
 tmp - coef(lm(y~x,data.))
 data.frame(A=unique(data.$A),
B=unique(data.$B),
intercept=tmp[1],
slope=tmp[2])
})

 byFitsDF - do.call('rbind',byFits)

 That's assuming I've got all the closing parantheses in the right
 places, since my email software (Eudora) doesn't do R syntax checking!

 This approach can get rather slow if by.df is big, or when the
 computations in FUN are extensive (or both).

 If by.df$A has mode character (as opposed to being a factor), then
 replacing A=unique(data.$A) with A=I(unique(data.$A)) might improve
 performance. You want to avoid character to factor conversions when
 using an approach like this.

 -Don


 At 2:54 PM -0700 6/5/03, Spencer Graves wrote:

 Dear R-Help:

   I want to (a) subset a data.frame by several columns, (b) fit a
 model to each subset, and (c) store a vector of results from the fit
 in the columns of a data.frame.  In the past, I've used for loops do
 do this.  Is there a way to use by?

   Consider the following example:

   byFits - by(by.df, list(A=by.df$A, B=by.df$B),
 +  function(data.)coef(lm(y~x, data.)))
   byFits
 A: A1
 B: B1
   (Intercept) x
  3.33e-01 -1.517960e-16
 
 A: A2
 B: B1
 NULL
 
 A: A1
 B: B2
 NULL
 

Re: [R] converting by to a data.frame?

2003-06-08 Thread Spencer Graves
Thanks to Thomas Lumley, Sundar Dorai-Raj, and Don McQueen for their 
suggestions.  I need the INDICES as part of the output data.frame, which 
McQueen's solution provided.  I generalized his method as follows:

by.to.data.frame -
function(x, INDICES, FUN){
# Split data.frame x on x[,INDICES]
# and lapply FUN to each data.frame subset,
# returning a data.frame
#
#  Internal functions
   get.Index - function(x, INDICES){
Ind - as.character(x[,INDICES[1]])
k - length(INDICES)
if(k  1)
Ind - paste(Ind, get.Index(x, INDICES[-1]), sep=:)  
Ind 
}
FUN2 - function(data., INDICES, FUN){
vec - FUN(data.)
Vec - matrix(vec, nrow=1)
dimnames(Vec) - list(NULL, names(vec))
cbind(data.[1,INDICES], Vec)
}
#   Combine INDICES
Ind - get.Index(x, INDICES)
#   Apply ...:  Do the work.
Split - split(x, Ind)
byFits - lapply(Split, FUN2, INDICES, FUN)
#   Convert to a data.frame
do.call('rbind',byFits) 
}
Applying this to my toy problem produces the following:

 by.df - data.frame(A=rep(c(A1, A2), each=3),
+  B=rep(c(B1, B2), each=3), x=1:6, y=rep(0:1, length=6))

 by.to.data.frame(by.df, c(A, B), function(data.)coef(lm(y~x, data.)))
   A  B (Intercept) x
A1:B1 A1 B1   0.333 -1.517960e-16
A2:B2 A2 B2   0.667  3.282015e-16
Thanks for the assistance.  I can now tackle the real problem that 
generated this question.

Best Wishes,
Spencer Graves

Don MacQueen wrote:
Since I don't have your by.df to test with I may not have it exactly 
right, but something along these lines should work:

byFits - lapply(split(by.df,paste(by.df$A,by.df$B)),
 FUN=function(data.) {
tmp - coef(lm(y~x,data.))
data.frame(A=unique(data.$A),
   B=unique(data.$B),
   intercept=tmp[1],
   slope=tmp[2])
   })
byFitsDF - do.call('rbind',byFits)

That's assuming I've got all the closing parantheses in the right 
places, since my email software (Eudora) doesn't do R syntax checking!

This approach can get rather slow if by.df is big, or when the 
computations in FUN are extensive (or both).

If by.df$A has mode character (as opposed to being a factor), then 
replacing A=unique(data.$A) with A=I(unique(data.$A)) might improve 
performance. You want to avoid character to factor conversions when 
using an approach like this.

-Don

At 2:54 PM -0700 6/5/03, Spencer Graves wrote:

Dear R-Help:

  I want to (a) subset a data.frame by several columns, (b) fit a 
model to each subset, and (c) store a vector of results from the fit 
in the columns of a data.frame.  In the past, I've used for loops do 
do this.  Is there a way to use by?

  Consider the following example:

  byFits - by(by.df, list(A=by.df$A, B=by.df$B),
+  function(data.)coef(lm(y~x, data.)))
  byFits
A: A1
B: B1
  (Intercept) x
 3.33e-01 -1.517960e-16

A: A2
B: B1
NULL

A: A1
B: B2
NULL

A: A2
B: B2
 (Intercept)x
6.67e-01 3.282015e-16


#
Desired output:
data.frame(A=c(A1,A2), B=c(B1, B2),
.Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16))
What's the simplest way to do this?
Thanks,
Spencer Graves
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] converting by to a data.frame?

2003-06-06 Thread Spencer Graves
Dear R-Help:

	  I want to (a) subset a data.frame by several columns, (b) fit a model 
to each subset, and (c) store a vector of results from the fit in the 
columns of a data.frame.  In the past, I've used for loops do do this. 
 Is there a way to use by?

	  Consider the following example:

 byFits - by(by.df, list(A=by.df$A, B=by.df$B),
+  function(data.)coef(lm(y~x, data.)))
 byFits
A: A1
B: B1
  (Intercept) x
 3.33e-01 -1.517960e-16

A: A2
B: B1
NULL

A: A1
B: B2
NULL

A: A2
B: B2
 (Intercept)x
6.67e-01 3.282015e-16


#
Desired output:
data.frame(A=c(A1,A2), B=c(B1, B2),
.Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16))
What's the simplest way to do this?
Thanks,
Spencer Graves
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] converting by to a data.frame?

2003-06-06 Thread Thomas Lumley
On Thu, 5 Jun 2003, Spencer Graves wrote:

 Dear R-Help:

 I want to (a) subset a data.frame by several columns, (b) fit a model
 to each subset, and (c) store a vector of results from the fit in the
 columns of a data.frame.  In the past, I've used for loops do do this.
   Is there a way to use by?

 Consider the following example:

   byFits - by(by.df, list(A=by.df$A, B=by.df$B),
 +  function(data.)coef(lm(y~x, data.)))
   byFits
 A: A1
 B: B1
(Intercept) x
   3.33e-01 -1.517960e-16
 
 A: A2
 B: B1
 NULL
 
 A: A1
 B: B2
 NULL
 
 A: A2
 B: B2
   (Intercept)x
 6.67e-01 3.282015e-16
  
  
 #
 Desired output:

 data.frame(A=c(A1,A2), B=c(B1, B2),
   .Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16))

 What's the simplest way to do this?

do.call(rbind, byFits)


-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] converting by to a data.frame?

2003-06-06 Thread Spencer Graves
Hi, Thomas, et al.:

Thanks for the reply.  Unfortunately, do.call strips off the subset 
identifiers, which I want to use for further modeling:

 do.call(rbind, byFits)
 (Intercept)  x
[1,]   0.333 -1.517960e-016
[2,]   0.667  3.282015e-016
The following does what I want using a for loop:

 by.df - data.frame(A=rep(c(A1, A2), each=3),
+  B=rep(c(B1, B2), each=3), x=1:6, y=rep(0:1, length=6))
 by.lvls - paste(as.character(by.df$A), as.character(by.df$B), sep=:)
 A.B - unique(by.lvls)
 Fits - data.frame(A.B = A.B, .Intercept.=rep(NA, length(A.B)),
+  x=rep(NA, length(A.B)))
 Fits$A - substring(A.B, 1, regexpr(:, A.B)-1)
 Fits$B - substring(A.B, regexpr(:, A.B)+1)
 for(i in 1:length(A.B))
+  Fits[i, 2:3] - coef(lm(y~x, by.df[by.lvls==A.B[i],]))
 Fits
A.B X.Intercept. x  A  B
1 A1:B10.333 -1.517960e-16 A1 B1
2 A2:B20.667  3.282015e-16 A2 B2

	  I wondered if there was something easier.

Thanks again for your reply.
Spencer Graves
Thomas Lumley wrote:
On Thu, 5 Jun 2003, Spencer Graves wrote:


Dear R-Help:

  I want to (a) subset a data.frame by several columns, (b) fit a model
to each subset, and (c) store a vector of results from the fit in the
columns of a data.frame.  In the past, I've used for loops do do this.
 Is there a way to use by?
	  Consider the following example:

 byFits - by(by.df, list(A=by.df$A, B=by.df$B),
+  function(data.)coef(lm(y~x, data.)))
 byFits
A: A1
B: B1
  (Intercept) x
 3.33e-01 -1.517960e-16

A: A2
B: B1
NULL

A: A1
B: B2
NULL

A: A2
B: B2
 (Intercept)x
6.67e-01 3.282015e-16


#
Desired output:
data.frame(A=c(A1,A2), B=c(B1, B2),
.Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16))
What's the simplest way to do this?


do.call(rbind, byFits)

	-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] converting by to a data.frame?

2003-06-06 Thread Don MacQueen
Since I don't have your by.df to test with I may not have it exactly 
right, but something along these lines should work:

byFits - lapply(split(by.df,paste(by.df$A,by.df$B)),
 FUN=function(data.) {
tmp - coef(lm(y~x,data.))
data.frame(A=unique(data.$A),
   B=unique(data.$B),
   intercept=tmp[1],
   slope=tmp[2])
   })
byFitsDF - do.call('rbind',byFits)

That's assuming I've got all the closing parantheses in the right 
places, since my email software (Eudora) doesn't do R syntax checking!

This approach can get rather slow if by.df is big, or when the 
computations in FUN are extensive (or both).

If by.df$A has mode character (as opposed to being a factor), then 
replacing A=unique(data.$A) with A=I(unique(data.$A)) might improve 
performance. You want to avoid character to factor conversions when 
using an approach like this.

-Don

At 2:54 PM -0700 6/5/03, Spencer Graves wrote:
Dear R-Help:

	  I want to (a) subset a data.frame by several columns, (b) 
fit a model to each subset, and (c) store a vector of results from 
the fit in the columns of a data.frame.  In the past, I've used 
for loops do do this.  Is there a way to use by?

	  Consider the following example:

  byFits - by(by.df, list(A=by.df$A, B=by.df$B),
+  function(data.)coef(lm(y~x, data.)))
  byFits
A: A1
B: B1
  (Intercept) x
 3.33e-01 -1.517960e-16

A: A2
B: B1
NULL

A: A1
B: B2
NULL

A: A2
B: B2
 (Intercept)x
6.67e-01 3.282015e-16


#
Desired output:
data.frame(A=c(A1,A2), B=c(B1, B2),
.Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16))
What's the simplest way to do this?
Thanks,
Spencer Graves
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] converting by to a data.frame?

2003-06-06 Thread Sundar Dorai-Raj
Spencer,
  Would sapply be better here?
R by.df - data.frame(A=rep(c(A1, A2), each=3),
R+ B=rep(c(B1, B2), each=3),
R+ x=1:6, y=rep(0:1, length=6))
R t(sapply(split(by.df, do.call(paste, c(by.df[, 1:2], sep = :))),
R+  function(x) coef(lm(y ~ x, data = x
  (Intercept) x
A1:B1   0.333 -1.517960e-16
A2:B2   0.667  3.282015e-16
R
Sundar

Spencer Graves wrote:
Hi, Thomas, et al.:

Thanks for the reply.  Unfortunately, do.call strips off the subset 
identifiers, which I want to use for further modeling:

  do.call(rbind, byFits)
 (Intercept)  x
[1,]   0.333 -1.517960e-016
[2,]   0.667  3.282015e-016
The following does what I want using a for loop:

  by.df - data.frame(A=rep(c(A1, A2), each=3),
+  B=rep(c(B1, B2), each=3), x=1:6, y=rep(0:1, length=6))
  by.lvls - paste(as.character(by.df$A), as.character(by.df$B), sep=:)
  A.B - unique(by.lvls)
  Fits - data.frame(A.B = A.B, .Intercept.=rep(NA, length(A.B)),
+  x=rep(NA, length(A.B)))
  Fits$A - substring(A.B, 1, regexpr(:, A.B)-1)
  Fits$B - substring(A.B, regexpr(:, A.B)+1)
  for(i in 1:length(A.B))
+  Fits[i, 2:3] - coef(lm(y~x, by.df[by.lvls==A.B[i],]))
  Fits
A.B X.Intercept. x  A  B
1 A1:B10.333 -1.517960e-16 A1 B1
2 A2:B20.667  3.282015e-16 A2 B2
 
  I wondered if there was something easier.

Thanks again for your reply.
Spencer Graves
Thomas Lumley wrote:

On Thu, 5 Jun 2003, Spencer Graves wrote:


Dear R-Help:

  I want to (a) subset a data.frame by several columns, (b) fit a 
model
to each subset, and (c) store a vector of results from the fit in the
columns of a data.frame.  In the past, I've used for loops do do this.
 Is there a way to use by?

  Consider the following example:

 byFits - by(by.df, list(A=by.df$A, B=by.df$B),
+  function(data.)coef(lm(y~x, data.)))
 byFits
A: A1
B: B1
  (Intercept) x
 3.33e-01 -1.517960e-16

A: A2
B: B1
NULL

A: A1
B: B2
NULL

A: A2
B: B2
 (Intercept)x
6.67e-01 3.282015e-16


#
Desired output:
data.frame(A=c(A1,A2), B=c(B1, B2),
.Intercept.=c(1/3, 2/3), x=c(-1.5e-16, 3.3e-16))
What's the simplest way to do this?


do.call(rbind, byFits)

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help