Re: [R] Repeated Aggregation with data.table

2012-09-21 Thread Elliot Joel Bernstein
I still haven't come up with a solution to the question below, and I have
another one. I frequently find myself in a situation where I have the list
of columns I want to aggregate over in the form of a vector of strings, and
I have to do something like the following:

dat[, list(mean.z = mean(z)), by = eval(parse(text = sprintf(list(%s),
paste(x, collapse=,]

I think that's a pretty ugly solution (although it does work), but I
haven't come up with anything better. Any suggestions?

Thanks.

- Elliot

On Tue, Sep 11, 2012 at 11:33 AM, Elliot Joel Bernstein 
elliot.bernst...@fdopartners.com wrote:

 I've been using this setup:


  flist - expression( list(mean.z = mean(z), sd.z = sd(z)) )
  dat[ , eval(flist), list(x)]

 It works great, but there's one small catch. If I do something like

  flist - expression(list(x.per.y = sum(x) / sum(y)))
  dat[, eval(flist), list(y)]

 it does the wrong thing, because sum(y) in each group is just the common
 value, rather than that value times the length. Is there any way around
 this? Obviously I could rewrite the expression if I know I'm going to by
 grouping by y, but I'd like it to be generic.

 Thanks.

 - Elliot


 On Wed, Aug 8, 2012 at 9:17 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Aug 7, 2012, at 9:28 PM, arun wrote:

  HI,

 Try this:

 fun1-function(x,.expr){
   .expr-expression(list(mean.z=**mean(z),sd.z=sd(z)))
  z1-eval(.expr)
  }

 #or
 fun1-function(x,.expr){
   .expr-expression(list(mean.z=**mean(z),sd.z=sd(z)))
  z1-.expr
  }


  dat[,eval(z1),list(x)]
 dat[,eval(z1),list(y)]
 dat[,eval(z1),list(x,y)]


 I'm not seeing the connection between those functions and the data.table
 call. (Running that code produces an error on my machine.) If the goal is
 to have an expression result then just create it with expression(). In the
 example:

  flist - expression( list(mean.z = mean(z), sd.z = sd(z)) )
  dat[ , eval(flist), list(x)]
x  mean.z sd.z
 1: 2  0.04436034 1.039615
 2: 3 -0.06354504 1.077686
 3: 1 -0.08879671 1.066916

 --
 David.


  A.K.



 - Original Message -
 From: Elliot Joel Bernstein 
 elliot.bernstein@fdopartners.**comelliot.bernst...@fdopartners.com
 
 To: r-help@r-project.org
 Cc:
 Sent: Tuesday, August 7, 2012 5:36 PM
 Subject: [R] Repeated Aggregation with data.table

 I have been using ddply to do aggregation, and I frequently define a
 single aggregation function that I use to aggregate over different
 groups. For example,

 require(plyr)

 dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100,
 replace = TRUE), z = rnorm(100))

 f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) }

 ddply(dat, x, f)
 ddply(dat, y, f)
 ddply(dat, c(x, y), f)

 I recently discovered the data.table package, which dramatically
 speeds up the aggregation:

 require(data.table)
 dat - data.table(dat)

 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)]
 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)]
 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)]

 But I can't figure out how to save the aggregation function
 list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse,
 similar to the function f above. Can someone please explain how to
 do that?

 Thanks.

 - Elliot

 --
 Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
 134 Mount Auburn Street | Cambridge, MA | 02138
 Phone: (617) 503-4619 | Email: 
 elliot.bernstein@fdopartners.**comelliot.bernst...@fdopartners.com

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 Alameda, CA, USA




 --
 Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
 134 Mount Auburn Street | Cambridge, MA | 02138
 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com




-- 
Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated Aggregation with data.table

2012-09-11 Thread Elliot Joel Bernstein
I've been using this setup:

 flist - expression( list(mean.z = mean(z), sd.z = sd(z)) )
 dat[ , eval(flist), list(x)]

It works great, but there's one small catch. If I do something like

 flist - expression(list(x.per.y = sum(x) / sum(y)))
 dat[, eval(flist), list(y)]

it does the wrong thing, because sum(y) in each group is just the common
value, rather than that value times the length. Is there any way around
this? Obviously I could rewrite the expression if I know I'm going to by
grouping by y, but I'd like it to be generic.

Thanks.

- Elliot

On Wed, Aug 8, 2012 at 9:17 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Aug 7, 2012, at 9:28 PM, arun wrote:

  HI,

 Try this:

 fun1-function(x,.expr){
   .expr-expression(list(mean.z=**mean(z),sd.z=sd(z)))
  z1-eval(.expr)
  }

 #or
 fun1-function(x,.expr){
   .expr-expression(list(mean.z=**mean(z),sd.z=sd(z)))
  z1-.expr
  }


  dat[,eval(z1),list(x)]
 dat[,eval(z1),list(y)]
 dat[,eval(z1),list(x,y)]


 I'm not seeing the connection between those functions and the data.table
 call. (Running that code produces an error on my machine.) If the goal is
 to have an expression result then just create it with expression(). In the
 example:

  flist - expression( list(mean.z = mean(z), sd.z = sd(z)) )
  dat[ , eval(flist), list(x)]
x  mean.z sd.z
 1: 2  0.04436034 1.039615
 2: 3 -0.06354504 1.077686
 3: 1 -0.08879671 1.066916

 --
 David.


  A.K.



 - Original Message -
 From: Elliot Joel Bernstein 
 elliot.bernstein@fdopartners.**comelliot.bernst...@fdopartners.com
 
 To: r-help@r-project.org
 Cc:
 Sent: Tuesday, August 7, 2012 5:36 PM
 Subject: [R] Repeated Aggregation with data.table

 I have been using ddply to do aggregation, and I frequently define a
 single aggregation function that I use to aggregate over different
 groups. For example,

 require(plyr)

 dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100,
 replace = TRUE), z = rnorm(100))

 f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) }

 ddply(dat, x, f)
 ddply(dat, y, f)
 ddply(dat, c(x, y), f)

 I recently discovered the data.table package, which dramatically
 speeds up the aggregation:

 require(data.table)
 dat - data.table(dat)

 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)]
 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)]
 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)]

 But I can't figure out how to save the aggregation function
 list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse,
 similar to the function f above. Can someone please explain how to
 do that?

 Thanks.

 - Elliot

 --
 Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
 134 Mount Auburn Street | Cambridge, MA | 02138
 Phone: (617) 503-4619 | Email: 
 elliot.bernstein@fdopartners.**comelliot.bernst...@fdopartners.com

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 Alameda, CA, USA




-- 
Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated Aggregation with data.table

2012-08-08 Thread David Winsemius


On Aug 7, 2012, at 9:28 PM, arun wrote:


HI,

Try this:

fun1-function(x,.expr){
  .expr-expression(list(mean.z=mean(z),sd.z=sd(z)))
 z1-eval(.expr)
 }

#or
fun1-function(x,.expr){
  .expr-expression(list(mean.z=mean(z),sd.z=sd(z)))
 z1-.expr
 }


 dat[,eval(z1),list(x)]
dat[,eval(z1),list(y)]
dat[,eval(z1),list(x,y)]



I'm not seeing the connection between those functions and the  
data.table call. (Running that code produces an error on my machine.)  
If the goal is to have an expression result then just create it with  
expression(). In the example:


 flist - expression( list(mean.z = mean(z), sd.z = sd(z)) )
 dat[ , eval(flist), list(x)]
   x  mean.z sd.z
1: 2  0.04436034 1.039615
2: 3 -0.06354504 1.077686
3: 1 -0.08879671 1.066916

--
David.


A.K.



- Original Message -
From: Elliot Joel Bernstein elliot.bernst...@fdopartners.com
To: r-help@r-project.org
Cc:
Sent: Tuesday, August 7, 2012 5:36 PM
Subject: [R] Repeated Aggregation with data.table

I have been using ddply to do aggregation, and I frequently define a
single aggregation function that I use to aggregate over different
groups. For example,

require(plyr)

dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100,
replace = TRUE), z = rnorm(100))

f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) }

ddply(dat, x, f)
ddply(dat, y, f)
ddply(dat, c(x, y), f)

I recently discovered the data.table package, which dramatically
speeds up the aggregation:

require(data.table)
dat - data.table(dat)

dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)]
dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)]
dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)]

But I can't figure out how to save the aggregation function
list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse,
similar to the function f above. Can someone please explain how to
do that?

Thanks.

- Elliot

--
Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated Aggregation with data.table

2012-08-08 Thread arun
HI David,

Thanks for testing.

It's a bit strange.  Yesterday, the function was working perfectly.  Today, it 
is not working in my system.  Not sure what happened.
A.K.  



- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: arun smartpink...@yahoo.com
Cc: Elliot Joel Bernstein elliot.bernst...@fdopartners.com; R help 
r-help@r-project.org
Sent: Wednesday, August 8, 2012 9:17 AM
Subject: Re: [R] Repeated Aggregation with data.table


On Aug 7, 2012, at 9:28 PM, arun wrote:

 HI,

 Try this:

 fun1-function(x,.expr){
   .expr-expression(list(mean.z=mean(z),sd.z=sd(z)))
  z1-eval(.expr)
  }

 #or
 fun1-function(x,.expr){
   .expr-expression(list(mean.z=mean(z),sd.z=sd(z)))
  z1-.expr
  }


  dat[,eval(z1),list(x)]
 dat[,eval(z1),list(y)]
 dat[,eval(z1),list(x,y)]


I'm not seeing the connection between those functions and the  
data.table call. (Running that code produces an error on my machine.)  
If the goal is to have an expression result then just create it with  
expression(). In the example:

 flist - expression( list(mean.z = mean(z), sd.z = sd(z)) )
 dat[ , eval(flist), list(x)]
    x      mean.z     sd.z
1: 2  0.04436034 1.039615
2: 3 -0.06354504 1.077686
3: 1 -0.08879671 1.066916

-- 
David.

 A.K.



 - Original Message -
 From: Elliot Joel Bernstein elliot.bernst...@fdopartners.com
 To: r-help@r-project.org
 Cc:
 Sent: Tuesday, August 7, 2012 5:36 PM
 Subject: [R] Repeated Aggregation with data.table

 I have been using ddply to do aggregation, and I frequently define a
 single aggregation function that I use to aggregate over different
 groups. For example,

 require(plyr)

 dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100,
 replace = TRUE), z = rnorm(100))

 f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) }

 ddply(dat, x, f)
 ddply(dat, y, f)
 ddply(dat, c(x, y), f)

 I recently discovered the data.table package, which dramatically
 speeds up the aggregation:

 require(data.table)
 dat - data.table(dat)

 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)]
 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)]
 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)]

 But I can't figure out how to save the aggregation function
 list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse,
 similar to the function f above. Can someone please explain how to
 do that?

 Thanks.

 - Elliot

 -- 
 Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
 134 Mount Auburn Street | Cambridge, MA | 02138
 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated Aggregation with data.table

2012-08-07 Thread R. Michael Weylandt
On Tue, Aug 7, 2012 at 4:36 PM, Elliot Joel Bernstein
elliot.bernst...@fdopartners.com wrote:
 I have been using ddply to do aggregation, and I frequently define a
 single aggregation function that I use to aggregate over different
 groups. For example,

 require(plyr)

 dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100,
 replace = TRUE), z = rnorm(100))

 f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) }

 ddply(dat, x, f)
 ddply(dat, y, f)
 ddply(dat, c(x, y), f)

 I recently discovered the data.table package, which dramatically
 speeds up the aggregation:

 require(data.table)
 dat - data.table(dat)

 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)]
 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)]
 dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)]

 But I can't figure out how to save the aggregation function
 list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse,
 similar to the function f above. Can someone please explain how to
 do that?

One exceptionally kludgy way:

zzz - expression(list(mean.z = mean(z), sd.z = sd(z)))

dat[, eval(zzz), list(x,y)]

Michael


 Thanks.

 - Elliot

 --
 Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
 134 Mount Auburn Street | Cambridge, MA | 02138
 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated Aggregation with data.table

2012-08-07 Thread arun
HI,

Try this:

fun1-function(x,.expr){
  .expr-expression(list(mean.z=mean(z),sd.z=sd(z)))
 z1-eval(.expr)
 }

#or
fun1-function(x,.expr){
  .expr-expression(list(mean.z=mean(z),sd.z=sd(z)))
 z1-.expr
 }


 dat[,eval(z1),list(x)]
dat[,eval(z1),list(y)]
dat[,eval(z1),list(x,y)]

A.K.



- Original Message -
From: Elliot Joel Bernstein elliot.bernst...@fdopartners.com
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 7, 2012 5:36 PM
Subject: [R] Repeated Aggregation with data.table

I have been using ddply to do aggregation, and I frequently define a
single aggregation function that I use to aggregate over different
groups. For example,

require(plyr)

dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100,
replace = TRUE), z = rnorm(100))

f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) }

ddply(dat, x, f)
ddply(dat, y, f)
ddply(dat, c(x, y), f)

I recently discovered the data.table package, which dramatically
speeds up the aggregation:

require(data.table)
dat - data.table(dat)

dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)]
dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)]
dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)]

But I can't figure out how to save the aggregation function
list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse,
similar to the function f above. Can someone please explain how to
do that?

Thanks.

- Elliot

-- 
Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.