Re: [R] Repeated Aggregation with data.table
I still haven't come up with a solution to the question below, and I have another one. I frequently find myself in a situation where I have the list of columns I want to aggregate over in the form of a vector of strings, and I have to do something like the following: dat[, list(mean.z = mean(z)), by = eval(parse(text = sprintf(list(%s), paste(x, collapse=,] I think that's a pretty ugly solution (although it does work), but I haven't come up with anything better. Any suggestions? Thanks. - Elliot On Tue, Sep 11, 2012 at 11:33 AM, Elliot Joel Bernstein elliot.bernst...@fdopartners.com wrote: I've been using this setup: flist - expression( list(mean.z = mean(z), sd.z = sd(z)) ) dat[ , eval(flist), list(x)] It works great, but there's one small catch. If I do something like flist - expression(list(x.per.y = sum(x) / sum(y))) dat[, eval(flist), list(y)] it does the wrong thing, because sum(y) in each group is just the common value, rather than that value times the length. Is there any way around this? Obviously I could rewrite the expression if I know I'm going to by grouping by y, but I'd like it to be generic. Thanks. - Elliot On Wed, Aug 8, 2012 at 9:17 AM, David Winsemius dwinsem...@comcast.netwrote: On Aug 7, 2012, at 9:28 PM, arun wrote: HI, Try this: fun1-function(x,.expr){ .expr-expression(list(mean.z=**mean(z),sd.z=sd(z))) z1-eval(.expr) } #or fun1-function(x,.expr){ .expr-expression(list(mean.z=**mean(z),sd.z=sd(z))) z1-.expr } dat[,eval(z1),list(x)] dat[,eval(z1),list(y)] dat[,eval(z1),list(x,y)] I'm not seeing the connection between those functions and the data.table call. (Running that code produces an error on my machine.) If the goal is to have an expression result then just create it with expression(). In the example: flist - expression( list(mean.z = mean(z), sd.z = sd(z)) ) dat[ , eval(flist), list(x)] x mean.z sd.z 1: 2 0.04436034 1.039615 2: 3 -0.06354504 1.077686 3: 1 -0.08879671 1.066916 -- David. A.K. - Original Message - From: Elliot Joel Bernstein elliot.bernstein@fdopartners.**comelliot.bernst...@fdopartners.com To: r-help@r-project.org Cc: Sent: Tuesday, August 7, 2012 5:36 PM Subject: [R] Repeated Aggregation with data.table I have been using ddply to do aggregation, and I frequently define a single aggregation function that I use to aggregate over different groups. For example, require(plyr) dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, replace = TRUE), z = rnorm(100)) f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } ddply(dat, x, f) ddply(dat, y, f) ddply(dat, c(x, y), f) I recently discovered the data.table package, which dramatically speeds up the aggregation: require(data.table) dat - data.table(dat) dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] But I can't figure out how to save the aggregation function list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse, similar to the function f above. Can someone please explain how to do that? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernstein@fdopartners.**comelliot.bernst...@fdopartners.com __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated Aggregation with data.table
I've been using this setup: flist - expression( list(mean.z = mean(z), sd.z = sd(z)) ) dat[ , eval(flist), list(x)] It works great, but there's one small catch. If I do something like flist - expression(list(x.per.y = sum(x) / sum(y))) dat[, eval(flist), list(y)] it does the wrong thing, because sum(y) in each group is just the common value, rather than that value times the length. Is there any way around this? Obviously I could rewrite the expression if I know I'm going to by grouping by y, but I'd like it to be generic. Thanks. - Elliot On Wed, Aug 8, 2012 at 9:17 AM, David Winsemius dwinsem...@comcast.netwrote: On Aug 7, 2012, at 9:28 PM, arun wrote: HI, Try this: fun1-function(x,.expr){ .expr-expression(list(mean.z=**mean(z),sd.z=sd(z))) z1-eval(.expr) } #or fun1-function(x,.expr){ .expr-expression(list(mean.z=**mean(z),sd.z=sd(z))) z1-.expr } dat[,eval(z1),list(x)] dat[,eval(z1),list(y)] dat[,eval(z1),list(x,y)] I'm not seeing the connection between those functions and the data.table call. (Running that code produces an error on my machine.) If the goal is to have an expression result then just create it with expression(). In the example: flist - expression( list(mean.z = mean(z), sd.z = sd(z)) ) dat[ , eval(flist), list(x)] x mean.z sd.z 1: 2 0.04436034 1.039615 2: 3 -0.06354504 1.077686 3: 1 -0.08879671 1.066916 -- David. A.K. - Original Message - From: Elliot Joel Bernstein elliot.bernstein@fdopartners.**comelliot.bernst...@fdopartners.com To: r-help@r-project.org Cc: Sent: Tuesday, August 7, 2012 5:36 PM Subject: [R] Repeated Aggregation with data.table I have been using ddply to do aggregation, and I frequently define a single aggregation function that I use to aggregate over different groups. For example, require(plyr) dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, replace = TRUE), z = rnorm(100)) f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } ddply(dat, x, f) ddply(dat, y, f) ddply(dat, c(x, y), f) I recently discovered the data.table package, which dramatically speeds up the aggregation: require(data.table) dat - data.table(dat) dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] But I can't figure out how to save the aggregation function list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse, similar to the function f above. Can someone please explain how to do that? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernstein@fdopartners.**comelliot.bernst...@fdopartners.com __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated Aggregation with data.table
On Aug 7, 2012, at 9:28 PM, arun wrote: HI, Try this: fun1-function(x,.expr){ .expr-expression(list(mean.z=mean(z),sd.z=sd(z))) z1-eval(.expr) } #or fun1-function(x,.expr){ .expr-expression(list(mean.z=mean(z),sd.z=sd(z))) z1-.expr } dat[,eval(z1),list(x)] dat[,eval(z1),list(y)] dat[,eval(z1),list(x,y)] I'm not seeing the connection between those functions and the data.table call. (Running that code produces an error on my machine.) If the goal is to have an expression result then just create it with expression(). In the example: flist - expression( list(mean.z = mean(z), sd.z = sd(z)) ) dat[ , eval(flist), list(x)] x mean.z sd.z 1: 2 0.04436034 1.039615 2: 3 -0.06354504 1.077686 3: 1 -0.08879671 1.066916 -- David. A.K. - Original Message - From: Elliot Joel Bernstein elliot.bernst...@fdopartners.com To: r-help@r-project.org Cc: Sent: Tuesday, August 7, 2012 5:36 PM Subject: [R] Repeated Aggregation with data.table I have been using ddply to do aggregation, and I frequently define a single aggregation function that I use to aggregate over different groups. For example, require(plyr) dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, replace = TRUE), z = rnorm(100)) f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } ddply(dat, x, f) ddply(dat, y, f) ddply(dat, c(x, y), f) I recently discovered the data.table package, which dramatically speeds up the aggregation: require(data.table) dat - data.table(dat) dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] But I can't figure out how to save the aggregation function list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse, similar to the function f above. Can someone please explain how to do that? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated Aggregation with data.table
HI David, Thanks for testing. It's a bit strange. Yesterday, the function was working perfectly. Today, it is not working in my system. Not sure what happened. A.K. - Original Message - From: David Winsemius dwinsem...@comcast.net To: arun smartpink...@yahoo.com Cc: Elliot Joel Bernstein elliot.bernst...@fdopartners.com; R help r-help@r-project.org Sent: Wednesday, August 8, 2012 9:17 AM Subject: Re: [R] Repeated Aggregation with data.table On Aug 7, 2012, at 9:28 PM, arun wrote: HI, Try this: fun1-function(x,.expr){ .expr-expression(list(mean.z=mean(z),sd.z=sd(z))) z1-eval(.expr) } #or fun1-function(x,.expr){ .expr-expression(list(mean.z=mean(z),sd.z=sd(z))) z1-.expr } dat[,eval(z1),list(x)] dat[,eval(z1),list(y)] dat[,eval(z1),list(x,y)] I'm not seeing the connection between those functions and the data.table call. (Running that code produces an error on my machine.) If the goal is to have an expression result then just create it with expression(). In the example: flist - expression( list(mean.z = mean(z), sd.z = sd(z)) ) dat[ , eval(flist), list(x)] x mean.z sd.z 1: 2 0.04436034 1.039615 2: 3 -0.06354504 1.077686 3: 1 -0.08879671 1.066916 -- David. A.K. - Original Message - From: Elliot Joel Bernstein elliot.bernst...@fdopartners.com To: r-help@r-project.org Cc: Sent: Tuesday, August 7, 2012 5:36 PM Subject: [R] Repeated Aggregation with data.table I have been using ddply to do aggregation, and I frequently define a single aggregation function that I use to aggregate over different groups. For example, require(plyr) dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, replace = TRUE), z = rnorm(100)) f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } ddply(dat, x, f) ddply(dat, y, f) ddply(dat, c(x, y), f) I recently discovered the data.table package, which dramatically speeds up the aggregation: require(data.table) dat - data.table(dat) dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] But I can't figure out how to save the aggregation function list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse, similar to the function f above. Can someone please explain how to do that? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated Aggregation with data.table
On Tue, Aug 7, 2012 at 4:36 PM, Elliot Joel Bernstein elliot.bernst...@fdopartners.com wrote: I have been using ddply to do aggregation, and I frequently define a single aggregation function that I use to aggregate over different groups. For example, require(plyr) dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, replace = TRUE), z = rnorm(100)) f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } ddply(dat, x, f) ddply(dat, y, f) ddply(dat, c(x, y), f) I recently discovered the data.table package, which dramatically speeds up the aggregation: require(data.table) dat - data.table(dat) dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] But I can't figure out how to save the aggregation function list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse, similar to the function f above. Can someone please explain how to do that? One exceptionally kludgy way: zzz - expression(list(mean.z = mean(z), sd.z = sd(z))) dat[, eval(zzz), list(x,y)] Michael Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated Aggregation with data.table
HI, Try this: fun1-function(x,.expr){ .expr-expression(list(mean.z=mean(z),sd.z=sd(z))) z1-eval(.expr) } #or fun1-function(x,.expr){ .expr-expression(list(mean.z=mean(z),sd.z=sd(z))) z1-.expr } dat[,eval(z1),list(x)] dat[,eval(z1),list(y)] dat[,eval(z1),list(x,y)] A.K. - Original Message - From: Elliot Joel Bernstein elliot.bernst...@fdopartners.com To: r-help@r-project.org Cc: Sent: Tuesday, August 7, 2012 5:36 PM Subject: [R] Repeated Aggregation with data.table I have been using ddply to do aggregation, and I frequently define a single aggregation function that I use to aggregate over different groups. For example, require(plyr) dat - data.frame(x = sample(3, 100, replace=TRUE), y = sample(3, 100, replace = TRUE), z = rnorm(100)) f - function(x) { data.frame(mean.z = mean(x$z), sd.z = sd(x$z)) } ddply(dat, x, f) ddply(dat, y, f) ddply(dat, c(x, y), f) I recently discovered the data.table package, which dramatically speeds up the aggregation: require(data.table) dat - data.table(dat) dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(y)] dat[, list(mean.z = mean(z), sd.z = sd(z)), list(x,y)] But I can't figure out how to save the aggregation function list(mean.z = mean(z), sd.z = sd(z)) as a variable that I can reuse, similar to the function f above. Can someone please explain how to do that? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.