Thanks, everyone, for the help! Bill: This looks like a really great general solution, but unfortunately I get an error when I call sumUp:
Error in "[.data.frame"(dat, , key_list, drop = FALSE) : invalid subscript type I'm running 2.2.1 because that's the latest I can get on ubuntu...could that be the problem? On 12/12/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Here is an elementary way of doing it: > > > dat > url time somethingirrelevant visits > 1 www.foo.com 1:00 xxx 100 > 2 www.foo.com 1:00 yyy 50 > 3 www.foo.com 2:00 xyz 25 > 4 www.bar.com 1:00 xxx 200 > 5 www.bar.com 1:00 zzz 200 > 6 www.foo.com 2:00 xxx 500 > > dat <- transform(dat, key = paste(url, time)) > > total_visits <- with(dat, tapply(visits, key, sum)) > > m <- match(names(total_visits), dat$key) > > tdat <- cbind(dat[m, c("url", "time")], total_visits) > > tdat > url time total_visits > 4 www.bar.com 1:00 400 > 1 www.foo.com 1:00 150 > 3 www.foo.com 2:00 525 > > > > This should not be too difficult to morph into a fairly general > function. Here's what I might do [warning: somewhat obscure code > follows] > > sumUp <- function(dat, key_list, sum_list) { > key <- with(dat, do.call("paste", dat[, key_list, drop = FALSE])) > totals <- as.matrix(sapply(dat[, sum_list, drop = FALSE], tapply, key, > sum)) > dimnames(totals)[[2]] <- paste("total", sum_list, sep = "_") > m <- match(dimnames(totals)[[1]], key) > cbind(dat[m, key_list, drop = FALSE], totals) > } > > check: > > > sumUp(dat, c("url", "time"), "visits") > url time total_visits > 4 www.bar.com 1:00 400 > 1 www.foo.com 1:00 150 > 3 www.foo.com 2:00 525 > > > sumUp(dat, "url", "visits") > url total_visits > 4 www.bar.com 400 > 1 www.foo.com 675 > > Question for the reader: why to you need 'drop = FALSE' (in three > places)? > > Bill Venables. > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of George Nachman > Sent: Wednesday, 13 December 2006 9:35 AM > To: r-help@stat.math.ethz.ch > Subject: [R] How to sum one column in a data frame keyed on other > columns > > I have a data frame that looks like this: > > url time somethingirrelevant visits > www.foo.com 1:00 xxx 100 > www.foo.com 1:00 yyy 50 > www.foo.com 2:00 xyz 25 > www.bar.com 1:00 xxx 200 > www.bar.com 1:00 zzz 200 > www.foo.com 2:00 xxx 500 > > I'd like to write some code that takes this as input and outputs > something like this: > > url time total_vists > www.foo.com 1:00 150 > www.foo.com 2:00 525 > www.bar.com 1:00 400 > > In other words, I need to calculate the sum of visits for each unique > tuple of (url,time). > > I can do it with this code, but it's very slow, and doesn't seem like > the right approach: > > keys = list() > getkey = function(m,cols,index) { paste(m[index,cols],collapse=",") } > for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 } > for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = > keys[[getkey(data,1:2,i)]] + data[i,4] } > > I'm sure there's a more functional-programming approach to this > problem! Any ideas? > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.