On Tue, 2006-12-12 at 15:34 -0800, George Nachman wrote: > I have a data frame that looks like this: > > url time somethingirrelevant visits > www.foo.com 1:00 xxx 100 > www.foo.com 1:00 yyy 50 > www.foo.com 2:00 xyz 25 > www.bar.com 1:00 xxx 200 > www.bar.com 1:00 zzz 200 > www.foo.com 2:00 xxx 500 > > I'd like to write some code that takes this as input and outputs > something like this: > > url time total_vists > www.foo.com 1:00 150 > www.foo.com 2:00 525 > www.bar.com 1:00 400 > > In other words, I need to calculate the sum of visits for each unique > tuple of (url,time). > > I can do it with this code, but it's very slow, and doesn't seem like > the right approach: > > keys = list() > getkey = function(m,cols,index) { paste(m[index,cols],collapse=",") } > for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 } > for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = > keys[[getkey(data,1:2,i)]] + data[i,4] } > > I'm sure there's a more functional-programming approach to this > problem! Any ideas?
See ?aggregate If your dataframe is called 'DF': > aggregate(DF$visits, list(DF$url, DF$time), sum) Group.1 Group.2 x 1 www.bar.com 1:00 400 2 www.foo.com 1:00 150 3 www.foo.com 2:00 525 HTH, Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.