... so... #1 ... flexible syntax for split-apply-combine, not very efficient for large data library(plyr) ddply(Dat,c("A1", "A2"), function(DF){data.frame(C1=sum(DF$C1))})
#2 ... compatible with large data on disk library(sqldf) sqldf("select A1,A2,sum(C1) as C1 from Dat group by A1, A2") #3 ... better for large data in memory library(data.table) dtt <- data.table(Dat) #speed for large data setkeyv(dtt,c("A1", "A2")) dtt[,list(C1=sum(C1)),by=list(A1,A2)] #4 ... package still under development, but potentially can support operations on data stored in memory or relational databases library(dplyr) Dat %>% group_by(A1,A2) %>% summarise( C1=sum( C1 ) ) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On November 9, 2014 1:39:45 PM PST, William Dunlap <wdun...@tibco.com> wrote: >> I tried with spilt() function. However it looks to me that, it can >> split a data-frame w.r.t. only one column. > >(I assume you you meant 'split', not 'spilt'.) > >You did not show what you tried, but the following splits Dat by its >"A1" >and "A2" columns (creating a list of data.frames): > split(Dat, f=Dat[,c("A1","A2")]) > >aggregate(), in core R, combine the split and the lapply needed to >calculate groupwise sums. E.g., > aggregate(Dat$C1, by=Dat[,c("A1","A2")], FUN=sum) > aggregate(C1 ~ A1 + A2, data=Dat, FUN=sum) > >The plyr and dplyr packages have other ways to do this sort of thing. > > >Bill Dunlap >TIBCO Software >wdunlap tibco.com > >On Sun, Nov 9, 2014 at 11:58 AM, Christofer Bogaso < >bogaso.christo...@gmail.com> wrote: > >> Hi again, >> >> Let say, I have following data frame: >> >> >> Dat <- structure(list(A1 = structure(c(3L, 3L, 1L, 3L, 3L, 3L, 3L, >2L, >> 3L, 3L, 1L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L, >> 3L, 2L, 3L, 1L, 1L, 3L), .Label = c("a", "b", "c"), class = >"factor"), >> A2 = c(2, 3, 2, 1, 3, 3, 2, 2, 3, 1, 3, 1, 3, 3, 2, 2, 1, >> 2, 1, 2, 1, 3, 3, 2, 1, 2, 3, 2, 2, 2), C1 = 1:30), .Names = >c("A1", >> "A2", "C1"), row.names = c(NA, -30L), class = "data.frame") >> >> >> Now my goal is : >> 1: Find all possible unique combinations of column 'A1' & column >'A2'. >> For example A1 = c, A2 = 2 is 1 unique combination. >> >> 2. For each such unique combination, calculate sum for 'A3'. >> >> Is there any direct R function to achieve this faster way? I have >very >> large data-frame to handle with such calculation. >> >> I tried with spilt() function. However it looks to me that, it can >> split a data-frame w.r.t. only one column. >> >> Thanks for your suggestion >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.