Hi: The other day I ran 100K simulations, each of which returned a 20 x 4 data frame. I stored these in a list object. When attempting to rbind them into a single large data frame, my first thought was to try plyr:
library(plyr) bigD <- ldply(L, rbind) # where L is the list object I quit at around a half hour. Ditto for do.call(rbind, L). [Sorry, I didn't time it - these are approximate times.] I then checked to see if the data.table package could do this, and lo and behold, I discovered the rbindlist() function. When applied to my list object, it ran correctly in under a second. Here's the actual example with some names changed to mask the application: g <- gs[1:100000] # gs is a list of lists > length(g) [1] 100000 > class(g) [1] "list" > dim(g[[1]]) [1] 20 4 > dim(g[[100000]]) [1] 20 4 > library(data.table) > system.time(bigD <- rbindlist(g)) user system elapsed 0.45 0.02 0.47 > dim(bigD) [1] 2000000 4 > class(bigD) [1] "data.table" "data.frame" Dennis On Tue, Feb 26, 2013 at 7:05 PM, David Kulp <dk...@fiksu.com> wrote: > On Feb 26, 2013, at 9:33 PM, Anika Masters <anika.mast...@gmail.com> wrote: > >> Thanks Arun and David. Another issue I am running into are memory >> issues when one of the data frames I'm trying to rbind to or merge >> with are "very large". (This is a repetitive problem, as I am trying >> to merge/rbind thousands of small dataframes into a single "very >> large" dataframe.) >> >> >> >> I'm thinking of creating a function that creates an empty dataframe to >> which I can add data, but will need to first determine and ensure that >> each dataframe has the exact same columns, in the exact same >> "location". >> >> >> >> Before I write any new code, is there any pre-existing functions or >> code that might solve this problem of "merging small or medium sized >> dataframes with a "very large" dataframe.) > > Consider plyr. Memory issues can be a problem, but it's a piece of > cake to write a one liner that iterates over a list of data frames and > returns them all rbind'd together. Or just: do.call(rbind, > list.of.data.frames). > > If memory is a serious problem then I think it's best to write your > own code that appends each row by index - which avoids copying entire > data frames in memory. > >> >> On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson <dcarl...@tamu.edu> wrote: >>> Clumsy but it doesn't require any packages: >>> >>> merge2 <- function(x, y) { >>> if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){ >>> rbind(x, y) >>> } else merge(x, y, all=TRUE) >>> } >>> merge2(df1, df2) >>> df3 <- df1 >>> merge2(df1, df3) >>> >>> ---------------------------------------------- >>> David L Carlson >>> Associate Professor of Anthropology >>> Texas A&M University >>> College Station, TX 77843-4352 >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- >>>> project.org] On Behalf Of arun >>>> Sent: Tuesday, February 26, 2013 1:14 PM >>>> To: Anika Masters >>>> Cc: R help >>>> Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill, >>>> etc.? >>>> >>>> Hi, >>>> >>>> You could also try: >>>> library(gtools) >>>> smartbind(df2,df1) >>>> # a b d >>>> #1 7 99 12 >>>> #2 7 99 12 >>>> >>>> >>>> When df1!=df2 >>>> smartbind(df1,df2) >>>> # a b d x y c >>>> #1 7 99 12 NA NA NA >>>> #2 NA 34 88 12 44 56 >>>> A.K. >>>> >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> From: Anika Masters <anika.mast...@gmail.com> >>>> To: r-help@r-project.org >>>> Cc: >>>> Sent: Tuesday, February 26, 2013 1:55 PM >>>> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.? >>>> >>>> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd >>>> (mydf). I want the 3rd dataframe to contain 1 row for each row in df1 >>>> & df2, and all the columns in both df1 & df2. The solution should >>>> "work" even if the 2 dataframes are identical, and even if the 2 >>>> dataframes do not have the same column names. The rbind.fill function >>>> seems to work. For learning purposes, are there other "good" ways to >>>> solve this problem, using merge or other functions other than >>>> rbind.fill? >>>> >>>> #e.g. These 3 examples all seem to "work" correctly and as I hoped: >>>> >>>> df1 <- data.frame(matrix(data=c(7, 99, 12) , nrow=1 , dimnames = >>>> list( NULL , c('a' , 'b' , 'd') ) ) ) >>>> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) , nrow=1 , >>>> dimnames = list( NULL , c('d' , 'b' , 'x' , 'y', 'c') ) ) ) >>>> mydf <- merge(df2, df1, all.y=T, all.x=T) >>>> mydf >>>> >>>> #e.g. this works: >>>> library(reshape) >>>> mydf <- rbind.fill(df1, df2) >>>> mydf >>>> >>>> #This works: >>>> library(reshape) >>>> mydf <- rbind.fill(df1, df2) >>>> mydf >>>> >>>> #But this does not (the 2 dataframes are identical) >>>> df1 <- data.frame(matrix(data=c(7, 99, 12) , nrow=1 , dimnames = >>>> list( NULL , c('a' , 'b' , 'd') ) ) ) >>>> df2 <- df1 >>>> mydf <- merge(df2, df1, all.y=T, all.x=T) >>>> mydf >>>> >>>> #Any way to get "mere" to work for this final example? Any other good >>>> solutions? >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting- >>>> guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting- >>>> guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.