Re: [R] mergeing a large number of large .csvs

Benjamin Caldwell Sat, 03 Nov 2012 01:10:29 -0700

Jeff,
If you're willing to educate, I'd be happy to learn what wide vs long
format means. I'll give rbind a shot in the meantime.
Ben
On Nov 2, 2012 4:31 PM, "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> wrote:


> I would first confirm that you need the data in wide format... many
> algorithms are more efficient in long format anyway, and rbind is way more
> efficient than merge.
>
> If you feel this is not negotiable, you may want to consider sqldf. Yes,
> you need to learn a bit of SQL, but it is very well integrated into R.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Benjamin Caldwell <btcaldw...@berkeley.edu> wrote:
>
> >Dear R help;
> >I'm currently trying to combine a large number (about 30 x 30) of large
> >.csvs together (each at least 10000 records). They are organized by
> >plots,
> >hence 30 X 30, with each group of csvs in a folder which corresponds to
> >the
> >plot. The unmerged csvs all have the same number of columns (5). The
> >fifth
> >column has a different name for each csv. The number of rows is
> >different.
> >
> >The combined csvs are of course quite large, and the code I'm running
> >is
> >quite slow - I'm currently running it on a computer with 10 GB ram,
> >ssd,
> >and quad core 2.3 ghz processor; it's taken 8 hours and it's only  75%
> >of
> >the way through (it's hung up on one of the largest data groupings now
> >for
> >an hour, and using 3.5 gigs of RAM.
> >
> >I know that R isn't the most efficient way of doing this, but I'm not
> >familiar with sql or C. I wonder if anyone has suggestions for a
> >different
> >way to do this in the R environment. For instance, the key function now
> >is
> >merge, but I haven't tried join from the plyr package or rbind from
> >base.
> >I'm willing to provide a dropbox link to a couple of these files if
> >you'd
> >like to see the data. My code is as follows:
> >
> >
> >#multmerge is based on code by Tony cookson,
> >
> http://www.r-bloggers.com/merging-multiple-data-files-into-one-data-frame/
> ;
> >The function takes a path. This path should be the name of a folder
> >that
> >contains all of the files you would like to read and merge together and
> >only those files you would like to merge.
> >
> >multmerge = function(mypath){
> >filenames=list.files(path=mypath, full.names=TRUE)
> >datalist = try(lapply(filenames,
> >function(x){read.csv(file=x,header=T)}))
> >try(Reduce(function(x,y) {merge(x, y, all=TRUE)}, datalist))
> >}
> >
> >#this function renames files using a fixed list and outputs a .csv
> >
> >merepk <- function (path, nf.name) {
> >
> >output<-multmerge(mypath=path)
> >name <- list("x", "y", "z", "depth", "amplitude")
> >try(names(output) <- name)
> >
> >write.csv(output, nf.name)
> >}
> >
> >#assumes all folders are in the same directory, with nothing else there
> >
> >merge.by.folder <- function (folderpath){
> >
> >foldernames<-list.files(path=folderpath)
> >n<- length(foldernames)
> >setwd(folderpath)
> >
> >for (i in 1:n){
> >path<-paste(folderpath,foldernames[i], sep="\\")
> > nf.name <- as.character(paste(foldernames[i],".csv", sep=""))
> >merepk (path,nf.name)
> > }
> >}
> >
> >folderpath <- "yourpath"
> >
> >merge.by.folder(folderpath)
> >
> >
> >Thanks for looking, and happy friday!
> >
> >
> >
> >*Ben Caldwell*
> >
> >PhD Candidate
> >University of California, Berkeley
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mergeing a large number of large .csvs

Reply via email to