Hi Pele, On Wed, Sep 22, 2010 at 12:40 PM, Pele <drdi...@yahoo.com> wrote: > > Hi David - thanks for your suggestion, but I am trying to avoid doing any > merging and sorting for this step because the real file I will be working > with has about 20 million records. If I can get this loop or something > similar to work will be good enough.
If that's the case, you might consider looking at the sqldf or data.table packages. They both implement data.frame-like objects, but can do subsetting (and merging) rather quickly since they implement indexes over "keys" (columns) of the respective data.frame(s). Subsetting "normal" data.frames in this scenario you describe involves a linear search for every query through the column(s) you are querying against, which can get slow as the size of your data.frames get large. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.