Don't see that as being a big problem. If your data grows then dplyr supports connections to external databases. Alternately if you just want a mean, most databases can do that directly in SQL.
On Tue, May 24, 2016 at 4:17 PM, Matthew <mccorm...@molbio.mgh.harvard.edu> wrote: > Thank you very much, Tom. > This gets me thinking in the right direction. > One thing I should have mentioned that I did not is that the number of > rows in the data frame will be a little over 40,000 rows. > > > On 5/24/2016 4:08 PM, Tom Wright wrote: > > Using dplyr > > $ library(dplyr) > $ x<-data.frame(Length=c(321,350,340,180,198), > ID=c(rep('A234',3),'B123','B225') ) > $ x %>% group_by(ID) %>% summarise(m=mean(Length)) > > > > On Tue, May 24, 2016 at 3:46 PM, Matthew <mccorm...@molbio.mgh.harvard.edu > > wrote: > >> I have a data frame with 10 columns. >> In the last column is an alphaneumaric identifier. >> For most rows, this alphaneumaric identifier is unique to the file, >> however some of these alphanemeric idenitifiers occur in duplicate, >> triplicate or more. When they do occur more than once they are in >> consecutive rows, so when there is a duplicate or triplicate or >> quadruplicate (let's call them multiplicates), they are in consecutive rows. >> >> In column 7 there is an integer number (may or may not be unique. does >> not matter). >> >> I want to identify each multiple entries (multiplicates) occurring in >> column 10 and then for each multiplicate calculate the mean of the integers >> column 7. >> >> As an example, I will show just two columns: >> Length Identifier >> 321 A234 >> 350 A234 >> 340 A234 >> 180 B123 >> 198 B225 >> >> What I want to do (in the above example) is collapse all the A234's and >> report the mean to get this: >> Length Identifier >> 337 A234 >> 180 B123 >> 198 B225 >> >> >> Matthew >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.