> -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Davis, Brian > Sent: Wednesday, March 14, 2012 2:28 PM > To: r-help@R-project.org > Subject: [R] Needing a better solution to a lookup problem. > > I have a solution (actually a few) to this problem, but none are > computationally efficient enough to be useful. I'm hoping someone can > enlighten me to a better solution. > ... > I have a solution that works reasonably well on small sets, but my current > data set is ~100K snp entries, and my regions table has ~200K entries. I have > ~1500 files to go through > > I haven't found a good way to efficiently solve this problem. I've tried > various versions of mapply/lapply, for loops, etc which get the answer for > small sets but takes hours (per file) on my real data. Bioconductor seemed > like the obvious place to look, but my GoogleFu must not be that great. I > never found anything relevant. > > Any ideas or points to the right direction would be greatly appreciated.
Consider using a database. For instance PostgreSQL can easily handle large amount of data and can restrict data set to only those that are within a certain subset. While it requires some DB & SQL knowledge, it will pay off. And you can query your data right from DB using RODBC or something. Solve this problem in DB and use R for further analysis. Mikhail ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.