> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
> Behalf Of Davis, Brian
> Sent: Wednesday, March 14, 2012 2:28 PM
> To: r-help@R-project.org
> Subject: [R] Needing a better solution to a lookup problem.
> 
> I have a solution (actually a few) to this problem, but none are
> computationally efficient enough to be useful.  I'm hoping someone can
> enlighten me to a better solution.
> ...
> I have a solution that works reasonably well on small sets, but my current
> data set is ~100K snp entries, and my regions table has ~200K entries. I
have
> ~1500 files to go through
> 
> I haven't found a good way to efficiently solve this problem.  I've tried
> various versions of mapply/lapply, for loops, etc which get the answer for
> small sets but takes hours (per file) on my real data.  Bioconductor
seemed
> like the obvious place to look, but my GoogleFu must not be that great.  I
> never found anything relevant.
> 
> Any ideas or points to the right direction would be greatly appreciated.

Consider using a database. For instance PostgreSQL can easily handle large
amount of data and can restrict data set to only those that are within a
certain subset. While it requires some DB & SQL knowledge, it will pay off.
And you can query your data right from DB using RODBC or something. Solve
this problem in DB and use R for further analysis.

Mikhail

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to