On Thu, Jul 10, 2014 at 2:16 PM, Steve Lianoglou <[email protected]> wrote:
> Hi, > > On Thu, Jul 10, 2014 at 1:52 PM, Vincent Carey > <[email protected]> wrote: > > a new, more inclusive GWAS catalog is available (GRASP, from Andrew > Johnson > > at NHLBI), with 6 million records and voluminous metadata (though it > seems > > sparse and perhaps can be trimmed/reshaped) > > > > i made a GRanges and it takes 3 minutes to load. even after stripping > all > > the > > metadata, a GRanges with 6 million records takes 20 seconds to load. > > that's probably acceptable, but a managed chromosome-specific > distribution > > might > > be closer to interactive availability. > > > > the metadata probably would be best kept in SQLite. it occurred to me to > > consider an arrangement in which we have the GRanges managing the ranges > > and a key to the database. range operations can engender queries to > > retrieve metadata, metadata queries in the db can generate indices to > > retrieve matching ranges. > > > > is anyone doing something along these lines? > > You might consider just stuffing it all in the database. > > SQLite supports RTrees, which is a spatial index, so you could in > theory get the fast overlap stuff baked in w/o a need to have a > parallel GRanges object to index into the database: > http://www.sqlite.org/rtree.html > > Before the reboot of the GenomicFeatures package (we're talking around > 2008/2009?) I was doing something like that for genomic annotations. > > The way that Hadley has abstracted db access in dplyr to make a > database look like a data.frame and respond to all the "data > manipulation verbs" in the same way gives me inspiration to believe > that we can do the same and make the database look essentially like a > GRanges / VRanges object and get cooking that way. > > This would be useful and was part of the intent of DynamicGRanges in the MutableRanges package (in svn for years but never released). A short-term solution might be an indexed VCF. The parser in VariantAnnotation supports multiple modes of restriction that should enable efficient loading. Michael > Hopefully this answer was at least minimally aligned in the direction > of what you were asking ;-) > > -steve > > -- > Steve Lianoglou > Computational Biologist > Genentech > > _______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
