On Thu, Jul 10, 2014 at 2:16 PM, Steve Lianoglou <[email protected]>
wrote:

> Hi,
>
> On Thu, Jul 10, 2014 at 1:52 PM, Vincent Carey
> <[email protected]> wrote:
> > a new, more inclusive GWAS catalog is available (GRASP, from Andrew
> Johnson
> > at NHLBI), with 6 million records and voluminous metadata (though it
> seems
> > sparse and perhaps can be trimmed/reshaped)
> >
> > i made a GRanges and it takes 3 minutes to load.  even after stripping
> all
> > the
> > metadata, a GRanges with 6 million records takes 20 seconds to load.
> >  that's probably acceptable, but a managed chromosome-specific
> distribution
> > might
> > be closer to interactive availability.
> >
> > the metadata probably would be best kept in SQLite.  it occurred to me to
> > consider an arrangement in which we have the GRanges managing the ranges
> > and a key to the database.  range operations can engender queries to
> > retrieve metadata, metadata queries in the db can generate indices to
> > retrieve matching ranges.
> >
> > is anyone doing something along these lines?
>
> You might consider just stuffing it all in the database.
>
> SQLite supports RTrees, which is a spatial index, so you could in
> theory get the fast overlap stuff baked in w/o a need to have a
> parallel GRanges object to index into the database:
> http://www.sqlite.org/rtree.html
>
> Before the reboot of the GenomicFeatures package (we're talking around
> 2008/2009?) I was doing something like that for genomic annotations.
>
> The way that Hadley has abstracted db access in dplyr to make a
> database look like a data.frame and respond to all the "data
> manipulation verbs" in the same way gives me inspiration to believe
> that we can do the same and make the database look essentially like a
> GRanges / VRanges object and get cooking that way.
>
>
This would be useful and was part of the intent of DynamicGRanges in the
MutableRanges package (in svn for years but never released). A short-term
solution might be an indexed VCF. The parser in VariantAnnotation supports
multiple modes of restriction that should enable efficient loading.

Michael


> Hopefully this answer was at least minimally aligned in the direction
> of what you were asking ;-)
>
> -steve
>
> --
> Steve Lianoglou
> Computational Biologist
> Genentech
>
> _______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to