A long time ago, I created this (on top of rhdf5), which I often use when I need to use hdf5 files: https://github.com/benilton/rhdf5utils
On Sat, Sep 19, 2015, 07:12 Morgan, Martin <martin.mor...@roswellpark.org> wrote: > Two less baked ideas are > > https://github.com/PaulPyl/h5array > > which could be used in the assay() of SummarizedExperiment, and > > https://github.com/nhayden/h5robj > > which translates R objects to hdf5. > > Martin > > > -----Original Message----- > > From: Bioc-devel [mailto:bioc-devel-boun...@r-project.org] On Behalf Of > > Michael Lawrence > > Sent: Friday, September 18, 2015 10:42 PM > > To: Peter Haverty > > Cc: Tim Triche, Jr.; bioc-devel@r-project.org > > Subject: Re: [Bioc-devel] SummarizedExperiment with alternate back end > > > > While it's useful (and often necessary) to store the big matrices out of > core, it > > would be convenient to store the metadata (the other components of the > > object) along with the matrices. Something along the lines of HDF5, but > we > > would want to keep things abstract. Other options include GDS (for > > genotypes), and of couse most any database. > > > > On Fri, Sep 18, 2015 at 6:18 PM, Peter Haverty <haverty.pe...@gene.com> > > wrote: > > > > > While we are on the topic, my GenoSet class will become a subclass of > > > RangedSummarizedExperiment, rather than eSet, after this upcoming > > release. > > > For this release both APIs work (colnames and sampleNames, etc.) > > > > > > I think the range-free SummarizedExperiment will be great. I've seen a > > > lot of ExpressionSets with random, non-exprs stuff in the exprs slot > > > for lack of something more appropriate. > > > > > > Pete > > > > > > ____________________ > > > Peter M. Haverty, Ph.D. > > > Genentech, Inc. > > > phave...@gene.com > > > > > > On Fri, Sep 18, 2015 at 6:09 PM, Ryan <r...@thompsonclan.org> wrote: > > > > > > > In the dev version, SummarizedExperiment has been split into > > > > RangedSummarizedExperiment (equivalent to the current > > > > SummarizedExperiement, with rowRanges) and SummarizedExperiment > > > > (kind of like eSet, no rowRanges). Given that eSet objects also > > > > support multiple assayData elements, I believe the new > > > > SummarizedExperiment is pretty > > > close > > > > to being eSet with different method names. In fact, I wonder if eSet > > > > could/should be reimplemented as a subclass of the new > > > SummarizedExperiment > > > > class. > > > > > > > > > > > > On 9/18/15 5:36 PM, Kasper Daniel Hansen wrote: > > > > > > > >> Interesting, thanks for the pointer. > > > >> > > > >> In light of the existing (and future) work on this, may I suggest > > > >> an > > > eSet > > > >> like class, but build using the technologies in > SummarizedExperiment. > > > Ie. > > > >> a SummarizedExperiment without the rowRanges. I would very much > > > >> like > > > this > > > >> for modern work using eSet like containers. Not everything has > ranges. > > > >> > > > >> Vince: I am not claiming that it is easy to work with; we have > > > >> pains as well. But am I missing something or is the assay matrix > only > > 2.3Gb? > > > >> > > > >> Best, > > > >> Kasper > > > >> > > > >> On Fri, Sep 18, 2015 at 6:28 PM, Peter Haverty > > > >> <haverty.pe...@gene.com> > > > >> wrote: > > > >> > > > >> Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are > > > >> good > > > >>> tricks > > > >>> for reducing the size of your eSets and SummarizedExperiments. > > > >>> Both object types can go into assayData or assays. In fact, that's > > > >>> what they were designed for. > > > >>> > > > >>> At Genentech, we use these for our 2.5e6 x 1e3 rectangular data > > > >>> from Illumina SNP arrays. We typically have ~6 such rectangular > > > >>> objects in one eSet. With a mix of BigMatrix object for point > > > >>> estimates and RleDataFrames for segmented data, readRDS times are > > > >>> quite reasonable. > > > >>> > > > >>> > > > >>> Pete > > > >>> > > > >>> ____________________ > > > >>> Peter M. Haverty, Ph.D. > > > >>> Genentech, Inc. > > > >>> phave...@gene.com > > > >>> > > > >>> On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. > > > >>> <tim.tri...@gmail.com > > > > > > > >>> wrote: > > > >>> > > > >>> bigmemoryExtras (Peter Haverty's extensions to > > > >>> bigMemory/bigMatrix) can > > > >>>> > > > >>> be > > > >>> > > > >>>> handy for this, as it works well as a backend, especially if you > > > >>>> go about splitting by chromosome as for CNV segmentation, DMR > > > >>>> finding, etc. > > > >>>> It's > > > >>>> not as seamless as one might like, but it's the closest thing > > > >>>> I've found. > > > >>>> > > > >>>> SciDb tries to implement a similar API, but for a distributed > > > >>>> version > > > of > > > >>>> this where the data itself is in a columnar database and served > > > >>>> on > > > >>>> > > > >>> demand. > > > >>> > > > >>>> I tried getting that up and running as a SummarizedExperiment > > > >>>> backend, > > > >>>> > > > >>> but > > > >>> > > > >>>> did not succeed. I have previously shoveled all of the TCGA 450k > > > >>>> data > > > >>>> > > > >>> into > > > >>> > > > >>>> one 7,000+ column bigMatrix which serializes to about 14GB on > disk. > > > >>>> > > > >>>> If you have any replicates in your 700+ samples, it's a good idea > > > >>>> to keep their SNP calls in metadata(yourSE), although if you > > > >>>> change names it > > > >>>> > > > >>> needs > > > >>> > > > >>>> to propagate into the dependent metadata. This is why I started > > > >>>> > > > >>> monkeying > > > >>> > > > >>>> around with linkedExperiments where those mappings are enforced; > > > >>>> it's becoming more of an issue with the TARGET pediatric AML > > > >>>> study, where > > > >>>> > > > >>> there > > > >>> > > > >>>> are numerous diagnosis-remission-relapse trios whose identity I > > > >>>> wish > > > to > > > >>>> verify periodically. The SNPs on the 450k array are great for > > > >>>> this purpose, but minfi doesn't really have a slot for them per > > > >>>> se, so live in metadata(). > > > >>>> > > > >>>> > > > >>>> --t > > > >>>> > > > >>>> On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey < > > > >>>> > > > >>> st...@channing.harvard.edu > > > >>> > > > >>>> wrote: > > > >>>> > > > >>>> i am dealing with ~700 450k arrays > > > >>>>> > > > >>>>> they are derived from one study, so it makes sense to think of > > > >>>>> > > > >>>>> them holistically. > > > >>>>> > > > >>>>> both the load time and the memory consumption are not > > satisfactory. > > > >>>>> > > > >>>>> has anyone worked on an object type that implements the > > rangedSE > > > >>>>> API > > > >>>>> > > > >>>> but > > > >>> > > > >>>> has > > > >>>>> > > > >>>>> the assay data out of memory? > > > >>>>> > > > >>>>> unix.time(load("wbmse.rda")) > > > >>>>>> > > > >>>>> user system elapsed > > > >>>>> > > > >>>>> 30.131 2.396 61.036 > > > >>>>> > > > >>>>> object.size(wbmse) > > > >>>>>> > > > >>>>> 124031032 bytes > > > >>>>> > > > >>>>> dim(wbmse) > > > >>>>>> > > > >>>>> [1] 485577 690 > > > >>>>> > > > >>>>> object.size(assays(wbmse)) > > > >>>>>> > > > >>>>> 2680430992 bytes > > > >>>>> > > > >>>>> [[alternative HTML version deleted]] > > > >>>>> > > > >>>>> _______________________________________________ > > > >>>>> Bioc-devel@r-project.org mailing list > > > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > >>>>> > > > >>>>> [[alternative HTML version deleted]] > > > >>>> > > > >>>> _______________________________________________ > > > >>>> Bioc-devel@r-project.org mailing list > > > >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > >>>> > > > >>>> [[alternative HTML version deleted]] > > > >>> > > > >>> _______________________________________________ > > > >>> Bioc-devel@r-project.org mailing list > > > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > >>> > > > >>> [[alternative HTML version deleted]] > > > >> > > > >> _______________________________________________ > > > >> Bioc-devel@r-project.org mailing list > > > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > >> > > > >> > > > >> > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioc-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel