so I'm told: https://github.com/vjcitn/biocMultiAssay/blob/master/R/triche.R
Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Wed, Mar 4, 2015 at 9:01 AM, Robert Castelo <robert.cast...@upf.edu> wrote: > some of the goals behind this discussion are IMO similar to the ones for > biocMultiAssay: > > https://github.com/vjcitn/biocMultiAssay > > maybe Vince can confirm. > > robert. > > On 03/04/2015 05:16 PM, Tim Triche, Jr. wrote: > > Oh, I don't disagree. Perhaps the two problems can be addressed > > simultaneously by > > > > 1) deciding on what contracts a multi-assay container can/would demand to > > be useful > > 2) calling it something besides SummarizedExperiment, say, > > ExperimentCollection > > > > Then the SE API could stay the same as it is (which is already very > useful) > > and progress could be sought in the offshoot (ExperimentCollection or > > whatever) without breaking things that rely on SE. > > > > Just off the top of my head, a most generically useful container for DNA > > methylation& CNV data (which can of course be called from the same > assay) > > is Kasper& JP's GenomicRatioSet, which already has some weird quirks for > > eSet backwards compatibility. (e.g. sampleNames(x) works, but > > sampleNames(x)<- does not work; pData(x) calls colData(x); fData(x) calls > > rowData(x)) There are little niggles that I should probably just send > in a > > patch for, but a cleaner overall container would be better, if for no > other > > reason than the aforementioned ability to easily experiment with > > imputation. An approach that I've been using is to stuff the SNPs, CNV > (as > > GRanges) and mRNA/miRNA (as a matrix) data into exptData(SE). This is... > > somewhat less than optimal, especially when subsetting. > > > > But it does suggest that I could define a coercion from the current > > rambling wreck into a nice clean new class/API (ExperimentCollection or > > whatever) and I'll bet other package authors could, too. The presence > of a > > GRangesFrame would then be handy for returning a given assay's results, > so > > that the user could be blissfully ignorant of the storage backing (ff, > > BigMatrix, Matrix, matrix, Rle, whatever) but not lose the data > management > > advantages of a SummarizedExperiment. > > > > JMHO > > > > > > > > > > > > > > > > Statistics is the grammar of science. > > Karl Pearson<http://en.wikipedia.org/wiki/The_Grammar_of_Science> > > > > On Wed, Mar 4, 2015 at 6:40 AM, Vincent Carey<st...@channing.harvard.edu > > > > wrote: > > > >> I am a bit concerned about any major alterations to the > >> SummarizedExperiment API. We have > >> two papers and plenty of working code that use it in meaningful ways. > >> Effort required to keep new > >> formulations back-compatible as well as bug-free has to be weighed > >> seriously. > >> > >> I agree that the name is not ideal. We are learning as we go. > >> > >> Seems to make sense to start with the contracts we want the instances > of > >> a class to satisfy. I have long felt > >> that X[i, j] idiom is one users and developers should be comfortable > with, > >> even insist on, and for consistency > >> with matrix operations idiom, it should work in a natural way for > numeric > >> indexing. This seems like an important > >> constraint. subsetBy* is a useful idiom, but it is conceivable that we > >> would adopt filter() for row-oriented selections > >> and select() for column-oriented selections. Do we have to make any > >> special design considerations to allow > >> very smooth interoperation with out-of-memory resources for certain > >> components for developers who want to allow this? > >> > >> We should have a reasonable way to get data on what is out there, what > >> is used, how it is most effectively used. > >> What's the SE API? Is it well-adapted to requirements of DESeq2? Other > >> killer packages that use/don't use it? > >> Even getting data on the formal API for a class is not all that > familiar. > >> And if folks are writing non-S4 interfaces (i.e., naked > >> functions) we have no way of identifying them. See below for one way of > >> discovering the API for SummarizedExperiment. > >> > >> In summary, I think we have to be careful about overdesigning too > >> early. Getting clear on contracts seems the best > >> way to ensure reuse, and we really want that so that reliability is > >> continually assessed. My sense is that it is good > >> to give developers something they'll gladly extend, not necessarily > reuse > >> directly. So we don't have to have > >> broad consensus on class details, but on the minimal abstraction and on > >> obligatory tests on its basic implementation. > >> > >>> methods(class="SummarizedExperiment") # perhaps an obsolete version of > >> methods cataloguer by MTM > >> > >> DataFrame with 76 rows and 3 columns > >> > >> generic > >> signature package > >> > >> <character> > >> <character> <character> > >> > >> 1 [ x="SummarizedExperiment", i="ANY", > >> j="ANY", drop="ANY" base > >> > >> 2 [ x="SummarizedExperiment", i="ANY", > >> j="missing", value="ANY" base > >> > >> 3 [ x="SummarizedExperiment", > >> i="ANY", j="missing" base > >> > >> 4 [<- x="SummarizedExperiment", i="ANY", j="ANY", > >> value="SummarizedExperiment" base > >> > >> 5 assay > >> x="SummarizedExperiment", i="character" GenomicRanges > >> > >> ... ... > >> ... ... > >> > >> 72 updateObject > >> object="SummarizedExperiment" BiocGenerics > >> > >> 73 values > >> x="SummarizedExperiment" S4Vectors > >> > >> 74 values<- > >> x="SummarizedExperiment" S4Vectors > >> > >> 75 width > >> x="SummarizedExperiment" BiocGenerics > >> > >> 76 width<- > >> x="SummarizedExperiment" BiocGenerics > >> > >> On Wed, Mar 4, 2015 at 8:32 AM, Hector Corrada Bravo<hcorr...@gmail.com > > > >> wrote: > >> > >>> May I advocate for 'IndexedDataFrame' or 'IndexedFrame'? 'rowIndices' > can > >>> return whatever makes sense (GRanges, or other data structures > -thinking > >>> taxonomy for metagenomics for example-). GRangesFrame can inherit from > >>> this. > >>> > >>> On Wed, Mar 4, 2015 at 3:28 AM, Hervé Pagès<hpa...@fredhutch.org> > wrote: > >>> > >>>> GRangesFrame is an interesting idea and I gave it some thoughts. > >>>> > >>>> There is this nice symmetry between GRanges and GRangesFrame: > >>>> > >>>> - GRanges = a naked GRanges + a DataFrame accessible via mcols() > >>>> > >>>> - GRangesFrame = a DataFrame + a naked GRanges accessible via > >>>> some accessor (e.g. rowRanges()) > >>>> > >>>> So GRanges and GRangesFrame are equivalent in terms of what they > >>>> can hold, but different in terms of API: the former has the ranges > >>>> API as primary API and the DataFrame API on its mcols() component, > >>>> and the latter has the DataFrame API as primary API and the ranges > >>>> API on its rowRanges() component. Nice switch! > >>>> > >>>> What does this API switch bring us? A GRangesFrame object is now > >>>> an object that fully behaves like a DataFrame and people can also > >>>> perform range-based operations on its rowRanges() component. > >>>> Here is what I'm afraid is going to happen: people will also want > >>>> to be able to perform range-based operations *directly* on > >>>> these objects, i.e. without having to call rowRanges() first. > >>>> So for example when they do subsetByOverlaps(), subsetting > >>>> happens vertically. Also the Hits object returned by findOverlaps() > >>>> would contain row indices. Problem with this is that these objects > >>>> now start to suffer from the "dual personality syndrome". For > >>>> example, it's not clear anymore what their length should be. > >>>> Strictly speaking it should be their number of columns (that's > >>>> what the length of a DataFrame is), but the ranges API that > >>>> we're trying to put on them also makes them feel like vectors > >>>> along the vertical dimension so it also feels that their length > >>>> should be their number of rows. Same thing with 1D subsetting. > >>>> Why does it subset the columns and not the rows? Most people > >>>> are now confused. > >>>> > >>>> It's interesting to note that the same thing happens with GRanges > >>>> objects, but in the opposite direction: people wish they could > >>>> do DataFrame operations directly on them without calling mcols() > >>>> first. But in order to preserve the good health of GRanges objects, > >>>> we've not done that (except for $, a shortcut for mcols(x)$, > >>>> the pressure was just too strong). > >>>> > >>>> H. > >>>> > >>>> > >>>> > >>>> On 03/03/2015 04:35 PM, Michael Lawrence wrote: > >>>> > >>>>> Should be possible for the annotations to be of any type, as long as > >>> they > >>>>> satisfy a simple contract of NROW() and 2D "[". Then, you could have > a > >>>>> DataFrame, GRanges, or whatever in there. But it would be nice to > have > >>> a > >>>>> special class for the container with range information. The contract > >>> for > >>>>> the range annotation would be to have a granges() method. > >>>>> > >>>>> I agree it would be nice if there was a way with the methods package > to > >>>>> easily assert such contracts. For example, one could define an > >>> interface > >>>>> with a set of generics (and optionally the relevant position in the > >>>>> generic > >>>>> signature). Then, once all of the methods have been assigned for a > >>>>> particular class, it is made to inherit from that contract class. > There > >>>>> are > >>>>> lots of gotchas though. Not sure how useful it would be in practice. > >>>>> > >>>>> > >>>>> On Tue, Mar 3, 2015 at 4:07 PM, Peter Haverty<haverty.pe...@gene.com > > > >>>>> wrote: > >>>>> > >>>>> There are some nice similarities in these new imaginary types. A > >>>>>> "GRangesFrame" is a list of dimensionally identical things (columns) > >>> and > >>>>>> some row meta-data (the GRanges). The SE-like object is similarly a > >>> list > >>>>>> of dimensionally like things (matrices, RleDataFrames, BigMatrix > >>> objects, > >>>>>> HDF5-backed things) with some row meta-data (a DataFrame or > >>>>>> GRangesFrame). > >>>>>> Elegant? Maybe they would actually be relatives in the class tree. > >>>>>> > >>>>>> I wonder if this kind of thing would be easier if we had Java-style > >>>>>> Interfaces or duck-typing. The "x" slot of "y" holds something that > >>>>>> implements this set of methods ... > >>>>>> > >>>>>> Oh, and kinda apropos, the genoset class will probably go away or > >>> become > >>>>>> an extension to this new SE-like thing. The extra stuff that comes > >>> along > >>>>>> with genoset will still be available. > >>>>>> > >>>>>> Pete > >>>>>> > >>>>>> ____________________ > >>>>>> Peter M. Haverty, Ph.D. > >>>>>> Genentech, Inc. > >>>>>> phave...@gene.com > >>>>>> > >>>>>> On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr.< > tim.tri...@gmail.com > >>>> > >>>>>> wrote: > >>>>>> > >>>>>> This. > >>>>>>> > >>>>>>> It would be damned near perfect as a return value for assays coming > >>> out > >>>>>>> of > >>>>>>> an object that held several such assays at several time points in a > >>>>>>> population, where there are both assay-wise and covariate-wise > >>> "holes" > >>>>>>> that > >>>>>>> could nonetheless be usefully imputed across assays. > >>>>>>> > >>>>>>> > >>>>>>> Statistics is the grammar of science. > >>>>>>> Karl Pearson<http://en.wikipedia.org/wiki/The_Grammar_of_Science> > >>>>>>> > >>>>>>> On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty< > >>> haverty.pe...@gene.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I still think GRanges should be a subclass of DataFrame, > >>>>>>>>> > >>>>>>>>>> which would make this easy, but I don't seem to be winning that > >>>>>>>>>> > >>>>>>>>> argument. > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>> Just impossible. As Michael mentioned back in November, they have > >>>>>>>>> conflicting APIs. > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Maybe a new "GRangesFrame" that is a DataFrame and holds a GRanges > >>>>>>>> (without mcols) as an index? > >>>>>>>> > >>>>>>>> > >>>>>>>> [[alternative HTML version deleted]] > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Bioc-devel@r-project.org mailing list > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>>>>> > >>>>>>>> > >>>>>>> [[alternative HTML version deleted]] > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioc-devel@r-project.org mailing list > >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> _______________________________________________ > >>>>> Bioc-devel@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>> > >>>>> > >>>> -- > >>>> Hervé Pagès > >>>> > >>>> Program in Computational Biology > >>>> Division of Public Health Sciences > >>>> Fred Hutchinson Cancer Research Center > >>>> 1100 Fairview Ave. N, M1-B514 > >>>> P.O. Box 19024 > >>>> Seattle, WA 98109-1024 > >>>> > >>>> E-mail: hpa...@fredhutch.org > >>>> Phone: (206) 667-5791 > >>>> Fax: (206) 667-1319 > >>>> > >>>> _______________________________________________ > >>>> Bioc-devel@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioc-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>> > >> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > -- > Robert Castelo, PhD > Associate Professor > Dept. of Experimental and Health Sciences > Universitat Pompeu Fabra (UPF) > Barcelona Biomedical Research Park (PRBB) > Dr Aiguader 88 > E-08003 Barcelona, Spain > telf: +34.933.160.514 > fax: +34.933.160.550 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel