Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
On 11/26/2014 12:11 PM, Hervé Pagès wrote: Hi guys, I like the idea of separating the row data from the row ranges. This could be formalized with 2 distinct accessors: rowData() and rowRanges(). The former would return a DataFrame, and the latter NULL or a range-based object (GRanges or GRangesList). I don't think there is the need for an emptyRanges class. For the original question, I think the ability to store genomic coordinates as well as other 'S4Vector' classes is very helpful for advanced users, even if a little intimidating for novice users. Also, it's clear that SummarizedExperiment in its current form doesn't satisfy the common use case of identifiers without range information. I think it makes sense to enable some like Herve outlines above, where the rowData() are separated into range information and annotation information, and I'll move forward with that implementation over the next week or so. Martin H. On 11/26/2014 11:40 AM, Hector Corrada Bravo wrote: One thing that’s become apparent working on epivizr is that it may be useful to think about ‘rowData’ in a SummarizedExperiment as having two distinct components: row coordinates and row metadata. In the current class rowData is a ‘GenomicRanges’ which contains both coordinates (the ranges) and metadata (mcols(rowData)). In metagenomics (the other application my group works a lot with), we think of the taxonomy as providing coordinates. The distinction is worthwhile thinking about since there are certain operations we do on coordinates that we don’t do with metadata (and conversely). Thinking about it this way, the ‘ExpressionSet’ object would be data without coordinates. So, I would avoid making ‘GenomicRanges’ behave like ‘DataFrame’ since this distinction between coordinates and metadata is lost. The ‘emptyRanges’ proposal gets closer to this since this corresponds to ‘no coordinates’, but it may be worth thinking in the long term on making the coordinate/metadata distinction more general. Hector On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. tim.tri...@gmail.com wrote: so as a simple experiment, I did the following: library(GenomicRanges) bar - matrix(rnorm(100), ncol=10) colnames(bar) - as.character(1:10) rownames(bar) - letters[1:10] foo - SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ##seqnamesranges strand ## Rle IRanges Rle ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## 7 more elements colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays exprs as a work-around. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote: On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment('s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use
[Bioc-devel] SummarizedExperiment vs ExpressionSet
A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment(’s interface). Are there any pertinent insights from this group? Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment(’s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/ ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays exprs as a work-around. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote: On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment('s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/ ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) Have to disagree on that. A GRanges is a vector of ranges; a table is a list of vectors all of the same length. Different things. There was a lot of thought invested in that. But it does subset as x[i,j], so in theory SummarizedExperiment could be generalized to contain something with the contract of 2D extraction. I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays exprs as a work-around. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote: On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment('s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/ ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
so as a simple experiment, I did the following: library(GenomicRanges) bar - matrix(rnorm(100), ncol=10) colnames(bar) - as.character(1:10) rownames(bar) - letters[1:10] foo - SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ##seqnamesranges strand ## Rle IRanges Rle ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## 7 more elements colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays exprs as a work-around. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote: On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment('s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/ ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
GRangesList is very compact, so this would definitely get the job done. But having an empty range is not the same as a NA, nor does it mean that ranges are irrelevant. There are definitely times, especially as we extend beyond genomics, when we need something more general, as Pete suggests. As an aside I think there is an interesting structural relationship between something like an eSet and a pivot table in a spreadsheet, except an eSet has multiple measurement tables and the column/row annotations are not just for aggregation. If we start to think more broadly, we should consider such specializations and try to unify them into a single framework. On Wed, Nov 26, 2014 at 9:37 AM, Tim Triche, Jr. tim.tri...@gmail.com wrote: so as a simple experiment, I did the following: library(GenomicRanges) bar - matrix(rnorm(100), ncol=10) colnames(bar) - as.character(1:10) rownames(bar) - letters[1:10] foo - SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ##seqnamesranges strand ## Rle IRanges Rle ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## 7 more elements colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays exprs as a work-around. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote: On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment('s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/ ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing
Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
One thing that’s become apparent working on epivizr is that it may be useful to think about ‘rowData’ in a SummarizedExperiment as having two distinct components: row coordinates and row metadata. In the current class rowData is a ‘GenomicRanges’ which contains both coordinates (the ranges) and metadata (mcols(rowData)). In metagenomics (the other application my group works a lot with), we think of the taxonomy as providing coordinates. The distinction is worthwhile thinking about since there are certain operations we do on coordinates that we don’t do with metadata (and conversely). Thinking about it this way, the ‘ExpressionSet’ object would be data without coordinates. So, I would avoid making ‘GenomicRanges’ behave like ‘DataFrame’ since this distinction between coordinates and metadata is lost. The ‘emptyRanges’ proposal gets closer to this since this corresponds to ‘no coordinates’, but it may be worth thinking in the long term on making the coordinate/metadata distinction more general. Hector On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. tim.tri...@gmail.com wrote: so as a simple experiment, I did the following: library(GenomicRanges) bar - matrix(rnorm(100), ncol=10) colnames(bar) - as.character(1:10) rownames(bar) - letters[1:10] foo - SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ##seqnamesranges strand ## Rle IRanges Rle ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## 7 more elements colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays exprs as a work-around. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote: On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment('s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/ ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML
Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
Hi guys, I like the idea of separating the row data from the row ranges. This could be formalized with 2 distinct accessors: rowData() and rowRanges(). The former would return a DataFrame, and the latter NULL or a range-based object (GRanges or GRangesList). I don't think there is the need for an emptyRanges class. H. On 11/26/2014 11:40 AM, Hector Corrada Bravo wrote: One thing that’s become apparent working on epivizr is that it may be useful to think about ‘rowData’ in a SummarizedExperiment as having two distinct components: row coordinates and row metadata. In the current class rowData is a ‘GenomicRanges’ which contains both coordinates (the ranges) and metadata (mcols(rowData)). In metagenomics (the other application my group works a lot with), we think of the taxonomy as providing coordinates. The distinction is worthwhile thinking about since there are certain operations we do on coordinates that we don’t do with metadata (and conversely). Thinking about it this way, the ‘ExpressionSet’ object would be data without coordinates. So, I would avoid making ‘GenomicRanges’ behave like ‘DataFrame’ since this distinction between coordinates and metadata is lost. The ‘emptyRanges’ proposal gets closer to this since this corresponds to ‘no coordinates’, but it may be worth thinking in the long term on making the coordinate/metadata distinction more general. Hector On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. tim.tri...@gmail.com wrote: so as a simple experiment, I did the following: library(GenomicRanges) bar - matrix(rnorm(100), ncol=10) colnames(bar) - as.character(1:10) rownames(bar) - letters[1:10] foo - SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ##seqnamesranges strand ## Rle IRanges Rle ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## 7 more elements colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays exprs as a work-around. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote: On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment('s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list
Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet
OK, GRanges as vector that does overlap stuff makes sense, but I think putting a DataFrame of metadata on that confuses the purpose of the object. How about a GRangesTable that inherits from both GenomicRanges and DataTable? It would be a DataFrame with a fancy index. The DataFrame API would make stuff like colnames work (rather than needing colnames(mcols(x)) ). If this were used as the rowData for SummarizedExperiment, then a plain DataFrame could be made to work too. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 9:33 AM, Michael Lawrence lawrence.mich...@gene.com wrote: On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com wrote: Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) Have to disagree on that. A GRanges is a vector of ranges; a table is a list of vectors all of the same length. Different things. There was a lot of thought invested in that. But it does subset as x[i,j], so in theory SummarizedExperiment could be generalized to contain something with the contract of 2D extraction. I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays exprs as a work-around. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote: On 26 November 2014 14:59, Wolfgang Huber wrote: A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment('s interface). Are there any pertinent insights from this group? Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent Thanks and best wishes Wolfgang ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/ ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel