Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-12-01 Thread Martin Morgan

On 11/26/2014 12:11 PM, Hervé Pagès wrote:

Hi guys,

I like the idea of separating the row data from the row ranges.
This could be formalized with 2 distinct accessors: rowData() and
rowRanges(). The former would return a DataFrame, and the latter
NULL or a range-based object (GRanges or GRangesList).
I don't think there is the need for an emptyRanges class.


For the original question, I think the ability to store genomic coordinates as 
well as other 'S4Vector' classes is very helpful for advanced users, even if a 
little intimidating for novice users.


Also, it's clear that SummarizedExperiment in its current form doesn't satisfy 
the common use case of identifiers without range information.


I think it makes sense to enable some like Herve outlines above, where the 
rowData() are separated into range information and annotation information, and 
I'll move forward with that implementation over the next week or so.


Martin



H.

On 11/26/2014 11:40 AM, Hector Corrada Bravo wrote:

One thing that’s become apparent working on epivizr is that it may be useful
to think about ‘rowData’ in a SummarizedExperiment as having two distinct
components: row coordinates and row metadata. In the current class rowData is
a ‘GenomicRanges’ which contains both coordinates (the ranges) and metadata
(mcols(rowData)). In metagenomics (the other application my group works a lot
with), we think of the taxonomy as providing coordinates. The distinction is
worthwhile thinking about since there are certain operations we do on
coordinates that we don’t do with metadata (and conversely).




Thinking about it this way, the ‘ExpressionSet’ object would be data without
coordinates. So, I would avoid making ‘GenomicRanges’ behave like ‘DataFrame’
since this distinction between coordinates and metadata is lost. The
‘emptyRanges’ proposal gets closer to this since this corresponds to ‘no
coordinates’, but it may be worth thinking in the long term on making the
coordinate/metadata distinction more general.




Hector

On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:


so as a simple experiment, I did the following:
library(GenomicRanges)
bar - matrix(rnorm(100), ncol=10)
colnames(bar) - as.character(1:10)
rownames(bar) - letters[1:10]
foo - SummarizedExperiment(assays=list(bar=bar))
rowData(foo)
## GRangesList object of length 10:
## $a
## GRanges object with 0 ranges and 0 metadata columns:
##seqnamesranges strand
##   Rle IRanges  Rle
##
## $b
## GRanges object with 0 ranges and 0 metadata columns:
##  seqnames ranges strand
##
## $c
## GRanges object with 0 ranges and 0 metadata columns:
##  seqnames ranges strand
##
## ...
## 7 more elements
colData(foo)
## DataFrame with 10 rows and 0 columns
This got me to thinking, why not have an emptyRanges class, or else the
ability to index a bunch of NULL ranges without a lot of hoohah?  The
defaults mostly do what they're supposed to; why not have a compact
representation of empty rowData as for empty colData (i.e., a DataFrame
with 0 rows)?  Or is a GRangesList of empty GRanges as compact as it is
practicable to get for this purpose?
Just pondering what the lowest-impact solution to the problem at hand might
be.
Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science
On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com
wrote:

Hi all,

I believe there is a strong need for an object that organizes a collection
of rectangular data (matrices, etc.) with metadata on the rows and
columns.  Can SummarizedExperiment inherit from something simpler that has
a DataFrame as rowData?  (I believe GenomicRanges should inherit from
DataTable, rather than Vector, and subset as x[i,j], but maybe that's
getting a bit off topic.)  I often see people stuffing arbitrary data into
an ExpressionSet and calling one of the assays exprs as a work-around.

Regards,

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote:



On 26 November 2014 14:59, Wolfgang Huber wrote:


A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.

There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).

Are there any pertinent insights from this group?


Instead of ExpressionSet, you could use 

[Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Wolfgang Huber
A colleague and I are designing a package for quantitative proteomics data, and 
we are debating whether to base it on the SummarizedExperiment or the 
ExpressionSet class. 

There is no immediate use for the ranges aspect of SummarizedExperiment, so 
that would have to be carried around with NAs, and this is a parsimony argument 
for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is 
cleaner, its code more modern and more likely to be updated, and users of the 
Bioconductor project are likely to benefit from having to deal with a single 
interface that works the same or similarly across packages, rather than a 
variety of formats; which argues that new packages should converge towards 
SummarizedExperiment(’s interface).

Are there any pertinent insights from this group?

Thanks and best wishes
Wolfgang

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Laurent Gatto

On 26 November 2014 14:59, Wolfgang Huber wrote:

 A colleague and I are designing a package for quantitative proteomics
 data, and we are debating whether to base it on the
 SummarizedExperiment or the ExpressionSet class.

 There is no immediate use for the ranges aspect of
 SummarizedExperiment, so that would have to be carried around with
 NAs, and this is a parsimony argument for using ExpressionSet
 instead. OTOH, the interface of SummarizedExperiment is cleaner, its
 code more modern and more likely to be updated, and users of the
 Bioconductor project are likely to benefit from having to deal with a
 single interface that works the same or similarly across packages,
 rather than a variety of formats; which argues that new packages
 should converge towards SummarizedExperiment(’s interface).

 Are there any pertinent insights from this group?

Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).

Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.

Hope this helps.

Best wishes,

Laurent

 Thanks and best wishes
 Wolfgang

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Peter Haverty
Hi all,

I believe there is a strong need for an object that organizes a collection
of rectangular data (matrices, etc.) with metadata on the rows and
columns.  Can SummarizedExperiment inherit from something simpler that has
a DataFrame as rowData?  (I believe GenomicRanges should inherit from
DataTable, rather than Vector, and subset as x[i,j], but maybe that's
getting a bit off topic.)  I often see people stuffing arbitrary data into
an ExpressionSet and calling one of the assays exprs as a work-around.

Regards,

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote:


 On 26 November 2014 14:59, Wolfgang Huber wrote:

  A colleague and I are designing a package for quantitative proteomics
  data, and we are debating whether to base it on the
  SummarizedExperiment or the ExpressionSet class.
 
  There is no immediate use for the ranges aspect of
  SummarizedExperiment, so that would have to be carried around with
  NAs, and this is a parsimony argument for using ExpressionSet
  instead. OTOH, the interface of SummarizedExperiment is cleaner, its
  code more modern and more likely to be updated, and users of the
  Bioconductor project are likely to benefit from having to deal with a
  single interface that works the same or similarly across packages,
  rather than a variety of formats; which argues that new packages
  should converge towards SummarizedExperiment('s interface).
 
  Are there any pertinent insights from this group?

 Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
 essentially an ExpressionSet for quantitative proteomics (i.e it has a
 MIAPE slot, instead of MIAME for example).

 Ideally, a SummarizedExperiment for proteomics would use peptide/protein
 ranges, which is in the pipeline, as far as I am concerned. When that
 becomes available, there should be infrastructure to coerce and MSnSet
 (and/or other relevant data) into an SummarizedExperiment.

 Hope this helps.

 Best wishes,

 Laurent

  Thanks and best wishes
  Wolfgang
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel

 --
 Laurent Gatto
 http://cpu.sysbiol.cam.ac.uk/

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Michael Lawrence
On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com
wrote:

 Hi all,

 I believe there is a strong need for an object that organizes a collection
 of rectangular data (matrices, etc.) with metadata on the rows and
 columns.  Can SummarizedExperiment inherit from something simpler that has
 a DataFrame as rowData?

  (I believe GenomicRanges should inherit from
 DataTable, rather than Vector, and subset as x[i,j], but maybe that's
 getting a bit off topic.)


Have to disagree on that. A GRanges is a vector of ranges; a table is a
list of vectors all of the same length. Different things. There was a lot
of thought invested in that. But it does subset as x[i,j], so in theory
SummarizedExperiment could be generalized to contain something with the
contract of 2D extraction.


 I often see people stuffing arbitrary data into
 an ExpressionSet and calling one of the assays exprs as a work-around.

 Regards,

 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote:

 
  On 26 November 2014 14:59, Wolfgang Huber wrote:
 
   A colleague and I are designing a package for quantitative proteomics
   data, and we are debating whether to base it on the
   SummarizedExperiment or the ExpressionSet class.
  
   There is no immediate use for the ranges aspect of
   SummarizedExperiment, so that would have to be carried around with
   NAs, and this is a parsimony argument for using ExpressionSet
   instead. OTOH, the interface of SummarizedExperiment is cleaner, its
   code more modern and more likely to be updated, and users of the
   Bioconductor project are likely to benefit from having to deal with a
   single interface that works the same or similarly across packages,
   rather than a variety of formats; which argues that new packages
   should converge towards SummarizedExperiment('s interface).
  
   Are there any pertinent insights from this group?
 
  Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
  essentially an ExpressionSet for quantitative proteomics (i.e it has a
  MIAPE slot, instead of MIAME for example).
 
  Ideally, a SummarizedExperiment for proteomics would use peptide/protein
  ranges, which is in the pipeline, as far as I am concerned. When that
  becomes available, there should be infrastructure to coerce and MSnSet
  (and/or other relevant data) into an SummarizedExperiment.
 
  Hope this helps.
 
  Best wishes,
 
  Laurent
 
   Thanks and best wishes
   Wolfgang
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
  --
  Laurent Gatto
  http://cpu.sysbiol.cam.ac.uk/
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Tim Triche, Jr.
so as a simple experiment, I did the following:

library(GenomicRanges)
bar - matrix(rnorm(100), ncol=10)
colnames(bar) - as.character(1:10)
rownames(bar) - letters[1:10]
foo - SummarizedExperiment(assays=list(bar=bar))

rowData(foo)
## GRangesList object of length 10:
## $a
## GRanges object with 0 ranges and 0 metadata columns:
##seqnamesranges strand
##   Rle IRanges  Rle
##
## $b
## GRanges object with 0 ranges and 0 metadata columns:
##  seqnames ranges strand
##
## $c
## GRanges object with 0 ranges and 0 metadata columns:
##  seqnames ranges strand
##
## ...
## 7 more elements

colData(foo)
## DataFrame with 10 rows and 0 columns

This got me to thinking, why not have an emptyRanges class, or else the
ability to index a bunch of NULL ranges without a lot of hoohah?  The
defaults mostly do what they're supposed to; why not have a compact
representation of empty rowData as for empty colData (i.e., a DataFrame
with 0 rows)?  Or is a GRangesList of empty GRanges as compact as it is
practicable to get for this purpose?

Just pondering what the lowest-impact solution to the problem at hand might
be.


Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com
wrote:

 Hi all,

 I believe there is a strong need for an object that organizes a collection
 of rectangular data (matrices, etc.) with metadata on the rows and
 columns.  Can SummarizedExperiment inherit from something simpler that has
 a DataFrame as rowData?  (I believe GenomicRanges should inherit from
 DataTable, rather than Vector, and subset as x[i,j], but maybe that's
 getting a bit off topic.)  I often see people stuffing arbitrary data into
 an ExpressionSet and calling one of the assays exprs as a work-around.

 Regards,

 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote:

 
  On 26 November 2014 14:59, Wolfgang Huber wrote:
 
   A colleague and I are designing a package for quantitative proteomics
   data, and we are debating whether to base it on the
   SummarizedExperiment or the ExpressionSet class.
  
   There is no immediate use for the ranges aspect of
   SummarizedExperiment, so that would have to be carried around with
   NAs, and this is a parsimony argument for using ExpressionSet
   instead. OTOH, the interface of SummarizedExperiment is cleaner, its
   code more modern and more likely to be updated, and users of the
   Bioconductor project are likely to benefit from having to deal with a
   single interface that works the same or similarly across packages,
   rather than a variety of formats; which argues that new packages
   should converge towards SummarizedExperiment('s interface).
  
   Are there any pertinent insights from this group?
 
  Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
  essentially an ExpressionSet for quantitative proteomics (i.e it has a
  MIAPE slot, instead of MIAME for example).
 
  Ideally, a SummarizedExperiment for proteomics would use peptide/protein
  ranges, which is in the pipeline, as far as I am concerned. When that
  becomes available, there should be infrastructure to coerce and MSnSet
  (and/or other relevant data) into an SummarizedExperiment.
 
  Hope this helps.
 
  Best wishes,
 
  Laurent
 
   Thanks and best wishes
   Wolfgang
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
  --
  Laurent Gatto
  http://cpu.sysbiol.cam.ac.uk/
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Michael Lawrence
GRangesList is very compact, so this would definitely get the job done. But
having an empty range is not the same as a NA, nor does it mean that ranges
are irrelevant. There are definitely times, especially as we extend
beyond genomics, when we need something more general, as Pete suggests.

As an aside I think there is an interesting structural relationship between
something like an eSet and a pivot table in a spreadsheet, except an eSet
has multiple measurement tables and the column/row annotations are not just
for aggregation. If we start to think more broadly, we should consider such
specializations and try to unify them into a single framework.



On Wed, Nov 26, 2014 at 9:37 AM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:

 so as a simple experiment, I did the following:

 library(GenomicRanges)
 bar - matrix(rnorm(100), ncol=10)
 colnames(bar) - as.character(1:10)
 rownames(bar) - letters[1:10]
 foo - SummarizedExperiment(assays=list(bar=bar))

 rowData(foo)
 ## GRangesList object of length 10:
 ## $a
 ## GRanges object with 0 ranges and 0 metadata columns:
 ##seqnamesranges strand
 ##   Rle IRanges  Rle
 ##
 ## $b
 ## GRanges object with 0 ranges and 0 metadata columns:
 ##  seqnames ranges strand
 ##
 ## $c
 ## GRanges object with 0 ranges and 0 metadata columns:
 ##  seqnames ranges strand
 ##
 ## ...
 ## 7 more elements

 colData(foo)
 ## DataFrame with 10 rows and 0 columns

 This got me to thinking, why not have an emptyRanges class, or else the
 ability to index a bunch of NULL ranges without a lot of hoohah?  The
 defaults mostly do what they're supposed to; why not have a compact
 representation of empty rowData as for empty colData (i.e., a DataFrame
 with 0 rows)?  Or is a GRangesList of empty GRanges as compact as it is
 practicable to get for this purpose?

 Just pondering what the lowest-impact solution to the problem at hand might
 be.


 Statistics is the grammar of science.
 Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

 On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com
 wrote:

  Hi all,
 
  I believe there is a strong need for an object that organizes a
 collection
  of rectangular data (matrices, etc.) with metadata on the rows and
  columns.  Can SummarizedExperiment inherit from something simpler that
 has
  a DataFrame as rowData?  (I believe GenomicRanges should inherit from
  DataTable, rather than Vector, and subset as x[i,j], but maybe that's
  getting a bit off topic.)  I often see people stuffing arbitrary data
 into
  an ExpressionSet and calling one of the assays exprs as a work-around.
 
  Regards,
 
  Pete
 
  
  Peter M. Haverty, Ph.D.
  Genentech, Inc.
  phave...@gene.com
 
  On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote:
 
  
   On 26 November 2014 14:59, Wolfgang Huber wrote:
  
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
   
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
   
Are there any pertinent insights from this group?
  
   Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
   essentially an ExpressionSet for quantitative proteomics (i.e it has a
   MIAPE slot, instead of MIAME for example).
  
   Ideally, a SummarizedExperiment for proteomics would use
 peptide/protein
   ranges, which is in the pipeline, as far as I am concerned. When that
   becomes available, there should be infrastructure to coerce and MSnSet
   (and/or other relevant data) into an SummarizedExperiment.
  
   Hope this helps.
  
   Best wishes,
  
   Laurent
  
Thanks and best wishes
Wolfgang
   
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
   --
   Laurent Gatto
   http://cpu.sysbiol.cam.ac.uk/
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 
  [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing 

Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Hector Corrada Bravo
One thing that’s become apparent working on epivizr is that it may be useful to 
think about ‘rowData’ in a SummarizedExperiment as having two distinct 
components: row coordinates and row metadata. In the current class rowData is a 
‘GenomicRanges’ which contains both coordinates (the ranges) and metadata 
(mcols(rowData)). In metagenomics (the other application my group works a lot 
with), we think of the taxonomy as providing coordinates. The distinction is 
worthwhile thinking about since there are certain operations we do on 
coordinates that we don’t do with metadata (and conversely).




Thinking about it this way, the ‘ExpressionSet’ object would be data without 
coordinates. So, I would avoid making ‘GenomicRanges’ behave like ‘DataFrame’ 
since this distinction between coordinates and metadata is lost. The 
‘emptyRanges’ proposal gets closer to this since this corresponds to ‘no 
coordinates’, but it may be worth thinking in the long term on making the 
coordinate/metadata distinction more general.




Hector

On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:

 so as a simple experiment, I did the following:
 library(GenomicRanges)
 bar - matrix(rnorm(100), ncol=10)
 colnames(bar) - as.character(1:10)
 rownames(bar) - letters[1:10]
 foo - SummarizedExperiment(assays=list(bar=bar))
 rowData(foo)
 ## GRangesList object of length 10:
 ## $a
 ## GRanges object with 0 ranges and 0 metadata columns:
 ##seqnamesranges strand
 ##   Rle IRanges  Rle
 ##
 ## $b
 ## GRanges object with 0 ranges and 0 metadata columns:
 ##  seqnames ranges strand
 ##
 ## $c
 ## GRanges object with 0 ranges and 0 metadata columns:
 ##  seqnames ranges strand
 ##
 ## ...
 ## 7 more elements
 colData(foo)
 ## DataFrame with 10 rows and 0 columns
 This got me to thinking, why not have an emptyRanges class, or else the
 ability to index a bunch of NULL ranges without a lot of hoohah?  The
 defaults mostly do what they're supposed to; why not have a compact
 representation of empty rowData as for empty colData (i.e., a DataFrame
 with 0 rows)?  Or is a GRangesList of empty GRanges as compact as it is
 practicable to get for this purpose?
 Just pondering what the lowest-impact solution to the problem at hand might
 be.
 Statistics is the grammar of science.
 Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science
 On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com
 wrote:
 Hi all,

 I believe there is a strong need for an object that organizes a collection
 of rectangular data (matrices, etc.) with metadata on the rows and
 columns.  Can SummarizedExperiment inherit from something simpler that has
 a DataFrame as rowData?  (I believe GenomicRanges should inherit from
 DataTable, rather than Vector, and subset as x[i,j], but maybe that's
 getting a bit off topic.)  I often see people stuffing arbitrary data into
 an ExpressionSet and calling one of the assays exprs as a work-around.

 Regards,

 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote:

 
  On 26 November 2014 14:59, Wolfgang Huber wrote:
 
   A colleague and I are designing a package for quantitative proteomics
   data, and we are debating whether to base it on the
   SummarizedExperiment or the ExpressionSet class.
  
   There is no immediate use for the ranges aspect of
   SummarizedExperiment, so that would have to be carried around with
   NAs, and this is a parsimony argument for using ExpressionSet
   instead. OTOH, the interface of SummarizedExperiment is cleaner, its
   code more modern and more likely to be updated, and users of the
   Bioconductor project are likely to benefit from having to deal with a
   single interface that works the same or similarly across packages,
   rather than a variety of formats; which argues that new packages
   should converge towards SummarizedExperiment('s interface).
  
   Are there any pertinent insights from this group?
 
  Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
  essentially an ExpressionSet for quantitative proteomics (i.e it has a
  MIAPE slot, instead of MIAME for example).
 
  Ideally, a SummarizedExperiment for proteomics would use peptide/protein
  ranges, which is in the pipeline, as far as I am concerned. When that
  becomes available, there should be infrastructure to coerce and MSnSet
  (and/or other relevant data) into an SummarizedExperiment.
 
  Hope this helps.
 
  Best wishes,
 
  Laurent
 
   Thanks and best wishes
   Wolfgang
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
  --
  Laurent Gatto
  http://cpu.sysbiol.cam.ac.uk/
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML 

Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Hervé Pagès

Hi guys,

I like the idea of separating the row data from the row ranges.
This could be formalized with 2 distinct accessors: rowData() and
rowRanges(). The former would return a DataFrame, and the latter
NULL or a range-based object (GRanges or GRangesList).
I don't think there is the need for an emptyRanges class.

H.

On 11/26/2014 11:40 AM, Hector Corrada Bravo wrote:

One thing that’s become apparent working on epivizr is that it may be useful to 
think about ‘rowData’ in a SummarizedExperiment as having two distinct 
components: row coordinates and row metadata. In the current class rowData is a 
‘GenomicRanges’ which contains both coordinates (the ranges) and metadata 
(mcols(rowData)). In metagenomics (the other application my group works a lot 
with), we think of the taxonomy as providing coordinates. The distinction is 
worthwhile thinking about since there are certain operations we do on 
coordinates that we don’t do with metadata (and conversely).




Thinking about it this way, the ‘ExpressionSet’ object would be data without 
coordinates. So, I would avoid making ‘GenomicRanges’ behave like ‘DataFrame’ 
since this distinction between coordinates and metadata is lost. The 
‘emptyRanges’ proposal gets closer to this since this corresponds to ‘no 
coordinates’, but it may be worth thinking in the long term on making the 
coordinate/metadata distinction more general.




Hector

On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:


so as a simple experiment, I did the following:
library(GenomicRanges)
bar - matrix(rnorm(100), ncol=10)
colnames(bar) - as.character(1:10)
rownames(bar) - letters[1:10]
foo - SummarizedExperiment(assays=list(bar=bar))
rowData(foo)
## GRangesList object of length 10:
## $a
## GRanges object with 0 ranges and 0 metadata columns:
##seqnamesranges strand
##   Rle IRanges  Rle
##
## $b
## GRanges object with 0 ranges and 0 metadata columns:
##  seqnames ranges strand
##
## $c
## GRanges object with 0 ranges and 0 metadata columns:
##  seqnames ranges strand
##
## ...
## 7 more elements
colData(foo)
## DataFrame with 10 rows and 0 columns
This got me to thinking, why not have an emptyRanges class, or else the
ability to index a bunch of NULL ranges without a lot of hoohah?  The
defaults mostly do what they're supposed to; why not have a compact
representation of empty rowData as for empty colData (i.e., a DataFrame
with 0 rows)?  Or is a GRangesList of empty GRanges as compact as it is
practicable to get for this purpose?
Just pondering what the lowest-impact solution to the problem at hand might
be.
Statistics is the grammar of science.
Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science
On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com
wrote:

Hi all,

I believe there is a strong need for an object that organizes a collection
of rectangular data (matrices, etc.) with metadata on the rows and
columns.  Can SummarizedExperiment inherit from something simpler that has
a DataFrame as rowData?  (I believe GenomicRanges should inherit from
DataTable, rather than Vector, and subset as x[i,j], but maybe that's
getting a bit off topic.)  I often see people stuffing arbitrary data into
an ExpressionSet and calling one of the assays exprs as a work-around.

Regards,

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote:



On 26 November 2014 14:59, Wolfgang Huber wrote:


A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.

There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).

Are there any pertinent insights from this group?


Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).

Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.

Hope this helps.

Best wishes,

Laurent


Thanks and best wishes
Wolfgang

___
Bioc-devel@r-project.org mailing list

Re: [Bioc-devel] SummarizedExperiment vs ExpressionSet

2014-11-26 Thread Peter Haverty
OK, GRanges as vector that does overlap stuff makes sense, but I think
putting a DataFrame of metadata on that confuses the purpose of the
object.  How about a GRangesTable that inherits from both GenomicRanges
and DataTable?  It would be a DataFrame with a fancy index.  The DataFrame
API would make stuff like colnames work (rather than needing
colnames(mcols(x)) ). If this were used as the rowData for
SummarizedExperiment, then a plain DataFrame could be made to work too.

Pete


Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Wed, Nov 26, 2014 at 9:33 AM, Michael Lawrence lawrence.mich...@gene.com
 wrote:



 On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty haverty.pe...@gene.com
 wrote:

 Hi all,

 I believe there is a strong need for an object that organizes a collection
 of rectangular data (matrices, etc.) with metadata on the rows and
 columns.  Can SummarizedExperiment inherit from something simpler that has
 a DataFrame as rowData?

   (I believe GenomicRanges should inherit from
 DataTable, rather than Vector, and subset as x[i,j], but maybe that's
 getting a bit off topic.)


 Have to disagree on that. A GRanges is a vector of ranges; a table is a
 list of vectors all of the same length. Different things. There was a lot
 of thought invested in that. But it does subset as x[i,j], so in theory
 SummarizedExperiment could be generalized to contain something with the
 contract of 2D extraction.


 I often see people stuffing arbitrary data into
 an ExpressionSet and calling one of the assays exprs as a work-around.

 Regards,

 Pete

 
 Peter M. Haverty, Ph.D.
 Genentech, Inc.
 phave...@gene.com

 On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto lg...@cam.ac.uk wrote:

 
  On 26 November 2014 14:59, Wolfgang Huber wrote:
 
   A colleague and I are designing a package for quantitative proteomics
   data, and we are debating whether to base it on the
   SummarizedExperiment or the ExpressionSet class.
  
   There is no immediate use for the ranges aspect of
   SummarizedExperiment, so that would have to be carried around with
   NAs, and this is a parsimony argument for using ExpressionSet
   instead. OTOH, the interface of SummarizedExperiment is cleaner, its
   code more modern and more likely to be updated, and users of the
   Bioconductor project are likely to benefit from having to deal with a
   single interface that works the same or similarly across packages,
   rather than a variety of formats; which argues that new packages
   should converge towards SummarizedExperiment('s interface).
  
   Are there any pertinent insights from this group?
 
  Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
  essentially an ExpressionSet for quantitative proteomics (i.e it has a
  MIAPE slot, instead of MIAME for example).
 
  Ideally, a SummarizedExperiment for proteomics would use peptide/protein
  ranges, which is in the pipeline, as far as I am concerned. When that
  becomes available, there should be infrastructure to coerce and MSnSet
  (and/or other relevant data) into an SummarizedExperiment.
 
  Hope this helps.
 
  Best wishes,
 
  Laurent
 
   Thanks and best wishes
   Wolfgang
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
  --
  Laurent Gatto
  http://cpu.sysbiol.cam.ac.uk/
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel