Re: [Bioc-devel] dimnames of multidimensional assays in SummarizedExperiment

2016-02-24 Thread Hervé Pagès

Hi Pete,

Sorry for the delay.

On 02/10/2016 12:33 PM, Peter Hickey wrote:

The assays slot in a SummarizedExperiment object supports elements
with up to 4 dimensions [*]

library(SummarizedExperiment)
makeSE <- function(n) {
   assay <- array(1:2^n,


What? `^` has precedence over `:` ? Amazing...


  dim = rep(2, n),
  dimnames = split(letters[1:(2 * n)], seq_len(n)))
   SummarizedExperiment(assay)
}
x <- makeSE(4)

However, the "higher-order" dimnames of the assays aren't preserved
when calling the `assays` or `assay` getters:


dimnames(assay(x, withDimnames = TRUE))

[[1]]
[1] "a" "e"

[[2]]
[1] "b" "f"

[[3]]
NULL

[[4]]
NULL

This is despite the data still being available in the assays slot:


dimnames(x@assays[[1]])

1`
[1] "a" "e"

2`
[1] "b" "f"

3`
[1] "c" "g"

4`
[1] "d" "h"

The following patch fixes this by only touching the rownames and
colnames and not touching the "higher-order" dimnames. Seem
reasonable?

Index: R/SummarizedExperiment-class.R
===
   --- R/SummarizedExperiment-class.R (revision 113505)
+++ R/SummarizedExperiment-class.R (working copy)
@@ -174,7 +174,10 @@
{
   assays <- as(x@assays, "SimpleList")
   if (withDimnames)
 -endoapply(assays, "dimnames<-", dimnames(x))
   + endoapply(assays, function(assay) {
 +dimnames(assay)[1:2] <- dimnames(x)
 +assay
 +})
   else
 assays
})


Thanks for catching this and providing a patch. I just applied it (in
SummarizedExperiment 1.1.21) and added some tests to cover this,



[*] In fact, the assay elements can have more than 4 dimensions when
constructed, although subsetting with `[` isn't supported (possibly
things other than subsetting break as well in this case).

# No error
y <- makeSE(5)
y

# Error
y[1, ]

Perhaps there should be a check in the constructor that all assay
elements have < 5 dimensions?


The early checking/validation of an SE object is an interesting topic
that is open for discussion. This is something we've discussed
internally with Martin and it seems that there is some benefit in
not enforcing the full SummarizedExperiment API contract upfront.
For example, you could imagine that someone wants to stick a 5-D
assay in an SE object but doesn't have the need to subset the object.
Or that someone wants to stick a 2-D assay that doesn't even
support subsetting, or doesn't support cbind() or rbind() (which means
that then trying to subset the SE object or cbind() or rbind() it
with another SE object will fail).

It actually seems to be a good feature that people can stick almost
anything in the assays slot of an SE object, as long as the individual
assay objects support dim(), dimnames(), and dimnames<-. These are the
minimum requirements and they give you an SE object with minimal
capabilities. The full requirements (i.e. the above plus [, [<-, rbind,
cbind) give you an SE object with full capabilities.

On-disk objects (e.g. in HDF5 format) are a good example of objects
that won't give you SE objects with full capabilities but enough
capabilities for some common workflows. FWIW I'm currently working on
an implementation of HDF5Matrix and HDF5Array objects that will support
dim(), dimnames(), dimnames<-, and [. They won't support [<-, rbind,
and cbind but these capabilities are not needed by popular workflows
like the DESeq2 vignette.

That being said, I don't know the reason for the current 4 dimensions
limit of subsetting. Sounds kind of arbitrary. Maybe we should just
support subsetting of assays with any number dimensions.

Cheers,
H.



Cheers,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] dimnames of multidimensional assays in SummarizedExperiment

2016-02-15 Thread Hervé Pagès

Hi Pete,

I'll look into this. Thanks!

H.

On 02/10/2016 12:33 PM, Peter Hickey wrote:

The assays slot in a SummarizedExperiment object supports elements
with up to 4 dimensions [*]

library(SummarizedExperiment)
makeSE <- function(n) {
   assay <- array(1:2^n,
  dim = rep(2, n),
  dimnames = split(letters[1:(2 * n)], seq_len(n)))
   SummarizedExperiment(assay)
}
x <- makeSE(4)

However, the "higher-order" dimnames of the assays aren't preserved
when calling the `assays` or `assay` getters:


dimnames(assay(x, withDimnames = TRUE))

[[1]]
[1] "a" "e"

[[2]]
[1] "b" "f"

[[3]]
NULL

[[4]]
NULL

This is despite the data still being available in the assays slot:


dimnames(x@assays[[1]])

1`
[1] "a" "e"

2`
[1] "b" "f"

3`
[1] "c" "g"

4`
[1] "d" "h"

The following patch fixes this by only touching the rownames and
colnames and not touching the "higher-order" dimnames. Seem
reasonable?

Index: R/SummarizedExperiment-class.R
===
   --- R/SummarizedExperiment-class.R (revision 113505)
+++ R/SummarizedExperiment-class.R (working copy)
@@ -174,7 +174,10 @@
{
   assays <- as(x@assays, "SimpleList")
   if (withDimnames)
 -endoapply(assays, "dimnames<-", dimnames(x))
   + endoapply(assays, function(assay) {
 +dimnames(assay)[1:2] <- dimnames(x)
 +assay
 +})
   else
 assays
})

[*] In fact, the assay elements can have more than 4 dimensions when
constructed, although subsetting with `[` isn't supported (possibly
things other than subsetting break as well in this case).

# No error
y <- makeSE(5)
y

# Error
y[1, ]

Perhaps there should be a check in the constructor that all assay
elements have < 5 dimensions?

Cheers,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] dimnames of multidimensional assays in SummarizedExperiment

2016-02-10 Thread Peter Hickey
The assays slot in a SummarizedExperiment object supports elements
with up to 4 dimensions [*]

library(SummarizedExperiment)
makeSE <- function(n) {
  assay <- array(1:2^n,
 dim = rep(2, n),
 dimnames = split(letters[1:(2 * n)], seq_len(n)))
  SummarizedExperiment(assay)
}
x <- makeSE(4)

However, the "higher-order" dimnames of the assays aren't preserved
when calling the `assays` or `assay` getters:

> dimnames(assay(x, withDimnames = TRUE))
[[1]]
[1] "a" "e"

[[2]]
[1] "b" "f"

[[3]]
NULL

[[4]]
NULL

This is despite the data still being available in the assays slot:

> dimnames(x@assays[[1]])
1`
[1] "a" "e"

2`
[1] "b" "f"

3`
[1] "c" "g"

4`
[1] "d" "h"

The following patch fixes this by only touching the rownames and
colnames and not touching the "higher-order" dimnames. Seem
reasonable?

Index: R/SummarizedExperiment-class.R
===
  --- R/SummarizedExperiment-class.R (revision 113505)
+++ R/SummarizedExperiment-class.R (working copy)
@@ -174,7 +174,10 @@
{
  assays <- as(x@assays, "SimpleList")
  if (withDimnames)
-endoapply(assays, "dimnames<-", dimnames(x))
  + endoapply(assays, function(assay) {
+dimnames(assay)[1:2] <- dimnames(x)
+assay
+})
  else
assays
})

[*] In fact, the assay elements can have more than 4 dimensions when
constructed, although subsetting with `[` isn't supported (possibly
things other than subsetting break as well in this case).

# No error
y <- makeSE(5)
y

# Error
y[1, ]

Perhaps there should be a check in the constructor that all assay
elements have < 5 dimensions?

Cheers,
Pete

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel