Hi Herve, I would have expected GRanges behaviour, where the metadata is affected by the replacement. For example, with GRanges objects, I often use the metadata to store statistics or descriptors relevant to each genomic interval, e.g., peak scores, log-fold changes and so on. If I did a replacement to change the genomic coordinates, I would expect that the associated metadata would come along for the ride. The same argument would be applied to other classes derived from Vector.
I should add that I encountered this problem in the context of doing subset replacement for metadata. To re-use your IRanges example: > require(IRanges) > ir <- IRanges(11:14, 20, names=letters[1:4]) > mcols(ir) <- DataFrame(stuff=1:4) > mcols(ir[1])$stuff <- 2 > mcols(ir)$stuff [1] 1 2 3 4 I would expect that the first entry of "stuff" would now be 2. But because metadata doesn't get replaced during subset replacement, it remains at 1. This is problematic, because if I wanted to update a subset of entries in the metadata, I would use something similar to the above replacement and expect it to change the existing object. Cheers, Aaron Hervé Pagès wrote: > Hi Aaron, > > On 11/15/2015 10:59 AM, Aaron Lun wrote: >> Hello all, >> >> I've encountered some unexpected behaviour with some of the base classes >> while developing stuff for genomic interactions. The first issue lies >> with the subset replacement in the Vector class. Let's say I make a >> derived class "foo" inheriting from Vector, as below: >> >> > require(S4Vectors) >> > setClass("foo", contains="Vector", slots=c(blah="integer")) >> > setMethod("parallelSlotNames", "foo", function(x) { >> > c("blah", callNextMethod()) >> > }) >> [1] "parallelSlotNames" >> > setMethod("c", "foo", function(x, ..., recursive=TRUE) { >> > new.blah <- do.call(c, lapply(list(x, ...), FUN=slot, >> name="blah")) >> > new.mcols <- do.call(rbind, lapply(list(x, ...), FUN=mcols)) >> > new("foo", blah=new.blah, metadata=metadata(x), >> > elementMetadata=new.mcols) >> > }) >> [1] "c" >> >> Construction gives what you'd expect: >> >> > a <- new("foo", blah=1:5, elementMetadata=DataFrame(stuff=1:5)) >> > a@blah >> [1] 1 2 3 4 5 >> > mcols(a)$stuff >> [1] 1 2 3 4 5 >> >> However, if I try to do subset replacement, I get this: >> >> > a[1] <- a[2] >> > a@blah >> [1] 2 2 3 4 5 >> > mcols(a)$stuff >> [1] 1 2 3 4 5 >> >> So, "blah" is replaced properly, but "elementMetadata" is not. This is >> attributable to a line in "replaceROWS" which preserves the mcols of the >> original object during replacement (also for "names"). Should this line >> be removed to give expected behaviour for the elementMetadata? > > For the treatment of the metadata columns we are usually mimicking > how names are treated in base R: > > > x <- c(a=1, b=2, c=3, d=4) > > x[1] <- x[2] > > x > a b c d > 2 2 3 4 > > The names are not affected. > > IRanges objects are following that model: > > > library(IRanges) > > ir <- IRanges(11:14, 20, names=letters[1:4]) > > mcols(ir) <- DataFrame(stuff=1:4) > > ir > IRanges of length 4 > start end width names > [1] 11 20 10 a > [2] 12 20 9 b > [3] 13 20 8 c > [4] 14 20 7 d > > mcols(ir) > DataFrame with 4 rows and 1 column > stuff > <integer> > 1 1 > 2 2 > 3 3 > 4 4 > > ir[1] <- ir[2] > > ir > IRanges of length 4 > start end width names > [1] 12 20 9 a > [2] 12 20 9 b > [3] 13 20 8 c > [4] 14 20 7 d > > mcols(ir) > DataFrame with 4 rows and 1 column > stuff > <integer> > 1 1 > 2 2 > 3 3 > 4 4 > > However it seems that GRanges objects are not: > > > gr <- GRanges("chr1", ir) > > gr > GRanges object with 4 ranges and 1 metadata column: > seqnames ranges strand | stuff > <Rle> <IRanges> <Rle> | <integer> > a chr1 [12, 20] * | 1 > b chr1 [12, 20] * | 2 > c chr1 [13, 20] * | 3 > d chr1 [14, 20] * | 4 > ------- > seqinfo: 1 sequence from an unspecified genome; no seqlengths > > gr[1] <- gr[2] > > gr > GRanges object with 4 ranges and 1 metadata column: > seqnames ranges strand | stuff > <Rle> <IRanges> <Rle> | <integer> > a chr1 [12, 20] * | 2 > b chr1 [12, 20] * | 2 > c chr1 [13, 20] * | 3 > d chr1 [14, 20] * | 4 > ------- > seqinfo: 1 sequence from an unspecified genome; no seqlengths > > So we have an inconsistency within our Vector-based classes. > We need to fix that. It seems that you would have expected the > metadata columns to be altered by [<-. Is this what most people feel? > >> >> The other issue is that r/cbind'ing doesn't seem to work properly for >> unnamed multi-matrix Assays objects. Consider: >> >> > require(SummarizedExperiment) >> > whee <- Assays(list(x1=matrix(1, 3, 4), x2=matrix(2, 3, 4))) >> > whee2 <- Assays(list(x1=matrix(3, 3, 4), x2=matrix(4, 3, 4))) >> > rbind(whee, whee2) >> Reference class object of class "ShallowSimpleListAssays" >> Field "data": >> List of length 2 >> names(2): x1 x2 >> > >> > names(whee) <- names(whee2) <- NULL >> > rbind(whee, whee2) >> Reference class object of class "ShallowSimpleListAssays" >> Field "data": >> List of length 1 >> >> So, unnaming and rbind'ing results in the loss of a matrix. This is the >> same issue I reported for unnamed multi-matrix assays when rbinding >> multiple SummarizedExperiment objects; I recall that being resolved by >> r/cbind'ing based on position. Should this be done here as well? If not, >> perhaps we should force people to name their assays. > > Maybe Martin or Val can chime in for this one. > > Thanks, > H. > >> >> Cheers, >> >> Aaron >> >> > sessionInfo() >> R Under development (unstable) (2015-10-30 r69588) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: CentOS release 6.4 (Final) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats4 stats graphics grDevices utils datasets >> [8] methods base >> >> other attached packages: >> [1] SummarizedExperiment_1.1.2 Biobase_2.31.0 >> [3] GenomicRanges_1.23.3 GenomeInfoDb_1.7.3 >> [5] IRanges_2.5.5 S4Vectors_0.9.8 >> [7] BiocGenerics_0.17.1 >> >> loaded via a namespace (and not attached): >> [1] zlibbioc_1.17.0 XVector_0.11.0 >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel