Some of the inconsistency emerges from wrappers that correspond to operations on the rowData. I think that's fine as long as it's obvious (as in the case of findOverlaps and isSNV). The head and tail functions are by convention row-based for rectangular objects. I agree though that if we keep 1D extraction then the behavior of length() should be changed.
On Fri, Mar 21, 2014 at 3:35 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > Hi Martin, > > > On 03/21/2014 01:45 PM, Martin Morgan wrote: > >> On 03/20/2014 05:20 PM, Hervé Pagès wrote: >> >>> Hi, >>> >>> On 03/19/2014 01:10 PM, Michael Lawrence wrote: >>> >>>> You can apparently use 1D extraction for VCF, which is a little >>>> surprising; >>>> I learned it from restrictToSNV. >>>> >>> >>> This is inherited from SummarizedExperiment: >>> >>> > example(SummarizedExperiment) >>> >>> > se1 >>> class: SummarizedExperiment >>> dim: 200 6 >>> exptData(0): >>> assays(1): counts >>> rownames: NULL >>> rowData metadata column names(0): >>> colnames(6): A B ... E F >>> colData names(1): Treatment >>> >>> > se1[1:4] >>> class: SummarizedExperiment >>> dim: 4 6 >>> exptData(0): >>> assays(1): counts >>> rownames: NULL >>> rowData metadata column names(0): >>> colnames(6): A B ... E F >>> colData names(1): Treatment >>> >>> To me that means that a SummarizedExperiment has a length >>> (conceptually), and that this length is the number of rows. >>> It would actually help if a "length" method was defined: >>> >>> > length(se1) >>> [1] 1 >>> >> >> I think of a SummarizedExperiment as fundamentally a matrix with row and >> column annotations. 'length' would then be prod(dim(se1)) >> > > But it's not defined as such either. > > Note that findOverlaps() on SummarizedExperiment objects returns a > Hits object with indices in the 1:nrow(query) or 1:nrow(subject) > range. I'd like to be able to say "in the seq_along(query) or > seq_along(subject) range" because that's what findOverlaps() does > on any other object defined in IRanges/GenomicRanges/GenomicAlignments. > But I can't because that would be inaccurate. > > However, it's conceptually true: I can use the indices in the Hits > object to do 1D extractions from the query or subject. This is good > and consistent with any other type of query or subject. > > > col- and >> rownames() are defined but names() is NULL. I guess 1-D sub-setting >> isn't matrix-like, but I don't think that removing this 'feature' simply >> for consistency sake is worth it; >> > > I was not suggesting that. > > > I guess the subsetting logical was >> copy/pasted from other code without enough thought. head(), tail() could >> be implemented if this were somehow useful (I usually use these for >> compact display, and that's irrelevant here...); >> > > I still find it sometimes useful to be able to do head() on a big > object when I just want to try things on a few elements first: > > > dim(vcf) > [1] 1000000 3 > > toy <- head(vcf) > rowData(toy) > assay(toy) > isSNV(toy) > findOverlaps(toy, exons) > > It's more convenient and much quicker than having to truncate the > individual results: > > head(rowData(vcf)) > head(assay(vcf)) > head(isSNV(vcf)) > head(findOverlaps(vcf, exons)) > > I guess what I'm trying to say is that while it helps thinking of > a SummarizedExperiment as fundamentally a matrix, there are already > enough differences with the matrix API to suggest that, unlike for a > matrix, the length of a SummarizedExperiment object is its nb of rows. > It's implicit in many ways and I think that formalizing it would help > rather than hurt. It will still be somewhat a surprise for the > end-user, but not a bigger surprise than the ones s/he gets right > now with seq_along(vcf), vcf[i], isSNV(vcf), findOverlaps(), head(), > etc.. And once s/he gets over it, there won't be anymore surprises: > all these things will be in agreement with length(vcf) and behave > as expected. > > Thanks, > H. > > > > rev() on a matrix >> doesn't do anything useful. >> >> Martin >> >> >>> That would automatically fix many convenience [ wrappers like head(), >>> tail(), rev(), etc... >>> >>> > head(se1) >>> class: SummarizedExperiment >>> dim: 1 6 >>> exptData(0): >>> assays(1): counts >>> rownames: NULL >>> rowData metadata column names(0): >>> colnames(6): A B ... E F >>> colData names(1): Treatment >>> >>> > rev(se1) >>> class: SummarizedExperiment >>> dim: 1 6 >>> exptData(0): >>> assays(1): counts >>> rownames: NULL >>> rowData metadata column names(0): >>> colnames(6): A B ... E F >>> colData names(1): Treatment >>> >>> Following that logic names(se1) also probably return colnames(se1). >>> >>> H. >>> >>> >>>> >>>> >>>> >>>> On Wed, Mar 19, 2014 at 1:07 PM, Vincent Carey >>>> <st...@channing.harvard.edu>wrote: >>>> >>>> >>>>> >>>>> >>>>> On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence < >>>>> lawrence.mich...@gene.com> wrote: >>>>> >>>>> It would be nice to have functions like isSNV, isIndel, isDeletion, >>>>>> etc >>>>>> that at least provide precise definitions of the terminology. I've >>>>>> added >>>>>> these, but they're designed only for VRanges. Should work for >>>>>> ExpandedVCF. >>>>>> >>>>>> Also, it would be nice if restrictToSNV just assumed that alt(x) >>>>>> must be >>>>>> something with nchar() support (with special handling for any >>>>>> List), so >>>>>> that the 'character' vector of alt,VRanges would work immediately. >>>>>> Basically restrictToSNV should just be x[isSNV(x)]. Is there even a >>>>>> use-case for the restrictToSNV abstraction if we did that? >>>>>> >>>>>> >>>>>> for VCF instance it would be x[isSNV(x),] and indeed I think that >>>>> would be >>>>> sufficient. i like the idea of having this family of predicates for >>>>> variant classes to allow such selections >>>>> >>>>> >>>>> >>>>> Michael >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain >>>>>> <voben...@fhcrc.org>wrote: >>>>>> >>>>>> Hi, >>>>>>> >>>>>>> I've added a restrictToSNV() function to VariantAnnotation >>>>>>> (1.9.46). The >>>>>>> return value is a subset VCF object containing SNVs only. The >>>>>>> function >>>>>>> operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value >>>>>>> must be >>>>>>> nucleotides (i.e., no structural variants). >>>>>>> >>>>>>> A variant is considered a SNV if the nucleotide sequences in both >>>>>>> ref(vcf) and alt(x) are of length 1. I have a question about how >>>>>>> variants >>>>>>> with multiple 'ALT' values should be handled. >>>>>>> >>>>>>> Should we consider row 4 a SNV? One 'ALT' is length 1, the other >>>>>>> is not. >>>>>>> >>>>>>> ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C")) >>>>>>> REF <- DNAStringSet(c("G", c("AA"), "T", "G")) >>>>>>> >>>>>>> DataFrame(REF, ALT) >>>>>>>> >>>>>>>>> >>>>>>>>> DataFrame with 4 rows and 2 columns >>>>>>>> REF ALT >>>>>>>> <DNAStringSet> <DNAStringSetList> >>>>>>>> 1 G A >>>>>>>> 2 AA TT >>>>>>>> 3 T G,A >>>>>>>> 4 G TT,C >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Thanks. >>>>>>> Valerie >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioc-devel@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>>> >>> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel