Re: [Bioc-devel] restrictToSNV for VCF

Michael Lawrence Fri, 21 Mar 2014 15:54:06 -0700

Some of the inconsistency emerges from wrappers that correspond to
operations on the rowData. I think  that's fine as long as it's obvious (as
in the case of findOverlaps and isSNV). The head and tail functions are by
convention row-based for rectangular objects. I agree though that if we
keep 1D extraction then the behavior of length() should be changed.



On Fri, Mar 21, 2014 at 3:35 PM, Hervé Pagès <hpa...@fhcrc.org> wrote:

> Hi Martin,
>
>
> On 03/21/2014 01:45 PM, Martin Morgan wrote:
>
>> On 03/20/2014 05:20 PM, Hervé Pagès wrote:
>>
>>> Hi,
>>>
>>> On 03/19/2014 01:10 PM, Michael Lawrence wrote:
>>>
>>>> You can apparently use 1D extraction for VCF, which is a little
>>>> surprising;
>>>> I learned it from restrictToSNV.
>>>>
>>>
>>> This is inherited from SummarizedExperiment:
>>>
>>>    > example(SummarizedExperiment)
>>>
>>>    > se1
>>>    class: SummarizedExperiment
>>>    dim: 200 6
>>>    exptData(0):
>>>    assays(1): counts
>>>    rownames: NULL
>>>    rowData metadata column names(0):
>>>    colnames(6): A B ... E F
>>>    colData names(1): Treatment
>>>
>>>    > se1[1:4]
>>>    class: SummarizedExperiment
>>>    dim: 4 6
>>>    exptData(0):
>>>    assays(1): counts
>>>    rownames: NULL
>>>    rowData metadata column names(0):
>>>    colnames(6): A B ... E F
>>>    colData names(1): Treatment
>>>
>>> To me that means that a SummarizedExperiment has a length
>>> (conceptually), and that this length is the number of rows.
>>> It would actually help if a "length" method was defined:
>>>
>>>    > length(se1)
>>>    [1] 1
>>>
>>
>> I think of a SummarizedExperiment as fundamentally a matrix with row and
>> column annotations. 'length' would then be prod(dim(se1))
>>
>
> But it's not defined as such either.
>
> Note that findOverlaps() on SummarizedExperiment objects returns a
> Hits object with indices in the 1:nrow(query) or 1:nrow(subject)
> range. I'd like to be able to say "in the seq_along(query) or
> seq_along(subject) range" because that's what findOverlaps() does
> on any other object defined in IRanges/GenomicRanges/GenomicAlignments.
> But I can't because that would be inaccurate.
>
> However, it's conceptually true: I can use the indices in the Hits
> object to do 1D extractions from the query or subject. This is good
> and consistent with any other type of query or subject.
>
>
>  col- and
>> rownames() are defined but names() is NULL. I guess 1-D sub-setting
>> isn't matrix-like, but I don't think that removing this 'feature' simply
>> for consistency sake is worth it;
>>
>
> I was not suggesting that.
>
>
>  I guess the subsetting logical was
>> copy/pasted from other code without enough thought. head(), tail() could
>> be implemented if this were somehow useful (I usually use these for
>> compact display, and that's irrelevant here...);
>>
>
> I still find it sometimes useful to be able to do head() on a big
> object when I just want to try things on a few elements first:
>
>   > dim(vcf)
>   [1] 1000000       3
>
>   toy <- head(vcf)
>   rowData(toy)
>   assay(toy)
>   isSNV(toy)
>   findOverlaps(toy, exons)
>
> It's more convenient and much quicker than having to truncate the
> individual results:
>
>   head(rowData(vcf))
>   head(assay(vcf))
>   head(isSNV(vcf))
>   head(findOverlaps(vcf, exons))
>
> I guess what I'm trying to say is that while it helps thinking of
> a SummarizedExperiment as fundamentally a matrix, there are already
> enough differences with the matrix API to suggest that, unlike for a
> matrix, the length of a SummarizedExperiment object is its nb of rows.
> It's implicit in many ways and I think that formalizing it would help
> rather than hurt. It will still be somewhat a surprise for the
> end-user, but not a bigger surprise than the ones s/he gets right
> now with seq_along(vcf), vcf[i], isSNV(vcf), findOverlaps(), head(),
> etc.. And once s/he gets over it, there won't be anymore surprises:
> all these things will be in agreement with length(vcf) and behave
> as expected.
>
> Thanks,
> H.
>
>
>
>  rev() on a matrix
>> doesn't do anything useful.
>>
>> Martin
>>
>>
>>> That would automatically fix many convenience [ wrappers like head(),
>>> tail(), rev(), etc...
>>>
>>>    > head(se1)
>>>    class: SummarizedExperiment
>>>    dim: 1 6
>>>    exptData(0):
>>>    assays(1): counts
>>>    rownames: NULL
>>>    rowData metadata column names(0):
>>>    colnames(6): A B ... E F
>>>    colData names(1): Treatment
>>>
>>>    > rev(se1)
>>>    class: SummarizedExperiment
>>>    dim: 1 6
>>>    exptData(0):
>>>    assays(1): counts
>>>    rownames: NULL
>>>    rowData metadata column names(0):
>>>    colnames(6): A B ... E F
>>>    colData names(1): Treatment
>>>
>>> Following that logic names(se1) also probably return colnames(se1).
>>>
>>> H.
>>>
>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 19, 2014 at 1:07 PM, Vincent Carey
>>>> <st...@channing.harvard.edu>wrote:
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence <
>>>>> lawrence.mich...@gene.com> wrote:
>>>>>
>>>>>  It would be nice to have functions like isSNV, isIndel, isDeletion,
>>>>>> etc
>>>>>> that at least provide precise definitions of the terminology. I've
>>>>>> added
>>>>>> these, but they're designed only for VRanges. Should work for
>>>>>> ExpandedVCF.
>>>>>>
>>>>>> Also, it would be nice if restrictToSNV just assumed that alt(x)
>>>>>> must be
>>>>>> something with nchar() support (with special handling for any
>>>>>> List), so
>>>>>> that the 'character' vector of alt,VRanges would work immediately.
>>>>>> Basically restrictToSNV should just be x[isSNV(x)]. Is there even a
>>>>>> use-case for the restrictToSNV abstraction if we did that?
>>>>>>
>>>>>>
>>>>>>  for VCF instance it would be x[isSNV(x),] and indeed I think that
>>>>> would be
>>>>> sufficient.  i like the idea of having this family of predicates for
>>>>> variant classes to allow such selections
>>>>>
>>>>>
>>>>>
>>>>>  Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
>>>>>> <voben...@fhcrc.org>wrote:
>>>>>>
>>>>>>  Hi,
>>>>>>>
>>>>>>> I've added a restrictToSNV() function to VariantAnnotation
>>>>>>> (1.9.46). The
>>>>>>> return value is a subset VCF object containing SNVs only. The
>>>>>>> function
>>>>>>> operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value
>>>>>>> must be
>>>>>>> nucleotides (i.e., no structural variants).
>>>>>>>
>>>>>>> A variant is considered a SNV if the nucleotide sequences in both
>>>>>>> ref(vcf) and alt(x) are of length 1. I have a question about how
>>>>>>> variants
>>>>>>> with multiple 'ALT' values should be handled.
>>>>>>>
>>>>>>> Should we consider row 4 a SNV? One 'ALT' is length 1, the other
>>>>>>> is not.
>>>>>>>
>>>>>>> ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
>>>>>>> REF <- DNAStringSet(c("G", c("AA"), "T", "G"))
>>>>>>>
>>>>>>>  DataFrame(REF, ALT)
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  DataFrame with 4 rows and 2 columns
>>>>>>>>               REF                ALT
>>>>>>>>    <DNAStringSet> <DNAStringSetList>
>>>>>>>> 1              G                  A
>>>>>>>> 2             AA                 TT
>>>>>>>> 3              T                G,A
>>>>>>>> 4              G               TT,C
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Valerie
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel@r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] restrictToSNV for VCF

Reply via email to