Re: [Bioc-devel] cigarToRleList fails

Jesper Gådin Thu, 20 Feb 2014 14:14:34 -0800

Very true that it is quite difficult to find the documentation when one is
not aware of its existence :P


coverage() is fast and beautiful. Thanks!

/Jesper


On Wed, Feb 19, 2014 at 9:21 PM, Hervé Pagès <hpa...@fhcrc.org> wrote:

> Hi Jesper,
>
>
> On 02/19/2014 08:44 AM, Michael Lawrence wrote:
>
>> On Wed, Feb 19, 2014 at 8:39 AM, Jesper Gådin <jesper.ga...@gmail.com
>> >wrote:
>>
>>  Hi Michael,
>>>
>>> Herves suggestion will probably work for my use case, but if there are
>>> any
>>> smoother ways it would be preferable.
>>>
>>> The use case is as follows:
>>>
>>> 1) calculate strand specific coverage over a region from
>>> GAlignments object (or file)
>>>
>>> At the moment I read a file using readGAlignmentsFromBam() with tag="XS",
>>> then filter it on "flag" and "mapq". Then I subset the
>>> resulting GAlignments in a minus and a plus -strand object.
>>> Then I calculate coverage by my own coverage function which uses the
>>> cigar
>>> information in the GAlignments object. This function is the one using
>>> cigarToRleList() at the moment. As I understand the coverage() function
>>> from the GenomicAlignments/IRanges package does not take into account
>>> cigars, or does it?
>>>
>>>
>>>  It does take the coverage into account; specifically to exclude the
>> introns
>> from coverage. I think there's also an option to exclude deletions.
>>
>
> Unfortunately the man page is not easy to access (you need to do
> ?`coverage,GAlignments-method`), but it says:
>
>    The methods for GAlignments and GAlignmentPairs objects do:
>
>      coverage(grglist(x), ...)
>
> And if you do grglist() on a GAlignments or GAlignmentPairs objects, the
> ranges you get in the returned GRangesList object are calculated based
> on the CIGAR.
>
> Trust but verify. Here is how you can actually verify that coverage()
> does take the CIGAR into account:
>
>   library(RNAseqData.HNRNPC.bam.chr14)
>   gal <- readGAlignments(RNAseqData.HNRNPC.bam.chr14_BAMFILES[1])
>   cig_op_table <- cigarOpTable(cigar(gal))
>
> First we pick up an alignment with an N in its CIGAR and do coverage()
> on it:
>
>   > gal_with_N <- gal[which(cig_op_table[ , "N"] != 0)[1]]
>
>   > gal_with_N
>   GAlignments with 1 alignment and 0 metadata columns:
>         seqnames strand       cigar    qwidth     start       end     width
>            <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
>     [1]    chr14      + 55M2117N17M        72  19650072  19652260      2189
>              ngap
>         <integer>
>     [1]         1
>     ---
>     seqlengths:
>                     chr1                 chr10 ...                  chrY
>                249250621             135534747 ...              59373566
>
>   > coverage(gal_with_N)$chr14
>   integer-Rle of length 107349540 with 5 runs
>     Lengths: 19650071       55     2117       17 87697280
>     Values :        0        1        0        1        0
>
> Same thing with an alignment with an I in its CIGAR:
>
>   > gal_with_I <- gal[which(cig_op_table[ , "I"] != 0)[1]]
>
>   > gal_with_I
>   GAlignments with 1 alignment and 0 metadata columns:
>         seqnames strand       cigar    qwidth     start       end     width
>            <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
>     [1]    chr14      -     64M1I7M        72  19411677  19411747        71
>              ngap
>         <integer>
>     [1]         0
>     ---
>     seqlengths:
>                     chr1                 chr10 ...                  chrY
>                249250621             135534747 ...              59373566
>
>   > coverage(gal_with_I)$chr14
>   integer-Rle of length 107349540 with 3 runs
>     Lengths: 19411676       71 87937793
>     Values :        0        1        0
>
> Same thing with an alignment with a D in its CIGAR:
>
>   > gal_with_D <- gal[which(cig_op_table[ , "D"] != 0)[1]]
>
>   > gal_with_D
>   GAlignments with 1 alignment and 0 metadata columns:
>         seqnames strand       cigar    qwidth     start       end     width
>            <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
>     [1]    chr14      +    38M1D34M        72  19659063  19659135        73
>              ngap
>         <integer>
>     [1]         0
>     ---
>     seqlengths:
>                     chr1                 chr10 ...                  chrY
>                249250621             135534747 ...              59373566
>
>   > coverage(gal_with_D)$chr14
>   integer-Rle of length 107349540 with 3 runs
>     Lengths: 19659062       73 87690405
>     Values :        0        1        0
>
> Seeing is believing,
>
> Cheers,
> H.
>
>
>>
>>  I started to look at the applyPileups() in Rsamtools which I can get to
>>> calculate coverage using cigars, but not using the strand or flag
>>> information for filtering. That solution would start from a bam-file
>>> instead of a GAlignments object, and sure I can do the filtering outside
>>> R.
>>> But it would be very convenient to do it all from within R.
>>>
>>> If there are nice solutions starting from both a GAlignments and a
>>> bam-file it would be great! =)
>>>
>>> /Jesper
>>>
>>>
>>>
>>> On Tue, Feb 18, 2014 at 10:52 PM, Michael Lawrence <
>>> lawrence.mich...@gene.com> wrote:
>>>
>>>  Hi Jesper,
>>>>
>>>> Would you be willing to volunteer your use case? As Herve hinted,
>>>> cigarToRleList and friends are low-level helpers. There may be an easier
>>>> way to achieve what you want, or an opportunity to improve things.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Mon, Feb 17, 2014 at 1:10 AM, Jesper Gådin <jesper.ga...@gmail.com
>>>> >wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> Have come across a cigar-vector that is problematic to process.
>>>>>
>>>>> #load package
>>>>>
>>>>>> library(GenomicAlignments)
>>>>>>
>>>>>
>>>>> #load data (see attached file)
>>>>>
>>>>>> load("2014-02-17-cigarExample.rdata")
>>>>>>
>>>>>
>>>>> #run function cigarToRleList
>>>>>
>>>>>> cigarRle <- cigarToRleList(cigarExample)
>>>>>>
>>>>> Error in .Call2("Rle_constructor", values, lengths, check, 0L, PACKAGE
>>>>> =
>>>>> "IRanges") :
>>>>>    integer overflow while summing elements in 'lengths'
>>>>>
>>>>>  sessionInfo()
>>>>>>
>>>>> R Under development (unstable) (2013-11-13 r64209)
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>>>> [8] base
>>>>>
>>>>> other attached packages:
>>>>> [1] GenomicAlignments_0.99.18 Rsamtools_1.15.26
>>>>> [3] Biostrings_2.31.12        XVector_0.3.6
>>>>> [5] GenomicRanges_1.15.26     IRanges_1.21.23
>>>>> [7] BiocGenerics_0.9.3
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>   [1] BatchJobs_1.1-1135 BBmisc_1.5         BiocParallel_0.5.8
>>>>> bitops_1.0-6
>>>>>
>>>>>   [5] brew_1.0-6         codetools_0.2-8    DBI_0.2-7
>>>>>   digest_0.6.4
>>>>>
>>>>>   [9] fail_1.2           foreach_1.4.1      iterators_1.0.6    plyr_1.8
>>>>>
>>>>> [13] RSQLite_0.11.4     sendmailR_1.1-2    stats4_3.1.0
>>>>> tools_3.1.0
>>>>>
>>>>> [17] zlibbioc_1.9.0
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>>
>>
>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] cigarToRleList fails

Reply via email to