Very true that it is quite difficult to find the documentation when one is not aware of its existence :P
coverage() is fast and beautiful. Thanks! /Jesper On Wed, Feb 19, 2014 at 9:21 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > Hi Jesper, > > > On 02/19/2014 08:44 AM, Michael Lawrence wrote: > >> On Wed, Feb 19, 2014 at 8:39 AM, Jesper Gådin <jesper.ga...@gmail.com >> >wrote: >> >> Hi Michael, >>> >>> Herves suggestion will probably work for my use case, but if there are >>> any >>> smoother ways it would be preferable. >>> >>> The use case is as follows: >>> >>> 1) calculate strand specific coverage over a region from >>> GAlignments object (or file) >>> >>> At the moment I read a file using readGAlignmentsFromBam() with tag="XS", >>> then filter it on "flag" and "mapq". Then I subset the >>> resulting GAlignments in a minus and a plus -strand object. >>> Then I calculate coverage by my own coverage function which uses the >>> cigar >>> information in the GAlignments object. This function is the one using >>> cigarToRleList() at the moment. As I understand the coverage() function >>> from the GenomicAlignments/IRanges package does not take into account >>> cigars, or does it? >>> >>> >>> It does take the coverage into account; specifically to exclude the >> introns >> from coverage. I think there's also an option to exclude deletions. >> > > Unfortunately the man page is not easy to access (you need to do > ?`coverage,GAlignments-method`), but it says: > > The methods for GAlignments and GAlignmentPairs objects do: > > coverage(grglist(x), ...) > > And if you do grglist() on a GAlignments or GAlignmentPairs objects, the > ranges you get in the returned GRangesList object are calculated based > on the CIGAR. > > Trust but verify. Here is how you can actually verify that coverage() > does take the CIGAR into account: > > library(RNAseqData.HNRNPC.bam.chr14) > gal <- readGAlignments(RNAseqData.HNRNPC.bam.chr14_BAMFILES[1]) > cig_op_table <- cigarOpTable(cigar(gal)) > > First we pick up an alignment with an N in its CIGAR and do coverage() > on it: > > > gal_with_N <- gal[which(cig_op_table[ , "N"] != 0)[1]] > > > gal_with_N > GAlignments with 1 alignment and 0 metadata columns: > seqnames strand cigar qwidth start end width > <Rle> <Rle> <character> <integer> <integer> <integer> <integer> > [1] chr14 + 55M2117N17M 72 19650072 19652260 2189 > ngap > <integer> > [1] 1 > --- > seqlengths: > chr1 chr10 ... chrY > 249250621 135534747 ... 59373566 > > > coverage(gal_with_N)$chr14 > integer-Rle of length 107349540 with 5 runs > Lengths: 19650071 55 2117 17 87697280 > Values : 0 1 0 1 0 > > Same thing with an alignment with an I in its CIGAR: > > > gal_with_I <- gal[which(cig_op_table[ , "I"] != 0)[1]] > > > gal_with_I > GAlignments with 1 alignment and 0 metadata columns: > seqnames strand cigar qwidth start end width > <Rle> <Rle> <character> <integer> <integer> <integer> <integer> > [1] chr14 - 64M1I7M 72 19411677 19411747 71 > ngap > <integer> > [1] 0 > --- > seqlengths: > chr1 chr10 ... chrY > 249250621 135534747 ... 59373566 > > > coverage(gal_with_I)$chr14 > integer-Rle of length 107349540 with 3 runs > Lengths: 19411676 71 87937793 > Values : 0 1 0 > > Same thing with an alignment with a D in its CIGAR: > > > gal_with_D <- gal[which(cig_op_table[ , "D"] != 0)[1]] > > > gal_with_D > GAlignments with 1 alignment and 0 metadata columns: > seqnames strand cigar qwidth start end width > <Rle> <Rle> <character> <integer> <integer> <integer> <integer> > [1] chr14 + 38M1D34M 72 19659063 19659135 73 > ngap > <integer> > [1] 0 > --- > seqlengths: > chr1 chr10 ... chrY > 249250621 135534747 ... 59373566 > > > coverage(gal_with_D)$chr14 > integer-Rle of length 107349540 with 3 runs > Lengths: 19659062 73 87690405 > Values : 0 1 0 > > Seeing is believing, > > Cheers, > H. > > >> >> I started to look at the applyPileups() in Rsamtools which I can get to >>> calculate coverage using cigars, but not using the strand or flag >>> information for filtering. That solution would start from a bam-file >>> instead of a GAlignments object, and sure I can do the filtering outside >>> R. >>> But it would be very convenient to do it all from within R. >>> >>> If there are nice solutions starting from both a GAlignments and a >>> bam-file it would be great! =) >>> >>> /Jesper >>> >>> >>> >>> On Tue, Feb 18, 2014 at 10:52 PM, Michael Lawrence < >>> lawrence.mich...@gene.com> wrote: >>> >>> Hi Jesper, >>>> >>>> Would you be willing to volunteer your use case? As Herve hinted, >>>> cigarToRleList and friends are low-level helpers. There may be an easier >>>> way to achieve what you want, or an opportunity to improve things. >>>> >>>> Michael >>>> >>>> >>>> On Mon, Feb 17, 2014 at 1:10 AM, Jesper Gådin <jesper.ga...@gmail.com >>>> >wrote: >>>> >>>> Hi, >>>>> >>>>> Have come across a cigar-vector that is problematic to process. >>>>> >>>>> #load package >>>>> >>>>>> library(GenomicAlignments) >>>>>> >>>>> >>>>> #load data (see attached file) >>>>> >>>>>> load("2014-02-17-cigarExample.rdata") >>>>>> >>>>> >>>>> #run function cigarToRleList >>>>> >>>>>> cigarRle <- cigarToRleList(cigarExample) >>>>>> >>>>> Error in .Call2("Rle_constructor", values, lengths, check, 0L, PACKAGE >>>>> = >>>>> "IRanges") : >>>>> integer overflow while summing elements in 'lengths' >>>>> >>>>> sessionInfo() >>>>>> >>>>> R Under development (unstable) (2013-11-13 r64209) >>>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] parallel stats graphics grDevices utils datasets methods >>>>> [8] base >>>>> >>>>> other attached packages: >>>>> [1] GenomicAlignments_0.99.18 Rsamtools_1.15.26 >>>>> [3] Biostrings_2.31.12 XVector_0.3.6 >>>>> [5] GenomicRanges_1.15.26 IRanges_1.21.23 >>>>> [7] BiocGenerics_0.9.3 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] BatchJobs_1.1-1135 BBmisc_1.5 BiocParallel_0.5.8 >>>>> bitops_1.0-6 >>>>> >>>>> [5] brew_1.0-6 codetools_0.2-8 DBI_0.2-7 >>>>> digest_0.6.4 >>>>> >>>>> [9] fail_1.2 foreach_1.4.1 iterators_1.0.6 plyr_1.8 >>>>> >>>>> [13] RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 >>>>> tools_3.1.0 >>>>> >>>>> [17] zlibbioc_1.9.0 >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> >>>>> >>>> >>> >> [[alternative HTML version deleted]] >> >> >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel