On 02/20/2014 02:55 PM, Martin Morgan wrote:
On 02/20/2014 02:32 PM, Hervé Pagès wrote:
Hi Jesper,

On 02/20/2014 02:13 PM, Jesper Gådin wrote:
Very true that it is quite difficult to find the documentation when one
is not aware of its existence :P

Yeah, this has been a source of frustration for many people. And
always a source of embarrassment (for us) when teaching our software
to beginners.

I've started to change this. In the upcoming version of BioC (2.14,
scheduled for mid-April), when you'll do ?coverage, you'll get to
choose between the 3 man pages that document coverage methods (there
is one in IRanges, one in GenomicRanges, and one in GenomicAlignments).

I want to generalize this to other generics that have methods spread
across several packages (e.g. findOverlaps, the intra- and inter-range
methods, etc...).

Having to choose between several man pages every time you do e.g.
?findOverlaps is a minor annoyance compared to not being able to
find the man page at all. (Note that if you already know where is
your favorite man page, you'll be able to direct access it with
e.g. ?GenomicRanges::findOverlaps). Nobody will ever need to use
the insane ?`findOverlaps,GenomicRanges,GenomicRanges-method` to

tab completion helps, so that you don't need to be totally insane, just
insane enough to know to start with

?"cover<tab>

and also insane enough to know which method you need to pick up amongst
the 30+ methods listed by ?"findOverlaps<tab>
Overwhelming!

H.


Martin

open that man page again. Ever! (it will still work though...)

Cheers,
H.


coverage() is fast and beautiful. Thanks!

/Jesper


On Wed, Feb 19, 2014 at 9:21 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:

    Hi Jesper,


    On 02/19/2014 08:44 AM, Michael Lawrence wrote:

        On Wed, Feb 19, 2014 at 8:39 AM, Jesper Gådin
        <jesper.ga...@gmail.com <mailto:jesper.ga...@gmail.com>>wrote:

            Hi Michael,

            Herves suggestion will probably work for my use case, but if
            there are any
            smoother ways it would be preferable.

            The use case is as follows:

            1) calculate strand specific coverage over a region from
            GAlignments object (or file)

            At the moment I read a file using readGAlignmentsFromBam()
            with tag="XS",
            then filter it on "flag" and "mapq". Then I subset the
            resulting GAlignments in a minus and a plus -strand object.
            Then I calculate coverage by my own coverage function which
            uses the cigar
            information in the GAlignments object. This function is the
            one using
            cigarToRleList() at the moment. As I understand the
            coverage() function
            from the GenomicAlignments/IRanges package does not take
            into account
            cigars, or does it?


        It does take the coverage into account; specifically to exclude
        the introns
        from coverage. I think there's also an option to exclude
deletions.


    Unfortunately the man page is not easy to access (you need to do
    ?`coverage,GAlignments-method`__), but it says:

        The methods for GAlignments and GAlignmentPairs objects do:

          coverage(grglist(x), ...)

    And if you do grglist() on a GAlignments or GAlignmentPairs
objects, the
    ranges you get in the returned GRangesList object are calculated
based
    on the CIGAR.

    Trust but verify. Here is how you can actually verify that
coverage()
    does take the CIGAR into account:

       library(RNAseqData.HNRNPC.bam.__chr14)
       gal <- readGAlignments(RNAseqData.__HNRNPC.bam.chr14_BAMFILES[1])
       cig_op_table <- cigarOpTable(cigar(gal))

    First we pick up an alignment with an N in its CIGAR and do
coverage()
    on it:

       > gal_with_N <- gal[which(cig_op_table[ , "N"] != 0)[1]]

       > gal_with_N
       GAlignments with 1 alignment and 0 metadata columns:
             seqnames strand       cigar    qwidth     start       end
       width
                <Rle>  <Rle> <character> <integer> <integer> <integer>
    <integer>
         [1]    chr14      + 55M2117N17M        72  19650072  19652260
        2189
                  ngap
             <integer>
         [1]         1
         ---
         seqlengths:
                         chr1                 chr10 ...
      chrY
                    249250621             135534747 ...
      59373566

       > coverage(gal_with_N)$chr14
       integer-Rle of length 107349540 with 5 runs
         Lengths: 19650071       55     2117       17 87697280
         Values :        0        1        0        1        0

    Same thing with an alignment with an I in its CIGAR:

       > gal_with_I <- gal[which(cig_op_table[ , "I"] != 0)[1]]

       > gal_with_I
       GAlignments with 1 alignment and 0 metadata columns:
             seqnames strand       cigar    qwidth     start       end
       width
                <Rle>  <Rle> <character> <integer> <integer> <integer>
    <integer>
         [1]    chr14      -     64M1I7M        72  19411677  19411747
          71
                  ngap
             <integer>
         [1]         0
         ---
         seqlengths:
                         chr1                 chr10 ...
      chrY
                    249250621             135534747 ...
      59373566

       > coverage(gal_with_I)$chr14
       integer-Rle of length 107349540 with 3 runs
         Lengths: 19411676 71 87937793 <tel:71%2087937793>
         Values :        0        1        0

    Same thing with an alignment with a D in its CIGAR:

       > gal_with_D <- gal[which(cig_op_table[ , "D"] != 0)[1]]

       > gal_with_D
       GAlignments with 1 alignment and 0 metadata columns:
             seqnames strand       cigar    qwidth     start       end
       width
                <Rle>  <Rle> <character> <integer> <integer> <integer>
    <integer>
         [1]    chr14      +    38M1D34M        72  19659063  19659135
          73
                  ngap
             <integer>
         [1]         0
         ---
         seqlengths:
                         chr1                 chr10 ...
      chrY
                    249250621             135534747 ...
      59373566

       > coverage(gal_with_D)$chr14
       integer-Rle of length 107349540 with 3 runs
         Lengths: 19659062       73 87690405
         Values :        0        1        0

    Seeing is believing,

    Cheers,
    H.



            I started to look at the applyPileups() in Rsamtools which I
            can get to
            calculate coverage using cigars, but not using the strand or
            flag
            information for filtering. That solution would start from a
            bam-file
            instead of a GAlignments object, and sure I can do the
            filtering outside R.
            But it would be very convenient to do it all from within R.

            If there are nice solutions starting from both a GAlignments
            and a
            bam-file it would be great! =)

            /Jesper



            On Tue, Feb 18, 2014 at 10:52 PM, Michael Lawrence <
            lawrence.mich...@gene.com
            <mailto:lawrence.mich...@gene.com>> wrote:

                Hi Jesper,

                Would you be willing to volunteer your use case? As
                Herve hinted,
                cigarToRleList and friends are low-level helpers. There
                may be an easier
                way to achieve what you want, or an opportunity to
                improve things.

                Michael


                On Mon, Feb 17, 2014 at 1:10 AM, Jesper Gådin
                <jesper.ga...@gmail.com
                <mailto:jesper.ga...@gmail.com>>wrote:

                    Hi,

                    Have come across a cigar-vector that is problematic
                    to process.

                    #load package

                        library(GenomicAlignments)


                    #load data (see attached file)

                        load("2014-02-17-cigarExample.__rdata")


                    #run function cigarToRleList

                        cigarRle <- cigarToRleList(cigarExample)

                    Error in .Call2("Rle_constructor", values, lengths,
                    check, 0L, PACKAGE =
                    "IRanges") :
                        integer overflow while summing elements in
'lengths'

                        sessionInfo()

                    R Under development (unstable) (2013-11-13 r64209)
                    Platform: x86_64-unknown-linux-gnu (64-bit)

                    locale:
                       [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
                       [3] LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
                       [5] LC_MONETARY=en_US.UTF-8
                      LC_MESSAGES=en_US.UTF-8
                       [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
                       [9] LC_ADDRESS=C               LC_TELEPHONE=C
                    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

                    attached base packages:
                    [1] parallel  stats     graphics  grDevices utils
                       datasets  methods
                    [8] base

                    other attached packages:
                    [1] GenomicAlignments_0.99.18 Rsamtools_1.15.26
                    [3] Biostrings_2.31.12        XVector_0.3.6
                    [5] GenomicRanges_1.15.26     IRanges_1.21.23
                    [7] BiocGenerics_0.9.3

                    loaded via a namespace (and not attached):
                       [1] BatchJobs_1.1-1135 BBmisc_1.5
                    BiocParallel_0.5.8
                    bitops_1.0-6

                       [5] brew_1.0-6         codetools_0.2-8
DBI_0.2-7
                       digest_0.6.4

                       [9] fail_1.2           foreach_1.4.1
                      iterators_1.0.6    plyr_1.8

                    [13] RSQLite_0.11.4     sendmailR_1.1-2
                      stats4_3.1.0       tools_3.1.0

                    [17] zlibbioc_1.9.0

                    _________________________________________________
                    Bioc-devel@r-project.org
                    <mailto:Bioc-devel@r-project.org> mailing list
                    https://stat.ethz.ch/mailman/__listinfo/bioc-devel
                    <https://stat.ethz.ch/mailman/listinfo/bioc-devel>





                 [[alternative HTML version deleted]]




        _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>


    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>






--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to