On 12/05/2013 06:37 AM, Taku Tokuyasu wrote:
Martin, Michael:

Thank you for the prompt replies!  Glad to know the functionality is available.
  The suggestions on alternatives are also quite helpful.  Amongst other
reasons, we wish to support htseq-count because it is part of a reference
RNA-seq protocol (http://www.ncbi.nlm.nih.gov/pubmed/23975260), and because I
haven't gotten around to comparing timing and functionality with
GenomicAlignments.  This is probably in a different thread, but e.g. is it known
that GenomicAlignments::summarizeOverlaps() produces identical results to
htseq-count with equivalent parameters?  What about timing?

Perhaps Valerie can chime in with specifics.

It is very hard to say that two algorithms are the 'same', in general the functionality of ht-seq and summarizeOverlaps is comparable; both support multiple counting modes (including in the case of summarizeOverlaps user-specified modes) but it would not be completely surprising if there were cases where the behaviour was different.

The parathyroidSE vignette

  
http://bioconductor.org/packages/release/data/experiment/html/parathyroidSE.html

details use of summarizeOverlaps for counting.

The Rsubread manuscript Table 1

  http://www.ncbi.nlm.nih.gov/pubmed/24227677

contains a performance comparison, although the memory consumption of of summarizeOverlaps as reported in that paper appears to rely on reading an entire file in to memory (files are iterated in chunks when processed in the way described by the parathyroidSE vignette; this should have minimal cost for overall speed but provide more reasonable memory management). Rsubread is not available on Windows.

Martin


_Taku



On Thu, Dec 5, 2013 at 5:14 AM, Michael Lawrence <[email protected]
<mailto:[email protected]>> wrote:

    It would be appreciated to know where the current counting methods fall
    short; i.e., why htseq-count is necessary.

    Thanks,
    Michael


    On Thu, Dec 5, 2013 at 12:30 AM, Martin Morgan <[email protected]
    <mailto:[email protected]>> wrote:

        On 12/04/2013 11:17 PM, Taku Tokuyasu wrote:

            Hello,

            We are trying to support NGS pipelines with SAM input to
            htseq-count, w/o
            requiring a samtools install.  Has Rsamtools implemented / planned
            any support
            for BAM to SAM conversion?  I notice this has been requested before
            (e.g. May
            2012,
            
http://comments.gmane.org/__gmane.science.biology.__informatics.conductor/41344
            
<http://comments.gmane.org/gmane.science.biology.informatics.conductor/41344>),
            so
            I was wondering if there were any updates on this.


        I added this to Rsamtools 1.15.14; this requires other package
        dependencies to be current. I had delayed adding this because it seems a
        step backward, to a relatively inefficient representation.

        Other counting alternatives not requiring coercion to SAM files are
        GenomicAlignments::__summarizeOverlaps (acting on BamFile / BamViews /
        GAlignments / GAlignmentPairs, counting bins as GRanges / GRanges list
        from TxDb or standard GTF etc files via rtracklayer::import) and
        Rsubread::featureCount (SAM or BAM files, built-in or GTF or SAF
        annotation files).

        Please let me know if there are issues with the implementation.

        Martin


            Regards,

            _Taku

            Taku Tokuyasu
            Computational Biology Core
            UCSF Helen Diller Family Comprehensive Cancer Center




        --
        Computational Biology / Fred Hutchinson Cancer Research Center
        1100 Fairview Ave. N.
        PO Box 19024 Seattle, WA 98109

        Location: Arnold Building M1 B861
        Phone: (206) 667-2793 <tel:%28206%29%20667-2793>

        _________________________________________________
        [email protected] <mailto:[email protected]> mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>





--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to