On 12/05/2013 06:37 AM, Taku Tokuyasu wrote:
Martin, Michael:
Thank you for the prompt replies! Glad to know the functionality is available.
The suggestions on alternatives are also quite helpful. Amongst other
reasons, we wish to support htseq-count because it is part of a reference
RNA-seq protocol (http://www.ncbi.nlm.nih.gov/pubmed/23975260), and because I
haven't gotten around to comparing timing and functionality with
GenomicAlignments. This is probably in a different thread, but e.g. is it known
that GenomicAlignments::summarizeOverlaps() produces identical results to
htseq-count with equivalent parameters? What about timing?
Perhaps Valerie can chime in with specifics.
It is very hard to say that two algorithms are the 'same', in general the
functionality of ht-seq and summarizeOverlaps is comparable; both support
multiple counting modes (including in the case of summarizeOverlaps
user-specified modes) but it would not be completely surprising if there were
cases where the behaviour was different.
The parathyroidSE vignette
http://bioconductor.org/packages/release/data/experiment/html/parathyroidSE.html
details use of summarizeOverlaps for counting.
The Rsubread manuscript Table 1
http://www.ncbi.nlm.nih.gov/pubmed/24227677
contains a performance comparison, although the memory consumption of of
summarizeOverlaps as reported in that paper appears to rely on reading an entire
file in to memory (files are iterated in chunks when processed in the way
described by the parathyroidSE vignette; this should have minimal cost for
overall speed but provide more reasonable memory management). Rsubread is not
available on Windows.
Martin
_Taku
On Thu, Dec 5, 2013 at 5:14 AM, Michael Lawrence <[email protected]
<mailto:[email protected]>> wrote:
It would be appreciated to know where the current counting methods fall
short; i.e., why htseq-count is necessary.
Thanks,
Michael
On Thu, Dec 5, 2013 at 12:30 AM, Martin Morgan <[email protected]
<mailto:[email protected]>> wrote:
On 12/04/2013 11:17 PM, Taku Tokuyasu wrote:
Hello,
We are trying to support NGS pipelines with SAM input to
htseq-count, w/o
requiring a samtools install. Has Rsamtools implemented / planned
any support
for BAM to SAM conversion? I notice this has been requested before
(e.g. May
2012,
http://comments.gmane.org/__gmane.science.biology.__informatics.conductor/41344
<http://comments.gmane.org/gmane.science.biology.informatics.conductor/41344>),
so
I was wondering if there were any updates on this.
I added this to Rsamtools 1.15.14; this requires other package
dependencies to be current. I had delayed adding this because it seems a
step backward, to a relatively inefficient representation.
Other counting alternatives not requiring coercion to SAM files are
GenomicAlignments::__summarizeOverlaps (acting on BamFile / BamViews /
GAlignments / GAlignmentPairs, counting bins as GRanges / GRanges list
from TxDb or standard GTF etc files via rtracklayer::import) and
Rsubread::featureCount (SAM or BAM files, built-in or GTF or SAF
annotation files).
Please let me know if there are issues with the implementation.
Martin
Regards,
_Taku
Taku Tokuyasu
Computational Biology Core
UCSF Helen Diller Family Comprehensive Cancer Center
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
_________________________________________________
[email protected] <mailto:[email protected]> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel