On 09/10/2013 12:22 AM, Elena Grassi wrote:
Ok, thanks - I misread the manual, I just thought that was a way to
filter out reads falling over certain regions.
After reading the alignments I use summarizeOverlaps to obtain counts
with the desired options. I decided to use scanBamParam(which=..)

You can use summarizeOverlaps with a 'BamFileList' created by something like

  myFiles = dir("/some/dir", pattern="bam$")
  bfl = BamFileList(myFiles, yieldSize=1000000)
  olaps = summarizeOverlaps(features, bfl)

see the example on the help page

  method?"summarizeOverlaps,GRanges,BamFileList"

(there's helpful tab completion after 'method?"sum<tab>'). If you've also said

  library(parallel)
  options(mc.cores=detectCores())  ## how many cores to use?

then the bam files are processed in parallel (choose yieldSize so that the different threads are not competing for memory).

because loading bams and counting overlaps is using a lot of RAM for
big bams and I found online various suggestions to use the

more generally a strategy is to use yieldSize and iterate through the file, like

  bf = BamFile(fl, yieldsSize=1000000)
  open(bf)
  repeat {
      curr = readBamGappedAlignments(bf) ## read up to yieldSize records
      if (length(curr) == 0)
          break ## done
      ## process chunk 'curr'
  }
  close(bf)

this is actually what summarizeOverlaps does when invoked on a BamFile (or BamFileList().

ScanBamParam to load one chr at a time...but as long as I would like
to be general I decided to extract all the seqlevels in the GRanges
object that I'm using and then I would like to iterate over those
extracting the reads that fall on different seqlevels.
I noticed that the counts where different with this approach with
respect to doing all the work together. I guess that I have to find
another way to split my alignments...but I'd really like to avoid
something like:
http://comments.gmane.org/gmane.comp.lang.r.sequencing/755

one small thing that came out of that thread was that

  as(seqinfo(BamFile("/some/file")), "GRanges")

gives a GRanges of all the sequence lengths.

Reducing my features would not work as long as I need to keep the
original number of the reads...I guess I will have to "flatten" all
the regions falling on a single chr of my Granges in a way that avoid
overlapping reads, that should do the trick (I hope...)...maybe I
could extract the smaller-bigger coords for a given chr and obtain
chr-based Granges from there.




Thank you very much for your help,
E.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to