On Jun 1, 2009, at 11:16 , Hervé Pagès wrote:

Hi Paul,

Thanks for the feedback! I just set up a script that does something similar to your code and was able to trigger the spooky/infamous bug. The first time after 140 iterations (55 sec.), the second time after 73, etc... Seems like I always get it after a few dozens of iterations. This is great because
that's going to make troubleshooting this issue a little bit easier!

BTW, if you are interested in speeding up your code (and at the same time, you might get rid of the spooky bug), you can now take advantage of the fact that getSeq() is vectorized (starting with BSgenome >= 1.12.2). Given that there is also a vectorized version of countPattern() (it's the vcountPattern() function),
then you don't need the "for" loop anymore:

 myb.consen.count <- character(nrow(peaks))
 all.genomic <- DNAStringSet(getSeq(Mmusculus, peaks$chr,
                                    peaks$start, peaks$end))
 c.1 <- vcountPattern(myb.dna, all.genomic, fixed=FALSE)
 c.2 <- vcountPattern(myb.dna.comp, all.genomic, fixed=FALSE)
 c.3 <- vcountPattern(myb.dna.rev, all.genomic, fixed=FALSE)
 c.4 <- vcountPattern(myb.dna.comp.rev, all.genomic, fixed=FALSE)
 ## Now c.1, c.2, c.3 and c.4 are integer vectors of the same length
 ## as 'all.genomic'
 the.sum <- c.1 + c.2 + c.3 + c.4
the.consen.count <- paste("F:", c.1, "FC:", c.2, "FRev:", c.3, "FCRev:", c.4)
 myb.consen.count[the.sum > 0] <- the.consen.count[the.sum > 0]

This is thousands of times faster than calling getSeq() and countPattern() inside a "for" loop. Also, it doesn't seem to trigger the spooky/ infamous
bug :-)

In my experience, the bug seems to come if I do a DNAStringSet on a Views, many times. I had some old code that did it essentially once per gene (always lots of errors) and moved it to once per chromosome (ran without problems).

Kasper
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to