On Jun 1, 2009, at 11:16 , Hervé Pagès wrote:
Hi Paul,
Thanks for the feedback! I just set up a script that does something
similar
to your code and was able to trigger the spooky/infamous bug. The
first time
after 140 iterations (55 sec.), the second time after 73, etc...
Seems like
I always get it after a few dozens of iterations. This is great
because
that's going to make troubleshooting this issue a little bit easier!
BTW, if you are interested in speeding up your code (and at the same
time, you
might get rid of the spooky bug), you can now take advantage of the
fact that
getSeq() is vectorized (starting with BSgenome >= 1.12.2). Given
that there is
also a vectorized version of countPattern() (it's the
vcountPattern() function),
then you don't need the "for" loop anymore:
myb.consen.count <- character(nrow(peaks))
all.genomic <- DNAStringSet(getSeq(Mmusculus, peaks$chr,
peaks$start, peaks$end))
c.1 <- vcountPattern(myb.dna, all.genomic, fixed=FALSE)
c.2 <- vcountPattern(myb.dna.comp, all.genomic, fixed=FALSE)
c.3 <- vcountPattern(myb.dna.rev, all.genomic, fixed=FALSE)
c.4 <- vcountPattern(myb.dna.comp.rev, all.genomic, fixed=FALSE)
## Now c.1, c.2, c.3 and c.4 are integer vectors of the same length
## as 'all.genomic'
the.sum <- c.1 + c.2 + c.3 + c.4
the.consen.count <- paste("F:", c.1, "FC:", c.2, "FRev:", c.3,
"FCRev:", c.4)
myb.consen.count[the.sum > 0] <- the.consen.count[the.sum > 0]
This is thousands of times faster than calling getSeq() and
countPattern()
inside a "for" loop. Also, it doesn't seem to trigger the spooky/
infamous
bug :-)
In my experience, the bug seems to come if I do a DNAStringSet on a
Views, many times. I had some old code that did it essentially once
per gene (always lots of errors) and moved it to once per chromosome
(ran without problems).
Kasper
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing