Hi Herve, I stumbled again over the 'memory not mapped' issue in trimLRpatterns using updated versions of R and BioC-devel. I guess it does not hit people very often, but I would highly appreciate if it could be fixed.
Many thanks, Michael PS: I can reproduce the issue using the code below under: R version 3.2.0 (2015-04-16) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago) locale: [1] C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ShortRead_1.27.1 GenomicAlignments_1.5.1 Rsamtools_1.21.3 [4] GenomicRanges_1.21.5 GenomeInfoDb_1.5.1 BiocParallel_1.3.4 [7] Biostrings_2.37.0 XVector_0.9.0 IRanges_2.3.6 [10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2 loaded via a namespace (and not attached): [1] lattice_0.20-31 bitops_1.0-6 grid_3.2.0 [4] futile.options_1.0.0 zlibbioc_1.15.0 hwriter_1.3.2 [7] latticeExtra_0.6-26 futile.logger_1.4.1 lambda.r_1.1.7 [10] Biobase_2.29.0 On 25.04.2014 13:11, Hervé Pagès wrote: > Hi Michael, > > Thanks for the report. I'll look into this. > > H. > > On 04/22/2014 08:29 AM, Michael Stadler wrote: >> Dear Herve, >> >> We are hitting a 'memory not mapped' problem when using trimLRpatterns >> as detailed below. I did not manage to reproduce it with few sequences, >> so I have to refer to a publicly available sequence file with many >> reads. Even then, it occasionally runs through without problems. >> >> Also, our use-case may not be typical and be part of the problem - maybe >> the solution will be to change our use of trimLRpatterns. >> >> Here is some code to illustrate/reproduce the problem: >> >> library(Biostrings) >> library(ShortRead) >> >> Rpat <- "TGGAATTCTCGGGTGCCAAGG" >> maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9)) >> >> fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG", >> "AAAATGGAATTCTCGGGTGCCAAGGTTTT")) >> >> # the second read is not trimmed because it runs through the adaptor >> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) >> # A DNAStringSet instance of length 2 >> # width seq >> #[1] 4 AAAA >> #[2] 28 AAAATGGAATTCTCGGGTGCCAAGGTTT >> >> # as a workaround, we pad the adaptor with Ns and >> # increase the mismatch tolerance >> numNs <- 90 >> maxRmm <- c(maxRmm, 1:numNs+max(maxRmm)) >> Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="") >> >> # now, also the second read gets trimmed >> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) >> # A DNAStringSet instance of length 2 >> # width seq >> #[1] 4 AAAA >> #[2] 4 AAAA >> >> # to trigger the segmentation fault, many reads are needed >> download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz", >> >> "ERR000916_1.fastq.gz") >> fq2 <- readFastq("ERR000916_1.fastq.gz") >> >> fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm) >> # *** caught segfault *** >> #address 0x7f5109be4fed, cause 'memory not mapped' >> # >> #Traceback: >> # 1: .Call(.NAME, ..., PACKAGE = PACKAGE) >> # 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at, >> at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, >> auto.reduce.pattern, PACKAGE = "Biostrings") >> # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch, >> min.mismatch, with.indels, fixed, .to.ans.type(follow.index), >> auto.reduce.pattern) >> # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, >> ending.at = subject_width, max.mismatch = max.Rmismatch, >> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) >> # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, >> ending.at = subject_width, max.mismatch = max.Rmismatch, >> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) >> # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels, >> Rfixed) >> # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject, >> max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, >> Rfixed, ranges) >> # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, >> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges >> = TRUE) >> # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, >> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges >> = TRUE) >> #10: eval(expr, envir, enclos) >> #11: eval(call, sys.frame(sys.parent())) >> #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch, >> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = >> TRUE) >> #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = >> maxRmm, with.Rindels = FALSE) >> #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = >> maxRmm, with.Rindels = FALSE) >> >> The problem did not occur in R 3.0.3 with BioC 2.13. >> Do you have an idea what's wrong? >> >> Thanks for your help, >> Michael >> >> >> >> sessionInfo()R version 3.1.0 (2014-04-10) >> #Platform: x86_64-unknown-linux-gnu (64-bit) >> # >> #locale: >> #[1] C >> # >> #attached base packages: >> #[1] parallel stats graphics grDevices utils datasets methods >> #[8] base >> # >> #other attached packages: >> # [1] ShortRead_1.22.0 GenomicAlignments_1.0.0 BSgenome_1.32.0 >> >> # [4] Rsamtools_1.16.0 GenomicRanges_1.16.0 GenomeInfoDb_1.0.1 >> >> # [7] BiocParallel_0.6.0 Biostrings_2.32.0 XVector_0.4.0 >> >> #[10] IRanges_1.22.2 BiocGenerics_0.10.0 RColorBrewer_1.0-5 >> >> # >> #loaded via a namespace (and not attached): >> # [1] BBmisc_1.5 BatchJobs_1.2 Biobase_2.24.0 >> # [4] DBI_0.2-7 RSQLite_0.11.4 Rcpp_0.11.1 >> # [7] bitops_1.0-6 brew_1.0-6 codetools_0.2-8 >> #[10] compiler_3.1.0 digest_0.6.4 fail_1.2 >> #[13] foreach_1.4.2 grid_3.1.0 hwriter_1.3 >> #[16] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 >> #[19] plyr_1.8.1 sendmailR_1.1-2 stats4_3.1.0 >> #[22] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel