Re: [Bioc-devel] 'memory not mapped' in trimLRpatterns
Hi Michael, I finally got to fix this in Biostrings 2.36.1 (release) and 2.37.2 (devel). Both should become available via biocLite() on Friday around 11am (Seattle time). Thanks for your patience. Cheers, H. On 04/30/2015 10:50 AM, Hervé Pagès wrote: Hi Michael, Thanks for the reminder. I must confess this slipped out of my radar. I'll look into it today and will let you know. H. On 04/30/2015 08:42 AM, Michael Stadler wrote: Hi Herve, I stumbled again over the 'memory not mapped' issue in trimLRpatterns using updated versions of R and BioC-devel. I guess it does not hit people very often, but I would highly appreciate if it could be fixed. Many thanks, Michael PS: I can reproduce the issue using the code below under: R version 3.2.0 (2015-04-16) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago) locale: [1] C attached base packages: [1] stats4parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ShortRead_1.27.1GenomicAlignments_1.5.1 Rsamtools_1.21.3 [4] GenomicRanges_1.21.5GenomeInfoDb_1.5.1 BiocParallel_1.3.4 [7] Biostrings_2.37.0 XVector_0.9.0 IRanges_2.3.6 [10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2 loaded via a namespace (and not attached): [1] lattice_0.20-31 bitops_1.0-6 grid_3.2.0 [4] futile.options_1.0.0 zlibbioc_1.15.0 hwriter_1.3.2 [7] latticeExtra_0.6-26 futile.logger_1.4.1 lambda.r_1.1.7 [10] Biobase_2.29.0 On 25.04.2014 13:11, Hervé Pagès wrote: Hi Michael, Thanks for the report. I'll look into this. H. On 04/22/2014 08:29 AM, Michael Stadler wrote: Dear Herve, We are hitting a 'memory not mapped' problem when using trimLRpatterns as detailed below. I did not manage to reproduce it with few sequences, so I have to refer to a publicly available sequence file with many reads. Even then, it occasionally runs through without problems. Also, our use-case may not be typical and be part of the problem - maybe the solution will be to change our use of trimLRpatterns. Here is some code to illustrate/reproduce the problem: library(Biostrings) library(ShortRead) Rpat - TGGAATTCTCGGGTGCCAAGG maxRmm - rep(0:2, c(6,3,nchar(Rpat)-9)) fq1 - DNAStringSet(c(TGGAATTCTCGGGTGCCAAGG, TGGAATTCTCGGGTGCCAAGG)) # the second read is not trimmed because it runs through the adaptor trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 #width seq #[1] 4 #[2]28 TGGAATTCTCGGGTGCCAAGGTTT # as a workaround, we pad the adaptor with Ns and # increase the mismatch tolerance numNs - 90 maxRmm - c(maxRmm, 1:numNs+max(maxRmm)) Rpat - paste(c(Rpat, rep(N, numNs)), collapse=) # now, also the second read gets trimmed trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 #width seq #[1] 4 #[2] 4 # to trigger the segmentation fault, many reads are needed download.file(ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz;, ERR000916_1.fastq.gz) fq2 - readFastq(ERR000916_1.fastq.gz) fq3 - trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm) # *** caught segfault *** #address 0x7f5109be4fed, cause 'memory not mapped' # #Traceback: # 1: .Call(.NAME, ..., PACKAGE = PACKAGE) # 2: .Call2(XStringSet_vmatch_pattern_at, pattern, subject, at, at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, auto.reduce.pattern, PACKAGE = Biostrings) # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch, min.mismatch, with.indels, fixed, .to.ans.type(follow.index), auto.reduce.pattern) # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels, Rfixed) # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject, max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges) # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #10: eval(expr, envir, enclos) #11: eval(call, sys.frame(sys.parent())) #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #13: trimLRPatterns(subject = fq2, Rpattern = Rpat,
Re: [Bioc-devel] 'memory not mapped' in trimLRpatterns
Hi Herve, I stumbled again over the 'memory not mapped' issue in trimLRpatterns using updated versions of R and BioC-devel. I guess it does not hit people very often, but I would highly appreciate if it could be fixed. Many thanks, Michael PS: I can reproduce the issue using the code below under: R version 3.2.0 (2015-04-16) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago) locale: [1] C attached base packages: [1] stats4parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ShortRead_1.27.1GenomicAlignments_1.5.1 Rsamtools_1.21.3 [4] GenomicRanges_1.21.5GenomeInfoDb_1.5.1 BiocParallel_1.3.4 [7] Biostrings_2.37.0 XVector_0.9.0 IRanges_2.3.6 [10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2 loaded via a namespace (and not attached): [1] lattice_0.20-31 bitops_1.0-6 grid_3.2.0 [4] futile.options_1.0.0 zlibbioc_1.15.0 hwriter_1.3.2 [7] latticeExtra_0.6-26 futile.logger_1.4.1 lambda.r_1.1.7 [10] Biobase_2.29.0 On 25.04.2014 13:11, Hervé Pagès wrote: Hi Michael, Thanks for the report. I'll look into this. H. On 04/22/2014 08:29 AM, Michael Stadler wrote: Dear Herve, We are hitting a 'memory not mapped' problem when using trimLRpatterns as detailed below. I did not manage to reproduce it with few sequences, so I have to refer to a publicly available sequence file with many reads. Even then, it occasionally runs through without problems. Also, our use-case may not be typical and be part of the problem - maybe the solution will be to change our use of trimLRpatterns. Here is some code to illustrate/reproduce the problem: library(Biostrings) library(ShortRead) Rpat - TGGAATTCTCGGGTGCCAAGG maxRmm - rep(0:2, c(6,3,nchar(Rpat)-9)) fq1 - DNAStringSet(c(TGGAATTCTCGGGTGCCAAGG, TGGAATTCTCGGGTGCCAAGG)) # the second read is not trimmed because it runs through the adaptor trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 #width seq #[1] 4 #[2]28 TGGAATTCTCGGGTGCCAAGGTTT # as a workaround, we pad the adaptor with Ns and # increase the mismatch tolerance numNs - 90 maxRmm - c(maxRmm, 1:numNs+max(maxRmm)) Rpat - paste(c(Rpat, rep(N, numNs)), collapse=) # now, also the second read gets trimmed trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 #width seq #[1] 4 #[2] 4 # to trigger the segmentation fault, many reads are needed download.file(ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz;, ERR000916_1.fastq.gz) fq2 - readFastq(ERR000916_1.fastq.gz) fq3 - trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm) # *** caught segfault *** #address 0x7f5109be4fed, cause 'memory not mapped' # #Traceback: # 1: .Call(.NAME, ..., PACKAGE = PACKAGE) # 2: .Call2(XStringSet_vmatch_pattern_at, pattern, subject, at, at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, auto.reduce.pattern, PACKAGE = Biostrings) # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch, min.mismatch, with.indels, fixed, .to.ans.type(follow.index), auto.reduce.pattern) # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels, Rfixed) # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject, max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges) # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #10: eval(expr, envir, enclos) #11: eval(call, sys.frame(sys.parent())) #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = maxRmm, with.Rindels = FALSE) #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = maxRmm, with.Rindels = FALSE) The problem did not occur in R 3.0.3 with BioC 2.13. Do you have an idea what's wrong? Thanks for your help, Michael sessionInfo()R version 3.1.0 (2014-04-10) #Platform: x86_64-unknown-linux-gnu (64-bit) #
Re: [Bioc-devel] 'memory not mapped' in trimLRpatterns
Hi Michael, Thanks for the report. I'll look into this. H. On 04/22/2014 08:29 AM, Michael Stadler wrote: Dear Herve, We are hitting a 'memory not mapped' problem when using trimLRpatterns as detailed below. I did not manage to reproduce it with few sequences, so I have to refer to a publicly available sequence file with many reads. Even then, it occasionally runs through without problems. Also, our use-case may not be typical and be part of the problem - maybe the solution will be to change our use of trimLRpatterns. Here is some code to illustrate/reproduce the problem: library(Biostrings) library(ShortRead) Rpat - TGGAATTCTCGGGTGCCAAGG maxRmm - rep(0:2, c(6,3,nchar(Rpat)-9)) fq1 - DNAStringSet(c(TGGAATTCTCGGGTGCCAAGG, TGGAATTCTCGGGTGCCAAGG)) # the second read is not trimmed because it runs through the adaptor trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 #width seq #[1] 4 #[2]28 TGGAATTCTCGGGTGCCAAGGTTT # as a workaround, we pad the adaptor with Ns and # increase the mismatch tolerance numNs - 90 maxRmm - c(maxRmm, 1:numNs+max(maxRmm)) Rpat - paste(c(Rpat, rep(N, numNs)), collapse=) # now, also the second read gets trimmed trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 #width seq #[1] 4 #[2] 4 # to trigger the segmentation fault, many reads are needed download.file(ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz;, ERR000916_1.fastq.gz) fq2 - readFastq(ERR000916_1.fastq.gz) fq3 - trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm) # *** caught segfault *** #address 0x7f5109be4fed, cause 'memory not mapped' # #Traceback: # 1: .Call(.NAME, ..., PACKAGE = PACKAGE) # 2: .Call2(XStringSet_vmatch_pattern_at, pattern, subject, at, at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, auto.reduce.pattern, PACKAGE = Biostrings) # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch, min.mismatch, with.indels, fixed, .to.ans.type(follow.index), auto.reduce.pattern) # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels, Rfixed) # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject, max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges) # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #10: eval(expr, envir, enclos) #11: eval(call, sys.frame(sys.parent())) #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = maxRmm, with.Rindels = FALSE) #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = maxRmm, with.Rindels = FALSE) The problem did not occur in R 3.0.3 with BioC 2.13. Do you have an idea what's wrong? Thanks for your help, Michael sessionInfo()R version 3.1.0 (2014-04-10) #Platform: x86_64-unknown-linux-gnu (64-bit) # #locale: #[1] C # #attached base packages: #[1] parallel stats graphics grDevices utils datasets methods #[8] base # #other attached packages: # [1] ShortRead_1.22.0GenomicAlignments_1.0.0 BSgenome_1.32.0 # [4] Rsamtools_1.16.0GenomicRanges_1.16.0GenomeInfoDb_1.0.1 # [7] BiocParallel_0.6.0 Biostrings_2.32.0 XVector_0.4.0 #[10] IRanges_1.22.2 BiocGenerics_0.10.0 RColorBrewer_1.0-5 # #loaded via a namespace (and not attached): # [1] BBmisc_1.5 BatchJobs_1.2 Biobase_2.24.0 # [4] DBI_0.2-7 RSQLite_0.11.4 Rcpp_0.11.1 # [7] bitops_1.0-6brew_1.0-6 codetools_0.2-8 #[10] compiler_3.1.0 digest_0.6.4fail_1.2 #[13] foreach_1.4.2 grid_3.1.0 hwriter_1.3 #[16] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 #[19] plyr_1.8.1 sendmailR_1.1-2 stats4_3.1.0 #[22] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514