Dear Herve, We are hitting a 'memory not mapped' problem when using trimLRpatterns as detailed below. I did not manage to reproduce it with few sequences, so I have to refer to a publicly available sequence file with many reads. Even then, it occasionally runs through without problems.
Also, our use-case may not be typical and be part of the problem - maybe the solution will be to change our use of trimLRpatterns. Here is some code to illustrate/reproduce the problem: library(Biostrings) library(ShortRead) Rpat <- "TGGAATTCTCGGGTGCCAAGG" maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9)) fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG", "AAAATGGAATTCTCGGGTGCCAAGGTTTT")) # the second read is not trimmed because it runs through the adaptor trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 # width seq #[1] 4 AAAA #[2] 28 AAAATGGAATTCTCGGGTGCCAAGGTTT # as a workaround, we pad the adaptor with Ns and # increase the mismatch tolerance numNs <- 90 maxRmm <- c(maxRmm, 1:numNs+max(maxRmm)) Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="") # now, also the second read gets trimmed trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 # width seq #[1] 4 AAAA #[2] 4 AAAA # to trigger the segmentation fault, many reads are needed download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz", "ERR000916_1.fastq.gz") fq2 <- readFastq("ERR000916_1.fastq.gz") fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm) # *** caught segfault *** #address 0x7f5109be4fed, cause 'memory not mapped' # #Traceback: # 1: .Call(.NAME, ..., PACKAGE = PACKAGE) # 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at, at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, auto.reduce.pattern, PACKAGE = "Biostrings") # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch, min.mismatch, with.indels, fixed, .to.ans.type(follow.index), auto.reduce.pattern) # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels, Rfixed) # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject, max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges) # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #10: eval(expr, envir, enclos) #11: eval(call, sys.frame(sys.parent())) #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = maxRmm, with.Rindels = FALSE) #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = maxRmm, with.Rindels = FALSE) The problem did not occur in R 3.0.3 with BioC 2.13. Do you have an idea what's wrong? Thanks for your help, Michael sessionInfo()R version 3.1.0 (2014-04-10) #Platform: x86_64-unknown-linux-gnu (64-bit) # #locale: #[1] C # #attached base packages: #[1] parallel stats graphics grDevices utils datasets methods #[8] base # #other attached packages: # [1] ShortRead_1.22.0 GenomicAlignments_1.0.0 BSgenome_1.32.0 # [4] Rsamtools_1.16.0 GenomicRanges_1.16.0 GenomeInfoDb_1.0.1 # [7] BiocParallel_0.6.0 Biostrings_2.32.0 XVector_0.4.0 #[10] IRanges_1.22.2 BiocGenerics_0.10.0 RColorBrewer_1.0-5 # #loaded via a namespace (and not attached): # [1] BBmisc_1.5 BatchJobs_1.2 Biobase_2.24.0 # [4] DBI_0.2-7 RSQLite_0.11.4 Rcpp_0.11.1 # [7] bitops_1.0-6 brew_1.0-6 codetools_0.2-8 #[10] compiler_3.1.0 digest_0.6.4 fail_1.2 #[13] foreach_1.4.2 grid_3.1.0 hwriter_1.3 #[16] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 #[19] plyr_1.8.1 sendmailR_1.1-2 stats4_3.1.0 #[22] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel