Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta
Dear Martin, I've now spend a day trying to resolve this. To make a long story short: I could get bowtie 1.1.1 (currently used in Rbowtie 1.15.1) to compile and work under macOS Sierra. I tried updating bowtie, but: - bowtie 1.2 has a bug that prevents it from using multiple threads, which I could reproduce both under Linux and macOS Sierra and which I could not resolve (see https://github.com/BenLangmead/bowtie/issues/41). - bowtie 1.1.2 crashes if compiled with clang (see https://github.com/BenLangmead/bowtie/issues/21), the only known workaround being to use a different toolchain, which I think is not an option for BioC/Rbowtie. I hope that resolves also the issue on El Capitan. Michael On 19.04.2017 13:47, Martin Morgan wrote: > On 04/19/2017 05:45 AM, Michael Stadler wrote: >> Dear BioC core, >> >> Thanks for the report, Herve. If I understand correctly, there is >> nothing I can do at this point to make QuasR green on windows, correct? >> >> I have another question regarding QuasR not building on veracruz2: The >> vignette does not build currently, reporting: >> Error on veracruz2.bioconductor.org processing sample >> /tmp/RtmpJBWrjI/chip_1_1.fq.bz2df0b6901ff33.fastq : 'asBam' internal: >> samtools invoked 'abort' ... >> >> Though it seems to build fine on other platforms, and there were no >> recent changes to the vignette. What would you or other suggest to do >> about that? > > The error is in createGenomicAlignmentsController after > > https://github.com/Bioconductor-mirror/QuasR/blob/cc374678033055f2bd4d105c502a426807223c1c/R/createAlignments-functions.R#L292 > > > it looks like the sam file is quite funky > > Browse[4]> options(nwarnings=1) > Browse[4]> xx = readLines(samFile) > There were 2339 warnings (use warnings() to see them) > Browse[4]> head(warnings(), 3) > Warning messages: > 1: In readLines(samFile) : line 7 appears to contain an embedded nul > 2: In readLines(samFile) : line 8 appears to contain an embedded nul > 3: In readLines(samFile) : line 9 appears to contain an embedded nul > Browse[4]> table(nzchar(xx)) > > FALSE TRUE > 2341 261 > Browse[4]> substring(head(xx, 10), 1, 70) > [1] "@HD\tVN:1.0\tSO:unsorted" > [2] "@SQ\tSN:chr1\tLN:4" > [3] "@SQ\tSN:chr2\tLN:1" > [4] "@SQ\tSN:chr3\tLN:45000" > [5] > "@PG\tID:Bowtie\tVN:1.1.1\tCL:\"/Library/Frameworks/R.framework/Versions/3." > > [6] "" > [7] "" > [8] "" > [9] "" > [10] "" > > The 'abort' from Rsamtools is > > Parse error at line 143: missing colon in auxiliary data > > It's not really clear whether R is being confused by the embedded nulls > or blank lines or other problem > > Browse[4]> xx[140 + 1:5] > [1] "" "" "" "" "" > Browse[4]> xx[nzhchar(xx)][140 + 1:5] > Error in nzhchar(xx) : could not find function "nzhchar" > Browse[4]> xx[nzchar(xx)][140 + 1:5] > [1] > "seq10137\t4\t*\t0\t0\t*\t*\t0\t0\tTCGTTATGGTTTCCGTTGCTGCCATCTCACAT\tB@BABCBBBABABA?A>8>A7:6=@>>:@BAA>1;B\tXM:i:0" > > [2] > "seq10138\t4\t*\t0\t0\t*\t*\t0\t0\tCTGGCGACTCCTTCGAGATGGACGCCGTTGGCGCTC\tABBB@CBBBA9BB@>'>9@AA=A?\tXM:i:0" > > [3] > "seq10139\t4\t*\t0\t0\t*\t*\t0\t0\tCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAAT\tBBCBBC=A@BAABBABBA=A??><\tXM:i:0" > > [4] > "seq10140\t4\t*\t0\t0\t*\t*\t0\t0\tGGTTGTCAGCGTCATAAGAGGACCTCCAAATG\t;;;AA=AA<AA<ABBB?=@>>=CCBB>9@@>B=BB=\tXM:i:0" > > [5] > "seq10141\t4\t*\t0\t0\t*\t*\t0\t0\tAACCCTAATGAGCTTAATCAAGATGATGCTCGTTAT\tBBAB@AAB@BBBA@B@ABAABBABAA@B?A?@\tXM:i:0" > > > So I guess it's in creation of the sam file -- Bowtie? > > Martin > >> >> Any suggestions are appreciated, >> Michael >> >> >> >> On 17.04.2017 02:08, Hervé Pagès wrote: >>> FWIW here are all the packages that are victim of this >>> installed.packages bug in today's build report: >>> >>> alpine >>> fCI >>> GenomicFeatures >>> QuasR >>> >>> We only see this error on tokay2 (Windows). >>> >>> H. >>> >>> >>> On 04/11/2017 04:21 PM, Gordon K Smyth wrote: >>>> I restarted my PC this morning and the problem disappeared. >>>> >>>> I probably should have tried that last night, but it was late ... >>>> >>>> Thanks >>>> Gordon >>>> >>>>> -Original M
Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta
Dear Martin, Thank you for the quick response - you are right to suspect bowtie, it seems that version 1.1.1 which I am using in Rbowtie has an issue on newer macOS versions, and I am currently looking into wether an update to version 1.2 would fix this. A first issue that I stumbled into is that R CMD check now warns about the bowtie code using C++11 extensions. Following the manual, I have added the following to DESCRIPTION: SystemRequirements: GNU make, C++11 ("GNU make" was already there), but the field seems to be ignored by R CMD check and the warning persists. Is this something to worry about? I hope to get back with a solution for the bowtie issue soon. Michael On 19.04.2017 13:47, Martin Morgan wrote: > On 04/19/2017 05:45 AM, Michael Stadler wrote: >> Dear BioC core, >> >> Thanks for the report, Herve. If I understand correctly, there is >> nothing I can do at this point to make QuasR green on windows, correct? >> >> I have another question regarding QuasR not building on veracruz2: The >> vignette does not build currently, reporting: >> Error on veracruz2.bioconductor.org processing sample >> /tmp/RtmpJBWrjI/chip_1_1.fq.bz2df0b6901ff33.fastq : 'asBam' internal: >> samtools invoked 'abort' ... >> >> Though it seems to build fine on other platforms, and there were no >> recent changes to the vignette. What would you or other suggest to do >> about that? > > The error is in createGenomicAlignmentsController after > > https://github.com/Bioconductor-mirror/QuasR/blob/cc374678033055f2bd4d105c502a426807223c1c/R/createAlignments-functions.R#L292 > > > it looks like the sam file is quite funky > > Browse[4]> options(nwarnings=1) > Browse[4]> xx = readLines(samFile) > There were 2339 warnings (use warnings() to see them) > Browse[4]> head(warnings(), 3) > Warning messages: > 1: In readLines(samFile) : line 7 appears to contain an embedded nul > 2: In readLines(samFile) : line 8 appears to contain an embedded nul > 3: In readLines(samFile) : line 9 appears to contain an embedded nul > Browse[4]> table(nzchar(xx)) > > FALSE TRUE > 2341 261 > Browse[4]> substring(head(xx, 10), 1, 70) > [1] "@HD\tVN:1.0\tSO:unsorted" > [2] "@SQ\tSN:chr1\tLN:4" > [3] "@SQ\tSN:chr2\tLN:1" > [4] "@SQ\tSN:chr3\tLN:45000" > [5] > "@PG\tID:Bowtie\tVN:1.1.1\tCL:\"/Library/Frameworks/R.framework/Versions/3." > > [6] "" > [7] "" > [8] "" > [9] "" > [10] "" > > The 'abort' from Rsamtools is > > Parse error at line 143: missing colon in auxiliary data > > It's not really clear whether R is being confused by the embedded nulls > or blank lines or other problem > > Browse[4]> xx[140 + 1:5] > [1] "" "" "" "" "" > Browse[4]> xx[nzhchar(xx)][140 + 1:5] > Error in nzhchar(xx) : could not find function "nzhchar" > Browse[4]> xx[nzchar(xx)][140 + 1:5] > [1] > "seq10137\t4\t*\t0\t0\t*\t*\t0\t0\tTCGTTATGGTTTCCGTTGCTGCCATCTCACAT\tB@BABCBBBABABA?A>8>A7:6=@>>:@BAA>1;B\tXM:i:0" > > [2] > "seq10138\t4\t*\t0\t0\t*\t*\t0\t0\tCTGGCGACTCCTTCGAGATGGACGCCGTTGGCGCTC\tABBB@CBBBA9BB@>'>9@AA=A?\tXM:i:0" > > [3] > "seq10139\t4\t*\t0\t0\t*\t*\t0\t0\tCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAAT\tBBCBBC=A@BAABBABBA=A??><\tXM:i:0" > > [4] > "seq10140\t4\t*\t0\t0\t*\t*\t0\t0\tGGTTGTCAGCGTCATAAGAGGACCTCCAAATG\t;;;AA=AA<AA<ABBB?=@>>=CCBB>9@@>B=BB=\tXM:i:0" > > [5] > "seq10141\t4\t*\t0\t0\t*\t*\t0\t0\tAACCCTAATGAGCTTAATCAAGATGATGCTCGTTAT\tBBAB@AAB@BBBA@B@ABAABBABAA@B?A?@\tXM:i:0" > > > So I guess it's in creation of the sam file -- Bowtie? > > Martin > >> >> Any suggestions are appreciated, >> Michael >> >> >> >> On 17.04.2017 02:08, Hervé Pagès wrote: >>> FWIW here are all the packages that are victim of this >>> installed.packages bug in today's build report: >>> >>> alpine >>> fCI >>> GenomicFeatures >>> QuasR >>> >>> We only see this error on tokay2 (Windows). >>> >>> H. >>> >>> >>> On 04/11/2017 04:21 PM, Gordon K Smyth wrote: >>>> I restarted my PC this morning and the problem disappeared. >>>> >>>> I probably should have tried that last night, but it was late ... >>>> >>>> Thanks >>>> Gordon >>>> >>>>> -Original Message- >>
Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta
Dear BioC core, Thanks for the report, Herve. If I understand correctly, there is nothing I can do at this point to make QuasR green on windows, correct? I have another question regarding QuasR not building on veracruz2: The vignette does not build currently, reporting: Error on veracruz2.bioconductor.org processing sample /tmp/RtmpJBWrjI/chip_1_1.fq.bz2df0b6901ff33.fastq : 'asBam' internal: samtools invoked 'abort' ... Though it seems to build fine on other platforms, and there were no recent changes to the vignette. What would you or other suggest to do about that? Any suggestions are appreciated, Michael On 17.04.2017 02:08, Hervé Pagès wrote: > FWIW here are all the packages that are victim of this > installed.packages bug in today's build report: > > alpine > fCI > GenomicFeatures > QuasR > > We only see this error on tokay2 (Windows). > > H. > > > On 04/11/2017 04:21 PM, Gordon K Smyth wrote: >> I restarted my PC this morning and the problem disappeared. >> >> I probably should have tried that last night, but it was late ... >> >> Thanks >> Gordon >> >>> -Original Message- >>> From: Martin Morgan [mailto:martin.mor...@roswellpark.org] >>> Sent: Tuesday, 11 April 2017 7:20 PM >>> To: Gordon K Smyth; bioc-devel@r-project.org >>> Subject: Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta >>> >>> On 04/11/2017 05:01 AM, Gordon K Smyth wrote: The problem appears to be with installed.packages(). If I start a fresh R >>> 3.4.0beta session, then I can run installed.packages() once with >>> correct results, >>> but running it a second time gives the error message: > installed.packages() Error in if (file.exists(dest) && file.mtime(dest) > file.mtime(lib) && : missing value where TRUE/FALSE needed >>> >>> The test is in this code chunk, from utils/R/packages.R >>> >>> for(lib in lib.loc) { >>> if(noCache) { >>> ret0 <- .readPkgDesc(lib, fields) >>> if(length(ret0)) retval <- rbind(retval, ret0) >>> } else { >>> ## Previously used URLencode for e.g. Windows paths with >>> drives >>> ## This version works for very long file names. >>> base <- paste(c(lib, fields), collapse = ",") >>> ## add length and 64-bit CRC in hex (in theory, seems >>> ## it is actually 32-bit on some systems) >>> enc <- sprintf("%d_%s", nchar(base), .Call(C_crc64, base)) >>> dest <- file.path(tempdir(), paste0("libloc_", enc, >>> ".rds")) >>> if(file.exists(dest) && >>> file.mtime(dest) > file.mtime(lib) && >>> (val <- readRDS(dest))$base == base) >>> ## use the cache file >>> retval <- rbind(retval, val$value) >>> else { >>> ret0 <- .readPkgDesc(lib, fields) >>> if(length(ret0)) { >>> retval <- rbind(retval, ret0) >>> ## save the cache file >>> saveRDS(list(base = base, value = ret0), dest) >>> } >>> } >>> } >>> >>> >>> where 'lib' is one of .libPaths(), 'dest' is one of >>> >>>dir(tempdir(), pattern="libloc_", full=TRUE) >>> >>> and 'base' should be a character(1) >>> >>> I think the code chunk has tried to cache the packages installed in each >>> directory of .libPaths() (the saveRDS() line), and these are somehow >>> corrupted on the second time through (I guess evaluating the >>> readRDS()??). >>> >>> For instance I have two paths in .libPaths() and after the first >>> install.packages() I have >>> >>> > str(readRDS(dir(tempdir(), full=TRUE)[1])) >>> List of 2 >>> $ base : chr >>> "/home/mtmorgan/bin/R-3-4- >>> branch/library,Version,Priority,Depends,Imports,LinkingTo,Suggests,Enhances,Li >>> >>> cense,Li"| >>> __truncated__ >>> $ value: chr [1:29, 1:17] "base" "boot" "class" "cluster" ... >>> > str(readRDS(dir(tempdir(), full=TRUE)[2])) >>> List of 2 >>> $ base : chr >>> "/home/mtmorgan/R/x86_64-pc-linux-gnu-library/3.4-Bioc- >>> 3.5,Version,Priority,Depends,Imports,LinkingTo,Suggests,E"| >>> __truncated__ >>> $ value: chr [1:513, 1:17] "abind" "acepack" "aCGH" "ADaCGH2" ... >>> >>> I'm guessing that one of these files is corrupted somehow, but it's not >>> obvious how. Can you use options(error=recover) and find the values that >>> cause the conditional to fail? >>> >>> Martin >>> > -Original Message- > From: Gordon K Smyth > Sent: Tuesday, 11 April 2017 6:26 PM > To: bioc-devel@r-project.org > Subject: Using BiocInstaller with R 3.4.0 beta > > I thought I would test out R 3.4.0 beta (for Windows) but now I > can't use the > BiocInstaller package. Attempts to use biocLite() or biocValid() > lead to an >>> error > message as shown below. > > What have I overlooked? > > Thanks > Gordon
Re: [Bioc-devel] Question: set purification test for overlapped regions between three GRanges object
Dear Jurat, Maybe this would be better asked on support.bioconductor.org. I don't think I fully understand what you intend to do (a fully working example would help), but here are two ideas that could be useful: 1. Maybe a single findOverlaps() will be enough to find all you need: D <- c(a,b,c) ov <- findOverlaps(query=D, subject=D) By inspecting queryHits(ov) and subjectHits(ov), you can find out whether a particular regions originally came from a, b or c (e.g. anything in 1:length(a) will be regions from a). 2. Maybe you can use intersect() to find common indices, see ?intersect Hope that helps, Michael On 28.04.2016 18:20, Jurat Shayidin wrote: > Dear Mailing list: > > I got stuck with implementing wrapper function for my packages. when > three GRanges objects (e.g. a,b,c) are given to my function, I would > like to find overlapped regions from one to another in parallel, > where a as query, b,c are subjects respectively. Because of three > input was given, so I want to call findOverlaps function in the > context of changing parameter (query, subject will be switched in > each individual test). To be clarify my point, I could have this > workflow: > > *1st test*: ov_1 <- list(ov.1 <- findOverlaps(a, b), ov.2 <- > findOverlaps( a,c)) > > intermediate output of 1st test: a.1 <- list(a.sc, a.wd) ; b.1 <- > list(b.sc, b.wd) ; c.1 <- list(c.sc, c.wd) > > *2nd test*: ov_2 <- list(ov.1 <- findOverlaps(b,a), ov.2 <- > findOverlaps(b,c )) > > intermediate output of 2nd test: b.2 <- list(b.sc_, b.wd_) ; a.2 <- > list(a.sc_, a.wd_) ; c.2 <- list(c.sc_, c.wd_) > > *3rd test*: ov_3 <- list(ov.1 <- findOverlaps(c,a), ov.2 <- > findOverlaps(c,b )) > > intermediate output of 3rd test: c.3 <- list(b.sc__, b.wd__) ; a.3 > <- list(a.sc__, a.wd__) ; c.3 <- list(c.sc__, c.wd__) > > start 1st test-> read data - > find overlapped regions conditionally > in parallel -> filtering function with specific threshold value -> > chisq.test() - > get combined pvalue, and do further filtering > process - > save result in function' environment -> go to 2nd test -> > repeat workflow - - - -> go to 3rd test -> - - - - -> all test is > done, prepare to generate final output of each GRanges objects - > > package job is DONE ! > > In particular, in each individual test, a,b,c could contain 2 > different group of regions as intermediate output, but it is not > final step, I must go to 2nd test, 3rd test respectively. > > However, I have hard time to find efficient solution for this issue, > because each individual test, a,b,c contains different set of > genomic regions where each has 2 different group of regions as > intermediate output. My goal is to implement function for set > purification for intermediate output of each GRanges objects from 3 > different test. > > desired job that I want to implement is , for a.1 <- list(a.sc, a.wd) > , a.2 <- list(a.sc_, a.wd_) , a.3 <- list(a.sc__, a.wd__), > implement function to retrieve set of genomic regions that all > appeared in 1st, 2nd, 3rd test respectively. > > *Objective*: I want to retrieve the regions that all appeared in 1st, > 2nd, 3rd test. How can I efficiently solve this issues ?Is there any > one give me possible idea to solve this problem? Any possible > approach, IDEA, sketch solution, or existing bioconductor package are > highly appreciated. Thanks a lot > > Best regards: > ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] 'memory not mapped' in trimLRpatterns
Hi Herve, I stumbled again over the 'memory not mapped' issue in trimLRpatterns using updated versions of R and BioC-devel. I guess it does not hit people very often, but I would highly appreciate if it could be fixed. Many thanks, Michael PS: I can reproduce the issue using the code below under: R version 3.2.0 (2015-04-16) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago) locale: [1] C attached base packages: [1] stats4parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ShortRead_1.27.1GenomicAlignments_1.5.1 Rsamtools_1.21.3 [4] GenomicRanges_1.21.5GenomeInfoDb_1.5.1 BiocParallel_1.3.4 [7] Biostrings_2.37.0 XVector_0.9.0 IRanges_2.3.6 [10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2 loaded via a namespace (and not attached): [1] lattice_0.20-31 bitops_1.0-6 grid_3.2.0 [4] futile.options_1.0.0 zlibbioc_1.15.0 hwriter_1.3.2 [7] latticeExtra_0.6-26 futile.logger_1.4.1 lambda.r_1.1.7 [10] Biobase_2.29.0 On 25.04.2014 13:11, Hervé Pagès wrote: Hi Michael, Thanks for the report. I'll look into this. H. On 04/22/2014 08:29 AM, Michael Stadler wrote: Dear Herve, We are hitting a 'memory not mapped' problem when using trimLRpatterns as detailed below. I did not manage to reproduce it with few sequences, so I have to refer to a publicly available sequence file with many reads. Even then, it occasionally runs through without problems. Also, our use-case may not be typical and be part of the problem - maybe the solution will be to change our use of trimLRpatterns. Here is some code to illustrate/reproduce the problem: library(Biostrings) library(ShortRead) Rpat - TGGAATTCTCGGGTGCCAAGG maxRmm - rep(0:2, c(6,3,nchar(Rpat)-9)) fq1 - DNAStringSet(c(TGGAATTCTCGGGTGCCAAGG, TGGAATTCTCGGGTGCCAAGG)) # the second read is not trimmed because it runs through the adaptor trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 #width seq #[1] 4 #[2]28 TGGAATTCTCGGGTGCCAAGGTTT # as a workaround, we pad the adaptor with Ns and # increase the mismatch tolerance numNs - 90 maxRmm - c(maxRmm, 1:numNs+max(maxRmm)) Rpat - paste(c(Rpat, rep(N, numNs)), collapse=) # now, also the second read gets trimmed trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) # A DNAStringSet instance of length 2 #width seq #[1] 4 #[2] 4 # to trigger the segmentation fault, many reads are needed download.file(ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz;, ERR000916_1.fastq.gz) fq2 - readFastq(ERR000916_1.fastq.gz) fq3 - trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm) # *** caught segfault *** #address 0x7f5109be4fed, cause 'memory not mapped' # #Traceback: # 1: .Call(.NAME, ..., PACKAGE = PACKAGE) # 2: .Call2(XStringSet_vmatch_pattern_at, pattern, subject, at, at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, auto.reduce.pattern, PACKAGE = Biostrings) # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch, min.mismatch, with.indels, fixed, .to.ans.type(follow.index), auto.reduce.pattern) # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels, Rfixed) # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject, max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges) # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #10: eval(expr, envir, enclos) #11: eval(call, sys.frame(sys.parent())) #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges = TRUE) #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = maxRmm, with.Rindels = FALSE) #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = maxRmm, with.Rindels = FALSE) The problem did not occur in R 3.0.3 with BioC 2.13. Do you have an idea what's wrong? Thanks for your help, Michael sessionInfo()R version 3.1.0 (2014-04-10) #Platform: x86_64-unknown-linux-gnu (64-bit
Re: [Bioc-devel] differences between petty and perceval (OS X 10.6.8 build machines for release/devel)
Dear Dan, Martin and Nate, Thank you for looking into it. I guess that is pointing to a problem within bowtie. It looks like the EXC_BAD_ACCESS you see on petty in ebwt.h is not reproducible on the other Mac or Linux machines we tried. Is it possible to run valgrind on petty? That may confirm/rule out if the memory (de-)allocation issues reported on Linux are related. I would like to submit a bug-report to the bowtie developers, but am reluctant to do that without being able to reproduce the problem or test potential fixed. I would have the options to go through Rbowtie build cycles, but would have to rely on the assumption that petty will keep hitting this hickup even with modified bowtie code. The minor differences between bowtie 1.0.1 and bowtie 1.0.1-bug-312 argue against that. I am tempted to stay with the current situtation: - OS X before 10.9 needs to use Rbowtie = 1.4.4 (based on bowtie 1.0.1) - OS X 10.9 onwards and everything else uses Rbowtie = 1.4.5 (based on bowtie 1.0.1 /patched bugs-312). Thanks again for your efforts, Michael On 14.06.2014 01:31, Dan Tenenbaum wrote: Hi Michael, - Original Message - From: Michael Stadler michael.stad...@fmi.ch To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel@r-project.org Sent: Friday, June 13, 2014 12:32:52 AM Subject: differences between petty and perceval (OS X 10.6.8 build machines for release/devel) Hi Dan, I'm cc'ing the list; maybe somebody else has experienced differences between petty and perceval. Rbowtie release (1.4.5) is not building under OS X 10.6.8 (petty). Rbowtie release (1.4.5) and development (1.5.5) are virtually identical (only DESCRIPTION and NEWS differ). The development version builds without problems on perceval, but the release version fails on petty: http://bioconductor.org/checkResults/devel/bioc-LATEST/Rbowtie/perceval-buildsrc.html http://bioconductor.org/checkResults/release/bioc-LATEST/Rbowtie/petty-buildsrc.html The only difference I can make out from the node info pages is that perceval has an additional section on C++11 compiler that is lacking from petty's NodeInfo page. Unfortunately, I cannot reproduce the issue, both Rbowtie 1.4.5 and 1.5.5 build successfully under OS X 10.6.8 and 10.7.5 using llvm-gcc-4.2. Do you have any idea what else could be different between petty and perceval? Martin and Nate and I took a look at this. I managed to come up with a bowtie command line that would reliably reproduce the segfault on petty. Then we ran that under gdb (and turned off compiler optimizations) and came up with this, which may or may not help you: petty:vignettes biocbuild$ gdb --args '/Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rbowtie/bowtie' -y -S -k 10 -m 10 -v 2 -r -p 4 --best --strata 'doit/refsIndex/index' 'doit/SpliceMapTemp_876c378e20ac/25mers.map' 'doit/SpliceMapTemp_876c378e20ac/25mers.map_unsorted' GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Mon Aug 15 16:03:10 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as x86_64-apple-darwin...Reading symbols for shared libraries ... done (gdb) run Starting program: /Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rbowtie/bowtie -y -S -k 10 -m 10 -v 2 -r -p 4 --best --strata doit/refsIndex/index doit/SpliceMapTemp_876c378e20ac/25mers.map doit/SpliceMapTemp_876c378e20ac/25mers.map_unsorted Reading symbols for shared libraries ++. done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x23d0d92d [Switching to process 36144 thread 0x20f] 0x000478b1 in Ebwtseqan::Stringseqan::SimpleTypeunsigned char, seqan::_Dna, seqan::Allocvoid ::rowL (this=0xbfffda10, l=@0xa300e14) at ebwt.h:1816 1816return unpack_2b_from_8b(l.side(this-_ebwt)[l._by], l._bp); (gdb) l 1811inline int EbwtTStr::rowL(const SideLocus l) const { 1812// Extract and return appropriate bit-pair 1813#ifdef SIXTY4_FORMAT 1814return (((uint64_t*)l.side(this-_ebwt))[l._by 3] l._by 7) 2) + l._bp) 1)) 3; 1815#else 1816return unpack_2b_from_8b(l.side(this-_ebwt)[l._by], l._bp); 1817#endif 1818} 1819 1820/** (gdb) p this -_ebwt $1 = (uint8_t *) 0x4804a00 \b2 (gdb) p *this -_ebwt $2 = 8 '\b' (gdb) p l._by $3 = 45 (gdb) p l.side $4 = SideLocus::side(unsigned char const*) const (gdb) p l.side(this-_ebwt) $5 = (uint8_t *) 0x23d0d900 Address 0x23d0d900 out of bounds (gdb) p l.side(this-_ebwt)[l._by] Cannot access memory at address 0x23d0d92d (gdb) p this -_ebwt $6 = (uint8_t *) 0x4804a00 \b2 (gdb) Running
[Bioc-devel] differences between petty and perceval (OS X 10.6.8 build machines for release/devel)
Hi Dan, I'm cc'ing the list; maybe somebody else has experienced differences between petty and perceval. Rbowtie release (1.4.5) is not building under OS X 10.6.8 (petty). Rbowtie release (1.4.5) and development (1.5.5) are virtually identical (only DESCRIPTION and NEWS differ). The development version builds without problems on perceval, but the release version fails on petty: http://bioconductor.org/checkResults/devel/bioc-LATEST/Rbowtie/perceval-buildsrc.html http://bioconductor.org/checkResults/release/bioc-LATEST/Rbowtie/petty-buildsrc.html The only difference I can make out from the node info pages is that perceval has an additional section on C++11 compiler that is lacking from petty's NodeInfo page. Unfortunately, I cannot reproduce the issue, both Rbowtie 1.4.5 and 1.5.5 build successfully under OS X 10.6.8 and 10.7.5 using llvm-gcc-4.2. Do you have any idea what else could be different between petty and perceval? Thank you, Michael ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] differences between petty and perceval (OS X 10.6.8 build machines for release/devel)
When BioC 2.14 was released, it did work on OS X 10.6.8, but not on 10.9 and neither on some flavours of 32-bit Linux. The bowtie developers have released version 1.0.1 which addresses some of these issues. For the remaining one that still made it fail on OS X 10.9, I have been in contact with them and they provided a patch (see http://sourceforge.net/p/bowtie-bio/bugs/312/) which seems to fix it on the OS X machines that I have access to, but apparently not for petty. Going back to the version at release is not an option, since I rate support for OS X 10.9 more important, especially in the long run. Michael On 13.06.2014 15:19, Kasper Daniel Hansen wrote: That is pretty weird. Since this is a segmentation fault in Bowtie, perhaps there are resource demands which are only satisfied on perceval (tmp space, #cores). Presumably this worked when Bioc was released, so what have you changed in between? Best, Kasper On Fri, Jun 13, 2014 at 3:32 AM, Michael Stadler michael.stad...@fmi.ch mailto:michael.stad...@fmi.ch wrote: Hi Dan, I'm cc'ing the list; maybe somebody else has experienced differences between petty and perceval. Rbowtie release (1.4.5) is not building under OS X 10.6.8 (petty). Rbowtie release (1.4.5) and development (1.5.5) are virtually identical (only DESCRIPTION and NEWS differ). The development version builds without problems on perceval, but the release version fails on petty: http://bioconductor.org/checkResults/devel/bioc-LATEST/Rbowtie/perceval-buildsrc.html http://bioconductor.org/checkResults/release/bioc-LATEST/Rbowtie/petty-buildsrc.html The only difference I can make out from the node info pages is that perceval has an additional section on C++11 compiler that is lacking from petty's NodeInfo page. Unfortunately, I cannot reproduce the issue, both Rbowtie 1.4.5 and 1.5.5 build successfully under OS X 10.6.8 and 10.7.5 using llvm-gcc-4.2. Do you have any idea what else could be different between petty and perceval? Thank you, Michael ___ Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Michael Stadler, PhD Head of Computational Biology Friedrich Miescher Institute Basel (Switzerland) Phone : +41 61 697 6492 Fax : +41 61 697 3976 Mail : michael.stad...@fmi.ch ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] [[- method dispatch and GenomicRanges
Hi Herve, I reinstalled AnnotationDbi 1.25.17, and now things work flawlessly. Thank you for your help! Michael On 09.04.2014 19:25, Hervé Pagès wrote: Hi Michael, Because of a recent change to IRanges, some information about the as.list method table is now out-of-sync in your installed AnnotationDbi (granted it was installed before the new IRanges propagated to you). Re-installing AnnotationDbi should clear this. I bumped AnnotationDbi version yesterday night (even though AnnotationDbi has not changed) to help clear this up for other users. The new version won't become available before tomorrow though. But, again, this new version (1.25.18) is similar to the current version (1.25.17), so you can re-install now with biocLite(AnnotationDbi). No need to wait until tomorrow. HTH, H. On 04/09/2014 12:23 AM, Michael Stadler wrote: Dear Bioc gurus, Since a few days, QuasR devel (1.3.13 and 1.3.14) is failing on some unit tests, which turns out to be reproducible independent of QuasR and might be related to R getting the wrong method for [[-. The following works fine... library(GenomicRanges) grl - GRangesList(r1=GRanges(chr1,IRanges(1,2)), r2=GRanges(chr2,IRanges(3,4))) grl[[1]] - GRanges(chr1,IRanges(10,11)) ...but this produces an error: library(GenomicRanges) library(GenomicFeatures) grl - GRangesList(r1=GRanges(chr1,IRanges(1,2)), r2=GRanges(chr2,IRanges(3,4))) grl[[1]] - GRanges(chr1,IRanges(10,11)) #Error in as.list(x, use.names = FALSE) : # could not find function .as.list.CompressedList My guess is that loading the GenomicFeatures library in addition to GenomicRanges influences the methods dispatch, but with my limited understanding of R and S4 classes I was not able to narrow it further down. I noticed that I am using a slightly older version of R (r65206) compared to the build system (r65358), however I also see the error with the most recent release candidate (r65385). Michael My session info: R version 3.1.0 alpha (2014-03-17 r65206) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicFeatures_1.15.15 AnnotationDbi_1.25.17 Biobase_2.23.6 [4] GenomicRanges_1.15.44 GenomeInfoDb_0.99.30IRanges_1.21.43 [7] BiocGenerics_0.9.3 RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] BBmisc_1.5BSgenome_1.31.13 [3] BatchJobs_1.2 BiocParallel_0.5.19 [5] Biostrings_2.31.21DBI_0.2-7 [7] GenomicAlignments_0.99.37 RCurl_1.95-4.1 [9] RSQLite_0.11.4Rcpp_0.11.1 [11] Rsamtools_1.15.41 XML_3.98-1.1 [13] XVector_0.3.7 biomaRt_2.19.3 [15] bitops_1.0-6 brew_1.0-6 [17] codetools_0.2-8 digest_0.6.4 [19] fail_1.2 foreach_1.4.1 [21] iterators_1.0.6 plyr_1.8.1 [23] rtracklayer_1.23.22 sendmailR_1.1-2 [25] stats4_3.1.0 stringr_0.6.2 [27] tools_3.1.0 zlibbioc_1.9.0 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Michael Stadler, PhD Head of Computational Biology Friedrich Miescher Institute Basel (Switzerland) Phone : +41 61 697 6492 Fax : +41 61 697 3976 Mail : michael.stad...@fmi.ch ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[Bioc-devel] [[- method dispatch and GenomicRanges
Dear Bioc gurus, Since a few days, QuasR devel (1.3.13 and 1.3.14) is failing on some unit tests, which turns out to be reproducible independent of QuasR and might be related to R getting the wrong method for [[-. The following works fine... library(GenomicRanges) grl - GRangesList(r1=GRanges(chr1,IRanges(1,2)), r2=GRanges(chr2,IRanges(3,4))) grl[[1]] - GRanges(chr1,IRanges(10,11)) ...but this produces an error: library(GenomicRanges) library(GenomicFeatures) grl - GRangesList(r1=GRanges(chr1,IRanges(1,2)), r2=GRanges(chr2,IRanges(3,4))) grl[[1]] - GRanges(chr1,IRanges(10,11)) #Error in as.list(x, use.names = FALSE) : # could not find function .as.list.CompressedList My guess is that loading the GenomicFeatures library in addition to GenomicRanges influences the methods dispatch, but with my limited understanding of R and S4 classes I was not able to narrow it further down. I noticed that I am using a slightly older version of R (r65206) compared to the build system (r65358), however I also see the error with the most recent release candidate (r65385). Michael My session info: R version 3.1.0 alpha (2014-03-17 r65206) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicFeatures_1.15.15 AnnotationDbi_1.25.17 Biobase_2.23.6 [4] GenomicRanges_1.15.44 GenomeInfoDb_0.99.30IRanges_1.21.43 [7] BiocGenerics_0.9.3 RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] BBmisc_1.5BSgenome_1.31.13 [3] BatchJobs_1.2 BiocParallel_0.5.19 [5] Biostrings_2.31.21DBI_0.2-7 [7] GenomicAlignments_0.99.37 RCurl_1.95-4.1 [9] RSQLite_0.11.4Rcpp_0.11.1 [11] Rsamtools_1.15.41 XML_3.98-1.1 [13] XVector_0.3.7 biomaRt_2.19.3 [15] bitops_1.0-6 brew_1.0-6 [17] codetools_0.2-8 digest_0.6.4 [19] fail_1.2 foreach_1.4.1 [21] iterators_1.0.6 plyr_1.8.1 [23] rtracklayer_1.23.22 sendmailR_1.1-2 [25] stats4_3.1.0 stringr_0.6.2 [27] tools_3.1.0 zlibbioc_1.9.0 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] [BioC] plot methylation figure for end analysis
I agree that the profile or metagene plot is useful and widely applied. Creating such a plot consists of several steps, including: - quantifying and accumulating signal around sites of interest - normalization - visualization The qProfile() function in QuasR was designed to do the quantification/accumulation, which is not straightforward to implement efficiently and could use a lot of memory. This leaves it to the user to choose an appropriate approach for the normalization (if at all necessary) and gives full flexibility for plotting. qProfile returns a list of vectors, so normalization and plotting are not very complex (as shown in the example mentioned by Tim, in the QuasR vignette in section 6.1.6 Create a genomic prole for a set of regions using qProfile). I would be interested to know your opinions - are these still too complex, so that a plot method would be of high value? Michael On 13.09.2013 07:21, bach le wrote: Many thanks. I've checked BSmooth but it seems not to be what I mentioned. As Tim said, this kind of figures are used quite often to illustrate the difference between groups of objects. Yet till now it seems that there has not been any offer from BioC (afa I may know). It would be great if someone can help to implement it. Cheers, On Fri, Sep 13, 2013 at 3:56 AM, Tim Triche, Jr. tim.tri...@gmail.comwrote: Bsmooth/bsseq (and QuasR, as well) are great for manipulating and analyzing BSseq data, but there's something in this email that I've never really found a great solution for, and that is so-called profile or metagene plots (the latter is a confusing bit of terminology, given that it is re-used for linear combinations of gene effects in expression studies). Suppose you wanted to look at the distribution of H3K4me3 or H3K27me3 ChIP-seq density relative to the start of (or exon/intron junctions within, or enhancers whose activity was linked to, ...) transcribed vs. silent genes across a genome. Suppose you also wanted to look at methylation proportions relative to the same positions (again, the idea here is to take a bunch of landmarks and plot a summary statistic describing the behavior of some measurement relative to those landmarks, across the genome). An example of such a plot is on page 23 of the QuasR user's manual. For clarity, here's a link: https://dl.dropboxusercontent.com/u/12962689/profile.png This seems like something that is so incredibly common it ought to be a generic function within BioC somewhere, but I've never found a really easy way to do it. If I want to plot a density matrix mat[x,y] for alhl x and all y, I can call image(mat) or persp(mat) and that's that. Is there something like this within biocGenerics of which I've remained completely ignorant, or... ? If it isn't a generic somewhere, I think I may have to implement one. On Thu, Sep 12, 2013 at 1:32 AM, Alex Gutteridge al...@ruggedtextile.com wrote: On 12.09.2013 08:25, Gia [guest] wrote: Hi, I would like to ask if anyone experienced with creating figures from BS-Seq methylation data for end analysis of genes (or any genomic features), like what was presented in the following link: http://www.nature.com/ng/**journal/v39/n1/fig_tab/ng1929_**F4.html http://www.nature.com/ng/journal/v39/n1/fig_tab/ng1929_F4.html It would be appreciate if you can show how to create this kind of figures using any Bioconductor packages. Cheers, -- output of sessionInfo(): None -- Sent via the guest posting facility at bioconductor.org. __**_ Bioconductor mailing list bioconduc...@r-project.org https://stat.ethz.ch/mailman/**listinfo/bioconductor https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.**science.biology.informatics.**conductor http://news.gmane.org/gmane.science.biology.informatics.conductor I've only skimmed through that paper but that figure isn't showing BS-Seq data as far as I can see - it's showing the results of an array / IP based approach to methylation quantitation. BSmooth is one of the best packages I've found for manipulating BS-Seq data. If you get your data in there you could derive analogous statistics (i.e. methylation levels) pretty easily. Then any of the base plotting functions will give you similar graphs. -- Alex Gutteridge __**_ Bioconductor mailing list bioconduc...@r-project.org https://stat.ethz.ch/mailman/**listinfo/bioconductor https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.** science.biology.informatics.**conductor http://news.gmane.org/gmane.science.biology.informatics.conductor -- *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* * * Benjamin Franklin, Poor Richard's Almanackhttp://archive.org/details/poorrichardsalma00franrich
Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Hi Valerie, Thank you for the quick fix - it works like a charm! Michael On 03/10/2013 02:45 AM, Valerie Obenchain wrote: Hi Michael, Thanks for reporting the bug and sending a reproducible example. Sorry it took a few days to get to this. A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel). Valerie On 03/07/13 05:43, Michael Stadler wrote: Here is the second attachment (variants.vcf) that was missing from the first message. Michael On 03/07/2013 02:40 PM, Michael Stadler wrote: Dear Valerie and colleagues, I have recently started using the VariantAnnotation package, which is a huge asset for much of my work with sequence variants - thank you! I have a question regarding the following example (it makes use of two small text files, annot.gtf and variants.vcf, attached to this message): library(VariantAnnotation) library(GenomicFeatures) library(BSgenome.Celegans.UCSC.ce6) # build a TranscriptDb from annot.gtf chrominfo - data.frame(chrom=as.character(seqnames(Celegans)), length=seqlengths(Celegans), is_circular=FALSE) txdb - makeTranscriptDbFromGFF(file=annot.gtf, format=gtf, chrominfo=chrominfo, dataSource=WormBase v. WS190, species=Caenorhabditis elegans) txdb #TranscriptDb object: #| Db type: TranscriptDb #| Supporting package: GenomicFeatures #| Data source: WormBase v. WS190 #| Organism: Caenorhabditis elegans #| miRBase build ID: NA #| transcript_nrow: 7 #| exon_nrow: 42 #| cds_nrow: 42 #| Db created by: GenomicFeatures package from Bioconductor #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) #| GenomicFeatures version at creation time: 1.11.14 #| RSQLite version at creation time: 0.11.2 #| DBSCHEMAVERSION: 1.0 # read in sequence variants vcf - readVcf(variants.vcf, genome=ce6) dim(vcf) #[1] 3 1 # predict the impact on coding sequences aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans) length(aa) #[1] 7 # My question is related to the impact of variant chrV:15822727: # This SNV is overlapping two transcripts (same base, plus strand), # yet the reported VARCODON values are different (GAT - TAT/AAT): aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATTAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA # interestingly, when altering the annotation or the set of variants, # this contradictions disapears (GAT - AAT): vcf2 - vcf[-1] # remove the first SNV aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) length(aa2) #[1] 5 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATAAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA Do you know why this could be the case? Regards, Michael Here is my envirnoment: sessionInfo() R Under development (unstable) (2013-02-25 r62061) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] BSgenome.Celegans.UCSC.ce6_1.3.19 [2] BSgenome_1.27.1 [3] GenomicFeatures_1.11.14 [4] AnnotationDbi_1.21.12 [5] Biobase_2.19.3 [6] VariantAnnotation_1.5.42 [7] Rsamtools_1.11.21 [8] Biostrings_2.27.11 [9] GenomicRanges_1.11.35 [10] IRanges_1.17.35 [11] BiocGenerics_0.5.6 [12] RColorBrewer_1.0-5 loaded via a namespace
[Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Dear Valerie and colleagues, I have recently started using the VariantAnnotation package, which is a huge asset for much of my work with sequence variants - thank you! I have a question regarding the following example (it makes use of two small text files, annot.gtf and variants.vcf, attached to this message): library(VariantAnnotation) library(GenomicFeatures) library(BSgenome.Celegans.UCSC.ce6) # build a TranscriptDb from annot.gtf chrominfo - data.frame(chrom=as.character(seqnames(Celegans)), length=seqlengths(Celegans), is_circular=FALSE) txdb - makeTranscriptDbFromGFF(file=annot.gtf, format=gtf, chrominfo=chrominfo, dataSource=WormBase v. WS190, species=Caenorhabditis elegans) txdb #TranscriptDb object: #| Db type: TranscriptDb #| Supporting package: GenomicFeatures #| Data source: WormBase v. WS190 #| Organism: Caenorhabditis elegans #| miRBase build ID: NA #| transcript_nrow: 7 #| exon_nrow: 42 #| cds_nrow: 42 #| Db created by: GenomicFeatures package from Bioconductor #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) #| GenomicFeatures version at creation time: 1.11.14 #| RSQLite version at creation time: 0.11.2 #| DBSCHEMAVERSION: 1.0 # read in sequence variants vcf - readVcf(variants.vcf, genome=ce6) dim(vcf) #[1] 3 1 # predict the impact on coding sequences aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans) length(aa) #[1] 7 # My question is related to the impact of variant chrV:15822727: # This SNV is overlapping two transcripts (same base, plus strand), # yet the reported VARCODON values are different (GAT - TAT/AAT): aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATTAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA # interestingly, when altering the annotation or the set of variants, # this contradictions disapears (GAT - AAT): vcf2 - vcf[-1] # remove the first SNV aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) length(aa2) #[1] 5 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATAAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA Do you know why this could be the case? Regards, Michael Here is my envirnoment: sessionInfo() R Under development (unstable) (2013-02-25 r62061) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] BSgenome.Celegans.UCSC.ce6_1.3.19 [2] BSgenome_1.27.1 [3] GenomicFeatures_1.11.14 [4] AnnotationDbi_1.21.12 [5] Biobase_2.19.3 [6] VariantAnnotation_1.5.42 [7] Rsamtools_1.11.21 [8] Biostrings_2.27.11 [9] GenomicRanges_1.11.35 [10] IRanges_1.17.35 [11] BiocGenerics_0.5.6 [12] RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2 [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5 [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0 [10] zlibbioc_1.5.0 chrVCoding_transcript exon6512065 6512138 . - 2 gene_id WBGene1073; transcript_id F46E10.9.1; chrVCoding_transcript exon6512189 6512279 . - 0 gene_id WBGene1073; transcript_id F46E10.9.1; chrVCoding_transcript exon6512325 6512472 . - 1 gene_id WBGene1073; transcript_id F46E10.9.1; chrVCoding_transcript exon6512521 6512649 .
Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Here is the second attachment (variants.vcf) that was missing from the first message. Michael On 03/07/2013 02:40 PM, Michael Stadler wrote: Dear Valerie and colleagues, I have recently started using the VariantAnnotation package, which is a huge asset for much of my work with sequence variants - thank you! I have a question regarding the following example (it makes use of two small text files, annot.gtf and variants.vcf, attached to this message): library(VariantAnnotation) library(GenomicFeatures) library(BSgenome.Celegans.UCSC.ce6) # build a TranscriptDb from annot.gtf chrominfo - data.frame(chrom=as.character(seqnames(Celegans)), length=seqlengths(Celegans), is_circular=FALSE) txdb - makeTranscriptDbFromGFF(file=annot.gtf, format=gtf, chrominfo=chrominfo, dataSource=WormBase v. WS190, species=Caenorhabditis elegans) txdb #TranscriptDb object: #| Db type: TranscriptDb #| Supporting package: GenomicFeatures #| Data source: WormBase v. WS190 #| Organism: Caenorhabditis elegans #| miRBase build ID: NA #| transcript_nrow: 7 #| exon_nrow: 42 #| cds_nrow: 42 #| Db created by: GenomicFeatures package from Bioconductor #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) #| GenomicFeatures version at creation time: 1.11.14 #| RSQLite version at creation time: 0.11.2 #| DBSCHEMAVERSION: 1.0 # read in sequence variants vcf - readVcf(variants.vcf, genome=ce6) dim(vcf) #[1] 3 1 # predict the impact on coding sequences aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans) length(aa) #[1] 7 # My question is related to the impact of variant chrV:15822727: # This SNV is overlapping two transcripts (same base, plus strand), # yet the reported VARCODON values are different (GAT - TAT/AAT): aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATTAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA # interestingly, when altering the annotation or the set of variants, # this contradictions disapears (GAT - AAT): vcf2 - vcf[-1] # remove the first SNV aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) length(aa2) #[1] 5 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATAAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA Do you know why this could be the case? Regards, Michael Here is my envirnoment: sessionInfo() R Under development (unstable) (2013-02-25 r62061) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] BSgenome.Celegans.UCSC.ce6_1.3.19 [2] BSgenome_1.27.1 [3] GenomicFeatures_1.11.14 [4] AnnotationDbi_1.21.12 [5] Biobase_2.19.3 [6] VariantAnnotation_1.5.42 [7] Rsamtools_1.11.21 [8] Biostrings_2.27.11 [9] GenomicRanges_1.11.35 [10] IRanges_1.17.35 [11] BiocGenerics_0.5.6 [12] RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2 [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5 [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0 [10] zlibbioc_1.5.0 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel