Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta

2017-04-20 Thread Michael Stadler
Dear Martin,

I've now spend a day trying to resolve this. To make a long story short:
I could get bowtie 1.1.1 (currently used in Rbowtie 1.15.1) to compile
and work under macOS Sierra.

I tried updating bowtie, but:
- bowtie 1.2 has a bug that prevents it from using multiple threads,
which I could reproduce both under Linux and macOS Sierra and which I
could not resolve (see https://github.com/BenLangmead/bowtie/issues/41).
- bowtie 1.1.2 crashes if compiled with clang (see
https://github.com/BenLangmead/bowtie/issues/21), the only known
workaround being to use a different toolchain, which I think is not an
option for BioC/Rbowtie.

I hope that resolves also the issue on El Capitan.

Michael

On 19.04.2017 13:47, Martin Morgan wrote:
> On 04/19/2017 05:45 AM, Michael Stadler wrote:
>> Dear BioC core,
>>
>> Thanks for the report, Herve. If I understand correctly, there is
>> nothing I can do at this point to make QuasR green on windows, correct?
>>
>> I have another question regarding QuasR not building on veracruz2: The
>> vignette does not build currently, reporting:
>> Error on veracruz2.bioconductor.org processing sample
>> /tmp/RtmpJBWrjI/chip_1_1.fq.bz2df0b6901ff33.fastq : 'asBam' internal:
>> samtools invoked 'abort' ...
>>
>> Though it seems to build fine on other platforms, and there were no
>> recent changes to the vignette. What would you or other suggest to do
>> about that?
> 
> The error is in createGenomicAlignmentsController after
> 
> https://github.com/Bioconductor-mirror/QuasR/blob/cc374678033055f2bd4d105c502a426807223c1c/R/createAlignments-functions.R#L292
> 
> 
> it looks like the sam file is quite funky
> 
> Browse[4]> options(nwarnings=1)
> Browse[4]> xx = readLines(samFile)
> There were 2339 warnings (use warnings() to see them)
> Browse[4]> head(warnings(), 3)
> Warning messages:
> 1: In readLines(samFile) : line 7 appears to contain an embedded nul
> 2: In readLines(samFile) : line 8 appears to contain an embedded nul
> 3: In readLines(samFile) : line 9 appears to contain an embedded nul
> Browse[4]> table(nzchar(xx))
> 
> FALSE  TRUE
>  2341   261
> Browse[4]> substring(head(xx, 10), 1, 70)
>  [1] "@HD\tVN:1.0\tSO:unsorted"
>  [2] "@SQ\tSN:chr1\tLN:4"
>  [3] "@SQ\tSN:chr2\tLN:1"
>  [4] "@SQ\tSN:chr3\tLN:45000"
> [5]
> "@PG\tID:Bowtie\tVN:1.1.1\tCL:\"/Library/Frameworks/R.framework/Versions/3."
> 
>  [6] ""
>  [7] ""
>  [8] ""
>  [9] ""
> [10] ""
> 
> The 'abort' from Rsamtools is
> 
>   Parse error at line 143: missing colon in auxiliary data
> 
> It's not really clear whether R is being confused by the embedded nulls
> or blank lines or other problem
> 
> Browse[4]> xx[140 + 1:5]
> [1] "" "" "" "" ""
> Browse[4]> xx[nzhchar(xx)][140 + 1:5]
> Error in nzhchar(xx) : could not find function "nzhchar"
> Browse[4]> xx[nzchar(xx)][140 + 1:5]
> [1]
> "seq10137\t4\t*\t0\t0\t*\t*\t0\t0\tTCGTTATGGTTTCCGTTGCTGCCATCTCACAT\tB@BABCBBBABABA?A>8>A7:6=@>>:@BAA>1;B\tXM:i:0"
> 
> [2]
> "seq10138\t4\t*\t0\t0\t*\t*\t0\t0\tCTGGCGACTCCTTCGAGATGGACGCCGTTGGCGCTC\tABBB@CBBBA9BB@>'>9@AA=A?\tXM:i:0"
> 
> [3]
> "seq10139\t4\t*\t0\t0\t*\t*\t0\t0\tCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAAT\tBBCBBC=A@BAABBABBA=A??><\tXM:i:0"
> 
> [4]
> "seq10140\t4\t*\t0\t0\t*\t*\t0\t0\tGGTTGTCAGCGTCATAAGAGGACCTCCAAATG\t;;;AA=AA<AA<ABBB?=@>>=CCBB>9@@>B=BB=\tXM:i:0"
> 
> [5]
> "seq10141\t4\t*\t0\t0\t*\t*\t0\t0\tAACCCTAATGAGCTTAATCAAGATGATGCTCGTTAT\tBBAB@AAB@BBBA@B@ABAABBABAA@B?A?@\tXM:i:0"
> 
> 
> So I guess it's in creation of the sam file -- Bowtie?
> 
> Martin
> 
>>
>> Any suggestions are appreciated,
>> Michael
>>
>>
>>
>> On 17.04.2017 02:08, Hervé Pagès wrote:
>>> FWIW here are all the packages that are victim of this
>>> installed.packages bug in today's build report:
>>>
>>>   alpine
>>>   fCI
>>>   GenomicFeatures
>>>   QuasR
>>>
>>> We only see this error on tokay2 (Windows).
>>>
>>> H.
>>>
>>>
>>> On 04/11/2017 04:21 PM, Gordon K Smyth wrote:
>>>> I restarted my PC this morning and the problem disappeared.
>>>>
>>>> I probably should have tried that last night, but it was late ...
>>>>
>>>> Thanks
>>>> Gordon
>>>>
>>>>> -Original M

Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta

2017-04-19 Thread Michael Stadler
Dear Martin,

Thank you for the quick response - you are right to suspect bowtie, it
seems that version 1.1.1 which I am using in Rbowtie has an issue on
newer macOS versions, and I am currently looking into wether an update
to version 1.2 would fix this.

A first issue that I stumbled into is that R CMD check now warns about
the bowtie code using C++11 extensions. Following the manual, I have
added the following to DESCRIPTION:
  SystemRequirements: GNU make, C++11
("GNU make" was already there), but the field seems to be ignored by R
CMD check and the warning persists. Is this something to worry about?

I hope to get back with a solution for the bowtie issue soon.
Michael



On 19.04.2017 13:47, Martin Morgan wrote:
> On 04/19/2017 05:45 AM, Michael Stadler wrote:
>> Dear BioC core,
>>
>> Thanks for the report, Herve. If I understand correctly, there is
>> nothing I can do at this point to make QuasR green on windows, correct?
>>
>> I have another question regarding QuasR not building on veracruz2: The
>> vignette does not build currently, reporting:
>> Error on veracruz2.bioconductor.org processing sample
>> /tmp/RtmpJBWrjI/chip_1_1.fq.bz2df0b6901ff33.fastq : 'asBam' internal:
>> samtools invoked 'abort' ...
>>
>> Though it seems to build fine on other platforms, and there were no
>> recent changes to the vignette. What would you or other suggest to do
>> about that?
> 
> The error is in createGenomicAlignmentsController after
> 
> https://github.com/Bioconductor-mirror/QuasR/blob/cc374678033055f2bd4d105c502a426807223c1c/R/createAlignments-functions.R#L292
> 
> 
> it looks like the sam file is quite funky
> 
> Browse[4]> options(nwarnings=1)
> Browse[4]> xx = readLines(samFile)
> There were 2339 warnings (use warnings() to see them)
> Browse[4]> head(warnings(), 3)
> Warning messages:
> 1: In readLines(samFile) : line 7 appears to contain an embedded nul
> 2: In readLines(samFile) : line 8 appears to contain an embedded nul
> 3: In readLines(samFile) : line 9 appears to contain an embedded nul
> Browse[4]> table(nzchar(xx))
> 
> FALSE  TRUE
>  2341   261
> Browse[4]> substring(head(xx, 10), 1, 70)
>  [1] "@HD\tVN:1.0\tSO:unsorted"
>  [2] "@SQ\tSN:chr1\tLN:4"
>  [3] "@SQ\tSN:chr2\tLN:1"
>  [4] "@SQ\tSN:chr3\tLN:45000"
> [5]
> "@PG\tID:Bowtie\tVN:1.1.1\tCL:\"/Library/Frameworks/R.framework/Versions/3."
> 
>  [6] ""
>  [7] ""
>  [8] ""
>  [9] ""
> [10] ""
> 
> The 'abort' from Rsamtools is
> 
>   Parse error at line 143: missing colon in auxiliary data
> 
> It's not really clear whether R is being confused by the embedded nulls
> or blank lines or other problem
> 
> Browse[4]> xx[140 + 1:5]
> [1] "" "" "" "" ""
> Browse[4]> xx[nzhchar(xx)][140 + 1:5]
> Error in nzhchar(xx) : could not find function "nzhchar"
> Browse[4]> xx[nzchar(xx)][140 + 1:5]
> [1]
> "seq10137\t4\t*\t0\t0\t*\t*\t0\t0\tTCGTTATGGTTTCCGTTGCTGCCATCTCACAT\tB@BABCBBBABABA?A>8>A7:6=@>>:@BAA>1;B\tXM:i:0"
> 
> [2]
> "seq10138\t4\t*\t0\t0\t*\t*\t0\t0\tCTGGCGACTCCTTCGAGATGGACGCCGTTGGCGCTC\tABBB@CBBBA9BB@>'>9@AA=A?\tXM:i:0"
> 
> [3]
> "seq10139\t4\t*\t0\t0\t*\t*\t0\t0\tCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAAT\tBBCBBC=A@BAABBABBA=A??><\tXM:i:0"
> 
> [4]
> "seq10140\t4\t*\t0\t0\t*\t*\t0\t0\tGGTTGTCAGCGTCATAAGAGGACCTCCAAATG\t;;;AA=AA<AA<ABBB?=@>>=CCBB>9@@>B=BB=\tXM:i:0"
> 
> [5]
> "seq10141\t4\t*\t0\t0\t*\t*\t0\t0\tAACCCTAATGAGCTTAATCAAGATGATGCTCGTTAT\tBBAB@AAB@BBBA@B@ABAABBABAA@B?A?@\tXM:i:0"
> 
> 
> So I guess it's in creation of the sam file -- Bowtie?
> 
> Martin
> 
>>
>> Any suggestions are appreciated,
>> Michael
>>
>>
>>
>> On 17.04.2017 02:08, Hervé Pagès wrote:
>>> FWIW here are all the packages that are victim of this
>>> installed.packages bug in today's build report:
>>>
>>>   alpine
>>>   fCI
>>>   GenomicFeatures
>>>   QuasR
>>>
>>> We only see this error on tokay2 (Windows).
>>>
>>> H.
>>>
>>>
>>> On 04/11/2017 04:21 PM, Gordon K Smyth wrote:
>>>> I restarted my PC this morning and the problem disappeared.
>>>>
>>>> I probably should have tried that last night, but it was late ...
>>>>
>>>> Thanks
>>>> Gordon
>>>>
>>>>> -Original Message-
>>

Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta

2017-04-19 Thread Michael Stadler
Dear BioC core,

Thanks for the report, Herve. If I understand correctly, there is
nothing I can do at this point to make QuasR green on windows, correct?

I have another question regarding QuasR not building on veracruz2: The
vignette does not build currently, reporting:
Error on veracruz2.bioconductor.org processing sample
/tmp/RtmpJBWrjI/chip_1_1.fq.bz2df0b6901ff33.fastq : 'asBam' internal:
samtools invoked 'abort' ...

Though it seems to build fine on other platforms, and there were no
recent changes to the vignette. What would you or other suggest to do
about that?

Any suggestions are appreciated,
Michael



On 17.04.2017 02:08, Hervé Pagès wrote:
> FWIW here are all the packages that are victim of this
> installed.packages bug in today's build report:
> 
>   alpine
>   fCI
>   GenomicFeatures
>   QuasR
> 
> We only see this error on tokay2 (Windows).
> 
> H.
> 
> 
> On 04/11/2017 04:21 PM, Gordon K Smyth wrote:
>> I restarted my PC this morning and the problem disappeared.
>>
>> I probably should have tried that last night, but it was late ...
>>
>> Thanks
>> Gordon
>>
>>> -Original Message-
>>> From: Martin Morgan [mailto:martin.mor...@roswellpark.org]
>>> Sent: Tuesday, 11 April 2017 7:20 PM
>>> To: Gordon K Smyth ; bioc-devel@r-project.org
>>> Subject: Re: [Bioc-devel] Using BiocInstaller with R 3.4.0 beta
>>>
>>> On 04/11/2017 05:01 AM, Gordon K Smyth wrote:
 The problem appears to be with installed.packages(). If I start a
 fresh R
>>> 3.4.0beta session, then I can run installed.packages() once with
>>> correct results,
>>> but running it a second time gives the error message:

> installed.packages()
 Error in if (file.exists(dest) && file.mtime(dest) > file.mtime(lib)
 &&  :
   missing value where TRUE/FALSE needed
>>>
>>> The test is in this code chunk, from utils/R/packages.R
>>>
>>>  for(lib in lib.loc) {
>>>  if(noCache) {
>>>  ret0 <- .readPkgDesc(lib, fields)
>>>  if(length(ret0)) retval <- rbind(retval, ret0)
>>>  } else {
>>>  ## Previously used URLencode for e.g. Windows paths with
>>> drives
>>>  ## This version works for very long file names.
>>>  base <- paste(c(lib, fields), collapse = ",")
>>>  ## add length and 64-bit CRC in hex (in theory, seems
>>>  ## it is actually 32-bit on some systems)
>>>  enc <- sprintf("%d_%s", nchar(base), .Call(C_crc64, base))
>>>  dest <- file.path(tempdir(), paste0("libloc_", enc,
>>> ".rds"))
>>>  if(file.exists(dest) &&
>>> file.mtime(dest) > file.mtime(lib) &&
>>> (val <- readRDS(dest))$base == base)
>>>  ## use the cache file
>>>  retval <- rbind(retval, val$value)
>>>  else {
>>>  ret0 <- .readPkgDesc(lib, fields)
>>>  if(length(ret0)) {
>>>  retval <- rbind(retval, ret0)
>>>  ## save the cache file
>>>  saveRDS(list(base = base, value = ret0), dest)
>>>  }
>>>  }
>>>  }
>>>
>>>
>>> where 'lib' is one of .libPaths(), 'dest' is one of
>>>
>>>dir(tempdir(), pattern="libloc_", full=TRUE)
>>>
>>> and 'base' should be a character(1)
>>>
>>> I think the code chunk has tried to cache the packages installed in each
>>> directory of .libPaths() (the saveRDS() line), and these are somehow
>>> corrupted on the second time through (I guess evaluating the
>>> readRDS()??).
>>>
>>> For instance I have two paths in .libPaths() and after the first
>>> install.packages() I have
>>>
>>>  > str(readRDS(dir(tempdir(), full=TRUE)[1]))
>>> List of 2
>>>   $ base : chr
>>> "/home/mtmorgan/bin/R-3-4-
>>> branch/library,Version,Priority,Depends,Imports,LinkingTo,Suggests,Enhances,Li
>>>
>>> cense,Li"|
>>> __truncated__
>>>   $ value: chr [1:29, 1:17] "base" "boot" "class" "cluster" ...
>>>  > str(readRDS(dir(tempdir(), full=TRUE)[2]))
>>> List of 2
>>>   $ base : chr
>>> "/home/mtmorgan/R/x86_64-pc-linux-gnu-library/3.4-Bioc-
>>> 3.5,Version,Priority,Depends,Imports,LinkingTo,Suggests,E"|
>>> __truncated__
>>>   $ value: chr [1:513, 1:17] "abind" "acepack" "aCGH" "ADaCGH2" ...
>>>
>>> I'm guessing that one of these files is corrupted somehow, but it's not
>>> obvious how. Can you use options(error=recover) and find the values that
>>> cause the conditional to fail?
>>>
>>> Martin
>>>


> -Original Message-
> From: Gordon K Smyth
> Sent: Tuesday, 11 April 2017 6:26 PM
> To: bioc-devel@r-project.org
> Subject: Using BiocInstaller with R 3.4.0 beta
>
> I thought I would test out R 3.4.0 beta (for Windows) but now I
> can't use the
> BiocInstaller package. Attempts to use biocLite() or biocValid()
> lead to an
>>> error
> message as shown below.
>
> What have I overlooked?
>
> Thanks
> Gordon

Re: [Bioc-devel] Question: set purification test for overlapped regions between three GRanges object

2016-04-29 Thread Michael Stadler
Dear Jurat,

Maybe this would be better asked on support.bioconductor.org.

I don't think I fully understand what you intend to do (a fully working
example would help), but here are two ideas that could be useful:

1. Maybe a single findOverlaps() will be enough to find all you need:

D <- c(a,b,c)
ov <- findOverlaps(query=D, subject=D)

By inspecting queryHits(ov) and subjectHits(ov), you can find out
whether a particular regions originally came from a, b or c (e.g.
anything in 1:length(a) will be regions from a).

2. Maybe you can use intersect() to find common indices, see ?intersect

Hope that helps,
Michael

On 28.04.2016 18:20, Jurat Shayidin wrote:
> Dear Mailing list:
> 
> I got stuck with implementing wrapper function for my packages. when
> three GRanges objects (e.g. a,b,c) are given to my function, I would
> like to find overlapped regions from one to another in parallel,
> where a as query, b,c are subjects respectively. Because of three
> input was given, so I want to call findOverlaps function in the
> context of changing parameter (query, subject will be switched in
> each individual test). To be clarify my point, I could have this
> workflow:
> 
> *1st test*: ov_1 <- list(ov.1 <- findOverlaps(a, b), ov.2 <-
> findOverlaps( a,c))
> 
> intermediate output of 1st test: a.1 <- list(a.sc, a.wd) ; b.1 <-
> list(b.sc, b.wd) ; c.1 <- list(c.sc, c.wd)
> 
> *2nd test*: ov_2 <- list(ov.1 <- findOverlaps(b,a), ov.2 <-
> findOverlaps(b,c ))
> 
> intermediate output of 2nd test: b.2 <- list(b.sc_, b.wd_) ; a.2 <-
> list(a.sc_, a.wd_) ;  c.2 <- list(c.sc_, c.wd_)
> 
> *3rd test*: ov_3 <- list(ov.1 <- findOverlaps(c,a), ov.2 <-
> findOverlaps(c,b ))
> 
> intermediate output of 3rd test: c.3 <- list(b.sc__, b.wd__) ; a.3
> <- list(a.sc__, a.wd__) ;  c.3 <- list(c.sc__, c.wd__)
> 
> start 1st test-> read data  - > find overlapped regions conditionally
> in parallel -> filtering function with specific threshold value  -> 
> chisq.test() - > get combined pvalue, and do further filtering
> process - > save result in function' environment -> go to 2nd test ->
> repeat workflow - - -   -> go to 3rd test -> - - - -   -> all test is
> done, prepare to generate final output of each GRanges objects - >
> package job is DONE !
> 
> In particular, in each individual test, a,b,c could contain 2
> different group of regions as intermediate output, but it is not
> final step, I must go to 2nd test, 3rd test respectively.
> 
> However, I have hard time to find efficient solution for this issue, 
> because each individual test, a,b,c contains different set of
> genomic regions where each has 2 different group of regions as
> intermediate output. My goal is to implement function for set
> purification for intermediate output of each GRanges objects from 3
> different test.
> 
> desired job that I want to implement is , for a.1 <- list(a.sc, a.wd)
> , a.2 <- list(a.sc_, a.wd_) ,  a.3 <- list(a.sc__, a.wd__),
> implement function to retrieve set of genomic regions that all
> appeared in 1st, 2nd, 3rd test respectively.
> 
> *Objective*: I want to retrieve the regions that all appeared in 1st,
> 2nd, 3rd test. How can I efficiently solve this issues ?Is there any
> one give me possible idea to solve this problem? Any possible
> approach, IDEA, sketch solution, or existing bioconductor package are
> highly appreciated. Thanks a lot
> 
> Best regards:
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] 'memory not mapped' in trimLRpatterns

2015-04-30 Thread Michael Stadler
Hi Herve,

I stumbled again over the 'memory not mapped' issue in trimLRpatterns
using updated versions of R and BioC-devel. I guess it does not hit
people very often, but I would highly appreciate if it could be fixed.

Many thanks,
Michael

PS: I can reproduce the issue using the code below under:
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

locale:
[1] C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
 [1] ShortRead_1.27.1GenomicAlignments_1.5.1 Rsamtools_1.21.3
 [4] GenomicRanges_1.21.5GenomeInfoDb_1.5.1  BiocParallel_1.3.4
 [7] Biostrings_2.37.0   XVector_0.9.0   IRanges_2.3.6
[10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2

loaded via a namespace (and not attached):
 [1] lattice_0.20-31  bitops_1.0-6 grid_3.2.0
 [4] futile.options_1.0.0 zlibbioc_1.15.0  hwriter_1.3.2
 [7] latticeExtra_0.6-26  futile.logger_1.4.1  lambda.r_1.1.7
[10] Biobase_2.29.0


On 25.04.2014 13:11, Hervé Pagès wrote:
 Hi Michael,
 
 Thanks for the report. I'll look into this.
 
 H.
 
 On 04/22/2014 08:29 AM, Michael Stadler wrote:
 Dear Herve,

 We are hitting a 'memory not mapped' problem when using trimLRpatterns
 as detailed below. I did not manage to reproduce it with few sequences,
 so I have to refer to a publicly available sequence file with many
 reads. Even then, it occasionally runs through without problems.

 Also, our use-case may not be typical and be part of the problem - maybe
 the solution will be to change our use of trimLRpatterns.

 Here is some code to illustrate/reproduce the problem:

 library(Biostrings)
 library(ShortRead)

 Rpat - TGGAATTCTCGGGTGCCAAGG
 maxRmm - rep(0:2, c(6,3,nchar(Rpat)-9))

 fq1 - DNAStringSet(c(TGGAATTCTCGGGTGCCAAGG,
TGGAATTCTCGGGTGCCAAGG))

 # the second read is not trimmed because it runs through the adaptor
 trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
 #  A DNAStringSet instance of length 2
 #width seq
 #[1] 4 
 #[2]28 TGGAATTCTCGGGTGCCAAGGTTT

 # as a workaround, we pad the adaptor with Ns and
 # increase the mismatch tolerance
 numNs - 90
 maxRmm - c(maxRmm, 1:numNs+max(maxRmm))
 Rpat - paste(c(Rpat, rep(N, numNs)), collapse=)

 # now, also the second read gets trimmed
 trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm)
 #  A DNAStringSet instance of length 2
 #width seq
 #[1] 4 
 #[2] 4 

 # to trigger the segmentation fault, many reads are needed
 download.file(ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz;,

 ERR000916_1.fastq.gz)
 fq2 - readFastq(ERR000916_1.fastq.gz)

 fq3 - trimLRPatterns(subject=fq2, Rpattern=Rpat, max.Rmismatch=maxRmm)
 # *** caught segfault ***
 #address 0x7f5109be4fed, cause 'memory not mapped'
 #
 #Traceback:
 # 1: .Call(.NAME, ..., PACKAGE = PACKAGE)
 # 2: .Call2(XStringSet_vmatch_pattern_at, pattern, subject, at,
 at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type,
 auto.reduce.pattern, PACKAGE = Biostrings)
 # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch,
 min.mismatch, with.indels, fixed, .to.ans.type(follow.index),
 auto.reduce.pattern)
 # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
   ending.at = subject_width, max.mismatch = max.Rmismatch,
 with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
 # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
   ending.at = subject_width, max.mismatch = max.Rmismatch,
 with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
 # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels,
   Rfixed)
 # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject,
 max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed,
 Rfixed, ranges)
 # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
 max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges
 = TRUE)
 # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch,
 max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges
 = TRUE)
 #10: eval(expr, envir, enclos)
 #11: eval(call, sys.frame(sys.parent()))
 #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch,
 max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, ranges =
 TRUE)
 #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
 maxRmm, with.Rindels = FALSE)
 #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch =
 maxRmm, with.Rindels = FALSE)

 The problem did not occur in R 3.0.3 with BioC 2.13.
 Do you have an idea what's wrong?

 Thanks for your help,
 Michael



 sessionInfo()R version 3.1.0 (2014-04-10)
 #Platform: x86_64-unknown-linux-gnu (64-bit

Re: [Bioc-devel] differences between petty and perceval (OS X 10.6.8 build machines for release/devel)

2014-06-16 Thread Michael Stadler
Dear Dan, Martin and Nate,

Thank you for looking into it. I guess that is pointing to a problem
within bowtie.

It looks like the EXC_BAD_ACCESS you see on petty in ebwt.h is not
reproducible on the other Mac or Linux machines we tried. Is it possible
to run valgrind on petty? That may confirm/rule out if the memory
(de-)allocation issues reported on Linux are related.

I would like to submit a bug-report to the bowtie developers, but am
reluctant to do that without being able to reproduce the problem or test
potential fixed. I would have the options to go through Rbowtie build
cycles, but would have to rely on the assumption that petty will keep
hitting this hickup even with modified bowtie code. The minor
differences between bowtie 1.0.1 and bowtie 1.0.1-bug-312 argue against
that.

I am tempted to stay with the current situtation:
  - OS X before 10.9 needs to use Rbowtie = 1.4.4
(based on bowtie 1.0.1)
  - OS X 10.9 onwards and everything else uses Rbowtie = 1.4.5
(based on bowtie 1.0.1 /patched bugs-312).

Thanks again for your efforts,
Michael


On 14.06.2014 01:31, Dan Tenenbaum wrote:
 Hi Michael,
 
 
 
 - Original Message -
 From: Michael Stadler michael.stad...@fmi.ch
 To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel@r-project.org
 Sent: Friday, June 13, 2014 12:32:52 AM
 Subject: differences between petty and perceval (OS X 10.6.8 build machines 
 for release/devel)

 Hi Dan,

 I'm cc'ing the list; maybe somebody else has experienced differences
 between petty and perceval.

 Rbowtie release (1.4.5) is not building under OS X 10.6.8 (petty).

 Rbowtie release (1.4.5) and development (1.5.5) are virtually
 identical
 (only DESCRIPTION and NEWS differ).

 The development version builds without problems on perceval, but the
 release version fails on petty:
 http://bioconductor.org/checkResults/devel/bioc-LATEST/Rbowtie/perceval-buildsrc.html
 http://bioconductor.org/checkResults/release/bioc-LATEST/Rbowtie/petty-buildsrc.html

 The only difference I can make out from the node info pages is that
 perceval has an additional section on C++11 compiler that is
 lacking
 from petty's NodeInfo page.

 Unfortunately, I cannot reproduce the issue, both Rbowtie 1.4.5 and
 1.5.5 build successfully under OS X 10.6.8 and 10.7.5 using
 llvm-gcc-4.2.

 Do you have any idea what else could be different between petty and
 perceval?
 
 Martin and Nate and I took a look at this. I managed to come up with a bowtie 
 command line that would reliably reproduce the segfault on petty.
 
 Then we ran that under gdb (and turned off compiler optimizations) and came 
 up with this, which may or may not help you:
 
 petty:vignettes biocbuild$ gdb --args 
 '/Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rbowtie/bowtie'
  -y -S -k 10 -m 10 -v 2 -r -p 4 --best --strata 'doit/refsIndex/index' 
 'doit/SpliceMapTemp_876c378e20ac/25mers.map' 
 'doit/SpliceMapTemp_876c378e20ac/25mers.map_unsorted' 
 GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Mon Aug 15 16:03:10 UTC 
 2011)
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as x86_64-apple-darwin...Reading symbols for shared 
 libraries ... done
 
 (gdb) run
 Starting program: 
 /Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rbowtie/bowtie 
 -y -S -k 10 -m 10 -v 2 -r -p 4 --best --strata doit/refsIndex/index 
 doit/SpliceMapTemp_876c378e20ac/25mers.map 
 doit/SpliceMapTemp_876c378e20ac/25mers.map_unsorted
 Reading symbols for shared libraries ++. done
 
 Program received signal EXC_BAD_ACCESS, Could not access memory.
 Reason: KERN_INVALID_ADDRESS at address: 0x23d0d92d
 [Switching to process 36144 thread 0x20f]
 0x000478b1 in Ebwtseqan::Stringseqan::SimpleTypeunsigned char, 
 seqan::_Dna, seqan::Allocvoid  ::rowL (this=0xbfffda10, l=@0xa300e14) at 
 ebwt.h:1816
 1816return unpack_2b_from_8b(l.side(this-_ebwt)[l._by], l._bp);
 (gdb) l
 1811inline int EbwtTStr::rowL(const SideLocus l) const {
 1812// Extract and return appropriate bit-pair
 1813#ifdef SIXTY4_FORMAT
 1814return (((uint64_t*)l.side(this-_ebwt))[l._by  3]  l._by 
  7)  2) + l._bp)  1))  3;
 1815#else
 1816return unpack_2b_from_8b(l.side(this-_ebwt)[l._by], l._bp);
 1817#endif
 1818}
 1819
 1820/**
 (gdb) p this -_ebwt
 $1 = (uint8_t *) 0x4804a00 \b2
 (gdb) p *this -_ebwt
 $2 = 8 '\b'
 (gdb) p l._by
 $3 = 45
 (gdb) p l.side 
 $4 = SideLocus::side(unsigned char const*) const
 (gdb) p l.side(this-_ebwt)
 $5 = (uint8_t *) 0x23d0d900 Address 0x23d0d900 out of bounds
 (gdb) p l.side(this-_ebwt)[l._by]
 Cannot access memory at address 0x23d0d92d
 (gdb) p this -_ebwt
 $6 = (uint8_t *) 0x4804a00 \b2
 (gdb) 
 
 Running

[Bioc-devel] differences between petty and perceval (OS X 10.6.8 build machines for release/devel)

2014-06-13 Thread Michael Stadler
Hi Dan,

I'm cc'ing the list; maybe somebody else has experienced differences
between petty and perceval.

Rbowtie release (1.4.5) is not building under OS X 10.6.8 (petty).

Rbowtie release (1.4.5) and development (1.5.5) are virtually identical
(only DESCRIPTION and NEWS differ).

The development version builds without problems on perceval, but the
release version fails on petty:
http://bioconductor.org/checkResults/devel/bioc-LATEST/Rbowtie/perceval-buildsrc.html
http://bioconductor.org/checkResults/release/bioc-LATEST/Rbowtie/petty-buildsrc.html

The only difference I can make out from the node info pages is that
perceval has an additional section on C++11 compiler that is lacking
from petty's NodeInfo page.

Unfortunately, I cannot reproduce the issue, both Rbowtie 1.4.5 and
1.5.5 build successfully under OS X 10.6.8 and 10.7.5 using llvm-gcc-4.2.

Do you have any idea what else could be different between petty and
perceval?

Thank you,
Michael

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] differences between petty and perceval (OS X 10.6.8 build machines for release/devel)

2014-06-13 Thread Michael Stadler
When BioC 2.14 was released, it did work on OS X 10.6.8, but not on 10.9
and neither on some flavours of 32-bit Linux.

The bowtie developers have released version 1.0.1 which addresses some
of these issues. For the remaining one that still made it fail on OS X
10.9, I have been in contact with them and they provided a patch (see
http://sourceforge.net/p/bowtie-bio/bugs/312/) which seems to fix it on
the OS X machines that I have access to, but apparently not for petty.

Going back to the version at release is not an option, since I rate
support for OS X 10.9 more important, especially in the long run.

Michael


On 13.06.2014 15:19, Kasper Daniel Hansen wrote:
 That is pretty weird.  Since this is a segmentation fault in Bowtie,
 perhaps there are resource demands which are only satisfied on perceval
 (tmp space, #cores).
 
 Presumably this worked when Bioc was released, so what have you changed
 in between?
 
 Best,
 Kasper
 
 
 On Fri, Jun 13, 2014 at 3:32 AM, Michael Stadler michael.stad...@fmi.ch
 mailto:michael.stad...@fmi.ch wrote:
 
 Hi Dan,
 
 I'm cc'ing the list; maybe somebody else has experienced differences
 between petty and perceval.
 
 Rbowtie release (1.4.5) is not building under OS X 10.6.8 (petty).
 
 Rbowtie release (1.4.5) and development (1.5.5) are virtually identical
 (only DESCRIPTION and NEWS differ).
 
 The development version builds without problems on perceval, but the
 release version fails on petty:
 
 http://bioconductor.org/checkResults/devel/bioc-LATEST/Rbowtie/perceval-buildsrc.html
 
 http://bioconductor.org/checkResults/release/bioc-LATEST/Rbowtie/petty-buildsrc.html
 
 The only difference I can make out from the node info pages is that
 perceval has an additional section on C++11 compiler that is lacking
 from petty's NodeInfo page.
 
 Unfortunately, I cannot reproduce the issue, both Rbowtie 1.4.5 and
 1.5.5 build successfully under OS X 10.6.8 and 10.7.5 using
 llvm-gcc-4.2.
 
 Do you have any idea what else could be different between petty and
 perceval?
 
 Thank you,
 Michael
 
 ___
 Bioc-devel@r-project.org mailto:Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 

-- 

Michael Stadler, PhD
Head of Computational Biology
Friedrich Miescher Institute
Basel (Switzerland)
Phone : +41 61 697 6492
Fax   : +41 61 697 3976
Mail  : michael.stad...@fmi.ch

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [[- method dispatch and GenomicRanges

2014-04-10 Thread Michael Stadler
Hi Herve,

I reinstalled AnnotationDbi 1.25.17, and now things work flawlessly.

Thank you for your help!

Michael


On 09.04.2014 19:25, Hervé Pagès wrote:
 Hi Michael,
 
 Because of a recent change to IRanges, some information about the
 as.list method table is now out-of-sync in your installed
 AnnotationDbi (granted it was installed before the new IRanges
 propagated to you). Re-installing AnnotationDbi should clear this.
 
 I bumped AnnotationDbi version yesterday night (even though
 AnnotationDbi has not changed) to help clear this up for other
 users. The new version won't become available before tomorrow though.
 But, again, this new version (1.25.18) is similar to the current
 version (1.25.17), so you can re-install now with
 biocLite(AnnotationDbi). No need to wait until tomorrow.
 
 HTH,
 H.
 
 On 04/09/2014 12:23 AM, Michael Stadler wrote:
 Dear Bioc gurus,

 Since a few days, QuasR devel (1.3.13 and 1.3.14) is failing on some
 unit tests, which turns out to be reproducible independent of QuasR and
 might be related to R getting the wrong method for [[-.

 The following works fine...
 library(GenomicRanges)
 grl - GRangesList(r1=GRanges(chr1,IRanges(1,2)),
 r2=GRanges(chr2,IRanges(3,4)))
 grl[[1]] - GRanges(chr1,IRanges(10,11))


 ...but this produces an error:
 library(GenomicRanges)
 library(GenomicFeatures)
 grl - GRangesList(r1=GRanges(chr1,IRanges(1,2)),
 r2=GRanges(chr2,IRanges(3,4)))
 grl[[1]] - GRanges(chr1,IRanges(10,11))
 #Error in as.list(x, use.names = FALSE) :
 #  could not find function .as.list.CompressedList

 My guess is that loading the GenomicFeatures library in addition to
 GenomicRanges influences the methods dispatch, but with my limited
 understanding of R and S4 classes I was not able to narrow it further
 down.

 I noticed that I am using a slightly older version of R (r65206)
 compared to the build system (r65358), however I also see the error with
 the most recent release candidate (r65385).

 Michael


 My session info:
 R version 3.1.0 alpha (2014-03-17 r65206)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
 [1] C

 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets  methods
 [8] base

 other attached packages:
 [1] GenomicFeatures_1.15.15 AnnotationDbi_1.25.17   Biobase_2.23.6
 [4] GenomicRanges_1.15.44   GenomeInfoDb_0.99.30IRanges_1.21.43
 [7] BiocGenerics_0.9.3  RColorBrewer_1.0-5

 loaded via a namespace (and not attached):
   [1] BBmisc_1.5BSgenome_1.31.13
   [3] BatchJobs_1.2 BiocParallel_0.5.19
   [5] Biostrings_2.31.21DBI_0.2-7
   [7] GenomicAlignments_0.99.37 RCurl_1.95-4.1
   [9] RSQLite_0.11.4Rcpp_0.11.1
 [11] Rsamtools_1.15.41 XML_3.98-1.1
 [13] XVector_0.3.7 biomaRt_2.19.3
 [15] bitops_1.0-6  brew_1.0-6
 [17] codetools_0.2-8   digest_0.6.4
 [19] fail_1.2  foreach_1.4.1
 [21] iterators_1.0.6   plyr_1.8.1
 [23] rtracklayer_1.23.22   sendmailR_1.1-2
 [25] stats4_3.1.0  stringr_0.6.2
 [27] tools_3.1.0   zlibbioc_1.9.0

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel

 

-- 

Michael Stadler, PhD
Head of Computational Biology
Friedrich Miescher Institute
Basel (Switzerland)
Phone : +41 61 697 6492
Fax   : +41 61 697 3976
Mail  : michael.stad...@fmi.ch

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] [[- method dispatch and GenomicRanges

2014-04-09 Thread Michael Stadler
Dear Bioc gurus,

Since a few days, QuasR devel (1.3.13 and 1.3.14) is failing on some
unit tests, which turns out to be reproducible independent of QuasR and
might be related to R getting the wrong method for [[-.

The following works fine...
library(GenomicRanges)
grl - GRangesList(r1=GRanges(chr1,IRanges(1,2)),
   r2=GRanges(chr2,IRanges(3,4)))
grl[[1]] - GRanges(chr1,IRanges(10,11))


...but this produces an error:
library(GenomicRanges)
library(GenomicFeatures)
grl - GRangesList(r1=GRanges(chr1,IRanges(1,2)),
   r2=GRanges(chr2,IRanges(3,4)))
grl[[1]] - GRanges(chr1,IRanges(10,11))
#Error in as.list(x, use.names = FALSE) :
#  could not find function .as.list.CompressedList

My guess is that loading the GenomicFeatures library in addition to
GenomicRanges influences the methods dispatch, but with my limited
understanding of R and S4 classes I was not able to narrow it further down.

I noticed that I am using a slightly older version of R (r65206)
compared to the build system (r65358), however I also see the error with
the most recent release candidate (r65385).

Michael


My session info:
R version 3.1.0 alpha (2014-03-17 r65206)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] GenomicFeatures_1.15.15 AnnotationDbi_1.25.17   Biobase_2.23.6
[4] GenomicRanges_1.15.44   GenomeInfoDb_0.99.30IRanges_1.21.43
[7] BiocGenerics_0.9.3  RColorBrewer_1.0-5

loaded via a namespace (and not attached):
 [1] BBmisc_1.5BSgenome_1.31.13
 [3] BatchJobs_1.2 BiocParallel_0.5.19
 [5] Biostrings_2.31.21DBI_0.2-7
 [7] GenomicAlignments_0.99.37 RCurl_1.95-4.1
 [9] RSQLite_0.11.4Rcpp_0.11.1
[11] Rsamtools_1.15.41 XML_3.98-1.1
[13] XVector_0.3.7 biomaRt_2.19.3
[15] bitops_1.0-6  brew_1.0-6
[17] codetools_0.2-8   digest_0.6.4
[19] fail_1.2  foreach_1.4.1
[21] iterators_1.0.6   plyr_1.8.1
[23] rtracklayer_1.23.22   sendmailR_1.1-2
[25] stats4_3.1.0  stringr_0.6.2
[27] tools_3.1.0   zlibbioc_1.9.0

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [BioC] plot methylation figure for end analysis

2013-09-16 Thread Michael Stadler
I agree that the profile or metagene plot is useful and widely applied.

Creating such a plot consists of several steps, including:
- quantifying and accumulating signal around sites of interest
- normalization
- visualization

The qProfile() function in QuasR was designed to do the
quantification/accumulation, which is not straightforward to implement
efficiently and could use a lot of memory.

This leaves it to the user to choose an appropriate approach for the
normalization (if at all necessary) and gives full flexibility for plotting.

qProfile returns a list of vectors, so normalization and plotting are
not very complex (as shown in the example mentioned by Tim, in the QuasR
vignette in section 6.1.6 Create a genomic prole for a set of regions
using qProfile).

I would be interested to know your opinions - are these still too
complex, so that a plot method would be of high value?

Michael



On 13.09.2013 07:21, bach le wrote:
 Many thanks. I've checked BSmooth but it seems not to be what I mentioned.
 As Tim said, this kind of figures are used quite often to illustrate the
 difference between groups of objects. Yet till now it seems that there has
 not been any offer from BioC (afa I may know). It would be great if someone
 can help to implement it.
 Cheers,
 
 
 On Fri, Sep 13, 2013 at 3:56 AM, Tim Triche, Jr. tim.tri...@gmail.comwrote:
 
 Bsmooth/bsseq (and QuasR, as well) are great for manipulating and analyzing
 BSseq data, but there's something in this email that I've never really
 found a great solution for, and that is so-called profile or metagene
 plots (the latter is a confusing bit of terminology, given that it is
 re-used for linear combinations of gene effects in expression studies).

 Suppose you wanted to look at the distribution of H3K4me3 or H3K27me3
 ChIP-seq density relative to the start of (or exon/intron junctions within,
 or enhancers whose activity was linked to, ...) transcribed vs. silent
 genes across a genome.  Suppose you also wanted to look at methylation
 proportions relative to the same positions (again, the idea here is to take
 a bunch of landmarks and plot a summary statistic describing the behavior
 of some measurement relative to those landmarks, across the genome).  An
 example of such a plot is on page 23 of the QuasR user's manual.  For
 clarity, here's a link:

 https://dl.dropboxusercontent.com/u/12962689/profile.png

 This seems like something that is so incredibly common it ought to be a
 generic function within BioC somewhere, but I've never found a really easy
 way to do it.  If I want to plot a density matrix mat[x,y] for alhl x and
 all y, I can call image(mat) or persp(mat) and that's that.  Is there
 something like this within biocGenerics of which I've remained completely
 ignorant, or... ?

 If it isn't a generic somewhere, I think I may have to implement one.




 On Thu, Sep 12, 2013 at 1:32 AM, Alex Gutteridge al...@ruggedtextile.com
 wrote:

 On 12.09.2013 08:25, Gia [guest] wrote:

 Hi,
 I would like to ask if anyone experienced with creating figures from
 BS-Seq methylation data for end analysis of genes (or any genomic
 features), like what was presented in the following link:
 http://www.nature.com/ng/**journal/v39/n1/fig_tab/ng1929_**F4.html
 http://www.nature.com/ng/journal/v39/n1/fig_tab/ng1929_F4.html
 It would be appreciate if you can show how to create this kind of
 figures using any Bioconductor packages.
 Cheers,

  -- output of sessionInfo():

 None

 --
 Sent via the guest posting facility at bioconductor.org.

 __**_
 Bioconductor mailing list
 bioconduc...@r-project.org
 https://stat.ethz.ch/mailman/**listinfo/bioconductor
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives:
 http://news.gmane.org/gmane.**science.biology.informatics.**conductor
 http://news.gmane.org/gmane.science.biology.informatics.conductor


 I've only skimmed through that paper but that figure isn't showing BS-Seq
 data as far as I can see - it's showing the results of an array / IP
 based
 approach to methylation quantitation. BSmooth is one of the best packages
 I've found for manipulating BS-Seq data. If you get your data in there
 you
 could derive analogous statistics (i.e. methylation levels) pretty
 easily.
 Then any of the base plotting functions will give you similar graphs.

 --
 Alex Gutteridge


 __**_
 Bioconductor mailing list
 bioconduc...@r-project.org
 https://stat.ethz.ch/mailman/**listinfo/bioconductor
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives: http://news.gmane.org/gmane.**
 science.biology.informatics.**conductor
 http://news.gmane.org/gmane.science.biology.informatics.conductor




 --
 *He that would live in peace and at ease, *
 *Must not speak all he knows, nor judge all he sees.*
 *
 *
 Benjamin Franklin, Poor Richard's
 Almanackhttp://archive.org/details/poorrichardsalma00franrich

 

Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation

2013-03-12 Thread Michael Stadler
Hi Valerie,

Thank you for the quick fix - it works like a charm!

Michael

On 03/10/2013 02:45 AM, Valerie Obenchain wrote:
 Hi Michael,
 
 Thanks for reporting the bug and sending a reproducible example. Sorry
 it took a few days to get to this.
 
 A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel).
 
 Valerie
 
 
 On 03/07/13 05:43, Michael Stadler wrote:
 Here is the second attachment (variants.vcf) that was missing from the
 first message.

 Michael

 On 03/07/2013 02:40 PM, Michael Stadler wrote:
 Dear Valerie and colleagues,

 I have recently started using the VariantAnnotation package, which is a
 huge asset for much of my work with sequence variants - thank you!

 I have a question regarding the following example (it makes use of two
 small text files, annot.gtf and variants.vcf, attached to this
 message):

 library(VariantAnnotation)
 library(GenomicFeatures)
 library(BSgenome.Celegans.UCSC.ce6)

 # build a TranscriptDb from annot.gtf
 chrominfo - data.frame(chrom=as.character(seqnames(Celegans)),
  length=seqlengths(Celegans),
  is_circular=FALSE)
 txdb - makeTranscriptDbFromGFF(file=annot.gtf,
  format=gtf,
  chrominfo=chrominfo,
  dataSource=WormBase v. WS190,
  species=Caenorhabditis elegans)
 txdb
 #TranscriptDb object:
 #| Db type: TranscriptDb
 #| Supporting package: GenomicFeatures
 #| Data source: WormBase v. WS190
 #| Organism: Caenorhabditis elegans
 #| miRBase build ID: NA
 #| transcript_nrow: 7
 #| exon_nrow: 42
 #| cds_nrow: 42
 #| Db created by: GenomicFeatures package from Bioconductor
 #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
 #| GenomicFeatures version at creation time: 1.11.14
 #| RSQLite version at creation time: 0.11.2
 #| DBSCHEMAVERSION: 1.0

 # read in sequence variants
 vcf - readVcf(variants.vcf, genome=ce6)
 dim(vcf)
 #[1] 3 1

 # predict the impact on coding sequences
 aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
 length(aa)
 #[1] 7


 # My question is related to the impact of variant chrV:15822727:
 # This SNV is overlapping two transcripts (same base, plus strand),
 # yet the reported VARCODON values are different (GAT - TAT/AAT):
 aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]


 #GRanges with 2 ranges and 5 metadata columns:
 #seqnames   ranges strand |
 #   RleIRanges  Rle |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #   REFALT  varAllele
 #DNAStringSet DNAStringSetList DNAStringSet
 #  chrV:15822727  G  A  A
 #  chrV:15822727  G  A  A
 #  REFCODON   VARCODON
 #DNAStringSet DNAStringSet
 #  chrV:15822727GATTAT
 #  chrV:15822727GATAAT
 #  ---
 #  seqlengths:
 #   chrI chrV
 # NA   NA


 # interestingly, when altering the annotation or the set of variants,
 # this contradictions disapears (GAT - AAT):
 vcf2 - vcf[-1] # remove the first SNV
 aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
 length(aa2)
 #[1] 5
 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]


 #GRanges with 2 ranges and 5 metadata columns:
 #seqnames   ranges strand |
 #   RleIRanges  Rle |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #   REFALT  varAllele
 #DNAStringSet DNAStringSetList DNAStringSet
 #  chrV:15822727  G  A  A
 #  chrV:15822727  G  A  A
 #  REFCODON   VARCODON
 #DNAStringSet DNAStringSet
 #  chrV:15822727GATAAT
 #  chrV:15822727GATAAT
 #  ---
 #  seqlengths:
 #   chrI chrV
 # NA   NA

 Do you know why this could be the case?
 Regards,
 Michael


 Here is my envirnoment:
 sessionInfo()
 R Under development (unstable) (2013-02-25 r62061)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
 [1] C

 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets
 [7] methods   base

 other attached packages:
   [1] BSgenome.Celegans.UCSC.ce6_1.3.19
   [2] BSgenome_1.27.1
   [3] GenomicFeatures_1.11.14
   [4] AnnotationDbi_1.21.12
   [5] Biobase_2.19.3
   [6] VariantAnnotation_1.5.42
   [7] Rsamtools_1.11.21
   [8] Biostrings_2.27.11
   [9] GenomicRanges_1.11.35
 [10] IRanges_1.17.35
 [11] BiocGenerics_0.5.6
 [12] RColorBrewer_1.0-5

 loaded via a namespace

[Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation

2013-03-07 Thread Michael Stadler
Dear Valerie and colleagues,

I have recently started using the VariantAnnotation package, which is a
huge asset for much of my work with sequence variants - thank you!

I have a question regarding the following example (it makes use of two
small text files, annot.gtf and variants.vcf, attached to this message):

library(VariantAnnotation)
library(GenomicFeatures)
library(BSgenome.Celegans.UCSC.ce6)

# build a TranscriptDb from annot.gtf
chrominfo - data.frame(chrom=as.character(seqnames(Celegans)),
length=seqlengths(Celegans),
is_circular=FALSE)
txdb - makeTranscriptDbFromGFF(file=annot.gtf,
format=gtf,
chrominfo=chrominfo,
dataSource=WormBase v. WS190,
species=Caenorhabditis elegans)
txdb
#TranscriptDb object:
#| Db type: TranscriptDb
#| Supporting package: GenomicFeatures
#| Data source: WormBase v. WS190
#| Organism: Caenorhabditis elegans
#| miRBase build ID: NA
#| transcript_nrow: 7
#| exon_nrow: 42
#| cds_nrow: 42
#| Db created by: GenomicFeatures package from Bioconductor
#| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
#| GenomicFeatures version at creation time: 1.11.14
#| RSQLite version at creation time: 0.11.2
#| DBSCHEMAVERSION: 1.0

# read in sequence variants
vcf - readVcf(variants.vcf, genome=ce6)
dim(vcf)
#[1] 3 1

# predict the impact on coding sequences
aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
length(aa)
#[1] 7


# My question is related to the impact of variant chrV:15822727:
# This SNV is overlapping two transcripts (same base, plus strand),
# yet the reported VARCODON values are different (GAT - TAT/AAT):
aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]

#GRanges with 2 ranges and 5 metadata columns:
#seqnames   ranges strand |
#   RleIRanges  Rle |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#   REFALT  varAllele
#DNAStringSet DNAStringSetList DNAStringSet
#  chrV:15822727  G  A  A
#  chrV:15822727  G  A  A
#  REFCODON   VARCODON
#DNAStringSet DNAStringSet
#  chrV:15822727GATTAT
#  chrV:15822727GATAAT
#  ---
#  seqlengths:
#   chrI chrV
# NA   NA


# interestingly, when altering the annotation or the set of variants,
# this contradictions disapears (GAT - AAT):
vcf2 - vcf[-1] # remove the first SNV
aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
length(aa2)
#[1] 5
aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]

#GRanges with 2 ranges and 5 metadata columns:
#seqnames   ranges strand |
#   RleIRanges  Rle |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#   REFALT  varAllele
#DNAStringSet DNAStringSetList DNAStringSet
#  chrV:15822727  G  A  A
#  chrV:15822727  G  A  A
#  REFCODON   VARCODON
#DNAStringSet DNAStringSet
#  chrV:15822727GATAAT
#  chrV:15822727GATAAT
#  ---
#  seqlengths:
#   chrI chrV
# NA   NA

Do you know why this could be the case?
Regards,
Michael


Here is my envirnoment:
sessionInfo()
R Under development (unstable) (2013-02-25 r62061)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets
[7] methods   base

other attached packages:
 [1] BSgenome.Celegans.UCSC.ce6_1.3.19
 [2] BSgenome_1.27.1
 [3] GenomicFeatures_1.11.14
 [4] AnnotationDbi_1.21.12
 [5] Biobase_2.19.3
 [6] VariantAnnotation_1.5.42
 [7] Rsamtools_1.11.21
 [8] Biostrings_2.27.11
 [9] GenomicRanges_1.11.35
[10] IRanges_1.17.35
[11] BiocGenerics_0.5.6
[12] RColorBrewer_1.0-5

loaded via a namespace (and not attached):
 [1] DBI_0.2-5  RCurl_1.95-4.1 RSQLite_0.11.2
 [4] XML_3.95-0.1   biomaRt_2.15.0 bitops_1.0-5
 [7] rtracklayer_1.19.9 stats4_3.0.0   tools_3.0.0
[10] zlibbioc_1.5.0

chrVCoding_transcript   exon6512065 6512138 .   -   2   
gene_id WBGene1073; transcript_id F46E10.9.1;
chrVCoding_transcript   exon6512189 6512279 .   -   0   
gene_id WBGene1073; transcript_id F46E10.9.1;
chrVCoding_transcript   exon6512325 6512472 .   -   1   
gene_id WBGene1073; transcript_id F46E10.9.1;
chrVCoding_transcript   exon6512521 6512649 .

Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation

2013-03-07 Thread Michael Stadler
Here is the second attachment (variants.vcf) that was missing from the
first message.

Michael

On 03/07/2013 02:40 PM, Michael Stadler wrote:
 Dear Valerie and colleagues,
 
 I have recently started using the VariantAnnotation package, which is a
 huge asset for much of my work with sequence variants - thank you!
 
 I have a question regarding the following example (it makes use of two
 small text files, annot.gtf and variants.vcf, attached to this message):
 
 library(VariantAnnotation)
 library(GenomicFeatures)
 library(BSgenome.Celegans.UCSC.ce6)
 
 # build a TranscriptDb from annot.gtf
 chrominfo - data.frame(chrom=as.character(seqnames(Celegans)),
 length=seqlengths(Celegans),
 is_circular=FALSE)
 txdb - makeTranscriptDbFromGFF(file=annot.gtf,
 format=gtf,
 chrominfo=chrominfo,
 dataSource=WormBase v. WS190,
 species=Caenorhabditis elegans)
 txdb
 #TranscriptDb object:
 #| Db type: TranscriptDb
 #| Supporting package: GenomicFeatures
 #| Data source: WormBase v. WS190
 #| Organism: Caenorhabditis elegans
 #| miRBase build ID: NA
 #| transcript_nrow: 7
 #| exon_nrow: 42
 #| cds_nrow: 42
 #| Db created by: GenomicFeatures package from Bioconductor
 #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
 #| GenomicFeatures version at creation time: 1.11.14
 #| RSQLite version at creation time: 0.11.2
 #| DBSCHEMAVERSION: 1.0
 
 # read in sequence variants
 vcf - readVcf(variants.vcf, genome=ce6)
 dim(vcf)
 #[1] 3 1
 
 # predict the impact on coding sequences
 aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
 length(aa)
 #[1] 7
 
 
 # My question is related to the impact of variant chrV:15822727:
 # This SNV is overlapping two transcripts (same base, plus strand),
 # yet the reported VARCODON values are different (GAT - TAT/AAT):
 aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]
 
 #GRanges with 2 ranges and 5 metadata columns:
 #seqnames   ranges strand |
 #   RleIRanges  Rle |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #   REFALT  varAllele
 #DNAStringSet DNAStringSetList DNAStringSet
 #  chrV:15822727  G  A  A
 #  chrV:15822727  G  A  A
 #  REFCODON   VARCODON
 #DNAStringSet DNAStringSet
 #  chrV:15822727GATTAT
 #  chrV:15822727GATAAT
 #  ---
 #  seqlengths:
 #   chrI chrV
 # NA   NA
 
 
 # interestingly, when altering the annotation or the set of variants,
 # this contradictions disapears (GAT - AAT):
 vcf2 - vcf[-1] # remove the first SNV
 aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
 length(aa2)
 #[1] 5
 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]
 
 #GRanges with 2 ranges and 5 metadata columns:
 #seqnames   ranges strand |
 #   RleIRanges  Rle |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #   REFALT  varAllele
 #DNAStringSet DNAStringSetList DNAStringSet
 #  chrV:15822727  G  A  A
 #  chrV:15822727  G  A  A
 #  REFCODON   VARCODON
 #DNAStringSet DNAStringSet
 #  chrV:15822727GATAAT
 #  chrV:15822727GATAAT
 #  ---
 #  seqlengths:
 #   chrI chrV
 # NA   NA
 
 Do you know why this could be the case?
 Regards,
 Michael
 
 
 Here is my envirnoment:
 sessionInfo()
 R Under development (unstable) (2013-02-25 r62061)
 Platform: x86_64-unknown-linux-gnu (64-bit)
 
 locale:
 [1] C
 
 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets
 [7] methods   base
 
 other attached packages:
  [1] BSgenome.Celegans.UCSC.ce6_1.3.19
  [2] BSgenome_1.27.1
  [3] GenomicFeatures_1.11.14
  [4] AnnotationDbi_1.21.12
  [5] Biobase_2.19.3
  [6] VariantAnnotation_1.5.42
  [7] Rsamtools_1.11.21
  [8] Biostrings_2.27.11
  [9] GenomicRanges_1.11.35
 [10] IRanges_1.17.35
 [11] BiocGenerics_0.5.6
 [12] RColorBrewer_1.0-5
 
 loaded via a namespace (and not attached):
  [1] DBI_0.2-5  RCurl_1.95-4.1 RSQLite_0.11.2
  [4] XML_3.95-0.1   biomaRt_2.15.0 bitops_1.0-5
  [7] rtracklayer_1.19.9 stats4_3.0.0   tools_3.0.0
 [10] zlibbioc_1.5.0
 
 
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel