Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation

2013-03-12 Thread Michael Stadler
Hi Valerie,

Thank you for the quick fix - it works like a charm!

Michael

On 03/10/2013 02:45 AM, Valerie Obenchain wrote:
 Hi Michael,
 
 Thanks for reporting the bug and sending a reproducible example. Sorry
 it took a few days to get to this.
 
 A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel).
 
 Valerie
 
 
 On 03/07/13 05:43, Michael Stadler wrote:
 Here is the second attachment (variants.vcf) that was missing from the
 first message.

 Michael

 On 03/07/2013 02:40 PM, Michael Stadler wrote:
 Dear Valerie and colleagues,

 I have recently started using the VariantAnnotation package, which is a
 huge asset for much of my work with sequence variants - thank you!

 I have a question regarding the following example (it makes use of two
 small text files, annot.gtf and variants.vcf, attached to this
 message):

 library(VariantAnnotation)
 library(GenomicFeatures)
 library(BSgenome.Celegans.UCSC.ce6)

 # build a TranscriptDb from annot.gtf
 chrominfo - data.frame(chrom=as.character(seqnames(Celegans)),
  length=seqlengths(Celegans),
  is_circular=FALSE)
 txdb - makeTranscriptDbFromGFF(file=annot.gtf,
  format=gtf,
  chrominfo=chrominfo,
  dataSource=WormBase v. WS190,
  species=Caenorhabditis elegans)
 txdb
 #TranscriptDb object:
 #| Db type: TranscriptDb
 #| Supporting package: GenomicFeatures
 #| Data source: WormBase v. WS190
 #| Organism: Caenorhabditis elegans
 #| miRBase build ID: NA
 #| transcript_nrow: 7
 #| exon_nrow: 42
 #| cds_nrow: 42
 #| Db created by: GenomicFeatures package from Bioconductor
 #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
 #| GenomicFeatures version at creation time: 1.11.14
 #| RSQLite version at creation time: 0.11.2
 #| DBSCHEMAVERSION: 1.0

 # read in sequence variants
 vcf - readVcf(variants.vcf, genome=ce6)
 dim(vcf)
 #[1] 3 1

 # predict the impact on coding sequences
 aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
 length(aa)
 #[1] 7


 # My question is related to the impact of variant chrV:15822727:
 # This SNV is overlapping two transcripts (same base, plus strand),
 # yet the reported VARCODON values are different (GAT - TAT/AAT):
 aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]


 #GRanges with 2 ranges and 5 metadata columns:
 #seqnames   ranges strand |
 #   RleIRanges  Rle |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #   REFALT  varAllele
 #DNAStringSet DNAStringSetList DNAStringSet
 #  chrV:15822727  G  A  A
 #  chrV:15822727  G  A  A
 #  REFCODON   VARCODON
 #DNAStringSet DNAStringSet
 #  chrV:15822727GATTAT
 #  chrV:15822727GATAAT
 #  ---
 #  seqlengths:
 #   chrI chrV
 # NA   NA


 # interestingly, when altering the annotation or the set of variants,
 # this contradictions disapears (GAT - AAT):
 vcf2 - vcf[-1] # remove the first SNV
 aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
 length(aa2)
 #[1] 5
 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]


 #GRanges with 2 ranges and 5 metadata columns:
 #seqnames   ranges strand |
 #   RleIRanges  Rle |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #   REFALT  varAllele
 #DNAStringSet DNAStringSetList DNAStringSet
 #  chrV:15822727  G  A  A
 #  chrV:15822727  G  A  A
 #  REFCODON   VARCODON
 #DNAStringSet DNAStringSet
 #  chrV:15822727GATAAT
 #  chrV:15822727GATAAT
 #  ---
 #  seqlengths:
 #   chrI chrV
 # NA   NA

 Do you know why this could be the case?
 Regards,
 Michael


 Here is my envirnoment:
 sessionInfo()
 R Under development (unstable) (2013-02-25 r62061)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
 [1] C

 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets
 [7] methods   base

 other attached packages:
   [1] BSgenome.Celegans.UCSC.ce6_1.3.19
   [2] BSgenome_1.27.1
   [3] GenomicFeatures_1.11.14
   [4] AnnotationDbi_1.21.12
   [5] Biobase_2.19.3
   [6] VariantAnnotation_1.5.42
   [7] Rsamtools_1.11.21
   [8] Biostrings_2.27.11
   [9] GenomicRanges_1.11.35
 [10] IRanges_1.17.35
 [11] BiocGenerics_0.5.6
 [12] RColorBrewer_1.0-5

 loaded via a namespace (and 

Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation

2013-03-09 Thread Valerie Obenchain

Hi Michael,

Thanks for reporting the bug and sending a reproducible example. Sorry 
it took a few days to get to this.


A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel).

Valerie


On 03/07/13 05:43, Michael Stadler wrote:

Here is the second attachment (variants.vcf) that was missing from the
first message.

Michael

On 03/07/2013 02:40 PM, Michael Stadler wrote:

Dear Valerie and colleagues,

I have recently started using the VariantAnnotation package, which is a
huge asset for much of my work with sequence variants - thank you!

I have a question regarding the following example (it makes use of two
small text files, annot.gtf and variants.vcf, attached to this message):

library(VariantAnnotation)
library(GenomicFeatures)
library(BSgenome.Celegans.UCSC.ce6)

# build a TranscriptDb from annot.gtf
chrominfo - data.frame(chrom=as.character(seqnames(Celegans)),
 length=seqlengths(Celegans),
 is_circular=FALSE)
txdb - makeTranscriptDbFromGFF(file=annot.gtf,
 format=gtf,
 chrominfo=chrominfo,
 dataSource=WormBase v. WS190,
 species=Caenorhabditis elegans)
txdb
#TranscriptDb object:
#| Db type: TranscriptDb
#| Supporting package: GenomicFeatures
#| Data source: WormBase v. WS190
#| Organism: Caenorhabditis elegans
#| miRBase build ID: NA
#| transcript_nrow: 7
#| exon_nrow: 42
#| cds_nrow: 42
#| Db created by: GenomicFeatures package from Bioconductor
#| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
#| GenomicFeatures version at creation time: 1.11.14
#| RSQLite version at creation time: 0.11.2
#| DBSCHEMAVERSION: 1.0

# read in sequence variants
vcf - readVcf(variants.vcf, genome=ce6)
dim(vcf)
#[1] 3 1

# predict the impact on coding sequences
aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
length(aa)
#[1] 7


# My question is related to the impact of variant chrV:15822727:
# This SNV is overlapping two transcripts (same base, plus strand),
# yet the reported VARCODON values are different (GAT - TAT/AAT):
aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]

#GRanges with 2 ranges and 5 metadata columns:
#seqnames   ranges strand |
#   RleIRanges  Rle |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#   REFALT  varAllele
#DNAStringSet DNAStringSetList DNAStringSet
#  chrV:15822727  G  A  A
#  chrV:15822727  G  A  A
#  REFCODON   VARCODON
#DNAStringSet DNAStringSet
#  chrV:15822727GATTAT
#  chrV:15822727GATAAT
#  ---
#  seqlengths:
#   chrI chrV
# NA   NA


# interestingly, when altering the annotation or the set of variants,
# this contradictions disapears (GAT - AAT):
vcf2 - vcf[-1] # remove the first SNV
aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
length(aa2)
#[1] 5
aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]

#GRanges with 2 ranges and 5 metadata columns:
#seqnames   ranges strand |
#   RleIRanges  Rle |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#   REFALT  varAllele
#DNAStringSet DNAStringSetList DNAStringSet
#  chrV:15822727  G  A  A
#  chrV:15822727  G  A  A
#  REFCODON   VARCODON
#DNAStringSet DNAStringSet
#  chrV:15822727GATAAT
#  chrV:15822727GATAAT
#  ---
#  seqlengths:
#   chrI chrV
# NA   NA

Do you know why this could be the case?
Regards,
Michael


Here is my envirnoment:
sessionInfo()
R Under development (unstable) (2013-02-25 r62061)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets
[7] methods   base

other attached packages:
  [1] BSgenome.Celegans.UCSC.ce6_1.3.19
  [2] BSgenome_1.27.1
  [3] GenomicFeatures_1.11.14
  [4] AnnotationDbi_1.21.12
  [5] Biobase_2.19.3
  [6] VariantAnnotation_1.5.42
  [7] Rsamtools_1.11.21
  [8] Biostrings_2.27.11
  [9] GenomicRanges_1.11.35
[10] IRanges_1.17.35
[11] BiocGenerics_0.5.6
[12] RColorBrewer_1.0-5

loaded via a namespace (and not attached):
  [1] DBI_0.2-5  RCurl_1.95-4.1 RSQLite_0.11.2
  [4] XML_3.95-0.1   biomaRt_2.15.0 bitops_1.0-5
  [7] rtracklayer_1.19.9 stats4_3.0.0   tools_3.0.0
[10] zlibbioc_1.5.0




[Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation

2013-03-07 Thread Michael Stadler
Dear Valerie and colleagues,

I have recently started using the VariantAnnotation package, which is a
huge asset for much of my work with sequence variants - thank you!

I have a question regarding the following example (it makes use of two
small text files, annot.gtf and variants.vcf, attached to this message):

library(VariantAnnotation)
library(GenomicFeatures)
library(BSgenome.Celegans.UCSC.ce6)

# build a TranscriptDb from annot.gtf
chrominfo - data.frame(chrom=as.character(seqnames(Celegans)),
length=seqlengths(Celegans),
is_circular=FALSE)
txdb - makeTranscriptDbFromGFF(file=annot.gtf,
format=gtf,
chrominfo=chrominfo,
dataSource=WormBase v. WS190,
species=Caenorhabditis elegans)
txdb
#TranscriptDb object:
#| Db type: TranscriptDb
#| Supporting package: GenomicFeatures
#| Data source: WormBase v. WS190
#| Organism: Caenorhabditis elegans
#| miRBase build ID: NA
#| transcript_nrow: 7
#| exon_nrow: 42
#| cds_nrow: 42
#| Db created by: GenomicFeatures package from Bioconductor
#| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
#| GenomicFeatures version at creation time: 1.11.14
#| RSQLite version at creation time: 0.11.2
#| DBSCHEMAVERSION: 1.0

# read in sequence variants
vcf - readVcf(variants.vcf, genome=ce6)
dim(vcf)
#[1] 3 1

# predict the impact on coding sequences
aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
length(aa)
#[1] 7


# My question is related to the impact of variant chrV:15822727:
# This SNV is overlapping two transcripts (same base, plus strand),
# yet the reported VARCODON values are different (GAT - TAT/AAT):
aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]

#GRanges with 2 ranges and 5 metadata columns:
#seqnames   ranges strand |
#   RleIRanges  Rle |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#   REFALT  varAllele
#DNAStringSet DNAStringSetList DNAStringSet
#  chrV:15822727  G  A  A
#  chrV:15822727  G  A  A
#  REFCODON   VARCODON
#DNAStringSet DNAStringSet
#  chrV:15822727GATTAT
#  chrV:15822727GATAAT
#  ---
#  seqlengths:
#   chrI chrV
# NA   NA


# interestingly, when altering the annotation or the set of variants,
# this contradictions disapears (GAT - AAT):
vcf2 - vcf[-1] # remove the first SNV
aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
length(aa2)
#[1] 5
aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]

#GRanges with 2 ranges and 5 metadata columns:
#seqnames   ranges strand |
#   RleIRanges  Rle |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#  chrV:15822727 chrV [15822727, 15822727]  + |
#   REFALT  varAllele
#DNAStringSet DNAStringSetList DNAStringSet
#  chrV:15822727  G  A  A
#  chrV:15822727  G  A  A
#  REFCODON   VARCODON
#DNAStringSet DNAStringSet
#  chrV:15822727GATAAT
#  chrV:15822727GATAAT
#  ---
#  seqlengths:
#   chrI chrV
# NA   NA

Do you know why this could be the case?
Regards,
Michael


Here is my envirnoment:
sessionInfo()
R Under development (unstable) (2013-02-25 r62061)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets
[7] methods   base

other attached packages:
 [1] BSgenome.Celegans.UCSC.ce6_1.3.19
 [2] BSgenome_1.27.1
 [3] GenomicFeatures_1.11.14
 [4] AnnotationDbi_1.21.12
 [5] Biobase_2.19.3
 [6] VariantAnnotation_1.5.42
 [7] Rsamtools_1.11.21
 [8] Biostrings_2.27.11
 [9] GenomicRanges_1.11.35
[10] IRanges_1.17.35
[11] BiocGenerics_0.5.6
[12] RColorBrewer_1.0-5

loaded via a namespace (and not attached):
 [1] DBI_0.2-5  RCurl_1.95-4.1 RSQLite_0.11.2
 [4] XML_3.95-0.1   biomaRt_2.15.0 bitops_1.0-5
 [7] rtracklayer_1.19.9 stats4_3.0.0   tools_3.0.0
[10] zlibbioc_1.5.0

chrVCoding_transcript   exon6512065 6512138 .   -   2   
gene_id WBGene1073; transcript_id F46E10.9.1;
chrVCoding_transcript   exon6512189 6512279 .   -   0   
gene_id WBGene1073; transcript_id F46E10.9.1;
chrVCoding_transcript   exon6512325 6512472 .   -   1   
gene_id WBGene1073; transcript_id F46E10.9.1;
chrVCoding_transcript   exon6512521 6512649 .

Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation

2013-03-07 Thread Michael Stadler
Here is the second attachment (variants.vcf) that was missing from the
first message.

Michael

On 03/07/2013 02:40 PM, Michael Stadler wrote:
 Dear Valerie and colleagues,
 
 I have recently started using the VariantAnnotation package, which is a
 huge asset for much of my work with sequence variants - thank you!
 
 I have a question regarding the following example (it makes use of two
 small text files, annot.gtf and variants.vcf, attached to this message):
 
 library(VariantAnnotation)
 library(GenomicFeatures)
 library(BSgenome.Celegans.UCSC.ce6)
 
 # build a TranscriptDb from annot.gtf
 chrominfo - data.frame(chrom=as.character(seqnames(Celegans)),
 length=seqlengths(Celegans),
 is_circular=FALSE)
 txdb - makeTranscriptDbFromGFF(file=annot.gtf,
 format=gtf,
 chrominfo=chrominfo,
 dataSource=WormBase v. WS190,
 species=Caenorhabditis elegans)
 txdb
 #TranscriptDb object:
 #| Db type: TranscriptDb
 #| Supporting package: GenomicFeatures
 #| Data source: WormBase v. WS190
 #| Organism: Caenorhabditis elegans
 #| miRBase build ID: NA
 #| transcript_nrow: 7
 #| exon_nrow: 42
 #| cds_nrow: 42
 #| Db created by: GenomicFeatures package from Bioconductor
 #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013)
 #| GenomicFeatures version at creation time: 1.11.14
 #| RSQLite version at creation time: 0.11.2
 #| DBSCHEMAVERSION: 1.0
 
 # read in sequence variants
 vcf - readVcf(variants.vcf, genome=ce6)
 dim(vcf)
 #[1] 3 1
 
 # predict the impact on coding sequences
 aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans)
 length(aa)
 #[1] 7
 
 
 # My question is related to the impact of variant chrV:15822727:
 # This SNV is overlapping two transcripts (same base, plus strand),
 # yet the reported VARCODON values are different (GAT - TAT/AAT):
 aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]
 
 #GRanges with 2 ranges and 5 metadata columns:
 #seqnames   ranges strand |
 #   RleIRanges  Rle |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #   REFALT  varAllele
 #DNAStringSet DNAStringSetList DNAStringSet
 #  chrV:15822727  G  A  A
 #  chrV:15822727  G  A  A
 #  REFCODON   VARCODON
 #DNAStringSet DNAStringSet
 #  chrV:15822727GATTAT
 #  chrV:15822727GATAAT
 #  ---
 #  seqlengths:
 #   chrI chrV
 # NA   NA
 
 
 # interestingly, when altering the annotation or the set of variants,
 # this contradictions disapears (GAT - AAT):
 vcf2 - vcf[-1] # remove the first SNV
 aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans)
 length(aa2)
 #[1] 5
 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)]
 
 #GRanges with 2 ranges and 5 metadata columns:
 #seqnames   ranges strand |
 #   RleIRanges  Rle |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #  chrV:15822727 chrV [15822727, 15822727]  + |
 #   REFALT  varAllele
 #DNAStringSet DNAStringSetList DNAStringSet
 #  chrV:15822727  G  A  A
 #  chrV:15822727  G  A  A
 #  REFCODON   VARCODON
 #DNAStringSet DNAStringSet
 #  chrV:15822727GATAAT
 #  chrV:15822727GATAAT
 #  ---
 #  seqlengths:
 #   chrI chrV
 # NA   NA
 
 Do you know why this could be the case?
 Regards,
 Michael
 
 
 Here is my envirnoment:
 sessionInfo()
 R Under development (unstable) (2013-02-25 r62061)
 Platform: x86_64-unknown-linux-gnu (64-bit)
 
 locale:
 [1] C
 
 attached base packages:
 [1] parallel  stats graphics  grDevices utils datasets
 [7] methods   base
 
 other attached packages:
  [1] BSgenome.Celegans.UCSC.ce6_1.3.19
  [2] BSgenome_1.27.1
  [3] GenomicFeatures_1.11.14
  [4] AnnotationDbi_1.21.12
  [5] Biobase_2.19.3
  [6] VariantAnnotation_1.5.42
  [7] Rsamtools_1.11.21
  [8] Biostrings_2.27.11
  [9] GenomicRanges_1.11.35
 [10] IRanges_1.17.35
 [11] BiocGenerics_0.5.6
 [12] RColorBrewer_1.0-5
 
 loaded via a namespace (and not attached):
  [1] DBI_0.2-5  RCurl_1.95-4.1 RSQLite_0.11.2
  [4] XML_3.95-0.1   biomaRt_2.15.0 bitops_1.0-5
  [7] rtracklayer_1.19.9 stats4_3.0.0   tools_3.0.0
 [10] zlibbioc_1.5.0
 
 
 
 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

--