Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Hi Valerie, Thank you for the quick fix - it works like a charm! Michael On 03/10/2013 02:45 AM, Valerie Obenchain wrote: Hi Michael, Thanks for reporting the bug and sending a reproducible example. Sorry it took a few days to get to this. A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel). Valerie On 03/07/13 05:43, Michael Stadler wrote: Here is the second attachment (variants.vcf) that was missing from the first message. Michael On 03/07/2013 02:40 PM, Michael Stadler wrote: Dear Valerie and colleagues, I have recently started using the VariantAnnotation package, which is a huge asset for much of my work with sequence variants - thank you! I have a question regarding the following example (it makes use of two small text files, annot.gtf and variants.vcf, attached to this message): library(VariantAnnotation) library(GenomicFeatures) library(BSgenome.Celegans.UCSC.ce6) # build a TranscriptDb from annot.gtf chrominfo - data.frame(chrom=as.character(seqnames(Celegans)), length=seqlengths(Celegans), is_circular=FALSE) txdb - makeTranscriptDbFromGFF(file=annot.gtf, format=gtf, chrominfo=chrominfo, dataSource=WormBase v. WS190, species=Caenorhabditis elegans) txdb #TranscriptDb object: #| Db type: TranscriptDb #| Supporting package: GenomicFeatures #| Data source: WormBase v. WS190 #| Organism: Caenorhabditis elegans #| miRBase build ID: NA #| transcript_nrow: 7 #| exon_nrow: 42 #| cds_nrow: 42 #| Db created by: GenomicFeatures package from Bioconductor #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) #| GenomicFeatures version at creation time: 1.11.14 #| RSQLite version at creation time: 0.11.2 #| DBSCHEMAVERSION: 1.0 # read in sequence variants vcf - readVcf(variants.vcf, genome=ce6) dim(vcf) #[1] 3 1 # predict the impact on coding sequences aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans) length(aa) #[1] 7 # My question is related to the impact of variant chrV:15822727: # This SNV is overlapping two transcripts (same base, plus strand), # yet the reported VARCODON values are different (GAT - TAT/AAT): aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATTAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA # interestingly, when altering the annotation or the set of variants, # this contradictions disapears (GAT - AAT): vcf2 - vcf[-1] # remove the first SNV aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) length(aa2) #[1] 5 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATAAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA Do you know why this could be the case? Regards, Michael Here is my envirnoment: sessionInfo() R Under development (unstable) (2013-02-25 r62061) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] BSgenome.Celegans.UCSC.ce6_1.3.19 [2] BSgenome_1.27.1 [3] GenomicFeatures_1.11.14 [4] AnnotationDbi_1.21.12 [5] Biobase_2.19.3 [6] VariantAnnotation_1.5.42 [7] Rsamtools_1.11.21 [8] Biostrings_2.27.11 [9] GenomicRanges_1.11.35 [10] IRanges_1.17.35 [11] BiocGenerics_0.5.6 [12] RColorBrewer_1.0-5 loaded via a namespace (and
Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Hi Michael, Thanks for reporting the bug and sending a reproducible example. Sorry it took a few days to get to this. A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel). Valerie On 03/07/13 05:43, Michael Stadler wrote: Here is the second attachment (variants.vcf) that was missing from the first message. Michael On 03/07/2013 02:40 PM, Michael Stadler wrote: Dear Valerie and colleagues, I have recently started using the VariantAnnotation package, which is a huge asset for much of my work with sequence variants - thank you! I have a question regarding the following example (it makes use of two small text files, annot.gtf and variants.vcf, attached to this message): library(VariantAnnotation) library(GenomicFeatures) library(BSgenome.Celegans.UCSC.ce6) # build a TranscriptDb from annot.gtf chrominfo - data.frame(chrom=as.character(seqnames(Celegans)), length=seqlengths(Celegans), is_circular=FALSE) txdb - makeTranscriptDbFromGFF(file=annot.gtf, format=gtf, chrominfo=chrominfo, dataSource=WormBase v. WS190, species=Caenorhabditis elegans) txdb #TranscriptDb object: #| Db type: TranscriptDb #| Supporting package: GenomicFeatures #| Data source: WormBase v. WS190 #| Organism: Caenorhabditis elegans #| miRBase build ID: NA #| transcript_nrow: 7 #| exon_nrow: 42 #| cds_nrow: 42 #| Db created by: GenomicFeatures package from Bioconductor #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) #| GenomicFeatures version at creation time: 1.11.14 #| RSQLite version at creation time: 0.11.2 #| DBSCHEMAVERSION: 1.0 # read in sequence variants vcf - readVcf(variants.vcf, genome=ce6) dim(vcf) #[1] 3 1 # predict the impact on coding sequences aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans) length(aa) #[1] 7 # My question is related to the impact of variant chrV:15822727: # This SNV is overlapping two transcripts (same base, plus strand), # yet the reported VARCODON values are different (GAT - TAT/AAT): aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATTAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA # interestingly, when altering the annotation or the set of variants, # this contradictions disapears (GAT - AAT): vcf2 - vcf[-1] # remove the first SNV aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) length(aa2) #[1] 5 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATAAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA Do you know why this could be the case? Regards, Michael Here is my envirnoment: sessionInfo() R Under development (unstable) (2013-02-25 r62061) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] BSgenome.Celegans.UCSC.ce6_1.3.19 [2] BSgenome_1.27.1 [3] GenomicFeatures_1.11.14 [4] AnnotationDbi_1.21.12 [5] Biobase_2.19.3 [6] VariantAnnotation_1.5.42 [7] Rsamtools_1.11.21 [8] Biostrings_2.27.11 [9] GenomicRanges_1.11.35 [10] IRanges_1.17.35 [11] BiocGenerics_0.5.6 [12] RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2 [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5 [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0 [10] zlibbioc_1.5.0
[Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Dear Valerie and colleagues, I have recently started using the VariantAnnotation package, which is a huge asset for much of my work with sequence variants - thank you! I have a question regarding the following example (it makes use of two small text files, annot.gtf and variants.vcf, attached to this message): library(VariantAnnotation) library(GenomicFeatures) library(BSgenome.Celegans.UCSC.ce6) # build a TranscriptDb from annot.gtf chrominfo - data.frame(chrom=as.character(seqnames(Celegans)), length=seqlengths(Celegans), is_circular=FALSE) txdb - makeTranscriptDbFromGFF(file=annot.gtf, format=gtf, chrominfo=chrominfo, dataSource=WormBase v. WS190, species=Caenorhabditis elegans) txdb #TranscriptDb object: #| Db type: TranscriptDb #| Supporting package: GenomicFeatures #| Data source: WormBase v. WS190 #| Organism: Caenorhabditis elegans #| miRBase build ID: NA #| transcript_nrow: 7 #| exon_nrow: 42 #| cds_nrow: 42 #| Db created by: GenomicFeatures package from Bioconductor #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) #| GenomicFeatures version at creation time: 1.11.14 #| RSQLite version at creation time: 0.11.2 #| DBSCHEMAVERSION: 1.0 # read in sequence variants vcf - readVcf(variants.vcf, genome=ce6) dim(vcf) #[1] 3 1 # predict the impact on coding sequences aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans) length(aa) #[1] 7 # My question is related to the impact of variant chrV:15822727: # This SNV is overlapping two transcripts (same base, plus strand), # yet the reported VARCODON values are different (GAT - TAT/AAT): aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATTAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA # interestingly, when altering the annotation or the set of variants, # this contradictions disapears (GAT - AAT): vcf2 - vcf[-1] # remove the first SNV aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) length(aa2) #[1] 5 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATAAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA Do you know why this could be the case? Regards, Michael Here is my envirnoment: sessionInfo() R Under development (unstable) (2013-02-25 r62061) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] BSgenome.Celegans.UCSC.ce6_1.3.19 [2] BSgenome_1.27.1 [3] GenomicFeatures_1.11.14 [4] AnnotationDbi_1.21.12 [5] Biobase_2.19.3 [6] VariantAnnotation_1.5.42 [7] Rsamtools_1.11.21 [8] Biostrings_2.27.11 [9] GenomicRanges_1.11.35 [10] IRanges_1.17.35 [11] BiocGenerics_0.5.6 [12] RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2 [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5 [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0 [10] zlibbioc_1.5.0 chrVCoding_transcript exon6512065 6512138 . - 2 gene_id WBGene1073; transcript_id F46E10.9.1; chrVCoding_transcript exon6512189 6512279 . - 0 gene_id WBGene1073; transcript_id F46E10.9.1; chrVCoding_transcript exon6512325 6512472 . - 1 gene_id WBGene1073; transcript_id F46E10.9.1; chrVCoding_transcript exon6512521 6512649 .
Re: [Bioc-devel] contradictions in the return value of predictCoding() from VariantAnnotation
Here is the second attachment (variants.vcf) that was missing from the first message. Michael On 03/07/2013 02:40 PM, Michael Stadler wrote: Dear Valerie and colleagues, I have recently started using the VariantAnnotation package, which is a huge asset for much of my work with sequence variants - thank you! I have a question regarding the following example (it makes use of two small text files, annot.gtf and variants.vcf, attached to this message): library(VariantAnnotation) library(GenomicFeatures) library(BSgenome.Celegans.UCSC.ce6) # build a TranscriptDb from annot.gtf chrominfo - data.frame(chrom=as.character(seqnames(Celegans)), length=seqlengths(Celegans), is_circular=FALSE) txdb - makeTranscriptDbFromGFF(file=annot.gtf, format=gtf, chrominfo=chrominfo, dataSource=WormBase v. WS190, species=Caenorhabditis elegans) txdb #TranscriptDb object: #| Db type: TranscriptDb #| Supporting package: GenomicFeatures #| Data source: WormBase v. WS190 #| Organism: Caenorhabditis elegans #| miRBase build ID: NA #| transcript_nrow: 7 #| exon_nrow: 42 #| cds_nrow: 42 #| Db created by: GenomicFeatures package from Bioconductor #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) #| GenomicFeatures version at creation time: 1.11.14 #| RSQLite version at creation time: 0.11.2 #| DBSCHEMAVERSION: 1.0 # read in sequence variants vcf - readVcf(variants.vcf, genome=ce6) dim(vcf) #[1] 3 1 # predict the impact on coding sequences aa - predictCoding(query=vcf, subject=txdb, seqSource=Celegans) length(aa) #[1] 7 # My question is related to the impact of variant chrV:15822727: # This SNV is overlapping two transcripts (same base, plus strand), # yet the reported VARCODON values are different (GAT - TAT/AAT): aa[names(aa)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATTAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA # interestingly, when altering the annotation or the set of variants, # this contradictions disapears (GAT - AAT): vcf2 - vcf[-1] # remove the first SNV aa2 - predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) length(aa2) #[1] 5 aa2[names(aa2)==chrV:15822727,c(REF,ALT,varAllele,REFCODON,VARCODON)] #GRanges with 2 ranges and 5 metadata columns: #seqnames ranges strand | # RleIRanges Rle | # chrV:15822727 chrV [15822727, 15822727] + | # chrV:15822727 chrV [15822727, 15822727] + | # REFALT varAllele #DNAStringSet DNAStringSetList DNAStringSet # chrV:15822727 G A A # chrV:15822727 G A A # REFCODON VARCODON #DNAStringSet DNAStringSet # chrV:15822727GATAAT # chrV:15822727GATAAT # --- # seqlengths: # chrI chrV # NA NA Do you know why this could be the case? Regards, Michael Here is my envirnoment: sessionInfo() R Under development (unstable) (2013-02-25 r62061) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets [7] methods base other attached packages: [1] BSgenome.Celegans.UCSC.ce6_1.3.19 [2] BSgenome_1.27.1 [3] GenomicFeatures_1.11.14 [4] AnnotationDbi_1.21.12 [5] Biobase_2.19.3 [6] VariantAnnotation_1.5.42 [7] Rsamtools_1.11.21 [8] Biostrings_2.27.11 [9] GenomicRanges_1.11.35 [10] IRanges_1.17.35 [11] BiocGenerics_0.5.6 [12] RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2 [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5 [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0 [10] zlibbioc_1.5.0 ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel --