Hi Michael, I noticed that when the tallyVariants function receives a 'which' arguments (via BamTallyParam), that contains overlapping or duplicated regions, duplicated rows are returned.
(See below for an example.) It took me a little while to understand where I was picking duplicates. Would it be useful to 'reduce' the 'which' GRanges/RangesList object by default, e.g. before tallying variants, to make sure each base is only tallied once ? Best, Thomas library(VariantTools) ## 'which' is a set of non-overlapping regions tally.param <- TallyVariantsParam(gmapR::TP53Genome(), high_base_quality = 23L, which = gmapR::TP53Which()) bams <- LungCancerLines::LungCancerBamFiles() raw.variants <- tallyVariants(bams$H1993, tally.param) any(duplicated( raw.variants )) ## FALSE ## 'which' is a set of duplicated regions tally.param <- TallyVariantsParam( gmapR::TP53Genome(), high_base_quality = 23L, which = c( gmapR::TP53Which(), gmapR::TP53Which() ) ) raw.variants <- tallyVariants(bams$H1993, tally.param) any(duplicated( raw.variants )) ## TRUE sort(raw.variants)[1:4] ### SessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices [6] utils datasets methods base other attached packages: [1] VariantTools_1.8.0 VariantAnnotation_1.12.9 [3] Rsamtools_1.18.2 Biostrings_2.34.1 [5] XVector_0.6.0 GenomicRanges_1.18.4 [7] GenomeInfoDb_1.2.4 IRanges_2.0.1 [9] S4Vectors_0.4.0 BiocGenerics_0.12.1 [11] BiocInstaller_1.16.1 roxygen2_4.1.0 [13] devtools_1.7.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.28.1 base64enc_0.1-2 [3] BatchJobs_1.5 BBmisc_1.9 [5] Biobase_2.26.0 BiocParallel_1.0.3 [7] biomaRt_2.22.0 bitops_1.0-6 [9] brew_1.0-6 BSgenome_1.34.1 [11] checkmate_1.5.1 codetools_0.2-10 [13] DBI_0.3.1 digest_0.6.8 [15] fail_1.2 foreach_1.4.2 [17] GenomicAlignments_1.2.1 GenomicFeatures_1.18.3 [19] gmapR_1.8.0 grid_3.1.2 [21] iterators_1.0.7 lattice_0.20-29 [23] LungCancerLines_0.3.1 Matrix_1.1-5 [25] Rcpp_0.11.4 RCurl_1.95-4.5 [27] RSQLite_1.0.0 rtracklayer_1.26.2 [29] sendmailR_1.2-1 stringr_0.6.2 [31] tools_3.1.2 XML_3.98-1.1 [33] zlibbioc_1.12.0 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel