Hi Martin, there appears to be a minor bug in one of the qa() plots in the code;
ShortRead:::.plotReadOccurrences If the code chunk which generates the dataset for plotting the read occurences is run manually on an example dataset, the cumulative proportion of reads can be slightly greater than 1. It looks like a slight error in the denominator of the calculation. library(ShortRead) exptPath <- system.file("extdata", package = "ShortRead") sp <- SolexaPath(exptPath) qa <- qa(sp) df <- qa[["sequenceDistribution"]] tmp <- with(df, { nOccur <- tapply(nOccurrences, lane, c) cumulative <- tapply(nOccurrences * nReads, lane, function(elt) { cs <- cumsum(elt) (cs - cs[1] + 1)/max(1, diff(range(cs))) ## denominator under estimates by 1 }) lane <- tapply(lane, lane, c) data.frame(nOccurrences = unlist(nOccur), cumulative = unlist(cumulative), lane = unlist(lane)) }) print(tmp) nOccurrences cumulative lane s_2_export.txt1 1 0.0008347245 1 s_2_export.txt2 2 0.0091819699 1 s_2_export.txt3 3 0.0116861436 1 s_2_export.txt4 5 0.0158597663 1 s_2_export.txt5 10 0.0242070117 1 s_2_export.txt6 1 0.6435726210 1 s_2_export.txt7 2 0.6519198664 1 s_2_export.txt8 3 0.6544240401 1 s_2_export.txt9 9 0.6619365609 1 s_2_export.txt10 1 0.9991652755 1 s_2_export.txt11 2 1.0008347245 1 tmp[which(tmp$cumulative>1),] nOccurrences cumulative lane s_2_export.txt11 2 1.000835 1 tmp <- with(df, { nOccur <- tapply(nOccurrences, lane, c) cumulative <- tapply(nOccurrences * nReads, lane, function(elt) { cs <- cumsum(elt) (cs - cs[1] + 1)/max(1, (diff(range(cs))+1)) ## Add 1 to difference }) lane <- tapply(as.character(lane), lane, c) ## as.character to stop factor coercing data.frame(nOccurrences = unlist(nOccur), cumulative = unlist(cumulative), lane = unlist(lane), row.names=NULL) ## Removing row names }) print(tmp) nOccurrences cumulative lane 1 1 0.0008340284 s_2_export.txt 2 2 0.0091743119 s_2_export.txt 3 3 0.0116763970 s_2_export.txt 4 5 0.0158465388 s_2_export.txt 5 10 0.0241868224 s_2_export.txt 6 1 0.6430358632 s_2_export.txt 7 2 0.6513761468 s_2_export.txt 8 3 0.6538782319 s_2_export.txt 9 9 0.6613844871 s_2_export.txt 10 1 0.9983319433 s_2_export.txt 11 2 1.0000000000 s_2_export.txt One suggestion is that the first part of this code (or similar) would be useful in the help so example(qa) could be run and testing with reporoducible examples can be quicker. cheers, Marcus sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_New Zealand.1252 LC_CTYPE=English_New Zealand.1252 [3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C [5] LC_TIME=English_New Zealand.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.10.3 Rsamtools_1.4.1 lattice_0.19-23 [4] Biostrings_2.20.1 GenomicRanges_1.4.3 IRanges_1.10.3 loaded via a namespace (and not attached): [1] Biobase_2.12.1 grid_2.13.0 hwriter_1.3 [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing