[Bioc-sig-seq] shortRead qa() minor bug

Marcus Davy Tue, 17 May 2011 17:26:49 -0700

Hi Martin,
there appears to be a minor bug in one of the qa() plots in the code;


ShortRead:::.plotReadOccurrences

If the code chunk which generates the dataset for plotting the read
occurences is run manually on an example dataset, the cumulative
proportion of reads can be slightly greater than 1. It looks like a slight
error in the denominator of the calculation.

library(ShortRead)

exptPath <- system.file("extdata", package = "ShortRead")
sp       <- SolexaPath(exptPath)
qa       <- qa(sp)

df       <- qa[["sequenceDistribution"]]

tmp <- with(df, {
        nOccur <- tapply(nOccurrences, lane, c)
        cumulative <- tapply(nOccurrences * nReads, lane, function(elt) {
            cs <- cumsum(elt)
            (cs - cs[1] + 1)/max(1, diff(range(cs))) ## denominator under
estimates by 1
        })
        lane <- tapply(lane, lane, c)
        data.frame(nOccurrences = unlist(nOccur), cumulative =
unlist(cumulative),
            lane = unlist(lane))
    })

print(tmp)
                 nOccurrences   cumulative lane
s_2_export.txt1             1 0.0008347245    1
s_2_export.txt2             2 0.0091819699    1
s_2_export.txt3             3 0.0116861436    1
s_2_export.txt4             5 0.0158597663    1
s_2_export.txt5            10 0.0242070117    1
s_2_export.txt6             1 0.6435726210    1
s_2_export.txt7             2 0.6519198664    1
s_2_export.txt8             3 0.6544240401    1
s_2_export.txt9             9 0.6619365609    1
s_2_export.txt10            1 0.9991652755    1
s_2_export.txt11            2 1.0008347245    1

tmp[which(tmp$cumulative>1),]
                 nOccurrences cumulative lane
s_2_export.txt11            2   1.000835    1

tmp <- with(df, {
        nOccur <- tapply(nOccurrences, lane, c)
        cumulative <- tapply(nOccurrences * nReads, lane, function(elt) {
            cs <- cumsum(elt)
            (cs - cs[1] + 1)/max(1, (diff(range(cs))+1))  ## Add 1 to
difference
        })
        lane <- tapply(as.character(lane), lane, c) ## as.character to stop
factor coercing
       data.frame(nOccurrences = unlist(nOccur), cumulative =
unlist(cumulative),
            lane = unlist(lane), row.names=NULL)  ## Removing row names
    })

print(tmp)
   nOccurrences   cumulative           lane
1             1 0.0008340284 s_2_export.txt
2             2 0.0091743119 s_2_export.txt
3             3 0.0116763970 s_2_export.txt
4             5 0.0158465388 s_2_export.txt
5            10 0.0241868224 s_2_export.txt
6             1 0.6430358632 s_2_export.txt
7             2 0.6513761468 s_2_export.txt
8             3 0.6538782319 s_2_export.txt
9             9 0.6613844871 s_2_export.txt
10            1 0.9983319433 s_2_export.txt
11            2 1.0000000000 s_2_export.txt



One suggestion is that the first part of this code (or similar) would be
useful  in the
help so example(qa) could be run and testing with reporoducible examples can
be quicker.

cheers,

Marcus


sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_New Zealand.1252  LC_CTYPE=English_New Zealand.1252

[3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C

[5] LC_TIME=English_New Zealand.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ShortRead_1.10.3    Rsamtools_1.4.1     lattice_0.19-23
[4] Biostrings_2.20.1   GenomicRanges_1.4.3 IRanges_1.10.3

loaded via a namespace (and not attached):
[1] Biobase_2.12.1 grid_2.13.0    hwriter_1.3

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

[Bioc-sig-seq] shortRead qa() minor bug

Reply via email to