Dear Jane,
in the case of genes (or better term: loci) with only one exon, DEXSeq's
testGeneForDEU does not make sense - it is looking for exons within the
locus that behave differently from the average of all the others: there
are no others! Hence the parameters are unidentifiable, and the software
should not even look at them. We'll robustify the software for this case
in future versions (note that the package is still actively being
developed).
The case of the NA values that are not explained by there only being one
exon is more interesting. Can you send us the code to reproduce your
problem (this also requires the data, which you can send offline, and in
which you can for instance anonymize the gene and sample annotations).
Best wishes
Wolfgang
Jul/28/11 4:23 PM, Jane Merlevede scripsit::
Hello,
I am using DEXseq for exon differential analysis. The analysis seems to work
well on genes with several exons, but I have a problem with genes with only
one exon.
Let me present you my dataset:
pData(ecs)
sizeFactor condition replicate type
Raman_1 0.9638822 nonvir 1 paired-end
HM1_1 1.4000715 vir 1 paired-end
Raman_2 1.0314049 nonvir 2 paired-end
Raman_3 1.0734133 nonvir 3 paired-end
HM1_2 0.7801269 vir 2 paired-end
HM1_3 0.9488409 vir 3 paired-end
table( table (geneIDs(ecs)))
1 2 3 4 5 6 7 8
3149 893 874 206 100 42 18 4
As you can see, I have 3149 genes (out of 5286) which have only 1 exon.
There are 9291 exons.
res1<- DEUresultTable(ecs)
summary(res1)
geneID exonID dispersion_CR_est
EHI_051420 : 8 E001 :5286 Min. :0.000e+00
EHI_070690 : 8 E002 :2137 1st Qu.:7.176e-03
EHI_169670 : 8 E003 :1244 Median :1.693e-02
EHI_175010+EHI_175000.1: 8 E004 : 370 Mean :5.546e-02
EHI_005010 : 7 E005 : 164 3rd Qu.:3.641e-02
EHI_042710 : 7 E006 : 64 Max. :1.300e+01
(Other) :9245 (Other): 26 NA's :3.153e+03
dispersion pvalue padjust
Min. :2.010e-02 Min. : 0.0000 Min. : 0.0000
1st Qu.:2.202e-02 1st Qu.: 0.1150 1st Qu.: 0.4597
Median :2.628e-02 Median : 0.4054 Median : 0.8086
Mean :3.229e+04 Mean : 0.4269 Mean : 0.6775
3rd Qu.:4.214e-02 3rd Qu.: 0.7133 3rd Qu.: 0.9510
Max. :1.000e+08 Max. : 1.0000 Max. : 1.0000
NA's :3153.0000 NA's :3153.0000
There are 3153 NA's values. When checking out what were those NA's values, I
realized that they were all the genes with only one exon (and only 4 more
genes in the other cases). I doubt that it is by chance... And they don't
have weird read counts. Check for example EHI_000010, EHI_000410,
EHI_000420, EHI_000430, EHI_000440:
geneCountTable(ecs)[1:21,]
Raman_1 HM1_1 Raman_2 Raman_3 HM1_2 HM1_3
EHI_000010 391 1364 464 449 457 560
EHI_000130 18806 8116 21890 24994 6509 14067
EHI_000240 24057 21825 15365 26903 21101 27841
EHI_000250 3374 2689 3938 4854 2310 3556
EHI_000290 458 3721 437 550 1716 1054
EHI_000410 896 1041 1086 978 905 745
EHI_000420 140 220 244 117 140 78
EHI_000430 583 2043 304 634 1398 1197
EHI_000440 3103 15309 5358 3619 9152 10392
EHI_000450 358 559 991 516 281 564
EHI_000460.1 357 118 449 425 43 746
EHI_000470.1 408 2657 297 397 946 909
EHI_000480 842 1702 858 778 841 847
EHI_000490 187 648 221 239 370 249
EHI_000500 329 506 454 402 244 555
EHI_000520 482 1615 883 440 925 751
EHI_000530 1794 1588 1751 1966 1099 1860
EHI_000540 2857 2504 3706 4152 1479 2547
EHI_000550 1967 3999 1241 1904 2505 2448
EHI_000560 152 434 239 165 273 175
EHI_000570 1209 2142 2183 1213 1905 2662
res1[1:20,]
geneID exonID dispersion_CR_est dispersion pvalue
EHI_000010:001 EHI_000010 E001 NA 0.02377712 NA
EHI_000130:001 EHI_000130 E001 0.074669451 0.07466945 0.48673539
EHI_000130:002 EHI_000130 E002 0.128972813 0.12897281 0.94279527
EHI_000130:003 EHI_000130 E003 0.019602219 0.02031871 0.84327914
EHI_000130:004 EHI_000130 E004 0.067337196 0.06733720 0.70429765
EHI_000240:001 EHI_000240 E001 0.043753997 0.04375400 0.74939865
EHI_000240:002 EHI_000240 E002 0.058562700 0.05856270 0.74939865
EHI_000250:001 EHI_000250 E001 0.031440630 0.03144063 0.70444841
EHI_000250:002 EHI_000250 E002 0.037457450 0.03745745 0.70444841
EHI_000290:001 EHI_000290 E001 0.021113888 0.02396819 0.05100089
EHI_000290:002 EHI_000290 E002 0.007626346 0.03471905 0.95676220
EHI_000290:003 EHI_000290 E003 0.036889359 0.03688936 0.03057067
EHI_000410:001 EHI_000410 E001 NA 0.02235347 NA
EHI_000420:001 EHI_000420 E001 NA 0.03395573 NA
EHI_000430:001 EHI_000430 E001 NA 0.02219522 NA
EHI_000440:001 EHI_000440 E001 NA 0.02037263 NA
EHI_000450:001 EHI_000450 E001 0.024495906 0.04492663 0.64717121
EHI_000450:002 EHI_000450 E002 0.020356260 0.02483641 0.64717121
EHI_000460.1:001 EHI_000460.1 E001 NA 0.02602179 NA
EHI_000470.1:001 EHI_000470.1 E001 NA 0.02254333 NA
padjust
EHI_000010:001 NA
EHI_000130:001 0.8639624
EHI_000130:002 0.9927663
EHI_000130:003 0.9756197
EHI_000130:004 0.9503086
EHI_000240:001 0.9611397
EHI_000240:002 0.9611397
EHI_000250:001 0.9503086
EHI_000250:002 0.9503086
EHI_000290:001 0.2890521
EHI_000290:002 0.9962043
EHI_000290:003 0.2087239
EHI_000410:001 NA
EHI_000420:001 NA
EHI_000430:001 NA
EHI_000440:001 NA
EHI_000450:001 0.9278993
EHI_000450:002 0.9278993
EHI_000460.1:001 NA
EHI_000470.1:001 NA
I checked the model in the documentation and I don't see why the test does
not give a result in this case. Has anyone get the same problem?
Thanks for your help,
Jane Merlevède
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing