Hi Davis/Gordon: I posted my question here again hope you can see it. When I tried edgeR and met a problem with the number of pseudocounts for each library after normalization, which should come to close numbers. This have been addressed in edgeR several times that "the total counts in each libray of the pseudocounts agrees well with the common library size" (page 27 & 44 of the user's guide), but my result are quite different between treatments although for the replicates within treatment the pseudocounts are very similar. I can't get to the common.lib.size for each treatment after I tried several methods (TMM, RLE and quantile). 1) Did I miss anything during my run with edgeR? How can I assure the normalization went well?
2) Does the normalized library size of the conditions matter or NOT, if they are different from the common.lib.size? 3) Is the result still meaningful even the library sizes of pseudocounts are different? 4) What could probably be the reason(s) to cause the library sizes of pseudocounts so different? 5) Should I remove the smaller number reads as some other people do? After I removed the smaller numbers of counts (<=40 in >=6 out of 14 samples), the normalized library sizes become very similar. I can feel my lack of mathematics for the packages. I attach part of my code here. --------------------------------------------------------------------- d$samples$lib.size #"Zygote1", 21012147 "Zygote2", 19924212 "Octant1", 9660245 "Octant2", 26002900 "Globular1",17139388 "Globular2", 7649319 "Heart1", 16430105 "Heart2", 20101956 "Torpedo1", 12920266 "Torpedo2", 6306742 "Bent1", 44241095 "Bent2", 20094409 "Mature1", 15166090 "Mature2", 23203758 d$common.lib.size [1] 16554344.47 colSums(d$pseudo.alt) # Zygote1 21523774.62 Zygote2 21638415.63 Octant1 14533481.82 Octant2 12046955.46 Globular1 18920316.62 Globular2 18439528.30 Heart1 11754608.30 Heart2 12759230.11 Torpedo1 11248245.52 Torpedo2 11410667.92 Bent1 16101723.65 Bent2 17980670.24 Mature1 26785396.02 Mature2 27067289.80 # > sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ALL_1.4.7 Biobase_2.12.1 limma_3.8.2 edgeR_2.2.5 loaded via a namespace (and not attached): [1] tools_2.13.0 --------------------------------------------------------------------- [[elided Hotmail spam]] Yifang Yifang Tan [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing