Dear Analysis Gurus

I am currently performing a gene expression analysis on a plant parasite. I 
have mapped Illumina read counts for various stages in this parasites 
lifecycle. Of interest for us in this analysis are genes that are 
differentially expressed during these lifecycles. To determine this, I have 
focused on two types of differential expression: "peaks" and "cliffs." "Peaks" 
occur when a gene is differentially expressed in one time sample (either higher 
or lower than the remaining samples) and "cliffs" occur when a gene is 
differentially expressed between two groups of sample (for instance higher 
expression in the first three samples than the last three). 

To determine these peaks and cliffs, I have been creating groups in which the 
desired peak/cliff is "case" and the remaining samples are "control." I then 
run common dispersion and/or tagwise dispersion and extract those reads with an 
FDR of less than 0.1. So, my questions:

1.) How much filtering of data should I do? Right now I have a fair amount of 
genes that are expressed in 0, 1, 2 etc. samples. It seems logical that I would 
filter out genes that have no expression, but at what level should it stop? 
Also, should there be different filtering depending on the analysis (peak or 
cliff)?

2.) When doing tagwise dispersion, what should I set my prior.n to (I currently 
have 7 samples)? Does it depend on the type of analysis?

3.) Should I investigate using a more advanced glm based analysis? Any advice 
on crafting a design for this?

4.) Any other ideas on analyses to perform on a set of timeseries data with 
EdgeR?

I greatly appreciate any help/advice and thank you in advance!
 
Mark J. Lawson, Ph.D.
Bioinformatics Research Scientist
Center for Public Health Genomics, UVA
mlaw...@virginia.edu

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to