Hi Lizex, > I've started analyzing my RNA-Seq data for two time points: Day0 and Day4 for > control and treated. I've done aligning the data to the reference genome > using Tophat. I've removed duplicates from the data sets. Could somebody > please tell me, how important is it to remove duplicates and how will it > influence my results if I don't remove?
This depends on whether you are removing duplicates in your fastq data and/or multi-mapping reads either using Tophat or post-processing steps. In any case, this approach that will affect quantitation outputs from Cufflinks and likely transcript assemblies as well. > I want to start with Cufflinks all the way through to Cuffdiff. Where do I > start since there are just so many options (in the manual) to choose from? > What do I look for? > Here's a tutorial that will help you get started with RNA-seq analysis: http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise Galaxy makes it easy to experiment with different parameter values, so you'll want to read the Cufflinks/compare/diff manual and adjust parameters that are relevant to your data: http://cufflinks.cbcb.umd.edu/manual.html In general, RNA-seq studies look at (a) transcripts assembled; (b) expression values; and (c) differential expression estimates. Good luck, J.
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/