[galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
Dear All,

I am analysing RNA-seq datasets for differential splicing events between cell 
types. These are mouse cells. Jen suggested me to use the iGenomes version of 
reference GTF to take full advantage of the options in CuffDiff. My question 
is: should I use this iGenome version reference GTF when I run Tophat?

Thanks.

Jianguang
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
Hi Jen,
Thanks for your help.
Do you mean that if I want to find novel isoform/splicing, I need to select 
No under Use Reference Annotation when I run Cufflink, and then use iGenome 
version of reference GTF when I run Cuffmerge?

Based on your information and some protocols found online, my understanding is 
that: 
1) if use iGenome version of reference GTF, I only need to run Cuffmerge with 
the Cufflink ouputs, because iGenome version reference GTF already contains 
attributes such as p_id and tss_id. Then the Cuffmerge output can be used for 
Cuffdiff.
2) however, if I use the reference GTF from Ensembl/UCSC (rather than from 
iGenome), I need to run Cuffcompare to create p_id and tss_id, which is 
required for Cuffdiff.
Am I right?

Another question is: should I use iGenome version of reference GTF when I run 
Tophat if I want to see novel isoforms/splicing?

Thanks.
Jianguang


From: Jennifer Jackson [j...@bx.psu.edu]
Sent: Thursday, August 23, 2012 11:46 AM
To: Du, Jianguang
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for 
Tophat?

Hello Jianguang,

When in the analysis process to start using the reference GTF file can
depend on whether or not you intend to do any discovery along with
differential expression testing. At the TopHat and Cufflinks steps,
using reference GTF file can influence how datasets will map and
assemble. In general, if your intention is to do discovery (e.g. work
with novel isoforms in your data, but not in the reference), then do not
add in the reference GTF until the CuffMerge step (to produce the input
annotation GTF file for Cuffdiff). But if you want to guide the analysis
toward known isoforms, then use the reference GTF.

This is the process our RNA-seq example protocol follows:
http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise

For reference, there are other variations of this on the Cufflinks web
site, some that never lead to Cuffdiff, but still may be useful to
review. Please see the Cufflinks paper (linked from right side bar as
Protocol for many more options/discussion.
http://cufflinks.cbcb.umd.edu/tutorial.html
-- Common uses of the Cufflinks package

The end decision will be up to you, and a few runs with different
options may be a useful way to make the final call, but hopefully this
provides some resources to help you understand the option,

Jen
Galaxy team

On 8/23/12 8:03 AM, Du, Jianguang wrote:
 Dear All,

 I am analysing RNA-seq datasets for differential splicing events between
 cell types. These are mouse cells. Jen suggested me to use the iGenomes
 version of reference GTF to take full advantage of the options in
 CuffDiff. My question is: should I use this iGenome version reference
 GTF when I run Tophat?

 Thanks.

 Jianguang



 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:

http://lists.bx.psu.edu/


--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Jennifer Jackson

Hello Jianguang,

On 8/23/12 11:28 AM, Du, Jianguang wrote:

Hi Jen,
Thanks for your help.
Do you mean that if I want to find novel isoform/splicing, I need to select No under 
Use Reference Annotation when I run Cufflink, and then use iGenome version of reference 
GTF when I run Cuffmerge?


Yes, according to the tool documentation, this is the method.


Based on your information and some protocols found online, my understanding is 
that:
1) if use iGenome version of reference GTF, I only need to run Cuffmerge with 
the Cufflink ouputs, because iGenome version reference GTF already contains 
attributes such as p_id and tss_id. Then the Cuffmerge output can be used for 
Cuffdiff.

Yes, this is the example protocol I shared.


2) however, if I use the reference GTF from Ensembl/UCSC (rather than from 
iGenome), I need to run Cuffcompare to create p_id and tss_id, which is 
required for Cuffdiff.
This can be tricky, it depends on what order you run the tools with and 
without the GTF annotation. The protocol in #1 is recommended.



Am I right?

Another question is: should I use iGenome version of reference GTF when I run 
Tophat if I want to see novel isoforms/splicing?
Yes, this is what I intended to answer in my original reply, I apologize 
if that was not clear. The reference GTF can influence both mapping and 
assembly. So, both Tophat and Cufflinks. The information on the TopHat 
web site for the parameter provides more information (see link on TopHat 
tool form). The tool authors can also be contacted if there are some 
details that you are curious about that are not covered in the primary 
documentation: tophat.cuffli...@gmail.com


Others are welcome to add to the thread with their experiences if they 
have used a reference annotation GTF with Tophat (or chosen not to for a 
particular reason that they would like to share),


Best,

Jen
Galaxy team


Thanks.
Jianguang


From: Jennifer Jackson [j...@bx.psu.edu]
Sent: Thursday, August 23, 2012 11:46 AM
To: Du, Jianguang
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for 
Tophat?

Hello Jianguang,

When in the analysis process to start using the reference GTF file can
depend on whether or not you intend to do any discovery along with
differential expression testing. At the TopHat and Cufflinks steps,
using reference GTF file can influence how datasets will map and
assemble. In general, if your intention is to do discovery (e.g. work
with novel isoforms in your data, but not in the reference), then do not
add in the reference GTF until the CuffMerge step (to produce the input
annotation GTF file for Cuffdiff). But if you want to guide the analysis
toward known isoforms, then use the reference GTF.

This is the process our RNA-seq example protocol follows:
http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise

For reference, there are other variations of this on the Cufflinks web
site, some that never lead to Cuffdiff, but still may be useful to
review. Please see the Cufflinks paper (linked from right side bar as
Protocol for many more options/discussion.
http://cufflinks.cbcb.umd.edu/tutorial.html
-- Common uses of the Cufflinks package

The end decision will be up to you, and a few runs with different
options may be a useful way to make the final call, but hopefully this
provides some resources to help you understand the option,

Jen
Galaxy team

On 8/23/12 8:03 AM, Du, Jianguang wrote:

Dear All,

I am analysing RNA-seq datasets for differential splicing events between
cell types. These are mouse cells. Jen suggested me to use the iGenomes
version of reference GTF to take full advantage of the options in
CuffDiff. My question is: should I use this iGenome version reference
GTF when I run Tophat?

Thanks.

Jianguang



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

http://lists.bx.psu.edu/



--
Jennifer Jackson
http://galaxyproject.org



--
Jennifer Jackson
http://galaxyproject.org
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists

Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
Hi Jen,
I had a problem when I tried to run Tophat with the iGenome reference GTF.
What I did is:
1) uploaded iGenome version of mm9 genes.gtf by: Shared Data - Data Libraries 
- iGenomes - click genes.gtf under mm9 - click Go for Import to 
current history. The genes.gtf appeared in history and turned green.
2) click Tophat for Illumina Find splice junctions using RNA-seq data to open 
the window of Tophat for Illumina (version 1.5.0)
3) selected the dataset to be analysed under RNA-Seq FASTQ file:.
4) chose Use one from the history under Will you select a reference genome 
from your history or use a built-in index?:
Then the screen refreshed and the box (pulldown menu) under Select the 
reference genome: became smaller. Nothing showed up in the pulldown menu 
(actually the menu can not be pulled down). So that I could not input iGenome 
reference GTF. Looks like the Tophat can only Use a built-in index.
How can I solve this problem?
Thanks in advance.
Jianguang 



From: galaxy-user-boun...@lists.bx.psu.edu 
[galaxy-user-boun...@lists.bx.psu.edu] on behalf of Du, Jianguang 
[jia...@iupui.edu]
Sent: Thursday, August 23, 2012 4:01 PM
To: Jennifer Jackson
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for 
Tophat?

Hi Jen,
Thank you very much for your help.
Jianguang


From: Jennifer Jackson [j...@bx.psu.edu]
Sent: Thursday, August 23, 2012 3:53 PM
To: Du, Jianguang
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for 
Tophat?

Hello Jianguang,

On 8/23/12 11:28 AM, Du, Jianguang wrote:
 Hi Jen,
 Thanks for your help.
 Do you mean that if I want to find novel isoform/splicing, I need to select 
 No under Use Reference Annotation when I run Cufflink, and then use 
 iGenome version of reference GTF when I run Cuffmerge?

Yes, according to the tool documentation, this is the method.

 Based on your information and some protocols found online, my understanding 
 is that:
 1) if use iGenome version of reference GTF, I only need to run Cuffmerge with 
 the Cufflink ouputs, because iGenome version reference GTF already contains 
 attributes such as p_id and tss_id. Then the Cuffmerge output can be used for 
 Cuffdiff.
Yes, this is the example protocol I shared.

 2) however, if I use the reference GTF from Ensembl/UCSC (rather than from 
 iGenome), I need to run Cuffcompare to create p_id and tss_id, which is 
 required for Cuffdiff.
This can be tricky, it depends on what order you run the tools with and
without the GTF annotation. The protocol in #1 is recommended.

 Am I right?

 Another question is: should I use iGenome version of reference GTF when I run 
 Tophat if I want to see novel isoforms/splicing?
Yes, this is what I intended to answer in my original reply, I apologize
if that was not clear. The reference GTF can influence both mapping and
assembly. So, both Tophat and Cufflinks. The information on the TopHat
web site for the parameter provides more information (see link on TopHat
tool form). The tool authors can also be contacted if there are some
details that you are curious about that are not covered in the primary
documentation: tophat.cuffli...@gmail.com

Others are welcome to add to the thread with their experiences if they
have used a reference annotation GTF with Tophat (or chosen not to for a
particular reason that they would like to share),

Best,

Jen
Galaxy team

 Thanks.
 Jianguang

 
 From: Jennifer Jackson [j...@bx.psu.edu]
 Sent: Thursday, August 23, 2012 11:46 AM
 To: Du, Jianguang
 Cc: galaxy-user@lists.bx.psu.edu
 Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF 
 for Tophat?

 Hello Jianguang,

 When in the analysis process to start using the reference GTF file can
 depend on whether or not you intend to do any discovery along with
 differential expression testing. At the TopHat and Cufflinks steps,
 using reference GTF file can influence how datasets will map and
 assemble. In general, if your intention is to do discovery (e.g. work
 with novel isoforms in your data, but not in the reference), then do not
 add in the reference GTF until the CuffMerge step (to produce the input
 annotation GTF file for Cuffdiff). But if you want to guide the analysis
 toward known isoforms, then use the reference GTF.

 This is the process our RNA-seq example protocol follows:
 http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise

 For reference, there are other variations of this on the Cufflinks web
 site, some that never lead to Cuffdiff, but still may be useful to
 review. Please see the Cufflinks paper (linked from right side bar as
 Protocol for many more options/discussion.
 http://cufflinks.cbcb.umd.edu/tutorial.html
 -- Common uses of the Cufflinks package

 The end decision will be up to you, and a few runs

Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Jennifer Jackson

Hello Jianguang,

Two different data are being mixed up: genome vs annotation

reference genome (format: fasta) vs reference annotation (format: GTF)

To annotation your sequences against the mm9 reference genome, choose 
locally cashed and select mm9 from the pull down menu.


Then, optionally, if you want to guide the mapping with a reference 
annotation GTF file, that is what the genes.gtf file represents. The 
option is set on the TopHat form under:


TopHat settings to use: Full Paramater list
Use Own Junctions: Yes
Use Gene Annotation Model: Yes
Gene Model Annotations: select the dataset with the GTF file

Best,

Jen
Galaxy team


On 8/23/12 2:48 PM, Du, Jianguang wrote:

Hi Jen,
I had a problem when I tried to run Tophat with the iGenome reference GTF.
What I did is:
1) uploaded iGenome version of mm9 genes.gtf by: Shared Data - Data Libraries - iGenomes - click genes.gtf 
under mm9 - click Go for Import to current history. The genes.gtf appeared in history and 
turned green.
2) click Tophat for Illumina Find splice junctions using RNA-seq data to open the 
window of Tophat for Illumina (version 1.5.0)
3) selected the dataset to be analysed under RNA-Seq FASTQ file:.
4) chose Use one from the history under Will you select a reference genome from 
your history or use a built-in index?:
Then the screen refreshed and the box (pulldown menu) under Select the reference 
genome: became smaller. Nothing showed up in the pulldown menu (actually the menu can not be 
pulled down). So that I could not input iGenome reference GTF. Looks like the Tophat can only 
Use a built-in index.
How can I solve this problem?
Thanks in advance.
Jianguang



From: galaxy-user-boun...@lists.bx.psu.edu 
[galaxy-user-boun...@lists.bx.psu.edu] on behalf of Du, Jianguang 
[jia...@iupui.edu]
Sent: Thursday, August 23, 2012 4:01 PM
To: Jennifer Jackson
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for 
Tophat?

Hi Jen,
Thank you very much for your help.
Jianguang


From: Jennifer Jackson [j...@bx.psu.edu]
Sent: Thursday, August 23, 2012 3:53 PM
To: Du, Jianguang
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for 
Tophat?

Hello Jianguang,

On 8/23/12 11:28 AM, Du, Jianguang wrote:

Hi Jen,
Thanks for your help.
Do you mean that if I want to find novel isoform/splicing, I need to select No under 
Use Reference Annotation when I run Cufflink, and then use iGenome version of reference 
GTF when I run Cuffmerge?


Yes, according to the tool documentation, this is the method.


Based on your information and some protocols found online, my understanding is 
that:
1) if use iGenome version of reference GTF, I only need to run Cuffmerge with 
the Cufflink ouputs, because iGenome version reference GTF already contains 
attributes such as p_id and tss_id. Then the Cuffmerge output can be used for 
Cuffdiff.

Yes, this is the example protocol I shared.


2) however, if I use the reference GTF from Ensembl/UCSC (rather than from 
iGenome), I need to run Cuffcompare to create p_id and tss_id, which is 
required for Cuffdiff.

This can be tricky, it depends on what order you run the tools with and
without the GTF annotation. The protocol in #1 is recommended.


Am I right?

Another question is: should I use iGenome version of reference GTF when I run 
Tophat if I want to see novel isoforms/splicing?

Yes, this is what I intended to answer in my original reply, I apologize
if that was not clear. The reference GTF can influence both mapping and
assembly. So, both Tophat and Cufflinks. The information on the TopHat
web site for the parameter provides more information (see link on TopHat
tool form). The tool authors can also be contacted if there are some
details that you are curious about that are not covered in the primary
documentation: tophat.cuffli...@gmail.com

Others are welcome to add to the thread with their experiences if they
have used a reference annotation GTF with Tophat (or chosen not to for a
particular reason that they would like to share),

Best,

Jen
Galaxy team


Thanks.
Jianguang


From: Jennifer Jackson [j...@bx.psu.edu]
Sent: Thursday, August 23, 2012 11:46 AM
To: Du, Jianguang
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for 
Tophat?

Hello Jianguang,

When in the analysis process to start using the reference GTF file can
depend on whether or not you intend to do any discovery along with
differential expression testing. At the TopHat and Cufflinks steps,
using reference GTF file can influence how datasets will map and
assemble. In general, if your intention is to do discovery (e.g. work
with novel isoforms in your data, but not in the reference), then do not
add in the reference GTF until the CuffMerge step (to produce