Hello Vincent,

For "percentage match", there are two interpretations.

1) what percent of your total data matches a particular taxonomic group

2) what percent (coverage/identity) of a query sequence matches a target "hit", leading to a taxonomic assignment


For #1:
The other tools in the group "Metagenomic analyses" all accept "Fetch taxonomic representation" output as input, and produce various summary, graphical, and statistical information. Please give these tools a try for.

On these tool forms, the "Fetch taxonomic representation" tool is referred to as "Taxonomy manipulation->Fetch Taxonomic Ranks", as I noted in my prior email. This is a legacy naming related to the prior publication, and we apologize if this still caused confusion. This should probably be updated, I will bring it up with the team.

For #2:
I sent instructions to Scott Tighe this morning with one example of how to use individual tools to select, sort, group, and filter data.
http://lists.bx.psu.edu/pipermail/galaxy-user/2012-March/004349.html
While the details for your analysis may differ, the basic tool set will probably be the same for your project. Filtering data by alignment quality prior to "Fetch Taxonomic Representation" was also part of the Metagenomics example in the publication we shared. The idea is to start with the parsed BLAST output, generate statistics, filter and group data based on those results, then go forward with Taxonomic assignments. There are no automated tools for this process, in a single step.

Hopefully this helps to clear up the tool set,

Best,

Jen
Galaxy team

On 3/13/12 2:49 PM, Montoya, Vincent wrote:
Hello
I previously asked whether or not I could retrieve more information from "Fetching Taxonomic 
Representation" as in my summarized taxonomy I have results for just about every organism 
imaginable.  Thus, the need to find out the percentage match for each of these results.  Currently, 
the Megablast results give you alignment information but the "Fetch taxonomic 
represenation" gives you none and does not give you any information to match it with the 
megablast results.
I appreciate the previous emails, but the comments and references do not 
address this problem.
Thanks
Vincent
________________________________________
From: John Major [john.e.major...@gmail.com]
Sent: Monday, March 12, 2012 11:28 AM
To: Jennifer Jackson
Cc: Montoya, Vincent; galaxy-u...@bx.psu.edu
Subject: Re: [galaxy-user] Metagenomics

A small warning re-the current cloud-Blast+ config.

To properly use the metagenomic tools, if you use the blast+ galaxy tool, make 
sure to export in blast.XML, then you'll need a script to parse out the readID 
and the Hit_def (as the hit ID).  It appears that the 'Hit_def' field contains 
the correct key to the taxonomy database.  Specifically, the Hit_def field is 
in the format #_#, where the 'gi' id is the first #.  The tabular (normal and 
extended) data does not contain this info.

I noticed this after attempting to use the tabular data, and using a trimmed 
col[1] (supposed to be hit seqID), but my results always came back as a ranked 
list of the most sequenced genomes in nt.... basically  keying in randomly.

j

On Wed, Mar 7, 2012 at 4:16 PM, Jennifer 
Jackson<j...@bx.psu.edu<mailto:j...@bx.psu.edu>>  wrote:
Hi Vincent, Scott,

Filtering raw hits is an important part of a metagenomics analysis pipeline. 
Please see the methods described in the published metagenomics analysis paper 
associated with this tool set:

Koskovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung W, Taylor J, and Nekrutenko 
A. "Windshield splatter analysis with the Galaxy metagenomic pipeline". Genome 
Research. 2009 Nov; 19(11):2144-53.

http://www.ncbi.nlm.nih.gov/pubmed/19819906

Live supplemental data that can be imported and experimented with is available 
on the public instance, including raw data, working histories, and a tutorial 
that demonstrates step-by-step the exact methods used in the publication:
http://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter
http://main.g2.bx.psu.edu/library ->  see "Windshield splatter"

Not all tools are available on the public main server, but a local or cloud 
instance could be used with wrapped tools from the Distribution or Tool Shed, 
as necessary. For example, BLAST is not available on the public instance, but 
is included in the distribution for use in local or cloud instances. 
http://getgalaxy.org

Hopefully you will both find this helpful,

Jen
Galaxy project




On 2/29/12 5:32 PM, Montoya, Vincent wrote:
Hello
I am a relatively new user on Galaxy and I had a question regarding "Fetching 
Taxonomic Information".  It is great that I can retrieve all of the hits for each 
sequence, but I cannot seem to find an option to also provide how accurate of a match it 
is to the given taxon.  For instance, a percentage match.  I can access this information 
in the original file and programmatically retrieve it but, it would be nice if it came in 
one package so that I can avoide those false hits that have a low percentage match.  Can 
you please provide me with instructions on how to best to retrieve this information 
(hopefully in a single file)?
Thank you
Vincent
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org<http://usegalaxy.org>.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org<http://usegalaxy.org>.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

Reply via email to