Re: [galaxy-user] extracting a subset of sequences from a very large fasta file(1.5 million)

Peter Cock Fri, 30 Nov 2012 08:29:31 -0800

On Fri, Nov 30, 2012 at 3:55 PM, Perumal Vijayan <[email protected]> wrote:
> I have successfully uploaded a large fasta file (2.5 million genomic
> sequence contigs) onto Galaxy server.  I wish to extract a subset of
> sequences from this file.  I have a list of the fasta headers.  Is there a
> way I can accomplish this on Galaxy?


Yes, if you are running your own Galaxy instance you could use
one of these two tools available on the Galaxy tool shed:
http://toolshed.g2.bx.psu.edu/

 'seq_filter_by_id' - returns a filtered version of the sequence file
with only those entries on your list. This can output two files,
those on the list and those not on the list, or just the sequences
not on the list.

 'seq_select_by_id' - like the above but indexes the sequence file
in order to extract the requested entries in the order given (rather
than the order in the sequence file).

Both of these tools work on FASTA, FASTQ and SFF files.

Peter
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] extracting a subset of sequences from a very large fasta file(1.5 million)

Reply via email to