On Fri, Nov 30, 2012 at 3:55 PM, Perumal Vijayan <[email protected]> wrote: > I have successfully uploaded a large fasta file (2.5 million genomic > sequence contigs) onto Galaxy server. I wish to extract a subset of > sequences from this file. I have a list of the fasta headers. Is there a > way I can accomplish this on Galaxy?
Yes, if you are running your own Galaxy instance you could use one of these two tools available on the Galaxy tool shed: http://toolshed.g2.bx.psu.edu/ 'seq_filter_by_id' - returns a filtered version of the sequence file with only those entries on your list. This can output two files, those on the list and those not on the list, or just the sequences not on the list. 'seq_select_by_id' - like the above but indexes the sequence file in order to extract the requested entries in the order given (rather than the order in the sequence file). Both of these tools work on FASTA, FASTQ and SFF files. Peter ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

