On Thu, Feb 3, 2011 at 11:54 AM, Peter Cock <[email protected]> wrote: > Hi all, > > I'm currently working with some 454 data where the sample was > amplified with selective primers, and therefore the reads need a > little processing to remove the primer sequences before assembly > or mapping (something that sff_extract cleverly spots and warns > the user about when doing an SFF to FASTA/FASTQ conversion). > > The actual processing I want to do is very similar to spotting > and removing barcodes or adapters - except that PRC primers > are often degenerate, i.e. have an N in them representing the > fact it is a pool of primers covering A, C, G and T at that point, > and primers may come in pairs. > > Looking over the provided tools in Galaxy, the only relevant ones > I saw are as follows: > > emboss_5/emboss_primersearch.xml - the text output does not > look helpful for trimming my sequences - nothing else in Galaxy > uses this format, does it? > > fastx_toolkit/fastx_barcode_splitter.xml - copes with 5' or 3' > barcodes, but only handles fastqsolexa (discussed recently on the > mailing list - I guess it could handle fastqsanger and fastqillumina > as well), not FASTA or SFF. Also according to the FASTX docs for > fastx_barcode_splitter.pl it require non-ambiguous barcodes > (i.e. ACGT only), so using it with ambiguous primers won't work: > http://hannonlab.cshl.edu/fastx_toolkit/commandline.html > > I did look on the tool shed and noticed Edward Kirton has done > some wrappers for the "Suite of Newbler tools", but his sfffile > wrapper does not (yet) include support for splitting SFF files using > Roche's MID barcodes. > > Are there any other relevant tools I have overlooked?
I forgot to mention fastx_toolkit/fastx_clipper.xml aka "Clip" which does handle FASTA and FASTQ files, but apparently only deals with 3' adapters (although perhaps the poorly documented -d switch is relevant for a 5' adapter?), and appears to only handle one adapter sequence at a time. The documentation doesn't mention what happens if you want to use an ambiguous adapter sequence (e.g. with an N in it). Peter _______________________________________________ galaxy-dev mailing list [email protected] http://lists.bx.psu.edu/listinfo/galaxy-dev
