Aaron,
Please do check for contaminants.... our experience with service providers and 
QC....I can write a book probably :(. The FastQC suite is a good place to start 
(also a galaxy wrapper is available for that). Even for 454 (not having fixed 
base positions and fixed lengths) it's quite informative (kmer 
overrepresentation and such).
In addition...check for contaminating sequences (ie Coli or Mycoplasma 
sequences not expected when sequencing human cells.... but you better check 
....experience).
In the MIRA documentation you will find some info on this prior to assembly 
filtering as well if I remember correctly.
Please keep us posted on your progress.

@Peter; hope you manage to take a flight and join the conference. A pity I 
won't be there but it looks very promising...

Alex



Van: Aaron Jex [mailto:a...@unimelb.edu.au]
Verzonden: dinsdag 24 mei 2011 8:52
Aan: Bossers, Alex
Onderwerp: RE: [galaxy-user] (no subject)

Hi Alex,

Thanks for the email.  I will have to have a closer read of the MIRA 
documentation I think.  I know that it definitely makes use of the quality data 
to some extent, but I hadn't considered whether it ignores low quality data or 
not (perhaps there's a threshold setting I could use - I'll check that).  I'm 
not too worried about adaptor sequence at the moment as these "should" be 
trimmed by our sequencing service, and I clip the ends on the reads when I 
extract the qual and fasta files from the original sff files anyways.

Best regards,
Aaron

Aaron Jex, BSc, PhD
Senior Research Officer,
Department of Veterinary Science,
The University of Melbourne,
250 Princes Highway,
Werribee, Victoria,
3030
tel: +61 3 9731 2294
From: Bossers, Alex [mailto:alex.boss...@wur.nl]
Sent: Tuesday, 24 May 2011 4:44 PM
To: Aaron Jex; galaxy-u...@bx.psu.edu
Subject: RE: [galaxy-user] (no subject)

Aaron,
As far as I remember MIRA....isn't MIRA taking into account the low/high 
quality bases anyway? So no need to filter there right?
Only filtering needed is for contaminating sequences.....(incl adapters and 
such). You can/have to check the MIRA website to be sure though.

The high qual segments I have used as in the metagenomics example but indeed 
you loose the exact qual info....but that is already above the provided 
threshold (default above 20 in Sanger quality score range).

Alex


Van: galaxy-user-boun...@lists.bx.psu.edu 
[mailto:galaxy-user-boun...@lists.bx.psu.edu] Namens Aaron Jex
Verzonden: dinsdag 24 mei 2011 1:40
Aan: galaxy-u...@bx.psu.edu
Onderwerp: [galaxy-user] (no subject)

Hi,

Can't seem to find an answer to this on your wiki site and it's not in the 
tutorial.  I would like to filter my 454 reads for high quality regions, rename 
the resulting sequence fragments AND relink the new reads (fragments) to the 
original quality data so that I can take these filtered reads and assembly them 
using MIRA. Is there a way to do this with Galaxy?  So basically all I want to 
do is take the new read fragments I get from converting the tabular file to the 
fasta file as shown in your metagenomics tutorial, and generate a corresponding 
qual file for these 'new' reads.

Best regards,
Aaron

Aaron Jex, BSc, PhD
Senior Research Officer,
Department of Veterinary Science,
The University of Melbourne,
250 Princes Highway,
Werribee, Victoria,
3030
tel: +61 3 9731 2294
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to