Re: [Mauve-users] Contigs

Aaron Darling Wed, 07 Feb 2007 19:40:48 -0800

Hi Ashlee!
response below...

Ashlee Earl wrote:
> Hi Aaron,
>
> I have 3 files each containing contigs from newly sequenced genomes of one
> bacterial species- 33-130 contigs/genome.  I ran progressive Mauve with
> these new sequences against our fully sequenced reference genome.  The
> results were not so straightforward (i.e., there should have been much more
> alignment) so I am wondering whether Mauve is capable of aligning fragmented
> sequences.  Also, I have noticed from my own small scale analyses that some
> of the contigs are in reverse orientation relative to our reference.  Could
> this be part of the problem?
>
> Thanks so much in advance for your advice.
>
> Best, Ashlee
>


Progressive Mauve should be able to handle alignment of 
partially-sequenced microbial genomes without trouble, although it may 
require some fine tuning of the parameters.
If the genomes are very closely related (>98% sequence identity), then 
the default minimum LCB score may be too low and misalignments in 
repetitive regions may arise.  For example, in a pairwise alignment of 
Shigella and E. coli, I found I had to set the minimum LCB score to 
around 1000000 to get a reasonable progressiveMauve alignment.  The 
scoring has changed between original Mauve and progressive, so the 
scores/weights don't translate between methods.

If the genomes are more divergent, say between 60% and 80% identity in 
homologous regions, a poor alignment may indicate that Mauve can't find 
any alignment anchors.  It may help to set a lower Match seed weight.  
For a small microbial genome, it may be reasonable to use a seed weight 
as low as 11 or even 9.  If the genomes have < 50% nucleotide identity 
in homologous regions, then it's probably necessary to include a 
phylogenetic intermediate in order to get a Mauve alignment.

If a substantial amount of genome rearrangement has taken place, and 
"false" rearrangement exists due to incorrect contig order, then the 
default value for the minimum LCB score may be too high.  The default 
value gets calculated by a rather arcane formula and printed to the 
console log.  The default is probably in the neighborhood of 30,000 for 
bacterial genomes, so perhaps 15000 would be a good place to start.

It would also be a good idea to reorder the contigs to match the 
reference genome order as much as possible.  A Mauve alignment can be 
used as a guide for manually reordering contigs, but other programs have 
been designed specifically for the purpose of contig ordering.  
Projector2 comes to mind:
http://bioinformatics.biol.rug.nl/websoftware/projector2/projector2_start.php

If after some fine tuning it still seems like Mauve isn't aligning 
things that it should be, send me the data off-list and I'll do a closer 
investigation to be sure it's not the result of a software bug. 

Or maybe you've got a rather large pan-genome on your hands?

-Aaron




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Mauve-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mauve-users

Re: [Mauve-users] Contigs

Reply via email to