Hi Ashlee! response below... Ashlee Earl wrote: > Hi Aaron, > > I have 3 files each containing contigs from newly sequenced genomes of one > bacterial species- 33-130 contigs/genome. I ran progressive Mauve with > these new sequences against our fully sequenced reference genome. The > results were not so straightforward (i.e., there should have been much more > alignment) so I am wondering whether Mauve is capable of aligning fragmented > sequences. Also, I have noticed from my own small scale analyses that some > of the contigs are in reverse orientation relative to our reference. Could > this be part of the problem? > > Thanks so much in advance for your advice. > > Best, Ashlee >
Progressive Mauve should be able to handle alignment of partially-sequenced microbial genomes without trouble, although it may require some fine tuning of the parameters. If the genomes are very closely related (>98% sequence identity), then the default minimum LCB score may be too low and misalignments in repetitive regions may arise. For example, in a pairwise alignment of Shigella and E. coli, I found I had to set the minimum LCB score to around 1000000 to get a reasonable progressiveMauve alignment. The scoring has changed between original Mauve and progressive, so the scores/weights don't translate between methods. If the genomes are more divergent, say between 60% and 80% identity in homologous regions, a poor alignment may indicate that Mauve can't find any alignment anchors. It may help to set a lower Match seed weight. For a small microbial genome, it may be reasonable to use a seed weight as low as 11 or even 9. If the genomes have < 50% nucleotide identity in homologous regions, then it's probably necessary to include a phylogenetic intermediate in order to get a Mauve alignment. If a substantial amount of genome rearrangement has taken place, and "false" rearrangement exists due to incorrect contig order, then the default value for the minimum LCB score may be too high. The default value gets calculated by a rather arcane formula and printed to the console log. The default is probably in the neighborhood of 30,000 for bacterial genomes, so perhaps 15000 would be a good place to start. It would also be a good idea to reorder the contigs to match the reference genome order as much as possible. A Mauve alignment can be used as a guide for manually reordering contigs, but other programs have been designed specifically for the purpose of contig ordering. Projector2 comes to mind: http://bioinformatics.biol.rug.nl/websoftware/projector2/projector2_start.php If after some fine tuning it still seems like Mauve isn't aligning things that it should be, send me the data off-list and I'll do a closer investigation to be sure it's not the result of a software bug. Or maybe you've got a rather large pan-genome on your hands? -Aaron ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Mauve-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mauve-users
