Hi Aaron
I got a question concerning the XMFA format.
I'm trying to extract the aligned sequences corresponding to matches (in 
lower case) from
this format and I realize that insertions located between two matches 
are also indicated
in lower case. This makes the parsing really complicate because it is 
not possible
to identify automatically matches from the XMFA file (the number of 
'lower cases'
blocks is not the same in all the aligned genomes). Don't you think that 
the insertions
have to be written in upper case ? Or do you have any other suggestion ?
Hope my question is clear, but I'm not sure...
Thanks
Helene

Here is an example of an insertion lcated between two matches:

#FormatVersion Mauve1

> > 1:1-104603 + AE017332_GR.fna
>   
atttaatagcaaaaaAtaattactaattaaattattatgccAAACATATTATAAAAATTTtgctagcctcaaaagttaca
taaattacgatacttttttaaattaaataagattaattctaataaattgtataaaaaaccgagtttttcccgagtttccc
agtttgatgagataattttaaaaaagtgtccggtttcccgtgcttttttaagttttttgccaaaagttaattacctattt
ttttactgatttattaagatatcaaagtagaaatcatacctatttcaaatattgggatcactaggaacaccatcgcgatt
tttaaatatcatttgctgatctaaagtgttaaattcaaccctttgccttaatttttgcaacatcgttactaaaccaactt
cagttaattgactgtttgcaaataacttgttatcactaatatgtttgttgattttaaaaattatagtttttaaaataact
aatgagataaagcacaaaaatgtatgagctaggatatgctcatctattcttaaaaatacaggacgaatattcaataaacc
ttttagacttctaaaattagcttcaatatttcactgtttttggtatttttcaactatgtctaagacatttaaatttagta
tatttgtttcatagacatagtaaccatcaaattgtttgtccttgtcaattttactttgatctaattcaaatttcatgttt
gaaatttccctaaaatatttaggttttttaccaaacaatttgtttacttcaataaaaccatctttattttgttttttaat
aaaactttggatttgctcttcgcgagccttgctgtcttttattgctctttttttactgtaagtaataattcttcttctaa
tattttcagtgtatcttttatttttataagatgaataaaattcttcttttttatacttaaaatctgcatttacatcaaca
taatcgctaggatctaataaataatttttaaatttttgactgcctatctttgcacgataagaaataatgaaattatattc
ttttgattcaagaaatcgaatatttgcagcagtggacatgccacggtcagcaattattgtcatatttttgatattatatt
tagattcaatatctaatacaaaagggattagggtactagaatcggctgtatttcctttaaaaactttaatatgaaaagga
atgccatttttatcgcatgctaaagcaatgacaatttgatcttctttgaatttagcatctttagaatagccaggaattct
taatccatttctttcaaatgtctcaaaatagactgttgatgaatcaaaataaaattcattgtccctttttccaagctcgc
ttataaccattttgttaagactatctaaaagttgattttgtgattcatagacaagatctaagagtctataaaagctattt
ttggaagtatttatttgatttgagtaatcatcctttttatcaaaagcattaataatgctaccaggatcagtaattcgctt
tgaaattaagtagttaaaaatttgtctcatatttttatgtctacttttaggaagtgattcaaaaatattgtgcttttcaa
taagtttttcaattaattccccaccaacaaaaaccgaaccttcgattatggcggaatttttaatagaatcaagtaaagta
gtcttgattttatctttgtcatctaaatttgaaaataatttcaatttttctttgataatttgaatagcatttggattaat
tttttccaaagtttgcacatttcctatactaaatcaacgttttggagctttcccataaccttgtgttcagccaacatatt
tatatattttatcttttgaagatcctcaaacattaaacaaaatcaaattatgctttttcataatttataattataccata
aaattactttaaaattgtaattaattgtaattaataaaaattaaggttttacgggtaaaaaacaccgatttacggggatt
tctgcatattcatattcactaaaaagtgaatttttgccatcaaactgggaaactcaggaaataagattaattctaataaa
ttgtataaaaaaccgagtttttag-aaaaaaatgaaattaatttatctaaaaactcggttttcttaatttttcgctattt
ttttatagttttctttttttagtcatcttccttaattgctctttgttttacgatgacatcggaaatatttttgggtacaa
tttcatagtgattaaattgcatctgataggttccccggcctgaagtcattgaacgtaattgcgttgaatagccgaacatt
tctgcaagtgggacgtgtccgcgaattacatttgccccatcagaacgagtttcctgttcacgaactagaccacgacgacg
tgataaatctcccattacatctcctgcatattcagagggagcaaaaacagaaacatccataatcggctctaaaagtactg
ttccaacagcatctctggcgcgggaaagtgccttagatgctgcgattttaaaggccatttcagaagaatcgacttcatga
aaagaaccatcaaataaagttgcccttaaattaattaaaggatagccagcaagaattccggcttgcattttttcttctag
tcctttttgaattgatttgatatattcttttggaattttccccccaacaattttatcaataaaatcaaaaccttcttcag
gattaggttcaaatttaatccaaacatgaccgtattgcccacgcccaccggattgtttgatatattttccttcaacttca
gcgctttttgtaattgtctcgcgataagaaacctgcggttttcctacccgagcttgaacattaaactctcttttgagccg
atcaacaataatatcaaggtgcaattcgcccatcccggcaattattgtctgtccagtttcaatatcagttcaagttttga
aagttggatcttcatttgctaatttttggagggcagtagctaatttttcaatttcggcttttgaaaaaggttcaagtgac
tgggaaatcacaggttcaggaaaattcatcctttcaagaacaaaagtttttgctttttcagaaattaaagaatcaccagt
tgttgtatcttttagaccaacaaaggcaccaatatcaccagttcttacctcatcaatttcttcacgggaattagcatgca
tagctaaaatacggcctacacgttcttttttaccttttgttgagttaattatataagtacctttttttaaaactccagaa
taaacccggaaaaaagtaagcgatccaacaaaaggatcattcataattttaaaagcaagggcagaaaattcttgatcatc
actagcttcaattgtaatttcttcctcatcacgaaacgctttaattgggggaacatcaacaggggaaggtaaataatcga
taaccgcatcgatcatctttttaacacccttgtttttaaaggaagatccgcagacaacaggaaaaaaattacctgtaatt
gtcgcagcccgaattgcagcttttaattgttcaggtaaaatttccttttcttctagtaaattattaaaaatttcttcatc
atagtcagcaactgcttctgcaagtgcaagtcgcatttgacttgctttttcgaaaagatcttcaggaattggaatttcat
attcgatttcttctttttggccatcataattataagccttcatttcaacaagatcgataagcccactaaaatcggcttca
gcgccaatatttaactgaattgcaactgcattcccgtttaattttgtccgcactgattcaattgaagcttcaaaatttgc
acctgctttatccattttattgacataaacaattcttggaacgctataatttgttgcttgtcttcaaacagtttcagtct
gaggttcaaccccggattgggcatctaaaacagcaactgccccatctaaaacccgtaatgatctttcaacttcgacagtg
aaatcaacatggcctggggtatcgatgatgttaattctttttccttttcaaaaagcagttgttgctgccgaagttatcgt
aattccgcgttctttttcctgttccattcagtccatttggctaactccatcatgagtttcgccaattttatgaatttttc
ctgtatggaatagaattctttcagttgttgttgttttccctgcatcaatatgggccataattccaatattgcgataatct
tttagttcaaattttcgtgccataattctaaattcctaccatttaaagtgggcaaaagccctatttgcttctgccatttt
gtgggtatcttcttttttcttaaaggcccctccggttttattataagcatcaattatttcgtttgctaacttaactatca
ttgttttttcattccgtttacgggcaaaaagaattaatcagcgtagtgcgagagtttgttgtcgtttctgtcttacttcc
attggcacttgatagttcgttccgccaattcgacgagaacgaacttcggtaagcggtgttacatttttaaccgcctggcg
aaaaacttcaagtgcatctttttgtaacttttcttctactaatttaaatgctgaataaagaatgttttgagcggtggttt
ttttcccttctaacattgtgcaatttattgcttttgttatcagtttggaattaaaaactggatcggctaaaacattgcga
acgggagcctgtttttttcgtgacattttatctccttttt--aaaaatatgttgtaatttatagtatttttttatttttt

> > 2:207-94741 + AE017243_GR.fna
>   
atttaatagcaaaaa-taattattaattaaattattatgccAAGAATATTACAAAAATTTtgctagccggaaaagttaca
taaattacgatacttttttaaattaaataagattaattctaataaattgtataaaaaaccgagtttt-------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
---------------------tag-aaaaaaatgaaattaattcatctaaaaactcggttttcttaatttttcgatattt
ttttatagttttctttttttagtcatcttccttaattgctctttgttttacgatgacatcggaaatatttttgggtacaa
tttcatagtgattaaattgcatctgataggttccccggcctgaagtcattgaacgtaattgcgttgaatacccgaacatt
tctgcaagtgggacgtgtccgcgaattacatttgccccatcagaacgagtttcctgttcacgaactagaccacgacgacg
tgataaatctcccattacatctcctgcatattcagagggagcaaaaacagaaacatccataatcggctctaaaagtactg
ttccaacagcatctctggcgcgggaaagtgccttagatgctgcgattttaaaggccatttcagaagaatcgacttcatga
aaagaaccatcaaataaagttgcccttaaattaattaaaggatagccagcaagaattccggcttgcattttttcttctag
tcctttttgaattgatttgatatattcttttggaattttccccccaacaattttatcaataaaatcaaaaccttcttcag
gattaggttcaaatttaatccaaacatgaccgtactgcccacgaccaccggattgtttgatatattttccttcaacttca
gcgctttttgtaattgtctcgcgataagaaacctgcggttttcctacacgagcttgaacattaaactctcttttgagccg
atcaacaataatatcaaggtgcaattcgcccattccggcaattattgtctgtccggtttcaatatcagttcaagttttga
aagttggatcttcatttgctaatttttggagggcagtagctaatttttcaatttcggcttttgaaaaaggttcaagtgac
tgggaaatcacaggttcaggaaaattcatcctttcaagaacaaaagtttttgctttttcggaaattaaagaatcaccagt
tgttgtatcttttagaccaacaaaggcaccaatatcaccagttcttacctcatcaatttcttcacgggaattagcatgca
tagctaaaatacggcctacacgttcttttttgccttttgttgagttaattatataagtacctttttttaaaactccagaa
taaacccggaaaaaagtaagtgatccaacaaaaggatcattcataattttaaaagcaagggcagaaaattcttgatcatc
gctagcttcaattgtaatttcttcctcatcacgaaacgctttaattgggggaacatcaacaggggaaggtaaataatcga
taaccgcatcgatcatctttttaacacccttgtttttaaaggaagatccgcagacaacagggaaaaaattacctgtaatt
gttgcagcccgaattgcagcttttaattgttcaggtaaaatttccttttcttctaataaattattaaaaatttcttcatc
atagtcagcaactgcttctgcaagtgcaagtcgcatttgacttgctttttcgaaaagatcttcaggaattggaatttcat
attcgatttcttctttttggccatcataattataagccttcatttcaacaagatcgataagtccactaaaatcggcttca
gcgccaatatttaactgaattgcaactgcattcccgtttaattttgtccgcactgattcaattgaagcttcaaaatttgc
accagctttatccattttattgacataaacaattcttggaacgctataatttgttgcttgtcttcaaacagtttcagtct
gaggttcaaccccggattgggcatctaaaacagcaactgccccatctaaaacccgtaatgatctttcaacttcgacagtg
aaatcaacatggcctggtgtatcgataatgttaattctttttccttttcaaaaagcagttgttgctgccgaagttatcgt
aattccgcgttctttttcctgttccattcagtccatttggctaaccccatcatgagtttcgccaattttatgaatttttc
ctgtatggaatagaattctttcagttgttgttgttttccctgcatcaatatgggccataattccaatattgcgataatct
tttagttcaaattttcgtgccataattctaaattcctaccatttaaagtgggcaaaagccctatttgcttccgccatttt
gtgggtatcttcttttttcttaaaggcccctccggttttattataagcatcaattatttcgtttgctaacttaactatca
ttgttttttcattccgtttacgggcaaaaagaattaatcagcgcagtgcgagagtttgttgtcgtttctgtcttacctcc
attggcacttgatagttcgttccgccaattcgacgagaacgaacttcggtaagcggtgttacatttttaaccgcctggcg
aaaaacttcaagtgcatctttttgtaacttttcttctactaatttaaatgctgaataaagaatgttttgagcggtggttt
ttttcccttctaacattgtgcaatttattgcttttgttatcagtttggaattaaaaactggatcggctaaaacattgcga
acgggagcctgtttttttcgtgacattttatctcctttttT-aaaaatatgttgtaatttatagtattttttta--tttt

> > 3:207-101485 + AE017244_GR.fna
>   
atttaatagcaaaaaAtaattactaattaaattattatgccAAACATATTACAAAAATTTtgctagcctcaaaagttaca
taaattacgatacttttttaaattaaataagattaattctaatagattgtataaaaaaccgagtttt-------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
---------------------tagaaaaaaaatgaaattaatttatctaaaaactcggttttcttaatttttcgctattt
ttttatagttttctttttttagtcatcttccttaattgctctttgttttacgatgacatctgaaatatttttgggtacaa
tttcatagtgattaaattgcatctgataggttccccggcctgaagtcattgaacgtaattgcgttgaatacccgaacatt
tctgcaagtgggacgtgaccgcgaattacatttgccccatcagaacgagtttcctgttcacgaactagaccacgacgacg
tgataaatctcccattacatctcctgcatattcagagggagcaaaaacagaaacatccataatcggctctaaaagtactg
ttccaacagcatctctggcgcgggaaagtgccttagatgctgcgattttaaaggccatttcagaagaatcgacttcatga
aaagaaccatcaaataaagttgcccttaaattaattaaaggatagccagcaagaattccggcctgcattttttcttctag
tcctttttgaattgatttgatatattcttttggaattttccccccaacaattttatcaataaaatcaaaaccttcttcag
gattaggttcaaatttaatccaaacatgaccgtattgcccacgcccaccggattgtttgatatattttccttcaacttca
gcgctttttgtaattgtctcgcgataagaaacctgcggttttcctacacgagcttgaacattaaactctcttttgagccg
atcaacaataatatcaaggtgcaattcgcccatcccggcaattattgtctgtccagtttcaatatcagttcaagttttga
aagttggatcttcatttgctaatttttggagggcagtagctaatttttcaatttcggcttttgaaaaaggttcaagtgac
tgggaaatcacaggttcaggaaaattcatcctttcaagaacaaaagtttttgctttttcggaaattaaagaatcaccagt
tgttgtatcttttaggccaacaaaggcaccaatatcaccagttcttacctcatcaatttcttcacgggagttagcatgca
tagctaaaatccggcctacacgttcttttttaccttttgttgagttaattatataagtaccttttttcaaaactccagaa
taaacccggaaaaaagtaagcgatccaacaaaaggatcattcataattttaaaagcaagggcagaaaattcttgatcatc
actagcttcaattgtaatttcttcctcatcacgaaacgctttaattgggggaacatcaacaggggaaggtaaataatcga
taaccgcatcgatcatctttttaacacccttgtttttaaaggaagatccgcagacaacagggaaaaaattacctgtaatt
gttgcagcccgaattgcagcttttaattgttcaggtaaaatttccttttcttctagtaaattattaaaaatttcttcatc
atagtcagcaactgcttctgcaagtgcaagtcgcatttgacttgctttttcgaaaagatcttcaggaattggaatttcat
attcgatttcttctttttggccatcataattataagccttcatttcaacaagatcgataagtccactaaaatcggcttca
gcgccaatatttaactgaattgcaacagcattcccgtttaattttgtccgcactgattcaattgaagcttcaaaatttgc
acctgctttatccattttattgacataaacaattcttggaacgctataatttgttgcttgtcttcaaacagtttcagtct
gaggttcaaccccggattgggcatctaaaacagcaactgccccatctaaaacccgtaatgatctttcaacttcgacagtg
aaatcaacatggcctggggtatcgatgatgttaattctttttccttttcaaaaagcagttgttgctgccgaagttatcgt
aattccgcgttctttttcctgttccattcagtccatttggctaactccatcatgagtttcgccaattttatgaatttttc
ctgtatggaatagaattctttcagttgttgttgttttccctgcatcaatatgggccataattccaatattgcgataatct
tttagttcaaattttcgtgccataattctaaattcctaccatttaaagtgggcaaaagccctatttgcttccgccatttt
gtgggtatcttcttttttcttaaaggcccctccggttttattataagcatcaattatttcgtttgctaacttaactatca
ttgttttttcattccgtttacgggcaaaaagaattaatcagcgcagtgcgagagtttgttgtcgtttctgtcttacctcc
attggcacttgatagttagttccgccaattcgacgtgaacgaacttcggtaagcggtgttacatttttaaccgcctggcg
aaaaacttcaagtgcatctttttgtaacttttcttctactaatttaaatgctgaataaagaatgttttgagcggtggttt
ttttcccttctaacattgtgcaatttattgcttttgttatcagtttggaattaaaaactggatcggctaaaacattgcga
acgggagcctgtttttttcgtgacattttatctcctttttTTaaaaatatgttgtaatttatagtattttttta--tttt
=



-- 

Helene Chiapello - Unite Mathematique, Informatique & Genome 
INRA - 78 352 Jouy-en-Josas -  FRANCE
tel : (33) (0)1 34 65 28 96, fax : (33) (0)1 34 65 29 01
mail : [EMAIL PROTECTED]     


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Mauve-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to