Hi Aaron I got a question concerning the XMFA format. I'm trying to extract the aligned sequences corresponding to matches (in lower case) from this format and I realize that insertions located between two matches are also indicated in lower case. This makes the parsing really complicate because it is not possible to identify automatically matches from the XMFA file (the number of 'lower cases' blocks is not the same in all the aligned genomes). Don't you think that the insertions have to be written in upper case ? Or do you have any other suggestion ? Hope my question is clear, but I'm not sure... Thanks Helene
Here is an example of an insertion lcated between two matches: #FormatVersion Mauve1 > > 1:1-104603 + AE017332_GR.fna > atttaatagcaaaaaAtaattactaattaaattattatgccAAACATATTATAAAAATTTtgctagcctcaaaagttaca taaattacgatacttttttaaattaaataagattaattctaataaattgtataaaaaaccgagtttttcccgagtttccc agtttgatgagataattttaaaaaagtgtccggtttcccgtgcttttttaagttttttgccaaaagttaattacctattt ttttactgatttattaagatatcaaagtagaaatcatacctatttcaaatattgggatcactaggaacaccatcgcgatt tttaaatatcatttgctgatctaaagtgttaaattcaaccctttgccttaatttttgcaacatcgttactaaaccaactt cagttaattgactgtttgcaaataacttgttatcactaatatgtttgttgattttaaaaattatagtttttaaaataact aatgagataaagcacaaaaatgtatgagctaggatatgctcatctattcttaaaaatacaggacgaatattcaataaacc ttttagacttctaaaattagcttcaatatttcactgtttttggtatttttcaactatgtctaagacatttaaatttagta tatttgtttcatagacatagtaaccatcaaattgtttgtccttgtcaattttactttgatctaattcaaatttcatgttt gaaatttccctaaaatatttaggttttttaccaaacaatttgtttacttcaataaaaccatctttattttgttttttaat aaaactttggatttgctcttcgcgagccttgctgtcttttattgctctttttttactgtaagtaataattcttcttctaa tattttcagtgtatcttttatttttataagatgaataaaattcttcttttttatacttaaaatctgcatttacatcaaca taatcgctaggatctaataaataatttttaaatttttgactgcctatctttgcacgataagaaataatgaaattatattc ttttgattcaagaaatcgaatatttgcagcagtggacatgccacggtcagcaattattgtcatatttttgatattatatt tagattcaatatctaatacaaaagggattagggtactagaatcggctgtatttcctttaaaaactttaatatgaaaagga atgccatttttatcgcatgctaaagcaatgacaatttgatcttctttgaatttagcatctttagaatagccaggaattct taatccatttctttcaaatgtctcaaaatagactgttgatgaatcaaaataaaattcattgtccctttttccaagctcgc ttataaccattttgttaagactatctaaaagttgattttgtgattcatagacaagatctaagagtctataaaagctattt ttggaagtatttatttgatttgagtaatcatcctttttatcaaaagcattaataatgctaccaggatcagtaattcgctt tgaaattaagtagttaaaaatttgtctcatatttttatgtctacttttaggaagtgattcaaaaatattgtgcttttcaa taagtttttcaattaattccccaccaacaaaaaccgaaccttcgattatggcggaatttttaatagaatcaagtaaagta gtcttgattttatctttgtcatctaaatttgaaaataatttcaatttttctttgataatttgaatagcatttggattaat tttttccaaagtttgcacatttcctatactaaatcaacgttttggagctttcccataaccttgtgttcagccaacatatt tatatattttatcttttgaagatcctcaaacattaaacaaaatcaaattatgctttttcataatttataattataccata aaattactttaaaattgtaattaattgtaattaataaaaattaaggttttacgggtaaaaaacaccgatttacggggatt tctgcatattcatattcactaaaaagtgaatttttgccatcaaactgggaaactcaggaaataagattaattctaataaa ttgtataaaaaaccgagtttttag-aaaaaaatgaaattaatttatctaaaaactcggttttcttaatttttcgctattt ttttatagttttctttttttagtcatcttccttaattgctctttgttttacgatgacatcggaaatatttttgggtacaa tttcatagtgattaaattgcatctgataggttccccggcctgaagtcattgaacgtaattgcgttgaatagccgaacatt tctgcaagtgggacgtgtccgcgaattacatttgccccatcagaacgagtttcctgttcacgaactagaccacgacgacg tgataaatctcccattacatctcctgcatattcagagggagcaaaaacagaaacatccataatcggctctaaaagtactg ttccaacagcatctctggcgcgggaaagtgccttagatgctgcgattttaaaggccatttcagaagaatcgacttcatga aaagaaccatcaaataaagttgcccttaaattaattaaaggatagccagcaagaattccggcttgcattttttcttctag tcctttttgaattgatttgatatattcttttggaattttccccccaacaattttatcaataaaatcaaaaccttcttcag gattaggttcaaatttaatccaaacatgaccgtattgcccacgcccaccggattgtttgatatattttccttcaacttca gcgctttttgtaattgtctcgcgataagaaacctgcggttttcctacccgagcttgaacattaaactctcttttgagccg atcaacaataatatcaaggtgcaattcgcccatcccggcaattattgtctgtccagtttcaatatcagttcaagttttga aagttggatcttcatttgctaatttttggagggcagtagctaatttttcaatttcggcttttgaaaaaggttcaagtgac tgggaaatcacaggttcaggaaaattcatcctttcaagaacaaaagtttttgctttttcagaaattaaagaatcaccagt tgttgtatcttttagaccaacaaaggcaccaatatcaccagttcttacctcatcaatttcttcacgggaattagcatgca tagctaaaatacggcctacacgttcttttttaccttttgttgagttaattatataagtacctttttttaaaactccagaa taaacccggaaaaaagtaagcgatccaacaaaaggatcattcataattttaaaagcaagggcagaaaattcttgatcatc actagcttcaattgtaatttcttcctcatcacgaaacgctttaattgggggaacatcaacaggggaaggtaaataatcga taaccgcatcgatcatctttttaacacccttgtttttaaaggaagatccgcagacaacaggaaaaaaattacctgtaatt gtcgcagcccgaattgcagcttttaattgttcaggtaaaatttccttttcttctagtaaattattaaaaatttcttcatc atagtcagcaactgcttctgcaagtgcaagtcgcatttgacttgctttttcgaaaagatcttcaggaattggaatttcat attcgatttcttctttttggccatcataattataagccttcatttcaacaagatcgataagcccactaaaatcggcttca gcgccaatatttaactgaattgcaactgcattcccgtttaattttgtccgcactgattcaattgaagcttcaaaatttgc acctgctttatccattttattgacataaacaattcttggaacgctataatttgttgcttgtcttcaaacagtttcagtct gaggttcaaccccggattgggcatctaaaacagcaactgccccatctaaaacccgtaatgatctttcaacttcgacagtg aaatcaacatggcctggggtatcgatgatgttaattctttttccttttcaaaaagcagttgttgctgccgaagttatcgt aattccgcgttctttttcctgttccattcagtccatttggctaactccatcatgagtttcgccaattttatgaatttttc ctgtatggaatagaattctttcagttgttgttgttttccctgcatcaatatgggccataattccaatattgcgataatct tttagttcaaattttcgtgccataattctaaattcctaccatttaaagtgggcaaaagccctatttgcttctgccatttt gtgggtatcttcttttttcttaaaggcccctccggttttattataagcatcaattatttcgtttgctaacttaactatca ttgttttttcattccgtttacgggcaaaaagaattaatcagcgtagtgcgagagtttgttgtcgtttctgtcttacttcc attggcacttgatagttcgttccgccaattcgacgagaacgaacttcggtaagcggtgttacatttttaaccgcctggcg aaaaacttcaagtgcatctttttgtaacttttcttctactaatttaaatgctgaataaagaatgttttgagcggtggttt ttttcccttctaacattgtgcaatttattgcttttgttatcagtttggaattaaaaactggatcggctaaaacattgcga acgggagcctgtttttttcgtgacattttatctccttttt--aaaaatatgttgtaatttatagtatttttttatttttt > > 2:207-94741 + AE017243_GR.fna > atttaatagcaaaaa-taattattaattaaattattatgccAAGAATATTACAAAAATTTtgctagccggaaaagttaca taaattacgatacttttttaaattaaataagattaattctaataaattgtataaaaaaccgagtttt------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ---------------------tag-aaaaaaatgaaattaattcatctaaaaactcggttttcttaatttttcgatattt ttttatagttttctttttttagtcatcttccttaattgctctttgttttacgatgacatcggaaatatttttgggtacaa tttcatagtgattaaattgcatctgataggttccccggcctgaagtcattgaacgtaattgcgttgaatacccgaacatt tctgcaagtgggacgtgtccgcgaattacatttgccccatcagaacgagtttcctgttcacgaactagaccacgacgacg tgataaatctcccattacatctcctgcatattcagagggagcaaaaacagaaacatccataatcggctctaaaagtactg ttccaacagcatctctggcgcgggaaagtgccttagatgctgcgattttaaaggccatttcagaagaatcgacttcatga aaagaaccatcaaataaagttgcccttaaattaattaaaggatagccagcaagaattccggcttgcattttttcttctag tcctttttgaattgatttgatatattcttttggaattttccccccaacaattttatcaataaaatcaaaaccttcttcag gattaggttcaaatttaatccaaacatgaccgtactgcccacgaccaccggattgtttgatatattttccttcaacttca gcgctttttgtaattgtctcgcgataagaaacctgcggttttcctacacgagcttgaacattaaactctcttttgagccg atcaacaataatatcaaggtgcaattcgcccattccggcaattattgtctgtccggtttcaatatcagttcaagttttga aagttggatcttcatttgctaatttttggagggcagtagctaatttttcaatttcggcttttgaaaaaggttcaagtgac tgggaaatcacaggttcaggaaaattcatcctttcaagaacaaaagtttttgctttttcggaaattaaagaatcaccagt tgttgtatcttttagaccaacaaaggcaccaatatcaccagttcttacctcatcaatttcttcacgggaattagcatgca tagctaaaatacggcctacacgttcttttttgccttttgttgagttaattatataagtacctttttttaaaactccagaa taaacccggaaaaaagtaagtgatccaacaaaaggatcattcataattttaaaagcaagggcagaaaattcttgatcatc gctagcttcaattgtaatttcttcctcatcacgaaacgctttaattgggggaacatcaacaggggaaggtaaataatcga taaccgcatcgatcatctttttaacacccttgtttttaaaggaagatccgcagacaacagggaaaaaattacctgtaatt gttgcagcccgaattgcagcttttaattgttcaggtaaaatttccttttcttctaataaattattaaaaatttcttcatc atagtcagcaactgcttctgcaagtgcaagtcgcatttgacttgctttttcgaaaagatcttcaggaattggaatttcat attcgatttcttctttttggccatcataattataagccttcatttcaacaagatcgataagtccactaaaatcggcttca gcgccaatatttaactgaattgcaactgcattcccgtttaattttgtccgcactgattcaattgaagcttcaaaatttgc accagctttatccattttattgacataaacaattcttggaacgctataatttgttgcttgtcttcaaacagtttcagtct gaggttcaaccccggattgggcatctaaaacagcaactgccccatctaaaacccgtaatgatctttcaacttcgacagtg aaatcaacatggcctggtgtatcgataatgttaattctttttccttttcaaaaagcagttgttgctgccgaagttatcgt aattccgcgttctttttcctgttccattcagtccatttggctaaccccatcatgagtttcgccaattttatgaatttttc ctgtatggaatagaattctttcagttgttgttgttttccctgcatcaatatgggccataattccaatattgcgataatct tttagttcaaattttcgtgccataattctaaattcctaccatttaaagtgggcaaaagccctatttgcttccgccatttt gtgggtatcttcttttttcttaaaggcccctccggttttattataagcatcaattatttcgtttgctaacttaactatca ttgttttttcattccgtttacgggcaaaaagaattaatcagcgcagtgcgagagtttgttgtcgtttctgtcttacctcc attggcacttgatagttcgttccgccaattcgacgagaacgaacttcggtaagcggtgttacatttttaaccgcctggcg aaaaacttcaagtgcatctttttgtaacttttcttctactaatttaaatgctgaataaagaatgttttgagcggtggttt ttttcccttctaacattgtgcaatttattgcttttgttatcagtttggaattaaaaactggatcggctaaaacattgcga acgggagcctgtttttttcgtgacattttatctcctttttT-aaaaatatgttgtaatttatagtattttttta--tttt > > 3:207-101485 + AE017244_GR.fna > atttaatagcaaaaaAtaattactaattaaattattatgccAAACATATTACAAAAATTTtgctagcctcaaaagttaca taaattacgatacttttttaaattaaataagattaattctaatagattgtataaaaaaccgagtttt------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ---------------------tagaaaaaaaatgaaattaatttatctaaaaactcggttttcttaatttttcgctattt ttttatagttttctttttttagtcatcttccttaattgctctttgttttacgatgacatctgaaatatttttgggtacaa tttcatagtgattaaattgcatctgataggttccccggcctgaagtcattgaacgtaattgcgttgaatacccgaacatt tctgcaagtgggacgtgaccgcgaattacatttgccccatcagaacgagtttcctgttcacgaactagaccacgacgacg tgataaatctcccattacatctcctgcatattcagagggagcaaaaacagaaacatccataatcggctctaaaagtactg ttccaacagcatctctggcgcgggaaagtgccttagatgctgcgattttaaaggccatttcagaagaatcgacttcatga aaagaaccatcaaataaagttgcccttaaattaattaaaggatagccagcaagaattccggcctgcattttttcttctag tcctttttgaattgatttgatatattcttttggaattttccccccaacaattttatcaataaaatcaaaaccttcttcag gattaggttcaaatttaatccaaacatgaccgtattgcccacgcccaccggattgtttgatatattttccttcaacttca gcgctttttgtaattgtctcgcgataagaaacctgcggttttcctacacgagcttgaacattaaactctcttttgagccg atcaacaataatatcaaggtgcaattcgcccatcccggcaattattgtctgtccagtttcaatatcagttcaagttttga aagttggatcttcatttgctaatttttggagggcagtagctaatttttcaatttcggcttttgaaaaaggttcaagtgac tgggaaatcacaggttcaggaaaattcatcctttcaagaacaaaagtttttgctttttcggaaattaaagaatcaccagt tgttgtatcttttaggccaacaaaggcaccaatatcaccagttcttacctcatcaatttcttcacgggagttagcatgca tagctaaaatccggcctacacgttcttttttaccttttgttgagttaattatataagtaccttttttcaaaactccagaa taaacccggaaaaaagtaagcgatccaacaaaaggatcattcataattttaaaagcaagggcagaaaattcttgatcatc actagcttcaattgtaatttcttcctcatcacgaaacgctttaattgggggaacatcaacaggggaaggtaaataatcga taaccgcatcgatcatctttttaacacccttgtttttaaaggaagatccgcagacaacagggaaaaaattacctgtaatt gttgcagcccgaattgcagcttttaattgttcaggtaaaatttccttttcttctagtaaattattaaaaatttcttcatc atagtcagcaactgcttctgcaagtgcaagtcgcatttgacttgctttttcgaaaagatcttcaggaattggaatttcat attcgatttcttctttttggccatcataattataagccttcatttcaacaagatcgataagtccactaaaatcggcttca gcgccaatatttaactgaattgcaacagcattcccgtttaattttgtccgcactgattcaattgaagcttcaaaatttgc acctgctttatccattttattgacataaacaattcttggaacgctataatttgttgcttgtcttcaaacagtttcagtct gaggttcaaccccggattgggcatctaaaacagcaactgccccatctaaaacccgtaatgatctttcaacttcgacagtg aaatcaacatggcctggggtatcgatgatgttaattctttttccttttcaaaaagcagttgttgctgccgaagttatcgt aattccgcgttctttttcctgttccattcagtccatttggctaactccatcatgagtttcgccaattttatgaatttttc ctgtatggaatagaattctttcagttgttgttgttttccctgcatcaatatgggccataattccaatattgcgataatct tttagttcaaattttcgtgccataattctaaattcctaccatttaaagtgggcaaaagccctatttgcttccgccatttt gtgggtatcttcttttttcttaaaggcccctccggttttattataagcatcaattatttcgtttgctaacttaactatca ttgttttttcattccgtttacgggcaaaaagaattaatcagcgcagtgcgagagtttgttgtcgtttctgtcttacctcc attggcacttgatagttagttccgccaattcgacgtgaacgaacttcggtaagcggtgttacatttttaaccgcctggcg aaaaacttcaagtgcatctttttgtaacttttcttctactaatttaaatgctgaataaagaatgttttgagcggtggttt ttttcccttctaacattgtgcaatttattgcttttgttatcagtttggaattaaaaactggatcggctaaaacattgcga acgggagcctgtttttttcgtgacattttatctcctttttTTaaaaatatgttgtaatttatagtattttttta--tttt = -- Helene Chiapello - Unite Mathematique, Informatique & Genome INRA - 78 352 Jouy-en-Josas - FRANCE tel : (33) (0)1 34 65 28 96, fax : (33) (0)1 34 65 29 01 mail : [EMAIL PROTECTED] ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Mauve-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mauve-users
