Hello again,
I've done a little more work identifying this potential bug (prompted by
another user). It seems to arise from the ".bbcols" file, not
stripSubsetLCBs.
I could not find online documentation for the bbcols format, but my
impression is that it contains the same information as the backbone
file, just in a different format. For my alignment, the two do not agree:
For instance, the backbone file describes the first block of my
alignment like this:
1149292 1150071 29190 29969 29283 30062 0 0
1150072 1172432 29970 52341 30063 52427 1 22365
0 0 0 0 52428 52475 0 0
1172481 1172481 0 0 52476 52476 0 0
1172482 1172488 52342 52348 52477 52483 0 0
1172489 1176826 52349 56687 52484 56822 22366 26703
1176827 1176830 56688 56691 56823 56826 0 0
1176831 1176857 0 0 56827 56853 0 0
1176858 1176861 56692 56695 56854 56857 0 0
1176862 1189367 56696 69202 56858 69365 26704 35478
1189368 1193095 69203 72932 69366 73095 0 0
and bbcols describes it like this:
0 1 780 0 1 2
0 781 22372 0 1 2 3
0 23249 1 0 2
0 23250 7 0 1 2
0 23257 4339 0 1 2 3
0 27596 4 0 1 2
0 27600 1 0 2
0 27601 26 0 2
0 27627 4 0 1 2
0 27631 12508 0 1 2 3
I can provide the full alignment for anyone who wants to look more
closely. Basically, I have three complete genomes and one draft genome
(#3) which is responsible for all these gaps.
Any advice would be appreciated.
Thanks,
adam
On 10/9/2012 7:17 PM, Adam Retchless wrote:
Hello,
I am using Mauve as part of the ClonalFrame pipeline for bacterial
genomes. I am very excited about what I can do with these tools, but I'm
hitting a snag with the stripSubsetsLCBs program (recently downloaded
from the "successful builds" webpage)
My understanding is that this program should trim the alignment file
down just the core region (i.e. shared by all genomes). However, I have
noticed that many of the blocks in the alignment file have large indels
in them (>1 kb). I ran Mauve with the default settings, and my
understanding is that gaps larger than 20bp would not be included in the
core region. At it's worst, I have a block that is >5kb, but only about
300 bp are aligned across all genomes.
I did a sliding-window analysis of a bunch of the blocks, and I think I
see what's going on.
These large gaps always seem to be at the end of the block. For example,
my original alignment file had one large block with gaps at each end,
and two internal regions with indels. This was split into three blocks
(removing the internal gaps) and the gap at the front was removed. The
gap at the back was NOT removed.
Is this expected behavior? If not, is there anything I can do to fix this?
Thanks,
Adam
UC Berkeley
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Mauve-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mauve-users
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Mauve-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mauve-users