Hi Daniel, nice to hear from you. Replies below.

On Tue, 2015-06-30 at 16:46 +0200, Daniel Dörr wrote:
> Dear Aaron,
> 
> I recently started using progressiveMauve to align large eukaryotic genomes 
> and ran into some problems: 
> 
> 1) the studied genomes are repeat masked (i.e. contain long stretches of Ns). 
> When extracting homologous segments of the input genomes from the backbone 
> file I found that some are located in masked regions. Is there a way to 
> prevent Mauve from using masked regions in identifying homologous segments? 
> As far as I am aware, no such parameter exists for the incorporated muscle 
> sequence aligner. 

there is currently no good way to prevent this behavior. It is likely
happening because the flanking regions were identified as positionally
homologous and so used as alignment anchors, and the masked region
between them in the two genomes became aligned because they were between
anchors. This happens because the N are internally converted to A by the
aligner when it stores the sequence in a 2-bit-per-base encoding. 

It might be possible to modify the homology HMM to include N as a
possible emission with probabilities reflecting a mixture of A,C,G,T and
so adjust the posterior probability of homology accordingly. This would
require some tinkering with the code.


> 2) I observe sometimes strange lines in the backbone file such as the 
> following:
> ___
> 7691835 7691966 -85715547       -85715547       0       0       0       0     
>   349474437       349474583       -700243823      -700243822      0       0
> 8282300 8282275 0       0       0       0       0       0       0       0     
>   0       0       0       0
> ___
> 
> Note that in the first line, the segments specified by columns [3,4] and [11, 
> 12] have lengths 0 and -1, respectively. Negative lengths mostly occur for 
> segments that are not homologous to segments in other genomes, as shown in 
> the second line (which makes me wonder why they are included in the backbone 
> file in the first place).

I've not seen this before but yes it does seem like a bug! As a
workaround, is it possible to ignore these segments in your downstream
processing until I can get a fix?

Best,
-Aaron

-- 
Aaron E. Darling, Ph.D.
Associate Professor, ithree institute
University of Technology Sydney
Australia

http://darlinglab.org
twitter: @koadman





UTS CRICOS Provider Code: 00099F
DISCLAIMER: This email message and any accompanying attachments may contain 
confidential information.
If you are not the intended recipient, do not read, use, disseminate, 
distribute or copy this message or
attachments. If you have received this message in error, please notify the 
sender immediately and delete
this message. Any views expressed in this message are those of the individual 
sender, except where the
sender expressly, and with authority, states them to be the views of the 
University of Technology Sydney.
Before opening any attachments, please check them for viruses and defects.

Think. Green. Do.

Please consider the environment before printing this email.
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Mauve-users mailing list
Mauve-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to