Hi

Mauve concatenates chromosomes (contigs)  prior to alignment.

How can this concatenates coordinates in an mauve alignment  mapped back to the 
original individual Sequences?

A final output of the alignment in clustal (.aln) format with original 
coordinates and chromosome name would be great.

Any idea how I can solve the problem?



with best regards,

Martin

--
Dr. Martin Münsterkötter
MIPS - Institute of Bioinformatics and Systems Biology
Helmholtz Zentrum München
German Research Center for Environmental Health (GmbH)
Ingolstädter Landstr. 1
85764 Neuherberg
Germany

http://www.helmholtz-muenchen.de/mips

Phone: +49-89-3187-3579
Fax: +49-89-3187-3585


________________________________________
Von: Guy Plunkett III [[email protected]]
Gesendet: Freitag, 7. September 2012 17:59
An: [email protected]
Betreff: Re: [Mauve-users] problem with input files in mauve

I'm replying to Alina offlist, but I thought I'd pass on some more general 
information to all.

(1) One issue she was having resulted from FASTA files that did not have the 
sequence wrapped, i.e., each sequence in the file consisted of an identifier 
line and a single sequence line, regardless of sequence length. While this is 
technically a valid FASTA file, Mauve doesn't deal well with them. I don't know 
what line length is the maximum Mauve can deal with, and I usually wrap 
sequences at something in the 60  - 80 residue range. Doing that with a file 
Alina sent me did the trick.

So how can this be accomplished?  Some folks will be proficient enough that 
they will just do it via command line text manipulation (regex, grep, sed, awk, 
etc.). Some folks will have a favorite text editor that readily allows such 
manipulations (BBEdit on Mac, for example). But the simplest solution for many 
folks will be a web service running Don Gilbert's venerable Readseq. Two such 
servers are <http://www.ebi.ac.uk/cgi-bin/readseq.cgi> and 
<http://iubio.bio.indiana.edu/cgi-bin/readseq.cgi>.

Just upload your file with the Choose File button, select Pearson|Fasta|fa as 
the output format under Options, and click on Submit. A file called 
"readseq.cgi" is downloaded to your computer. The file is wrapped at 60 
residues/line, and works with Mauve as is (although you might want to rename 
it).

(2) The second issue was a draft genome in many contigs, where she received the 
data as a directory full of individual files. For Mauve to deal with those 
sequences as a single genome, you need to convert the mutiple single-sequence 
files to a single multiple-sequence file. The only way I know to do this is via 
command line, but it is straightforward,, and will work for both FASTA and 
GenBank formats. Assuming you have all the .gbk sequences for a genome in the 
directory "genome", launch the Terminal (Mac) or Command Shell (Windows) and 
navigate to the directory that the "genome" directory is in. Then type the 
command to merge all the files in the genome directory into a single new file 
called all.gbk (or what ever name you want to use; works for any file 
extension).

On a Mac, type
        cat genomes/*.gbk > all.gbk

On Windows, type
        copy /a genomes\*.gbk all.gbk
(the /a is required to ensure that the resulting file is plain text)

Hope folks find this useful.

Cheers,
Guy

Dr. Guy Plunkett III
Senior Scientist, Genome Center of Wisconsin
Senior Scientist, DNASTAR
<http://www.genome.wisc.edu/information/gplunkett.html>http://www.genome.wisc.edu/information/gplunkett.html


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Mauve-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mauve-users

Helmholtz Zentrum München
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
Ingolstädter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe
Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum
Registergericht: Amtsgericht München HRB 6466
USt-IdNr: DE 129521671

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Mauve-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mauve-users

Reply via email to