Hello.

I am trying to extract the protein translations from a genbank file. I am managing to get the proteins, but I would like to get the information about the translation out from the file too.

Example:

     CDS             complement(93..1919)                               


/translation="MTLENTSPNPSQISLNLSGGIALGAYMAGVCFELVRQARKDNSP
LLIDLITGASAGAMTGAITAYYLLNREISNTEYESQNILQRAWVEKADMKDIDTVFAI EDYRQVLNNLFKSQNESLLSQKGIKNIANLITENTDQLKVHQPLALVMTVTNLQGLLV
/product="hypothetical protein"

I have found two programs that could solve this problem. Coderet does give me the protein sequences, but the fasta description lines of the proteins are not easily relatable back to the genbank file.

unknown_pro_1
MTLENTSPNPSQISLNLSGGIALGAYMAGVCFELVRQARKDNSPLLIDLITGASAGAMTG
AITAYYLLNREISNTEYESQNILQRAWVEKADMKDIDTVFAIEDYRQVLNNLFKSQNESL
LSQKGIKNIANLITENTDQLKVHQPLALVMTVTNLQGLLV

Extractfeat gives me sensible description lines, but for now I have not been able to make it give me the protein, and not the DNA sequence.

scaffold00002_93_1919 [CDS] Contig scaffold00002
atgaccctagaaaatacctctcccaatcctagtcaaatttccctaaatttgtcgggagga
attgccctcggagcttatatggctggggtgtgttttgaattagttagacaagccagaaaa
gacaattctcccctgttaattgatttgattaccggagcatctgctggggcgatgaccgga
....


So. Are there any other programs, or options/switches to the ones that I have mentioned that I should be using?


TIA,

Karin

--
Karin Lagesen
Post Doc
Centre of Ecological and Evolutionary Synthesis (CEES)
Department of Biology
University of Oslo
P.O. Box 1066 - Blindern
N-0316 Oslo
Norway
_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to