I have many notepad documents that all contain long chunks of genetic code. They look something like this:
atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa Basically, I want to design a program using python that can open and read these documents. However, I want them to be read 3 base pairs at a time (to analyse them codon by codon) and find the value that each codon has a value assigned to it. An example of this is below: ** If the three base pairs were UUU the value assigned to it (from the codon value table) would be 0.296 The program has to read all the sequence three pairs at a time, then I want to get all the values for each codon, multiply them together and put them to the power of 1 / the length of the sequence in codons (which is the length of the whole sequence divided by three). However, to make things even more complicated, the notebook sequences are in lowercase and the codon value table is in uppercase, so the sequences need to be converted into uppercase. Also, the Ts in the DNA sequences need to be changed to Us (again to match the codon value table). And finally, before the DNA sequences are read and analysed I need to remove the first 50 codons (i.e. the first 150 letters) and the last 20 codons (the last 60 letters) from the DNA sequence. I've also been having problems ensuring the program reads ALL the sequence 3 letters at a time. I've tried various ways of doing this but keep coming unstuck along the way. Has anyone got any suggestions for how they would tackle this problem? Thanks for any help recieved! -- http://mail.python.org/mailman/listinfo/python-list