[Tutor] Character counting again, was Re: Tutor Digest, Vol 121, Issue 56
Jumana yousef wrote: [Please don't reply to the digest. At the very least change the subject to its original text. Thank you.] > just a reminder of my data: > it cossets of multiple sequences of DNA that I need to count the bases(characters) and calculate the percentage of C+G and calculate the entropy. > before each sequence there is a header or identifier (lets say ID) > so it is like > >ID 1…etc > AAGGTAACCATATATACCGGG….etc (up to or even more than 3000 characters) > >ID 2…etc > AAATAAATTTATATATACGCGCGCATGG….. etc > … etc > I need the out pu to be like this: > > ID…1.. etc > sequence length = a value > G & G content: a value > Entropy = a value > > ID…2.. etc > sequence length = a value > G & G content: a value > Entropy = a value > ….etc > > > I wrote a program close to what Denis suggested , however it works only if I have one sequence (one header and one sequence), I can not modify it to work if I have several sequences (like above). I also get an incorrect value for entropy (H) > > #!/usr/bin/python If you put the following into a function, say show_stats(seq) > print ' Sequence length : ', len(seq) > counters = {} > for char in seq: > char = char.strip() > if counters.has_key(char): > counters[char] += 1 > else: > counters[char] = 1 > c_g = 100*(counters['C']+counters['G'])/len(seq) > print ' The C & G content: ' '%.1f'% c_g, '%' > import math > all = len(seq) > Pa = (counters['A'])/all > Pc = counters['C']/all > Pg = counters['G']/all > Pt = counters['T']/all > > H =-1*(Pa*math.log(Pa,2) + Pc*math.log(Pc,2) + Pg*math.log(Pg,2) + Pt*math.log(Pt,2)) > > print ' H = ' , H you can invoke that function in and after the while loop like so: > seq = '' > while True: > try: > line = raw_input() > index = line.find('>') > if index > -1: if seq: show_stats(seq) seq = "" > print line > else: > line = line.rstrip() > line = line.upper() > seq = seq + line > except: > break if seq: show_stats() > I do not know why Pa, Pc, Pg, Pt give me a value of 0, although when I type counters['A'] or counters['C']. counters[T'] , counters['G'] or all I get values > 0. When you divide an integer by an integer Python 2 gives you an integer by default: >>> 1/3 0 You can avoid that by converting at least one operand to float >>> float(1)/3 0. >>> 1/float(3) 0. or by putting the following magic import at the beginning of every module where you want float or "true" division rather than integer division: >>> from __future__ import division >>> 1/3 0. > So please how I can fix this calculations and how I modify this program to read each sequence, print the results then read the second one and print the results and so on.. > > Many thanks for your help and support. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character counting
On 23Mar2014 17:28, Mustafa Musameh wrote: > Hi; > I have a file that looks like this: > >title 1 > AAATTTGGGCCCATA... > TTAACAAGTTAAAT > >title 2 > AAATTTAAACCC > ATATATATA > > > I wrote the following to count the As, Cs, Gs anTs for each title I wrote the > following > > import sys > > file = open('file.fna') > > data=file.readlines() > for line in data: > line = line.rstrip() > if line.startswith('>') : > print line > if not line.startswith('>') : You could just say "else" here instead of "if not". > seq = line.rstrip() > counters={} > for char in seq: > counters[char] = counters.get(char,0) + 1 > Ks = counters.keys() > Ks.sort() > for k in Ks: > print sum(counters.itervalues()) This prints the same sum as many times as there are keys. Notice that your print statement has no mention of "k"? You either want just the "print" with no loop over Ks or you want the loop, with some expression inside which changes depending on the value of "k". You call, of course, depending on your desired result. Cheers, -- Cameron Simpson "Don't you know the speed limit is 55 miles per hour???" "Yeah, but I wasn't going to be out that long." - Steven Wright ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character counting
On 23/03/14 06:28, Mustafa Musameh wrote: Hi; I have a file that looks like this: >title 1 AAATTTGGGCCCATA... TTAACAAGTTAAAT… >title 2 AAATTTAAACCC… ATATATATA… … I want to get the following out put: >title 234 >title 1 3453 …. Your example data and example output don't match - at least not in any way I can see. Can you provide sample input and output from that sample? That will help us understand exactly what you want. It might be useful to break the code into functions so that you have one to read the lines and if appropriate call a second that analyzes a line returning the counts. Then a third function can print the results in the format you want. An optional fourth function could assign the analysis results to the dictionary but that's probably overkill. You could even ignore the first one and just make it your main driver code, but the second and third would be helpful in testing and make the main code easier to read. HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] character counting
On 03/23/2014 07:28 AM, Mustafa Musameh wrote: Hi; I have a file that looks like this: title 1 AAATTTGGGCCCATA... TTAACAAGTTAAAT… title 2 AAATTTAAACCC… ATATATATA… … I wrote the following to count the As, Cs, Gs anTs for each title I wrote the following import sys file = open('file.fna') data=file.readlines() for line in data: line = line.rstrip() if line.startswith('>') : print line if not line.startswith('>') : seq = line.rstrip() counters={} for char in seq: counters[char] = counters.get(char,0) + 1 Ks = counters.keys() Ks.sort() for k in Ks: print sum(counters.itervalues()) I want to get the following out put: title 234 title 1 3453 …. but what i get title 1 60 60 60 60 … it seems it do counting for each line and print it out. Can you help me please Thanks (Your code does not work at all, as is. Probably you did not just copy paste a ruuning program.) You are not taking into account the fact that there is a predefinite and small set of of bases, which are the keys of the 'counters' dict. This would simplify your code: see line below with "***". Example (adapted to python 3, and to read a string directly, instead of a file): data = """\ title 1 AAATTTGGGCCCATA TTAACAAGTTAAAT title 2 AAATTTAAACCC ATATATATA """ for line in data.split("\n"): line = line.strip() if line == "": # for last line, maybe others continue if line.startswith('>'): print(line) continue counters = {"A":0, "C":0, "G":0, "T":0} # *** for base in line: counters[base] += 1 bases = ["A","C","G","T"] # *** for base in bases: print(counters[base], end=" ") print() ==> title 1 5 3 3 4 7 1 1 5 title 2 6 3 4 3 5 0 0 4 Is this what you want? denis ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] character counting
Hi; I have a file that looks like this: >title 1 AAATTTGGGCCCATA... TTAACAAGTTAAAT… >title 2 AAATTTAAACCC… ATATATATA… … I wrote the following to count the As, Cs, Gs anTs for each title I wrote the following import sys file = open('file.fna') data=file.readlines() for line in data: line = line.rstrip() if line.startswith('>') : print line if not line.startswith('>') : seq = line.rstrip() counters={} for char in seq: counters[char] = counters.get(char,0) + 1 Ks = counters.keys() Ks.sort() for k in Ks: print sum(counters.itervalues()) I want to get the following out put: >title 234 >title 1 3453 …. but what i get >title 1 60 60 60 60 … it seems it do counting for each line and print it out. Can you help me please Thanks ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor