[Tutor] Character counting again, was Re: Tutor Digest, Vol 121, Issue 56

2014-03-24 Thread Peter Otten
Jumana yousef wrote:

[Please don't reply to the digest. At the very least change the subject to 
its original text. Thank you.]

> just a reminder of my data:
> it cossets of multiple sequences of DNA that I need to count the 
bases(characters) and calculate the percentage of C+G and calculate the 
entropy.
> before each sequence there is a header or identifier (lets say ID)
> so it is like
> >ID 1…etc
> AAGGTAACCATATATACCGGG….etc (up to or even more than 3000 characters)
> >ID 2…etc
> AAATAAATTTATATATACGCGCGCATGG….. etc
> … etc
> I need the out pu to be like this:
> > ID…1.. etc
> sequence length = a value
> G & G content: a value
> Entropy = a value
> > ID…2.. etc
> sequence length = a value
> G & G content: a value
> Entropy = a value
> ….etc
> 
> 
> I wrote a program close to what Denis suggested , however it works only if 
I have one sequence (one header and one sequence), I can not modify it to 
work if I have several sequences (like above). I also get an incorrect value 
for entropy (H) 
> 
> #!/usr/bin/python

If you put the following into a function, say show_stats(seq)

> print ' Sequence length : ', len(seq)
> counters = {}
> for char in seq:
> char = char.strip()
> if counters.has_key(char):
> counters[char] += 1
> else:
> counters[char] = 1
> c_g = 100*(counters['C']+counters['G'])/len(seq)
> print ' The C & G content: ' '%.1f'%  c_g, '%'
> import math
> all = len(seq)
> Pa = (counters['A'])/all
> Pc = counters['C']/all
> Pg = counters['G']/all
> Pt = counters['T']/all
> 
> H =-1*(Pa*math.log(Pa,2) + Pc*math.log(Pc,2) + Pg*math.log(Pg,2) + 
Pt*math.log(Pt,2))
> 
> print ' H = ' , H

you can invoke that function in and after the while loop like so:

> seq = ''
> while True:
> try:
> line = raw_input()
> index = line.find('>')
> if index > -1:
  if seq:
  show_stats(seq)
  seq = ""
> print line
> else:
> line = line.rstrip()
> line = line.upper()
> seq = seq + line
> except:
> break

if seq:
show_stats()

> I do not know why Pa, Pc, Pg, Pt give me a value of 0, although when I 
type counters['A'] or counters['C']. counters[T'] , counters['G'] or all I 
get values > 0.

When you divide an integer by an integer Python 2 gives you an integer by 
default:

>>> 1/3
0

You can avoid that by converting at least one operand to float

>>> float(1)/3
0.
>>> 1/float(3)
0.

or by putting the following magic import at the beginning of every module 
where you want float or "true" division rather than integer division:

>>> from __future__ import division
>>> 1/3
0.

> So please how I can fix this calculations and how I modify this program to 
read each sequence, print the results then read the second one and print the 
results and so on..
> 
> Many thanks for your help and support.

 



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character counting

2014-03-23 Thread Cameron Simpson
On 23Mar2014 17:28, Mustafa Musameh  wrote:
> Hi;
> I have a file that looks like this:
> >title 1
> AAATTTGGGCCCATA...
> TTAACAAGTTAAAT
> >title 2
> AAATTTAAACCC
> ATATATATA
> 
> 
> I wrote the following to count the As, Cs, Gs anTs for each title I wrote the
> following
> 
> import sys
> 
> file = open('file.fna')
> 
> data=file.readlines()
> for line in data:
> line = line.rstrip()
> if line.startswith('>') :
> print line
> if not line.startswith('>') :

You could just say "else" here instead of "if not".

> seq = line.rstrip()
> counters={}
> for char in seq:
> counters[char] = counters.get(char,0) + 1
> Ks = counters.keys()
> Ks.sort()
> for k in Ks:
> print sum(counters.itervalues())

This prints the same sum as many times as there are keys.
Notice that your print statement has no mention of "k"?

You either want just the "print" with no loop over Ks or you want
the loop, with some expression inside which changes depending on
the value of "k". You call, of course, depending on your desired
result.

Cheers,
-- 
Cameron Simpson 

"Don't you know the speed limit is 55 miles per hour???"
"Yeah, but I wasn't going to be out that long."
- Steven Wright
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character counting

2014-03-23 Thread Alan Gauld

On 23/03/14 06:28, Mustafa Musameh wrote:

Hi;
I have a file that looks like this:
 >title 1
AAATTTGGGCCCATA...
TTAACAAGTTAAAT…
 >title 2
AAATTTAAACCC…
ATATATATA…
…





I want to get the following out put:

 >title
234
 >title 1
3453
….


Your example data and example output don't match - at least
not in any way I can see.

Can you provide sample input and output from that sample?
That will help us understand exactly what you want.

It might be useful to break the code into functions so that
you have one to read the lines and if appropriate call a
second that analyzes a line returning the counts. Then
a third function can print the results in the format
you want. An optional fourth function could assign the
analysis results to the dictionary but that's probably
overkill.

You could even ignore the first one and just make
it your main driver code, but the second and third would
be helpful in testing and make the main code easier
to read.


HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] character counting

2014-03-23 Thread spir

On 03/23/2014 07:28 AM, Mustafa Musameh wrote:

Hi;
I have a file that looks like this:

title 1

AAATTTGGGCCCATA...
TTAACAAGTTAAAT…

title 2

AAATTTAAACCC…
ATATATATA…
…

I wrote the following to count the As, Cs, Gs anTs for each title I wrote the 
following
import sys

file = open('file.fna')

data=file.readlines()

for line in data:

 line = line.rstrip()

 if line.startswith('>') :

 print line

 if not line.startswith('>') :

 seq = line.rstrip()

 counters={}

 for char in seq:

 counters[char] = counters.get(char,0) + 1

 Ks = counters.keys()

 Ks.sort()

 for k in Ks:

 print sum(counters.itervalues())





I want to get the following out put:


title

234

title 1

3453
….
but what i get

title 1

60
60
60
60
…
it seems it do counting for each line and print it out.

Can you help me please
Thanks


(Your code does not work at all, as is. Probably you did not just copy paste a 
ruuning program.)


You are not taking into account the fact that there is a predefinite and small 
set of of bases, which are the keys of the 'counters' dict. This would simplify 
your code: see line below with "***". Example (adapted to python 3, and to read 
a string directly, instead of a file):


data = """\

title 1

AAATTTGGGCCCATA
TTAACAAGTTAAAT

title 2

AAATTTAAACCC
ATATATATA
"""

for line in data.split("\n"):
line = line.strip()
if line == "":  # for last line, maybe others
continue
if line.startswith('>'):
print(line)
continue

counters = {"A":0, "C":0, "G":0, "T":0} # ***
for base in line:
counters[base] += 1
bases = ["A","C","G","T"]   # ***
for base in bases:
print(counters[base], end=" ")
print()
==>


title 1

5 3 3 4
7 1 1 5

title 2

6 3 4 3
5 0 0 4

Is this what you want?

denis
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] character counting

2014-03-23 Thread Mustafa Musameh
Hi;
I have a file that looks like this:
>title 1
AAATTTGGGCCCATA...
TTAACAAGTTAAAT…
>title 2
AAATTTAAACCC…
ATATATATA…
…

I wrote the following to count the As, Cs, Gs anTs for each title I wrote the 
following
import sys

file = open('file.fna')

data=file.readlines()

for line in data:

line = line.rstrip()

if line.startswith('>') :

print line

if not line.startswith('>') :

seq = line.rstrip()

counters={}

for char in seq:

counters[char] = counters.get(char,0) + 1

Ks = counters.keys()

Ks.sort()

for k in Ks:

print sum(counters.itervalues())





I want to get the following out put:

>title
234
>title 1
3453
….
but what i get
>title 1
60
60
60
60
…
it seems it do counting for each line and print it out.

Can you help me please
Thanks

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor