Re: [Tutor] arrangement of datafile

2014-01-10 Thread Albert-Jan Roskam

Ok, it's clear already that the OP has a csv file so the following is 
OFF-TOPIC. I was reading Python Cookbook and I saw a recipe to read fixed width 
files using struct.unpack. Much shorter and faster (esp. if you use compiled 
structs) than indexing. I thought this is a pretty cool approach: 
http://code.activestate.com/recipes/65224-accessing-substrings/.

regards,
Albert-Jan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] arrangement of datafile

2014-01-10 Thread Amrita Kumari
Hi Peter,

Thankyou very much for your kind help. I got the output like the way I
wanted (which you have also shown in your output). I really appreciate your
effort.

Thanks for your time.
Amrita


On Thu, Jan 9, 2014 at 8:41 PM, Peter Otten <__pete...@web.de> wrote:

> Amrita Kumari wrote:
>
> > On 17th Dec. I posted one question, how to arrange datafile in a
> > particular fashion so that I can have only residue no. and chemical
> > shift value of the atom as:
> > 1  H=nil
> > 2  H=8.8500
> > 3  H=8.7530
> > 4  H=7.9100
> > 5  H=7.4450
> > 
> > Peter has replied to this mail but since I haven't subscribe to the
> > tutor mailing list earlier hence I didn't receive the reply, I
> > apologize for my mistake, today I checked his reply and he asked me to
> > do few things:
>
> I'm sorry, I'm currently lacking the patience to tune into your problem
> again, but maybe the script that I wrote (but did not post) back then is of
> help.
>
> The data sample:
>
> $ cat residues.txt
> 1 GLY HA2=3.7850 HA3=3.9130
> 2 SER H=8.8500 HA=4.3370 N=115.7570
> 3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380
> 4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810
> 5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790
> 6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690
> HD12=0.7690 HD13=0.7690 N=117.3260
> 7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800
> 8 PRO HD2=3.7450
> 9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660
> 10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470
> HG12=1.6010 HG13=2.1670 N=119.0300
> 11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620
> 12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640
> 13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620
> HD13=0.8620 N=119.1360
>
> The script:
>
> $ cat residues.py
> def process(filename):
> residues = {}
> with open(filename) as infile:
> for line in infile:
> parts = line.split()# split line at whitespace
> residue = int(parts.pop(0)) # convert first item to integer
> if residue in residues:
> raise ValueError("duplicate residue {}".format(residue))
> parts.pop(0)# discard second item
>
> # split remaining items at "=" and put them in a dict,
> # e. g. {"HA2": 3.7, "HA3": 3.9}
> pairs = (pair.split("=") for pair in parts)
> lookup = {atom: float(value) for atom, value in pairs}
>
> # put previous lookup dict in residues dict
> # e. g. {1: {"HA2": 3.7, "HA3": 3.9}}
> residues[residue] = lookup
>
> return residues
>
> def show(residues):
> atoms = set().union(*(r.keys() for r in residues.values()))
> residues = sorted(residues.items())
> for atom in sorted(atoms):
> for residue, lookup in residues:
> print "{} {}={}".format(residue, atom, lookup.get(atom, "nil"))
> print
> print "---"
> print
>
> if __name__ == "__main__":
> r = process("residues.txt")
> show(r)
>
> Note that converting the values to float can be omitted if all you want to
> do is print them. Finally the output of the script:
>
> $ python residues.py
> 1 H=nil
> 2 H=8.85
> 3 H=8.753
> 4 H=7.91
> 5 H=7.445
> 6 H=7.687
> 7 H=7.819
> 8 H=nil
> 9 H=8.235
> 10 H=7.979
> 11 H=7.947
> 12 H=8.191
> 13 H=8.133
>
> ---
>
> 1 HA=nil
> 2 HA=4.337
> 3 HA=4.034
> 4 HA=3.862
> 5 HA=4.077
> 6 HA=4.21
> 7 HA=4.554
> 8 HA=nil
> 9 HA=4.012
> 10 HA=3.697
> 11 HA=4.369
> 12 HA=4.192
> 13 HA=3.817
>
> ---
>
> [snip]
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] arrangement of datafile

2014-01-09 Thread Peter Otten
Amrita Kumari wrote:

> On 17th Dec. I posted one question, how to arrange datafile in a
> particular fashion so that I can have only residue no. and chemical
> shift value of the atom as:
> 1  H=nil
> 2  H=8.8500
> 3  H=8.7530
> 4  H=7.9100
> 5  H=7.4450
> 
> Peter has replied to this mail but since I haven't subscribe to the
> tutor mailing list earlier hence I didn't receive the reply, I
> apologize for my mistake, today I checked his reply and he asked me to
> do few things:

I'm sorry, I'm currently lacking the patience to tune into your problem 
again, but maybe the script that I wrote (but did not post) back then is of 
help.

The data sample:

$ cat residues.txt
1 GLY HA2=3.7850 HA3=3.9130
2 SER H=8.8500 HA=4.3370 N=115.7570
3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380
4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810
5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790
6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690 
HD12=0.7690 HD13=0.7690 N=117.3260
7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800
8 PRO HD2=3.7450
9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660
10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470 
HG12=1.6010 HG13=2.1670 N=119.0300
11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620
12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640
13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620 
HD13=0.8620 N=119.1360

The script:

$ cat residues.py
def process(filename):
residues = {}
with open(filename) as infile:
for line in infile:
parts = line.split()# split line at whitespace
residue = int(parts.pop(0)) # convert first item to integer
if residue in residues:
raise ValueError("duplicate residue {}".format(residue))
parts.pop(0)# discard second item

# split remaining items at "=" and put them in a dict,
# e. g. {"HA2": 3.7, "HA3": 3.9}
pairs = (pair.split("=") for pair in parts)
lookup = {atom: float(value) for atom, value in pairs}

# put previous lookup dict in residues dict
# e. g. {1: {"HA2": 3.7, "HA3": 3.9}}
residues[residue] = lookup

return residues

def show(residues):
atoms = set().union(*(r.keys() for r in residues.values()))
residues = sorted(residues.items())
for atom in sorted(atoms):
for residue, lookup in residues:
print "{} {}={}".format(residue, atom, lookup.get(atom, "nil"))
print
print "---"
print

if __name__ == "__main__":
r = process("residues.txt")
show(r)

Note that converting the values to float can be omitted if all you want to 
do is print them. Finally the output of the script:

$ python residues.py 
1 H=nil
2 H=8.85
3 H=8.753
4 H=7.91
5 H=7.445
6 H=7.687
7 H=7.819
8 H=nil
9 H=8.235
10 H=7.979
11 H=7.947
12 H=8.191
13 H=8.133

---

1 HA=nil
2 HA=4.337
3 HA=4.034
4 HA=3.862
5 HA=4.077
6 HA=4.21
7 HA=4.554
8 HA=nil
9 HA=4.012
10 HA=3.697
11 HA=4.369
12 HA=4.192
13 HA=3.817

---

[snip]

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] arrangement of datafile

2013-12-27 Thread Evans Anyokwu
One thing that I've noticed is that there is no structure to your data.
Some have missing *fields* -so making the use of regex out of the question.

Without seeing your code, I'd suggest saving the data as a separated value
file and parse it. Python has a good csv support.

Get this one sorted out first then we can move on to the nested list.

Good luck.
Evans
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] arrangement of datafile

2013-12-27 Thread Andreas Perstinger
[Please don't top-post and trim the quoted message to the essential.
See http://www.catb.org/~esr/jargon/html/T/top-post.html ]

Amrita Kumari  wrote:
>My data file is something like this:
>
[SNIP]
>can you suggest me how to produce nested dicts like this:
[SNIP]

What's the current version of your program? Did you fix the
problem Dave told you?

Don't expect that we will write the program for you. Show us what you
have tried and where you are stuck and we will help you move on. And
always include the full traceback (error message) you get when you run
the program.

Bye, Andreas
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] arrangement of datafile

2013-12-27 Thread Amrita Kumari
Hi,

My data file is something like this:

1 GLY HA2=3.7850 HA3=3.9130
2 SER H=8.8500 HA=4.3370 N=115.7570
3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380
 4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810
5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790
6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690
HD12=0.7690 HD13=0.7690 N=117.3260
7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800
8 PRO HD2=3.7450
9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660
10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470
HG12=1.6010 HG13=2.1670 N=119.0300
11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620
12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640
13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620
HD13=0.8620 N=119.1360

...

where first column is the residue number and I want to print the individual
atom chemical shift value one by one along with residue number.for
example for atom HA2 it should be:

1 HA2=3.7850
2 HA2=nil
3 HA2=nil
.

..
13 HA2=nil

similarly for atom HA3 it should be same as above:

1 HA3=3.9130
2 HA3=nil
3 HA3=nil
...


13 HA3=nil

while for atom H it should be:

1  H=nil
2  H=8.8500
3  H=8.7530
4  H=7.9100
5  H=7.4450


can you suggest me how to produce nested dicts like this:

{1: {'HA2': 3.785, 'HA3': 3.913},
2: {'H': 8.85, 'HA': 4.337, 'N': 115.757},
3: {'H': 8.753, 'HA': 4.034, 'HB2': 1.808, 'N': 123.238},
4: {'H': 7.91, 'HA': 3.862, 'HB2': 1.744, 'HG2': 1.441, 'N': 117.981},
5: {'H': 7.445, 'HA': 4.077, 'HB2': 1.765, 'HG2': 1.413, 'N': 115.479},
6: {'H': 7.687,
 'HA': 4.21,
 'HB2': 1.386,
 'HB3': 1.605,
 'HD11': 0.769,
 'HD12': 0.769,
 'HD13': 0.769,
 'HG': 1.513,
 'N': 117.326},
7: {'H': 7.819, 'HA': 4.554, 'HB2': 3.136, 'N': 117.08},
8: {'HD2': 3.745},
9: {'H': 8.235, 'HA': 4.012, 'HB2': 2.137, 'N': 116.366},
10: {'H': 7.979,
  'HA': 3.697,
  'HB': 1.88,
  'HG12': 1.601,
  'HG13': 2.167,
  'HG21': 0.847,
  'HG22': 0.847,
  'HG23': 0.847,
  'N': 119.03},
11: {'H': 7.947, 'HA': 4.369, 'HB3': 2.514, 'N': 117.862},
12: {'H': 8.191, 'HA': 4.192, 'HB2': 3.156, 'N': 121.264},
13: {'H': 8.133,
  'HA': 3.817,
  'HB3': 1.788,
  'HD11': 0.862,
  'HD12': 0.862,
  'HD13': 0.862,
  'HG': 1.581,
  'N': 119.136}}

Thanks,
Amrita



On Wed, Dec 25, 2013 at 7:28 PM, Dave Angel  wrote:

> On Wed, 25 Dec 2013 16:17:27 +0800, Amrita Kumari 
> wrote:
>
>> I tried these and here is the code:
>>
>
>
>  f=open('filename')
>> lines=f.readlines()
>> new=lines.split()
>>
>
> That line will throw an exception.
>
>> number=int(new[0])
>> mylist=[i.split('=')[0] for i in new]
>>
>
>
>  one thing I don't understand is why you asked to remove first two
>> items from the list?
>>
>
> You don't show us the data file,  but presumably he would ask that because
> the first two lines held different formats of data. Like your number= line
> was intended to fetch a count from only line zero?
>
>
>
>  and is the above code alright?, it can produce
>> output like the one you mentioned:
>> {1: {'HA2': 3.785, 'HA3': 3.913},
>>  2: {'H': 8.85, 'HA': 4.337, 'N': 115.757},
>>
>
> The code above won't produce a dict of dicts. It won't even get past the
> exception.  Please use copy/paste.
>
> --
> DaveA
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] arrangement of datafile

2013-12-25 Thread Dave Angel
On Wed, 25 Dec 2013 16:17:27 +0800, Amrita Kumari 
 wrote:

I tried these and here is the code:




f=open('filename')
lines=f.readlines()
new=lines.split()


That line will throw an exception. 


number=int(new[0])
mylist=[i.split('=')[0] for i in new]




one thing I don't understand is why you asked to remove first two
items from the list? 


You don't show us the data file,  but presumably he would ask that 
because the first two lines held different formats of data. Like your 
number= line was intended to fetch a count from only line zero?




and is the above code alright?, it can produce
output like the one you mentioned:
{1: {'HA2': 3.785, 'HA3': 3.913},
 2: {'H': 8.85, 'HA': 4.337, 'N': 115.757},


The code above won't produce a dict of dicts. It won't even get past 
the exception.  Please use copy/paste.


--
DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] arrangement of datafile

2013-12-17 Thread Peter Otten
Amrita Kumari wrote:

> Hi,
> 
> I am new in programming and want to try Python programming (which is
> simple and easy to learn) to solve one problem: in which
> I have various long file like this:
> 
> 1 GLY HA2=3.7850 HA3=3.9130
> 2 SER H=8.8500 HA=4.3370 N=115.7570
> 3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380
> 4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810
> 5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790
> 6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690
> HD12=0.7690 HD13=0.7690 N=117.3260
> 7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800
> 8 PRO HD2=3.7450
> 9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660
> 10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470
> HG12=1.6010 HG13=2.1670 N=119.0300
> 11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620
> 12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640
> 13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620
> HD13=0.8620 N=119.1360
> 
> ...
> 
> where first column is the residue number, what I want is to print
> individual atom chemical shift value one by one along with residue
> number.for example for atom HA2 it should be:
> 
> 1 HA2=3.7850
> 2 HA2=nil
> 3 HA2=nil
> .
> 
> ..
> 13 HA2=nil
> 
> similarly for atom HA3 it should be same as above:
> 
> 1 HA3=3.9130
> 2 HA3=nil
> 3 HA3=nil
> ...
> 
> 
> 13 HA3=nil
> 
> while for atom H it should be:
> 1  H=nil
> 2  H=8.8500
> 3  H=8.7530
> 4  H=7.9100
> 5  H=7.4450
> 
> 
> but in some file the residue number is not continuous some are missing (in
> between). I want to write python code to solve this problem but don't know
> how to split the datafile and print the desired output. This problem is
> important in order to compare each atom chemical shift value with some
> other web-based generated chemical shift value. As the number of atoms in
> different row are different and similar atom are at random position in
> different residue hence I don't know to to split them. Please help to
> solve this problem.

You tell us what you want, but you don't give us an idea what you can do and 
what problems you run into.

Can you read a file line by line?
Can you split the line into a list of strings at whitespace occurences?
Can you extract the first item from the list and convert it to an int?
Can you remove the first two items from the list?
Can you split the items in the list at the "="?

Do what you can and come back here when you run into problems.
Once you have finished the above agenda you can put your data into two 
nested dicts that look like this:

{1: {'HA2': 3.785, 'HA3': 3.913},
 2: {'H': 8.85, 'HA': 4.337, 'N': 115.757},
 3: {'H': 8.753, 'HA': 4.034, 'HB2': 1.808, 'N': 123.238},
 4: {'H': 7.91, 'HA': 3.862, 'HB2': 1.744, 'HG2': 1.441, 'N': 117.981},
 5: {'H': 7.445, 'HA': 4.077, 'HB2': 1.765, 'HG2': 1.413, 'N': 115.479},
 6: {'H': 7.687,
 'HA': 4.21,
 'HB2': 1.386,
 'HB3': 1.605,
 'HD11': 0.769,
 'HD12': 0.769,
 'HD13': 0.769,
 'HG': 1.513,
 'N': 117.326},
 7: {'H': 7.819, 'HA': 4.554, 'HB2': 3.136, 'N': 117.08},
 8: {'HD2': 3.745},
 9: {'H': 8.235, 'HA': 4.012, 'HB2': 2.137, 'N': 116.366},
 10: {'H': 7.979,
  'HA': 3.697,
  'HB': 1.88,
  'HG12': 1.601,
  'HG13': 2.167,
  'HG21': 0.847,
  'HG22': 0.847,
  'HG23': 0.847,
  'N': 119.03},
 11: {'H': 7.947, 'HA': 4.369, 'HB3': 2.514, 'N': 117.862},
 12: {'H': 8.191, 'HA': 4.192, 'HB2': 3.156, 'N': 121.264},
 13: {'H': 8.133,
  'HA': 3.817,
  'HB3': 1.788,
  'HD11': 0.862,
  'HD12': 0.862,
  'HD13': 0.862,
  'HG': 1.581,
  'N': 119.136}}

Once you are there we can help you print out this nicely. Below's a spoiler 
;)






















def show(residues):
atoms = set().union(*(r.keys() for r in residues.values()))
residues = sorted(residues.items())
for atom in sorted(atoms):
for residue, lookup in residues:
print "{} {}={}".format(residue, atom, lookup.get(atom, "nil"))
print
print "---"
print




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor