Re: [Tutor] arrangement of datafile
Ok, it's clear already that the OP has a csv file so the following is OFF-TOPIC. I was reading Python Cookbook and I saw a recipe to read fixed width files using struct.unpack. Much shorter and faster (esp. if you use compiled structs) than indexing. I thought this is a pretty cool approach: http://code.activestate.com/recipes/65224-accessing-substrings/. regards, Albert-Jan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] arrangement of datafile
Hi Peter, Thankyou very much for your kind help. I got the output like the way I wanted (which you have also shown in your output). I really appreciate your effort. Thanks for your time. Amrita On Thu, Jan 9, 2014 at 8:41 PM, Peter Otten <__pete...@web.de> wrote: > Amrita Kumari wrote: > > > On 17th Dec. I posted one question, how to arrange datafile in a > > particular fashion so that I can have only residue no. and chemical > > shift value of the atom as: > > 1 H=nil > > 2 H=8.8500 > > 3 H=8.7530 > > 4 H=7.9100 > > 5 H=7.4450 > > > > Peter has replied to this mail but since I haven't subscribe to the > > tutor mailing list earlier hence I didn't receive the reply, I > > apologize for my mistake, today I checked his reply and he asked me to > > do few things: > > I'm sorry, I'm currently lacking the patience to tune into your problem > again, but maybe the script that I wrote (but did not post) back then is of > help. > > The data sample: > > $ cat residues.txt > 1 GLY HA2=3.7850 HA3=3.9130 > 2 SER H=8.8500 HA=4.3370 N=115.7570 > 3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380 > 4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810 > 5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790 > 6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690 > HD12=0.7690 HD13=0.7690 N=117.3260 > 7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800 > 8 PRO HD2=3.7450 > 9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660 > 10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470 > HG12=1.6010 HG13=2.1670 N=119.0300 > 11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620 > 12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640 > 13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620 > HD13=0.8620 N=119.1360 > > The script: > > $ cat residues.py > def process(filename): > residues = {} > with open(filename) as infile: > for line in infile: > parts = line.split()# split line at whitespace > residue = int(parts.pop(0)) # convert first item to integer > if residue in residues: > raise ValueError("duplicate residue {}".format(residue)) > parts.pop(0)# discard second item > > # split remaining items at "=" and put them in a dict, > # e. g. {"HA2": 3.7, "HA3": 3.9} > pairs = (pair.split("=") for pair in parts) > lookup = {atom: float(value) for atom, value in pairs} > > # put previous lookup dict in residues dict > # e. g. {1: {"HA2": 3.7, "HA3": 3.9}} > residues[residue] = lookup > > return residues > > def show(residues): > atoms = set().union(*(r.keys() for r in residues.values())) > residues = sorted(residues.items()) > for atom in sorted(atoms): > for residue, lookup in residues: > print "{} {}={}".format(residue, atom, lookup.get(atom, "nil")) > print > print "---" > print > > if __name__ == "__main__": > r = process("residues.txt") > show(r) > > Note that converting the values to float can be omitted if all you want to > do is print them. Finally the output of the script: > > $ python residues.py > 1 H=nil > 2 H=8.85 > 3 H=8.753 > 4 H=7.91 > 5 H=7.445 > 6 H=7.687 > 7 H=7.819 > 8 H=nil > 9 H=8.235 > 10 H=7.979 > 11 H=7.947 > 12 H=8.191 > 13 H=8.133 > > --- > > 1 HA=nil > 2 HA=4.337 > 3 HA=4.034 > 4 HA=3.862 > 5 HA=4.077 > 6 HA=4.21 > 7 HA=4.554 > 8 HA=nil > 9 HA=4.012 > 10 HA=3.697 > 11 HA=4.369 > 12 HA=4.192 > 13 HA=3.817 > > --- > > [snip] > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] arrangement of datafile
Amrita Kumari wrote: > On 17th Dec. I posted one question, how to arrange datafile in a > particular fashion so that I can have only residue no. and chemical > shift value of the atom as: > 1 H=nil > 2 H=8.8500 > 3 H=8.7530 > 4 H=7.9100 > 5 H=7.4450 > > Peter has replied to this mail but since I haven't subscribe to the > tutor mailing list earlier hence I didn't receive the reply, I > apologize for my mistake, today I checked his reply and he asked me to > do few things: I'm sorry, I'm currently lacking the patience to tune into your problem again, but maybe the script that I wrote (but did not post) back then is of help. The data sample: $ cat residues.txt 1 GLY HA2=3.7850 HA3=3.9130 2 SER H=8.8500 HA=4.3370 N=115.7570 3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380 4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810 5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790 6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690 HD12=0.7690 HD13=0.7690 N=117.3260 7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800 8 PRO HD2=3.7450 9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660 10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470 HG12=1.6010 HG13=2.1670 N=119.0300 11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620 12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640 13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620 HD13=0.8620 N=119.1360 The script: $ cat residues.py def process(filename): residues = {} with open(filename) as infile: for line in infile: parts = line.split()# split line at whitespace residue = int(parts.pop(0)) # convert first item to integer if residue in residues: raise ValueError("duplicate residue {}".format(residue)) parts.pop(0)# discard second item # split remaining items at "=" and put them in a dict, # e. g. {"HA2": 3.7, "HA3": 3.9} pairs = (pair.split("=") for pair in parts) lookup = {atom: float(value) for atom, value in pairs} # put previous lookup dict in residues dict # e. g. {1: {"HA2": 3.7, "HA3": 3.9}} residues[residue] = lookup return residues def show(residues): atoms = set().union(*(r.keys() for r in residues.values())) residues = sorted(residues.items()) for atom in sorted(atoms): for residue, lookup in residues: print "{} {}={}".format(residue, atom, lookup.get(atom, "nil")) print print "---" print if __name__ == "__main__": r = process("residues.txt") show(r) Note that converting the values to float can be omitted if all you want to do is print them. Finally the output of the script: $ python residues.py 1 H=nil 2 H=8.85 3 H=8.753 4 H=7.91 5 H=7.445 6 H=7.687 7 H=7.819 8 H=nil 9 H=8.235 10 H=7.979 11 H=7.947 12 H=8.191 13 H=8.133 --- 1 HA=nil 2 HA=4.337 3 HA=4.034 4 HA=3.862 5 HA=4.077 6 HA=4.21 7 HA=4.554 8 HA=nil 9 HA=4.012 10 HA=3.697 11 HA=4.369 12 HA=4.192 13 HA=3.817 --- [snip] ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] arrangement of datafile
One thing that I've noticed is that there is no structure to your data. Some have missing *fields* -so making the use of regex out of the question. Without seeing your code, I'd suggest saving the data as a separated value file and parse it. Python has a good csv support. Get this one sorted out first then we can move on to the nested list. Good luck. Evans ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] arrangement of datafile
[Please don't top-post and trim the quoted message to the essential. See http://www.catb.org/~esr/jargon/html/T/top-post.html ] Amrita Kumari wrote: >My data file is something like this: > [SNIP] >can you suggest me how to produce nested dicts like this: [SNIP] What's the current version of your program? Did you fix the problem Dave told you? Don't expect that we will write the program for you. Show us what you have tried and where you are stuck and we will help you move on. And always include the full traceback (error message) you get when you run the program. Bye, Andreas ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] arrangement of datafile
Hi, My data file is something like this: 1 GLY HA2=3.7850 HA3=3.9130 2 SER H=8.8500 HA=4.3370 N=115.7570 3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380 4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810 5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790 6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690 HD12=0.7690 HD13=0.7690 N=117.3260 7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800 8 PRO HD2=3.7450 9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660 10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470 HG12=1.6010 HG13=2.1670 N=119.0300 11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620 12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640 13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620 HD13=0.8620 N=119.1360 ... where first column is the residue number and I want to print the individual atom chemical shift value one by one along with residue number.for example for atom HA2 it should be: 1 HA2=3.7850 2 HA2=nil 3 HA2=nil . .. 13 HA2=nil similarly for atom HA3 it should be same as above: 1 HA3=3.9130 2 HA3=nil 3 HA3=nil ... 13 HA3=nil while for atom H it should be: 1 H=nil 2 H=8.8500 3 H=8.7530 4 H=7.9100 5 H=7.4450 can you suggest me how to produce nested dicts like this: {1: {'HA2': 3.785, 'HA3': 3.913}, 2: {'H': 8.85, 'HA': 4.337, 'N': 115.757}, 3: {'H': 8.753, 'HA': 4.034, 'HB2': 1.808, 'N': 123.238}, 4: {'H': 7.91, 'HA': 3.862, 'HB2': 1.744, 'HG2': 1.441, 'N': 117.981}, 5: {'H': 7.445, 'HA': 4.077, 'HB2': 1.765, 'HG2': 1.413, 'N': 115.479}, 6: {'H': 7.687, 'HA': 4.21, 'HB2': 1.386, 'HB3': 1.605, 'HD11': 0.769, 'HD12': 0.769, 'HD13': 0.769, 'HG': 1.513, 'N': 117.326}, 7: {'H': 7.819, 'HA': 4.554, 'HB2': 3.136, 'N': 117.08}, 8: {'HD2': 3.745}, 9: {'H': 8.235, 'HA': 4.012, 'HB2': 2.137, 'N': 116.366}, 10: {'H': 7.979, 'HA': 3.697, 'HB': 1.88, 'HG12': 1.601, 'HG13': 2.167, 'HG21': 0.847, 'HG22': 0.847, 'HG23': 0.847, 'N': 119.03}, 11: {'H': 7.947, 'HA': 4.369, 'HB3': 2.514, 'N': 117.862}, 12: {'H': 8.191, 'HA': 4.192, 'HB2': 3.156, 'N': 121.264}, 13: {'H': 8.133, 'HA': 3.817, 'HB3': 1.788, 'HD11': 0.862, 'HD12': 0.862, 'HD13': 0.862, 'HG': 1.581, 'N': 119.136}} Thanks, Amrita On Wed, Dec 25, 2013 at 7:28 PM, Dave Angel wrote: > On Wed, 25 Dec 2013 16:17:27 +0800, Amrita Kumari > wrote: > >> I tried these and here is the code: >> > > > f=open('filename') >> lines=f.readlines() >> new=lines.split() >> > > That line will throw an exception. > >> number=int(new[0]) >> mylist=[i.split('=')[0] for i in new] >> > > > one thing I don't understand is why you asked to remove first two >> items from the list? >> > > You don't show us the data file, but presumably he would ask that because > the first two lines held different formats of data. Like your number= line > was intended to fetch a count from only line zero? > > > > and is the above code alright?, it can produce >> output like the one you mentioned: >> {1: {'HA2': 3.785, 'HA3': 3.913}, >> 2: {'H': 8.85, 'HA': 4.337, 'N': 115.757}, >> > > The code above won't produce a dict of dicts. It won't even get past the > exception. Please use copy/paste. > > -- > DaveA > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] arrangement of datafile
On Wed, 25 Dec 2013 16:17:27 +0800, Amrita Kumari wrote: I tried these and here is the code: f=open('filename') lines=f.readlines() new=lines.split() That line will throw an exception. number=int(new[0]) mylist=[i.split('=')[0] for i in new] one thing I don't understand is why you asked to remove first two items from the list? You don't show us the data file, but presumably he would ask that because the first two lines held different formats of data. Like your number= line was intended to fetch a count from only line zero? and is the above code alright?, it can produce output like the one you mentioned: {1: {'HA2': 3.785, 'HA3': 3.913}, 2: {'H': 8.85, 'HA': 4.337, 'N': 115.757}, The code above won't produce a dict of dicts. It won't even get past the exception. Please use copy/paste. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] arrangement of datafile
Amrita Kumari wrote: > Hi, > > I am new in programming and want to try Python programming (which is > simple and easy to learn) to solve one problem: in which > I have various long file like this: > > 1 GLY HA2=3.7850 HA3=3.9130 > 2 SER H=8.8500 HA=4.3370 N=115.7570 > 3 LYS H=8.7530 HA=4.0340 HB2=1.8080 N=123.2380 > 4 LYS H=7.9100 HA=3.8620 HB2=1.7440 HG2=1.4410 N=117.9810 > 5 LYS H=7.4450 HA=4.0770 HB2=1.7650 HG2=1.4130 N=115.4790 > 6 LEU H=7.6870 HA=4.2100 HB2=1.3860 HB3=1.6050 HG=1.5130 HD11=0.7690 > HD12=0.7690 HD13=0.7690 N=117.3260 > 7 PHE H=7.8190 HA=4.5540 HB2=3.1360 N=117.0800 > 8 PRO HD2=3.7450 > 9 GLN H=8.2350 HA=4.0120 HB2=2.1370 N=116.3660 > 10 ILE H=7.9790 HA=3.6970 HB=1.8800 HG21=0.8470 HG22=0.8470 HG23=0.8470 > HG12=1.6010 HG13=2.1670 N=119.0300 > 11 ASN H=7.9470 HA=4.3690 HB3=2.5140 N=117.8620 > 12 PHE H=8.1910 HA=4.1920 HB2=3.1560 N=121.2640 > 13 LEU H=8.1330 HA=3.8170 HB3=1.7880 HG=1.5810 HD11=0.8620 HD12=0.8620 > HD13=0.8620 N=119.1360 > > ... > > where first column is the residue number, what I want is to print > individual atom chemical shift value one by one along with residue > number.for example for atom HA2 it should be: > > 1 HA2=3.7850 > 2 HA2=nil > 3 HA2=nil > . > > .. > 13 HA2=nil > > similarly for atom HA3 it should be same as above: > > 1 HA3=3.9130 > 2 HA3=nil > 3 HA3=nil > ... > > > 13 HA3=nil > > while for atom H it should be: > 1 H=nil > 2 H=8.8500 > 3 H=8.7530 > 4 H=7.9100 > 5 H=7.4450 > > > but in some file the residue number is not continuous some are missing (in > between). I want to write python code to solve this problem but don't know > how to split the datafile and print the desired output. This problem is > important in order to compare each atom chemical shift value with some > other web-based generated chemical shift value. As the number of atoms in > different row are different and similar atom are at random position in > different residue hence I don't know to to split them. Please help to > solve this problem. You tell us what you want, but you don't give us an idea what you can do and what problems you run into. Can you read a file line by line? Can you split the line into a list of strings at whitespace occurences? Can you extract the first item from the list and convert it to an int? Can you remove the first two items from the list? Can you split the items in the list at the "="? Do what you can and come back here when you run into problems. Once you have finished the above agenda you can put your data into two nested dicts that look like this: {1: {'HA2': 3.785, 'HA3': 3.913}, 2: {'H': 8.85, 'HA': 4.337, 'N': 115.757}, 3: {'H': 8.753, 'HA': 4.034, 'HB2': 1.808, 'N': 123.238}, 4: {'H': 7.91, 'HA': 3.862, 'HB2': 1.744, 'HG2': 1.441, 'N': 117.981}, 5: {'H': 7.445, 'HA': 4.077, 'HB2': 1.765, 'HG2': 1.413, 'N': 115.479}, 6: {'H': 7.687, 'HA': 4.21, 'HB2': 1.386, 'HB3': 1.605, 'HD11': 0.769, 'HD12': 0.769, 'HD13': 0.769, 'HG': 1.513, 'N': 117.326}, 7: {'H': 7.819, 'HA': 4.554, 'HB2': 3.136, 'N': 117.08}, 8: {'HD2': 3.745}, 9: {'H': 8.235, 'HA': 4.012, 'HB2': 2.137, 'N': 116.366}, 10: {'H': 7.979, 'HA': 3.697, 'HB': 1.88, 'HG12': 1.601, 'HG13': 2.167, 'HG21': 0.847, 'HG22': 0.847, 'HG23': 0.847, 'N': 119.03}, 11: {'H': 7.947, 'HA': 4.369, 'HB3': 2.514, 'N': 117.862}, 12: {'H': 8.191, 'HA': 4.192, 'HB2': 3.156, 'N': 121.264}, 13: {'H': 8.133, 'HA': 3.817, 'HB3': 1.788, 'HD11': 0.862, 'HD12': 0.862, 'HD13': 0.862, 'HG': 1.581, 'N': 119.136}} Once you are there we can help you print out this nicely. Below's a spoiler ;) def show(residues): atoms = set().union(*(r.keys() for r in residues.values())) residues = sorted(residues.items()) for atom in sorted(atoms): for residue, lookup in residues: print "{} {}={}".format(residue, atom, lookup.get(atom, "nil")) print print "---" print ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor