Re: [Tutor] regex and parsing through a semi-csv file
Thank you Ramit. I updated my code since I am running 2.7.1+ on Ubuntu. Best wishes, Mina On 11-10-25 08:02 AM, Prasad, Ramit wrote: f = open(args.fname, 'r') lines = f.readlines() f.close() If you are using Python 2.6+ you can use a context manager to automatically close the file. That way you never have to worry about closing any files! with open(args.fname, 'r') as f: lines = f.readlines() Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex and parsing through a semi-csv file
Hi Marc, Thank you. Following some of your suggestion, the rewrite below worked. I agree with your point on readability over complexity. By grace I meant not convoluted or simpler. That's all. As a beginner, I find not knowing all the existing functions, I end up re-inventing the wheel sometimes. Cheers, Mina isotope_name,isotope_A = args.isotope.split('-') print isotope_name, isotope_A found_isotope = False activity_time = [] activity = [] activity_err = [] f = open(args.fname, 'r') lines = f.readlines() f.close() for i, line in enumerate(lines): line = line.strip() if isotope_name in line and isotope_A in line: found_isotope = True print 'found isotope' #print line lines = lines[i+1:] break for line in lines: line = line.strip() if not line[0].isdigit(): break print 'found' words = line.split(',') activity_time.append(float(words[0])) activity.append(float(words[1])) activity_err.append(float(words[2])) On 11-10-19 12:06 PM, Marc Tompkins wrote: On Wed, Oct 5, 2011 at 11:12 AM, Mina Nozar mailto:noz...@triumf.ca>> wrote: Now, I would like to parse through this code and fill out 3 lists: 1) activity_time, 2) activity, 3) error, and plot the activities as a function of time using matplotlip. My question specifically is on how to parse through the lines containing the data (activity time, activity, error) for a given isotope, stopping before reaching the next isotope's info. Regular expressions certainly are terse, but (IMHO) they're really, really hard to debug and maintain; I find I have to get myself into a Zen state to even unpack them, and that just doesn't feel very Pythonic. Here's an approach I've used in similar situations (a file with arbitrary sequences of differently-formatted lines, where one line determines the "type" of the lines that follow): - create a couple of status variables: currentElement, currentIsotope - read each line and split it into a list, separating on the commas - look at the first item on the line: is it an element? (You could use a list of the 120 symbols, or you could just check to see if it's alphabetic...) - if the first item is an element, then set currentElement and currentIsotope, move on to next line. - if the first item is NOT an element, then this is a data line. - if currentElement and currentIsotope match what the user asked for, - add time, activity, and error to the appropriate lists - if not, move on. This approach also works in the event that the data wasn't all collected in order - i.e. there might be data for Ag111 followed by U235 followed by Ag111 again. Note that the size of the lists will change depending on the number of activities for a given run of the simulation so I don't want to hard code '13' as the number of lines to read in followed by the line containing isotope_name, etc. This should work for any number of lines or size of file, as long as the data lines are all formatted as you expect. Obviously a bit of error-trapping would be a good thing If there is a more graceful way of doing this, please let me know as well. I am new to python... For me, readability and maintainability trump "grace" every time. Nobody's handing out awards for elegance (outside of the classroom), but complexity gets punished (with bugs and wasted time.) More elegant solutions might also run faster, but remember that premature optimization is a Bad Thing. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex and parsing through a semi-csv file
Hello Wayne, Thank you for your help and sorry for the delay in the response. I was caught up with other simulation jobs and didn't get around to testing what you suggested until yesterday. On 11-10-05 01:24 PM, Wayne Werner wrote: On Wed, Oct 5, 2011 at 1:12 PM, Mina Nozar mailto:noz...@triumf.ca>> wrote: I just glanced through your email, but my initial thought would be to just use regex to collect the entire segment that you're looking for, and then string methods to split it up: pat = re.compile('({name},{number}.*?)[A-Z]{{1,2}}'.format(name='AC', number='225'), re.DOTALL) raw_data = re.search(pat, f.read()) if raw_data is None: # we didn't find the isotope, so take appropriate actions, quit or tell the user else: raw_data = raw_data.string.strip().split('\n') Then it depends on how you want to process your data, but you could easily use list comprehensions/generator expressions. The most terse syntax I know of: data = [[float(x) for x in d.split(',')] for d in raw_data if d[0].isdigit()] > data will then contain a list of 3-element lists of floating point values. > If you want to "rotate" the list, you can do data = list(zip(*data)). To illustrate: > >>> d = [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']] > >>> d = list(zip(*d)) > >>> d > [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')] HTH, Wayne I tried what you suggested above, but it doesn't work. The search doesn't start at the right place (info. following an isotope choice input by the user and doesn't stop at the info. for the particular isotope. So basically it seems like data gets filled in with the data for all isotopes in a given file. I made a small test file to verify this. Can you please explain what the statement assigned to pat actually does? At the end, I should be getting three lists, one containing the times (column 1), one containing the activities (column 2), and one containing the error in activities (column 3) for a specific isotope requested. Thank you and best wishes, Mina Here is what I tried: #! /usr/bin/env python import re import argparse parser = argparse.ArgumentParser(description='Plot activities for a given isotope') parser.add_argument('-f', action="store", dest="fname", help='The csv file name containing ctivites') parser.add_argument('-i', action="store", dest="isotope", help='Isotope to plot activities for, eg. U-238') args=parser.parse_args() print 'file name:', args.fname print 'isotope:', args.isotope isotope_name,isotope_A = args.isotope.split('-') print isotope_name, isotope_A f = open(args.fname, 'r') pat = re.compile('({name},{number}.*?)[A-Z]{{1,2}}'.format(name=isotope_name, number=isotope_A), re.DOTALL) result = re.search(pat, f.read()) print result.string f.close() if result is None: exit(args.fname+' does not contain info on '+args.isotope) else: result = result.string.strip().split('\n') data = [[float(x) for x in d.split(',')] for d in result if d[0].isdigit()] data = list(zip(*data)) for i in range(0, len(data)): print data[i] Input file: test.csv # element, z, isotope, activity_time, activity, error AC,225,89 3.6000e+03,1.6625e-07,2.4555e-09 8.6400e+04,0.e+00,-1.1455e-23 2.5920e+05,3.1615e-07,4.6695e-09 8.6400e+05,3.6457e-05,5.3847e-07 1.8000e+06,5.5137e-04,8.1437e-06 AG,111,47 3.6000e+03,1.7936e+07,3.1191e+05 8.6400e+04,7.9538e+08,1.3800e+07 2.5920e+05,2.2201e+09,3.8519e+07 8.6400e+05,5.5546e+09,9.6372e+07 1.8000e+06,7.8612e+09,1.3639e+08 AG,112,47 3.6000e+03,2.7591e+07,4.9498e+05 8.6400e+04,3.8637e+09,6.9315e+07 2.5920e+05,7.3492e+09,1.3184e+08 8.6400e+05,8.2493e+09,1.4799e+08 1.8000e+06,8.2528e+09,1.4806e+08 and here is what I get when I run the code: python ActivityPlots.py -f test.csv -i AG-111 file name: test.csv isotope: AG-111 AG 111 # element, z, isotope, activity_time, activity, error AC,225,89 3.6000e+03,1.6625e-07,2.4555e-09 8.6400e+04,0.e+00,-1.1455e-23 2.5920e+05,3.1615e-07,4.6695e-09 8.6400e+05,3.6457e-05,5.3847e-07 1.8000e+06,5.5137e-04,8.1437e-06 AG,111,47 3.6000e+03,1.7936e+07,3.1191e+05 8.6400e+04,7.9538e+08,1.3800e+07 2.5920e+05,2.2201e+09,3.8519e+07 8.6400e+05,5.5546e+09,9.6372e+07 1.8000e+06,7.8612e+09,1.3639e+08 AG,112,47 3.6000e+03,2.7591e+07,4.9498e+05 8.6400e+04,3.8637e+09,6.9315e+07 2.5920e+05,7.3492e+09,1.3184e+08 8.6400e+05,8.2493e+09,1.4799e+08
[Tutor] regex and parsing through a semi-csv file
Hi everyone, I am post processing data from the output of simulation of activities for various radionuclide produced in a reaction at different times. I have already combined the information from 13 files (containing calculated activities and errors for 13 different times). The format of this combined, semi-csv file is the following: A line with an element's name, its isotope number, and its atomic number, followed by 13 lines containing activation time 1, activation 1, error in activation time 1 ... ... So here what the input file looks for two isotopes: AC,225,89 3.6000e+03,1.6625e-07,2.4555e-09 8.6400e+04,0.e+00,-1.1455e-23 2.5920e+05,3.1615e-07,4.6695e-09 8.6400e+05,3.6457e-05,5.3847e-07 1.8000e+06,5.5137e-04,8.1437e-06 1.8036e+06,5.5047e-04,8.1304e-06 1.8864e+06,5.3279e-04,7.8693e-06 2.6640e+06,6.9672e-04,1.0291e-05 4.3920e+06,3.2737e-03,4.8353e-05 1.0440e+07,2.3830e-02,3.5197e-04 2.7720e+07,9.2184e-02,1.3616e-03 8.8200e+07,9.2184e-02,1.3616e-03 1.7460e+08,6.7440e-01,9.9609e-03 AG,111,47 3.6000e+03,1.7936e+07,3.1191e+05 8.6400e+04,7.9538e+08,1.3800e+07 2.5920e+05,2.2201e+09,3.8519e+07 8.6400e+05,5.5546e+09,9.6372e+07 1.8000e+06,7.8612e+09,1.3639e+08 1.8036e+06,7.8484e+09,1.3617e+08 1.8864e+06,7.1836e+09,1.2464e+08 2.6640e+06,3.1095e+09,5.3950e+07 4.3920e+06,4.8368e+08,8.3918e+06 1.0440e+07,7.1793e+05,1.2456e+04 2.7720e+07,5.9531e-03,1.0329e-04 8.8200e+07,5.9531e-03,1.0329e-04 1.7460e+08,0.e+00,0.e+00 Now, I would like to parse through this code and fill out 3 lists: 1) activity_time, 2) activity, 3) error, and plot the activities as a function of time using matplotlip. My question specifically is on how to parse through the lines containing the data (activity time, activity, error) for a given isotope, stopping before reaching the next isotope's info. The test I am trying in the following snippet is not working. found_isotope = False activity_time = [] activity = [] activity_err = [] f = open(args.fname, 'r') for line in f.readlines(): line = line.strip() if isotope_name in line and isotope_A in line: print isotope_name, isotope_A found_isotope = True continue if found_isotope: print line found = re.search(r'(\d+\.[eE][\+\-]\d+),(\d+\.[eE][\+\-]\d+),(\d+\.[eE][\+\-]\d+)', line, re.I) print found if found: print found.group(1), found.group(2), found.group(3) activity_time.append(found.group(1)) activity.append(found.group(2)) activity_err.append(found.group(3)) continue else: break f.close() If I run the code for isotope_name: AC and isotope_A: 225, I get the following: AC 225 3.6000e+03,1.6625e-07,2.4555e-09 None Note that the size of the lists will change depending on the number of activities for a given run of the simulation so I don't want to hard code '13' as the number of lines to read in followed by the line containing isotope_name, etc. If there is a more graceful way of doing this, please let me know as well. I am new to python... Thank you very much, Mina ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] OptionParser
Thank you Parsad. I am using Python 2.7.1+ You are right, looks like optparse is replaced by argparse. My problem was that I was checking output and not options.output. cheers, Mina On 11-09-20 02:27 PM, Prasad, Ramit wrote: from optparse import OptionParser I am not sure what version of Python you are using but from 2.7+ optparse is deprercated. You may want to use that if you can. I don't really understand what dest and action in the arguments to parser.add_option mean. Here is your usage: parser = OptionParser(usage="blahblahblah") parser.add_option("-f", "--file", dest="filename") (options, args) = parser.parse_args(['-f filename.csv']) # test in interpreter options.filename ' filename.csv' # Note the space before filename print options.filename filename.csv See the documentation: http://docs.python.org/library/optparse.html Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] OptionParser
Hello, I am trying to use OptionParser (my first time) to set a variable (cvs_output). i.e. if --csv is given in the list of options, then cvs_output = True. Then I check, if cvs_output == True: [...] I have the following so far but something is missing. from optparse import OptionParser usage = "usage: %prog [options]" parser = OptionParser(usage=usage) parser.add_option("-cvs", dest="output", default=True, help="outputs the csv file for plotting activites") python dUCx_ActivityPlots.py python dUCx_ActivityPlots.2.py -h Usage: dUCx_ActivityPlots.2.py [options] Options: -h, --help show this help message and exit --cvs=OUTPUT_FLAG outputs the csv file for plotting activities I don't really understand what dest and action in the arguments to parser.add_option mean. Any help is appreciated. Mina ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor