Re: [Tutor] regex and parsing through a semi-csv file

2011-10-25 Thread Mina Nozar

Thank you Ramit.  I updated my code since I am running 2.7.1+ on Ubuntu.

Best wishes,
Mina

On 11-10-25 08:02 AM, Prasad, Ramit wrote:

f = open(args.fname, 'r')
lines = f.readlines()
f.close()


If you are using Python 2.6+ you can use a context manager to automatically 
close the file. That way you never have to worry about closing any files!

with open(args.fname, 'r') as f:
 lines = f.readlines()

Ramit

Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex and parsing through a semi-csv file

2011-10-24 Thread Mina Nozar

Hi Marc,

Thank you.  Following some of your suggestion, the rewrite below worked.  I agree with your point on readability over 
complexity.  By grace I meant not convoluted or simpler.  That's all.  As a beginner, I find not knowing all the 
existing functions, I end up re-inventing the wheel sometimes.



Cheers,
Mina


isotope_name,isotope_A = args.isotope.split('-')
print isotope_name, isotope_A

found_isotope = False
activity_time = []
activity = []
activity_err = []


f = open(args.fname, 'r')
lines = f.readlines()
f.close()

for i, line in enumerate(lines):
line = line.strip()
if isotope_name in line and isotope_A in line:
found_isotope = True
print 'found isotope'
#print line
lines = lines[i+1:]
break

for line in lines:
line = line.strip()
if not line[0].isdigit():
break
print 'found'
words = line.split(',')
activity_time.append(float(words[0]))
activity.append(float(words[1]))
activity_err.append(float(words[2]))

On 11-10-19 12:06 PM, Marc Tompkins wrote:

On Wed, Oct 5, 2011 at 11:12 AM, Mina Nozar mailto:noz...@triumf.ca>> wrote:

Now, I would like to parse through this code and fill out 3 lists: 1) 
activity_time, 2) activity, 3) error, and plot
the activities as a function of time using matplotlip.  My question 
specifically is on how to parse through the
lines containing the data (activity time, activity, error) for a given 
isotope, stopping before reaching the next
isotope's info.


Regular expressions certainly are terse, but (IMHO) they're really, really hard 
to debug and maintain; I find I have to
get myself into a Zen state to even unpack them, and that just doesn't feel 
very Pythonic.

Here's an approach I've used in similar situations (a file with arbitrary 
sequences of differently-formatted lines,
where one line determines the "type" of the lines that follow):
-  create a couple of status variables: currentElement, currentIsotope
-  read each line and split it into a list, separating on the commas
-  look at the first item on the line: is it an element?  (You could use a list 
of the 120 symbols, or you could just
check to see if it's alphabetic...)
   -  if the first item is an element, then set currentElement and 
currentIsotope, move on to next line.
-  if the first item is NOT an element, then this is a data line.
   -  if currentElement and currentIsotope match what the user asked for,
  -  add time, activity, and error to the appropriate lists
   - if not, move on.

This approach also works in the event that the data wasn't all collected in 
order - i.e. there might be data for Ag111
followed by U235 followed by Ag111 again.

Note that the size of the lists will change depending on the number of 
activities for a given run of the simulation
so I don't want to hard code '13' as the number of lines to read in 
followed by the line containing isotope_name, etc.


This should work for any number of lines or size of file, as long as the data 
lines are all formatted as you expect.
Obviously a bit of error-trapping would be a good thing

If there is a more graceful way of doing this, please let me know as well.  
I am new to python...

For me, readability and maintainability trump "grace" every time.  Nobody's 
handing out awards for elegance (outside of
the classroom), but complexity gets punished (with bugs and wasted time.)  More 
elegant solutions might also run faster,
but remember that premature optimization is a Bad Thing.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex and parsing through a semi-csv file

2011-10-19 Thread Mina Nozar

Hello Wayne,

Thank you for your help and sorry for the delay in the response.  I was caught up with other simulation jobs and didn't 
get around to testing what you suggested until yesterday.


On 11-10-05 01:24 PM, Wayne Werner wrote:

On Wed, Oct 5, 2011 at 1:12 PM, Mina Nozar mailto:noz...@triumf.ca>> wrote:
I just glanced through your email, but my initial thought would be to just use 
regex to collect the entire segment
that you're looking for, and then string methods to split it up:

pat = re.compile('({name},{number}.*?)[A-Z]{{1,2}}'.format(name='AC', 
number='225'), re.DOTALL)

raw_data = re.search(pat, f.read())
if raw_data is None:
 # we didn't find the isotope, so take appropriate actions, quit or tell 
the user
else:
 raw_data = raw_data.string.strip().split('\n')

Then it depends on how you want to process your data, but you could easily use 
list comprehensions/generator expressions.

The most terse syntax I know of:

data = [[float(x) for x in d.split(',')] for d in raw_data if d[0].isdigit()]


> data will then contain a list of 3-element lists of floating point values.

> If you want to "rotate" the list, you can do data = list(zip(*data)). To 
illustrate:

> >>> d = [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']]
> >>> d = list(zip(*d))
> >>> d
> [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]

HTH,
Wayne


I tried what you suggested above, but it doesn't work.  The search doesn't start at the right place (info. following an 
isotope choice input by the user and doesn't stop at the info. for the particular isotope.  So basically it seems like 
data gets filled in with the data for all isotopes in a given file.  I made a small test file to verify this.


Can you please explain what the statement assigned to pat actually does?

At the end, I should be getting three lists, one containing the times (column 1), one containing the activities (column 
2), and one containing the error in activities (column 3) for a specific isotope requested.



Thank you and best wishes,
Mina

Here is what I tried:

#! /usr/bin/env python

import re
import argparse

parser = argparse.ArgumentParser(description='Plot activities for a given 
isotope')
parser.add_argument('-f', action="store", dest="fname",  help='The csv file 
name containing ctivites')
parser.add_argument('-i', action="store", dest="isotope",  help='Isotope to 
plot activities for, eg. U-238')
args=parser.parse_args()
print 'file name:', args.fname
print 'isotope:', args.isotope

isotope_name,isotope_A = args.isotope.split('-')
print isotope_name, isotope_A

f = open(args.fname, 'r')
pat = re.compile('({name},{number}.*?)[A-Z]{{1,2}}'.format(name=isotope_name, 
number=isotope_A), re.DOTALL)
result = re.search(pat, f.read())
print result.string
f.close()

if result is None:
exit(args.fname+' does not contain info on '+args.isotope)
else:
result = result.string.strip().split('\n')
data = [[float(x) for x in d.split(',')] for d in result if 
d[0].isdigit()]
data = list(zip(*data))

for i in range(0, len(data)):
print data[i]


Input file: test.csv

# element, z, isotope, activity_time, activity, error
AC,225,89
3.6000e+03,1.6625e-07,2.4555e-09
8.6400e+04,0.e+00,-1.1455e-23
2.5920e+05,3.1615e-07,4.6695e-09
8.6400e+05,3.6457e-05,5.3847e-07
1.8000e+06,5.5137e-04,8.1437e-06
AG,111,47
3.6000e+03,1.7936e+07,3.1191e+05
8.6400e+04,7.9538e+08,1.3800e+07
2.5920e+05,2.2201e+09,3.8519e+07
8.6400e+05,5.5546e+09,9.6372e+07
1.8000e+06,7.8612e+09,1.3639e+08
AG,112,47
3.6000e+03,2.7591e+07,4.9498e+05
8.6400e+04,3.8637e+09,6.9315e+07
2.5920e+05,7.3492e+09,1.3184e+08
8.6400e+05,8.2493e+09,1.4799e+08
1.8000e+06,8.2528e+09,1.4806e+08


and here is what I get when I run the code: python ActivityPlots.py -f test.csv 
-i AG-111

file name: test.csv
isotope: AG-111
AG 111
# element, z, isotope, activity_time, activity, error
AC,225,89
3.6000e+03,1.6625e-07,2.4555e-09
8.6400e+04,0.e+00,-1.1455e-23
2.5920e+05,3.1615e-07,4.6695e-09
8.6400e+05,3.6457e-05,5.3847e-07
1.8000e+06,5.5137e-04,8.1437e-06
AG,111,47
3.6000e+03,1.7936e+07,3.1191e+05
8.6400e+04,7.9538e+08,1.3800e+07
2.5920e+05,2.2201e+09,3.8519e+07
8.6400e+05,5.5546e+09,9.6372e+07
1.8000e+06,7.8612e+09,1.3639e+08
AG,112,47
3.6000e+03,2.7591e+07,4.9498e+05
8.6400e+04,3.8637e+09,6.9315e+07
2.5920e+05,7.3492e+09,1.3184e+08
8.6400e+05,8.2493e+09,1.4799e+08

[Tutor] regex and parsing through a semi-csv file

2011-10-05 Thread Mina Nozar

Hi everyone,

I am post processing data from the output of simulation of activities for various radionuclide produced in a reaction at 
different times.


I have already combined the information from 13 files (containing calculated activities and errors for 13 different 
times).  The format of this combined, semi-csv file is the following:


A line with an element's name, its isotope number, and its atomic number, 
followed by 13 lines containing
activation time 1, activation 1, error in activation time 1
...
...

So here what the input file looks for two isotopes:

AC,225,89
3.6000e+03,1.6625e-07,2.4555e-09
8.6400e+04,0.e+00,-1.1455e-23
2.5920e+05,3.1615e-07,4.6695e-09
8.6400e+05,3.6457e-05,5.3847e-07
1.8000e+06,5.5137e-04,8.1437e-06
1.8036e+06,5.5047e-04,8.1304e-06
1.8864e+06,5.3279e-04,7.8693e-06
2.6640e+06,6.9672e-04,1.0291e-05
4.3920e+06,3.2737e-03,4.8353e-05
1.0440e+07,2.3830e-02,3.5197e-04
2.7720e+07,9.2184e-02,1.3616e-03
8.8200e+07,9.2184e-02,1.3616e-03
1.7460e+08,6.7440e-01,9.9609e-03
AG,111,47
3.6000e+03,1.7936e+07,3.1191e+05
8.6400e+04,7.9538e+08,1.3800e+07
2.5920e+05,2.2201e+09,3.8519e+07
8.6400e+05,5.5546e+09,9.6372e+07
1.8000e+06,7.8612e+09,1.3639e+08
1.8036e+06,7.8484e+09,1.3617e+08
1.8864e+06,7.1836e+09,1.2464e+08
2.6640e+06,3.1095e+09,5.3950e+07
4.3920e+06,4.8368e+08,8.3918e+06
1.0440e+07,7.1793e+05,1.2456e+04
2.7720e+07,5.9531e-03,1.0329e-04
8.8200e+07,5.9531e-03,1.0329e-04
1.7460e+08,0.e+00,0.e+00

Now, I would like to parse through this code and fill out 3 lists: 1) activity_time, 2) activity, 3) error, and plot the 
activities as a function of time using matplotlip.  My question specifically is on how to parse through the lines 
containing the data (activity time, activity, error) for a given isotope, stopping before reaching the next isotope's 
info.  The test I am trying in the following snippet is not working.


found_isotope = False
activity_time = []
activity = []
activity_err = []

f = open(args.fname, 'r')
for line in f.readlines():
line = line.strip()
if isotope_name in line and isotope_A in line:
print isotope_name, isotope_A
found_isotope = True
continue

if found_isotope:
print line  
found = 
re.search(r'(\d+\.[eE][\+\-]\d+),(\d+\.[eE][\+\-]\d+),(\d+\.[eE][\+\-]\d+)', 
line, re.I)
print found
if found:
print found.group(1), found.group(2), found.group(3)
activity_time.append(found.group(1))
activity.append(found.group(2))
activity_err.append(found.group(3))
continue
else:
break
f.close()

If I run the code for isotope_name: AC and isotope_A: 225, I get the following:
AC 225
3.6000e+03,1.6625e-07,2.4555e-09
None


Note that the size of the lists will change depending on the number of activities for a given run of the simulation so I 
don't want to hard code '13' as the number of lines to read in followed by the line containing isotope_name, etc.


If there is a more graceful way of doing this, please let me know as well.  I 
am new to python...

Thank you very much,
Mina
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] OptionParser

2011-09-20 Thread Mina Nozar

Thank you Parsad.

I am using Python 2.7.1+

You are right, looks like optparse is replaced by argparse.

My problem was that I was checking output and not options.output.

cheers,
Mina

On 11-09-20 02:27 PM, Prasad, Ramit wrote:

from optparse import OptionParser

I am not sure what version of Python you are using but from 2.7+ optparse is 
deprercated. You may want to use that if you can.


I don't really understand what dest and action in the arguments to 
parser.add_option mean.

Here is your usage:

parser = OptionParser(usage="blahblahblah")
parser.add_option("-f", "--file", dest="filename")



(options, args) = parser.parse_args(['-f filename.csv']) # test in interpreter
options.filename

' filename.csv' # Note the space before filename

print options.filename

  filename.csv

See the documentation: http://docs.python.org/library/optparse.html

Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] OptionParser

2011-09-19 Thread Mina Nozar

Hello,

I am trying to use OptionParser (my first time) to set a variable (cvs_output).  i.e. if --csv is given in the list of 
options, then cvs_output = True.

Then I check,
if cvs_output == True:
[...]


I have the following so far but something is missing.
from optparse import OptionParser
usage = "usage: %prog [options]"
parser = OptionParser(usage=usage)
parser.add_option("-cvs", dest="output", default=True, help="outputs the csv file 
for plotting activites")


python dUCx_ActivityPlots.py

python dUCx_ActivityPlots.2.py -h
Usage: dUCx_ActivityPlots.2.py [options]

Options:
  -h, --help show this help message and exit
  --cvs=OUTPUT_FLAG  outputs the csv file for plotting activities


I don't really understand what dest and action in the arguments to 
parser.add_option mean.
Any help is appreciated.
Mina
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor