Re: writing results to array

2007-12-04 Thread Bevan Jenkins
Thank you all very much.

Firstly for providing an answer that does exactly what I require.  But
also for the hints on the naming conventions and the explanations of
how I was going wrong.

Thanks again,
b


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing results to array

2007-12-04 Thread Chris
On Dec 3, 10:45 pm, Bevan Jenkins <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have recently discovered the python language and am having a lot of
> fun getting head around the basics of it.
> However, I have run into a stumbling block that I have not been able
> to overcome, so I thought I would ask for help.
> 
> I am trying to import a text file that has the following format:
> 02/01/2000 @ 00:00:00   0.983896 Q10  T2
> 03/01/2000 @ 00:00:00   0.557377 Q10  T2
> 04/01/2000 @ 00:00:00   0.508871 Q10  T2
> 05/01/2000 @ 00:00:00   0.583196 Q10  T2
> 06/01/2000 @ 00:00:00   0.518281 Q10  T2
> when there is missing data:
> 12/09/2000 @ 00:00:00Q151 T2
> 13/09/2000 @ 00:00:00Q151 T2
>
> I have cobbled together some code which imports the data.  The next
> step is to create an array in which each column contains a years worth
> of values.  Thus, if i have 6 years of data (2001-2006 inclusive),
> there will be six columns, with 365 rows (not all years have a full
> data set and may only have say 340 days of data.
> 
> In the code below
> print answer[j,1] is giving me the right answer but i can't write it
> to an array.
> any suggestions welcomed.
>
> This is what I have:
> flow=[]
> flowdate=[]
> yeardate=[]
> uniqueyear=[]
> #flow_order=
> flow_rank=[]
> icount=[]
> p=[]
>
> filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
> linesep ="\n"
>
> # read in whole file
> tempdata = open( filename).read()
> # break into lines
> tempdata = string.split( tempdata, linesep )
> # for each record, get the field values
> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])
> if len(fields)>5:
> flowdate.append(fields[0])
> list =string.split(fields[0],"/")
> yeardate.append(list[2])
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))
>
> for rows in yeardate:
>if rows not in uniqueyear:
>   uniqueyear.append(rows)
>
> #print answer[:,0]   #date
> flow_order=empty((0,0),dtype=float)
> #for yr in enumerate(uniqueyear):
> for iyr,yr in enumerate(uniqueyear):
> for j, val, in enumerate (answer[:,0]):
> flowyr=string.split(val,"/")
> if int(flowyr[2])==int(yr):
> print answer[j,1]
> #flow_order =

Maybe you're looking for something more in the line of:

fInput = open('tst.txt')
dictObj = {}
"""{ Year_Key: { DayKey: FloatValue}}"""
for each_line in fInput.readlines():
if each_line.strip():
line = each_line.strip().split()
if len(line) == 6:
if dictObj.has_key(line[0].split('/')[-1]):
tmpDict = dictObj[line[0].split('/')[-1]]
tmpDict[line[0]] = line[3]
else:
dictObj[line[0].split('/')[-1]] = {line[0]:line[3]}
fInput.close()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing results to array

2007-12-03 Thread Matimus
On Dec 3, 12:45 pm, Bevan Jenkins <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have recently discovered the python language and am having a lot of
> fun getting head around the basics of it.
> However, I have run into a stumbling block that I have not been able
> to overcome, so I thought I would ask for help.
> 
> I am trying to import a text file that has the following format:
> 02/01/2000 @ 00:00:00   0.983896 Q10  T2
> 03/01/2000 @ 00:00:00   0.557377 Q10  T2
> 04/01/2000 @ 00:00:00   0.508871 Q10  T2
> 05/01/2000 @ 00:00:00   0.583196 Q10  T2
> 06/01/2000 @ 00:00:00   0.518281 Q10  T2
> when there is missing data:
> 12/09/2000 @ 00:00:00Q151 T2
> 13/09/2000 @ 00:00:00Q151 T2
>
> I have cobbled together some code which imports the data.  The next
> step is to create an array in which each column contains a years worth
> of values.  Thus, if i have 6 years of data (2001-2006 inclusive),
> there will be six columns, with 365 rows (not all years have a full
> data set and may only have say 340 days of data.
> 
> In the code below
> print answer[j,1] is giving me the right answer but i can't write it
> to an array.
> any suggestions welcomed.
>
> This is what I have:
> flow=[]
> flowdate=[]
> yeardate=[]
> uniqueyear=[]
> #flow_order=
> flow_rank=[]
> icount=[]
> p=[]
>
> filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
> linesep ="\n"
>
> # read in whole file
> tempdata = open( filename).read()
> # break into lines
> tempdata = string.split( tempdata, linesep )
> # for each record, get the field values
> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])
> if len(fields)>5:
> flowdate.append(fields[0])
> list =string.split(fields[0],"/")
> yeardate.append(list[2])
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))
>
> for rows in yeardate:
>if rows not in uniqueyear:
>   uniqueyear.append(rows)
>
> #print answer[:,0]   #date
> flow_order=empty((0,0),dtype=float)
> #for yr in enumerate(uniqueyear):
> for iyr,yr in enumerate(uniqueyear):
> for j, val, in enumerate (answer[:,0]):
> flowyr=string.split(val,"/")
> if int(flowyr[2])==int(yr):
> print answer[j,1]
> #flow_order =

I'm not sure what you mean by `write it to an array'. `answers' is an
array. Perhaps you could show an example that has the bad behavior you
are observing. Or at least an example of what you expect to get.

Also, just a couple of pointers:

this:

> tempdata = open( filename).read()
> # break into lines
> tempdata = string.split( tempdata, linesep )
> # for each record, get the field values
> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])

is better written (and usually written) in python like this:

for line in open(filename):
fields = line.split()

Don't use the string module, use the methods of the strings
themselves.
Don't use built-in type names as variable names, as seen on this line:
> list =string.split(fields[0],"/") # list is a built-in type

You only need to use enumerate if you actually want the index. If you
don't need the index, just iterate over the sequence. eg. use this:

> for yr in uniqueyear:

You don't need to re-create the column-stack each time you get a value
from the file. It is very inefficient.

eg. this:

> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])
> if len(fields)>5:
> flowdate.append(fields[0])
> list =string.split(fields[0],"/")
> yeardate.append(list[2])
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))

to this:

> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])
> if len(fields)>5:
> flowdate.append(fields[0])
> list =string.split(fields[0],"/")
> yeardate.append(list[2])
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))

or, with the other suggested changes:

> for line in open(filename):
> # split into the lines
> fields = line.split()
> if len(fields) > 5:
> flowdate.append(fields[0])
> year = fields[0].split("/")[2]
> yeardate.append(year)
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))

If I was doing this though, I would use a dictionary (dict) where the
keys are the year and the values are lists of flows for that year.

Something like this:
[code]
filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
year2flows = {}

fin = open(filename)
for line in fin:
# split into the lines
fields = line.split()
if len(fields)>5:
dat