Re: processing input from multiple files

2010-10-15 Thread John Posner

On 10/15/2010 6:59 AM, Christopher Steele wrote:

Thanks,

The issue with the times is now sorted, however I'm running into a 
problem towards the end of the script:


 File "sortoutsynop2.py", line 131, in 
newline = 
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ 
"002" +c+"-"+c+"-"+c+str(pressure)+c

TypeError: cannot concatenate 'str' and 'list' objects


I think I can see the issue here, but I'm not entirely sure how to get 
around it. Several of my variables change either from one file to the 
next or from each line. Time and pressure would be examples of both of 
these types.Yet others, such as message_type, are constant. As a 
result I have a mixture of both lists and strings. Should I then 
create a list of the constant values?


I suggest maintaining a list for each such variable, in order to keep 
your code simpler. It won't matter that some lists contain the same 
value over and over and over.


(There's a slight possibility it would matter if you're dealing with 
massive amounts of data. But that's the kind of problem that you don't 
need to solve until you encounter it.)


Some more notes below, interspersed with your code ...

I'm a little confused, I'll send you the script that works for a 
single file


Yes! That's a much better approach: figure out how to handle one file, 
place the code inside a function that takes the filename as an argument, 
and call the function on each file in turn.



and I'll see if I can come up with a more logical way around it.

#!/usr/bin/python

import sys
import os
import re

#foutname = 'test.txt'
#filelist = os.system('ls
fname = "datalist_201081813.txt"


There's a digit missing from the above filename.



foutname1 = 'prestest.txt'
foutname2 = 'temptest.txt'
foutname3 = 'tempdtest.txt'
foutname4 = 'wspeedtest.txt'
foutname5 = 'winddtest.txt'

time = fname.split('_')[1].split('.')[0]
year = time[:4]
month = time[4:6]
day = time[6:8]
hour = time[-2:]

newtime = year+month+day+'_'+hour+''
c = ','
file1 = open(fname,"r")


file2 = open("uk_stations.txt","r")
stations = file2.readlines()
ids=[]
names=[]
lats=[]
lons=[]
for item in stations:
item_list = item.strip().split(',')
ids.append(item_list[0])
names.append(item_list[1])
lats.append(item_list[2])
lons.append(item_list[3])


st = file1.readlines()
print st
data=[item[:item.find(' 333 ')] for item in st]


I still think there's a problem in the above statement. In the data file 
you provided in a previous message, some lines lack the ' 333 ' 
substring. In such lines, the find() method will return -1, which (I 
think) is not what you want. Ex:


>>> item = '1 2 333 4'
>>> item[:item.find(' 333 ')]
  '1 2'

>>> item = '1 2 4'
>>> item[:item.find(' 333 ')]
  '1 2 '

Note that the last digit, "4", gets dropped. I *think* you want 
something like this:


  data = []
  posn = item.find(' 333 ')
  if posn != -1:
  data.append(item[:posn])
  else:
  data.append(...some other value...)



#data=st[split:]
print data

pres_out = ''
temp_out = ''
dtemp_out = ''
dir_out = ''
speed_out = ''

for line in data:
elements=line.split(' ')


Do you really want to specify a SPACE character argument to split()?

>>> 'aaa bbbccc'.split(' ')
  ['aaa', 'bbb', '', '', '', 'ccc']

>>> 'aaa bbbccc'.split()
  ['aaa', 'bbb', 'ccc']



station_id = elements[0]
try:
index = ids.index(station_id)
lat = lats[index]
lon = lons[index]
message_type = 'blah'
except:


It's bad form to use a "bare except", which defines a code block to be 
executed if *anything* does wrong. You should specify what you're 
expecting to go wrong:


  except IndexError:


print 'Station ID',station_id,'not in list!'
lat = lon = 'NaN'
message_type = 'Bad_station_id'

try:
temp = [item for item in elements if item.startswith('1')][0]
temperature = float(temp[2:])/10
sign = temp[1]
if sign == 1:
temperature=-temperature
except:
temperature='NaN'


What are expecting to go wrong (i.e. what exception might occur) in the 
above try/except code?




try:
dtemp = [item for item in elements if item.startswith('2')][0]
dtemperature = float(dtemp[2:])/10
sign = dtemp[1]
if sign == 1:
dtemperature=-dtemperature
except:
detemperature='NaN'
try:
press = [item for item in elements[2:] if item.startswith('4')][0]
if press[1]=='9':
pressure = float(press[1:])/10
else:
pressure = float(press[1:])/10+1000
except:
pressure = 'NaN'

try:
wind = elements[elements.index(temp)-1]
direction = float(wind[1:3])*10
speed = float(wind[3:])*0.51444
except:
direction=speed='NaN'



newline = 
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+'-'+c+'002'+c+'-'+c+'-9

Re: processing input from multiple files

2010-10-15 Thread Christopher Steele
Thanks,

The issue with the times is now sorted, however I'm running into a problem
towards the end of the script:

 File "sortoutsynop2.py", line 131, in 
newline =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "002"
+c+"-"+c+"-"+c+str(pressure)+c
TypeError: cannot concatenate 'str' and 'list' objects


I think I can see the issue here, but I'm not entirely sure how to get
around it. Several of my variables change either from one file to the next
or from each line. Time and pressure would be examples of both of these
types.Yet others, such as message_type, are constant. As a result I have a
mixture of both lists and strings. Should I then create a list of the
constant values? I'm a little confused, I'll send you the script that works
for a single file and I'll see if I can come up with a more logical way
around it.

#!/usr/bin/python

import sys
import os
import re

#foutname = 'test.txt'
#filelist = os.system('ls
fname = "datalist_201081813.txt"
foutname1 = 'prestest.txt'
foutname2 = 'temptest.txt'
foutname3 = 'tempdtest.txt'
foutname4 = 'wspeedtest.txt'
foutname5 = 'winddtest.txt'

time = fname.split('_')[1].split('.')[0]
year = time[:4]
month = time[4:6]
day = time[6:8]
hour = time[-2:]

newtime = year+month+day+'_'+hour+''
c = ','
file1 = open(fname,"r")


file2 = open("uk_stations.txt","r")
stations = file2.readlines()
ids=[]
names=[]
lats=[]
lons=[]
for item in stations:
item_list = item.strip().split(',')
ids.append(item_list[0])
names.append(item_list[1])
lats.append(item_list[2])
lons.append(item_list[3])


st = file1.readlines()
print st
data=[item[:item.find(' 333 ')] for item in st]
#data=st[split:]
print data

pres_out = ''
temp_out = ''
dtemp_out = ''
dir_out = ''
speed_out = ''

for line in data:
elements=line.split(' ')
station_id = elements[0]
try:
index = ids.index(station_id)
lat = lats[index]
lon = lons[index]
message_type = 'blah'
except:
print 'Station ID',station_id,'not in list!'
lat = lon = 'NaN'
message_type = 'Bad_station_id'

try:
temp = [item for item in elements if item.startswith('1')][0]
temperature = float(temp[2:])/10
sign = temp[1]
if sign == 1:
temperature=-temperature
except:
temperature='NaN'

try:
dtemp = [item for item in elements if item.startswith('2')][0]
dtemperature = float(dtemp[2:])/10
sign = dtemp[1]
if sign == 1:
dtemperature=-dtemperature
except:
detemperature='NaN'
try:
press = [item for item in elements[2:] if item.startswith('4')][0]
if press[1]=='9':
pressure = float(press[1:])/10
else:
pressure = float(press[1:])/10+1000
except:
pressure = 'NaN'

try:
wind = elements[elements.index(temp)-1]
direction = float(wind[1:3])*10
speed = float(wind[3:])*0.51444
except:
direction=speed='NaN'



newline =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+'-'+c+'002'+c+'-'+c+'-'+c+str(pressure)+c
print newline
pres_out+=newline+'\n'


newline2 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "011"
+c+"-"+c+"-"+c+str(temperature)+c
print newline2
temp_out+=newline2+'\n'
fout = open(foutname2,'w')
fout.writelines(temp_out)
fout.close()




newline3 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "017"
+c+"-"+c+"-"+c+str(dtemperature)+c
print newline3
dtemp_out+=newline3+'\n'
fout = open(foutname3,'w')
fout.writelines(dtemp_out)
fout.close()


newline4 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "031"
+c+"-"+c+"-"+c+str(direction)+c
print newline4
dir_out+=newline4+'\n'
fout = open(foutname4,'w')
fout.writelines(dir_out)
fout.close()


newline5 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+
"032"+c+"-"+c+"-"+c+str(speed)+c
print newline5
speed_out+=newline5+'\n'


fout = open(foutname1,'w')
fout.writelines(pres_out)
fout.close()
fout = open(foutname2,'w')
fout.writelines(temp_out)
fout.close()
fout = open(foutname3,'w')
fout.writelines(dtemp_out)
fout.close()
fout = open(foutname4,'w')
fout.writelines(dir_out)
fout.close()
fout = open(foutname5,'w')
fout.writelines(speed_out)
fout.close()


cheers

Chris












On Thu, Oct 14, 2010 at 8:15 PM, John Posner  wrote:

> On 10/14/2010 10:44 AM, Christopher Steele wrote:
>
>> The issue is that I need to be able to both, split the names of the files
>> so that I can extract the relevant times, and open each individual file and
>> process each line individually. Once I have achieved this I need to append
>> the sorted files onto one another in one long file so that I can pass them
>> into a verification package. I've tried changing th

Re: processing input from multiple files

2010-10-14 Thread John Posner

On 10/14/2010 10:44 AM, Christopher Steele wrote:
The issue is that I need to be able to both, split the names of the 
files so that I can extract the relevant times, and open each 
individual file and process each line individually. Once I have 
achieved this I need to append the sorted files onto one another in 
one long file so that I can pass them into a verification package. 
I've tried changing the name to textline and I get the same result


I'm very happy to hear that changing the name of a variable did not 
affect the way the program works! Anything else would be worrisome.




- the sorted files overwrite one another.


Variable *time* names a list, with one member for each input file. But 
variable *newtime* names a scalar value, not a list. That looks like a 
problem to me. Either of the following changes might help:


Original:

  for x in time:
  hour= x[:2]
  print hour
  newtime = year+month+day+'_'+hour+'00'

Alternative #1:

  newtime = []
  for x in time:
  hour= x[:2]
  print hour
  newtime.append(year+month+day+'_'+hour+'00')

Alternative #2:
  newtime = [year + month + day + '_' + x[:2] + '00' for x in time]


HTH,
John

--
http://mail.python.org/mailman/listinfo/python-list


Re: processing input from multiple files

2010-10-14 Thread John Posner

On 10/14/2010 6:08 AM, Christopher Steele wrote:

Hi

I've been trying to decode a series of observations from multiple files
(each file is a different time) and put each type of observation into
their own separate file. The script runs successfully for one file but
whenever I try it for more they just overwrite each other.


fileinput.input() iterates over *lines* not entire *files*. So take a 
look at this location in the code:


  for file  in fileinput.input(obs):
  data=file[:file.find(' 333 ')]

Did you mean your iteration variable to be "file", implying that it will 
hold an entire file of input data?


If you meant the iteration variable to be named "textline" instead of 
"file", is it guaranteed that string '  333  ' will occur in every such 
text line?



-John
--
http://mail.python.org/mailman/listinfo/python-list


Re: processing input from multiple files

2010-10-14 Thread Christopher Steele
The issue is that I need to be able to both, split the names of the files so
that I can extract the relevant times, and open each individual file and
process each line individually. Once I have achieved this I need to append
the sorted files onto one another in one long file so that I can pass them
into a verification package. I've tried changing the name to textline and I
get the same result - the sorted files overwrite one another.
The data are actually meteorological observations and I need to manipulate
them in order to test the performance of a model. The 333 denotes that cloud
observations are going to follow - something that is not always reported at
stations.

I hope this has helped

Chris


On Thu, Oct 14, 2010 at 3:16 PM, John Posner  wrote:

> On 10/14/2010 6:08 AM, Christopher Steele wrote:
>
>> Hi
>>
>> I've been trying to decode a series of observations from multiple files
>> (each file is a different time) and put each type of observation into
>> their own separate file. The script runs successfully for one file but
>> whenever I try it for more they just overwrite each other.
>>
>
> fileinput.input() iterates over *lines* not entire *files*. So take a look
> at this location in the code:
>
>
>  for file  in fileinput.input(obs):
>  data=file[:file.find(' 333 ')]
>
> Did you mean your iteration variable to be "file", implying that it will
> hold an entire file of input data?
>
> If you meant the iteration variable to be named "textline" instead of
> "file", is it guaranteed that string '  333  ' will occur in every such text
> line?
>
>
> -John
>
-- 
http://mail.python.org/mailman/listinfo/python-list


processing input from multiple files

2010-10-14 Thread Christopher Steele
Hi

I've been trying to decode a series of observations from multiple files
(each file is a different time) and put each type of observation into their
own separate file. The script runs successfully for one file but whenever I
try it for more they just overwrite each other. I'm new to python and I'm
not sure how to go about efficiently running through the process once and
then appending to the output file for all other input files. Has anyone done
something similar to this before?



If it helps, I'll also attach a sample of one of the input files


#!/usr/bin/python

import sys
import os
import re
import fileinput

#load in file list
#obs = os.system('ls s[i,m,n]uk[0,2,4][1,2,3]d_??00P.DATA')
obs = ['siuk21d_0300P.DATA', 'siuk21d_0900P.DATA']
print obs
#code for file type "datalist"
#fname = "datalist_201081813.txt"


#output files
foutname1 = 'prestest.txt'
foutname2 = 'temptest.txt'
foutname3 = 'tempdtest.txt'
foutname4 = 'wspeedtest.txt'
foutname5 = 'winddtest.txt'


#prepare times
time=[]
year="2009"
month="09"
day="18"
hour=[]

#outputs
pres_out = ''
temp_out = ''
dtemp_out = ''
dir_out = ''
speed_out = ''
x =''


#load in station file with lat/lons
file2 = open("uk_stations.txt","r")
stations = file2.readlines()
ids=[]
names=[]
lats=[]
lons=[]
for item in stations:
item_list = item.strip().split(',')
ids.append(item_list[0])
names.append(item_list[1])
lats.append(item_list[2])
lons.append(item_list[3])

#create loop over file list
time= [item.split('_')[1].split('.')[0] for item in obs]
print time
for x in time:
hour= x[:2]
print hour
newtime = year+month+day+'_'+hour+'00'
print newtime
for file  in fileinput.input(obs):
data=file[:file.find(' 333 ')]
#data=st[split:]
print data
elements=data.split(' ')
print elements
station_id = elements[0]
try:
index = ids.index(station_id)
lat = lats[index]
lon = lons[index]
message_type = 'ADPSFC'
except:
print 'Station ID',station_id,'not in list!'
lat = lon = 'NaN'
message_type = 'Bad_station_id'
try:
temp = [item for item in elements if item.startswith('1')][0]
temperature = float(temp[2:])/10
sign = temp[1]
if sign == 1:
   temperature=-temperature
except:
temperature='NaN'

try:
dtemp = [item for item in elements if item.startswith('2')][0]
dtemperature = float(dtemp[2:])/10
sign = dtemp[1]
if sign == 1:
dtemperature=-dtemperature
except:
detemperature='NaN'
try:
press = [item for item in elements[2:] if item.startswith('4')][0]
if press[1]=='9':
pressure = float(press[1:])/10
else:
pressure = float(press[1:])/10+1000
except:
pressure = 'NaN'

try:
wind = elements[elements.index(temp)-1]
direction = float(wind[1:3])*10
speed = float(wind[3:])*0.51444
except:
direction=speed='NaN'



newline =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "002"
+c+"-"+c+"-"+c+str(pressure)+c
pres_out+=newline+'\n'


newline2 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "011"
+c+"-"+c+"-"+c+str(temperature)+c
print newline2
temp_out+=newline2+'\n'
fout = open(foutname2,'w')
fout.writelines(temp_out)
fout.close()




newline3 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "017"
+c+"-"+c+"-"+c+str(dtemperature)+c
print newline3
dtemp_out+=newline3+'\n'
fout = open(foutname3,'w')
fout.writelines(dtemp_out)
fout.close()


newline4 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "031"
+c+"-"+c+"-"+c+str(direction)+c
print newline4
dir_out+=newline4+'\n'
fout = open(foutname4,'w')
fout.writelines(dir_out)
fout.close()


newline5 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+
"032"+c+"-"+c+"-"+c+str(speed)+c
print newline5
speed_out+=newline5+'\n'


fout = open(foutname1,'w')
fout.writelines(pres_out)
fout.close()
fout = open(foutname2,'w')
fout.writelines(temp_out)
fout.close()
fout = open(foutname3,'w')
fout.writelines(dtemp_out)
fout.close()
fout = open(foutname4,'w')
fout.writelines(dir_out)
fout.close()
fout = open(foutname5,'w')
fout.writelines(speed_out)
fout.close()










cheers

Chris


siuk21d_0300P.DATA
Description: Binary data
-- 
http://mail.python.org/mailman/listinfo/python-list