Re: processing input from multiple files
On 10/15/2010 6:59 AM, Christopher Steele wrote: Thanks, The issue with the times is now sorted, however I'm running into a problem towards the end of the script: File "sortoutsynop2.py", line 131, in newline = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "002" +c+"-"+c+"-"+c+str(pressure)+c TypeError: cannot concatenate 'str' and 'list' objects I think I can see the issue here, but I'm not entirely sure how to get around it. Several of my variables change either from one file to the next or from each line. Time and pressure would be examples of both of these types.Yet others, such as message_type, are constant. As a result I have a mixture of both lists and strings. Should I then create a list of the constant values? I suggest maintaining a list for each such variable, in order to keep your code simpler. It won't matter that some lists contain the same value over and over and over. (There's a slight possibility it would matter if you're dealing with massive amounts of data. But that's the kind of problem that you don't need to solve until you encounter it.) Some more notes below, interspersed with your code ... I'm a little confused, I'll send you the script that works for a single file Yes! That's a much better approach: figure out how to handle one file, place the code inside a function that takes the filename as an argument, and call the function on each file in turn. and I'll see if I can come up with a more logical way around it. #!/usr/bin/python import sys import os import re #foutname = 'test.txt' #filelist = os.system('ls fname = "datalist_201081813.txt" There's a digit missing from the above filename. foutname1 = 'prestest.txt' foutname2 = 'temptest.txt' foutname3 = 'tempdtest.txt' foutname4 = 'wspeedtest.txt' foutname5 = 'winddtest.txt' time = fname.split('_')[1].split('.')[0] year = time[:4] month = time[4:6] day = time[6:8] hour = time[-2:] newtime = year+month+day+'_'+hour+'' c = ',' file1 = open(fname,"r") file2 = open("uk_stations.txt","r") stations = file2.readlines() ids=[] names=[] lats=[] lons=[] for item in stations: item_list = item.strip().split(',') ids.append(item_list[0]) names.append(item_list[1]) lats.append(item_list[2]) lons.append(item_list[3]) st = file1.readlines() print st data=[item[:item.find(' 333 ')] for item in st] I still think there's a problem in the above statement. In the data file you provided in a previous message, some lines lack the ' 333 ' substring. In such lines, the find() method will return -1, which (I think) is not what you want. Ex: >>> item = '1 2 333 4' >>> item[:item.find(' 333 ')] '1 2' >>> item = '1 2 4' >>> item[:item.find(' 333 ')] '1 2 ' Note that the last digit, "4", gets dropped. I *think* you want something like this: data = [] posn = item.find(' 333 ') if posn != -1: data.append(item[:posn]) else: data.append(...some other value...) #data=st[split:] print data pres_out = '' temp_out = '' dtemp_out = '' dir_out = '' speed_out = '' for line in data: elements=line.split(' ') Do you really want to specify a SPACE character argument to split()? >>> 'aaa bbbccc'.split(' ') ['aaa', 'bbb', '', '', '', 'ccc'] >>> 'aaa bbbccc'.split() ['aaa', 'bbb', 'ccc'] station_id = elements[0] try: index = ids.index(station_id) lat = lats[index] lon = lons[index] message_type = 'blah' except: It's bad form to use a "bare except", which defines a code block to be executed if *anything* does wrong. You should specify what you're expecting to go wrong: except IndexError: print 'Station ID',station_id,'not in list!' lat = lon = 'NaN' message_type = 'Bad_station_id' try: temp = [item for item in elements if item.startswith('1')][0] temperature = float(temp[2:])/10 sign = temp[1] if sign == 1: temperature=-temperature except: temperature='NaN' What are expecting to go wrong (i.e. what exception might occur) in the above try/except code? try: dtemp = [item for item in elements if item.startswith('2')][0] dtemperature = float(dtemp[2:])/10 sign = dtemp[1] if sign == 1: dtemperature=-dtemperature except: detemperature='NaN' try: press = [item for item in elements[2:] if item.startswith('4')][0] if press[1]=='9': pressure = float(press[1:])/10 else: pressure = float(press[1:])/10+1000 except: pressure = 'NaN' try: wind = elements[elements.index(temp)-1] direction = float(wind[1:3])*10 speed = float(wind[3:])*0.51444 except: direction=speed='NaN' newline = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+'-'+c+'002'+c+'-'+c+'-9
Re: processing input from multiple files
Thanks, The issue with the times is now sorted, however I'm running into a problem towards the end of the script: File "sortoutsynop2.py", line 131, in newline = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "002" +c+"-"+c+"-"+c+str(pressure)+c TypeError: cannot concatenate 'str' and 'list' objects I think I can see the issue here, but I'm not entirely sure how to get around it. Several of my variables change either from one file to the next or from each line. Time and pressure would be examples of both of these types.Yet others, such as message_type, are constant. As a result I have a mixture of both lists and strings. Should I then create a list of the constant values? I'm a little confused, I'll send you the script that works for a single file and I'll see if I can come up with a more logical way around it. #!/usr/bin/python import sys import os import re #foutname = 'test.txt' #filelist = os.system('ls fname = "datalist_201081813.txt" foutname1 = 'prestest.txt' foutname2 = 'temptest.txt' foutname3 = 'tempdtest.txt' foutname4 = 'wspeedtest.txt' foutname5 = 'winddtest.txt' time = fname.split('_')[1].split('.')[0] year = time[:4] month = time[4:6] day = time[6:8] hour = time[-2:] newtime = year+month+day+'_'+hour+'' c = ',' file1 = open(fname,"r") file2 = open("uk_stations.txt","r") stations = file2.readlines() ids=[] names=[] lats=[] lons=[] for item in stations: item_list = item.strip().split(',') ids.append(item_list[0]) names.append(item_list[1]) lats.append(item_list[2]) lons.append(item_list[3]) st = file1.readlines() print st data=[item[:item.find(' 333 ')] for item in st] #data=st[split:] print data pres_out = '' temp_out = '' dtemp_out = '' dir_out = '' speed_out = '' for line in data: elements=line.split(' ') station_id = elements[0] try: index = ids.index(station_id) lat = lats[index] lon = lons[index] message_type = 'blah' except: print 'Station ID',station_id,'not in list!' lat = lon = 'NaN' message_type = 'Bad_station_id' try: temp = [item for item in elements if item.startswith('1')][0] temperature = float(temp[2:])/10 sign = temp[1] if sign == 1: temperature=-temperature except: temperature='NaN' try: dtemp = [item for item in elements if item.startswith('2')][0] dtemperature = float(dtemp[2:])/10 sign = dtemp[1] if sign == 1: dtemperature=-dtemperature except: detemperature='NaN' try: press = [item for item in elements[2:] if item.startswith('4')][0] if press[1]=='9': pressure = float(press[1:])/10 else: pressure = float(press[1:])/10+1000 except: pressure = 'NaN' try: wind = elements[elements.index(temp)-1] direction = float(wind[1:3])*10 speed = float(wind[3:])*0.51444 except: direction=speed='NaN' newline = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+'-'+c+'002'+c+'-'+c+'-'+c+str(pressure)+c print newline pres_out+=newline+'\n' newline2 = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "011" +c+"-"+c+"-"+c+str(temperature)+c print newline2 temp_out+=newline2+'\n' fout = open(foutname2,'w') fout.writelines(temp_out) fout.close() newline3 = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "017" +c+"-"+c+"-"+c+str(dtemperature)+c print newline3 dtemp_out+=newline3+'\n' fout = open(foutname3,'w') fout.writelines(dtemp_out) fout.close() newline4 = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "031" +c+"-"+c+"-"+c+str(direction)+c print newline4 dir_out+=newline4+'\n' fout = open(foutname4,'w') fout.writelines(dir_out) fout.close() newline5 = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "032"+c+"-"+c+"-"+c+str(speed)+c print newline5 speed_out+=newline5+'\n' fout = open(foutname1,'w') fout.writelines(pres_out) fout.close() fout = open(foutname2,'w') fout.writelines(temp_out) fout.close() fout = open(foutname3,'w') fout.writelines(dtemp_out) fout.close() fout = open(foutname4,'w') fout.writelines(dir_out) fout.close() fout = open(foutname5,'w') fout.writelines(speed_out) fout.close() cheers Chris On Thu, Oct 14, 2010 at 8:15 PM, John Posner wrote: > On 10/14/2010 10:44 AM, Christopher Steele wrote: > >> The issue is that I need to be able to both, split the names of the files >> so that I can extract the relevant times, and open each individual file and >> process each line individually. Once I have achieved this I need to append >> the sorted files onto one another in one long file so that I can pass them >> into a verification package. I've tried changing th
Re: processing input from multiple files
On 10/14/2010 10:44 AM, Christopher Steele wrote: The issue is that I need to be able to both, split the names of the files so that I can extract the relevant times, and open each individual file and process each line individually. Once I have achieved this I need to append the sorted files onto one another in one long file so that I can pass them into a verification package. I've tried changing the name to textline and I get the same result I'm very happy to hear that changing the name of a variable did not affect the way the program works! Anything else would be worrisome. - the sorted files overwrite one another. Variable *time* names a list, with one member for each input file. But variable *newtime* names a scalar value, not a list. That looks like a problem to me. Either of the following changes might help: Original: for x in time: hour= x[:2] print hour newtime = year+month+day+'_'+hour+'00' Alternative #1: newtime = [] for x in time: hour= x[:2] print hour newtime.append(year+month+day+'_'+hour+'00') Alternative #2: newtime = [year + month + day + '_' + x[:2] + '00' for x in time] HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: processing input from multiple files
On 10/14/2010 6:08 AM, Christopher Steele wrote: Hi I've been trying to decode a series of observations from multiple files (each file is a different time) and put each type of observation into their own separate file. The script runs successfully for one file but whenever I try it for more they just overwrite each other. fileinput.input() iterates over *lines* not entire *files*. So take a look at this location in the code: for file in fileinput.input(obs): data=file[:file.find(' 333 ')] Did you mean your iteration variable to be "file", implying that it will hold an entire file of input data? If you meant the iteration variable to be named "textline" instead of "file", is it guaranteed that string ' 333 ' will occur in every such text line? -John -- http://mail.python.org/mailman/listinfo/python-list
Re: processing input from multiple files
The issue is that I need to be able to both, split the names of the files so that I can extract the relevant times, and open each individual file and process each line individually. Once I have achieved this I need to append the sorted files onto one another in one long file so that I can pass them into a verification package. I've tried changing the name to textline and I get the same result - the sorted files overwrite one another. The data are actually meteorological observations and I need to manipulate them in order to test the performance of a model. The 333 denotes that cloud observations are going to follow - something that is not always reported at stations. I hope this has helped Chris On Thu, Oct 14, 2010 at 3:16 PM, John Posner wrote: > On 10/14/2010 6:08 AM, Christopher Steele wrote: > >> Hi >> >> I've been trying to decode a series of observations from multiple files >> (each file is a different time) and put each type of observation into >> their own separate file. The script runs successfully for one file but >> whenever I try it for more they just overwrite each other. >> > > fileinput.input() iterates over *lines* not entire *files*. So take a look > at this location in the code: > > > for file in fileinput.input(obs): > data=file[:file.find(' 333 ')] > > Did you mean your iteration variable to be "file", implying that it will > hold an entire file of input data? > > If you meant the iteration variable to be named "textline" instead of > "file", is it guaranteed that string ' 333 ' will occur in every such text > line? > > > -John > -- http://mail.python.org/mailman/listinfo/python-list
processing input from multiple files
Hi I've been trying to decode a series of observations from multiple files (each file is a different time) and put each type of observation into their own separate file. The script runs successfully for one file but whenever I try it for more they just overwrite each other. I'm new to python and I'm not sure how to go about efficiently running through the process once and then appending to the output file for all other input files. Has anyone done something similar to this before? If it helps, I'll also attach a sample of one of the input files #!/usr/bin/python import sys import os import re import fileinput #load in file list #obs = os.system('ls s[i,m,n]uk[0,2,4][1,2,3]d_??00P.DATA') obs = ['siuk21d_0300P.DATA', 'siuk21d_0900P.DATA'] print obs #code for file type "datalist" #fname = "datalist_201081813.txt" #output files foutname1 = 'prestest.txt' foutname2 = 'temptest.txt' foutname3 = 'tempdtest.txt' foutname4 = 'wspeedtest.txt' foutname5 = 'winddtest.txt' #prepare times time=[] year="2009" month="09" day="18" hour=[] #outputs pres_out = '' temp_out = '' dtemp_out = '' dir_out = '' speed_out = '' x ='' #load in station file with lat/lons file2 = open("uk_stations.txt","r") stations = file2.readlines() ids=[] names=[] lats=[] lons=[] for item in stations: item_list = item.strip().split(',') ids.append(item_list[0]) names.append(item_list[1]) lats.append(item_list[2]) lons.append(item_list[3]) #create loop over file list time= [item.split('_')[1].split('.')[0] for item in obs] print time for x in time: hour= x[:2] print hour newtime = year+month+day+'_'+hour+'00' print newtime for file in fileinput.input(obs): data=file[:file.find(' 333 ')] #data=st[split:] print data elements=data.split(' ') print elements station_id = elements[0] try: index = ids.index(station_id) lat = lats[index] lon = lons[index] message_type = 'ADPSFC' except: print 'Station ID',station_id,'not in list!' lat = lon = 'NaN' message_type = 'Bad_station_id' try: temp = [item for item in elements if item.startswith('1')][0] temperature = float(temp[2:])/10 sign = temp[1] if sign == 1: temperature=-temperature except: temperature='NaN' try: dtemp = [item for item in elements if item.startswith('2')][0] dtemperature = float(dtemp[2:])/10 sign = dtemp[1] if sign == 1: dtemperature=-dtemperature except: detemperature='NaN' try: press = [item for item in elements[2:] if item.startswith('4')][0] if press[1]=='9': pressure = float(press[1:])/10 else: pressure = float(press[1:])/10+1000 except: pressure = 'NaN' try: wind = elements[elements.index(temp)-1] direction = float(wind[1:3])*10 speed = float(wind[3:])*0.51444 except: direction=speed='NaN' newline = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "002" +c+"-"+c+"-"+c+str(pressure)+c pres_out+=newline+'\n' newline2 = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "011" +c+"-"+c+"-"+c+str(temperature)+c print newline2 temp_out+=newline2+'\n' fout = open(foutname2,'w') fout.writelines(temp_out) fout.close() newline3 = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "017" +c+"-"+c+"-"+c+str(dtemperature)+c print newline3 dtemp_out+=newline3+'\n' fout = open(foutname3,'w') fout.writelines(dtemp_out) fout.close() newline4 = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "031" +c+"-"+c+"-"+c+str(direction)+c print newline4 dir_out+=newline4+'\n' fout = open(foutname4,'w') fout.writelines(dir_out) fout.close() newline5 = message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-"+c+ "032"+c+"-"+c+"-"+c+str(speed)+c print newline5 speed_out+=newline5+'\n' fout = open(foutname1,'w') fout.writelines(pres_out) fout.close() fout = open(foutname2,'w') fout.writelines(temp_out) fout.close() fout = open(foutname3,'w') fout.writelines(dtemp_out) fout.close() fout = open(foutname4,'w') fout.writelines(dir_out) fout.close() fout = open(foutname5,'w') fout.writelines(speed_out) fout.close() cheers Chris siuk21d_0300P.DATA Description: Binary data -- http://mail.python.org/mailman/listinfo/python-list