On Thu, 28 Apr 2005 23:34:31 GMT, Peter A. Schott <[EMAIL PROTECTED]> wrote:
>I've got a file that seems to come across more like a dictionary from what I >can >tell. Something like the following format: > >###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10 >###,1,val_1,2,val_2,3,val_3,5,val_5,11,val_11,25,val_25,967,val_967 > >In other words, different layouts (defined mostly by what is in val_1, val_2, >val_3). > >The ,#, fields indicate what "field" from our mainframe the corresponding value >represents. > >Is there a good way to parse this into a DB-type format where I only pull out >the values corresponding to the appropriate field numbers? Seems like >converting to a dictionary of some sort would be best, but I don't quite know >how I would go about that. > >In this case, the first field is a value unto itself - represents a "letter >type" that would be associated with the rest of the record. The fields are >either present or not, no placeholder if there's no value for e.g. Field #4. Here's a sketch, tested as you'll see, but totally devoid of the error-checking that would be necessary for any data coming from an MF. C:\junk>type schott.txt pers,1,xxx,2,yyy,3,zzz,100,SMITH,101,JOHN,102,ALOYSIUS,103,1969-12-31 addr,1,qqq,2,www,3,eee,200,"""THE LODGE"", 123 MAIN ST",205,WALLA WALLA,206,WA C:\junk>type schott.py import csv for row in csv.reader(open('schott.txt', 'rb')): rectype = row[0] recdict = {} for k in range(1, len(row), 2): recdict[int(row[k])] = row[k+1] print rectype, recdict C:\junk>python schott.py pers {1: 'xxx', 2: 'yyy', 3: 'zzz', 100: 'SMITH', 101: 'JOHN', 102: 'ALOYSIUS', 103: '1969-12-31'} addr {1: 'qqq', 2: 'www', 3: 'eee', 200: '"THE LODGE", 123 MAIN ST', 205: 'WALLA WALLA', 206: 'WA'} Hint: you'll probably go nuts if you don't implement some sort of naming convention instead of those numbers. One way would be like this: mf_surname = 100 mf_given_1 = 101 ... mf_state = 206 then you can refer to recdict[mf_state] instead of recdict[206]. Going upmarket a bit: Have a mf_map = {100: 'surname', 206: 'state', } # etc etc then you do class Record(object): pass # for each row: rec = Record() rec.rectype = row[0] for k in range(1, len(row), 2): setattr(rec, mf_map[int(row[k])], row[k+1]) Then you can refer to rec.state instead of recdict[mf_state] or recdict[206]. Further upmarket would involve gathering basic "type" information about the MF fields (free text, alpha code, identifier (e.g. SSN), money, quantity, date, etc etc) so that you can do validations and format conversions as appropriate. HTH, John -- http://mail.python.org/mailman/listinfo/python-list