On Oct 31, 12:48 pm, Tim Chase <python.l...@tim.thechases.com> wrote: > > PRJ01001 4 00100END > > PRJ01002 3 00110END > > > I would like to pick only some columns to a new file and put them to a > > certain places (to match previous data) - definition file (def.csv) > > could be something like this: > > > VARIABLE FIELDSTARTS FIELD SIZE NEW PLACE IN NEW DATA FILE > > ProjID ; 1 ; 5 ; 1 > > CaseID ; 6 ; 3 ; 10 > > UselessV ; 10 ; 1 ; > > Zipcode ; 12 ; 5 ; 15 > > > So the new datafile should look like this: > > > PRJ01 001 00100END > > PRJ01 002 00110END > > How flexible is the def.csv format? The difficulty I see with > your def.csv format is that it leaves undefined gaps (presumably > to be filled in with spaces) and that you also have a blank "new > place in new file" value. If instead, you could specify the > width to which you want to pad it and omit variables you don't > want in the output, ordering the variables in the same order you > want them in the output: > > Variable; Start; Size; Width > ProjID; 1; 5; 10 > CaseID; 6; 3; 10 > Zipcode; 12; 5; 5 > End; 16; 3; 3 > > (note that I lazily use the same method to copy the END from the > source to the destination, rather than coding specially for it) > you could do something like this (untested) > > import csv > f = file('def.csv', 'rb') > f.next() # discard the header row > r = csv.reader(f, delimiter=';') > fields = [ > (varname, slice(int(start), int(start)+int(size)), width) > for varname, start, size, width > in r > ] > f.close() > out = file('out.txt', 'w') > try: > for row in file('data.txt'): > for varname, slc, width in fields: > out.write(row[slc].ljust(width)) > out.write('\n') > finally: > out.close() > > Hope that's fairly easy to follow and makes sense. There might > be some fence-posting errors (particularly your use of "1" as the > initial offset, while python uses "0" as the initial offset for > strings) > > If you can't modify the def.csv format, then things are a bit > more complex and I'd almost be tempted to write a script to try > and convert your existing def.csv format into something simpler > to process like what I describe. > > -tkc
To your point about the non-stand csv encoding in the defs.csv file, you could use a reg exp instead of the csv module to solve that: import re parse_columns = re.compile(r'\s*;\s*') f = file('defs.csv', 'rb') f.readline() # discard the header row r = (parse_columns.split(line.strip()) for line in f) fields = [ (varname, slice(int(start), int(start)+int(size), int(width) if width else 0)) for varname, start, size, width in r ] f.close() which given the OP's csv produces for fields: [('ProjID', slice(1, 6, 1)), ('CaseID', slice(6, 9, 10)), ('UselessV', slice(10, 11, 0)), ('Zipcode', slice(12, 17, 15))] and that should work with the remainder of your original code; although perhaps the OP wants something else to happen when width is omitted from the csv... Cheers - Chas -- http://mail.python.org/mailman/listinfo/python-list