[EMAIL PROTECTED] writes: >Hi,
>Given a large ascii file (delimited or fixed width) with one ID field >and dimensions/measures fields, sorted by dimensions, I'd like to >"flatten" or "rollup" the file by creating new columns: one for each >combination of dimension level, and summing up measures over all >records for a given ID. >If the wheel has already been invented, great, please point me in the >right direction. If not, please share some pointers on how to think >about this problem in order to write efficient code. >Is a hash with dimension level combinations a good approach, with >values reset at each new ID level? >I know mysql, Oracle etc will do this , but they all have a cap on # >of columns allowed. SAS will allow unlimited columns, but I don't own >SAS. >Thanks. >ID,color,shape,msr1 >------------------------------ >001, blue, square, 4 >001, red , circle, 5 >001, red, circle, 6 >ID, blue_circle, blue_square, red_circle, red_square >-------------------------------------------------------------------------- >001,0,4,11,0 >002 ... Something like: import sys from sets import Set ids = {} keys = Set() for line in sys.stdin: ID,COL,SHAPE,VAL = [s.strip() for s in line.split(',')] ids.setdefault(ID,{}) key = '%s_%s'%(COL,SHAPE) ids[ID].setdefault(key,0) ids[ID][key] += int(VAL) keys.add(key) print 'id',',',','.join([str(key) for key in keys]) for id,cols in ids.items(): print id,',', ', '.join([str(cols.get(k,0)) for k in keys]) Doesn't keep all possible keys just those that are actually used. Needs to sort() things here and there. Incidentally I don't think you could do it in SQL at all in this way but you could do it in a more vertical fashion (eg 001, red, circle, 11 001, blue, square, 4 002, red, rhombus, 99) etc. Eddie -- http://mail.python.org/mailman/listinfo/python-list