[EMAIL PROTECTED] wrote: > Hi, > > Given a large ascii file (delimited or fixed width) with one ID field > and dimensions/measures fields, sorted by dimensions, I'd like to > "flatten" or "rollup" the file by creating new columns: one for each > combination of dimension level, and summing up measures over all > records for a given ID. > > If the wheel has already been invented, great, please point me in the > right direction. If not, please share some pointers on how to think > about this problem in order to write efficient code. > > Is a hash with dimension level combinations a good approach, with > values reset at each new ID level? > > I know mysql, Oracle etc will do this , but they all have a cap on # > of columns allowed. SAS will allow unlimited columns, but I don't own > SAS. > > Thanks. > > > ID,color,shape,msr1 > ------------------------------ > 001, blue, square, 4 > 001, red , circle, 5 > 001, red, circle, 6 > > > ID, blue_circle, blue_square, red_circle, red_square > -------------------------------------------------------------------------- > 001,0,4,11,0 > 002 ... > It seems a bit wrong-headed to force this problem to fit a solution where you define relations with a variable number of columns when the natural way to solve it would seem to be to sum the msr1 values for each unique combination of ID, color and shape. That's a pretty straightforward relational problem.
So, is there some reason the result *has* to have that variable number of columns? regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Recent Ramblings http://holdenweb.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list