aggregating a large sorted text file

adtvff Wed, 21 Mar 2007 04:21:03 -0800

Hi,

Given a large ascii file (delimited or fixed width) with one ID field
and dimensions/measures fields, sorted by dimensions, I'd like to
"flatten" or "rollup" the file by creating new columns: one for each
combination of dimension level, and summing up measures over all
records for a given ID.


If the wheel has already been invented, great, please point me in the
right direction. If not, please share some pointers on how to think
about this problem in order to write efficient code.

Is a hash with dimension level combinations a good approach, with
values reset at each new ID level?

I know mysql, Oracle etc will do this , but they all have a cap on #
of columns allowed. SAS will allow unlimited columns, but I don't own
SAS.

Thanks.


ID,color,shape,msr1
------------------------------
001, blue, square,  4
001, red , circle,    5
001, red,  circle,    6


ID, blue_circle, blue_square, red_circle, red_square
--------------------------------------------------------------------------
001,0,4,11,0
002 ...

-- 
http://mail.python.org/mailman/listinfo/python-list

flattening/rolling up/aggregating a large sorted text file

Reply via email to