Hi,

I am writing code to sort the columns according to the sum of each
column. The dataset is huge (50k rows x 300k cols), so i need to read
line by line and do the summation to avoid the out-of-memory problem.
But I don't know why it runs very slow, and part of the code is as
follows. I suspect it's because of array index, but not sure. Can
anyone
point out what needs to be modified to make it run fast? thanks in
advance!

...
from numpy import *
...

       currSum = zeros(self.componentcount)
       currRow = zeros(self.componentcount)
       for featureDict in self.featureDictList:
           currRow[:] = 0
           for components in self.componentdict1:
               if featureDict.has_key(components):
                   col = self.componentdict1[components]
                   value = featureDict[components]
                   currRow[col]=value;
           currSum = currSum + row;
...

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to