Hi, I am writing code to sort the columns according to the sum of each column. The dataset is huge (50k rows x 300k cols), so i need to read line by line and do the summation to avoid the out-of-memory problem. But I don't know why it runs very slow, and part of the code is as follows. I suspect it's because of array index, but not sure. Can anyone point out what needs to be modified to make it run fast? thanks in advance!
... from numpy import * ... currSum = zeros(self.componentcount) currRow = zeros(self.componentcount) for featureDict in self.featureDictList: currRow[:] = 0 for components in self.componentdict1: if featureDict.has_key(components): col = self.componentdict1[components] value = featureDict[components] currRow[col]=value; currSum = currSum + row; ... -- http://mail.python.org/mailman/listinfo/python-list