On 2012-09-20 19:31, giuseppe.amatu...@gmail.com wrote:
Hi,
I have this script in python that i need to apply for very large arrays (arrays 
coming from satellite images).
The script works grate but i would like to speed up the process.
The larger computational time is in the for loop process.
Is there is a way to improve that part?
Should be better to use dic() instead of np.ndarray for saving the results?
and if yes how i can make the sum in dic()(like in the correspondent 
matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col] )?
If the dic() is the solution way is faster?

Thanks
Giuseppe

import numpy  as  np
import sys
from time import clock, time

# create the arrays

start = time()
valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = (time() - start)
print(elapsed , "create the data")

start = time()

categories = np.unique(valuesCategory)
matrix = np.c_[ categories , np.zeros(len(categories))]

elapsed = (time() - start)
print(elapsed , "create the matrix and append a colum zero ")

rows = 10
cols = 10

start = time()

for col in range(0,cols):
     for row in range(0,rows):
         for row_c in range(0,len(matrix)) :
             if valuesCategory[row,col] == matrix[row_c,0] :
                 matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col]
                 break
elapsed = (time() - start)
print(elapsed , "loop in the  data ")

print (matrix)

If I understand the code correctly, 'matrix' contains the categories in
column 0 and the totals in column 1.

What you're doing is performing a linear search through the categories
and then adding to the corresponding total.

Linear searches are slow because on average you have to search through
half of the list. Using a dict would be much faster (although you
should of course measure it!).

Try something like this:

import numpy as np
from time import time

# Create the arrays.

start = time()

valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = time() - start
print(elapsed, "Create the data.")

start = time()

categories = np.unique(valuesCategory)
totals = dict.fromkeys(categories, 0)

elapsed = time() - start
print(elapsed, "Create the totals dict.")

rows = 100
cols = 10

start = time()

for col in range(cols):
    for row in range(rows):
        cat = valuesCategory[row, col]
        ras = valuesRaster[row, col]
        totals[cat] += ras

elapsed = time() - start
print(elapsed, "Loop in the data.")

print(totals)

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to