Looking at the code the arrays that you are multiplying seem fairly small (300, 200) and you have 50 of them. So it might the case that there is not enough computational work to compensate for the cost of forking new processes and communicating the results. Have you tried larger arrays and more of them ?
If you are on an intel machine and you have MKL libraries around I would strongly recommend that you use the matrix multiplication routine if possible. MKL will do the parallelization for you. Well, any good BLAS implementation would do the same, you dont really need MKL. ATLAS and ACML would work too, just that MKL has been setup for us and it works well. To give an idea, given the amount of tuning and optimization that these libraries have undergone a numpy.sum would be slower that an multiplication with a vector of all ones. So in the interest of speed the longer you stay in the BLAS context the better. --srean On Fri, Jun 10, 2011 at 10:01 AM, Brandt Belson <bbel...@princeton.edu> wrote: > Unfortunately I can't flatten the arrays. I'm writing a library where the > user supplies an inner product function for two generic objects, and almost > always the inner product function does large array multiplications at some > point. The library doesn't get to know about the underlying arrays. > Thanks, > Brandt _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion