Travis Oliphant wrote:

>
>I suspect I know why, although the difference seems rather large.  
>
[snip]

>I'm surprised the overhead of adjusting pointers is so high, but then 
>again you are probably getting a lot of cache misses in the first case 
>so there is more to it than that, the loops may run more slowly too.
>  
>

I'm personally bothered that this example runs so much more slowly.  I 
don't think it should.  Perhaps it is unavoidable because of the 
memory-layout issues.  It is just hard to believe that the overhead for 
calling into the loop and adjusting the pointers is so much higher. 

But, that isn't the problem, here.  Notice the following:

x3 = N.random.rand(39,2000)
x4 = N.random.rand(39,64,1)

%timeit z3 = x3[:,None,:] - x4

10 loops, best of 3: 76.4 ms per loop

Hmm... It looks like cache misses are a lot more important than making 
sure the inner loop is taken over the largest number of variables 
(that's the current way ufuncs decide which axis ought to be used as the 
1-d loop). 

Perhaps those inner 1-d loops could be optimized (using prefetch or 
something) to reduce the number of cache misses on the inner 
computation, and the concept of looping over the largest dimension 
(instead of the last dimension) should be re-considered.

Ideas,

-Travis




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Reply via email to