Sorry to be a bit slow getting to this -- but the increased user time
isn't too surprising to me. By far the slowest thing on a modern CPU
is memory access, and vectorized languages like PDL (and IDL and
Octave and ...) pessimize for memory access. The non-optimized Perl
may be making a lot more system calls to allocate and shuffle memory,
thereby placing the relevant memory in the CPU's cache. The system
then gets the hit for the RAM fetch while the user process gets to
take advantage of the higher speed cache. The optimized version may
be more efficient in its use of system calls, thereby transferring
the RAM fetches to the normal user time. That's not the only
scenario that could cause the effect but it is a plausible one.
RAM latency is a big deal -- explicit convolveND sped up by about a
factor of 10 on my Pentium machine when I reversed the order of the
loops for kernel/image evaluation (the kernel is usually much smaller
than the image, so putting the kernel loop on the inside preserves
cache better).
Xavier, that effect is also probably why your C code is running so
much faster than the IDL and PDL (which, aside from IDL's sloooow
interpreter, should run at about the same speed -- both PDL and IDL
have reasonably tight generated code in the threadloops). You could
test that notion by running with a data set that is small enough to
fit entirely in your Level II cache. My belief is that, under those
conditions, the difference between C and PDL will become
proportionally much less.
Best,
Craig
On Mar 10, 2007, at 12:31 PM, Vanuxem Grégory wrote:
Le vendredi 09 mars 2007 à 07:41 +0000, Xavier Calbet a écrit :
Hi Greg,
Thanks a lot for your answers. The key I was looking for was
this mult and such functions, which operate on piddles
without generating new ones. Your comments have really
been helpful on this.
I have run a version of mandel optimised along these lines.
I do get a performance increase and in fact the sys time
goes almost to zero (see figures below). Unfortunately
the user time goes up significantly
Don't know, may be a Perl issue ?
Greg
secs real user sys
total=user+sys
PDL 649.295 329.347 319.703 649.050
PDL optimised 485.754 484.821 0.664 485.485
Why is this happening? Am I doing something wrong here?
Is there a way to get the same user time as in the straight PDL
version (329 secs) and almost zero sys time?
Code attached.
Many thanks again,
Xavier
______________________________________________________________________
_____
Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et
son interface révolutionnaire.
http://fr.mail.yahoo.com
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl