Sorry to be a bit slow getting to this -- but the increased user time isn't too surprising to me. By far the slowest thing on a modern CPU is memory access, and vectorized languages like PDL (and IDL and Octave and ...) pessimize for memory access. The non-optimized Perl may be making a lot more system calls to allocate and shuffle memory, thereby placing the relevant memory in the CPU's cache. The system then gets the hit for the RAM fetch while the user process gets to take advantage of the higher speed cache. The optimized version may be more efficient in its use of system calls, thereby transferring the RAM fetches to the normal user time. That's not the only scenario that could cause the effect but it is a plausible one.

RAM latency is a big deal -- explicit convolveND sped up by about a factor of 10 on my Pentium machine when I reversed the order of the loops for kernel/image evaluation (the kernel is usually much smaller than the image, so putting the kernel loop on the inside preserves cache better).

Xavier, that effect is also probably why your C code is running so much faster than the IDL and PDL (which, aside from IDL's sloooow interpreter, should run at about the same speed -- both PDL and IDL have reasonably tight generated code in the threadloops). You could test that notion by running with a data set that is small enough to fit entirely in your Level II cache. My belief is that, under those conditions, the difference between C and PDL will become proportionally much less.

Best,
Craig


On Mar 10, 2007, at 12:31 PM, Vanuxem Grégory wrote:

Le vendredi 09 mars 2007 à 07:41 +0000, Xavier Calbet a écrit :
  Hi Greg,

 Thanks a lot for your answers. The key I was looking for was
this mult and such functions, which operate on piddles
without generating new ones. Your comments have really
been helpful on this.

  I have run a version of mandel optimised along these lines.
I do get a performance increase and in fact the sys time
goes almost to zero (see figures below). Unfortunately
the user time goes up significantly

Don't know, may be a Perl issue ?

Greg

secs real user sys total=user+sys
PDL                   649.295    329.347   319.703        649.050
PDL optimised   485.754    484.821       0.664        485.485

  Why is this happening? Am I doing something wrong here?
Is there a way to get the same user time as in the straight PDL
version (329 secs) and almost zero sys time?
Code attached.

  Many thanks again,

  Xavier


        

        
                
______________________________________________________________________ _____ Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son interface révolutionnaire.
http://fr.mail.yahoo.com


_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl



_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to