Re: [OMPI devel] poor btl sm latency

Jeffrey Squyres Thu, 15 Mar 2012 12:10:37 -0400

On Mar 15, 2012, at 8:06 AM, Matthias Jurenz wrote:

> We made a big step forward today!
> 
> The used Kernel has a bug regarding to the shared L1 instruction cache in AMD 
> Bulldozer processors:
> See 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=dfb09f9b7ab03fd367740e541a5caf830ed56726
>  
> and
> http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
> 
> Until the Kernel is patched we disable the address-space layout randomization 
> (ASLR) as described in the above PDF:
> 
>   $ sudo /sbin/sysctl -w kernel.randomize_va_space=0
> 
> Therewith, NetPIPE results in ~0.5us latency when binding the processes for 
> L2/L1I cache sharing (i.e. -bind-to-core).


This is good!  I love it when the bug is not our fault.  :-)

> However, when binding the processes for exclusive L2/L1I caches (i.e. -cpus-
> per-proc 2) we still get ~1.1us latency. I don't think that the upcoming 
> kernel patch will help for this kind of process binding...

Does this kind of thing happen with Platform MPI, too?  I.e., is this another 
kernel issue, or an OMPI-specific issue?

-- 
Jeff Squyres
[email protected]
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] poor btl sm latency

Reply via email to