Re: [OMPI devel] poor btl sm latency

Matthias Jurenz Fri, 16 Mar 2012 04:57:21 -0400

"Unfortunately" also Platform MPI benefits from disabled ASLR:

shared L2/L1I caches (core 0 and 1) and
enabled ASLR:
$ mpirun -np 2 taskset -c 0,1 ./NPmpi_pcmpi -u 1 -n 1000000
Now starting the main loop
  0:       1 bytes 1000000 times -->     17.07 Mbps in       0.45 usec


disabled ASLR:
$ mpirun -np 2 taskset -c 0,1 ./NPmpi_pcmpi -u 1 -n 1000000
Now starting the main loop
  0:       1 bytes 1000000 times -->     55.03 Mbps in       0.14 usec

exclusive L2/L1I caches (core 0 and 2) and
enabled ASLR:
$ mpirun -np 2 taskset -c 0,2 ./NPmpi_pcmpi -u 1 -n 1000000
Now starting the main loop
  0:       1 bytes 1000000 times -->     30.45 Mbps in       0.27 usec

disabled ASLR:
$ mpirun -np 2 taskset -c 0,2 ./NPmpi_pcmpi -u 1 -n 1000000
Now starting the main loop
  0:       1 bytes 1000000 times -->     31.25 Mbps in       0.28 usec

As expected, disabling the ASLR makes no difference when binding for exclusive 
caches.

Matthias

On Thursday 15 March 2012 17:10:33 Jeffrey Squyres wrote:
> On Mar 15, 2012, at 8:06 AM, Matthias Jurenz wrote:
> > We made a big step forward today!
> > 
> > The used Kernel has a bug regarding to the shared L1 instruction cache in
> > AMD Bulldozer processors:
> > See
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit
> > diff;h=dfb09f9b7ab03fd367740e541a5caf830ed56726 and
> > http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
> > 
> > Until the Kernel is patched we disable the address-space layout
> > randomization
> > 
> > (ASLR) as described in the above PDF:
> >   $ sudo /sbin/sysctl -w kernel.randomize_va_space=0
> > 
> > Therewith, NetPIPE results in ~0.5us latency when binding the processes
> > for L2/L1I cache sharing (i.e. -bind-to-core).
> 
> This is good!  I love it when the bug is not our fault.  :-)
> 
> > However, when binding the processes for exclusive L2/L1I caches (i.e.
> > -cpus- per-proc 2) we still get ~1.1us latency. I don't think that the
> > upcoming kernel patch will help for this kind of process binding...
> 
> Does this kind of thing happen with Platform MPI, too?  I.e., is this
> another kernel issue, or an OMPI-specific issue?

Re: [OMPI devel] poor btl sm latency

Reply via email to