"Unfortunately" also Platform MPI benefits from disabled ASLR: shared L2/L1I caches (core 0 and 1) and enabled ASLR: $ mpirun -np 2 taskset -c 0,1 ./NPmpi_pcmpi -u 1 -n 1000000 Now starting the main loop 0: 1 bytes 1000000 times --> 17.07 Mbps in 0.45 usec
disabled ASLR: $ mpirun -np 2 taskset -c 0,1 ./NPmpi_pcmpi -u 1 -n 1000000 Now starting the main loop 0: 1 bytes 1000000 times --> 55.03 Mbps in 0.14 usec exclusive L2/L1I caches (core 0 and 2) and enabled ASLR: $ mpirun -np 2 taskset -c 0,2 ./NPmpi_pcmpi -u 1 -n 1000000 Now starting the main loop 0: 1 bytes 1000000 times --> 30.45 Mbps in 0.27 usec disabled ASLR: $ mpirun -np 2 taskset -c 0,2 ./NPmpi_pcmpi -u 1 -n 1000000 Now starting the main loop 0: 1 bytes 1000000 times --> 31.25 Mbps in 0.28 usec As expected, disabling the ASLR makes no difference when binding for exclusive caches. Matthias On Thursday 15 March 2012 17:10:33 Jeffrey Squyres wrote: > On Mar 15, 2012, at 8:06 AM, Matthias Jurenz wrote: > > We made a big step forward today! > > > > The used Kernel has a bug regarding to the shared L1 instruction cache in > > AMD Bulldozer processors: > > See > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit > > diff;h=dfb09f9b7ab03fd367740e541a5caf830ed56726 and > > http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf > > > > Until the Kernel is patched we disable the address-space layout > > randomization > > > > (ASLR) as described in the above PDF: > > $ sudo /sbin/sysctl -w kernel.randomize_va_space=0 > > > > Therewith, NetPIPE results in ~0.5us latency when binding the processes > > for L2/L1I cache sharing (i.e. -bind-to-core). > > This is good! I love it when the bug is not our fault. :-) > > > However, when binding the processes for exclusive L2/L1I caches (i.e. > > -cpus- per-proc 2) we still get ~1.1us latency. I don't think that the > > upcoming kernel patch will help for this kind of process binding... > > Does this kind of thing happen with Platform MPI, too? I.e., is this > another kernel issue, or an OMPI-specific issue?