[OMPI devel] enable-smp-locks affects PSM performance

Friedley, Andrew Thu, 30 Oct 2014 10:16:48 -0400 (EDT)

Hi,

I'm reporting a performance (message rate 16%, latency 3%) regression when 
using PSM that occurred between OMPI v1.6.5 and v1.8.1.  I would guess it 
affects other networks too, but I haven't tested.  The problem stems from the 
--enable-smp-locks and --enable-opal-multi-threads options.


--enable-smp-locks defaults to enabled and, on x86, causes a 'lock' prefix to 
be prepended to ASM instructions used by atomic primitives.  Disabling removes 
the 'lock' prefix.

In OMPI 1.6.5, --enable-opal-multi-threads defaulted to disabled.  When 
enabled, OPAL would be compiled with multithreading support, which included 
compiling in calls to atomic primitives.  Those atomic primitives, in turn, 
potentially use a lock prefix (controlled by --enable-smp-locks).

SVN r29891 on the trunk changed the above.  --enable-opal-multi-threads was 
removed.  CPP macros (#if OPAL_ENABLE_MULTI_THREADS) controlling various calls 
to atomic primitives were removed, effectively changing the default behavior to 
multithreading ON for OPAL.  This change was then carried to the v1.7 branch in 
r29944, Fixes #3983.

We can use --disable-smp-locks to make the performance regression go away for 
the builds we ship, but we'd very much prefer if performance was good 'out of 
the box' for people that grab an OMPI tarball and use it with PSM.

My question is, what's the best way to do that?  It seems obvious to just make 
--disable-smp-locks the default, but I presume the change was done on purpose, 
so I'm looking for community feedback.

Thanks,

Andrew

[OMPI devel] enable-smp-locks affects PSM performance

Reply via email to