Also, double check that you have an optimized build, not a debugging build.

SVN and HG checkouts default to debugging builds, which add in lots of latency.


On Feb 13, 2012, at 10:22 AM, Ralph Castain wrote:

> Few thoughts
> 
> 1. Bind to socket is broken in 1.5.4 - fixed in next release
> 
> 2. Add --report-bindings to cmd line and see where it thinks the procs are 
> bound
> 
> 3. Sounds lime memory may not be local - might be worth checking mem binding.
> 
> Sent from my iPad
> 
> On Feb 13, 2012, at 7:07 AM, Matthias Jurenz <matthias.jur...@tu-dresden.de> 
> wrote:
> 
>> Hi Sylvain,
>> 
>> thanks for the quick response!
>> 
>> Here some results with enabled process binding. I hope I used the parameters 
>> correctly...
>> 
>> bind two ranks to one socket:
>> $ mpirun -np 2 --bind-to-core ./all2all
>> $ mpirun -np 2 -mca mpi_paffinity_alone 1 ./all2all
>> 
>> bind two ranks to two different sockets:
>> $ mpirun -np 2 --bind-to-socket ./all2all
>> 
>> All three runs resulted in similar bad latencies (~1.4us).
>> :-(
>> 
>> 
>> Matthias
>> 
>> On Monday 13 February 2012 12:43:22 sylvain.jeau...@bull.net wrote:
>>> Hi Matthias,
>>> 
>>> You might want to play with process binding to see if your problem is
>>> related to bad memory affinity.
>>> 
>>> Try to launch pingpong on two CPUs of the same socket, then on different
>>> sockets (i.e. bind each process to a core, and try different
>>> configurations).
>>> 
>>> Sylvain
>>> 
>>> 
>>> 
>>> De :    Matthias Jurenz <matthias.jur...@tu-dresden.de>
>>> A :     Open MPI Developers <de...@open-mpi.org>
>>> Date :  13/02/2012 12:12
>>> Objet : [OMPI devel] poor btl sm latency
>>> Envoyé par :    devel-boun...@open-mpi.org
>>> 
>>> 
>>> 
>>> Hello all,
>>> 
>>> on our new AMD cluster (AMD Opteron 6274, 2,2GHz) we get very bad
>>> latencies
>>> (~1.5us) when performing 0-byte p2p communication on one single node using
>>> the
>>> Open MPI sm BTL. When using Platform MPI we get ~0.5us latencies which is
>>> pretty good. The bandwidth results are similar for both MPI
>>> implementations
>>> (~3,3GB/s) - this is okay.
>>> 
>>> One node has 64 cores and 64Gb RAM where it doesn't matter how many ranks
>>> allocated by the application. We get similar results with different number
>>> of
>>> ranks.
>>> 
>>> We are using Open MPI 1.5.4 which is built by gcc 4.3.4 without any
>>> special
>>> configure options except the installation prefix and the location of the
>>> LSF
>>> stuff.
>>> 
>>> As mentioned at http://www.open-mpi.org/faq/?category=sm we tried to use
>>> /dev/shm instead of /tmp for the session directory, but it had no effect.
>>> Furthermore, we tried the current release candidate 1.5.5rc1 of Open MPI
>>> which
>>> provides an option to use the SysV shared memory (-mca shmem sysv) - also
>>> this
>>> results in similar poor latencies.
>>> 
>>> Do you have any idea? Please help!
>>> 
>>> Thanks,
>>> Matthias
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to