Also, double check that you have an optimized build, not a debugging build.
SVN and HG checkouts default to debugging builds, which add in lots of latency. On Feb 13, 2012, at 10:22 AM, Ralph Castain wrote: > Few thoughts > > 1. Bind to socket is broken in 1.5.4 - fixed in next release > > 2. Add --report-bindings to cmd line and see where it thinks the procs are > bound > > 3. Sounds lime memory may not be local - might be worth checking mem binding. > > Sent from my iPad > > On Feb 13, 2012, at 7:07 AM, Matthias Jurenz <matthias.jur...@tu-dresden.de> > wrote: > >> Hi Sylvain, >> >> thanks for the quick response! >> >> Here some results with enabled process binding. I hope I used the parameters >> correctly... >> >> bind two ranks to one socket: >> $ mpirun -np 2 --bind-to-core ./all2all >> $ mpirun -np 2 -mca mpi_paffinity_alone 1 ./all2all >> >> bind two ranks to two different sockets: >> $ mpirun -np 2 --bind-to-socket ./all2all >> >> All three runs resulted in similar bad latencies (~1.4us). >> :-( >> >> >> Matthias >> >> On Monday 13 February 2012 12:43:22 sylvain.jeau...@bull.net wrote: >>> Hi Matthias, >>> >>> You might want to play with process binding to see if your problem is >>> related to bad memory affinity. >>> >>> Try to launch pingpong on two CPUs of the same socket, then on different >>> sockets (i.e. bind each process to a core, and try different >>> configurations). >>> >>> Sylvain >>> >>> >>> >>> De : Matthias Jurenz <matthias.jur...@tu-dresden.de> >>> A : Open MPI Developers <de...@open-mpi.org> >>> Date : 13/02/2012 12:12 >>> Objet : [OMPI devel] poor btl sm latency >>> Envoyé par : devel-boun...@open-mpi.org >>> >>> >>> >>> Hello all, >>> >>> on our new AMD cluster (AMD Opteron 6274, 2,2GHz) we get very bad >>> latencies >>> (~1.5us) when performing 0-byte p2p communication on one single node using >>> the >>> Open MPI sm BTL. When using Platform MPI we get ~0.5us latencies which is >>> pretty good. The bandwidth results are similar for both MPI >>> implementations >>> (~3,3GB/s) - this is okay. >>> >>> One node has 64 cores and 64Gb RAM where it doesn't matter how many ranks >>> allocated by the application. We get similar results with different number >>> of >>> ranks. >>> >>> We are using Open MPI 1.5.4 which is built by gcc 4.3.4 without any >>> special >>> configure options except the installation prefix and the location of the >>> LSF >>> stuff. >>> >>> As mentioned at http://www.open-mpi.org/faq/?category=sm we tried to use >>> /dev/shm instead of /tmp for the session directory, but it had no effect. >>> Furthermore, we tried the current release candidate 1.5.5rc1 of Open MPI >>> which >>> provides an option to use the SysV shared memory (-mca shmem sysv) - also >>> this >>> results in similar poor latencies. >>> >>> Do you have any idea? Please help! >>> >>> Thanks, >>> Matthias >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/