Not good:


/labhome/alexm/workspace/openmpi-1.6.1a1hge06c2f2a0859/inst/bin/mpirun
--host
h-qa-017,h-qa-017,h-qa-017,h-qa-017,h-qa-018,h-qa-018,h-qa-018,h-qa-018 -np
8 --bind-to-core -bynode -display-map
/usr/mpi/gcc/mlnx-openmpi-1.6rc4/tests/osu_benchmarks-3.1.1/osu_alltoall



 ========================   JOB MAP   ========================



 Data for node: h-qa-017               Num procs: 4

                Process OMPI jobid: [36855,1] Process rank: 0

                Process OMPI jobid: [36855,1] Process rank: 2

                Process OMPI jobid: [36855,1] Process rank: 4

                Process OMPI jobid: [36855,1] Process rank: 6



 Data for node: h-qa-018               Num procs: 4

                Process OMPI jobid: [36855,1] Process rank: 1

                Process OMPI jobid: [36855,1] Process rank: 3

                Process OMPI jobid: [36855,1] Process rank: 5

                Process OMPI jobid: [36855,1] Process rank: 7



 =============================================================

--------------------------------------------------------------------------

An invalid physical processor ID was returned when attempting to bind

an MPI process to a unique processor.



This usually means that you requested binding to more processors than

exist (e.g., trying to bind N MPI processes to M processors, where N >

M).  Double check that you have enough unique processors for all the

MPI processes that you are launching on this host.





$hwloc-ls --of console
Machine (32GB)
  NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (20MB) + L2 L#0 (256KB) +
L1 L#0 (32KB) + Core L#0
    PU L#0 (P#0)
    PU L#1 (P#2)
  NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (20MB) + L2 L#1 (256KB) +
L1 L#1 (32KB) + Core L#1
    PU L#2 (P#1)
    PU L#3 (P#3)


On Tue, May 29, 2012 at 11:00 PM, Jeff Squyres <jsquy...@cisco.com> wrote:

> Per ticket #3108, there were still some unfortunate bugs in the affinity
> code in 1.6.  :-(
>
> These have now been fixed.  ...but since is the 2nd or 3rd time we have
> "fixed" the 1.5/1.6 series w.r.t. processor affinity, I'd really like
> people to test this stuff before it's committed and we ship 1.6.1.  I've
> put tarballs containing the fixes here:
>
>    http://www.open-mpi.org/~jsquyres/unofficial/
>
> Can you please try mpirun options like --bind-to-core and --bind-to-socket
> and ensure that they still work for you?  (even on machines with
> hyperthreading enabled, if you have access to such things)
>
> IBM: I'd particularly like to hear that we haven't made anything worse on
> POWER systems.  Thanks.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to