I still see an issue with the openib receive queues settings. Interestingly, it seems to work if I pass the setting with the mpirun command, e.g.

mpirun --mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32 --npernode 1 -np 2 ./lat

but if I add it to the ${HOME}/.openmpi/mca-param.conf file, e.g.

---snip---
 cat  ~/.openmpi/mca-params.conf
...
btl_openib_receive_queues = S,12288,128,64,32:S,65536,128,64,32
...
----------snip---

I receive the following error message:
gabriel@crill:~> mpirun  --npernode 1 -np 2 ./lat
--------------------------------------------------------------------------
The Open MPI receive queue configuration for the OpenFabrics devices
on two nodes are incompatible, meaning that MPI processes on two
specific nodes were unable to communicate with each other.  This
generally happens when you are using OpenFabrics devices from
different vendors on the same network.  You should be able to use the
mca_btl_openib_receive_queues MCA parameter to set a uniform receive
queue configuration for all the devices in the MPI job, and therefore
be able to run successfully.

  Local host:       crill-003
  Local adapter:    mlx4_0 (vendor 0x2c9, part ID 26418)
  Local queues:     S,12288,128,64,32:S,65536,128,64,32

  Remote host:      crill-004
  Remote adapter:   (vendor 0x2c9, part ID 26418)
Remote queues: P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64
--------------------------------------------------------------------------

Does anybody have an idea what I should be looking for to fix this? I can definitely confirm, that the home file system is mounted on all nodes correctly (i.e. all processes can access the same mca-params.conf file), and they have the identical IB hardware (in contrary to what the error message says).


Thanks
Edgar

--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Reply via email to