I have a system with InifniPath HCAs, where I've historically tested
mtl:psm.
For some reason, that appears to have ceased working some time in the past
4 months.
However, this report is about something else.
I am using the current master tarball: openmpi-dev-1203-g171d674.tar.bz2

When I ran configure, verbs support was found even though it was not my
intent to use it.
So, I am running with an explicit blt list that omits verbs and am
disabling the broken mtl:psm and mtl:ofi as well.
However, I am getting complaints from some verbs-related code:

$ mpirun -mca btl sm,self,tcp -mca mtl ^psm,ofi -np 2 -host n15,n16
 examples/ring_c
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for
/sys/class/infiniband_verbs/uverbs0
--------------------------------------------------------------------------
Fork support was requested but the library call ibv_fork_init() failed.

  Hostname:    n16
  Error (22):  Invalid argument
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Fork support was requested but the library call ibv_fork_init() failed.

  Hostname:    n15
  Error (22):  Invalid argument
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Fork support was requested but the library call ibv_fork_init() failed.

  Hostname:    n16
  Error (22):  Invalid argument
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Fork support was requested but the library call ibv_fork_init() failed.

  Hostname:    n15
  Error (22):  Invalid argument
--------------------------------------------------------------------------

Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
Process 1 exiting



There are at least THREE things "wrong" in my opinion.

The first is the following two lines:

libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for
/sys/class/infiniband_verbs/uverbs0

However, I can run ibv_devinfo (and see ACTIVE ports) on both of the
compute nodes.
So, these appear to me to be a complaint about the login node (which is
simply not on the IB network).
I did not ask for ibv, and even if I did the message about a non-IB login
node is just an annoyance.

The second is the "ibv_fork_init()" message twice per host, again when I
have NOT requested btl:verbs.

The third is that I had to pass so many mca params just to get as far as
this!

I did find that adding "-mca oob tcp" eliminated all the messages.
So, I am assuming oob:ud is responsible for this mess.

This does not appear to be a very good default behavior.
+ I believe oob:ud should *silently* disqualify itself when the node
running "mpirun" is not on the IB network.
+ I don't know why/when the ibv_fork_init() messages came about but they
are quite annoying when I don't even intend to *use* ibv.

-Paul


-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to