I wonder if this is why we invented the "-1" default value for enabling verbs fork support() -- because there are legitimate cases where ibv_fork_init() fails, and the user doesn't care. Hence, -1 allows it to fail and no one cares.
Can you tell us why ibv_fork_init() would fail? > On Mar 4, 2015, at 9:56 AM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > I have a system with InifniPath HCAs, where I've historically tested mtl:psm. > For some reason, that appears to have ceased working some time in the past 4 > months. > However, this report is about something else. > I am using the current master tarball: openmpi-dev-1203-g171d674.tar.bz2 > > When I ran configure, verbs support was found even though it was not my > intent to use it. > So, I am running with an explicit blt list that omits verbs and am disabling > the broken mtl:psm and mtl:ofi as well. > However, I am getting complaints from some verbs-related code: > > $ mpirun -mca btl sm,self,tcp -mca mtl ^psm,ofi -np 2 -host n15,n16 > examples/ring_c > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. > libibverbs: Warning: no userspace device-specific driver found for > /sys/class/infiniband_verbs/uverbs0 > -------------------------------------------------------------------------- > Fork support was requested but the library call ibv_fork_init() failed. > > Hostname: n16 > Error (22): Invalid argument > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Fork support was requested but the library call ibv_fork_init() failed. > > Hostname: n15 > Error (22): Invalid argument > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Fork support was requested but the library call ibv_fork_init() failed. > > Hostname: n16 > Error (22): Invalid argument > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Fork support was requested but the library call ibv_fork_init() failed. > > Hostname: n15 > Error (22): Invalid argument > -------------------------------------------------------------------------- > Process 0 sending 10 to 1, tag 201 (2 processes in ring) > Process 0 sent to 1 > Process 0 decremented value: 9 > Process 0 decremented value: 8 > Process 0 decremented value: 7 > Process 0 decremented value: 6 > Process 0 decremented value: 5 > Process 0 decremented value: 4 > Process 0 decremented value: 3 > Process 0 decremented value: 2 > Process 0 decremented value: 1 > Process 0 decremented value: 0 > Process 0 exiting > Process 1 exiting > > > There are at least THREE things "wrong" in my opinion. > > The first is the following two lines: > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. > libibverbs: Warning: no userspace device-specific driver found for > /sys/class/infiniband_verbs/uverbs0 > However, I can run ibv_devinfo (and see ACTIVE ports) on both of the compute > nodes. > So, these appear to me to be a complaint about the login node (which is > simply not on the IB network). > I did not ask for ibv, and even if I did the message about a non-IB login > node is just an annoyance. > > The second is the "ibv_fork_init()" message twice per host, again when I have > NOT requested btl:verbs. > > The third is that I had to pass so many mca params just to get as far as this! > > I did find that adding "-mca oob tcp" eliminated all the messages. > So, I am assuming oob:ud is responsible for this mess. > > This does not appear to be a very good default behavior. > + I believe oob:ud should *silently* disqualify itself when the node running > "mpirun" is not on the IB network. > + I don't know why/when the ibv_fork_init() messages came about but they are > quite annoying when I don't even intend to *use* ibv. > > -Paul > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/03/17093.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/