I wonder if this is why we invented the "-1" default value for enabling verbs 
fork support() -- because there are legitimate cases where ibv_fork_init() 
fails, and the user doesn't care.  Hence, -1 allows it to fail and no one cares.

Can you tell us why ibv_fork_init() would fail?



> On Mar 4, 2015, at 9:56 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
> I have a system with InifniPath HCAs, where I've historically tested mtl:psm.
> For some reason, that appears to have ceased working some time in the past 4 
> months.
> However, this report is about something else.
> I am using the current master tarball: openmpi-dev-1203-g171d674.tar.bz2
> 
> When I ran configure, verbs support was found even though it was not my 
> intent to use it.
> So, I am running with an explicit blt list that omits verbs and am disabling 
> the broken mtl:psm and mtl:ofi as well.
> However, I am getting complaints from some verbs-related code:
> 
> $ mpirun -mca btl sm,self,tcp -mca mtl ^psm,ofi -np 2 -host n15,n16  
> examples/ring_c
> libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
> libibverbs: Warning: no userspace device-specific driver found for 
> /sys/class/infiniband_verbs/uverbs0
> --------------------------------------------------------------------------
> Fork support was requested but the library call ibv_fork_init() failed.
> 
>   Hostname:    n16
>   Error (22):  Invalid argument
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Fork support was requested but the library call ibv_fork_init() failed.
> 
>   Hostname:    n15
>   Error (22):  Invalid argument
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Fork support was requested but the library call ibv_fork_init() failed.
> 
>   Hostname:    n16
>   Error (22):  Invalid argument
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Fork support was requested but the library call ibv_fork_init() failed.
> 
>   Hostname:    n15
>   Error (22):  Invalid argument
> --------------------------------------------------------------------------
> Process 0 sending 10 to 1, tag 201 (2 processes in ring)
> Process 0 sent to 1
> Process 0 decremented value: 9
> Process 0 decremented value: 8
> Process 0 decremented value: 7
> Process 0 decremented value: 6
> Process 0 decremented value: 5
> Process 0 decremented value: 4
> Process 0 decremented value: 3
> Process 0 decremented value: 2
> Process 0 decremented value: 1
> Process 0 decremented value: 0
> Process 0 exiting
> Process 1 exiting
> 
> 
> There are at least THREE things "wrong" in my opinion.
> 
> The first is the following two lines:
> libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
> libibverbs: Warning: no userspace device-specific driver found for 
> /sys/class/infiniband_verbs/uverbs0
> However, I can run ibv_devinfo (and see ACTIVE ports) on both of the compute 
> nodes.
> So, these appear to me to be a complaint about the login node (which is 
> simply not on the IB network).
> I did not ask for ibv, and even if I did the message about a non-IB login 
> node is just an annoyance.
> 
> The second is the "ibv_fork_init()" message twice per host, again when I have 
> NOT requested btl:verbs. 
> 
> The third is that I had to pass so many mca params just to get as far as this!
> 
> I did find that adding "-mca oob tcp" eliminated all the messages.
> So, I am assuming oob:ud is responsible for this mess.
> 
> This does not appear to be a very good default behavior.
> + I believe oob:ud should *silently* disqualify itself when the node running 
> "mpirun" is not on the IB network.
> + I don't know why/when the ibv_fork_init() messages came about but they are 
> quite annoying when I don't even intend to *use* ibv.
> 
> -Paul
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/03/17093.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to