I don't know much about PSM either but shouldn't it be called only after the oob:ud code? If so, then ibv_fork_init() is being called from oob:ud early enough so either there is another reason for ibv_fork_init() failure (like you said), or the reason why this verb failed was the same reason why these warnings appeared? libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
Also, opal_common_verbs_want_fork_support is now set to -1 like you suggested so these warnings shouldn't appear anymore. On Thu, Mar 5, 2015 at 4:51 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On Mar 5, 2015, at 6:32 AM, Alina Sklarevich <ali...@dev.mellanox.co.il> > wrote: > > > > If oob:ud was disabled then there was no call to ibv_fork_init() > anywhere else, right? If so, then this is why the messages went away. > > Right. That's why I'm saying it doesn't seem like a PSM problem. > > (I don't know much about PSM, but I don't think it uses verbs...?) > > > The calls to ibv_fork_init() from the opal common verbs were pushed to > the master. One of the places a call was set is oob:ud, but if there is a > call to memory registering verbs before this place, then the call to it in > oob:ud would result in a failure. > > Yes, I think that is the exact question: why are these messages showing up > because of oob:ud? It seems like the call sequences to ibv_fork_init() are > not as understood as we thought they were. :-( I.e., it was presupposed > that oob_ud was the first entity to call any verbs code (and by your > commits, is supposed to be calling the common verbs code to call > ibv_fork_init() early enough such that it won't be a problem). But either > that is not the case, or ibv_fork_init() is failing for some other reason. > > These are the things that need to be figured out. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/03/17104.php >