Mike, Mike, I think in this case "version does not match" really means "not installed". I am pretty sure, as Dave G. and I discussed earlier on this thread, those two lines are because the head node has an HCA and kernel modules, but only the "devel" libraries have been installed (because the HCA is not cabled to a switch). I can assure you that both verbs and psm support work fine on/among the compute nodes (with no messages at all once oob:ud is disabled).
To confirm that the first two lines are from the head node, I just now tried executing mpirun from a compute node instead (one not on the -host list) and the libibverbs warning lines are no longer present. The ibv_fork_init() failures *are* still present. So, I am inclined to agree with Dave that these lines: libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 are an unavoidable product of orterun trying to init the verbs libs on the head node that lacks the necessary libs. So, I am *not* really concerned with making these go away. -Paul On Thu, Mar 5, 2015 at 8:56 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote: > Paul, > judging by: > > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. > libibverbs: Warning: no userspace device-specific driver found for > /sys/class/infiniband_verbs/uverbs0 > > it seems that ofed userspace libraries version does not match loaded ofer > kernel driver version. > > > > On Thu, Mar 5, 2015 at 5:33 PM, Alina Sklarevich < > ali...@dev.mellanox.co.il> wrote: > >> I don't know much about PSM either but shouldn't it be called only after >> the oob:ud code? >> If so, then ibv_fork_init() is being called from oob:ud early enough so >> either there is another reason for ibv_fork_init() failure (like you said), >> or the reason why this verb failed was the same reason why these warnings >> appeared? >> libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. >> libibverbs: Warning: no userspace device-specific driver found for >> /sys/class/infiniband_verbs/uverbs0 >> >> Also, opal_common_verbs_want_fork_support is now set to -1 like you >> suggested so these warnings shouldn't appear anymore. >> >> On Thu, Mar 5, 2015 at 4:51 PM, Jeff Squyres (jsquyres) < >> jsquy...@cisco.com> wrote: >> >>> On Mar 5, 2015, at 6:32 AM, Alina Sklarevich <ali...@dev.mellanox.co.il> >>> wrote: >>> > >>> > If oob:ud was disabled then there was no call to ibv_fork_init() >>> anywhere else, right? If so, then this is why the messages went away. >>> >>> Right. That's why I'm saying it doesn't seem like a PSM problem. >>> >>> (I don't know much about PSM, but I don't think it uses verbs...?) >>> >>> > The calls to ibv_fork_init() from the opal common verbs were pushed to >>> the master. One of the places a call was set is oob:ud, but if there is a >>> call to memory registering verbs before this place, then the call to it in >>> oob:ud would result in a failure. >>> >>> Yes, I think that is the exact question: why are these messages showing >>> up because of oob:ud? It seems like the call sequences to ibv_fork_init() >>> are not as understood as we thought they were. :-( I.e., it was >>> presupposed that oob_ud was the first entity to call any verbs code (and by >>> your commits, is supposed to be calling the common verbs code to call >>> ibv_fork_init() early enough such that it won't be a problem). But either >>> that is not the case, or ibv_fork_init() is failing for some other reason. >>> >>> These are the things that need to be figured out. >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/03/17104.php >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/03/17106.php >> > > > > -- > > Kind Regards, > > M. > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/03/17107.php > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900