Mike,

Mike, I think in this case "version does not match" really means "not
installed".  I am pretty sure, as Dave G. and I discussed earlier on this
thread, those two lines are because the head node has an HCA and kernel
modules, but only the "devel" libraries have been installed (because the
HCA is not cabled to a switch).  I can assure you that both verbs and psm
support work fine on/among the compute nodes (with no messages at all once
oob:ud is disabled).

To confirm that the first two lines are from the head node, I just now
tried executing mpirun from a compute node instead (one not on the -host
list) and the libibverbs warning lines are no longer present.  The
ibv_fork_init() failures *are* still present.

So, I am inclined to agree with Dave that these lines:

libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for
/sys/class/infiniband_verbs/uverbs0

are an unavoidable product of orterun trying to init the verbs libs on the
head node that lacks the necessary libs.  So, I am *not* really concerned
with making these go away.

-Paul

On Thu, Mar 5, 2015 at 8:56 AM, Mike Dubman <mi...@dev.mellanox.co.il>
wrote:

> Paul,
> judging by:
>
> libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
> libibverbs: Warning: no userspace device-specific driver found for
> /sys/class/infiniband_verbs/uverbs0
>
> it seems that ofed userspace libraries version does not match loaded ofer
> kernel driver version.
>
>
>
> On Thu, Mar 5, 2015 at 5:33 PM, Alina Sklarevich <
> ali...@dev.mellanox.co.il> wrote:
>
>> I don't know much about PSM either but shouldn't it be called only after
>> the oob:ud code?
>> If so, then ibv_fork_init() is being called from oob:ud early enough so
>> either there is another reason for ibv_fork_init() failure (like you said),
>> or the reason why this verb failed was the same reason why these warnings
>> appeared?
>> libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
>> libibverbs: Warning: no userspace device-specific driver found for
>> /sys/class/infiniband_verbs/uverbs0
>>
>> Also, opal_common_verbs_want_fork_support is now set to -1 like you
>> suggested so these warnings shouldn't appear anymore.
>>
>> On Thu, Mar 5, 2015 at 4:51 PM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>>
>>> On Mar 5, 2015, at 6:32 AM, Alina Sklarevich <ali...@dev.mellanox.co.il>
>>> wrote:
>>> >
>>> > If oob:ud was disabled then there was no call to ibv_fork_init()
>>> anywhere else, right? If so, then this is why the messages went away.
>>>
>>> Right.  That's why I'm saying it doesn't seem like a PSM problem.
>>>
>>> (I don't know much about PSM, but I don't think it uses verbs...?)
>>>
>>> > The calls to ibv_fork_init() from the opal common verbs were pushed to
>>> the master. One of the places a call was set is oob:ud, but if there is a
>>> call to memory registering verbs before this place, then the call to it in
>>> oob:ud would result in a failure.
>>>
>>> Yes, I think that is the exact question: why are these messages showing
>>> up because of oob:ud?  It seems like the call sequences to ibv_fork_init()
>>> are not as understood as we thought they were.  :-(  I.e., it was
>>> presupposed that oob_ud was the first entity to call any verbs code (and by
>>> your commits, is supposed to be calling the common verbs code to call
>>> ibv_fork_init() early enough such that it won't be a problem).  But either
>>> that is not the case, or ibv_fork_init() is failing for some other reason.
>>>
>>> These are the things that need to be figured out.
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/03/17104.php
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/03/17106.php
>>
>
>
>
> --
>
> Kind Regards,
>
> M.
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/03/17107.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to