I did this in https://svn.open-mpi.org/trac/ompi/changeset/18279; the
message is now gone if IBCM is not installed on the host.
If you care: I actually used open() instead of stat(), because that
way I can also ensure that the current user is able to both read and
write to the device (which is also required).
On Apr 24, 2008, at 11:03 AM, Jeff Squyres (jsquyres) wrote:
...actually, thinking about this a bit more, it might be easy to try
to stat /dev/infiniband/ucmX before calling ib_cm_open_device. I'll
check into it this afternoon.
-jms
Sent from my PDA. No type good.
-----Original Message-----
From: Jeff Squyres (jsquyres)
Sent: Thursday, April 24, 2008 10:56 AM Eastern Standard Time
To: pa...@dev.mellanox.co.il
Cc: Open MPI Developers
Subject: Re: [OMPI devel] Merging in the CPC work
Its unavoidable in the current rev of libibcm :( - sean hefty tells
me that he'll remove that message in the next release.
For the time being, mayhe the right solution in ompi is to not try
to use ibcm unless its specifically requested. :(
-jms
Sent from my PDA. No type good.
-----Original Message-----
From: Pavel Shamis (Pasha) [mailto:pa...@dev.mellanox.co.il]
Sent: Thursday, April 24, 2008 10:52 AM Eastern Standard Time
To: Jeff Squyres (jsquyres)
Cc: Open MPI Developers
Subject: Re: [OMPI devel] Merging in the CPC work
The trivial tests Pass and now I'm running full testing.
In the NOT_XRC tests i got:
libibcm: unable to open /dev/infiniband/ucm0
libibcm: couldn't read ABI version
But the test PASS successfully. So as I understood it use OOB. Can we
prevent the message somehow ?
Jeff Squyres wrote:
> Thanks! That's a result of some [helpful] error messages and
handling
> that I added yesterday when ibcm is not configured on the host.
>
> Fixed in r18273.
>
>
> On Apr 24, 2008, at 8:22 AM, Pavel Shamis (Pasha) wrote:
>
>> The patch below resolves the segfault :
>>
>> -- btl_openib_connect_ibcm.c.orig 2008-04-24
15:14:28.500676000
>> +0300
>> +++ btl_openib_connect_ibcm.c 2008-04-24 15:15:08.961168000 +0300
>> @@ -328,7 +328,7 @@
>> {
>> int rc;
>> modex_msg_t *msg;
>> - ibcm_module_t *m;
>> + ibcm_module_t *m = NULL;
>> opal_list_item_t *item;
>> ibcm_listen_cm_id_t *cmh;
>> ibcm_module_list_item_t *imli;
>>
>>
>> Jeff Squyres wrote:
>>> I had a linker error with the rdmacm library yesterday that I
fixed
>>> later, sorry.
>>>
>>> Could you try it again? You'll need to svn up, re-autogen,
etc. It
>>> should be obvious whether I fixed it -- even trivial apps will
work
>>> or not work.
>>>
>>> Thanks.
>>>
>>>
>>> On Apr 24, 2008, at 6:24 AM, Gleb Natapov wrote:
>>>
>>>> On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha)
wrote:
>>>>> Jeff,
>>>>> All my tests fail.
>>>>> XRC disabled tests failed with:
>>>>> mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so:
undefined
>>>>> symbol: rdma_create_event_channel
>>>>> XRC enabled failed with segfault , I will take a look later
today.
>>>> Well it is a little bit better for me. I compiled only OOB
connection
>>>> manager and ompi passes simple testing.
>>>>
>>>>>
>>>>> Pasha
>>>>>
>>>>> Jeff Squyres wrote:
>>>>>> As we discussed yesterday, I have started the merge from the /
tmp-
>>>>>> public/openib-cpc2 branch. "oob" is currently the default.
>>>>>>
>>>>>> Unfortunately, it caused quite a few conflicts when I merged
with
>>>>>> the
>>>>>> trunk, so I created a new temp branch and put all the work
there:
>>>>>> /tmp-
>>>>>> public/openib-cpc3.
>>>>>>
>>>>>> Could all the IB and iWARP vendors and any other interested
parties
>>>>>> please try this branch before we bring it back to the trunk?
Please
>>>>>> test all functionality that you care about -- XRC, etc. I'd
like to
>>>>>> bring it back to the trunk COB Thursday. Please let me know
if this
>>>>>> is too soon.
>>>>>>
>>>>>> You can force the selection of a different CPC with the
>>>>>> btl_openib_cpc_include MCA param:
>>>>>>
>>>>>> mpirun --mca btl_openib_cpc_include oob ...
>>>>>> mpirun --mca btl_openib_cpc_include xoob ...
>>>>>> mpirun --mca btl_openib_cpc_include rdma_cm ...
>>>>>> mpirun --mca btl_openib_cpc_include ibcm ...
>>>>>>
>>>>>> You might want to concentrate on testing oob and xoob to
ensure that
>>>>>> we didn't cause any regressions. The ibcm and rdma_cm CPCs
probably
>>>>>> still have some rough edges (and the IBCM package in OFED
itself may
>>>>>> not be 100% -- that's one of the things we're evaluating. It's
>>>>>> known
>>>>>> to not install properly on RHEL4U4, for example -- you have to
>>>>>> manually mknod and chmod a device in /dev/infiniband for every
>>>>>> HCA in
>>>>>> the host).
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pavel Shamis (Pasha)
>>>>> Mellanox Technologies
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> --
>>>> Gleb.
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>
>>
>> --
>> Pavel Shamis (Pasha)
>> Mellanox Technologies
>>
>
>
--
Pavel Shamis (Pasha)
Mellanox Technologies
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems