...actually, thinking about this a bit more, it might be easy to try to stat 
/dev/infiniband/ucmX before calling ib_cm_open_device.  I'll check into it this 
afternoon.

-jms
Sent from my PDA.  No type good.

 -----Original Message-----
From:   Jeff Squyres (jsquyres)
Sent:   Thursday, April 24, 2008 10:56 AM Eastern Standard Time
To:     pa...@dev.mellanox.co.il
Cc:     Open MPI Developers
Subject:        Re: [OMPI devel] Merging in the CPC work

Its unavoidable in the current rev of libibcm :( - sean hefty tells me that 
he'll remove that message in the next release.

For the time being, mayhe the right solution in ompi is to not try to use ibcm 
unless its specifically requested.  :(

-jms
Sent from my PDA.  No type good.

 -----Original Message-----
From:   Pavel Shamis (Pasha) [mailto:pa...@dev.mellanox.co.il]
Sent:   Thursday, April 24, 2008 10:52 AM Eastern Standard Time
To:     Jeff Squyres (jsquyres)
Cc:     Open MPI Developers
Subject:        Re: [OMPI devel] Merging in the CPC work

The trivial tests Pass and now I'm running full testing.
In the NOT_XRC tests i got:

libibcm: unable to open /dev/infiniband/ucm0
libibcm: couldn't read ABI version

But the test PASS successfully. So as I understood it use OOB. Can we 
prevent the message somehow ?

Jeff Squyres wrote:
> Thanks!  That's a result of some [helpful] error messages and handling 
> that I added yesterday when ibcm is not configured on the host.
>
> Fixed in r18273.
>
>
> On Apr 24, 2008, at 8:22 AM, Pavel Shamis (Pasha) wrote:
>
>> The patch below resolves the segfault :
>>
>> -- btl_openib_connect_ibcm.c.orig      2008-04-24 15:14:28.500676000 
>> +0300
>> +++ btl_openib_connect_ibcm.c   2008-04-24 15:15:08.961168000 +0300
>> @@ -328,7 +328,7 @@
>> {
>>    int rc;
>>    modex_msg_t *msg;
>> -    ibcm_module_t *m;
>> +    ibcm_module_t *m = NULL;
>>    opal_list_item_t *item;
>>    ibcm_listen_cm_id_t *cmh;
>>    ibcm_module_list_item_t *imli;
>>
>>
>> Jeff Squyres wrote:
>>> I had a linker error with the rdmacm library yesterday that I fixed 
>>> later, sorry.
>>>
>>> Could you try it again?  You'll need to svn up, re-autogen, etc.  It 
>>> should be obvious whether I fixed it -- even trivial apps will work 
>>> or not work.
>>>
>>> Thanks.
>>>
>>>
>>> On Apr 24, 2008, at 6:24 AM, Gleb Natapov wrote:
>>>
>>>> On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha) wrote:
>>>>> Jeff,
>>>>> All my tests fail.
>>>>> XRC disabled tests failed with:
>>>>> mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so: undefined
>>>>> symbol: rdma_create_event_channel
>>>>> XRC enabled failed with segfault , I will take a look later today.
>>>> Well it is a little bit better for me. I compiled only OOB connection
>>>> manager and ompi passes simple testing.
>>>>
>>>>>
>>>>> Pasha
>>>>>
>>>>> Jeff Squyres wrote:
>>>>>> As we discussed yesterday, I have started the merge from the /tmp-
>>>>>> public/openib-cpc2 branch.  "oob" is currently the default.
>>>>>>
>>>>>> Unfortunately, it caused quite a few conflicts when I merged with 
>>>>>> the
>>>>>> trunk, so I created a new temp branch and put all the work there: 
>>>>>> /tmp-
>>>>>> public/openib-cpc3.
>>>>>>
>>>>>> Could all the IB and iWARP vendors and any other interested parties
>>>>>> please try this branch before we bring it back to the trunk?  Please
>>>>>> test all functionality that you care about -- XRC, etc.  I'd like to
>>>>>> bring it back to the trunk COB Thursday.  Please let me know if this
>>>>>> is too soon.
>>>>>>
>>>>>> You can force the selection of a different CPC with the
>>>>>> btl_openib_cpc_include MCA param:
>>>>>>
>>>>>>    mpirun --mca btl_openib_cpc_include oob ...
>>>>>>    mpirun --mca btl_openib_cpc_include xoob ...
>>>>>>    mpirun --mca btl_openib_cpc_include rdma_cm ...
>>>>>>    mpirun --mca btl_openib_cpc_include ibcm ...
>>>>>>
>>>>>> You might want to concentrate on testing oob and xoob to ensure that
>>>>>> we didn't cause any regressions.  The ibcm and rdma_cm CPCs probably
>>>>>> still have some rough edges (and the IBCM package in OFED itself may
>>>>>> not be 100% -- that's one of the things we're evaluating.  It's 
>>>>>> known
>>>>>> to not install properly on RHEL4U4, for example -- you have to
>>>>>> manually mknod and chmod a device in /dev/infiniband for every 
>>>>>> HCA in
>>>>>> the host).
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Pavel Shamis (Pasha)
>>>>> Mellanox Technologies
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> -- 
>>>>            Gleb.
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>
>>
>> -- 
>> Pavel Shamis (Pasha)
>> Mellanox Technologies
>>
>
>


-- 
Pavel Shamis (Pasha)
Mellanox Technologies

Reply via email to