Seems to be fixed.

On 7/14/08, Lenny Verkhovsky <lenny.verkhov...@gmail.com> wrote:
>
> ../configure --with-memory-manager=ptmalloc2 --with-openib
>
> I guess not. I always use same configure line, and only recently I started
> to see this error.
>
> On 7/13/08, Jeff Squyres <jsquy...@cisco.com> wrote:
>>
>> I think you said opposite things: Lenny's command line did not
>> specifically ask for ibcm, but it was used anyway.  Lenny -- did you
>> explicitly request it somewhere else (e.g., env var or MCA param file)?
>>
>> I suspect that you did not; I suspect (without looking at the code again)
>> that ibcm tried to select itself and failed on the ibcm_listen() call, so it
>> fell back to oob.  This might have to be another workaround in OMPI, perhaps
>> something like this:
>>
>> if (ibcm_listen() fails)
>>   if (ibcm explicitly requested)
>>       print_warning()
>>   fail to use ibcm
>>
>> Has this been filed as a bug at openfabrics.org?  I don't think that I
>> filed it when Brad and I were testing on RoadRunner -- it would probably be
>> good if someone filed it.
>>
>>
>>
>> On Jul 13, 2008, at 8:56 AM, Lenny Verkhovsky wrote:
>>
>>  Pasha is right, I didn't disabled it.
>>>
>>> On 7/13/08, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il> wrote: Jeff
>>> Squyres wrote:
>>> Brad and I did some scale testing of IBCM and saw this error sometimes.
>>>  It seemed to happen with higher frequency when you increased the number of
>>> processes on a single node.
>>>
>>> I talked to Sean Hefty about it, but we never figured out a definitive
>>> cause or solution.  My best guess is that there is something wonky about
>>> multiple processes simultaneously interacting with the IBCM kernel driver
>>> from userspace; but I don't know jack about kernel stuff, so that's a total
>>> SWAG.
>>>
>>> Thanks for reminding me of this issue; I admit that I had forgotten about
>>> it.  :-(  Pasha -- should IBCM not be the default?
>>> It is not default. I guess Lenny configured it explicitly, is not it ?
>>>
>>> Pasha.
>>>
>>>
>>>
>>>
>>>
>>> On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote:
>>>
>>> Hi,
>>>
>>> I am getting this error sometimes.
>>>
>>> /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile
>>> /home/USERS/lenny/TESTS/COMPILERS/hostfile
>>> /home/USERS/lenny/TESTS/COMPILERS/hello
>>> [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query]
>>> failed to ib_cm_listen 10 times: rc=-1, errno=22
>>> Hello world! I'm 0 of 100 on witch2
>>>
>>>
>>> Best Regards
>>>
>>> Lenny.
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>

Reply via email to