If you are going to look at it, I will not bother with this.

Rich


On 8/29/07 10:47 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:

> On Wed, Aug 29, 2007 at 10:46:06AM -0400, Richard Graham wrote:
>> Gleb,
>>   Are you looking at this ?
> Not today. And I need the code to reproduce the bug. Is this possible?
> 
>> 
>> Rich
>> 
>> 
>> On 8/29/07 9:56 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
>> 
>>> On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote:
>>>> Is this trunk or 1.2?
>>> Oops. I should read more carefully :) This is trunk.
>>> 
>>>> 
>>>> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote:
>>>>> I have a program that does a simple bucket brigade of sends and receives
>>>>> where rank 0 is the start and repeatedly sends to rank 1 until a certain
>>>>> amount of time has passed and then it sends and all done packet.
>>>>> 
>>>>> Running this under np=2 always works.  However, when I run with greater
>>>>> than 2 using only the SM btl the program usually hangs and one of the
>>>>> processes has a long stack that has a lot of the following 3 calls in it:
>>>>> 
>>>>>  [25] opal_progress(), line 187 in "opal_progress.c"
>>>>>   [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c"
>>>>>   [27] mca_bml_r2_progress(), line 110 in "bml_r2.c"
>>>>> 
>>>>> When stepping through the ompi_fifo_write_to_head routine it looks like
>>>>> the fifo has overflowed.
>>>>> 
>>>>> I am wondering if what is happening is rank 0 has sent a bunch of
>>>>> messages that have exhausted the
>>>>> resources such that one of the middle ranks which is in the process of
>>>>> sending cannot send and therefore
>>>>> never gets to the point of trying to receive the messages from rank 0?
>>>>> 
>>>>> Is the above a possible scenario or are messages periodically bled off
>>>>> the SM BTL's fifos?
>>>>> 
>>>>> Note, I have seen np=3 pass sometimes and I can get it to pass reliably
>>>>> if I raise the shared memory space used by the BTL.   This is using the
>>>>> trunk.
>>>>> 
>>>>> 
>>>>> --td
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> --
>>>> Gleb.
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> --
>>> Gleb.
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> --
> Gleb.
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to