On Wed, Nov 07, 2007 at 01:16:04PM -0500, George Bosilca wrote:
>
> On Nov 7, 2007, at 12:51 PM, Jeff Squyres wrote:
>
>>> The same callback is called in both cases. In the case that you
>>> described, the callback is called just a little bit deeper into the
>>> recursion, when in the "normal case" it will get called from the
>>> first level of the recursion. Or maybe I miss something here ...
>>
>> Right -- it's not the callback that is the problem.  It's when the
>> recursion is unwound and further up the stack you now have a stale
>> request.
>
> That's exactly the point that I fail to see. If the request is freed in the 
> PML callback, then it should get release in both cases, and therefore lead 
> to problems all the time. Which, obviously, is not true when we do not have 
> this deep recursion thing going on.
>
> Moreover, he request management is based on the reference count. The PML 
> level have one ref count and the MPI level have another one. In fact, we 
> cannot release a request until we explicitly call ompi_request_free on it. 
> The place where this call happens is different between the blocking and non 
> blocking calls. In the non blocking case the ompi_request_free get called 
> from the *_test (*_wait) functions while in the blocking case it get called 
> directly from the MPI_Send function.
>
> Let me summarize: a request cannot reach a stale state without a call to 
> ompi_request_free. This function is never called directly from the PML 
> level. Therefore, the recursion depth should not have any impact on the 
> state of the request !

I looked at the code one more time and it seems to me now that George is
absolutely right. The scenario I described cannot happen because we call
ompi_request_free() at the top of the stack. I somehow had an
impression that we mark internal requests as freed before calling
send(). So I'll go and implement NOT_ON_WIRE extension when I'll have
time for it.

--
                        Gleb.

Reply via email to