On Wed, Nov 07, 2007 at 01:16:04PM -0500, George Bosilca wrote: > > On Nov 7, 2007, at 12:51 PM, Jeff Squyres wrote: > >>> The same callback is called in both cases. In the case that you >>> described, the callback is called just a little bit deeper into the >>> recursion, when in the "normal case" it will get called from the >>> first level of the recursion. Or maybe I miss something here ... >> >> Right -- it's not the callback that is the problem. It's when the >> recursion is unwound and further up the stack you now have a stale >> request. > > That's exactly the point that I fail to see. If the request is freed in the > PML callback, then it should get release in both cases, and therefore lead > to problems all the time. Which, obviously, is not true when we do not have > this deep recursion thing going on. > > Moreover, he request management is based on the reference count. The PML > level have one ref count and the MPI level have another one. In fact, we > cannot release a request until we explicitly call ompi_request_free on it. > The place where this call happens is different between the blocking and non > blocking calls. In the non blocking case the ompi_request_free get called > from the *_test (*_wait) functions while in the blocking case it get called > directly from the MPI_Send function. > > Let me summarize: a request cannot reach a stale state without a call to > ompi_request_free. This function is never called directly from the PML > level. Therefore, the recursion depth should not have any impact on the > state of the request !
I looked at the code one more time and it seems to me now that George is absolutely right. The scenario I described cannot happen because we call ompi_request_free() at the top of the stack. I somehow had an impression that we mark internal requests as freed before calling send(). So I'll go and implement NOT_ON_WIRE extension when I'll have time for it. -- Gleb.