On 05/16/2011 10:56 AM, Steve Wise wrote:
On 05/16/2011 10:53 AM, Roland Dreier wrote:
On Mon, May 16, 2011 at 8:18 AM, Steve Wise<sw...@opengridcomputing.com>  wrote:
Roland, I need to recall this patch.  It appears to have some problem.
Good thing I didn't apply it yet.

I did think it should be possible to declare wait objects on the stack... the
origin of completions was to handle exactly that issue; the older semaphore
structure was vulnerable to the wakeup after free, but completions handle
that.  But I didn't look at the cxgb4 code at all...

I'm pretty sure the race exists as described in my commit comment. And I don't see how the wait object code could avoid this issue. Unless I'm just all wrong. :)



FYI:

So the crash I kept seeing was that my wake up path would end up accessing a freed wait object. I added a 100us udelay in this path just after setting the wait condition to true, but before calling wake_up(). With this delay, I hit the crash much more quickly. So in my mind I proved that the race exists. I then added this commit to get rid of on-stack wait objects, but left the 100us delay in my code and tested that the crash goes away. So I proved, in my mind, that 1) this was a real issue, and 2) my fix worked. I then ran some regression tests to convince myself the fix was stable before posting the patch on Friday. However our QA seems to be seeing some issues with this patch on various backports with older kernels, so I need to investigate this further. Thus I recalled the patch.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to