On 3/1/2017 4:51 PM, Christoph Hellwig wrote:

> On Wed, Mar 01, 2017 at 04:30:26PM +0200, Noa Osherovich wrote:
>> Analysis:
>> Since ib_comp_wq isn't single threaded, two works can run in parallel for 
>> the same CQ,
>> executing __ib_process_cq.
> They shouldn't.  Each CQ has a single work_struct, and any given work_struct
> should only be executing at once:
>
> "Note that the flag ``WQ_NON_REENTRANT`` no longer exists as all
> workqueues are now non-reentrant - any work item is guaranteed to be
> executed by at most one worker system-wide at any given time."
>
>> Since this function isn't thread safe and the wc array is shared, it causes 
>> a data corruption
>> which eventually crashes in the MAD layer due to a double list_del of the 
>> same element.
> This should not be the case.  What kernel version are you testing and does
> it contain any patches touching core kernel code?

Thanks Christoph for the quick response.

Currently we see this only in old kernels. I'll investigate this more and 
update.


Reply via email to