William Kennington <w...@google.com> writes:

>> On Nov 4, 2017, at 2:14 AM, Michael Ellerman <m...@ellerman.id.au 
>> <mailto:m...@ellerman.id.au>> wrote:
>> 
>> "William A. Kennington III" <w...@google.com <mailto:w...@google.com>> 
>> writes:
>> 
>>> The current code checks the completion map to look for the first token
>>> that is complete. In some cases, a completion can come in but the token
>>> can still be on lease to the caller processing the completion. If this
>>> completed but unreleased token is the first token found in the bitmap by
>>> another tasks trying to acquire a token, then the __test_and_set_bit
>>> call will fail since the token will still be on lease. The acquisition
>>> will then fail with an EBUSY.
>>> 
>>> This patch reorganizes the acquisition code to look at the
>>> opal_async_token_map for an unleased token. If the token has no lease it
>>> must have no outstanding completions so we should never see an EBUSY,
>>> unless we have leased out too many tokens. Since
>>> opal_async_get_token_inrerruptible is protected by a semaphore, we will
>>> practically never see EBUSY anymore.
>>> 
>>> Signed-off-by: William A. Kennington III <w...@google.com 
>>> <mailto:w...@google.com>>
>>> ---
>>> arch/powerpc/platforms/powernv/opal-async.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> I think this is superseeded by Cyrils rework (which he's finally
>> posted):
>> 
>>  http://patchwork.ozlabs.org/patch/833630/ 
>> <http://patchwork.ozlabs.org/patch/833630/>
>> 
>> If not please let us know.
>
> Yeah, I think Cyril’s rework fixes this. I wasn’t sure how long it
> would take for master to receive his changes so I figured we could use
> something in the interim to fix the locking failures. If his changes
> will be mailed into the next merge window then we should have the
> issue fixed in master. I understand that rework probably won’t make it
> into stable kernels? If not then we should probably send this along to
> stable kernel maintainers.

OK. I didn't realise the bug was sufficiently bad to need a backport
to stable.

To make a backport easier I've merged this patch first, and then Cyril's
on top of it (which essentially deletes this patch).

I assume you've tested this patch at least somewhat? :)

cheers

Reply via email to