William Kennington <w...@google.com> writes: >> On Nov 4, 2017, at 2:14 AM, Michael Ellerman <m...@ellerman.id.au >> <mailto:m...@ellerman.id.au>> wrote: >> >> "William A. Kennington III" <w...@google.com <mailto:w...@google.com>> >> writes: >> >>> The current code checks the completion map to look for the first token >>> that is complete. In some cases, a completion can come in but the token >>> can still be on lease to the caller processing the completion. If this >>> completed but unreleased token is the first token found in the bitmap by >>> another tasks trying to acquire a token, then the __test_and_set_bit >>> call will fail since the token will still be on lease. The acquisition >>> will then fail with an EBUSY. >>> >>> This patch reorganizes the acquisition code to look at the >>> opal_async_token_map for an unleased token. If the token has no lease it >>> must have no outstanding completions so we should never see an EBUSY, >>> unless we have leased out too many tokens. Since >>> opal_async_get_token_inrerruptible is protected by a semaphore, we will >>> practically never see EBUSY anymore. >>> >>> Signed-off-by: William A. Kennington III <w...@google.com >>> <mailto:w...@google.com>> >>> --- >>> arch/powerpc/platforms/powernv/opal-async.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> I think this is superseeded by Cyrils rework (which he's finally >> posted): >> >> http://patchwork.ozlabs.org/patch/833630/ >> <http://patchwork.ozlabs.org/patch/833630/> >> >> If not please let us know. > > Yeah, I think Cyril’s rework fixes this. I wasn’t sure how long it > would take for master to receive his changes so I figured we could use > something in the interim to fix the locking failures. If his changes > will be mailed into the next merge window then we should have the > issue fixed in master. I understand that rework probably won’t make it > into stable kernels? If not then we should probably send this along to > stable kernel maintainers.
OK. I didn't realise the bug was sufficiently bad to need a backport to stable. To make a backport easier I've merged this patch first, and then Cyril's on top of it (which essentially deletes this patch). I assume you've tested this patch at least somewhat? :) cheers