Hi David,

On 2021/8/13 1:45, David Teigland wrote:
On Thu, Aug 12, 2021 at 01:44:53PM +0800, Gang He wrote:
In fact, I can reproduce this problem stably.
I want to know if this error happen is by our expectation? since there is
not any extreme pressure test.
Second, how should we handle these error cases? call dlm_lock function
again? maybe the function will fails again, that will lead to kernel
soft-lockup after multiple re-tries.

What's probably happening is that ocfs2 calls dlm_unlock(CANCEL) to cancel
an in-progress dlm_lock() request.  Before the cancel completes (or the
original request completes), ocfs2 calls dlm_lock() again on the same
resource.  This dlm_lock() returns -EBUSY because the previous request has
not completed, either normally or by cancellation.  This is expected.
These dlm_lock and dlm_unlock are invoked in the same node, or the different nodes?


A couple options to try: wait for the original request to complete
(normally or by cancellation) before calling dlm_lock() again, or retry
dlm_lock() on -EBUSY.
If I retry dlm_lock() repeatedly, I just wonder if this will lead to kernel soft lockup or waste lots of CPU.
If dlm_lock() function returns -EAGAIN, how should we handle this case?
retry it repeatedly?

Thanks
Gang


Dave


Reply via email to