Hi,

I have these patches laying around a long time... and it's maybe time to
bring them up. It does the three changes in dlm recovery handling:

1.

The dlm_lsop_recover_prep() callback should be called once after the
lockspace is stopped and not if it's already stopped when the recovery
is running. 

It will change possible:

dlm_lsop_recover_prep()
...
dlm_lsop_recover_prep()
dlm_lsop_recover_done()

to only have one possible prep call:

dlm_lsop_recover_prep()
dlm_lsop_recover_done()

2.

If a new_lockspace() is created we wait until a point when members are
successful pinged, then new_lockspace() returns to the caller. However
the recovery might be still running. Mostly all users of dlm will
workaround this with a dlm_lsop_recover_done() call wait to know the dlm
lockspace can be used now. This should be backwards compatible with the
existing dlm users, however they can drop their handling if they want.

3.

There exists two ways how recovery can be triggered. Either somebody called
new_lockspace(), that means a waiter waits until recovery is done. Or it
is a complete async process e.g. nodes joining/leaving the lockspace.
There is no caller in the async case which waits for dlm recovery is done,
therefore there exists no error handling which reacts on possible recovery
errors. This patch series will introduce a "best effort" approach to simple
retry/schedule() the recovery on error and hope the error gets resolved.
If this is not the case in 5 retries panic() will fence the node.

- Alex

Alexander Aring (7):
  fs: dlm: add notes for recovery and membership handling
  fs: dlm: call dlm_lsop_recover_prep once
  fs: dlm: let new_lockspace() wait until recovery
  fs: dlm: handle recovery result outside of ls_recover
  fs: dlm: handle recovery -EAGAIN case as retry
  fs: dlm: change -EINVAL recovery error to -EAGAIN
  fs: dlm: add WARN_ON for non waiter case

 fs/dlm/dlm_internal.h |  4 +--
 fs/dlm/lock.c         |  5 +++-
 fs/dlm/lockspace.c    |  9 ++++---
 fs/dlm/member.c       | 30 +++++++++++-----------
 fs/dlm/recoverd.c     | 60 ++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 82 insertions(+), 26 deletions(-)

-- 
2.31.1

Reply via email to