This patch mainly reverts what commit b92a4e3f86b1 ("fs: dlm: change posix
lock sigint handling") introduced. Except two things, checking if
op->done got true under ops_lock after it got interrupted and changing
"no op" messages to debug printout.
There is currently problems with cleaning up pending operations. The
main idea of commit b92a4e3f86b1 ("fs: dlm: change posix lock sigint
handling") was to wait for a reply and if it was interrupted then the
cleanup routine e.g. list_del(), do_unlock_close() will be executed.
This requires that for every dlm op request a answer in dev_write()
comes back. The cleanup routine do_unlock_close() is not operating in
the dlm user space software on a per request basis and will cleanup
everything else what matches certain plock op fields which concludes
that we don't get anymore for all request a result back. This will
have some leftovers inside the dlm plock recv_list which will never
being deleted.
It was confirmed with a new debugfs entry to look if some plock lists
have still entries left when there is no posix lock activity, checked
by dlm_tool plocks $LS, ongoing anymore. In the specific testcase on
a gfs2 mountpoint the following command was executed:
stress-ng --fcntl 32
and the stress-ng program was killed after certain time.
Due the fact that do_unlock_close() cleans more than just a specific
operation and the dlm operation is already removed by list_del(). This
list_del() can either be operating on send_list or recv_list. If it hits
recv_list it still can be that answers coming back for an ongoing
operation and do_unlock_close() is not synchronized with the list_del().
This will end in "no op ..." log_print(), to not confuse the user about
such issues which seems to be there by design we move this logging
information to pr_debug() as those are expected log messages.
Cc: [email protected]
Fixes: b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling")
Signed-off-by: Alexander Aring <[email protected]>
---
fs/dlm/plock.c | 25 ++++++-------------------
1 file changed, 6 insertions(+), 19 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index ff364901f22b..fea2157fac5b 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -30,8 +30,6 @@ struct plock_async_data {
struct plock_op {
struct list_head list;
int done;
- /* if lock op got interrupted while waiting dlm_controld reply */
- bool sigint;
struct dlm_plock_info info;
/* if set indicates async handling */
struct plock_async_data *data;
@@ -167,12 +165,14 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64
number, struct file *file,
spin_unlock(&ops_lock);
goto do_lock_wait;
}
-
- op->sigint = true;
+ list_del(&op->list);
spin_unlock(&ops_lock);
+
log_debug(ls, "%s: wait interrupted %x %llx pid %d",
__func__, ls->ls_global_id,
(unsigned long long)number, op->info.pid);
+ do_unlock_close(&op->info);
+ dlm_release_plock_op(op);
goto out;
}
@@ -434,19 +434,6 @@ static ssize_t dev_write(struct file *file, const char
__user *u, size_t count,
if (iter->info.fsid == info.fsid &&
iter->info.number == info.number &&
iter->info.owner == info.owner) {
- if (iter->sigint) {
- list_del(&iter->list);
- spin_unlock(&ops_lock);
-
- pr_debug("%s: sigint cleanup %x %llx pid %d",
- __func__, iter->info.fsid,
- (unsigned long long)iter->info.number,
- iter->info.pid);
- do_unlock_close(&iter->info);
- memcpy(&iter->info, &info, sizeof(info));
- dlm_release_plock_op(iter);
- return count;
- }
list_del_init(&iter->list);
memcpy(&iter->info, &info, sizeof(info));
if (iter->data)
@@ -465,8 +452,8 @@ static ssize_t dev_write(struct file *file, const char
__user *u, size_t count,
else
wake_up(&recv_wq);
} else
- log_print("%s: no op %x %llx", __func__,
- info.fsid, (unsigned long long)info.number);
+ pr_debug("%s: no op %x %llx", __func__,
+ info.fsid, (unsigned long long)info.number);
return count;
}
--
2.31.1