Re: [Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-18 Thread Junxiao Bi
Hi Eric,

> 在 2016年10月19日,下午1:19,Eric Ren  写道:
> 
> Hi all!
> 
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in another deadlock as we have discussed in the recent thread:
>https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012454.html
> 
> Before this one, a similiar deadlock has been fixed by Junxiao:
>commit c25a1e0671fb ("ocfs2: fix posix_acl_create deadlock")
>commit 5ee0fbd50fdf ("ocfs2: revert using ocfs2_acl_chmod to avoid inode 
> cluster lock hang")
> 
> We are in the situation that we have to avoid recursive cluster locking, but
> there is no way to check if a cluster lock has been taken by a precess 
> already.
> 
> Mostly, we can avoid recursive locking by writing code carefully. However, as
> the deadlock issues have proved out, it's very hard to handle the routines
> that are called directly by vfs. For instance:
> 
>const struct inode_operations ocfs2_file_iops = {
>.permission = ocfs2_permission,
>.get_acl= ocfs2_iop_get_acl,
>.set_acl= ocfs2_iop_set_acl,
>};
> 
> 
> ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
> The problem is that the call chain of ocfs2_permission() includes *_acl().
> 
> Possibly, there are three solutions I can think of.  The first one is to
> implement the inode permission routine for ocfs2 itself, replacing the
> existing generic_permission(); this will bring lots of changes and
> involve too many trivial vfs functions into ocfs2 code. Frown on this.
> 
> The second one is, what I am trying now, to keep track of the processes who
> lock/unlock a cluster lock by the following draft patches. But, I quickly
> find out that a cluster locking which has been taken by processA can be 
> unlocked
> by processB. For example, systemfiles like journal: is locked during 
> mout, and
> unlocked during umount. 
I had ever implemented generic recursive locking support, please check the 
patch at https://oss.oracle.com/pipermail/ocfs2-devel/2015-December/011408.html 
 , the 
issue that locking and unlocking in different processes was considered. But it 
was rejected by Mark as recursive locking is not allowed in ocfs2/kernel . 
> 
> The thrid one is to revert that problematic commit! It looks like 
> get/set_acl()
> are always been called by other vfs callback like ocfs2_permission(). I think
> we can do this if it's true, right? Anyway, I'll try to work out if it's 
> true;-)
Not sure whether get/set_acl() will be called directly by vfs. Even not now, we 
can’t make sure that in the future. So revert it may be a little risky. But if 
refactor is complicated, then this maybe the only way we can do.

Thanks,
Junxiao.
> 
> Hope for your input to solve this problem;-)
> 
> Thanks,
> Eric
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking

2016-10-18 Thread Eric Ren
The deadlock issue happens when running discontiguous block
group testing on multiple nodes. The easier way to reproduce
is to do "chmod -R 777 /mnt/ocfs2" things like this on multiple
nodes at the same time by pssh.

This is indeed another deadlock caused by: commit 743b5f1434f5
("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). The reason
had been explained well by Tariq Saeed in this thread:

https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html

For this case, the ocfs2_inode_lock() is misused recursively as below:

do_sys_open
 do_filp_open
  path_openat
   may_open
inode_permission
 __inode_permission
  ocfs2_permission  <== ocfs2_inode_lock()
   generic_permission
get_acl
 ocfs2_iop_get_acl  <== ocfs2_inode_lock()
  ocfs2_inode_lock_full_nested <= deadlock if a remote EX request
comes between two ocfs2_inode_lock()

Fix by checking if the cluster lock has been acquired aready in the call-chain
path.

Fixes: commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren 
---
 fs/ocfs2/acl.c | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..7e3544e 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle,
 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
int status = 0;
 
-   status = ocfs2_inode_lock(inode, &bh, 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   status = ocfs2_inode_lock(inode, &bh, 1);
+   if (status < 0) {
+   if (status != -ENOENT)
+   mlog_errno(status);
+   return status;
+   }
}
+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!oh)
+   ocfs2_inode_unlock(inode, 1);
brelse(bh);
return status;
 }
@@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
+   struct ocfs2_holder *oh;
+   struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres;
int ret;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
+
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (!oh) {
+   ret = ocfs2_inode_lock(inode, &di_bh, 0);
+   if (ret < 0) {
+   if (ret != -ENOENT)
+   mlog_errno(ret);
+   return ERR_PTR(ret);
+   }
}
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   if (!oh)
+   ocfs2_inode_unlock(inode, 0);
brelse(di_bh);
return acl;
 }
-- 
2.6.6


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [DRAFT 1/2] ocfs2/dlmglue: keep track of the processes who take/put a cluster lock

2016-10-18 Thread Eric Ren
We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

Possibly, there are two solutions I can think of. One way is to
implement the inode permission routine for ocfs2 itself, replacing the
existing generic_permission(); this will bring lots of changes and
involve too many trivial vfs functions into ocfs2 code.

Another way is to keep track of the processes who lock/unlock a cluster
lock. This patch provides ocfs2_is_locked_by_me() for process to check
if the cluster lock has been requested before. This is now only used for
avoiding recursive locking, though it also can help debug cluster locking
issue. Unfortunately, this may incur some performance lost.

Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c | 60 ++
 fs/ocfs2/dlmglue.h | 13 
 fs/ocfs2/ocfs2.h   |  1 +
 3 files changed, 74 insertions(+)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 83d576f..9f91884 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(&res->l_event);
INIT_LIST_HEAD(&res->l_blocked_list);
INIT_LIST_HEAD(&res->l_mask_waiters);
+   INIT_LIST_HEAD(&res->l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,48 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_holder *oh = kmalloc(sizeof(struct ocfs2_holder), 
GFP_KERNEL);
+
+   INIT_LIST_HEAD(&oh->oh_list);
+   oh->oh_lockres = lockres;
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(&lockres->l_lock);
+   list_add_tail(&oh->oh_list, &lockres->l_holders);
+   spin_unlock(&lockres->l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_holder *oh)
+{
+   spin_lock(&lockres->l_lock);
+   list_del(&oh->oh_list);
+   spin_unlock(&lockres->l_lock);
+
+   put_pid(oh->oh_owner_pid);
+   kfree(oh);
+}
+
+struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_holder *oh;
+   struct pid *pid;
+
+   spin_lock(&lockres->l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, &lockres->l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid)
+   goto out;
+   }
+   oh = NULL;
+out:
+   spin_unlock(&lockres->l_lock);
+
+   return oh;
+}
+
 static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres,
 int level)
 {
@@ -1392,6 +1435,7 @@ static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
int noqueue_attempted = 0;
int dlm_locked = 0;
int kick_dc = 0;
+   struct ocfs2_holder *oh;
 
if (!(lockres->l_flags & OCFS2_LOCK_INITIALIZED)) {
mlog_errno(-EINVAL);
@@ -1403,6 +1447,14 @@ static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
lkm_flags |= DLM_LKF_VALBLK;
 
+   /* This block is just used to check recursive locking now */
+   oh = ocfs2_is_locked_by_me(lockres);
+   if (unlikely(oh))
+   mlog_bug_on_msg(1, "PID(%d) locks on lockres(%s) recursively\n",
+   pid_nr(oh->oh_owner_pid), lockres->l_name);
+   else
+   ocfs2_add_holder(lockres);
+
 again:
wait = 0;
 
@@ -1596,6 +1648,14 @@ static void __ocfs2_cluster_unlock(struct ocfs2_super 
*osb,
   unsigned long caller_ip)
 {
unsigned long flags;
+   struct ocfs2_holder *oh = ocfs2_is_locked_by_me(lockres);
+
+   /* This block is just used to check recursive locking now */
+   if (unlikely(!oh))
+   mlog_bug_on_msg(1, "PID(%d) unlock lockres(%s) unnecessarily\n",
+   pid_nr(task_pid(current)), lockres->l_name);
+   else
+   ocfs2_remove_holder(lockres, oh);
 
spin_lock_irqsave(&lockres->l_lock, flags);
ocfs2_dec_holders(lockres, level);
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index d293a22..3b1d4e7 100644
--- a/fs/ocfs2/dlmglue.h
+++ 

[Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?

2016-10-18 Thread Eric Ren
Hi all!

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in another deadlock as we have discussed in the recent thread:
https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012454.html

Before this one, a similiar deadlock has been fixed by Junxiao:
commit c25a1e0671fb ("ocfs2: fix posix_acl_create deadlock")
commit 5ee0fbd50fdf ("ocfs2: revert using ocfs2_acl_chmod to avoid inode 
cluster lock hang")

We are in the situation that we have to avoid recursive cluster locking, but
there is no way to check if a cluster lock has been taken by a precess already.

Mostly, we can avoid recursive locking by writing code carefully. However, as
the deadlock issues have proved out, it's very hard to handle the routines
that are called directly by vfs. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};


ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock().
The problem is that the call chain of ocfs2_permission() includes *_acl().

Possibly, there are three solutions I can think of.  The first one is to
implement the inode permission routine for ocfs2 itself, replacing the
existing generic_permission(); this will bring lots of changes and
involve too many trivial vfs functions into ocfs2 code. Frown on this.

The second one is, what I am trying now, to keep track of the processes who
lock/unlock a cluster lock by the following draft patches. But, I quickly
find out that a cluster locking which has been taken by processA can be unlocked
by processB. For example, systemfiles like journal: is locked during mout, 
and
unlocked during umount. 

The thrid one is to revert that problematic commit! It looks like get/set_acl()
are always been called by other vfs callback like ocfs2_permission(). I think
we can do this if it's true, right? Anyway, I'll try to work out if it's true;-)

Hope for your input to solve this problem;-)

Thanks,
Eric


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel