Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-12 Thread Junxiao Bi
On 01/05/2017 11:31 PM, Eric Ren wrote:
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in a deadlock, as the author "Tariq Saeed" realized shortly
> after the patch was merged. The discussion happened here
> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
> 
> The reason why taking cluster inode lock at vfs entry points opens up
> a self deadlock window, is explained in the previous patch of this
> series.
> 
> So far, we have seen two different code paths that have this issue.
> 1. do_sys_open
>  may_open
>   inode_permission
>ocfs2_permission
> ocfs2_inode_lock() <=== take PR
>  generic_permission
>   get_acl
>ocfs2_iop_get_acl
> ocfs2_inode_lock() <=== take PR
> 2. fchmod|fchmodat
> chmod_common
>  notify_change
>   ocfs2_setattr <=== take EX
>posix_acl_chmod
> get_acl
>  ocfs2_iop_get_acl <=== take PR
> ocfs2_iop_set_acl <=== take EX
> 
> Fixes them by adding the tracking logic (in the previous patch) for
> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
> ocfs2_setattr().
> 
> Signed-off-by: Eric Ren 
> ---
>  fs/ocfs2/acl.c  | 39 ++-
>  fs/ocfs2/file.c | 44 ++--
>  2 files changed, 68 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
> index bed1fcb..c539890 100644
> --- a/fs/ocfs2/acl.c
> +++ b/fs/ocfs2/acl.c
> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
> posix_acl *acl, int type)
>  {
>   struct buffer_head *bh = NULL;
>   int status = 0;
> -
> - status = ocfs2_inode_lock(inode, , 1);
> + int arg_flags = 0, has_locked;
> + struct ocfs2_holder oh;
> + struct ocfs2_lock_res *lockres;
> +
> + lockres = _I(inode)->ip_inode_lockres;
> + has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
> + if (has_locked)
> + arg_flags = OCFS2_META_LOCK_GETBH;
> + status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>   if (status < 0) {
>   if (status != -ENOENT)
>   mlog_errno(status);
>   return status;
>   }
> + if (!has_locked)
> + ocfs2_add_holder(lockres, );
> +
>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
> - ocfs2_inode_unlock(inode, 1);
> +
> + if (!has_locked) {
> + ocfs2_remove_holder(lockres, );
> + ocfs2_inode_unlock(inode, 1);
> + }
>   brelse(bh);
> +
>   return status;
>  }
>  
> @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
> *inode, int type)
>   struct buffer_head *di_bh = NULL;
>   struct posix_acl *acl;
>   int ret;
> + int arg_flags = 0, has_locked;
> + struct ocfs2_holder oh;
> + struct ocfs2_lock_res *lockres;
>  
>   osb = OCFS2_SB(inode->i_sb);
>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>   return NULL;
> - ret = ocfs2_inode_lock(inode, _bh, 0);
> +
> + lockres = _I(inode)->ip_inode_lockres;
> + has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
> + if (has_locked)
> + arg_flags = OCFS2_META_LOCK_GETBH;
> + ret = ocfs2_inode_lock_full(inode, _bh, 0, arg_flags);
>   if (ret < 0) {
>   if (ret != -ENOENT)
>   mlog_errno(ret);
>   return ERR_PTR(ret);
>   }
> + if (!has_locked)
> + ocfs2_add_holder(lockres, );
>  
>   acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>  
> - ocfs2_inode_unlock(inode, 0);
> + if (!has_locked) {
> + ocfs2_remove_holder(lockres, );
> + ocfs2_inode_unlock(inode, 0);
> + }
>   brelse(di_bh);
> +
>   return acl;
>  }
>  
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index c488965..62be75d 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   handle_t *handle = NULL;
>   struct dquot *transfer_to[MAXQUOTAS] = { };
>   int qtype;
> + int arg_flags = 0, had_lock;
> + struct ocfs2_holder oh;
> + struct ocfs2_lock_res *lockres;
>  
>   trace_ocfs2_setattr(inode, dentry,
>   (unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -1173,13 +1176,20 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   }
>   }
>  
> - status = ocfs2_inode_lock(inode, , 1);
> + lockres = _I(inode)->ip_inode_lockres;
> + had_lock = (ocfs2_is_locked_by_me(lockres) != NULL);

If had_lock==true, it is a bug? I think we should BUG_ON for it, that
can help us catch bug at the first time.


> + if (had_lock)
> + arg_flags = OCFS2_META_LOCK_GETBH;
> + status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>   if (status < 0) {
>   

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-12 Thread Junxiao Bi
On 01/05/2017 11:31 PM, Eric Ren wrote:
> We are in the situation that we have to avoid recursive cluster locking,
> but there is no way to check if a cluster lock has been taken by a
> precess already.
> 
> Mostly, we can avoid recursive locking by writing code carefully.
> However, we found that it's very hard to handle the routines that
> are invoked directly by vfs code. For instance:
> 
> const struct inode_operations ocfs2_file_iops = {
> .permission = ocfs2_permission,
> .get_acl= ocfs2_iop_get_acl,
> .set_acl= ocfs2_iop_set_acl,
> };
> 
> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
> do_sys_open
>  may_open
>   inode_permission
>ocfs2_permission
> ocfs2_inode_lock() <=== first time
>  generic_permission
>   get_acl
>ocfs2_iop_get_acl
>   ocfs2_inode_lock() <=== recursive one
> 
> A deadlock will occur if a remote EX request comes in between two
> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
> 
> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
> BAST(ocfs2_generic_handle_bast) when downconvert is started
> on behalf of the remote EX lock request. Another hand, the recursive
> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
> because there is no chance for the first cluster lock on this node to be
> unlocked - we block ourselves in the code path.
> 
> The idea to fix this issue is mostly taken from gfs2 code.
> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
> keep track of the processes' pid  who has taken the cluster lock
> of this lock resource;
> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
> it means just getting back disk inode bh for us if we've got cluster lock.
> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
> have got the cluster lock in the upper code path.
> 
> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
> to solve the recursive locking issue cuased by the fact that vfs routines
> can call into each other.
> 
> The performance penalty of processing the holder list should only be seen
> at a few cases where the tracking logic is used, such as get/set acl.
> 
> You may ask what if the first time we got a PR lock, and the second time
> we want a EX lock? fortunately, this case never happens in the real world,
> as far as I can see, including permission check, (get|set)_(acl|attr), and
> the gfs2 code also do so.
> 
> Signed-off-by: Eric Ren 
> ---
>  fs/ocfs2/dlmglue.c | 47 ---
>  fs/ocfs2/dlmglue.h | 18 ++
>  fs/ocfs2/ocfs2.h   |  1 +
>  3 files changed, 63 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 83d576f..500bda4 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>   init_waitqueue_head(>l_event);
>   INIT_LIST_HEAD(>l_blocked_list);
>   INIT_LIST_HEAD(>l_mask_waiters);
> + INIT_LIST_HEAD(>l_holders);
>  }
>  
>  void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
> @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>   res->l_flags = 0UL;
>  }
>  
> +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_holder *oh)
> +{
> + INIT_LIST_HEAD(>oh_list);
> + oh->oh_owner_pid =  get_pid(task_pid(current));
struct pid(oh->oh_owner_pid) looks complicated here, why not use
task_struct(current) or pid_t(current->pid) directly? Also i didn't see
the ref count needs to be considered.

> +
> + spin_lock(>l_lock);
> + list_add_tail(>oh_list, >l_holders);
> + spin_unlock(>l_lock);
> +}
> +
> +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_holder *oh)
> +{
> + spin_lock(>l_lock);
> + list_del(>oh_list);
> + spin_unlock(>l_lock);
> +
> + put_pid(oh->oh_owner_pid);
same the above

> +}
> +
> +inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res 
> *lockres)
Agree with Joseph, return bool looks better. I didn't see how that help
debug since the return value is not used.


> +{
> + struct ocfs2_holder *oh;
> + struct pid *pid;
> +
> + /* look in the list of holders for one with the current task as owner */
> + spin_lock(>l_lock);
> + pid = task_pid(current);
> + list_for_each_entry(oh, >l_holders, oh_list) {
> + if (oh->oh_owner_pid == pid)
> + goto out;
> + }
> + oh = NULL;
> +out:
> + spin_unlock(>l_lock);
> + return oh;
> +}
> +
>  static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres,
>int level)
>  {
> @@ -2333,8 +2373,9 @@ 

[Ocfs2-devel] [PATCH v3] ocfs2/journal: fix umount hang after flushing journal failure

2017-01-12 Thread Changwei Ge
Hi Joseph,

Do you think my last version of patch to fix umount hang after journal
flushing failure is OK?

If so, I 'd like to ask Andrew's help to merge this patch into his test
tree.


Thanks,

Br.

Changwei



From 686b52ee2f06395c53e36e2c7515c276dc7541fb Mon Sep 17 00:00:00 2001
From: Changwei Ge 
Date: Wed, 11 Jan 2017 09:05:35 +0800
Subject: [PATCH] fix umount hang after journal flushing failure

Signed-off-by: Changwei Ge 
---
 fs/ocfs2/journal.c |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index a244f14..5f3c862 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -2315,6 +2315,24 @@ static int ocfs2_commit_thread(void *arg)
 "commit_thread: %u transactions pending on "
 "shutdown\n",
 atomic_read(>j_num_trans));
+
+   if (status < 0) {
+   mlog(ML_ERROR, "journal is already abort
and cannot be "
+"flushed any more. So ignore
the pending "
+"transactions to avoid blocking
ocfs2 unmount.\n");
+   /*
+* This may a litte hacky, however, no
chance
+* for ocfs2/journal to decrease this
variable
+* thourgh commit-thread. I have to do so to
+* avoid umount hang after journal flushing
+* failure. Since jounral has been
marked ABORT
+* within jbd2_journal_flush, commit
cache will
+* never do any real work to flush
journal to
+* disk.Set it to ZERO so that umount will
+* continue during shutting down journal
+*/
+   atomic_set(>j_num_trans, 0);
+   }
}
}

--
1.7.9.5

-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-12 Thread Eric Ren
Hi Joseph,

On 01/09/2017 10:13 AM, Eric Ren wrote:
 So you are trying to fix it by making phase3 finish without really doing
>>> Phase3 can go ahead because this node is already under protection of 
>>> cluster lock.
>> You said it was blocked...
> Oh, sorry, I meant phase3 can go ahead if this patch set is applied;-)
>
>> "Another hand, the recursive cluster lock (the second one) will be blocked in
>> __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED."
 __ocfs2_cluster_lock, then Process B can continue either.
 Let us bear in mind that phase1 and phase3 are in the same context and
 executed in order. That's why I think there is no need to check if locked
 by myself in phase1.
> Sorry, I still cannot see it. Without keeping track of the first cluster 
> lock, how can we
> know if
> we are under a context that has already been in the protecting of cluster 
> lock? How can we
> handle
> the recursive locking (the second cluster lock) if we don't have this 
> information?
 If phase1 finds it is already locked by myself, that means the holder
 is left by last operation without dec holder. That's why I think it is a 
 bug
 instead of a recursive lock case.
> I think I got your point here. Do you mean that we should just add the lock 
> holder at the
> first locking position
> without checking before that? Unfortunately, it's tricky here to know exactly 
> which ocfs2
> routine will be the first vfs
> entry point, such as ocfs2_get_acl() which can be both the first vfs entry 
> point and the
> second vfs entry point after
> ocfs2_permission(), right?
>
> It will be a coding bug if the problem you concern about happens. I think we 
> don't need to
> worry about this much because
> the code logic here is quite simple;-)
Ping...

Did I clear your doubts by the last email? I really want to get your point, if 
not.

If there's any problem, I will fix them in the next version;-)

Thanks,
Eric

>
> Thanks for your patience!
> Eric
>
>>> D


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel