Re: [Ocfs2-devel] [PATCH v2 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren
Hi!

On 01/16/2017 02:58 PM, Junxiao Bi wrote:
> On 01/16/2017 02:42 PM, Eric Ren wrote:
>> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>> after the patch was merged. The discussion happened here
>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>
>> The reason why taking cluster inode lock at vfs entry points opens up
>> a self deadlock window, is explained in the previous patch of this
>> series.
>>
>> So far, we have seen two different code paths that have this issue.
>> 1. do_sys_open
>>   may_open
>>inode_permission
>> ocfs2_permission
>>  ocfs2_inode_lock() <=== take PR
>>   generic_permission
>>get_acl
>> ocfs2_iop_get_acl
>>  ocfs2_inode_lock() <=== take PR
>> 2. fchmod|fchmodat
>>  chmod_common
>>   notify_change
>>ocfs2_setattr <=== take EX
>> posix_acl_chmod
>>  get_acl
>>   ocfs2_iop_get_acl <=== take PR
>>  ocfs2_iop_set_acl <=== take EX
>>
>> Fixes them by adding the tracking logic (in the previous patch) for
>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>> ocfs2_setattr().
>>
>> Changes since v1:
>> 1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
>> process gets the cluster lock - suggested by: Joseph Qi 
>> 
>> and Junxiao Bi .
>>
>> 2. Change "struct ocfs2_holder" to a more meaningful name 
>> "ocfs2_lock_holder",
>> suggested by: Junxiao Bi .
>>
>> 3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
>> catch exceptional cases, suggested by: Junxiao Bi .
>>
>> Signed-off-by: Eric Ren 
>> ---
>>   fs/ocfs2/acl.c  | 39 +
>>   fs/ocfs2/file.c | 76 
>> +
>>   2 files changed, 100 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>> index bed1fcb..3e47262 100644
>> --- a/fs/ocfs2/acl.c
>> +++ b/fs/ocfs2/acl.c
>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
>> posix_acl *acl, int type)
>>   {
>>  struct buffer_head *bh = NULL;
>>  int status = 0;
>> -
>> -status = ocfs2_inode_lock(inode, &bh, 1);
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_lock_holder oh;
>> +struct ocfs2_lock_res *lockres;
>> +
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = ocfs2_is_locked_by_me(lockres);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>>  if (status < 0) {
>>  if (status != -ENOENT)
>>  mlog_errno(status);
>>  return status;
>>  }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, &oh);
>> +
> Same code pattern showed here and *get_acl, can it be abstracted to one
> function?
> The same issue for *setattr and *permission. Sorry for not mention that
> in last review.

Good idea! I will do it in the next version;-)

Thanks,
Eric

>
> Thanks,
> Junxiao.
>>  status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>> -ocfs2_inode_unlock(inode, 1);
>> +
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> +ocfs2_inode_unlock(inode, 1);
>> +}
>>  brelse(bh);
>> +
>>  return status;
>>   }
>>   
>> @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
>> *inode, int type)
>>  struct buffer_head *di_bh = NULL;
>>  struct posix_acl *acl;
>>  int ret;
>> +int arg_flags = 0, has_locked;
>> +struct ocfs2_lock_holder oh;
>> +struct ocfs2_lock_res *lockres;
>>   
>>  osb = OCFS2_SB(inode->i_sb);
>>  if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>  return NULL;
>> -ret = ocfs2_inode_lock(inode, &di_bh, 0);
>> +
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = ocfs2_is_locked_by_me(lockres);
>> +if (has_locked)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags);
>>  if (ret < 0) {
>>  if (ret != -ENOENT)
>>  mlog_errno(ret);
>>  return ERR_PTR(ret);
>>  }
>> +if (!has_locked)
>> +ocfs2_add_holder(lockres, &oh);
>>   
>>  acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>>   
>> -ocfs2_inode_unlock(inode, 0);
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> +ocfs2_inode_unlock(inode, 0);
>> +}
>>  brelse(di_bh);
>> +
>>  return acl;
>>   }
>>   
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index c488965..b620c25 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
>> *attr)
>>  handle_t *handle = NULL;
>>  s

Re: [Ocfs2-devel] [PATCH v2 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-15 Thread Junxiao Bi
On 01/16/2017 02:42 PM, Eric Ren wrote:
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in a deadlock, as the author "Tariq Saeed" realized shortly
> after the patch was merged. The discussion happened here
> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
> 
> The reason why taking cluster inode lock at vfs entry points opens up
> a self deadlock window, is explained in the previous patch of this
> series.
> 
> So far, we have seen two different code paths that have this issue.
> 1. do_sys_open
>  may_open
>   inode_permission
>ocfs2_permission
> ocfs2_inode_lock() <=== take PR
>  generic_permission
>   get_acl
>ocfs2_iop_get_acl
> ocfs2_inode_lock() <=== take PR
> 2. fchmod|fchmodat
> chmod_common
>  notify_change
>   ocfs2_setattr <=== take EX
>posix_acl_chmod
> get_acl
>  ocfs2_iop_get_acl <=== take PR
> ocfs2_iop_set_acl <=== take EX
> 
> Fixes them by adding the tracking logic (in the previous patch) for
> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
> ocfs2_setattr().
> 
> Changes since v1:
> 1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi 
> and Junxiao Bi .
> 
> 2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
> suggested by: Junxiao Bi .
> 
> 3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
> catch exceptional cases, suggested by: Junxiao Bi .
> 
> Signed-off-by: Eric Ren 
> ---
>  fs/ocfs2/acl.c  | 39 +
>  fs/ocfs2/file.c | 76 
> +
>  2 files changed, 100 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
> index bed1fcb..3e47262 100644
> --- a/fs/ocfs2/acl.c
> +++ b/fs/ocfs2/acl.c
> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
> posix_acl *acl, int type)
>  {
>   struct buffer_head *bh = NULL;
>   int status = 0;
> -
> - status = ocfs2_inode_lock(inode, &bh, 1);
> + int arg_flags = 0, has_locked;
> + struct ocfs2_lock_holder oh;
> + struct ocfs2_lock_res *lockres;
> +
> + lockres = &OCFS2_I(inode)->ip_inode_lockres;
> + has_locked = ocfs2_is_locked_by_me(lockres);
> + if (has_locked)
> + arg_flags = OCFS2_META_LOCK_GETBH;
> + status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>   if (status < 0) {
>   if (status != -ENOENT)
>   mlog_errno(status);
>   return status;
>   }
> + if (!has_locked)
> + ocfs2_add_holder(lockres, &oh);
> +
Same code pattern showed here and *get_acl, can it be abstracted to one
function?
The same issue for *setattr and *permission. Sorry for not mention that
in last review.

Thanks,
Junxiao.
>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
> - ocfs2_inode_unlock(inode, 1);
> +
> + if (!has_locked) {
> + ocfs2_remove_holder(lockres, &oh);
> + ocfs2_inode_unlock(inode, 1);
> + }
>   brelse(bh);
> +
>   return status;
>  }
>  
> @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
> *inode, int type)
>   struct buffer_head *di_bh = NULL;
>   struct posix_acl *acl;
>   int ret;
> + int arg_flags = 0, has_locked;
> + struct ocfs2_lock_holder oh;
> + struct ocfs2_lock_res *lockres;
>  
>   osb = OCFS2_SB(inode->i_sb);
>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>   return NULL;
> - ret = ocfs2_inode_lock(inode, &di_bh, 0);
> +
> + lockres = &OCFS2_I(inode)->ip_inode_lockres;
> + has_locked = ocfs2_is_locked_by_me(lockres);
> + if (has_locked)
> + arg_flags = OCFS2_META_LOCK_GETBH;
> + ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags);
>   if (ret < 0) {
>   if (ret != -ENOENT)
>   mlog_errno(ret);
>   return ERR_PTR(ret);
>   }
> + if (!has_locked)
> + ocfs2_add_holder(lockres, &oh);
>  
>   acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>  
> - ocfs2_inode_unlock(inode, 0);
> + if (!has_locked) {
> + ocfs2_remove_holder(lockres, &oh);
> + ocfs2_inode_unlock(inode, 0);
> + }
>   brelse(di_bh);
> +
>   return acl;
>  }
>  
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index c488965..b620c25 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   handle_t *handle = NULL;
>   struct dquot *transfer_to[MAXQUOTAS] = { };
>   int qtype;
> + int arg_flags = 0, had_lock;
> + struct ocfs2_lock_holder oh;
> + struct ocfs2_lock_res *lockres;
>  
>   trace_ocfs2_setattr(inode, dentry,
>  

[Ocfs2-devel] [PATCH v2 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-15 Thread Eric Ren
We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi .

3. Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c | 48 +---
 fs/ocfs2/dlmglue.h | 18 ++
 fs/ocfs2/ocfs2.h   |  1 +
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..b045f02 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(&res->l_event);
INIT_LIST_HEAD(&res->l_blocked_list);
INIT_LIST_HEAD(&res->l_mask_waiters);
+   INIT_LIST_HEAD(&res->l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,46 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(&oh->oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(&lockres->l_lock);
+   list_add_tail(&oh->oh_list, &lockres->l_holders);
+   spin_unlock(&lockres->l_lock);
+}
+
+void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(&lockres->l_lock);
+   list_del(&oh->oh_list);
+   spin_unlock(&lockres->l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(&lockres->l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, &lockres->l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid) {
+   spin_unlock(&lockres->l_lock);
+   return 1;
+   }
+   }
+   spin_unlock(&lockres->l_lock);
+
+   return 0;
+}
+
 static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres,
 int level)
 {
@@ -2333,8 +2374,9 @@ int ocfs2_inod

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren
Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi .

3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi .

Signed-off-by: Eric Ren 
---
 fs/ocfs2/acl.c  | 39 +
 fs/ocfs2/file.c | 76 +
 2 files changed, 100 insertions(+), 15 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..3e47262 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
int status = 0;
-
-   status = ocfs2_inode_lock(inode, &bh, 1);
+   int arg_flags = 0, has_locked;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
+
+   lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   has_locked = ocfs2_is_locked_by_me(lockres);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
return status;
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, &oh);
+
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, &oh);
+   ocfs2_inode_unlock(inode, 1);
+   }
brelse(bh);
+
return status;
 }
 
@@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
int ret;
+   int arg_flags = 0, has_locked;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, &di_bh, 0);
+
+   lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   has_locked = ocfs2_is_locked_by_me(lockres);
+   if (has_locked)
+   arg_flags = OCFS2_META_LOCK_GETBH;
+   ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags);
if (ret < 0) {
if (ret != -ENOENT)
mlog_errno(ret);
return ERR_PTR(ret);
}
+   if (!has_locked)
+   ocfs2_add_holder(lockres, &oh);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   if (!has_locked) {
+   ocfs2_remove_holder(lockres, &oh);
+   ocfs2_inode_unlock(inode, 0);
+   }
brelse(di_bh);
+
return acl;
 }
 
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..b620c25 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int arg_flags = 0, had_lock;
+   struct ocfs2_lock_holder oh;
+   struct ocfs2_lock_res *lockres;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,13 +1176,41 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, &bh, 1);
+   lockres = &OCFS2_I(inode)->ip_inode_lockres;
+   had_lock = ocfs2_is_locked_by_me(lockres);
+   if (had_lock) {
+   arg_flags = 

[Ocfs2-devel] [PATCH v2 0/2] fix deadlock caused by recursive cluster locking

2017-01-15 Thread Eric Ren
This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
1. Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .
 
2. Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi .
 
3. Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi .
 
4. Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c | 39 
 fs/ocfs2/dlmglue.c | 48 +++---
 fs/ocfs2/dlmglue.h | 18 +
 fs/ocfs2/file.c| 76 +++---
 fs/ocfs2/ocfs2.h   |  1 +
 5 files changed, 164 insertions(+), 18 deletions(-)

-- 
2.10.2


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-15 Thread Eric Ren
Hi Junxiao,
>> OK, good suggestion. Hrm, but in order to align with "ocfs2_inc_holders", I
>> think it's good to keep those function names as it is;-)
> that name is also not very clear. Maybe you can make another patch to
> clear it.

Maybe, the name completeness needs to compromise with the name length at
some time.  One of basic rules is whether the name may confuse the reader.
In this case,  "ocfs2_inc_holders"  in dlmglue.c sounds good to me, not 
ambiguous.

I want to go with it. Anyone who don't like the name can propose their patch 
for it;-)

Thanks,
Eric

>
> Thanks,
> Junxiao.
>
>
>> Thanks for your review!
>> Eric
>>
>>> Thanks,
>>> Junxiao.
>>>
 +struct list_head oh_list;
 +struct pid *oh_owner_pid;
 +};
 +
/* ocfs2_inode_lock_full() 'arg_flags' flags */
/* don't wait on recovery. */
#define OCFS2_META_LOCK_RECOVERY(0x01)
 @@ -77,6 +82,8 @@ struct ocfs2_orphan_scan_lvb {
#define OCFS2_META_LOCK_NOQUEUE(0x02)
/* don't block waiting for the downconvert thread, instead return
 -EAGAIN */
#define OCFS2_LOCK_NONBLOCK(0x04)
 +/* just get back disk inode bh if we've got cluster lock. */
 +#define OCFS2_META_LOCK_GETBH(0x08)
  /* Locking subclasses of inode cluster lock */
enum {
 @@ -170,4 +177,15 @@ void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug
 *dlm_debug);
  /* To set the locking protocol on module initialization */
void ocfs2_set_locking_protocol(void);
 +
 +/*
 + * Keep a list of processes who have interest in a lockres.
 + * Note: this is now only uesed for check recursive cluster lock.
 + */
 +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
 + struct ocfs2_holder *oh);
 +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
 + struct ocfs2_holder *oh);
 +inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct
 ocfs2_lock_res *lockres);
 +
#endif/* DLMGLUE_H */
 diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
 index 7e5958b..0c39d71 100644
 --- a/fs/ocfs2/ocfs2.h
 +++ b/fs/ocfs2/ocfs2.h
 @@ -172,6 +172,7 @@ struct ocfs2_lock_res {
  struct list_head l_blocked_list;
struct list_head l_mask_waiters;
 +struct list_head l_holders;
  unsigned long l_flags;
char l_name[OCFS2_LOCK_ID_MAX_LEN];

>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren
On 01/16/2017 11:13 AM, Junxiao Bi wrote:
> On 01/16/2017 11:06 AM, Eric Ren wrote:
>> Hi Junxiao,
>>
>> On 01/16/2017 10:46 AM, Junxiao Bi wrote:
> If had_lock==true, it is a bug? I think we should BUG_ON for it, that
> can help us catch bug at the first time.
 Good idea! But I'm not sure if "ocfs2_setattr" is always the first one
 who takes the cluster lock.
 It's harder for me to name all the possible paths;-/
>>> The BUG_ON() can help catch the path where ocfs2_setattr is not the
>>> first one.
>> Yes, I understand. But, the problem is that the vfs entries calling
>> order is out of our control.
>> I don't want to place an assertion where I'm not 100% sure it's
>> absolutely right;-)
> If it is not the first one, is it another recursive locking bug? In this
> case, if you don't like BUG_ON(), you can dump the call trace and print
> some warning message.

Yes! I like this idea, will add it in next version, thanks!

Eric

>
> Thanks,
> Junxiao.
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Junxiao.
>>>
>> +if (had_lock)
>> +arg_flags = OCFS2_META_LOCK_GETBH;
>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>> if (status < 0) {
>> if (status != -ENOENT)
>> mlog_errno(status);
>> goto bail_unlock_rw;
>> }
>> -inode_locked = 1;
>> +if (!had_lock) {
>> +ocfs2_add_holder(lockres, &oh);
>> +inode_locked = 1;
>> +}
>>   if (size_change) {
>> status = inode_newsize_ok(inode, attr->ia_size);
>> @@ -1260,7 +1270,8 @@ int ocfs2_setattr(struct dentry *dentry, struct
>> iattr *attr)
>> bail_commit:
>> ocfs2_commit_trans(osb, handle);
>> bail_unlock:
>> -if (status) {
>> +if (status && inode_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> ocfs2_inode_unlock(inode, 1);
>> inode_locked = 0;
>> }
>> @@ -1278,8 +1289,10 @@ int ocfs2_setattr(struct dentry *dentry,
>> struct iattr *attr)
>> if (status < 0)
>> mlog_errno(status);
>> }
>> -if (inode_locked)
>> +if (inode_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> ocfs2_inode_unlock(inode, 1);
>> +}
>>   brelse(bh);
>> return status;
>> @@ -1321,20 +1334,31 @@ int ocfs2_getattr(struct vfsmount *mnt,
>> int ocfs2_permission(struct inode *inode, int mask)
>> {
>> int ret;
>> +int has_locked;
>> +struct ocfs2_holder oh;
>> +struct ocfs2_lock_res *lockres;
>>   if (mask & MAY_NOT_BLOCK)
>> return -ECHILD;
>> -ret = ocfs2_inode_lock(inode, NULL, 0);
>> -if (ret) {
>> -if (ret != -ENOENT)
>> -mlog_errno(ret);
>> -goto out;
>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
> The same thing as ocfs2_setattr.
 OK. I will think over your suggestions!

 Thanks,
 Eric

> Thanks,
> Junxiao.
>> +if (!has_locked) {
>> +ret = ocfs2_inode_lock(inode, NULL, 0);
>> +if (ret) {
>> +if (ret != -ENOENT)
>> +mlog_errno(ret);
>> +goto out;
>> +}
>> +ocfs2_add_holder(lockres, &oh);
>> }
>>   ret = generic_permission(inode, mask);
>> -ocfs2_inode_unlock(inode, 0);
>> +if (!has_locked) {
>> +ocfs2_remove_holder(lockres, &oh);
>> +ocfs2_inode_unlock(inode, 0);
>> +}
>> out:
>> return ret;
>> }
>>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-15 Thread Junxiao Bi
On 01/16/2017 11:06 AM, Eric Ren wrote:
> Hi Junxiao,
> 
> On 01/16/2017 10:46 AM, Junxiao Bi wrote:
 If had_lock==true, it is a bug? I think we should BUG_ON for it, that
 can help us catch bug at the first time.
>>> Good idea! But I'm not sure if "ocfs2_setattr" is always the first one
>>> who takes the cluster lock.
>>> It's harder for me to name all the possible paths;-/
>> The BUG_ON() can help catch the path where ocfs2_setattr is not the
>> first one.
> Yes, I understand. But, the problem is that the vfs entries calling
> order is out of our control.
> I don't want to place an assertion where I'm not 100% sure it's
> absolutely right;-)
If it is not the first one, is it another recursive locking bug? In this
case, if you don't like BUG_ON(), you can dump the call trace and print
some warning message.

Thanks,
Junxiao.
> 
> Thanks,
> Eric
> 
>>
>> Thanks,
>> Junxiao.
>>

> +if (had_lock)
> +arg_flags = OCFS2_META_LOCK_GETBH;
> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>if (status < 0) {
>if (status != -ENOENT)
>mlog_errno(status);
>goto bail_unlock_rw;
>}
> -inode_locked = 1;
> +if (!had_lock) {
> +ocfs2_add_holder(lockres, &oh);
> +inode_locked = 1;
> +}
>  if (size_change) {
>status = inode_newsize_ok(inode, attr->ia_size);
> @@ -1260,7 +1270,8 @@ int ocfs2_setattr(struct dentry *dentry, struct
> iattr *attr)
>bail_commit:
>ocfs2_commit_trans(osb, handle);
>bail_unlock:
> -if (status) {
> +if (status && inode_locked) {
> +ocfs2_remove_holder(lockres, &oh);
>ocfs2_inode_unlock(inode, 1);
>inode_locked = 0;
>}
> @@ -1278,8 +1289,10 @@ int ocfs2_setattr(struct dentry *dentry,
> struct iattr *attr)
>if (status < 0)
>mlog_errno(status);
>}
> -if (inode_locked)
> +if (inode_locked) {
> +ocfs2_remove_holder(lockres, &oh);
>ocfs2_inode_unlock(inode, 1);
> +}
>  brelse(bh);
>return status;
> @@ -1321,20 +1334,31 @@ int ocfs2_getattr(struct vfsmount *mnt,
>int ocfs2_permission(struct inode *inode, int mask)
>{
>int ret;
> +int has_locked;
> +struct ocfs2_holder oh;
> +struct ocfs2_lock_res *lockres;
>  if (mask & MAY_NOT_BLOCK)
>return -ECHILD;
>-ret = ocfs2_inode_lock(inode, NULL, 0);
> -if (ret) {
> -if (ret != -ENOENT)
> -mlog_errno(ret);
> -goto out;
> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
 The same thing as ocfs2_setattr.
>>> OK. I will think over your suggestions!
>>>
>>> Thanks,
>>> Eric
>>>
 Thanks,
 Junxiao.
> +if (!has_locked) {
> +ret = ocfs2_inode_lock(inode, NULL, 0);
> +if (ret) {
> +if (ret != -ENOENT)
> +mlog_errno(ret);
> +goto out;
> +}
> +ocfs2_add_holder(lockres, &oh);
>}
>  ret = generic_permission(inode, mask);
>-ocfs2_inode_unlock(inode, 0);
> +if (!has_locked) {
> +ocfs2_remove_holder(lockres, &oh);
> +ocfs2_inode_unlock(inode, 0);
> +}
>out:
>return ret;
>}
>
>>
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-15 Thread Eric Ren
Hi Junxiao,

On 01/16/2017 10:46 AM, Junxiao Bi wrote:
>>> If had_lock==true, it is a bug? I think we should BUG_ON for it, that
>>> can help us catch bug at the first time.
>> Good idea! But I'm not sure if "ocfs2_setattr" is always the first one
>> who takes the cluster lock.
>> It's harder for me to name all the possible paths;-/
> The BUG_ON() can help catch the path where ocfs2_setattr is not the
> first one.
Yes, I understand. But, the problem is that the vfs entries calling order is 
out of our control.
I don't want to place an assertion where I'm not 100% sure it's absolutely 
right;-)

Thanks,
Eric

>
> Thanks,
> Junxiao.
>
>>>
 +if (had_lock)
 +arg_flags = OCFS2_META_LOCK_GETBH;
 +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
if (status < 0) {
if (status != -ENOENT)
mlog_errno(status);
goto bail_unlock_rw;
}
 -inode_locked = 1;
 +if (!had_lock) {
 +ocfs2_add_holder(lockres, &oh);
 +inode_locked = 1;
 +}
  if (size_change) {
status = inode_newsize_ok(inode, attr->ia_size);
 @@ -1260,7 +1270,8 @@ int ocfs2_setattr(struct dentry *dentry, struct
 iattr *attr)
bail_commit:
ocfs2_commit_trans(osb, handle);
bail_unlock:
 -if (status) {
 +if (status && inode_locked) {
 +ocfs2_remove_holder(lockres, &oh);
ocfs2_inode_unlock(inode, 1);
inode_locked = 0;
}
 @@ -1278,8 +1289,10 @@ int ocfs2_setattr(struct dentry *dentry,
 struct iattr *attr)
if (status < 0)
mlog_errno(status);
}
 -if (inode_locked)
 +if (inode_locked) {
 +ocfs2_remove_holder(lockres, &oh);
ocfs2_inode_unlock(inode, 1);
 +}
  brelse(bh);
return status;
 @@ -1321,20 +1334,31 @@ int ocfs2_getattr(struct vfsmount *mnt,
int ocfs2_permission(struct inode *inode, int mask)
{
int ret;
 +int has_locked;
 +struct ocfs2_holder oh;
 +struct ocfs2_lock_res *lockres;
  if (mask & MAY_NOT_BLOCK)
return -ECHILD;
-ret = ocfs2_inode_lock(inode, NULL, 0);
 -if (ret) {
 -if (ret != -ENOENT)
 -mlog_errno(ret);
 -goto out;
 +lockres = &OCFS2_I(inode)->ip_inode_lockres;
 +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>> The same thing as ocfs2_setattr.
>> OK. I will think over your suggestions!
>>
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Junxiao.
 +if (!has_locked) {
 +ret = ocfs2_inode_lock(inode, NULL, 0);
 +if (ret) {
 +if (ret != -ENOENT)
 +mlog_errno(ret);
 +goto out;
 +}
 +ocfs2_add_holder(lockres, &oh);
}
  ret = generic_permission(inode, mask);
-ocfs2_inode_unlock(inode, 0);
 +if (!has_locked) {
 +ocfs2_remove_holder(lockres, &oh);
 +ocfs2_inode_unlock(inode, 0);
 +}
out:
return ret;
}

>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-15 Thread Junxiao Bi
On 01/13/2017 02:19 PM, Eric Ren wrote:
> Hi!
> 
> On 01/13/2017 12:22 PM, Junxiao Bi wrote:
>> On 01/05/2017 11:31 PM, Eric Ren wrote:
>>> Commit 743b5f1434f5 ("ocfs2: take inode lock in
>>> ocfs2_iop_set/get_acl()")
>>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>>> after the patch was merged. The discussion happened here
>>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>>
>>>
>>> The reason why taking cluster inode lock at vfs entry points opens up
>>> a self deadlock window, is explained in the previous patch of this
>>> series.
>>>
>>> So far, we have seen two different code paths that have this issue.
>>> 1. do_sys_open
>>>   may_open
>>>inode_permission
>>> ocfs2_permission
>>>  ocfs2_inode_lock() <=== take PR
>>>   generic_permission
>>>get_acl
>>> ocfs2_iop_get_acl
>>>  ocfs2_inode_lock() <=== take PR
>>> 2. fchmod|fchmodat
>>>  chmod_common
>>>   notify_change
>>>ocfs2_setattr <=== take EX
>>> posix_acl_chmod
>>>  get_acl
>>>   ocfs2_iop_get_acl <=== take PR
>>>  ocfs2_iop_set_acl <=== take EX
>>>
>>> Fixes them by adding the tracking logic (in the previous patch) for
>>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>>> ocfs2_setattr().
>>>
>>> Signed-off-by: Eric Ren 
>>> ---
>>>   fs/ocfs2/acl.c  | 39 ++-
>>>   fs/ocfs2/file.c | 44 ++--
>>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>> index bed1fcb..c539890 100644
>>> --- a/fs/ocfs2/acl.c
>>> +++ b/fs/ocfs2/acl.c
>>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode,
>>> struct posix_acl *acl, int type)
>>>   {
>>>   struct buffer_head *bh = NULL;
>>>   int status = 0;
>>> -
>>> -status = ocfs2_inode_lock(inode, &bh, 1);
>>> +int arg_flags = 0, has_locked;
>>> +struct ocfs2_holder oh;
>>> +struct ocfs2_lock_res *lockres;
>>> +
>>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>> +if (has_locked)
>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags);
>>>   if (status < 0) {
>>>   if (status != -ENOENT)
>>>   mlog_errno(status);
>>>   return status;
>>>   }
>>> +if (!has_locked)
>>> +ocfs2_add_holder(lockres, &oh);
>>> +
>>>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
>>> -ocfs2_inode_unlock(inode, 1);
>>> +
>>> +if (!has_locked) {
>>> +ocfs2_remove_holder(lockres, &oh);
>>> +ocfs2_inode_unlock(inode, 1);
>>> +}
>>>   brelse(bh);
>>> +
>>>   return status;
>>>   }
>>>   @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct
>>> inode *inode, int type)
>>>   struct buffer_head *di_bh = NULL;
>>>   struct posix_acl *acl;
>>>   int ret;
>>> +int arg_flags = 0, has_locked;
>>> +struct ocfs2_holder oh;
>>> +struct ocfs2_lock_res *lockres;
>>> osb = OCFS2_SB(inode->i_sb);
>>>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>>   return NULL;
>>> -ret = ocfs2_inode_lock(inode, &di_bh, 0);
>>> +
>>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>> +if (has_locked)
>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>> +ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags);
>>>   if (ret < 0) {
>>>   if (ret != -ENOENT)
>>>   mlog_errno(ret);
>>>   return ERR_PTR(ret);
>>>   }
>>> +if (!has_locked)
>>> +ocfs2_add_holder(lockres, &oh);
>>> acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>>>   -ocfs2_inode_unlock(inode, 0);
>>> +if (!has_locked) {
>>> +ocfs2_remove_holder(lockres, &oh);
>>> +ocfs2_inode_unlock(inode, 0);
>>> +}
>>>   brelse(di_bh);
>>> +
>>>   return acl;
>>>   }
>>>   diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>>> index c488965..62be75d 100644
>>> --- a/fs/ocfs2/file.c
>>> +++ b/fs/ocfs2/file.c
>>> @@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct
>>> iattr *attr)
>>>   handle_t *handle = NULL;
>>>   struct dquot *transfer_to[MAXQUOTAS] = { };
>>>   int qtype;
>>> +int arg_flags = 0, had_lock;
>>> +struct ocfs2_holder oh;
>>> +struct ocfs2_lock_res *lockres;
>>> trace_ocfs2_setattr(inode, dentry,
>>>   (unsigned long long)OCFS2_I(inode)->ip_blkno,
>>> @@ -1173,13 +1176,20 @@ int ocfs2_setattr(struct dentry *dentry,
>>> struct iattr *attr)
>>>   }
>>>   }
>>>   -status = ocfs2_inode_lock(inode, &bh, 1);
>>> +lockres = &OCFS2_I(inode)->ip_inode_lockres;
>>> +had_lock = (ocfs2_is_locked_by_me(lockres) != NULL);
>> I

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-15 Thread Junxiao Bi
On 01/13/2017 02:12 PM, Eric Ren wrote:
> Hi Junxiao!
> 
> On 01/13/2017 11:59 AM, Junxiao Bi wrote:
>> On 01/05/2017 11:31 PM, Eric Ren wrote:
>>> We are in the situation that we have to avoid recursive cluster locking,
>>> but there is no way to check if a cluster lock has been taken by a
>>> precess already.
>>>
>>> Mostly, we can avoid recursive locking by writing code carefully.
>>> However, we found that it's very hard to handle the routines that
>>> are invoked directly by vfs code. For instance:
>>>
>>> const struct inode_operations ocfs2_file_iops = {
>>>  .permission = ocfs2_permission,
>>>  .get_acl= ocfs2_iop_get_acl,
>>>  .set_acl= ocfs2_iop_set_acl,
>>> };
>>>
>>> Both ocfs2_permission() and ocfs2_iop_get_acl() call
>>> ocfs2_inode_lock(PR):
>>> do_sys_open
>>>   may_open
>>>inode_permission
>>> ocfs2_permission
>>>  ocfs2_inode_lock() <=== first time
>>>   generic_permission
>>>get_acl
>>> ocfs2_iop_get_acl
>>> ocfs2_inode_lock() <=== recursive one
>>>
>>> A deadlock will occur if a remote EX request comes in between two
>>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>>
>>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>>> on behalf of the remote EX lock request. Another hand, the recursive
>>> cluster lock (the second one) will be blocked in in
>>> __ocfs2_cluster_lock()
>>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
>>> because there is no chance for the first cluster lock on this node to be
>>> unlocked - we block ourselves in the code path.
>>>
>>> The idea to fix this issue is mostly taken from gfs2 code.
>>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>>> keep track of the processes' pid  who has taken the cluster lock
>>> of this lock resource;
>>> 2. introduce a new flag for ocfs2_inode_lock_full:
>>> OCFS2_META_LOCK_GETBH;
>>> it means just getting back disk inode bh for us if we've got cluster
>>> lock.
>>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>>> have got the cluster lock in the upper code path.
>>>
>>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>>> to solve the recursive locking issue cuased by the fact that vfs
>>> routines
>>> can call into each other.
>>>
>>> The performance penalty of processing the holder list should only be
>>> seen
>>> at a few cases where the tracking logic is used, such as get/set acl.
>>>
>>> You may ask what if the first time we got a PR lock, and the second time
>>> we want a EX lock? fortunately, this case never happens in the real
>>> world,
>>> as far as I can see, including permission check,
>>> (get|set)_(acl|attr), and
>>> the gfs2 code also do so.
>>>
>>> Signed-off-by: Eric Ren 
>>> ---
>>>   fs/ocfs2/dlmglue.c | 47
>>> ---
>>>   fs/ocfs2/dlmglue.h | 18 ++
>>>   fs/ocfs2/ocfs2.h   |  1 +
>>>   3 files changed, 63 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 83d576f..500bda4 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct
>>> ocfs2_lock_res *res)
>>>   init_waitqueue_head(&res->l_event);
>>>   INIT_LIST_HEAD(&res->l_blocked_list);
>>>   INIT_LIST_HEAD(&res->l_mask_waiters);
>>> +INIT_LIST_HEAD(&res->l_holders);
>>>   }
>>> void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
>>> @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res
>>> *res)
>>>   res->l_flags = 0UL;
>>>   }
>>>   +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
>>> +   struct ocfs2_holder *oh)
>>> +{
>>> +INIT_LIST_HEAD(&oh->oh_list);
>>> +oh->oh_owner_pid =  get_pid(task_pid(current));
>> struct pid(oh->oh_owner_pid) looks complicated here, why not use
>> task_struct(current) or pid_t(current->pid) directly? Also i didn't see
>> the ref count needs to be considered.
> 
> This is learned from gfs2 code, which is tested by practice. So, I think
> it's not bad
> to keep it;-)
> 
>>
>>> +
>>> +spin_lock(&lockres->l_lock);
>>> +list_add_tail(&oh->oh_list, &lockres->l_holders);
>>> +spin_unlock(&lockres->l_lock);
>>> +}
>>> +
>>> +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
>>> +   struct ocfs2_holder *oh)
>>> +{
>>> +spin_lock(&lockres->l_lock);
>>> +list_del(&oh->oh_list);
>>> +spin_unlock(&lockres->l_lock);
>>> +
>>> +put_pid(oh->oh_owner_pid);
>> same the above
>>
>>> +}
>>> +
>>> +inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct
>>> ocfs2_lock_res *lockres)
>> Agree with Joseph, return bool looks better. I didn't see how that help
>> debug since the return value is not used.
>>
>>
>>> +{
>>> +struct ocfs2_holder *oh;
>>> +struct pid *pid;
>>> +
>>>

Re: [Ocfs2-devel] [PATCH v3] ocfs2/journal: fix umount hang after flushing journal failure

2017-01-15 Thread Joseph Qi


On 17/1/13 20:37, Eric Ren wrote:
> On 01/13/2017 10:52 AM, Changwei Ge wrote:
>> Hi Joseph,
>>
>> Do you think my last version of patch to fix umount hang after journal
>> flushing failure is OK?
>>
>> If so, I 'd like to ask Andrew's help to merge this patch into his test
>> tree.
>>
>>
>> Thanks,
>>
>> Br.
>>
>> Changwei
>
> The message above should not occur in a formal patch.  It should be 
> put in "cover-letter" if
> you want to say something to the other developers. See "git 
> format-patch --cover-letter".
>
>>
>>
>>
>>  From 686b52ee2f06395c53e36e2c7515c276dc7541fb Mon Sep 17 00:00:00 2001
>> From: Changwei Ge 
>> Date: Wed, 11 Jan 2017 09:05:35 +0800
>> Subject: [PATCH] fix umount hang after journal flushing failure
>
> The commit message is needed here! It should describe what's your 
> problem, how to reproduce it,
> and what's your solution, things like that.
>
>>
>> Signed-off-by: Changwei Ge 
>> ---
>>   fs/ocfs2/journal.c |   18 ++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>> index a244f14..5f3c862 100644
>> --- a/fs/ocfs2/journal.c
>> +++ b/fs/ocfs2/journal.c
>> @@ -2315,6 +2315,24 @@ static int ocfs2_commit_thread(void *arg)
>>   "commit_thread: %u transactions pending 
>> on "
>>   "shutdown\n",
>> atomic_read(&journal->j_num_trans));
>> +
>> +   if (status < 0) {
>> +   mlog(ML_ERROR, "journal is already abort
>> and cannot be "
>> +"flushed any more. So ignore
>> the pending "
>> +"transactions to avoid blocking
>> ocfs2 unmount.\n");
>
> Can you find any example in the kernel source to print out message 
> like that?!
>
> I saw Joseph showed you the right way in previous email:
> "
>
> if (status < 0) {
>
>  mlog(ML_ERROR, "journal is already abort and cannot be "
>
>  "flushed any more. So ignore the pending "
>
>  "transactions to avoid blocking ocfs2 unmount.\n");
>
> "
> So, please be careful and learn from the kernel source and the right 
> way other developers do in
> their patch work. Otherwise, it's meaningless to waste others' time in 
> such basic issues.
>
>> +   /*
>> +* This may a litte hacky, however, no
>> chance
>> +* for ocfs2/journal to decrease this
>> variable
>> +* thourgh commit-thread. I have to 
>> do so to
>> +* avoid umount hang after journal 
>> flushing
>> +* failure. Since jounral has been
>> marked ABORT
>> +* within jbd2_journal_flush, commit
>> cache will
>> +* never do any real work to flush
>> journal to
>> +* disk.Set it to ZERO so that umount 
>> will
>> +* continue during shutting down journal
>> +*/
>> + atomic_set(&journal->j_num_trans, 0);
> It's possible to corrupt data doing this way. Why not just crash the 
> kernel when jbd2 aborts?
> and let the other node to do the journal recovery. It's the strength 
> of cluster filesystem.
We shouldn't crash kernel directly, which will enlarge the impact of the
issue. For example, we have mount multiple volumes and only one has this
error occurred.
But I do agree with you that we have to let other nodes know the
abnormal exit and do the recovery, which can ensure the data
consistency.

Thanks,
Joseph
>
> Anyway, it's glad to see you guys making contributions!
>
> Thanks,
> Eric
>
>
>> +   }
>>  }
>>  }
>>
>> -- 
>> 1.7.9.5
>>
>> -
>>  
>>
>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 
>>
>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 
>>
>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 
>>
>> 邮件!
>> This e-mail and its attachments contain confidential information from 
>> H3C, which is
>> intended only for the person or entity whose address is listed above. 
>> Any use of the
>> information contained herein in any way (including, but not limited 
>> to, total or partial
>> disclosure, reproduction, or dissemination) by persons other than the 
>> intended
>> recipient(s) is prohibited. If you receive this e-mail in error, 
>> please notify the sender
>> by phone or email immediately and delete it!
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel