Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-22 Thread Mark Fasheh
On Mon, Dec 21, 2015 at 01:12:34PM +0800, Junxiao Bi wrote:
> Hi Mark,
> 
> On 12/19/2015 07:23 AM, Mark Fasheh wrote:
> > On Tue, Dec 15, 2015 at 09:43:48AM +0800, Junxiao Bi wrote:
> >> Hi Mark,
> >>
> >> On 12/15/2015 03:18 AM, Mark Fasheh wrote:
> >>> On Mon, Dec 14, 2015 at 02:03:17PM +0800, Junxiao Bi wrote:
> > Second, this issue can be reproduced in old Linux kernels (e.g. 
> > 3.16.7-24)? there should not be any regression issue? 
>  Maybe just hard to reproduce, ocfs2 supports recursive locking.
> >>>
> >>> In what sense? The DLM might but the FS should never be making use of 
> >>> such a
> >>> mechanism (it would be for userspace users).
> >> See commit 743b5f1434f5 ("ocfs2: take inode lock in
> >> ocfs2_iop_set/get_acl()"), it used recursive locking and caused a
> >> deadlock, the call trace is in this patch's log.
> > 
> > Ahh ok so it's part of the buggy patch.
> > 
> > 
> >>> We really can't add recursive locks without this getting rejected 
> >>> upstream.
> >>> There's a whole slew of reasons why we don't like those in the kernel.
> >> Is there any harm to support this lock in kernel?
> > 
> > Yeah so you can google search on why recursive locks are considered harmful
> > by many programmers and in the Linux Kernel they are a big 'No No'. We used
> > to have one recursive lock (the 'big kernel lock') which took a large effort
> > to clean up.
> > 
> > Most objections are going to come down to the readability of the code and
> > the nasty bugs that can come about as a result. Here's a random blog post I
> > found explaining some of this:
> > 
> > http://blog.stephencleary.com/2013/04/recursive-re-entrant-locks.html
> Good doc. Thank you for sharing it. Learned more about recursive locking.

Np, glad it helped.


> But I am afraid, cluster lock suffers all drawbacks mentioned in that
> doc since it is stick to node. P1 and P2 can hold one EX lock at the
> same time, is it a kind of recursive locking for cluster lock?
> Maybe it is just a bad naming of recursive locking. If two processes in
> one node are allowed to hold one EX lock at the same time, why one
> process not allowed to lock one EX lock two times?

Yeah I think maybe it's 'bad naming' as you say? What dlmglue is doing we
call 'lock caching'.

It's possible to use the dlm without caching the locks but that means every
process wanting a cluster lock will have to make a dlm request. So instead,
we hold onto the lock until another node wants it. However, this effectively
makes each lock node-wide (as opposed to per-process which we would get
without the lock caching).

For local process locking, we use the VFS locks (inode mutex, etc).

So the cluster lock is never being acquired more than once, it gets
acquired and dlmglue caches access to it in a fair way. Hence it is not
being acquired 'recusively'.

Callers of dlmglue honor the usual locking rules - the same ones that we
honor with local locks. For example, they must be properly ordered with
respect to other locks.


As you point out, we could turn the dlmglue locking into a true recursive
locking scheme. But if we do that, then we are changing the design of
dlmglue and introducing recursive locking into a kernel which does not hav
it. The 2nd point is self explanatory. The reason I would also not want to
change the design of dlmglue is that I don't feel this bug requires such a
drastic measure.

Thanks,
--Mark

--
Mark Fasheh

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-22 Thread Junxiao Bi
Hi Mark,

On 12/23/2015 06:12 AM, Mark Fasheh wrote:
> On Mon, Dec 21, 2015 at 01:12:34PM +0800, Junxiao Bi wrote:
>> Hi Mark,
>>
>> On 12/19/2015 07:23 AM, Mark Fasheh wrote:
>>> On Tue, Dec 15, 2015 at 09:43:48AM +0800, Junxiao Bi wrote:
 Hi Mark,

 On 12/15/2015 03:18 AM, Mark Fasheh wrote:
> On Mon, Dec 14, 2015 at 02:03:17PM +0800, Junxiao Bi wrote:
>>> Second, this issue can be reproduced in old Linux kernels (e.g. 
>>> 3.16.7-24)? there should not be any regression issue? 
>> Maybe just hard to reproduce, ocfs2 supports recursive locking.
>
> In what sense? The DLM might but the FS should never be making use of 
> such a
> mechanism (it would be for userspace users).
 See commit 743b5f1434f5 ("ocfs2: take inode lock in
 ocfs2_iop_set/get_acl()"), it used recursive locking and caused a
 deadlock, the call trace is in this patch's log.
>>>
>>> Ahh ok so it's part of the buggy patch.
>>>
>>>
> We really can't add recursive locks without this getting rejected 
> upstream.
> There's a whole slew of reasons why we don't like those in the kernel.
 Is there any harm to support this lock in kernel?
>>>
>>> Yeah so you can google search on why recursive locks are considered harmful
>>> by many programmers and in the Linux Kernel they are a big 'No No'. We used
>>> to have one recursive lock (the 'big kernel lock') which took a large effort
>>> to clean up.
>>>
>>> Most objections are going to come down to the readability of the code and
>>> the nasty bugs that can come about as a result. Here's a random blog post I
>>> found explaining some of this:
>>>
>>> http://blog.stephencleary.com/2013/04/recursive-re-entrant-locks.html
>> Good doc. Thank you for sharing it. Learned more about recursive locking.
> 
> Np, glad it helped.
> 
> 
>> But I am afraid, cluster lock suffers all drawbacks mentioned in that
>> doc since it is stick to node. P1 and P2 can hold one EX lock at the
>> same time, is it a kind of recursive locking for cluster lock?
>> Maybe it is just a bad naming of recursive locking. If two processes in
>> one node are allowed to hold one EX lock at the same time, why one
>> process not allowed to lock one EX lock two times?
> 
> Yeah I think maybe it's 'bad naming' as you say? What dlmglue is doing we
> call 'lock caching'.
> 
> It's possible to use the dlm without caching the locks but that means every
> process wanting a cluster lock will have to make a dlm request. So instead,
> we hold onto the lock until another node wants it. However, this effectively
> makes each lock node-wide (as opposed to per-process which we would get
> without the lock caching).
Yes.
> 
> For local process locking, we use the VFS locks (inode mutex, etc).
Yes.
> 
> So the cluster lock is never being acquired more than once, it gets
> acquired and dlmglue caches access to it in a fair way. Hence it is not
> being acquired 'recusively'.
> 
> Callers of dlmglue honor the usual locking rules - the same ones that we
> honor with local locks. For example, they must be properly ordered with
> respect to other locks.
> 
> 
> As you point out, we could turn the dlmglue locking into a true recursive
> locking scheme. But if we do that, then we are changing the design of
> dlmglue and introducing recursive locking into a kernel which does not hav
> it. 
Well, so this is just to keep cluster locking behavior align with local
locks and don't want to involve recursive locking into kernel. But
without recursive locking, it's a little hard to fix this issue, may
need refactor ocfs2_mknod(). Do you have a suggestion about the fix?

> The 2nd point is self explanatory. The reason I would also not want to
> change the design of dlmglue is that I don't feel this bug requires such a
> drastic measure.
Tariq just post another patch to fix this issue(add
OCFS2_LOCK_IGNORE_BLOCKED arg_flags to ocfs2_cluster_lock() to prevent
hang), do you agree with fixing like that?

The way i am using is more common, it can avoid all possible recursive
locking deadlock issue.

Another side benefit is that with this, we can know the lockres some
process is holding and blocked by, this info can be exported to debugfs,
then it will be very useful to live debug a cluster hung issue.

Thanks,
Junxiao.

> 
> Thanks,
>   --Mark
> 
> --
> Mark Fasheh
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-20 Thread Junxiao Bi
Hi Mark,

On 12/19/2015 07:23 AM, Mark Fasheh wrote:
> On Tue, Dec 15, 2015 at 09:43:48AM +0800, Junxiao Bi wrote:
>> Hi Mark,
>>
>> On 12/15/2015 03:18 AM, Mark Fasheh wrote:
>>> On Mon, Dec 14, 2015 at 02:03:17PM +0800, Junxiao Bi wrote:
> Second, this issue can be reproduced in old Linux kernels (e.g. 
> 3.16.7-24)? there should not be any regression issue? 
 Maybe just hard to reproduce, ocfs2 supports recursive locking.
>>>
>>> In what sense? The DLM might but the FS should never be making use of such a
>>> mechanism (it would be for userspace users).
>> See commit 743b5f1434f5 ("ocfs2: take inode lock in
>> ocfs2_iop_set/get_acl()"), it used recursive locking and caused a
>> deadlock, the call trace is in this patch's log.
> 
> Ahh ok so it's part of the buggy patch.
> 
> 
>>> We really can't add recursive locks without this getting rejected upstream.
>>> There's a whole slew of reasons why we don't like those in the kernel.
>> Is there any harm to support this lock in kernel?
> 
> Yeah so you can google search on why recursive locks are considered harmful
> by many programmers and in the Linux Kernel they are a big 'No No'. We used
> to have one recursive lock (the 'big kernel lock') which took a large effort
> to clean up.
> 
> Most objections are going to come down to the readability of the code and
> the nasty bugs that can come about as a result. Here's a random blog post I
> found explaining some of this:
> 
> http://blog.stephencleary.com/2013/04/recursive-re-entrant-locks.html
Good doc. Thank you for sharing it. Learned more about recursive locking.

But I am afraid, cluster lock suffers all drawbacks mentioned in that
doc since it is stick to node. P1 and P2 can hold one EX lock at the
same time, is it a kind of recursive locking for cluster lock?
Maybe it is just a bad naming of recursive locking. If two processes in
one node are allowed to hold one EX lock at the same time, why one
process not allowed to lock one EX lock two times?

Thanks,
Junxiao.

> 
> I'm pretty sure Linus has a rant about it somewhere too you can find.
> 
> 
> But basically your approach to fixing situations like this is going to be
> refactoring the code in a readable manner such that the lock is only taken
> once per code path.
> 
> Hope that all helps,
>   --Mark
> 
> 
> --
> Mark Fasheh
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-18 Thread Mark Fasheh
On Tue, Dec 15, 2015 at 09:43:48AM +0800, Junxiao Bi wrote:
> Hi Mark,
> 
> On 12/15/2015 03:18 AM, Mark Fasheh wrote:
> > On Mon, Dec 14, 2015 at 02:03:17PM +0800, Junxiao Bi wrote:
> >>> Second, this issue can be reproduced in old Linux kernels (e.g. 
> >>> 3.16.7-24)? there should not be any regression issue? 
> >> Maybe just hard to reproduce, ocfs2 supports recursive locking.
> > 
> > In what sense? The DLM might but the FS should never be making use of such a
> > mechanism (it would be for userspace users).
> See commit 743b5f1434f5 ("ocfs2: take inode lock in
> ocfs2_iop_set/get_acl()"), it used recursive locking and caused a
> deadlock, the call trace is in this patch's log.

Ahh ok so it's part of the buggy patch.


> > We really can't add recursive locks without this getting rejected upstream.
> > There's a whole slew of reasons why we don't like those in the kernel.
> Is there any harm to support this lock in kernel?

Yeah so you can google search on why recursive locks are considered harmful
by many programmers and in the Linux Kernel they are a big 'No No'. We used
to have one recursive lock (the 'big kernel lock') which took a large effort
to clean up.

Most objections are going to come down to the readability of the code and
the nasty bugs that can come about as a result. Here's a random blog post I
found explaining some of this:

http://blog.stephencleary.com/2013/04/recursive-re-entrant-locks.html

I'm pretty sure Linus has a rant about it somewhere too you can find.


But basically your approach to fixing situations like this is going to be
refactoring the code in a readable manner such that the lock is only taken
once per code path.

Hope that all helps,
--Mark


--
Mark Fasheh

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-14 Thread Junxiao Bi
Hi Mark,

On 12/15/2015 03:18 AM, Mark Fasheh wrote:
> On Mon, Dec 14, 2015 at 02:03:17PM +0800, Junxiao Bi wrote:
>>> Second, this issue can be reproduced in old Linux kernels (e.g. 3.16.7-24)? 
>>> there should not be any regression issue? 
>> Maybe just hard to reproduce, ocfs2 supports recursive locking.
> 
> In what sense? The DLM might but the FS should never be making use of such a
> mechanism (it would be for userspace users).
See commit 743b5f1434f5 ("ocfs2: take inode lock in
ocfs2_iop_set/get_acl()"), it used recursive locking and caused a
deadlock, the call trace is in this patch's log.
> 
> We really can't add recursive locks without this getting rejected upstream.
> There's a whole slew of reasons why we don't like those in the kernel.
Is there any harm to support this lock in kernel?

Thanks,
Junxiao.

>   --Mark
> 
> --
> Mark Fasheh
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-14 Thread Junxiao Bi
On 12/14/2015 04:44 PM, Eric Ren wrote:
> Hi Junxiao,
> 
> On Mon, Dec 14, 2015 at 09:57:38AM +0800, Junxiao Bi wrote: 
>> The following locking order can cause a deadlock.
>> Process A on Node X: Process B on Node Y:
>> lock_XYZ(PR)
>>  lock_XYZ(EX)
>> lock_XYZ(PR) >>> blocked forever by OCFS2_LOCK_BLOCKED.
>>
>> Use ocfs2-test multiple reflink test can reproduce this on v4.3 kernel,
>> the whole cluster hung, and get the following call trace.
>>
>>  INFO: task multi_reflink_t:10118 blocked for more than 120 seconds.
>>Tainted: G   OE   4.3.0 #1
>>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>  multi_reflink_t D 88003e816980 0 10118  10117 0x0080
>>   880005b735f8 0082 81a25500 88003e75
>>   880005b735c8 8117992f ea929f80 88003e816980
>>   7fff  0001 ea929f80
>>  Call Trace:
>>   [] ? find_get_entry+0x2f/0xc0
>>   [] schedule+0x3e/0x80
>>   [] schedule_timeout+0x1c8/0x220
>>   [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
>>   [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
>>   [] ? ocfs2_buffer_cached+0x99/0x170 [ocfs2]
>>   [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
>>   [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
>>   [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>>   [] wait_for_completion+0xde/0x110
>>   [] ? try_to_wake_up+0x240/0x240
>>   [] __ocfs2_cluster_lock+0x20d/0x720 [ocfs2]
>>   [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>>   [] ocfs2_inode_lock_full_nested+0x181/0x400 [ocfs2]
>>   [] ? ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
>>   [] ? igrab+0x42/0x70
>>   [] ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
>>   [] get_acl+0x53/0x70
>>   [] posix_acl_create+0x73/0x130
>>   [] ocfs2_mknod+0x7cf/0x1140 [ocfs2]
>>   [] ocfs2_create+0x62/0x110 [ocfs2]
>>   [] ? __d_alloc+0x65/0x190
>>   [] ? __inode_permission+0x4e/0xd0
>>   [] vfs_create+0xd5/0x100
>>   [] ? lookup_real+0x1d/0x60
>>   [] lookup_open+0x173/0x1a0
>>   [] ? percpu_down_read+0x16/0x70
>>   [] do_last+0x31a/0x830
>>   [] ? __inode_permission+0x4e/0xd0
>>   [] ? inode_permission+0x18/0x50
>>   [] ? link_path_walk+0x290/0x550
>>   [] path_openat+0x7c/0x140
>>   [] do_filp_open+0x85/0xe0
>>   [] ? getname_flags+0x7f/0x1f0
>>   [] do_sys_open+0x11a/0x220
>>   [] ? syscall_trace_enter_phase1+0x15b/0x170
>>   [] SyS_open+0x1e/0x20
>>   [] entry_SYSCALL_64_fastpath+0x12/0x71
>>
>> commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
>> add a recursive locking to ocfs2_mknod() which exports this deadlock, but
>> indeed this is a common issue, it can be triggered in other place.
>>
>> Record processes who owned the cluster lock, allow recursive lock to go
> Recording process hurts scalability, performace and readability heavily, 
> right?
Do you mean searching the l_owner_list of lockres when locking hurting
the above?

>> can fix PR+PR, EX+EX, EX+PR deadlock. But it can't fix the PR+EX deadlock,
> Could you please describe why PR+EX is special?
Because on the master nodes, the lockres of blocked node Y is in the
position before the one of node X from the converting list. So even let
Ex go from Node X, still a deadlock.

Thanks,
Junxiao.

> 
> Thanks,
> Eric
>> to avoid this, the second EX locking must use non-block way.
>>
>> Signed-off-by: Junxiao Bi 
>> ---
>>  fs/ocfs2/dlmglue.c |  209 
>> 
>>  fs/ocfs2/ocfs2.h   |   15 
>>  2 files changed, 211 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 1c91103..e46be46 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -33,6 +33,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define MLOG_MASK_PREFIX ML_DLM_GLUE
>>  #include 
>> @@ -115,6 +116,7 @@ static int ocfs2_check_refcount_downconvert(struct 
>> ocfs2_lock_res *lockres,
>>  int new_level);
>>  static int ocfs2_refcount_convert_worker(struct ocfs2_lock_res *lockres,
>>   int blocking);
>> +static int ocfs2_is_recursive_lock(struct ocfs2_lock_res *lockres);
>>  
>>  #define mlog_meta_lvb(__level, __lockres) ocfs2_dump_meta_lvb_info(__level, 
>> __PRETTY_FUNCTION__, __LINE__, __lockres)
>>  
>> @@ -341,8 +343,9 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
>>   struct ocfs2_lock_res *lockres,
>>   int level,
>>   u32 dlm_flags);
>> -static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res 
>> *lockres,
>> - int wanted);
>> +static inline int
>> +ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lockres,
>> +   int wanted, int nonblock, int *canwait);
>>  

Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-14 Thread Eric Ren
Hi,

On Mon, Dec 14, 2015 at 05:02:26PM +0800, Junxiao Bi wrote: 
> On 12/14/2015 04:44 PM, Eric Ren wrote:
> > Hi Junxiao,
> > 
> > On Mon, Dec 14, 2015 at 09:57:38AM +0800, Junxiao Bi wrote: 
> >> The following locking order can cause a deadlock.
> >> Process A on Node X: Process B on Node Y:
> >> lock_XYZ(PR)
> >>  lock_XYZ(EX)
> >> lock_XYZ(PR) >>> blocked forever by OCFS2_LOCK_BLOCKED.
> >>
> >> Use ocfs2-test multiple reflink test can reproduce this on v4.3 kernel,
> >> the whole cluster hung, and get the following call trace.
> >>
> >>  INFO: task multi_reflink_t:10118 blocked for more than 120 seconds.
> >>Tainted: G   OE   4.3.0 #1
> >>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>  multi_reflink_t D 88003e816980 0 10118  10117 0x0080
> >>   880005b735f8 0082 81a25500 88003e75
> >>   880005b735c8 8117992f ea929f80 88003e816980
> >>   7fff  0001 ea929f80
> >>  Call Trace:
> >>   [] ? find_get_entry+0x2f/0xc0
> >>   [] schedule+0x3e/0x80
> >>   [] schedule_timeout+0x1c8/0x220
> >>   [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
> >>   [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
> >>   [] ? ocfs2_buffer_cached+0x99/0x170 [ocfs2]
> >>   [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
> >>   [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
> >>   [] ? 
> >> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
> >>   [] wait_for_completion+0xde/0x110
> >>   [] ? try_to_wake_up+0x240/0x240
> >>   [] __ocfs2_cluster_lock+0x20d/0x720 [ocfs2]
> >>   [] ? 
> >> __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
> >>   [] ocfs2_inode_lock_full_nested+0x181/0x400 [ocfs2]
> >>   [] ? ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
> >>   [] ? igrab+0x42/0x70
> >>   [] ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
> >>   [] get_acl+0x53/0x70
> >>   [] posix_acl_create+0x73/0x130
> >>   [] ocfs2_mknod+0x7cf/0x1140 [ocfs2]
> >>   [] ocfs2_create+0x62/0x110 [ocfs2]
> >>   [] ? __d_alloc+0x65/0x190
> >>   [] ? __inode_permission+0x4e/0xd0
> >>   [] vfs_create+0xd5/0x100
> >>   [] ? lookup_real+0x1d/0x60
> >>   [] lookup_open+0x173/0x1a0
> >>   [] ? percpu_down_read+0x16/0x70
> >>   [] do_last+0x31a/0x830
> >>   [] ? __inode_permission+0x4e/0xd0
> >>   [] ? inode_permission+0x18/0x50
> >>   [] ? link_path_walk+0x290/0x550
> >>   [] path_openat+0x7c/0x140
> >>   [] do_filp_open+0x85/0xe0
> >>   [] ? getname_flags+0x7f/0x1f0
> >>   [] do_sys_open+0x11a/0x220
> >>   [] ? syscall_trace_enter_phase1+0x15b/0x170
> >>   [] SyS_open+0x1e/0x20
> >>   [] entry_SYSCALL_64_fastpath+0x12/0x71
> >>
> >> commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> >> add a recursive locking to ocfs2_mknod() which exports this deadlock, but
> >> indeed this is a common issue, it can be triggered in other place.
> >>
> >> Record processes who owned the cluster lock, allow recursive lock to go
> > Recording process hurts scalability, performace and readability heavily, 
> > right?
> Do you mean searching the l_owner_list of lockres when locking hurting
> the above?
Hm, yes.
> 
> >> can fix PR+PR, EX+EX, EX+PR deadlock. But it can't fix the PR+EX deadlock,
> > Could you please describe why PR+EX is special?
> Because on the master nodes, the lockres of blocked node Y is in the
> position before the one of node X from the converting list. So even let
> Ex go from Node X, still a deadlock.
Got it, thanks.

Eric
> 
> Thanks,
> Junxiao.
> 
> > 
> > Thanks,
> > Eric
> >> to avoid this, the second EX locking must use non-block way.
> >>
> >> Signed-off-by: Junxiao Bi 
> >> ---
> >>  fs/ocfs2/dlmglue.c |  209 
> >> 
> >>  fs/ocfs2/ocfs2.h   |   15 
> >>  2 files changed, 211 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> >> index 1c91103..e46be46 100644
> >> --- a/fs/ocfs2/dlmglue.c
> >> +++ b/fs/ocfs2/dlmglue.c
> >> @@ -33,6 +33,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  
> >>  #define MLOG_MASK_PREFIX ML_DLM_GLUE
> >>  #include 
> >> @@ -115,6 +116,7 @@ static int ocfs2_check_refcount_downconvert(struct 
> >> ocfs2_lock_res *lockres,
> >>int new_level);
> >>  static int ocfs2_refcount_convert_worker(struct ocfs2_lock_res *lockres,
> >> int blocking);
> >> +static int ocfs2_is_recursive_lock(struct ocfs2_lock_res *lockres);
> >>  
> >>  #define mlog_meta_lvb(__level, __lockres) 
> >> ocfs2_dump_meta_lvb_info(__level, __PRETTY_FUNCTION__, __LINE__, __lockres)
> >>  
> >> @@ -341,8 +343,9 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
> >> struct ocfs2_lock_res *lockres,
> >> int level,
> >> u32 dlm_flags);
> 

Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-14 Thread Junxiao Bi

> 在 2015年12月14日,下午4:57,Eric Ren  写道:
> 
> Hi,
> 
> On Mon, Dec 14, 2015 at 02:03:17PM +0800, Junxiao Bi wrote: 
>> On 12/14/2015 01:39 PM, Gang He wrote:
>>> Hello Junxiao,
>>> 
>>> From the initial description, the second lock_XYZ(PR) should be blocked, 
>>> since DLM have a fair queue  mechanism, otherwise, it looks to bring a 
>>> write lock starvation.
>> Should be blocked? No, that is a deadlock. I don't think this recursive
>> locking is common, so no need care starvation here.
> "not common" is really good news. I think we should list recursive use
> cases first
I have said in pervious mail, this way is simple for developer, it is usually 
hard to find the recursive use case before see the deadlock call trace.
> and try to decrease that use before messing up "__ocfs2_cluster_lock"
> further.
I don’t see this is a mess up, I think record which processes are using the 
lockers is very useful. I am going to add a blocker list of lockres. With this, 
for one process, we can see which locks it is holding, and which lock it is 
blocked.
This can be exported to debugfs and is useful to debug deadlock issue.

> As for this patch,  cost is too high :/
I don’t think so. The list will not be long, and searching on it will be very 
fast. Also please keep in mind, ocfs2_cluster_lock/unlock itself is never the 
bottle neck of the performance, when you get a high delay for locking, that is 
because io triggered by down convert on other nodes or lock race hurt the 
performance, a list search is just a cpu op, it is much faster than io. I don’t 
see it can hurt performance.

Thanks,
Junxiao.
> 
> Thanks,
> Eric
>> 
>>> Second, this issue can be reproduced in old Linux kernels (e.g. 3.16.7-24)? 
>>> there should not be any regression issue? 
>> Maybe just hard to reproduce, ocfs2 supports recursive locking.
>> 
>>> Finally, really do not like nested using lock, can we avoid this.
>> I didn't see a good reason why this should be avoided. Without this,
>> developer needs pay more attend to not involve recursive locking,
>> usually that is very hard before run a full test or a very detailed review.
>> 
>> Thanks,
>> Junxiao.
>>> 
>>> Thanks
>>> Gang
>>> 
>>> 
>> 
>> 
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>> 
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com 
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
> 
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-14 Thread Mark Fasheh
On Mon, Dec 14, 2015 at 02:03:17PM +0800, Junxiao Bi wrote:
> > Second, this issue can be reproduced in old Linux kernels (e.g. 3.16.7-24)? 
> > there should not be any regression issue? 
> Maybe just hard to reproduce, ocfs2 supports recursive locking.

In what sense? The DLM might but the FS should never be making use of such a
mechanism (it would be for userspace users).

We really can't add recursive locks without this getting rejected upstream.
There's a whole slew of reasons why we don't like those in the kernel.
--Mark

--
Mark Fasheh

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-14 Thread Eric Ren
Hi Junxiao,

On Mon, Dec 14, 2015 at 09:57:38AM +0800, Junxiao Bi wrote: 
> The following locking order can cause a deadlock.
> Process A on Node X: Process B on Node Y:
> lock_XYZ(PR)
>  lock_XYZ(EX)
> lock_XYZ(PR) >>> blocked forever by OCFS2_LOCK_BLOCKED.
> 
> Use ocfs2-test multiple reflink test can reproduce this on v4.3 kernel,
> the whole cluster hung, and get the following call trace.
> 
>  INFO: task multi_reflink_t:10118 blocked for more than 120 seconds.
>Tainted: G   OE   4.3.0 #1
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  multi_reflink_t D 88003e816980 0 10118  10117 0x0080
>   880005b735f8 0082 81a25500 88003e75
>   880005b735c8 8117992f ea929f80 88003e816980
>   7fff  0001 ea929f80
>  Call Trace:
>   [] ? find_get_entry+0x2f/0xc0
>   [] schedule+0x3e/0x80
>   [] schedule_timeout+0x1c8/0x220
>   [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
>   [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
>   [] ? ocfs2_buffer_cached+0x99/0x170 [ocfs2]
>   [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
>   [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
>   [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>   [] wait_for_completion+0xde/0x110
>   [] ? try_to_wake_up+0x240/0x240
>   [] __ocfs2_cluster_lock+0x20d/0x720 [ocfs2]
>   [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>   [] ocfs2_inode_lock_full_nested+0x181/0x400 [ocfs2]
>   [] ? ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
>   [] ? igrab+0x42/0x70
>   [] ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
>   [] get_acl+0x53/0x70
>   [] posix_acl_create+0x73/0x130
>   [] ocfs2_mknod+0x7cf/0x1140 [ocfs2]
>   [] ocfs2_create+0x62/0x110 [ocfs2]
>   [] ? __d_alloc+0x65/0x190
>   [] ? __inode_permission+0x4e/0xd0
>   [] vfs_create+0xd5/0x100
>   [] ? lookup_real+0x1d/0x60
>   [] lookup_open+0x173/0x1a0
>   [] ? percpu_down_read+0x16/0x70
>   [] do_last+0x31a/0x830
>   [] ? __inode_permission+0x4e/0xd0
>   [] ? inode_permission+0x18/0x50
>   [] ? link_path_walk+0x290/0x550
>   [] path_openat+0x7c/0x140
>   [] do_filp_open+0x85/0xe0
>   [] ? getname_flags+0x7f/0x1f0
>   [] do_sys_open+0x11a/0x220
>   [] ? syscall_trace_enter_phase1+0x15b/0x170
>   [] SyS_open+0x1e/0x20
>   [] entry_SYSCALL_64_fastpath+0x12/0x71
> 
> commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> add a recursive locking to ocfs2_mknod() which exports this deadlock, but
> indeed this is a common issue, it can be triggered in other place.
> 
> Record processes who owned the cluster lock, allow recursive lock to go
Recording process hurts scalability, performace and readability heavily, right?
> can fix PR+PR, EX+EX, EX+PR deadlock. But it can't fix the PR+EX deadlock,
Could you please describe why PR+EX is special?

Thanks,
Eric
> to avoid this, the second EX locking must use non-block way.
> 
> Signed-off-by: Junxiao Bi 
> ---
>  fs/ocfs2/dlmglue.c |  209 
> 
>  fs/ocfs2/ocfs2.h   |   15 
>  2 files changed, 211 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 1c91103..e46be46 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define MLOG_MASK_PREFIX ML_DLM_GLUE
>  #include 
> @@ -115,6 +116,7 @@ static int ocfs2_check_refcount_downconvert(struct 
> ocfs2_lock_res *lockres,
>   int new_level);
>  static int ocfs2_refcount_convert_worker(struct ocfs2_lock_res *lockres,
>int blocking);
> +static int ocfs2_is_recursive_lock(struct ocfs2_lock_res *lockres);
>  
>  #define mlog_meta_lvb(__level, __lockres) ocfs2_dump_meta_lvb_info(__level, 
> __PRETTY_FUNCTION__, __LINE__, __lockres)
>  
> @@ -341,8 +343,9 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
>struct ocfs2_lock_res *lockres,
>int level,
>u32 dlm_flags);
> -static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res 
> *lockres,
> -  int wanted);
> +static inline int
> +ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lockres,
> +int wanted, int nonblock, int *canwait);
>  static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
>  struct ocfs2_lock_res *lockres,
>  int level, unsigned long caller_ip);
> @@ -531,6 +534,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>   init_waitqueue_head(>l_event);
>   INIT_LIST_HEAD(>l_blocked_list);
>   INIT_LIST_HEAD(>l_mask_waiters);
> + 

Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-14 Thread Eric Ren
Hi,

On Mon, Dec 14, 2015 at 02:03:17PM +0800, Junxiao Bi wrote: 
> On 12/14/2015 01:39 PM, Gang He wrote:
> > Hello Junxiao,
> > 
> > From the initial description, the second lock_XYZ(PR) should be blocked, 
> > since DLM have a fair queue  mechanism, otherwise, it looks to bring a 
> > write lock starvation.
> Should be blocked? No, that is a deadlock. I don't think this recursive
> locking is common, so no need care starvation here.
"not common" is really good news. I think we should list recursive use
cases first and try to decrease that use before messing up 
"__ocfs2_cluster_lock"
further. As for this patch,  cost is too high :/

Thanks,
Eric
> 
> > Second, this issue can be reproduced in old Linux kernels (e.g. 3.16.7-24)? 
> > there should not be any regression issue? 
> Maybe just hard to reproduce, ocfs2 supports recursive locking.
> 
> > Finally, really do not like nested using lock, can we avoid this.
> I didn't see a good reason why this should be avoided. Without this,
> developer needs pay more attend to not involve recursive locking,
> usually that is very hard before run a full test or a very detailed review.
> 
> Thanks,
> Junxiao.
> > 
> > Thanks
> > Gang
> > 
> > 
> 
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-13 Thread Gang He
Hello Junxiao,

>From the initial description, the second lock_XYZ(PR) should be blocked, since 
>DLM have a fair queue  mechanism, otherwise, it looks to bring a write lock 
>starvation.
Second, this issue can be reproduced in old Linux kernels (e.g. 3.16.7-24)? 
there should not be any regression issue? 
Finally, really do not like nested using lock, can we avoid this.

Thanks
Gang


-- 



>>> 
> The following locking order can cause a deadlock.
> Process A on Node X: Process B on Node Y:
> lock_XYZ(PR)
>  lock_XYZ(EX)
> lock_XYZ(PR) >>> blocked forever by OCFS2_LOCK_BLOCKED.
> 
> Use ocfs2-test multiple reflink test can reproduce this on v4.3 kernel,
> the whole cluster hung, and get the following call trace.
> 
>  INFO: task multi_reflink_t:10118 blocked for more than 120 seconds.
>Tainted: G   OE   4.3.0 #1
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  multi_reflink_t D 88003e816980 0 10118  10117 0x0080
>   880005b735f8 0082 81a25500 88003e75
>   880005b735c8 8117992f ea929f80 88003e816980
>   7fff  0001 ea929f80
>  Call Trace:
>   [] ? find_get_entry+0x2f/0xc0
>   [] schedule+0x3e/0x80
>   [] schedule_timeout+0x1c8/0x220
>   [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
>   [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
>   [] ? ocfs2_buffer_cached+0x99/0x170 [ocfs2]
>   [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
>   [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
>   [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>   [] wait_for_completion+0xde/0x110
>   [] ? try_to_wake_up+0x240/0x240
>   [] __ocfs2_cluster_lock+0x20d/0x720 [ocfs2]
>   [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>   [] ocfs2_inode_lock_full_nested+0x181/0x400 [ocfs2]
>   [] ? ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
>   [] ? igrab+0x42/0x70
>   [] ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
>   [] get_acl+0x53/0x70
>   [] posix_acl_create+0x73/0x130
>   [] ocfs2_mknod+0x7cf/0x1140 [ocfs2]
>   [] ocfs2_create+0x62/0x110 [ocfs2]
>   [] ? __d_alloc+0x65/0x190
>   [] ? __inode_permission+0x4e/0xd0
>   [] vfs_create+0xd5/0x100
>   [] ? lookup_real+0x1d/0x60
>   [] lookup_open+0x173/0x1a0
>   [] ? percpu_down_read+0x16/0x70
>   [] do_last+0x31a/0x830
>   [] ? __inode_permission+0x4e/0xd0
>   [] ? inode_permission+0x18/0x50
>   [] ? link_path_walk+0x290/0x550
>   [] path_openat+0x7c/0x140
>   [] do_filp_open+0x85/0xe0
>   [] ? getname_flags+0x7f/0x1f0
>   [] do_sys_open+0x11a/0x220
>   [] ? syscall_trace_enter_phase1+0x15b/0x170
>   [] SyS_open+0x1e/0x20
>   [] entry_SYSCALL_64_fastpath+0x12/0x71
> 
> commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> add a recursive locking to ocfs2_mknod() which exports this deadlock, but
> indeed this is a common issue, it can be triggered in other place.
> 
> Record processes who owned the cluster lock, allow recursive lock to go
> can fix PR+PR, EX+EX, EX+PR deadlock. But it can't fix the PR+EX deadlock,
> to avoid this, the second EX locking must use non-block way.
> 
> Signed-off-by: Junxiao Bi 
> ---
>  fs/ocfs2/dlmglue.c |  209 
> 
>  fs/ocfs2/ocfs2.h   |   15 
>  2 files changed, 211 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 1c91103..e46be46 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define MLOG_MASK_PREFIX ML_DLM_GLUE
>  #include 
> @@ -115,6 +116,7 @@ static int ocfs2_check_refcount_downconvert(struct 
> ocfs2_lock_res *lockres,
>   int new_level);
>  static int ocfs2_refcount_convert_worker(struct ocfs2_lock_res *lockres,
>int blocking);
> +static int ocfs2_is_recursive_lock(struct ocfs2_lock_res *lockres);
>  
>  #define mlog_meta_lvb(__level, __lockres) ocfs2_dump_meta_lvb_info(__level, 
> __PRETTY_FUNCTION__, __LINE__, __lockres)
>  
> @@ -341,8 +343,9 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
>struct ocfs2_lock_res *lockres,
>int level,
>u32 dlm_flags);
> -static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res 
> *lockres,
> -  int wanted);
> +static inline int
> +ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lockres,
> +int wanted, int nonblock, int *canwait);
>  static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
>  struct ocfs2_lock_res *lockres,
>  int level, unsigned long caller_ip);
> @@ -531,6 +534,7 @@ void 

[Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-13 Thread Junxiao Bi
The following locking order can cause a deadlock.
Process A on Node X: Process B on Node Y:
lock_XYZ(PR)
 lock_XYZ(EX)
lock_XYZ(PR) >>> blocked forever by OCFS2_LOCK_BLOCKED.

Use ocfs2-test multiple reflink test can reproduce this on v4.3 kernel,
the whole cluster hung, and get the following call trace.

 INFO: task multi_reflink_t:10118 blocked for more than 120 seconds.
   Tainted: G   OE   4.3.0 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 multi_reflink_t D 88003e816980 0 10118  10117 0x0080
  880005b735f8 0082 81a25500 88003e75
  880005b735c8 8117992f ea929f80 88003e816980
  7fff  0001 ea929f80
 Call Trace:
  [] ? find_get_entry+0x2f/0xc0
  [] schedule+0x3e/0x80
  [] schedule_timeout+0x1c8/0x220
  [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
  [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
  [] ? ocfs2_buffer_cached+0x99/0x170 [ocfs2]
  [] ? ocfs2_inode_cache_unlock+0x14/0x20 [ocfs2]
  [] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]
  [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
  [] wait_for_completion+0xde/0x110
  [] ? try_to_wake_up+0x240/0x240
  [] __ocfs2_cluster_lock+0x20d/0x720 [ocfs2]
  [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
  [] ocfs2_inode_lock_full_nested+0x181/0x400 [ocfs2]
  [] ? ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
  [] ? igrab+0x42/0x70
  [] ocfs2_iop_get_acl+0x53/0x113 [ocfs2]
  [] get_acl+0x53/0x70
  [] posix_acl_create+0x73/0x130
  [] ocfs2_mknod+0x7cf/0x1140 [ocfs2]
  [] ocfs2_create+0x62/0x110 [ocfs2]
  [] ? __d_alloc+0x65/0x190
  [] ? __inode_permission+0x4e/0xd0
  [] vfs_create+0xd5/0x100
  [] ? lookup_real+0x1d/0x60
  [] lookup_open+0x173/0x1a0
  [] ? percpu_down_read+0x16/0x70
  [] do_last+0x31a/0x830
  [] ? __inode_permission+0x4e/0xd0
  [] ? inode_permission+0x18/0x50
  [] ? link_path_walk+0x290/0x550
  [] path_openat+0x7c/0x140
  [] do_filp_open+0x85/0xe0
  [] ? getname_flags+0x7f/0x1f0
  [] do_sys_open+0x11a/0x220
  [] ? syscall_trace_enter_phase1+0x15b/0x170
  [] SyS_open+0x1e/0x20
  [] entry_SYSCALL_64_fastpath+0x12/0x71

commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
add a recursive locking to ocfs2_mknod() which exports this deadlock, but
indeed this is a common issue, it can be triggered in other place.

Record processes who owned the cluster lock, allow recursive lock to go
can fix PR+PR, EX+EX, EX+PR deadlock. But it can't fix the PR+EX deadlock,
to avoid this, the second EX locking must use non-block way.

Signed-off-by: Junxiao Bi 
---
 fs/ocfs2/dlmglue.c |  209 
 fs/ocfs2/ocfs2.h   |   15 
 2 files changed, 211 insertions(+), 13 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 1c91103..e46be46 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MLOG_MASK_PREFIX ML_DLM_GLUE
 #include 
@@ -115,6 +116,7 @@ static int ocfs2_check_refcount_downconvert(struct 
ocfs2_lock_res *lockres,
int new_level);
 static int ocfs2_refcount_convert_worker(struct ocfs2_lock_res *lockres,
 int blocking);
+static int ocfs2_is_recursive_lock(struct ocfs2_lock_res *lockres);
 
 #define mlog_meta_lvb(__level, __lockres) ocfs2_dump_meta_lvb_info(__level, 
__PRETTY_FUNCTION__, __LINE__, __lockres)
 
@@ -341,8 +343,9 @@ static int ocfs2_lock_create(struct ocfs2_super *osb,
 struct ocfs2_lock_res *lockres,
 int level,
 u32 dlm_flags);
-static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res 
*lockres,
-int wanted);
+static inline int
+ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lockres,
+  int wanted, int nonblock, int *canwait);
 static void __ocfs2_cluster_unlock(struct ocfs2_super *osb,
   struct ocfs2_lock_res *lockres,
   int level, unsigned long caller_ip);
@@ -531,6 +534,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_owner_list);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -782,6 +786,10 @@ static inline void ocfs2_dec_holders(struct ocfs2_lock_res 
*lockres,
default:
BUG();
}
+
+   /* l_owner_list should be empty if no holders. */
+   BUG_ON(!lockres->l_ex_holders && !lockres->l_ro_holders \
+   && !list_empty(>l_owner_list));
 }
 
 /* WARNING: This 

Re: [Ocfs2-devel] [PATCH] ocfs2: dlm: fix recursive locking deadlock

2015-12-13 Thread Junxiao Bi
On 12/14/2015 01:39 PM, Gang He wrote:
> Hello Junxiao,
> 
> From the initial description, the second lock_XYZ(PR) should be blocked, 
> since DLM have a fair queue  mechanism, otherwise, it looks to bring a write 
> lock starvation.
Should be blocked? No, that is a deadlock. I don't think this recursive
locking is common, so no need care starvation here.

> Second, this issue can be reproduced in old Linux kernels (e.g. 3.16.7-24)? 
> there should not be any regression issue? 
Maybe just hard to reproduce, ocfs2 supports recursive locking.

> Finally, really do not like nested using lock, can we avoid this.
I didn't see a good reason why this should be avoided. Without this,
developer needs pay more attend to not involve recursive locking,
usually that is very hard before run a full test or a very detailed review.

Thanks,
Junxiao.
> 
> Thanks
> Gang
> 
> 


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel