Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points
Hi Eric, [auto build test WARNING on linus/master] [also build test WARNING on v4.10-rc2 next-20170106] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Eric-Ren/fix-deadlock-caused-by-recursive-cluster-locking/20170106-200837 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) include/linux/compiler.h:253:8: sparse: attribute 'no_sanitize_address': unknown attribute >> fs/ocfs2/dlmglue.h:189:50: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:185:29: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:187:32: sparse: marked inline, but without a definition >> fs/ocfs2/dlmglue.h:189:50: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:185:29: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:187:32: sparse: marked inline, but without a definition -- include/linux/compiler.h:253:8: sparse: attribute 'no_sanitize_address': unknown attribute >> fs/ocfs2/dlmglue.h:189:50: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:185:29: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:187:32: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:187:32: sparse: marked inline, but without a definition >> fs/ocfs2/dlmglue.h:189:50: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:185:29: sparse: marked inline, but without a definition fs/ocfs2/dlmglue.h:187:32: sparse: marked inline, but without a definition vim +189 fs/ocfs2/dlmglue.h 34d024f8 Mark Fasheh 2007-09-24 173 void ocfs2_wake_downconvert_thread(struct ocfs2_super *osb); ccd979bd Mark Fasheh 2005-12-15 174 ccd979bd Mark Fasheh 2005-12-15 175 struct ocfs2_dlm_debug *ocfs2_new_dlm_debug(void); ccd979bd Mark Fasheh 2005-12-15 176 void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug *dlm_debug); ccd979bd Mark Fasheh 2005-12-15 177 63e0c48a Joel Becker 2008-01-30 178 /* To set the locking protocol on module initialization */ 63e0c48a Joel Becker 2008-01-30 179 void ocfs2_set_locking_protocol(void); 9fb5ed3a Eric Ren2017-01-05 180 9fb5ed3a Eric Ren2017-01-05 181 /* 9fb5ed3a Eric Ren2017-01-05 182 * Keep a list of processes who have interest in a lockres. 9fb5ed3a Eric Ren2017-01-05 183 * Note: this is now only uesed for check recursive cluster lock. 9fb5ed3a Eric Ren2017-01-05 184 */ 9fb5ed3a Eric Ren2017-01-05 185 inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, 9fb5ed3a Eric Ren2017-01-05 186 struct ocfs2_holder *oh); 9fb5ed3a Eric Ren2017-01-05 187 inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, 9fb5ed3a Eric Ren2017-01-05 188 struct ocfs2_holder *oh); 9fb5ed3a Eric Ren2017-01-05 @189 inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres); 9fb5ed3a Eric Ren2017-01-05 190 ccd979bd Mark Fasheh 2005-12-15 191 #endif/* DLMGLUE_H */ :: The code at line 189 was first introduced by commit :: 9fb5ed3abab2100ae8d99cee9b25fb92e3154224 ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock :: TO: Eric Ren :: CC: 0day robot --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points
Hi Eric, [auto build test ERROR on linus/master] [also build test ERROR on v4.10-rc2 next-20170106] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Eric-Ren/fix-deadlock-caused-by-recursive-cluster-locking/20170106-200837 config: ia64-allyesconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 6.2.0 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=ia64 All errors (new ones prefixed by >>): In file included from fs/ocfs2/acl.c:31:0: fs/ocfs2/acl.c: In function 'ocfs2_iop_set_acl': >> fs/ocfs2/dlmglue.h:189:29: error: inlining failed in call to always_inline >> 'ocfs2_is_locked_by_me': function body not available inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres); ^ fs/ocfs2/acl.c:292:16: note: called from here has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); ^~ In file included from fs/ocfs2/acl.c:31:0: >> fs/ocfs2/dlmglue.h:189:29: error: inlining failed in call to always_inline >> 'ocfs2_is_locked_by_me': function body not available inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres); ^ fs/ocfs2/acl.c:292:16: note: called from here has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); ^~ In file included from fs/ocfs2/acl.c:31:0: >> fs/ocfs2/dlmglue.h:185:13: error: inlining failed in call to always_inline >> 'ocfs2_add_holder': function body not available inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, ^~~~ fs/ocfs2/acl.c:302:3: note: called from here ocfs2_add_holder(lockres, &oh); ^~ In file included from fs/ocfs2/acl.c:31:0: >> fs/ocfs2/dlmglue.h:187:13: error: inlining failed in call to always_inline >> 'ocfs2_remove_holder': function body not available inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, ^~~ fs/ocfs2/acl.c:307:3: note: called from here ocfs2_remove_holder(lockres, &oh); ^ -- In file included from arch/ia64/include/uapi/asm/intrinsics.h:21:0, from arch/ia64/include/asm/intrinsics.h:10, from arch/ia64/include/asm/bitops.h:18, from include/linux/bitops.h:36, from include/linux/kernel.h:10, from include/linux/list.h:8, from include/linux/wait.h:6, from include/linux/fs.h:5, from fs/ocfs2/file.c:27: fs/ocfs2/file.c: In function 'ocfs2_file_write_iter': arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is not used [-Wunused-value] ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr ~^~~~ fs/ocfs2/file.c:2334:3: note: in expansion of macro 'xchg' xchg(&iocb->ki_complete, saved_ki_complete); ^~~~ In file included from fs/ocfs2/file.c:49:0: fs/ocfs2/file.c: In function 'ocfs2_permission': >> fs/ocfs2/dlmglue.h:189:29: error: inlining failed in call to always_inline >> 'ocfs2_is_locked_by_me': function body not available inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres); ^ fs/ocfs2/file.c:1345:16: note: called from here has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); ^~ In file included from fs/ocfs2/file.c:49:0: >> fs/ocfs2/dlmglue.h:189:29: error: inlining failed in call to always_inline >> 'ocfs2_is_locked_by_me': function body not available inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres); ^ fs/ocfs2/file.c:1345:16: note: called from here has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); ^~ In file included from fs/ocfs2/file.c:49:0: >> fs/ocfs2/dlmglue.h:185:13: error: inlining failed in call to always_inline >> 'ocfs2_add_holder': function body not available inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, ^~~~ fs/ocfs2/file.c:1
Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points
Hi! On 01/06/2017 05:55 PM, Joseph Qi wrote: > On 17/1/6 17:13, Eric Ren wrote: >> Hi, >> Fixes them by adding the tracking logic (in the previous patch) for these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), ocfs2_setattr(). >>> As described cases above, shall we just add the tracking logic only for >>> set/get_acl()? >> >> The idea is to detect recursive locking on the running task stack. Take >> case 1) for >> example if ocfs2_permisssion() >> is not changed: >> >> ocfs2_permission() <=== take PR, ocfs2_holder is not added >>ocfs2_iop_get_acl <=== still take PR, because there is no lock holder >> on the >> tracking list > I mean we have no need to check if locked by me, just do inode lock and > add holder. > This will make code more clean, IMO. Oh, sorry, I get your point this time. I think we need to check it if there are more than one processes that hold PR lock on the same resource. If I don't understand you correctly, please tell me why you think it's not neccessary to check before getting lock? >>> The code logic can only check if it is locked by myself. In the case >> Why only...? >>> described above, ocfs2_permission is the first entry to take inode lock. >>> And even if check succeeds, it is a bug without unlock, but not the case >>> of recursive lock. >> >> By checking succeeds, you mean it's locked by me, right? If so, this flag >> "arg_flags = OCFS2_META_LOCK_GETBH" >> will be passed down to ocfs2_inode_lock_full(), which gets back buffer head >> of >> the disk inode for us if necessary, but doesn't take cluster locking again. >> So, there is >> no need to unlock in such case. > I am trying to state my point more clearly... Thanks a lot! > The issue case you are trying to fix is: > Process A > take inode lock (phase1) > ... > <<< race window (phase2, Process B) The deadlock only happens if process B is on a remote node and request EX lock. Quote the patch[1/2]'s commit message: A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. --- > ... > take inode lock again (phase3) > > Deadlock happens because Process B in phase2 and Process A in phase3 > are waiting for each other. It's local lock's (like i_mutex) responsibility to protect critical section from racing among processes on the same node. > So you are trying to fix it by making phase3 finish without really doing Phase3 can go ahead because this node is already under protection of cluster lock. > __ocfs2_cluster_lock, then Process B can continue either. > Let us bear in mind that phase1 and phase3 are in the same context and > executed in order. That's why I think there is no need to check if locked > by myself in phase1. > If phase1 finds it is already locked by myself, that means the holder > is left by last operation without dec holder. That's why I think it is a bug > instead of a recursive lock case. Did I answer your question? Thanks! Eric > > Thanks, > Joseph >> >> Thanks, >> Eric >> >>> >>> Thanks, >>> Joseph Thanks, Eric > > Thanks, > Joseph >> >> Thanks for your review;-) >> Eric >> >>> >>> Thanks, >>> Joseph Signed-off-by: Eric Ren --- fs/ocfs2/acl.c | 39 ++- fs/ocfs2/file.c | 44 ++-- 2 files changed, 68 insertions(+), 15 deletions(-) diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c index bed1fcb..c539890 100644 --- a/fs/ocfs2/acl.c +++ b/fs/ocfs2/acl.c @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) { struct buffer_head *bh = NULL; int status = 0; - -status = ocfs2_inode_lock(inode, &bh, 1); +int arg_flags = 0, has_locked; +struct ocfs2_holder oh; +struct ocfs2_lock_res *lockres; + +lockres = &OCFS2_I(inode)->ip_inode_lockres; +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); +if (has_locked) +arg_flags = OCFS2_META_LOCK_GETBH; +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags); if (s
Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points
On 17/1/6 17:13, Eric Ren wrote: > Hi, > >>> >>> Fixes them by adding the tracking logic (in the previous patch) for >>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), >>> ocfs2_setattr(). >> As described cases above, shall we just add the tracking logic >> only for set/get_acl()? > > The idea is to detect recursive locking on the running task stack. > Take case 1) for example if ocfs2_permisssion() > is not changed: > > ocfs2_permission() <=== take PR, ocfs2_holder is not added >ocfs2_iop_get_acl <=== still take PR, because there is no lock > holder on the tracking list I mean we have no need to check if locked by me, just do inode lock and add holder. This will make code more clean, IMO. >>> Oh, sorry, I get your point this time. I think we need to check it >>> if there are more than one processes that hold >>> PR lock on the same resource. If I don't understand you correctly, >>> please tell me why you think it's not neccessary >>> to check before getting lock? >> The code logic can only check if it is locked by myself. In the case > Why only...? >> described above, ocfs2_permission is the first entry to take inode lock. >> And even if check succeeds, it is a bug without unlock, but not the case >> of recursive lock. > > By checking succeeds, you mean it's locked by me, right? If so, this flag > "arg_flags = OCFS2_META_LOCK_GETBH" > will be passed down to ocfs2_inode_lock_full(), which gets back buffer > head of > the disk inode for us if necessary, but doesn't take cluster locking > again. So, there is > no need to unlock in such case. I am trying to state my point more clearly... The issue case you are trying to fix is: Process A take inode lock (phase1) ... <<< race window (phase2, Process B) ... take inode lock again (phase3) Deadlock happens because Process B in phase2 and Process A in phase3 are waiting for each other. So you are trying to fix it by making phase3 finish without really doing __ocfs2_cluster_lock, then Process B can continue either. Let us bear in mind that phase1 and phase3 are in the same context and executed in order. That's why I think there is no need to check if locked by myself in phase1. If phase1 finds it is already locked by myself, that means the holder is left by last operation without dec holder. That's why I think it is a bug instead of a recursive lock case. Thanks, Joseph > > Thanks, > Eric > >> >> Thanks, >> Joseph >>> >>> Thanks, >>> Eric Thanks, Joseph > > Thanks for your review;-) > Eric > >> >> Thanks, >> Joseph >>> >>> Signed-off-by: Eric Ren >>> --- >>> fs/ocfs2/acl.c | 39 ++- >>> fs/ocfs2/file.c | 44 ++-- >>> 2 files changed, 68 insertions(+), 15 deletions(-) >>> >>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c >>> index bed1fcb..c539890 100644 >>> --- a/fs/ocfs2/acl.c >>> +++ b/fs/ocfs2/acl.c >>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, >>> struct posix_acl *acl, int type) >>> { >>> struct buffer_head *bh = NULL; >>> int status = 0; >>> - >>> -status = ocfs2_inode_lock(inode, &bh, 1); >>> +int arg_flags = 0, has_locked; >>> +struct ocfs2_holder oh; >>> +struct ocfs2_lock_res *lockres; >>> + >>> +lockres = &OCFS2_I(inode)->ip_inode_lockres; >>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); >>> +if (has_locked) >>> +arg_flags = OCFS2_META_LOCK_GETBH; >>> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags); >>> if (status < 0) { >>> if (status != -ENOENT) >>> mlog_errno(status); >>> return status; >>> } >>> +if (!has_locked) >>> +ocfs2_add_holder(lockres, &oh); >>> + >>> status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, >>> NULL); >>> -ocfs2_inode_unlock(inode, 1); >>> + >>> +if (!has_locked) { >>> +ocfs2_remove_holder(lockres, &oh); >>> +ocfs2_inode_unlock(inode, 1); >>> +} >>> brelse(bh); >>> + >>> return status; >>> } >>> @@ -303,21 +318,35 @@ struct posix_acl >>> *ocfs2_iop_get_acl(struct inode *inode, int type) >>> struct buffer_head *di_bh = NULL; >>> struct posix_acl *acl; >>> int ret; >>> +int arg_flags = 0, has_locked; >>> +struct ocfs2_holder oh; >>> +struct ocfs2_lock_res *lockres; >>> osb = OCFS2_SB(inode->i_sb); >>> if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) >>> return NULL; >>> -ret = ocfs2_inode_lock(inode, &di_bh, 0); >>> + >>> +lockres = &OCFS2_I(inode)->ip_inode_lockres; >>> +
Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points
Hi, >> >> Fixes them by adding the tracking logic (in the previous patch) for >> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), >> ocfs2_setattr(). > As described cases above, shall we just add the tracking logic only for > set/get_acl()? The idea is to detect recursive locking on the running task stack. Take case 1) for example if ocfs2_permisssion() is not changed: ocfs2_permission() <=== take PR, ocfs2_holder is not added ocfs2_iop_get_acl <=== still take PR, because there is no lock holder on the tracking list >>> I mean we have no need to check if locked by me, just do inode lock and add >>> holder. >>> This will make code more clean, IMO. >> Oh, sorry, I get your point this time. I think we need to check it if there >> are more than >> one processes that hold >> PR lock on the same resource. If I don't understand you correctly, please >> tell me why >> you think it's not neccessary >> to check before getting lock? > The code logic can only check if it is locked by myself. In the case Why only...? > described above, ocfs2_permission is the first entry to take inode lock. > And even if check succeeds, it is a bug without unlock, but not the case > of recursive lock. By checking succeeds, you mean it's locked by me, right? If so, this flag "arg_flags = OCFS2_META_LOCK_GETBH" will be passed down to ocfs2_inode_lock_full(), which gets back buffer head of the disk inode for us if necessary, but doesn't take cluster locking again. So, there is no need to unlock in such case. Thanks, Eric > > Thanks, > Joseph >> >> Thanks, >> Eric >>> >>> Thanks, >>> Joseph Thanks for your review;-) Eric > > Thanks, > Joseph >> >> Signed-off-by: Eric Ren >> --- >> fs/ocfs2/acl.c | 39 ++- >> fs/ocfs2/file.c | 44 ++-- >> 2 files changed, 68 insertions(+), 15 deletions(-) >> >> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c >> index bed1fcb..c539890 100644 >> --- a/fs/ocfs2/acl.c >> +++ b/fs/ocfs2/acl.c >> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct >> posix_acl >> *acl, int type) >> { >> struct buffer_head *bh = NULL; >> int status = 0; >> - >> -status = ocfs2_inode_lock(inode, &bh, 1); >> +int arg_flags = 0, has_locked; >> +struct ocfs2_holder oh; >> +struct ocfs2_lock_res *lockres; >> + >> +lockres = &OCFS2_I(inode)->ip_inode_lockres; >> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); >> +if (has_locked) >> +arg_flags = OCFS2_META_LOCK_GETBH; >> +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags); >> if (status < 0) { >> if (status != -ENOENT) >> mlog_errno(status); >> return status; >> } >> +if (!has_locked) >> +ocfs2_add_holder(lockres, &oh); >> + >> status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); >> -ocfs2_inode_unlock(inode, 1); >> + >> +if (!has_locked) { >> +ocfs2_remove_holder(lockres, &oh); >> +ocfs2_inode_unlock(inode, 1); >> +} >> brelse(bh); >> + >> return status; >> } >> @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode >> *inode, int >> type) >> struct buffer_head *di_bh = NULL; >> struct posix_acl *acl; >> int ret; >> +int arg_flags = 0, has_locked; >> +struct ocfs2_holder oh; >> +struct ocfs2_lock_res *lockres; >> osb = OCFS2_SB(inode->i_sb); >> if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) >> return NULL; >> -ret = ocfs2_inode_lock(inode, &di_bh, 0); >> + >> +lockres = &OCFS2_I(inode)->ip_inode_lockres; >> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); >> +if (has_locked) >> +arg_flags = OCFS2_META_LOCK_GETBH; >> +ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags); >> if (ret < 0) { >> if (ret != -ENOENT) >> mlog_errno(ret); >> return ERR_PTR(ret); >> } >> +if (!has_locked) >> +ocfs2_add_holder(lockres, &oh); >> acl = ocfs2_get_acl_nolock(inode, type, di_bh); >> -ocfs2_inode_unlock(inode, 0); >> +if (!has_locked) { >> +ocfs2_remove_holder(lockres, &oh); >> +ocfs2_inode_unlock(inode, 0); >> +} >> brelse(di_bh); >> + >> return acl; >> } >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index c488965..62be75d 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@
Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points
On 17/1/6 16:21, Eric Ren wrote: > On 01/06/2017 03:14 PM, Joseph Qi wrote: >> >> >> On 17/1/6 14:56, Eric Ren wrote: >>> On 01/06/2017 02:09 PM, Joseph Qi wrote: Hi Eric, On 17/1/5 23:31, Eric Ren wrote: > Commit 743b5f1434f5 ("ocfs2: take inode lock in > ocfs2_iop_set/get_acl()") > results in a deadlock, as the author "Tariq Saeed" realized shortly > after the patch was merged. The discussion happened here > (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html). > > > > The reason why taking cluster inode lock at vfs entry points opens up > a self deadlock window, is explained in the previous patch of this > series. > > So far, we have seen two different code paths that have this issue. > 1. do_sys_open > may_open >inode_permission > ocfs2_permission > ocfs2_inode_lock() <=== take PR > generic_permission >get_acl > ocfs2_iop_get_acl > ocfs2_inode_lock() <=== take PR > 2. fchmod|fchmodat > chmod_common > notify_change >ocfs2_setattr <=== take EX > posix_acl_chmod > get_acl > ocfs2_iop_get_acl <=== take PR > ocfs2_iop_set_acl <=== take EX > > Fixes them by adding the tracking logic (in the previous patch) for > these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), > ocfs2_setattr(). As described cases above, shall we just add the tracking logic only for set/get_acl()? >>> >>> The idea is to detect recursive locking on the running task stack. >>> Take case 1) for example if ocfs2_permisssion() >>> is not changed: >>> >>> ocfs2_permission() <=== take PR, ocfs2_holder is not added >>>ocfs2_iop_get_acl <=== still take PR, because there is no lock >>> holder on the tracking list >> I mean we have no need to check if locked by me, just do inode lock >> and add holder. >> This will make code more clean, IMO. > Oh, sorry, I get your point this time. I think we need to check it if > there are more than one processes that hold > PR lock on the same resource. If I don't understand you correctly, > please tell me why you think it's not neccessary > to check before getting lock? The code logic can only check if it is locked by myself. In the case described above, ocfs2_permission is the first entry to take inode lock. And even if check succeeds, it is a bug without unlock, but not the case of recursive lock. Thanks, Joseph > > Thanks, > Eric >> >> Thanks, >> Joseph >>> >>> Thanks for your review;-) >>> Eric >>> Thanks, Joseph > > Signed-off-by: Eric Ren > --- > fs/ocfs2/acl.c | 39 ++- > fs/ocfs2/file.c | 44 ++-- > 2 files changed, 68 insertions(+), 15 deletions(-) > > diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c > index bed1fcb..c539890 100644 > --- a/fs/ocfs2/acl.c > +++ b/fs/ocfs2/acl.c > @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, > struct posix_acl *acl, int type) > { > struct buffer_head *bh = NULL; > int status = 0; > - > -status = ocfs2_inode_lock(inode, &bh, 1); > +int arg_flags = 0, has_locked; > +struct ocfs2_holder oh; > +struct ocfs2_lock_res *lockres; > + > +lockres = &OCFS2_I(inode)->ip_inode_lockres; > +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); > +if (has_locked) > +arg_flags = OCFS2_META_LOCK_GETBH; > +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags); > if (status < 0) { > if (status != -ENOENT) > mlog_errno(status); > return status; > } > +if (!has_locked) > +ocfs2_add_holder(lockres, &oh); > + > status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); > -ocfs2_inode_unlock(inode, 1); > + > +if (!has_locked) { > +ocfs2_remove_holder(lockres, &oh); > +ocfs2_inode_unlock(inode, 1); > +} > brelse(bh); > + > return status; > } > @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct > inode *inode, int type) > struct buffer_head *di_bh = NULL; > struct posix_acl *acl; > int ret; > +int arg_flags = 0, has_locked; > +struct ocfs2_holder oh; > +struct ocfs2_lock_res *lockres; > osb = OCFS2_SB(inode->i_sb); > if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) > return NULL; > -ret = ocfs2_inode_lock(inode, &di_bh, 0); > + > +lockres = &OCFS2_I(inode)->ip_inode_lockres; > +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); > +if (has_locked
Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points
On 01/06/2017 03:14 PM, Joseph Qi wrote: > > > On 17/1/6 14:56, Eric Ren wrote: >> On 01/06/2017 02:09 PM, Joseph Qi wrote: >>> Hi Eric, >>> >>> >>> On 17/1/5 23:31, Eric Ren wrote: Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") results in a deadlock, as the author "Tariq Saeed" realized shortly after the patch was merged. The discussion happened here (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html). The reason why taking cluster inode lock at vfs entry points opens up a self deadlock window, is explained in the previous patch of this series. So far, we have seen two different code paths that have this issue. 1. do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== take PR generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== take PR 2. fchmod|fchmodat chmod_common notify_change ocfs2_setattr <=== take EX posix_acl_chmod get_acl ocfs2_iop_get_acl <=== take PR ocfs2_iop_set_acl <=== take EX Fixes them by adding the tracking logic (in the previous patch) for these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), ocfs2_setattr(). >>> As described cases above, shall we just add the tracking logic only for >>> set/get_acl()? >> >> The idea is to detect recursive locking on the running task stack. Take case >> 1) for >> example if ocfs2_permisssion() >> is not changed: >> >> ocfs2_permission() <=== take PR, ocfs2_holder is not added >>ocfs2_iop_get_acl <=== still take PR, because there is no lock holder on >> the tracking >> list > I mean we have no need to check if locked by me, just do inode lock and add > holder. > This will make code more clean, IMO. Oh, sorry, I get your point this time. I think we need to check it if there are more than one processes that hold PR lock on the same resource. If I don't understand you correctly, please tell me why you think it's not neccessary to check before getting lock? Thanks, Eric > > Thanks, > Joseph >> >> Thanks for your review;-) >> Eric >> >>> >>> Thanks, >>> Joseph Signed-off-by: Eric Ren --- fs/ocfs2/acl.c | 39 ++- fs/ocfs2/file.c | 44 ++-- 2 files changed, 68 insertions(+), 15 deletions(-) diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c index bed1fcb..c539890 100644 --- a/fs/ocfs2/acl.c +++ b/fs/ocfs2/acl.c @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) { struct buffer_head *bh = NULL; int status = 0; - -status = ocfs2_inode_lock(inode, &bh, 1); +int arg_flags = 0, has_locked; +struct ocfs2_holder oh; +struct ocfs2_lock_res *lockres; + +lockres = &OCFS2_I(inode)->ip_inode_lockres; +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); +if (has_locked) +arg_flags = OCFS2_META_LOCK_GETBH; +status = ocfs2_inode_lock_full(inode, &bh, 1, arg_flags); if (status < 0) { if (status != -ENOENT) mlog_errno(status); return status; } +if (!has_locked) +ocfs2_add_holder(lockres, &oh); + status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); -ocfs2_inode_unlock(inode, 1); + +if (!has_locked) { +ocfs2_remove_holder(lockres, &oh); +ocfs2_inode_unlock(inode, 1); +} brelse(bh); + return status; } @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, int type) struct buffer_head *di_bh = NULL; struct posix_acl *acl; int ret; +int arg_flags = 0, has_locked; +struct ocfs2_holder oh; +struct ocfs2_lock_res *lockres; osb = OCFS2_SB(inode->i_sb); if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) return NULL; -ret = ocfs2_inode_lock(inode, &di_bh, 0); + +lockres = &OCFS2_I(inode)->ip_inode_lockres; +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL); +if (has_locked) +arg_flags = OCFS2_META_LOCK_GETBH; +ret = ocfs2_inode_lock_full(inode, &di_bh, 0, arg_flags); if (ret < 0) { if (ret != -ENOENT) mlog_errno(ret); return ERR_PTR(ret); } +if (!has_locked) +ocfs2_add_holder(lockres, &oh); acl = ocfs2_get_acl_nolock(inode, type, di_bh); -ocfs2_inode_unlock(i
Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
On 01/06/2017 03:24 PM, Joseph Qi wrote: > > > On 17/1/6 15:03, Eric Ren wrote: >> On 01/06/2017 02:07 PM, Joseph Qi wrote: >>> Hi Eric, >>> >>> >>> On 17/1/5 23:31, Eric Ren wrote: We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl= ocfs2_iop_get_acl, .set_acl= ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 47 --- fs/ocfs2/dlmglue.h | 18 ++ fs/ocfs2/ocfs2.h | 1 + 3 files changed, 63 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 83d576f..500bda4 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) init_waitqueue_head(&res->l_event); INIT_LIST_HEAD(&res->l_blocked_list); INIT_LIST_HEAD(&res->l_mask_waiters); +INIT_LIST_HEAD(&res->l_holders); } void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) res->l_flags = 0UL; } +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_holder *oh) +{ +INIT_LIST_HEAD(&oh->oh_list); +oh->oh_owner_pid = get_pid(task_pid(current)); + +spin_lock(&lockres->l_lock); +list_add_tail(&oh->oh_list, &lockres->l_holders); +spin_unlock(&lockres->l_lock); +} + +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_holder *oh) +{ +spin_lock(&lockres->l_lock); +list_del(&oh->oh_list); +spin_unlock(&lockres->l_lock); + +put_pid(oh->oh_owner_pid); +} + +inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) +{ +struct ocfs2_holder *oh; +struct pid *pid; + +/* look in the list of holders for one with the current task as owner */ +spin_lock(&lockres->l_lock); +pid = task_pid(current); +list_for_each_entry(oh, &lockres->l_holders, oh_list) { +if (oh->oh_owner_pid == pid) +goto out; +} +oh = NULL; +out: