Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
Hello Peter, On (08/30/17 10:47), Peter Zijlstra wrote: [..] > On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote: > > > > So the overhead looks to be spread out over all sorts, which makes it > > harder to find and fix. > > > > stack unwinding is done lots and is fairly expensive, I've not yet > > checked if crossrelease does too much of that. > > Aah, we do an unconditional stack unwind for every __lock_acquire() now. > It keeps a trace in the xhlocks[]. > > Does the below cure most of that overhead? thanks. mostly yes. the kernel is not so dramatically slower now. it's still seems to be a bit slower, which is expected I suppose, but over all it's much better real1m35.209s user4m12.467s sys 0m49.457s // approx 10 seconds slower. -ss
RE: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
> -Original Message- > From: Peter Zijlstra [mailto:pet...@infradead.org] > Sent: Wednesday, August 30, 2017 5:48 PM > To: Sergey Senozhatsky > Cc: Byungchul Park; Bart Van Assche; linux-ker...@vger.kernel.org; linux- > bl...@vger.kernel.org; martin.peter...@oracle.com; ax...@kernel.dk; linux- > s...@vger.kernel.org; s...@canb.auug.org.au; linux-n...@vger.kernel.org; > kernel-t...@lge.com > Subject: Re: possible circular locking dependency detected [was: linux- > next: Tree for Aug 22] > > On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote: > > > > So the overhead looks to be spread out over all sorts, which makes it > > harder to find and fix. > > > > stack unwinding is done lots and is fairly expensive, I've not yet > > checked if crossrelease does too much of that. > > Aah, we do an unconditional stack unwind for every __lock_acquire() now. > It keeps a trace in the xhlocks[]. Yeah.. I also think this is most significant.. > > Does the below cure most of that overhead? > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 44c8d0d17170..7b872036b72e 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -4872,7 +4872,7 @@ static void add_xhlock(struct held_lock *hlock) > xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES; > xhlock->trace.entries = xhlock->trace_entries; > xhlock->trace.skip = 3; > - save_stack_trace(&xhlock->trace); > + /* save_stack_trace(&xhlock->trace); */ > } > > static inline int same_context_xhlock(struct hist_lock *xhlock)
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote: > > So the overhead looks to be spread out over all sorts, which makes it > harder to find and fix. > > stack unwinding is done lots and is fairly expensive, I've not yet > checked if crossrelease does too much of that. Aah, we do an unconditional stack unwind for every __lock_acquire() now. It keeps a trace in the xhlocks[]. Does the below cure most of that overhead? diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 44c8d0d17170..7b872036b72e 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -4872,7 +4872,7 @@ static void add_xhlock(struct held_lock *hlock) xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES; xhlock->trace.entries = xhlock->trace_entries; xhlock->trace.skip = 3; - save_stack_trace(&xhlock->trace); + /* save_stack_trace(&xhlock->trace); */ } static inline int same_context_xhlock(struct hist_lock *xhlock)
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 30, 2017 at 03:15:11PM +0900, Sergey Senozhatsky wrote: > Hi, > > On (08/30/17 14:43), Byungchul Park wrote: > [..] > > > notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim > > > is irritatingly slow) > > > > To Ingo, > > > > I cannot decide if we have to roll back CONFIG_LOCKDEP_CROSSRELEASE > > dependency on CONFIG_PROVE_LOCKING in Kconfig. With them enabled, > > lockdep detection becomes strong but has performance impact. But, > > it's anyway a debug option so IMHO we don't have to take case of the > > performance impact. Please let me know your decision. > > well, I expected it :) > > I've been running lockdep enabled kernels for years, and was OK with > the performance. but now it's just too much and I'm looking at disabling > lockdep. > > a more relevant test -- compilation of a relatively small project > > LOCKDEP -CROSSRELEASE -COMPLETIONS LOCKDEP +CROSSRELEASE +COMPLETIONS > >real1m23.722s real2m9.969s >user4m11.300s user4m15.458s >sys 0m49.386s sys 2m3.594s > > > you don't want to know how much time now it takes to recompile the > kernel ;) Right,.. so when I look at perf annotate for __lock_acquire and lock_release (the two most expensive lockdep functions in a kernel profile) I don't actually see much cross-release stuff. So the overhead looks to be spread out over all sorts, which makes it harder to find and fix. stack unwinding is done lots and is fairly expensive, I've not yet checked if crossrelease does too much of that. The below saved about 50% of my __lock_acquire() time, not sure it made a significant difference over all though. --- diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 44c8d0d17170..f8db1ead1c48 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3386,7 +3386,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, if (!class) return 0; } - atomic_inc((atomic_t *)&class->ops); + /* atomic_inc((atomic_t *)&class->ops); */ if (very_verbose(class)) { printk("\nacquire class [%p] %s", class->key, class->name); if (class->name_version > 1)
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
Hi, On (08/30/17 14:43), Byungchul Park wrote: [..] > > notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim > > is irritatingly slow) > > To Ingo, > > I cannot decide if we have to roll back CONFIG_LOCKDEP_CROSSRELEASE > dependency on CONFIG_PROVE_LOCKING in Kconfig. With them enabled, > lockdep detection becomes strong but has performance impact. But, > it's anyway a debug option so IMHO we don't have to take case of the > performance impact. Please let me know your decision. well, I expected it :) I've been running lockdep enabled kernels for years, and was OK with the performance. but now it's just too much and I'm looking at disabling lockdep. a more relevant test -- compilation of a relatively small project LOCKDEP -CROSSRELEASE -COMPLETIONS LOCKDEP +CROSSRELEASE +COMPLETIONS real1m23.722s real2m9.969s user4m11.300s user4m15.458s sys 0m49.386s sys 2m3.594s you don't want to know how much time now it takes to recompile the kernel ;) -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 30, 2017 at 02:20:37PM +0900, Sergey Senozhatsky wrote: > Byungchul, a quick question. Hello Sergey, > have you measured the performance impact? somehow my linux-next is Yeah, it might have performance impact inevitably. > notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim > is irritatingly slow) To Ingo, I cannot decide if we have to roll back CONFIG_LOCKDEP_CROSSRELEASE dependency on CONFIG_PROVE_LOCKING in Kconfig. With them enabled, lockdep detection becomes strong but has performance impact. But, it's anyway a debug option so IMHO we don't have to take case of the performance impact. Please let me know your decision. > `time dmesg' shows some difference, but probably that's not a good > test. > > !LOCKDEPLOCKDEP LOCKDEP -CROSSRELEASE -COMPLETIONS > real 0m0.661s 0m2.290s0m1.920s > user 0m0.010s 0m0.105s0m0.000s > sys 0m0.636s 0m2.224s0m1.888s > > anyone else "sees"/"can confirm" the slow down? > > > it gets back to "usual normal" when I disable CROSSRELEASE and COMPLETIONS. > > --- > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index b19c491cbc4e..cdc30ef81c5e 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1091,8 +1091,6 @@ config PROVE_LOCKING > select DEBUG_MUTEXES > select DEBUG_RT_MUTEXES if RT_MUTEXES > select DEBUG_LOCK_ALLOC > - select LOCKDEP_CROSSRELEASE > - select LOCKDEP_COMPLETIONS > select TRACE_IRQFLAGS > default n > help > > --- > > -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 09:03), Byungchul Park wrote: [..] > > Byungchul, did you add the crosslock checks to lockdep? Can you have a look > > at > > the above report? That report namely doesn't make sense to me. > > The report is talking about the following lockup: > > A work in a worker A task work on exit to user > -- --- > mutex_lock(&bdev->bd_mutex) >mutext_lock(&bdev->bd_mutex) > blk_execute_rq() >wait_for_completion_io_timeout(&A) >complete(&A) > [..] > To Peterz, > > Anyway I wanted to avoid lockdep reports in the case using a timeout > interface. Do you think it's still worth reporting the kind of lockup? > I'm ok if you do. Byungchul, a quick question. have you measured the performance impact? somehow my linux-next is notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim is irritatingly slow) `time dmesg' shows some difference, but probably that's not a good test. !LOCKDEPLOCKDEP LOCKDEP -CROSSRELEASE -COMPLETIONS real 0m0.661s 0m2.290s0m1.920s user 0m0.010s 0m0.105s0m0.000s sys 0m0.636s 0m2.224s0m1.888s anyone else "sees"/"can confirm" the slow down? it gets back to "usual normal" when I disable CROSSRELEASE and COMPLETIONS. --- diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index b19c491cbc4e..cdc30ef81c5e 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1091,8 +1091,6 @@ config PROVE_LOCKING select DEBUG_MUTEXES select DEBUG_RT_MUTEXES if RT_MUTEXES select DEBUG_LOCK_ALLOC - select LOCKDEP_CROSSRELEASE - select LOCKDEP_COMPLETIONS select TRACE_IRQFLAGS default n help --- -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
Hi, On (08/24/17 12:39), Boqun Feng wrote: > On Wed, Aug 23, 2017 at 02:55:17PM +0900, Sergey Senozhatsky wrote: > > On (08/23/17 13:35), Boqun Feng wrote: > > > > KERN_CONT and "\n" should not be together. "\n" flushes the cont > > > > buffer immediately. > > > > > > > > > > Hmm.. Not quite familiar with printk() stuffs, but I could see several > > > usages of printk(KERN_CONT "...\n") in kernel. > > > > > > Did a bit research myself, and I now think the inappropriate use is to > > > use a KERN_CONT printk *after* another printk ending with a "\n". > > > > ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer > > upon the return. sorry, your code is correct. > > > > So means printk(KERN_CON "..."); + printk(KERN_CONT "...\n") is a > correct usage, right? well, yes. with one precondition - there should be no printk-s from other CPUs/tasks in between printk(KERN_CON "..."); + printk(KERN_CONT "...\n") ^ here we can have a preliminary flush and broken cont line. but it's been this way forever. -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 23, 2017 at 02:55:17PM +0900, Sergey Senozhatsky wrote: > On (08/23/17 13:35), Boqun Feng wrote: > > > KERN_CONT and "\n" should not be together. "\n" flushes the cont > > > buffer immediately. > > > > > > > Hmm.. Not quite familiar with printk() stuffs, but I could see several > > usages of printk(KERN_CONT "...\n") in kernel. > > > > Did a bit research myself, and I now think the inappropriate use is to > > use a KERN_CONT printk *after* another printk ending with a "\n". > > ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer > upon the return. sorry, your code is correct. > So means printk(KERN_CON "..."); + printk(KERN_CONT "...\n") is a correct usage, right? Thanks. Again, not familiar with printk stuff, glad you can help me go through this ;-) Regards, Boqun > -ss > > > > > printk("\n *** DEADLOCK ***\n\n"); > > > > + } else if (cross_lock(src->instance)) { > > > > + printk(" Possible unsafe locking scenario by > > > > crosslock:\n\n"); > > > > + printk(" CPU0CPU1\n"); > > > > + printk(" \n"); > > > > + printk(" lock("); > > > > + __print_lock_name(target); > > > > + printk(KERN_CONT ");\n"); > > > > + printk(" lock("); > > > > + __print_lock_name(source); > > > > + printk(KERN_CONT ");\n"); > > > > + printk(" lock("); > > > > + __print_lock_name(parent == source ? target : parent); > > > > + printk(KERN_CONT ");\n"); > > > > + printk(" unlock("); > > > > + __print_lock_name(source); > > > > + printk(KERN_CONT ");\n"); > > > > + printk("\n *** DEADLOCK ***\n\n"); > > > > } else { > > > > printk(" Possible unsafe locking scenario:\n\n"); > > > > printk(" CPU0CPU1\n"); > > > > -- > > > > 2.14.1 > > > > > > signature.asc Description: PGP signature
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote: > On Tue, Aug 22, 2017 at 09:43:56PM +, Bart Van Assche wrote: > The report is talking about the following lockup: > > A work in a worker A task work on exit to user > -- --- > mutex_lock(&bdev->bd_mutex) >mutext_lock(&bdev->bd_mutex) > blk_execute_rq() >wait_for_completion_io_timeout(&A) >complete(&A) > > Is this impossible? > > To Peterz, > > Anyway I wanted to avoid lockdep reports in the case using a timeout > interface. Do you think it's still worth reporting the kind of lockup? Yes, people might not have expected to hit the timeout on this. They might think timeout means a dead device or something like that. I'd like to heard from the block folks if this was constructed thus on purpose though.
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 13:35), Boqun Feng wrote: > > KERN_CONT and "\n" should not be together. "\n" flushes the cont > > buffer immediately. > > > > Hmm.. Not quite familiar with printk() stuffs, but I could see several > usages of printk(KERN_CONT "...\n") in kernel. > > Did a bit research myself, and I now think the inappropriate use is to > use a KERN_CONT printk *after* another printk ending with a "\n". ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer upon the return. sorry, your code is correct. -ss > > > printk("\n *** DEADLOCK ***\n\n"); > > > + } else if (cross_lock(src->instance)) { > > > + printk(" Possible unsafe locking scenario by crosslock:\n\n"); > > > + printk(" CPU0CPU1\n"); > > > + printk(" \n"); > > > + printk(" lock("); > > > + __print_lock_name(target); > > > + printk(KERN_CONT ");\n"); > > > + printk(" lock("); > > > + __print_lock_name(source); > > > + printk(KERN_CONT ");\n"); > > > + printk(" lock("); > > > + __print_lock_name(parent == source ? target : parent); > > > + printk(KERN_CONT ");\n"); > > > + printk(" unlock("); > > > + __print_lock_name(source); > > > + printk(KERN_CONT ");\n"); > > > + printk("\n *** DEADLOCK ***\n\n"); > > > } else { > > > printk(" Possible unsafe locking scenario:\n\n"); > > > printk(" CPU0CPU1\n"); > > > -- > > > 2.14.1 > > >
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 13:35), Boqun Feng wrote: [..] > > > printk(KERN_CONT ");\n"); > > > > KERN_CONT and "\n" should not be together. "\n" flushes the cont > > buffer immediately. > > > > Hmm.. Not quite familiar with printk() stuffs, but I could see several > usages of printk(KERN_CONT "...\n") in kernel. > > Did a bit research myself, and I now think the inappropriate use is to > use a KERN_CONT printk *after* another printk ending with a "\n". Am I > missing some recent changes or rules of KERN_CONT? has been this way for quite some time (if not always). LOG_NEWLINE results in cont_flush(), which log_store() the content of KERN_CONT buffer. if we see that supplied message has no \n then we store it in a dedicated buffer (cont buffer) if (!(lflags & LOG_NEWLINE)) return cont_add(); return log_store(); we flush that buffer (move its content to the kernel log buffer) when we receive a message with a \n or when printk() from another task/context interrupts the current cont line and, thus, forces us to flush. -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 23, 2017 at 12:38:13PM +0800, Boqun Feng wrote: > From: Boqun Feng > Date: Wed, 23 Aug 2017 12:12:16 +0800 > Subject: [PATCH] lockdep: Print proper scenario if cross deadlock detected at > acquisition time > > For a potential deadlock about CROSSRELEASE as follow: > > P1 P2 > === = > lock(A) > lock(X) > lock(A) > commit(X) > > A: normal lock, X: cross lock > > , we could detect it at two places: > > 1. commit time: > > We have run P1 first, and have dependency A --> X in graph, and > then we run P2, and find the deadlock. > > 2. acquisition time: > > We have run P2 first, and have dependency A --> X, in X -> A > graph(because another P3 may run previously and is acquiring for ".. another P3 may have run previously and was holding .." ^ Additionally, not only P3 but also P2 like: lock(A) lock(X) lock(X) // I mean it's at _P2_ lock(A) commit(X) > lock X), and then we run P1 and find the deadlock. > > In current print_circular_lock_scenario(), for 1) we could print the > right scenario and note that's a deadlock related to CROSSRELEASE, > however for 2) we print the scenario as a normal lockdep deadlock. > > It's better to print a proper scenario related to CROSSRELEASE to help > users find their bugs more easily, so improve this. > > Signed-off-by: Boqun Feng > --- > kernel/locking/lockdep.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 642fb5362507..a3709e15f609 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src, > __print_lock_name(target); > printk(KERN_CONT ");\n"); > printk("\n *** DEADLOCK ***\n\n"); > + } else if (cross_lock(src->instance)) { > + printk(" Possible unsafe locking scenario by crosslock:\n\n"); > + printk(" CPU0CPU1\n"); > + printk(" \n"); > + printk(" lock("); > + __print_lock_name(target); > + printk(KERN_CONT ");\n"); > + printk(" lock("); > + __print_lock_name(source); > + printk(KERN_CONT ");\n"); > + printk(" lock("); > + __print_lock_name(parent == source ? target : parent); > + printk(KERN_CONT ");\n"); > + printk(" unlock("); > + __print_lock_name(source); > + printk(KERN_CONT ");\n"); > + printk("\n *** DEADLOCK ***\n\n"); > } else { > printk(" Possible unsafe locking scenario:\n\n"); > printk(" CPU0CPU1\n"); I need time to be sure if it's correct.
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 23, 2017 at 01:46:48PM +0900, Sergey Senozhatsky wrote: > On (08/23/17 12:38), Boqun Feng wrote: > [..] > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > index 642fb5362507..a3709e15f609 100644 > > --- a/kernel/locking/lockdep.c > > +++ b/kernel/locking/lockdep.c > > @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src, > > __print_lock_name(target); > > printk(KERN_CONT ");\n"); > > KERN_CONT and "\n" should not be together. "\n" flushes the cont > buffer immediately. > Hmm.. Not quite familiar with printk() stuffs, but I could see several usages of printk(KERN_CONT "...\n") in kernel. Did a bit research myself, and I now think the inappropriate use is to use a KERN_CONT printk *after* another printk ending with a "\n". Am I missing some recent changes or rules of KERN_CONT? Regards, Boqun > -ss > > > printk("\n *** DEADLOCK ***\n\n"); > > + } else if (cross_lock(src->instance)) { > > + printk(" Possible unsafe locking scenario by crosslock:\n\n"); > > + printk(" CPU0CPU1\n"); > > + printk(" \n"); > > + printk(" lock("); > > + __print_lock_name(target); > > + printk(KERN_CONT ");\n"); > > + printk(" lock("); > > + __print_lock_name(source); > > + printk(KERN_CONT ");\n"); > > + printk(" lock("); > > + __print_lock_name(parent == source ? target : parent); > > + printk(KERN_CONT ");\n"); > > + printk(" unlock("); > > + __print_lock_name(source); > > + printk(KERN_CONT ");\n"); > > + printk("\n *** DEADLOCK ***\n\n"); > > } else { > > printk(" Possible unsafe locking scenario:\n\n"); > > printk(" CPU0CPU1\n"); > > -- > > 2.14.1 > > signature.asc Description: PGP signature
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 23, 2017 at 01:46:17PM +0900, Byungchul Park wrote: > On Wed, Aug 23, 2017 at 11:49:51AM +0800, Boqun Feng wrote: > > Hi Byungchul, > > > > On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote: > > > On Tue, Aug 22, 2017 at 09:43:56PM +, Bart Van Assche wrote: > > > > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote: > > > > > == > > > > > WARNING: possible circular locking dependency detected > > > > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not > > > > > tainted > > > > > -- > > > > > fsck.ext4/148 is trying to acquire lock: > > > > > (&bdev->bd_mutex){+.+.}, at: [] > > > > > __blkdev_put+0x33/0x190 > > > > > > > > > > but now in release context of a crosslock acquired at the following: > > > > > ((complete)&wait#2){+.+.}, at: [] > > > > > blk_execute_rq+0xbb/0xda > > > > > > > > > > which lock already depends on the new lock. > > > > > > > > > I felt this message really misleading, because the deadlock is detected > > at the commit time of "((complete)&wait#2)" rather than the acquisition > > time of "(&bdev->bd_mutex)", so I made the following improvement. > > > > Thoughts? > > > > Regards, > > Boqun > > > > --->8 > > From: Boqun Feng > > Date: Wed, 23 Aug 2017 10:18:30 +0800 > > Subject: [PATCH] lockdep: Improve the readibility of crossrelease related > > splats > > > > When a crossrelease related deadlock is detected in a commit, the > > current implemention makes splats like: > > > > > fsck.ext4/148 is trying to acquire lock: > > > (&bdev->bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190 > > > > > > but now in release context of a crosslock acquired at the following: > > > ((complete)&wait#2){+.+.}, at: [] > > > blk_execute_rq+0xbb/0xda > > > > > > which lock already depends on the new lock. > > > ... > > > > However, it could be misleading because the current task has got the > > lock already, and in fact the deadlock is detected when it is doing the > > commit of the crossrelease lock. So make the splats more accurate to > > describe the deadlock case. > > > > Signed-off-by: Boqun Feng > > --- > > kernel/locking/lockdep.c | 22 ++ > > 1 file changed, 14 insertions(+), 8 deletions(-) > > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > index 66011c9f5df3..642fb5362507 100644 > > --- a/kernel/locking/lockdep.c > > +++ b/kernel/locking/lockdep.c > > @@ -1195,17 +1195,23 @@ print_circular_bug_header(struct lock_list *entry, > > unsigned int depth, > > pr_warn("WARNING: possible circular locking dependency detected\n"); > > print_kernel_ident(); > > pr_warn("--\n"); > > - pr_warn("%s/%d is trying to acquire lock:\n", > > - curr->comm, task_pid_nr(curr)); > > - print_lock(check_src); > > > > - if (cross_lock(check_tgt->instance)) > > - pr_warn("\nbut now in release context of a crosslock acquired > > at the following:\n"); > > - else > > + if (cross_lock(check_tgt->instance)) { > > + pr_warn("%s/%d is committing a crossrelease lock:\n", > > + curr->comm, task_pid_nr(curr)); > > I think it would be better to print something in term of acquisition, > since the following print_lock() will print infromation of acquisition. > Well, that print_lock() will print the cross lock acquisition information at other contexts, but the current thread is doing the commit. So I think the information would be a little misleading. I will add "aacquired at" to indicate the lock information is for acquisition. > > + print_lock(check_tgt); > > + pr_warn("\n, with the following lock held:\n"); > > The lock does not have to be held at the commit. > Ah.. right. How about this: pr_warn("%s/%d is committing a crossrelease lock acquired at:\n", curr->comm, task_pid_nr(curr)); print_lock(check_tgt); pr_warn("\n, after having the following lock held at least once:\n"); Regards, Boqun > > + print_lock(check_src); > > + pr_warn("\non which lock the crossrelease lock already > > depends.\n\n"); > > + } else { > > + pr_warn("%s/%d is trying to acquire lock:\n", > > + curr->comm, task_pid_nr(curr)); > > + print_lock(check_src); > > pr_warn("\nbut task is already holding lock:\n"); > > + print_lock(check_tgt); > > + pr_warn("\nwhich lock already depends on the new lock.\n\n"); > > + } > > > > - print_lock(check_tgt); > > - pr_warn("\nwhich lock already depends on the new lock.\n\n"); > > pr_warn("\nthe existing dependency chain (in reverse order) is:\n"); > > > > print_circular_bug_entry(entry, depth); > > -- > > 2.14.1 signature.asc Description: PGP s
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 12:38), Boqun Feng wrote: [..] > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 642fb5362507..a3709e15f609 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src, > __print_lock_name(target); > printk(KERN_CONT ");\n"); KERN_CONT and "\n" should not be together. "\n" flushes the cont buffer immediately. -ss > printk("\n *** DEADLOCK ***\n\n"); > + } else if (cross_lock(src->instance)) { > + printk(" Possible unsafe locking scenario by crosslock:\n\n"); > + printk(" CPU0CPU1\n"); > + printk(" \n"); > + printk(" lock("); > + __print_lock_name(target); > + printk(KERN_CONT ");\n"); > + printk(" lock("); > + __print_lock_name(source); > + printk(KERN_CONT ");\n"); > + printk(" lock("); > + __print_lock_name(parent == source ? target : parent); > + printk(KERN_CONT ");\n"); > + printk(" unlock("); > + __print_lock_name(source); > + printk(KERN_CONT ");\n"); > + printk("\n *** DEADLOCK ***\n\n"); > } else { > printk(" Possible unsafe locking scenario:\n\n"); > printk(" CPU0CPU1\n"); > -- > 2.14.1 >
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 23, 2017 at 11:49:51AM +0800, Boqun Feng wrote: > Hi Byungchul, > > On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote: > > On Tue, Aug 22, 2017 at 09:43:56PM +, Bart Van Assche wrote: > > > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote: > > > > == > > > > WARNING: possible circular locking dependency detected > > > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted > > > > -- > > > > fsck.ext4/148 is trying to acquire lock: > > > > (&bdev->bd_mutex){+.+.}, at: [] > > > > __blkdev_put+0x33/0x190 > > > > > > > > but now in release context of a crosslock acquired at the following: > > > > ((complete)&wait#2){+.+.}, at: [] > > > > blk_execute_rq+0xbb/0xda > > > > > > > > which lock already depends on the new lock. > > > > > > I felt this message really misleading, because the deadlock is detected > at the commit time of "((complete)&wait#2)" rather than the acquisition > time of "(&bdev->bd_mutex)", so I made the following improvement. > > Thoughts? > > Regards, > Boqun > > --->8 > From: Boqun Feng > Date: Wed, 23 Aug 2017 10:18:30 +0800 > Subject: [PATCH] lockdep: Improve the readibility of crossrelease related > splats > > When a crossrelease related deadlock is detected in a commit, the > current implemention makes splats like: > > > fsck.ext4/148 is trying to acquire lock: > > (&bdev->bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190 > > > > but now in release context of a crosslock acquired at the following: > > ((complete)&wait#2){+.+.}, at: [] > > blk_execute_rq+0xbb/0xda > > > > which lock already depends on the new lock. > > ... > > However, it could be misleading because the current task has got the > lock already, and in fact the deadlock is detected when it is doing the > commit of the crossrelease lock. So make the splats more accurate to > describe the deadlock case. > > Signed-off-by: Boqun Feng > --- > kernel/locking/lockdep.c | 22 ++ > 1 file changed, 14 insertions(+), 8 deletions(-) > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 66011c9f5df3..642fb5362507 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -1195,17 +1195,23 @@ print_circular_bug_header(struct lock_list *entry, > unsigned int depth, > pr_warn("WARNING: possible circular locking dependency detected\n"); > print_kernel_ident(); > pr_warn("--\n"); > - pr_warn("%s/%d is trying to acquire lock:\n", > - curr->comm, task_pid_nr(curr)); > - print_lock(check_src); > > - if (cross_lock(check_tgt->instance)) > - pr_warn("\nbut now in release context of a crosslock acquired > at the following:\n"); > - else > + if (cross_lock(check_tgt->instance)) { > + pr_warn("%s/%d is committing a crossrelease lock:\n", > + curr->comm, task_pid_nr(curr)); I think it would be better to print something in term of acquisition, since the following print_lock() will print infromation of acquisition. > + print_lock(check_tgt); > + pr_warn("\n, with the following lock held:\n"); The lock does not have to be held at the commit. > + print_lock(check_src); > + pr_warn("\non which lock the crossrelease lock already > depends.\n\n"); > + } else { > + pr_warn("%s/%d is trying to acquire lock:\n", > + curr->comm, task_pid_nr(curr)); > + print_lock(check_src); > pr_warn("\nbut task is already holding lock:\n"); > + print_lock(check_tgt); > + pr_warn("\nwhich lock already depends on the new lock.\n\n"); > + } > > - print_lock(check_tgt); > - pr_warn("\nwhich lock already depends on the new lock.\n\n"); > pr_warn("\nthe existing dependency chain (in reverse order) is:\n"); > > print_circular_bug_entry(entry, depth); > -- > 2.14.1
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 23, 2017 at 11:49:51AM +0800, Boqun Feng wrote: > Hi Byungchul, > > On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote: > > On Tue, Aug 22, 2017 at 09:43:56PM +, Bart Van Assche wrote: > > > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote: > > > > == > > > > WARNING: possible circular locking dependency detected > > > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted > > > > -- > > > > fsck.ext4/148 is trying to acquire lock: > > > > (&bdev->bd_mutex){+.+.}, at: [] > > > > __blkdev_put+0x33/0x190 > > > > > > > > but now in release context of a crosslock acquired at the following: > > > > ((complete)&wait#2){+.+.}, at: [] > > > > blk_execute_rq+0xbb/0xda > > > > > > > > which lock already depends on the new lock. > > > > > > I felt this message really misleading, because the deadlock is detected > at the commit time of "((complete)&wait#2)" rather than the acquisition > time of "(&bdev->bd_mutex)", so I made the following improvement. > > Thoughts? > > Regards, > Boqun > While I'm on this one, I think we should also add a case in @check_src is a cross lock, i.e. we detect cross deadlock at the acquisition time of the cross lock. How about the following? Regards, Boqun --->8 From: Boqun Feng Date: Wed, 23 Aug 2017 12:12:16 +0800 Subject: [PATCH] lockdep: Print proper scenario if cross deadlock detected at acquisition time For a potential deadlock about CROSSRELEASE as follow: P1 P2 === = lock(A) lock(X) lock(A) commit(X) A: normal lock, X: cross lock , we could detect it at two places: 1. commit time: We have run P1 first, and have dependency A --> X in graph, and then we run P2, and find the deadlock. 2. acquisition time: We have run P2 first, and have dependency A --> X, in graph(because another P3 may run previously and is acquiring for lock X), and then we run P1 and find the deadlock. In current print_circular_lock_scenario(), for 1) we could print the right scenario and note that's a deadlock related to CROSSRELEASE, however for 2) we print the scenario as a normal lockdep deadlock. It's better to print a proper scenario related to CROSSRELEASE to help users find their bugs more easily, so improve this. Signed-off-by: Boqun Feng --- kernel/locking/lockdep.c | 17 + 1 file changed, 17 insertions(+) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 642fb5362507..a3709e15f609 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src, __print_lock_name(target); printk(KERN_CONT ");\n"); printk("\n *** DEADLOCK ***\n\n"); + } else if (cross_lock(src->instance)) { + printk(" Possible unsafe locking scenario by crosslock:\n\n"); + printk(" CPU0CPU1\n"); + printk(" \n"); + printk(" lock("); + __print_lock_name(target); + printk(KERN_CONT ");\n"); + printk(" lock("); + __print_lock_name(source); + printk(KERN_CONT ");\n"); + printk(" lock("); + __print_lock_name(parent == source ? target : parent); + printk(KERN_CONT ");\n"); + printk(" unlock("); + __print_lock_name(source); + printk(KERN_CONT ");\n"); + printk("\n *** DEADLOCK ***\n\n"); } else { printk(" Possible unsafe locking scenario:\n\n"); printk(" CPU0CPU1\n"); -- 2.14.1
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
Hi Byungchul, On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote: > On Tue, Aug 22, 2017 at 09:43:56PM +, Bart Van Assche wrote: > > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote: > > > == > > > WARNING: possible circular locking dependency detected > > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted > > > -- > > > fsck.ext4/148 is trying to acquire lock: > > > (&bdev->bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190 > > > > > > but now in release context of a crosslock acquired at the following: > > > ((complete)&wait#2){+.+.}, at: [] > > > blk_execute_rq+0xbb/0xda > > > > > > which lock already depends on the new lock. > > > I felt this message really misleading, because the deadlock is detected at the commit time of "((complete)&wait#2)" rather than the acquisition time of "(&bdev->bd_mutex)", so I made the following improvement. Thoughts? Regards, Boqun --->8 From: Boqun Feng Date: Wed, 23 Aug 2017 10:18:30 +0800 Subject: [PATCH] lockdep: Improve the readibility of crossrelease related splats When a crossrelease related deadlock is detected in a commit, the current implemention makes splats like: > fsck.ext4/148 is trying to acquire lock: > (&bdev->bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190 > > but now in release context of a crosslock acquired at the following: > ((complete)&wait#2){+.+.}, at: [] blk_execute_rq+0xbb/0xda > > which lock already depends on the new lock. > ... However, it could be misleading because the current task has got the lock already, and in fact the deadlock is detected when it is doing the commit of the crossrelease lock. So make the splats more accurate to describe the deadlock case. Signed-off-by: Boqun Feng --- kernel/locking/lockdep.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 66011c9f5df3..642fb5362507 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1195,17 +1195,23 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth, pr_warn("WARNING: possible circular locking dependency detected\n"); print_kernel_ident(); pr_warn("--\n"); - pr_warn("%s/%d is trying to acquire lock:\n", - curr->comm, task_pid_nr(curr)); - print_lock(check_src); - if (cross_lock(check_tgt->instance)) - pr_warn("\nbut now in release context of a crosslock acquired at the following:\n"); - else + if (cross_lock(check_tgt->instance)) { + pr_warn("%s/%d is committing a crossrelease lock:\n", + curr->comm, task_pid_nr(curr)); + print_lock(check_tgt); + pr_warn("\n, with the following lock held:\n"); + print_lock(check_src); + pr_warn("\non which lock the crossrelease lock already depends.\n\n"); + } else { + pr_warn("%s/%d is trying to acquire lock:\n", + curr->comm, task_pid_nr(curr)); + print_lock(check_src); pr_warn("\nbut task is already holding lock:\n"); + print_lock(check_tgt); + pr_warn("\nwhich lock already depends on the new lock.\n\n"); + } - print_lock(check_tgt); - pr_warn("\nwhich lock already depends on the new lock.\n\n"); pr_warn("\nthe existing dependency chain (in reverse order) is:\n"); print_circular_bug_entry(entry, depth); -- 2.14.1
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Wed, Aug 23, 2017 at 11:36:49AM +0900, Sergey Senozhatsky wrote: > On (08/23/17 09:03), Byungchul Park wrote: > [..] > > aha, ok > > > The report is talking about the following lockup: > > > > A work in a worker A task work on exit to user > > -- --- > > mutex_lock(&bdev->bd_mutex) > >mutext_lock(&bdev->bd_mutex) > > blk_execute_rq() > >wait_for_completion_io_timeout(&A) > >complete(&A) > > > > Is this impossible? > > I was really confused how this "unlock" may lead to a deadlock Hi Sergey, Right. It should be enhanced. > > > > > other info that might help us debug this: > > > > Possible unsafe locking scenario by crosslock: > > > >CPU0CPU1 > > > > > > > > lock(&bdev->bd_mutex); > > > > lock((complete)&wait#2); > > > >lock(&bdev->bd_mutex); > > > >unlock((complete)&wait#2); > > > any chance the report can be improved? mention timeout, etc? > // well, if this functionality will stay. > > > p.s. > Bart Van Assche, thanks for Cc-ing Park Byungchul, I was really > sure I didn't enabled the cross-release, but apparently I was wrong: > CONFIG_LOCKDEP_CROSSRELEASE=y > CONFIG_LOCKDEP_COMPLETIONS=y > > -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 09:03), Byungchul Park wrote: [..] aha, ok > The report is talking about the following lockup: > > A work in a worker A task work on exit to user > -- --- > mutex_lock(&bdev->bd_mutex) >mutext_lock(&bdev->bd_mutex) > blk_execute_rq() >wait_for_completion_io_timeout(&A) >complete(&A) > > Is this impossible? I was really confused how this "unlock" may lead to a deadlock > > > other info that might help us debug this: > > > Possible unsafe locking scenario by crosslock: > > >CPU0CPU1 > > > > > > lock(&bdev->bd_mutex); > > > lock((complete)&wait#2); > > >lock(&bdev->bd_mutex); > > >unlock((complete)&wait#2); any chance the report can be improved? mention timeout, etc? // well, if this functionality will stay. p.s. Bart Van Assche, thanks for Cc-ing Park Byungchul, I was really sure I didn't enabled the cross-release, but apparently I was wrong: CONFIG_LOCKDEP_CROSSRELEASE=y CONFIG_LOCKDEP_COMPLETIONS=y -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Tue, Aug 22, 2017 at 09:43:56PM +, Bart Van Assche wrote: > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote: > > == > > WARNING: possible circular locking dependency detected > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted > > -- > > fsck.ext4/148 is trying to acquire lock: > > (&bdev->bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190 > > > > but now in release context of a crosslock acquired at the following: > > ((complete)&wait#2){+.+.}, at: [] > > blk_execute_rq+0xbb/0xda > > > > which lock already depends on the new lock. > > > > the existing dependency chain (in reverse order) is: > > > > -> #1 ((complete)&wait#2){+.+.}: > >lock_acquire+0x176/0x19e > >__wait_for_common+0x50/0x1e3 > >blk_execute_rq+0xbb/0xda > >scsi_execute+0xc3/0x17d [scsi_mod] > >sd_revalidate_disk+0x112/0x1549 [sd_mod] > >rescan_partitions+0x48/0x2c4 > >__blkdev_get+0x14b/0x37c > >blkdev_get+0x191/0x2c0 > >device_add_disk+0x2b4/0x3e5 > >sd_probe_async+0xf8/0x17e [sd_mod] > >async_run_entry_fn+0x34/0xe0 > >process_one_work+0x2af/0x4d1 > >worker_thread+0x19a/0x24f > >kthread+0x133/0x13b > >ret_from_fork+0x27/0x40 > > > > -> #0 (&bdev->bd_mutex){+.+.}: > >__blkdev_put+0x33/0x190 > >blkdev_close+0x24/0x27 > >__fput+0xee/0x18a > >task_work_run+0x79/0xa0 > >prepare_exit_to_usermode+0x9b/0xb5 > > > > other info that might help us debug this: > > Possible unsafe locking scenario by crosslock: > >CPU0CPU1 > > > > lock(&bdev->bd_mutex); > > lock((complete)&wait#2); > >lock(&bdev->bd_mutex); > >unlock((complete)&wait#2); > > > > *** DEADLOCK *** > > 4 locks held by fsck.ext4/148: > > #0: (&bdev->bd_mutex){+.+.}, at: [] > > __blkdev_put+0x33/0x190 > > #1: (rcu_read_lock){}, at: [] > > rcu_lock_acquire+0x0/0x20 > > #2: (&(&host->lock)->rlock){-.-.}, at: [] > > ata_scsi_queuecmd+0x23/0x74 [libata] > > #3: (&x->wait#14){-...}, at: [] complete+0x18/0x50 > > > > stack backtrace: > > CPU: 1 PID: 148 Comm: fsck.ext4 Not tainted > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 > > Call Trace: > > dump_stack+0x67/0x8e > > print_circular_bug+0x2a1/0x2af > > ? zap_class+0xc5/0xc5 > > check_prev_add+0x76/0x20d > > ? __lock_acquire+0xc27/0xcc8 > > lock_commit_crosslock+0x327/0x35e > > complete+0x24/0x50 > > scsi_end_request+0x8d/0x176 [scsi_mod] > > scsi_io_completion+0x1be/0x423 [scsi_mod] > > __blk_mq_complete_request+0x112/0x131 > > ata_scsi_simulate+0x212/0x218 [libata] > > __ata_scsi_queuecmd+0x1be/0x1de [libata] > > ata_scsi_queuecmd+0x41/0x74 [libata] > > scsi_dispatch_cmd+0x194/0x2af [scsi_mod] > > scsi_queue_rq+0x1e0/0x26f [scsi_mod] > > blk_mq_dispatch_rq_list+0x193/0x2a7 > > ? _raw_spin_unlock+0x2e/0x40 > > blk_mq_sched_dispatch_requests+0x132/0x176 > > __blk_mq_run_hw_queue+0x59/0xc5 > > __blk_mq_delay_run_hw_queue+0x5f/0xc1 > > blk_mq_flush_plug_list+0xfc/0x10b > > blk_flush_plug_list+0xc6/0x1eb > > blk_finish_plug+0x25/0x32 > > generic_writepages+0x56/0x63 > > do_writepages+0x36/0x70 > > __filemap_fdatawrite_range+0x59/0x5f > > filemap_write_and_wait+0x19/0x4f > > __blkdev_put+0x5f/0x190 > > blkdev_close+0x24/0x27 > > __fput+0xee/0x18a > > task_work_run+0x79/0xa0 > > prepare_exit_to_usermode+0x9b/0xb5 > > entry_SYSCALL_64_fastpath+0xab/0xad > > Byungchul, did you add the crosslock checks to lockdep? Can you have a look at > the above report? That report namely doesn't make sense to me. The report is talking about the following lockup: A work in a worker A task work on exit to user -- --- mutex_lock(&bdev->bd_mutex) mutext_lock(&bdev->bd_mutex) blk_execute_rq() wait_for_completion_io_timeout(&A) complete(&A) Is this impossible? To Peterz, Anyway I wanted to avoid lockdep reports in the case using a timeout interface. Do you think it's still worth reporting the kind of lockup? I'm ok if you do.
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote: > == > WARNING: possible circular locking dependency detected > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted > -- > fsck.ext4/148 is trying to acquire lock: > (&bdev->bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190 > > but now in release context of a crosslock acquired at the following: > ((complete)&wait#2){+.+.}, at: [] blk_execute_rq+0xbb/0xda > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #1 ((complete)&wait#2){+.+.}: >lock_acquire+0x176/0x19e >__wait_for_common+0x50/0x1e3 >blk_execute_rq+0xbb/0xda >scsi_execute+0xc3/0x17d [scsi_mod] >sd_revalidate_disk+0x112/0x1549 [sd_mod] >rescan_partitions+0x48/0x2c4 >__blkdev_get+0x14b/0x37c >blkdev_get+0x191/0x2c0 >device_add_disk+0x2b4/0x3e5 >sd_probe_async+0xf8/0x17e [sd_mod] >async_run_entry_fn+0x34/0xe0 >process_one_work+0x2af/0x4d1 >worker_thread+0x19a/0x24f >kthread+0x133/0x13b >ret_from_fork+0x27/0x40 > > -> #0 (&bdev->bd_mutex){+.+.}: >__blkdev_put+0x33/0x190 >blkdev_close+0x24/0x27 >__fput+0xee/0x18a >task_work_run+0x79/0xa0 >prepare_exit_to_usermode+0x9b/0xb5 > > other info that might help us debug this: > Possible unsafe locking scenario by crosslock: >CPU0CPU1 > > lock(&bdev->bd_mutex); > lock((complete)&wait#2); >lock(&bdev->bd_mutex); >unlock((complete)&wait#2); > > *** DEADLOCK *** > 4 locks held by fsck.ext4/148: > #0: (&bdev->bd_mutex){+.+.}, at: [] > __blkdev_put+0x33/0x190 > #1: (rcu_read_lock){}, at: [] > rcu_lock_acquire+0x0/0x20 > #2: (&(&host->lock)->rlock){-.-.}, at: [] > ata_scsi_queuecmd+0x23/0x74 [libata] > #3: (&x->wait#14){-...}, at: [] complete+0x18/0x50 > > stack backtrace: > CPU: 1 PID: 148 Comm: fsck.ext4 Not tainted > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 > Call Trace: > dump_stack+0x67/0x8e > print_circular_bug+0x2a1/0x2af > ? zap_class+0xc5/0xc5 > check_prev_add+0x76/0x20d > ? __lock_acquire+0xc27/0xcc8 > lock_commit_crosslock+0x327/0x35e > complete+0x24/0x50 > scsi_end_request+0x8d/0x176 [scsi_mod] > scsi_io_completion+0x1be/0x423 [scsi_mod] > __blk_mq_complete_request+0x112/0x131 > ata_scsi_simulate+0x212/0x218 [libata] > __ata_scsi_queuecmd+0x1be/0x1de [libata] > ata_scsi_queuecmd+0x41/0x74 [libata] > scsi_dispatch_cmd+0x194/0x2af [scsi_mod] > scsi_queue_rq+0x1e0/0x26f [scsi_mod] > blk_mq_dispatch_rq_list+0x193/0x2a7 > ? _raw_spin_unlock+0x2e/0x40 > blk_mq_sched_dispatch_requests+0x132/0x176 > __blk_mq_run_hw_queue+0x59/0xc5 > __blk_mq_delay_run_hw_queue+0x5f/0xc1 > blk_mq_flush_plug_list+0xfc/0x10b > blk_flush_plug_list+0xc6/0x1eb > blk_finish_plug+0x25/0x32 > generic_writepages+0x56/0x63 > do_writepages+0x36/0x70 > __filemap_fdatawrite_range+0x59/0x5f > filemap_write_and_wait+0x19/0x4f > __blkdev_put+0x5f/0x190 > blkdev_close+0x24/0x27 > __fput+0xee/0x18a > task_work_run+0x79/0xa0 > prepare_exit_to_usermode+0x9b/0xb5 > entry_SYSCALL_64_fastpath+0xab/0xad Byungchul, did you add the crosslock checks to lockdep? Can you have a look at the above report? That report namely doesn't make sense to me. Bart.