On Tue, 2017-10-10 at 07:03 -0700, t...@kernel.org wrote: > Hello, Trond. > > On Mon, Oct 09, 2017 at 06:32:13PM +0000, Trond Myklebust wrote: > > On Mon, 2017-10-09 at 19:17 +0100, Lorenzo Pieralisi wrote: > > > I have run into the lockdep warning below while running v4.14- > > > rc3/rc4 > > > on an ARM64 defconfig Juno dev board - reporting it to check > > > whether > > > it is a known/genuine issue. > > > > > > Please let me know if you need further debug data or need some > > > specific tests. > > > > > > [ 6.209384] > > > ====================================================== > > > [ 6.215569] WARNING: possible circular locking dependency > > > detected > > > [ 6.221755] 4.14.0-rc4 #54 Not tainted > > > [ 6.225503] -------------------------------------------------- > > > ---- > > > [ 6.231689] kworker/4:0H/32 is trying to acquire lock: > > > [ 6.236830] ((&task->u.tk_work)){+.+.}, at: > > > [<ffff0000080e64cc>] > > > process_one_work+0x1cc/0x3f0 > > > [ 6.245472] > > > but task is already holding lock: > > > [ 6.251309] ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>] > > > process_one_work+0x1cc/0x3f0 > > > [ 6.259158] > > > which lock already depends on the new lock. > > > > > > [ 6.267345] > > > the existing dependency chain (in reverse order) > > > is: > > .. > > Adding Tejun and Lai, since this looks like a workqueue locking > > issue. > > It looks a bit cryptic but it's warning against the following case. > > 1. Memory pressure is high and rescuer kicks in for the xprtiod > workqueue. There are no other kworkers serving the workqueue. > > 2. The rescuer runs the xptr_destroy path and ends up calling > cancel_work_sync() on a work item which is queued on xprtiod. > > 3. The work item is pending on the same workqueue and assuming that > memory pressure doesn't let off (let's say reclaim is trying to > kick off nfs pages), the only way it can get executed is by the > rescuer which is waiting for the work item - an A-B-A deadlock. >
Hi Tejun, Thanks for the explanation. What I'm not really understanding here though, is how the work item could be queued at all. We have a wait_on_bit_lock() in xprt_destroy() that should mean the xprt- >task_cleanup work item has completed running, and that it cannot be requeued. Is there a possibility that the flush_queue() might be triggered despite the work item not being queued? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com