Please reread what I said. There was no obvious circular dependency, because nfsiod and rpciod are separate workqueues, both created with WQ_MEM_RECLAIM. Dros' experience shows, however that a call to rpc_shutdown_client in an nfsiod work item will deadlock with rpciod if the RPC task's work item has been assigned to the same CPU as the one running the rpc_shutdown_client work item.
I can't tell right now if that is intentional (in which case the WARN_ON in the rpc code is correct), or if it is a bug in the workqueue code. For now, we're assuming the former. ________________________________________ From: J. Bruce Fields [bfie...@fieldses.org] Sent: Friday, December 21, 2012 6:26 PM To: Myklebust, Trond Cc: Dave Jones; Linux Kernel; linux-...@vger.kernel.org; Adamson, Dros Subject: Re: nfsd oops on Linus' current tree. On Fri, Dec 21, 2012 at 11:15:40PM +0000, Myklebust, Trond wrote: > Apologies for top-posting. The SSD on my laptop died, and so I'm stuck using > webmail for this account... Fun! If that happens to me on this trip, I've got a week trying to hack the kernel from my cell phone.... > Our experience with nfsiod is that the WQ_MEM_RECLAIM option still deadlocks > despite the "rescuer thread". The CPU that is running the workqueue will > deadlock with any rpciod task that is assigned to the same CPU. Interestingly > enough, the WQ_UNBOUND option also appears able to deadlock in the same > situation. > > Sorry, I have no explanation why... As I said: > there shouldn't be any deadlock as long as there's no circular > dependency among the three. There was a circular dependency (of rpciod on itself), so having a dedicated rpciod rescuer thread wouldn't help--once the rescuer thread is waiting for work queued to do the same queue you're asking for trouble. The last argument in alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 1); ensures that it will never allow more than 1 piece of work to run per CPU, so the deadlock should be pretty easy to hit. And with UNBOUND that's only one piece of work globally, so yeah all you need is an rpc at shutdown time and it should deadlock every time. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/