On Wed, Jun 04, 2025 at 06:53:44PM +0200, Danilo Krummrich wrote: > On Wed, Jun 04, 2025 at 09:45:00AM -0700, Matthew Brost wrote: > > On Wed, Jun 04, 2025 at 05:07:15PM +0200, Simona Vetter wrote: > > > We should definitely document this trick better though, I didn't find any > > > place where that was documented. > > > > This is a good idea. > > I think - and I also mentioned this a few times in the patch series that added > the workqueue support - we should also really document the pitfalls of this. > > If the scheduler shares a workqueue with the driver, the driver needs to take > special care when submitting work that it's not possible to prevent run_job > and > free_job work from running by doing this. > > For instance, if it's a single threaded workqueue and the driver submits work > that allocates with GFP_KERNEL, this is a deadlock condition. > > More generally, if the driver submits N work that, for instance allocates with > GFP_KERNEL, it's also a deadlock condition if N == max_active.
Can we prime lockdep on scheduler init? e.g. fs_reclaim_acquire(GFP_KERNEL); workqueue_lockdep_acquire(); workqueue_lockdep_release(); fs_reclaim_release(GFP_KERNEL); In addition to documentation, this would prevent workqueues from being used that allocate with GFP_KERNEL. Maybe we could use dma_fence_sigaling annotations instead of fs_reclaim_acquire, but at one point those gave Xe false lockdep positives so use fs_reclaim_acquire in similar cases. Maybe that has been fixed though. Matt