On 2021-08-16 9:38 a.m., Christian König wrote: > Am 13.08.21 um 12:29 schrieb Michel Dänzer: >> From: Michel Dänzer <mdaen...@redhat.com> >> >> schedule_delayed_work does not push back the work if it was already >> scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms >> after the first time GFXOFF was disabled and re-enabled, even if GFXOFF >> was disabled and re-enabled again during those 100 ms. >> >> This resulted in frame drops / stutter with the upcoming mutter 41 >> release on Navi 14, due to constantly enabling GFXOFF in the HW and >> disabling it again (for getting the GPU clock counter). >> >> To fix this, call cancel_delayed_work_sync when GFXOFF transitions from >> enabled to disabled. This makes sure the delayed work will be scheduled >> as intended in the reverse case. >> >> In order to avoid a deadlock, amdgpu_device_delay_enable_gfx_off needs >> to use mutex_trylock instead of mutex_lock. >> >> v2: >> * Use cancel_delayed_work_sync & mutex_trylock instead of >> mod_delayed_work. > > While this may work it still smells a little bit fishy. > > In general you have two common locking orders around work items, either > lock->work or work->lock. If you mix this as lock->work->lock like here > trouble is usually imminent. > > I think what we should do instead is to double check if taking the lock > inside the work item is necessary and instead making sure that the work is > sync canceled when we don't want it to run. In other words fully switching to > the lock->work approach.
Done in v3, thanks for the suggestion! -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer