> Another possibility is that test-only commits with the > DRM_MODE_ATOMIC_TEST_ONLY flag (which can happen in parallel while the kernel > is processing a "real" commit) accidentally have side effects on the current > kernel state, resulting in the "real" commit failing to do something it > should. There have been bugs like that in the amdgpu DM code before.
Some users reported that GPU resets on dGPUs happens way less often with legacy modesetting than atomic, which led me to the same conclusion of possibly missing locks in the driver. To test that theory, I recently gave some affected users a patch to lock KWin's commit thread(s) while doing atomic tests on the main thread, so it never does two atomic commits simultaneously. Testing on APUs showed that it did not help there, but as I haven't heard back from any dGPU users yet, it's still a possible factor. - Xaver
