Hi, Since its introduction, the vc4 driver schedules its jobs and tracks the dependencies in between them using its own internal job queue implementation. This internal implementation is based in job lists, wait queues and hand-rolled seqnos. Although job scheduling worked most of the time, in more GPU intensive scenarios, many GPU hangs were reported [1][2].
After investigating several GPU hangs, I noticed that job dependencies weren't being tracked correctly, which could lead to synchronization issues and GPU resets. Also, the GPU reset path had issues related to job resubmission. Considering the many issues related to the internal job queue implementation, this series proposes switching to the DRM GPU scheduler, which is a well-established implementation used by multiple DRM drivers. This has many advantages: 1. Using common code: Instead of relying on a custom implementation, use a trusted common framework. This helps with maintainability of the vc4 driver. It also makes the code more readable. 2. Synchronization issues are gone: With this series, applications can work reliably on RPi 3. Many users reported that they weren't able to open applications like emulators on the device. Now, it's possible to play several retro games without issues. 3. GPU resets are recoverable: Even if a timeout happens, the GPU is able to recover successfully with minimal impact to the user. 4. PM actually works: Before this series, the GPU was active during the entire runtime. After this series, the GPU is able to autosuspend and resume when needed. In order to improve reviewability of the patches, I introduced piece by piece of the new infrastructure without actually plugging it in. The actual switchover only happens in the patch "drm/vc4: Switch to DRM GPU scheduler". This series was mostly based on the design of the v3d driver as the two drivers are very similar. [1] https://github.com/raspberrypi/linux/issues/5780 [2] https://github.com/raspberrypi/linux/issues/3221 Best regards, - Maíra --- Maíra Canal (11): drm/vc4: Release runtime PM reference after binding V3D drm/vc4: Fix a memory leak in hang state error path drm/vc4: Move vc4_wait_bo_ioctl() to vc4_bo.c drm/vc4: Introduce vc4_job structures for DRM scheduler integration drm/vc4: Add DRM GPU scheduler infrastructure drm/vc4: Add new job submission implementation drm/vc4: Add per-file descriptor seqno tracking drm/vc4: Switch to DRM GPU scheduler drm/vc4: Use unique fence timeline names per queue drm/vc4: Get PM reference before register access drm/vc4: Use devm_request_irq() for automatic cleanup drivers/gpu/drm/vc4/Kconfig | 1 + drivers/gpu/drm/vc4/Makefile | 2 + drivers/gpu/drm/vc4/vc4_bo.c | 33 ++ drivers/gpu/drm/vc4/vc4_drv.c | 15 + drivers/gpu/drm/vc4/vc4_drv.h | 232 +++++---- drivers/gpu/drm/vc4/vc4_fence.c | 33 +- drivers/gpu/drm/vc4/vc4_gem.c | 976 ++---------------------------------- drivers/gpu/drm/vc4/vc4_irq.c | 163 ++---- drivers/gpu/drm/vc4/vc4_render_cl.c | 17 +- drivers/gpu/drm/vc4/vc4_sched.c | 322 ++++++++++++ drivers/gpu/drm/vc4/vc4_submit.c | 569 +++++++++++++++++++++ drivers/gpu/drm/vc4/vc4_v3d.c | 25 +- drivers/gpu/drm/vc4/vc4_validate.c | 21 +- 13 files changed, 1208 insertions(+), 1201 deletions(-) --- base-commit: 2bcbc706dfa02ae50118173a6f6d8a12e735480c change-id: 20260121-vc4-drm-scheduler-03cd8670b3f6
