Hi,

Since its introduction, the vc4 driver schedules its jobs and tracks the
dependencies in between them using its own internal job queue
implementation. This internal implementation is based in job lists, wait
queues and hand-rolled seqnos. Although job scheduling worked most of
the time, in more GPU intensive scenarios, many GPU hangs were reported
[1][2].

After investigating several GPU hangs, I noticed that job dependencies
weren't being tracked correctly, which could lead to synchronization
issues and GPU resets. Also, the GPU reset path had issues related to
job resubmission.

Considering the many issues related to the internal job queue
implementation, this series proposes switching to the DRM GPU scheduler,
which is a well-established implementation used by multiple DRM drivers.

This has many advantages:

1. Using common code: Instead of relying on a custom implementation, use
   a trusted common framework. This helps with maintainability of the
   vc4 driver. It also makes the code more readable.

2. Synchronization issues are gone: With this series, applications can
   work reliably on RPi 3. Many users reported that they weren't able to
   open applications like emulators on the device. Now, it's possible to
   play several retro games without issues.

3. GPU resets are recoverable: Even if a timeout happens, the GPU is able
   to recover successfully with minimal impact to the user.

4. PM actually works: Before this series, the GPU was active during the 
   entire runtime. After this series, the GPU is able to autosuspend and
   resume when needed.

In order to improve reviewability of the patches, I introduced piece by
piece of the new infrastructure without actually plugging it in. The
actual switchover only happens in the patch "drm/vc4: Switch to DRM GPU
scheduler".

This series was mostly based on the design of the v3d driver as the two
drivers are very similar.

[1] https://github.com/raspberrypi/linux/issues/5780
[2] https://github.com/raspberrypi/linux/issues/3221

Best regards,
- Maíra

---
Maíra Canal (11):
      drm/vc4: Release runtime PM reference after binding V3D
      drm/vc4: Fix a memory leak in hang state error path
      drm/vc4: Move vc4_wait_bo_ioctl() to vc4_bo.c
      drm/vc4: Introduce vc4_job structures for DRM scheduler integration
      drm/vc4: Add DRM GPU scheduler infrastructure
      drm/vc4: Add new job submission implementation
      drm/vc4: Add per-file descriptor seqno tracking
      drm/vc4: Switch to DRM GPU scheduler
      drm/vc4: Use unique fence timeline names per queue
      drm/vc4: Get PM reference before register access
      drm/vc4: Use devm_request_irq() for automatic cleanup

 drivers/gpu/drm/vc4/Kconfig         |   1 +
 drivers/gpu/drm/vc4/Makefile        |   2 +
 drivers/gpu/drm/vc4/vc4_bo.c        |  33 ++
 drivers/gpu/drm/vc4/vc4_drv.c       |  15 +
 drivers/gpu/drm/vc4/vc4_drv.h       | 232 +++++----
 drivers/gpu/drm/vc4/vc4_fence.c     |  33 +-
 drivers/gpu/drm/vc4/vc4_gem.c       | 976 ++----------------------------------
 drivers/gpu/drm/vc4/vc4_irq.c       | 163 ++----
 drivers/gpu/drm/vc4/vc4_render_cl.c |  17 +-
 drivers/gpu/drm/vc4/vc4_sched.c     | 322 ++++++++++++
 drivers/gpu/drm/vc4/vc4_submit.c    | 569 +++++++++++++++++++++
 drivers/gpu/drm/vc4/vc4_v3d.c       |  25 +-
 drivers/gpu/drm/vc4/vc4_validate.c  |  21 +-
 13 files changed, 1208 insertions(+), 1201 deletions(-)
---
base-commit: 2bcbc706dfa02ae50118173a6f6d8a12e735480c
change-id: 20260121-vc4-drm-scheduler-03cd8670b3f6

Reply via email to