This is revision 4 implementing a GPU crash state for drm/msm
(https://patchwork.freedesktop.org/series/36097/). This version fixes some
identified issues and actually compiles on a modern kernel.

The goal is to store and provide enough information to debug software
and hardware issues on the Adreno hardware in a semi human-readable
format that can also be parsed by scripts.

THe full set of changes here capture basic information about the GPU, the
status and contents of the ringbuffers, a snapshot of the current register state
and the active buffers from the hanging submit.

The data is printed with devcoredump.  For example, after a hang you can get
the data from /sys/class/devcoredump/devcdX/data where X is a unique number.

You can see an example of the output for a simple invalid opcode error on the
db820c here: https://hastebin.com/ewamikoreh.cs

v5: Fix symbol error in i915_gpu_error.c thanks to 01 dot org bot. Added
open/release functions for the show debugfs file to get the state per Chris
Wilson. Slightly modified the register output format to be more YAML friendly
also per Chris.
v4: Add buffer dump for the active submit. Fix refcount issue with devcoredump.
Change header for a5xx registers to registers-hlsq because I'm told YAML
requires unique tags.
v3: Make recommended changes to ascii85 per Chris Wilson. Use devcoredump to
dump crash states as suggested by Bjorn Andersson and add a new drm_print
facility to facilitate that. Remove the now obsolete 'crash' debugfs node.
Add documentation for the crash dump output.
v2: Convert output to yaml, use ascii85 to dump ringbuffer contents.

Jordan Crouse (10):
  include: Move ascii85 functions from i915 to linux/ascii85.h
  drm: drm_printer: Add printer for devcoredump
  drm/msm/gpu: Capture the state of the GPU
  drm/msm/gpu: Convert the GPU show function to use the GPU state
  drm/msm/gpu: Rearrange the code that collects the task during a hang
  drm/msm/gpu: Capture the GPU state on a GPU hang
  drm/msm/adreno: Convert the show/crash file format
  drm/msm/adreno: Add ringbuffer data to the GPU state
  drm/msm/adreno: Add a5xx specific registers for the GPU state
  drm/msm/gpu: Add the buffer objects from the submit to the crash dump

 Documentation/gpu/drm-msm-crash-dump.txt |  46 +++++
 drivers/gpu/drm/drm_print.c              |  54 +++++
 drivers/gpu/drm/i915/i915_gpu_error.c    |  35 +---
 drivers/gpu/drm/msm/Kconfig              |   1 +
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c    |  30 +--
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c    |  22 ++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c    | 242 +++++++++++++++++++++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 180 +++++++++++++++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |   7 +-
 drivers/gpu/drm/msm/msm_debugfs.c        |  93 ++++++++-
 drivers/gpu/drm/msm/msm_gpu.c            | 143 +++++++++++++-
 drivers/gpu/drm/msm/msm_gpu.h            |  67 ++++++-
 include/drm/drm_print.h                  |  27 +++
 include/linux/ascii85.h                  |  39 ++++
 14 files changed, 886 insertions(+), 100 deletions(-)
 create mode 100644 Documentation/gpu/drm-msm-crash-dump.txt
 create mode 100644 include/linux/ascii85.h

-- 
2.17.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to