On Mon, Oct 20, 2025 at 3:47 PM Rodrigo Siqueira <[email protected]> wrote:
>
> AMDGPU heavily relies on ring buffers to manage its components; as a
> result, it has an elaborate mechanism of operation with multiple details
> around it. This commit introduces new documentation on ring buffers,
> detailing their management and expanding the explanation of Enforce
> isolation. Finally, this commit also adds the documentation available in
> the amdgpu_ring.c file to it.
>
> Cc: Alex Deucher <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Timur Kristóf <[email protected]>
> Signed-off-by: Rodrigo Siqueira <[email protected]>
> ---
>  .../gpu/amdgpu/enforce_isolation.svg          |  654 +++++++
>  Documentation/gpu/amdgpu/gfx_pipeline_seq.svg |  413 +++++
>  Documentation/gpu/amdgpu/index.rst            |    1 +
>  .../gpu/amdgpu/no_enforce_isolation.svg       |  707 +++++++
>  Documentation/gpu/amdgpu/ring-buffer.rst      |   91 +
>  Documentation/gpu/amdgpu/ring_buffers.svg     | 1633 +++++++++++++++++
>  Documentation/gpu/amdgpu/userq.rst            |    2 +
>  7 files changed, 3501 insertions(+)
>  create mode 100644 Documentation/gpu/amdgpu/enforce_isolation.svg
>  create mode 100644 Documentation/gpu/amdgpu/gfx_pipeline_seq.svg
>  create mode 100644 Documentation/gpu/amdgpu/no_enforce_isolation.svg
>  create mode 100644 Documentation/gpu/amdgpu/ring-buffer.rst
>  create mode 100644 Documentation/gpu/amdgpu/ring_buffers.svg

Got a link to the images?

<snip>

> diff --git a/Documentation/gpu/amdgpu/ring-buffer.rst 
> b/Documentation/gpu/amdgpu/ring-buffer.rst
> new file mode 100644
> index 000000000000..015d803541fe
> --- /dev/null
> +++ b/Documentation/gpu/amdgpu/ring-buffer.rst
> @@ -0,0 +1,91 @@
> +=============
> + Ring Buffer
> +=============
> +
> +To handle communication between user space and kernel space, amdgpu adopts a
> +strategy based on a ring buffer, which is used by GFX, Compute, SDMA, UVD, 
> VCE,
> +VCN, VPE, KIQ, MES, UMSCH, and CPER.

To handle communication between user space and kernel space, AMD GPUs
use a ring buffer design to feed the engines (GFX, Compute, SDMA, UVD,
VCE, VCN, VPE, etc.).

> See the figure below that illustrates how
> +this communication works:
> +
> +.. kernel-figure:: ring_buffers.svg
> +
> +Ring buffers in the amdgpu work as a producer-consumer problem, where 
> userspace

Ring buffers in the amdgpu work as a producer-consumer model, where software

> +acts as the producer, constantly filling the ring buffer with GPU commands to
> +be executed. Meanwhile, the GPU retrieves the information from the ring, 
> parses
> +it, and distributes the specific set of instructions between the different
> +amdgpu blocks.

> Notice from the diagram that the ring has a Read Pointer (rptr)
> +that indicates where the GPU is currently reading, and there is a Write 
> Pointer
> +(wptr) that indicates where the host has written the message. Note that every
> +time something is written to the ring, the wptr is incremented. Meanwhile, 
> the
> +GPU constantly updates the rptr pointer, chasing the wptr until both pointers
> +are equal. The User space or the driver can utilize the rptr pointer to 
> roughly
> +estimate how many instructions are still pending completion.

The ring has a Read Pointer (rptr), which indicates where the engine
is currently reading packets from the ring, and a Write Pointer which
indicates how many packets software has added to the ring.  When the
rptr and wptr are equal, the ring is idle.  When software adds packets
to the ring, it updates the wptr, this causes the engine to start
fetching and processing packets.  As the engine processes packets, the
rptr gets updates until the rptr catches up to the wptr and they are
equal again.

> +
> +Usually, ring buffers in the driver have a limited size (search for 
> occurrences
> +of `amdgpu_ring_init()`). One of the reasons for the small ring buffer size 
> is
> +that CP (Command Processor) is capable of following addresses inserted into 
> the
> +ring; this is illustrated in the image by the reference to the IB (Indirect
> +Buffer). The IB gives userspace the possibility to have an area in memory 
> that
> +CP can read and feed the hardware with extra instructions.
> +
> +All ASICs pre-GFX11 use what is called a kernel queue, which means
> +the ring is allocated in kernel space and has some restrictions, such as not
> +being able to be :ref:`preempted directly by the scheduler<amdgpu-mes>`. 
> GFX11
> +and newer support kernel queues, but also provide a new mechanism named
> +:ref:`user queues<amdgpu-userq>`, where the queue is moved to the user space
> +and can be mapped and unmapped via the scheduler. In practice, both queues
> +insert user-space-generated GPU commands from different jobs into the 
> requested
> +component ring.
> +
> +Enforce Isolation
> +=================
> +

Might want to cross link to the other process isolation documentation
at Documentation/gpu/amdgpu/process-isolation.rst

> +Before examining the Enforce Isolation mechanism in the ring buffer context, 
> it
> +is helpful to briefly discuss how instructions from the ring buffer are
> +processed in the graphics pipeline. Let’s expand on this topic by checking 
> the
> +diagram below that illustrates the graphics pipeline:
> +
> +.. kernel-figure:: gfx_pipeline_seq.svg
> +
> +In terms of executing instructions, the GFX pipeline follows the sequence:
> +Shader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), Scan
> +Converter (SC), Primitive Assembler (PA), and cache manipulation (which may
> +vary across ASICs). Another common way to describe the pipeline is to use 
> Pixel
> +Shader (PS), raster, and Vertex Shader (VS) to symbolize the two shader 
> stages.
> +Now, with this pipeline in mind, let's assume that Job B causes a hang issue,
> +but Job C's instruction might already be executing, leading developers to
> +incorrectly identify Job C as the problematic one. This problem can be
> +mitigated on multiple levels; the diagram below illustrates how to minimize
> +part of this problem:
> +
> +.. kernel-figure:: no_enforce_isolation.svg
> +
> +Note from the diagram that there is no guarantee of order or a clear 
> separation
> +between instructions, which is not a problem most of the time, and is also 
> good
> +for performance. Furthermore, notice some circles between jobs in the diagram
> +that represent a **fence wait** used to avoid overlapping work in the ring. 
> At
> +the end of the fence, a cache flush occurs, ensuring that when the next job
> +starts, it begins in a clean state and, if issues arise, the developer can
> +pinpoint the problematic process more precisely.
> +
> +To increase the level of isolation between jobs, there is the "Enforce
> +Isolation" method described in the picture below:
> +
> +.. kernel-figure:: enforce_isolation.svg
> +
> +As shown in the diagram, enforcing isolation introduces ordering between
> +submissions, since the access to GFX/Compute/Other is serialized.   Notice 
> that

enforce isolation only affects GFX and compute.  It's probably easier
to think about it as single process at a time mode for gfx/compute.

Alex

> +this approach has a significant performance impact, as it allows only one job
> +to submit commands at a time. However, this option can help pinpoint the job
> +that caused the problem. Although enforcing isolation improves the situation,
> +it does not fully resolve the issue of precisely pinpointing bad jobs, since
> +isolation might mask the problem. In summary, identifying which job caused 
> the
> +issue may not be precise, but enforcing isolation might help with the
> +debugging.
> +
> +Ring Operations
> +===============
> +
> +.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +   :internal:
> +

Reply via email to