Re: [Intel-gfx] [PATCH] drm/i915: Update workaround documentation

2022-11-15 Thread Matt Roper
On Tue, Nov 15, 2022 at 11:26:11AM -0800, Lucas De Marchi wrote:
> There were several updates in the driver on how the workarounds are
> handled since its documentation was written. Update the documentation to
> reflect the current reality.
> 
> v2:
>   - Remove footnote that was wrongly referenced, adding back the
> reference in the correct paragraph.
>   - Remove "Display workarounds" and just mention "display IP" under
> "Other" category since all of them are peppered around the driver.
> 
> Cc: Matt Roper 
> Signed-off-by: Lucas De Marchi 
> Acked-by: Balasubramani Vivekanandan  # 
> v1

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 80 +
>  1 file changed, 50 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 213160f29ec3..290f9f91fdf4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -18,42 +18,62 @@
>  /**
>   * DOC: Hardware workarounds
>   *
> - * This file is intended as a central place to implement most [1]_ of the
> - * required workarounds for hardware to work as originally intended. They 
> fall
> - * in five basic categories depending on how/when they are applied:
> + * Hardware workarounds are register programming documented to be executed in
> + * the driver that fall outside of the normal programming sequences for a
> + * platform. There are some basic categories of workarounds, depending on
> + * how/when they are applied:
>   *
> - * - Workarounds that touch registers that are saved/restored to/from the HW
> - *   context image. The list is emitted (via Load Register Immediate 
> commands)
> - *   everytime a new context is created.
> - * - GT workarounds. The list of these WAs is applied whenever these 
> registers
> - *   revert to default values (on GPU reset, suspend/resume [2]_, etc..).
> - * - Display workarounds. The list is applied during display clock-gating
> - *   initialization.
> - * - Workarounds that whitelist a privileged register, so that UMDs can 
> manage
> - *   them directly. This is just a special case of a MMMIO workaround (as we
> - *   write the list of these to/be-whitelisted registers to some special HW
> - *   registers).
> - * - Workaround batchbuffers, that get executed automatically by the hardware
> - *   on every HW context restore.
> + * - Context workarounds: workarounds that touch registers that are
> + *   saved/restored to/from the HW context image. The list is emitted (via 
> Load
> + *   Register Immediate commands) once when initializing the device and 
> saved in
> + *   the default context. That default context is then used on every context
> + *   creation to have a "primed golden context", i.e. a context image that
> + *   already contains the changes needed to all the registers.
>   *
> - * .. [1] Please notice that there are other WAs that, due to their nature,
> - *cannot be applied from a central place. Those are peppered around the 
> rest
> - *of the code, as needed.
> + * - Engine workarounds: the list of these WAs is applied whenever the 
> specific
> + *   engine is reset. It's also possible that a set of engine classes share a
> + *   common power domain and they are reset together. This happens on some
> + *   platforms with render and compute engines. In this case (at least) one 
> of
> + *   them need to keeep the workaround programming: the approach taken in the
> + *   driver is to tie those workarounds to the first compute/render engine 
> that
> + *   is registered.  When executing with GuC submission, engine resets are
> + *   outside of kernel driver control, hence the list of registers involved 
> in
> + *   written once, on engine initialization, and then passed to GuC, that
> + *   saves/restores their values before/after the reset takes place. See
> + *   ``drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c`` for reference.
>   *
> - * .. [2] Technically, some registers are powercontext saved & restored, so 
> they
> - *survive a suspend/resume. In practice, writing them again is not too
> - *costly and simplifies things. We can revisit this in the future.
> + * - GT workarounds: the list of these WAs is applied whenever these 
> registers
> + *   revert to their default values: on GPU reset, suspend/resume [1]_, etc.
> + *
> + * - Register whitelist: some workarounds need to be implemented in 
> userspace,
> + *   but need to touch privileged registers. The whitelist in the kernel
> + *   instructs the hardware to allow the access to happen. From the kernel 
> side,
> + *   this is just a special case of a MMIO workaround (as we write the list 
> of
> + *   these to/be-whitelisted registers to some special HW registers).
> + *
> + * - Workaround batchbuffers: buffers that get executed automatically by the
> + *   hardware on every HW context restore. These buffers are created and
> + *   

Re: [Intel-gfx] [PATCH] drm/i915: Update workaround documentation

2022-11-15 Thread Lucas De Marchi

On Mon, Nov 14, 2022 at 01:04:30PM -0800, Matt Roper wrote:

On Mon, Nov 07, 2022 at 04:30:28PM -0800, Lucas De Marchi wrote:

There were several updates in the driver on how the workarounds are
handled since its documentation was written. Update the documentation to
reflect the current reality.

Signed-off-by: Lucas De Marchi 
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 87 +
 1 file changed, 56 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 3cdf5c24dbc5..0db3713c1beb 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -17,43 +17,68 @@
 /**
  * DOC: Hardware workarounds
  *
- * This file is intended as a central place to implement most [1]_ of the
- * required workarounds for hardware to work as originally intended. They fall
- * in five basic categories depending on how/when they are applied:
+ * This is intended as a central place to implement most [1]_ of the


Your footnotes don't hook up properly anymore.  The original code had
[1] and [2], but the new code hooks [1] to what used to be [2].


rewording this for next version



Since we moved this file under gt/ a while back, I wonder if we should
note somehow that sgunit/soc workarounds and display workarounds aren't
expected to be part of this file?


I'm trying to make this agnostic to "this file" since it doesn't look
correct when reading the html documentation. Next version I'm rewording
some of this to just reference "central place".




+ * required workarounds for hardware to work as originally intended. Hardware
+ * workarounds are register programming documented to be executed in the driver
+ * that fall outside of the normal programming sequences for a platform. There
+ * are some basic categories of workarounds, depending on how/when they are
+ * applied:
  *
- * - Workarounds that touch registers that are saved/restored to/from the HW
- *   context image. The list is emitted (via Load Register Immediate commands)
- *   everytime a new context is created.
- * - GT workarounds. The list of these WAs is applied whenever these registers
- *   revert to default values (on GPU reset, suspend/resume [2]_, etc..).
- * - Display workarounds. The list is applied during display clock-gating
- *   initialization.
- * - Workarounds that whitelist a privileged register, so that UMDs can manage
- *   them directly. This is just a special case of a MMMIO workaround (as we
- *   write the list of these to/be-whitelisted registers to some special HW
- *   registers).
- * - Workaround batchbuffers, that get executed automatically by the hardware
- *   on every HW context restore.
+ * - Context workarounds: workarounds that touch registers that are
+ *   saved/restored to/from the HW context image. The list is emitted (via Load
+ *   Register Immediate commands) once when initializing the device and saved 
in
+ *   the default context. That default context is then used on every context
+ *   creation to have a "primed golden context", i.e. a context image that
+ *   already contains the changes needed to all the registers.
  *
- * .. [1] Please notice that there are other WAs that, due to their nature,
- *cannot be applied from a central place. Those are peppered around the 
rest
- *of the code, as needed.
+ * - Engine workarounds: the list of these WAs is applied whenever the specific
+ *   engine is reset. It's also possible that a set of engine classes share a
+ *   common power domain and they are reset together. This happens on some
+ *   platforms with render and compute engines. In this case (at least) one of
+ *   them need to keeep the workaround programming: the approach taken in the
+ *   driver is to tie those workarounds to the first compute/render engine that
+ *   is registered.  When executing with GuC submission, engine resets are
+ *   outside of kernel driver control, hence the list of registers involved in
+ *   written once, on engine initialization, and then passed to GuC, that
+ *   saves/restores their values before/after the reset takes place. See
+ *   ``drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c`` for reference.
  *
- * .. [2] Technically, some registers are powercontext saved & restored, so 
they
- *survive a suspend/resume. In practice, writing them again is not too
- *costly and simplifies things. We can revisit this in the future.
+ * - GT workarounds: the list of these WAs is applied whenever these registers
+ *   revert to their default values: on GPU reset, suspend/resume, etc.


This is where the new [1] used to be referenced from.


  *
- * Layout
- * ~~
+ * - Register whitelist: some workarounds need to be implemented in userspace,
+ *   but need to touch privileged registers. The whitelist in the kernel
+ *   instructs the hardware to allow the access to happen. From the kernel 
side,
+ *   this is just a special case of a 

Re: [Intel-gfx] [PATCH] drm/i915: Update workaround documentation

2022-11-14 Thread Matt Roper
On Mon, Nov 07, 2022 at 04:30:28PM -0800, Lucas De Marchi wrote:
> There were several updates in the driver on how the workarounds are
> handled since its documentation was written. Update the documentation to
> reflect the current reality.
> 
> Signed-off-by: Lucas De Marchi 
> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 87 +
>  1 file changed, 56 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 3cdf5c24dbc5..0db3713c1beb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -17,43 +17,68 @@
>  /**
>   * DOC: Hardware workarounds
>   *
> - * This file is intended as a central place to implement most [1]_ of the
> - * required workarounds for hardware to work as originally intended. They 
> fall
> - * in five basic categories depending on how/when they are applied:
> + * This is intended as a central place to implement most [1]_ of the

Your footnotes don't hook up properly anymore.  The original code had
[1] and [2], but the new code hooks [1] to what used to be [2].

Since we moved this file under gt/ a while back, I wonder if we should
note somehow that sgunit/soc workarounds and display workarounds aren't
expected to be part of this file?

> + * required workarounds for hardware to work as originally intended. Hardware
> + * workarounds are register programming documented to be executed in the 
> driver
> + * that fall outside of the normal programming sequences for a platform. 
> There
> + * are some basic categories of workarounds, depending on how/when they are
> + * applied:
>   *
> - * - Workarounds that touch registers that are saved/restored to/from the HW
> - *   context image. The list is emitted (via Load Register Immediate 
> commands)
> - *   everytime a new context is created.
> - * - GT workarounds. The list of these WAs is applied whenever these 
> registers
> - *   revert to default values (on GPU reset, suspend/resume [2]_, etc..).
> - * - Display workarounds. The list is applied during display clock-gating
> - *   initialization.
> - * - Workarounds that whitelist a privileged register, so that UMDs can 
> manage
> - *   them directly. This is just a special case of a MMMIO workaround (as we
> - *   write the list of these to/be-whitelisted registers to some special HW
> - *   registers).
> - * - Workaround batchbuffers, that get executed automatically by the hardware
> - *   on every HW context restore.
> + * - Context workarounds: workarounds that touch registers that are
> + *   saved/restored to/from the HW context image. The list is emitted (via 
> Load
> + *   Register Immediate commands) once when initializing the device and 
> saved in
> + *   the default context. That default context is then used on every context
> + *   creation to have a "primed golden context", i.e. a context image that
> + *   already contains the changes needed to all the registers.
>   *
> - * .. [1] Please notice that there are other WAs that, due to their nature,
> - *cannot be applied from a central place. Those are peppered around the 
> rest
> - *of the code, as needed.
> + * - Engine workarounds: the list of these WAs is applied whenever the 
> specific
> + *   engine is reset. It's also possible that a set of engine classes share a
> + *   common power domain and they are reset together. This happens on some
> + *   platforms with render and compute engines. In this case (at least) one 
> of
> + *   them need to keeep the workaround programming: the approach taken in the
> + *   driver is to tie those workarounds to the first compute/render engine 
> that
> + *   is registered.  When executing with GuC submission, engine resets are
> + *   outside of kernel driver control, hence the list of registers involved 
> in
> + *   written once, on engine initialization, and then passed to GuC, that
> + *   saves/restores their values before/after the reset takes place. See
> + *   ``drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c`` for reference.
>   *
> - * .. [2] Technically, some registers are powercontext saved & restored, so 
> they
> - *survive a suspend/resume. In practice, writing them again is not too
> - *costly and simplifies things. We can revisit this in the future.
> + * - GT workarounds: the list of these WAs is applied whenever these 
> registers
> + *   revert to their default values: on GPU reset, suspend/resume, etc.

This is where the new [1] used to be referenced from.

>   *
> - * Layout
> - * ~~
> + * - Register whitelist: some workarounds need to be implemented in 
> userspace,
> + *   but need to touch privileged registers. The whitelist in the kernel
> + *   instructs the hardware to allow the access to happen. From the kernel 
> side,
> + *   this is just a special case of a MMIO workaround (as we write the list 
> of
> + *   these to/be-whitelisted registers to some 

Re: [Intel-gfx] [PATCH] drm/i915: Update workaround documentation

2022-11-10 Thread Balasubramani Vivekanandan
On 07.11.2022 16:30, Lucas De Marchi wrote:
> There were several updates in the driver on how the workarounds are
> handled since its documentation was written. Update the documentation to
> reflect the current reality.
> 
> Signed-off-by: Lucas De Marchi 
> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 87 +
>  1 file changed, 56 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 3cdf5c24dbc5..0db3713c1beb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -17,43 +17,68 @@
>  /**
>   * DOC: Hardware workarounds
>   *
> - * This file is intended as a central place to implement most [1]_ of the
> - * required workarounds for hardware to work as originally intended. They 
> fall
> - * in five basic categories depending on how/when they are applied:
> + * This is intended as a central place to implement most [1]_ of the
> + * required workarounds for hardware to work as originally intended. Hardware
> + * workarounds are register programming documented to be executed in the 
> driver
> + * that fall outside of the normal programming sequences for a platform. 
> There
> + * are some basic categories of workarounds, depending on how/when they are
> + * applied:
>   *
> - * - Workarounds that touch registers that are saved/restored to/from the HW
> - *   context image. The list is emitted (via Load Register Immediate 
> commands)
> - *   everytime a new context is created.
> - * - GT workarounds. The list of these WAs is applied whenever these 
> registers
> - *   revert to default values (on GPU reset, suspend/resume [2]_, etc..).
> - * - Display workarounds. The list is applied during display clock-gating
> - *   initialization.
> - * - Workarounds that whitelist a privileged register, so that UMDs can 
> manage
> - *   them directly. This is just a special case of a MMMIO workaround (as we
> - *   write the list of these to/be-whitelisted registers to some special HW
> - *   registers).
> - * - Workaround batchbuffers, that get executed automatically by the hardware
> - *   on every HW context restore.
> + * - Context workarounds: workarounds that touch registers that are
> + *   saved/restored to/from the HW context image. The list is emitted (via 
> Load
> + *   Register Immediate commands) once when initializing the device and 
> saved in
> + *   the default context. That default context is then used on every context
> + *   creation to have a "primed golden context", i.e. a context image that
> + *   already contains the changes needed to all the registers.
>   *
> - * .. [1] Please notice that there are other WAs that, due to their nature,
> - *cannot be applied from a central place. Those are peppered around the 
> rest
> - *of the code, as needed.
> + * - Engine workarounds: the list of these WAs is applied whenever the 
> specific
> + *   engine is reset. It's also possible that a set of engine classes share a
> + *   common power domain and they are reset together. This happens on some
> + *   platforms with render and compute engines. In this case (at least) one 
> of
> + *   them need to keeep the workaround programming: the approach taken in the
> + *   driver is to tie those workarounds to the first compute/render engine 
> that
> + *   is registered.  When executing with GuC submission, engine resets are
> + *   outside of kernel driver control, hence the list of registers involved 
> in
> + *   written once, on engine initialization, and then passed to GuC, that
> + *   saves/restores their values before/after the reset takes place. See
> + *   ``drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c`` for reference.
>   *
> - * .. [2] Technically, some registers are powercontext saved & restored, so 
> they
> - *survive a suspend/resume. In practice, writing them again is not too
> - *costly and simplifies things. We can revisit this in the future.
> + * - GT workarounds: the list of these WAs is applied whenever these 
> registers
> + *   revert to their default values: on GPU reset, suspend/resume, etc.
>   *
> - * Layout
> - * ~~
> + * - Register whitelist: some workarounds need to be implemented in 
> userspace,
> + *   but need to touch privileged registers. The whitelist in the kernel
> + *   instructs the hardware to allow the access to happen. From the kernel 
> side,
> + *   this is just a special case of a MMIO workaround (as we write the list 
> of
> + *   these to/be-whitelisted registers to some special HW registers).
>   *
> - * Keep things in this file ordered by WA type, as per the above (context, 
> GT,
> - * display, register whitelist, batchbuffer). Then, inside each type, keep 
> the
> - * following order:
> + * - Workaround batchbuffers: buffers that get executed automatically by the
> + *   hardware on every HW context restore. These buffers are created and
> + *   programmed