On 28.07.2022 04:42, john.c.harri...@intel.com wrote:
> From: John Harrison <john.c.harri...@intel.com>
> 
> When the KMD sends a CLIENT_RESET request to GuC (as part of the
> suspend sequence), GuC will mark the CTB buffer as 'UNUSED'. If the

hmm, GuC shouldn't do that on CLIENT_RESET, GuC shall only mark CTB as
UNUSED when we explicitly disable CTB using CONTROL_CTB as only then CTB
descriptors are known to be valid

> KMD then checked the CTB queue, it would see a non-zero status value
> and report the buffer as corrupted.
> 
> Technically, no G2H messages should be received once the CLIENT_RESET
> has been sent. However, if a context was outstanding on an engine then
> it would get reset and a reset notification would be sent. So, don't
> actually treat UNUSED as a catastrophic error. Just flag it up as
> unexpected and keep going.

we should have already marked locally that CTB is disabled, either as
part of the explicit disabling of CTB with CONTROL_CTB, or implicit due
to issued CLIENT_RESET, but in both cases we shouldn't try to read CTB
any more, even it there are any outstanding messages ...

is this due to a race with ct->enabled ?

> 
> Signed-off-by: John Harrison <john.c.harri...@intel.com>
> ---
>  .../i915/gt/uc/abi/guc_communication_ctb_abi.h |  8 +++++---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c      | 18 ++++++++++++++++--
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h 
> b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> index df83c1cc7c7a6..28b8387f97b77 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> @@ -37,6 +37,7 @@
>   *  |   |       |   - _`GUC_CTB_STATUS_OVERFLOW` = 1 (head/tail too large)   
>   |
>   *  |   |       |   - _`GUC_CTB_STATUS_UNDERFLOW` = 2 (truncated message)    
>   |
>   *  |   |       |   - _`GUC_CTB_STATUS_MISMATCH` = 4 (head/tail modified)    
>   |
> + *  |   |       |   - _`GUC_CTB_STATUS_UNUSED` = 8 (CTB is not in use)       
>   |
>   *  
> +---+-------+--------------------------------------------------------------+
>   *  |...|       | RESERVED = MBZ                                             
>   |
>   *  
> +---+-------+--------------------------------------------------------------+
> @@ -49,9 +50,10 @@ struct guc_ct_buffer_desc {
>       u32 tail;
>       u32 status;
>  #define GUC_CTB_STATUS_NO_ERROR                              0
> -#define GUC_CTB_STATUS_OVERFLOW                              (1 << 0)
> -#define GUC_CTB_STATUS_UNDERFLOW                     (1 << 1)
> -#define GUC_CTB_STATUS_MISMATCH                              (1 << 2)
> +#define GUC_CTB_STATUS_OVERFLOW                              BIT(0)
> +#define GUC_CTB_STATUS_UNDERFLOW                     BIT(1)
> +#define GUC_CTB_STATUS_MISMATCH                              BIT(2)
> +#define GUC_CTB_STATUS_UNUSED                                BIT(3)

nit: our goal was to use plain C definitions in ABI headers as much as
possible without introducing any dependency on external macros

>       u32 reserved[13];
>  } __packed;
>  static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index f01325cd1b625..11b5d4ddb19ce 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -816,8 +816,22 @@ static int ct_read(struct intel_guc_ct *ct, struct 
> ct_incoming_msg **msg)
>       if (unlikely(ctb->broken))
>               return -EPIPE;
>  
> -     if (unlikely(desc->status))
> -             goto corrupted;
> +     if (unlikely(desc->status)) {
> +             u32 status = desc->status;
> +
> +             if (status & GUC_CTB_STATUS_UNUSED) {
> +                     /*
> +                      * Potentially valid if a CLIENT_RESET request resulted 
> in
> +                      * contexts/engines being reset. But should never 
> happen as
> +                      * no contexts should be active when CLIENT_RESET is 
> sent.
> +                      */
> +                     CT_ERROR(ct, "Unexpected G2H after GuC has stopped!\n");
> +                     status &= ~GUC_CTB_STATUS_UNUSED;

do you really want to continue read messages from already disabled CTB ?
maybe instead of clearing GUC_CTB_STATUS_UNUSED bit we should just return?

Michal

> +             }
> +
> +             if (status)
> +                     goto corrupted;
> +     }
>  
>       GEM_BUG_ON(head > size);
>  

Reply via email to