On Mon, Dec 20, 2021 at 04:52:19PM -0800, john.c.harri...@intel.com wrote:
> From: John Harrison <john.c.harri...@intel.com>
> 
> There is a known (but exceedingly unlikely) race condition where the
> asynchronous frequency management code could reduce the GT clock while
> a GuC reload is in progress (during a full GT reset). A fix is in
> progress but there are complex locking issues to be resolved. In the
> meantime bump the timeout to 500ms. Even at slowest clock, this
> should be sufficient. And in the working case, a larger timeout makes
> no difference.
> 
> Signed-off-by: John Harrison <john.c.harri...@intel.com>

Any idea of the ETA for the proper fix? Also if the proper fix makes the
locking more complicated I'm probably of the opinion we just live with a
longer timer as full GTs shouldn't really ever happen in practice and if
they take a longer time, so be it.

Anyways for this patch:
Reviewed-by: Matthew Brost <matthew.br...@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> index 31420ce1ce6b..c03bde5ec61f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> @@ -105,12 +105,21 @@ static int guc_wait_ucode(struct intel_uncore *uncore)
>       /*
>        * Wait for the GuC to start up.
>        * NB: Docs recommend not using the interrupt for completion.
> -      * Measurements indicate this should take no more than 20ms, so a
> +      * Measurements indicate this should take no more than 20ms
> +      * (assuming the GT clock is at maximum frequency). So, a
>        * timeout here indicates that the GuC has failed and is unusable.
>        * (Higher levels of the driver may decide to reset the GuC and
>        * attempt the ucode load again if this happens.)
> +      *
> +      * FIXME: There is a known (but exceedingly unlikely) race condition
> +      * where the asynchronous frequency management code could reduce
> +      * the GT clock while a GuC reload is in progress (during a full
> +      * GT reset). A fix is in progress but there are complex locking
> +      * issues to be resolved. In the meantime bump the timeout to
> +      * 500ms. Even at slowest clock, this should be sufficient. And
> +      * in the working case, a larger timeout makes no difference.
>        */
> -     ret = wait_for(guc_ready(uncore, &status), 100);
> +     ret = wait_for(guc_ready(uncore, &status), 500);
>       if (ret) {
>               struct drm_device *drm = &uncore->i915->drm;
>  
> -- 
> 2.25.1
> 

Reply via email to