Hi Matt,

> > We occasionally see the PCI device in a non-accessible state at the
> > point the driver is loaded.  When this happens, all BAR accesses will
> > read back as 0xFFFFFFFF.  Rather than reading registers and
> > misinterpreting their (invalid) values, let's specifically check for
> > 0xFFFFFFFF in a register that cannot have that value to see if the
> > device is accessible.
> > 
> > Signed-off-by: Matt Roper <matthew.d.ro...@intel.com>
> > Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
> > Signed-off-by: Andi Shyti <andi.sh...@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++
> >  1 file changed, 35 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
> > b/drivers/gpu/drm/i915/intel_uncore.c
> > index e1e1f34490c8e..0b69081d6d285 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -2602,11 +2602,46 @@ static int uncore_forcewake_init(struct 
> > intel_uncore *uncore)
> >     return 0;
> >  }
> >  
> > +static int sanity_check_mmio_access(struct intel_uncore *uncore)
> > +{
> > +   struct drm_i915_private *i915 = uncore->i915;
> > +   int ret;
> > +
> > +   if (GRAPHICS_VER(i915) < 8)
> > +           return 0;
> > +
> > +   /*
> > +    * Sanitycheck that MMIO access to the device is working properly.  If
> > +    * the CPU is unable to communcate with a PCI device, BAR reads will
> > +    * return 0xFFFFFFFF.  Let's make sure the device isn't in this state
> > +    * before we start trying to access registers.
> > +    *
> > +    * We use the primary GT's forcewake register as our guinea pig since
> > +    * it's been around since HSW and it's a masked register so the upper
> > +    * 16 bits can never read back as 1's if device access is operating
> > +    * properly.
> > +    *
> > +    * If MMIO isn't working, we'll wait up to 2 seconds to see if it
> > +    * recovers, then give up.
> > +    */
> > +   ret = intel_wait_for_register_fw(uncore, FORCEWAKE_MT, 0, 0, 2000000);
> 
> It looks like you lost the check for 0xFFFFFFFF specifically.  In fact
> with a mask/value of 0, isn't this always going to just always pass
> immediately?

uh... yes... sorry, I just got confused and lost track of the
goal of the patch.

Sorry, then please ignore... I don't see then how
intel_wait_for_register_fw() can be used with a '!='.

Please, ignore this v2.

Thanks and sorry, again,
Andi

> We don't know what the value of this register will be (there may or may
> not be some bits set), but we need to make sure that it isn't 0xFFFFFFFF
> because that means we're not even truly accessing the register, just
> hitting a PCI BAR read failure.
> 
> 
> Matt
> 
> > +   if (ret == -ETIMEDOUT) {
> > +           drm_err(&i915->drm, "Device is non-operational; MMIO access 
> > returns 0xFFFFFFFF!\n");
> > +           return -EIO;
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> >  int intel_uncore_init_mmio(struct intel_uncore *uncore)
> >  {
> >     struct drm_i915_private *i915 = uncore->i915;
> >     int ret;
> >  
> > +   ret = sanity_check_mmio_access(uncore);
> > +   if (ret)
> > +           return ret;
> > +
> >     /*
> >      * The boot firmware initializes local memory and assesses its health.
> >      * If memory training fails, the punit will have been instructed to
> > -- 
> > 2.39.2
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

Reply via email to