* Daniel Vetter <dan...@ffwll.ch> wrote:

> On Tue, Oct 31, 2017 at 10:50:06AM +0100, Ingo Molnar wrote:
> > 
> > * Hans de Goede <j.w.r.dego...@gmail.com> wrote:
> > 
> > > intel_uncore_forcewake_reset() does forcewake puts and gets as such
> > > we need to make sure that no-one tries to access the PUNIT->PMIC bus
> > > (on systems where this bus is shared) while it runs, otherwise bad
> > > things happen.
> > > 
> > > Normally this is taken care of by the i915_pmic_bus_access_notifier()
> > > which does an intel_uncore_forcewake_get(FORCEWAKE_ALL) when some other
> > > driver tries to access the PMIC bus, so that later forcewake gets are
> > > no-ops (for the duration of the bus access).
> > > 
> > > But intel_uncore_forcewake_reset gets called in 3 cases:
> > > 1) Before registering the pmic_bus_access_notifier
> > > 2) After unregistering the pmic_bus_access_notifier
> > > 3) To reset forcewake state on a GPU reset
> > > 
> > > In all 3 cases the i915_pmic_bus_access_notifier() protection is
> > > insufficient.
> > > 
> > > This commit fixes this race by calling iosf_mbi_punit_acquire() before
> > > calling intel_uncore_forcewake_reset(). In the case where it is called
> > > directly after unregistering the pmic_bus_access_notifier, we need to
> > > hold the punit-lock over both calls to avoid a race where
> > > intel_uncore_fw_release_timer() may execute between the 2 calls.
> > > 
> > > Signed-off-by: Hans de Goede <hdego...@redhat.com>
> > > Reviewed-by: Imre Deak <imre.d...@intel.com>
> > > ---
> > > Changes in v2:
> > > -Rebase on current (July 6th 2017) drm-next
> > > 
> > > Changes in v3:
> > > -Keep punit acquired / locked over the unregister + forcewake_reset
> > >  call combo to avoid a race hitting between the 2 calls
> > > -This requires modifying iosf_mbi_unregister_pmic_bus_access_notifier
> > >  to not take the lock itself, since we are the only users this is done
> > >  in this same commit
> > > 
> > > Changes in v4:
> > > -Fix missing rename in doc-comment
> > > -Add and use iosf_mbi_assert_punit_acquired() helper
> > > -Add missing acquire surrounding intel_uncore_forcewake_reset() inside
> > >  intel_uncore_check_forcewake_domains()
> > > -Add Imre's Reviewed-by
> > > 
> > > Changes in v5:
> > > -Separate out arch/x86 iosf_mbi changes into a separate patch
> > > ---
> > >  drivers/gpu/drm/i915/intel_uncore.c           | 17 +++++++++++++----
> > >  drivers/gpu/drm/i915/selftests/intel_uncore.c |  3 +++
> > >  2 files changed, 16 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
> > > b/drivers/gpu/drm/i915/intel_uncore.c
> > > index 8c2ce81f01c2..0da81faf3981 100644
> > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > @@ -229,6 +229,7 @@ intel_uncore_fw_release_timer(struct hrtimer *timer)
> > >   return HRTIMER_NORESTART;
> > >  }
> > >  
> > > +/* Note callers must have acquired the PUNIT->PMIC bus, before calling 
> > > this. */
> > >  static void intel_uncore_forcewake_reset(struct drm_i915_private 
> > > *dev_priv,
> > >                                    bool restore)
> > >  {
> > > @@ -237,6 +238,8 @@ static void intel_uncore_forcewake_reset(struct 
> > > drm_i915_private *dev_priv,
> > >   int retry_count = 100;
> > >   enum forcewake_domains fw, active_domains;
> > >  
> > > + iosf_mbi_assert_punit_acquired();
> > > +
> > >   /* Hold uncore.lock across reset to prevent any register access
> > >    * with forcewake not set correctly. Wait until all pending
> > >    * timers are run before holding.
> > > @@ -416,14 +419,18 @@ static void __intel_uncore_early_sanitize(struct 
> > > drm_i915_private *dev_priv,
> > >                              GT_FIFO_CTL_RC6_POLICY_STALL);
> > >   }
> > >  
> > > + iosf_mbi_punit_acquire();
> > >   intel_uncore_forcewake_reset(dev_priv, restore_forcewake);
> > > + iosf_mbi_punit_release();
> > >  }
> > >  
> > >  void intel_uncore_suspend(struct drm_i915_private *dev_priv)
> > >  {
> > > - iosf_mbi_unregister_pmic_bus_access_notifier(
> > > + iosf_mbi_punit_acquire();
> > > + iosf_mbi_unregister_pmic_bus_access_notifier_unlocked(
> > >           &dev_priv->uncore.pmic_bus_access_nb);
> > >   intel_uncore_forcewake_reset(dev_priv, false);
> > > + iosf_mbi_punit_release();
> > >  }
> > >  
> > >  void intel_uncore_resume_early(struct drm_i915_private *dev_priv)
> > > @@ -1315,12 +1322,14 @@ void intel_uncore_init(struct drm_i915_private 
> > > *dev_priv)
> > >  
> > >  void intel_uncore_fini(struct drm_i915_private *dev_priv)
> > >  {
> > > - iosf_mbi_unregister_pmic_bus_access_notifier(
> > > -         &dev_priv->uncore.pmic_bus_access_nb);
> > > -
> > >   /* Paranoia: make sure we have disabled everything before we exit. */
> > >   intel_uncore_sanitize(dev_priv);
> > > +
> > > + iosf_mbi_punit_acquire();
> > > + iosf_mbi_unregister_pmic_bus_access_notifier_unlocked(
> > > +         &dev_priv->uncore.pmic_bus_access_nb);
> > >   intel_uncore_forcewake_reset(dev_priv, false);
> > > + iosf_mbi_punit_release();
> > >  }
> > >  
> > >  static const struct reg_whitelist {
> > > diff --git a/drivers/gpu/drm/i915/selftests/intel_uncore.c 
> > > b/drivers/gpu/drm/i915/selftests/intel_uncore.c
> > > index 3cac22eb47ce..733d87fe7737 100644
> > > --- a/drivers/gpu/drm/i915/selftests/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/selftests/intel_uncore.c
> > > @@ -148,7 +148,10 @@ static int 
> > > intel_uncore_check_forcewake_domains(struct drm_i915_private *dev_pri
> > >   for_each_set_bit(offset, valid, FW_RANGE) {
> > >           i915_reg_t reg = { offset };
> > >  
> > > +         iosf_mbi_punit_acquire();
> > >           intel_uncore_forcewake_reset(dev_priv, false);
> > > +         iosf_mbi_punit_release();
> > > +
> > >           check_for_unclaimed_mmio(dev_priv);
> > >  
> > >           (void)I915_READ(reg);
> > 
> > This patch looks like one massive layering violation. Why does the GPU code 
> > muck 
> > with the uncore hardware?
> 
> Because the hw is an even worse layering violation. There's way too much
> "magic" stuff going on in the background, which then sometimes doesn't
> work and we end up implementing hacks in drivers to paper over it.

Ok, fair enough I guess:

  Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

        Ingo
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to