On Thu, 2016-08-18 at 09:39 +0200, Maarten Lankhorst wrote: > Hey, > > Op 17-08-16 om 21:55 schreef Lyude: > > > > Since the watermark calculations for Skylake are still broken, we're apt > > to hitting underruns very easily under multi-monitor configurations. > > While it would be lovely if this was fixed, it's not. Another problem > > that's been coming from this however, is the mysterious issue of > > underruns causing full system hangs. An easy way to reproduce this with > > a skylake system: > > > > - Get a laptop with a skylake GPU, and hook up two external monitors to > >  it > > - Move the cursor from the built-in LCD to one of the external displays > >  as quickly as you can > > - You'll get a few pipe underruns, and eventually the entire system will > >  just freeze. > > > > After doing a lot of investigation and reading through the bspec, I > > found the existence of the SAGV, which is responsible for adjusting the > > system agent voltage and clock frequencies depending on how much power > > we need. According to the bspec: > > > > "The display engine access to system memory is blocked during the > >  adjustment time. SAGV defaults to enabled. Software must use the > >  GT-driver pcode mailbox to disable SAGV when the display engine is not > >  able to tolerate the blocking time." > > > > The rest of the bspec goes on to explain that software can simply leave > > the SAGV enabled, and disable it when we use interlaced pipes/have more > > then one pipe active. > > > > Sure enough, with this patchset the system hangs resulting from pipe > > underruns on Skylake have completely vanished on my T460s. Additionally, > > the bspec mentions turning off the SAGV with more then one pipe > > enabled > > as a workaround for display underruns. While this patch doesn't entirely > > fix that, it looks like it does improve the situation a little bit so > > it's likely this is going to be required to make watermarks on Skylake > > fully functional. > > > > This will still need additional work in the future: we shouldn't be > > enabling the SAGV if any of the currently enabled planes can't enable WM > > levels that introduce latencies >= 30 µs. > > > > Changes since v11: > >  - Add skl_can_enable_sagv() > >  - Make sure we don't enable SAGV when not all planes can enable > >    watermarks >= the SAGV engine block time. I was originally going to > >    save this for later, but I recently managed to run into a machine > >    that was having problems with a single pipe configuration + SAGV. > >  - Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit > >  - Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE > >  - Move printks outside of mutexes > >  - Don't print error messages twice > > Changes since v10: > >  - Apparently sandybridge_pcode_read actually writes values and reads > >    them back, despite it's misleading function name. This means we've > >    been doing this mostly wrong and have been writing garbage to the > >    SAGV control. Because of this, we no longer attempt to read the SAGV > >    status during initialization (since there are no helpers for this). > >  - mlankhorst noticed that this patch was breaking on some very early > >    pre-release Skylake machines, which apparently don't allow you to > >    disable the SAGV. To prevent machines from failing tests due to SAGV > >    errors, if the first time we try to control the SAGV results in the > >    mailbox indicating an invalid command, we just disable future attempts > >    to control the SAGV state by setting dev_priv->skl_sagv_status to > >    I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg. > >  - Move mutex_unlock() a little higher in skl_enable_sagv(). This > >    doesn't actually fix anything, but lets us release the lock a little > >    sooner since we're finished with it. > > Changes since v9: > >  - Only enable/disable sagv on Skylake > > Changes since v8: > >  - Add intel_state->modeset guard to the conditional for > >    skl_enable_sagv() > > Changes since v7: > >  - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's > >    all we use it for anyway) > >  - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification > >  - Fix a styling error that snuck past me > > Changes since v6: > >  - Protect skl_enable_sagv() with intel_state->modeset conditional in > >    intel_atomic_commit_tail() > > Changes since v5: > >  - Don't use is_power_of_2. Makes things confusing > >  - Don't use the old state to figure out whether or not to > >    enable/disable the sagv, use the new one > >  - Split the loop in skl_disable_sagv into it's own function > >  - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail() > > Changes since v4: > >  - Use is_power_of_2 against active_crtcs to check whether we have > 1 > >    pipe enabled > >  - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0 > >    enabled > >  - Call skl_sagv_enable/disable() from pre/post-plane updates > > Changes since v3: > >  - Use time_before() to compare timeout to jiffies > > Changes since v2: > >  - Really apply minor style nitpicks to patch this time > > Changes since v1: > >  - Added comments about this probably being one of the requirements to > >    fixing Skylake's watermark issues > >  - Minor style nitpicks from Matt Roper > >  - Disable these functions on Broxton, since it doesn't have an SAGV > > > > Signed-off-by: Lyude <cpaul at redhat.com> > > Cc: Matt Roper <matthew.d.roper at intel.com> > > Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com> > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch> > > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com> > > Cc: stable at vger.kernel.org > > --- > >  drivers/gpu/drm/i915/i915_drv.h      |   7 ++ > >  drivers/gpu/drm/i915/i915_reg.h      |   4 + > >  drivers/gpu/drm/i915/intel_display.c |  11 +++ > >  drivers/gpu/drm/i915/intel_drv.h     |   3 + > >  drivers/gpu/drm/i915/intel_pm.c      | 148 > > +++++++++++++++++++++++++++++++++++ > >  5 files changed, 173 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > > b/drivers/gpu/drm/i915/i915_drv.h > > index 35caa9b..f20530bb 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -1949,6 +1949,13 @@ struct drm_i915_private { > >  struct i915_suspend_saved_registers regfile; > >  struct vlv_s0ix_state vlv_s0ix_state; > >  > > + enum { > > + I915_SKL_SAGV_UNKNOWN = 0, > > + I915_SKL_SAGV_DISABLED, > > + I915_SKL_SAGV_ENABLED, > > + I915_SKL_SAGV_NOT_CONTROLLED > > + } skl_sagv_status; > > + > >  struct { > >  /* > >   * Raw watermark latency values: > > diff --git a/drivers/gpu/drm/i915/i915_reg.h > > b/drivers/gpu/drm/i915/i915_reg.h > > index 7419fbf..be82c49 100644 > > --- a/drivers/gpu/drm/i915/i915_reg.h > > +++ b/drivers/gpu/drm/i915/i915_reg.h > > @@ -7153,6 +7153,10 @@ enum { > >  #define   HSW_PCODE_DE_WRITE_FREQ_REQ 0x17 > >  #define   DISPLAY_IPS_CONTROL 0x19 > >  #define   HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL 0x1A > > +#define   GEN9_PCODE_SAGV_CONTROL 0x21 > > +#define     GEN9_SAGV_DISABLE 0x0 > > +#define     GEN9_SAGV_IS_DISABLED 0x1 > > +#define     GEN9_SAGV_ENABLE              0x3 > >  #define GEN6_PCODE_DATA _MMIO(0x138128) > >  #define   GEN6_PCODE_FREQ_IA_RATIO_SHIFT 8 > >  #define   GEN6_PCODE_FREQ_RING_RATIO_SHIFT 16 > > diff --git a/drivers/gpu/drm/i915/intel_display.c > > b/drivers/gpu/drm/i915/intel_display.c > > index 781d22e..ca4b83f 100644 > > --- a/drivers/gpu/drm/i915/intel_display.c > > +++ b/drivers/gpu/drm/i915/intel_display.c > > @@ -14156,6 +14156,13 @@ static void intel_atomic_commit_tail(struct > > drm_atomic_state *state) > >       intel_state->cdclk_pll_vco != dev_priv- > > >cdclk_pll.vco)) > >  dev_priv->display.modeset_commit_cdclk(state); > >  > > + /* > > +  * SKL workaround: bspec recommends we disable the SAGV > > when we > > +  * have more then one pipe enabled > > +  */ > > + if (IS_SKYLAKE(dev_priv) && !skl_can_enable_sagv(state)) > > + skl_disable_sagv(dev_priv); > > + > >  intel_modeset_verify_disabled(dev); > >  } > >  > > @@ -14229,6 +14236,10 @@ static void intel_atomic_commit_tail(struct > > drm_atomic_state *state) > >  intel_modeset_verify_crtc(crtc, old_crtc_state, crtc- > > >state); > >  } > >  > > + if (IS_SKYLAKE(dev_priv) && intel_state->modeset && > > +     skl_can_enable_sagv(state)) > > + skl_enable_sagv(dev_priv); > > + > >  drm_atomic_helper_commit_hw_done(state); > >  > >  if (intel_state->modeset) > > diff --git a/drivers/gpu/drm/i915/intel_drv.h > > b/drivers/gpu/drm/i915/intel_drv.h > > index 1c700b0..d203c77 100644 > > --- a/drivers/gpu/drm/i915/intel_drv.h > > +++ b/drivers/gpu/drm/i915/intel_drv.h > > @@ -1722,6 +1722,9 @@ void ilk_wm_get_hw_state(struct drm_device *dev); > >  void skl_wm_get_hw_state(struct drm_device *dev); > >  void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv, > >    struct skl_ddb_allocation *ddb /* out */); > > +bool skl_can_enable_sagv(struct drm_atomic_state *state); > > +int skl_enable_sagv(struct drm_i915_private *dev_priv); > > +int skl_disable_sagv(struct drm_i915_private *dev_priv); > >  uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config); > >  bool ilk_disable_lp_wm(struct drm_device *dev); > >  int sanitize_rc6_option(struct drm_i915_private *dev_priv, int > > enable_rc6); > > diff --git a/drivers/gpu/drm/i915/intel_pm.c > > b/drivers/gpu/drm/i915/intel_pm.c > > index b4cf988..fed2bae8 100644 > > --- a/drivers/gpu/drm/i915/intel_pm.c > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > @@ -2860,6 +2860,7 @@ bool ilk_disable_lp_wm(struct drm_device *dev) > >  > >  #define SKL_DDB_SIZE 896 /* in blocks */ > >  #define BXT_DDB_SIZE 512 > > +#define SKL_SAGV_BLOCK_TIME 30 /* µs */ > >  > >  /* > >  * Return the index of a plane in the SKL DDB and wm result > > arrays.  Primary > > @@ -2883,6 +2884,153 @@ skl_wm_plane_id(const struct intel_plane *plane) > >  } > >  } > >  > > +/* > > + * SAGV dynamically adjusts the system agent voltage and clock frequencies > > + * depending on power and performance requirements. The display engine > > access > > + * to system memory is blocked during the adjustment time. Because of the > > + * blocking time, having this enabled can cause full system hangs and/or > > pipe > > + * underruns if we don't meet all of the following requirements: > > + * > > + *  - <= 1 pipe enabled > > + *  - All planes can enable watermarks for latencies >= SAGV engine block > > time > > + *  - We're not using an interlaced display configuration > > + */ > > +int > > +skl_enable_sagv(struct drm_i915_private *dev_priv) > > +{ > > + int ret; > > + > > + if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED || > > +     dev_priv->skl_sagv_status == I915_SKL_SAGV_ENABLED) > > + return 0; > > + > > + DRM_DEBUG_KMS("Enabling the SAGV\n"); > > + mutex_lock(&dev_priv->rps.hw_lock); > > + > > + ret = sandybridge_pcode_write(dev_priv, GEN9_PCODE_SAGV_CONTROL, > > +       GEN9_SAGV_ENABLE); > > + > > + /* We don't need to wait for the SAGV when enabling */ > > + mutex_unlock(&dev_priv->rps.hw_lock); > > + > > + /* > > +  * Some skl systems, pre-release machines in particular, > > +  * don't actually have an SAGV. > > +  */ > > + if (ret == -ENOSYS) { > > + DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n"); > > + dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED; > > + return 0; > > + } else if (ret < 0) { > > + DRM_ERROR("Failed to enable the SAGV\n"); > > + return ret; > > + } > > + > > + dev_priv->skl_sagv_status = I915_SKL_SAGV_ENABLED; > > + return 0; > > +} > > + > > +static int > > +skl_do_sagv_disable(struct drm_i915_private *dev_priv) > > +{ > > + int ret; > > + uint32_t temp = GEN9_SAGV_DISABLE; > > + > > + ret = sandybridge_pcode_read(dev_priv, GEN9_PCODE_SAGV_CONTROL, > > +      &temp); > > + if (ret) > > + return ret; > > + else > > + return temp & GEN9_SAGV_IS_DISABLED; > > +} > > + > > +int > > +skl_disable_sagv(struct drm_i915_private *dev_priv) > > +{ > > + int ret, result; > > + > > + if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED || > > +     dev_priv->skl_sagv_status == I915_SKL_SAGV_DISABLED) > > + return 0; > > + > > + DRM_DEBUG_KMS("Disabling the SAGV\n"); > > + mutex_lock(&dev_priv->rps.hw_lock); > > + > > + /* bspec says to keep retrying for at least 1 ms */ > > + ret = wait_for(result = skl_do_sagv_disable(dev_priv), 1); > > + mutex_unlock(&dev_priv->rps.hw_lock); > > + > > + if (ret == -ETIMEDOUT) { > > + DRM_ERROR("Request to disable SAGV timed out\n"); > > + return -ETIMEDOUT; > > + } > > + > > + /* > > +  * Some skl systems, pre-release machines in particular, > > +  * don't actually have an SAGV. > > +  */ > > + if (result == -ENOSYS) { > > + DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n"); > > + dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED; > > + return 0; > > + } else if (result < 0) { > > + DRM_ERROR("Failed to disable the SAGV\n"); > > + return result; > > + } > > + > > + dev_priv->skl_sagv_status = I915_SKL_SAGV_DISABLED; > > + return 0; > > +} > > + > > +bool skl_can_enable_sagv(struct drm_atomic_state *state) > > +{ > > + struct drm_device *dev = state->dev; > > + struct drm_i915_private *dev_priv = to_i915(dev); > > + struct intel_atomic_state *intel_state = > > to_intel_atomic_state(state); > > + struct drm_crtc *crtc; > > + enum pipe pipe; > > + int level, plane; > > + > > + /* > > +  * SKL workaround: bspec recommends we disable the SAGV when we > > have > > +  * more then one pipe enabled > > +  * > > +  * If there are no active CRTCs, no additional checks need be > > performed > > +  */ > > + if (hweight32(intel_state->active_crtcs) == 0) > > + return true; > > + else if (hweight32(intel_state->active_crtcs) > 1) > > + return false; > > + > > + /* Since we're now guaranteed to only have one active CRTC... */ > > + pipe = ffs(intel_state->active_crtcs) - 1; > > + crtc = dev_priv->pipe_to_crtc_mapping[pipe]; > > + > > + if (crtc->state->mode.flags & DRM_MODE_FLAG_INTERLACE) > > + return false; > > + > > + for_each_plane(dev_priv, pipe, plane) { > > + /* Skip this plane if it's not enabled */ > > + if (intel_state->wm_results.plane[pipe][plane][0] == 0) > > + continue; > > + > > + /* Find the highest enabled wm level for this plane */ > > + for (level = ilk_wm_max_level(dev); > > +      intel_state->wm_results.plane[pipe][plane][level] == > > 0; > > +      --level); > > + > > + /* > > +  * If any of the planes on this pipe don't enable wm levels > > +  * that incur memory latencies higher then 30µs we can't > > enable > > +  * the SAGV > > +  */ > > + if (dev_priv->wm.skl_latency[level] < SKL_SAGV_BLOCK_TIME) > > + return false; > Shouldn't this check be >= BLOCK_TIME? > That's the requirement for the sagv but the conditional here is still correct.
WM0 - 2ms WM1 - 5ms WM2 - 10ms WM3 - 20ms WM4+- disabled (20ms < BLOCK_TIME) == true, which indicates we don't have any watermark levels with latency values >= 30ms. We can't enable so return false. WM0 - 2ms WM1 - 5ms WM2 - 10ms WM3 - 20ms WM4 - 33ms WM5 - 50ms WM6 - 70ms WM7 - 99ms (99ms < BLOCK_TIME) == false, and 99 >= BLOCK_TIME so we end up returning true to indicate it's safe to enable. > ~Maarten > _______________________________________________ > dri-devel mailing list > dri-devel at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel