Daniel Vetter <dan...@ffwll.ch> writes: > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote: >> The WaDisableLSQCROPERFforOCL workaround has the side effect of >> disabling an L3SQ optimization that has huge performance implications >> and is unlikely to be necessary for the correct functioning of usual >> graphic workloads. Userspace is free to re-enable the workaround on >> demand, and is generally in a better position to determine whether the >> workaround is necessary than the DRM is (e.g. only during the >> execution of compute kernels that rely on both L3 fences and HDC R/W >> requests). >> >> The same workaround seems to apply to BDW (at least to production >> stepping G1) and SKL as well (the internal workaround database claims >> that it does for all steppings, while the BSpec workaround table only >> mentions pre-production steppings), but the DRM doesn't do anything >> beyond whitelisting the L3SQCREG4 register so userspace can enable it >> when it sees fit. Do the same on KBL platforms. >> >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%, >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master -- >> This is followed by a regression of 35% and 10% respectively for the >> same benchmarks and platform caused by my recent patch series >> switching userspace to use the dataport constant cache instead of the >> sampler to implement uniform pull constant loads, which caused us to >> hit more heavily the L3 cache (and on platforms other than KBL had the >> opposite effect of improving performance of the same two benchmarks). >> The overall effect on KBL of this change combined with the recent >> userspace change is respectively 4.6% and 2.6%. SynMark2 OglShMapPcf >> was affected by the constant cache changes (though it improved as it >> did on other platforms rather than regressing), but is not >> significantly affected by this patch (with statistical significance of >> 5% and sample size 20). >> >> v2: Drop some more code to avoid unused variable warning. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256 >> Signed-off-by: Francisco Jerez <curroje...@riseup.net> >> Cc: Eero Tamminen <eero.t.tammi...@intel.com> >> Cc: Jani Nikula <jani.nik...@intel.com> >> Cc: Mika Kuoppala <mika.kuopp...@intel.com> >> Cc: beig...@lists.freedesktop.org > > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom > for compute kernels? Are the patches for mesa compute/beignet > ready&reviewed?
This is explicit setting on kbl/E0 only. So one could argue that unless they filter based on PCI-IDs, things would already blow up across the skl/kbl population, if they forgot to set it. The whitelisting is in place and looks sane so this E0 exception is a wart that got in by me reading wa database slavishly without thinking. -Mika > -Daniel > >> --- >> drivers/gpu/drm/i915/intel_lrc.c | 10 ---------- >> drivers/gpu/drm/i915/intel_ringbuffer.c | 8 -------- >> 2 files changed, 18 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >> b/drivers/gpu/drm/i915/intel_lrc.c >> index 6db246a..656e0a3 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -970,18 +970,8 @@ static inline int gen8_emit_flush_coherentl3_wa(struct >> intel_engine_cs *engine, >> uint32_t *batch, >> uint32_t index) >> { >> - struct drm_i915_private *dev_priv = engine->i915; >> uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES); >> >> - /* >> - * WaDisableLSQCROPERFforOCL:kbl >> - * This WA is implemented in skl_init_clock_gating() but since >> - * this batch updates GEN8_L3SQCREG4 with default value we need to >> - * set this bit here to retain the WA during flush. >> - */ >> - if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0)) >> - l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS; >> - >> wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 | >> MI_SRM_LRM_GLOBAL_GTT)); >> wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4); >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c >> b/drivers/gpu/drm/i915/intel_ringbuffer.c >> index 0971ac3..7cb2ab4 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c >> @@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct >> intel_engine_cs *engine) >> WA_SET_BIT_MASKED(HDC_CHICKEN0, >> HDC_FENCE_DEST_SLM_DISABLE); >> >> - /* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes >> - * involving this register should also be added to WA batch as required. >> - */ >> - if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0)) >> - /* WaDisableLSQCROPERFforOCL:kbl */ >> - I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) | >> - GEN8_LQSC_RO_PERF_DIS); >> - >> /* WaToEnableHwFixForPushConstHWBug:kbl */ >> if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER)) >> WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2, >> -- >> 2.10.2 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx