[Intel-gfx] [PATCH] drm/i915: Avoid reading through unaccessible PTEs during an error event
We need to sanity check that the buffer is actually bound into the mappable range of the GTT prior to reading it back through the GTT with the CPU. Fortuitously, the only buffers we have been interested in so far are constrained to be in the mappable region in order to handle potential relocations. However, this can be relaxed in future and given that the purpose is to read back following an error we should be extra careful and not assume everything is safe. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- drivers/gpu/drm/i915/i915_irq.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 26ac3bf..9b38226 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -708,7 +708,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv, int page, page_count; u32 reloc_offset; - if (src == NULL || src-pages == NULL) + if (src == NULL || src-pages == NULL || !src-map_and_fenceable) return NULL; page_count = src-base.size / PAGE_SIZE; -- 1.7.7.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 00/11] aliasing ppgtt support v2
On Mon, 28 Nov 2011 21:35:27 +0100, Daniel Vetter daniel.vet...@ffwll.ch wrote: Hi all, Changes since the last submission: - fixed issues pointed out by Chris Wilson on irc. - fixed an oops on pre-snb, shame on me for that one. - added two new patches to only bind objects to the global gtt when required. - added a new patch so that userspace can find out whether ppgtt is on. This is required to use MI_STORE/LOAD commands correctly from userspace batchbuffers. Luckily no currently released userspace code depends on this, it's just prep work. Comments, test reports, reviews and flames highly welcome. The lazy-gtt is just missing a guard to ensure the buffer is bound in the global gtt before reading through those PTE (impacts other code to avoid allocating mappable GTT space). The beauty is that it forced me to grok the rest of the lazy-gtt, it's deceptively simple. Aside from the lazy-gtt sufficiently speeding up command submission and reopening an old wound that prevents me from usefully analysing performance, the series is tested-by myself. I've so far reviewed everything but the actual mechanics of the PDE, which is not saying much ;-) -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid reading through unaccessible PTEs during an error event
On Tue, Nov 29, 2011 at 09:35:10AM +, Chris Wilson wrote: We need to sanity check that the buffer is actually bound into the mappable range of the GTT prior to reading it back through the GTT with the CPU. Fortuitously, the only buffers we have been interested in so far are constrained to be in the mappable region in order to handle potential relocations. However, this can be relaxed in future and given that the purpose is to read back following an error we should be extra careful and not assume everything is safe. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch For the lazy-gtt binding we need to also check whether the ptes are correct (they should because we pin buffers with relocations as mappable). I'll add that additional paranoia to my series. -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: capture error_state also for stuck rings
Since quite a while we also the basic output configuration in the error_state, so it should contain enough information to diagnose these MI_WAIT hangs. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_irq.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index f870958..8926bbe 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1708,6 +1708,7 @@ void i915_hangcheck_elapsed(unsigned long data) dev_priv-last_instdone1 == instdone1) { if (dev_priv-hangcheck_count++ 1) { DRM_ERROR(Hangcheck timer elapsed... GPU hung\n); + i915_handle_error(dev, true); if (!IS_GEN2(dev)) { /* Is the chip hanging on a WAIT_FOR_EVENT? @@ -1715,7 +1716,6 @@ void i915_hangcheck_elapsed(unsigned long data) * and break the hang. This should work on * all but the second generation chipsets. */ - if (kick_ring(dev_priv-ring[RCS])) goto repeat; @@ -1728,7 +1728,6 @@ void i915_hangcheck_elapsed(unsigned long data) goto repeat; } - i915_handle_error(dev, true); return; } } else { -- 1.7.7.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: bind objects to the global gtt only when needed
And track the existence of such a binding similar to the aliasing ppgtt case. Speeds up binding/unbinding in the common case where we only need a ppgtt binding (which is accessed in a cpu coherent fashion by the gpu) and no gloabl gtt binding (which needs uc writes for the ptes). This patch just puts the required tracking in place. v2: Check that global gtt mappings exist in the error_state capture code (they should because we pin all relevant things as mappable). Suggested by Chris Wilson. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_drv.h |1 + drivers/gpu/drm/i915/i915_gem.c | 12 ++-- drivers/gpu/drm/i915/i915_gem_gtt.c |4 drivers/gpu/drm/i915/i915_irq.c |3 +++ 4 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e026299..8b5c016 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -857,6 +857,7 @@ struct drm_i915_gem_object { unsigned int cache_level:2; unsigned int has_aliasing_ppgtt_mapping:1; + unsigned int has_global_gtt_mapping:1; struct page **pages; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 234eae8..88e5c1b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1224,6 +1224,9 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) goto unlock; } + if (!obj-has_global_gtt_mapping) + i915_gem_gtt_bind_object(obj, obj-cache_level); + if (obj-tiling_mode == I915_TILING_NONE) ret = i915_gem_object_put_fence(obj); else @@ -2133,7 +2136,8 @@ i915_gem_object_unbind(struct drm_i915_gem_object *obj) trace_i915_gem_object_unbind(obj); - i915_gem_gtt_unbind_object(obj); + if (obj-has_global_gtt_mapping) + i915_gem_gtt_unbind_object(obj); if (obj-has_aliasing_ppgtt_mapping) { i915_ppgtt_unbind_object(dev_priv-mm.aliasing_ppgtt, obj); obj-has_aliasing_ppgtt_mapping = 0; @@ -2986,7 +2990,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, return ret; } - i915_gem_gtt_bind_object(obj, cache_level); + if (obj-has_global_gtt_mapping) + i915_gem_gtt_bind_object(obj, cache_level); if (obj-has_aliasing_ppgtt_mapping) i915_ppgtt_bind_object(dev_priv-mm.aliasing_ppgtt, obj, cache_level); @@ -3369,6 +3374,9 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj, return ret; } + if (!obj-has_global_gtt_mapping map_and_fenceable) + i915_gem_gtt_bind_object(obj, obj-cache_level); + if (obj-pin_count++ == 0) { if (!obj-active) list_move_tail(obj-mm_list, diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index bfa500d..f437d4b 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -388,12 +388,16 @@ void i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj, obj-base.size PAGE_SHIFT, obj-pages, agp_type); + + obj-has_global_gtt_mapping = 1; } void i915_gem_gtt_unbind_object(struct drm_i915_gem_object *obj) { intel_gtt_clear_range(obj-gtt_space-start PAGE_SHIFT, obj-base.size PAGE_SHIFT); + + obj-has_global_gtt_mapping = 0; } void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 65dc543..20f6e5a 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -711,6 +711,9 @@ i915_error_object_create(struct drm_i915_private *dev_priv, if (src == NULL || src-pages == NULL || !src-map_and_fenceable) return NULL; + if (!src-has_global_gtt_mapping) + return NULL; + page_count = src-base.size / PAGE_SIZE; dst = kmalloc(sizeof(*dst) + page_count * sizeof(u32 *), GFP_ATOMIC); -- 1.7.7.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: capture error_state also for stuck rings
On Tue, 29 Nov 2011 12:16:34 +0100, Daniel Vetter daniel.vet...@ffwll.ch wrote: Since quite a while we also the basic output configuration in the error_state, so it should contain enough information to diagnose these MI_WAIT hangs. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch Reviewed-and-tested-by: Chris Wilson ch...@chris-wilson.co.uk I've often used this for debugging, so about time it went upstream. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 1/4 v2] drm/i915: fix ELD writing for SandyBridge
SandyBridge should be using the same register addresses as IvyBridge. Signed-off-by: Wu Fengguang fengguang...@intel.com --- drivers/gpu/drm/i915/intel_display.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) Sorry, this moves some necessary changes from patch 2/4 to here. --- linux.orig/drivers/gpu/drm/i915/intel_display.c 2011-11-28 15:33:04.0 +0800 +++ linux/drivers/gpu/drm/i915/intel_display.c 2011-11-29 19:51:28.0 +0800 @@ -5857,14 +5857,14 @@ static void ironlake_write_eld(struct dr int aud_cntl_st; int aud_cntrl_st2; - if (IS_IVYBRIDGE(connector-dev)) { - hdmiw_hdmiedid = GEN7_HDMIW_HDMIEDID_A; - aud_cntl_st = GEN7_AUD_CNTRL_ST_A; - aud_cntrl_st2 = GEN7_AUD_CNTRL_ST2; - } else { + if (HAS_PCH_IBX(connector-dev)) { hdmiw_hdmiedid = GEN5_HDMIW_HDMIEDID_A; aud_cntl_st = GEN5_AUD_CNTL_ST_A; aud_cntrl_st2 = GEN5_AUD_CNTL_ST2; + } else { + hdmiw_hdmiedid = GEN7_HDMIW_HDMIEDID_A; + aud_cntl_st = GEN7_AUD_CNTRL_ST_A; + aud_cntrl_st2 = GEN7_AUD_CNTRL_ST2; } i = to_intel_crtc(crtc)-pipe; ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/4 v2] drm/i915: rename audio ELD registers
Change the definitions from GEN5 to IBX as they aren't in the CPU and some SNB systems actually shipped with IBX chipsets (or, at least that's a supported configuration). The GEN7_* register addresses actually take effect since GEN6 and should be prefixed by CPT, the PCH code name. Suggested-by: Keith Packard kei...@keithp.com Signed-off-by: Wu Fengguang fengguang...@intel.com --- drivers/gpu/drm/i915/i915_reg.h | 22 +++--- drivers/gpu/drm/i915/intel_display.c | 22 +++--- 2 files changed, 22 insertions(+), 22 deletions(-) --- linux.orig/drivers/gpu/drm/i915/i915_reg.h 2011-11-29 19:50:27.0 +0800 +++ linux/drivers/gpu/drm/i915/i915_reg.h 2011-11-29 19:51:38.0 +0800 @@ -3534,17 +3534,17 @@ #define G4X_ELD_ACK(1 4) #define G4X_HDMIW_HDMIEDID 0x6210C -#define GEN5_HDMIW_HDMIEDID_A 0xE2050 -#define GEN5_AUD_CNTL_ST_A 0xE20B4 -#define GEN5_ELD_BUFFER_SIZE (0x1f 10) -#define GEN5_ELD_ADDRESS (0x1f 5) -#define GEN5_ELD_ACK (1 4) -#define GEN5_AUD_CNTL_ST2 0xE20C0 -#define GEN5_ELD_VALIDB(1 0) -#define GEN5_CP_READYB (1 1) +#define IBX_HDMIW_HDMIEDID_A 0xE2050 +#define IBX_AUD_CNTL_ST_A 0xE20B4 +#define IBX_ELD_BUFFER_SIZE(0x1f 10) +#define IBX_ELD_ADDRESS(0x1f 5) +#define IBX_ELD_ACK(1 4) +#define IBX_AUD_CNTL_ST2 0xE20C0 +#define IBX_ELD_VALIDB (1 0) +#define IBX_CP_READYB (1 1) -#define GEN7_HDMIW_HDMIEDID_A 0xE5050 -#define GEN7_AUD_CNTRL_ST_A0xE50B4 -#define GEN7_AUD_CNTRL_ST2 0xE50C0 +#define CPT_HDMIW_HDMIEDID_A 0xE5050 +#define CPT_AUD_CNTL_ST_A 0xE50B4 +#define CPT_AUD_CNTRL_ST2 0xE50C0 #endif /* _I915_REG_H_ */ --- linux.orig/drivers/gpu/drm/i915/intel_display.c 2011-11-29 19:51:28.0 +0800 +++ linux/drivers/gpu/drm/i915/intel_display.c 2011-11-29 19:52:01.0 +0800 @@ -5858,13 +5858,13 @@ static void ironlake_write_eld(struct dr int aud_cntrl_st2; if (HAS_PCH_IBX(connector-dev)) { - hdmiw_hdmiedid = GEN5_HDMIW_HDMIEDID_A; - aud_cntl_st = GEN5_AUD_CNTL_ST_A; - aud_cntrl_st2 = GEN5_AUD_CNTL_ST2; + hdmiw_hdmiedid = IBX_HDMIW_HDMIEDID_A; + aud_cntl_st = IBX_AUD_CNTL_ST_A; + aud_cntrl_st2 = IBX_AUD_CNTL_ST2; } else { - hdmiw_hdmiedid = GEN7_HDMIW_HDMIEDID_A; - aud_cntl_st = GEN7_AUD_CNTRL_ST_A; - aud_cntrl_st2 = GEN7_AUD_CNTRL_ST2; + hdmiw_hdmiedid = CPT_HDMIW_HDMIEDID_A; + aud_cntl_st = CPT_AUD_CNTL_ST_A; + aud_cntrl_st2 = CPT_AUD_CNTRL_ST2; } i = to_intel_crtc(crtc)-pipe; @@ -5878,12 +5878,12 @@ static void ironlake_write_eld(struct dr if (!i) { DRM_DEBUG_DRIVER(Audio directed to unknown port\n); /* operate blindly on all ports */ - eldv = GEN5_ELD_VALIDB; - eldv |= GEN5_ELD_VALIDB 4; - eldv |= GEN5_ELD_VALIDB 8; + eldv = IBX_ELD_VALIDB; + eldv |= IBX_ELD_VALIDB 4; + eldv |= IBX_ELD_VALIDB 8; } else { DRM_DEBUG_DRIVER(ELD on port %c\n, 'A' + i); - eldv = GEN5_ELD_VALIDB ((i - 1) * 4); + eldv = IBX_ELD_VALIDB ((i - 1) * 4); } i = I915_READ(aud_cntrl_st2); @@ -5899,7 +5899,7 @@ static void ironlake_write_eld(struct dr } i = I915_READ(aud_cntl_st); - i = ~GEN5_ELD_ADDRESS; + i = ~IBX_ELD_ADDRESS; I915_WRITE(aud_cntl_st, i); len = min_t(uint8_t, eld[2], 21); /* 84 bytes of hw ELD buffer */ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 1/2] drm/i915: enable deepest RC6 state
This should allow even more energy saving when GPU is not in use. According to the testing, this state results in around 0.1 - 0.4 W better power usage. No issues or regressions observed so far, but additional testing is certainly welcome. Signed-off-by: Eugeni Dodonov eugeni.dodo...@intel.com --- drivers/gpu/drm/i915/intel_display.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 981b1f1..d1e5726 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -7924,6 +7924,7 @@ void gen6_enable_rps(struct drm_i915_private *dev_priv) if (i915_enable_rc6) rc6_mask = GEN6_RC_CTL_RC6p_ENABLE | + GEN6_RC_CTL_RC6pp_ENABLE | GEN6_RC_CTL_RC6_ENABLE; I915_WRITE(GEN6_RC_CONTROL, -- 1.7.7.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/2] drm/i915: add debugfs interface to control rc6 and rps
This allows to enable/disable rps and rc6 from userspace. This is necessary for to have predictable results from hardware counters, and also to provide a finer granularity over power control from userspace. As an additional trick, we also change the value of i915_enable_rc6 by using this value, to allow the module to know the latest status of rc6 requested by user. Signed-off-by: Eugeni Dodonov eugeni.dodo...@intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 101 ++ drivers/gpu/drm/i915/intel_display.c |2 +- drivers/gpu/drm/i915/intel_drv.h |3 + 3 files changed, 105 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 4f40f1c..dffc998 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1340,6 +1340,79 @@ static const struct file_operations i915_wedged_fops = { }; static int +i915_enable_rc6_open(struct inode *inode, + struct file *filp) +{ + filp-private_data = inode-i_private; + return 0; +} + +static ssize_t +i915_enable_rc6_read(struct file *filp, + char __user *ubuf, + size_t max, + loff_t *ppos) +{ + char buf[80]; + int len; + + len = snprintf(buf, sizeof(buf), + %d\n, i915_enable_rc6); + + if (len sizeof(buf)) + len = sizeof(buf); + + return simple_read_from_buffer(ubuf, max, ppos, buf, len); +} + +static ssize_t +i915_enable_rc6_write(struct file *filp, + const char __user *ubuf, + size_t cnt, + loff_t *ppos) +{ + struct drm_device *dev = filp-private_data; + struct drm_i915_private *dev_priv = dev-dev_private; + char buf[20]; + int val = -1; + + if (cnt 0) { + if (cnt sizeof(buf) - 1) + return -EINVAL; + + if (copy_from_user(buf, ubuf, cnt)) + return -EFAULT; + buf[cnt] = 0; + + val = simple_strtol(buf, NULL, 0); + } + + if (INTEL_INFO(dev)-gen 5) + return cnt; + + DRM_DEBUG_DRIVER(Manually setting rps and rc6 status to %d\n, val); + i915_enable_rc6 = val; + + if (val == 0) { + if (IS_IRONLAKE_M(dev)) + ironlake_disable_rc6(dev); + else { + gen6_disable_rps(dev); + gen6_update_ring_freq(dev_priv); + } + } else { + if (IS_IRONLAKE_M(dev)) + ironlake_disable_rc6(dev); + else { + gen6_enable_rps(dev_priv); + gen6_update_ring_freq(dev_priv); + } + } + + return cnt; +} + +static int i915_max_freq_open(struct inode *inode, struct file *filp) { @@ -1401,6 +1474,14 @@ i915_max_freq_write(struct file *filp, return cnt; } +static const struct file_operations i915_enable_rc6_fops = { + .owner = THIS_MODULE, + .open = i915_enable_rc6_open, + .read = i915_enable_rc6_read, + .write = i915_enable_rc6_write, + .llseek = default_llseek, +}; + static const struct file_operations i915_max_freq_fops = { .owner = THIS_MODULE, .open = i915_max_freq_open, @@ -1605,6 +1686,21 @@ static int i915_max_freq_create(struct dentry *root, struct drm_minor *minor) return drm_add_fake_info_node(minor, ent, i915_max_freq_fops); } +static int i915_enable_rc6_create(struct dentry *root, struct drm_minor *minor) +{ + struct drm_device *dev = minor-dev; + struct dentry *ent; + + ent = debugfs_create_file(i915_enable_rc6, + S_IRUGO | S_IWUSR, + root, dev, + i915_enable_rc6_fops); + if (IS_ERR(ent)) + return PTR_ERR(ent); + + return drm_add_fake_info_node(minor, ent, i915_enable_rc6_fops); +} + static int i915_cache_sharing_create(struct dentry *root, struct drm_minor *minor) { struct drm_device *dev = minor-dev; @@ -1676,6 +1772,9 @@ int i915_debugfs_init(struct drm_minor *minor) ret = i915_max_freq_create(minor-debugfs_root, minor); if (ret) return ret; + ret = i915_enable_rc6_create(minor-debugfs_root, minor); + if (ret) + return ret; ret = i915_cache_sharing_create(minor-debugfs_root, minor); if (ret) return ret; @@ -1695,6 +1794,8 @@ void i915_debugfs_cleanup(struct drm_minor *minor) 1, minor); drm_debugfs_remove_files((struct drm_info_list *) i915_max_freq_fops, 1, minor); + drm_debugfs_remove_files((struct drm_info_list *) i915_enable_rc6_fops, +
Re: [Intel-gfx] [PATCH 1/2] drm/i915: enable deepest RC6 state
On Tue, 29 Nov 2011 10:55:04 -0200, Eugeni Dodonov eugeni.dodo...@intel.com wrote: This should allow even more energy saving when GPU is not in use. According to the testing, this state results in around 0.1 - 0.4 W better power usage. No issues or regressions observed so far, but additional testing is certainly welcome. The docs I saw said not implemented; do not use. Do we have it on good authority that this is safe and useful to enable? And doesn't it require programming of more transition thresholds? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: add debugfs interface to control rc6 and rps
On Tue, 29 Nov 2011 10:55:05 -0200, Eugeni Dodonov eugeni.dodo...@intel.com wrote: This allows to enable/disable rps and rc6 from userspace. This is necessary for to have predictable results from hardware counters, and also to provide a finer granularity over power control from userspace. Reading registers and counters is handled with i915_forcewake_user. So the question is what benefit does htis give over module reload. It looks far riskier... -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: add debugfs interface to control rc6 and rps
On Tue, Nov 29, 2011 at 01:05:03PM +, Chris Wilson wrote: On Tue, 29 Nov 2011 10:55:05 -0200, Eugeni Dodonov eugeni.dodo...@intel.com wrote: This allows to enable/disable rps and rc6 from userspace. This is necessary for to have predictable results from hardware counters, and also to provide a finer granularity over power control from userspace. Reading registers and counters is handled with i915_forcewake_user. So the question is what benefit does htis give over module reload. It looks far riskier... The observation architecture bspec says that we need to disable rc6, otherwise we'll get garbage in the gpu perf counters. But imo I think this should be handled in the kernel (we neeed to write tons of funny registers to set things up anyway), at least for the time-based perf counter sampling. For prototyping things with intel_reg_write disabling rc6 at module load time seems sufficient to me. -Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: add debugfs interface to control rc6 and rps
On Tue, Nov 29, 2011 at 12:12, Daniel Vetter dan...@ffwll.ch wrote: On Tue, Nov 29, 2011 at 01:05:03PM +, Chris Wilson wrote: On Tue, 29 Nov 2011 10:55:05 -0200, Eugeni Dodonov eugeni.dodo...@intel.com wrote: This allows to enable/disable rps and rc6 from userspace. This is necessary for to have predictable results from hardware counters, and also to provide a finer granularity over power control from userspace. Reading registers and counters is handled with i915_forcewake_user. So the question is what benefit does htis give over module reload. It looks far riskier... The observation architecture bspec says that we need to disable rc6, otherwise we'll get garbage in the gpu perf counters. But imo I think this should be handled in the kernel (we neeed to write tons of funny registers to set things up anyway), at least for the time-based perf counter sampling. For prototyping things with intel_reg_write disabling rc6 at module load time seems sufficient to me. The major idea behind this was to provide additional facilities for controlling power and performance-related details of the driver from userspace, without module reloading or reboot. Currently, both the GPU turbo and rc6/fbc operations are mostly autonomous, there is not much we can influence on their behavior without a reboot or module reload. So my idea was to have a way to toggle - and in some future, ideally, fine-tune their behavior from userspace during the execution. For GPU turbo, we can control it to some point from userspace via i915_max_freq. And with this patch, we could also control the rps/rc6 behavior to some point as well. For the perf counters, I thought on the following flow of execution with this patch: 1. prev_val=$(cat /sys/kernel/debug/dri/0/i915_enable_rc6) 2. echo 0 /sys/kernel/debug/dri/0/i915_enable_rc6 3. do the perf counters aquisition and testing 4. echo $(prev_val) /sys/kernel/debug/dri/0/i915_enable_rc6 Comments? -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 00/15] [RFC] Forced throttling/scheduling
On Sat, Nov 19, 2011 at 00:24, Ben Widawsky b...@bwidawsk.net wrote: This code is currently living on my personal git repo: git://people.freedesktop.org/~bwidawsk/drm-intelhttp://people.freedesktop.org/%7Ebwidawsk/drm-intelforced_throttling Since these are RFC, I haven't spent too much time cleaning things up. Feel free to comment on typos, comments you feel are missing, etc. I also haven't spent any time running the standard kernel tests, kmemleak and such - I intend to do so while these are reviewed here. There are two main scheduler types implemented in this patch. What I refer to as a fairness scheduler, and a batch scheduler. The fairness scheduler is currently implemented on batch granularity but that is not guaranteed going forward. The batch scheduler is a way to set batch limits per pid. Refer to the relevant patch for more details. It is my opinion that the fairness scheduler isn't terribly interesting except to prevent badly written, or malicious apps. For example, a 3d app which queues up a ton of work but never calls glxSwapBuffer. The batch scheduler is also somewhat uninteresting as the values it uses require proper tuning and will vary from system to system, and even then depending on what's currently running. But like the fairness one, this too has its applications. Most comments and feedback are welcome. I am not that expert about how GPU scheduling works, but all the patches make sense to me and I haven't found anything apparently wrong. I have one feature request for patch 11, which is to list available schedulers via a debugfs entry. For example: # cat /sys/kernel/debug/dri/0/i915_available_schedulers none *fair batch or list them in the output of i915_scheduler, like: # cat /sys/kernel/debug/dri/0/i915_scheduler ... Scheduler: batch Available schedulers: none fair batch To see what schedulers are available for usage. But besides this, for the series: Tested-by: Eugeni Dodonov eugeni.dodo...@intel.com Reviewed-by: Eugeni Dodonov eugeni.dodo...@intel.com -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: add debugfs interface to control rc6 and rps
On Tue, 29 Nov 2011 12:26:23 -0200, Eugeni Dodonov eug...@dodonov.net wrote: For the perf counters, I thought on the following flow of execution with this patch: 1. prev_val=$(cat /sys/kernel/debug/dri/0/i915_enable_rc6) 2. echo 0 /sys/kernel/debug/dri/0/i915_enable_rc6 3. do the perf counters aquisition and testing 4. echo $(prev_val) /sys/kernel/debug/dri/0/i915_enable_rc6 Comments? debugfs should never be ABI. Development and debug testing is one thing, but if we are serious about this, we need to start building an acceptable user interface elsewhere in /sys. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/2] drm/i915: properly clflush pwrites to phys objects
Usually results in (rare) cursor corruptions on platforms requiring physically addressed cursors. Note to the stable team: This requires the drm core patch drm: add helper to clflush a virtual address range which creates the helper used here. Tested-and-reported-by: Bruno Prémont bonb...@linux-vserver.org Cc: sta...@kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35460 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=21442 Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_gem.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 39459d2..e395a7d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4123,6 +4123,7 @@ i915_gem_phys_pwrite(struct drm_device *dev, return -EFAULT; } + drm_clflush_virt_range(vaddr, args-size); intel_gtt_chipset_flush(); return 0; } -- 1.7.7.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Only clear the GPU domains upon a successful finish
By clearing the GPU read domains before waiting upon the buffer, we run the risk of the wait being interrupted and the domains prematurely cleared. The next time we attempt to wait upon the buffer (after userspace handles the signal), we believe that the buffer is idle and so skip the wait. There are a number of bugs across all generations which show signs of an overly haste reuse of active buffers. Such as: https://bugs.freedesktop.org/show_bug.cgi?id=29046 https://bugs.freedesktop.org/show_bug.cgi?id=35863 https://bugs.freedesktop.org/show_bug.cgi?id=38952 https://bugs.freedesktop.org/show_bug.cgi?id=40282 https://bugs.freedesktop.org/show_bug.cgi?id=41098 https://bugs.freedesktop.org/show_bug.cgi?id=41102 https://bugs.freedesktop.org/show_bug.cgi?id=41284 https://bugs.freedesktop.org/show_bug.cgi?id=42141 A couple of those pre-date i915_gem_object_finish_gpu(), so may be unrelated (such as a wild write from a userspace command buffer), but this does look like a convincing cause for most of those bugs. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: sta...@kernel.org --- drivers/gpu/drm/i915/i915_gem.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8118942..593a64b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3121,10 +3121,13 @@ i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj) return ret; } + ret = i915_gem_object_wait_rendering(obj); + if (ret) + return ret; + /* Ensure that we invalidate the GPU's caches and TLBs. */ obj-base.read_domains = ~I915_GEM_GPU_DOMAINS; - - return i915_gem_object_wait_rendering(obj); + return 0; } /** -- 1.7.7.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU
We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these caches, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 24 +--- 1 files changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index f71f26e..9b8d543 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -265,6 +265,12 @@ eb_destroy(struct eb_objects *eb) kfree(eb); } +static inline int use_cpu_reloc(struct drm_i915_gem_object *obj) +{ + return (obj-base.write_domain == I915_GEM_DOMAIN_CPU || + obj-cache_level != I915_CACHE_NONE); +} + static int i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, struct eb_objects *eb, @@ -351,7 +357,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, } reloc-delta += target_offset; - if (obj-base.write_domain == I915_GEM_DOMAIN_CPU) { + if (use_cpu_reloc(obj)) { uint32_t page_offset = reloc-offset ~PAGE_MASK; char *vaddr; @@ -463,6 +469,13 @@ i915_gem_execbuffer_relocate(struct drm_device *dev, #define __EXEC_OBJECT_HAS_FENCE (131) static int +need_reloc_mappable(struct drm_i915_gem_object *obj) +{ + struct drm_i915_gem_exec_object2 *entry = obj-exec_entry; + return entry-relocation_count !use_cpu_reloc(obj); +} + +static int pin_and_fence_object(struct drm_i915_gem_object *obj, struct intel_ring_buffer *ring) { @@ -475,8 +488,7 @@ pin_and_fence_object(struct drm_i915_gem_object *obj, has_fenced_gpu_access entry-flags EXEC_OBJECT_NEEDS_FENCE obj-tiling_mode != I915_TILING_NONE; - need_mappable = - entry-relocation_count ? true : need_fence; + need_mappable = need_fence || need_reloc_mappable(obj); ret = i915_gem_object_pin(obj, entry-alignment, need_mappable); if (ret) @@ -532,8 +544,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring, has_fenced_gpu_access entry-flags EXEC_OBJECT_NEEDS_FENCE obj-tiling_mode != I915_TILING_NONE; - need_mappable = - entry-relocation_count ? true : need_fence; + need_mappable = need_fence || need_reloc_mappable(obj); if (need_mappable) list_move(obj-exec_list, ordered_objects); @@ -573,8 +584,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring, has_fenced_gpu_access entry-flags EXEC_OBJECT_NEEDS_FENCE obj-tiling_mode != I915_TILING_NONE; - need_mappable = - entry-relocation_count ? true : need_fence; + need_mappable = need_fence || need_reloc_mappable(obj); if ((entry-alignment obj-gtt_offset (entry-alignment - 1)) || (need_mappable !obj-map_and_fenceable)) -- 1.7.7.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU
On Tue, Nov 29, 2011 at 03:12:40PM +, Chris Wilson wrote: We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these caches, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk The error_state capture currently relies on us pinning buffers as mappable when they contain relocations (and userspace always submitting a batchbuffers containing relocations). You break that guarantee without fixing up the error capture code. Otherwise I like this. -Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: properly clflush pwrites to phys objects
On Tue, 29 Nov 2011 16:09:29 +0100, Daniel Vetter daniel.vet...@ffwll.ch wrote: Usually results in (rare) cursor corruptions on platforms requiring physically addressed cursors. So the phys cursor pages are set to WC upon creation, are we just missing the mb()? Or more likely the CPUs don't have PAT and we are being lazy in not detecting the error. Anyway as this is reported to fix the issue on 8xx at least, overkill is fine. :) Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Only clear the GPU domains upon a successful finish
On Tue, Nov 29, 2011 at 03:12:16PM +, Chris Wilson wrote: By clearing the GPU read domains before waiting upon the buffer, we run the risk of the wait being interrupted and the domains prematurely cleared. The next time we attempt to wait upon the buffer (after userspace handles the signal), we believe that the buffer is idle and so skip the wait. There are a number of bugs across all generations which show signs of an overly haste reuse of active buffers. Such as: https://bugs.freedesktop.org/show_bug.cgi?id=29046 https://bugs.freedesktop.org/show_bug.cgi?id=35863 https://bugs.freedesktop.org/show_bug.cgi?id=38952 https://bugs.freedesktop.org/show_bug.cgi?id=40282 https://bugs.freedesktop.org/show_bug.cgi?id=41098 https://bugs.freedesktop.org/show_bug.cgi?id=41102 https://bugs.freedesktop.org/show_bug.cgi?id=41284 https://bugs.freedesktop.org/show_bug.cgi?id=42141 A couple of those pre-date i915_gem_object_finish_gpu(), so may be unrelated (such as a wild write from a userspace command buffer), but this does look like a convincing cause for most of those bugs. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: sta...@kernel.org Really nice catch! Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Only clear the GPU domains upon a successful finish
On Tue, Nov 29, 2011 at 13:12, Chris Wilson ch...@chris-wilson.co.ukwrote: By clearing the GPU read domains before waiting upon the buffer, we run the risk of the wait being interrupted and the domains prematurely cleared. The next time we attempt to wait upon the buffer (after userspace handles the signal), we believe that the buffer is idle and so skip the wait. There are a number of bugs across all generations which show signs of an overly haste reuse of active buffers. Such as: https://bugs.freedesktop.org/show_bug.cgi?id=29046 https://bugs.freedesktop.org/show_bug.cgi?id=35863 https://bugs.freedesktop.org/show_bug.cgi?id=38952 https://bugs.freedesktop.org/show_bug.cgi?id=40282 https://bugs.freedesktop.org/show_bug.cgi?id=41098 https://bugs.freedesktop.org/show_bug.cgi?id=41102 https://bugs.freedesktop.org/show_bug.cgi?id=41284 https://bugs.freedesktop.org/show_bug.cgi?id=42141 A couple of those pre-date i915_gem_object_finish_gpu(), so may be unrelated (such as a wild write from a userspace command buffer), but this does look like a convincing cause for most of those bugs. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: sta...@kernel.org Reviewed-by: Eugeni Dodonov eugeni.dodo...@intel.com -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: multithreaded forcewake is an ivb+ feature
On Mon, Nov 28, 2011 at 19:12, Daniel Vetter daniel.vet...@ffwll.ch wrote: Name the function accordingly. Suggested by Chris Wilson. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch Reviewed-by: Eugeni Dodonov eugeni.dodo...@intel.com Such patches are so hard to review. The only ones harder than those are the white space-related ones :). -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: properly clflush pwrites to phys objects
On Tue, Nov 29, 2011 at 03:35:54PM +, Chris Wilson wrote: On Tue, 29 Nov 2011 16:09:29 +0100, Daniel Vetter daniel.vet...@ffwll.ch wrote: Usually results in (rare) cursor corruptions on platforms requiring physically addressed cursors. So the phys cursor pages are set to WC upon creation, are we just missing the mb()? Or more likely the CPUs don't have PAT and we are being lazy in not detecting the error. Yes, on reconsidering the tested-by is from a pentium m, which has working pat, and we do a wbinvd in the i8xx chipset flush, so I don't know anymore how this patch actually works. But it seems to indeed fix the issue for at least one reporter and cursor update is about as far away from a perf critical path as possible, so who cares about such minor quibbles, it works ;-) -Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/4] drm/i915: drop register special-casing in forcewake
On Mon, Nov 28, 2011 at 18:17, Daniel Vetter daniel.vet...@ffwll.ch wrote: We currently have 3 register for which we must not grab forcewake for: FORCEWAKE, FROCEWAKE_MT and ECOBUS. - FROCEWAKE is excluded in the NEEDS_FORCE_WAKE macro and accessed with _NOTRACE. - FROCEWAKE_MT is just accessed with _NOTRACE. s/FROCEWAKE/FORCEWAKE/g :) Besides this, Reviewed-by: Eugeni Dodonov eugeni.dodo...@intel.com for the series. -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: capture error_state also for stuck rings
On Tue, Nov 29, 2011 at 09:42, Chris Wilson ch...@chris-wilson.co.ukwrote: On Tue, 29 Nov 2011 12:16:34 +0100, Daniel Vetter daniel.vet...@ffwll.ch wrote: Since quite a while we also the basic output configuration in the error_state, so it should contain enough information to diagnose these MI_WAIT hangs. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch Reviewed-and-tested-by: Chris Wilson ch...@chris-wilson.co.uk Reviewed-by: Eugeni Dodonov eugeni.dodo...@intel.com -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915: re-enable semaphores by default
On Tue, Nov 22, 2011 at 06:41, Andrew Lutomirski l...@mit.edu wrote: On Thu, Nov 17, 2011 at 1:22 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote: Oops, sorry for wasting your time, wrong branch. Can you try the for-poland one? And also please try what happens when enabling the iommu. for-poland with semaphores, rc6, and iommu on seems to work. Â I'm sending this email from that kernel right now :) Just to double-check: Can you please retest my latest ppgtt branch available at: http://cgit.freedesktop.org/~danvet/drm/log/?h=ppgtt If that works for you with DMAR and semaphores enabled, a tested-by is highly appreciated. Thanks, Daniel -- Daniel Vetter daniel.vet...@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU
On Tue, 29 Nov 2011 16:34:41 +0100, Daniel Vetter dan...@ffwll.ch wrote: On Tue, Nov 29, 2011 at 03:12:40PM +, Chris Wilson wrote: We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these caches, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk The error_state capture currently relies on us pinning buffers as mappable when they contain relocations (and userspace always submitting a batchbuffers containing relocations). You break that guarantee without fixing up the error capture code. Otherwise I like this. I may have sent that patch a little earlier. ;-) That particular patch stands by itself since we already do use the full GTT and so need defense against reading through unmappable PTEs because having some of our errors, you can't be paranoid enough. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU
On Tue, Nov 29, 2011 at 04:48:15PM +, Chris Wilson wrote: On Tue, 29 Nov 2011 16:34:41 +0100, Daniel Vetter dan...@ffwll.ch wrote: On Tue, Nov 29, 2011 at 03:12:40PM +, Chris Wilson wrote: We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these caches, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk The error_state capture currently relies on us pinning buffers as mappable when they contain relocations (and userspace always submitting a batchbuffers containing relocations). You break that guarantee without fixing up the error capture code. Otherwise I like this. I may have sent that patch a little earlier. ;-) Yes, I know. My gripe is that this will reduce our chances of successfully capturing the error_state, because now we expect to hit that case in the error capture code whereas up to now it would have been a bug somewhere. So either - fixup the error_capture to fall back to cpu reads (needs the usual clflush dance if the object is not llc cached) - or drop the pin mappable change in this patch. -Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU
On Tue, Nov 29, 2011 at 06:03:53PM +0100, Daniel Vetter wrote: On Tue, Nov 29, 2011 at 04:48:15PM +, Chris Wilson wrote: On Tue, 29 Nov 2011 16:34:41 +0100, Daniel Vetter dan...@ffwll.ch wrote: On Tue, Nov 29, 2011 at 03:12:40PM +, Chris Wilson wrote: We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these caches, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk The error_state capture currently relies on us pinning buffers as mappable when they contain relocations (and userspace always submitting a batchbuffers containing relocations). You break that guarantee without fixing up the error capture code. Otherwise I like this. I may have sent that patch a little earlier. ;-) Yes, I know. My gripe is that this will reduce our chances of successfully capturing the error_state, because now we expect to hit that case in the error capture code whereas up to now it would have been a bug somewhere. So either - fixup the error_capture to fall back to cpu reads (needs the usual clflush dance if the object is not llc cached) - or drop the pin mappable change in this patch. After some irc discussion with Dave Airlie I think a simple wmb() to flush the wc buffer might make more sense. I'll try to run that past testers and gather results. Will take at least a week to get anything conclusive. -Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU
On Tue, 29 Nov 2011 18:03:53 +0100, Daniel Vetter dan...@ffwll.ch wrote: On Tue, Nov 29, 2011 at 04:48:15PM +, Chris Wilson wrote: On Tue, 29 Nov 2011 16:34:41 +0100, Daniel Vetter dan...@ffwll.ch wrote: On Tue, Nov 29, 2011 at 03:12:40PM +, Chris Wilson wrote: We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these caches, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk The error_state capture currently relies on us pinning buffers as mappable when they contain relocations (and userspace always submitting a batchbuffers containing relocations). You break that guarantee without fixing up the error capture code. Otherwise I like this. I may have sent that patch a little earlier. ;-) Yes, I know. My gripe is that this will reduce our chances of successfully capturing the error_state, because now we expect to hit that case in the error capture code whereas up to now it would have been a bug somewhere. So either - fixup the error_capture to fall back to cpu reads (needs the usual clflush dance if the object is not llc cached) - or drop the pin mappable change in this patch. Ah you forget, I volunteered you to do the error-state capture from a workqueue so that we could add further complexity... We would then be able to allocate enough memory to capture auxiliary buffers as well, etc. In the meantime, the paths that hit this code are during warmup (before any batches have been retired into the userspace bo cache), slow steady state behaviour (when the caches are being reaped and repopulated), and when thrashing the hardware. It also requires that the whole mappable range had already been allocated. Whilst not negligible, the risk imo is small and all will be solved with the next generation i915_capture_error(). -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: flush overlay regfile writes
Better be paranoid. The wmb should flush the wc writes, and the chipset_flush hopefully flushes any mch buffers. There've been a few overlay hangs I've never really diagnosed, unfortunately all the reporters disappeared. Maybe-related: https://bugs.freedesktop.org/show_bug.cgi?id=33309 Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/intel_overlay.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index cdf17d4..f75e892 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -209,6 +209,9 @@ static void intel_overlay_unmap_regs(struct intel_overlay *overlay, { if (!OVERLAY_NEEDS_PHYSICAL(overlay-dev)) io_mapping_unmap(regs); + + wmb(); + intel_gtt_chipset_flush(); } static int intel_overlay_do_wait_request(struct intel_overlay *overlay, -- 1.7.7.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: flush overlay regfile writes
On Tue, 29 Nov 2011 18:32:18 +0100, Daniel Vetter daniel.vet...@ffwll.ch wrote: Better be paranoid. The wmb should flush the wc writes, and the chipset_flush hopefully flushes any mch buffers. There've been a few overlay hangs I've never really diagnosed, unfortunately all the reporters disappeared. One of the worries I've had in the past has been whether we need a wmb() inside intel_ring_begin() so that all the register writes (assuming we get WC registers one day) and the WC gtt writes are coherent with initiating work on the GPU. I attacked i915_gem_object_flush_gtt_write_domain() instead in the belief that would be sufficient. I think this demonstrates I was wrong. However, you could argue that the overlay code is circumventing the cache domain tracking of gem objects established for this very purpose. Though calling i915_gem_object_set_to_gtt_domain(obj,true) in map and i915_gem_object_flush_gtt_write_domain() in unmap might be overkill... -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
On Mon, 28 Nov 2011 21:35:30 +0100 Daniel Vetter daniel.vet...@ffwll.ch wrote: This just adds the setup and teardown code for the ppgtt PDE and the last-level pagetables, which are fixed for the entire lifetime, at least for the moment. v2: Kill the stray debug printk noted by and improve the pte definitions as suggested by Chris Wilson. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_dma.c | 14 +++- drivers/gpu/drm/i915/i915_drv.h | 18 + drivers/gpu/drm/i915/i915_gem_gtt.c | 133 +++ drivers/gpu/drm/i915/i915_reg.h | 18 + 4 files changed, 181 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index fd617fb..9d9a92c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1199,13 +1199,21 @@ static int i915_load_gem_init(struct drm_device *dev) * at the last page of the aperture. One page should be enough to * keep any prefetching inside of the aperture. */ - i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE); + i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE - 512*PAGE_SIZE); + + if (HAS_ALIASING_PPGTT(dev)) { + ret = i915_gem_init_aliasing_ppgtt(dev); + if (ret) + return ret; + } Not sure if you fixed this based on our IRC conversation. The size belongs in the HAS_ALIASING block. Also, I'd prefer NUM_PDEs or something similar. Finally, if you put the PDEs at the top, couldn't we get rid of the scratch page? Ben ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Handle unmappable buffers during error state capture
As the buffer is not necessarily accessible through the GTT at the time of a GPU hang, and capturing some of its contents is far more valuable than skipping it, provide a clflushed fallback read path. We still prefer to read through the GTT as that is more consistent with the GPU access of the same buffer. So example it will demonstrate any errorneous tiling or swizzling of the command buffer as seen by the GPU. This becomes necessary with use of CPU relocations and lazy GTT binding, but could potentially happen anyway as a result of a pathological error. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Daniel Vetter daniel.vet...@ffwll.ch --- We continued discussing how best to handle this in the light of the desirability of unmappable command buffers and decided that a CPU fallback path was necessary to maintain the utility of the i915_error_state and a prerequisite for LLC relocations. -Chris --- drivers/gpu/drm/i915/i915_irq.c | 28 +++- 1 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index b40004b..08877a7 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -720,7 +720,6 @@ i915_error_object_create(struct drm_i915_private *dev_priv, reloc_offset = src-gtt_offset; for (page = 0; page page_count; page++) { unsigned long flags; - void __iomem *s; void *d; d = kmalloc(PAGE_SIZE, GFP_ATOMIC); @@ -728,10 +727,29 @@ i915_error_object_create(struct drm_i915_private *dev_priv, goto unwind; local_irq_save(flags); - s = io_mapping_map_atomic_wc(dev_priv-mm.gtt_mapping, -reloc_offset); - memcpy_fromio(d, s, PAGE_SIZE); - io_mapping_unmap_atomic(s); + if (reloc_offset dev_priv-mm.gtt_mappable_end) { + void __iomem *s; + + /* Simply ignore tiling or any overlapping fence. +* It's part of the error state, and this hopefully +* captures what the GPU read. +*/ + + s = io_mapping_map_atomic_wc(dev_priv-mm.gtt_mapping, +reloc_offset); + memcpy_fromio(d, s, PAGE_SIZE); + io_mapping_unmap_atomic(s); + } else { + void *s; + + drm_clflush_pages(src-pages[page], 1); + + s = kmap_atomic(src-pages[page]); + memcpy(d, s, PAGE_SIZE); + kunmap_atomic(s); + + drm_clflush_pages(src-pages[page], 1); + } local_irq_restore(flags); dst-pages[page] = d; -- 1.7.7.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
On Tue, Nov 29, 2011 at 10:50:56AM -0800, Ben Widawsky wrote: On Mon, 28 Nov 2011 21:35:30 +0100 Daniel Vetter daniel.vet...@ffwll.ch wrote: This just adds the setup and teardown code for the ppgtt PDE and the last-level pagetables, which are fixed for the entire lifetime, at least for the moment. v2: Kill the stray debug printk noted by and improve the pte definitions as suggested by Chris Wilson. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_dma.c | 14 +++- drivers/gpu/drm/i915/i915_drv.h | 18 + drivers/gpu/drm/i915/i915_gem_gtt.c | 133 +++ drivers/gpu/drm/i915/i915_reg.h | 18 + 4 files changed, 181 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index fd617fb..9d9a92c 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1199,13 +1199,21 @@ static int i915_load_gem_init(struct drm_device *dev) * at the last page of the aperture. One page should be enough to * keep any prefetching inside of the aperture. */ - i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE); + i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE - 512*PAGE_SIZE); + + if (HAS_ALIASING_PPGTT(dev)) { + ret = i915_gem_init_aliasing_ppgtt(dev); + if (ret) + return ret; + } Not sure if you fixed this based on our IRC conversation. The size belongs in the HAS_ALIASING block. Also, I'd prefer NUM_PDEs or something similar. Finally, if you put the PDEs at the top, couldn't we get rid of the scratch page? Well, yet another reason this is about the wost patch series I've submitted in the last few months. You're totally right, I'll move this in. I'll keep the guard page though even for the ppgtt case, I feel safer with the more paranoid choice and it's just 4kb of almost 2GB of gtt. -Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: initialization/teardown for the aliasing ppgtt
This just adds the setup and teardown code for the ppgtt PDE and the last-level pagetables, which are fixed for the entire lifetime, at least for the moment. v2: Kill the stray debug printk noted by and improve the pte definitions as suggested by Chris Wilson. v3: Clean up the aperture stealing code as noted by Ben Widawsky. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_dma.c | 40 --- drivers/gpu/drm/i915/i915_drv.h | 18 + drivers/gpu/drm/i915/i915_gem_gtt.c | 133 +++ drivers/gpu/drm/i915/i915_reg.h | 18 + 4 files changed, 198 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index fd617fb..9c56284 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1190,22 +1190,38 @@ static int i915_load_gem_init(struct drm_device *dev) /* Basic memrange allocator for stolen space */ drm_mm_init(dev_priv-mm.stolen, 0, prealloc_size); - /* Let GEM Manage all of the aperture. -* -* However, leave one page at the end still bound to the scratch page. -* There are a number of places where the hardware apparently -* prefetches past the end of the object, and we've seen multiple -* hangs with the GPU head pointer stuck in a batchbuffer bound -* at the last page of the aperture. One page should be enough to -* keep any prefetching inside of the aperture. -*/ - i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE); + if (HAS_ALIASING_PPGTT(dev)) { + /* PPGTT pdes are stolen from global gtt ptes, so shrink the +* aperture accordingly when using aliasing ppgtt. For paranoia +* keep the guard page in between. */ + i915_gem_do_init(dev, 0, mappable_size, +gtt_size - PAGE_SIZE +- I915_PPGTT_PD_ENTRIES*PAGE_SIZE); + + ret = i915_gem_init_aliasing_ppgtt(dev); + if (ret) + return ret; + } else { + /* Let GEM Manage all of the aperture. +* +* However, leave one page at the end still bound to the scratch +* page. There are a number of places where the hardware +* apparently prefetches past the end of the object, and we've +* seen multiple hangs with the GPU head pointer stuck in a +* batchbuffer bound at the last page of the aperture. One page +* should be enough to keep any prefetching inside of the +* aperture. +*/ + i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE); + } mutex_lock(dev-struct_mutex); ret = i915_gem_init_hw(dev); mutex_unlock(dev-struct_mutex); - if (ret) + if (ret) { + i915_gem_cleanup_aliasing_ppgtt(dev); return ret; + } /* Try to set up FBC with a reasonable compressed buffer size */ if (I915_HAS_FBC(dev) i915_powersave) { @@ -1292,6 +1308,7 @@ cleanup_gem: mutex_lock(dev-struct_mutex); i915_gem_cleanup_ringbuffer(dev); mutex_unlock(dev-struct_mutex); + i915_gem_cleanup_aliasing_ppgtt(dev); cleanup_vga_switcheroo: vga_switcheroo_unregister_client(dev-pdev); cleanup_vga_client: @@ -2179,6 +2196,7 @@ int i915_driver_unload(struct drm_device *dev) i915_gem_free_all_phys_object(dev); i915_gem_cleanup_ringbuffer(dev); mutex_unlock(dev-struct_mutex); + i915_gem_cleanup_aliasing_ppgtt(dev); if (I915_HAS_FBC(dev) i915_powersave) i915_cleanup_compression(dev); drm_mm_takedown(dev_priv-mm.stolen); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index bdbd6d8..8b6f1b7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -254,6 +254,16 @@ struct intel_device_info { u8 has_blt_ring:1; }; +#define I915_PPGTT_PD_ENTRIES 512 +#define I915_PPGTT_PT_ENTRIES 1024 +struct i915_hw_ppgtt { + unsigned num_pd_entries; + struct page **pt_pages; + uint32_t pd_offset; + dma_addr_t *pt_dma_addr; + dma_addr_t scratch_page_dma_addr; +}; + enum no_fbc_reason { FBC_NO_OUTPUT, /* no outputs enabled to compress */ FBC_STOLEN_TOO_SMALL, /* not enough space to hold compressed buffers */ @@ -584,6 +594,9 @@ typedef struct drm_i915_private { struct io_mapping *gtt_mapping; int gtt_mtrr; + /** PPGTT used for aliasing the PPGTT with the GTT */ + struct i915_hw_ppgtt *aliasing_ppgtt; + struct shrinker inactive_shrinker;
Re: [Intel-gfx] [PATCH] drm/i915: Handle unmappable buffers during error state capture
On Tue, Nov 29, 2011 at 07:02:06PM +, Chris Wilson wrote: As the buffer is not necessarily accessible through the GTT at the time of a GPU hang, and capturing some of its contents is far more valuable than skipping it, provide a clflushed fallback read path. We still prefer to read through the GTT as that is more consistent with the GPU access of the same buffer. So example it will demonstrate any errorneous tiling or swizzling of the command buffer as seen by the GPU. This becomes necessary with use of CPU relocations and lazy GTT binding, but could potentially happen anyway as a result of a pathological error. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Daniel Vetter daniel.vet...@ffwll.ch Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU
On Tue, Nov 29, 2011 at 03:12:40PM +, Chris Wilson wrote: We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these caches, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk With the improved error_state capture code this is Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Handle unmappable buffers during error state capture
On Tue, Nov 29, 2011 at 17:02, Chris Wilson ch...@chris-wilson.co.ukwrote: As the buffer is not necessarily accessible through the GTT at the time of a GPU hang, and capturing some of its contents is far more valuable than skipping it, provide a clflushed fallback read path. We still prefer to read through the GTT as that is more consistent with the GPU access of the same buffer. So example it will demonstrate any errorneous tiling or swizzling of the command buffer as seen by the GPU. This becomes necessary with use of CPU relocations and lazy GTT binding, but could potentially happen anyway as a result of a pathological error. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Daniel Vetter daniel.vet...@ffwll.ch Reviewed-by: Eugeni Dodonov eugeni.dodo...@intel.com -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: initialization/teardown for the aliasing ppgtt
This just adds the setup and teardown code for the ppgtt PDE and the last-level pagetables, which are fixed for the entire lifetime, at least for the moment. v2: Kill the stray debug printk noted by and improve the pte definitions as suggested by Chris Wilson. v3: Clean up the aperture stealing code as noted by Ben Widawsky. v4: Paint the init code in a more pleasing colour as suggest by Chris Wilson. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_dma.c | 41 --- drivers/gpu/drm/i915/i915_drv.h | 18 + drivers/gpu/drm/i915/i915_gem_gtt.c | 133 +++ drivers/gpu/drm/i915/i915_reg.h | 18 + 4 files changed, 199 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index fd617fb..2cc0ba4 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1190,22 +1190,39 @@ static int i915_load_gem_init(struct drm_device *dev) /* Basic memrange allocator for stolen space */ drm_mm_init(dev_priv-mm.stolen, 0, prealloc_size); - /* Let GEM Manage all of the aperture. -* -* However, leave one page at the end still bound to the scratch page. -* There are a number of places where the hardware apparently -* prefetches past the end of the object, and we've seen multiple -* hangs with the GPU head pointer stuck in a batchbuffer bound -* at the last page of the aperture. One page should be enough to -* keep any prefetching inside of the aperture. -*/ - i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE); + if (HAS_ALIASING_PPGTT(dev)) { + /* PPGTT pdes are stolen from global gtt ptes, so shrink the +* aperture accordingly when using aliasing ppgtt. */ + gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE; + /* For paranoia keep the guard page in between. */ + gtt_size -= PAGE_SIZE; + + i915_gem_do_init(dev, 0, mappable_size, gtt_size); + + ret = i915_gem_init_aliasing_ppgtt(dev); + if (ret) + return ret; + } else { + /* Let GEM Manage all of the aperture. +* +* However, leave one page at the end still bound to the scratch +* page. There are a number of places where the hardware +* apparently prefetches past the end of the object, and we've +* seen multiple hangs with the GPU head pointer stuck in a +* batchbuffer bound at the last page of the aperture. One page +* should be enough to keep any prefetching inside of the +* aperture. +*/ + i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE); + } mutex_lock(dev-struct_mutex); ret = i915_gem_init_hw(dev); mutex_unlock(dev-struct_mutex); - if (ret) + if (ret) { + i915_gem_cleanup_aliasing_ppgtt(dev); return ret; + } /* Try to set up FBC with a reasonable compressed buffer size */ if (I915_HAS_FBC(dev) i915_powersave) { @@ -1292,6 +1309,7 @@ cleanup_gem: mutex_lock(dev-struct_mutex); i915_gem_cleanup_ringbuffer(dev); mutex_unlock(dev-struct_mutex); + i915_gem_cleanup_aliasing_ppgtt(dev); cleanup_vga_switcheroo: vga_switcheroo_unregister_client(dev-pdev); cleanup_vga_client: @@ -2179,6 +2197,7 @@ int i915_driver_unload(struct drm_device *dev) i915_gem_free_all_phys_object(dev); i915_gem_cleanup_ringbuffer(dev); mutex_unlock(dev-struct_mutex); + i915_gem_cleanup_aliasing_ppgtt(dev); if (I915_HAS_FBC(dev) i915_powersave) i915_cleanup_compression(dev); drm_mm_takedown(dev_priv-mm.stolen); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index bdbd6d8..8b6f1b7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -254,6 +254,16 @@ struct intel_device_info { u8 has_blt_ring:1; }; +#define I915_PPGTT_PD_ENTRIES 512 +#define I915_PPGTT_PT_ENTRIES 1024 +struct i915_hw_ppgtt { + unsigned num_pd_entries; + struct page **pt_pages; + uint32_t pd_offset; + dma_addr_t *pt_dma_addr; + dma_addr_t scratch_page_dma_addr; +}; + enum no_fbc_reason { FBC_NO_OUTPUT, /* no outputs enabled to compress */ FBC_STOLEN_TOO_SMALL, /* not enough space to hold compressed buffers */ @@ -584,6 +594,9 @@ typedef struct drm_i915_private { struct io_mapping *gtt_mapping; int gtt_mtrr; + /** PPGTT used for aliasing the PPGTT with the GTT */ + struct i915_hw_ppgtt *aliasing_ppgtt; +
[Intel-gfx] [PATCH] drm/i915: initialization/teardown for the aliasing ppgtt
This just adds the setup and teardown code for the ppgtt PDE and the last-level pagetables, which are fixed for the entire lifetime, at least for the moment. v2: Kill the stray debug printk noted by and improve the pte definitions as suggested by Chris Wilson. v3: Clean up the aperture stealing code as noted by Ben Widawsky. v4: Paint the init code in a more pleasing colour as suggest by Chris Wilson. v5: Explain the magic numbers noticed by Ben Widawsky. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_dma.c | 41 --- drivers/gpu/drm/i915/i915_drv.h | 18 + drivers/gpu/drm/i915/i915_gem_gtt.c | 139 +++ drivers/gpu/drm/i915/i915_reg.h | 18 + 4 files changed, 205 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index fd617fb..2cc0ba4 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1190,22 +1190,39 @@ static int i915_load_gem_init(struct drm_device *dev) /* Basic memrange allocator for stolen space */ drm_mm_init(dev_priv-mm.stolen, 0, prealloc_size); - /* Let GEM Manage all of the aperture. -* -* However, leave one page at the end still bound to the scratch page. -* There are a number of places where the hardware apparently -* prefetches past the end of the object, and we've seen multiple -* hangs with the GPU head pointer stuck in a batchbuffer bound -* at the last page of the aperture. One page should be enough to -* keep any prefetching inside of the aperture. -*/ - i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE); + if (HAS_ALIASING_PPGTT(dev)) { + /* PPGTT pdes are stolen from global gtt ptes, so shrink the +* aperture accordingly when using aliasing ppgtt. */ + gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE; + /* For paranoia keep the guard page in between. */ + gtt_size -= PAGE_SIZE; + + i915_gem_do_init(dev, 0, mappable_size, gtt_size); + + ret = i915_gem_init_aliasing_ppgtt(dev); + if (ret) + return ret; + } else { + /* Let GEM Manage all of the aperture. +* +* However, leave one page at the end still bound to the scratch +* page. There are a number of places where the hardware +* apparently prefetches past the end of the object, and we've +* seen multiple hangs with the GPU head pointer stuck in a +* batchbuffer bound at the last page of the aperture. One page +* should be enough to keep any prefetching inside of the +* aperture. +*/ + i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE); + } mutex_lock(dev-struct_mutex); ret = i915_gem_init_hw(dev); mutex_unlock(dev-struct_mutex); - if (ret) + if (ret) { + i915_gem_cleanup_aliasing_ppgtt(dev); return ret; + } /* Try to set up FBC with a reasonable compressed buffer size */ if (I915_HAS_FBC(dev) i915_powersave) { @@ -1292,6 +1309,7 @@ cleanup_gem: mutex_lock(dev-struct_mutex); i915_gem_cleanup_ringbuffer(dev); mutex_unlock(dev-struct_mutex); + i915_gem_cleanup_aliasing_ppgtt(dev); cleanup_vga_switcheroo: vga_switcheroo_unregister_client(dev-pdev); cleanup_vga_client: @@ -2179,6 +2197,7 @@ int i915_driver_unload(struct drm_device *dev) i915_gem_free_all_phys_object(dev); i915_gem_cleanup_ringbuffer(dev); mutex_unlock(dev-struct_mutex); + i915_gem_cleanup_aliasing_ppgtt(dev); if (I915_HAS_FBC(dev) i915_powersave) i915_cleanup_compression(dev); drm_mm_takedown(dev_priv-mm.stolen); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index bdbd6d8..8b6f1b7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -254,6 +254,16 @@ struct intel_device_info { u8 has_blt_ring:1; }; +#define I915_PPGTT_PD_ENTRIES 512 +#define I915_PPGTT_PT_ENTRIES 1024 +struct i915_hw_ppgtt { + unsigned num_pd_entries; + struct page **pt_pages; + uint32_t pd_offset; + dma_addr_t *pt_dma_addr; + dma_addr_t scratch_page_dma_addr; +}; + enum no_fbc_reason { FBC_NO_OUTPUT, /* no outputs enabled to compress */ FBC_STOLEN_TOO_SMALL, /* not enough space to hold compressed buffers */ @@ -584,6 +594,9 @@ typedef struct drm_i915_private { struct io_mapping *gtt_mapping; int gtt_mtrr; + /** PPGTT used for aliasing the PPGTT with the GTT */
Re: [Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU
On Tue, Nov 29, 2011 at 03:12:40PM +, Chris Wilson wrote: We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these caches, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Reviewed-by: Ben Widawsky b...@bwidawsk.net ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915: enable deepest RC6 state
On Tue, Nov 29, 2011 at 01:01:31PM +, Chris Wilson wrote: On Tue, 29 Nov 2011 10:55:04 -0200, Eugeni Dodonov eugeni.dodo...@intel.com wrote: This should allow even more energy saving when GPU is not in use. According to the testing, this state results in around 0.1 - 0.4 W better power usage. No issues or regressions observed so far, but additional testing is certainly welcome. The docs I saw said not implemented; do not use. Do we have it on good authority that this is safe and useful to enable? And doesn't it require programming of more transition thresholds? -Chris Yes, I think we do have to program more stuff to make this work. Perhaps the BIOS puts in decent values for these registers though? We would have to restore those on reset and resume I'd guess. If BIOS doesn't use anything there, you probably aren't even entering these states. Also, for posterity, there are 3 rc6 states, rc6, deep rc6, and deepest rc6. I think deepest rc6 was recommended to avoid (though I don't recall a specific root caused issue, just some data from Jesse, from the windows team that it didn't seem stable). And I think deepest rc6 is also referred to as rc7 sometimes. IIRC, Jesse had patches for deep rc6, but it was very minimal performance gain. Someone should check if the windows driver uses it. Ben ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: remove the i915_batchbuffer_info debugfs file
With the error_state facility in place, this has outlived it's usefulness. It also oopses with the lates llc-reloc patches because it directly access objects through the gtt without any checks. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_debugfs.c | 40 --- 1 files changed, 0 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 919f072..84c7295 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -562,45 +562,6 @@ static int i915_hws_info(struct seq_file *m, void *data) return 0; } -static void i915_dump_object(struct seq_file *m, -struct io_mapping *mapping, -struct drm_i915_gem_object *obj) -{ - int page, page_count, i; - - page_count = obj-base.size / PAGE_SIZE; - for (page = 0; page page_count; page++) { - u32 *mem = io_mapping_map_wc(mapping, -obj-gtt_offset + page * PAGE_SIZE); - for (i = 0; i PAGE_SIZE; i += 4) - seq_printf(m, %08x : %08x\n, i, mem[i / 4]); - io_mapping_unmap(mem); - } -} - -static int i915_batchbuffer_info(struct seq_file *m, void *data) -{ - struct drm_info_node *node = (struct drm_info_node *) m-private; - struct drm_device *dev = node-minor-dev; - drm_i915_private_t *dev_priv = dev-dev_private; - struct drm_i915_gem_object *obj; - int ret; - - ret = mutex_lock_interruptible(dev-struct_mutex); - if (ret) - return ret; - - list_for_each_entry(obj, dev_priv-mm.active_list, mm_list) { - if (obj-base.read_domains I915_GEM_DOMAIN_COMMAND) { - seq_printf(m, --- gtt_offset = 0x%08x\n, obj-gtt_offset); - i915_dump_object(m, dev_priv-mm.gtt_mapping, obj); - } - } - - mutex_unlock(dev-struct_mutex); - return 0; -} - static int i915_ringbuffer_data(struct seq_file *m, void *data) { struct drm_info_node *node = (struct drm_info_node *) m-private; @@ -1823,7 +1784,6 @@ static struct drm_info_list i915_debugfs_list[] = { {i915_bsd_ringbuffer_info, i915_ringbuffer_info, 0, (void *)VCS}, {i915_blt_ringbuffer_data, i915_ringbuffer_data, 0, (void *)BCS}, {i915_blt_ringbuffer_info, i915_ringbuffer_info, 0, (void *)BCS}, - {i915_batchbuffers, i915_batchbuffer_info, 0}, {i915_error_state, i915_error_state, 0}, {i915_rstdby_delays, i915_rstdby_delays, 0}, {i915_cur_delayinfo, i915_cur_delayinfo, 0}, -- 1.7.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] Is MI_FLUSH_ENABLE bit 12?
Just reading through vol1c.4 of the bspec this evening and found something odd. Bit 11 of MI_MODE is Invalidate UHPTR enable. Bit 12 of MI_MODE is MI_FLUSH Enable And, yet, in i915_reg.h: #define MI_MODE 0x0209c # define VS_TIMER_DISPATCH (1 6) # define MI_FLUSH_ENABLE(1 11) Are we off-by-one on MI_FLUSH_ENABLE? Seems like this would cause serious problems... -- keith.pack...@intel.com pgp6gLFj3Dxwh.pgp Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote: This just adds the setup and teardown code for the ppgtt PDE and the last-level pagetables, which are fixed for the entire lifetime, at least for the moment. v2: Kill the stray debug printk noted by and improve the pte definitions as suggested by Chris Wilson. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch Crap... one more point +/* PPGTT support for Sandybdrige/Gen6 and later */ +static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt, +unsigned first_entry, +unsigned num_entries) +{ + int i, j; + uint32_t *pt_vaddr; + uint32_t scratch_pte; + + scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt-scratch_page_dma_addr); + scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC; + + for (i = 0; i ppgtt-num_pd_entries; i++) { + pt_vaddr = kmap_atomic(ppgtt-pt_pages[i]); + + for (j = 0; j I915_PPGTT_PT_ENTRIES; j++) + pt_vaddr[j] = scratch_pte; + + kunmap_atomic(pt_vaddr); + } + +} Don't you want to clflush here (unless I missed it somewhere else). Ideally I think you'd also flush the TLB with a PIPE_CONTROL, but I guess so long as we have that bit set that always flushes we're okay on that one... Still feel we need a clflush though. Ben ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 04/11] drm/i915: ppgtt binding/unbinding support
On Mon, Nov 28, 2011 at 09:35:31PM +0100, Daniel Vetter wrote: This adds support to bind/unbind objects and wires it up. Objects are only put into the ppgtt when necessary, i.e. at execbuf time. Objects are still unconditionally put into the global gtt. v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind like for the global gtt function. Noticed by Chris Wilson. Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index c918124..9c81cda 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -513,6 +513,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring, struct drm_file *file, struct list_head *objects) { + drm_i915_private_t *dev_priv = ring-dev-dev_private; struct drm_i915_gem_object *obj; int ret, retry; bool has_fenced_gpu_access = INTEL_INFO(ring-dev)-gen 4; @@ -621,6 +622,14 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring, } i915_gem_object_unpin(obj); + + /* ... and ensure ppgtt mapping exist if needed. */ + if (dev_priv-mm.aliasing_ppgtt !obj-has_aliasing_ppgtt_mapping) { + i915_ppgtt_bind_object(dev_priv-mm.aliasing_ppgtt, +obj, obj-cache_level); + + obj-has_aliasing_ppgtt_mapping = 1; + } } if (ret != -ENOSPC || retry 1) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index bd9b520..061ae12 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -34,22 +34,31 @@ static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt, unsigned first_entry, unsigned num_entries) { - int i, j; uint32_t *pt_vaddr; uint32_t scratch_pte; + unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES; + unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES; + unsigned last_pte, i; scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt-scratch_page_dma_addr); scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC; - for (i = 0; i ppgtt-num_pd_entries; i++) { - pt_vaddr = kmap_atomic(ppgtt-pt_pages[i]); + while (num_entries) { + last_pte = first_pte + num_entries; + if (last_pte I915_PPGTT_PT_ENTRIES) + last_pte = I915_PPGTT_PT_ENTRIES; + + pt_vaddr = kmap_atomic(ppgtt-pt_pages[act_pd]); - for (j = 0; j I915_PPGTT_PT_ENTRIES; j++) - pt_vaddr[j] = scratch_pte; + for (i = first_pte; i last_pte; i++) + pt_vaddr[i] = scratch_pte; kunmap_atomic(pt_vaddr); - } + num_entries -= last_pte - first_pte; + first_pte = 0; + act_pd++; + } } int i915_gem_init_aliasing_ppgtt(struct drm_device *dev) @@ -162,6 +171,131 @@ void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev) kfree(ppgtt); } +static void i915_ppgtt_insert_sg_entries(struct i915_hw_ppgtt *ppgtt, + struct scatterlist *sg_list, + unsigned sg_len, + unsigned first_entry, + uint32_t pte_flags) +{ + uint32_t *pt_vaddr, pte; + unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES; + unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES; + unsigned i, j, m, segment_len; + dma_addr_t page_addr; + struct scatterlist *sg; + + /* init sg walking */ + sg = sg_list; + i = 0; + segment_len = sg_dma_len(sg) PAGE_SHIFT; + m = 0; + + while (i sg_len) { + pt_vaddr = kmap_atomic(ppgtt-pt_pages[act_pd]); + + for (j = first_pte; j I915_PPGTT_PT_ENTRIES; j++) { + page_addr = sg_dma_address(sg) + (m PAGE_SHIFT); + pte = GEN6_PTE_ADDR_ENCODE(page_addr); + pt_vaddr[j] = pte | pte_flags; + + /* grab the next page */ + m++; + if (m == segment_len) { + sg = sg_next(sg); + i++; + if (i == sg_len) + break; + + segment_len = sg_dma_len(sg) PAGE_SHIFT; + m = 0; + } + } + + kunmap_atomic(pt_vaddr); +
Re: [Intel-gfx] [PATCH 1/2] drm/i915: enable deepest RC6 state
On Tue, Nov 29, 2011 at 19:42, Ben Widawsky b...@bwidawsk.net wrote: On Tue, Nov 29, 2011 at 01:01:31PM +, Chris Wilson wrote: On Tue, 29 Nov 2011 10:55:04 -0200, Eugeni Dodonov eugeni.dodo...@intel.com wrote: This should allow even more energy saving when GPU is not in use. According to the testing, this state results in around 0.1 - 0.4 W better power usage. No issues or regressions observed so far, but additional testing is certainly welcome. The docs I saw said not implemented; do not use. Do we have it on good authority that this is safe and useful to enable? And doesn't it require programming of more transition thresholds? -Chris Yes, I think we do have to program more stuff to make this work. Perhaps the BIOS puts in decent values for these registers though? We would have to restore those on reset and resume I'd guess. If BIOS doesn't use anything there, you probably aren't even entering these states. Also, for posterity, there are 3 rc6 states, rc6, deep rc6, and deepest rc6. I think deepest rc6 was recommended to avoid (though I don't recall a specific root caused issue, just some data from Jesse, from the windows team that it didn't seem stable). And I think deepest rc6 is also referred to as rc7 sometimes. We already setup the variables for both deep and deepest rc6 in our driver (GEN6_RC6p_* and GEN6_RC6pp_*), but we weren't using this additional state previously - if I understood the documentation and the code correctly, we do enable plain rc6 and deep rc6 currently. I haven't found any indications which would tell to avoid it in the latest docs, and I also haven't seen any regressions or issues with it being enabled on any of the machines, so I thought it would be worth trying that additional state as well. From the testing which QA did for this patch, looks like we save between 0.1 and 0.4W when compared to what we had with i915_enable_rc6=1. But in any case, this is all highly experimental and I'll do more testing with it :). -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote: This just adds the setup and teardown code for the ppgtt PDE and the last-level pagetables, which are fixed for the entire lifetime, at least for the moment. v2: Kill the stray debug printk noted by and improve the pte definitions as suggested by Chris Wilson. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- {...} +#define I915_PPGTT_PD_ENTRIES 512 +#define I915_PPGTT_PT_ENTRIES 1024 Not to fond that we have this redefined as GEN6_PTES_PER_PD (which makes more sense as far as naming goes IMO) {...} +#define GEN6_PTES_PER_PD 1024 Ben ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/11] agp/intel-gtt: export the scratch page dma address
On Mon, Nov 28, 2011 at 09:35:28PM +0100, Daniel Vetter wrote: To implement a PPGTT for drm/i915 that fully aliases the GTT, we also need to properly alias the scratch page. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch Reviewed-by: Ben Widawsky b...@bwidawsk.net ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 02/11] agp/intel-gtt: export the gtt pagetable iomapping
On Mon, Nov 28, 2011 at 09:35:29PM +0100, Daniel Vetter wrote: We need this because ppgtt page directory entries need to be in the global gtt pagetable. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch I think it makes sense to export the size of the GTT as well, but I'm good either way. Reviewed-by: Ben Widawsky b...@bwidawsk.net ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 05/11] drm/i915: ppgtt register definitions
On Mon, Nov 28, 2011 at 09:35:32PM +0100, Daniel Vetter wrote: Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_reg.h | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index e9f5dc8..7227446 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -86,6 +86,13 @@ #define GEN6_MBC_SNPCR_LOW (221) #define GEN6_MBC_SNPCR_MIN (321) /* only 1/16th of the cache is shared */ +#define GEN6_MBCTL 0x0907c +#define GEN6_MBCTL_ENABLE_BOOT_FETCH (1 4) +#define GEN6_MBCTL_CTX_FETCH_NEEDED(1 3) +#define GEN6_MBCTL_BME_UPDATE_ENABLE (1 2) +#define GEN6_MBCTL_MAE_UPDATE_ENABLE (1 1) +#define GEN6_MBCTL_BOOT_FETCH_MECH (1 0) + Maybe grep fail, but I don't see these registers used later in the series. #define GEN6_GDRST 0x941c #define GEN6_GRDOM_FULL (1 0) #define GEN6_GRDOM_RENDER (1 1) @@ -110,6 +117,16 @@ #define GEN6_PTES_PER_PD 1024 +#define RING_PP_DIR_BASE(ring) ((ring)-mmio_base+0x228) +#define RING_PP_DIR_BASE_READ(ring) ((ring)-mmio_base+0x518) +#define RING_PP_DIR_DCLV(ring) ((ring)-mmio_base+0x220) +#define PP_DIR_DCLV_2G 0x + +#define GAM_ECOCHK 0x4090 +#define ECOCHK_SNB_BIT (110) +#define ECOCHK_PPGTT_CACHE64B (0x33) +#define ECOCHK_PPGTT_CACHE4B (0x03) + /* VGA stuff */ #define VGA_ST01_MDA 0x3ba @@ -422,6 +439,7 @@ #define GFX_MODE 0x02520 #define GFX_MODE_GEN70x0229c +#define RING_MODE_GEN7(ring) ((ring)-mmio_base+0x29c) #define GFX_RUN_LIST_ENABLE(115) #define GFX_TLB_INVALIDATE_ALWAYS (113) #define GFX_SURFACE_FAULT_ENABLE (112) Ben ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: remove the i915_batchbuffer_info debugfs file
On Wed, 30 Nov 2011 00:17:45 +0100, Daniel Vetter daniel.vet...@ffwll.ch wrote: With the error_state facility in place, this has outlived it's usefulness. It also oopses with the lates llc-reloc patches because it directly access objects through the gtt without any checks. This code has been useless for some time. Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 06/11] drm/i915: ppgtt debugfs info
On Mon, Nov 28, 2011 at 09:35:33PM +0100, Daniel Vetter wrote: Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch I am fine with the patch, but I don't yet understand how useful this is. Reviewed-by: Ben Widawsky b...@bwidawsk.net ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 08/11] drm/i915: enable ppgtt
On Mon, Nov 28, 2011 at 10:24:52PM +0100, Daniel Vetter wrote: v2: Don't try to enable ppgtt on pre-snb. Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_drv.c |2 ++ drivers/gpu/drm/i915/i915_drv.h |1 + drivers/gpu/drm/i915/i915_gem.c | 39 +++ 3 files changed, 42 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index f12b43e..c7cf881 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -698,6 +698,8 @@ int i915_reset(struct drm_device *dev, u8 flags) if (HAS_BLT(dev)) dev_priv-ring[BCS].init(dev_priv-ring[BCS]); + i915_gem_init_ppgtt(dev); + mutex_unlock(dev-struct_mutex); drm_irq_uninstall(dev); drm_mode_config_reset(dev); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5db3b84..0d228d2 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1228,6 +1228,7 @@ int __must_check i915_gem_object_set_domain(struct drm_i915_gem_object *obj, int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj); int __must_check i915_gem_init_hw(struct drm_device *dev); void i915_gem_init_swizzling(struct drm_device *dev); +void i915_gem_init_ppgtt(struct drm_device *dev); void i915_gem_cleanup_ringbuffer(struct drm_device *dev); void i915_gem_do_init(struct drm_device *dev, unsigned long start, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 593aa60..e1d7852 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3772,6 +3772,43 @@ void i915_gem_init_swizzling(struct drm_device *dev) DISP_TILE_SURFACE_SWIZZLING); } + +void i915_gem_init_ppgtt(struct drm_device *dev) +{ + drm_i915_private_t *dev_priv = dev-dev_private; + uint32_t pd_offset; + struct intel_ring_buffer *ring; + int i; + + if (!HAS_ALIASING_PPGTT(dev)) + return; + + pd_offset = dev_priv-mm.aliasing_ppgtt-pd_offset; + pd_offset /= 64; /* in cachelines, */ + pd_offset = 16; + + if (INTEL_INFO(dev)-gen == 6) { + uint32_t ecochk = I915_READ(GAM_ECOCHK); + I915_WRITE(GAM_ECOCHK, ecochk | ECOCHK_SNB_BIT | +ECOCHK_PPGTT_CACHE64B); + I915_WRITE(GFX_MODE, GFX_MODE_ENABLE(GFX_PPGTT_ENABLE)); + } else if (INTEL_INFO(dev)-gen = 7) { + I915_WRITE(GAM_ECOCHK, ECOCHK_PPGTT_CACHE64B); + /* GFX_MODE is per-ring on gen7+ */ + } + + for (i = 0; i I915_NUM_RINGS; i++) { + ring = dev_priv-ring[i]; + + if (INTEL_INFO(dev)-gen = 7) + I915_WRITE(RING_MODE_GEN7(ring), +GFX_MODE_ENABLE(GFX_PPGTT_ENABLE)); + + I915_WRITE(RING_PP_DIR_DCLV(ring), PP_DIR_DCLV_2G); + I915_WRITE(RING_PP_DIR_BASE(ring), pd_offset); + } +} + int i915_gem_init_hw(struct drm_device *dev) { @@ -3798,6 +3835,8 @@ i915_gem_init_hw(struct drm_device *dev) dev_priv-next_seqno = 1; + i915_gem_init_ppgtt(dev); + return 0; cleanup_bsd_ring: Maybe through some DRM_INFOs in here for later debugging to see if we failed in setting up the page tables? Also, does this work across resume and reset? Ben ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 00/11] aliasing ppgtt support v2
On Mon, Nov 28, 2011 at 18:35, Daniel Vetter daniel.vet...@ffwll.ch wrote: Hi all, Changes since the last submission: - fixed issues pointed out by Chris Wilson on irc. - fixed an oops on pre-snb, shame on me for that one. - added two new patches to only bind objects to the global gtt when required. - added a new patch so that userspace can find out whether ppgtt is on. This is required to use MI_STORE/LOAD commands correctly from userspace batchbuffers. Luckily no currently released userspace code depends on this, it's just prep work. Comments, test reports, reviews and flames highly welcome. For the series (with the latest patches on your fd.o repo) Tested-by: Eugeni Dodonov eugeni.dodo...@intel.com I left it running on 2 SNBs and 1 IVB for couple of hours under different set of workloads, and haven't seen any issues so far. Also, after the latest round of fixes with Chris and Ben's comments, I guess I can also give a Reviewed-by: Eugeni Dodonov eugeni.dodo...@intel.com for the series, with one small bike shed comment for Patch 7 which I'll reply there. -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 07/11] drm/i915: per-ring fault reg
On Mon, Nov 28, 2011 at 18:35, Daniel Vetter daniel.vet...@ffwll.ch wrote: +#define ERROR 0x40a0 You do not seem to use this, and it is the same value as ERROR_GEN6. Do we need it? Or I misread something? -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: remove the i915_batchbuffer_info debugfs file
On Tue, Nov 29, 2011 at 21:17, Daniel Vetter daniel.vet...@ffwll.ch wrote: With the error_state facility in place, this has outlived it's usefulness. It also oopses with the lates llc-reloc patches because it directly access objects through the gtt without any checks. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch Yep, I think that error_state does the job just fine now. Reviewed-by: Eugeni Dodonov eugeni.dodo...@intel.com -- Eugeni Dodonov http://eugeni.dodonov.net/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Is MI_FLUSH_ENABLE bit 12?
On Mon, 28 Nov 2011 18:48:04 -0800, Keith Packard kei...@keithp.com wrote: Non-text part: multipart/mixed Non-text part: multipart/signed Just reading through vol1c.4 of the bspec this evening and found something odd. Bit 11 of MI_MODE is Invalidate UHPTR enable. Bit 12 of MI_MODE is MI_FLUSH Enable And, yet, in i915_reg.h: #define MI_MODE 0x0209c # define VS_TIMER_DISPATCH(1 6) # define MI_FLUSH_ENABLE (1 11) Are we off-by-one on MI_FLUSH_ENABLE? Seems like this would cause serious problems... I think we are. On the other hand, based on actual behavior plus reading of simulator, I believe that the bit does nothing, regardless. pgp8HsuP4gkZG.pgp Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Is MI_FLUSH_ENABLE bit 12?
On Tue, Nov 29, 2011 at 04:47:57PM -0800, Eric Anholt wrote: On Mon, 28 Nov 2011 18:48:04 -0800, Keith Packard kei...@keithp.com wrote: Non-text part: multipart/mixed Non-text part: multipart/signed Just reading through vol1c.4 of the bspec this evening and found something odd. Bit 11 of MI_MODE is Invalidate UHPTR enable. Bit 12 of MI_MODE is MI_FLUSH Enable And, yet, in i915_reg.h: #define MI_MODE 0x0209c # define VS_TIMER_DISPATCH (1 6) # define MI_FLUSH_ENABLE(1 11) Are we off-by-one on MI_FLUSH_ENABLE? Seems like this would cause serious problems... I think we are. On the other hand, based on actual behavior plus reading of simulator, I believe that the bit does nothing, regardless. I do not think so. We've (Chris, I, and perhaps Jesse?) been through this excercise at least twice before, and both times resulted in hangs when we switched to bit 12 on Ironlake, not sure about other platforms. Ben pgpyMIhTaEJkB.pgp Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: remove the i915_batchbuffer_info debugfs file
On Wed, Nov 30, 2011 at 12:17:45AM +0100, Daniel Vetter wrote: With the error_state facility in place, this has outlived it's usefulness. It also oopses with the lates llc-reloc patches because it directly access objects through the gtt without any checks. Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_debugfs.c | 40 --- 1 files changed, 0 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 919f072..84c7295 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -562,45 +562,6 @@ static int i915_hws_info(struct seq_file *m, void *data) return 0; } -static void i915_dump_object(struct seq_file *m, - struct io_mapping *mapping, - struct drm_i915_gem_object *obj) -{ - int page, page_count, i; - - page_count = obj-base.size / PAGE_SIZE; - for (page = 0; page page_count; page++) { - u32 *mem = io_mapping_map_wc(mapping, - obj-gtt_offset + page * PAGE_SIZE); - for (i = 0; i PAGE_SIZE; i += 4) - seq_printf(m, %08x : %08x\n, i, mem[i / 4]); - io_mapping_unmap(mem); - } -} For some time I've wanted to turn this into a generic bo dumping debugfs entry. I'd be sorry to see it go, but we can always resurrect it later. The rest I don't care about. - -static int i915_batchbuffer_info(struct seq_file *m, void *data) -{ - struct drm_info_node *node = (struct drm_info_node *) m-private; - struct drm_device *dev = node-minor-dev; - drm_i915_private_t *dev_priv = dev-dev_private; - struct drm_i915_gem_object *obj; - int ret; - - ret = mutex_lock_interruptible(dev-struct_mutex); - if (ret) - return ret; - - list_for_each_entry(obj, dev_priv-mm.active_list, mm_list) { - if (obj-base.read_domains I915_GEM_DOMAIN_COMMAND) { - seq_printf(m, --- gtt_offset = 0x%08x\n, obj-gtt_offset); - i915_dump_object(m, dev_priv-mm.gtt_mapping, obj); - } - } - - mutex_unlock(dev-struct_mutex); - return 0; -} - static int i915_ringbuffer_data(struct seq_file *m, void *data) { struct drm_info_node *node = (struct drm_info_node *) m-private; @@ -1823,7 +1784,6 @@ static struct drm_info_list i915_debugfs_list[] = { {i915_bsd_ringbuffer_info, i915_ringbuffer_info, 0, (void *)VCS}, {i915_blt_ringbuffer_data, i915_ringbuffer_data, 0, (void *)BCS}, {i915_blt_ringbuffer_info, i915_ringbuffer_info, 0, (void *)BCS}, - {i915_batchbuffers, i915_batchbuffer_info, 0}, {i915_error_state, i915_error_state, 0}, {i915_rstdby_delays, i915_rstdby_delays, 0}, {i915_cur_delayinfo, i915_cur_delayinfo, 0}, -- 1.7.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 05/11] drm/i915: ppgtt register definitions
On Tue, Nov 29, 2011 at 03:57:41PM -0800, Ben Widawsky wrote: On Mon, Nov 28, 2011 at 09:35:32PM +0100, Daniel Vetter wrote: Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/i915/i915_reg.h | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index e9f5dc8..7227446 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -86,6 +86,13 @@ #define GEN6_MBC_SNPCR_LOW (221) #define GEN6_MBC_SNPCR_MIN (321) /* only 1/16th of the cache is shared */ +#define GEN6_MBCTL 0x0907c +#define GEN6_MBCTL_ENABLE_BOOT_FETCH (1 4) +#define GEN6_MBCTL_CTX_FETCH_NEEDED (1 3) +#define GEN6_MBCTL_BME_UPDATE_ENABLE (1 2) +#define GEN6_MBCTL_MAE_UPDATE_ENABLE (1 1) +#define GEN6_MBCTL_BOOT_FETCH_MECH (1 0) + Maybe grep fail, but I don't see these registers used later in the series. They're not used. But both public snb docs and Bspec contain notices that you need to frob the boot mode enable bit in here, which afaics doesn't exist under that exact name. Also, frobbing any of these seems to just reduce the expected lifetime of my gpus. But in case anyone wants to try things out, it's imo good to include the definitions (especially since public Docs don't explain anything about this reg than that warning about ppgtt). -Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx