Re: [Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
Quoting Tvrtko Ursulin (2018-07-20 16:13:14) > > On 20/07/2018 11:19, Chris Wilson wrote: > > +/* > > + * Once upon a time we supposed that writes through the GGTT would be > > + * immediately in physical memory (once flushed out of the CPU path). > > However, > > + * on a few different processors and chipsets, this is not necessarily the > > case > > + * as the writes appear to be buffered internally. Thus a read of the > > backing > > + * storage (physical memory) via a different path (with different physical > > tags > > + * to the indirect write via the GGTT) will see stale values from before > > + * the GGTT write. Inside the kernel, we can for the most part keep track > > of > > + * the different read/write domains in use (e.g. set-domain), but the > > assumption > > + * of coherency is baked into the ABI, hence reporting its true state in > > this > > + * parameter. > > + * > > + * If set to true, writes via mmap_gtt are immediately visible following an > > + * lfence to flush the WCB. > > + * > > + * If set to false, writes via mmap_gtt are indeterminatnly delayed in an > > in > > + * internal buffer and are _not_ immediately visible to third parties > > accessing > > + * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC > > + * communications channel when set to false is strongly disadvised. > > I would perhaps change the language from "set to true/false" to > "reported as true/false". s/set/reported/ and pushed. Thanks for the reviews, -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
Quoting Tvrtko Ursulin (2018-07-20 16:27:20) > > On 20/07/2018 16:21, Chris Wilson wrote: > > My only plan is for igt to know when the tests are expected to fail. > > Real userspace should not be using GTT, it is slow (even slower than > > intended ;) and constrained, so subject to aperture thrashing, fencing > > is only usable for two out of the many tiling modes, etc, etc, etc. > > I'll take the view that get_param is tiny enough for this to be fine. > > Reviewed-by: Tvrtko Ursulin Also take a peek over at https://patchwork.freedesktop.org/patch/239949/ -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
On 20/07/2018 16:21, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-07-20 16:13:14) On 20/07/2018 11:19, Chris Wilson wrote: Not all chipsets have an internal buffer delaying the visibility of writes via the GGTT being visible by other physical paths, but we use a very heavy workaround for all. We only need to apply that workarounds to the chipsets we know suffer from the delay and the resulting coherency issue. Similarly, the same inconsistent coherency fouls up our ABI promise that a write into a mmap_gtt is immediately visible to others. Since the HW has made that a lie, let userspace know when that contract is broken. (Not that userspace would want to use mmap_gtt on those chipsets for other performance reasons...) Testcase: igt/drv_selftest/live_coherency Testcase: igt/gem_mmap_gtt/coherency The list of platforms to mark with false was compiled from the test results? Yes. Coverage of older chipsets is less than ideal, as we have fewer of them being tested and the gmch wasn't tied to any processor. So whereas my PIIIm + i915gm passes, CI's P4 + i915g fails. That is odd. Hope the oddity is not hiding something but we'll see. But then before this patch workaround was applied everywhere - so if the test was failing even then - that means workaround wasn't sufficient? The test is being run in userspace; bypassing any domain control in the kernel, assuming the coherency model built into the ABI. Ah yes. +/* + * Once upon a time we supposed that writes through the GGTT would be + * immediately in physical memory (once flushed out of the CPU path). However, + * on a few different processors and chipsets, this is not necessarily the case + * as the writes appear to be buffered internally. Thus a read of the backing + * storage (physical memory) via a different path (with different physical tags + * to the indirect write via the GGTT) will see stale values from before + * the GGTT write. Inside the kernel, we can for the most part keep track of + * the different read/write domains in use (e.g. set-domain), but the assumption + * of coherency is baked into the ABI, hence reporting its true state in this + * parameter. + * + * If set to true, writes via mmap_gtt are immediately visible following an + * lfence to flush the WCB. + * + * If set to false, writes via mmap_gtt are indeterminatnly delayed in an in + * internal buffer and are _not_ immediately visible to third parties accessing + * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC + * communications channel when set to false is strongly disadvised. I would perhaps change the language from "set to true/false" to "reported as true/false". + */ +#define I915_PARAM_MMAP_GTT_COHERENT 52 + typedef struct drm_i915_getparam { __s32 param; /* Looks fine to me in principle. Is there some userspace ready to start consuming it? My only plan is for igt to know when the tests are expected to fail. Real userspace should not be using GTT, it is slow (even slower than intended ;) and constrained, so subject to aperture thrashing, fencing is only usable for two out of the many tiling modes, etc, etc, etc. I'll take the view that get_param is tiny enough for this to be fine. Reviewed-by: Tvrtko Ursulin Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
Quoting Tvrtko Ursulin (2018-07-20 16:13:14) > > On 20/07/2018 11:19, Chris Wilson wrote: > > Not all chipsets have an internal buffer delaying the visibility of > > writes via the GGTT being visible by other physical paths, but we use a > > very heavy workaround for all. We only need to apply that workarounds to > > the chipsets we know suffer from the delay and the resulting coherency > > issue. > > > > Similarly, the same inconsistent coherency fouls up our ABI promise that > > a write into a mmap_gtt is immediately visible to others. Since the HW > > has made that a lie, let userspace know when that contract is broken. > > (Not that userspace would want to use mmap_gtt on those chipsets for > > other performance reasons...) > > > > Testcase: igt/drv_selftest/live_coherency > > Testcase: igt/gem_mmap_gtt/coherency > > The list of platforms to mark with false was compiled from the test results? Yes. Coverage of older chipsets is less than ideal, as we have fewer of them being tested and the gmch wasn't tied to any processor. So whereas my PIIIm + i915gm passes, CI's P4 + i915g fails. That is odd. > But then before this patch workaround was applied everywhere - so if the > test was failing even then - that means workaround wasn't sufficient? The test is being run in userspace; bypassing any domain control in the kernel, assuming the coherency model built into the ABI. > > +/* > > + * Once upon a time we supposed that writes through the GGTT would be > > + * immediately in physical memory (once flushed out of the CPU path). > > However, > > + * on a few different processors and chipsets, this is not necessarily the > > case > > + * as the writes appear to be buffered internally. Thus a read of the > > backing > > + * storage (physical memory) via a different path (with different physical > > tags > > + * to the indirect write via the GGTT) will see stale values from before > > + * the GGTT write. Inside the kernel, we can for the most part keep track > > of > > + * the different read/write domains in use (e.g. set-domain), but the > > assumption > > + * of coherency is baked into the ABI, hence reporting its true state in > > this > > + * parameter. > > + * > > + * If set to true, writes via mmap_gtt are immediately visible following an > > + * lfence to flush the WCB. > > + * > > + * If set to false, writes via mmap_gtt are indeterminatnly delayed in an > > in > > + * internal buffer and are _not_ immediately visible to third parties > > accessing > > + * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC > > + * communications channel when set to false is strongly disadvised. > > I would perhaps change the language from "set to true/false" to > "reported as true/false". > > > + */ > > +#define I915_PARAM_MMAP_GTT_COHERENT 52 > > + > > typedef struct drm_i915_getparam { > > __s32 param; > > /* > > > > Looks fine to me in principle. Is there some userspace ready to start > consuming it? My only plan is for igt to know when the tests are expected to fail. Real userspace should not be using GTT, it is slow (even slower than intended ;) and constrained, so subject to aperture thrashing, fencing is only usable for two out of the many tiling modes, etc, etc, etc. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
On 20/07/2018 11:19, Chris Wilson wrote: Not all chipsets have an internal buffer delaying the visibility of writes via the GGTT being visible by other physical paths, but we use a very heavy workaround for all. We only need to apply that workarounds to the chipsets we know suffer from the delay and the resulting coherency issue. Similarly, the same inconsistent coherency fouls up our ABI promise that a write into a mmap_gtt is immediately visible to others. Since the HW has made that a lie, let userspace know when that contract is broken. (Not that userspace would want to use mmap_gtt on those chipsets for other performance reasons...) Testcase: igt/drv_selftest/live_coherency Testcase: igt/gem_mmap_gtt/coherency The list of platforms to mark with false was compiled from the test results? But then before this patch workaround was applied everywhere - so if the test was failing even then - that means workaround wasn't sufficient? Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen --- Resend with --function-context -Chris --- drivers/gpu/drm/i915/i915_drv.c | 3 +++ drivers/gpu/drm/i915/i915_gem.c | 5 + drivers/gpu/drm/i915/i915_pci.c | 10 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + include/uapi/drm/i915_drm.h | 22 ++ 5 files changed, 41 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index f8cfd16be534..18a45e7a3d7c 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -302,152 +302,155 @@ static void intel_detect_pch(struct drm_i915_private *dev_priv) static int i915_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_i915_private *dev_priv = to_i915(dev); struct pci_dev *pdev = dev_priv->drm.pdev; drm_i915_getparam_t *param = data; int value; switch (param->param) { case I915_PARAM_IRQ_ACTIVE: case I915_PARAM_ALLOW_BATCHBUFFER: case I915_PARAM_LAST_DISPATCH: case I915_PARAM_HAS_EXEC_CONSTANTS: /* Reject all old ums/dri params. */ return -ENODEV; case I915_PARAM_CHIPSET_ID: value = pdev->device; break; case I915_PARAM_REVISION: value = pdev->revision; break; case I915_PARAM_NUM_FENCES_AVAIL: value = dev_priv->num_fence_regs; break; case I915_PARAM_HAS_OVERLAY: value = dev_priv->overlay ? 1 : 0; break; case I915_PARAM_HAS_BSD: value = !!dev_priv->engine[VCS]; break; case I915_PARAM_HAS_BLT: value = !!dev_priv->engine[BCS]; break; case I915_PARAM_HAS_VEBOX: value = !!dev_priv->engine[VECS]; break; case I915_PARAM_HAS_BSD2: value = !!dev_priv->engine[VCS2]; break; case I915_PARAM_HAS_LLC: value = HAS_LLC(dev_priv); break; case I915_PARAM_HAS_WT: value = HAS_WT(dev_priv); break; case I915_PARAM_HAS_ALIASING_PPGTT: value = USES_PPGTT(dev_priv); break; case I915_PARAM_HAS_SEMAPHORES: value = HAS_LEGACY_SEMAPHORES(dev_priv); break; case I915_PARAM_HAS_SECURE_BATCHES: value = capable(CAP_SYS_ADMIN); break; case I915_PARAM_CMD_PARSER_VERSION: value = i915_cmd_parser_get_version(dev_priv); break; case I915_PARAM_SUBSLICE_TOTAL: value = sseu_subslice_total(&INTEL_INFO(dev_priv)->sseu); if (!value) return -ENODEV; break; case I915_PARAM_EU_TOTAL: value = INTEL_INFO(dev_priv)->sseu.eu_total; if (!value) return -ENODEV; break; case I915_PARAM_HAS_GPU_RESET: value = i915_modparams.enable_hangcheck && intel_has_gpu_reset(dev_priv); if (value && intel_has_reset_engine(dev_priv)) value = 2; break; case I915_PARAM_HAS_RESOURCE_STREAMER: value = HAS_RESOURCE_STREAMER(dev_priv); break; case I915_PARAM_HAS_POOLED_EU: value = HAS_POOLED_EU(dev_priv); break; case I915_PARAM_MIN_EU_IN_POOL: value = INTEL_INFO(dev_priv)->sseu.min_eu_in_pool; break; case I915_PARAM_HUC_STATUS: value = intel_huc_check_status(&dev_priv->huc); if (value < 0)
Re: [Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
Quoting Lis, Tomasz (2018-07-20 15:41:33) > > > On 2018-07-20 12:19, Chris Wilson wrote: > > Not all chipsets have an internal buffer delaying the visibility of > > writes via the GGTT being visible by other physical paths, but we use a > > very heavy workaround for all. We only need to apply that workarounds to > > the chipsets we know suffer from the delay and the resulting coherency > > issue. > > > > Similarly, the same inconsistent coherency fouls up our ABI promise that > > a write into a mmap_gtt is immediately visible to others. Since the HW > > has made that a lie, let userspace know when that contract is broken. > > (Not that userspace would want to use mmap_gtt on those chipsets for > > other performance reasons...) > > > > Testcase: igt/drv_selftest/live_coherency > > Testcase: igt/gem_mmap_gtt/coherency > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587 > > Signed-off-by: Chris Wilson > > Cc: Joonas Lahtinen > Is there any mention of this bug/feature in bspec? Would be nice to have > a reference. I know that some HW bod has come across it, as they recommended the same for ordering mem access between us using the GTT for iomaps and GuC submission. All I know is the background chatter... We don't seem to have a named w/a; iirc it was just concluded that was the way it was meant to work ;) -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
On 2018-07-20 12:19, Chris Wilson wrote: Not all chipsets have an internal buffer delaying the visibility of writes via the GGTT being visible by other physical paths, but we use a very heavy workaround for all. We only need to apply that workarounds to the chipsets we know suffer from the delay and the resulting coherency issue. Similarly, the same inconsistent coherency fouls up our ABI promise that a write into a mmap_gtt is immediately visible to others. Since the HW has made that a lie, let userspace know when that contract is broken. (Not that userspace would want to use mmap_gtt on those chipsets for other performance reasons...) Testcase: igt/drv_selftest/live_coherency Testcase: igt/gem_mmap_gtt/coherency Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Is there any mention of this bug/feature in bspec? Would be nice to have a reference. Reviewed-by: Tomasz Lis --- Resend with --function-context -Chris --- drivers/gpu/drm/i915/i915_drv.c | 3 +++ drivers/gpu/drm/i915/i915_gem.c | 5 + drivers/gpu/drm/i915/i915_pci.c | 10 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + include/uapi/drm/i915_drm.h | 22 ++ 5 files changed, 41 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index f8cfd16be534..18a45e7a3d7c 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -302,152 +302,155 @@ static void intel_detect_pch(struct drm_i915_private *dev_priv) static int i915_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_i915_private *dev_priv = to_i915(dev); struct pci_dev *pdev = dev_priv->drm.pdev; drm_i915_getparam_t *param = data; int value; switch (param->param) { case I915_PARAM_IRQ_ACTIVE: case I915_PARAM_ALLOW_BATCHBUFFER: case I915_PARAM_LAST_DISPATCH: case I915_PARAM_HAS_EXEC_CONSTANTS: /* Reject all old ums/dri params. */ return -ENODEV; case I915_PARAM_CHIPSET_ID: value = pdev->device; break; case I915_PARAM_REVISION: value = pdev->revision; break; case I915_PARAM_NUM_FENCES_AVAIL: value = dev_priv->num_fence_regs; break; case I915_PARAM_HAS_OVERLAY: value = dev_priv->overlay ? 1 : 0; break; case I915_PARAM_HAS_BSD: value = !!dev_priv->engine[VCS]; break; case I915_PARAM_HAS_BLT: value = !!dev_priv->engine[BCS]; break; case I915_PARAM_HAS_VEBOX: value = !!dev_priv->engine[VECS]; break; case I915_PARAM_HAS_BSD2: value = !!dev_priv->engine[VCS2]; break; case I915_PARAM_HAS_LLC: value = HAS_LLC(dev_priv); break; case I915_PARAM_HAS_WT: value = HAS_WT(dev_priv); break; case I915_PARAM_HAS_ALIASING_PPGTT: value = USES_PPGTT(dev_priv); break; case I915_PARAM_HAS_SEMAPHORES: value = HAS_LEGACY_SEMAPHORES(dev_priv); break; case I915_PARAM_HAS_SECURE_BATCHES: value = capable(CAP_SYS_ADMIN); break; case I915_PARAM_CMD_PARSER_VERSION: value = i915_cmd_parser_get_version(dev_priv); break; case I915_PARAM_SUBSLICE_TOTAL: value = sseu_subslice_total(&INTEL_INFO(dev_priv)->sseu); if (!value) return -ENODEV; break; case I915_PARAM_EU_TOTAL: value = INTEL_INFO(dev_priv)->sseu.eu_total; if (!value) return -ENODEV; break; case I915_PARAM_HAS_GPU_RESET: value = i915_modparams.enable_hangcheck && intel_has_gpu_reset(dev_priv); if (value && intel_has_reset_engine(dev_priv)) value = 2; break; case I915_PARAM_HAS_RESOURCE_STREAMER: value = HAS_RESOURCE_STREAMER(dev_priv); break; case I915_PARAM_HAS_POOLED_EU: value = HAS_POOLED_EU(dev_priv); break; case I915_PARAM_MIN_EU_IN_POOL: value = INTEL_INFO(dev_priv)->sseu.min_eu_in_pool; break; case I915_PARAM_HUC_STATUS: value = intel_huc_check_status(&dev_priv->huc); if (value < 0) return value; break; case I915_PARAM_MMAP_GTT_VERSION: /* Tho
[Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
Not all chipsets have an internal buffer delaying the visibility of writes via the GGTT being visible by other physical paths, but we use a very heavy workaround for all. We only need to apply that workarounds to the chipsets we know suffer from the delay and the resulting coherency issue. Similarly, the same inconsistent coherency fouls up our ABI promise that a write into a mmap_gtt is immediately visible to others. Since the HW has made that a lie, let userspace know when that contract is broken. (Not that userspace would want to use mmap_gtt on those chipsets for other performance reasons...) Testcase: igt/drv_selftest/live_coherency Testcase: igt/gem_mmap_gtt/coherency Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen --- Resend with --function-context -Chris --- drivers/gpu/drm/i915/i915_drv.c | 3 +++ drivers/gpu/drm/i915/i915_gem.c | 5 + drivers/gpu/drm/i915/i915_pci.c | 10 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + include/uapi/drm/i915_drm.h | 22 ++ 5 files changed, 41 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index f8cfd16be534..18a45e7a3d7c 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -302,152 +302,155 @@ static void intel_detect_pch(struct drm_i915_private *dev_priv) static int i915_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_i915_private *dev_priv = to_i915(dev); struct pci_dev *pdev = dev_priv->drm.pdev; drm_i915_getparam_t *param = data; int value; switch (param->param) { case I915_PARAM_IRQ_ACTIVE: case I915_PARAM_ALLOW_BATCHBUFFER: case I915_PARAM_LAST_DISPATCH: case I915_PARAM_HAS_EXEC_CONSTANTS: /* Reject all old ums/dri params. */ return -ENODEV; case I915_PARAM_CHIPSET_ID: value = pdev->device; break; case I915_PARAM_REVISION: value = pdev->revision; break; case I915_PARAM_NUM_FENCES_AVAIL: value = dev_priv->num_fence_regs; break; case I915_PARAM_HAS_OVERLAY: value = dev_priv->overlay ? 1 : 0; break; case I915_PARAM_HAS_BSD: value = !!dev_priv->engine[VCS]; break; case I915_PARAM_HAS_BLT: value = !!dev_priv->engine[BCS]; break; case I915_PARAM_HAS_VEBOX: value = !!dev_priv->engine[VECS]; break; case I915_PARAM_HAS_BSD2: value = !!dev_priv->engine[VCS2]; break; case I915_PARAM_HAS_LLC: value = HAS_LLC(dev_priv); break; case I915_PARAM_HAS_WT: value = HAS_WT(dev_priv); break; case I915_PARAM_HAS_ALIASING_PPGTT: value = USES_PPGTT(dev_priv); break; case I915_PARAM_HAS_SEMAPHORES: value = HAS_LEGACY_SEMAPHORES(dev_priv); break; case I915_PARAM_HAS_SECURE_BATCHES: value = capable(CAP_SYS_ADMIN); break; case I915_PARAM_CMD_PARSER_VERSION: value = i915_cmd_parser_get_version(dev_priv); break; case I915_PARAM_SUBSLICE_TOTAL: value = sseu_subslice_total(&INTEL_INFO(dev_priv)->sseu); if (!value) return -ENODEV; break; case I915_PARAM_EU_TOTAL: value = INTEL_INFO(dev_priv)->sseu.eu_total; if (!value) return -ENODEV; break; case I915_PARAM_HAS_GPU_RESET: value = i915_modparams.enable_hangcheck && intel_has_gpu_reset(dev_priv); if (value && intel_has_reset_engine(dev_priv)) value = 2; break; case I915_PARAM_HAS_RESOURCE_STREAMER: value = HAS_RESOURCE_STREAMER(dev_priv); break; case I915_PARAM_HAS_POOLED_EU: value = HAS_POOLED_EU(dev_priv); break; case I915_PARAM_MIN_EU_IN_POOL: value = INTEL_INFO(dev_priv)->sseu.min_eu_in_pool; break; case I915_PARAM_HUC_STATUS: value = intel_huc_check_status(&dev_priv->huc); if (value < 0) return value; break; case I915_PARAM_MMAP_GTT_VERSION: /* Though we've started our numbering from 1, and so class all * earlier versions as 0, in effect their value is undefined as * the ioctl
Re: [Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
On Fri, Jul 20, 2018 at 08:59:31AM +0100, Chris Wilson wrote: > Not all chipsets have an internal buffer delaying the visibility of > writes via the GGTT being visible by other physical paths, but we use a > very heavy workaround for all. We only need to apply that workarounds to > the chipsets we know suffer from the delay and the resulting coherency > issue. > > Similarly, the same inconsistent coherency fouls up our ABI promise that > a write into a mmap_gtt is immediately visible to others. Since the HW > has made that a lie, let userspace know when that contract is broken. > (Not that userspace would want to use mmap_gtt on those chipsets for > other performance reasons...) > > Testcase: igt/drv_selftest/live_coherency > Testcase: igt/gem_mmap_gtt/coherency > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587 > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > --- > drivers/gpu/drm/i915/i915_drv.c | 3 +++ > drivers/gpu/drm/i915/i915_gem.c | 5 + > drivers/gpu/drm/i915/i915_pci.c | 10 ++ > drivers/gpu/drm/i915/intel_device_info.h | 1 + > include/uapi/drm/i915_drm.h | 22 ++ > 5 files changed, 41 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index f8cfd16be534..18a45e7a3d7c 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -441,6 +441,9 @@ static int i915_getparam_ioctl(struct drm_device *dev, > void *data, > case I915_PARAM_CS_TIMESTAMP_FREQUENCY: > value = 1000 * INTEL_INFO(dev_priv)->cs_timestamp_frequency_khz; > break; > + case I915_PARAM_MMAP_GTT_COHERENT: > + value = INTEL_INFO(dev_priv)->has_coherent_ggtt; > + break; > default: > DRM_DEBUG("Unknown parameter %d\n", param->param); > return -EINVAL; > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index fcc73a6ab503..8b52cb768a67 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -802,6 +802,11 @@ void i915_gem_flush_ggtt_writes(struct drm_i915_private > *dev_priv) >* that was!). >*/ > > + wmb(); > + > + if (INTEL_INFO(dev_priv)->has_coherent_ggtt) > + return; > + > i915_gem_chipset_flush(dev_priv); > > intel_runtime_pm_get(dev_priv); > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index 6a4d1388ad2d..e443fe44da3a 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -74,6 +74,7 @@ ? These macros make me sad sometimes. Increase the context size a bit for this one? > .unfenced_needs_alignment = 1, \ > .ring_mask = RENDER_RING, \ > .has_snoop = true, \ > + .has_coherent_ggtt = false, \ > GEN_DEFAULT_PIPEOFFSETS, \ > GEN_DEFAULT_PAGE_SIZES, \ > CURSOR_OFFSETS > @@ -110,6 +111,7 @@ static const struct intel_device_info intel_i865g_info = { > .has_gmch_display = 1, \ > .ring_mask = RENDER_RING, \ > .has_snoop = true, \ > + .has_coherent_ggtt = true, \ > GEN_DEFAULT_PIPEOFFSETS, \ > GEN_DEFAULT_PAGE_SIZES, \ > CURSOR_OFFSETS > @@ -117,6 +119,7 @@ static const struct intel_device_info intel_i865g_info = { > static const struct intel_device_info intel_i915g_info = { > GEN3_FEATURES, > PLATFORM(INTEL_I915G), > + .has_coherent_ggtt = false, > .cursor_needs_physical = 1, > .has_overlay = 1, .overlay_needs_physical = 1, > .hws_needs_physical = 1, > @@ -178,6 +181,7 @@ static const struct intel_device_info intel_pineview_info > = { > .has_gmch_display = 1, \ > .ring_mask = RENDER_RING, \ > .has_snoop = true, \ > + .has_coherent_ggtt = true, \ > GEN_DEFAULT_PIPEOFFSETS, \ > GEN_DEFAULT_PAGE_SIZES, \ > CURSOR_OFFSETS > @@ -220,6 +224,7 @@ static const struct intel_device_info intel_gm45_info = { > .has_hotplug = 1, \ > .ring_mask = RENDER_RING | BSD_RING, \ > .has_snoop = true, \ > + .has_coherent_ggtt = true, \ > /* ilk does support rc6, but we do not implement [power] contexts */ \ > .has_rc6 = 0, \ > GEN_DEFAULT_PIPEOFFSETS, \ > @@ -243,6 +248,7 @@ static const struct intel_device_info > intel_ironlake_m_info = { > .has_hotplug = 1, \ > .has_fbc = 1, \ > .ring_mask = RENDER_RING | BSD_RING | BLT_RING, \ > + .has_coherent_ggtt = true, \ > .has_llc = 1, \ > .has_rc6 = 1, \ > .has_rc6p = 1, \ > @@ -287,6 +293,7 @@ static const struct intel_device_info > intel_sandybridge_m_gt2_info = { > .has_hotplug = 1, \ > .has_fbc = 1, \ > .ring_mask = RENDER_RING | BSD_RING | BLT_RING, \ > + .has_coherent_ggtt = true, \ > .has_llc = 1, \ > .has_rc6 = 1, \ > .has_rc6p = 1, \ > @@ -347,6 +354,7 @@ static const struct inte
[Intel-gfx] [PATCH] drm/i915: Only force GGTT coherency w/a on required chipsets
Not all chipsets have an internal buffer delaying the visibility of writes via the GGTT being visible by other physical paths, but we use a very heavy workaround for all. We only need to apply that workarounds to the chipsets we know suffer from the delay and the resulting coherency issue. Similarly, the same inconsistent coherency fouls up our ABI promise that a write into a mmap_gtt is immediately visible to others. Since the HW has made that a lie, let userspace know when that contract is broken. (Not that userspace would want to use mmap_gtt on those chipsets for other performance reasons...) Testcase: igt/drv_selftest/live_coherency Testcase: igt/gem_mmap_gtt/coherency Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_drv.c | 3 +++ drivers/gpu/drm/i915/i915_gem.c | 5 + drivers/gpu/drm/i915/i915_pci.c | 10 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + include/uapi/drm/i915_drm.h | 22 ++ 5 files changed, 41 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index f8cfd16be534..18a45e7a3d7c 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -441,6 +441,9 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_CS_TIMESTAMP_FREQUENCY: value = 1000 * INTEL_INFO(dev_priv)->cs_timestamp_frequency_khz; break; + case I915_PARAM_MMAP_GTT_COHERENT: + value = INTEL_INFO(dev_priv)->has_coherent_ggtt; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fcc73a6ab503..8b52cb768a67 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -802,6 +802,11 @@ void i915_gem_flush_ggtt_writes(struct drm_i915_private *dev_priv) * that was!). */ + wmb(); + + if (INTEL_INFO(dev_priv)->has_coherent_ggtt) + return; + i915_gem_chipset_flush(dev_priv); intel_runtime_pm_get(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 6a4d1388ad2d..e443fe44da3a 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -74,6 +74,7 @@ .unfenced_needs_alignment = 1, \ .ring_mask = RENDER_RING, \ .has_snoop = true, \ + .has_coherent_ggtt = false, \ GEN_DEFAULT_PIPEOFFSETS, \ GEN_DEFAULT_PAGE_SIZES, \ CURSOR_OFFSETS @@ -110,6 +111,7 @@ static const struct intel_device_info intel_i865g_info = { .has_gmch_display = 1, \ .ring_mask = RENDER_RING, \ .has_snoop = true, \ + .has_coherent_ggtt = true, \ GEN_DEFAULT_PIPEOFFSETS, \ GEN_DEFAULT_PAGE_SIZES, \ CURSOR_OFFSETS @@ -117,6 +119,7 @@ static const struct intel_device_info intel_i865g_info = { static const struct intel_device_info intel_i915g_info = { GEN3_FEATURES, PLATFORM(INTEL_I915G), + .has_coherent_ggtt = false, .cursor_needs_physical = 1, .has_overlay = 1, .overlay_needs_physical = 1, .hws_needs_physical = 1, @@ -178,6 +181,7 @@ static const struct intel_device_info intel_pineview_info = { .has_gmch_display = 1, \ .ring_mask = RENDER_RING, \ .has_snoop = true, \ + .has_coherent_ggtt = true, \ GEN_DEFAULT_PIPEOFFSETS, \ GEN_DEFAULT_PAGE_SIZES, \ CURSOR_OFFSETS @@ -220,6 +224,7 @@ static const struct intel_device_info intel_gm45_info = { .has_hotplug = 1, \ .ring_mask = RENDER_RING | BSD_RING, \ .has_snoop = true, \ + .has_coherent_ggtt = true, \ /* ilk does support rc6, but we do not implement [power] contexts */ \ .has_rc6 = 0, \ GEN_DEFAULT_PIPEOFFSETS, \ @@ -243,6 +248,7 @@ static const struct intel_device_info intel_ironlake_m_info = { .has_hotplug = 1, \ .has_fbc = 1, \ .ring_mask = RENDER_RING | BSD_RING | BLT_RING, \ + .has_coherent_ggtt = true, \ .has_llc = 1, \ .has_rc6 = 1, \ .has_rc6p = 1, \ @@ -287,6 +293,7 @@ static const struct intel_device_info intel_sandybridge_m_gt2_info = { .has_hotplug = 1, \ .has_fbc = 1, \ .ring_mask = RENDER_RING | BSD_RING | BLT_RING, \ + .has_coherent_ggtt = true, \ .has_llc = 1, \ .has_rc6 = 1, \ .has_rc6p = 1, \ @@ -347,6 +354,7 @@ static const struct intel_device_info intel_valleyview_info = { .has_aliasing_ppgtt = 1, .has_full_ppgtt = 1, .has_snoop = true, + .has_coherent_ggtt = false, .ring_mask = RENDER_RING | BSD_RING | BLT_RING, .display_mmio_offset = VLV_DISPL