Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
On Wed, Aug 30, 2017 at 04:13:41PM +0100, Chris Wilson wrote: > Quoting Daniel Vetter (2017-08-30 14:59:32) > > On Wed, Aug 30, 2017 at 01:56:40PM +0100, Chris Wilson wrote: > > > Quoting Daniel Vetter (2017-08-30 13:23:56) > > > > Or just the need to add a pile more tests to pm_rpm? > > > > > > No. It's just your regular combinatorial explosion. The approach I would > > > take here would be to register a sysenter callback that attempted to do a > > > rpm suspend (i.e. so ~every ioctl would start from idle, and controlled > > > via the faultinjection framework) and then run the minimal test set that > > > exercises all ioctl paths, and then expand to all driver branches. > > > > > > First we need coverage feedback. > > > > What I meant to imply: As long as any display is on we will never rpm > > suspend. Mostly this is the case for CI machines. > > > > The new testcases I've had in mind would explicitly dpms off the display > > before running a set of gem testcases. We don't want to do that everywhere > > though, because a dpms on/off is very costly. > > If no userspace is using the display and we remove fbcon, shouldn't the > kernel be disabling the outputs anyway? There's literally nothing there > to provide display continuity. Fastboot will take over boot-up state and keep it running until a kms client takes over. Once that happened, we'll have dropped the bios fb on the floor and will shut down all outputs on close (due to removal of the userspace fbs). Which means if you want to rely on that, then how your test behaves depends upon when we last had to reboot, and whether a kms test has run before you. That's really bad from a "too much noise in CI" pov (because with the full sharded run it's somewhat randomized each time). Also, we have fbcon tests in igt, so disabling fbcon isn't an option really. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
Quoting Daniel Vetter (2017-08-30 14:59:32) > On Wed, Aug 30, 2017 at 01:56:40PM +0100, Chris Wilson wrote: > > Quoting Daniel Vetter (2017-08-30 13:23:56) > > > Or just the need to add a pile more tests to pm_rpm? > > > > No. It's just your regular combinatorial explosion. The approach I would > > take here would be to register a sysenter callback that attempted to do a > > rpm suspend (i.e. so ~every ioctl would start from idle, and controlled > > via the faultinjection framework) and then run the minimal test set that > > exercises all ioctl paths, and then expand to all driver branches. > > > > First we need coverage feedback. > > What I meant to imply: As long as any display is on we will never rpm > suspend. Mostly this is the case for CI machines. > > The new testcases I've had in mind would explicitly dpms off the display > before running a set of gem testcases. We don't want to do that everywhere > though, because a dpms on/off is very costly. If no userspace is using the display and we remove fbcon, shouldn't the kernel be disabling the outputs anyway? There's literally nothing there to provide display continuity. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
On Wed, Aug 30, 2017 at 01:56:40PM +0100, Chris Wilson wrote: > Quoting Daniel Vetter (2017-08-30 13:23:56) > > Or just the need to add a pile more tests to pm_rpm? > > No. It's just your regular combinatorial explosion. The approach I would > take here would be to register a sysenter callback that attempted to do a > rpm suspend (i.e. so ~every ioctl would start from idle, and controlled > via the faultinjection framework) and then run the minimal test set that > exercises all ioctl paths, and then expand to all driver branches. > > First we need coverage feedback. What I meant to imply: As long as any display is on we will never rpm suspend. Mostly this is the case for CI machines. The new testcases I've had in mind would explicitly dpms off the display before running a set of gem testcases. We don't want to do that everywhere though, because a dpms on/off is very costly. I guess once we switch (eventually, hopefully) to a binary-at-time model for full igt CI we could amortize that over a pile of substests and do it almost everywhere. So I think adding a force rpm suspend won't help, because under normal CI runs we can't ever get there becaus of the active display. And that's why we're not hitting all these tons of problems anywhere else. This might also be good reason for more server chips, or at least fake server chips, where we disable the display entirely. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
Quoting Chris Wilson (2017-08-30 13:56:40) > No. It's just your regular combinatorial explosion. The approach I would > take here would be to register a sysenter callback that attempted to do a > rpm suspend (i.e. so ~every ioctl would start from idle, and controlled > via the faultinjection framework) and then run the minimal test set that > exercises all ioctl paths, and then expand to all driver branches. The simpler option is to apply the faultinjection to the rpm_put. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
Quoting Chris Wilson (2017-08-30 13:56:40) > Quoting Daniel Vetter (2017-08-30 13:23:56) > > Or just the need to add a pile more tests to pm_rpm? > > No. It's just your regular combinatorial explosion. The approach I would > take here would be to register a sysenter callback that attempted to do a > rpm suspend (i.e. so ~every ioctl would start from idle, and controlled > via the faultinjection framework) and then run the minimal test set that > exercises all ioctl paths, and then expand to all driver branches. We also need the corollary that without rpm it works. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
Quoting Daniel Vetter (2017-08-30 13:23:56) > On Tue, Aug 29, 2017 at 03:59:36PM +0100, Chris Wilson wrote: > > Quoting Joonas Lahtinen (2017-08-29 15:54:06) > > > On Tue, 2017-08-29 at 11:33 +0100, Chris Wilson wrote: > > > > Since we hold the device wakeref when writing through the GTT (otherwise > > > > the writes would fail), we presumed that before the device sleeps those > > > > writes would naturally be flushed and that we wouldn't need our mmio > > > > read trick. However, that presumption seems false and a sleepy bxt seems > > > > to require us to always manually flush the GTT writes prior to direct > > > > access. > > > > > > > > Fixes: e2a2aa36a509 ("drm/i915: Check we have an wake device before > > > > flushing GTT writes") > > > > Signed-off-by: Chris Wilson > > > > Cc: Joonas Lahtinen > > > > > > Got any Bugzilla, Testcase, Tested-by? > > > > Original bugzilla hasn't been reopened, so I its looks like they were > > happy enough with the original patches that fixed the problem on my bxt. > > The testcase seems to be very system dependent, my suspicion is that it > > has to do with the wacky runtime pm exhibited by CI bxt. > > CI bxt doesn't have displays, which means we shut down a lot more when > it's running. Does this indicate a huge gem test gap where we should run > plenty of gem testcases with all the outputs shut down? This one is hard to tell since we are guessing at how the hw actually works. Strong PCI ordering it is not. If we take this example at face value, the key point of failure was rpm_get_if_in_use, so we could simply say that we need to ensure that all such branches are exercised, with varying amounts of stress since we are looking for a random hw delay. At the moment that boils down to the shrinker being avoiding unbinding anything whilst the device is idle, pushing us closer to oom (with kswapd hopefully riding to the rescue, and we all know how unreliable kswapd is). The wacky part of CI suspend seems to be that there's no reason for the device to wake at times, quite a few of the tests are just burning cycles without touching the hw and we still have the constant stream of suspend/resume. I'm very suspicious that we are waking up too often (and that it takes too long, about 28ms including the hpd of a headless machine). > Or just the need to add a pile more tests to pm_rpm? No. It's just your regular combinatorial explosion. The approach I would take here would be to register a sysenter callback that attempted to do a rpm suspend (i.e. so ~every ioctl would start from idle, and controlled via the faultinjection framework) and then run the minimal test set that exercises all ioctl paths, and then expand to all driver branches. First we need coverage feedback. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
On Tue, Aug 29, 2017 at 03:59:36PM +0100, Chris Wilson wrote: > Quoting Joonas Lahtinen (2017-08-29 15:54:06) > > On Tue, 2017-08-29 at 11:33 +0100, Chris Wilson wrote: > > > Since we hold the device wakeref when writing through the GTT (otherwise > > > the writes would fail), we presumed that before the device sleeps those > > > writes would naturally be flushed and that we wouldn't need our mmio > > > read trick. However, that presumption seems false and a sleepy bxt seems > > > to require us to always manually flush the GTT writes prior to direct > > > access. > > > > > > Fixes: e2a2aa36a509 ("drm/i915: Check we have an wake device before > > > flushing GTT writes") > > > Signed-off-by: Chris Wilson > > > Cc: Joonas Lahtinen > > > > Got any Bugzilla, Testcase, Tested-by? > > Original bugzilla hasn't been reopened, so I its looks like they were > happy enough with the original patches that fixed the problem on my bxt. > The testcase seems to be very system dependent, my suspicion is that it > has to do with the wacky runtime pm exhibited by CI bxt. CI bxt doesn't have displays, which means we shut down a lot more when it's running. Does this indicate a huge gem test gap where we should run plenty of gem testcases with all the outputs shut down? Or just the need to add a pile more tests to pm_rpm? Would be good if testcase review is a part of review, and not just "code does what the commit message says" ... The latter should be the trivial-most part of review really. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
Quoting Joonas Lahtinen (2017-08-29 15:54:06) > On Tue, 2017-08-29 at 11:33 +0100, Chris Wilson wrote: > > Since we hold the device wakeref when writing through the GTT (otherwise > > the writes would fail), we presumed that before the device sleeps those > > writes would naturally be flushed and that we wouldn't need our mmio > > read trick. However, that presumption seems false and a sleepy bxt seems > > to require us to always manually flush the GTT writes prior to direct > > access. > > > > Fixes: e2a2aa36a509 ("drm/i915: Check we have an wake device before > > flushing GTT writes") > > Signed-off-by: Chris Wilson > > Cc: Joonas Lahtinen > > Got any Bugzilla, Testcase, Tested-by? Original bugzilla hasn't been reopened, so I its looks like they were happy enough with the original patches that fixed the problem on my bxt. The testcase seems to be very system dependent, my suspicion is that it has to do with the wacky runtime pm exhibited by CI bxt. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
On Tue, 2017-08-29 at 11:33 +0100, Chris Wilson wrote: > Since we hold the device wakeref when writing through the GTT (otherwise > the writes would fail), we presumed that before the device sleeps those > writes would naturally be flushed and that we wouldn't need our mmio > read trick. However, that presumption seems false and a sleepy bxt seems > to require us to always manually flush the GTT writes prior to direct > access. > > Fixes: e2a2aa36a509 ("drm/i915: Check we have an wake device before flushing > GTT writes") > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen Got any Bugzilla, Testcase, Tested-by? Does what the message describes. Reviewed-by: Joonas Lahtinen Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
Since we hold the device wakeref when writing through the GTT (otherwise the writes would fail), we presumed that before the device sleeps those writes would naturally be flushed and that we wouldn't need our mmio read trick. However, that presumption seems false and a sleepy bxt seems to require us to always manually flush the GTT writes prior to direct access. Fixes: e2a2aa36a509 ("drm/i915: Check we have an wake device before flushing GTT writes") Signed-off-by: Chris Wilson Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 43834dee4e8d..890fe2802973 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -695,12 +695,11 @@ flush_write_domain(struct drm_i915_gem_object *obj, unsigned int flush_domains) switch (obj->base.write_domain) { case I915_GEM_DOMAIN_GTT: if (INTEL_GEN(dev_priv) >= 6 && !HAS_LLC(dev_priv)) { - if (intel_runtime_pm_get_if_in_use(dev_priv)) { - spin_lock_irq(&dev_priv->uncore.lock); - POSTING_READ_FW(RING_ACTHD(dev_priv->engine[RCS]->mmio_base)); - spin_unlock_irq(&dev_priv->uncore.lock); - intel_runtime_pm_put(dev_priv); - } + intel_runtime_pm_get(dev_priv); + spin_lock_irq(&dev_priv->uncore.lock); + POSTING_READ_FW(RING_ACTHD(dev_priv->engine[RCS]->mmio_base)); + spin_unlock_irq(&dev_priv->uncore.lock); + intel_runtime_pm_put(dev_priv); } intel_fb_obj_flush(obj, -- 2.14.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Always wake the device to flush the GTT
Experimental patch for bxt/gem_pwrite. --- drivers/gpu/drm/i915/i915_gem.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 37fbc64d9ffe..4f6af401320c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -695,12 +695,11 @@ flush_write_domain(struct drm_i915_gem_object *obj, unsigned int flush_domains) switch (obj->base.write_domain) { case I915_GEM_DOMAIN_GTT: if (INTEL_GEN(dev_priv) >= 6 && !HAS_LLC(dev_priv)) { - if (intel_runtime_pm_get_if_in_use(dev_priv)) { - spin_lock_irq(&dev_priv->uncore.lock); - POSTING_READ_FW(RING_ACTHD(dev_priv->engine[RCS]->mmio_base)); - spin_unlock_irq(&dev_priv->uncore.lock); - intel_runtime_pm_put(dev_priv); - } + intel_runtime_pm_get(dev_priv); + spin_lock_irq(&dev_priv->uncore.lock); + POSTING_READ_FW(RING_ACTHD(dev_priv->engine[RCS]->mmio_base)); + spin_unlock_irq(&dev_priv->uncore.lock); + intel_runtime_pm_put(dev_priv); } intel_fb_obj_flush(obj, -- 2.14.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx