Re: [Intel-gfx] [PATCH 13/14] drm/i915/guc: Workaround the missing user interrupt after preemption

2017-10-20 Thread Daniele Ceraolo Spurio



On 20/10/17 10:22, Chris Wilson wrote:

Quoting Chris Wilson (2017-10-19 20:44:41)

Quoting Michał Winiarski (2017-10-19 19:36:18)

With GuC, we're scheduling tasklet on USER_INTERRUPT - since some work
has completed, perhaps we're able to submit some more work. We're doing
similar thing for preemption - after preemption has completed, it's time
to schedule the tasklet and submit more work (since the engine is now
idle). Unfortunately, we can hit the scenarios where the preemption is
done, but the interrupt is nowhere to be seen. To work around the
problem, let's use a delayed work that's kicking the tasklet if
preemption is done, and queueing itself otherwise.


I'm not buying it yet.


I think I know what it is. I think it is as simple as us turning off the
irq when there's no more requests being signaled on the engine.
-Chris


I can confirm this, I was just about to reply with the same findings ;)
By hackily adding USER_INTERRUPT to irq_keep_mask we've been able to run 
exec_whisper@fds-priority several time in a row with no problems.


Daniele
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 13/14] drm/i915/guc: Workaround the missing user interrupt after preemption

2017-10-20 Thread Chris Wilson
Quoting Chris Wilson (2017-10-19 20:44:41)
> Quoting Michał Winiarski (2017-10-19 19:36:18)
> > With GuC, we're scheduling tasklet on USER_INTERRUPT - since some work
> > has completed, perhaps we're able to submit some more work. We're doing
> > similar thing for preemption - after preemption has completed, it's time
> > to schedule the tasklet and submit more work (since the engine is now
> > idle). Unfortunately, we can hit the scenarios where the preemption is
> > done, but the interrupt is nowhere to be seen. To work around the
> > problem, let's use a delayed work that's kicking the tasklet if
> > preemption is done, and queueing itself otherwise.
> 
> I'm not buying it yet.

I think I know what it is. I think it is as simple as us turning off the
irq when there's no more requests being signaled on the engine.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 13/14] drm/i915/guc: Workaround the missing user interrupt after preemption

2017-10-20 Thread Chris Wilson
Quoting Michał Winiarski (2017-10-19 19:36:18)
> +#define GUC_LOST_IRQ_WORK_DELAY_MS 100
> +static void guc_lost_user_interrupt(struct work_struct *work)
> +{
> +   struct guc_preempt_work *preempt_work =
> +   container_of(to_delayed_work(work), typeof(*preempt_work),
> +lost_irq_work);
> +   struct intel_engine_cs *engine = preempt_work->engine;
> +   struct intel_guc *guc = >i915->guc;

Since you switched preempt_work to be stored within intel_guc,
you can use container_of to get to the guc and cut out the dance.
Here and inject_preempt_context.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 13/14] drm/i915/guc: Workaround the missing user interrupt after preemption

2017-10-19 Thread Chris Wilson
Quoting Michał Winiarski (2017-10-19 19:36:18)
> With GuC, we're scheduling tasklet on USER_INTERRUPT - since some work
> has completed, perhaps we're able to submit some more work. We're doing
> similar thing for preemption - after preemption has completed, it's time
> to schedule the tasklet and submit more work (since the engine is now
> idle). Unfortunately, we can hit the scenarios where the preemption is
> done, but the interrupt is nowhere to be seen. To work around the
> problem, let's use a delayed work that's kicking the tasklet if
> preemption is done, and queueing itself otherwise.

I'm not buying it yet. Missing USER_INTERRUPT would not be guc specific
(surely?) and so we would be getting similar missed-breadcrumb warnings
on skl+. We dedicate a lot of CI towards detecting those...

The first alternative that springs to mind is bad ordering between ggtt
write and MI_USER_INTERRUPT, but that too is common to execlists, and
also results in missed-breadcrumb warnings (which are notable by their
absence). Though having said that, the HWSP ordering issue with
intel_iommu is worrying me that we have similar issues with breadcrumbs
+ intel_iommu. Is iommu a factor?

In short, if this is a problem here, it should be a problem everywhere.
Right?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 13/14] drm/i915/guc: Workaround the missing user interrupt after preemption

2017-10-19 Thread Michał Winiarski
With GuC, we're scheduling tasklet on USER_INTERRUPT - since some work
has completed, perhaps we're able to submit some more work. We're doing
similar thing for preemption - after preemption has completed, it's time
to schedule the tasklet and submit more work (since the engine is now
idle). Unfortunately, we can hit the scenarios where the preemption is
done, but the interrupt is nowhere to be seen. To work around the
problem, let's use a delayed work that's kicking the tasklet if
preemption is done, and queueing itself otherwise.

Testcase: igt/gem_exec_whisper/*-priority
Signed-off-by: Michał Winiarski 
Cc: Chris Wilson 
Cc: Jeff McGee 
Cc: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 36 +-
 drivers/gpu/drm/i915/intel_guc.h   |  1 +
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c 
b/drivers/gpu/drm/i915/i915_guc_submission.c
index a11ed4deff4b..dbb03b5481d2 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -574,6 +574,26 @@ static void flush_ggtt_writes(struct i915_vma *vma)
POSTING_READ_FW(GUC_STATUS);
 }
 
+#define GUC_LOST_IRQ_WORK_DELAY_MS 100
+static void guc_lost_user_interrupt(struct work_struct *work)
+{
+   struct guc_preempt_work *preempt_work =
+   container_of(to_delayed_work(work), typeof(*preempt_work),
+lost_irq_work);
+   struct intel_engine_cs *engine = preempt_work->engine;
+   struct intel_guc *guc = >i915->guc;
+   struct guc_shared_ctx_data *data = guc->shared_data_vaddr;
+   struct guc_ctx_report *report = 
>preempt_ctx_report[engine->guc_id];
+
+   if (report->report_return_status == INTEL_GUC_REPORT_STATUS_COMPLETE)
+   tasklet_schedule(>execlists.irq_tasklet);
+   else
+   queue_delayed_work(guc->preempt_wq,
+  _work->lost_irq_work,
+  
msecs_to_jiffies(GUC_LOST_IRQ_WORK_DELAY_MS));
+
+}
+
 #define GUC_PREEMPT_FINISHED 0x1
 #define GUC_PREEMPT_BREADCRUMB_DWORDS 0x8
 static void inject_preempt_context(struct work_struct *work)
@@ -629,7 +649,13 @@ static void inject_preempt_context(struct work_struct 
*work)
if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data {
WRITE_ONCE(engine->execlists.preempt, false);
tasklet_schedule(>execlists.irq_tasklet);
+
+   return;
}
+
+   queue_delayed_work(engine->i915->guc.preempt_wq,
+  _work->lost_irq_work,
+  msecs_to_jiffies(GUC_LOST_IRQ_WORK_DELAY_MS));
 }
 
 /*
@@ -647,6 +673,10 @@ static void wait_for_guc_preempt_report(struct 
intel_engine_cs *engine)
struct guc_shared_ctx_data *data = guc->shared_data_vaddr;
struct guc_ctx_report *report = 
>preempt_ctx_report[engine->guc_id];
 
+   /* If we landed here, it means that we didn't lose an interrupt, and
+* we can safely cancel the worker */
+   cancel_delayed_work(>preempt_work[engine->id].lost_irq_work);
+
WARN_ON(wait_for_atomic(report->report_return_status ==
INTEL_GUC_REPORT_STATUS_COMPLETE,
GUC_PREEMPT_POSTPROCESS_DELAY_MS));
@@ -1229,6 +1259,8 @@ static int guc_preempt_work_create(struct intel_guc *guc)
for_each_engine(engine, dev_priv, id) {
guc->preempt_work[id].engine = engine;
INIT_WORK(>preempt_work[id].work, inject_preempt_context);
+   INIT_DELAYED_WORK(>preempt_work[id].lost_irq_work,
+ guc_lost_user_interrupt);
}
 
return 0;
@@ -1240,8 +1272,10 @@ static void guc_preempt_work_destroy(struct intel_guc 
*guc)
struct intel_engine_cs *engine;
enum intel_engine_id id;
 
-   for_each_engine(engine, dev_priv, id)
+   for_each_engine(engine, dev_priv, id) {
+   cancel_delayed_work_sync(>preempt_work[id].lost_irq_work);
cancel_work_sync(>preempt_work[id].work);
+   }
 
destroy_workqueue(guc->preempt_wq);
 }
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 7273a6be7dc1..0c9338b5c4b8 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -49,6 +49,7 @@ enum i915_guc_client_id {
 struct guc_preempt_work {
struct intel_engine_cs *engine;
struct work_struct work;
+   struct delayed_work lost_irq_work;
 };
 
 struct intel_guc {
-- 
2.13.6

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx