Re: [PATCH v2] drm/i915/guc: Enable PXP GuC autoteardown flow

2024-09-06 Thread Teres Alexis, Alan Previn
LGTM:

Reviewed-by: Alan Previn 

On Fri, 2024-09-06 at 10:40 -0700, john.c.harri...@intel.com wrote:
> From: Juston Li 
> 
> This feature flag enables GuC autoteardown which allows for a grace
> period before session teardown.
> 
> Also add a HAS_PXP() helper to share with the other place that wants
> to check.
> 
> Signed-off-by: Juston Li 
> Signed-off-by: John Harrison 
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c  | 8 
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 1 +
>  drivers/gpu/drm/i915/i915_drv.h | 3 +++
>  drivers/gpu/drm/i915/pxp/intel_pxp.c    | 2 +-
>  4 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 097fc6bd1285e..5949ff0b0161f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -239,8 +239,16 @@ static u32 guc_ctl_debug_flags(struct intel_guc *guc)
>  
>  static u32 guc_ctl_feature_flags(struct intel_guc *guc)
>  {
> +   struct intel_gt *gt = guc_to_gt(guc);
> u32 flags = 0;
>  
> +   /*
> +    * Enable PXP GuC autoteardown flow.
> +    * NB: MTL does things differently.
> +    */
> +   if (HAS_PXP(gt->i915) && !IS_METEORLAKE(gt->i915))
> +   flags |= GUC_CTL_ENABLE_GUC_PXP_CTL;
> +
> if (!intel_guc_submission_is_used(guc))
> flags |= GUC_CTL_DISABLE_SCHEDULER;
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index 263c9c3f6a034..4ce6e2332a63f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -105,6 +105,7 @@
>  #define   GUC_WA_ENABLE_TSC_CHECK_ON_RC6   BIT(22)
>  
>  #define GUC_CTL_FEATURE2
> +#define   GUC_CTL_ENABLE_GUC_PXP_CTL   BIT(1)
>  #define   GUC_CTL_ENABLE_SLPC  BIT(2)
>  #define   GUC_CTL_DISABLE_SCHEDULERBIT(14)
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 39f6614a0a99a..faeba9732422f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -693,6 +693,9 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
>  
>  #define HAS_RPS(i915)  (INTEL_INFO(i915)->has_rps)
>  
> +#define HAS_PXP(i915) \
> +   (IS_ENABLED(CONFIG_DRM_I915_PXP) && INTEL_INFO(i915)->has_pxp)
> +
>  #define HAS_HECI_PXP(i915) \
> (INTEL_INFO(i915)->has_heci_pxp)
>  
> diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
> b/drivers/gpu/drm/i915/pxp/intel_pxp.c
> index 75278e78ca90e..5e0bf776aac0f 100644
> --- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
> @@ -170,7 +170,7 @@ static struct intel_gt 
> *find_gt_for_required_teelink(struct drm_i915_private *i9
>  
>  static struct intel_gt *find_gt_for_required_protected_content(struct 
> drm_i915_private *i915)
>  {
> -   if (!IS_ENABLED(CONFIG_DRM_I915_PXP) || !INTEL_INFO(i915)->has_pxp)
> +   if (!HAS_PXP(i915))
> return NULL;
>  
> /*



Re: [PATCH] drm/i915/guc: Correct capture of EIR register on hang

2024-02-27 Thread Teres Alexis, Alan Previn
On Fri, 2024-02-23 at 12:32 -0800, john.c.harri...@intel.com wrote:
> From: John Harrison 
alan:snip

> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> @@ -51,6 +51,7 @@
> { RING_ESR(0),  0,  0, "ESR" }, \
> { RING_DMA_FADD(0), 0,  0, "RING_DMA_FADD_LDW" },
> \
> { RING_DMA_FADD_UDW(0), 0,  0, "RING_DMA_FADD_UDW" },
> \
> +   { RING_EIR(0),  0,  0, "EIR" }, \
> { RING_IPEIR(0),    0,  0, "IPEIR" }, \
> { RING_IPEHR(0),    0,  0, "IPEHR" }, \
> { RING_INSTPS(0),   0,  0, "INSTPS" }, \
> @@ -80,9 +81,6 @@
> { GEN8_RING_PDP_LDW(0, 3),  0,  0, "PDP3_LDW" }, \
> { GEN8_RING_PDP_UDW(0, 3),  0,  0, "PDP3_UDW" }
>  
> -#define COMMON_BASE_HAS_EU \
> -   { EIR,  0,  0, "EIR" }
> -
alan:snip

alan: Thanks for catching this one.
Reviewed-by: Alan Previn 


Re: ✗ Fi.CI.IGT: failure for Resolve suspend-resume racing with GuC destroy-context-worker (rev13)

2024-01-04 Thread Teres Alexis, Alan Previn
On Thu, 2024-01-04 at 10:57 +, Patchwork wrote:
> Patch Details
> Series: Resolve suspend-resume racing with GuC destroy-context-worker (rev13)
> URL:https://patchwork.freedesktop.org/series/121916/
> State:  failure
> Details:
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121916v13/index.html
> CI Bug Log - changes from CI_DRM_14076_full -> Patchwork_121916v13_full
> Summary
> 
> FAILURE
alan:snip


> Here are the unknown changes that may have been introduced in 
> Patchwork_121916v13_full:
> 
> IGT changes
> Possible regressions
> 
>   *   igt@gem_eio@wait-wedge-immediate:
>  *   shard-mtlp: 
> PASS
>  -> 
> ABORT
> 
alan: from the code and dmesg, this is unrelated to guc context destruction 
flows.
Its reading an MCR register that times out. Additionally, i believe this error 
is occuring during post-reset-init flows.
So its definitely not doing any context destruction at this point (as reset 
would have happenned sooner).
> Known issues
> 



Re: [PATCH v9 0/2] Resolve suspend-resume racing with GuC destroy-context-worker

2024-01-02 Thread Teres Alexis, Alan Previn
On Wed, 2023-12-27 at 20:55 -0800, Teres Alexis, Alan Previn wrote:
> This series is the result of debugging issues root caused to
> races between the GuC's destroyed_worker_func being triggered
> vs repeating suspend-resume cycles with concurrent delayed
> fence signals for engine-freeing.
> 
alan:snip.

alan: I did not receive the CI-premerge email where the following was reported:
  IGT changes
  Possible regressions
  igt@i915_selftest@live@gt_pm:
  shard-rkl: PASS -> DMESG-FAIL
After going thru the error in dmesg and codes, i am confident this failure not
related to the series. This selftest calls rdmsrl functions (that doen't do
any requests / guc submissions) but gets a reply power of zero (the bug 
reported).
So this is unrelated.


Hi @"Vivi, Rodrigo" , just an FYI note that after the 
last
requested rebase, BAT passed twice in a row now so i am confident failures on 
rev7
and prior was unrelated and that this series is ready for merging. Thanks again
for all your help and patiences - this was a long one  :)


[PATCH v9 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-12-29 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(>->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn 
Signed-off-by: Anshuman Gupta 
Tested-by: Mousumi Jana 
Acked-by: Daniele Ceraolo Spurio 
Reviewed-by: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 73 +--
 2 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 0d812f4d787d..3b27218aabe2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -28,6 +28,13 @@ void i915_gem_suspend(struct drm_i915_private *i915)
GEM_TRACE("%s\n", dev_name(i915->drm.dev));
 
intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 0);
+   /*
+* On rare occasions, we've observed the fence completion triggers
+* free_engines asynchronously via rcu_call. Ensure those are done.
+* This path is only called on suspend, so it's an acceptable cost.
+*/
+   rcu_barrier();
+
flush_workqueue(i915->wq);
 
/*
@@ -160,6 +167,9 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 * machine in an unusable condition.
 */
 
+   /* Like i915_gem_suspend, flush tasks staged from fence triggers */
+   rcu_barrier();
+
for_each_gt(gt, i915, i)
intel_gt_suspend_late(gt);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9c64ae0766cc..cae637fc3ead 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -236,6 +236,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(&ce->guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -613,6 +620,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -623,7 +632,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(&guc->outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_gu

[PATCH v9 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-12-27 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(>->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn 
Signed-off-by: Anshuman Gupta 
Tested-by: Mousumi Jana 
Acked-by: Daniele Ceraolo Spurio 
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 73 +--
 2 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 0d812f4d787d..3b27218aabe2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -28,6 +28,13 @@ void i915_gem_suspend(struct drm_i915_private *i915)
GEM_TRACE("%s\n", dev_name(i915->drm.dev));
 
intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 0);
+   /*
+* On rare occasions, we've observed the fence completion triggers
+* free_engines asynchronously via rcu_call. Ensure those are done.
+* This path is only called on suspend, so it's an acceptable cost.
+*/
+   rcu_barrier();
+
flush_workqueue(i915->wq);
 
/*
@@ -160,6 +167,9 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 * machine in an unusable condition.
 */
 
+   /* Like i915_gem_suspend, flush tasks staged from fence triggers */
+   rcu_barrier();
+
for_each_gt(gt, i915, i)
intel_gt_suspend_late(gt);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9c64ae0766cc..cae637fc3ead 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -236,6 +236,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(&ce->guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -613,6 +620,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -623,7 +632,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(&guc->outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_guc_wa

[PATCH v9 1/2] drm/i915/guc: Flush context destruction worker at suspend

2023-12-27 Thread Alan Previn
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a259f1118c5a..9c64ae0766cc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1613,6 +1613,11 @@ static void guc_flush_submissions(struct intel_guc *guc)
spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+void intel_guc_submission_flush_work(struct intel_guc *guc)
+{
+   flush_work(&guc->submission_state.destroyed_worker);
+}
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c57b29cdb1a6..b6df75622d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
   bool interruptible,
   long timeout);
 
+void intel_guc_submission_flush_work(struct intel_guc *guc);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
return guc->submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 3872d309ed31..b8b09b1bee3e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -690,6 +690,8 @@ void intel_uc_suspend(struct intel_uc *uc)
return;
}
 
+   intel_guc_submission_flush_work(guc);
+
with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
-- 
2.39.0



[PATCH v9 0/2] Resolve suspend-resume racing with GuC destroy-context-worker

2023-12-27 Thread Alan Previn
This series is the result of debugging issues root caused to
races between the GuC's destroyed_worker_func being triggered
vs repeating suspend-resume cycles with concurrent delayed
fence signals for engine-freeing.

The reproduction steps require that an app is launched right
before the start of the suspend cycle where it creates a
new gem context and submits a tiny workload that would
complete in the middle of the suspend cycle. However this
app uses dma-buffer sharing or dma-fence with non-GPU
objects or signals that eventually triggers a FENCE_FREE
via__i915_sw_fence_notify that connects to engines_notify ->
free_engines_rcu -> intel_context_put ->
kref_put(&ce->ref..) that queues the worker after the GuCs
CTB has been disabled (i.e. after i915-gem's suspend-late).

This sequence is a corner-case and required repeating this
app->suspend->resume cycle ~1500 times across 4 identical
systems to see it once. That said, based on above callstack,
it is clear that merely flushing the context destruction worker,
which is obviously missing and needed, isn't sufficient.

Because of that, this series adds additional patches besides
the obvious (Patch #1) flushing of the worker during the
suspend flows. It also includes (Patch #2) closing a race
between sending the context-deregistration H2G vs the CTB
getting disabled in the midst of it (by detecing the failure
and unrolling the guc-lrc-unpin flow) and adding an additional
rcu_barrier in the gem-suspend flow to purge outstanding
rcu defered tasks that may include context destruction.

This patch was tested and confirmed to be reliably working
after running ~1500 suspend resume cycles on 4 concurrent
machines.

Changes from prior revs:
   v8: - Rebase again to resolve conflicts (Rodrigo).
   v7: - Rebase on latest drm-tip.
   v6: - Dont hold the spinlock while calling deregister_context
 which can take a longer time. (Rodrigo)
   - Fix / improve of comments. (Rodrigo)
   - Release the ce->guc_state.lock before calling
 deregister_context and retake it if we fail. (John/Daniele).
   v5: - Remove Patch #3 which doesnt solve this exact bug
 but can be a separate patch(Tvrtko).
   v4: - In Patch #2, change the position of the calls into
 rcu_barrier based on latest testing data. (Alan/Anshuman).
   - In Patch #3, fix the timeout value selection for the
 final gt-pm idle-wait that was incorrectly using a 'ns'
 #define as a milisec timeout.
   v3: - In Patch #3, when deregister_context fails, instead
 of calling intel_gt_pm_put(that might sleep), call
 __intel_wakeref_put (without ASYNC flag) (Rodrigo/Anshuman).
   - In wait_for_suspend add an rcu_barrier before we
 proceed to wait for idle. (Anshuman)
   v2: - Patch #2 Restructure code in guc_lrc_desc_unpin so
 it's more readible to differentiate (1)direct guc-id
 cleanup ..vs (2) sending the H2G ctx-destroy action ..
 vs (3) the unrolling steps if the H2G fails.
   - Patch #2 Add a check to close the race sooner by checking
 for intel_guc_is_ready from destroyed_worker_func.
   - Patch #2 When guc_submission_send_busy_loop gets a
 failure from intel_guc_send_busy_loop, we need to undo
 i.e. decrement the outstanding_submission_g2h.
   - Patch #3 In wait_for_suspend, fix checking of return from
 intel_gt_pm_wait_timeout_for_idle to now use -ETIMEDOUT
 and add documentation for intel_wakeref_wait_for_idle.
 (Rodrigo).

Alan Previn (2):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss

 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 78 +--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  2 +
 4 files changed, 87 insertions(+), 5 deletions(-)


base-commit: 56c351e9f8c3ca9389a924ff2a46b34b50fb37dd
-- 
2.39.0



Re: [PATCH v8 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-12-27 Thread Teres Alexis, Alan Previn
On Tue, 2023-12-26 at 10:11 -0500, Vivi, Rodrigo wrote:
> On Wed, Dec 20, 2023 at 11:08:59PM +, Teres Alexis, Alan Previn wrote:
> > On Wed, 2023-12-13 at 16:23 -0500, Vivi, Rodrigo wrote:
alan:snip

> > 
> > 
> > alan: Thanks Rodrigo for the RB last week, just quick update:
> > 
> > I've cant reproduce the BAT failures that seem to be intermittent
> > on platform and test - however, a noticable number of failures
> > do keep occuring on i915_selftest @live @requests where the
> > last test leaked a wakeref and the failing test hangs waiting
> > for gt to idle before starting its test.
> > 
> > i have to debug this further although from code inspection
> > is unrelated to the patches in this series.
> > Hopefully its a different issue.
> 
> Yeap, likely not related. Anyway, I'm sorry for not merging
> this sooner. Could you please send a rebased version? This
> on is not applying cleanly anymore.

Hi Rodrigo, i will rebase it as soon as i do a bit more testing..
I realize i was using a slighlty older guc and with newer guc am
seeing all kinds of failures but trending as not an issue with the series.



Re: [PATCH v8 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-12-20 Thread Teres Alexis, Alan Previn
On Wed, 2023-12-13 at 16:23 -0500, Vivi, Rodrigo wrote:
> On Tue, Dec 12, 2023 at 08:57:16AM -0800, Alan Previn wrote:
> > If we are at the end of suspend or very early in resume
> > its possible an async fence signal (via rcu_call) is triggered
> > to free_engines which could lead us to the execution of
> > the context destruction worker (after a prior worker flush).
alan:snip
> 
> > Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
> > contexts if guc_lrc_desc_unpin fails due to CT send falure.
> > When unrolling, keep the context in the GuC's destroy-list so
> > it can get picked up on the next destroy worker invocation
> > (if suspend aborted) or get fully purged as part of a GuC
> > sanitization (end of suspend) or a reset flow.
> > 
> > Signed-off-by: Alan Previn 
> > Signed-off-by: Anshuman Gupta 
> > Tested-by: Mousumi Jana 
> > Acked-by: Daniele Ceraolo Spurio 
> 
> Thanks for all the explanations, patience and great work!
> 
> Reviewed-by: Rodrigo Vivi 

alan: Thanks Rodrigo for the RB last week, just quick update:

I've cant reproduce the BAT failures that seem to be intermittent
on platform and test - however, a noticable number of failures
do keep occuring on i915_selftest @live @requests where the
last test leaked a wakeref and the failing test hangs waiting
for gt to idle before starting its test.

i have to debug this further although from code inspection
is unrelated to the patches in this series.
Hopefully its a different issue.


[PATCH v8 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-12-12 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(>->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn 
Signed-off-by: Anshuman Gupta 
Tested-by: Mousumi Jana 
Acked-by: Daniele Ceraolo Spurio 
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 73 +--
 2 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 0d812f4d787d..3b27218aabe2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -28,6 +28,13 @@ void i915_gem_suspend(struct drm_i915_private *i915)
GEM_TRACE("%s\n", dev_name(i915->drm.dev));
 
intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 0);
+   /*
+* On rare occasions, we've observed the fence completion triggers
+* free_engines asynchronously via rcu_call. Ensure those are done.
+* This path is only called on suspend, so it's an acceptable cost.
+*/
+   rcu_barrier();
+
flush_workqueue(i915->wq);
 
/*
@@ -160,6 +167,9 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 * machine in an unusable condition.
 */
 
+   /* Like i915_gem_suspend, flush tasks staged from fence triggers */
+   rcu_barrier();
+
for_each_gt(gt, i915, i)
intel_gt_suspend_late(gt);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9c64ae0766cc..cae637fc3ead 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -236,6 +236,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(&ce->guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -613,6 +620,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -623,7 +632,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(&guc->outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_guc_wa

[PATCH v8 1/2] drm/i915/guc: Flush context destruction worker at suspend

2023-12-12 Thread Alan Previn
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a259f1118c5a..9c64ae0766cc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1613,6 +1613,11 @@ static void guc_flush_submissions(struct intel_guc *guc)
spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+void intel_guc_submission_flush_work(struct intel_guc *guc)
+{
+   flush_work(&guc->submission_state.destroyed_worker);
+}
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c57b29cdb1a6..b6df75622d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
   bool interruptible,
   long timeout);
 
+void intel_guc_submission_flush_work(struct intel_guc *guc);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
return guc->submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 3872d309ed31..b8b09b1bee3e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -690,6 +690,8 @@ void intel_uc_suspend(struct intel_uc *uc)
return;
}
 
+   intel_guc_submission_flush_work(guc);
+
with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
-- 
2.39.0



[PATCH v8 0/2] Resolve suspend-resume racing with GuC destroy-context-worker

2023-12-12 Thread Alan Previn
This series is the result of debugging issues root caused to
races between the GuC's destroyed_worker_func being triggered
vs repeating suspend-resume cycles with concurrent delayed
fence signals for engine-freeing.

The reproduction steps require that an app is launched right
before the start of the suspend cycle where it creates a
new gem context and submits a tiny workload that would
complete in the middle of the suspend cycle. However this
app uses dma-buffer sharing or dma-fence with non-GPU
objects or signals that eventually triggers a FENCE_FREE
via__i915_sw_fence_notify that connects to engines_notify ->
free_engines_rcu -> intel_context_put ->
kref_put(&ce->ref..) that queues the worker after the GuCs
CTB has been disabled (i.e. after i915-gem's suspend-late).

This sequence is a corner-case and required repeating this
app->suspend->resume cycle ~1500 times across 4 identical
systems to see it once. That said, based on above callstack,
it is clear that merely flushing the context destruction worker,
which is obviously missing and needed, isn't sufficient.

Because of that, this series adds additional patches besides
the obvious (Patch #1) flushing of the worker during the
suspend flows. It also includes (Patch #2) closing a race
between sending the context-deregistration H2G vs the CTB
getting disabled in the midst of it (by detecing the failure
and unrolling the guc-lrc-unpin flow) and adding an additional
rcu_barrier in the gem-suspend flow to purge outstanding
rcu defered tasks that may include context destruction.

This patch was tested and confirmed to be reliably working
after running ~1500 suspend resume cycles on 4 concurrent
machines.

Changes from prior revs:
   v7: - Rebase on latest drm-tip.
   v6: - Dont hold the spinlock while calling deregister_context
 which can take a longer time. (Rodrigo)
   - Fix / improve of comments. (Rodrigo)
   - Release the ce->guc_state.lock before calling
 deregister_context and retake it if we fail. (John/Daniele).
   v5: - Remove Patch #3 which doesnt solve this exact bug
 but can be a separate patch(Tvrtko).
   v4: - In Patch #2, change the position of the calls into
 rcu_barrier based on latest testing data. (Alan/Anshuman).
   - In Patch #3, fix the timeout value selection for the
 final gt-pm idle-wait that was incorrectly using a 'ns'
 #define as a milisec timeout.
   v3: - In Patch #3, when deregister_context fails, instead
 of calling intel_gt_pm_put(that might sleep), call
 __intel_wakeref_put (without ASYNC flag) (Rodrigo/Anshuman).
   - In wait_for_suspend add an rcu_barrier before we
 proceed to wait for idle. (Anshuman)
   v2: - Patch #2 Restructure code in guc_lrc_desc_unpin so
 it's more readible to differentiate (1)direct guc-id
 cleanup ..vs (2) sending the H2G ctx-destroy action ..
 vs (3) the unrolling steps if the H2G fails.
   - Patch #2 Add a check to close the race sooner by checking
 for intel_guc_is_ready from destroyed_worker_func.
   - Patch #2 When guc_submission_send_busy_loop gets a
 failure from intel_guc_send_busy_loop, we need to undo
 i.e. decrement the outstanding_submission_g2h.
   - Patch #3 In wait_for_suspend, fix checking of return from
 intel_gt_pm_wait_timeout_for_idle to now use -ETIMEDOUT
 and add documentation for intel_wakeref_wait_for_idle.
 (Rodrigo).

Alan Previn (2):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss

 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 78 +--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  2 +
 4 files changed, 87 insertions(+), 5 deletions(-)


base-commit: b4182ec1538e8cebf630083ec4296bed0061d594
-- 
2.39.0



Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for Resolve suspend-resume racing with GuC destroy-context-worker (rev8)

2023-11-30 Thread Teres Alexis, Alan Previn
On Fri, 2023-12-01 at 02:20 +, Patchwork wrote:
> Patch Details
> Series: Resolve suspend-resume racing with GuC destroy-context-worker (rev8)
> URL:https://patchwork.freedesktop.org/series/121916/
> State:  failure
> Details:
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_121916v8/index.html

alan:snip
> Here are the unknown changes that may have been introduced in 
> Patchwork_121916v8:
> 
> IGT changes
> Possible regressions
> 
>   *   igt@i915_selftest@live@requests:
>  *   bat-mtlp-6: 
> PASS
>  -> 
> DMESG-FAIL
alan: inspecting the selftest code that is failing WRT changes in this targets, 
i really dont see any relation.
the series is changing how we handle context deregistratoin either thru the 
regular context-close or through the
flushing during suspend... however the selftest reporting a reset would be an 
issue where the context is not
closed (because it would be hanging) - also its a kernel context - which should 
not get closed. 
I will run another retest after i verify trybot sees no issue.


Re: [Intel-gfx] [PATCH v7 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-11-30 Thread Teres Alexis, Alan Previn
> As far as i can tell, its only if we started resetting / wedging right after 
> this
> queued worker got started.

alan: hope Daniele can proof read my tracing and confirm if got it right.


Re: [Intel-gfx] [PATCH v7 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-11-30 Thread Teres Alexis, Alan Previn
On Thu, 2023-11-30 at 16:18 -0500, Vivi, Rodrigo wrote:
> On Wed, Nov 29, 2023 at 04:20:13PM -0800, Alan Previn wrote:
alan:snip
> > +
> > if (unlikely(disabled)) {
> > release_guc_id(guc, ce);
> > __guc_context_destroy(ce);
> > -   return;
> > +   return 0;
> 
> is success the right return case here?
alan: yes: we may discover "disabled == true" if submission_disabled
found that gt-is-wedged. I dont believe such a case will happen as
part of flushing destroyed_worker_func during suspend but may occur
as part of regular runtime context closing that just happens to
get queued in the middle of a gt-reset/wedging process. In such a case,
the reset-prepare code will sanitize everytihng including cleaning up
the pending-destructoin-contexts-link-list. So its either we pick it
from here and dump it ... or reset picks it first and dumps it there
(where both dumpings only occur if guc got disabled first).


Supplemental: How regular context cleanup leads to the same path -->

i915_sw_fence_notify -> engines_notify -> free_engines_rcu ->
intel_context_put -> kref_put(&ce->ref..) -> ce->ops->destroy ->

(where ce->ops = engine->cops and engine->cops = guc_context_ops)
*and, guc_context_ops->destroy == guc_context_destroy so ->

ce->ops->destroy -> guc_context_destroy ->
queue_work(..&guc->submission_state.destroyed_worker);
-> [eventually] -> the same guc_lrc_unpin above

However with additional "if (!intel_guc_is_ready(guc))" in destroyed_worker_func
as part of this patch, hitting this "disabled==true" case will be even less 
likely.
As far as i can tell, its only if we started resetting / wedging right after 
this
queued worker got started. (i ran out of time to check if reset can ever occur
from within the same system_unbound_wq but then i recall we also have CI using
debugfs to force wedging for select (/all?) igt tests) so i suspect it can still
happen in parallel. NOTE: the original checking of the "is disabled" is not new
code - its the original code.

...alan

P.S.- oh man, that took a lot of code tracing as i can't remember these paths 
by heart.



Re: [Intel-gfx] [PATCH v2 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-29 Thread Teres Alexis, Alan Previn
On Wed, 2023-11-29 at 13:13 -0800, Teres Alexis, Alan Previn wrote:
> On Mon, 2023-11-27 at 15:24 -0500, Vivi, Rodrigo wrote:
> > On Wed, Nov 22, 2023 at 12:30:03PM -0800, Alan Previn wrote:
> alan:snip
> alan: thanks for reviewing and apologize for replying to this late.
> 
> > >   /*
> > > -  * On MTL and newer platforms, protected contexts require setting
> > > -  * the LRC run-alone bit or else the encryption will not happen.
> > > +  * Wa_14019159160 - Case 2: mtl
> > > +  * On some platforms, protected contexts require setting
> > > +  * the LRC run-alone bit or else the encryption/decryption will not 
> > > happen.
> > > +  * NOTE: Case 2 only applies to PXP use-case of said workaround.
> > >*/
> > 
> > hmm, interesting enough, on the wa description I read that it is incomplete 
> > for MTL/ARL
> > and something about a fuse bit. We should probably chase for some 
> > clarification in the
> > HSD?!
> alan: yes, i went through the HSD description with the architect and we both 
> had agreed that
> that from the KMD's perspective. At that time, the checking of the fuse from 
> KMD's
> could be optimized out for Case-2-PXP: if the fuses was set a certain way, 
> KMD can skip setting
> the bit in lrc because hw will enforce runalone in pxp mode irrespective of 
> what KMD does but
> if fuse was was not set that way KMD needed to set the runalone in lrc. So 
> for case2, its simpler
> to just turn it on always when the context is protected. However, that said, 
> i realized the
> wording of the HSDs have changed since the original patch was implemented so 
> i will reclarify
> offline and reply back. NOTE: i believe John Harrison is working on case-3 
> through a
> separate series.
alan: based on conversations with the architect, a different WA number, 
14019725520,
has the details of case-2 that explains the relationship the new fuse.
However, as before it can be optimized as explained above where PXP context
need to always set this bit to ensure encryption/decryption engages.




[Intel-gfx] [PATCH v7 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-11-29 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(>->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn 
Signed-off-by: Anshuman Gupta 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 73 +--
 2 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 0d812f4d787d..3b27218aabe2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -28,6 +28,13 @@ void i915_gem_suspend(struct drm_i915_private *i915)
GEM_TRACE("%s\n", dev_name(i915->drm.dev));
 
intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 0);
+   /*
+* On rare occasions, we've observed the fence completion triggers
+* free_engines asynchronously via rcu_call. Ensure those are done.
+* This path is only called on suspend, so it's an acceptable cost.
+*/
+   rcu_barrier();
+
flush_workqueue(i915->wq);
 
/*
@@ -160,6 +167,9 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 * machine in an unusable condition.
 */
 
+   /* Like i915_gem_suspend, flush tasks staged from fence triggers */
+   rcu_barrier();
+
for_each_gt(gt, i915, i)
intel_gt_suspend_late(gt);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6e3fb2fcce4f..a7228bebec39 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -236,6 +236,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(&ce->guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -613,6 +620,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -623,7 +632,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(&guc->outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
@@ -3288,1

[Intel-gfx] [PATCH v7 0/2] Resolve suspend-resume racing with GuC destroy-context-worker

2023-11-29 Thread Alan Previn
This series is the result of debugging issues root caused to
races between the GuC's destroyed_worker_func being triggered
vs repeating suspend-resume cycles with concurrent delayed
fence signals for engine-freeing.

The reproduction steps require that an app is launched right
before the start of the suspend cycle where it creates a
new gem context and submits a tiny workload that would
complete in the middle of the suspend cycle. However this
app uses dma-buffer sharing or dma-fence with non-GPU
objects or signals that eventually triggers a FENCE_FREE
via__i915_sw_fence_notify that connects to engines_notify ->
free_engines_rcu -> intel_context_put ->
kref_put(&ce->ref..) that queues the worker after the GuCs
CTB has been disabled (i.e. after i915-gem's suspend-late).

This sequence is a corner-case and required repeating this
app->suspend->resume cycle ~1500 times across 4 identical
systems to see it once. That said, based on above callstack,
it is clear that merely flushing the context destruction worker,
which is obviously missing and needed, isn't sufficient.

Because of that, this series adds additional patches besides
the obvious (Patch #1) flushing of the worker during the
suspend flows. It also includes (Patch #2) closing a race
between sending the context-deregistration H2G vs the CTB
getting disabled in the midst of it (by detecing the failure
and unrolling the guc-lrc-unpin flow) and adding an additional
rcu_barrier in the gem-suspend flow to purge outstanding
rcu defered tasks that may include context destruction.

This patch was tested and confirmed to be reliably working
after running ~1500 suspend resume cycles on 4 concurrent
machines.

Changes from prior revs:
   v6: - Dont hold the spinlock while calling deregister_context
 which can take a longer time. (Rodrigo)
   - Fix / improve of comments. (Rodrigo)
   - Release the ce->guc_state.lock before calling
 deregister_context and retake it if we fail. (John/Daniele).
   v5: - Remove Patch #3 which doesnt solve this exact bug
 but can be a separate patch(Tvrtko).
   v4: - In Patch #2, change the position of the calls into
 rcu_barrier based on latest testing data. (Alan/Anshuman).
   - In Patch #3, fix the timeout value selection for the
 final gt-pm idle-wait that was incorrectly using a 'ns'
 #define as a milisec timeout.
   v3: - In Patch #3, when deregister_context fails, instead
 of calling intel_gt_pm_put(that might sleep), call
 __intel_wakeref_put (without ASYNC flag) (Rodrigo/Anshuman).
   - In wait_for_suspend add an rcu_barrier before we
 proceed to wait for idle. (Anshuman)
   v2: - Patch #2 Restructure code in guc_lrc_desc_unpin so
 it's more readible to differentiate (1)direct guc-id
 cleanup ..vs (2) sending the H2G ctx-destroy action ..
 vs (3) the unrolling steps if the H2G fails.
   - Patch #2 Add a check to close the race sooner by checking
 for intel_guc_is_ready from destroyed_worker_func.
   - Patch #2 When guc_submission_send_busy_loop gets a
 failure from intel_guc_send_busy_loop, we need to undo
 i.e. decrement the outstanding_submission_g2h.
   - Patch #3 In wait_for_suspend, fix checking of return from
 intel_gt_pm_wait_timeout_for_idle to now use -ETIMEDOUT
 and add documentation for intel_wakeref_wait_for_idle.
 (Rodrigo).

Alan Previn (2):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss

 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 78 +--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  2 +
 4 files changed, 87 insertions(+), 5 deletions(-)


base-commit: 436cb0ff9f20fadc99ec3b70c4d2ac6cb2e4410a
-- 
2.39.0



[Intel-gfx] [PATCH v7 1/2] drm/i915/guc: Flush context destruction worker at suspend

2023-11-29 Thread Alan Previn
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 04f8377fd7a3..6e3fb2fcce4f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1613,6 +1613,11 @@ static void guc_flush_submissions(struct intel_guc *guc)
spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+void intel_guc_submission_flush_work(struct intel_guc *guc)
+{
+   flush_work(&guc->submission_state.destroyed_worker);
+}
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c57b29cdb1a6..b6df75622d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
   bool interruptible,
   long timeout);
 
+void intel_guc_submission_flush_work(struct intel_guc *guc);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
return guc->submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 3872d309ed31..b8b09b1bee3e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -690,6 +690,8 @@ void intel_uc_suspend(struct intel_uc *uc)
return;
}
 
+   intel_guc_submission_flush_work(guc);
+
with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
-- 
2.39.0



Re: [Intel-gfx] [PATCH v2 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-29 Thread Teres Alexis, Alan Previn
On Mon, 2023-11-27 at 15:24 -0500, Vivi, Rodrigo wrote:
> On Wed, Nov 22, 2023 at 12:30:03PM -0800, Alan Previn wrote:
alan:snip
alan: thanks for reviewing and apologize for replying to this late.

> > /*
> > -* On MTL and newer platforms, protected contexts require setting
> > -* the LRC run-alone bit or else the encryption will not happen.
> > +* Wa_14019159160 - Case 2: mtl
> > +* On some platforms, protected contexts require setting
> > +* the LRC run-alone bit or else the encryption/decryption will not 
> > happen.
> > +* NOTE: Case 2 only applies to PXP use-case of said workaround.
> >  */
> 
> hmm, interesting enough, on the wa description I read that it is incomplete 
> for MTL/ARL
> and something about a fuse bit. We should probably chase for some 
> clarification in the
> HSD?!
alan: yes, i went through the HSD description with the architect and we both 
had agreed that
that from the KMD's perspective. At that time, the checking of the fuse from 
KMD's
could be optimized out for Case-2-PXP: if the fuses was set a certain way, KMD 
can skip setting
the bit in lrc because hw will enforce runalone in pxp mode irrespective of 
what KMD does but
if fuse was was not set that way KMD needed to set the runalone in lrc. So for 
case2, its simpler
to just turn it on always when the context is protected. However, that said, i 
realized the
wording of the HSDs have changed since the original patch was implemented so i 
will reclarify
offline and reply back. NOTE: i believe John Harrison is working on case-3 
through a
separate series.

> 
> > -   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
> > -   (ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
> > RENDER_CLASS)) {
> > +   if (IS_METEORLAKE(ce->engine->i915) && (ce->engine->class == 
> > COMPUTE_CLASS ||
> > +   ce->engine->class == 
> > RENDER_CLASS)) {
> 
> This check now excludes the ARL with the same IP, no?!
alan: yeah - re-reved this part just now.
> 
> > rcu_read_lock();
> > gem_ctx = rcu_dereference(ce->gem_context);
> > if (gem_ctx)
> > 
> > base-commit: 5429d55de723544dfc0630cf39d96392052b27a1
> > -- 
> > 2.39.0
> > 



Re: [Intel-gfx] [PATCH v5] drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-11-29 Thread Teres Alexis, Alan Previn
On Fri, 2023-11-24 at 08:30 +, Tvrtko Ursulin wrote:
> On 22/11/2023 19:15, Alan Previn wrote:
alan:snip
alan: thanks for reviewing.
> > if (iir & GEN12_DISPLAY_STATE_RESET_COMPLETE_INTERRUPT)
> > -   pxp->session_events |= PXP_TERMINATION_COMPLETE;
> > +   pxp->session_events |= PXP_TERMINATION_COMPLETE | 
> > PXP_EVENT_TYPE_IRQ;
> 
> This looked to be doing more than adding debug logs, but then no one is 
> actually consuming this new flag?!
alan: see below hunk, inside pxp_session_work the drm_dbg that was prints
the "events" that came from above. we need this debug message to uniquely
recognize that contexts pxp keys would get invalidated in response to
a KCR IRQ (as opposed to other known flows that are being reported by other
means like banning hanging contexts, or as part of suspend etc).
NOTE: pxp->session_events does get set in at lease one other path
outside the IRQ (which triggers teardown but not coming from KCR-hw)
This flag is solely for the debug message. Hope this works.

> 
> Regards,
> 
> Tvrtko
> 
> >   
> > if (pxp->session_events)
> > queue_work(system_unbound_wq, &pxp->session_work);
> > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c 
> > b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
> > index 0a3e66b0265e..091c86e03d1a 100644
> > --- a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
> > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
> > @@ -137,8 +137,10 @@ void intel_pxp_terminate(struct intel_pxp *pxp, bool 
> > post_invalidation_needs_res
> >   static void pxp_terminate_complete(struct intel_pxp *pxp)
> >   {
> > /* Re-create the arb session after teardown handle complete */
> > -   if (fetch_and_zero(&pxp->hw_state_invalidated))
> > +   if (fetch_and_zero(&pxp->hw_state_invalidated)) {
> > +   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: creating arb_session 
> > after invalidation");
> > pxp_create_arb_session(pxp);
> > +   }
> >   
> > complete_all(&pxp->termination);
> >   }
> > @@ -157,6 +159,8 @@ static void pxp_session_work(struct work_struct *work)
> > if (!events)
> > return;
> >   
> > +   drm_dbg(>->i915->drm, "PXP: processing event-flags 0x%08x", events);
> > +
> > if (events & PXP_INVAL_REQUIRED)
> > intel_pxp_invalidate(pxp);
> >   
> > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_types.h 
> > b/drivers/gpu/drm/i915/pxp/intel_pxp_types.h
> > index 7e11fa8034b2..07864b584cf4 100644
> > --- a/drivers/gpu/drm/i915/pxp/intel_pxp_types.h
> > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_types.h
> > @@ -124,6 +124,7 @@ struct intel_pxp {
> >   #define PXP_TERMINATION_REQUEST  BIT(0)
> >   #define PXP_TERMINATION_COMPLETE BIT(1)
> >   #define PXP_INVAL_REQUIRED   BIT(2)
> > +#define PXP_EVENT_TYPE_IRQ   BIT(3)
> >   };
> >   
> >   #endif /* __INTEL_PXP_TYPES_H__ */
> > 
> > base-commit: 5429d55de723544dfc0630cf39d96392052b27a1



Re: [Intel-gfx] [PATCH v6 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-11-29 Thread Teres Alexis, Alan Previn
On Mon, 2023-11-27 at 16:51 -0500, Vivi, Rodrigo wrote:

alan: Firstly, thanks for taking the time to review this, knowing you have a 
lot on your plate right now.
> 
alan:snip
> > @@ -3301,19 +3315,38 @@ static inline void guc_lrc_desc_unpin(struct 
> > intel_context *ce)
> > /* Seal race with Reset */
> > spin_lock_irqsave(&ce->guc_state.lock, flags);
> > disabled = submission_disabled(guc);
> > -   if (likely(!disabled)) {
> > -   __intel_gt_pm_get(gt);
> > -   set_context_destroyed(ce);
> > -   clr_context_registered(ce);
> > -   }
> > -   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> 
> you are now holding this spin lock for too long...
alan: my intention was to ensure that an async H2G request to change this
gucid-context state wouldnt come in while we are in the middle of attempting
to send the context-destruction H2G and later realize it needed to be
unrolled (corner case race we are solving here as discovered on
customer platform). But after discussing with Daniele and John, they
agreed that we should move the unlock back to before the deregister and then
take the lock again inside of the unrolling code (if deregister_context 
fails).
alan:snip
> 
> > -   deregister_context(ce, ce->guc_id.id);
> > +   /* GuC is active, lets destroy this context,
> 
> for multi-line comments you need to start with '/*' only
> and start the real comment below, like:
alan:my bad. will fix.
> *
>  * GuC is active, ...
> > +* but at this point we can still be racing with
> > +* suspend, so we undo everything if the H2G fails
> > +*/
> > +
> > +   /* Change context state to destroyed and get gt-pm */
> > +   __intel_gt_pm_get(gt);
> > +   set_context_destroyed(ce);
> > +   clr_context_registered(ce);
> > +
> > +   ret = deregister_context(ce, ce->guc_id.id);
> > +   if (ret) {
> > +   /* Undo the state change and put gt-pm if that failed */
> > +   set_context_registered(ce);
> > +   clr_context_destroyed(ce);
> > +   /*
> > +* Dont use might_sleep / ASYNC verion of put because
> > +* CT loss in deregister_context could mean an ongoing
> > +* reset or suspend flow. Immediately put before the unlock
> > +*/
> > +   __intel_wakeref_put(>->wakeref, 0);
> 
> interesting enough you use the '__' version to bypass the might_sleep(),
> but also sending 0 as argument what might lead you in the mutex_lock inside
> the spin lock area, what is not allowed.
alan: so one thing to note, the motivation for an alternative unlock was only
driven by the fact we were holding that spinlock. However, as per the review
comment and response on the spin lock above, if we move the pm-put
outside the spin lock, we can call the intel_wakeref_put_async (which will not
actually trigger a delayed task becase this function (guc_lrc_desc_unpin) starts
with GEM_BUG_ON(!intel_gt_pm_is_awake(gt)); which means it would only decrement
the ref count.
alan:snip
> > +   spin_lock_irqsave(&guc->submission_state.lock, flags);
> > +   list_add_tail(&ce->destroyed_link,
> > + 
> > &guc->submission_state.destroyed_contexts);
> > +   spin_unlock_irqrestore(&guc->submission_state.lock, 
> > flags);
> > +   /* Bail now since the list might never be emptied if 
> > h2gs fail */
> 
> For this GuC interaction part I'd like to get an ack from John Harrison 
> please.
alan:okay - will do. I might alternatively tap on Daniele since he knows the 
guc just as well.
> 
alan:snip

> > @@ -3392,6 +3440,17 @@ static void destroyed_worker_func(struct work_struct 
> > *w)
> > struct intel_gt *gt = guc_to_gt(guc);
> > int tmp;
> >  
> > +   /*
> > +* In rare cases we can get here via async context-free fence-signals 
> > that
> > +* come very late in suspend flow or very early in resume flows. In 
> > these
> > +* cases, GuC won't be ready but just skipping it here is fine as these
> > +* pending-destroy-contexts get destroyed totally at GuC reset time at 
> > the
> > +* end of suspend.. OR.. this worker can be picked up later on the next
> > +* context destruction trigger after resume-completes
> > +*/
> > +   if (!intel_guc_is_ready(guc))
> > +   return;
> 
> is this racy?
alan: this is to reduce raciness. This worker function eventually calls
deregister_destroyed_context which calls the guc_lrc_desc_unpin that does
all the work we discussed above (taking locks, taking refs, sending h2g,
potentially unrolling). So this check an optimization to skip that without
going through the locking. So yes its a bit racy but its trying to reduce
raciness. NOTE1: Without this line of code, in theory everything would still
work fine, with the driver potentially experiencing more unroll cases
in those thousands of cycles. NOTE2: This check using intel_guc_is_ready
is done in many places in the driver knowing that it 

[Intel-gfx] [PATCH v4 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-29 Thread Alan Previn
Add missing tag for "Wa_14019159160 - Case 2" (for existing
PXP code that ensures run alone mode bit is set to allow
PxP-decryption.

 v4: - Include IP_VER 12.71. (Matt Roper)
 v3: - Check targeted platforms using IP_VAL. (John Harrison)
 v2: - Fix WA id number (John Harrison).
 - Improve comments and code to be specific
   for the targeted platforms (John Harrison)

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7c367ba8d9dc..d92e39bbb13d 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -863,10 +863,12 @@ static bool ctx_needs_runalone(const struct intel_context 
*ce)
bool ctx_is_protected = false;
 
/*
-* On MTL and newer platforms, protected contexts require setting
-* the LRC run-alone bit or else the encryption will not happen.
+* Wa_14019159160 - Case 2.
+* On some platforms, protected contexts require setting
+* the LRC run-alone bit or else the encryption/decryption will not 
happen.
+* NOTE: Case 2 only applies to PXP use-case of said workaround.
 */
-   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
+   if (IS_GFX_GT_IP_RANGE(ce->engine->gt, IP_VER(12, 70), IP_VER(12, 71)) 
&&
(ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
RENDER_CLASS)) {
rcu_read_lock();
gem_ctx = rcu_dereference(ce->gem_context);

base-commit: 436cb0ff9f20fadc99ec3b70c4d2ac6cb2e4410a
-- 
2.39.0



Re: [Intel-gfx] [PATCH v3 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-29 Thread Teres Alexis, Alan Previn
On Tue, 2023-11-28 at 10:03 -0800, Roper, Matthew D wrote:
> On Mon, Nov 27, 2023 at 12:11:50PM -0800, Alan Previn wrote:
> > Add missing tag for "Wa_14019159160 - Case 2" (for existing
> > PXP code that ensures run alone mode bit is set to allow
> > PxP-decryption.
alan:snip.
alan: thanks for reviewing.
> > -   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
> > +   if (GRAPHICS_VER_FULL(ce->engine->i915) == IP_VER(12, 70) &&
> 
> The workaround database lists this as being needed on both 12.70 and
> 12.71.  Should this be a
> 
> IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 71))
alan: sure - will do this.
> 
> check instead?
> 
> The workaround is also listed in the database as applying to DG2; is
> this "case 2" subset of the workaround not relevant to that platform?
alan: i dont believe so - not the pxp case 2.



[Intel-gfx] [PATCH v3 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-27 Thread Alan Previn
Add missing tag for "Wa_14019159160 - Case 2" (for existing
PXP code that ensures run alone mode bit is set to allow
PxP-decryption.

 v3: - Check targeted platforms using IP_VAL. (John Harrison)
 v2: - Fix WA id number (John Harrison).
 - Improve comments and code to be specific
   for the targeted platforms (John Harrison)

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7c367ba8d9dc..1152cf25d578 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -863,10 +863,12 @@ static bool ctx_needs_runalone(const struct intel_context 
*ce)
bool ctx_is_protected = false;
 
/*
-* On MTL and newer platforms, protected contexts require setting
-* the LRC run-alone bit or else the encryption will not happen.
+* Wa_14019159160 - Case 2: mtl
+* On some platforms, protected contexts require setting
+* the LRC run-alone bit or else the encryption/decryption will not 
happen.
+* NOTE: Case 2 only applies to PXP use-case of said workaround.
 */
-   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
+   if (GRAPHICS_VER_FULL(ce->engine->i915) == IP_VER(12, 70) &&
(ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
RENDER_CLASS)) {
rcu_read_lock();
gem_ctx = rcu_dereference(ce->gem_context);

base-commit: 5429d55de723544dfc0630cf39d96392052b27a1
-- 
2.39.0



[Intel-gfx] [PATCH v2 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-22 Thread Alan Previn
Add missing tag for "Wa_14019159160 - Case 2" (for existing
PXP code that ensures run alone mode bit is set to allow
PxP-decryption.

 v2: - Fix WA id number (John Harrison).
 - Improve comments and code to be specific
   for the targetted platforms (John Harrison)

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7c367ba8d9dc..2959dfed2aa0 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -863,11 +863,13 @@ static bool ctx_needs_runalone(const struct intel_context 
*ce)
bool ctx_is_protected = false;
 
/*
-* On MTL and newer platforms, protected contexts require setting
-* the LRC run-alone bit or else the encryption will not happen.
+* Wa_14019159160 - Case 2: mtl
+* On some platforms, protected contexts require setting
+* the LRC run-alone bit or else the encryption/decryption will not 
happen.
+* NOTE: Case 2 only applies to PXP use-case of said workaround.
 */
-   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
-   (ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
RENDER_CLASS)) {
+   if (IS_METEORLAKE(ce->engine->i915) && (ce->engine->class == 
COMPUTE_CLASS ||
+   ce->engine->class == 
RENDER_CLASS)) {
rcu_read_lock();
gem_ctx = rcu_dereference(ce->gem_context);
if (gem_ctx)

base-commit: 5429d55de723544dfc0630cf39d96392052b27a1
-- 
2.39.0



[Intel-gfx] [PATCH v5] drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-11-22 Thread Alan Previn
Debugging PXP issues can't even begin without understanding precedding
sequence of important events. Add drm_dbg into the most important PXP
events.

 v5 : - rebase.
 v4 : - rebase.
 v3 : - move gt_dbg to after mutex block in function
i915_gsc_proxy_component_bind. (Vivaik)
 v2 : - remove __func__ since drm_dbg covers that (Jani).
  - add timeout dbg of the restart from front-end (Alan).

Signed-off-by: Alan Previn 
Reviewed-by: Vivaik Balasubrawmanian 
---
 drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c |  2 ++
 drivers/gpu/drm/i915/pxp/intel_pxp.c | 15 ---
 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c |  5 +++--
 drivers/gpu/drm/i915/pxp/intel_pxp_session.c |  6 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_types.h   |  1 +
 5 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
index 5f138de3c14f..40817ebcca71 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
@@ -322,6 +322,7 @@ static int i915_gsc_proxy_component_bind(struct device 
*i915_kdev,
gsc->proxy.component = data;
gsc->proxy.component->mei_dev = mei_kdev;
mutex_unlock(&gsc->proxy.mutex);
+   gt_dbg(gt, "GSC proxy mei component bound\n");
 
return 0;
 }
@@ -342,6 +343,7 @@ static void i915_gsc_proxy_component_unbind(struct device 
*i915_kdev,
with_intel_runtime_pm(&i915->runtime_pm, wakeref)
intel_uncore_rmw(gt->uncore, HECI_H_CSR(MTL_GSC_HECI2_BASE),
 HECI_H_CSR_IE | HECI_H_CSR_RST, 0);
+   gt_dbg(gt, "GSC proxy mei component unbound\n");
 }
 
 static const struct component_ops i915_gsc_proxy_component_ops = {
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index dc327cf40b5a..e11f562b1876 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -303,6 +303,8 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
 
if (!pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: teardown for suspend/fini");
/*
 * To ensure synchronous and coherent session teardown completion
 * in response to suspend or shutdown triggers, don't use a worker.
@@ -324,6 +326,8 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
if (pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: teardown for restart");
/*
 * The arb-session is currently inactive and we are doing a reset and 
restart
 * due to a runtime event. Use the worker that was designed for this.
@@ -332,8 +336,11 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
timeout = intel_pxp_get_backend_timeout_ms(pxp);
 
-   if (!wait_for_completion_timeout(&pxp->termination, 
msecs_to_jiffies(timeout)))
+   if (!wait_for_completion_timeout(&pxp->termination, 
msecs_to_jiffies(timeout))) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: restart backend timed 
out (%d ms)",
+   timeout);
return -ETIMEDOUT;
+   }
 
return 0;
 }
@@ -414,10 +421,12 @@ int intel_pxp_start(struct intel_pxp *pxp)
int ret = 0;
 
ret = intel_pxp_get_readiness_status(pxp, PXP_READINESS_TIMEOUT);
-   if (ret < 0)
+   if (ret < 0) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: tried but not-avail 
(%d)", ret);
return ret;
-   else if (ret > 1)
+   } else if (ret > 1) {
return -EIO; /* per UAPI spec, user may retry later */
+   }
 
mutex_lock(&pxp->arb_mutex);
 
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
index 91e9622c07d0..d81750b9bdda 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
@@ -40,11 +40,12 @@ void intel_pxp_irq_handler(struct intel_pxp *pxp, u16 iir)
   GEN12_DISPLAY_APP_TERMINATED_PER_FW_REQ_INTERRUPT)) {
/* immediately mark PXP as inactive on termination */
intel_pxp_mark_termination_in_progress(pxp);
-   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED;
+   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED |
+  PXP_EVENT_TYPE_IRQ;
}
 
if (iir & GEN12_DISPLAY_STATE_RESET_COMPLETE_INTERRUPT)
-   pxp->session_events |= PXP_TERMINATION_COMPLETE;
+   pxp->session_events |= PXP_TERMINATION_COMPLETE | 
PXP_EVENT_TYPE_IRQ;
 

Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/pxp: Add drm_dbgs for critical PXP events. (rev5)

2023-11-22 Thread Teres Alexis, Alan Previn
On Wed, 2023-09-27 at 11:08 +, Patchwork wrote:
> Patch Details
> Series: drm/i915/pxp: Add drm_dbgs for critical PXP events. (rev5)
> URL:https://patchwork.freedesktop.org/series/123803/
alan:snip
> 
> Here are the unknown changes that may have been introduced in 
> Patchwork_123803v5_full:
> 
> IGT changes
> Possible regressions
> 
>   *   igt@perf_pmu@frequency@gt0:
>  *   shard-dg2: 
> PASS
>  -> 
> SKIP
alan: failure is verified as not related. but will post a rebase since its been 
a while.


[Intel-gfx] [PATCH 1/1] drm/i915/pxp: Add missing tag for Wa_14019159160

2023-11-21 Thread Alan Previn
Add missing tag for "Wa_14019159160 - Case 2" (for existing
PXP code that ensures run alone mode bit is set to allow
PxP-decryption.

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7c367ba8d9dc..c758fe4906a5 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -863,8 +863,10 @@ static bool ctx_needs_runalone(const struct intel_context 
*ce)
bool ctx_is_protected = false;
 
/*
+* Wa_140191591606 - Case 2: mtl
 * On MTL and newer platforms, protected contexts require setting
-* the LRC run-alone bit or else the encryption will not happen.
+* the LRC run-alone bit or else the encryption/decryption will not 
happen.
+* NOTE: Case 2 only applies to PXP use-case of said workaround.
 */
if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
(ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
RENDER_CLASS)) {

base-commit: dbdb47c227dc21b7bf98ada039bf42aac4b58b8b
-- 
2.39.0



Re: [Intel-gfx] [PATCH v1 1/1] drm/i915/pxp: Bail early in pxp tee backend on first teardown error

2023-11-16 Thread Teres Alexis, Alan Previn
On Thu, 2023-11-16 at 15:20 -0800, Teres Alexis, Alan Previn wrote:
> For Gen12 when using mei-pxp tee backend tranport, if we are coming
> up from a cold boot or from a resume (not runtime resume), we can
> optionally quicken the very first session cleanup that would occur
> as part of starting up a default PXP session. Typically a cleanup
> from a cold-start is expected to be quick so we can use a shorter
> timeout and skip retries (when getting non-success on calling
> backend transport for intel_pxp_tee_end_arb_fw_session).
alan:snip

> @@ -387,10 +389,14 @@ void intel_pxp_tee_end_arb_fw_session(struct intel_pxp 
> *pxp, u32 session_id)
>   ret = intel_pxp_tee_io_message(pxp,
>  &msg_in, sizeof(msg_in),
>  &msg_out, sizeof(msg_out),
> -NULL);
> +NULL, pxp->hw_state_coldstart ?
> +PXP_TRANSPORT_TIMEOUT_FAST_MS : 
> PXP_TRANSPORT_TIMEOUT_MS);
>  
> - /* Cleanup coherency between GT and Firmware is critical, so try again 
> if it fails */
> - if ((ret || msg_out.header.status != 0x0) && ++trials < 3)
> + /*
> +  * Cleanup coherency between GT and Firmware is critical, so try again 
> if it
> +  * fails, unless we are performing a cold-start reset
> +  */
> + if ((ret || msg_out.header.status != 0x0) && !pxp->hw_state_coldstart 
> &&  ++trials < 3)
alan: Take note I am working offline with sister teams to perform some end to
end conformance testing with more comprehensive OS stacks before we can verify
that this optimization doesnt break any existing use-cases.


[Intel-gfx] [PATCH v1 1/1] drm/i915/pxp: Bail early in pxp tee backend on first teardown error

2023-11-16 Thread Alan Previn
For Gen12 when using mei-pxp tee backend tranport, if we are coming
up from a cold boot or from a resume (not runtime resume), we can
optionally quicken the very first session cleanup that would occur
as part of starting up a default PXP session. Typically a cleanup
from a cold-start is expected to be quick so we can use a shorter
timeout and skip retries (when getting non-success on calling
backend transport for intel_pxp_tee_end_arb_fw_session).

While we are touching this area of code, lets not update
pxp->platform_cfg_is_bad so its not done inside an "is_foo"
helper, move that to the helper's caller.

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/pxp/intel_pxp.c |  1 +
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c   |  3 ++-
 drivers/gpu/drm/i915/pxp/intel_pxp_pm.c  |  1 +
 drivers/gpu/drm/i915/pxp/intel_pxp_session.c |  1 +
 drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 23 +---
 drivers/gpu/drm/i915/pxp/intel_pxp_types.h   | 10 +
 6 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index dc327cf40b5a..b4de34e6ad01 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -140,6 +140,7 @@ static void pxp_init_full(struct intel_pxp *pxp)
if (ret)
return;
 
+   pxp->hw_state_coldstart = true;
if (HAS_ENGINE(pxp->ctrl_gt, GSC0))
ret = intel_pxp_gsccs_init(pxp);
else
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c
index 75df959b0aa0..94a26faac14f 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c
@@ -24,7 +24,6 @@ is_fw_err_platform_config(struct intel_pxp *pxp, u32 type)
case PXP_STATUS_ERROR_API_VERSION:
case PXP_STATUS_PLATFCONFIG_KF1_NOVERIF:
case PXP_STATUS_PLATFCONFIG_KF1_BAD:
-   pxp->platform_cfg_is_bad = true;
return true;
default:
break;
@@ -228,6 +227,7 @@ int intel_pxp_gsccs_create_session(struct intel_pxp *pxp,
drm_err(&i915->drm, "Failed to init session %d, ret=[%d]\n", 
arb_session_id, ret);
} else if (msg_out.header.status != 0) {
if (is_fw_err_platform_config(pxp, msg_out.header.status)) {
+   pxp->platform_cfg_is_bad = true;
drm_info_once(&i915->drm,
  "PXP init-session-%d failed due to 
BIOS/SOC:0x%08x:%s\n",
  arb_session_id, msg_out.header.status,
@@ -271,6 +271,7 @@ void intel_pxp_gsccs_end_arb_fw_session(struct intel_pxp 
*pxp, u32 session_id)
session_id, ret);
} else if (msg_out.header.status != 0) {
if (is_fw_err_platform_config(pxp, msg_out.header.status)) {
+   pxp->platform_cfg_is_bad = true;
drm_info_once(&i915->drm,
  "PXP inv-stream-key-%u failed due to 
BIOS/SOC :0x%08x:%s\n",
  session_id, msg_out.header.status,
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_pm.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_pm.c
index 6dfd24918953..fd53b4fd7bac 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_pm.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_pm.c
@@ -60,6 +60,7 @@ static void _pxp_resume(struct intel_pxp *pxp, bool 
take_wakeref)
 void intel_pxp_resume_complete(struct intel_pxp *pxp)
 {
_pxp_resume(pxp, true);
+   pxp->hw_state_coldstart = true;
 }
 
 void intel_pxp_runtime_resume(struct intel_pxp *pxp)
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
index 0a3e66b0265e..7979648c6a2c 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
@@ -114,6 +114,7 @@ static int pxp_terminate_arb_session_and_global(struct 
intel_pxp *pxp)
intel_pxp_gsccs_end_arb_fw_session(pxp, ARB_SESSION);
else
intel_pxp_tee_end_arb_fw_session(pxp, ARB_SESSION);
+   pxp->hw_state_coldstart = false;
 
return ret;
 }
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
index b00d6c280159..07696738d31b 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
@@ -22,6 +22,7 @@
 #include "intel_pxp_types.h"
 
 #define PXP_TRANSPORT_TIMEOUT_MS 5000 /* 5 sec */
+#define PXP_TRANSPORT_TIMEOUT_FAST_MS 1000 /* 1 sec */
 
 static bool
 is_fw_err_platform_config(struct intel_pxp *pxp, u32 type)
@@ -30,7 +31,6 @@ is_fw_err_platform_config(struct intel_pxp *pxp, u32 type)
case PXP_STATUS_ERROR_API_VERSION:
case 

Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors

2023-11-15 Thread Teres Alexis, Alan Previn
On Wed, 2023-11-15 at 13:31 +, Tvrtko Ursulin wrote:
> On 14/11/2023 15:31, Teres Alexis, Alan Previn wrote:
> > On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > 
> 
> Regardless of the mei_pxp_send_message being temporarily broken, doesn't 
> Ville's logs suggest the PXP detection is altogether messed up? AFAIR 
> the plan was exactly to avoid stalls during Mesa init and new uapi was 
> added to achieve that. But it doesn't seem to be working?!
> 
> commit 3b918f4f0c8b5344af4058f1a12e2023363d0097
> Author: Alan Previn 
> Date:   Wed Aug 2 11:25:50 2023 -0700
> 
>  drm/i915/pxp: Optimize GET_PARAM:PXP_STATUS
> 
>  After recent discussions with Mesa folks, it was requested
>  that we optimize i915's GET_PARAM for the PXP_STATUS without
>  changing the UAPI spec.
> 
>  Add these additional optimizations:
> - If any PXP initializatoin flow failed, then ensure that
>   we catch it so that we can change the returned PXP_STATUS
>   from "2" (i.e. 'PXP is supported but not yet ready')
>   to "-ENODEV". This typically should not happen and if it
>   does, we have a platform configuration issue.
> - If a PXP arbitration session creation event failed
>   due to incorrect firmware version or blocking SOC fusing
>   or blocking BIOS configuration (platform reasons that won't
>   change if we retry), then reflect that blockage by also
>   returning -ENODEV in the GET_PARAM:PXP_STATUS.
> - GET_PARAM:PXP_STATUS should not wait at all if PXP is
>   supported but non-i915 dependencies (component-driver /
>   firmware) we are still pending to complete the init flows.
>   In this case, just return "2" immediately (i.e. 'PXP is
>   supported but not yet ready').
> 
> AFAIU is things failed there shouldn't be long waits, repeated/constant 
> ones even less so.
> 
alan: I agree - but I don't think MESA is detecting for PXP for above case...
which is designed to be quick if using the GET_PARAM IOCTL - i believe MESA
may actually be opting in to enforce PXP. When this happens we are required
to be stringent when managing the protected-hw-sessions and for certain PXP
operations we retry when a failure occurs - one in particular is the PXP hw
session teardown or reset. We are expected to retry up to 3 times to ensure
that the session is properly torn down (requirements from architecture)
unless the error returned from FW indicated that we dont have proper fusing or
security assets in the platform (i.e. lower level aspects of the platform won't
permit PXP support). All of this is already coded.

So we have two problems here:

1. The code for enforcing PXP when PXP is explicitly requested by UMD is
expecting the mei-component driver to follow the mei-component-interface spec
but a change was introduced by Alex that caused a bug in this lowest layer spec
(see: 
https://elixir.bootlin.com/linux/latest/source/drivers/misc/mei/pxp/mei_pxp.c#L25)
"Return: 0 on Success, <0 on Failure" for send and "Return: bytes sent on
Success, <0 on Failure" for receive - the change they made was returning
a positive number on send's success and i915 was checking the according to spec:

 ret = pxp_component->ops->send(pxp_component->tee_dev, msg_in, 
msg_in_size,
PXP_TRANSPORT_TIMEOUT_MS);
->   if (ret) {
 drm_err(&i915->drm, "Failed to send PXP TEE message\n");
 goto unlock;
 }

 ret = pxp_component->ops->recv(pxp_component->tee_dev, msg_out, 
msg_out_max_size,
PXP_TRANSPORT_TIMEOUT_MS);
 if (ret < 0) {
 drm_err(&i915->drm, "Failed to receive PXP TEE message\n");
 goto unlock;
 }

So i really cannot guarantee anything if the mei code itself is broken and not
complying to the spec we agreed on - i can change the above check for send
from "if (ret)" to "if (ret < 0)" but that would only be "working aroung"
the mei bug. As a side note, when we bail on a successful "send" thinking its
a failure, and try to "send" again, it triggers an link reset in the mei-hw
link because the receive was never picked up. IF i understand this correctly,
this is hw-fw design requirement, i.e. that the send-recv be an atomic FSM
sequence. That said, if the the send-failure was a real failure, the timeout
would have not been hit and the retry would have been much faster. As a side 
note,
Alex from mei team 

Re: [Intel-gfx] [PATCH v1 1/1] drm/i915/gt: Dont wait forever when idling in suspend

2023-11-14 Thread Teres Alexis, Alan Previn
On Tue, 2023-11-14 at 08:22 -0800, Teres Alexis, Alan Previn wrote:
> When suspending, add a timeout when calling
> intel_gt_pm_wait_for_idle else if we have a leaked
> wakeref (which would be indicative of a bug elsewhere
> in the driver), driver will at exit the suspend-resume
> cycle, after the kernel detects the held reference and
> prints a message to abort suspending instead of hanging
> in the kernel forever which then requires serial connection
> or ramoops dump to debug further.
NOTE: this patch originates from Patch#3 of this other series
https://patchwork.freedesktop.org/series/121916/ (rev 5 and prior)
and was decided to be moved out as its own patch since this
patch is trying to improve general debuggability as opposed
to resolving that bug being resolved in above series.
alan:snip

> +int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms)
>  {
> - int err;
> + int err = 0;
>  
>   might_sleep();
>  
> - err = wait_var_event_killable(&wf->wakeref,
> -   !intel_wakeref_is_active(wf));
> + if (!timeout_ms)
> + err = wait_var_event_killable(&wf->wakeref,
> +   !intel_wakeref_is_active(wf));
> + else if (wait_var_event_timeout(&wf->wakeref,
> + !intel_wakeref_is_active(wf),
> + msecs_to_jiffies(timeout_ms)) < 1)
> + err = -ETIMEDOUT;
> +
alan: paraphrasing feedback from Tvrtko on the originating series this patch:
it would be good idea to add error-injection into this timeout to ensure
we dont have any other subsytem that could inadvertently leak an rpm
wakeref (and catch such bugs in future pre-merge).




Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-11-14 Thread Teres Alexis, Alan Previn
On Tue, 2023-11-14 at 17:52 +, Tvrtko Ursulin wrote:
> On 14/11/2023 17:37, Teres Alexis, Alan Previn wrote:
> > On Tue, 2023-11-14 at 17:27 +, Tvrtko Ursulin wrote:
> > > On 13/11/2023 17:57, Teres Alexis, Alan Previn wrote:
> > > > On Wed, 2023-10-25 at 13:58 +0100, Tvrtko Ursulin wrote:
> > > > > On 04/10/2023 18:59, Teres Alexis, Alan Previn wrote:
> > > > > > On Thu, 2023-09-28 at 13:46 +0100, Tvrtko Ursulin wrote:
> > > > > > > On 27/09/2023 17:36, Teres Alexis, Alan Previn wrote:
> 
alan:snip

> > alan: So i did trace back the gt->wakeref before i posted these patches and
> > see that within these runtime get/put calls, i believe the first 'get' leads
> > to __intel_wakeref_get_first which calls intel_runtime_pm_get via rpm_get
> > helper and eventually executes a pm_runtime_get_sync(rpm->kdev); (hanging 
> > off
> > i915_device). (naturally there is a corresponding mirros for the 
> > '_put_last').
> > So this means the first-get and last-put lets the kernel know. Thats why 
> > when
> > i tested this patch, it did actually cause the suspend to abort from kernel 
> > side
> > and the kernel would print a message indicating i915 was the one that didnt
> > release all refs.
> 
> Ah that would be much better then.
> 
> Do you know if everything gets resumed/restored correctly in that case 
> or we would need some additional work to maybe early exit from callers 
> of wait_for_suspend()?
alan: So assuming we are still discussing about a "potentially new
future leaked-wakeref bug" (i.e. putting aside the fact that
Patch #1 + #2 resolves this specific series' bug), based on the
previous testing we did, after this timeout-bail trigger,
the suspend flow bails and gt/guc operation does actually continue
as normal. However, its been a long time since we tested this so
i am not sure of how accidental-new-future bugs might play to this
assumption especially if some other subsystem that leaked the rpm
wakref but that subsystem did NOT get reset like how GuC is reset
at the end of suspend.

> 
> What I would also ask is to see if something like injecting a probing 
> failure is feasible, so we can have this new timeout exit path 
> constantly/regularly tested in CI.
alan: Thats a good idea. In line with this, i would like to point out that
rev6 of this series has been posted but i removed this Patch #3. However i did
post this Patch #3 as a standalone patch here: 
https://patchwork.freedesktop.org/series/126414/
as i anticipate this patch will truly help with future issue debuggability.

That said, i shall post a review on that patch with your suggestion to add
an injected probe error for the suspend-resume flow and follow up on that one
separately.

> 
> Regards,
> 
> Tvrtko
> 
> > alan: Anyways, i have pulled this patch out of rev6 of this series and 
> > created a
> > separate standalone patch for this patch #3 that we review independently.
> > 



Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-11-14 Thread Teres Alexis, Alan Previn
On Tue, 2023-11-14 at 12:36 -0500, Vivi, Rodrigo wrote:
> On Tue, Nov 14, 2023 at 05:27:18PM +, Tvrtko Ursulin wrote:
> > 
> > On 13/11/2023 17:57, Teres Alexis, Alan Previn wrote:
> > > On Wed, 2023-10-25 at 13:58 +0100, Tvrtko Ursulin wrote:
> > > > On 04/10/2023 18:59, Teres Alexis, Alan Previn wrote:
> > > > > On Thu, 2023-09-28 at 13:46 +0100, Tvrtko Ursulin wrote:
> > > > > > On 27/09/2023 17:36, Teres Alexis, Alan Previn wrote:
> 
> That's a good point.
> Well, current code is already bad and buggy on suspend-resume. We could get
> suspend stuck forever without any clue of what happened.
> At least the proposed patch add some gt_warn. But, yes, the right thing
> to do should be entirely abort the suspend in case of timeout, besides the
> gt_warn.
alan: yes - thats was the whole idea for Patch #3. Only after putting such
code did we have much better debuggability on real world production platforms
+ config that may not have serial-port or ramoops-dump by default.

Btw, as per previous comments by Tvrtko - which i agreed with, I have
moved this one single patch into a separate patch on its own.
See here: https://patchwork.freedesktop.org/series/126414/
(I also maintained the RB from you Rodrigo because the patch did not changed).
And yes, the gt_warn is still in place :)


Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-11-14 Thread Teres Alexis, Alan Previn
On Tue, 2023-11-14 at 17:27 +, Tvrtko Ursulin wrote:
> On 13/11/2023 17:57, Teres Alexis, Alan Previn wrote:
> > On Wed, 2023-10-25 at 13:58 +0100, Tvrtko Ursulin wrote:
> > > On 04/10/2023 18:59, Teres Alexis, Alan Previn wrote:
> > > > On Thu, 2023-09-28 at 13:46 +0100, Tvrtko Ursulin wrote:
> > > > > On 27/09/2023 17:36, Teres Alexis, Alan Previn wrote:
alan: snip
> 
> > alan: I won't say its NOT fixing anything - i am saying it's not fixing
> > this specific bug where we have the outstanding G2H from a context 
> > destruction
> > operation that got dropped. I am okay with dropping this patch in the next 
> > rev
> > but shall post a separate stand alone version of Patch3 - because I believe
> > all other i915 subsystems that take runtime-pm's DO NOT wait forever when 
> > entering
> > suspend - but GT is doing this. This means if there ever was a bug 
> > introduced,
> > it would require serial port or ramoops collection to debug. So i think 
> > such a
> > patch, despite not fixing this specific bug will be very helpful for 
> > debuggability
> > of future issues. After all, its better to fail our suspend when we have a 
> > bug
> > rather than to hang the kernel forever.
> 
> Issue I have is that I am not seeing how it fails the suspend when 
> nothing is passed out from *void* wait_suspend(gt). To me it looks the 
> suspend just carries on. And if so, it sounds dangerous to allow it to 
> do that with any future/unknown bugs in the suspend sequence. Existing 
> timeout wedges the GPU (and all that entails). New code just says "meh 
> I'll just carry on regardless".

alan: So i did trace back the gt->wakeref before i posted these patches and
see that within these runtime get/put calls, i believe the first 'get' leads
to __intel_wakeref_get_first which calls intel_runtime_pm_get via rpm_get
helper and eventually executes a pm_runtime_get_sync(rpm->kdev); (hanging off
i915_device). (naturally there is a corresponding mirros for the '_put_last').
So this means the first-get and last-put lets the kernel know. Thats why when
i tested this patch, it did actually cause the suspend to abort from kernel side
and the kernel would print a message indicating i915 was the one that didnt
release all refs.

alan: Anyways, i have pulled this patch out of rev6 of this series and created a
separate standalone patch for this patch #3 that we review independently.



[Intel-gfx] [PATCH v1 1/1] drm/i915/gt: Dont wait forever when idling in suspend

2023-11-14 Thread Alan Previn
When suspending, add a timeout when calling
intel_gt_pm_wait_for_idle else if we have a leaked
wakeref (which would be indicative of a bug elsewhere
in the driver), driver will at exit the suspend-resume
cycle, after the kernel detects the held reference and
prints a message to abort suspending instead of hanging
in the kernel forever which then requires serial connection
or ramoops dump to debug further.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  7 ++-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |  7 ++-
 drivers/gpu/drm/i915/intel_wakeref.c  | 14 ++
 drivers/gpu/drm/i915/intel_wakeref.h  |  6 --
 5 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 40687806d22a..ffef963037f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -686,7 +686,7 @@ void intel_engines_release(struct intel_gt *gt)
if (!engine->release)
continue;
 
-   intel_wakeref_wait_for_idle(&engine->wakeref);
+   intel_wakeref_wait_for_idle(&engine->wakeref, 0);
GEM_BUG_ON(intel_engine_pm_is_awake(engine));
 
engine->release(engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index f5899d503e23..25cb39ba9fdf 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -306,6 +306,8 @@ int intel_gt_resume(struct intel_gt *gt)
 
 static void wait_for_suspend(struct intel_gt *gt)
 {
+   int final_timeout_ms = (I915_GT_SUSPEND_IDLE_TIMEOUT * 10);
+
if (!intel_gt_pm_is_awake(gt))
return;
 
@@ -318,7 +320,10 @@ static void wait_for_suspend(struct intel_gt *gt)
intel_gt_retire_requests(gt);
}
 
-   intel_gt_pm_wait_for_idle(gt);
+   /* we are suspending, so we shouldn't be waiting forever */
+   if (intel_gt_pm_wait_timeout_for_idle(gt, final_timeout_ms) == 
-ETIMEDOUT)
+   gt_warn(gt, "bailing from %s after %d milisec timeout\n",
+   __func__, final_timeout_ms);
 }
 
 void intel_gt_suspend_prepare(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index b1eeb5b33918..1757ca4c3077 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -68,7 +68,12 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
-   return intel_wakeref_wait_for_idle(>->wakeref);
+   return intel_wakeref_wait_for_idle(>->wakeref, 0);
+}
+
+static inline int intel_gt_pm_wait_timeout_for_idle(struct intel_gt *gt, int 
timeout_ms)
+{
+   return intel_wakeref_wait_for_idle(>->wakeref, timeout_ms);
 }
 
 void intel_gt_pm_init_early(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/intel_wakeref.c 
b/drivers/gpu/drm/i915/intel_wakeref.c
index 623a69089386..f2611c65246b 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.c
+++ b/drivers/gpu/drm/i915/intel_wakeref.c
@@ -113,14 +113,20 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
 "wakeref.work", &key->work, 0);
 }
 
-int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
+int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms)
 {
-   int err;
+   int err = 0;
 
might_sleep();
 
-   err = wait_var_event_killable(&wf->wakeref,
- !intel_wakeref_is_active(wf));
+   if (!timeout_ms)
+   err = wait_var_event_killable(&wf->wakeref,
+ !intel_wakeref_is_active(wf));
+   else if (wait_var_event_timeout(&wf->wakeref,
+   !intel_wakeref_is_active(wf),
+   msecs_to_jiffies(timeout_ms)) < 1)
+   err = -ETIMEDOUT;
+
if (err)
return err;
 
diff --git a/drivers/gpu/drm/i915/intel_wakeref.h 
b/drivers/gpu/drm/i915/intel_wakeref.h
index ec881b097368..302694a780d2 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.h
+++ b/drivers/gpu/drm/i915/intel_wakeref.h
@@ -251,15 +251,17 @@ __intel_wakeref_defer_park(struct intel_wakeref *wf)
 /**
  * intel_wakeref_wait_for_idle: Wait until the wakeref is idle
  * @wf: the wakeref
+ * @timeout_ms: Timeout in ms, 0 means never timeout.
  *
  * Wait for the earlier asynchronous release of the wakeref. Note
  * this will wait for any third party as well, so make sure you only wait
  * when you have control over the wakeref and trust no one else is 

[Intel-gfx] [PATCH v6 1/2] drm/i915/guc: Flush context destruction worker at suspend

2023-11-14 Thread Alan Previn
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d37698bd6b91..9d1915482898 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1611,6 +1611,11 @@ static void guc_flush_submissions(struct intel_guc *guc)
spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+void intel_guc_submission_flush_work(struct intel_guc *guc)
+{
+   flush_work(&guc->submission_state.destroyed_worker);
+}
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c57b29cdb1a6..b6df75622d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
   bool interruptible,
   long timeout);
 
+void intel_guc_submission_flush_work(struct intel_guc *guc);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
return guc->submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 27f6561dd731..cb74835b1a13 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -695,6 +695,8 @@ void intel_uc_suspend(struct intel_uc *uc)
return;
}
 
+   intel_guc_submission_flush_work(guc);
+
with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
-- 
2.39.0



[Intel-gfx] [PATCH v6 2/2] drm/i915/guc: Close deregister-context race against CT-loss

2023-11-14 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(>->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn 
Signed-off-by: Anshuman Gupta 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ---
 2 files changed, 80 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 0d812f4d787d..3b27218aabe2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -28,6 +28,13 @@ void i915_gem_suspend(struct drm_i915_private *i915)
GEM_TRACE("%s\n", dev_name(i915->drm.dev));
 
intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 0);
+   /*
+* On rare occasions, we've observed the fence completion triggers
+* free_engines asynchronously via rcu_call. Ensure those are done.
+* This path is only called on suspend, so it's an acceptable cost.
+*/
+   rcu_barrier();
+
flush_workqueue(i915->wq);
 
/*
@@ -160,6 +167,9 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 * machine in an unusable condition.
 */
 
+   /* Like i915_gem_suspend, flush tasks staged from fence triggers */
+   rcu_barrier();
+
for_each_gt(gt, i915, i)
intel_gt_suspend_late(gt);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9d1915482898..225747115f78 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -236,6 +236,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(&ce->guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -613,6 +620,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -623,7 +632,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(&guc->outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
@@ -3286,1

[Intel-gfx] [PATCH v6 0/2] Resolve suspend-resume racing with GuC destroy-context-worker

2023-11-14 Thread Alan Previn
This series is the result of debugging issues root caused to
races between the GuC's destroyed_worker_func being triggered
vs repeating suspend-resume cycles with concurrent delayed
fence signals for engine-freeing.

The reproduction steps require that an app is launched right
before the start of the suspend cycle where it creates a
new gem context and submits a tiny workload that would
complete in the middle of the suspend cycle. However this
app uses dma-buffer sharing or dma-fence with non-GPU
objects or signals that eventually triggers a FENCE_FREE
via__i915_sw_fence_notify that connects to engines_notify ->
free_engines_rcu -> intel_context_put ->
kref_put(&ce->ref..) that queues the worker after the GuCs
CTB has been disabled (i.e. after i915-gem's suspend-late).

This sequence is a corner-case and required repeating this
app->suspend->resume cycle ~1500 times across 4 identical
systems to see it once. That said, based on above callstack,
it is clear that merely flushing the context destruction worker,
which is obviously missing and needed, isn't sufficient.

Because of that, this series adds additional patches besides
the obvious (Patch #1) flushing of the worker during the
suspend flows. It also includes (Patch #2) closing a race
between sending the context-deregistration H2G vs the CTB
getting disabled in the midst of it (by detecing the failure
and unrolling the guc-lrc-unpin flow) and adding an additional
rcu_barrier in the gem-suspend flow to purge outstanding
rcu defered tasks that may include context destruction.

This patch was tested and confirmed to be reliably working
after running ~1500 suspend resume cycles on 4 concurrent
machines.

Changes from prior revs:
   v5: - Remove Patch #3 which doesnt solve this exact bug
 but can be a separate patch(Tvrtko).
   v4: - In Patch #2, change the position of the calls into
 rcu_barrier based on latest testing data. (Alan/Anshuman).
   - In Patch #3, fix the timeout value selection for the
 final gt-pm idle-wait that was incorrectly using a 'ns'
 #define as a milisec timeout.
   v3: - In Patch #3, when deregister_context fails, instead
 of calling intel_gt_pm_put(that might sleep), call
 __intel_wakeref_put (without ASYNC flag) (Rodrigo/Anshuman).
   - In wait_for_suspend add an rcu_barrier before we
 proceed to wait for idle. (Anshuman)
   v2: - Patch #2 Restructure code in guc_lrc_desc_unpin so
 it's more readible to differentiate (1)direct guc-id
 cleanup ..vs (2) sending the H2G ctx-destroy action ..
 vs (3) the unrolling steps if the H2G fails.
   - Patch #2 Add a check to close the race sooner by checking
 for intel_guc_is_ready from destroyed_worker_func.
   - Patch #2 When guc_submission_send_busy_loop gets a
 failure from intel_guc_send_busy_loop, we need to undo
 i.e. decrement the outstanding_submission_g2h.
   - Patch #3 In wait_for_suspend, fix checking of return from
 intel_gt_pm_wait_timeout_for_idle to now use -ETIMEDOUT
 and add documentation for intel_wakeref_wait_for_idle.
 (Rodrigo).

Alan Previn (2):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss

 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 86 ---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  2 +
 4 files changed, 89 insertions(+), 11 deletions(-)


base-commit: 3d1e36691e73b3946b4a9ca8132a34f0319ff984
-- 
2.39.0



Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors

2023-11-14 Thread Teres Alexis, Alan Previn
On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > From: Alexander Usyskin 
> > 
> > Disable and enable mei-pxp client on errors to clean the internal state.
> 
> This broke i915 on my Alderlake-P laptop.
> 


Hi Alex, i just relooked at the series that got merged, and i noticed
that in patch #3 of the series, you had changed mei_pxp_send_message
to return bytes sent instead of zero on success. IIRC, we had
agreed to not effect the behavior of this component interface (other
than adding the timeout) - this was the intention of Patch #4 that i
was pushing for in order to spec the interface (which continues
to say zero on success). We should fix this to stay with the original
behavior - where mei-pxp should NOT send partial packets and
will only return zero in success case where success is sending of
the complete packets - so we don't need to get back the "bytes sent"
from mei_pxp_send_message. So i think this might be causing the problem.


Side note  to Ville:, are you enabling PXP kernel config by default in
all MESA contexts? I recall that MESA folks were running some CI testing
with enable pxp contexts, but didn't realize this is being enabled by
default in all contexts. Please be aware that enabling pxp-contexts
would temporarily disabled runtime-pm during that contexts lifetime.
Also pxp contexts will be forced to be irrecoverable if it ever hangs.
The former is a hardware architecture requirement but doesn't do anything
if you're enabling display (which I beleive also blocks in ADL). The
latter was a requirement to comply with Vulkan.

...alan




Re: [Intel-gfx] [PATCH v3] drm/i915: Skip pxp init if gt is wedged

2023-11-13 Thread Teres Alexis, Alan Previn
On Mon, 2023-11-13 at 14:49 -0800, Zhanjun Dong wrote:
> The gt wedged could be triggered by missing guc firmware file, HW not
> working, etc. Once triggered, it means all gt usage is dead, therefore we
> can't enable pxp under this fatal error condition.
> 
> 
alan:skip
alan: this looks good (as per our offline review/discussion),
we dont mess with the current driver startup sequence (i.e. pxp
failures can never pull down the driver probing and will not
generate warnings or errors). Also, if something does break for PXP,
we only do a drm_debug if the failure returned is not -ENODEV
(since -ENODEV can happen on the majority of cases with
legacy products or with non-PXP kernel configs):

Reviewed-by: Alan Previn 


Re: [Intel-gfx] [PATCH] drm/i915: Initialize residency registers earlier

2023-11-13 Thread Teres Alexis, Alan Previn
On Mon, 2023-10-30 at 16:45 -0700, Belgaumkar, Vinay wrote:
alan:skip
> +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c
> @@ -608,11 +608,13 @@ void intel_rc6_init(struct intel_rc6 *rc6)
>   /* Disable runtime-pm until we can save the GPU state with rc6 pctx */
>   rpm_get(rc6);
>  
> - if (!rc6_supported(rc6))
> - return;
> -
>   rc6_res_reg_init(rc6);
>  
> + if (!rc6_supported(rc6)) {
> + rpm_put(rc6);
> + return;
> + }
> +
>   if (IS_CHERRYVIEW(i915))
>   err = chv_rc6_init(rc6);
>   else if (IS_VALLEYVIEW(i915))

alan: as far as the bug this patch is addressing  (i.e. ensuring that
intel_rc6_print_residency has valid rc6.res_reg values for correct dprc
debugfs output when rc6 is disabled) and release the rpm, this looks good
to me.

However, when looking at the other code flows around the intel_rc6_init/fini
and intel_rc6_enable/disable, i must point out that the calls to rpm_get
and rpm_put from these functions don't seem to be designed with proper
mirror-ing. For example during driver startup, intel_rc6_init (which is called
by intel_gt_pm_init) calls rpm_get at the start but doesn't call rpm_put
before it returns. But back up the callstack in intel_gt_init,
after intel_gt_pm_init, a couple of subsystems get intialized before 
intel_gt_resume
is called - which in turn calls intel_rc6_enable which does the rpm_put at its 
end.
However before that get and put, i see several error paths that trigger cleanups
(leading eventually to driver load failure), but i think some cases are 
completely
missing the put_rpm that intel_rc6_init took. Additionally, the function names 
of
rc6_init and __get_rc6 inside i915_pmu.c seems to be confusing although static.
I wish those were named pmu_rc6_init and __pmu_rc6_init and etc.

Anyways, as per offline conversation, we are not trying to solve every
bug and design gap in this patch but just one specific bug fix. So as
per the agreed condition that we create a separate internal issue
to address this "lack of a clean mirrored-function design of rpm_get/put
across the rc6 startup sequences", here is my rb:

Reviewed-by: Alan Previn 



Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-11-13 Thread Teres Alexis, Alan Previn
On Wed, 2023-10-25 at 13:58 +0100, Tvrtko Ursulin wrote:
> On 04/10/2023 18:59, Teres Alexis, Alan Previn wrote:
> > On Thu, 2023-09-28 at 13:46 +0100, Tvrtko Ursulin wrote:
> > > On 27/09/2023 17:36, Teres Alexis, Alan Previn wrote:
alan:snip
> > > It is not possible to wait for lost G2H in something like
> > > intel_uc_suspend() and simply declare "bad things happened" if it times
> > > out there, and forcibly clean it all up? (Which would include releasing
> > > all the abandoned pm refs, so this patch wouldn't be needed.)
> > > 
> > alan: I'm not sure if intel_uc_suspend should be held up by gt-level wakeref
> > check unless huc/guc/gsc-uc are the only ones ever taking a gt wakeref.
> > 
> > As we already know, what we do know from a uc-perspective:
> > -  ensure the outstanding guc related workers is flushed which we didnt 
> > before
> > (addressed by patch #1).
> > - any further late H2G-SchedDisable is not leaking wakerefs when calling H2G
> > and not realizing it failed (addressed by patch #2).
> > - (we already), "forcibly clean it all up" at the end of the 
> > intel_uc_suspend
> > when we do the guc reset and cleanup all guc-ids. (pre-existing upstream 
> > code)
> > - we previously didnt have a coherrent guarantee that "this is the end" 
> > i.e. no
> > more new request after intel_uc_suspend. I mean by code logic, we thought 
> > we did
> > (thats why intel_uc_suspend ends wth a guc reset), but we now know 
> > otherwise.
> > So we that fix by adding the additional rcu_barrier (also part of patch #2).
> 
> It is not clear to me from the above if that includes cleaning up the 
> outstanding CT replies or no. But anyway..
alan: Apologies, i should have made it clearer by citing the actual function
name calling out the steps of interest: So if you look at the function
"intel_gt_suspend_late", it calls "intel_uc_suspend" which in turn calls 
"intel_guc_suspend" which does a soft reset onto guc firmware - so after that
we can assume all outstanding G2Hs are done. Going back up the stack,
intel_gt_suspend_late finally calls gt_sanitize which calls 
"intel_uc_reset_prepare"
which calls "intel_guc_submission_reset_prepare" which calls
"scrub_guc_desc_for_outstanding_g2h" which does what we are looking for for all
types of outstanding G2H. As per prior review comments, besides closing the race
condition, we were missing an rcu_barrier (which caused new requests to come in 
way
late). So we have resolved both in Patch #2.

> > That said, patch-3 is NOT fixing a bug in guc -its about "if we ever have
> > a future racy gt-wakeref late-leak somewhere - no matter which subsystem
> > took it (guc is not the only subsystem taking gt wakerefs), we at
> > least don't hang forever in this code. Ofc, based on that, even without
> > patch-3 i am confident the issue is resolved anyway.
> > So we could just drop patch-3 is you prefer?
> 
> .. given this it does sound to me that if you are confident patch 3 
> isn't fixing anything today that it should be dropped.
alan: I won't say its NOT fixing anything - i am saying it's not fixing
this specific bug where we have the outstanding G2H from a context destruction
operation that got dropped. I am okay with dropping this patch in the next rev
but shall post a separate stand alone version of Patch3 - because I believe
all other i915 subsystems that take runtime-pm's DO NOT wait forever when 
entering
suspend - but GT is doing this. This means if there ever was a bug introduced,
it would require serial port or ramoops collection to debug. So i think such a
patch, despite not fixing this specific bug will be very helpful for 
debuggability
of future issues. After all, its better to fail our suspend when we have a bug
rather than to hang the kernel forever.





Re: [Intel-gfx] [PATCH] drm/i915: Skip pxp init if gt is wedged

2023-10-31 Thread Teres Alexis, Alan Previn
On Fri, 2023-10-27 at 10:13 +0300, Jani Nikula wrote:
> On Thu, 26 Oct 2023, Zhanjun Dong  wrote:
> 
alan:snip
> I'll note that nobody checks intel_pxp_init() return status, so this
> silently skips PXP.
> 
> BR,
> Jani.

alan:snip
> > +   if (intel_gt_is_wedged(gt))
> > +   return -ENODEV;
> > +

alan: wondering if we can add a drm_dbg in the caller of intel_pxp_init and
use a unique return value for the case of gt_is_wedged (for example: -ENXIO.).
As we know gt being wedged at startup basically means all gt usage is dead
and therefore we cant enable PXP (along with everything else that needs 
submission/
guc/ etc). With a drm-debug in the caller that prints that return value, it
helps us to differentiate between gt-is-wedged vs platform config doesnt support
PXP. However, this would mean new drm-debug 'noise' for platforms that i915 just
doesn't support PXP on at all which would be okay if dont use drm_warn or 
drm_err
and use 'softer' message like "PXP skipped with %d".

Please treat above comment as a "nit" - i.e. existing patch is good enough for 
me...
(after addressing Jani's request for more commit message info). ...alan




[Intel-gfx] [PATCH v5 2/3] drm/i915/guc: Close deregister-context race against CT-loss

2023-10-13 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert rcu_barriers at the start
of i915_gem_suspend (part of driver's suspend prepare) and
again in i915_gem_suspend_late so that all such cases have
completed and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(>->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn 
Signed-off-by: Anshuman Gupta 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ---
 2 files changed, 80 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 0d812f4d787d..3b27218aabe2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -28,6 +28,13 @@ void i915_gem_suspend(struct drm_i915_private *i915)
GEM_TRACE("%s\n", dev_name(i915->drm.dev));
 
intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 0);
+   /*
+* On rare occasions, we've observed the fence completion triggers
+* free_engines asynchronously via rcu_call. Ensure those are done.
+* This path is only called on suspend, so it's an acceptable cost.
+*/
+   rcu_barrier();
+
flush_workqueue(i915->wq);
 
/*
@@ -160,6 +167,9 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 * machine in an unusable condition.
 */
 
+   /* Like i915_gem_suspend, flush tasks staged from fence triggers */
+   rcu_barrier();
+
for_each_gt(gt, i915, i)
intel_gt_suspend_late(gt);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a5b68f77e494..9806b33c8561 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -235,6 +235,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(&ce->guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -612,6 +619,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -622,7 +631,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(&guc->outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
@@ -3205,1

[Intel-gfx] [PATCH v5 0/3] Resolve suspend-resume racing with GuC destroy-context-worker

2023-10-13 Thread Alan Previn
This series is the result of debugging issues root caused to
races between the GuC's destroyed_worker_func being triggered
vs repeating suspend-resume cycles with concurrent delayed
fence signals for engine-freeing.

The reproduction steps require that an app is launched right
before the start of the suspend cycle where it creates a
new gem context and submits a tiny workload that would
complete in the middle of the suspend cycle. However this
app uses dma-buffer sharing or dma-fence with non-GPU
objects or signals that eventually triggers a FENCE_FREE
via__i915_sw_fence_notify that connects to engines_notify ->
free_engines_rcu -> intel_context_put ->
kref_put(&ce->ref..) that queues the worker after the GuCs
CTB has been disabled (i.e. after i915-gem's suspend-late).

This sequence is a corner-case and required repeating this
app->suspend->resume cycle ~1500 times across 4 identical
systems to see it once. That said, based on above callstack,
it is clear that merely flushing the context destruction worker,
which is obviously missing and needed, isn't sufficient.

Because of that, this series adds additional patches besides
the obvious (Patch #1) flushing of the worker during the
suspend flows. It also includes (Patch #2) closing a race
between sending the context-deregistration H2G vs the CTB
getting disabled in the midst of it (by detecing the failure
and unrolling the guc-lrc-unpin flow) and (Patch #32) not
infinitely waiting in intel_gt_pm_wait_timeout_for_idle
when in the suspend-flow.

NOTE: We are still observing one more wakeref leak from gt
but not necessarilty guc. We are still debugging this so
this series will very likely get rev'd up again.

Changes from prior revs:
   v4: - In Patch #2, change the position of the calls into
 rcu_barrier based on latest testing data. (Alan/Anshuman).
   - In Patch #3, fix the timeout value selection for the
 final gt-pm idle-wait that was incorrectly using a 'ns'
 #define as a milisec timeout.
   v3: - In Patch #3, when deregister_context fails, instead
 of calling intel_gt_pm_put(that might sleep), call
 __intel_wakeref_put (without ASYNC flag) (Rodrigo/Anshuman).
   - In wait_for_suspend add an rcu_barrier before we
 proceed to wait for idle. (Anshuman)
   v2: - Patch #2 Restructure code in guc_lrc_desc_unpin so
 it's more readible to differentiate (1)direct guc-id
 cleanup ..vs (2) sending the H2G ctx-destroy action ..
 vs (3) the unrolling steps if the H2G fails.
   - Patch #2 Add a check to close the race sooner by checking
 for intel_guc_is_ready from destroyed_worker_func.
   - Patch #2 When guc_submission_send_busy_loop gets a
 failure from intel_guc_send_busy_loop, we need to undo
 i.e. decrement the outstanding_submission_g2h.
   - Patch #3 In wait_for_suspend, fix checking of return from
 intel_gt_pm_wait_timeout_for_idle to now use -ETIMEDOUT
 and add documentation for intel_wakeref_wait_for_idle.
 (Rodrigo).

Alan Previn (3):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss
  drm/i915/gt: Timeout when waiting for idle in suspending

 drivers/gpu/drm/i915/gem/i915_gem_pm.c| 10 +++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  7 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |  7 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 86 ---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  2 +
 drivers/gpu/drm/i915/intel_wakeref.c  | 14 ++-
 drivers/gpu/drm/i915/intel_wakeref.h  |  6 +-
 9 files changed, 116 insertions(+), 20 deletions(-)


base-commit: 13e6d61391ba3da0e26ce5c788d1fe26e036294c
-- 
2.39.0



[Intel-gfx] [PATCH v5 1/3] drm/i915/guc: Flush context destruction worker at suspend

2023-10-13 Thread Alan Previn
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2cce5ec1ff00..a5b68f77e494 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1610,6 +1610,11 @@ static void guc_flush_submissions(struct intel_guc *guc)
spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+void intel_guc_submission_flush_work(struct intel_guc *guc)
+{
+   flush_work(&guc->submission_state.destroyed_worker);
+}
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c57b29cdb1a6..b6df75622d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
   bool interruptible,
   long timeout);
 
+void intel_guc_submission_flush_work(struct intel_guc *guc);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
return guc->submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 98b103375b7a..eb3554cb5ea4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -693,6 +693,8 @@ void intel_uc_suspend(struct intel_uc *uc)
return;
}
 
+   intel_guc_submission_flush_work(guc);
+
with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
-- 
2.39.0



[Intel-gfx] [PATCH v5 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-10-13 Thread Alan Previn
When suspending, add a timeout when calling
intel_gt_pm_wait_for_idle else if we have a lost
G2H event that holds a wakeref (which would be
indicative of a bug elsewhere in the driver),
driver will at least complete the suspend-resume
cycle, (albeit not hitting all the targets for
low power hw counters), instead of hanging in the kernel.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  7 ++-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |  7 ++-
 drivers/gpu/drm/i915/intel_wakeref.c  | 14 ++
 drivers/gpu/drm/i915/intel_wakeref.h  |  6 --
 5 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 179d9546865b..4b45a8f7fe5a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -686,7 +686,7 @@ void intel_engines_release(struct intel_gt *gt)
if (!engine->release)
continue;
 
-   intel_wakeref_wait_for_idle(&engine->wakeref);
+   intel_wakeref_wait_for_idle(&engine->wakeref, 0);
GEM_BUG_ON(intel_engine_pm_is_awake(engine));
 
engine->release(engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index f5899d503e23..25cb39ba9fdf 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -306,6 +306,8 @@ int intel_gt_resume(struct intel_gt *gt)
 
 static void wait_for_suspend(struct intel_gt *gt)
 {
+   int final_timeout_ms = (I915_GT_SUSPEND_IDLE_TIMEOUT * 10);
+
if (!intel_gt_pm_is_awake(gt))
return;
 
@@ -318,7 +320,10 @@ static void wait_for_suspend(struct intel_gt *gt)
intel_gt_retire_requests(gt);
}
 
-   intel_gt_pm_wait_for_idle(gt);
+   /* we are suspending, so we shouldn't be waiting forever */
+   if (intel_gt_pm_wait_timeout_for_idle(gt, final_timeout_ms) == 
-ETIMEDOUT)
+   gt_warn(gt, "bailing from %s after %d milisec timeout\n",
+   __func__, final_timeout_ms);
 }
 
 void intel_gt_suspend_prepare(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index b1eeb5b33918..1757ca4c3077 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -68,7 +68,12 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
-   return intel_wakeref_wait_for_idle(>->wakeref);
+   return intel_wakeref_wait_for_idle(>->wakeref, 0);
+}
+
+static inline int intel_gt_pm_wait_timeout_for_idle(struct intel_gt *gt, int 
timeout_ms)
+{
+   return intel_wakeref_wait_for_idle(>->wakeref, timeout_ms);
 }
 
 void intel_gt_pm_init_early(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/intel_wakeref.c 
b/drivers/gpu/drm/i915/intel_wakeref.c
index 623a69089386..f2611c65246b 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.c
+++ b/drivers/gpu/drm/i915/intel_wakeref.c
@@ -113,14 +113,20 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
 "wakeref.work", &key->work, 0);
 }
 
-int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
+int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms)
 {
-   int err;
+   int err = 0;
 
might_sleep();
 
-   err = wait_var_event_killable(&wf->wakeref,
- !intel_wakeref_is_active(wf));
+   if (!timeout_ms)
+   err = wait_var_event_killable(&wf->wakeref,
+ !intel_wakeref_is_active(wf));
+   else if (wait_var_event_timeout(&wf->wakeref,
+   !intel_wakeref_is_active(wf),
+   msecs_to_jiffies(timeout_ms)) < 1)
+   err = -ETIMEDOUT;
+
if (err)
return err;
 
diff --git a/drivers/gpu/drm/i915/intel_wakeref.h 
b/drivers/gpu/drm/i915/intel_wakeref.h
index ec881b097368..302694a780d2 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.h
+++ b/drivers/gpu/drm/i915/intel_wakeref.h
@@ -251,15 +251,17 @@ __intel_wakeref_defer_park(struct intel_wakeref *wf)
 /**
  * intel_wakeref_wait_for_idle: Wait until the wakeref is idle
  * @wf: the wakeref
+ * @timeout_ms: Timeout in ms, 0 means never timeout.
  *
  * Wait for the earlier asynchronous release of the wakeref. Note
  * this will wait for any third party as well, so make sure you only wait
  * when you have control over the wakeref and trust no one else is acquiring
  * it.
  *
- * Return: 0 on success, error code if killed.
+

Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-10-04 Thread Teres Alexis, Alan Previn
On Thu, 2023-09-28 at 13:46 +0100, Tvrtko Ursulin wrote:
> On 27/09/2023 17:36, Teres Alexis, Alan Previn wrote:
> > Thanks for taking the time to review this Tvrtko, replies inline below.
alan:snip

> > > 
> > > Main concern is that we need to be sure there are no possible
> > > ill-effects, like letting the GPU/GuC scribble on some memory we
> > > unmapped (or will unmap), having let the suspend continue after timing
> > > out, and not perhaps doing the forced wedge like wait_for_suspend() does
> > > on the existing timeout path.
> > alan: this will not happen because the held wakeref is never force-released
> > after the timeout - so what happens is the kernel would bail the suspend.
> 
> How does it know to fail the suspend when there is no error code 
> returned with this timeout? Maybe a stupid question.. my knowledge of 
> suspend-resume paths was not great even before I forgot it all.
alan:Tvrtko, you and I both sir. (apologies for the tardy response yet again 
busy week).
So i did trace back the gt->wakeref before i posted these patches and 
discovered that
runtime get/put call, i believe that the first 'get' leads to 
__intel_wakeref_get_first
which calls intel_runtime_pm_get via rpm_get helper and eventually executes a
pm_runtime_get_sync(rpm->kdev); (hanging off i915). (ofc, there is a 
corresponding
for '_put_last') - so non-first, non-last increases the counter for the gt...
but this last miss will mean kernel knows i915 hasnt 'put' everything.

alan:snip
> > 
> > Recap: so in both cases (original vs this patch), if we had a buggy 
> > gt-wakeref leak,
> > we dont get invalid guc-accesses, but without this patch, we wait forever,
> > and with this patch, we get some messages and eventually bail the suspend.
> 
> It is not possible to wait for lost G2H in something like 
> intel_uc_suspend() and simply declare "bad things happened" if it times 
> out there, and forcibly clean it all up? (Which would include releasing 
> all the abandoned pm refs, so this patch wouldn't be needed.)
> 
alan: I'm not sure if intel_uc_suspend should be held up by gt-level wakeref
check unless huc/guc/gsc-uc are the only ones ever taking a gt wakeref. 

As we already know, what we do know from a uc-perspective:
-  ensure the outstanding guc related workers is flushed which we didnt before
(addressed by patch #1).
- any further late H2G-SchedDisable is not leaking wakerefs when calling H2G
and not realizing it failed (addressed by patch #2).
- (we already), "forcibly clean it all up" at the end of the intel_uc_suspend
when we do the guc reset and cleanup all guc-ids. (pre-existing upstream code)
- we previously didnt have a coherrent guarantee that "this is the end" i.e. no
more new request after intel_uc_suspend. I mean by code logic, we thought we did
(thats why intel_uc_suspend ends wth a guc reset), but we now know otherwise.
So we that fix by adding the additional rcu_barrier (also part of patch #2).

That said, patch-3 is NOT fixing a bug in guc -its about "if we ever have
a future racy gt-wakeref late-leak somewhere - no matter which subsystem
took it (guc is not the only subsystem taking gt wakerefs), we at
least don't hang forever in this code. Ofc, based on that, even without
patch-3 i am confident the issue is resolved anyway.
So we could just drop patch-3 is you prefer?

...alan


Re: [Intel-gfx] [PATCH v4 2/3] drm/i915/guc: Close deregister-context race against CT-loss

2023-10-04 Thread Teres Alexis, Alan Previn
On Wed, 2023-10-04 at 06:34 +, Gupta, Anshuman wrote:
> 
> > -Original Message-
> > From: Teres Alexis, Alan Previn  > @@ -289,6 +289,13 @@ int intel_gt_resume(struct intel_gt *gt)
> > 
> >  static void wait_for_suspend(struct intel_gt *gt)  {
> > +   /*
> > +* On rare occasions, we've observed the fence completion trigger
> > +* free_engines asynchronously via rcu_call. Ensure those are done.
> > +* This path is only called on suspend, so it's an acceptable cost.
> > +*/
> > +   rcu_barrier();
> Let's add the barrier after the end of prepare suspend and at start of late 
> suspend.
> To make sure we don't have any async destroy from any user request or any 
> internal  kmd request during i915 suspend?
> Br,
> Anshuman Gupta.
alan: some thoughts: actuallly wait_fos_suspend is being called at from both 
intel_gt_suspend_prepare
and intel_gt_suspend_late. so putting the barrier in above location would get 
hit for both steps.
However, because wait_for_suspend may optionally wedge the system if 
outstanding requests were stuck
for more than I915_GT_SUSPEND_IDLE_TIMEOUT, wouldnt it be better to add the 
barrier before that
check (i.e. in above location) as opposed to after the return from 
wait_for_suspend at the end
of suspend_prepare - which would defeat the purpose of even checking that 
intel_gt_wait_for_idle
from within wait_for_suspend? (which is existing code). Perhaps we need to get 
one last test on this?



Re: [Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-09-27 Thread Teres Alexis, Alan Previn
Thanks for taking the time to review this Tvrtko, replies inline below.

On Wed, 2023-09-27 at 10:02 +0100, Tvrtko Ursulin wrote:
> On 26/09/2023 20:05, Alan Previn wrote:
> > When suspending, add a timeout when calling
> > intel_gt_pm_wait_for_idle else if we have a lost
> > G2H event that holds a wakeref (which would be
> > indicative of a bug elsewhere in the driver),
> > driver will at least complete the suspend-resume
> > cycle, (albeit not hitting all the targets for
> > low power hw counters), instead of hanging in the kernel.
alan:snip

> >   {
> > +   int timeout_ms = CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT ? : 1;
> 
> CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT is in ns so assigning it to _ms is 
> a bit to arbitrary.
> 
> Why not the existing I915_GT_SUSPEND_IDLE_TIMEOUT for instance?
alan: good catch, my bad. However I915_GT_SUSPEND_IDLE_TIMEOUT is only half a 
sec
which i feel is too quick. I guess i could create a new #define or a multiple
of I915_GT_SUSPEND_IDLE_TIMEOUT?

> > /*
> >  * On rare occasions, we've observed the fence completion trigger
> >  * free_engines asynchronously via rcu_call. Ensure those are done.
> > @@ -308,7 +309,10 @@ static void wait_for_suspend(struct intel_gt *gt)
> > intel_gt_retire_requests(gt);
> > }
> >   
> > -   intel_gt_pm_wait_for_idle(gt);
> > +   /* we are suspending, so we shouldn't be waiting forever */
> > +   if (intel_gt_pm_wait_timeout_for_idle(gt, timeout_ms) == -ETIMEDOUT)
> > +   gt_warn(gt, "bailing from %s after %d milisec timeout\n",
> > +   __func__, timeout_ms);
> 
> Does the timeout in intel_gt_pm_wait_timeout_for_idle always comes in 
> pair with the timeout first in intel_gt_wait_for_idle?
alan: Not necessarily, ... IIRC in nearly all cases, the first wait call
complete in time (i.e. no more ongoing work) and the second wait
does wait only if the last bit of work 'just-finished', then that second
wait may take a touch bit longer because of possible async gt-pm-put calls.
(which appear in several places in the driver as part of regular runtime
operation including request completion). NOTE: this not high end workloads
so the
> 
> Also, is the timeout here hit from the intel_gt_suspend_prepare, 
> intel_gt_suspend_late, or can be both?
> 
> Main concern is that we need to be sure there are no possible 
> ill-effects, like letting the GPU/GuC scribble on some memory we 
> unmapped (or will unmap), having let the suspend continue after timing 
> out, and not perhaps doing the forced wedge like wait_for_suspend() does 
> on the existing timeout path.
alan: this will not happen because the held wakeref is never force-released
after the timeout - so what happens is the kernel would bail the suspend.
> 
> Would it be possible to handle the lost G2H events directly in the 
> respective component instead of here? Like apply the timeout during the 
> step which explicitly idles the CT for suspend (presumably that 
> exists?), and so cleanup from there once declared a lost event.
alan: So yes, we don't solve the problem with this patch - Patch#2
is addressing the root cause - and verification is still ongoing - because its
hard to thoroughly test / reproduce. (i.e. Patch 2 could get re-rev'd).

Intent of this patch, is to be informed that gt-pm wait failed in prep for
suspending and kernel will eventually bail the suspend, that's all.
Because without this timed-out version of gt-pm-wait, if we did have a leaky
gt-wakeref, kernel will hang here forever and we will need to get serial
console or ramoops to eventually discover the kernel's cpu hung error with
call-stack pointing to this location. So the goal here is to help speed up
future debug process (if let's say some future code change with another
async gt-pm-put came along and caused new problems after Patch #2 resolved
this time).

Recap: so in both cases (original vs this patch), if we had a buggy gt-wakeref 
leak,
we dont get invalid guc-accesses, but without this patch, we wait forever,
and with this patch, we get some messages and eventually bail the suspend.

alan:snip



[Intel-gfx] [PATCH v4 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-09-26 Thread Alan Previn
When suspending, add a timeout when calling
intel_gt_pm_wait_for_idle else if we have a lost
G2H event that holds a wakeref (which would be
indicative of a bug elsewhere in the driver),
driver will at least complete the suspend-resume
cycle, (albeit not hitting all the targets for
low power hw counters), instead of hanging in the kernel.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  6 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |  7 ++-
 drivers/gpu/drm/i915/intel_wakeref.c  | 14 ++
 drivers/gpu/drm/i915/intel_wakeref.h  |  6 --
 5 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 84a75c95f3f7..9c6151b78e1d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -687,7 +687,7 @@ void intel_engines_release(struct intel_gt *gt)
if (!engine->release)
continue;
 
-   intel_wakeref_wait_for_idle(&engine->wakeref);
+   intel_wakeref_wait_for_idle(&engine->wakeref, 0);
GEM_BUG_ON(intel_engine_pm_is_awake(engine));
 
engine->release(engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 59b5658a17fb..820217c06dc7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -289,6 +289,7 @@ int intel_gt_resume(struct intel_gt *gt)
 
 static void wait_for_suspend(struct intel_gt *gt)
 {
+   int timeout_ms = CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT ? : 1;
/*
 * On rare occasions, we've observed the fence completion trigger
 * free_engines asynchronously via rcu_call. Ensure those are done.
@@ -308,7 +309,10 @@ static void wait_for_suspend(struct intel_gt *gt)
intel_gt_retire_requests(gt);
}
 
-   intel_gt_pm_wait_for_idle(gt);
+   /* we are suspending, so we shouldn't be waiting forever */
+   if (intel_gt_pm_wait_timeout_for_idle(gt, timeout_ms) == -ETIMEDOUT)
+   gt_warn(gt, "bailing from %s after %d milisec timeout\n",
+   __func__, timeout_ms);
 }
 
 void intel_gt_suspend_prepare(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index 6c9a46452364..5358acc2b5b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -68,7 +68,12 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
-   return intel_wakeref_wait_for_idle(>->wakeref);
+   return intel_wakeref_wait_for_idle(>->wakeref, 0);
+}
+
+static inline int intel_gt_pm_wait_timeout_for_idle(struct intel_gt *gt, int 
timeout_ms)
+{
+   return intel_wakeref_wait_for_idle(>->wakeref, timeout_ms);
 }
 
 void intel_gt_pm_init_early(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/intel_wakeref.c 
b/drivers/gpu/drm/i915/intel_wakeref.c
index 718f2f1b6174..383a37521415 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.c
+++ b/drivers/gpu/drm/i915/intel_wakeref.c
@@ -111,14 +111,20 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
 "wakeref.work", &key->work, 0);
 }
 
-int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
+int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms)
 {
-   int err;
+   int err = 0;
 
might_sleep();
 
-   err = wait_var_event_killable(&wf->wakeref,
- !intel_wakeref_is_active(wf));
+   if (!timeout_ms)
+   err = wait_var_event_killable(&wf->wakeref,
+ !intel_wakeref_is_active(wf));
+   else if (wait_var_event_timeout(&wf->wakeref,
+   !intel_wakeref_is_active(wf),
+   msecs_to_jiffies(timeout_ms)) < 1)
+   err = -ETIMEDOUT;
+
if (err)
return err;
 
diff --git a/drivers/gpu/drm/i915/intel_wakeref.h 
b/drivers/gpu/drm/i915/intel_wakeref.h
index ec881b097368..302694a780d2 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.h
+++ b/drivers/gpu/drm/i915/intel_wakeref.h
@@ -251,15 +251,17 @@ __intel_wakeref_defer_park(struct intel_wakeref *wf)
 /**
  * intel_wakeref_wait_for_idle: Wait until the wakeref is idle
  * @wf: the wakeref
+ * @timeout_ms: Timeout in ms, 0 means never timeout.
  *
  * Wait for the earlier asynchronous release of the wakeref. Note
  * this will wait for any third party as well, so make sure you only wait
  * when you have control over the wakeref and trust no

[Intel-gfx] [PATCH v4 2/3] drm/i915/guc: Close deregister-context race against CT-loss

2023-09-26 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert an rcu_barrier at the start
of wait_for_suspend so that all such cases have completed
and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(>->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn 
Signed-off-by: Anshuman Gupta 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  7 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ---
 2 files changed, 77 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 5a942af0a14e..59b5658a17fb 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -289,6 +289,13 @@ int intel_gt_resume(struct intel_gt *gt)
 
 static void wait_for_suspend(struct intel_gt *gt)
 {
+   /*
+* On rare occasions, we've observed the fence completion trigger
+* free_engines asynchronously via rcu_call. Ensure those are done.
+* This path is only called on suspend, so it's an acceptable cost.
+*/
+   rcu_barrier();
+
if (!intel_gt_pm_is_awake(gt))
return;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index fdd7179f502a..465baf7660d7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -235,6 +235,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(&ce->guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -612,6 +619,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -622,7 +631,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(&guc->outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
@@ -3205,12 +3218,13 @@ static void guc_context_close(struct intel_context *ce)
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 }
 
-static inline void guc_lrc_desc_unpin(struct intel_context *ce)
+static inline int guc_lrc_desc_unpin(struct intel_context *ce)
 {
struct intel_guc *guc = ce_to_guc(ce);
struct intel_gt *gt = guc_to_gt(guc);
unsigned long flags;
bool disabled;
+   int ret;
 
GEM_BUG_ON(!intel_gt_pm_is_awake(

[Intel-gfx] [PATCH v4 0/3] Resolve suspend-resume racing with GuC destroy-context-worker

2023-09-26 Thread Alan Previn
This series is the result of debugging issues root caused to
races between the GuC's destroyed_worker_func being triggered
vs repeating suspend-resume cycles with concurrent delayed
fence signals for engine-freeing.

The reproduction steps require that an app is launched right
before the start of the suspend cycle where it creates a
new gem context and submits a tiny workload that would
complete in the middle of the suspend cycle. However this
app uses dma-buffer sharing or dma-fence with non-GPU
objects or signals that eventually triggers a FENCE_FREE
via__i915_sw_fence_notify that connects to engines_notify ->
free_engines_rcu -> intel_context_put ->
kref_put(&ce->ref..) that queues the worker after the GuCs
CTB has been disabled (i.e. after i915-gem's suspend-late).

This sequence is a corner-case and required repeating this
app->suspend->resume cycle ~1500 times across 4 identical
systems to see it once. That said, based on above callstack,
it is clear that merely flushing the context destruction worker,
which is obviously missing and needed, isn't sufficient.

Because of that, this series adds additional patches besides
the obvious (Patch #1) flushing of the worker during the
suspend flows. It also includes (Patch #2) closing a race
between sending the context-deregistration H2G vs the CTB
getting disabled in the midst of it (by detecing the failure
and unrolling the guc-lrc-unpin flow) and (Patch #32) not
infinitely waiting in intel_gt_pm_wait_timeout_for_idle
when in the suspend-flow.

NOTE: We are still observing one more wakeref leak from gt
but not necessarilty guc. We are still debugging this so
this series will very likely get rev'd up again.

Changes from prior revs:
   v3: - In Patch #3, when deregister_context fails, instead
 of calling intel_gt_pm_put(that might sleep), call
 __intel_wakeref_put (without ASYNC flag) (Rodrigo/Anshuman).
   - In wait_for_suspend add an rcu_barrier before we
 proceed to wait for idle. (Anshuman)
   v2: - Patch #2 Restructure code in guc_lrc_desc_unpin so
 it's more readible to differentiate (1)direct guc-id
 cleanup ..vs (2) sending the H2G ctx-destroy action ..
 vs (3) the unrolling steps if the H2G fails.
   - Patch #2 Add a check to close the race sooner by checking
 for intel_guc_is_ready from destroyed_worker_func.
   - Patch #2 When guc_submission_send_busy_loop gets a
 failure from intel_guc_send_busy_loop, we need to undo
 i.e. decrement the outstanding_submission_g2h.
   - Patch #3 In wait_for_suspend, fix checking of return from
 intel_gt_pm_wait_timeout_for_idle to now use -ETIMEDOUT
 and add documentation for intel_wakeref_wait_for_idle.
 (Rodrigo).

Alan Previn (3):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss
  drm/i915/gt: Timeout when waiting for idle in suspending

 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c | 13 ++-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |  7 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 86 ---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  2 +
 drivers/gpu/drm/i915/intel_wakeref.c  | 14 ++-
 drivers/gpu/drm/i915/intel_wakeref.h  |  6 +-
 8 files changed, 112 insertions(+), 20 deletions(-)


base-commit: a42554bf0755b80fdfb8e91ca35ae6835bb3534d
-- 
2.39.0



[Intel-gfx] [PATCH v4 1/3] drm/i915/guc: Flush context destruction worker at suspend

2023-09-26 Thread Alan Previn
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn 
Reviewed-by: Rodrigo Vivi 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ae3495a9c814..fdd7179f502a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1610,6 +1610,11 @@ static void guc_flush_submissions(struct intel_guc *guc)
spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+void intel_guc_submission_flush_work(struct intel_guc *guc)
+{
+   flush_work(&guc->submission_state.destroyed_worker);
+}
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c57b29cdb1a6..b6df75622d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
   bool interruptible,
   long timeout);
 
+void intel_guc_submission_flush_work(struct intel_guc *guc);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
return guc->submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 98b103375b7a..eb3554cb5ea4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -693,6 +693,8 @@ void intel_uc_suspend(struct intel_uc *uc)
return;
}
 
+   intel_guc_submission_flush_work(guc);
+
with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
-- 
2.39.0



Re: [Intel-gfx] [PATCH v3 2/3] drm/i915/guc: Close deregister-context race against CT-loss

2023-09-26 Thread Teres Alexis, Alan Previn
On Thu, 2023-09-14 at 11:34 -0400, Vivi, Rodrigo wrote:
> On Sat, Sep 09, 2023 at 08:58:45PM -0700, Alan Previn wrote:
alan:snip
> 
> > +   /* Change context state to destroyed and get gt-pm */
> > +   __intel_gt_pm_get(gt);
> > +   set_context_destroyed(ce);
> > +   clr_context_registered(ce);
> > +
> > +   ret = deregister_context(ce, ce->guc_id.id);
> > +   if (ret) {
> > +   /* Undo the state change and put gt-pm if that failed */
> > +   set_context_registered(ce);
> > +   clr_context_destroyed(ce);
> > +   intel_gt_pm_put(gt);
> 
> This is a might_sleep inside a spin_lock! Are you 100% confident no WARN
> was seeing during the tests indicated in the commit msg?
> 
> > +   }
> > +   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +   return 0;
> 
> If you are always returning 0, there's no pointing in s/void/int...
alan: Hi Rodrigo - i actually forget that i need the error value returned
so that _guc_context_destroy can put the context back into the
guc->submission_state.destroyed_contexts linked list. So i clearly missed 
returning
the error in the case deregister_context fails.
> 
> >  }
> >  
> >  static void __guc_context_destroy(struct intel_context *ce)
> > @@ -3268,7 +3296,22 @@ static void deregister_destroyed_contexts(struct 
> > intel_guc *guc)
> > if (!ce)
> > break;
> >  
> > -   guc_lrc_desc_unpin(ce);
> > +   if (guc_lrc_desc_unpin(ce)) {
> > +   /*
> > +* This means GuC's CT link severed mid-way which could 
> > happen
> > +* in suspend-resume corner cases. In this case, put the
> > +* context back into the destroyed_contexts list which 
> > will
> > +* get picked up on the next context deregistration 
> > event or
> > +* purged in a GuC sanitization event 
> > (reset/unload/wedged/...).
> > +*/
> > +   spin_lock_irqsave(&guc->submission_state.lock, flags);
> > +   list_add_tail(&ce->destroyed_link,
> > + 
> > &guc->submission_state.destroyed_contexts);
> > +   spin_unlock_irqrestore(&guc->submission_state.lock, 
> > flags);
alan:snip


Re: [Intel-gfx] [PATCH v3 2/3] drm/i915/guc: Close deregister-context race against CT-loss

2023-09-26 Thread Teres Alexis, Alan Previn
> 


> > alan:snip
> > > > @@ -3279,6 +3322,17 @@ static void destroyed_worker_func(struct
> > work_struct *w)
> > > > struct intel_gt *gt = guc_to_gt(guc);
> > > > int tmp;
> > > > 
> > > > +   /*
> > > > +* In rare cases we can get here via async context-free 
> > > > fence-signals
> > that
> > > > +* come very late in suspend flow or very early in resume 
> > > > flows. In
> > these
> > > > +* cases, GuC won't be ready but just skipping it here is fine 
> > > > as these
> > > > +* pending-destroy-contexts get destroyed totally at GuC reset 
> > > > time at
> > the
> > > > +* end of suspend.. OR.. this worker can be picked up later on 
> > > > the next
> > > > +* context destruction trigger after resume-completes
> > > 
> > > who is triggering the work queue again?
> > 
> > alan: short answer: we dont know - and still hunting this (getting closer 
> > now..
> > using task tgid str-name lookups).
> > in the few times I've seen it, the callstack I've seen looked like this:
> > 
> > [33763.582036] Call Trace:
> > [33763.582038]  
> > [33763.582040]  dump_stack_lvl+0x69/0x97 [33763.582054]
> > guc_context_destroy+0x1b5/0x1ec [33763.582067]
> > free_engines+0x52/0x70 [33763.582072]  rcu_do_batch+0x161/0x438
> > [33763.582084]  rcu_nocb_cb_kthread+0xda/0x2d0 [33763.582093]
> > kthread+0x13a/0x152 [33763.582102]  ?
> > rcu_nocb_gp_kthread+0x6a7/0x6a7 [33763.582107]  ? css_get+0x38/0x38
> > [33763.582118]  ret_from_fork+0x1f/0x30 [33763.582128]  

> Alan above trace is not due to missing GT wakeref, it is due to a 
> intel_context_put(),
> Which  called asynchronously by rcu_call(__free_engines), we need insert 
> rcu_barrier() to flush all
> rcu callback in late suspend.
> 
> Thanks,
> Anshuman.
> > 
Thanks Anshuman for following up with the ongoing debug. I shall re-rev 
accordingly.
...alan


Re: [Intel-gfx] [PATCH v3 2/3] drm/i915/guc: Close deregister-context race against CT-loss

2023-09-22 Thread Teres Alexis, Alan Previn
(cc Anshuman who is working directly with the taskforce debugging this)

Thanks again for taking the time to review this patch.
Apologies for the tardiness, rest assured debug is still ongoing.

As mentioned in prior comments, the signatures and frequency are
now different compared to without the patches of this series.
We are still hunting for data as we are suspecting a different wakeref
still being held with the same trigger event despite.

That said, we will continue to rebase / update this series but hold off on 
actual
merge until we can be sure we have all the issues resolved.

On Thu, 2023-09-14 at 11:34 -0400, Vivi, Rodrigo wrote:
> On Sat, Sep 09, 2023 at 08:58:45PM -0700, Alan Previn wrote:
> > If we are at the end of suspend or very early in resume
> > its possible an async fence signal could lead us to the
alan:snip


> > @@ -3188,19 +3202,33 @@ static inline void guc_lrc_desc_unpin(struct 
> > intel_context *ce)
> > /* Seal race with Reset */
> > spin_lock_irqsave(&ce->guc_state.lock, flags);
> > disabled = submission_disabled(guc);
> > 
alan:snip
> > +   /* Change context state to destroyed and get gt-pm */
> > +   __intel_gt_pm_get(gt);
> > +   set_context_destroyed(ce);
> > +   clr_context_registered(ce);
> > +
> > +   ret = deregister_context(ce, ce->guc_id.id);
> > +   if (ret) {
> > +   /* Undo the state change and put gt-pm if that failed */
> > +   set_context_registered(ce);
> > +   clr_context_destroyed(ce);
> > +   intel_gt_pm_put(gt);
> 
> This is a might_sleep inside a spin_lock! Are you 100% confident no WARN
> was seeing during the tests indicated in the commit msg?
alan: Good catch - i dont think we saw a WARN - I'll go back and check
with the task force - i shall rework this function to get that outside the lock.

> 
> > +   }
> > +   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +   return 0;
> 
> If you are always returning 0, there's no pointing in s/void/int...
Alan: agreed - will change to void.
> > 
> > 

alan:snip
> > @@ -3279,6 +3322,17 @@ static void destroyed_worker_func(struct work_struct 
> > *w)
> > struct intel_gt *gt = guc_to_gt(guc);
> > int tmp;
> >  
> > +   /*
> > +* In rare cases we can get here via async context-free fence-signals 
> > that
> > +* come very late in suspend flow or very early in resume flows. In 
> > these
> > +* cases, GuC won't be ready but just skipping it here is fine as these
> > +* pending-destroy-contexts get destroyed totally at GuC reset time at 
> > the
> > +* end of suspend.. OR.. this worker can be picked up later on the next
> > +* context destruction trigger after resume-completes
> 
> who is triggering the work queue again?

alan: short answer: we dont know - and still hunting this
(getting closer now.. using task tgid str-name lookups).
in the few times I've seen it, the callstack I've seen looked like this:

[33763.582036] Call Trace:
[33763.582038]  
[33763.582040]  dump_stack_lvl+0x69/0x97
[33763.582054]  guc_context_destroy+0x1b5/0x1ec
[33763.582067]  free_engines+0x52/0x70
[33763.582072]  rcu_do_batch+0x161/0x438
[33763.582084]  rcu_nocb_cb_kthread+0xda/0x2d0
[33763.582093]  kthread+0x13a/0x152
[33763.582102]  ? rcu_nocb_gp_kthread+0x6a7/0x6a7
[33763.582107]  ? css_get+0x38/0x38
[33763.582118]  ret_from_fork+0x1f/0x30
[33763.582128]  

I did add additional debug-msg for tracking and I recall seeing this sequence 
via independant callstacks in the big picture:
i915_sw_fence_complete > __i915_sw_fence_complete -> 
__i915_sw_fence_notify(fence, FENCE_FREE) -> <..delayed?..>
[ drm fence sync func ] <...> engines_notify > call_rcu(&engines>rcu, 
free_engines_rcu) <..delayed?..>
free_engines -> intel_context_put -> ... [kref-dec] --> 
guc_context_destroy

Unfortunately, we still don't know why this initial "i915_sw_fence_complete" is 
coming during suspend-late.
NOTE1: in the cover letter or prior comment, I hope i mentioned the 
reproduction steps where
it only occurs when having a workload that does network download that begins 
downloading just
before suspend is started but completes before suspend late. We are getting 
close to finding
this - taking time because of the reproduction steps.

Anshuman can chime in if he is seeing new signatures with different callstack /
events after this patch is applied.


> I mean, I'm wondering if it is necessary to re-schedule it from inside...
> but also wonder if this is safe anyway...

alan: so my thought process, also after consulting with John and Daniele, was:
since we have a link list to c

Re: [Intel-gfx] [PATCH v3 1/3] drm/i915/guc: Flush context destruction worker at suspend

2023-09-22 Thread Teres Alexis, Alan Previn
On Thu, 2023-09-14 at 11:35 -0400, Vivi, Rodrigo wrote:
> On Sat, Sep 09, 2023 at 08:58:44PM -0700, Alan Previn wrote:
> > When suspending, flush the context-guc-id
> > deregistration worker at the final stages of
> > intel_gt_suspend_late when we finally call gt_sanitize
> > that eventually leads down to __uc_sanitize so that
> > the deregistration worker doesn't fire off later as
> > we reset the GuC microcontroller.
> > 
> > Signed-off-by: Alan Previn 
> > Tested-by: Mousumi Jana 
> 
> Reviewed-by: Rodrigo Vivi 
alan:snip
alan: thanks for the RB Rodrigo.


[Intel-gfx] [PATCH v4] drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-09-19 Thread Alan Previn
Debugging PXP issues can't even begin without understanding precedding
sequence of important events. Add drm_dbg into the most important PXP
events.

 v3 : - move gt_dbg to after mutex block in function
i915_gsc_proxy_component_bind. (Vivaik)
 v2 : - remove __func__ since drm_dbg covers that (Jani).
  - add timeout dbg of the restart from front-end (Alan).

Signed-off-by: Alan Previn 
Reviewed-by: Vivaik Balasubrawmanian 
---
 drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c |  2 ++
 drivers/gpu/drm/i915/pxp/intel_pxp.c | 15 ---
 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c |  5 +++--
 drivers/gpu/drm/i915/pxp/intel_pxp_session.c |  6 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_types.h   |  1 +
 5 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
index 5f138de3c14f..40817ebcca71 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
@@ -322,6 +322,7 @@ static int i915_gsc_proxy_component_bind(struct device 
*i915_kdev,
gsc->proxy.component = data;
gsc->proxy.component->mei_dev = mei_kdev;
mutex_unlock(&gsc->proxy.mutex);
+   gt_dbg(gt, "GSC proxy mei component bound\n");
 
return 0;
 }
@@ -342,6 +343,7 @@ static void i915_gsc_proxy_component_unbind(struct device 
*i915_kdev,
with_intel_runtime_pm(&i915->runtime_pm, wakeref)
intel_uncore_rmw(gt->uncore, HECI_H_CSR(MTL_GSC_HECI2_BASE),
 HECI_H_CSR_IE | HECI_H_CSR_RST, 0);
+   gt_dbg(gt, "GSC proxy mei component unbound\n");
 }
 
 static const struct component_ops i915_gsc_proxy_component_ops = {
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index dc327cf40b5a..e11f562b1876 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -303,6 +303,8 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
 
if (!pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: teardown for suspend/fini");
/*
 * To ensure synchronous and coherent session teardown completion
 * in response to suspend or shutdown triggers, don't use a worker.
@@ -324,6 +326,8 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
if (pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: teardown for restart");
/*
 * The arb-session is currently inactive and we are doing a reset and 
restart
 * due to a runtime event. Use the worker that was designed for this.
@@ -332,8 +336,11 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
timeout = intel_pxp_get_backend_timeout_ms(pxp);
 
-   if (!wait_for_completion_timeout(&pxp->termination, 
msecs_to_jiffies(timeout)))
+   if (!wait_for_completion_timeout(&pxp->termination, 
msecs_to_jiffies(timeout))) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: restart backend timed 
out (%d ms)",
+   timeout);
return -ETIMEDOUT;
+   }
 
return 0;
 }
@@ -414,10 +421,12 @@ int intel_pxp_start(struct intel_pxp *pxp)
int ret = 0;
 
ret = intel_pxp_get_readiness_status(pxp, PXP_READINESS_TIMEOUT);
-   if (ret < 0)
+   if (ret < 0) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: tried but not-avail 
(%d)", ret);
return ret;
-   else if (ret > 1)
+   } else if (ret > 1) {
return -EIO; /* per UAPI spec, user may retry later */
+   }
 
mutex_lock(&pxp->arb_mutex);
 
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
index 91e9622c07d0..d81750b9bdda 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
@@ -40,11 +40,12 @@ void intel_pxp_irq_handler(struct intel_pxp *pxp, u16 iir)
   GEN12_DISPLAY_APP_TERMINATED_PER_FW_REQ_INTERRUPT)) {
/* immediately mark PXP as inactive on termination */
intel_pxp_mark_termination_in_progress(pxp);
-   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED;
+   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED |
+  PXP_EVENT_TYPE_IRQ;
}
 
if (iir & GEN12_DISPLAY_STATE_RESET_COMPLETE_INTERRUPT)
-   pxp->session_events |= PXP_TERMINATION_COMPLETE;
+   pxp->session_events |= PXP_TERMINATION_COMPLETE | 
PXP_EVENT_TYPE_IRQ;
 
if (pxp->session_events)

Re: [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-09-19 Thread Teres Alexis, Alan Previn
On Sat, 2023-09-16 at 03:09 +, Patchwork wrote:
> 
alan:snip
> 2eab9e4e637a drm/i915/pxp: Add drm_dbgs for critical PXP events.
> -:7: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line 
> (possible unwrapped commit description?)
> #7: 
> sequence of important events. Add drm_dbg into the most important PXP events.
> 
> -:15: ERROR:BAD_SIGN_OFF: Unrecognized email address: 'Balasubrawmanian, 
> Vivaik '
> #15: 
> Reviewed-by: Balasubrawmanian, Vivaik 
> 
> -:96: WARNING:LONG_LINE: line length of 105 exceeds 100 columns
> #96: FILE: drivers/gpu/drm/i915/pxp/intel_pxp_irq.c:43:
> + pxp->session_events |= PXP_TERMINATION_REQUEST | 
> PXP_INVAL_REQUIRED | PXP_EVENT_TYPE_IRQ;

Will fix these before we merge.
...alan


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/pxp/mtl: Update gsc-heci cmd submission to align with fw/hw spec

2023-09-18 Thread Teres Alexis, Alan Previn
On Sun, 2023-09-17 at 22:04 +, Patchwork wrote:
> Patch Details
> Series: drm/i915/pxp/mtl: Update gsc-heci cmd submission to align with fw/hw 
> spec
> URL:https://patchwork.freedesktop.org/series/123830/
> 

alan:snip
Below issue it unrelated since this series only changes code paths in MTL-PXP 
only and in pxp-specific lrc code. (double checked). will retrigger test since 
it was a BAT failure.

> Possible new issues
> 
> Here are the unknown changes that may have been introduced in 
> Patchwork_123830v1:
> 
> IGT changes
> Possible regressions
> 
>   *   igt@gem_exec_suspend@basic-s3@lmem0:
>  *   bat-dg2-8: NOTRUN -> 
> INCOMPLETE
> 



[Intel-gfx] [PATCH v7 2/3] drm/i915/pxp/mtl: Update pxp-firmware packet size

2023-09-17 Thread Alan Previn
Update the GSC-fw input/output HECI packet size to match
updated internal fw specs.

Signed-off-by: Alan Previn 
Reviewed-by: Vivaik Balasubrawmanian 
---
 drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
index 0165d38fbead..329b4fcdc040 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
@@ -14,8 +14,8 @@
 #define PXP43_CMDID_NEW_HUC_AUTH 0x003F /* MTL+ */
 #define PXP43_CMDID_INIT_SESSION 0x0036
 
-/* PXP-Packet sizes for MTL's GSCCS-HECI instruction */
-#define PXP43_MAX_HECI_INOUT_SIZE (SZ_32K)
+/* PXP-Packet sizes for MTL's GSCCS-HECI instruction is spec'd at 65K before 
page alignment*/
+#define PXP43_MAX_HECI_INOUT_SIZE (PAGE_ALIGN(SZ_64K + SZ_1K))
 
 /* PXP-Packet size for MTL's NEW_HUC_AUTH instruction */
 #define PXP43_HUC_AUTH_INOUT_SIZE (SZ_4K)
-- 
2.39.0



[Intel-gfx] [PATCH v7 3/3] drm/i915/lrc: User PXP contexts requires runalone bit in lrc

2023-09-17 Thread Alan Previn
On Meteorlake onwards, HW specs require that all user contexts that
run on render or compute engines and require PXP must enforce
run-alone bit in lrc. Add this enforcement for protected contexts.

Signed-off-by: Alan Previn 
Reviewed-by: Vivaik Balasubrawmanian 
---
 drivers/gpu/drm/i915/gt/intel_engine_regs.h |  1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c | 23 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_regs.h 
b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
index 6b9d9f837669..fdd4ddd3a978 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
@@ -177,6 +177,7 @@
 #define   CTX_CTRL_RS_CTX_ENABLE   REG_BIT(1)
 #define  CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT  REG_BIT(2)
 #define  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH   REG_BIT(3)
+#define  GEN12_CTX_CTRL_RUNALONE_MODE  REG_BIT(7)
 #define  GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE REG_BIT(8)
 #define RING_CTX_SR_CTL(base)  _MMIO((base) + 0x244)
 #define RING_SEMA_WAIT_POLL(base)  _MMIO((base) + 0x24c)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 147b6f44ad56..eaf66d903166 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -845,6 +845,27 @@ lrc_setup_indirect_ctx(u32 *regs,
lrc_ring_indirect_offset_default(engine) << 6;
 }
 
+static bool ctx_needs_runalone(const struct intel_context *ce)
+{
+   struct i915_gem_context *gem_ctx;
+   bool ctx_is_protected = false;
+
+   /*
+* On MTL and newer platforms, protected contexts require setting
+* the LRC run-alone bit or else the encryption will not happen.
+*/
+   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
+   (ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
RENDER_CLASS)) {
+   rcu_read_lock();
+   gem_ctx = rcu_dereference(ce->gem_context);
+   if (gem_ctx)
+   ctx_is_protected = gem_ctx->uses_protected_content;
+   rcu_read_unlock();
+   }
+
+   return ctx_is_protected;
+}
+
 static void init_common_regs(u32 * const regs,
 const struct intel_context *ce,
 const struct intel_engine_cs *engine,
@@ -860,6 +881,8 @@ static void init_common_regs(u32 * const regs,
if (GRAPHICS_VER(engine->i915) < 11)
ctl |= _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT |
   CTX_CTRL_RS_CTX_ENABLE);
+   if (ctx_needs_runalone(ce))
+   ctl |= _MASKED_BIT_ENABLE(GEN12_CTX_CTRL_RUNALONE_MODE);
regs[CTX_CONTEXT_CONTROL] = ctl;
 
regs[CTX_TIMESTAMP] = ce->stats.runtime.last;
-- 
2.39.0



[Intel-gfx] [PATCH v7 0/3] drm/i915/pxp/mtl: Update gsc-heci cmd submission to align with fw/hw spec

2023-09-17 Thread Alan Previn
For MTL, update the GSC-HECI packet size and the max firmware
response timeout to match internal fw specs. Enforce setting
run-alone bit in LRC for protected contexts.

Changes from prio revs:
   v6: - fix build error with PXP config (Alan)
   v5: - PAGE_ALIGN typo fix (Alan).
   - Use macro for runalone bit (Vivaik)
   - Spec alignment with system overhead,
 increase fw timeout to 500ms (Alan)
   v4: - PAGE_ALIGN the max heci packet size (Alan).
   v3: - Patch #1. Only start counting the request completion
 timeout from after the request has started (Daniele).
   v2: - Patch #3: fix sparse warning reported by kernel test robot.
   v1: - N/A (Re-test)

Signed-off-by: Alan Previn 

Alan Previn (3):
  drm/i915/pxp/mtl: Update pxp-firmware response timeout
  drm/i915/pxp/mtl: Update pxp-firmware packet size
  drm/i915/lrc: User PXP contexts requires runalone bit in lrc

 drivers/gpu/drm/i915/gt/intel_engine_regs.h   |  1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   | 23 +++
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 ++--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 +
 .../drm/i915/pxp/intel_pxp_cmd_interface_43.h |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c|  2 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 10 
 7 files changed, 55 insertions(+), 11 deletions(-)


base-commit: dc4cd6e4e53d46211952fe7c0e408fce3e212993
-- 
2.39.0



[Intel-gfx] [PATCH v7 1/3] drm/i915/pxp/mtl: Update pxp-firmware response timeout

2023-09-17 Thread Alan Previn
Update the max GSC-fw response time to match updated internal
fw specs. Because this response time is an SLA on the firmware,
not inclusive of i915->GuC->HW handoff latency, when submitting
requests to the GSC fw via intel_gsc_uc_heci_cmd_submit helpers,
start the count after the request hits the GSC command streamer.
Also, move GSC_REPLY_LATENCY_MS definition from pxp header to
intel_gsc_uc_heci_cmd_submit.h since its for any GSC HECI packet.

Signed-off-by: Alan Previn 
Reviewed-by: Vivaik Balasubrawmanian 
---
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 +--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 ++
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c|  2 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 10 --
 4 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
index 89ed5ee9cded..2fde5c360cff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
@@ -81,8 +81,17 @@ int intel_gsc_uc_heci_cmd_submit_packet(struct intel_gsc_uc 
*gsc, u64 addr_in,
 
i915_request_add(rq);
 
-   if (!err && i915_request_wait(rq, 0, msecs_to_jiffies(500)) < 0)
-   err = -ETIME;
+   if (!err) {
+   /*
+* Start timeout for i915_request_wait only after considering 
one possible
+* pending GSC-HECI submission cycle on the other 
(non-privileged) path.
+*/
+   if (wait_for(i915_request_started(rq), 
GSC_HECI_REPLY_LATENCY_MS))
+   drm_dbg(&gsc_uc_to_gt(gsc)->i915->drm,
+   "Delay in gsc-heci-priv submission to 
gsccs-hw");
+   if (i915_request_wait(rq, 0, 
msecs_to_jiffies(GSC_HECI_REPLY_LATENCY_MS)) < 0)
+   err = -ETIME;
+   }
 
i915_request_put(rq);
 
@@ -186,6 +195,13 @@ intel_gsc_uc_heci_cmd_submit_nonpriv(struct intel_gsc_uc 
*gsc,
i915_request_add(rq);
 
if (!err) {
+   /*
+* Start timeout for i915_request_wait only after considering 
one possible
+* pending GSC-HECI submission cycle on the other (privileged) 
path.
+*/
+   if (wait_for(i915_request_started(rq), 
GSC_HECI_REPLY_LATENCY_MS))
+   drm_dbg(&gsc_uc_to_gt(gsc)->i915->drm,
+   "Delay in gsc-heci-non-priv submission to 
gsccs-hw");
if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
  msecs_to_jiffies(timeout_ms)) < 0)
err = -ETIME;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
index 09d3fbdad05a..c4308291c003 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
@@ -12,6 +12,12 @@ struct i915_vma;
 struct intel_context;
 struct intel_gsc_uc;
 
+#define GSC_HECI_REPLY_LATENCY_MS 500
+/*
+ * Max FW response time is 500ms, but this should be counted from the time the
+ * command has hit the GSC-CS hardware, not the preceding handoff to GuC CTB.
+ */
+
 struct intel_gsc_mtl_header {
u32 validity_marker;
 #define GSC_HECI_VALIDITY_MARKER 0xA578875A
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c
index b4ce7ca9b49d..27402ecf0457 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c
@@ -112,7 +112,7 @@ gsccs_send_message(struct intel_pxp *pxp,
 
ret = intel_gsc_uc_heci_cmd_submit_nonpriv(>->uc.gsc,
   exec_res->ce, &pkt, 
exec_res->bb_vaddr,
-  GSC_REPLY_LATENCY_MS);
+  GSC_HECI_REPLY_LATENCY_MS);
if (ret) {
drm_err(&i915->drm, "failed to send gsc PXP msg (%d)\n", ret);
goto unlock;
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
index 298ad38e6c7d..9aae779c4da3 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
@@ -8,16 +8,14 @@
 
 #include 
 
+#include "gt/uc/intel_gsc_uc_heci_cmd_submit.h"
+
 struct intel_pxp;
 
-#define GSC_REPLY_LATENCY_MS 210
-/*
- * Max FW response time is 200ms, to which we add 10ms to account for overhead
- * such as request preparation, GuC submission to hw and pipeline completion 
times.
- */
 #define GSC_PENDING_RETRY_MAXCOUNT 40
 #define GSC_PENDING_RETRY_PAUSE_MS 50
-#define GSCFW_MAX_ROUND_TRIP_LATE

Re: [Intel-gfx] [PATCH v6 1/3] drm/i915/pxp/mtl: Update pxp-firmware response timeout

2023-09-17 Thread Teres Alexis, Alan Previn
On Sat, 2023-09-16 at 10:25 +0800, lkp wrote:
> Hi Alan,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on cf1e91e884bb1113c653e654e9de1754fc1d4488]
> 
> aAll errors (new ones prefixed by >>):
> 
> 
alan:snip
alan: missed building with PXP config after that last change below - will 
re-rev this.

>drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c: In function 
> 'gsccs_send_message':
> > > drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c:115:52: error: 
> > > 'GSC_REPLY_LATENCY_MS' undeclared (first use in this function); did you 
> > > mean 'GSC_HECI_REPLY_LATENCY_MS'?
>  115 |
> GSC_REPLY_LATENCY_MS);
>  |
> ^~~~
>  |
> GSC_HECI_REPLY_LATENCY_MS
>drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c:115:52: note: each undeclared 
> identifier is reported only once for each function it appears in
> 




Re: [Intel-gfx] [PATCH v3] drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-09-15 Thread Teres Alexis, Alan Previn
On Fri, 2023-09-15 at 13:15 -0700, Teres Alexis, Alan Previn wrote:
> Debugging PXP issues can't even begin without understanding precedding
> sequence of important events. Add drm_dbg into the most important PXP events.
> 
>  v3 : - move gt_dbg to after mutex block in function
> i915_gsc_proxy_component_bind. (Vivaik)
>  v2 : - remove __func__ since drm_dbg covers that (Jani).
>   - add timeout dbg of the restart from front-end (Alan).
> 
alan:snip

> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
> @@ -322,6 +322,7 @@ static int i915_gsc_proxy_component_bind(struct device 
> *i915_kdev,
>   gsc->proxy.component = data;
>   gsc->proxy.component->mei_dev = mei_kdev;
>   mutex_unlock(&gsc->proxy.mutex);
> + gt_dbg(gt, "GSC proxy mei component bound\n");

forgot to include RB from Vivaik, per condition of fixing above hunk, from rev2 
dri-devel 
https://lists.freedesktop.org/archives/dri-devel/2023-September/422858.html :

Reviewed-by: Balasubrawmanian, Vivaik  


[Intel-gfx] [PATCH v3] drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-09-15 Thread Alan Previn
Debugging PXP issues can't even begin without understanding precedding
sequence of important events. Add drm_dbg into the most important PXP events.

 v3 : - move gt_dbg to after mutex block in function
i915_gsc_proxy_component_bind. (Vivaik)
 v2 : - remove __func__ since drm_dbg covers that (Jani).
  - add timeout dbg of the restart from front-end (Alan).

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c |  2 ++
 drivers/gpu/drm/i915/pxp/intel_pxp.c | 15 ---
 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_session.c |  6 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_types.h   |  1 +
 5 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
index 5f138de3c14f..40817ebcca71 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
@@ -322,6 +322,7 @@ static int i915_gsc_proxy_component_bind(struct device 
*i915_kdev,
gsc->proxy.component = data;
gsc->proxy.component->mei_dev = mei_kdev;
mutex_unlock(&gsc->proxy.mutex);
+   gt_dbg(gt, "GSC proxy mei component bound\n");
 
return 0;
 }
@@ -342,6 +343,7 @@ static void i915_gsc_proxy_component_unbind(struct device 
*i915_kdev,
with_intel_runtime_pm(&i915->runtime_pm, wakeref)
intel_uncore_rmw(gt->uncore, HECI_H_CSR(MTL_GSC_HECI2_BASE),
 HECI_H_CSR_IE | HECI_H_CSR_RST, 0);
+   gt_dbg(gt, "GSC proxy mei component unbound\n");
 }
 
 static const struct component_ops i915_gsc_proxy_component_ops = {
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index dc327cf40b5a..e11f562b1876 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -303,6 +303,8 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
 
if (!pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: teardown for suspend/fini");
/*
 * To ensure synchronous and coherent session teardown completion
 * in response to suspend or shutdown triggers, don't use a worker.
@@ -324,6 +326,8 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
if (pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: teardown for restart");
/*
 * The arb-session is currently inactive and we are doing a reset and 
restart
 * due to a runtime event. Use the worker that was designed for this.
@@ -332,8 +336,11 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
timeout = intel_pxp_get_backend_timeout_ms(pxp);
 
-   if (!wait_for_completion_timeout(&pxp->termination, 
msecs_to_jiffies(timeout)))
+   if (!wait_for_completion_timeout(&pxp->termination, 
msecs_to_jiffies(timeout))) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: restart backend timed 
out (%d ms)",
+   timeout);
return -ETIMEDOUT;
+   }
 
return 0;
 }
@@ -414,10 +421,12 @@ int intel_pxp_start(struct intel_pxp *pxp)
int ret = 0;
 
ret = intel_pxp_get_readiness_status(pxp, PXP_READINESS_TIMEOUT);
-   if (ret < 0)
+   if (ret < 0) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: tried but not-avail 
(%d)", ret);
return ret;
-   else if (ret > 1)
+   } else if (ret > 1) {
return -EIO; /* per UAPI spec, user may retry later */
+   }
 
mutex_lock(&pxp->arb_mutex);
 
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
index 91e9622c07d0..0637b1d36356 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
@@ -40,11 +40,11 @@ void intel_pxp_irq_handler(struct intel_pxp *pxp, u16 iir)
   GEN12_DISPLAY_APP_TERMINATED_PER_FW_REQ_INTERRUPT)) {
/* immediately mark PXP as inactive on termination */
intel_pxp_mark_termination_in_progress(pxp);
-   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED;
+   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED | PXP_EVENT_TYPE_IRQ;
}
 
if (iir & GEN12_DISPLAY_STATE_RESET_COMPLETE_INTERRUPT)
-   pxp->session_events |= PXP_TERMINATION_COMPLETE;
+   pxp->session_events |= PXP_TERMINATION_COMPLETE | 
PXP_EVENT_TYPE_IRQ;
 
if (pxp->session_events)
queue_work(system_unbound_wq, &pxp->session_work);
diff --git a/d

[Intel-gfx] [PATCH v6 2/3] drm/i915/pxp/mtl: Update pxp-firmware packet size

2023-09-15 Thread Alan Previn
Update the GSC-fw input/output HECI packet size to match
updated internal fw specs.

Signed-off-by: Alan Previn 
Reviewed-by: Vivaik Balasubrawmanian 
---
 drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
index 0165d38fbead..329b4fcdc040 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
@@ -14,8 +14,8 @@
 #define PXP43_CMDID_NEW_HUC_AUTH 0x003F /* MTL+ */
 #define PXP43_CMDID_INIT_SESSION 0x0036
 
-/* PXP-Packet sizes for MTL's GSCCS-HECI instruction */
-#define PXP43_MAX_HECI_INOUT_SIZE (SZ_32K)
+/* PXP-Packet sizes for MTL's GSCCS-HECI instruction is spec'd at 65K before 
page alignment*/
+#define PXP43_MAX_HECI_INOUT_SIZE (PAGE_ALIGN(SZ_64K + SZ_1K))
 
 /* PXP-Packet size for MTL's NEW_HUC_AUTH instruction */
 #define PXP43_HUC_AUTH_INOUT_SIZE (SZ_4K)
-- 
2.39.0



[Intel-gfx] [PATCH v6 1/3] drm/i915/pxp/mtl: Update pxp-firmware response timeout

2023-09-15 Thread Alan Previn
Update the max GSC-fw response time to match updated internal
fw specs. Because this response time is an SLA on the firmware,
not inclusive of i915->GuC->HW handoff latency, when submitting
requests to the GSC fw via intel_gsc_uc_heci_cmd_submit helpers,
start the count after the request hits the GSC command streamer.
Also, move GSC_REPLY_LATENCY_MS definition from pxp header to
intel_gsc_uc_heci_cmd_submit.h since its for any GSC HECI packet.

Signed-off-by: Alan Previn 
Reviewed-by: Vivaik Balasubrawmanian 
---
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 +--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 ++
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 10 --
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
index 89ed5ee9cded..2fde5c360cff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
@@ -81,8 +81,17 @@ int intel_gsc_uc_heci_cmd_submit_packet(struct intel_gsc_uc 
*gsc, u64 addr_in,
 
i915_request_add(rq);
 
-   if (!err && i915_request_wait(rq, 0, msecs_to_jiffies(500)) < 0)
-   err = -ETIME;
+   if (!err) {
+   /*
+* Start timeout for i915_request_wait only after considering 
one possible
+* pending GSC-HECI submission cycle on the other 
(non-privileged) path.
+*/
+   if (wait_for(i915_request_started(rq), 
GSC_HECI_REPLY_LATENCY_MS))
+   drm_dbg(&gsc_uc_to_gt(gsc)->i915->drm,
+   "Delay in gsc-heci-priv submission to 
gsccs-hw");
+   if (i915_request_wait(rq, 0, 
msecs_to_jiffies(GSC_HECI_REPLY_LATENCY_MS)) < 0)
+   err = -ETIME;
+   }
 
i915_request_put(rq);
 
@@ -186,6 +195,13 @@ intel_gsc_uc_heci_cmd_submit_nonpriv(struct intel_gsc_uc 
*gsc,
i915_request_add(rq);
 
if (!err) {
+   /*
+* Start timeout for i915_request_wait only after considering 
one possible
+* pending GSC-HECI submission cycle on the other (privileged) 
path.
+*/
+   if (wait_for(i915_request_started(rq), 
GSC_HECI_REPLY_LATENCY_MS))
+   drm_dbg(&gsc_uc_to_gt(gsc)->i915->drm,
+   "Delay in gsc-heci-non-priv submission to 
gsccs-hw");
if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
  msecs_to_jiffies(timeout_ms)) < 0)
err = -ETIME;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
index 09d3fbdad05a..c4308291c003 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
@@ -12,6 +12,12 @@ struct i915_vma;
 struct intel_context;
 struct intel_gsc_uc;
 
+#define GSC_HECI_REPLY_LATENCY_MS 500
+/*
+ * Max FW response time is 500ms, but this should be counted from the time the
+ * command has hit the GSC-CS hardware, not the preceding handoff to GuC CTB.
+ */
+
 struct intel_gsc_mtl_header {
u32 validity_marker;
 #define GSC_HECI_VALIDITY_MARKER 0xA578875A
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
index 298ad38e6c7d..9aae779c4da3 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
@@ -8,16 +8,14 @@
 
 #include 
 
+#include "gt/uc/intel_gsc_uc_heci_cmd_submit.h"
+
 struct intel_pxp;
 
-#define GSC_REPLY_LATENCY_MS 210
-/*
- * Max FW response time is 200ms, to which we add 10ms to account for overhead
- * such as request preparation, GuC submission to hw and pipeline completion 
times.
- */
 #define GSC_PENDING_RETRY_MAXCOUNT 40
 #define GSC_PENDING_RETRY_PAUSE_MS 50
-#define GSCFW_MAX_ROUND_TRIP_LATENCY_MS (GSC_PENDING_RETRY_MAXCOUNT * 
GSC_PENDING_RETRY_PAUSE_MS)
+#define GSCFW_MAX_ROUND_TRIP_LATENCY_MS (GSC_HECI_REPLY_LATENCY_MS + \
+(GSC_PENDING_RETRY_MAXCOUNT * 
GSC_PENDING_RETRY_PAUSE_MS))
 
 #ifdef CONFIG_DRM_I915_PXP
 void intel_pxp_gsccs_fini(struct intel_pxp *pxp);
-- 
2.39.0



[Intel-gfx] [PATCH v6 3/3] drm/i915/lrc: User PXP contexts requires runalone bit in lrc

2023-09-15 Thread Alan Previn
On Meteorlake onwards, HW specs require that all user contexts that
run on render or compute engines and require PXP must enforce
run-alone bit in lrc. Add this enforcement for protected contexts.

Signed-off-by: Alan Previn 
Reviewed-by: Vivaik Balasubrawmanian 
---
 drivers/gpu/drm/i915/gt/intel_engine_regs.h |  1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c | 23 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_regs.h 
b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
index 6b9d9f837669..fdd4ddd3a978 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
@@ -177,6 +177,7 @@
 #define   CTX_CTRL_RS_CTX_ENABLE   REG_BIT(1)
 #define  CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT  REG_BIT(2)
 #define  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH   REG_BIT(3)
+#define  GEN12_CTX_CTRL_RUNALONE_MODE  REG_BIT(7)
 #define  GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE REG_BIT(8)
 #define RING_CTX_SR_CTL(base)  _MMIO((base) + 0x244)
 #define RING_SEMA_WAIT_POLL(base)  _MMIO((base) + 0x24c)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 147b6f44ad56..eaf66d903166 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -845,6 +845,27 @@ lrc_setup_indirect_ctx(u32 *regs,
lrc_ring_indirect_offset_default(engine) << 6;
 }
 
+static bool ctx_needs_runalone(const struct intel_context *ce)
+{
+   struct i915_gem_context *gem_ctx;
+   bool ctx_is_protected = false;
+
+   /*
+* On MTL and newer platforms, protected contexts require setting
+* the LRC run-alone bit or else the encryption will not happen.
+*/
+   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
+   (ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
RENDER_CLASS)) {
+   rcu_read_lock();
+   gem_ctx = rcu_dereference(ce->gem_context);
+   if (gem_ctx)
+   ctx_is_protected = gem_ctx->uses_protected_content;
+   rcu_read_unlock();
+   }
+
+   return ctx_is_protected;
+}
+
 static void init_common_regs(u32 * const regs,
 const struct intel_context *ce,
 const struct intel_engine_cs *engine,
@@ -860,6 +881,8 @@ static void init_common_regs(u32 * const regs,
if (GRAPHICS_VER(engine->i915) < 11)
ctl |= _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT |
   CTX_CTRL_RS_CTX_ENABLE);
+   if (ctx_needs_runalone(ce))
+   ctl |= _MASKED_BIT_ENABLE(GEN12_CTX_CTRL_RUNALONE_MODE);
regs[CTX_CONTEXT_CONTROL] = ctl;
 
regs[CTX_TIMESTAMP] = ce->stats.runtime.last;
-- 
2.39.0



[Intel-gfx] [PATCH v6 0/3] drm/i915/pxp/mtl: Update gsc-heci cmd submission to align with fw/hw spec

2023-09-15 Thread Alan Previn
For MTL, update the GSC-HECI packet size and the max firmware
response timeout to match internal fw specs. Enforce setting
run-alone bit in LRC for protected contexts.

Changes from prio revs:
   v5: - PAGE_ALIGN typo fix (Alan).
   - Use macro for runalone bit (Vivaik)
   - Spec alignment with system overhead,
 increase fw timeout to 500ms (Alan)
   v4: - PAGE_ALIGN the max heci packet size (Alan).
   v3: - Patch #1. Only start counting the request completion
 timeout from after the request has started (Daniele).
   v2: - Patch #3: fix sparse warning reported by kernel test robot.
   v1: - N/A (Re-test)

Signed-off-by: Alan Previn 

Alan Previn (3):
  drm/i915/pxp/mtl: Update pxp-firmware response timeout
  drm/i915/pxp/mtl: Update pxp-firmware packet size
  drm/i915/lrc: User PXP contexts requires runalone bit in lrc

 drivers/gpu/drm/i915/gt/intel_engine_regs.h   |  1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   | 23 +++
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 ++--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 +
 .../drm/i915/pxp/intel_pxp_cmd_interface_43.h |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 10 
 6 files changed, 54 insertions(+), 10 deletions(-)


base-commit: cf1e91e884bb1113c653e654e9de1754fc1d4488
-- 
2.39.0



[Intel-gfx] [PATCH v6 0/3] drm/i915/pxp/mtl: Update gsc-heci cmd submission to align with fw/hw spec

2023-09-15 Thread Alan Previn
For MTL, update the GSC-HECI packet size and the max firmware
response timeout to match internal fw specs. Enforce setting
run-alone bit in LRC for protected contexts.

Changes from prio revs:
   v5: - PAGE_ALIGN typo fix (Alan).
   - Use macro for runalone bit (Vivaik)
   - Spec alignment with system overhead,
 increase fw timeout to 500ms (Alan)
   v4: - PAGE_ALIGN the max heci packet size (Alan).
   v3: - Patch #1. Only start counting the request completion
 timeout from after the request has started (Daniele).
   v2: - Patch #3: fix sparse warning reported by kernel test robot.
   v1: - N/A (Re-test)

Signed-off-by: Alan Previn 

Alan Previn (3):
  drm/i915/pxp/mtl: Update pxp-firmware response timeout
  drm/i915/pxp/mtl: Update pxp-firmware packet size
  drm/i915/lrc: User PXP contexts requires runalone bit in lrc

 drivers/gpu/drm/i915/gt/intel_engine_regs.h   |  1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   | 23 +++
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 ++--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 +
 .../drm/i915/pxp/intel_pxp_cmd_interface_43.h |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 10 
 6 files changed, 54 insertions(+), 10 deletions(-)


base-commit: cf1e91e884bb1113c653e654e9de1754fc1d4488
-- 
2.39.0



Re: [Intel-gfx] [PATCH v5 3/3] drm/i915/lrc: User PXP contexts requires runalone bit in lrc

2023-09-15 Thread Teres Alexis, Alan Previn
On Sat, 2023-09-09 at 15:38 -0700, Teres Alexis, Alan Previn wrote:
> On Meteorlake onwards, HW specs require that all user contexts that
> run on render or compute engines and require PXP must enforce
> run-alone bit in lrc. Add this enforcement for protected contexts.
alan:snip
> 
> Signed-off-by: Alan Previn 
> @@ -860,6 +881,8 @@ static void init_common_regs(u32 * const regs,
>   if (GRAPHICS_VER(engine->i915) < 11)
>   ctl |= _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT |
>  CTX_CTRL_RS_CTX_ENABLE);
> + if (ctx_needs_runalone(ce))
> + ctl |= _MASKED_BIT_ENABLE(BIT(7));
>   regs[CTX_CONTEXT_CONTROL] = ctl;
>  
>   regs[CTX_TIMESTAMP] = ce->stats.runtime.last;
alan: to align intel-gfx ml, Vivaik reviewed this on dri-devel at 
https://lists.freedesktop.org/archives/dri-devel/2023-September/422875.html - 
snippet:
thus, will rerev this (with the others fixes in this series).

>> Can we please get the bit defined in intel_engine_regs.h with a define 
>> instead of a number identification?
>> 
>> Review completed conditional to the above fix.
>> 
>> Reviewed-by: Balasubrawmanian, Vivaik 
>>  
>> <mailto:vivaik.balasubrawmanian at intel.com>




Re: [Intel-gfx] [PATCH v5 2/3] drm/i915/pxp/mtl: Update pxp-firmware packet size

2023-09-15 Thread Teres Alexis, Alan Previn
On Sat, 2023-09-09 at 15:38 -0700, Teres Alexis, Alan Previn wrote:
> Update the GSC-fw input/output HECI packet size to match
> updated internal fw specs.
> 
> Signed-off-by: Alan Previn 
> ---
>  drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h 
> b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
> index 0165d38fbead..e017a7d952e9 100644
> --- a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
> @@ -14,8 +14,8 @@
>  #define PXP43_CMDID_NEW_HUC_AUTH 0x003F /* MTL+ */
>  #define PXP43_CMDID_INIT_SESSION 0x0036
>  
> -/* PXP-Packet sizes for MTL's GSCCS-HECI instruction */
> -#define PXP43_MAX_HECI_INOUT_SIZE (SZ_32K)
> +/* PXP-Packet sizes for MTL's GSCCS-HECI instruction is spec'd at 65K before 
> page alignment*/
> +#define PXP43_MAX_HECI_INOUT_SIZE (PAGE_ALIGNED(SZ_64K + SZ_1K))
>  
>  /* PXP-Packet size for MTL's NEW_HUC_AUTH instruction */
>  #define PXP43_HUC_AUTH_INOUT_SIZE (SZ_4K)
Vivaik replied with RB on dri-devel: 
https://lists.freedesktop.org/archives/dri-devel/2023-September/422862.htmll
we connected offline and agreed that his RB can remain standing on condition i 
fix the PAGE_ALIGNED -> PAGE_ALIGN fix.
Thanks Vivaik for reviewing.



Re: [Intel-gfx] [PATCH v5 1/3] drm/i915/pxp/mtl: Update pxp-firmware response timeout

2023-09-15 Thread Teres Alexis, Alan Previn
On Sat, 2023-09-09 at 15:38 -0700, Teres Alexis, Alan Previn wrote:
> Update the max GSC-fw response time to match updated internal
> fw specs. Because this response time is an SLA on the firmware,
> not inclusive of i915->GuC->HW handoff latency, when submitting
> requests to the GSC fw via intel_gsc_uc_heci_cmd_submit helpers,
alan:snip
Vivaik replied with RB on dri-devel: 
https://lists.freedesktop.org/archives/dri-devel/2023-September/422861.html
we connected offline and agreed that his RB can remain standing on condition i 
fix the PAGE_ALIGNED -> PAGE_ALIGN fix.
Thanks Vivaik for reviewing.


Re: [Intel-gfx] [PATCH v5 1/3] drm/i915/pxp/mtl: Update pxp-firmware response timeout

2023-09-15 Thread Teres Alexis, Alan Previn
On Sat, 2023-09-09 at 15:38 -0700, Teres Alexis, Alan Previn wrote:
> Update the max GSC-fw response time to match updated internal
> fw specs. Because this response time is an SLA on the firmware,
> not inclusive of i915->GuC->HW handoff latency, when submitting
> requests to the GSC fw via intel_gsc_uc_heci_cmd_submit helpers,
> start the count after the request hits the GSC command streamer.
> Also, move GSC_REPLY_LATENCY_MS definition from pxp header to
> intel_gsc_uc_heci_cmd_submit.h since its for any GSC HECI packet.
> 
> Signed-off-by: Alan Previn 
> ---
>  .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 +--
>  .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 ++
>  drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 11 ++
>  3 files changed, 31 insertions(+), 6 deletions(-)
alan: snip


> index 09d3fbdad05a..5ae5c5d9608b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
> @@ -12,6 +12,12 @@ struct i915_vma;
>  struct intel_context;
>  struct intel_gsc_uc;
>  
> +#define GSC_HECI_REPLY_LATENCY_MS 350
> +/*
> + * Max FW response time is 350ms, but this should be counted from the time 
> the
> + * command has hit the GSC-CS hardware, not the preceding handoff to GuC CTB.
> + */
alan: continue to face timeout issues - so increasing this to ~500 to absorb 
other hw/sw system latencies.
this also matches what the gsc-proxy code was doing - so i could use the same 
macro for that other code path.



Re: [Intel-gfx] [PATCH v5 2/3] drm/i915/pxp/mtl: Update pxp-firmware packet size

2023-09-15 Thread Teres Alexis, Alan Previn
On Sat, 2023-09-09 at 15:38 -0700, Teres Alexis, Alan Previn wrote:
> Update the GSC-fw input/output HECI packet size to match
> updated internal fw specs.
> 
> Signed-off-by: Alan Previn 
> 
alan:snip

> -/* PXP-Packet sizes for MTL's GSCCS-HECI instruction */
> -#define PXP43_MAX_HECI_INOUT_SIZE (SZ_32K)
> +/* PXP-Packet sizes for MTL's GSCCS-HECI instruction is spec'd at 65K before 
> page alignment*/
> +#define PXP43_MAX_HECI_INOUT_SIZE (PAGE_ALIGNED(SZ_64K + SZ_1K))
alan: silly ctrl-c/v bug on my part - should be PAGE_ALIGN, not ALIGNED
>  
>  /* PXP-Packet size for MTL's NEW_HUC_AUTH instruction */
>  #define PXP43_HUC_AUTH_INOUT_SIZE (SZ_4K)



[Intel-gfx] [PATCH v2 1/1] drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-09-13 Thread Alan Previn
Debugging PXP issues can't even begin without understanding precedding
sequence of events. Add drm_dbg into the most important PXP events.

 v2 : - remove __func__ since drm_dbg covers that (Jani).
  - add timeout of the restart from front-end (Alan).

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c |  2 ++
 drivers/gpu/drm/i915/pxp/intel_pxp.c | 15 ---
 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_session.c |  6 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_types.h   |  1 +
 5 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
index 5f138de3c14f..61216c4abaec 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
@@ -321,6 +321,7 @@ static int i915_gsc_proxy_component_bind(struct device 
*i915_kdev,
mutex_lock(&gsc->proxy.mutex);
gsc->proxy.component = data;
gsc->proxy.component->mei_dev = mei_kdev;
+   gt_dbg(gt, "GSC proxy mei component bound\n");
mutex_unlock(&gsc->proxy.mutex);
 
return 0;
@@ -342,6 +343,7 @@ static void i915_gsc_proxy_component_unbind(struct device 
*i915_kdev,
with_intel_runtime_pm(&i915->runtime_pm, wakeref)
intel_uncore_rmw(gt->uncore, HECI_H_CSR(MTL_GSC_HECI2_BASE),
 HECI_H_CSR_IE | HECI_H_CSR_RST, 0);
+   gt_dbg(gt, "GSC proxy mei component unbound\n");
 }
 
 static const struct component_ops i915_gsc_proxy_component_ops = {
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index dc327cf40b5a..e11f562b1876 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -303,6 +303,8 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
 
if (!pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: teardown for suspend/fini");
/*
 * To ensure synchronous and coherent session teardown completion
 * in response to suspend or shutdown triggers, don't use a worker.
@@ -324,6 +326,8 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
if (pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: teardown for restart");
/*
 * The arb-session is currently inactive and we are doing a reset and 
restart
 * due to a runtime event. Use the worker that was designed for this.
@@ -332,8 +336,11 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
timeout = intel_pxp_get_backend_timeout_ms(pxp);
 
-   if (!wait_for_completion_timeout(&pxp->termination, 
msecs_to_jiffies(timeout)))
+   if (!wait_for_completion_timeout(&pxp->termination, 
msecs_to_jiffies(timeout))) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: restart backend timed 
out (%d ms)",
+   timeout);
return -ETIMEDOUT;
+   }
 
return 0;
 }
@@ -414,10 +421,12 @@ int intel_pxp_start(struct intel_pxp *pxp)
int ret = 0;
 
ret = intel_pxp_get_readiness_status(pxp, PXP_READINESS_TIMEOUT);
-   if (ret < 0)
+   if (ret < 0) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: tried but not-avail 
(%d)", ret);
return ret;
-   else if (ret > 1)
+   } else if (ret > 1) {
return -EIO; /* per UAPI spec, user may retry later */
+   }
 
mutex_lock(&pxp->arb_mutex);
 
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
index 91e9622c07d0..0637b1d36356 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
@@ -40,11 +40,11 @@ void intel_pxp_irq_handler(struct intel_pxp *pxp, u16 iir)
   GEN12_DISPLAY_APP_TERMINATED_PER_FW_REQ_INTERRUPT)) {
/* immediately mark PXP as inactive on termination */
intel_pxp_mark_termination_in_progress(pxp);
-   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED;
+   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED | PXP_EVENT_TYPE_IRQ;
}
 
if (iir & GEN12_DISPLAY_STATE_RESET_COMPLETE_INTERRUPT)
-   pxp->session_events |= PXP_TERMINATION_COMPLETE;
+   pxp->session_events |= PXP_TERMINATION_COMPLETE | 
PXP_EVENT_TYPE_IRQ;
 
if (pxp->session_events)
queue_work(system_unbound_wq, &pxp->session_work);
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c 
b/drivers/gpu/drm/i915/pxp/inte

Re: [Intel-gfx] [PATCH v1 1/1] drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-09-13 Thread Teres Alexis, Alan Previn
On Mon, 2023-09-11 at 12:26 +0300, Jani Nikula wrote:
> On Wed, 06 Sep 2023, Alan Previn  wrote:
> > Debugging PXP issues can't even begin without understanding precedding
> > sequence of events. Add drm_dbg into the most important PXP events.
> > 
> > Signed-off-by: Alan Previn 
alan:snip

> 
> > +
> > +   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: %s invoked", __func__);
> 
> drm_dbg already covers __func__ (via __builtin_return_address(0) in
> __drm_dev_dbg), it's redundant.
> 
> Ditto for all added debugs below.

My bad - yup - will fix them.
Thanks for taking time to review this patch.
...alan
> 



Re: [Intel-gfx] [PATCH] i915/pmu: Move execlist stats initialization to execlist specific setup

2023-09-13 Thread Teres Alexis, Alan Previn
I went up the call stack to ensure the differences between the
old and new location isnt skipping over other functions that may reference
something engine related (that may also end up triggering stats variabls).

Without digging further, i see the old postion here:
i915_driver_probe -> i915_driver_mmio_probe -> intel_gt_init_mmio -> 
intel_engines_init_mmio -> intel_engine_setup
new postion here:
i915_driver_probe -> i915_gem_init -> (for_each_gt) intel_gt_init -> 
intel_engines_init -> setup (intel_execlists_submission_setup)

And between i915_driver_mmio_probe and i915_gem_init are only mem/ggtt, display 
and irq related init functions.
That said, LGTM (although you do need to address the BAT failure before 
merging):

Reviewed-by: Alan Previn 

On Tue, 2023-09-12 at 14:22 -0700, Umesh Nerlige Ramappa wrote:
> engine->stats is a union of execlist and guc stat objects. When execlist
> specific fields are initialized, the initial state of guc stats is
> affected. This results in bad busyness values when using GuC mode. Move
> the execlist initialization from common code to execlist specific code.
> 
> Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to 
> pmu")
> Signed-off-by: Umesh Nerlige Ramappa 
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c| 1 -
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 ++
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index dfb69fc977a0..84a75c95f3f7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -558,7 +558,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
> intel_engine_id id,
>   DRIVER_CAPS(i915)->has_logical_contexts = true;
>  
>   ewma__engine_latency_init(&engine->latency);
> - seqcount_init(&engine->stats.execlists.lock);
>  
>   ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 4d05321dc5b5..e8f42ec6b1b4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3548,6 +3548,8 @@ int intel_execlists_submission_setup(struct 
> intel_engine_cs *engine)
>   logical_ring_default_vfuncs(engine);
>   logical_ring_default_irqs(engine);
>  
> + seqcount_init(&engine->stats.execlists.lock);
> +
>   if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)
>   rcs_submission_override(engine);
>  



[Intel-gfx] [PATCH v3 2/3] drm/i915/guc: Close deregister-context race against CT-loss

2023-09-09 Thread Alan Previn
If we are at the end of suspend or very early in resume
its possible an async fence signal could lead us to the
execution of the context destruction worker (after the
prior worker flush).

Even if checking that the CT is enabled before calling
destroyed_worker_func, guc_lrc_desc_unpin may still fail
because in corner cases, as we traverse the GuC's
context-destroy list, the CT could get disabled in the mid
of it right before calling the GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the unpin process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. That will cascade into a kernel hang later at the tail
end of suspend in this function:

   intel_wakeref_wait_for_idle(>->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Doing this unroll and keeping the context in the GuC's
destroy-list will allow the context to get picked up on
the next destroy worker invocation or purged as part of a
major GuC sanitization or reset flow.

Signed-off-by: Alan Previn 
Tested-by: Mousumi Jana 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 76 ---
 1 file changed, 65 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0ed44637bca0..db7df1217350 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -235,6 +235,13 @@ set_context_destroyed(struct intel_context *ce)
ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+   lockdep_assert_held(&ce->guc_state.lock);
+   ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -612,6 +619,8 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
 u32 g2h_len_dw,
 bool loop)
 {
+   int ret;
+
/*
 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 * so we don't handle the case where we don't get a reply because we
@@ -622,7 +631,11 @@ static int guc_submission_send_busy_loop(struct intel_guc 
*guc,
if (g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
 
-   return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+   if (ret)
+   atomic_dec(&guc->outstanding_submission_g2h);
+
+   return ret;
 }
 
 int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
@@ -3173,12 +3186,13 @@ static void guc_context_close(struct intel_context *ce)
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 }
 
-static inline void guc_lrc_desc_unpin(struct intel_context *ce)
+static inline int guc_lrc_desc_unpin(struct intel_context *ce)
 {
struct intel_guc *guc = ce_to_guc(ce);
struct intel_gt *gt = guc_to_gt(guc);
unsigned long flags;
bool disabled;
+   int ret;
 
GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
GEM_BUG_ON(!ctx_id_mapped(guc, ce->guc_id.id));
@@ -3188,19 +3202,33 @@ static inline void guc_lrc_desc_unpin(struct 
intel_context *ce)
/* Seal race with Reset */
spin_lock_irqsave(&ce->guc_state.lock, flags);
disabled = submission_disabled(guc);
-   if (likely(!disabled)) {
-   __intel_gt_pm_get(gt);
-   set_context_destroyed(ce);
-   clr_context_registered(ce);
-   }
-   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
if (unlikely(disabled)) {
+   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
release_guc_id(guc, ce);
__guc_context_destroy(ce);
-   return;
+   return 0;
}
 
-   deregister_context(ce, ce->guc_id.id);
+   /* GuC is active, lets destroy this context,
+* but at this point we can still be racing with
+* suspend, so we undo everything if the H2G fails
+*/
+
+   /* Change context state to destroyed and get gt-pm */
+   __intel_gt_pm_get(gt);
+   set_context_destroyed(ce);
+   clr_context_registered(ce);
+
+   ret = deregister_conte

[Intel-gfx] [PATCH v3 1/3] drm/i915/guc: Flush context destruction worker at suspend

2023-09-09 Thread Alan Previn
When suspending, flush the context-guc-id
deregistration worker at the final stages of
intel_gt_suspend_late when we finally call gt_sanitize
that eventually leads down to __uc_sanitize so that
the deregistration worker doesn't fire off later as
we reset the GuC microcontroller.

Signed-off-by: Alan Previn 
Tested-by: Mousumi Jana 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index cabdc645fcdd..0ed44637bca0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1578,6 +1578,11 @@ static void guc_flush_submissions(struct intel_guc *guc)
spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
+void intel_guc_submission_flush_work(struct intel_guc *guc)
+{
+   flush_work(&guc->submission_state.destroyed_worker);
+}
+
 static void guc_flush_destroyed_contexts(struct intel_guc *guc);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index c57b29cdb1a6..b6df75622d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
   bool interruptible,
   long timeout);
 
+void intel_guc_submission_flush_work(struct intel_guc *guc);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
return guc->submission_supported;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 98b103375b7a..eb3554cb5ea4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -693,6 +693,8 @@ void intel_uc_suspend(struct intel_uc *uc)
return;
}
 
+   intel_guc_submission_flush_work(guc);
+
with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
-- 
2.39.0



[Intel-gfx] [PATCH v3 3/3] drm/i915/gt: Timeout when waiting for idle in suspending

2023-09-09 Thread Alan Previn
When suspending, add a timeout when calling
intel_gt_pm_wait_for_idle else if we have a lost
G2H event that holds a wakeref (which would be
indicative of a bug elsewhere in the driver),
driver will at least complete the suspend-resume
cycle, (albeit not hitting all the targets for
low power hw counters), instead of hanging in the kernel.

Signed-off-by: Alan Previn 
Tested-by: Mousumi Jana 
Reviewed-by: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  7 ++-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |  7 ++-
 drivers/gpu/drm/i915/intel_wakeref.c  | 14 ++
 drivers/gpu/drm/i915/intel_wakeref.h  |  6 --
 5 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index dfb69fc977a0..4811f3be0332 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -688,7 +688,7 @@ void intel_engines_release(struct intel_gt *gt)
if (!engine->release)
continue;
 
-   intel_wakeref_wait_for_idle(&engine->wakeref);
+   intel_wakeref_wait_for_idle(&engine->wakeref, 0);
GEM_BUG_ON(intel_engine_pm_is_awake(engine));
 
engine->release(engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 5a942af0a14e..ca46aee72573 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -289,6 +289,8 @@ int intel_gt_resume(struct intel_gt *gt)
 
 static void wait_for_suspend(struct intel_gt *gt)
 {
+   int timeout_ms = CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT ? : 1;
+
if (!intel_gt_pm_is_awake(gt))
return;
 
@@ -301,7 +303,10 @@ static void wait_for_suspend(struct intel_gt *gt)
intel_gt_retire_requests(gt);
}
 
-   intel_gt_pm_wait_for_idle(gt);
+   /* we are suspending, so we shouldn't be waiting forever */
+   if (intel_gt_pm_wait_timeout_for_idle(gt, timeout_ms) == -ETIMEDOUT)
+   gt_warn(gt, "bailing from %s after %d milisec timeout\n",
+   __func__, timeout_ms);
 }
 
 void intel_gt_suspend_prepare(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index 6c9a46452364..5358acc2b5b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -68,7 +68,12 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
-   return intel_wakeref_wait_for_idle(>->wakeref);
+   return intel_wakeref_wait_for_idle(>->wakeref, 0);
+}
+
+static inline int intel_gt_pm_wait_timeout_for_idle(struct intel_gt *gt, int 
timeout_ms)
+{
+   return intel_wakeref_wait_for_idle(>->wakeref, timeout_ms);
 }
 
 void intel_gt_pm_init_early(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/intel_wakeref.c 
b/drivers/gpu/drm/i915/intel_wakeref.c
index 718f2f1b6174..383a37521415 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.c
+++ b/drivers/gpu/drm/i915/intel_wakeref.c
@@ -111,14 +111,20 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
 "wakeref.work", &key->work, 0);
 }
 
-int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
+int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms)
 {
-   int err;
+   int err = 0;
 
might_sleep();
 
-   err = wait_var_event_killable(&wf->wakeref,
- !intel_wakeref_is_active(wf));
+   if (!timeout_ms)
+   err = wait_var_event_killable(&wf->wakeref,
+ !intel_wakeref_is_active(wf));
+   else if (wait_var_event_timeout(&wf->wakeref,
+   !intel_wakeref_is_active(wf),
+   msecs_to_jiffies(timeout_ms)) < 1)
+   err = -ETIMEDOUT;
+
if (err)
return err;
 
diff --git a/drivers/gpu/drm/i915/intel_wakeref.h 
b/drivers/gpu/drm/i915/intel_wakeref.h
index ec881b097368..302694a780d2 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.h
+++ b/drivers/gpu/drm/i915/intel_wakeref.h
@@ -251,15 +251,17 @@ __intel_wakeref_defer_park(struct intel_wakeref *wf)
 /**
  * intel_wakeref_wait_for_idle: Wait until the wakeref is idle
  * @wf: the wakeref
+ * @timeout_ms: Timeout in ms, 0 means never timeout.
  *
  * Wait for the earlier asynchronous release of the wakeref. Note
  * this will wait for any third party as well, so make sure you only wait
  * when you have control over the wakeref and trust no one else is acquiring
  * it.
  *
- * Return: 0 on success, error code if killed.
+

[Intel-gfx] [PATCH v3 0/3] Resolve suspend-resume racing with GuC destroy-context-worker

2023-09-09 Thread Alan Previn
This series is the result of debugging issues root caused to
races between the GuC's destroyed_worker_func being triggered
vs repeating suspend-resume cycles with concurrent delayed
fence signals for engine-freeing.

The reproduction steps require that an app is launched right
before the start of the suspend cycle where it creates a
new gem context and submits a tiny workload that would
complete in the middle of the suspend cycle. However this
app uses dma-buffer sharing or dma-fence with non-GPU
objects or signals that eventually triggers a FENCE_FREE
via__i915_sw_fence_notify that connects to engines_notify ->
free_engines_rcu -> intel_context_put ->
kref_put(&ce->ref..) that queues the worker after the GuCs
CTB has been disabled (i.e. after i915-gem's suspend-late).

This sequence is a corner-case and required repeating this
app->suspend->resume cycle ~1500 times across 4 identical
systems to see it once. That said, based on above callstack,
it is clear that merely flushing the context destruction worker,
which is obviously missing and needed, isn't sufficient.

Because of that, this series adds additional patches besides
the obvious (Patch #1) flushing of the worker during the
suspend flows. It also includes (Patch #2) closing a race
between sending the context-deregistration H2G vs the CTB
getting disabled in the midst of it (by detecing the failure
and unrolling the guc-lrc-unpin flow) and (Patch #32) not
infinitely waiting in intel_gt_pm_wait_timeout_for_idle
when in the suspend-flow.

NOTE: We are still observing one more wakeref leak from gt
but not necessarilty guc. We are still debugging this so
this series will very likely get rev'd up again.

Changes from prior revs:
   v2: - Patch #2 Restructure code in guc_lrc_desc_unpin so
 it's more readible to differentiate (1)direct guc-id
 cleanup ..vs (2) sending the H2G ctx-destroy action ..
 vs (3) the unrolling steps if the H2G fails.
   - Patch #2 Add a check to close the race sooner by checking
 for intel_guc_is_ready from destroyed_worker_func.
   - Patch #2 When guc_submission_send_busy_loop gets a
 failure from intel_guc_send_busy_loop, we need to undo
 i.e. decrement the outstanding_submission_g2h.
   - Patch #3 In wait_for_suspend, fix checking of return from
 intel_gt_pm_wait_timeout_for_idle to now use -ETIMEDOUT
 and add documentation for intel_wakeref_wait_for_idle.
 (Rodrigo).

Alan Previn (3):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss
  drm/i915/gt: Timeout when waiting for idle in suspending

 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  7 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |  7 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  2 +
 drivers/gpu/drm/i915/intel_wakeref.c  | 14 +++-
 drivers/gpu/drm/i915/intel_wakeref.h  |  6 +-
 8 files changed, 101 insertions(+), 20 deletions(-)


base-commit: f8d21cb17a99b75862196036bb4bb93ee9637b74
-- 
2.39.0



[Intel-gfx] [PATCH v5 3/3] drm/i915/lrc: User PXP contexts requires runalone bit in lrc

2023-09-09 Thread Alan Previn
On Meteorlake onwards, HW specs require that all user contexts that
run on render or compute engines and require PXP must enforce
run-alone bit in lrc. Add this enforcement for protected contexts.

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 967fe4d77a87..3df32177e49e 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -845,6 +845,27 @@ lrc_setup_indirect_ctx(u32 *regs,
lrc_ring_indirect_offset_default(engine) << 6;
 }
 
+static bool ctx_needs_runalone(const struct intel_context *ce)
+{
+   struct i915_gem_context *gem_ctx;
+   bool ctx_is_protected = false;
+
+   /*
+* On MTL and newer platforms, protected contexts require setting
+* the LRC run-alone bit or else the encryption will not happen.
+*/
+   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
+   (ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
RENDER_CLASS)) {
+   rcu_read_lock();
+   gem_ctx = rcu_dereference(ce->gem_context);
+   if (gem_ctx)
+   ctx_is_protected = gem_ctx->uses_protected_content;
+   rcu_read_unlock();
+   }
+
+   return ctx_is_protected;
+}
+
 static void init_common_regs(u32 * const regs,
 const struct intel_context *ce,
 const struct intel_engine_cs *engine,
@@ -860,6 +881,8 @@ static void init_common_regs(u32 * const regs,
if (GRAPHICS_VER(engine->i915) < 11)
ctl |= _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT |
   CTX_CTRL_RS_CTX_ENABLE);
+   if (ctx_needs_runalone(ce))
+   ctl |= _MASKED_BIT_ENABLE(BIT(7));
regs[CTX_CONTEXT_CONTROL] = ctl;
 
regs[CTX_TIMESTAMP] = ce->stats.runtime.last;
-- 
2.39.0



[Intel-gfx] [PATCH v5 1/3] drm/i915/pxp/mtl: Update pxp-firmware response timeout

2023-09-09 Thread Alan Previn
Update the max GSC-fw response time to match updated internal
fw specs. Because this response time is an SLA on the firmware,
not inclusive of i915->GuC->HW handoff latency, when submitting
requests to the GSC fw via intel_gsc_uc_heci_cmd_submit helpers,
start the count after the request hits the GSC command streamer.
Also, move GSC_REPLY_LATENCY_MS definition from pxp header to
intel_gsc_uc_heci_cmd_submit.h since its for any GSC HECI packet.

Signed-off-by: Alan Previn 
---
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 +--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 ++
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 11 ++
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
index 89ed5ee9cded..fe6a2f78cea0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
@@ -81,8 +81,17 @@ int intel_gsc_uc_heci_cmd_submit_packet(struct intel_gsc_uc 
*gsc, u64 addr_in,
 
i915_request_add(rq);
 
-   if (!err && i915_request_wait(rq, 0, msecs_to_jiffies(500)) < 0)
-   err = -ETIME;
+   if (!err) {
+   /*
+* Start timeout for i915_request_wait only after considering 
one possible
+* pending GSC-HECI submission cycle on the other 
(non-privileged) path.
+*/
+   if (wait_for(i915_request_started(rq), 
GSC_HECI_REPLY_LATENCY_MS))
+   drm_dbg(&gsc_uc_to_gt(gsc)->i915->drm,
+   "Delay in gsc-heci-priv submission to 
gsccs-hw");
+   if (i915_request_wait(rq, 0, msecs_to_jiffies(500)) < 0)
+   err = -ETIME;
+   }
 
i915_request_put(rq);
 
@@ -186,6 +195,13 @@ intel_gsc_uc_heci_cmd_submit_nonpriv(struct intel_gsc_uc 
*gsc,
i915_request_add(rq);
 
if (!err) {
+   /*
+* Start timeout for i915_request_wait only after considering 
one possible
+* pending GSC-HECI submission cycle on the other (privileged) 
path.
+*/
+   if (wait_for(i915_request_started(rq), 
GSC_HECI_REPLY_LATENCY_MS))
+   drm_dbg(&gsc_uc_to_gt(gsc)->i915->drm,
+   "Delay in gsc-heci-non-priv submission to 
gsccs-hw");
if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
  msecs_to_jiffies(timeout_ms)) < 0)
err = -ETIME;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
index 09d3fbdad05a..5ae5c5d9608b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
@@ -12,6 +12,12 @@ struct i915_vma;
 struct intel_context;
 struct intel_gsc_uc;
 
+#define GSC_HECI_REPLY_LATENCY_MS 350
+/*
+ * Max FW response time is 350ms, but this should be counted from the time the
+ * command has hit the GSC-CS hardware, not the preceding handoff to GuC CTB.
+ */
+
 struct intel_gsc_mtl_header {
u32 validity_marker;
 #define GSC_HECI_VALIDITY_MARKER 0xA578875A
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
index 298ad38e6c7d..a4f17b3ea286 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
@@ -8,16 +8,19 @@
 
 #include 
 
+#include "gt/uc/intel_gsc_uc_heci_cmd_submit.h"
+
 struct intel_pxp;
 
-#define GSC_REPLY_LATENCY_MS 210
+#define GSC_REPLY_LATENCY_MS GSC_HECI_REPLY_LATENCY_MS
 /*
- * Max FW response time is 200ms, to which we add 10ms to account for overhead
- * such as request preparation, GuC submission to hw and pipeline completion 
times.
+ * Max FW response time is 350ms, but this should be counted from the time the
+ * command has hit the GSC-CS hardware, not the preceding handoff to GuC CTB.
  */
 #define GSC_PENDING_RETRY_MAXCOUNT 40
 #define GSC_PENDING_RETRY_PAUSE_MS 50
-#define GSCFW_MAX_ROUND_TRIP_LATENCY_MS (GSC_PENDING_RETRY_MAXCOUNT * 
GSC_PENDING_RETRY_PAUSE_MS)
+#define GSCFW_MAX_ROUND_TRIP_LATENCY_MS (GSC_REPLY_LATENCY_MS + \
+(GSC_PENDING_RETRY_MAXCOUNT * 
GSC_PENDING_RETRY_PAUSE_MS))
 
 #ifdef CONFIG_DRM_I915_PXP
 void intel_pxp_gsccs_fini(struct intel_pxp *pxp);
-- 
2.39.0



[Intel-gfx] [PATCH v5 2/3] drm/i915/pxp/mtl: Update pxp-firmware packet size

2023-09-09 Thread Alan Previn
Update the GSC-fw input/output HECI packet size to match
updated internal fw specs.

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
index 0165d38fbead..e017a7d952e9 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
@@ -14,8 +14,8 @@
 #define PXP43_CMDID_NEW_HUC_AUTH 0x003F /* MTL+ */
 #define PXP43_CMDID_INIT_SESSION 0x0036
 
-/* PXP-Packet sizes for MTL's GSCCS-HECI instruction */
-#define PXP43_MAX_HECI_INOUT_SIZE (SZ_32K)
+/* PXP-Packet sizes for MTL's GSCCS-HECI instruction is spec'd at 65K before 
page alignment*/
+#define PXP43_MAX_HECI_INOUT_SIZE (PAGE_ALIGNED(SZ_64K + SZ_1K))
 
 /* PXP-Packet size for MTL's NEW_HUC_AUTH instruction */
 #define PXP43_HUC_AUTH_INOUT_SIZE (SZ_4K)
-- 
2.39.0



[Intel-gfx] [PATCH v5 0/3] drm/i915/pxp/mtl: Update gsc-heci cmd submission to align with fw/hw spec

2023-09-09 Thread Alan Previn
For MTL, update the GSC-HECI packet size and the max firmware
response timeout to match internal fw specs. Enforce setting
run-alone bit in LRC for protected contexts.

Changes from prio revs:
   v4: - PAGE_ALIGN the max heci packet size (Alan).
   v3: - Patch #1. Only start counting the request completion
 timeout from after the request has started (Daniele).
   v2: - Patch #3: fix sparse warning reported by kernel test robot.
   v1: - N/A (Re-test)

Signed-off-by: Alan Previn 

Alan Previn (3):
  drm/i915/pxp/mtl: Update pxp-firmware response timeout
  drm/i915/pxp/mtl: Update pxp-firmware packet size
  drm/i915/lrc: User PXP contexts requires runalone bit in lrc

 drivers/gpu/drm/i915/gt/intel_lrc.c   | 23 +++
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 ++--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 +
 .../drm/i915/pxp/intel_pxp_cmd_interface_43.h |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 11 +
 5 files changed, 56 insertions(+), 8 deletions(-)


base-commit: f8d21cb17a99b75862196036bb4bb93ee9637b74
-- 
2.39.0



Re: [Intel-gfx] [PATCH v4 2/3] drm/i915/pxp/mtl: Update pxp-firmware packet size

2023-09-06 Thread Teres Alexis, Alan Previn
On Wed, 2023-09-06 at 17:15 -0700, Teres Alexis, Alan Previn wrote:
> Update the GSC-fw input/output HECI packet size to match
> updated internal fw specs.
alan:snip

> +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
> @@ -14,8 +14,8 @@
> 


> +/* PXP-Packet sizes for MTL's GSCCS-HECI instruction is 65K*/
> +#define PXP43_MAX_HECI_INOUT_SIZE (SZ_64K + SZ_1K)
alan: my mistake - didnt fix this before posting - should have been 
PAGE_ALIGN(65k)




[Intel-gfx] [PATCH v1 1/1] drm/i915/pxp: Add drm_dbgs for critical PXP events.

2023-09-06 Thread Alan Previn
Debugging PXP issues can't even begin without understanding precedding
sequence of events. Add drm_dbg into the most important PXP events.

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c |  2 ++
 drivers/gpu/drm/i915/pxp/intel_pxp.c | 10 --
 drivers/gpu/drm/i915/pxp/intel_pxp_irq.c |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_session.c |  6 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_types.h   |  1 +
 5 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
index 5f138de3c14f..61216c4abaec 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.c
@@ -321,6 +321,7 @@ static int i915_gsc_proxy_component_bind(struct device 
*i915_kdev,
mutex_lock(&gsc->proxy.mutex);
gsc->proxy.component = data;
gsc->proxy.component->mei_dev = mei_kdev;
+   gt_dbg(gt, "GSC proxy mei component bound\n");
mutex_unlock(&gsc->proxy.mutex);
 
return 0;
@@ -342,6 +343,7 @@ static void i915_gsc_proxy_component_unbind(struct device 
*i915_kdev,
with_intel_runtime_pm(&i915->runtime_pm, wakeref)
intel_uncore_rmw(gt->uncore, HECI_H_CSR(MTL_GSC_HECI2_BASE),
 HECI_H_CSR_IE | HECI_H_CSR_RST, 0);
+   gt_dbg(gt, "GSC proxy mei component unbound\n");
 }
 
 static const struct component_ops i915_gsc_proxy_component_ops = {
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index dc327cf40b5a..d285f10bbacc 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -303,6 +303,8 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
 
if (!pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: %s invoked", __func__);
/*
 * To ensure synchronous and coherent session teardown completion
 * in response to suspend or shutdown triggers, don't use a worker.
@@ -324,6 +326,8 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 
if (pxp->arb_is_valid)
return 0;
+
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: %s invoked", __func__);
/*
 * The arb-session is currently inactive and we are doing a reset and 
restart
 * due to a runtime event. Use the worker that was designed for this.
@@ -414,10 +418,12 @@ int intel_pxp_start(struct intel_pxp *pxp)
int ret = 0;
 
ret = intel_pxp_get_readiness_status(pxp, PXP_READINESS_TIMEOUT);
-   if (ret < 0)
+   if (ret < 0) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: explicit %s failed on 
readiness with %d", __func__, ret);
return ret;
-   else if (ret > 1)
+   } else if (ret > 1) {
return -EIO; /* per UAPI spec, user may retry later */
+   }
 
mutex_lock(&pxp->arb_mutex);
 
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
index 91e9622c07d0..0637b1d36356 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_irq.c
@@ -40,11 +40,11 @@ void intel_pxp_irq_handler(struct intel_pxp *pxp, u16 iir)
   GEN12_DISPLAY_APP_TERMINATED_PER_FW_REQ_INTERRUPT)) {
/* immediately mark PXP as inactive on termination */
intel_pxp_mark_termination_in_progress(pxp);
-   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED;
+   pxp->session_events |= PXP_TERMINATION_REQUEST | 
PXP_INVAL_REQUIRED | PXP_EVENT_TYPE_IRQ;
}
 
if (iir & GEN12_DISPLAY_STATE_RESET_COMPLETE_INTERRUPT)
-   pxp->session_events |= PXP_TERMINATION_COMPLETE;
+   pxp->session_events |= PXP_TERMINATION_COMPLETE | 
PXP_EVENT_TYPE_IRQ;
 
if (pxp->session_events)
queue_work(system_unbound_wq, &pxp->session_work);
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
index 0a3e66b0265e..2041dd5221e7 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c
@@ -137,8 +137,10 @@ void intel_pxp_terminate(struct intel_pxp *pxp, bool 
post_invalidation_needs_res
 static void pxp_terminate_complete(struct intel_pxp *pxp)
 {
/* Re-create the arb session after teardown handle complete */
-   if (fetch_and_zero(&pxp->hw_state_invalidated))
+   if (fetch_and_zero(&pxp->hw_state_invalidated)) {
+   drm_dbg(&pxp->ctrl_gt->i915->drm, "PXP: %s to create 
arb_session after invalidation", __func__);

[Intel-gfx] [PATCH v4 2/3] drm/i915/pxp/mtl: Update pxp-firmware packet size

2023-09-06 Thread Alan Previn
Update the GSC-fw input/output HECI packet size to match
updated internal fw specs.

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
index 0165d38fbead..b2196b008f26 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
@@ -14,8 +14,8 @@
 #define PXP43_CMDID_NEW_HUC_AUTH 0x003F /* MTL+ */
 #define PXP43_CMDID_INIT_SESSION 0x0036
 
-/* PXP-Packet sizes for MTL's GSCCS-HECI instruction */
-#define PXP43_MAX_HECI_INOUT_SIZE (SZ_32K)
+/* PXP-Packet sizes for MTL's GSCCS-HECI instruction is 65K*/
+#define PXP43_MAX_HECI_INOUT_SIZE (SZ_64K + SZ_1K)
 
 /* PXP-Packet size for MTL's NEW_HUC_AUTH instruction */
 #define PXP43_HUC_AUTH_INOUT_SIZE (SZ_4K)
-- 
2.39.0



[Intel-gfx] [PATCH v4 1/3] drm/i915/pxp/mtl: Update pxp-firmware response timeout

2023-09-06 Thread Alan Previn
Update the max GSC-fw response time to match updated internal
fw specs. Because this response time is an SLA on the firmware,
not inclusive of i915->GuC->HW handoff latency, when submitting
requests to the GSC fw via intel_gsc_uc_heci_cmd_submit helpers,
start the count after the request hits the GSC command streamer.
Also, move GSC_REPLY_LATENCY_MS definition from pxp header to
intel_gsc_uc_heci_cmd_submit.h since its for any GSC HECI packet.

Signed-off-by: Alan Previn 
---
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 +--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 ++
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 11 ++
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
index 89ed5ee9cded..fe6a2f78cea0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c
@@ -81,8 +81,17 @@ int intel_gsc_uc_heci_cmd_submit_packet(struct intel_gsc_uc 
*gsc, u64 addr_in,
 
i915_request_add(rq);
 
-   if (!err && i915_request_wait(rq, 0, msecs_to_jiffies(500)) < 0)
-   err = -ETIME;
+   if (!err) {
+   /*
+* Start timeout for i915_request_wait only after considering 
one possible
+* pending GSC-HECI submission cycle on the other 
(non-privileged) path.
+*/
+   if (wait_for(i915_request_started(rq), 
GSC_HECI_REPLY_LATENCY_MS))
+   drm_dbg(&gsc_uc_to_gt(gsc)->i915->drm,
+   "Delay in gsc-heci-priv submission to 
gsccs-hw");
+   if (i915_request_wait(rq, 0, msecs_to_jiffies(500)) < 0)
+   err = -ETIME;
+   }
 
i915_request_put(rq);
 
@@ -186,6 +195,13 @@ intel_gsc_uc_heci_cmd_submit_nonpriv(struct intel_gsc_uc 
*gsc,
i915_request_add(rq);
 
if (!err) {
+   /*
+* Start timeout for i915_request_wait only after considering 
one possible
+* pending GSC-HECI submission cycle on the other (privileged) 
path.
+*/
+   if (wait_for(i915_request_started(rq), 
GSC_HECI_REPLY_LATENCY_MS))
+   drm_dbg(&gsc_uc_to_gt(gsc)->i915->drm,
+   "Delay in gsc-heci-non-priv submission to 
gsccs-hw");
if (i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE,
  msecs_to_jiffies(timeout_ms)) < 0)
err = -ETIME;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
index 09d3fbdad05a..5ae5c5d9608b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h
@@ -12,6 +12,12 @@ struct i915_vma;
 struct intel_context;
 struct intel_gsc_uc;
 
+#define GSC_HECI_REPLY_LATENCY_MS 350
+/*
+ * Max FW response time is 350ms, but this should be counted from the time the
+ * command has hit the GSC-CS hardware, not the preceding handoff to GuC CTB.
+ */
+
 struct intel_gsc_mtl_header {
u32 validity_marker;
 #define GSC_HECI_VALIDITY_MARKER 0xA578875A
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
index 298ad38e6c7d..a4f17b3ea286 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h
@@ -8,16 +8,19 @@
 
 #include 
 
+#include "gt/uc/intel_gsc_uc_heci_cmd_submit.h"
+
 struct intel_pxp;
 
-#define GSC_REPLY_LATENCY_MS 210
+#define GSC_REPLY_LATENCY_MS GSC_HECI_REPLY_LATENCY_MS
 /*
- * Max FW response time is 200ms, to which we add 10ms to account for overhead
- * such as request preparation, GuC submission to hw and pipeline completion 
times.
+ * Max FW response time is 350ms, but this should be counted from the time the
+ * command has hit the GSC-CS hardware, not the preceding handoff to GuC CTB.
  */
 #define GSC_PENDING_RETRY_MAXCOUNT 40
 #define GSC_PENDING_RETRY_PAUSE_MS 50
-#define GSCFW_MAX_ROUND_TRIP_LATENCY_MS (GSC_PENDING_RETRY_MAXCOUNT * 
GSC_PENDING_RETRY_PAUSE_MS)
+#define GSCFW_MAX_ROUND_TRIP_LATENCY_MS (GSC_REPLY_LATENCY_MS + \
+(GSC_PENDING_RETRY_MAXCOUNT * 
GSC_PENDING_RETRY_PAUSE_MS))
 
 #ifdef CONFIG_DRM_I915_PXP
 void intel_pxp_gsccs_fini(struct intel_pxp *pxp);
-- 
2.39.0



[Intel-gfx] [PATCH v4 0/3] drm/i915/pxp/mtl: Update gsc-heci cmd submission to align with fw/hw spec

2023-09-06 Thread Alan Previn
For MTL, update the GSC-HECI packet size and the max firmware
response timeout to match internal fw specs. Enforce setting
run-alone bit in LRC for protected contexts.

Changes from prio revs:
   v3: - Patch #1. Only start counting the request completion
 timeout from after the request has started (Daniele).
   v2: - Patch #3: fix sparse warning reported by kernel test robot.
   v1: - N/A (Re-test)

Signed-off-by: Alan Previn 

Alan Previn (3):
  drm/i915/pxp/mtl: Update pxp-firmware response timeout
  drm/i915/pxp/mtl: Update pxp-firmware packet size
  drm/i915/lrc: User PXP contexts requires runalone bit in lrc

 drivers/gpu/drm/i915/gt/intel_lrc.c   | 23 +++
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 ++--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 +
 .../drm/i915/pxp/intel_pxp_cmd_interface_43.h |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 11 +
 5 files changed, 56 insertions(+), 8 deletions(-)


base-commit: 5008076127a9599704e98fb4de3761743d943dd0
-- 
2.39.0



[Intel-gfx] [PATCH v4 3/3] drm/i915/lrc: User PXP contexts requires runalone bit in lrc

2023-09-06 Thread Alan Previn
On Meteorlake onwards, HW specs require that all user contexts that
run on render or compute engines and require PXP must enforce
run-alone bit in lrc. Add this enforcement for protected contexts.

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 967fe4d77a87..3df32177e49e 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -845,6 +845,27 @@ lrc_setup_indirect_ctx(u32 *regs,
lrc_ring_indirect_offset_default(engine) << 6;
 }
 
+static bool ctx_needs_runalone(const struct intel_context *ce)
+{
+   struct i915_gem_context *gem_ctx;
+   bool ctx_is_protected = false;
+
+   /*
+* On MTL and newer platforms, protected contexts require setting
+* the LRC run-alone bit or else the encryption will not happen.
+*/
+   if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 70) &&
+   (ce->engine->class == COMPUTE_CLASS || ce->engine->class == 
RENDER_CLASS)) {
+   rcu_read_lock();
+   gem_ctx = rcu_dereference(ce->gem_context);
+   if (gem_ctx)
+   ctx_is_protected = gem_ctx->uses_protected_content;
+   rcu_read_unlock();
+   }
+
+   return ctx_is_protected;
+}
+
 static void init_common_regs(u32 * const regs,
 const struct intel_context *ce,
 const struct intel_engine_cs *engine,
@@ -860,6 +881,8 @@ static void init_common_regs(u32 * const regs,
if (GRAPHICS_VER(engine->i915) < 11)
ctl |= _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT |
   CTX_CTRL_RS_CTX_ENABLE);
+   if (ctx_needs_runalone(ce))
+   ctl |= _MASKED_BIT_ENABLE(BIT(7));
regs[CTX_CONTEXT_CONTROL] = ctl;
 
regs[CTX_TIMESTAMP] = ce->stats.runtime.last;
-- 
2.39.0



[Intel-gfx] [PATCH v4 0/3] drm/i915/pxp/mtl: Update gsc-heci cmd submission to align with fw/hw spec

2023-09-06 Thread Alan Previn
For MTL, update the GSC-HECI packet size and the max firmware
response timeout to match internal fw specs. Enforce setting
run-alone bit in LRC for protected contexts.

Changes from prio revs:
   v3: - Patch #1. Only start counting the request completion
 timeout from after the request has started (Daniele).
   v2: - Patch #3: fix sparse warning reported by kernel test robot.
   v1: - N/A (Re-test)

Signed-off-by: Alan Previn 

Alan Previn (3):
  drm/i915/pxp/mtl: Update pxp-firmware response timeout
  drm/i915/pxp/mtl: Update pxp-firmware packet size
  drm/i915/lrc: User PXP contexts requires runalone bit in lrc

 drivers/gpu/drm/i915/gt/intel_lrc.c   | 23 +++
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.c | 20 ++--
 .../i915/gt/uc/intel_gsc_uc_heci_cmd_submit.h |  6 +
 .../drm/i915/pxp/intel_pxp_cmd_interface_43.h |  4 ++--
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h| 11 +
 5 files changed, 56 insertions(+), 8 deletions(-)


base-commit: 5008076127a9599704e98fb4de3761743d943dd0
-- 
2.39.0



Re: [Intel-gfx] [PATCH v2 2/3] drm/i915/guc: Close deregister-context race against CT-loss

2023-08-28 Thread Teres Alexis, Alan Previn
Additional update from the most recent testing.

When relying solely on guc_lrc_desc_unpin getting a failure from 
deregister_context
as a means for identifying that we are in the 
"deregister-context-vs-suspend-late" race,
it is too late a location to handle this safely. This is because one of the
first things destroyed_worker_func does it to take a gt pm wakeref - which 
triggers
the gt_unpark function that does a whole lot bunch of other flows including 
triggering more
workers and taking additional refs. That said, its best to not even call
deregister_destroyed_contexts from the worker when !intel_guc_is_ready 
(ct-is-disabled).

...alan


On Fri, 2023-08-25 at 11:54 -0700, Teres Alexis, Alan Previn wrote:
> just a follow up note-to-self:
> 
> On Tue, 2023-08-15 at 12:08 -0700, Teres Alexis, Alan Previn wrote:
> > On Tue, 2023-08-15 at 09:56 -0400, Vivi, Rodrigo wrote:
> > > On Mon, Aug 14, 2023 at 06:12:09PM -0700, Alan Previn wrote:
> > > > 
> [snip]
> 
> in guc_submission_send_busy_loop, we are incrementing the following
> that needs to be decremented if the function fails.
> 
> atomic_inc(&guc->outstanding_submission_g2h);
> 
> also, it seems that even with thie unroll design - we are still
> leaking a wakeref elsewhere. this is despite a cleaner redesign of
> flows in function "guc_lrc_desc_unpin"
> (discussed earlier that wasnt very readible).
> 
> will re-rev today but will probably need more follow ups
> tracking that one more leaking gt-wakeref (one in thousands-cycles)
> but at least now we are not hanging mid-suspend.. we bail from suspend
> with useful kernel messages.
> 
> 
> 
> 



  1   2   3   4   5   6   7   8   9   10   >