date:20210820

Re: [PATCH linux-next] drm: drop unneeded assignment in the fx_v6_0_enable_mgcg()

2021-08-20 Thread Christophe JAILLET


Hi,

Le 21/08/2021 à 04:08, CGEL a écrit :

From: Luo penghao 

The first assignment is not used. In order to keep the code style
consistency of the whole file, the first 'data' assignment should be
deleted.

The clang_analyzer complains as follows:

drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c:2608:10: warning:
Although the value storedto 'offset' is used in the enclosing expression,
the value is never actually read from 'offset'.


Apparently clang only spotted on place, at line 2608.



Reported-by: Zeal Robot 
Signed-off-by: Luo penghao 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 6a8dade..84a5f22 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -2605,7 +2605,7 @@ static void gfx_v6_0_enable_mgcg(struct amdgpu_device 
*adev, bool enable)
u32 data, orig, tmp = 0;
  
  	if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG)) {

-   orig = data = RREG32(mmCGTS_SM_CTRL_REG);
+   orig = RREG32(mmCGTS_SM_CTRL_REG);
data = 0x96940200;
if (orig != data)
WREG32(mmCGTS_SM_CTRL_REG, data);
@@ -2617,7 +2617,7 @@ static void gfx_v6_0_enable_mgcg(struct amdgpu_device 
*adev, bool enable)
WREG32(mmCP_MEM_SLP_CNTL, data);
}
  
-		orig = data = RREG32(mmRLC_CGTT_MGCG_OVERRIDE);

+   orig = RREG32(mmRLC_CGTT_MGCG_OVERRIDE);
data &= 0xffc0;

 ^^
but you also change here where it is used.


if (orig != data)
WREG32(mmRLC_CGTT_MGCG_OVERRIDE, data);



CJ

Re: [PATCH 19/27] drm/i915/guc: Move guc_blocked fence to struct guc_state

2021-08-20 Thread Daniele Ceraolo Spurio





On 8/18/2021 11:16 PM, Matthew Brost wrote:

Move guc_blocked fence to struct guc_state as the lock which protects
the fence lives there.

s/ce->guc_blocked/ce->guc_state.blocked_fence/g


Could also call it just ce->guc_state.blocked, blocked_fence sounds to 
me like the fence itself is blocked.


Reviewed-by: Daniele Ceraolo Spurio 

Daniele



Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/intel_context.c|  5 +++--
  drivers/gpu/drm/i915/gt/intel_context_types.h  |  5 ++---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c  | 18 +-
  3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 745e84c72c90..0e48939ec85f 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -405,8 +405,9 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
 * Initialize fence to be complete as this is expected to be complete
 * unless there is a pending schedule disable outstanding.
 */
-   i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
-   i915_sw_fence_commit(&ce->guc_blocked);
+   i915_sw_fence_init(&ce->guc_state.blocked_fence,
+  sw_fence_dummy_notify);
+   i915_sw_fence_commit(&ce->guc_state.blocked_fence);
  
  	i915_active_init(&ce->active,

 __intel_context_active, __intel_context_retire, 0);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 3a73f3117873..c06171ee8792 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -167,6 +167,8 @@ struct intel_context {
 * fence related to GuC submission
 */
struct list_head fences;
+   /* GuC context blocked fence */
+   struct i915_sw_fence blocked_fence;
} guc_state;
  
  	struct {

@@ -190,9 +192,6 @@ struct intel_context {
 */
struct list_head guc_id_link;
  
-	/* GuC context blocked fence */

-   struct i915_sw_fence guc_blocked;
-
/*
 * GuC priority management
 */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index deb2e821e441..053f4485d6e9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1490,24 +1490,24 @@ static void guc_blocked_fence_complete(struct 
intel_context *ce)
  {
lockdep_assert_held(&ce->guc_state.lock);
  
-	if (!i915_sw_fence_done(&ce->guc_blocked))

-   i915_sw_fence_complete(&ce->guc_blocked);
+   if (!i915_sw_fence_done(&ce->guc_state.blocked_fence))
+   i915_sw_fence_complete(&ce->guc_state.blocked_fence);
  }
  
  static void guc_blocked_fence_reinit(struct intel_context *ce)

  {
lockdep_assert_held(&ce->guc_state.lock);
-   GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked));
+   GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_state.blocked_fence));
  
  	/*

 * This fence is always complete unless a pending schedule disable is
 * outstanding. We arm the fence here and complete it when we receive
 * the pending schedule disable complete message.
 */
-   i915_sw_fence_fini(&ce->guc_blocked);
-   i915_sw_fence_reinit(&ce->guc_blocked);
-   i915_sw_fence_await(&ce->guc_blocked);
-   i915_sw_fence_commit(&ce->guc_blocked);
+   i915_sw_fence_fini(&ce->guc_state.blocked_fence);
+   i915_sw_fence_reinit(&ce->guc_state.blocked_fence);
+   i915_sw_fence_await(&ce->guc_state.blocked_fence);
+   i915_sw_fence_commit(&ce->guc_state.blocked_fence);
  }
  
  static u16 prep_context_pending_disable(struct intel_context *ce)

@@ -1547,7 +1547,7 @@ static struct i915_sw_fence *guc_context_block(struct 
intel_context *ce)
if (enabled)
clr_context_enabled(ce);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
-   return &ce->guc_blocked;
+   return &ce->guc_state.blocked_fence;
}
  
  	/*

@@ -1563,7 +1563,7 @@ static struct i915_sw_fence *guc_context_block(struct 
intel_context *ce)
with_intel_runtime_pm(runtime_pm, wakeref)
__guc_context_sched_disable(guc, ce, guc_id);
  
-	return &ce->guc_blocked;

+   return &ce->guc_state.blocked_fence;
  }
  
  static void guc_context_unblock(struct intel_context *ce)

Re: [PATCH 17/27] drm/i915/guc: Flush G2H work queue during reset

2021-08-20 Thread Daniele Ceraolo Spurio





On 8/18/2021 11:16 PM, Matthew Brost wrote:

It isn't safe to scrub for missing G2H or continue with the reset until
all G2H processing is complete. Flush the G2H work queue during reset to
ensure it is done running.


Might be worth moving this patch closer to "drm/i915/guc: Process all 
G2H message at once in work queue".



Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface")
Signed-off-by: Matthew Brost 
---
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c  | 18 ++
  1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 4cf5a565f08e..9a53bae367b1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -714,8 +714,6 @@ static void guc_flush_submissions(struct intel_guc *guc)
  
  void intel_guc_submission_reset_prepare(struct intel_guc *guc)

  {
-   int i;
-
if (unlikely(!guc_submission_initialized(guc))) {
/* Reset called during driver load? GuC not yet initialised! */
return;
@@ -731,20 +729,8 @@ void intel_guc_submission_reset_prepare(struct intel_guc 
*guc)
  
  	guc_flush_submissions(guc);
  
-	/*

-* Handle any outstanding G2Hs before reset. Call IRQ handler directly
-* each pass as interrupt have been disabled. We always scrub for
-* outstanding G2H as it is possible for outstanding_submission_g2h to
-* be incremented after the context state update.
-*/
-   for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); 
++i) {
-   intel_guc_to_host_event_handler(guc);
-#define wait_for_reset(guc, wait_var) \
-   intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
-   do {
-   wait_for_reset(guc, &guc->outstanding_submission_g2h);
-   } while (!list_empty(&guc->ct.requests.incoming));
-   }
+   flush_work(&guc->ct.requests.worker);
+


We're now not waiting in the requests anymore, just ensuring that the 
processing of the ones we already received is done. Is this intended? We 
do still handle the remaining oustanding submission in the scrub so it's 
functionally correct, but the commit message doesn't state the change in 
waiting behavior, so wanted to double check it was planned.


Daniele


scrub_guc_desc_for_outstanding_g2h(guc);
  }

Re: [PATCH 15/27] drm/i915/guc: Reset LRC descriptor if register returns -ENODEV

2021-08-20 Thread Daniele Ceraolo Spurio





On 8/18/2021 11:16 PM, Matthew Brost wrote:

Reset LRC descriptor if a context register returns -ENODEV as this means
we are mid-reset.

Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface")
Signed-off-by: Matthew Brost 


Reviewed-by: Daniele Ceraolo Spurio 

Daniele


---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index fa87470ea576..4cf5a565f08e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1407,10 +1407,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, 
bool loop)
} else {
with_intel_runtime_pm(runtime_pm, wakeref)
ret = register_context(ce, loop);
-   if (unlikely(ret == -EBUSY))
+   if (unlikely(ret == -EBUSY)) {
+   reset_lrc_desc(guc, desc_idx);
+   } else if (unlikely(ret == -ENODEV)) {
reset_lrc_desc(guc, desc_idx);
-   else if (unlikely(ret == -ENODEV))
ret = 0;/* Will get registered later */
+   }
}
  
  	return ret;

Re: [PATCH 13/27] drm/i915/guc: Take context ref when cancelling request

2021-08-20 Thread Daniele Ceraolo Spurio





On 8/18/2021 11:16 PM, Matthew Brost wrote:

A context can get destroyed after cancelling a request so take a
reference to context when cancelling a request.


What's the exact race? AFAICS __i915_request_skip does not have a 
context_put().


Daniele



Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index e0e85e4ad512..85f96d325048 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1620,8 +1620,10 @@ static void guc_context_cancel_request(struct 
intel_context *ce,
   struct i915_request *rq)
  {
if (i915_sw_fence_signaled(&rq->submit)) {
-   struct i915_sw_fence *fence = guc_context_block(ce);
+   struct i915_sw_fence *fence;
  
+		intel_context_get(ce);

+   fence = guc_context_block(ce);
i915_sw_fence_wait(fence);
if (!i915_request_completed(rq)) {
__i915_request_skip(rq);
@@ -1636,6 +1638,7 @@ static void guc_context_cancel_request(struct 
intel_context *ce,
flush_work(&ce_to_guc(ce)->ct.requests.worker);
  
  		guc_context_unblock(ce);

+   intel_context_put(ce);
}
  }

Re: [PATCH v3 15/15] drm/mediatek: add mediatek-drm of vdosys1 support for MT8195

2021-08-20 Thread Chun-Kuang Hu

Hi, Nancy:

Nancy.Lin  於 2021年8月18日 週三 下午5:18寫道：
>
> Add driver data of mt8195 vdosys1 to mediatek-drm and modify drm for
> multi-mmsys support. The two mmsys (vdosys0 and vdosys1) will bring
> up two drm drivers, only one drm driver register as the drm device.
> Each drm driver binds its own component. The first bind drm driver
> will allocate the drm device, and the last bind drm driver registers
> the drm device to drm core. Each crtc path is created with the
> corresponding drm driver data.

Separate this patch to two patch. One is support two mmsys, and
another one is support mt8195 vdosys1.

Regards,
Chun-Kuang.

>
> Signed-off-by: Nancy.Lin 
> ---
>  drivers/gpu/drm/mediatek/mtk_disp_merge.c   |   4 +
>  drivers/gpu/drm/mediatek/mtk_drm_crtc.c |  25 +-
>  drivers/gpu/drm/mediatek/mtk_drm_crtc.h |   3 +-
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c |  15 +
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h |   1 +
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c  | 372 
>  drivers/gpu/drm/mediatek/mtk_drm_drv.h  |   8 +-
>  7 files changed, 354 insertions(+), 74 deletions(-)
>
>

[PATCH 18/27] drm/i915/guc: Update debugfs for GuC multi-lrc

2021-08-20 Thread Matthew Brost

Display the workqueue status in debugfs for GuC contexts that are in
parent-child relationship.

Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++-
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index e34e0ea9136a..07eee9a399c8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3673,6 +3673,26 @@ static void guc_log_context_priority(struct drm_printer 
*p,
drm_printf(p, "\n");
 }
 
+
+static inline void guc_log_context(struct drm_printer *p,
+  struct intel_context *ce)
+{
+   drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id.id);
+   drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
+   drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
+  ce->ring->head,
+  ce->lrc_reg_state[CTX_RING_HEAD]);
+   drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
+  ce->ring->tail,
+  ce->lrc_reg_state[CTX_RING_TAIL]);
+   drm_printf(p, "\t\tContext Pin Count: %u\n",
+  atomic_read(&ce->pin_count));
+   drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
+  atomic_read(&ce->guc_id.ref));
+   drm_printf(p, "\t\tSchedule State: 0x%x\n\n",
+  ce->guc_state.sched_state);
+}
+
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 struct drm_printer *p)
 {
@@ -3682,22 +3702,25 @@ void intel_guc_submission_print_context_info(struct 
intel_guc *guc,
 
xa_lock_irqsave(&guc->context_lookup, flags);
xa_for_each(&guc->context_lookup, index, ce) {
-   drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id.id);
-   drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
-   drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
-  ce->ring->head,
-  ce->lrc_reg_state[CTX_RING_HEAD]);
-   drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
-  ce->ring->tail,
-  ce->lrc_reg_state[CTX_RING_TAIL]);
-   drm_printf(p, "\t\tContext Pin Count: %u\n",
-  atomic_read(&ce->pin_count));
-   drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
-  atomic_read(&ce->guc_id.ref));
-   drm_printf(p, "\t\tSchedule State: 0x%x\n\n",
-  ce->guc_state.sched_state);
+   GEM_BUG_ON(intel_context_is_child(ce));
 
+   guc_log_context(p, ce);
guc_log_context_priority(p, ce);
+
+   if (intel_context_is_parent(ce)) {
+   struct guc_process_desc *desc = __get_process_desc(ce);
+   struct intel_context *child;
+
+   drm_printf(p, "\t\tWQI Head: %u\n",
+  READ_ONCE(desc->head));
+   drm_printf(p, "\t\tWQI Tail: %u\n",
+  READ_ONCE(desc->tail));
+   drm_printf(p, "\t\tWQI Status: %u\n\n",
+  READ_ONCE(desc->wq_status));
+
+   for_each_child(ce, child)
+   guc_log_context(p, child);
+   }
}
xa_unlock_irqrestore(&guc->context_lookup, flags);
 }
-- 
2.32.0

[PATCH 20/27] drm/i915/guc: Connect UAPI to GuC multi-lrc interface

2021-08-20 Thread Matthew Brost

Introduce 'set parallel submit' extension to connect UAPI to GuC
multi-lrc interface. Kernel doc in new uAPI should explain it all.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: link to come

v2:
 (Daniel Vetter)
  - Add IGT link and placeholder for media UMD link

Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 220 +-
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   6 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
 drivers/gpu/drm/i915/gt/intel_engine.h|  12 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |   6 +-
 .../drm/i915/gt/intel_execlists_submission.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 114 -
 include/uapi/drm/i915_drm.h   | 128 ++
 9 files changed, 485 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index bcaaf514876b..de0fd145fb47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -522,9 +522,149 @@ set_proto_ctx_engines_bond(struct i915_user_extension 
__user *base, void *data)
return 0;
 }
 
+static int
+set_proto_ctx_engines_parallel_submit(struct i915_user_extension __user *base,
+ void *data)
+{
+   struct i915_context_engines_parallel_submit __user *ext =
+   container_of_user(base, typeof(*ext), base);
+   const struct set_proto_ctx_engines *set = data;
+   struct drm_i915_private *i915 = set->i915;
+   u64 flags;
+   int err = 0, n, i, j;
+   u16 slot, width, num_siblings;
+   struct intel_engine_cs **siblings = NULL;
+   intel_engine_mask_t prev_mask;
+
+   /* Disabling for now */
+   return -ENODEV;
+
+   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
+   return -ENODEV;
+
+   if (get_user(slot, &ext->engine_index))
+   return -EFAULT;
+
+   if (get_user(width, &ext->width))
+   return -EFAULT;
+
+   if (get_user(num_siblings, &ext->num_siblings))
+   return -EFAULT;
+
+   if (slot >= set->num_engines) {
+   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
+   slot, set->num_engines);
+   return -EINVAL;
+   }
+
+   if (set->engines[slot].type != I915_GEM_ENGINE_TYPE_INVALID) {
+   drm_dbg(&i915->drm,
+   "Invalid placement[%d], already occupied\n", slot);
+   return -EINVAL;
+   }
+
+   if (get_user(flags, &ext->flags))
+   return -EFAULT;
+
+   if (flags) {
+   drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags);
+   return -EINVAL;
+   }
+
+   for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+   err = check_user_mbz(&ext->mbz64[n]);
+   if (err)
+   return err;
+   }
+
+   if (width < 2) {
+   drm_dbg(&i915->drm, "Width (%d) < 2\n", width);
+   return -EINVAL;
+   }
+
+   if (num_siblings < 1) {
+   drm_dbg(&i915->drm, "Number siblings (%d) < 1\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
+   siblings = kmalloc_array(num_siblings * width,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return -ENOMEM;
+
+   /* Create contexts / engines */
+   for (i = 0; i < width; ++i) {
+   intel_engine_mask_t current_mask = 0;
+   struct i915_engine_class_instance prev_engine;
+
+   for (j = 0; j < num_siblings; ++j) {
+   struct i915_engine_class_instance ci;
+
+   n = i * num_siblings + j;
+   if (copy_from_user(&ci, &ext->engines[n], sizeof(ci))) {
+   err = -EFAULT;
+   goto out_err;
+   }
+
+   siblings[n] =
+   intel_engine_lookup_user(i915, ci.engine_class,
+ci.engine_instance);
+   if (!siblings[n]) {
+   drm_dbg(&i915->drm,
+   "Invalid sibling[%d]: { class:%d, 
inst:%d }\n",
+   n, ci.engine_class, ci.engine_instance);
+   err = -EINVAL;
+   goto out_err;
+   }
+
+   if (n) {
+   if (prev_engine.engine_class !=
+   ci.engine_class) {
+   drm_dbg(&i915->drm,
+

[PATCH 19/27] drm/i915: Fix bug in user proto-context creation that leaked contexts

2021-08-20 Thread Matthew Brost

Set number of engines before attempting to create contexts so the
function free_engines can clean up properly.

Fixes: d4433c7600f7 ("drm/i915/gem: Use the proto-context to handle create 
parameters (v5)")
Signed-off-by: Matthew Brost 
Cc: 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dbaeb924a437..bcaaf514876b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -944,6 +944,7 @@ static struct i915_gem_engines *user_engines(struct 
i915_gem_context *ctx,
unsigned int n;
 
e = alloc_engines(num_engines);
+   e->num_engines = num_engines;
for (n = 0; n < num_engines; n++) {
struct intel_context *ce;
int ret;
@@ -977,7 +978,6 @@ static struct i915_gem_engines *user_engines(struct 
i915_gem_context *ctx,
goto free_engines;
}
}
-   e->num_engines = num_engines;
 
return e;
 
-- 
2.32.0

[PATCH 22/27] drm/i915/guc: Add basic GuC multi-lrc selftest

2021-08-20 Thread Matthew Brost

Add very basic (single submission) multi-lrc selftest.

Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   1 +
 .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   | 180 ++
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 3 files changed, 182 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2554d0eb4afd..91330525330d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3924,4 +3924,5 @@ bool intel_guc_virtual_engine_has_heartbeat(const struct 
intel_engine_cs *ve)
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_guc.c"
+#include "selftest_guc_multi_lrc.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c 
b/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
new file mode 100644
index ..dacfc5dfadd6
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_multi_lrc.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright �� 2019 Intel Corporation
+ */
+
+#include "selftests/igt_spinner.h"
+#include "selftests/igt_reset.h"
+#include "selftests/intel_scheduler_helpers.h"
+#include "gt/intel_engine_heartbeat.h"
+#include "gem/selftests/mock_context.h"
+
+static void logical_sort(struct intel_engine_cs **engines, int num_engines)
+{
+   struct intel_engine_cs *sorted[MAX_ENGINE_INSTANCE + 1];
+   int i, j;
+
+   for (i = 0; i < num_engines; ++i)
+   for (j = 0; j < MAX_ENGINE_INSTANCE + 1; ++j) {
+   if (engines[j]->logical_mask & BIT(i)) {
+   sorted[i] = engines[j];
+   break;
+   }
+   }
+
+   memcpy(*engines, *sorted,
+  sizeof(struct intel_engine_cs *) * num_engines);
+}
+
+static struct intel_context *
+multi_lrc_create_parent(struct intel_gt *gt, u8 class,
+   unsigned long flags)
+{
+   struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+   struct intel_engine_cs *engine;
+   enum intel_engine_id id;
+   int i = 0;
+
+   for_each_engine(engine, gt, id) {
+   if (engine->class != class)
+   continue;
+
+   siblings[i++] = engine;
+   }
+
+   if (i <= 1)
+   return ERR_PTR(0);
+
+   logical_sort(siblings, i);
+
+   return intel_engine_create_parallel(siblings, 1, i);
+}
+
+static void multi_lrc_context_unpin(struct intel_context *ce)
+{
+   struct intel_context *child;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   for_each_child(ce, child)
+   intel_context_unpin(child);
+   intel_context_unpin(ce);
+}
+
+static void multi_lrc_context_put(struct intel_context *ce)
+{
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   /*
+* Only the parent gets the creation ref put in the uAPI, the parent
+* itself is responsible for creation ref put on the children.
+*/
+   intel_context_put(ce);
+}
+
+static struct i915_request *
+multi_lrc_nop_request(struct intel_context *ce)
+{
+   struct intel_context *child;
+   struct i915_request *rq, *child_rq;
+   int i = 0;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   rq = intel_context_create_request(ce);
+   if (IS_ERR(rq))
+   return rq;
+
+   i915_request_get(rq);
+   i915_request_add(rq);
+
+   for_each_child(ce, child) {
+   child_rq = intel_context_create_request(child);
+   if (IS_ERR(child_rq))
+   goto child_error;
+
+   if (++i == ce->guc_number_children)
+   set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
+   &child_rq->fence.flags);
+   i915_request_add(child_rq);
+   }
+
+   return rq;
+
+child_error:
+   i915_request_put(rq);
+
+   return ERR_PTR(-ENOMEM);
+}
+
+static int __intel_guc_multi_lrc_basic(struct intel_gt *gt, unsigned int class)
+{
+   struct intel_context *parent;
+   struct i915_request *rq;
+   int ret;
+
+   parent = multi_lrc_create_parent(gt, class, 0);
+   if (IS_ERR(parent)) {
+   pr_err("Failed creating contexts: %ld", PTR_ERR(parent));
+   return PTR_ERR(parent);
+   } else if (!parent) {
+   pr_debug("Not enough engines in class: %d",
+VIDEO_DECODE_CLASS);
+   return 0;
+   }
+
+   rq = multi_lrc_nop_request(parent);
+   if (IS_ERR(rq)) {
+   ret = PTR_ERR(rq);
+   pr_err("Failed creating requests: %d", ret);
+   goto out;
+   }
+
+   ret = intel_selftest_wait_for_rq(rq);
+   if (ret)
+   pr_err("Failed waiting on reques

[PATCH 24/27] drm/i915: Multi-BB execbuf

2021-08-20 Thread Matthew Brost

Allow multiple batch buffers to be submitted in a single execbuf IOCTL
after a context has been configured with the 'set_parallel' extension.
The number batches is implicit based on the contexts configuration.

This is implemented with a series of loops. First a loop is used to find
all the batches, a loop to pin all the HW contexts, a loop to generate
all the requests, a loop to submit all the requests, a loop to commit
all the requests, and finally a loop to tie the requests to the VMAs
they touch.

A composite fence is also created for the also the generated requests to
return to the user and to stick in dma resv slots.

IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1
media UMD: link to come

Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 765 --
 drivers/gpu/drm/i915/gt/intel_context.h   |   8 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  12 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   2 +
 drivers/gpu/drm/i915/i915_request.h   |   9 +
 drivers/gpu/drm/i915/i915_vma.c   |  21 +-
 drivers/gpu/drm/i915/i915_vma.h   |  13 +-
 7 files changed, 573 insertions(+), 257 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 8290bdadd167..481978974627 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -244,17 +244,23 @@ struct i915_execbuffer {
struct drm_i915_gem_exec_object2 *exec; /** ioctl execobj[] */
struct eb_vma *vma;
 
-   struct intel_engine_cs *engine; /** engine to queue the request to */
+   struct intel_gt *gt; /* gt for the execbuf */
struct intel_context *context; /* logical state for the request */
struct i915_gem_context *gem_context; /** caller's context */
 
-   struct i915_request *request; /** our request to build */
-   struct eb_vma *batch; /** identity of the batch obj/vma */
+   struct i915_request *requests[MAX_ENGINE_INSTANCE + 1]; /** our 
requests to build */
+   struct eb_vma *batches[MAX_ENGINE_INSTANCE + 1]; /** identity of the 
batch obj/vma */
struct i915_vma *trampoline; /** trampoline used for chaining */
 
+   /** used for excl fence in dma_resv objects when > 1 BB submitted */
+   struct dma_fence *composite_fence;
+
/** actual size of execobj[] as we may extend it for the cmdparser */
unsigned int buffer_count;
 
+   /* number of batches in execbuf IOCTL */
+   unsigned int num_batches;
+
/** list of vma not yet bound during reservation phase */
struct list_head unbound;
 
@@ -281,7 +287,7 @@ struct i915_execbuffer {
 
u64 invalid_flags; /** Set of execobj.flags that are invalid */
 
-   u64 batch_len; /** Length of batch within object */
+   u64 batch_len[MAX_ENGINE_INSTANCE + 1]; /** Length of batch within 
object */
u32 batch_start_offset; /** Location within object of batch */
u32 batch_flags; /** Flags composed for emit_bb_start() */
struct intel_gt_buffer_pool_node *batch_pool; /** pool node for batch 
buffer */
@@ -299,14 +305,13 @@ struct i915_execbuffer {
 };
 
 static int eb_parse(struct i915_execbuffer *eb);
-static struct i915_request *eb_pin_engine(struct i915_execbuffer *eb,
- bool throttle);
+static int eb_pin_engine(struct i915_execbuffer *eb, bool throttle);
 static void eb_unpin_engine(struct i915_execbuffer *eb);
 
 static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)
 {
-   return intel_engine_requires_cmd_parser(eb->engine) ||
-   (intel_engine_using_cmd_parser(eb->engine) &&
+   return intel_engine_requires_cmd_parser(eb->context->engine) ||
+   (intel_engine_using_cmd_parser(eb->context->engine) &&
 eb->args->batch_len);
 }
 
@@ -544,11 +549,21 @@ eb_validate_vma(struct i915_execbuffer *eb,
return 0;
 }
 
-static void
+static inline bool
+is_batch_buffer(struct i915_execbuffer *eb, unsigned int buffer_idx)
+{
+   return eb->args->flags & I915_EXEC_BATCH_FIRST ?
+   buffer_idx < eb->num_batches :
+   buffer_idx >= eb->args->buffer_count - eb->num_batches;
+}
+
+static int
 eb_add_vma(struct i915_execbuffer *eb,
-  unsigned int i, unsigned batch_idx,
+  unsigned int *current_batch,
+  unsigned int i,
   struct i915_vma *vma)
 {
+   struct drm_i915_private *i915 = eb->i915;
struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
struct eb_vma *ev = &eb->vma[i];
 
@@ -575,15 +590,41 @@ eb_add_vma(struct i915_execbuffer *eb,
 * Note that actual hangs have only been observed on gen7, but for
 * paranoia do it everywhere.
 */
-   if (i == batch_idx) {
+   if (is_batch_buffer(eb, i)) {
if (entry->relocation_cou

[PATCH 26/27] drm/i915: Enable multi-bb execbuf

2021-08-20 Thread Matthew Brost

Enable multi-bb execbuf by enabling the set_parallel extension.

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index de0fd145fb47..0aa095bed310 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -536,9 +536,6 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
struct intel_engine_cs **siblings = NULL;
intel_engine_mask_t prev_mask;
 
-   /* Disabling for now */
-   return -ENODEV;
-
if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
return -ENODEV;
 
-- 
2.32.0

[PATCH 13/27] drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts

2021-08-20 Thread Matthew Brost

In GuC parent-child contexts the parent context controls the scheduling,
ensure only the parent does the scheduling operations.

Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 24 ++-
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index dbcb9ab28a9a..00d54bb00bfb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -320,6 +320,12 @@ static void decr_context_committed_requests(struct 
intel_context *ce)
GEM_BUG_ON(ce->guc_state.number_committed_requests < 0);
 }
 
+static struct intel_context *
+request_to_scheduling_context(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
 static bool context_guc_id_invalid(struct intel_context *ce)
 {
return ce->guc_id.id == GUC_INVALID_LRC_ID;
@@ -1684,6 +1690,7 @@ static void __guc_context_sched_disable(struct intel_guc 
*guc,
 
GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
+   GEM_BUG_ON(intel_context_is_child(ce));
trace_intel_context_sched_disable(ce);
 
guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1898,6 +1905,8 @@ static void guc_context_sched_disable(struct 
intel_context *ce)
u16 guc_id;
bool enabled;
 
+   GEM_BUG_ON(intel_context_is_child(ce));
+
if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
!lrc_desc_registered(guc, ce->guc_id.id)) {
spin_lock_irqsave(&ce->guc_state.lock, flags);
@@ -2286,6 +2295,8 @@ static void guc_signal_context_fence(struct intel_context 
*ce)
 {
unsigned long flags;
 
+   GEM_BUG_ON(intel_context_is_child(ce));
+
spin_lock_irqsave(&ce->guc_state.lock, flags);
clr_context_wait_for_deregister_to_register(ce);
__guc_signal_context_fence(ce);
@@ -2315,7 +2326,7 @@ static void guc_context_init(struct intel_context *ce)
 
 static int guc_request_alloc(struct i915_request *rq)
 {
-   struct intel_context *ce = rq->context;
+   struct intel_context *ce = request_to_scheduling_context(rq);
struct intel_guc *guc = ce_to_guc(ce);
unsigned long flags;
int ret;
@@ -2358,11 +2369,12 @@ static int guc_request_alloc(struct i915_request *rq)
 * exhausted and return -EAGAIN to the user indicating that they can try
 * again in the future.
 *
-* There is no need for a lock here as the timeline mutex ensures at
-* most one context can be executing this code path at once. The
-* guc_id_ref is incremented once for every request in flight and
-* decremented on each retire. When it is zero, a lock around the
-* increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
+* There is no need for a lock here as the timeline mutex (or
+* parallel_submit mutex in the case of multi-lrc) ensures at most one
+* context can be executing this code path at once. The guc_id_ref is
+* incremented once for every request in flight and decremented on each
+* retire. When it is zero, a lock around the increment (in pin_guc_id)
+* is needed to seal a race with unpin_guc_id.
 */
if (atomic_add_unless(&ce->guc_id.ref, 1, 0))
goto out;
-- 
2.32.0

[PATCH 25/27] drm/i915/guc: Handle errors in multi-lrc requests

2021-08-20 Thread Matthew Brost

If an error occurs in the front end when multi-lrc requests are getting
generated we need to skip these in the backend but we still need to
emit the breadcrumbs seqno. An issues arrises because with multi-lrc
breadcrumbs there is a handshake between the parent and children to make
forwad progress. If all the requests are not present this handshake
doesn't work. To work around this, if multi-lrc request has an error we
skip the handshake but still emit the breadcrumbs seqno.

Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 61 ++-
 1 file changed, 58 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2ef38557b0f0..61e737fd1eee 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3546,8 +3546,8 @@ static int 
emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq,
 }
 
 static u32 *
-emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
-u32 *cs)
+__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+  u32 *cs)
 {
struct intel_context *ce = rq->context;
u8 i;
@@ -3575,6 +3575,41 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct 
i915_request *rq,
  get_children_go_addr(ce),
  0);
 
+   return cs;
+}
+
+/*
+ * If this true, a submission of multi-lrc requests had an error and the
+ * requests need to be skipped. The front end (execuf IOCTL) should've called
+ * i915_request_skip which squashes the BB but we still need to emit the fini
+ * breadrcrumbs seqno write. At this point we don't know how many of the
+ * requests in the multi-lrc submission were generated so we can't do the
+ * handshake between the parent and children (e.g. if 4 requests should be
+ * generated but 2nd hit an error only 1 would be seen by the GuC backend).
+ * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error
+ * has occurred on any of the requests in submission / relationship.
+ */
+static inline bool skip_handshake(struct i915_request *rq)
+{
+   return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags);
+}
+
+static u32 *
+emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq,
+u32 *cs)
+{
+   struct intel_context *ce = rq->context;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+
+   if (unlikely(skip_handshake(rq))) {
+   memset(cs, 0, sizeof(u32) *
+  (ce->engine->emit_fini_breadcrumb_dw - 6));
+   cs += ce->engine->emit_fini_breadcrumb_dw - 6;
+   } else {
+   cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs);
+   }
+
/* Emit fini breadcrumb */
cs = gen8_emit_ggtt_write(cs,
  rq->fence.seqno,
@@ -3591,7 +3626,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct 
i915_request *rq,
 }
 
 static u32 *
-emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 
*cs)
+__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+ u32 *cs)
 {
struct intel_context *ce = rq->context;
 
@@ -3617,6 +3653,25 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct 
i915_request *rq, u32 *cs
*cs++ = get_children_go_addr(ce->parent);
*cs++ = 0;
 
+   return cs;
+}
+
+static u32 *
+emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq,
+   u32 *cs)
+{
+   struct intel_context *ce = rq->context;
+
+   GEM_BUG_ON(!intel_context_is_child(ce));
+
+   if (unlikely(skip_handshake(rq))) {
+   memset(cs, 0, sizeof(u32) *
+  (ce->engine->emit_fini_breadcrumb_dw - 6));
+   cs += ce->engine->emit_fini_breadcrumb_dw - 6;
+   } else {
+   cs = __emit_fini_breadcrumb_child_no_preempt_mid_batch(rq, cs);
+   }
+
/* Emit fini breadcrumb */
cs = gen8_emit_ggtt_write(cs,
  rq->fence.seqno,
-- 
2.32.0

[PATCH 21/27] drm/i915/doc: Update parallel submit doc to point to i915_drm.h

2021-08-20 Thread Matthew Brost

Update parallel submit doc to point to i915_drm.h

Signed-off-by: Matthew Brost 
---
 Documentation/gpu/rfc/i915_parallel_execbuf.h | 122 --
 Documentation/gpu/rfc/i915_scheduler.rst  |   4 +-
 2 files changed, 2 insertions(+), 124 deletions(-)
 delete mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h

diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
b/Documentation/gpu/rfc/i915_parallel_execbuf.h
deleted file mode 100644
index 8cbe2c4e0172..
--- a/Documentation/gpu/rfc/i915_parallel_execbuf.h
+++ /dev/null
@@ -1,122 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2021 Intel Corporation
- */
-
-#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
i915_context_engines_parallel_submit */
-
-/**
- * struct drm_i915_context_engines_parallel_submit - Configure engine for
- * parallel submission.
- *
- * Setup a slot in the context engine map to allow multiple BBs to be submitted
- * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the 
GPU
- * in parallel. Multiple hardware contexts are created internally in the i915
- * run these BBs. Once a slot is configured for N BBs only N BBs can be
- * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
- * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
- * many BBs there are based on the slot's configuration. The N BBs are the last
- * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
- *
- * The default placement behavior is to create implicit bonds between each
- * context if each context maps to more than 1 physical engine (e.g. context is
- * a virtual engine). Also we only allow contexts of same engine class and 
these
- * contexts must be in logically contiguous order. Examples of the placement
- * behavior described below. Lastly, the default is to not allow BBs to
- * preempted mid BB rather insert coordinated preemption on all hardware
- * contexts between each set of BBs. Flags may be added in the future to change
- * both of these default behaviors.
- *
- * Returns -EINVAL if hardware context placement configuration is invalid or if
- * the placement configuration isn't supported on the platform / submission
- * interface.
- * Returns -ENODEV if extension isn't supported on the platform / submission
- * interface.
- *
- * .. code-block:: none
- *
- * Example 1 pseudo code:
- * CS[X] = generic engine of same class, logical instance X
- * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- * set_engines(INVALID)
- * set_parallel(engine_index=0, width=2, num_siblings=1,
- *  engines=CS[0],CS[1])
- *
- * Results in the following valid placement:
- * CS[0], CS[1]
- *
- * Example 2 pseudo code:
- * CS[X] = generic engine of same class, logical instance X
- * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- * set_engines(INVALID)
- * set_parallel(engine_index=0, width=2, num_siblings=2,
- *  engines=CS[0],CS[2],CS[1],CS[3])
- *
- * Results in the following valid placements:
- * CS[0], CS[1]
- * CS[2], CS[3]
- *
- * This can also be thought of as 2 virtual engines described by 2-D array
- * in the engines the field with bonds placed between each index of the
- * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
- * CS[3].
- * VE[0] = CS[0], CS[2]
- * VE[1] = CS[1], CS[3]
- *
- * Example 3 pseudo code:
- * CS[X] = generic engine of same class, logical instance X
- * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
- * set_engines(INVALID)
- * set_parallel(engine_index=0, width=2, num_siblings=2,
- *  engines=CS[0],CS[1],CS[1],CS[3])
- *
- * Results in the following valid and invalid placements:
- * CS[0], CS[1]
- * CS[1], CS[3] - Not logical contiguous, return -EINVAL
- */
-struct drm_i915_context_engines_parallel_submit {
-   /**
-* @base: base user extension.
-*/
-   struct i915_user_extension base;
-
-   /**
-* @engine_index: slot for parallel engine
-*/
-   __u16 engine_index;
-
-   /**
-* @width: number of contexts per parallel engine
-*/
-   __u16 width;
-
-   /**
-* @num_siblings: number of siblings per context
-*/
-   __u16 num_siblings;
-
-   /**
-* @mbz16: reserved for future use; must be zero
-*/
-   __u16 mbz16;
-
-   /**
-* @flags: all undefined flags must be zero, currently not defined flags
-*/
-   __u64 flags;
-
-   /**
-* @mbz64: reserved for future use; must be zero
-*/
-   __u64 mbz64[3];
-
-   /**
-* @engines: 2-d array of engine instances to configure parallel engine
-*
-* length = width (i) * num_siblings (j)
-* index = j + i * num_siblings
-*/
-

[PATCH 23/27] drm/i915/guc: Implement no mid batch preemption for multi-lrc

2021-08-20 Thread Matthew Brost

For some users of multi-lrc, e.g. split frame, it isn't safe to preempt
mid BB. To safely enable preemption at the BB boundary, a handshake
between to parent and child is needed. This is implemented via custom
emit_bb_start & emit_fini_breadcrumb functions and enabled via by
default if a context is configured by set parallel extension.

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 283 +-
 4 files changed, 287 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 5615be32879c..2de62649e275 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -561,7 +561,7 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
GEM_BUG_ON(intel_context_is_child(child));
GEM_BUG_ON(intel_context_is_parent(child));
 
-   parent->guc_number_children++;
+   child->guc_child_index = parent->guc_number_children++;
list_add_tail(&child->guc_child_link,
  &parent->guc_child_list);
child->parent = parent;
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 713d85b0b364..727f91e7f7c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -246,6 +246,9 @@ struct intel_context {
/** @guc_number_children: number of children if parent */
u8 guc_number_children;
 
+   /** @guc_child_index: index into guc_child_list if child */
+   u8 guc_child_index;
+
/**
 * @parent_page: page in context used by parent for work queue,
 * work queue descriptor
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 6cd26dc060d1..9f61cfa5566a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -188,7 +188,7 @@ struct guc_process_desc {
u32 wq_status;
u32 engine_presence;
u32 priority;
-   u32 reserved[30];
+   u32 reserved[36];
 } __packed;
 
 #define CONTEXT_REGISTRATION_FLAG_KMD  BIT(0)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 91330525330d..1a18f99bf12a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
 #include "gt/intel_context.h"
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_engine_heartbeat.h"
+#include "gt/intel_gpu_commands.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
@@ -366,10 +367,14 @@ static struct i915_priolist *to_priolist(struct rb_node 
*rb)
 
 /*
  * When using multi-lrc submission an extra page in the context state is
- * reserved for the process descriptor and work queue.
+ * reserved for the process descriptor, work queue, and preempt BB boundary
+ * handshake between the parent + childlren contexts.
  *
  * The layout of this page is below:
  * 0   guc_process_desc
+ * + sizeof(struct guc_process_desc)   child go
+ * + CACHELINE_BYTES   child join ...
+ * + CACHELINE_BYTES ...
  * ... unused
  * PAGE_SIZE / 2   work queue start
  * ... work queue
@@ -1785,6 +1790,30 @@ static int deregister_context(struct intel_context *ce, 
u32 guc_id, bool loop)
return __guc_action_deregister_context(guc, guc_id, loop);
 }
 
+static inline void clear_children_join_go_memory(struct intel_context *ce)
+{
+   u32 *mem = (u32 *)(__get_process_desc(ce) + 1);
+   u8 i;
+
+   for (i = 0; i < ce->guc_number_children + 1; ++i)
+   mem[i * (CACHELINE_BYTES / sizeof(u32))] = 0;
+}
+
+static inline u32 get_children_go_value(struct intel_context *ce)
+{
+   u32 *mem = (u32 *)(__get_process_desc(ce) + 1);
+
+   return mem[0];
+}
+
+static inline u32 get_children_join_value(struct intel_context *ce,
+ u8 child_index)
+{
+   u32 *mem = (u32 *)(__get_process_desc(ce) + 1);
+
+   return mem[(child_index + 1) * (CACHELINE_BYTES / sizeof(u32))];
+}
+
 static void guc_context_policy_init(struct intel_engine_cs *engine,
struct guc_lrc_desc *desc)
 {
@@ -1867,6 +1896,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, 
bool loop)
desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
guc_context_policy_init(engine, desc);

[PATCH 16/27] drm/i915/guc: Insert submit fences between requests in parent-child relationship

2021-08-20 Thread Matthew Brost

The GuC must receive requests in the order submitted for contexts in a
parent-child relationship to function correctly. To ensure this, insert
a submit fence between the current request and last request submitted
for requests / contexts in a parent child relationship. This is
conceptually similar to a single timeline.

Signed-off-by: Matthew Brost 
Cc: John Harrison 
---
 drivers/gpu/drm/i915/gt/intel_context.h   |   5 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   5 +-
 drivers/gpu/drm/i915/i915_request.c   | 120 ++
 4 files changed, 109 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index c2985822ab74..9dcc1b14697b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -75,6 +75,11 @@ intel_context_to_parent(struct intel_context *ce)
 }
 }
 
+static inline bool intel_context_is_parallel(struct intel_context *ce)
+{
+   return intel_context_is_child(ce) || intel_context_is_parent(ce);
+}
+
 void intel_context_bind_parent_child(struct intel_context *parent,
 struct intel_context *child);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 6f567ebeb039..a63329520c35 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -246,6 +246,13 @@ struct intel_context {
 * work queue descriptor
 */
u8 parent_page;
+
+   /**
+* @last_rq: last request submitted on a parallel context, used
+* to insert submit fences between request in the parallel
+* context.
+*/
+   struct i915_request *last_rq;
};
 
 #ifdef CONFIG_DRM_I915_SELFTEST
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b107ad095248..f0b60fecf253 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -672,8 +672,7 @@ static int rq_prio(const struct i915_request *rq)
 
 static bool is_multi_lrc_rq(struct i915_request *rq)
 {
-   return intel_context_is_child(rq->context) ||
-   intel_context_is_parent(rq->context);
+   return intel_context_is_parallel(rq->context);
 }
 
 static bool can_merge_rq(struct i915_request *rq,
@@ -2843,6 +2842,8 @@ static void guc_parent_context_unpin(struct intel_context 
*ce)
GEM_BUG_ON(!intel_context_is_parent(ce));
GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
 
+   if (ce->last_rq)
+   i915_request_put(ce->last_rq);
unpin_guc_id(guc, ce);
lrc_unpin(ce);
 }
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index ce446716d092..2e51c8999088 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1546,36 +1546,62 @@ i915_request_await_object(struct i915_request *to,
return ret;
 }
 
+static inline bool is_parallel_rq(struct i915_request *rq)
+{
+   return intel_context_is_parallel(rq->context);
+}
+
+static inline struct intel_context *request_to_parent(struct i915_request *rq)
+{
+   return intel_context_to_parent(rq->context);
+}
+
 static struct i915_request *
-__i915_request_add_to_timeline(struct i915_request *rq)
+__i915_request_ensure_parallel_ordering(struct i915_request *rq,
+   struct intel_timeline *timeline)
 {
-   struct intel_timeline *timeline = i915_request_timeline(rq);
struct i915_request *prev;
 
-   /*
-* Dependency tracking and request ordering along the timeline
-* is special cased so that we can eliminate redundant ordering
-* operations while building the request (we know that the timeline
-* itself is ordered, and here we guarantee it).
-*
-* As we know we will need to emit tracking along the timeline,
-* we embed the hooks into our request struct -- at the cost of
-* having to have specialised no-allocation interfaces (which will
-* be beneficial elsewhere).
-*
-* A second benefit to open-coding i915_request_await_request is
-* that we can apply a slight variant of the rules specialised
-* for timelines that jump between engines (such as virtual engines).
-* If we consider the case of virtual engine, we must emit a dma-fence
-* to prevent scheduling of the second request until the first is
-* complete (to maximise our greedy late load balancing) and this
-* precludes optimising to use semaphores serialisation of a single
-* timeline across engines.
-*/
+   GEM_BUG_ON(!is_parallel_r

[PATCH 17/27] drm/i915/guc: Implement multi-lrc reset

2021-08-20 Thread Matthew Brost

Update context and full GPU reset to work with multi-lrc. The idea is
parent context tracks all the active requests inflight for itself and
its' children. The parent context owns the reset replaying / canceling
requests as needed.

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c   | 11 ++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 63 +--
 2 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 00d1aee6d199..5615be32879c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -528,20 +528,21 @@ struct i915_request *intel_context_create_request(struct 
intel_context *ce)
 
 struct i915_request *intel_context_find_active_request(struct intel_context 
*ce)
 {
+   struct intel_context *parent = intel_context_to_parent(ce);
struct i915_request *rq, *active = NULL;
unsigned long flags;
 
GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
 
-   spin_lock_irqsave(&ce->guc_state.lock, flags);
-   list_for_each_entry_reverse(rq, &ce->guc_state.requests,
+   spin_lock_irqsave(&parent->guc_state.lock, flags);
+   list_for_each_entry_reverse(rq, &parent->guc_state.requests,
sched.link) {
-   if (i915_request_completed(rq))
+   if (i915_request_completed(rq) && rq->context == ce)
break;
 
-   active = rq;
+   active = (rq->context == ce) ? rq : active;
}
-   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+   spin_unlock_irqrestore(&parent->guc_state.lock, flags);
 
return active;
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f0b60fecf253..e34e0ea9136a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -670,6 +670,11 @@ static int rq_prio(const struct i915_request *rq)
return rq->sched.attr.priority;
 }
 
+static inline bool is_multi_lrc(struct intel_context *ce)
+{
+   return intel_context_is_parallel(ce);
+}
+
 static bool is_multi_lrc_rq(struct i915_request *rq)
 {
return intel_context_is_parallel(rq->context);
@@ -1179,10 +1184,13 @@ __unwind_incomplete_requests(struct intel_context *ce)
 
 static void __guc_reset_context(struct intel_context *ce, bool stalled)
 {
+   bool local_stalled;
struct i915_request *rq;
unsigned long flags;
u32 head;
+   int i, number_children = ce->guc_number_children;
bool skip = false;
+   struct intel_context *parent = ce;
 
intel_context_get(ce);
 
@@ -1209,25 +1217,34 @@ static void __guc_reset_context(struct intel_context 
*ce, bool stalled)
if (unlikely(skip))
goto out_put;
 
-   rq = intel_context_find_active_request(ce);
-   if (!rq) {
-   head = ce->ring->tail;
-   stalled = false;
-   goto out_replay;
-   }
+   for (i = 0; i < number_children + 1; ++i) {
+   if (!intel_context_is_pinned(ce))
+   goto next_context;
+
+   local_stalled = false;
+   rq = intel_context_find_active_request(ce);
+   if (!rq) {
+   head = ce->ring->tail;
+   goto out_replay;
+   }
 
-   if (!i915_request_started(rq))
-   stalled = false;
+   GEM_BUG_ON(i915_active_is_idle(&ce->active));
+   head = intel_ring_wrap(ce->ring, rq->head);
 
-   GEM_BUG_ON(i915_active_is_idle(&ce->active));
-   head = intel_ring_wrap(ce->ring, rq->head);
-   __i915_request_reset(rq, stalled);
+   if (i915_request_started(rq))
+   local_stalled = true;
 
+   __i915_request_reset(rq, local_stalled && stalled);
 out_replay:
-   guc_reset_state(ce, head, stalled);
-   __unwind_incomplete_requests(ce);
+   guc_reset_state(ce, head, local_stalled && stalled);
+next_context:
+   if (i != number_children)
+   ce = list_next_entry(ce, guc_child_link);
+   }
+
+   __unwind_incomplete_requests(parent);
 out_put:
-   intel_context_put(ce);
+   intel_context_put(parent);
 }
 
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
@@ -1248,7 +1265,8 @@ void intel_guc_submission_reset(struct intel_guc *guc, 
bool stalled)
 
xa_unlock(&guc->context_lookup);
 
-   if (intel_context_is_pinned(ce))
+   if (intel_context_is_pinned(ce) &&
+   !intel_context_is_child(ce))
__guc_reset_context(ce, stalled);
 
intel_context_put(ce);
@@ -1340,7 +1358,8 @@ void intel_guc_submission_cancel_requests(struct

[PATCH 14/27] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids

2021-08-20 Thread Matthew Brost

Assign contexts in parent-child relationship consecutive guc_ids. This
is accomplished by partitioning guc_id space between ones that need to
be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
available guc_ids). The consecutive search is implemented via the bitmap
API.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
when using the GuC multi-lrc interface.

v2:
 (Daniel Vetter)
  - Explictly state why we assign consecutive guc_ids

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|   6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 107 +-
 2 files changed, 86 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 023953e77553..3f95b1b4f15c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -61,9 +61,13 @@ struct intel_guc {
 */
spinlock_t lock;
/**
-* @guc_ids: used to allocate new guc_ids
+* @guc_ids: used to allocate new guc_ids, single-lrc
 */
struct ida guc_ids;
+   /**
+* @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc
+*/
+   unsigned long *guc_ids_bitmap;
/** @num_guc_ids: number of guc_ids that can be used */
u32 num_guc_ids;
/** @max_guc_ids: max number of guc_ids that can be used */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 00d54bb00bfb..e9dfd43d29a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -125,6 +125,18 @@ guc_create_virtual(struct intel_engine_cs **siblings, 
unsigned int count);
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+/*
+ * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
+ * per the GuC submission interface. A different allocation algorithm is used
+ * (bitmap vs. ida) between multi-lrc and single-lrc hence the reason to
+ * partition the guc_id space. We believe the number of multi-lrc contexts in
+ * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for
+ * multi-lrc.
+ */
+#define NUMBER_MULTI_LRC_GUC_ID(guc) \
+   ((guc)->submission_state.num_guc_ids / 16 > 32 ? \
+(guc)->submission_state.num_guc_ids / 16 : 32)
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock.
@@ -1176,6 +1188,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts);
intel_gt_pm_unpark_work_init(&guc->submission_state.destroyed_worker,
 destroyed_worker_func);
+   guc->submission_state.guc_ids_bitmap =
+   bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
+   if (!guc->submission_state.guc_ids_bitmap)
+   return -ENOMEM;
 
return 0;
 }
@@ -1188,6 +1204,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
guc_lrc_desc_pool_destroy(guc);
guc_flush_destroyed_contexts(guc);
i915_sched_engine_put(guc->sched_engine);
+   bitmap_free(guc->submission_state.guc_ids_bitmap);
 }
 
 static void queue_request(struct i915_sched_engine *sched_engine,
@@ -1239,18 +1256,43 @@ static void guc_submit_request(struct i915_request *rq)
spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static int new_guc_id(struct intel_guc *guc)
+static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
 {
-   return ida_simple_get(&guc->submission_state.guc_ids, 0,
- guc->submission_state.num_guc_ids, GFP_KERNEL |
- __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+   int ret;
+
+   GEM_BUG_ON(intel_context_is_child(ce));
+
+   if (intel_context_is_parent(ce))
+   ret = 
bitmap_find_free_region(guc->submission_state.guc_ids_bitmap,
+ NUMBER_MULTI_LRC_GUC_ID(guc),
+ 
order_base_2(ce->guc_number_children
+  + 1));
+   else
+   ret = ida_simple_get(&guc->submission_state.guc_ids,
+NUMBER_MULTI_LRC_GUC_ID(guc),
+guc->submission_state.num_guc_ids,
+GFP_KERNEL | __GFP_RETRY_MAYFAIL |
+__GFP_NOWARN);
+   if (unlikely(ret < 0))
+   return ret;
+
+   ce->guc_id.id = ret;
+   return 0;
 }
 
 static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
 {

[PATCH 06/27] drm/i915/guc: Take engine PM when a context is pinned with GuC submission

2021-08-20 Thread Matthew Brost

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a scheduling of user context could be enabled.

v2:
 (Daniel Vetter)
  - Add might_lock annotations to pin / unpin function

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c   |  3 ++
 drivers/gpu/drm/i915/gt/intel_engine_pm.h | 15 
 drivers/gpu/drm/i915/gt/intel_gt_pm.h | 10 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +--
 drivers/gpu/drm/i915/intel_wakeref.h  | 12 +++
 5 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index c8595da64ad8..508cfe5770c0 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -240,6 +240,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
if (err)
goto err_post_unpin;
 
+   intel_engine_pm_might_get(ce->engine);
+
if (unlikely(intel_context_is_closed(ce))) {
err = -ENOENT;
goto err_unlock;
@@ -313,6 +315,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int 
sub)
return;
 
CE_TRACE(ce, "unpin\n");
+   intel_engine_pm_might_put(ce->engine);
ce->ops->unpin(ce);
ce->ops->post_unpin(ce);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 17a5028ea177..3fe2ae1bcc26 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -9,6 +9,7 @@
 #include "i915_request.h"
 #include "intel_engine_types.h"
 #include "intel_wakeref.h"
+#include "intel_gt_pm.h"
 
 static inline bool
 intel_engine_pm_is_awake(const struct intel_engine_cs *engine)
@@ -31,6 +32,13 @@ static inline bool intel_engine_pm_get_if_awake(struct 
intel_engine_cs *engine)
return intel_wakeref_get_if_active(&engine->wakeref);
 }
 
+static inline void intel_engine_pm_might_get(struct intel_engine_cs *engine)
+{
+   if (!intel_engine_is_virtual(engine))
+   intel_wakeref_might_get(&engine->wakeref);
+   intel_gt_pm_might_get(engine->gt);
+}
+
 static inline void intel_engine_pm_put(struct intel_engine_cs *engine)
 {
intel_wakeref_put(&engine->wakeref);
@@ -52,6 +60,13 @@ static inline void intel_engine_pm_flush(struct 
intel_engine_cs *engine)
intel_wakeref_unlock_wait(&engine->wakeref);
 }
 
+static inline void intel_engine_pm_might_put(struct intel_engine_cs *engine)
+{
+   if (!intel_engine_is_virtual(engine))
+   intel_wakeref_might_put(&engine->wakeref);
+   intel_gt_pm_might_put(engine->gt);
+}
+
 static inline struct i915_request *
 intel_engine_create_kernel_request(struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index a17bf0d4592b..3c173033ce23 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -31,6 +31,11 @@ static inline bool intel_gt_pm_get_if_awake(struct intel_gt 
*gt)
return intel_wakeref_get_if_active(>->wakeref);
 }
 
+static inline void intel_gt_pm_might_get(struct intel_gt *gt)
+{
+   intel_wakeref_might_get(>->wakeref);
+}
+
 static inline void intel_gt_pm_put(struct intel_gt *gt)
 {
intel_wakeref_put(>->wakeref);
@@ -41,6 +46,11 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
intel_wakeref_put_async(>->wakeref);
 }
 
+static inline void intel_gt_pm_might_put(struct intel_gt *gt)
+{
+   intel_wakeref_might_put(>->wakeref);
+}
+
 #define with_intel_gt_pm(gt, tmp) \
for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 intel_gt_pm_put(gt), tmp = 0)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index dbf919801de2..e0eed70f9b92 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1550,7 +1550,12 @@ static int guc_context_pre_pin(struct intel_context *ce,
 
 static int guc_context_pin(struct intel_context *ce, void *vaddr)
 {
-   return __guc_context_pin(ce, ce->engine, vaddr);
+   int ret = __guc_context_pin(ce, ce->engine, vaddr);
+
+   if (likely(!ret && !intel_context_is_barrier(ce)))
+   intel_engine_pm_get(ce->engine);
+
+   return ret;
 }
 
 static void guc_context_unpin(struct intel_context *ce)
@@ -1559,6 +1564,9 @@ static void guc_context_unpin(struct intel_context *ce)
 
unpin_guc_id(guc, ce);
lrc_unpin(ce);
+
+   if (likely(!intel_context_is_barrier(ce)))
+   intel_engine_pm_put_async(ce->engine);
 }
 
 static void guc_context_post_unpin(struct intel_context *ce)
@@ -2328,8 +2336,30 @@ static int guc_virtual_context_pre_pin(struct 
intel_context *ce,
 static int guc_virtual_context_pin(struct intel_context *ce, v

[PATCH 27/27] drm/i915/execlists: Weak parallel submission support for execlists

2021-08-20 Thread Matthew Brost

A weak implementation of parallel submission (multi-bb execbuf IOCTL) for
execlists. Basically doing as little as possible to support this
interface for execlists - basically just passing submit fences between
each request generated and virtual engines are not allowed. This is on
par with what is there for the existing (hopefully soon deprecated)
bonding interface.

We perma-pin these execlists contexts to align with GuC implementation.

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  9 ++-
 drivers/gpu/drm/i915/gt/intel_context.c   |  4 +-
 .../drm/i915/gt/intel_execlists_submission.c  | 57 ++-
 drivers/gpu/drm/i915/gt/intel_lrc.c   |  2 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 -
 5 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 0aa095bed310..cb6ce2ee1d8b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -536,9 +536,6 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
struct intel_engine_cs **siblings = NULL;
intel_engine_mask_t prev_mask;
 
-   if (!(intel_uc_uses_guc_submission(&i915->gt.uc)))
-   return -ENODEV;
-
if (get_user(slot, &ext->engine_index))
return -EFAULT;
 
@@ -548,6 +545,12 @@ set_proto_ctx_engines_parallel_submit(struct 
i915_user_extension __user *base,
if (get_user(num_siblings, &ext->num_siblings))
return -EFAULT;
 
+   if (!intel_uc_uses_guc_submission(&i915->gt.uc) && num_siblings != 1) {
+   drm_dbg(&i915->drm, "Only 1 sibling (%d) supported in non-GuC 
mode\n",
+   num_siblings);
+   return -EINVAL;
+   }
+
if (slot >= set->num_engines) {
drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
slot, set->num_engines);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 2de62649e275..b0f0cac6a151 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -79,7 +79,8 @@ static int intel_context_active_acquire(struct intel_context 
*ce)
 
__i915_active_acquire(&ce->active);
 
-   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
+   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine) ||
+   intel_context_is_parallel(ce))
return 0;
 
/* Preallocate tracking nodes */
@@ -554,7 +555,6 @@ void intel_context_bind_parent_child(struct intel_context 
*parent,
 * Callers responsibility to validate that this function is used
 * correctly but we use GEM_BUG_ON here ensure that they do.
 */
-   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
GEM_BUG_ON(intel_context_is_pinned(parent));
GEM_BUG_ON(intel_context_is_child(parent));
GEM_BUG_ON(intel_context_is_pinned(child));
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index d1e2d6f8ff81..8875d85a1677 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -927,8 +927,7 @@ static void execlists_submit_ports(struct intel_engine_cs 
*engine)
 
 static bool ctx_single_port_submission(const struct intel_context *ce)
 {
-   return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
-   intel_context_force_single_submission(ce));
+   return intel_context_force_single_submission(ce);
 }
 
 static bool can_merge_ctx(const struct intel_context *prev,
@@ -2598,6 +2597,59 @@ static void execlists_context_cancel_request(struct 
intel_context *ce,
  current->comm);
 }
 
+static struct intel_context *
+execlists_create_parallel(struct intel_engine_cs **engines,
+ unsigned int num_siblings,
+ unsigned int width)
+{
+   struct intel_engine_cs **siblings = NULL;
+   struct intel_context *parent = NULL, *ce, *err;
+   int i, j;
+
+   GEM_BUG_ON(num_siblings != 1);
+
+   siblings = kmalloc_array(num_siblings,
+sizeof(*siblings),
+GFP_KERNEL);
+   if (!siblings)
+   return ERR_PTR(-ENOMEM);
+
+   for (i = 0; i < width; ++i) {
+   for (j = 0; j < num_siblings; ++j)
+   siblings[j] = engines[i * num_siblings + j];
+
+   ce = intel_context_create(siblings[0]);
+   if (!ce) {
+   err = ERR_PTR(-ENOMEM);
+   goto unwind;
+   }
+
+   if (i == 0) {
+   parent = ce;
+   } else {
+   in

[PATCH 04/27] drm/i915/guc: Take GT PM ref when deregistering context

2021-08-20 Thread Matthew Brost

Taking a PM reference to prevent intel_gt_wait_for_idle from short
circuiting while a deregister context H2G is in flight.

FIXME: Move locking / structure changes into different patch

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  13 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.h |   5 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |  13 ++
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|  46 ++--
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c|  13 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 212 +++---
 8 files changed, 199 insertions(+), 106 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index adfe49b53b1b..c8595da64ad8 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -399,6 +399,8 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
ce->guc_id.id = GUC_INVALID_LRC_ID;
INIT_LIST_HEAD(&ce->guc_id.link);
 
+   INIT_LIST_HEAD(&ce->destroyed_link);
+
/*
 * Initialize fence to be complete as this is expected to be complete
 * unless there is a pending schedule disable outstanding.
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 80bbdc7810f6..fd338a30617e 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -190,22 +190,29 @@ struct intel_context {
/**
 * @id: unique handle which is used to communicate information
 * with the GuC about this context, protected by
-* guc->contexts_lock
+* guc->submission_state.lock
 */
u16 id;
/**
 * @ref: the number of references to the guc_id, when
 * transitioning in and out of zero protected by
-* guc->contexts_lock
+* guc->submission_state.lock
 */
atomic_t ref;
/**
 * @link: in guc->guc_id_list when the guc_id has no refs but is
-* still valid, protected by guc->contexts_lock
+* still valid, protected by guc->submission_state.lock
 */
struct list_head link;
} guc_id;
 
+   /**
+* @destroyed_link: link in guc->submission_state.destroyed_contexts, in
+* list when context is pending to be destroyed (deregistered with the
+* GuC), protected by guc->submission_state.lock
+*/
+   struct list_head destroyed_link;
+
 #ifdef CONFIG_DRM_I915_SELFTEST
/**
 * @drop_schedule_enable: Force drop of schedule enable G2H for selftest
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index 70ea46d6cfb0..17a5028ea177 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -16,6 +16,11 @@ intel_engine_pm_is_awake(const struct intel_engine_cs 
*engine)
return intel_wakeref_is_active(&engine->wakeref);
 }
 
+static inline void __intel_engine_pm_get(struct intel_engine_cs *engine)
+{
+   __intel_wakeref_get(&engine->wakeref);
+}
+
 static inline void intel_engine_pm_get(struct intel_engine_cs *engine)
 {
intel_wakeref_get(&engine->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index d0588d8aaa44..a17bf0d4592b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -41,6 +41,19 @@ static inline void intel_gt_pm_put_async(struct intel_gt *gt)
intel_wakeref_put_async(>->wakeref);
 }
 
+#define with_intel_gt_pm(gt, tmp) \
+   for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+intel_gt_pm_put(gt), tmp = 0)
+#define with_intel_gt_pm_async(gt, tmp) \
+   for (tmp = 1, intel_gt_pm_get(gt); tmp; \
+intel_gt_pm_put_async(gt), tmp = 0)
+#define with_intel_gt_pm_if_awake(gt, tmp) \
+   for (tmp = intel_gt_pm_get_if_awake(gt); tmp; \
+intel_gt_pm_put(gt), tmp = 0)
+#define with_intel_gt_pm_if_awake_async(gt, tmp) \
+   for (tmp = intel_gt_pm_get_if_awake(gt); tmp; \
+intel_gt_pm_put_async(gt), tmp = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
return intel_wakeref_wait_for_idle(>->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h 
b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 8ff58aff..ba10bd374cee 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPO

[PATCH 01/27] drm/i915/guc: Squash Clean up GuC CI failures, simplify locking, and kernel DOC

2021-08-20 Thread Matthew Brost

https://patchwork.freedesktop.org/series/93704/

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c   |  19 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  81 +-
 .../drm/i915/gt/intel_execlists_submission.c  |   4 -
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c|  29 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|  19 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |   6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 952 +++---
 drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 126 +++
 drivers/gpu/drm/i915/i915_gpu_error.c |  39 +-
 drivers/gpu/drm/i915/i915_request.h   |  23 +-
 drivers/gpu/drm/i915/i915_trace.h |  12 +-
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 drivers/gpu/drm/i915/selftests/i915_request.c | 100 ++
 .../i915/selftests/intel_scheduler_helpers.c  |  12 +
 .../i915/selftests/intel_scheduler_helpers.h  |   2 +
 16 files changed, 965 insertions(+), 466 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc.c

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 745e84c72c90..adfe49b53b1b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -394,19 +394,18 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
 
spin_lock_init(&ce->guc_state.lock);
INIT_LIST_HEAD(&ce->guc_state.fences);
+   INIT_LIST_HEAD(&ce->guc_state.requests);
 
-   spin_lock_init(&ce->guc_active.lock);
-   INIT_LIST_HEAD(&ce->guc_active.requests);
-
-   ce->guc_id = GUC_INVALID_LRC_ID;
-   INIT_LIST_HEAD(&ce->guc_id_link);
+   ce->guc_id.id = GUC_INVALID_LRC_ID;
+   INIT_LIST_HEAD(&ce->guc_id.link);
 
/*
 * Initialize fence to be complete as this is expected to be complete
 * unless there is a pending schedule disable outstanding.
 */
-   i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
-   i915_sw_fence_commit(&ce->guc_blocked);
+   i915_sw_fence_init(&ce->guc_state.blocked_fence,
+  sw_fence_dummy_notify);
+   i915_sw_fence_commit(&ce->guc_state.blocked_fence);
 
i915_active_init(&ce->active,
 __intel_context_active, __intel_context_retire, 0);
@@ -520,15 +519,15 @@ struct i915_request 
*intel_context_find_active_request(struct intel_context *ce)
 
GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
 
-   spin_lock_irqsave(&ce->guc_active.lock, flags);
-   list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+   spin_lock_irqsave(&ce->guc_state.lock, flags);
+   list_for_each_entry_reverse(rq, &ce->guc_state.requests,
sched.link) {
if (i915_request_completed(rq))
break;
 
active = rq;
}
-   spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+   spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
return active;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e54351a170e2..80bbdc7810f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -112,6 +112,7 @@ struct intel_context {
 #define CONTEXT_FORCE_SINGLE_SUBMISSION7
 #define CONTEXT_NOPREEMPT  8
 #define CONTEXT_LRCA_DIRTY 9
+#define CONTEXT_GUC_INIT   10
 
struct {
u64 timeout_us;
@@ -155,49 +156,73 @@ struct intel_context {
u8 wa_bb_page; /* if set, page num reserved for context workarounds */
 
struct {
-   /** lock: protects everything in guc_state */
+   /** @lock: protects everything in guc_state */
spinlock_t lock;
/**
-* sched_state: scheduling state of this context using GuC
+* @sched_state: scheduling state of this context using GuC
 * submission
 */
-   u16 sched_state;
+   u32 sched_state;
/*
-* fences: maintains of list of requests that have a submit
-* fence related to GuC submission
+* @fences: maintains a list of requests are currently being
+* fenced until a GuC operation completes
 */
struct list_head fences;
+   /**
+* @blocked_fence: fence used to signal when the blocking of a
+* contexts submissions is complete.
+*/
+   struct i915_sw_fence blocked_fence;
+   /** @number_committed_requests: number of committed requests */
+   int number_committed_requests;
+   /** @requests: list of active re

[PATCH 03/27] drm/i915/guc: Connect the number of guc_ids to debugfs

2021-08-20 Thread Matthew Brost

For testing purposes it may make sense to reduce the number of guc_ids
available to be allocated. Add debugfs support for setting the number of
guc_ids.

Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c| 31 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index 887c8c8f35db..b88d343ee432 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -71,12 +71,43 @@ static bool intel_eval_slpc_support(void *data)
return intel_guc_slpc_is_used(guc);
 }
 
+static int guc_num_id_get(void *data, u64 *val)
+{
+   struct intel_guc *guc = data;
+
+   if (!intel_guc_submission_is_used(guc))
+   return -ENODEV;
+
+   *val = guc->num_guc_ids;
+
+   return 0;
+}
+
+static int guc_num_id_set(void *data, u64 val)
+{
+   struct intel_guc *guc = data;
+
+   if (!intel_guc_submission_is_used(guc))
+   return -ENODEV;
+
+   if (val > guc->max_guc_ids)
+   val = guc->max_guc_ids;
+   else if (val < 256)
+   val = 256;
+
+   guc->num_guc_ids = val;
+
+   return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(guc_num_id_fops, guc_num_id_get, guc_num_id_set, 
"%lld\n");
+
 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
 {
static const struct debugfs_gt_file files[] = {
{ "guc_info", &guc_info_fops, NULL },
{ "guc_registered_contexts", &guc_registered_contexts_fops, 
NULL },
{ "guc_slpc_info", &guc_slpc_info_fops, 
&intel_eval_slpc_support},
+   { "guc_num_id", &guc_num_id_fops, NULL },
};
 
if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 8235e49bb347..68742b612692 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2716,7 +2716,8 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 
if (unlikely(desc_idx >= guc->max_guc_ids)) {
drm_err(&guc_to_gt(guc)->i915->drm,
-   "Invalid desc_idx %u", desc_idx);
+   "Invalid desc_idx %u, max %u",
+   desc_idx, guc->max_guc_ids);
return NULL;
}
 
-- 
2.32.0

[PATCH 02/27] drm/i915/guc: Allow flexible number of context ids

2021-08-20 Thread Matthew Brost

Number of available GuC contexts ids might be limited.
Stop referring in code to macro and use variable instead.

Signed-off-by: Michal Wajdeczko 
Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|  4 
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 +--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 112dd29a63fe..6fd2719d1b75 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -60,6 +60,10 @@ struct intel_guc {
spinlock_t contexts_lock;
/** @guc_ids: used to allocate new guc_ids */
struct ida guc_ids;
+   /** @num_guc_ids: number of guc_ids that can be used */
+   u32 num_guc_ids;
+   /** @max_guc_ids: max number of guc_ids that can be used */
+   u32 max_guc_ids;
/**
 * @guc_id_list: list of intel_context with valid guc_ids but no refs
 */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 46158d996bf6..8235e49bb347 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -344,7 +344,7 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc 
*guc, u32 index)
 {
struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
 
-   GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS);
+   GEM_BUG_ON(index >= guc->max_guc_ids);
 
return &base[index];
 }
@@ -353,7 +353,7 @@ static struct intel_context *__get_context(struct intel_guc 
*guc, u32 id)
 {
struct intel_context *ce = xa_load(&guc->context_lookup, id);
 
-   GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
+   GEM_BUG_ON(id >= guc->max_guc_ids);
 
return ce;
 }
@@ -363,8 +363,7 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
u32 size;
int ret;
 
-   size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) *
- GUC_MAX_LRC_DESCRIPTORS);
+   size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) * guc->max_guc_ids);
ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
 (void 
**)&guc->lrc_desc_pool_vaddr);
if (ret)
@@ -1193,7 +1192,7 @@ static void guc_submit_request(struct i915_request *rq)
 static int new_guc_id(struct intel_guc *guc)
 {
return ida_simple_get(&guc->guc_ids, 0,
- GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
+ guc->num_guc_ids, GFP_KERNEL |
  __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
 }
 
@@ -2704,6 +2703,8 @@ static bool __guc_submission_selected(struct intel_guc 
*guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+   guc->max_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
+   guc->num_guc_ids = GUC_MAX_LRC_DESCRIPTORS;
guc->submission_supported = __guc_submission_supported(guc);
guc->submission_selected = __guc_submission_selected(guc);
 }
@@ -2713,7 +2714,7 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 {
struct intel_context *ce;
 
-   if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
+   if (unlikely(desc_idx >= guc->max_guc_ids)) {
drm_err(&guc_to_gt(guc)->i915->drm,
"Invalid desc_idx %u", desc_idx);
return NULL;
@@ -3063,6 +3064,8 @@ void intel_guc_submission_print_info(struct intel_guc 
*guc,
 
drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
   atomic_read(&guc->outstanding_submission_g2h));
+   drm_printf(p, "GuC Number GuC IDs: %u\n", guc->num_guc_ids);
+   drm_printf(p, "GuC Max GuC IDs: %u\n", guc->max_guc_ids);
drm_printf(p, "GuC tasklet count: %u\n\n",
   atomic_read(&sched_engine->tasklet.count));
 
-- 
2.32.0

[PATCH 07/27] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission

2021-08-20 Thread Matthew Brost

Calling switch_to_kernel_context isn't needed if the engine PM reference
is taken while all contexts are pinned. By not calling
switch_to_kernel_context we save on issuing a request to the engine.

v2:
 (Daniel Vetter)
  - Add FIXME comment about pushing switch_to_kernel_context to backend

Signed-off-by: Matthew Brost 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 1f07ac4e0672..11fee66daf60 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -162,6 +162,15 @@ static bool switch_to_kernel_context(struct 
intel_engine_cs *engine)
unsigned long flags;
bool result = true;
 
+   /*
+* No need to switch_to_kernel_context if GuC submission
+*
+* FIXME: This execlists specific backend behavior in generic code, this
+* should be pushed to the backend.
+*/
+   if (intel_engine_uses_guc(engine))
+   return true;
+
/* GPU is pointing to the void, as good as in the kernel context. */
if (intel_gt_is_wedged(engine->gt))
return true;
-- 
2.32.0

[PATCH 10/27] drm/i915/guc: Introduce context parent-child relationship

2021-08-20 Thread Matthew Brost

Introduce context parent-child relationship. Once this relationship is
created all pinning / unpinning operations are directed to the parent
context. The parent context is responsible for pinning all of its'
children and itself.

This is a precursor to the full GuC multi-lrc implementation but aligns
to how GuC mutli-lrc interface is defined - a single H2G is used
register / deregister all of the contexts simultaneously.

Subsequent patches in the series will implement the pinning / unpinning
operations for parent / child contexts.

v2:
 (Daniel Vetter)
  - Add kernel doc, add wrapper to access parent to ensure safety

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context.c   | 29 ++
 drivers/gpu/drm/i915/gt/intel_context.h   | 39 +++
 drivers/gpu/drm/i915/gt/intel_context_types.h | 23 +++
 3 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 508cfe5770c0..00d1aee6d199 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -404,6 +404,8 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
 
INIT_LIST_HEAD(&ce->destroyed_link);
 
+   INIT_LIST_HEAD(&ce->guc_child_list);
+
/*
 * Initialize fence to be complete as this is expected to be complete
 * unless there is a pending schedule disable outstanding.
@@ -418,10 +420,17 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
 
 void intel_context_fini(struct intel_context *ce)
 {
+   struct intel_context *child, *next;
+
if (ce->timeline)
intel_timeline_put(ce->timeline);
i915_vm_put(ce->vm);
 
+   /* Need to put the creation ref for the children */
+   if (intel_context_is_parent(ce))
+   for_each_child_safe(ce, child, next)
+   intel_context_put(child);
+
mutex_destroy(&ce->pin_mutex);
i915_active_fini(&ce->active);
 }
@@ -537,6 +546,26 @@ struct i915_request 
*intel_context_find_active_request(struct intel_context *ce)
return active;
 }
 
+void intel_context_bind_parent_child(struct intel_context *parent,
+struct intel_context *child)
+{
+   /*
+* Callers responsibility to validate that this function is used
+* correctly but we use GEM_BUG_ON here ensure that they do.
+*/
+   GEM_BUG_ON(!intel_engine_uses_guc(parent->engine));
+   GEM_BUG_ON(intel_context_is_pinned(parent));
+   GEM_BUG_ON(intel_context_is_child(parent));
+   GEM_BUG_ON(intel_context_is_pinned(child));
+   GEM_BUG_ON(intel_context_is_child(child));
+   GEM_BUG_ON(intel_context_is_parent(child));
+
+   parent->guc_number_children++;
+   list_add_tail(&child->guc_child_link,
+ &parent->guc_child_list);
+   child->parent = parent;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index c41098950746..c2985822ab74 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -44,6 +44,45 @@ void intel_context_free(struct intel_context *ce);
 int intel_context_reconfigure_sseu(struct intel_context *ce,
   const struct intel_sseu sseu);
 
+static inline bool intel_context_is_child(struct intel_context *ce)
+{
+   return !!ce->parent;
+}
+
+static inline bool intel_context_is_parent(struct intel_context *ce)
+{
+   return !!ce->guc_number_children;
+}
+
+static inline bool intel_context_is_pinned(struct intel_context *ce);
+
+static inline struct intel_context *
+intel_context_to_parent(struct intel_context *ce)
+{
+if (intel_context_is_child(ce)) {
+   /*
+* The parent holds ref count to the child so it is always safe
+* for the parent to access the child, but the child has pointer
+* to the parent without a ref. To ensure this is safe the child
+* should only access the parent pointer while the parent is
+* pinned.
+*/
+GEM_BUG_ON(!intel_context_is_pinned(ce->parent));
+
+return ce->parent;
+} else {
+return ce;
+}
+}
+
+void intel_context_bind_parent_child(struct intel_context *parent,
+struct intel_context *child);
+
+#define for_each_child(parent, ce)\
+   list_for_each_entry(ce, &(parent)->guc_child_list, guc_child_link)
+#define for_each_child_safe(parent, ce, cn)\
+   list_for_each_entry_safe(ce, cn, &(parent)->guc_child_list, 
guc_child_link)
+
 /**
  * intel_context_lock_pinned - Stablises the 'pinned' status of the HW context
  * @ce - the context
diff --git a/dri

[PATCH 11/27] drm/i915/guc: Implement parallel context pin / unpin functions

2021-08-20 Thread Matthew Brost

Parallel contexts are perma-pinned by the upper layers which makes the
backend implementation rather simple. The parent pins the guc_id and
children increment the parent's pin count on pin to ensure all the
contexts are unpinned before we disable scheduling with the GuC / or
deregister the context.

v2:
 (Daniel Vetter)
  - Perma-pin parallel contexts

Signed-off-by: Matthew Brost 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 70 +++
 1 file changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ffafbac7335e..14b24298cdd7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2395,6 +2395,76 @@ static const struct intel_context_ops 
virtual_guc_context_ops = {
.get_sibling = guc_virtual_get_sibling,
 };
 
+/* Future patches will use this function */
+__maybe_unused
+static int guc_parent_context_pin(struct intel_context *ce, void *vaddr)
+{
+   struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+   struct intel_guc *guc = ce_to_guc(ce);
+   int ret;
+
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   ret = pin_guc_id(guc, ce);
+   if (unlikely(ret < 0))
+   return ret;
+
+   return __guc_context_pin(ce, engine, vaddr);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static int guc_child_context_pin(struct intel_context *ce, void *vaddr)
+{
+   struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+   GEM_BUG_ON(!intel_context_is_child(ce));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   __intel_context_pin(ce->parent);
+   return __guc_context_pin(ce, engine, vaddr);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static void guc_parent_context_unpin(struct intel_context *ce)
+{
+   struct intel_guc *guc = ce_to_guc(ce);
+
+   GEM_BUG_ON(context_enabled(ce));
+   GEM_BUG_ON(intel_context_is_barrier(ce));
+   GEM_BUG_ON(!intel_context_is_parent(ce));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   unpin_guc_id(guc, ce);
+   lrc_unpin(ce);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static void guc_child_context_unpin(struct intel_context *ce)
+{
+   GEM_BUG_ON(context_enabled(ce));
+   GEM_BUG_ON(intel_context_is_barrier(ce));
+   GEM_BUG_ON(!intel_context_is_child(ce));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   lrc_unpin(ce);
+}
+
+/* Future patches will use this function */
+__maybe_unused
+static void guc_child_context_post_unpin(struct intel_context *ce)
+{
+   GEM_BUG_ON(!intel_context_is_child(ce));
+   GEM_BUG_ON(!intel_context_is_pinned(ce->parent));
+   GEM_BUG_ON(!intel_engine_is_virtual(ce->engine));
+
+   lrc_post_unpin(ce);
+   intel_context_unpin(ce->parent);
+}
+
 static bool
 guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
 {
-- 
2.32.0

[PATCH 05/27] drm/i915: Add GT PM unpark worker

2021-08-20 Thread Matthew Brost

Sometimes it is desirable to queue work up for later if the GT PM isn't
held and run that work on next GT PM unpark.

Implemented with a list in the GT of all pending work, workqueues in
the list, a callback to add a workqueue to the list, and finally a
wakeref post_get callback that iterates / drains the list + queues the
workqueues.

First user of this is deregistration of GuC contexts.

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/Makefile |  1 +
 drivers/gpu/drm/i915/gt/intel_gt.c|  3 ++
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |  8 
 .../gpu/drm/i915/gt/intel_gt_pm_unpark_work.c | 35 
 .../gpu/drm/i915/gt/intel_gt_pm_unpark_work.h | 40 +++
 drivers/gpu/drm/i915/gt/intel_gt_types.h  | 10 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|  8 ++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 +--
 drivers/gpu/drm/i915/intel_wakeref.c  |  5 +++
 drivers/gpu/drm/i915/intel_wakeref.h  |  1 +
 10 files changed, 119 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 642a5b5a1b81..579bdc069f25 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -103,6 +103,7 @@ gt-y += \
gt/intel_gt_clock_utils.o \
gt/intel_gt_irq.o \
gt/intel_gt_pm.o \
+   gt/intel_gt_pm_unpark_work.o \
gt/intel_gt_pm_irq.o \
gt/intel_gt_requests.o \
gt/intel_gtt.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index 62d40c986642..7e690e74baa2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -29,6 +29,9 @@ void intel_gt_init_early(struct intel_gt *gt, struct 
drm_i915_private *i915)
 
spin_lock_init(>->irq_lock);
 
+   spin_lock_init(>->pm_unpark_work_lock);
+   INIT_LIST_HEAD(>->pm_unpark_work_list);
+
INIT_LIST_HEAD(>->closed_vma);
spin_lock_init(>->closed_lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index dea8e2479897..564c11a3748b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -90,6 +90,13 @@ static int __gt_unpark(struct intel_wakeref *wf)
return 0;
 }
 
+static void __gt_unpark_work_queue(struct intel_wakeref *wf)
+{
+   struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
+
+   intel_gt_pm_unpark_work_queue(gt);
+}
+
 static int __gt_park(struct intel_wakeref *wf)
 {
struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
@@ -118,6 +125,7 @@ static int __gt_park(struct intel_wakeref *wf)
 
 static const struct intel_wakeref_ops wf_ops = {
.get = __gt_unpark,
+   .post_get = __gt_unpark_work_queue,
.put = __gt_park,
 };
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.c
new file mode 100644
index ..23162dbd0c35
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "intel_runtime_pm.h"
+#include "intel_gt_pm.h"
+
+void intel_gt_pm_unpark_work_queue(struct intel_gt *gt)
+{
+   struct intel_gt_pm_unpark_work *work, *next;
+   unsigned long flags;
+
+   spin_lock_irqsave(>->pm_unpark_work_lock, flags);
+   list_for_each_entry_safe(work, next,
+>->pm_unpark_work_list, link) {
+   list_del_init(&work->link);
+   queue_work(system_unbound_wq, &work->worker);
+   }
+   spin_unlock_irqrestore(>->pm_unpark_work_lock, flags);
+}
+
+void intel_gt_pm_unpark_work_add(struct intel_gt *gt,
+struct intel_gt_pm_unpark_work *work)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(>->pm_unpark_work_lock, flags);
+   if (intel_gt_pm_is_awake(gt))
+   queue_work(system_unbound_wq, &work->worker);
+   else if (list_empty(&work->link))
+   list_add_tail(&work->link, >->pm_unpark_work_list);
+   spin_unlock_irqrestore(>->pm_unpark_work_lock, flags);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.h 
b/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.h
new file mode 100644
index ..eaf1dc313aa2
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_unpark_work.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#ifndef INTEL_GT_PM_UNPARK_WORK_H
+#define INTEL_GT_PM_UNPARK_WORK_H
+
+#include 
+#include 
+
+struct intel_gt;
+
+/**
+ * struct intel_gt_pm_unpark_work - work to be scheduled when GT unparked
+ */
+struct intel_gt_pm_unpark_wor

[PATCH 15/27] drm/i915/guc: Implement multi-lrc submission

2021-08-20 Thread Matthew Brost

Implement multi-lrc submission via a single workqueue entry and single
H2G. The workqueue entry contains an updated tail value for each
request, of all the contexts in the multi-lrc submission, and updates
these values simultaneously. As such, the tasklet and bypass path have
been updated to coalesce requests into a single submission.

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.c|  21 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|   8 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  24 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 312 +++---
 drivers/gpu/drm/i915/i915_request.h   |   8 +
 6 files changed, 317 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index fbfcae727d7f..879aef662b2e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -748,3 +748,24 @@ void intel_guc_load_status(struct intel_guc *guc, struct 
drm_printer *p)
}
}
 }
+
+void intel_guc_write_barrier(struct intel_guc *guc)
+{
+   struct intel_gt *gt = guc_to_gt(guc);
+
+   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
+   GEM_BUG_ON(guc->send_regs.fw_domains);
+   /*
+* This register is used by the i915 and GuC for MMIO based
+* communication. Once we are in this code CTBs are the only
+* method the i915 uses to communicate with the GuC so it is
+* safe to write to this register (a value of 0 is NOP for MMIO
+* communication). If we ever start mixing CTBs and MMIOs a new
+* register will have to be chosen.
+*/
+   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
+   } else {
+   /* wmb() sufficient for a barrier if in smem */
+   wmb();
+   }
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 3f95b1b4f15c..0ead2406d03c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -37,6 +37,12 @@ struct intel_guc {
/* Global engine used to submit requests to GuC */
struct i915_sched_engine *sched_engine;
struct i915_request *stalled_request;
+   enum {
+   STALL_NONE,
+   STALL_REGISTER_CONTEXT,
+   STALL_MOVE_LRC_TAIL,
+   STALL_ADD_REQUEST,
+   } submission_stall_reason;
 
/* intel_guc_recv interrupt related state */
spinlock_t irq_lock;
@@ -332,4 +338,6 @@ void intel_guc_submission_cancel_requests(struct intel_guc 
*guc);
 
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
+void intel_guc_write_barrier(struct intel_guc *guc);
+
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 20c710a74498..10d1878d2826 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -377,28 +377,6 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
return ++ct->requests.last_fence;
 }
 
-static void write_barrier(struct intel_guc_ct *ct)
-{
-   struct intel_guc *guc = ct_to_guc(ct);
-   struct intel_gt *gt = guc_to_gt(guc);
-
-   if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
-   GEM_BUG_ON(guc->send_regs.fw_domains);
-   /*
-* This register is used by the i915 and GuC for MMIO based
-* communication. Once we are in this code CTBs are the only
-* method the i915 uses to communicate with the GuC so it is
-* safe to write to this register (a value of 0 is NOP for MMIO
-* communication). If we ever start mixing CTBs and MMIOs a new
-* register will have to be chosen.
-*/
-   intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
-   } else {
-   /* wmb() sufficient for a barrier if in smem */
-   wmb();
-   }
-}
-
 static int ct_write(struct intel_guc_ct *ct,
const u32 *action,
u32 len /* in dwords */,
@@ -468,7 +446,7 @@ static int ct_write(struct intel_guc_ct *ct,
 * make sure H2G buffer update and LRC tail update (if this triggering a
 * submission) are visible before updating the descriptor tail
 */
-   write_barrier(ct);
+   intel_guc_write_barrier(ct_to_guc(ct));
 
/* update local copies */
ctb->tail = tail;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 0e600a3b8f1e..6cd26dc060d1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -65,12 +65,14 @@
 #de

[PATCH 12/27] drm/i915/guc: Add multi-lrc context registration

2021-08-20 Thread Matthew Brost

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c   |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 109 +-
 4 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 0fafc178cf2c..6f567ebeb039 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -232,8 +232,20 @@ struct intel_context {
/** @parent: pointer to parent if child */
struct intel_context *parent;
 
+
+   /** @guc_wqi_head: head pointer in work queue */
+   u16 guc_wqi_head;
+   /** @guc_wqi_tail: tail pointer in work queue */
+   u16 guc_wqi_tail;
+
/** @guc_number_children: number of children if parent */
u8 guc_number_children;
+
+   /**
+* @parent_page: page in context used by parent for work queue,
+* work queue descriptor
+*/
+   u8 parent_page;
};
 
 #ifdef CONFIG_DRM_I915_SELFTEST
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index bb4af4977920..0ddbad4e062a 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -861,6 +861,11 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
 
+   if (intel_context_is_parent(ce)) {
+   ce->parent_page = context_size / PAGE_SIZE;
+   context_size += PAGE_SIZE;
+   }
+
obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
if (IS_ERR(obj))
obj = i915_gem_object_create_shmem(engine->i915, context_size);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index fa4be13c8854..0e600a3b8f1e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -52,7 +52,7 @@
 
 #define GUC_DOORBELL_INVALID   256
 
-#define GUC_WQ_SIZE(PAGE_SIZE * 2)
+#define GUC_WQ_SIZE(PAGE_SIZE / 2)
 
 /* Work queue item header definitions */
 #define WQ_STATUS_ACTIVE   1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 14b24298cdd7..dbcb9ab28a9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -340,6 +340,39 @@ static struct i915_priolist *to_priolist(struct rb_node 
*rb)
return rb_entry(rb, struct i915_priolist, node);
 }
 
+/*
+ * When using multi-lrc submission an extra page in the context state is
+ * reserved for the process descriptor and work queue.
+ *
+ * The layout of this page is below:
+ * 0   guc_process_desc
+ * ... unused
+ * PAGE_SIZE / 2   work queue start
+ * ... work queue
+ * PAGE_SIZE - 1   work queue end
+ */
+#define WQ_OFFSET  (PAGE_SIZE / 2)
+static u32 __get_process_desc_offset(struct intel_context *ce)
+{
+   GEM_BUG_ON(!ce->parent_page);
+
+   return ce->parent_page * PAGE_SIZE;
+}
+
+static u32 __get_wq_offset(struct intel_context *ce)
+{
+   return __get_process_desc_offset(ce) + WQ_OFFSET;
+}
+
+static struct guc_process_desc *
+__get_process_desc(struct intel_context *ce)
+{
+   return (struct guc_process_desc *)
+   (ce->lrc_reg_state +
+((__get_process_desc_offset(ce) -
+  LRC_STATE_OFFSET) / sizeof(u32)));
+}
+
 static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
@@ -1342,6 +1375,30 @@ static void unpin_guc_id(struct intel_guc *guc, struct 
intel_context *ce)
spin_unlock_irqrestore(&guc->submission_state.lock, flags);
 }
 
+static int __guc_action_register_multi_lrc(struct intel_guc *guc,
+  struct intel_context *ce,
+  u32 guc_id,
+  u32 offset,
+  bool loop)
+{
+   struct intel_context *child;
+   u32 action[4 + MAX_ENGINE_INSTANCE];
+   int len = 0;
+
+   GEM_BUG_ON(ce->guc_number_children > MAX_ENGINE_INSTANCE);
+
+   action[len++] =

[PATCH 09/27] drm/i915: Expose logical engine instance to user

2021-08-20 Thread Matthew Brost

Expose logical engine instance to user via query engine info IOCTL. This
is required for split-frame workloads as these needs to be placed on
engines in a logically contiguous order. The logical mapping can change
based on fusing. Rather than having user have knowledge of the fusing we
simply just expose the logical mapping with the existing query engine
info IOCTL.

IGT: https://patchwork.freedesktop.org/patch/445637/?series=92854&rev=1
media UMD: link coming soon

v2:
 (Daniel Vetter)
  - Add IGT link, placeholder for media UMD

Cc: Tvrtko Ursulin 
Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/i915_query.c | 2 ++
 include/uapi/drm/i915_drm.h   | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index e49da36c62fb..8a72923fbdba 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,7 +124,9 @@ query_engine_info(struct drm_i915_private *i915,
for_each_uabi_engine(engine, i915) {
info.engine.engine_class = engine->uabi_class;
info.engine.engine_instance = engine->uabi_instance;
+   info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
info.capabilities = engine->uabi_capabilities;
+   info.logical_instance = ilog2(engine->logical_mask);
 
if (copy_to_user(info_ptr, &info, sizeof(info)))
return -EFAULT;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bde5860b3686..b1248a67b4f8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2726,14 +2726,20 @@ struct drm_i915_engine_info {
 
/** @flags: Engine flags. */
__u64 flags;
+#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE  (1 << 0)
 
/** @capabilities: Capabilities of this engine. */
__u64 capabilities;
 #define I915_VIDEO_CLASS_CAPABILITY_HEVC   (1 << 0)
 #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC(1 << 1)
 
+   /** @logical_instance: Logical instance of engine */
+   __u16 logical_instance;
+
/** @rsvd1: Reserved fields. */
-   __u64 rsvd1[4];
+   __u16 rsvd1[3];
+   /** @rsvd2: Reserved fields. */
+   __u64 rsvd2[3];
 };
 
 /**
-- 
2.32.0

[PATCH 08/27] drm/i915: Add logical engine mapping

2021-08-20 Thread Matthew Brost

Add logical engine mapping. This is required for split-frame, as
workloads need to be placed on engines in a logically contiguous manner.

v2:
 (Daniel Vetter)
  - Add kernel doc for new fields

Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 60 ---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  5 ++
 .../drm/i915/gt/intel_execlists_submission.c  |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c|  2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 21 +--
 5 files changed, 60 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 0d9105a31d84..4d790f9a65dd 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -290,7 +290,8 @@ static void nop_irq_handler(struct intel_engine_cs *engine, 
u16 iir)
GEM_DEBUG_WARN_ON(iir);
 }
 
-static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
+static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id,
+ u8 logical_instance)
 {
const struct engine_info *info = &intel_engines[id];
struct drm_i915_private *i915 = gt->i915;
@@ -334,6 +335,7 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
intel_engine_id id)
 
engine->class = info->class;
engine->instance = info->instance;
+   engine->logical_mask = BIT(logical_instance);
__sprint_engine_name(engine);
 
engine->props.heartbeat_interval_ms =
@@ -572,6 +574,37 @@ static intel_engine_mask_t init_engine_mask(struct 
intel_gt *gt)
return info->engine_mask;
 }
 
+static void populate_logical_ids(struct intel_gt *gt, u8 *logical_ids,
+u8 class, const u8 *map, u8 num_instances)
+{
+   int i, j;
+   u8 current_logical_id = 0;
+
+   for (j = 0; j < num_instances; ++j) {
+   for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
+   if (!HAS_ENGINE(gt, i) ||
+   intel_engines[i].class != class)
+   continue;
+
+   if (intel_engines[i].instance == map[j]) {
+   logical_ids[intel_engines[i].instance] =
+   current_logical_id++;
+   break;
+   }
+   }
+   }
+}
+
+static void setup_logical_ids(struct intel_gt *gt, u8 *logical_ids, u8 class)
+{
+   int i;
+   u8 map[MAX_ENGINE_INSTANCE + 1];
+
+   for (i = 0; i < MAX_ENGINE_INSTANCE + 1; ++i)
+   map[i] = i;
+   populate_logical_ids(gt, logical_ids, class, map, ARRAY_SIZE(map));
+}
+
 /**
  * intel_engines_init_mmio() - allocate and prepare the Engine Command 
Streamers
  * @gt: pointer to struct intel_gt
@@ -583,7 +616,8 @@ int intel_engines_init_mmio(struct intel_gt *gt)
struct drm_i915_private *i915 = gt->i915;
const unsigned int engine_mask = init_engine_mask(gt);
unsigned int mask = 0;
-   unsigned int i;
+   unsigned int i, class;
+   u8 logical_ids[MAX_ENGINE_INSTANCE + 1];
int err;
 
drm_WARN_ON(&i915->drm, engine_mask == 0);
@@ -593,15 +627,23 @@ int intel_engines_init_mmio(struct intel_gt *gt)
if (i915_inject_probe_failure(i915))
return -ENODEV;
 
-   for (i = 0; i < ARRAY_SIZE(intel_engines); i++) {
-   if (!HAS_ENGINE(gt, i))
-   continue;
+   for (class = 0; class < MAX_ENGINE_CLASS + 1; ++class) {
+   setup_logical_ids(gt, logical_ids, class);
 
-   err = intel_engine_setup(gt, i);
-   if (err)
-   goto cleanup;
+   for (i = 0; i < ARRAY_SIZE(intel_engines); ++i) {
+   u8 instance = intel_engines[i].instance;
+
+   if (intel_engines[i].class != class ||
+   !HAS_ENGINE(gt, i))
+   continue;
 
-   mask |= BIT(i);
+   err = intel_engine_setup(gt, i,
+logical_ids[instance]);
+   if (err)
+   goto cleanup;
+
+   mask |= BIT(i);
+   }
}
 
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index ed91bcff20eb..fddf35546b58 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -266,6 +266,11 @@ struct intel_engine_cs {
unsigned int guc_id;
 
intel_engine_mask_t mask;
+   /**
+* @logical_mask: logical mask of engine, reported to user space via
+* query IOCTL and used to communicate with the GuC in logical space
+*/
+   intel_engine_mask_t logical_mas

[PATCH 00/27] Parallel submission aka multi-bb execbuf

2021-08-20 Thread Matthew Brost

As discussed in [1] we are introducing a new parallel submission uAPI
for the i915 which allows more than 1 BB to be submitted in an execbuf
IOCTL. This is the implemenation for both GuC and execlists.

In addition to selftests in the series, an IGT is available implemented
in the first 4 patches [2].

Media UMD changes to land soon.

First patch in the series in a squashed patch of [3] and does not need
to be reviewed here.

The execbuf IOCTL changes have been done in a single large patch (#24)
as all the changes flow together and I believe a single patch will be
better if some one has to lookup this change in the future. Can split in
a series of smaller patches if desired.

This code is available in a public [4] repo for UMD teams to test there
code on.

v2: Drop complicated state machine to block in kernel if no guc_ids
available, perma-pin parallel contexts, reworker execbuf IOCTL to be a
series of loops inside the IOCTL rather than 1 large one on the outside,
address Daniel Vetter's comments, rebase on [3]  

Signed-off-by: Matthew Brost 

[1] https://patchwork.freedesktop.org/series/92028/
[2] https://patchwork.freedesktop.org/series/93071/
[3] https://patchwork.freedesktop.org/series/93704/
[4] 
https://gitlab.freedesktop.org/mbrost/mbrost-drm-intel/-/tree/drm-intel-parallel

Matthew Brost (27):
  drm/i915/guc: Squash Clean up GuC CI failures, simplify locking, and
kernel DOC
  drm/i915/guc: Allow flexible number of context ids
  drm/i915/guc: Connect the number of guc_ids to debugfs
  drm/i915/guc: Take GT PM ref when deregistering context
  drm/i915: Add GT PM unpark worker
  drm/i915/guc: Take engine PM when a context is pinned with GuC
submission
  drm/i915/guc: Don't call switch_to_kernel_context with GuC submission
  drm/i915: Add logical engine mapping
  drm/i915: Expose logical engine instance to user
  drm/i915/guc: Introduce context parent-child relationship
  drm/i915/guc: Implement parallel context pin / unpin functions
  drm/i915/guc: Add multi-lrc context registration
  drm/i915/guc: Ensure GuC schedule operations do not operate on child
contexts
  drm/i915/guc: Assign contexts in parent-child relationship consecutive
guc_ids
  drm/i915/guc: Implement multi-lrc submission
  drm/i915/guc: Insert submit fences between requests in parent-child
relationship
  drm/i915/guc: Implement multi-lrc reset
  drm/i915/guc: Update debugfs for GuC multi-lrc
  drm/i915: Fix bug in user proto-context creation that leaked contexts
  drm/i915/guc: Connect UAPI to GuC multi-lrc interface
  drm/i915/doc: Update parallel submit doc to point to i915_drm.h
  drm/i915/guc: Add basic GuC multi-lrc selftest
  drm/i915/guc: Implement no mid batch preemption for multi-lrc
  drm/i915: Multi-BB execbuf
  drm/i915/guc: Handle errors in multi-lrc requests
  drm/i915: Enable multi-bb execbuf
  drm/i915/execlists: Weak parallel submission support for execlists

 Documentation/gpu/rfc/i915_parallel_execbuf.h |  122 -
 Documentation/gpu/rfc/i915_scheduler.rst  |4 +-
 drivers/gpu/drm/i915/Makefile |1 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  222 +-
 .../gpu/drm/i915/gem/i915_gem_context_types.h |6 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  765 --
 drivers/gpu/drm/i915/gt/intel_context.c   |   58 +-
 drivers/gpu/drm/i915/gt/intel_context.h   |   52 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  148 +-
 drivers/gpu/drm/i915/gt/intel_engine.h|   12 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |   66 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |9 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.h |   20 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |5 +
 .../drm/i915/gt/intel_execlists_submission.c  |   68 +-
 drivers/gpu/drm/i915/gt/intel_gt.c|3 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.c |8 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.h |   23 +
 .../gpu/drm/i915/gt/intel_gt_pm_unpark_work.c |   35 +
 .../gpu/drm/i915/gt/intel_gt_pm_unpark_work.h |   40 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h  |   10 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   |7 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   12 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c|   29 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c|   21 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h|   61 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c|2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |   30 +-
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c|   36 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   10 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2312 +
 drivers/gpu/drm/i915/gt/uc/selftest_guc.c |  126 +
 .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   |  180 ++
 drivers/gpu/drm/i915/i915_gpu_error.c |   39 +-
 drivers/gpu/drm/i915/i91

[PATCH 3/3] drm/panfrost: Clamp lock region to Bifrost minimum

2021-08-20 Thread Alyssa Rosenzweig

When locking a region, we currently clamp to a PAGE_SIZE as the minimum
lock region. While this is valid for Midgard, it is invalid for Bifrost,
where the minimum locking size is 8x larger than the 4k page size. Add a
hardware definition for the minimum lock region size (corresponding to
KBASE_LOCK_REGION_MIN_SIZE_LOG2 in kbase) and respect it.

Signed-off-by: Alyssa Rosenzweig 
Tested-by: Chris Morgan 
Cc: 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c  | 2 +-
 drivers/gpu/drm/panfrost/panfrost_regs.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 3a795273e505..dfe5f1d29763 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -66,7 +66,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 
as_nr,
/* The size is encoded as ceil(log2) minus(1), which may be calculated
 * with fls. The size must be clamped to hardware bounds.
 */
-   size = max_t(u64, size, PAGE_SIZE);
+   size = max_t(u64, size, AS_LOCK_REGION_MIN_SIZE);
region_width = fls64(size - 1) - 1;
region |= region_width;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 1940ff86e49a..6c5a11ef1ee8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -316,6 +316,8 @@
 #define AS_FAULTSTATUS_ACCESS_TYPE_READ(0x2 << 8)
 #define AS_FAULTSTATUS_ACCESS_TYPE_WRITE   (0x3 << 8)
 
+#define AS_LOCK_REGION_MIN_SIZE (1ULL << 15)
+
 #define gpu_write(dev, reg, data) writel(data, dev->iomem + reg)
 #define gpu_read(dev, reg) readl(dev->iomem + reg)
 
-- 
2.30.2

[PATCH 2/3] drm/panfrost: Use u64 for size in lock_region

2021-08-20 Thread Alyssa Rosenzweig

Mali virtual addresses are 48-bit. Use a u64 instead of size_t to ensure
we can express the "lock everything" condition as ~0ULL without relying
on platform-specific behaviour.

Signed-off-by: Alyssa Rosenzweig 
Suggested-by: Rob Herring 
Tested-by: Chris Morgan 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index f6e02d0392f4..3a795273e505 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -58,7 +58,7 @@ static int write_cmd(struct panfrost_device *pfdev, u32 
as_nr, u32 cmd)
 }
 
 static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
-   u64 iova, size_t size)
+   u64 iova, u64 size)
 {
u8 region_width;
u64 region = iova & PAGE_MASK;
@@ -78,7 +78,7 @@ static void lock_region(struct panfrost_device *pfdev, u32 
as_nr,
 
 
 static int mmu_hw_do_operation_locked(struct panfrost_device *pfdev, int as_nr,
- u64 iova, size_t size, u32 op)
+ u64 iova, u64 size, u32 op)
 {
if (as_nr < 0)
return 0;
@@ -95,7 +95,7 @@ static int mmu_hw_do_operation_locked(struct panfrost_device 
*pfdev, int as_nr,
 
 static int mmu_hw_do_operation(struct panfrost_device *pfdev,
   struct panfrost_mmu *mmu,
-  u64 iova, size_t size, u32 op)
+  u64 iova, u64 size, u32 op)
 {
int ret;
 
@@ -112,7 +112,7 @@ static void panfrost_mmu_enable(struct panfrost_device 
*pfdev, struct panfrost_m
u64 transtab = cfg->arm_mali_lpae_cfg.transtab;
u64 memattr = cfg->arm_mali_lpae_cfg.memattr;
 
-   mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
+   mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, 
AS_COMMAND_FLUSH_MEM);
 
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), transtab & 0xUL);
mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), transtab >> 32);
@@ -128,7 +128,7 @@ static void panfrost_mmu_enable(struct panfrost_device 
*pfdev, struct panfrost_m
 
 static void panfrost_mmu_disable(struct panfrost_device *pfdev, u32 as_nr)
 {
-   mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
+   mmu_hw_do_operation_locked(pfdev, as_nr, 0, ~0ULL, 
AS_COMMAND_FLUSH_MEM);
 
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), 0);
mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), 0);
@@ -242,7 +242,7 @@ static size_t get_pgsize(u64 addr, size_t size)
 
 static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
 struct panfrost_mmu *mmu,
-u64 iova, size_t size)
+u64 iova, u64 size)
 {
if (mmu->as < 0)
return;
-- 
2.30.2

[PATCH 1/3] drm/panfrost: Simplify lock_region calculation

2021-08-20 Thread Alyssa Rosenzweig

In lock_region, simplify the calculation of the region_width parameter.
This field is the size, but encoded as log2(ceil(size)) - 1.
log2(ceil(size)) may be computed directly as fls(size - 1). However, we
want to use the 64-bit versions as the amount to lock can exceed
32-bits.

This avoids undefined behaviour when locking all memory (size ~0),
caught by UBSAN.

Signed-off-by: Alyssa Rosenzweig 
Reported-and-tested-by: Chris Morgan 
Cc: 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 0da5b3100ab1..f6e02d0392f4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -62,21 +62,12 @@ static void lock_region(struct panfrost_device *pfdev, u32 
as_nr,
 {
u8 region_width;
u64 region = iova & PAGE_MASK;
-   /*
-* fls returns:
-* 1 .. 32
-*
-* 10 + fls(num_pages)
-* results in the range (11 .. 42)
-*/
-
-   size = round_up(size, PAGE_SIZE);
 
-   region_width = 10 + fls(size >> PAGE_SHIFT);
-   if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) {
-   /* not pow2, so must go up to the next pow2 */
-   region_width += 1;
-   }
+   /* The size is encoded as ceil(log2) minus(1), which may be calculated
+* with fls. The size must be clamped to hardware bounds.
+*/
+   size = max_t(u64, size, PAGE_SIZE);
+   region_width = fls64(size - 1) - 1;
region |= region_width;
 
/* Lock the region that needs to be updated */
-- 
2.30.2

[PATCH 0/3] drm/panfrost: Bug fixes for lock_region

2021-08-20 Thread Alyssa Rosenzweig

Chris Morgan reported UBSAN errors in panfrost and tracked them down to
the size computation in lock_region. This calculation is overcomplicated
(seemingly cargo culted from kbase) and can be simplified with kernel
helpers and some mathematical identities. The first patch in the series
rewrites the calculation in a form avoiding undefined behaviour; Chris
confirms it placates UBSAN.

While researching this function, I noticed a pair of other potential
bugs: Bifrost can lock more than 4GiB at a time, but must lock at least
32KiB at a time. The latter patches in the series handle these cases.

The size computation was unit-tested in userspace. Relevant code below,
just missing some copypaste definitions for fls64/clamp/etc:

#define MIN_LOCK (1ULL << 12)
#define MAX_LOCK (1ULL << 48)

struct {
uint64_t size;
uint8_t encoded;
} tests[] = {
/* Clamping */
{ 0, 11 },
{ 1, 11 },
{ 2, 11 },
{ 4095, 11 },
/* Power of two */
{ 4096, 11 },
/* Round up */
{ 4097, 12 },
{ 8192, 12 },
{ 16384, 13 },
{ 16385, 14 },
/* Maximum */
{ ~0ULL, 47 },
};

static uint8_t region_width(uint64_t size)
{
size = clamp(size, MIN_LOCK, MAX_LOCK);
return fls64(size - 1) - 1;
}

int main(int argc, char **argv)
{
for (unsigned i = 0; i < ARRAY_SIZE(tests); ++i) {
uint64_t test = tests[i].size;
uint8_t expected = tests[i].encoded;
uint8_t actual = region_width(test);

assert(expected == actual);
}
}

Alyssa Rosenzweig (3):
  drm/panfrost: Simplify lock_region calculation
  drm/panfrost: Use u64 for size in lock_region
  drm/panfrost: Clamp lock region to Bifrost minimum

 drivers/gpu/drm/panfrost/panfrost_mmu.c  | 31 +---
 drivers/gpu/drm/panfrost/panfrost_regs.h |  2 ++
 2 files changed, 13 insertions(+), 20 deletions(-)

-- 
2.30.2

Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device

2021-08-20 Thread Lyude Paul

On Fri, 2021-08-20 at 11:20 +, Lin, Wayne wrote:
> [Public]
> 
> > -Original Message-
> > From: Lyude Paul 
> > Sent: Thursday, August 19, 2021 2:59 AM
> > To: Lin, Wayne ; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas ; Wentland, Harry <
> > harry.wentl...@amd.com>; Zuo, Jerry
> > ; Wu, Hersen ; Juston Li <
> > juston...@intel.com>; Imre Deak ;
> > Ville Syrjälä ; Daniel Vetter <
> > daniel.vet...@ffwll.ch>; Sean Paul ; Maarten Lankhorst
> > ; Maxime Ripard ;
> > Thomas Zimmermann ;
> > David Airlie ; Daniel Vetter ; Deucher,
> > Alexander ; Siqueira,
> > Rodrigo ; Pillai, Aurabindo <
> > aurabindo.pil...@amd.com>; Eryk Brol ; Bas
> > Nieuwenhuizen ; Cornij, Nikola <
> > nikola.cor...@amd.com>; Jani Nikula ; Manasi
> > Navare ; Ankit Nautiyal <
> > ankit.k.nauti...@intel.com>; José Roberto de Souza ;
> > Sean Paul ; Ben Skeggs ; 
> > sta...@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > On Wed, 2021-08-11 at 09:49 +, Lin, Wayne wrote:
> > > [Public]
> > > 
> > > > -Original Message-
> > > > From: Lyude Paul 
> > > > Sent: Wednesday, August 11, 2021 4:45 AM
> > > > To: Lin, Wayne ; dri-devel@lists.freedesktop.org
> > > > Cc: Kazlauskas, Nicholas ; Wentland,
> > > > Harry < harry.wentl...@amd.com>; Zuo, Jerry ; Wu,
> > > > Hersen ; Juston Li < juston...@intel.com>; Imre
> > > > Deak ; Ville Syrjälä
> > > > ; Daniel Vetter <
> > > > daniel.vet...@ffwll.ch>; Sean Paul ; Maarten
> > > > Lankhorst ; Maxime Ripard
> > > > ; Thomas Zimmermann ; David
> > > > Airlie ; Daniel Vetter ; Deucher,
> > > > Alexander ; Siqueira, Rodrigo
> > > > ; Pillai, Aurabindo <
> > > > aurabindo.pil...@amd.com>; Eryk Brol ; Bas
> > > > Nieuwenhuizen ; Cornij, Nikola <
> > > > nikola.cor...@amd.com>; Jani Nikula ; Manasi
> > > > Navare ; Ankit Nautiyal <
> > > > ankit.k.nauti...@intel.com>; José Roberto de Souza
> > > > ; Sean Paul ; Ben
> > > > Skeggs ; sta...@vger.kernel.org
> > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > connected end device
> > > > 
> > > > On Wed, 2021-08-04 at 07:13 +, Lin, Wayne wrote:
> > > > > [Public]
> > > > > 
> > > > > > -Original Message-
> > > > > > From: Lyude Paul 
> > > > > > Sent: Wednesday, August 4, 2021 8:09 AM
> > > > > > To: Lin, Wayne ;
> > > > > > dri-devel@lists.freedesktop.org
> > > > > > Cc: Kazlauskas, Nicholas ;
> > > > > > Wentland, Harry < harry.wentl...@amd.com>; Zuo, Jerry
> > > > > > ; Wu, Hersen ; Juston Li
> > > > > > < juston...@intel.com>; Imre Deak ; Ville
> > > > > > Syrjälä ; Wentland, Harry <
> > > > > > harry.wentl...@amd.com>; Daniel Vetter ;
> > > > > > Sean Paul ; Maarten Lankhorst <
> > > > > > maarten.lankho...@linux.intel.com>; Maxime Ripard
> > > > > > ; Thomas Zimmermann ;
> > > > > > David Airlie ; Daniel Vetter
> > > > > > ; Deucher, Alexander
> > > > > > ; Siqueira, Rodrigo <
> > > > > > rodrigo.sique...@amd.com>; Pillai, Aurabindo
> > > > > > ; Eryk Brol ; Bas
> > > > > > Nieuwenhuizen ; Cornij, Nikola
> > > > > > ; Jani Nikula ;
> > > > > > Manasi Navare ; Ankit Nautiyal
> > > > > > ; José Roberto de Souza
> > > > > > ; Sean Paul ; Ben
> > > > > > Skeggs ; sta...@vger.kernel.org
> > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > > connected end device
> > > > > > 
> > > > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > > > > > [Why]
> > > > > > > > Currently, we will create connectors for all output ports no
> > > > > > > > matter it's connected or not. However, in MST, we can only
> > > > > > > > determine whether an output port really stands for a
> > > > > > > > "connector"
> > > > > > > > till it is connected and check its peer device type as an
> > > > > > > > end device.
> > > > > > > 
> > > > > > > What is this commit trying to solve exactly? e.g. is AMD
> > > > > > > currently running into issues with there being too many DRM
> > > > > > > connectors or something like that?
> > > > > > > Ideally this is behavior I'd very much like us to keep as-is
> > > > > > > unless there's good reason to change it.
> > > > > Hi Lyude,
> > > > > Really appreciate for your time to elaborate in such detail. Thanks!
> > > > > 
> > > > > I come up with this commit because I observed something confusing
> > > > > when I was analyzing MST connectors' life cycle. Take the topology
> > > > > instance you mentioned below
> > > > > 
> > > > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected
> > > > > w/
> > > > > display)
> > > > >     |
> > > > > -
> > > > > > Output_Port 2 (Disconnected)
> > > > >     -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1
> > > > > (Disconnected)
> > > > > 
> > > > > -> Output_Port 2 (Disconnected) Which is exactly the topology of
> > > > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within
> > > > > this hub. With our MST imple

[PATCH 3/8] drm/etnaviv: stop abusing mmu_context as FE running marker

2021-08-20 Thread Lucas Stach

While the DMA frontend can only be active when the MMU context is set, the
reverse isn't necessarily true, as the frontend can be stopped while the
MMU state is kept. Stop treating mmu_context being set as a indication that
the frontend is running and instead add a explicit property.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 10 --
 drivers/gpu/drm/etnaviv/etnaviv_gpu.h |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index c1b9c5cbed11..325858cfc2c3 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -569,6 +569,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
/* We rely on the GPU running, so program the clock */
etnaviv_gpu_update_clock(gpu);
 
+   gpu->fe_running = false;
+
return 0;
 }
 
@@ -631,6 +633,8 @@ void etnaviv_gpu_start_fe(struct etnaviv_gpu *gpu, u32 
address, u16 prefetch)
  VIVS_MMUv2_SEC_COMMAND_CONTROL_ENABLE |
  VIVS_MMUv2_SEC_COMMAND_CONTROL_PREFETCH(prefetch));
}
+
+   gpu->fe_running = true;
 }
 
 static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu)
@@ -1364,7 +1368,7 @@ struct dma_fence *etnaviv_gpu_submit(struct 
etnaviv_gem_submit *submit)
goto out_unlock;
}
 
-   if (!gpu->mmu_context) {
+   if (!gpu->fe_running) {
gpu->mmu_context = 
etnaviv_iommu_context_get(submit->mmu_context);
etnaviv_gpu_start_fe_idleloop(gpu);
} else {
@@ -1573,7 +1577,7 @@ int etnaviv_gpu_wait_idle(struct etnaviv_gpu *gpu, 
unsigned int timeout_ms)
 
 static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu)
 {
-   if (gpu->initialized && gpu->mmu_context) {
+   if (gpu->initialized && gpu->fe_running) {
/* Replace the last WAIT with END */
mutex_lock(&gpu->lock);
etnaviv_buffer_end(gpu);
@@ -1588,6 +1592,8 @@ static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu)
 
etnaviv_iommu_context_put(gpu->mmu_context);
gpu->mmu_context = NULL;
+
+   gpu->fe_running = false;
}
 
gpu->exec_state = -1;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
index 8ea48697d132..1c75c8ed5bce 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h
@@ -101,6 +101,7 @@ struct etnaviv_gpu {
struct workqueue_struct *wq;
struct drm_gpu_scheduler sched;
bool initialized;
+   bool fe_running;
 
/* 'ring'-buffer: */
struct etnaviv_cmdbuf buffer;
-- 
2.30.2

[PATCH 4/8] drm/etnaviv: keep MMU context across runtime suspend/resume

2021-08-20 Thread Lucas Stach

The MMU state may be kept across a runtime suspend/resume cycle, as we
avoid a full hardware reset to keep the latency of the runtime PM small.

Don't pretend that the MMU state is lost in driver state. The MMU
context is pushed out when new HW jobs with a different context are
coming in. The only exception to this is when the GPU is unbound, in
which case we need to make sure to also free the last active context.

Cc: sta...@vger.kernel.org # 5.4
Reported-by: Michael Walle 
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 325858cfc2c3..973843c35fca 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1590,9 +1590,6 @@ static int etnaviv_gpu_hw_suspend(struct etnaviv_gpu *gpu)
 */
etnaviv_gpu_wait_idle(gpu, 100);
 
-   etnaviv_iommu_context_put(gpu->mmu_context);
-   gpu->mmu_context = NULL;
-
gpu->fe_running = false;
}
 
@@ -1741,6 +1738,9 @@ static void etnaviv_gpu_unbind(struct device *dev, struct 
device *master,
etnaviv_gpu_hw_suspend(gpu);
 #endif
 
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
+
if (gpu->initialized) {
etnaviv_cmdbuf_free(&gpu->buffer);
etnaviv_iommu_global_fini(gpu);
-- 
2.30.2

[PATCH 5/8] drm/etnaviv: exec and MMU state is lost when resetting the GPU

2021-08-20 Thread Lucas Stach

When the GPU is reset both the current exec state, as well as all MMU
state is lost. Move the driver side state tracking into the reset function
to keep hardware and software state from diverging.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 973843c35fca..9c710924df6b 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -570,6 +570,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
etnaviv_gpu_update_clock(gpu);
 
gpu->fe_running = false;
+   gpu->exec_state = -1;
+   gpu->mmu_context = NULL;
 
return 0;
 }
@@ -830,7 +832,6 @@ int etnaviv_gpu_init(struct etnaviv_gpu *gpu)
/* Now program the hardware */
mutex_lock(&gpu->lock);
etnaviv_gpu_hw_init(gpu);
-   gpu->exec_state = -1;
mutex_unlock(&gpu->lock);
 
pm_runtime_mark_last_busy(gpu->dev);
@@ -1055,8 +1056,6 @@ void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu)
spin_unlock(&gpu->event_spinlock);
 
etnaviv_gpu_hw_init(gpu);
-   gpu->exec_state = -1;
-   gpu->mmu_context = NULL;
 
mutex_unlock(&gpu->lock);
pm_runtime_mark_last_busy(gpu->dev);
-- 
2.30.2

[PATCH 6/8] drm/etnaviv: fix MMU context leak on GPU reset

2021-08-20 Thread Lucas Stach

After a reset the GPU is no longer using the MMU context and may be
restarted with a different context. While the mmu_state proeprly was
cleared, the context wasn't unreferenced, leading to a memory leak.

Cc: sta...@vger.kernel.org # 5.4
Reported-by: Michael Walle 
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 9c710924df6b..f420c4f14657 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -571,6 +571,8 @@ static int etnaviv_hw_reset(struct etnaviv_gpu *gpu)
 
gpu->fe_running = false;
gpu->exec_state = -1;
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
gpu->mmu_context = NULL;
 
return 0;
-- 
2.30.2

[PATCH 7/8] drm/etnaviv: reference MMU context when setting up hardware state

2021-08-20 Thread Lucas Stach

Move the refcount manipulation of the MMU context to the point where the
hardware state is programmed. At that point it is also known if a previous
MMU state is still there, or the state needs to be reprogrammed with a
potentially different context.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c  | 24 +++---
 drivers/gpu/drm/etnaviv/etnaviv_iommu.c|  4 
 drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c |  8 
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index f420c4f14657..1fa98ce870f7 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -641,17 +641,19 @@ void etnaviv_gpu_start_fe(struct etnaviv_gpu *gpu, u32 
address, u16 prefetch)
gpu->fe_running = true;
 }
 
-static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu)
+static void etnaviv_gpu_start_fe_idleloop(struct etnaviv_gpu *gpu,
+ struct etnaviv_iommu_context *context)
 {
-   u32 address = etnaviv_cmdbuf_get_va(&gpu->buffer,
-   &gpu->mmu_context->cmdbuf_mapping);
u16 prefetch;
+   u32 address;
 
/* setup the MMU */
-   etnaviv_iommu_restore(gpu, gpu->mmu_context);
+   etnaviv_iommu_restore(gpu, context);
 
/* Start command processor */
prefetch = etnaviv_buffer_init(gpu);
+   address = etnaviv_cmdbuf_get_va(&gpu->buffer,
+   &gpu->mmu_context->cmdbuf_mapping);
 
etnaviv_gpu_start_fe(gpu, address, prefetch);
 }
@@ -1369,14 +1371,12 @@ struct dma_fence *etnaviv_gpu_submit(struct 
etnaviv_gem_submit *submit)
goto out_unlock;
}
 
-   if (!gpu->fe_running) {
-   gpu->mmu_context = 
etnaviv_iommu_context_get(submit->mmu_context);
-   etnaviv_gpu_start_fe_idleloop(gpu);
-   } else {
-   if (submit->prev_mmu_context)
-   etnaviv_iommu_context_put(submit->prev_mmu_context);
-   submit->prev_mmu_context = 
etnaviv_iommu_context_get(gpu->mmu_context);
-   }
+   if (!gpu->fe_running)
+   etnaviv_gpu_start_fe_idleloop(gpu, submit->mmu_context);
+
+   if (submit->prev_mmu_context)
+   etnaviv_iommu_context_put(submit->prev_mmu_context);
+   submit->prev_mmu_context = etnaviv_iommu_context_get(gpu->mmu_context);
 
if (submit->nr_pmrs) {
gpu->event[event[1]].sync_point = 
&sync_point_perfmon_sample_pre;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_iommu.c
index 1a7c89a67bea..afe5dd6a9925 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_iommu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu.c
@@ -92,6 +92,10 @@ static void etnaviv_iommuv1_restore(struct etnaviv_gpu *gpu,
struct etnaviv_iommuv1_context *v1_context = to_v1_context(context);
u32 pgtable;
 
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
+   gpu->mmu_context = etnaviv_iommu_context_get(context);
+
/* set base addresses */
gpu_write(gpu, VIVS_MC_MEMORY_BASE_ADDR_RA, 
context->global->memory_base);
gpu_write(gpu, VIVS_MC_MEMORY_BASE_ADDR_FE, 
context->global->memory_base);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c 
b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
index f8bf488e9d71..d664ae29ae20 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
@@ -172,6 +172,10 @@ static void etnaviv_iommuv2_restore_nonsec(struct 
etnaviv_gpu *gpu,
if (gpu_read(gpu, VIVS_MMUv2_CONTROL) & VIVS_MMUv2_CONTROL_ENABLE)
return;
 
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
+   gpu->mmu_context = etnaviv_iommu_context_get(context);
+
prefetch = etnaviv_buffer_config_mmuv2(gpu,
(u32)v2_context->mtlb_dma,
(u32)context->global->bad_page_dma);
@@ -192,6 +196,10 @@ static void etnaviv_iommuv2_restore_sec(struct etnaviv_gpu 
*gpu,
if (gpu_read(gpu, VIVS_MMUv2_SEC_CONTROL) & 
VIVS_MMUv2_SEC_CONTROL_ENABLE)
return;
 
+   if (gpu->mmu_context)
+   etnaviv_iommu_context_put(gpu->mmu_context);
+   gpu->mmu_context = etnaviv_iommu_context_get(context);
+
gpu_write(gpu, VIVS_MMUv2_PTA_ADDRESS_LOW,
  lower_32_bits(context->global->v2.pta_dma));
gpu_write(gpu, VIVS_MMUv2_PTA_ADDRESS_HIGH,
-- 
2.30.2

[PATCH 8/8] drm/etnaviv: add missing MMU context put when reaping MMU mapping

2021-08-20 Thread Lucas Stach

When we forcefully evict a mapping from the the address space and thus the
MMU context, the MMU context is leaked, as the mapping no longer points to
it, so it doesn't get freed when the GEM object is destroyed. Add the
mssing context put to fix the leak.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
index dab1b58006d8..9fb1a2aadbcb 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
@@ -199,6 +199,7 @@ static int etnaviv_iommu_find_iova(struct 
etnaviv_iommu_context *context,
 */
list_for_each_entry_safe(m, n, &list, scan_node) {
etnaviv_iommu_remove_mapping(context, m);
+   etnaviv_iommu_context_put(m->context);
m->context = NULL;
list_del_init(&m->mmu_node);
list_del_init(&m->scan_node);
-- 
2.30.2

[PATCH 2/8] drm/etnaviv: put submit prev MMU context when it exists

2021-08-20 Thread Lucas Stach

The prev context is the MMU context at the time of the job
queueing in hardware. As a job might be queued multiple times
due to recovery after a GPU hang, we need to make sure to put
the stale prev MMU context from a prior queuing, to avoid the
reference and thus the MMU context leaking.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index c8b9b0cc4442..c1b9c5cbed11 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1368,6 +1368,8 @@ struct dma_fence *etnaviv_gpu_submit(struct 
etnaviv_gem_submit *submit)
gpu->mmu_context = 
etnaviv_iommu_context_get(submit->mmu_context);
etnaviv_gpu_start_fe_idleloop(gpu);
} else {
+   if (submit->prev_mmu_context)
+   etnaviv_iommu_context_put(submit->prev_mmu_context);
submit->prev_mmu_context = 
etnaviv_iommu_context_get(gpu->mmu_context);
}
 
-- 
2.30.2

[PATCH 1/8] drm/etnaviv: return context from etnaviv_iommu_context_get

2021-08-20 Thread Lucas Stach

Being able to have the refcount manipulation in an assignment makes
it much easier to parse the code.

Cc: sta...@vger.kernel.org # 5.4
Signed-off-by: Lucas Stach 
Tested-by: Michael Walle 
---
 drivers/gpu/drm/etnaviv/etnaviv_buffer.c | 3 +--
 drivers/gpu/drm/etnaviv/etnaviv_gem.c| 3 +--
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 3 +--
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c| 6 ++
 drivers/gpu/drm/etnaviv/etnaviv_mmu.h| 4 +++-
 5 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_buffer.c 
b/drivers/gpu/drm/etnaviv/etnaviv_buffer.c
index 76d38561c910..cf741c5c82d2 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_buffer.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_buffer.c
@@ -397,8 +397,7 @@ void etnaviv_buffer_queue(struct etnaviv_gpu *gpu, u32 
exec_state,
if (switch_mmu_context) {
struct etnaviv_iommu_context *old_context = 
gpu->mmu_context;
 
-   etnaviv_iommu_context_get(mmu_context);
-   gpu->mmu_context = mmu_context;
+   gpu->mmu_context = 
etnaviv_iommu_context_get(mmu_context);
etnaviv_iommu_context_put(old_context);
}
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index b8fa6ed3dd73..fb7a33b88fc0 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -303,8 +303,7 @@ struct etnaviv_vram_mapping *etnaviv_gem_mapping_get(
list_del(&mapping->obj_node);
}
 
-   etnaviv_iommu_context_get(mmu_context);
-   mapping->context = mmu_context;
+   mapping->context = etnaviv_iommu_context_get(mmu_context);
mapping->use = 1;
 
ret = etnaviv_iommu_map_gem(mmu_context, etnaviv_obj,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 4dd7d9d541c0..486259e154af 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -532,8 +532,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void 
*data,
goto err_submit_objects;
 
submit->ctx = file->driver_priv;
-   etnaviv_iommu_context_get(submit->ctx->mmu);
-   submit->mmu_context = submit->ctx->mmu;
+   submit->mmu_context = etnaviv_iommu_context_get(submit->ctx->mmu);
submit->exec_state = args->exec_state;
submit->flags = args->flags;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 4102bcea3341..c8b9b0cc4442 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1365,12 +1365,10 @@ struct dma_fence *etnaviv_gpu_submit(struct 
etnaviv_gem_submit *submit)
}
 
if (!gpu->mmu_context) {
-   etnaviv_iommu_context_get(submit->mmu_context);
-   gpu->mmu_context = submit->mmu_context;
+   gpu->mmu_context = 
etnaviv_iommu_context_get(submit->mmu_context);
etnaviv_gpu_start_fe_idleloop(gpu);
} else {
-   etnaviv_iommu_context_get(gpu->mmu_context);
-   submit->prev_mmu_context = gpu->mmu_context;
+   submit->prev_mmu_context = 
etnaviv_iommu_context_get(gpu->mmu_context);
}
 
if (submit->nr_pmrs) {
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.h 
b/drivers/gpu/drm/etnaviv/etnaviv_mmu.h
index d1d6902fd13b..e4a0b7d09c2e 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.h
@@ -105,9 +105,11 @@ void etnaviv_iommu_dump(struct etnaviv_iommu_context *ctx, 
void *buf);
 struct etnaviv_iommu_context *
 etnaviv_iommu_context_init(struct etnaviv_iommu_global *global,
   struct etnaviv_cmdbuf_suballoc *suballoc);
-static inline void etnaviv_iommu_context_get(struct etnaviv_iommu_context *ctx)
+static inline struct etnaviv_iommu_context *
+etnaviv_iommu_context_get(struct etnaviv_iommu_context *ctx)
 {
kref_get(&ctx->refcount);
+   return ctx;
 }
 void etnaviv_iommu_context_put(struct etnaviv_iommu_context *ctx);
 void etnaviv_iommu_restore(struct etnaviv_gpu *gpu,
-- 
2.30.2

Re: [git pull] drm fixes for 5.14-rc7

2021-08-20 Thread pr-tracker-bot

The pull request you sent on Fri, 20 Aug 2021 15:36:29 +1000:

> git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-08-20-3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8ba9fbe1e4b8a28050c283792344ee8b6bc3465c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

Re: refactor the i915 GVT support

2021-08-20 Thread Luis Chamberlain

On Fri, Aug 20, 2021 at 04:17:24PM +0200, Christoph Hellwig wrote:
> On Thu, Aug 19, 2021 at 04:29:29PM +0800, Zhenyu Wang wrote:
> > I'm working on below patch to resolve this. But I met a weird issue in
> > case when building i915 as module and also kvmgt module, it caused
> > busy wait on request_module("kvmgt") when boot, it doesn't happen if
> > building i915 into kernel. I'm not sure what could be the reason?
> 
> Luis, do you know if there is a problem with a request_module from
> a driver ->probe routine that is probably called by a module_init
> function itself?

Generally no, but you can easily foot yourself in the feet by creating
cross dependencies and not dealing with them properly. I'd make sure
to keep module initialization as simple as possible, and run whatever
takes more time asynchronously, then use a state machine to allow
you to verify where you are in the initialization phase or query it
or wait for a completion with a timeout.

It seems the code in question is getting some spring cleaning, and its
unclear where the code is I can inspect. If there's a tree somewhere I
can take a peak I'd be happy to review possible oddities that may stick
out.

My goto model for these sorts of problems is to abstract the issue
*outside* of the driver in question and implement new selftests to
try to reproduce. This serves two purposes, 1) helps with testing
2) may allow you to see the problem more clearly.

  Luis

Re: [PATCH 07/27] Revert "drm/i915/gt: Propagate change in error status to children on unhold"

2021-08-20 Thread Jason Ekstrand

On Thu, Aug 19, 2021 at 1:22 AM Matthew Brost  wrote:
>
> Propagating errors to dependent fences is wrong, don't do it. A selftest
> in the following exposed the propagating of an error to a dependent
> fence after an engine reset.

I feel like we could still have a bit of a better message.  Maybe
something like this:

Propagating errors to dependent fences is broken and can lead to
errors from one client ending up in another.  In 3761baae908a (Revert
"drm/i915: Propagate errors on awaiting already signaled fences"), we
attempted to get rid of fence error propagation but missed the case
added in 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status
to children on unhold").  Revert that one too.  This error was found
by an up-and-coming selftest which .

Otherwise, looks good to me.

--Jason

>
> This reverts commit 8e9f84cf5cac248a1c6a5daa4942879c8b765058.
>
> v2:
>  (Daniel Vetter)
>   - Use revert
>
> References: 3761baae908a (Revert "drm/i915: Propagate errors on awaiting 
> already signaled fences")
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de5f9c86b9a4..cafb0608ffb4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq)
> if (p->flags & I915_DEPENDENCY_WEAK)
> continue;
>
> -   /* Propagate any change in error status */
> -   if (rq->fence.error)
> -   i915_request_set_error_once(w, 
> rq->fence.error);
> -
> if (w->engine != rq->engine)
> continue;
>
> --
> 2.32.0
>

Re: [syzbot] WARNING in drm_gem_shmem_vm_open

2021-08-20 Thread Daniel Vetter

On Fri, Aug 20, 2021 at 9:23 PM Thomas Zimmermann  wrote:
> Hi
>
> Am 20.08.21 um 17:45 schrieb syzbot:
> > syzbot has bisected this issue to:
>
> Good bot!
>
> >
> > commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2
> > Author: Daniel Vetter 
> > Date:   Fri Oct 9 23:21:56 2020 +
> >
> >  drm/vkms: fbdev emulation support
>
> Here's a guess.
>
> GEM SHMEM + fbdev emulation requires that
> (drm_mode_config.prefer_shadow_fbdev = true). Otherwise, deferred I/O
> and SHMEM conflict over the use of page flags IIRC.

But we should only set up defio if fb->dirty is set, which vkms
doesn't do. So there's something else going on? So there must be
something else funny going on here I think ... No idea what's going on
really.
-Daniel

>  From a quick grep, vkms doesn't set prefer_shadow_fbdev and an alarming
> amount of SHMEM-based drivers don't do either.
>
> Best regards
> Thomas
>
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530
> > start commit:   614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern..
> > git tree:   upstream
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=96f0602203250753
> > dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30
> >
> > Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com
> > Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [syzbot] WARNING in drm_gem_shmem_vm_open

2021-08-20 Thread Thomas Zimmermann


Hi

Am 20.08.21 um 17:45 schrieb syzbot:

syzbot has bisected this issue to:


Good bot!



commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2
Author: Daniel Vetter 
Date:   Fri Oct 9 23:21:56 2020 +

 drm/vkms: fbdev emulation support


Here's a guess.

GEM SHMEM + fbdev emulation requires that 
(drm_mode_config.prefer_shadow_fbdev = true). Otherwise, deferred I/O 
and SHMEM conflict over the use of page flags IIRC.


From a quick grep, vkms doesn't set prefer_shadow_fbdev and an alarming 
amount of SHMEM-based drivers don't do either.


Best regards
Thomas



bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530
start commit:   614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern..
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530
console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530
kernel config:  https://syzkaller.appspot.com/x/.config?x=96f0602203250753
dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30

Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com
Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature

Re: [Intel-gfx] [PATCH 10/27] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered

2021-08-20 Thread Matthew Brost

On Fri, Aug 20, 2021 at 11:42:38AM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 8/18/2021 11:16 PM, Matthew Brost wrote:
> > When unblocking a context, do not enable scheduling if the context is
> > banned, guc_id invalid, or not registered.
> > 
> > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
> > Signed-off-by: Matthew Brost 
> > Cc: 
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index d61f906105ef..e53a4ef7d442 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1586,6 +1586,9 @@ static void guc_context_unblock(struct intel_context 
> > *ce)
> > spin_lock_irqsave(&ce->guc_state.lock, flags);
> > if (unlikely(submission_disabled(guc) ||
> > +intel_context_is_banned(ce) ||
> > +context_guc_id_invalid(ce) ||
> > +!lrc_desc_registered(guc, ce->guc_id) ||
> >  !intel_context_is_pinned(ce) ||
> >  context_pending_disable(ce) ||
> >  context_blocked(ce) > 1)) {
> 
> This is getting to a lot of conditions. Maybe we can simplify it a bit? E.g

Yea, this some defensive programming to cover all the basis if another
async operation (ban, reset, unpin) happens when this op is in flight.
Probably some of the conditions are not needed but being extra safe
here.

> it should be possible to check context_blocked, context_banned and
> context_pending_disable as a single op:
> 
> #define SCHED_STATE_MULTI_BLOCKED_MASK \  /* 2 or more blocks */
>     (SCHED_STATE_BLOCKED_MASK & ~SCHED_STATE_BLOCKED)
> #define SCHED_STATE_NO_UNBLOCK \
>     SCHED_STATE_MULTI_BLOCKED_MASK | \
>     SCHED_STATE_PENDING_DISABLE | \
>     SCHED_STATE_BANNED

Good idea, let me move this to helper in the next spin.

Matt

> 
> Not a blocker.
> 
> Reviewed-by: Daniele Ceraolo Spurio 
> 
> Daniele
> 
>

Re: [Intel-gfx] [PATCH 10/27] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered

2021-08-20 Thread Daniele Ceraolo Spurio





On 8/18/2021 11:16 PM, Matthew Brost wrote:

When unblocking a context, do not enable scheduling if the context is
banned, guc_id invalid, or not registered.

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Matthew Brost 
Cc: 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index d61f906105ef..e53a4ef7d442 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1586,6 +1586,9 @@ static void guc_context_unblock(struct intel_context *ce)
spin_lock_irqsave(&ce->guc_state.lock, flags);
  
  	if (unlikely(submission_disabled(guc) ||

+intel_context_is_banned(ce) ||
+context_guc_id_invalid(ce) ||
+!lrc_desc_registered(guc, ce->guc_id) ||
 !intel_context_is_pinned(ce) ||
 context_pending_disable(ce) ||
 context_blocked(ce) > 1)) {


This is getting to a lot of conditions. Maybe we can simplify it a bit? 
E.g it should be possible to check context_blocked, context_banned and 
context_pending_disable as a single op:


#define SCHED_STATE_MULTI_BLOCKED_MASK \  /* 2 or more blocks */
    (SCHED_STATE_BLOCKED_MASK & ~SCHED_STATE_BLOCKED)
#define SCHED_STATE_NO_UNBLOCK \
    SCHED_STATE_MULTI_BLOCKED_MASK | \
    SCHED_STATE_PENDING_DISABLE | \
    SCHED_STATE_BANNED

Not a blocker.

Reviewed-by: Daniele Ceraolo Spurio 

Daniele

Re: [Intel-gfx] [PATCH 09/27] drm/i915/guc: Kick tasklet after queuing a request

2021-08-20 Thread Matthew Brost

On Fri, Aug 20, 2021 at 11:31:56AM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 8/18/2021 11:16 PM, Matthew Brost wrote:
> > Kick tasklet after queuing a request so it submitted in a timely manner.
> > 
> > Fixes: 3a4cdf1982f0 ("drm/i915/guc: Implement GuC context operations for 
> > new inteface")
> 
> Is this actually a bug or just a performance issue? in the latter case I
> don't think we need a fixes tag.
> 

Basically the tasklet won't get queued in certain ituations until the
heartbeat ping. Didn't notice it as the tasklet is only used during flow
control or after a full GT reset which both are rather rare. We can
probably drop the fixes tag as GuC submission isn't on by default is
still works without this fix.

> > Signed-off-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 8f7a11e65ef5..d61f906105ef 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1050,6 +1050,7 @@ static inline void queue_request(struct 
> > i915_sched_engine *sched_engine,
> > list_add_tail(&rq->sched.link,
> >   i915_sched_lookup_priolist(sched_engine, prio));
> > set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +   tasklet_hi_schedule(&sched_engine->tasklet);
> 
> the caller of queue_request() already has a tasklet_hi_schedule in another
> branch of the if/else statement. Maybe we can have the caller own the kick
> to keep it in one place? Not a blocker.
>

I guess it could be:

bool kick = need_taklet()

if (kick)
queue_requst()
else
kick = bypass()
if (kick)
kick_tasklet()

Idk, it that is better? I'll think on this and decide before the next
post.

Matt

> Reviewed-by: Daniele Ceraolo Spurio 
> 
> Daniele
> 
> >   }
> >   static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>

Re: [Intel-gfx] [PATCH 09/27] drm/i915/guc: Kick tasklet after queuing a request

2021-08-20 Thread Daniele Ceraolo Spurio





On 8/18/2021 11:16 PM, Matthew Brost wrote:

Kick tasklet after queuing a request so it submitted in a timely manner.

Fixes: 3a4cdf1982f0 ("drm/i915/guc: Implement GuC context operations for new 
inteface")


Is this actually a bug or just a performance issue? in the latter case I 
don't think we need a fixes tag.



Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 8f7a11e65ef5..d61f906105ef 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1050,6 +1050,7 @@ static inline void queue_request(struct i915_sched_engine 
*sched_engine,
list_add_tail(&rq->sched.link,
  i915_sched_lookup_priolist(sched_engine, prio));
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+   tasklet_hi_schedule(&sched_engine->tasklet);


the caller of queue_request() already has a tasklet_hi_schedule in 
another branch of the if/else statement. Maybe we can have the caller 
own the kick to keep it in one place? Not a blocker.


Reviewed-by: Daniele Ceraolo Spurio 

Daniele


  }
  
  static int guc_bypass_tasklet_submit(struct intel_guc *guc,

Re: [PATCH v3 1/3] dt-bindings: Add YAML bindings for NVDEC

2021-08-20 Thread Rob Herring

On Wed, Aug 18, 2021 at 8:18 AM Thierry Reding  wrote:
>
> On Wed, Aug 18, 2021 at 11:24:28AM +0300, Mikko Perttunen wrote:
> > On 8/18/21 12:20 AM, Rob Herring wrote:
> > > On Wed, Aug 11, 2021 at 01:50:28PM +0300, Mikko Perttunen wrote:
> > > > Add YAML device tree bindings for NVDEC, now in a more appropriate
> > > > place compared to the old textual Host1x bindings.
> > > >
> > > > Signed-off-by: Mikko Perttunen 
> > > > ---
> > > > v3:
> > > > * Drop host1x bindings
> > > > * Change read2 to read-1 in interconnect names
> > > > v2:
> > > > * Fix issues pointed out in v1
> > > > * Add T194 nvidia,instance property
> > > > ---
> > > >   .../gpu/host1x/nvidia,tegra210-nvdec.yaml | 109 ++
> > > >   MAINTAINERS   |   1 +
> > > >   2 files changed, 110 insertions(+)
> > > >   create mode 100644 
> > > > Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> > > >
> > > > diff --git 
> > > > a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> > > >  
> > > > b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> > > > new file mode 100644
> > > > index ..571849625da8
> > > > --- /dev/null
> > > > +++ 
> > > > b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> > > > @@ -0,0 +1,109 @@
> > > > +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: 
> > > > "http://devicetree.org/schemas/gpu/host1x/nvidia,tegra210-nvdec.yaml#";
> > > > +$schema: "http://devicetree.org/meta-schemas/core.yaml#";
> > > > +
> > > > +title: Device tree binding for NVIDIA Tegra NVDEC
> > > > +
> > > > +description: |
> > > > +  NVDEC is the hardware video decoder present on NVIDIA Tegra210
> > > > +  and newer chips. It is located on the Host1x bus and typically
> > > > +  programmed through Host1x channels.
> > > > +
> > > > +maintainers:
> > > > +  - Thierry Reding 
> > > > +  - Mikko Perttunen 
> > > > +
> > > > +properties:
> > > > +  $nodename:
> > > > +pattern: "^nvdec@[0-9a-f]*$"
> > > > +
> > > > +  compatible:
> > > > +enum:
> > > > +  - nvidia,tegra210-nvdec
> > > > +  - nvidia,tegra186-nvdec
> > > > +  - nvidia,tegra194-nvdec
> > > > +
> > > > +  reg:
> > > > +maxItems: 1
> > > > +
> > > > +  clocks:
> > > > +maxItems: 1
> > > > +
> > > > +  clock-names:
> > > > +items:
> > > > +  - const: nvdec
> > > > +
> > > > +  resets:
> > > > +maxItems: 1
> > > > +
> > > > +  reset-names:
> > > > +items:
> > > > +  - const: nvdec
> > > > +
> > > > +  power-domains:
> > > > +maxItems: 1
> > > > +
> > > > +  iommus:
> > > > +maxItems: 1
> > > > +
> > > > +  interconnects:
> > > > +items:
> > > > +  - description: DMA read memory client
> > > > +  - description: DMA read 2 memory client
> > > > +  - description: DMA write memory client
> > > > +
> > > > +  interconnect-names:
> > > > +items:
> > > > +  - const: dma-mem
> > > > +  - const: read-1
> > > > +  - const: write
> > > > +
> > > > +required:
> > > > +  - compatible
> > > > +  - reg
> > > > +  - clocks
> > > > +  - clock-names
> > > > +  - resets
> > > > +  - reset-names
> > > > +  - power-domains
> > > > +
> > > > +if:
> > > > +  properties:
> > > > +compatible:
> > > > +  contains:
> > > > +const: nvidia,tegra194-host1x
> > >
> > > host1x? This will never be true as the schema is only applied to nodes
> > > with the nvdec compatible.
> >
> > Argh, it's a typo indeed. Should be nvidia,tegra194-nvdec.
> >
> > >
> > > > +then:
> > > > +  properties:
> > > > +nvidia,instance:
> > > > +  items:
> > > > +- description: 0 for NVDEC0, or 1 for NVDEC1
> > >
> > > What's this for? We generally don't do indices in DT.
> >
> > When programming the hardware through Host1x, we need to know the "class ID"
> > of the hardware, specific to each instance. So we need to know which
> > instance it is. Technically of course we could derive this from the MMIO
> > address but that seems more confusing.
> >
> > >
> > > > +
> > > > +additionalProperties: true
> > >
> > > This should be false or 'unevaluatedProperties: false'
> >
> > I tried that but it resulted in validation failures; please see the
> > discussion in v2.
>
> Rob mentioned that there is now support for unevaluatedProperties in
> dt-schema. I was able to test this, though with only limited success. I
> made the following changes on top of this patch:

Here's a branch that works with current jsonschema master:

https://github.com/robherring/dt-schema/tree/draft2020-12

> --- >8 ---
> diff --git 
> a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml 
> b/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> index d2681c98db7e..0bdf05fc8fc7 100644
> --- a/Documentation/devicetree/bindings/gpu/host1x/nvidia,tegra210-nvdec.yaml
> +++ b/Documentation/devicetree/bindings/gpu/host1x/nvidia,teg

Re: [PATCH] drm/i915: Actually delete gpu reloc selftests

2021-08-20 Thread Daniel Vetter

On Fri, Aug 20, 2021 at 7:00 PM Rodrigo Vivi  wrote:
>
> On Fri, Aug 20, 2021 at 05:49:32PM +0200, Daniel Vetter wrote:
> > In
> >
> > commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7
> > Author: Daniel Vetter 
> > Date:   Tue Aug 3 14:48:33 2021 +0200
> >
> > drm/i915: delete gpu reloc code
>
> it would be better with dim cite format...
>
> do we need the Fixes: tag?

I did delete the selftest, I just forgot to delete the code. So no
Fixes: imo. I'll bikeshed the commit citation.

> anyway:
>
> Reviewed-by: Rodrigo Vivi 

Thanks for the review, will merge when CI approves too, one never knows.
-Daniel

>
>
> >
> > I deleted the gpu relocation code and the selftest include and
> > enabling, but accidentally forgot about the selftest source code.
> >
> > Fix this oversight.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Jon Bloomfield 
> > Cc: Chris Wilson 
> > Cc: Maarten Lankhorst 
> > Cc: Daniel Vetter 
> > Cc: Joonas Lahtinen 
> > Cc: "Thomas Hellström" 
> > Cc: Matthew Auld 
> > Cc: Lionel Landwerlin 
> > Cc: Dave Airlie 
> > Cc: Jason Ekstrand 
> > ---
> >  .../i915/gem/selftests/i915_gem_execbuffer.c  | 190 --
> >  1 file changed, 190 deletions(-)
> >  delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> >
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c 
> > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> > deleted file mode 100644
> > index 16162fc2782d..
> > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> > +++ /dev/null
> > @@ -1,190 +0,0 @@
> > -// SPDX-License-Identifier: MIT
> > -/*
> > - * Copyright © 2020 Intel Corporation
> > - */
> > -
> > -#include "i915_selftest.h"
> > -
> > -#include "gt/intel_engine_pm.h"
> > -#include "selftests/igt_flush_test.h"
> > -
> > -static u64 read_reloc(const u32 *map, int x, const u64 mask)
> > -{
> > - u64 reloc;
> > -
> > - memcpy(&reloc, &map[x], sizeof(reloc));
> > - return reloc & mask;
> > -}
> > -
> > -static int __igt_gpu_reloc(struct i915_execbuffer *eb,
> > -struct drm_i915_gem_object *obj)
> > -{
> > - const unsigned int offsets[] = { 8, 3, 0 };
> > - const u64 mask =
> > - GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
> > - const u32 *map = page_mask_bits(obj->mm.mapping);
> > - struct i915_request *rq;
> > - struct i915_vma *vma;
> > - int err;
> > - int i;
> > -
> > - vma = i915_vma_instance(obj, eb->context->vm, NULL);
> > - if (IS_ERR(vma))
> > - return PTR_ERR(vma);
> > -
> > - err = i915_gem_object_lock(obj, &eb->ww);
> > - if (err)
> > - return err;
> > -
> > - err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH);
> > - if (err)
> > - return err;
> > -
> > - /* 8-Byte aligned */
> > - err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
> > - if (err <= 0)
> > - goto reloc_err;
> > -
> > - /* !8-Byte aligned */
> > - err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
> > - if (err <= 0)
> > - goto reloc_err;
> > -
> > - /* Skip to the end of the cmd page */
> > - i = PAGE_SIZE / sizeof(u32) - 1;
> > - i -= eb->reloc_cache.rq_size;
> > - memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size,
> > -  MI_NOOP, i);
> > - eb->reloc_cache.rq_size += i;
> > -
> > - /* Force next batch */
> > - err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
> > - if (err <= 0)
> > - goto reloc_err;
> > -
> > - GEM_BUG_ON(!eb->reloc_cache.rq);
> > - rq = i915_request_get(eb->reloc_cache.rq);
> > - reloc_gpu_flush(eb, &eb->reloc_cache);
> > - GEM_BUG_ON(eb->reloc_cache.rq);
> > -
> > - err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
> > - if (err) {
> > - intel_gt_set_wedged(eb->engine->gt);
> > - goto put_rq;
> > - }
> > -
> > - if (!i915_request_completed(rq)) {
> > - pr_err("%s: did not wait for relocations!\n", 
> > eb->engine->name);
> > - err = -EINVAL;
> > - goto put_rq;
> > - }
> > -
> > - for (i = 0; i < ARRAY_SIZE(offsets); i++) {
> > - u64 reloc = read_reloc(map, offsets[i], mask);
> > -
> > - if (reloc != i) {
> > - pr_err("%s[%d]: map[%d] %llx != %x\n",
> > -eb->engine->name, i, offsets[i], reloc, i);
> > - err = -EINVAL;
> > - }
> > - }
> > - if (err)
> > - igt_hexdump(map, 4096);
> > -
> > -put_rq:
> > - i915_request_put(rq);
> > -unpin_vma:
> > - i915_vma_unpin(vma);
> > - return err;
> > -
> > -reloc_err:
> > - if (!err)
> > - err = -EIO;
> > - goto unpin_vma;
> > -}
> > -
> > -static int igt_gpu_reloc(void *arg)
> > -{
> > - struct i915_execbuffer eb;
> > - struct d

[GIT PULL] drm-misc + drm-intel: Add support for out-of-band hotplug notification

2021-08-20 Thread Hans de Goede

Hello drm-misc and drm-intel maintainers,

My "Add support for out-of-band hotplug notification" patchset:
https://patchwork.freedesktop.org/series/93763/

Is ready for merging now, as discussed on IRC I based this series
on top drm-tip and when trying to apply the i915 parts on top
of drm-misc this fails due to conflict.

So as Jani suggested here is a pull-req for a topic-branch with the
entire set, minus the troublesome i915 bits. Once this has been merged
into both drm-misc-next and drm-intel-next I can push the 2 i915
patch do drm-intel-next on top of the merge.

Note there are also 2 drivers/usb/typec patches in here these
have Greg KH's Reviewed-by for merging through the drm tree,
Since this USB code does not change all that much. I also checked
and the drm-misc-next-2021-08-12 base of this tree contains the
same last commit to the modified file as usb-next.

Daniel Vetter mentioned on IRC that it might be better for you to simply
pick-up the series directly from patchwork, that is fine too in that
case don't forget to add:

Reviewed-by: Lyude Paul 

To the entire series (given in a reply to the cover-letter)

And:

Reviewed-by: Greg Kroah-Hartman 

To the usb/typec patches (patch 7/8), this was given in reply
to a previous posting of the series and I forgot to add this
in the resend.

Regards,

Hans


The following changes since commit c7782443a88926a4f938f0193041616328cf2db2:

  drm/bridge: ti-sn65dsi86: Avoid creating multiple connectors (2021-08-12 
09:56:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux.git 
drm-misc-intel-oob-hotplug-v1

for you to fetch changes up to 7f811394878535ed9a6849717de8c2959ae38899:

  usb: typec: altmodes/displayport: Notify drm subsys of hotplug events 
(2021-08-20 12:35:59 +0200)


Topic branch for drm-misc / drm-intel for OOB hotplug support for Type-C 
connectors


Hans de Goede (6):
  drm/connector: Give connector sysfs devices there own device_type
  drm/connector: Add a fwnode pointer to drm_connector and register with 
ACPI (v2)
  drm/connector: Add drm_connector_find_by_fwnode() function (v3)
  drm/connector: Add support for out-of-band hotplug notification (v3)
  usb: typec: altmodes/displayport: Make dp_altmode_notify() more generic
  usb: typec: altmodes/displayport: Notify drm subsys of hotplug events

 drivers/gpu/drm/drm_connector.c  | 79 +
 drivers/gpu/drm/drm_crtc_internal.h  |  2 +
 drivers/gpu/drm/drm_sysfs.c  | 87 +++-
 drivers/usb/typec/altmodes/Kconfig   |  1 +
 drivers/usb/typec/altmodes/displayport.c | 58 +
 include/drm/drm_connector.h  | 25 +
 6 files changed, 217 insertions(+), 35 deletions(-)

[GIT PULL] etnaviv-next for 5.15

2021-08-20 Thread Lucas Stach

Hi Dave, Daniel,

things are still slow in etnaviv land. Just one hardware support
addition for the GPU found on the NXP Layerscape LS1028A SoC from
Michael and the GEM mmap cleanup from Thomas.

Regards,
Lucas

The following changes since commit 8a02ea42bc1d4c448caf1bab0e05899dad503f74:

  Merge tag 'drm-intel-next-fixes-2021-06-29' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-next (2021-06-30 15:42:05 
+1000)

are available in the Git repository at:

  https://git.pengutronix.de/git/lst/linux 
81fd23e2b3ccf71c807e671444e8accaba98ca53

for you to fetch changes up to 81fd23e2b3ccf71c807e671444e8accaba98ca53:

  drm/etnaviv: Implement mmap as GEM object function (2021-07-06 18:32:23 +0200)


Michael Walle (2):
  drm/etnaviv: add HWDB entry for GC7000 r6202
  drm/etnaviv: add clock gating workaround for GC7000 r6202

Thomas Zimmermann (1):
  drm/etnaviv: Implement mmap as GEM object function

 drivers/gpu/drm/etnaviv/etnaviv_drv.c   | 14 ++
 drivers/gpu/drm/etnaviv/etnaviv_drv.h   |  3 ---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c   | 18 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 13 -
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c   |  6 ++
 drivers/gpu/drm/etnaviv/etnaviv_hwdb.c  | 31 
+++
 6 files changed, 44 insertions(+), 41 deletions(-)

Re: [PATCH v3] drm/i915/dp: Use max params for panels < eDP 1.4

2021-08-20 Thread Ville Syrjälä

On Fri, Aug 20, 2021 at 03:52:59PM +0800, Kai-Heng Feng wrote:
> Users reported that after commit 2bbd6dba84d4 ("drm/i915: Try to use
> fast+narrow link on eDP again and fall back to the old max strategy on
> failure"), the screen starts to have wobbly effect.
> 
> Commit a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for
> everything") doesn't help either, that means the affected eDP 1.2 panels
> only work with max params.
> 
> So use max params for panels < eDP 1.4 as Windows does to solve the
> issue.
> 
> v3:
>  - Do the eDP rev check in intel_edp_init_dpcd()
> 
> v2:
>  - Check eDP 1.4 instead of DPCD 1.1 to apply max params
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3714
> Fixes: 2bbd6dba84d4 ("drm/i915: Try to use fast+narrow link on eDP again and 
> fall back to the old max strategy on failure")
> Fixes: a5c936add6a2 ("drm/i915/dp: Use slow and wide link training for 
> everything")
> Suggested-by: Ville Syrjälä 
> Signed-off-by: Kai-Heng Feng 

Slapped a cc:stable on it and pushed to drm-intel-next. Thanks.

> ---
>  drivers/gpu/drm/i915/display/intel_dp.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 75d4ebc669411..e0dbd35ae7bc0 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -2445,11 +2445,14 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp)
>*/
>   if (drm_dp_dpcd_read(&intel_dp->aux, DP_EDP_DPCD_REV,
>intel_dp->edp_dpcd, sizeof(intel_dp->edp_dpcd)) ==
> -  sizeof(intel_dp->edp_dpcd))
> +  sizeof(intel_dp->edp_dpcd)) {
>   drm_dbg_kms(&dev_priv->drm, "eDP DPCD: %*ph\n",
>   (int)sizeof(intel_dp->edp_dpcd),
>   intel_dp->edp_dpcd);
>  
> + intel_dp->use_max_params = intel_dp->edp_dpcd[0] < DP_EDP_14;
> + }
> +
>   /*
>* This has to be called after intel_dp->edp_dpcd is filled, PSR checks
>* for SET_POWER_CAPABLE bit in intel_dp->edp_dpcd[1]
> -- 
> 2.32.0

-- 
Ville Syrjälä
Intel

[pull] amdgpu, amdkfd, radeon drm-next-5.15

2021-08-20 Thread Alex Deucher

Hi Dave, Daniel,

Updates for 5.15.  Mainly bug fixes and cleanups.

The following changes since commit 554594567b1fa3da74f88ec7b2dc83d000c58e98:

  drm/display: fix possible null-pointer dereference in dcn10_set_clock() 
(2021-08-11 17:19:54 -0400)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-5.15-2021-08-20

for you to fetch changes up to 90a9266269eb9f71af1f323c33e1dca53527bd22:

  drm/amdgpu: Cancel delayed work when GFXOFF is disabled (2021-08-20 12:09:44 
-0400)


amd-drm-next-5.15-2021-08-20:

amdgpu:
- embed hw fence into job
- Misc SMU fixes
- PSP TA code cleanup
- RAS fixes
- PWM fan speed fixes
- DC workqueue cleanups
- SR-IOV fixes
- gfxoff delayed work fix
- Pin domain check fix

amdkfd:
- SVM fixes

radeon:
- Code cleanup


Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.79

Aric Cyr (1):
  drm/amd/display: 3.2.149

Candice Li (3):
  drm/amd/amdgpu: consolidate PSP TA context
  drm/amd/amdgpu: remove unnecessary RAS context field
  drm/amd: consolidate TA shared memory structures

Christian König (1):
  drm/amdgpu: use the preferred pin domain after the check

Colin Ian King (1):
  drm/amd/pm: Fix spelling mistake "firwmare" -> "firmware"

Evan Quan (9):
  drm/amd/pm: correct the fan speed RPM setting
  drm/amd/pm: record the RPM and PWM based fan speed settings
  drm/amd/pm: correct the fan speed PWM retrieving
  drm/amd/pm: correct the fan speed RPM retrieving
  drm/amd/pm: drop the unnecessary intermediate percent-based transition
  drm/amd/pm: drop unnecessary manual mode check
  drm/amd/pm: correct the address of Arcturus fan related registers
  drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarily
  drm/amd/pm: a quick fix for "divided by zero" error

Hawking Zhang (1):
  drm/amdgpu: increase max xgmi physical node for aldebaran

Jack Zhang (1):
  drm/amd/amdgpu embed hw_fence into amdgpu_job

Jake Wang (1):
  drm/amd/display: Ensure DCN save after VM setup

Jiange Zhao (1):
  drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR 
notification.

Jonathan Kim (1):
  drm/amdgpu: get extended xgmi topology data

Kenneth Feng (2):
  Revert "drm/amd/pm: fix workload mismatch on vega10"
  drm/amd/pm: change the workload type for some cards

Kevin Wang (5):
  drm/amd/pm: correct DPM_XGMI/VCN_DPM feature name
  drm/amd/pm: skip to load smu microcode on sriov for aldebaran
  drm/amd/pm: change return value in aldebaran_get_power_limit()
  drm/amd/pm: change smu msg's attribute to allow working under sriov
  drm/amd/pm: change pp_dpm_sclk/mclk/fclk attribute is RO for aldebaran

Lukas Bulwahn (1):
  drm: amdgpu: remove obsolete reference to config CHASH

Michel Dänzer (1):
  drm/amdgpu: Cancel delayed work when GFXOFF is disabled

Nathan Chancellor (1):
  drm/radeon: Add break to switch statement in 
radeonfb_create_pinned_object()

Nicholas Kazlauskas (3):
  drm/amd/display: Fix multi-display support for idle opt workqueue
  drm/amd/display: Use vblank control events for PSR enable/disable
  drm/amd/display: Guard vblank wq flush with DCN guards

Wayne Lin (1):
  drm/amd/display: Create dc_sink when EDID fail

Yifan Zhang (1):
  drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure

YuBiao Wang (1):
  drm/amd/amdgpu:flush ttm delayed work before cancel_sync

Zhan Liu (1):
  drm/amd/display: Use DCN30 watermark calc for DCN301

Zhigang Luo (1):
  drm/amdgpu: correct MMSCH 1.0 version

 drivers/gpu/drm/Kconfig|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  28 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  86 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|  37 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.c|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  39 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h|   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.c  |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c   |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 432 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 111 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_rap.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c|   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h

Re: [PATCH] drm/i915: Actually delete gpu reloc selftests

2021-08-20 Thread Rodrigo Vivi

On Fri, Aug 20, 2021 at 05:49:32PM +0200, Daniel Vetter wrote:
> In
> 
> commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7
> Author: Daniel Vetter 
> Date:   Tue Aug 3 14:48:33 2021 +0200
> 
> drm/i915: delete gpu reloc code

it would be better with dim cite format...

do we need the Fixes: tag?

anyway:

Reviewed-by: Rodrigo Vivi 


> 
> I deleted the gpu relocation code and the selftest include and
> enabling, but accidentally forgot about the selftest source code.
> 
> Fix this oversight.
> 
> Signed-off-by: Daniel Vetter 
> Cc: Jon Bloomfield 
> Cc: Chris Wilson 
> Cc: Maarten Lankhorst 
> Cc: Daniel Vetter 
> Cc: Joonas Lahtinen 
> Cc: "Thomas Hellström" 
> Cc: Matthew Auld 
> Cc: Lionel Landwerlin 
> Cc: Dave Airlie 
> Cc: Jason Ekstrand 
> ---
>  .../i915/gem/selftests/i915_gem_execbuffer.c  | 190 --
>  1 file changed, 190 deletions(-)
>  delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> deleted file mode 100644
> index 16162fc2782d..
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
> +++ /dev/null
> @@ -1,190 +0,0 @@
> -// SPDX-License-Identifier: MIT
> -/*
> - * Copyright © 2020 Intel Corporation
> - */
> -
> -#include "i915_selftest.h"
> -
> -#include "gt/intel_engine_pm.h"
> -#include "selftests/igt_flush_test.h"
> -
> -static u64 read_reloc(const u32 *map, int x, const u64 mask)
> -{
> - u64 reloc;
> -
> - memcpy(&reloc, &map[x], sizeof(reloc));
> - return reloc & mask;
> -}
> -
> -static int __igt_gpu_reloc(struct i915_execbuffer *eb,
> -struct drm_i915_gem_object *obj)
> -{
> - const unsigned int offsets[] = { 8, 3, 0 };
> - const u64 mask =
> - GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
> - const u32 *map = page_mask_bits(obj->mm.mapping);
> - struct i915_request *rq;
> - struct i915_vma *vma;
> - int err;
> - int i;
> -
> - vma = i915_vma_instance(obj, eb->context->vm, NULL);
> - if (IS_ERR(vma))
> - return PTR_ERR(vma);
> -
> - err = i915_gem_object_lock(obj, &eb->ww);
> - if (err)
> - return err;
> -
> - err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH);
> - if (err)
> - return err;
> -
> - /* 8-Byte aligned */
> - err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
> - if (err <= 0)
> - goto reloc_err;
> -
> - /* !8-Byte aligned */
> - err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
> - if (err <= 0)
> - goto reloc_err;
> -
> - /* Skip to the end of the cmd page */
> - i = PAGE_SIZE / sizeof(u32) - 1;
> - i -= eb->reloc_cache.rq_size;
> - memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size,
> -  MI_NOOP, i);
> - eb->reloc_cache.rq_size += i;
> -
> - /* Force next batch */
> - err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
> - if (err <= 0)
> - goto reloc_err;
> -
> - GEM_BUG_ON(!eb->reloc_cache.rq);
> - rq = i915_request_get(eb->reloc_cache.rq);
> - reloc_gpu_flush(eb, &eb->reloc_cache);
> - GEM_BUG_ON(eb->reloc_cache.rq);
> -
> - err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
> - if (err) {
> - intel_gt_set_wedged(eb->engine->gt);
> - goto put_rq;
> - }
> -
> - if (!i915_request_completed(rq)) {
> - pr_err("%s: did not wait for relocations!\n", eb->engine->name);
> - err = -EINVAL;
> - goto put_rq;
> - }
> -
> - for (i = 0; i < ARRAY_SIZE(offsets); i++) {
> - u64 reloc = read_reloc(map, offsets[i], mask);
> -
> - if (reloc != i) {
> - pr_err("%s[%d]: map[%d] %llx != %x\n",
> -eb->engine->name, i, offsets[i], reloc, i);
> - err = -EINVAL;
> - }
> - }
> - if (err)
> - igt_hexdump(map, 4096);
> -
> -put_rq:
> - i915_request_put(rq);
> -unpin_vma:
> - i915_vma_unpin(vma);
> - return err;
> -
> -reloc_err:
> - if (!err)
> - err = -EIO;
> - goto unpin_vma;
> -}
> -
> -static int igt_gpu_reloc(void *arg)
> -{
> - struct i915_execbuffer eb;
> - struct drm_i915_gem_object *scratch;
> - int err = 0;
> - u32 *map;
> -
> - eb.i915 = arg;
> -
> - scratch = i915_gem_object_create_internal(eb.i915, 4096);
> - if (IS_ERR(scratch))
> - return PTR_ERR(scratch);
> -
> - map = i915_gem_object_pin_map_unlocked(scratch, I915_MAP_WC);
> - if (IS_ERR(map)) {
> - err = PTR_ERR(map);
> - goto err_scratch;
> - }
> -
> - intel_gt_pm_get(&eb.i915->gt);
> -
> - for_each_uabi_engine(eb.engine, eb.i915) {
> - if (intel_engine_requires_c

Re: [PATCH v2 0/8] drm/bridge: Make panel and bridge probe order consistent

2021-08-20 Thread Maxime Ripard

Hi Andrzej,

On Wed, Aug 04, 2021 at 04:09:38PM +0200, a.hajda wrote:
> Hi Maxime,
> 
> I have been busy with other tasks, and I did not follow the list last 
> time, so sorry for my late response.
> 
> On 28.07.2021 15:32, Maxime Ripard wrote:
> > Hi,
> > 
> > We've encountered an issue with the RaspberryPi DSI panel that prevented the
> > whole display driver from probing.
> > 
> > The issue is described in detail in the commit 7213246a803f ("drm/vc4: dsi:
> > Only register our component once a DSI device is attached"), but the basic 
> > idea
> > is that since the panel is probed through i2c, there's no synchronization
> > between its probe and the registration of the MIPI-DSI host it's attached 
> > to.
> > 
> > We initially moved the component framework registration to the MIPI-DSI Host
> > attach hook to make sure we register our component only when we have a DSI
> > device attached to our MIPI-DSI host, and then use lookup our DSI device in 
> > our
> > bind hook.
> > 
> > However, all the DSI bridges controlled through i2c are only registering 
> > their
> > associated DSI device in their bridge attach hook, meaning with our change
> 
> I guess this is incorrect. I have promoted several times the pattern 
> that device driver shouldn't expose its interfaces (for example 
> component_add, drm_panel_add, drm_bridge_add) until it gathers all 
> required dependencies. In this particular case bridges should defer 
> probe until DSI bus becomes available. I guess this way the patch you 
> reverts would work.
> 
> I advised few times this pattern in case of DSI hosts, apparently I 
> didn't notice the similar issue can appear in case of bridges. Or there 
> is something I have missed???
> 
> Anyway there are already eleven(?) bridge drivers using this pattern. I 
> wonder if fixing it would be difficult, or if it expose other issues???
> The patches should be quite straightforward - move 
> of_find_mipi_dsi_host_by_node and mipi_dsi_device_register_full to probe 
> time.

I gave this a try today, went back to the current upstream code and
found that indeed it works. I converted two bridges that works now. I'll
send a new version some time next week and will convert all the others
if we agree on the approach.

Thanks for the suggestion!

> Finally I think that if we will not fix these bridge drivers we will 
> encounter another set of issues with new platforms connecting "DSI host 
> drivers assuming this pattern" and "i2c/dsi device drivers assuming 
> pattern already present in the bridges".

Yeah, this is exactly the situation I'm in :)

Maxime


signature.asc
Description: PGP signature

Re: [PATCH] drm/i915/gt: Potential error pointer dereference in pinned_context()

2021-08-20 Thread Rodrigo Vivi

On Fri, Aug 13, 2021 at 04:01:06PM +0200, Thomas Hellström wrote:
> 
> On 8/13/21 1:36 PM, Dan Carpenter wrote:
> > If the intel_engine_create_pinned_context() function returns an error
> > pointer, then dereferencing "ce" will Oops.  Use "vm" instead of
> > "ce->vm".
> > 
> > Fixes: cf586021642d ("drm/i915/gt: Pipelined page migration")
> > Signed-off-by: Dan Carpenter 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
> > b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index d0a7c934fd3b..1dac21aa7e5c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -177,7 +177,7 @@ static struct intel_context *pinned_context(struct 
> > intel_gt *gt)
> > ce = intel_engine_create_pinned_context(engine, vm, SZ_512K,
> > I915_GEM_HWS_MIGRATE,
> > &key, "migrate");
> > -   i915_vm_put(ce->vm);
> > +   i915_vm_put(vm);
> > return ce;
> >   }
> 
> Thanks,
> 
> Reviewed-by: Thomas Hellström 

And pushed to drm-intel-gt-next, thanks for the patch and review.

> 
>

Re: [PATCH rdma-next v3 0/3] SG fix together with update to RDMA umem

2021-08-20 Thread Jason Gunthorpe

On Thu, Jul 29, 2021 at 12:39:10PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Changelog:
> v3:
>  * Rewrote to new API suggestion
>  * Split for more patches
> v2: https://lore.kernel.org/lkml/cover.1626605893.git.leo...@nvidia.com
>  * Changed implementation of first patch, based on our discussion with 
> Christoph.
>https://lore.kernel.org/lkml/ynwavtt0qmqdx...@infradead.org/
> v1: https://lore.kernel.org/lkml/cover.1624955710.git.leo...@nvidia.com/
>  * Fixed sg_page with a _dma_ API in the umem.c
> v0: https://lore.kernel.org/lkml/cover.1624361199.git.leo...@nvidia.com
> 
> 
> Maor Gottlieb (3):
>   lib/scatterlist: Provide a dedicated function to support table append
>   lib/scatterlist: Fix wrong update of orig_nents
>   RDMA: Use the sg_table directly and remove the opencoded version from
> umem

I'm going to send this into linux-next, last time that triggered some
bug reports.

But overall it looks okay, though some of the sg_append_table is bit
odd. Certainly using the sg_table throughout the RDMA code is big
improvement.

Lets see a v4, reviews/etc and I'll update it.

Jason

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Jiri Kosina

On Fri, 20 Aug 2021, Kees Cook wrote:

> > > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > > > > field bounds checking for memset(), avoid intentionally writing across
> > > > > neighboring fields.
> > > > >
> > > > > Add struct_group() to mark region of struct kone_mouse_event that 
> > > > > should
> > > > > be initialized to zero.
> > > > >
> > > > > Cc: Stefan Achatz 
> > > > > Cc: Jiri Kosina 
> > > > > Cc: Benjamin Tissoires 
> > > > > Cc: linux-in...@vger.kernel.org
> > > > > Signed-off-by: Kees Cook 
> > > >
> > > > Applied, thank you Kees.
> > > >
> > > 
> > > Eek! No, this will break the build: struct_group() is not yet in the tree.
> > > I can carry this with an Ack, etc.
> > 
> > I was pretty sure I saw struct_group() already in linux-next, but that was 
> > apparently a vacation-induced brainfart, sorry. Dropping.
> 
> Oh, for these two patches, can I add your Acked-by while I carry them?

Yes, thanks, and sorry for the noise.

-- 
Jiri Kosina
SUSE Labs

Re: [BUG - BISECTED] display not detected anymore

2021-08-20 Thread Heiko Carstens

On Thu, Aug 19, 2021 at 09:07:26PM +0300, Ville Syrjälä wrote:
> > ef79d62b5ce5 ("drm/i915: Encapsulate dbuf state handling harder")
> > 
> > With that commit the display is not detected anymore, one commit
> > before that it still works. So this one seems to be broken.
> > 
> > Ville, Stanislav, any idea how to fix this?
> > 
> > commit ef79d62b5ce53851901d6c1d21b74cbb9e27219b
> > Author: Ville Syrjälä 
> > Date:   Fri Jan 22 22:56:32 2021 +0200
> > 
> > drm/i915: Encapsulate dbuf state handling harder
> 
> That has nothing to do with display detection, so very mysterious.
> 
> Please file a bug at https://gitlab.freedesktop.org/drm/intel/issues/new
> boot with drm.debug=0xe with both good and bad kernels and attach the
> dmesg from both to the bug.

Everything (hopefully) provided here:
https://gitlab.freedesktop.org/drm/intel/-/issues/4013

Please let me know if you need more, or if I can help otherwise to
resolve this.

Re: [PATCH v2 02/14] drm/arm/hdlcd: Convert to Linux IRQ interfaces

2021-08-20 Thread Liviu Dudau

On Tue, Aug 03, 2021 at 11:06:52AM +0200, Thomas Zimmermann wrote:
> Drop the DRM IRQ midlayer in favor of Linux IRQ interfaces. DRM's
> IRQ helpers are mostly useful for UMS drivers. Modern KMS drivers
> don't benefit from using it.
> 
> DRM IRQ callbacks are now being called directly or inlined.
> 
> Calls to platform_get_irq() can fail with a negative errno code.
> Abort initialization in this case. The DRM IRQ midlayer does not
> handle this case correctly.
> 
> v2:
>   * name struct drm_device variables 'drm' (Sam)
> 
> Signed-off-by: Thomas Zimmermann 
> Acked-by: Sam Ravnborg 

Sorry for the delayed response due to holidays.

Acked-by: Liviu Dudau 

Best regards,
Liviu

> ---
>  drivers/gpu/drm/arm/hdlcd_drv.c | 174 ++--
>  drivers/gpu/drm/arm/hdlcd_drv.h |   1 +
>  2 files changed, 97 insertions(+), 78 deletions(-)
> 
> diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
> index 81ae92390736..479c2422a2e0 100644
> --- a/drivers/gpu/drm/arm/hdlcd_drv.c
> +++ b/drivers/gpu/drm/arm/hdlcd_drv.c
> @@ -29,7 +29,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -38,6 +37,94 @@
>  #include "hdlcd_drv.h"
>  #include "hdlcd_regs.h"
>  
> +static irqreturn_t hdlcd_irq(int irq, void *arg)
> +{
> + struct drm_device *drm = arg;
> + struct hdlcd_drm_private *hdlcd = drm->dev_private;
> + unsigned long irq_status;
> +
> + irq_status = hdlcd_read(hdlcd, HDLCD_REG_INT_STATUS);
> +
> +#ifdef CONFIG_DEBUG_FS
> + if (irq_status & HDLCD_INTERRUPT_UNDERRUN)
> + atomic_inc(&hdlcd->buffer_underrun_count);
> +
> + if (irq_status & HDLCD_INTERRUPT_DMA_END)
> + atomic_inc(&hdlcd->dma_end_count);
> +
> + if (irq_status & HDLCD_INTERRUPT_BUS_ERROR)
> + atomic_inc(&hdlcd->bus_error_count);
> +
> + if (irq_status & HDLCD_INTERRUPT_VSYNC)
> + atomic_inc(&hdlcd->vsync_count);
> +
> +#endif
> + if (irq_status & HDLCD_INTERRUPT_VSYNC)
> + drm_crtc_handle_vblank(&hdlcd->crtc);
> +
> + /* acknowledge interrupt(s) */
> + hdlcd_write(hdlcd, HDLCD_REG_INT_CLEAR, irq_status);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static void hdlcd_irq_preinstall(struct drm_device *drm)
> +{
> + struct hdlcd_drm_private *hdlcd = drm->dev_private;
> + /* Ensure interrupts are disabled */
> + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, 0);
> + hdlcd_write(hdlcd, HDLCD_REG_INT_CLEAR, ~0);
> +}
> +
> +static void hdlcd_irq_postinstall(struct drm_device *drm)
> +{
> +#ifdef CONFIG_DEBUG_FS
> + struct hdlcd_drm_private *hdlcd = drm->dev_private;
> + unsigned long irq_mask = hdlcd_read(hdlcd, HDLCD_REG_INT_MASK);
> +
> + /* enable debug interrupts */
> + irq_mask |= HDLCD_DEBUG_INT_MASK;
> +
> + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, irq_mask);
> +#endif
> +}
> +
> +static int hdlcd_irq_install(struct drm_device *drm, int irq)
> +{
> + int ret;
> +
> + if (irq == IRQ_NOTCONNECTED)
> + return -ENOTCONN;
> +
> + hdlcd_irq_preinstall(drm);
> +
> + ret = request_irq(irq, hdlcd_irq, 0, drm->driver->name, drm);
> + if (ret)
> + return ret;
> +
> + hdlcd_irq_postinstall(drm);
> +
> + return 0;
> +}
> +
> +static void hdlcd_irq_uninstall(struct drm_device *drm)
> +{
> + struct hdlcd_drm_private *hdlcd = drm->dev_private;
> + /* disable all the interrupts that we might have enabled */
> + unsigned long irq_mask = hdlcd_read(hdlcd, HDLCD_REG_INT_MASK);
> +
> +#ifdef CONFIG_DEBUG_FS
> + /* disable debug interrupts */
> + irq_mask &= ~HDLCD_DEBUG_INT_MASK;
> +#endif
> +
> + /* disable vsync interrupts */
> + irq_mask &= ~HDLCD_INTERRUPT_VSYNC;
> + hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, irq_mask);
> +
> + free_irq(hdlcd->irq, drm);
> +}
> +
>  static int hdlcd_load(struct drm_device *drm, unsigned long flags)
>  {
>   struct hdlcd_drm_private *hdlcd = drm->dev_private;
> @@ -90,7 +177,12 @@ static int hdlcd_load(struct drm_device *drm, unsigned 
> long flags)
>   goto setup_fail;
>   }
>  
> - ret = drm_irq_install(drm, platform_get_irq(pdev, 0));
> + ret = platform_get_irq(pdev, 0);
> + if (ret < 0)
> + goto irq_fail;
> + hdlcd->irq = ret;
> +
> + ret = hdlcd_irq_install(drm, hdlcd->irq);
>   if (ret < 0) {
>   DRM_ERROR("failed to install IRQ handler\n");
>   goto irq_fail;
> @@ -122,76 +214,6 @@ static void hdlcd_setup_mode_config(struct drm_device 
> *drm)
>   drm->mode_config.funcs = &hdlcd_mode_config_funcs;
>  }
>  
> -static irqreturn_t hdlcd_irq(int irq, void *arg)
> -{
> - struct drm_device *drm = arg;
> - struct hdlcd_drm_private *hdlcd = drm->dev_private;
> - unsigned long irq_status;
> -
> - irq_status = hdlcd_read(hdlcd, HDLCD_REG_INT_STATUS);
> -
> -#ifdef CONFIG_DEBUG_FS
> - if (irq_status & HDLCD_INTERRUPT_UNDERRUN)
> -

Re: [PATCH rdma-next v3 2/3] lib/scatterlist: Fix wrong update of orig_nents

2021-08-20 Thread Jason Gunthorpe

On Fri, Aug 20, 2021 at 12:54:25PM -0300, Jason Gunthorpe wrote:
> On Thu, Jul 29, 2021 at 12:39:12PM +0300, Leon Romanovsky wrote:
> 
> > +/**
> > + * __sg_free_table - Free a previously mapped sg table
> > + * @table: The sg table header to use
> > + * @max_ents:  The maximum number of entries per single scatterlist
> > + * @total_ents:The total number of entries in the table
> > + * @nents_first_chunk: Number of entries int the (preallocated) first
> > + * scatterlist chunk, 0 means no such preallocated
> > + * first chunk
> > + * @free_fn:   Free function
> > + *
> > + *  Description:
> > + *Free an sg table previously allocated and setup with
> > + *__sg_alloc_table().  The @max_ents value must be identical to
> > + *that previously used with __sg_alloc_table().
> > + *
> > + **/
> > +void __sg_free_table(struct sg_table *table, unsigned int max_ents,
> > +unsigned int nents_first_chunk, sg_free_fn *free_fn)
> > +{
> > +   sg_free_table_entries(table, max_ents, nents_first_chunk, free_fn,
> > + table->orig_nents);
> > +}
> >  EXPORT_SYMBOL(__sg_free_table);
> 
> This is getting a bit indirect, there is only one caller of
> __sg_free_table() in sg_pool.c, so may as well just export
> sg_free_table_entries have have it use that directly.

And further since sg_free_table_entries() doesn't actually use table->
except for the SGL it should probably be called sg_free_table_sgl()

Jason

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 05:27:35PM +0200, Jiri Kosina wrote:
> On Fri, 20 Aug 2021, Kees Cook wrote:
> 
> > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > > > field bounds checking for memset(), avoid intentionally writing across
> > > > neighboring fields.
> > > >
> > > > Add struct_group() to mark region of struct kone_mouse_event that should
> > > > be initialized to zero.
> > > >
> > > > Cc: Stefan Achatz 
> > > > Cc: Jiri Kosina 
> > > > Cc: Benjamin Tissoires 
> > > > Cc: linux-in...@vger.kernel.org
> > > > Signed-off-by: Kees Cook 
> > >
> > > Applied, thank you Kees.
> > >
> > 
> > Eek! No, this will break the build: struct_group() is not yet in the tree.
> > I can carry this with an Ack, etc.
> 
> I was pretty sure I saw struct_group() already in linux-next, but that was 
> apparently a vacation-induced brainfart, sorry. Dropping.

Oh, for these two patches, can I add your Acked-by while I carry them?

Thanks!

-- 
Kees Cook

Re: [PATCH v2 56/63] RDMA/mlx5: Use struct_group() to zero struct mlx5_ib_mr

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 09:34:00AM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 19, 2021 at 11:14:37AM -0700, Kees Cook wrote:
> 
> > Which do you mean? When doing the conversions I tended to opt for
> > struct_group() since it provides more robust "intentionality". Strictly
> > speaking, the new memset helpers are doing field-spanning writes, but the
> > "clear to the end" pattern was so common it made sense to add the helpers,
> > as they're a bit less disruptive. It's totally up to you! :)
> 
> Well, of the patches you cc'd to me only this one used the struct
> group..

Understood. I've adjusted this for v3. Thanks!

-- 
Kees Cook

Re: [PATCH v2 57/63] powerpc/signal32: Use struct_group() to zero spe regs

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 05:49:35PM +1000, Michael Ellerman wrote:
> Kees Cook  writes:
> > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > field bounds checking for memset(), avoid intentionally writing across
> > neighboring fields.
> >
> > Add a struct_group() for the spe registers so that memset() can correctly 
> > reason
> > about the size:
> >
> >In function 'fortify_memset_chk',
> >inlined from 'restore_user_regs.part.0' at 
> > arch/powerpc/kernel/signal_32.c:539:3:
> >>> include/linux/fortify-string.h:195:4: error: call to 
> >>> '__write_overflow_field' declared with attribute warning: detected write 
> >>> beyond size of field (1st parameter); maybe use struct_group()? 
> >>> [-Werror=attribute-warning]
> >  195 |__write_overflow_field();
> >  |^~~~
> >
> > Cc: Michael Ellerman 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Paul Mackerras 
> > Cc: Christophe Leroy 
> > Cc: Sudeep Holla 
> > Cc: linuxppc-...@lists.ozlabs.org
> > Reported-by: kernel test robot 
> > Signed-off-by: Kees Cook 
> > ---
> >  arch/powerpc/include/asm/processor.h | 6 --
> >  arch/powerpc/kernel/signal_32.c  | 6 +++---
> >  2 files changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/processor.h 
> > b/arch/powerpc/include/asm/processor.h
> > index f348e564f7dd..05dc567cb9a8 100644
> > --- a/arch/powerpc/include/asm/processor.h
> > +++ b/arch/powerpc/include/asm/processor.h
> > @@ -191,8 +191,10 @@ struct thread_struct {
> > int used_vsr;   /* set if process has used VSX */
> >  #endif /* CONFIG_VSX */
> >  #ifdef CONFIG_SPE
> > -   unsigned long   evr[32];/* upper 32-bits of SPE regs */
> > -   u64 acc;/* Accumulator */
> > +   struct_group(spe,
> > +   unsigned long   evr[32];/* upper 32-bits of SPE regs */
> > +   u64 acc;/* Accumulator */
> > +   );
> > unsigned long   spefscr;/* SPE & eFP status */
> > unsigned long   spefscr_last;   /* SPEFSCR value on last prctl
> >call or trap return */
> > diff --git a/arch/powerpc/kernel/signal_32.c 
> > b/arch/powerpc/kernel/signal_32.c
> > index 0608581967f0..77b86caf5c51 100644
> > --- a/arch/powerpc/kernel/signal_32.c
> > +++ b/arch/powerpc/kernel/signal_32.c
> > @@ -532,11 +532,11 @@ static long restore_user_regs(struct pt_regs *regs,
> > regs_set_return_msr(regs, regs->msr & ~MSR_SPE);
> > if (msr & MSR_SPE) {
> > /* restore spe registers from the stack */
> > -   unsafe_copy_from_user(current->thread.evr, &sr->mc_vregs,
> > - ELF_NEVRREG * sizeof(u32), failed);
> > +   unsafe_copy_from_user(¤t->thread.spe, &sr->mc_vregs,
> > + sizeof(current->thread.spe), failed);
> 
> This makes me nervous, because the ABI is that we copy ELF_NEVRREG *
> sizeof(u32) bytes, not whatever sizeof(current->thread.spe) happens to
> be.
> 
> ie. if we use sizeof an inadvertent change to the fields in
> thread_struct could change how many bytes we copy out to userspace,
> which would be an ABI break.
> 
> And that's not that hard to do, because it's not at all obvious that the
> size and layout of fields in thread_struct affects the user ABI.
> 
> At the same time we don't want to copy the right number of bytes but
> the wrong content, so from that point of view using sizeof is good :)
> 
> The way we handle it in ptrace is to have BUILD_BUG_ON()s to verify that
> things match up, so maybe we should do that here too.
> 
> ie. add:
> 
>   BUILD_BUG_ON(sizeof(current->thread.spe) == ELF_NEVRREG * sizeof(u32));
> 
> Not sure if you are happy doing that as part of this patch. I can always
> do it later if not.

Sounds good to me; I did that in a few other cases in the series where
the relationships between things seemed tenuous. :) I'll add this (as
!=) in v3.

Thanks!

-- 
Kees Cook

Re: [PATCH rdma-next v3 2/3] lib/scatterlist: Fix wrong update of orig_nents

2021-08-20 Thread Jason Gunthorpe

On Thu, Jul 29, 2021 at 12:39:12PM +0300, Leon Romanovsky wrote:

> +/**
> + * __sg_free_table - Free a previously mapped sg table
> + * @table:   The sg table header to use
> + * @max_ents:The maximum number of entries per single scatterlist
> + * @total_ents:  The total number of entries in the table
> + * @nents_first_chunk: Number of entries int the (preallocated) first
> + * scatterlist chunk, 0 means no such preallocated
> + * first chunk
> + * @free_fn: Free function
> + *
> + *  Description:
> + *Free an sg table previously allocated and setup with
> + *__sg_alloc_table().  The @max_ents value must be identical to
> + *that previously used with __sg_alloc_table().
> + *
> + **/
> +void __sg_free_table(struct sg_table *table, unsigned int max_ents,
> +  unsigned int nents_first_chunk, sg_free_fn *free_fn)
> +{
> + sg_free_table_entries(table, max_ents, nents_first_chunk, free_fn,
> +   table->orig_nents);
> +}
>  EXPORT_SYMBOL(__sg_free_table);

This is getting a bit indirect, there is only one caller of
__sg_free_table() in sg_pool.c, so may as well just export
sg_free_table_entries have have it use that directly.

Jason

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 05:27:35PM +0200, Jiri Kosina wrote:
> On Fri, 20 Aug 2021, Kees Cook wrote:
> 
> > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > > > field bounds checking for memset(), avoid intentionally writing across
> > > > neighboring fields.
> > > >
> > > > Add struct_group() to mark region of struct kone_mouse_event that should
> > > > be initialized to zero.
> > > >
> > > > Cc: Stefan Achatz 
> > > > Cc: Jiri Kosina 
> > > > Cc: Benjamin Tissoires 
> > > > Cc: linux-in...@vger.kernel.org
> > > > Signed-off-by: Kees Cook 
> > >
> > > Applied, thank you Kees.
> > >
> > 
> > Eek! No, this will break the build: struct_group() is not yet in the tree.
> > I can carry this with an Ack, etc.
> 
> I was pretty sure I saw struct_group() already in linux-next, but that was 
> apparently a vacation-induced brainfart, sorry. Dropping.

Cool, no worries. Sorry for the confusion!

-- 
Kees Cook

[PATCH] drm/i915: Actually delete gpu reloc selftests

2021-08-20 Thread Daniel Vetter

In

commit 8e02cceb1f1f4f254625e5338dd997ff61ab40d7
Author: Daniel Vetter 
Date:   Tue Aug 3 14:48:33 2021 +0200

drm/i915: delete gpu reloc code

I deleted the gpu relocation code and the selftest include and
enabling, but accidentally forgot about the selftest source code.

Fix this oversight.

Signed-off-by: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Chris Wilson 
Cc: Maarten Lankhorst 
Cc: Daniel Vetter 
Cc: Joonas Lahtinen 
Cc: "Thomas Hellström" 
Cc: Matthew Auld 
Cc: Lionel Landwerlin 
Cc: Dave Airlie 
Cc: Jason Ekstrand 
---
 .../i915/gem/selftests/i915_gem_execbuffer.c  | 190 --
 1 file changed, 190 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
deleted file mode 100644
index 16162fc2782d..
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ /dev/null
@@ -1,190 +0,0 @@
-// SPDX-License-Identifier: MIT
-/*
- * Copyright © 2020 Intel Corporation
- */
-
-#include "i915_selftest.h"
-
-#include "gt/intel_engine_pm.h"
-#include "selftests/igt_flush_test.h"
-
-static u64 read_reloc(const u32 *map, int x, const u64 mask)
-{
-   u64 reloc;
-
-   memcpy(&reloc, &map[x], sizeof(reloc));
-   return reloc & mask;
-}
-
-static int __igt_gpu_reloc(struct i915_execbuffer *eb,
-  struct drm_i915_gem_object *obj)
-{
-   const unsigned int offsets[] = { 8, 3, 0 };
-   const u64 mask =
-   GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
-   const u32 *map = page_mask_bits(obj->mm.mapping);
-   struct i915_request *rq;
-   struct i915_vma *vma;
-   int err;
-   int i;
-
-   vma = i915_vma_instance(obj, eb->context->vm, NULL);
-   if (IS_ERR(vma))
-   return PTR_ERR(vma);
-
-   err = i915_gem_object_lock(obj, &eb->ww);
-   if (err)
-   return err;
-
-   err = i915_vma_pin_ww(vma, &eb->ww, 0, 0, PIN_USER | PIN_HIGH);
-   if (err)
-   return err;
-
-   /* 8-Byte aligned */
-   err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
-   if (err <= 0)
-   goto reloc_err;
-
-   /* !8-Byte aligned */
-   err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
-   if (err <= 0)
-   goto reloc_err;
-
-   /* Skip to the end of the cmd page */
-   i = PAGE_SIZE / sizeof(u32) - 1;
-   i -= eb->reloc_cache.rq_size;
-   memset32(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size,
-MI_NOOP, i);
-   eb->reloc_cache.rq_size += i;
-
-   /* Force next batch */
-   err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
-   if (err <= 0)
-   goto reloc_err;
-
-   GEM_BUG_ON(!eb->reloc_cache.rq);
-   rq = i915_request_get(eb->reloc_cache.rq);
-   reloc_gpu_flush(eb, &eb->reloc_cache);
-   GEM_BUG_ON(eb->reloc_cache.rq);
-
-   err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
-   if (err) {
-   intel_gt_set_wedged(eb->engine->gt);
-   goto put_rq;
-   }
-
-   if (!i915_request_completed(rq)) {
-   pr_err("%s: did not wait for relocations!\n", eb->engine->name);
-   err = -EINVAL;
-   goto put_rq;
-   }
-
-   for (i = 0; i < ARRAY_SIZE(offsets); i++) {
-   u64 reloc = read_reloc(map, offsets[i], mask);
-
-   if (reloc != i) {
-   pr_err("%s[%d]: map[%d] %llx != %x\n",
-  eb->engine->name, i, offsets[i], reloc, i);
-   err = -EINVAL;
-   }
-   }
-   if (err)
-   igt_hexdump(map, 4096);
-
-put_rq:
-   i915_request_put(rq);
-unpin_vma:
-   i915_vma_unpin(vma);
-   return err;
-
-reloc_err:
-   if (!err)
-   err = -EIO;
-   goto unpin_vma;
-}
-
-static int igt_gpu_reloc(void *arg)
-{
-   struct i915_execbuffer eb;
-   struct drm_i915_gem_object *scratch;
-   int err = 0;
-   u32 *map;
-
-   eb.i915 = arg;
-
-   scratch = i915_gem_object_create_internal(eb.i915, 4096);
-   if (IS_ERR(scratch))
-   return PTR_ERR(scratch);
-
-   map = i915_gem_object_pin_map_unlocked(scratch, I915_MAP_WC);
-   if (IS_ERR(map)) {
-   err = PTR_ERR(map);
-   goto err_scratch;
-   }
-
-   intel_gt_pm_get(&eb.i915->gt);
-
-   for_each_uabi_engine(eb.engine, eb.i915) {
-   if (intel_engine_requires_cmd_parser(eb.engine) ||
-   intel_engine_using_cmd_parser(eb.engine))
-   continue;
-
-   reloc_cache_init(&eb.reloc_cache, eb.i915);
-   memset(map, POISON_INUSE, 4096);
-
-   intel_engine_pm_get(eb.engine);
-   eb.context = intel_context_create(e

Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021 at 03:01:43PM +0200, Jiri Kosina wrote:
> On Tue, 17 Aug 2021, Kees Cook wrote:
> 
> > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > field bounds checking for memcpy(), memmove(), and memset(), avoid
> > intentionally writing across neighboring fields.
> > 
> > Use struct_group() in struct cp2112_string_report around members report,
> > length, type, and string, so they can be referenced together. This will
> > allow memcpy() and sizeof() to more easily reason about sizes, improve
> > readability, and avoid future warnings about writing beyond the end of
> > report.
> > 
> > "pahole" shows no size nor member offset changes to struct
> > cp2112_string_report.  "objdump -d" shows no meaningful object
> > code changes (i.e. only source line number induced differences.)
> > 
> > Cc: Jiri Kosina 
> > Cc: Benjamin Tissoires 
> > Cc: linux-in...@vger.kernel.org
> > Signed-off-by: Kees Cook 
> 
> Applied, thanks.

I'm not sure if my other HTML email got through, but please don't apply
these to separate trees -- struct_group() is introduced as part of this
series.

-- 
Kees Cook

[PATCH V3 4/9] PCI/VGA: Remove empty vga_arb_device_card_gone()

2021-08-20 Thread Huacai Chen

From: Bjorn Helgaas 

vga_arb_device_card_gone() has always been empty.  Remove it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index e4153ab70481..c984c76b3fd7 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -104,8 +104,6 @@ static int vga_str_to_iostate(char *buf, int str_size, int 
*io_state)
 /* this is only used a cookie - it should not be dereferenced */
 static struct pci_dev *vga_default;
 
-static void vga_arb_device_card_gone(struct pci_dev *pdev);
-
 /* Find somebody in our list */
 static struct vga_device *vgadev_find(struct pci_dev *pdev)
 {
@@ -741,10 +739,6 @@ static bool vga_arbiter_del_pci_device(struct pci_dev 
*pdev)
/* Remove entry from list */
list_del(&vgadev->list);
vga_count--;
-   /* Notify userland driver that the device is gone so it discards
-* it's copies of the pci_dev pointer
-*/
-   vga_arb_device_card_gone(pdev);
 
/* Wake up all possible waiters */
wake_up_all(&vga_wait_queue);
@@ -994,9 +988,7 @@ static ssize_t vga_arb_read(struct file *file, char __user 
*buf,
if (lbuf == NULL)
return -ENOMEM;
 
-   /* Shields against vga_arb_device_card_gone (pci_dev going
-* away), and allows access to vga list
-*/
+   /* Protects vga_list */
spin_lock_irqsave(&vga_lock, flags);
 
/* If we are targeting the default, use it */
@@ -1013,8 +1005,6 @@ static ssize_t vga_arb_read(struct file *file, char 
__user *buf,
/* Wow, it's not in the list, that shouldn't happen,
 * let's fix us up and return invalid card
 */
-   if (pdev == priv->target)
-   vga_arb_device_card_gone(pdev);
spin_unlock_irqrestore(&vga_lock, flags);
len = sprintf(lbuf, "invalid");
goto done;
@@ -1358,10 +1348,6 @@ static int vga_arb_release(struct inode *inode, struct 
file *file)
return 0;
 }
 
-static void vga_arb_device_card_gone(struct pci_dev *pdev)
-{
-}
-
 /*
  * callback any registered clients to let them know we have a
  * change in VGA cards
-- 
2.27.0

[PATCH V3 3/9] PCI/VGA: Use unsigned format string to print lock counts

2021-08-20 Thread Huacai Chen

From: Bjorn Helgaas 

In struct vga_device, io_lock_cnt and mem_lock_cnt are unsigned, but we
previously printed them with "%d", the signed decimal format.  Print them
with the unsigned format "%u" instead.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 61b57abcb014..e4153ab70481 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -1022,7 +1022,7 @@ static ssize_t vga_arb_read(struct file *file, char 
__user *buf,
 
/* Fill the buffer with infos */
len = snprintf(lbuf, 1024,
-  "count:%d,PCI:%s,decodes=%s,owns=%s,locks=%s(%d:%d)\n",
+  "count:%d,PCI:%s,decodes=%s,owns=%s,locks=%s(%u:%u)\n",
   vga_decode_count, pci_name(pdev),
   vga_iostate_to_str(vgadev->decodes),
   vga_iostate_to_str(vgadev->owns),
-- 
2.27.0

[PATCH V3 6/9] PCI/VGA: Prefer vga_default_device()

2021-08-20 Thread Huacai Chen

Use the vga_default_device() interface consistently instead of directly
testing vga_default.  No functional change intended.

[bhelgaas: split to separate patch and extended]
Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 1f8fb37be5fa..a6a5864ff538 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -173,7 +173,7 @@ int vga_remove_vgacon(struct pci_dev *pdev)
 {
int ret = 0;
 
-   if (pdev != vga_default)
+   if (pdev != vga_default_device())
return 0;
vgaarb_info(&pdev->dev, "deactivate vga console\n");
 
@@ -707,7 +707,7 @@ static bool vga_arbiter_add_pci_device(struct pci_dev *pdev)
/* Deal with VGA default device. Use first enabled one
 * by default if arch doesn't have it's own hook
 */
-   if (vga_default == NULL &&
+   if (!vga_default_device() &&
((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) {
vgaarb_info(&pdev->dev, "setting as boot VGA device\n");
vga_set_default_device(pdev);
@@ -744,7 +744,7 @@ static bool vga_arbiter_del_pci_device(struct pci_dev *pdev)
goto bail;
}
 
-   if (vga_default == pdev)
+   if (vga_default_device() == pdev)
vga_set_default_device(NULL);
 
if (vgadev->decodes & (VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM))
-- 
2.27.0

[PATCH V3 1/9] PCI/VGA: Move vgaarb to drivers/pci

2021-08-20 Thread Huacai Chen

From: Bjorn Helgaas 

The VGA arbiter is really PCI-specific and doesn't depend on any GPU
things.  Move it to the PCI subsystem.

Signed-off-by: Bjorn Helgaas 
---
 drivers/gpu/vga/Kconfig   | 19 ---
 drivers/gpu/vga/Makefile  |  1 -
 drivers/pci/Kconfig   | 19 +++
 drivers/pci/Makefile  |  1 +
 drivers/{gpu/vga => pci}/vgaarb.c |  0
 5 files changed, 20 insertions(+), 20 deletions(-)
 rename drivers/{gpu/vga => pci}/vgaarb.c (100%)

diff --git a/drivers/gpu/vga/Kconfig b/drivers/gpu/vga/Kconfig
index 1ad4c4ef0b5e..eb8b14ab22c3 100644
--- a/drivers/gpu/vga/Kconfig
+++ b/drivers/gpu/vga/Kconfig
@@ -1,23 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-config VGA_ARB
-   bool "VGA Arbitration" if EXPERT
-   default y
-   depends on (PCI && !S390)
-   help
- Some "legacy" VGA devices implemented on PCI typically have the same
- hard-decoded addresses as they did on ISA. When multiple PCI devices
- are accessed at same time they need some kind of coordination. Please
- see Documentation/gpu/vgaarbiter.rst for more details. Select this to
- enable VGA arbiter.
-
-config VGA_ARB_MAX_GPUS
-   int "Maximum number of GPUs"
-   default 16
-   depends on VGA_ARB
-   help
- Reserves space in the kernel to maintain resource locking for
- multiple GPUS.  The overhead for each GPU is very small.
-
 config VGA_SWITCHEROO
bool "Laptop Hybrid Graphics - GPU switching support"
depends on X86
diff --git a/drivers/gpu/vga/Makefile b/drivers/gpu/vga/Makefile
index e92064442d60..9800620deda3 100644
--- a/drivers/gpu/vga/Makefile
+++ b/drivers/gpu/vga/Makefile
@@ -1,3 +1,2 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_VGA_ARB)  += vgaarb.o
 obj-$(CONFIG_VGA_SWITCHEROO) += vga_switcheroo.o
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 0c473d75e625..7c9e56d7b857 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -252,6 +252,25 @@ config PCIE_BUS_PEER2PEER
 
 endchoice
 
+config VGA_ARB
+   bool "VGA Arbitration" if EXPERT
+   default y
+   depends on (PCI && !S390)
+   help
+ Some "legacy" VGA devices implemented on PCI typically have the same
+ hard-decoded addresses as they did on ISA. When multiple PCI devices
+ are accessed at same time they need some kind of coordination. Please
+ see Documentation/gpu/vgaarbiter.rst for more details. Select this to
+ enable VGA arbiter.
+
+config VGA_ARB_MAX_GPUS
+   int "Maximum number of GPUs"
+   default 16
+   depends on VGA_ARB
+   help
+ Reserves space in the kernel to maintain resource locking for
+ multiple GPUS.  The overhead for each GPU is very small.
+
 source "drivers/pci/hotplug/Kconfig"
 source "drivers/pci/controller/Kconfig"
 source "drivers/pci/endpoint/Kconfig"
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index d62c4ac4ae1b..ebe720f69b15 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_PCI_PF_STUB) += pci-pf-stub.o
 obj-$(CONFIG_PCI_ECAM) += ecam.o
 obj-$(CONFIG_PCI_P2PDMA)   += p2pdma.o
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
+obj-$(CONFIG_VGA_ARB)  += vgaarb.o
 
 # Endpoint library must be initialized before its users
 obj-$(CONFIG_PCI_ENDPOINT) += endpoint/
diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/pci/vgaarb.c
similarity index 100%
rename from drivers/gpu/vga/vgaarb.c
rename to drivers/pci/vgaarb.c
-- 
2.27.0

[PATCH V3 0/9] PCI/VGA: Rework default VGA device selection

2021-08-20 Thread Huacai Chen

My original work is at [1].

Bjorn do some rework and extension in V2. It moves the VGA arbiter to
the PCI subsystem, fixes a few nits, and breaks a few pieces to make
the main patch a little smaller.

V3 rewrite the commit log of the last patch (which is also summarized
by Bjorn).

All comments welcome!

[1] 
https://lore.kernel.org/dri-devel/20210705100503.1120643-1-chenhua...@loongson.cn/

Bjorn Helgaas (4):
  PCI/VGA: Move vgaarb to drivers/pci
  PCI/VGA: Replace full MIT license text with SPDX identifier
  PCI/VGA: Use unsigned format string to print lock counts
  PCI/VGA: Remove empty vga_arb_device_card_gone()

Huacai Chen (5):
  PCI/VGA: Move vga_arb_integrated_gpu() earlier in file
  PCI/VGA: Prefer vga_default_device()
  PCI/VGA: Split out vga_arb_update_default_device()
  PCI/VGA: Log bridge control messages when adding devices
  PCI/VGA: Rework default VGA device selection

Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas  
---
 drivers/gpu/vga/Kconfig   |  19 ---
 drivers/gpu/vga/Makefile  |   1 -
 drivers/pci/Kconfig   |  19 +++
 drivers/pci/Makefile  |   1 +
 drivers/{gpu/vga => pci}/vgaarb.c | 269 --
 5 files changed, 126 insertions(+), 183 deletions(-)
 rename drivers/{gpu/vga => pci}/vgaarb.c (90%)
--
2.27.0

[PATCH V3 7/9] PCI/VGA: Split out vga_arb_update_default_device()

2021-08-20 Thread Huacai Chen

If there's no default VGA device, and we find a VGA device that owns the
legacy VGA resources, we make that device the default.  Split this logic
out from vga_arbiter_add_pci_device() into a new function,
vga_arb_update_default_device().

[bhelgaas: split another piece to separate patch]
Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index a6a5864ff538..4cecb599f5ed 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -577,6 +577,21 @@ static bool vga_arb_integrated_gpu(struct device *dev)
 }
 #endif
 
+static void vga_arb_update_default_device(struct vga_device *vgadev)
+{
+   struct pci_dev *pdev = vgadev->pdev;
+
+   /*
+* If we don't have a default VGA device yet, and this device owns
+* the legacy VGA resources, make it the default.
+*/
+   if (!vga_default_device() &&
+   ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) {
+   vgaarb_info(&pdev->dev, "setting as boot VGA device\n");
+   vga_set_default_device(pdev);
+   }
+}
+
 /*
  * Rules for using a bridge to control a VGA descendant decoding: if a bridge
  * has only one VGA descendant then it can be used to control the VGA routing
@@ -704,15 +719,7 @@ static bool vga_arbiter_add_pci_device(struct pci_dev 
*pdev)
bus = bus->parent;
}
 
-   /* Deal with VGA default device. Use first enabled one
-* by default if arch doesn't have it's own hook
-*/
-   if (!vga_default_device() &&
-   ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) {
-   vgaarb_info(&pdev->dev, "setting as boot VGA device\n");
-   vga_set_default_device(pdev);
-   }
-
+   vga_arb_update_default_device(vgadev);
vga_arbiter_check_bridge_sharing(vgadev);
 
/* Add to the list */
-- 
2.27.0

[PATCH V3 8/9] PCI/VGA: Log bridge control messages when adding devices

2021-08-20 Thread Huacai Chen

Previously vga_arb_device_init() iterated through all VGA devices and
indicated whether legacy VGA routing to each could be controlled by an
upstream bridge.

But we determine that information in vga_arbiter_add_pci_device(), which we
call for every device, so we can log it there without iterating through the
VGA devices again.

Note that we call vga_arbiter_check_bridge_sharing() before adding the
device to vga_list, so we have to handle the very first device separately.

[bhelgaas: commit log, split another piece to separate patch, fix
list_empty() issue]
Link: https://lore.kernel.org/r/20210705100503.1120643-1-chenhua...@loongson.cn
Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 4cecb599f5ed..dd07b1c3205f 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -609,8 +609,10 @@ static void vga_arbiter_check_bridge_sharing(struct 
vga_device *vgadev)
 
vgadev->bridge_has_one_vga = true;
 
-   if (list_empty(&vga_list))
+   if (list_empty(&vga_list)) {
+   vgaarb_info(&vgadev->pdev->dev, "bridge control possible\n");
return;
+   }
 
/* okay iterate the new devices bridge hierarachy */
new_bus = vgadev->pdev->bus;
@@ -649,6 +651,11 @@ static void vga_arbiter_check_bridge_sharing(struct 
vga_device *vgadev)
}
new_bus = new_bus->parent;
}
+
+   if (vgadev->bridge_has_one_vga)
+   vgaarb_info(&vgadev->pdev->dev, "bridge control possible\n");
+   else
+   vgaarb_info(&vgadev->pdev->dev, "no bridge control possible\n");
 }
 
 /*
@@ -1527,7 +1534,6 @@ static int __init vga_arb_device_init(void)
 {
int rc;
struct pci_dev *pdev;
-   struct vga_device *vgadev;
 
rc = misc_register(&vga_arb_device);
if (rc < 0)
@@ -1543,15 +1549,6 @@ static int __init vga_arb_device_init(void)
   PCI_ANY_ID, pdev)) != NULL)
vga_arbiter_add_pci_device(pdev);
 
-   list_for_each_entry(vgadev, &vga_list, list) {
-   struct device *dev = &vgadev->pdev->dev;
-
-   if (vgadev->bridge_has_one_vga)
-   vgaarb_info(dev, "bridge control possible\n");
-   else
-   vgaarb_info(dev, "no bridge control possible\n");
-   }
-
vga_arb_select_default_device();
 
pr_info("loaded\n");
-- 
2.27.0

Re: [syzbot] WARNING in drm_gem_shmem_vm_open

2021-08-20 Thread syzbot

syzbot has bisected this issue to:

commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2
Author: Daniel Vetter 
Date:   Fri Oct 9 23:21:56 2020 +

drm/vkms: fbdev emulation support

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=11c31d5530
start commit:   614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern..
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d5530
console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d5530
kernel config:  https://syzkaller.appspot.com/x/.config?x=96f0602203250753
dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=122bce0e30

Reported-by: syzbot+91525b2bd4b5dff71...@syzkaller.appspotmail.com
Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

[PATCH V3 9/9] PCI/VGA: Rework default VGA device selection

2021-08-20 Thread Huacai Chen

Current default VGA device selection fails in some cases:

  - On BMC system, the AST2500 bridge [1a03:1150] does not implement
PCI_BRIDGE_CTL_VGA [1].  This is perfectly legal but means the
legacy VGA resources won't reach downstream devices unless they're
included in the usual bridge windows.

  - vga_arb_select_default_device() will set a device below such a
bridge as the default VGA device as long as it has PCI_COMMAND_IO
and PCI_COMMAND_MEMORY enabled.

  - vga_arbiter_add_pci_device() is called for every VGA device,
either at boot-time or at hot-add time, and it will also set the
device as the default VGA device, but ONLY if all bridges leading
to it implement PCI_BRIDGE_CTL_VGA.

  - This difference between vga_arb_select_default_device() and
vga_arbiter_add_pci_device() means that a device below an AST2500
or similar bridge can only be set as the default if it is
enumerated before vga_arb_device_init().

  - On ACPI-based systems, PCI devices are enumerated by acpi_init(),
which runs before vga_arb_device_init().

  - On non-ACPI systems, like on MIPS system, they are enumerated by
pcibios_init(), which typically runs *after*
vga_arb_device_init().

So I made vga_arb_update_default_device() to replace the current vga_
arb_select_default_device(), which will be call from vga_arbiter_add_
pci_device(), set the default device even if it does not own the VGA
resources because an upstream bridge doesn't implement PCI_BRIDGE_CTL_
VGA. And the default VGA device is updated if a better one is found
(device with legacy resources enabled is better, device owns the
firmware framebuffer is even better).
---
 drivers/pci/vgaarb.c | 158 ++-
 1 file changed, 66 insertions(+), 92 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index dd07b1c3205f..0b059a2fc749 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -580,16 +580,79 @@ static bool vga_arb_integrated_gpu(struct device *dev)
 static void vga_arb_update_default_device(struct vga_device *vgadev)
 {
struct pci_dev *pdev = vgadev->pdev;
+   struct device *dev = &pdev->dev;
+   struct vga_device *vgadev_default;
+#if defined(CONFIG_X86) || defined(CONFIG_IA64)
+   int i;
+   unsigned long flags;
+   u64 base = screen_info.lfb_base;
+   u64 size = screen_info.lfb_size;
+   u64 limit;
+   resource_size_t start, end;
+#endif
 
/*
 * If we don't have a default VGA device yet, and this device owns
 * the legacy VGA resources, make it the default.
 */
-   if (!vga_default_device() &&
-   ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK)) {
-   vgaarb_info(&pdev->dev, "setting as boot VGA device\n");
+   if (!vga_default_device()) {
+   if ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == 
VGA_RSRC_LEGACY_MASK)
+   vgaarb_info(dev, "setting as boot VGA device\n");
+   else
+   vgaarb_info(dev, "setting as boot device (VGA legacy 
resources not available)\n");
vga_set_default_device(pdev);
}
+
+   vgadev_default = vgadev_find(vga_default);
+
+   /* Overridden by a better device */
+   if (vgadev_default && ((vgadev_default->owns & VGA_RSRC_LEGACY_MASK) == 
0)
+   && ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == 
VGA_RSRC_LEGACY_MASK)) {
+   vgaarb_info(dev, "overriding boot VGA device\n");
+   vga_set_default_device(pdev);
+   }
+
+   if (vga_arb_integrated_gpu(dev)) {
+   vgaarb_info(dev, "overriding boot VGA device\n");
+   vga_set_default_device(pdev);
+   }
+
+#if defined(CONFIG_X86) || defined(CONFIG_IA64)
+   if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
+   base |= (u64)screen_info.ext_lfb_base << 32;
+
+   limit = base + size;
+
+   /*
+* Override vga_arbiter_add_pci_device()'s I/O based detection
+* as it may take the wrong device (e.g. on Apple system under
+* EFI).
+*
+* Select the device owning the boot framebuffer if there is
+* one.
+*/
+
+   /* Does firmware framebuffer belong to us? */
+   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+   flags = pci_resource_flags(vgadev->pdev, i);
+
+   if ((flags & IORESOURCE_MEM) == 0)
+   continue;
+
+   start = pci_resource_start(vgadev->pdev, i);
+   end  = pci_resource_end(vgadev->pdev, i);
+
+   if (!start || !end)
+   continue;
+
+   if (base < start || limit >= end)
+   continue;
+
+   if (vgadev->pdev != vga_default_device())
+   vgaarb_info(dev, "overriding boot device\n");
+   vga_set_default_device(vgadev->pdev);
+   }
+#endif
 }

[PATCH V3 2/9] PCI/VGA: Replace full MIT license text with SPDX identifier

2021-08-20 Thread Huacai Chen

From: Bjorn Helgaas 

Per Documentation/process/license-rules.rst, the SPDX MIT identifier is
equivalent to including the entire MIT license text from
LICENSES/preferred/MIT.

Replace the MIT license text with the equivalent SPDX identifier.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 23 +--
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 949fde433ea2..61b57abcb014 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -1,32 +1,11 @@
+// SPDX-License-Identifier: MIT
 /*
  * vgaarb.c: Implements the VGA arbitration. For details refer to
  * Documentation/gpu/vgaarbiter.rst
  *
- *
  * (C) Copyright 2005 Benjamin Herrenschmidt 
  * (C) Copyright 2007 Paulo R. Zanoni 
  * (C) Copyright 2007, 2009 Tiago Vignatti 
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS
- * IN THE SOFTWARE.
- *
  */
 
 #define pr_fmt(fmt) "vgaarb: " fmt
-- 
2.27.0

[PATCH V3 5/9] PCI/VGA: Move vga_arb_integrated_gpu() earlier in file

2021-08-20 Thread Huacai Chen

Move vga_arb_integrated_gpu() earlier in file to prepare for future patch.
No functional change intended.

[bhelgaas: split to separate patch]
Signed-off-by: Huacai Chen 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/vgaarb.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index c984c76b3fd7..1f8fb37be5fa 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -563,6 +563,20 @@ void vga_put(struct pci_dev *pdev, unsigned int rsrc)
 }
 EXPORT_SYMBOL(vga_put);
 
+#if defined(CONFIG_ACPI)
+static bool vga_arb_integrated_gpu(struct device *dev)
+{
+   struct acpi_device *adev = ACPI_COMPANION(dev);
+
+   return adev && !strcmp(acpi_device_hid(adev), ACPI_VIDEO_HID);
+}
+#else
+static bool vga_arb_integrated_gpu(struct device *dev)
+{
+   return false;
+}
+#endif
+
 /*
  * Rules for using a bridge to control a VGA descendant decoding: if a bridge
  * has only one VGA descendant then it can be used to control the VGA routing
@@ -1416,20 +1430,6 @@ static struct miscdevice vga_arb_device = {
MISC_DYNAMIC_MINOR, "vga_arbiter", &vga_arb_device_fops
 };
 
-#if defined(CONFIG_ACPI)
-static bool vga_arb_integrated_gpu(struct device *dev)
-{
-   struct acpi_device *adev = ACPI_COMPANION(dev);
-
-   return adev && !strcmp(acpi_device_hid(adev), ACPI_VIDEO_HID);
-}
-#else
-static bool vga_arb_integrated_gpu(struct device *dev)
-{
-   return false;
-}
-#endif
-
 static void __init vga_arb_select_default_device(void)
 {
struct pci_dev *pdev, *found = NULL;
-- 
2.27.0

Re: [PATCH v8 12/13] drm/mediatek: add MERGE support for mediatek-drm

2021-08-20 Thread Chun-Kuang Hu

Hi, Jason:

jason-jh.lin  於 2021年8月19日 週四 上午10:23寫道：
>
> Add MERGE engine file:
> MERGE module is used to merge two slice-per-line inputs
> into one side-by-side output.
>
> Signed-off-by: jason-jh.lin 
> ---
>  drivers/gpu/drm/mediatek/Makefile   |   1 +
>  drivers/gpu/drm/mediatek/mtk_disp_drv.h |   8 +
>  drivers/gpu/drm/mediatek/mtk_disp_merge.c   | 268 
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c |  16 ++
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h |   1 +
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c  |   2 +
>  drivers/gpu/drm/mediatek/mtk_drm_drv.h  |   1 +
>  7 files changed, 297 insertions(+)
>  create mode 100644 drivers/gpu/drm/mediatek/mtk_disp_merge.c
>
> diff --git a/drivers/gpu/drm/mediatek/Makefile 
> b/drivers/gpu/drm/mediatek/Makefile
> index dc54a7a69005..538e0087a44c 100644
> --- a/drivers/gpu/drm/mediatek/Makefile
> +++ b/drivers/gpu/drm/mediatek/Makefile
> @@ -3,6 +3,7 @@
>  mediatek-drm-y := mtk_disp_ccorr.o \
>   mtk_disp_color.o \
>   mtk_disp_gamma.o \
> + mtk_disp_merge.o \
>   mtk_disp_ovl.o \
>   mtk_disp_rdma.o \
>   mtk_drm_crtc.o \
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_drv.h 
> b/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> index cafd9df2d63b..f407cd9d873e 100644
> --- a/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> @@ -46,6 +46,14 @@ void mtk_gamma_set_common(void __iomem *regs, struct 
> drm_crtc_state *state);
>  void mtk_gamma_start(struct device *dev);
>  void mtk_gamma_stop(struct device *dev);
>
> +int mtk_merge_clk_enable(struct device *dev);
> +void mtk_merge_clk_disable(struct device *dev);
> +void mtk_merge_config(struct device *dev, unsigned int width,
> + unsigned int height, unsigned int vrefresh,
> + unsigned int bpc, struct cmdq_pkt *cmdq_pkt);
> +void mtk_merge_start(struct device *dev);
> +void mtk_merge_stop(struct device *dev);
> +
>  void mtk_ovl_bgclr_in_on(struct device *dev);
>  void mtk_ovl_bgclr_in_off(struct device *dev);
>  void mtk_ovl_bypass_shadow(struct device *dev);
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_merge.c 
> b/drivers/gpu/drm/mediatek/mtk_disp_merge.c
> new file mode 100644
> index ..ebcb646bde9c
> --- /dev/null
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_merge.c
> @@ -0,0 +1,268 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 MediaTek Inc.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "mtk_drm_ddp_comp.h"
> +#include "mtk_drm_drv.h"
> +#include "mtk_disp_drv.h"
> +
> +#define DISP_REG_MERGE_CTRL0x000
> +#define MERGE_EN   1
> +#define DISP_REG_MERGE_CFG_0   0x010
> +#define DISP_REG_MERGE_CFG_4   0x020
> +#define DISP_REG_MERGE_CFG_10  0x038
> +/* no swap */
> +#define SWAP_MODE  0
> +#define FLD_SWAP_MODE  GENMASK(4, 0)
> +#define DISP_REG_MERGE_CFG_12  0x040
> +#define CFG_10_10_1PI_2PO_BUF_MODE 6
> +#define CFG_10_10_2PI_2PO_BUF_MODE 8
> +#define FLD_CFG_MERGE_MODE GENMASK(4, 0)
> +#define DISP_REG_MERGE_CFG_24  0x070
> +#define DISP_REG_MERGE_CFG_25  0x074
> +#define DISP_REG_MERGE_CFG_36  0x0a0
> +#define ULTRA_EN   1

You could use FLD_ULTRA_EN for this.

> +#define PREULTRA_EN1
> +#define HALT_FOR_DVFS_EN   0

You could just not set this.

> +#define FLD_ULTRA_EN   GENMASK(0, 0)

#define FLD_ULTRA_EN BIT(0)

Regards,
Chun-Kuang.

> +#define FLD_PREULTRA_ENGENMASK(4, 4)
> +#define FLD_HALT_FOR_DVFS_EN   GENMASK(8, 8)
> +#define DISP_REG_MERGE_CFG_37  0x0a4
> +/* 0: Off, 1: SRAM0, 2: SRAM1, 3: SRAM0 + SRAM1 */
> +#define BUFFER_MODE3
> +#define FLD_BUFFER_MODEGENMASK(1, 0)
> +#define DISP_REG_MERGE_CFG_38  0x0a8
> +#define FLD_VDE_BLOCK_ULTRAGENMASK(0, 0)
> +#define FLD_VALID_TH_BLOCK_ULTRA   GENMASK(4, 4)
> +#define FLD_ULTRA_FIFO_VALID_THGENMASK(31, 16)
> +#define DISP_REG_MERGE_CFG_39  0x0ac
> +#define FLD_NVDE_FORCE_PREULTRAGENMASK(8, 8)
> +#define FLD_NVALID_TH_FORCE_PREULTRA   GENMASK(12, 12)
> +#define FLD_PREULTRA_FIFO_VALID_TH GENMASK(31, 16)

Re: [PATCH] drm/amd/pm: And destination bounds checking to struct copy

2021-08-20 Thread Alex Deucher

On Thu, Aug 19, 2021 at 4:14 PM Kees Cook  wrote:
>
> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memcpy(), memmove(), and memset(), avoid
> intentionally writing across neighboring fields.
>
> The "Board Parameters" members of the structs:
> struct atom_smc_dpm_info_v4_5
> struct atom_smc_dpm_info_v4_6
> struct atom_smc_dpm_info_v4_7
> struct atom_smc_dpm_info_v4_10
> are written to the corresponding members of the corresponding PPTable_t
> variables, but they lack destination size bounds checking, which means
> the compiler cannot verify at compile time that this is an intended and
> safe memcpy().
>
> Since the header files are effectively immutable[1] and a struct_group()
> cannot be used, nor a common struct referenced by both sides of the
> memcpy() arguments, add a new helper, memcpy_trailing(), to perform the
> bounds checking at compile time. Replace the open-coded memcpy()s with
> memcpy_trailing() which includes enough context for the bounds checking.
>
> "objdump -d" shows no object code changes.
>
> [1] https://lore.kernel.org/lkml/e56aad3c-a06f-da07-f491-a894a570d...@amd.com
>
> Cc: Lijo Lazar 
> Cc: "Christian König" 
> Cc: "Pan, Xinhui" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Hawking Zhang 
> Cc: Feifei Xu 
> Cc: Likun Gao 
> Cc: Jiawei Gu 
> Cc: Evan Quan 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Kees Cook 
> Link: 
> https://lore.kernel.org/lkml/cadnq5_npb8uyvd+r4uhgf-w8-cqj3joodjvijr_y9w9wqj7...@mail.gmail.com
> ---
> Alex, I dropped your prior Acked-by, since the implementation is very
> different. If you're still happy with it, I can add it back. :)

This looks reasonable to me:
Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 25 +++
>  .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  6 ++---
>  .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  8 +++---
>  .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  5 ++--
>  4 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 96e895d6be35..4605934a4fb7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1446,4 +1446,29 @@ static inline int amdgpu_in_reset(struct amdgpu_device 
> *adev)
>  {
> return atomic_read(&adev->in_gpu_reset);
>  }
> +
> +/**
> + * memcpy_trailing - Copy the end of one structure into the middle of another
> + *
> + * @dst: Pointer to destination struct
> + * @first_dst_member: The member name in @dst where the overwrite begins
> + * @last_dst_member: The member name in @dst where the overwrite ends after
> + * @src: Pointer to the source struct
> + * @first_src_member: The member name in @src where the copy begins
> + *
> + */
> +#define memcpy_trailing(dst, first_dst_member, last_dst_member,  
>  \
> +   src, first_src_member) \
> +({\
> +   size_t __src_offset = offsetof(typeof(*(src)), first_src_member);  \
> +   size_t __src_size = sizeof(*(src)) - __src_offset; \
> +   size_t __dst_offset = offsetof(typeof(*(dst)), first_dst_member);  \
> +   size_t __dst_size = offsetofend(typeof(*(dst)), last_dst_member) - \
> +   __dst_offset;  \
> +   BUILD_BUG_ON(__src_size != __dst_size);\
> +   __builtin_memcpy((u8 *)(dst) + __dst_offset,   \
> +(u8 *)(src) + __src_offset,   \
> +__dst_size);  \
> +})
> +
>  #endif
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> index 8ab58781ae13..1918e6232319 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> @@ -465,10 +465,8 @@ static int arcturus_append_powerplay_table(struct 
> smu_context *smu)
>
> if ((smc_dpm_table->table_header.format_revision == 4) &&
> (smc_dpm_table->table_header.content_revision == 6))
> -   memcpy(&smc_pptable->MaxVoltageStepGfx,
> -  &smc_dpm_table->maxvoltagestepgfx,
> -  sizeof(*smc_dpm_table) - offsetof(struct 
> atom_smc_dpm_info_v4_6, maxvoltagestepgfx));
> -
> +   memcpy_trailing(smc_pptable, MaxVoltageStepGfx, BoardReserved,
> +   smc_dpm_table, maxvoltagestepgfx);
> return 0;
>  }
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> index 2e5d3669652b..b738042e064d 100644
> --- a/drivers/gpu/drm/amd/pm/sws

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Jiri Kosina

On Fri, 20 Aug 2021, Kees Cook wrote:

> > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > > field bounds checking for memset(), avoid intentionally writing across
> > > neighboring fields.
> > >
> > > Add struct_group() to mark region of struct kone_mouse_event that should
> > > be initialized to zero.
> > >
> > > Cc: Stefan Achatz 
> > > Cc: Jiri Kosina 
> > > Cc: Benjamin Tissoires 
> > > Cc: linux-in...@vger.kernel.org
> > > Signed-off-by: Kees Cook 
> >
> > Applied, thank you Kees.
> >
> 
> Eek! No, this will break the build: struct_group() is not yet in the tree.
> I can carry this with an Ack, etc.

I was pretty sure I saw struct_group() already in linux-next, but that was 
apparently a vacation-induced brainfart, sorry. Dropping.

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021, 6:01 AM Jiri Kosina  wrote:

> On Tue, 17 Aug 2021, Kees Cook wrote:
>
> > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > field bounds checking for memcpy(), memmove(), and memset(), avoid
> > intentionally writing across neighboring fields.
> >
> > Use struct_group() in struct cp2112_string_report around members report,
> > length, type, and string, so they can be referenced together. This will
> > allow memcpy() and sizeof() to more easily reason about sizes, improve
> > readability, and avoid future warnings about writing beyond the end of
> > report.
> >
> > "pahole" shows no size nor member offset changes to struct
> > cp2112_string_report.  "objdump -d" shows no meaningful object
> > code changes (i.e. only source line number induced differences.)
> >
> > Cc: Jiri Kosina 
> > Cc: Benjamin Tissoires 
> > Cc: linux-in...@vger.kernel.org
> > Signed-off-by: Kees Cook 
>
> Applied, thanks.
>

Same for this one: it's part of the larger series.

-Kees

>

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Kees Cook

On Fri, Aug 20, 2021, 6:02 AM Jiri Kosina  wrote:

> On Tue, 17 Aug 2021, Kees Cook wrote:
>
> > In preparation for FORTIFY_SOURCE performing compile-time and run-time
> > field bounds checking for memset(), avoid intentionally writing across
> > neighboring fields.
> >
> > Add struct_group() to mark region of struct kone_mouse_event that should
> > be initialized to zero.
> >
> > Cc: Stefan Achatz 
> > Cc: Jiri Kosina 
> > Cc: Benjamin Tissoires 
> > Cc: linux-in...@vger.kernel.org
> > Signed-off-by: Kees Cook 
>
> Applied, thank you Kees.
>

Eek! No, this will break the build: struct_group() is not yet in the tree.
I can carry this with an Ack, etc.

-Kees

Re: [RFC] Make use of non-dynamic dmabuf in RDMA

2021-08-20 Thread Jason Gunthorpe

On Fri, Aug 20, 2021 at 03:58:33PM +0300, Gal Pressman wrote:

> Though it would've been nicer if we could agree on a solution that could work
> for more than 1-2 RDMA devices, using the existing tools the RDMA subsystem 
> has.

I don't think it can really be done, revoke is necessary, and isn't a
primitive we have today.

Revoke is sort of like rereg MR, but with a guaranteed no-change to
the lkey/rkey

Then there is the locking complexity of linking the mr creation and
destruction to the lifecycle of the pages, which is messy and maybe
not general. For instance mlx5 would call its revoke_mr, disconnect
the dmabuf then destroy the mkey - but this is only safe because mlx5
HW can handle concurrent revokes.

> That's why I tried to approach this by denying such attachments for non-ODP
> importers instead of exposing a "limited" dynamic importer.

That is fine if there is no revoke - once revoke exists we must have
driver and HW support.

Jason

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Andrey Grodzovsky




On 2021-08-20 3:12 a.m., Liu, Monk wrote:

[AMD Official Use Only]

@Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
  
Do you have any concern on the kthread_park() approach ?


Theoretically speaking sched_main shall run there exclusively with job_timeout 
since they both touches jobs, and stop scheduler during job_timeout won't 
impact performance since in that scenario
There was already something wrong/stuck on that ring/scheduler



Regarding last paragraph, and specifically the claim that there was 
already something wrong if the TO handler
starts execution - Not sure about this and I wonder if we have a 
potential bug here - when we start the timeout timer in
drm_sched_job_begin we do it for each new incoming job. In a constant 
rapid stream of jobs each new job comming
will try to start the timer but most of the time this operation just 
bails out as there is already pending timer from one
of the previous jobs which cancels out any new ones [1] so, when the TO 
handler does execute eventually it's not
because something wrong but simply because TO has expired. If in this 
case the pending list not empty a false
TDR will be triggered. I think long ago we used TO handler per job and 
not per scheduler, this would solve this problem
but hurt the serialization issue we are trying to solve. So not sure 
what to do.


[1] - 
https://elixir.bootlin.com/linux/v5.14-rc1/source/kernel/workqueue.c#L1665


Andrey



Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Liu, Monk
Sent: Thursday, August 19, 2021 6:26 PM
To: Daniel Vetter ; Grodzovsky, Andrey 

Cc: Alex Deucher ; Chen, JingWen ; Maling list - 
DRI developers ; amd-gfx list ; 
Koenig, Christian 
Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

[AMD Official Use Only]

Hi Daniel


Why can't we stop the scheduler thread first, so that there's guaranteed no 
race? I've recently had a lot of discussions with panfrost folks about their 
reset that spawns across engines, and without stopping the scheduler thread 
first before you touch anything it's just plain impossible.

Yeah we had this though as well in our mind.

Our second approach is to call ktrhead_stop() in job_timedout() routine so that  the 
"bad" job is guaranteed to be used without scheduler's touching or freeing, 
Check this sample patch one as well please:

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index a2a9536..50a49cb 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
 sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
 /* Protects against concurrent deletion in drm_sched_get_cleanup_job */

+   kthread_park(sched->thread);
 spin_lock(&sched->job_list_lock);
 job = list_first_entry_or_null(&sched->pending_list,
struct drm_sched_job, list);
  
 if (job) {

-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
 spin_unlock(&sched->job_list_lock);
  
 status = job->sched->ops->timedout_job(job);

@@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 } else {
 spin_unlock(&sched->job_list_lock);
 }
+   kthread_unpark(sched->thread);
  
 if (status != DRM_GPU_SCHED_STAT_ENODEV) {

 spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void 
drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 kthread_park(sched->thread);
  
 /*

-* Reinsert back the bad job here - now it's safe as
-* drm_sched_get_cleanup_job cannot race against us and release the
-* bad job at this point - we parked (waited for) any in progress
-* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-* now until the scheduler thread is unparked.
-*/
-   if (bad && bad->sched == sched)
-   /*
-* Add at the head of the queue to reflect it was the earliest
-* job extracted.
-*/
-   list_add(&bad->list, &sched->pending_list);
-
-   /*
  * Iterate the job list from later to  earlier one and either deactive
  * their HW callbacks or remove them from pending list if they already
  * signaled.


Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-20 Thread Andrey Grodzovsky


I believe we have some minor confusion here

On 2021-08-20 4:09 a.m., Jingwen Chen wrote:

Hi all,

I just submit a v3 patch according your opinion on using kthread_park
instead.

Thanks,
Jingwen
On Fri Aug 20, 2021 at 09:20:42AM +0200, Christian König wrote:

No, that perfectly works for me.

The problem we used to have with this approach was that we potentially have
multiple timeouts at the same time.

But when we serialize the timeout handling by using a single workqueue as
suggested by Daniel now as well then that isn't an issue any more.



While we do use single work queue by default (system_wq) for this, we 
use different
work items, one per scheduler which means they still run in parallel.  I 
didn't see the original
mail by Daniel but from what Christian mentioned I assume he suggested 
to serialize all TO handlers
from all possible engines by either using single work item for TO 
handler or by using single threaded queue for all TO handlers.
So i believe it's premature to send V3 patch without also switching all 
TDR handling to actual single threaded
handling per entire ASIC or in case of amdgpu we actually need to 
consider XGMI hives and so it goes beyond a single

device.

Andrey




Regards,
Christian.

Am 20.08.21 um 09:12 schrieb Liu, Monk:

[AMD Official Use Only]

@Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
Do you have any concern on the kthread_park() approach ?

Theoretically speaking sched_main shall run there exclusively with job_timeout 
since they both touches jobs, and stop scheduler during job_timeout won't 
impact performance since in that scenario
There was already something wrong/stuck on that ring/scheduler

Thanks

--
Monk Liu | Cloud-GPU Core team
--

-Original Message-
From: Liu, Monk
Sent: Thursday, August 19, 2021 6:26 PM
To: Daniel Vetter ; Grodzovsky, Andrey 

Cc: Alex Deucher ; Chen, JingWen ; Maling list - 
DRI developers ; amd-gfx list ; 
Koenig, Christian 
Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

[AMD Official Use Only]

Hi Daniel


Why can't we stop the scheduler thread first, so that there's guaranteed no 
race? I've recently had a lot of discussions with panfrost folks about their 
reset that spawns across engines, and without stopping the scheduler thread 
first before you touch anything it's just plain impossible.

Yeah we had this though as well in our mind.

Our second approach is to call ktrhead_stop() in job_timedout() routine so that  the 
"bad" job is guaranteed to be used without scheduler's touching or freeing, 
Check this sample patch one as well please:

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index a2a9536..50a49cb 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
  sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  /* Protects against concurrent deletion in drm_sched_get_cleanup_job 
*/
+   kthread_park(sched->thread);
  spin_lock(&sched->job_list_lock);
  job = list_first_entry_or_null(&sched->pending_list,
 struct drm_sched_job, list);
  if (job) {
-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
  spin_unlock(&sched->job_list_lock);
  status = job->sched->ops->timedout_job(job);
@@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
  } else {
  spin_unlock(&sched->job_list_lock);
  }
+   kthread_unpark(sched->thread);
  if (status != DRM_GPU_SCHED_STAT_ENODEV) {
  spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ void 
drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
  kthread_park(sched->thread);
  /*
-* Reinsert back the bad job here - now it's safe as
-* drm_sched_get_cleanup_job cannot race against us and release the
-* bad job at this point - we parked (waited for) any in progress
-* (earlier) cleanups and drm_sched_get_cleanup_job will not be called
-* now until the scheduler thread is unparked.
-*/
-   if (bad && bad->sched == sched)
-   /*
-* Add at the head of the queue to reflect it was the earliest
-* job extracted.
-*/
-   list_add(&bad->list, &sched->pending_list);
-
-   /*
   * Iterate the job list from later to  earlier one and either deactive
   * their HW ca

Re: [PATCH v8 07/34] clk: tegra: Support runtime PM and power domain

2021-08-20 Thread Ulf Hansson

[...]

> >
> > I'm creating platform device for the clocks that require DVFS. These
> > clocks don't use regulator, they are attached to the CORE domain.
> > GENPD framework manages the performance state, aggregating perf votes
> > from each device, i.e. from each clock individually.
> >
> > You want to reinvent another layer of aggregation on top of GENPD.
> > This doesn't worth the effort, we won't get anything from it, it
> > should be a lot of extra complexity for nothing. We will also lose
> > from it because pm_genpd_summary won't show you a per-device info.
> >
> > domain  status  children
> >performance
> > /device runtime status
> > --
> > heg on  
> >100
> > /devices/soc0/5000.host1x   active  
> >100
> > /devices/soc0/5000.host1x/5414.gr2d suspended   
> >0
> > mpe off-0   
> >0
> > vdecoff-0   
> >0
> > /devices/soc0/6001a000.vde  suspended   
> >0
> > vencoff-0   
> >0
> > 3d1 off-0   
> >0
> > /devices/genpd:1:5418.gr3d  suspended   
> >0
> > 3d0 off-0   
> >0
> > /devices/genpd:0:5418.gr3d  suspended   
> >0
> > core-domain on  
> >100
> > 3d0, 3d1, venc, vdec, mpe, 
> > heg
> > /devices/soc0/7d00.usb  active  
> >100
> > /devices/soc0/78000400.mmc  active  
> >95
> > /devices/soc0/7000f400.memory-controllerunsupported 
> >100
> > /devices/soc0/7000a000.pwm  active  
> >100
> > /devices/soc0/60006000.clock/tegra_clk_pll_cactive  
> >100
> > /devices/soc0/60006000.clock/tegra_clk_pll_esuspended   
> >0
> > /devices/soc0/60006000.clock/tegra_clk_pll_mactive  
> >100
> > /devices/soc0/60006000.clock/tegra_clk_sclk active  
> >100
> >
>
> I suppose if there's really no good way of doing this other than
> providing a struct device, then so be it. I think the cleaned up sysfs
> shown in the summary above looks much better than what the original
> would've looked like.
>
> Perhaps an additional tweak to that would be to not create platform
> devices. Instead, just create struct device. Those really have
> everything you need (.of_node, and can be used with RPM and GENPD). As I
> mentioned earlier, platform device implies a CPU-memory-mapped bus,
> which this clearly isn't. It's kind of a separate "bus" if you want, so
> just using struct device directly seems more appropriate.

Just a heads up. If you don't use a platform device or have a driver
associated with it for probing, you need to manage the attachment to
genpd yourself. That means calling one of the dev_pm_domain_attach*()
APIs, but that's perfectly fine, ofcourse.

>
> We did something similar for XUSB pads, see drivers/phy/tegra/xusb.[ch]
> for an example of how that was done. I think you can do something
> similar here.
>
> Thierry

Kind regards
Uffe

Re: [PATCH v2 55/63] HID: roccat: Use struct_group() to zero kone_mouse_event

2021-08-20 Thread Jiri Kosina

On Tue, 17 Aug 2021, Kees Cook wrote:

> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memset(), avoid intentionally writing across
> neighboring fields.
> 
> Add struct_group() to mark region of struct kone_mouse_event that should
> be initialized to zero.
> 
> Cc: Stefan Achatz 
> Cc: Jiri Kosina 
> Cc: Benjamin Tissoires 
> Cc: linux-in...@vger.kernel.org
> Signed-off-by: Kees Cook 

Applied, thank you Kees.

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH v2 22/63] HID: cp2112: Use struct_group() for memcpy() region

2021-08-20 Thread Jiri Kosina

On Tue, 17 Aug 2021, Kees Cook wrote:

> In preparation for FORTIFY_SOURCE performing compile-time and run-time
> field bounds checking for memcpy(), memmove(), and memset(), avoid
> intentionally writing across neighboring fields.
> 
> Use struct_group() in struct cp2112_string_report around members report,
> length, type, and string, so they can be referenced together. This will
> allow memcpy() and sizeof() to more easily reason about sizes, improve
> readability, and avoid future warnings about writing beyond the end of
> report.
> 
> "pahole" shows no size nor member offset changes to struct
> cp2112_string_report.  "objdump -d" shows no meaningful object
> code changes (i.e. only source line number induced differences.)
> 
> Cc: Jiri Kosina 
> Cc: Benjamin Tissoires 
> Cc: linux-in...@vger.kernel.org
> Signed-off-by: Kees Cook 

Applied, thanks.

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH v8 01/34] opp: Add dev_pm_opp_sync() helper

2021-08-20 Thread Ulf Hansson

On Fri, 20 Aug 2021 at 07:18, Viresh Kumar  wrote:
>
> On 19-08-21, 16:55, Ulf Hansson wrote:
> > Right, that sounds reasonable.
> >
> > We already have pm_genpd_opp_to_performance_state() which translates
> > an OPP to a performance state. This function invokes the
> > ->opp_to_performance_state() for a genpd. Maybe we need to allow a
> > genpd to not have ->opp_to_performance_state() callback assigned
> > though, but continue up in the hierarchy to see if the parent has the
> > callback assigned, to make this work for Tegra?
> >
> > Perhaps we should add an API dev_pm_genpd_opp_to_performance_state(),
> > allowing us to pass the device instead of the genpd. But that's a
> > minor thing.
>
> I am not concerned a lot about how it gets implemented, and am not
> sure as well, as I haven't looked into these details since sometime.
> Any reasonable thing will be accepted, as simple as that.
>
> > Finally, the precondition to use the above, is to first get a handle
> > to an OPP table. This is where I am struggling to find a generic
> > solution, because I guess that would be platform or even consumer
> > driver specific for how to do this. And at what point should we do
> > this?
>
> Hmm, I am not very clear with the whole picture at this point of time.
>
> Dmitry, can you try to frame a sequence of events/calls/etc that will
> define what kind of devices we are looking at here, and how this can
> be made to work ?
>
> > > > Viresh, please take a look at what I did in [1]. Maybe it could be done
> > > > in another way.
> > >
> > > I looked into this and looked like too much trouble. The
> > > implementation needs to be simple. I am not sure I understand all the
> > > problems you faced while doing that, would be better to start with a
> > > simpler implementation of get_performance_state() kind of API for
> > > genpd, after the domain is attached and its OPP table is initialized.
> > >
> > > Note, that the OPP table isn't required to be fully initialized for
> > > the device at this point, we can parse the DT as well if needed be.
> >
> > Sure, but as I indicated above, you need some kind of input data to
> > figure out what OPP table to pick, before you can translate that into
> > a performance state. Is that always the clock rate, for example?
>
> Eventually it can be clock, bandwidth, or pstate of anther genpd, not
> sure what all we are looking for now. It should be just clock right
> now as far as I can imagine :)
>
> > Perhaps, we should start with adding a dev_pm_opp_get_from_rate() or
> > what do you think? Do you have other suggestions?
>
> We already have similar APIs, so that won't be a problem. We also have
> a mechanism inside the OPP core, frequency based, which is used to
> guess the current OPP. Maybe we can enhance and use that directly
> here.

After reading the last reply from Dmitry, I am starting to think that
the problem he is facing can be described and solved in a much easier
way.

If I am correct, it looks like we don't need to add APIs to get OPPs
for a clock rate or set initial performance state values according to
the HW in genpd.

See my other response to Dmitry, let's see where that leads us.

Kind regards
Uffe

1 2 >

1 - 100 of 137 matches

Mail list logo