Re: [Intel-gfx] [PATCH v2] drm/i915: Fix a VMA UAF for multi-gt platform

2023-06-07 Thread Nirmoy Das



On 6/7/2023 8:20 AM, Andrzej Hajda wrote:



On 06.06.2023 22:27, Nirmoy Das wrote:

Ensure correct handling of closed VMAs on multi-gt platforms to prevent
Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are
exclusively added to GT0's closed_vma link (gt->closed_vma) and
subsequently freed by i915_vma_parked(), which assumes the entire GPU is
idle. However, on platforms with multiple GTs, such as MTL, GT1 may
remain active while GT0 is idle. This causes GT0 to mistakenly consider
the closed VMAs in its closed_vma list as unnecessary, potentially
leading to Use-After-Free issues if a job for GT1 attempts to access a
freed VMA.

Although we do take a wakeref for GT0 but it happens later, after
evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref
early.

v2: Use gt id to detect multi-tile(Andi)
 Fix the incorrect error path.

Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Cc: Chris Wilson 
Cc: Andi Shyti 
Cc: Andrzej Hajda 
Cc: Sushma Venkatesh Reddy 
Tested-by: Andi Shyti 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c

index 3aeede6aee4d..c2a67435acfa 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2683,6 +2683,7 @@ static int
  eb_select_engine(struct i915_execbuffer *eb)
  {
  struct intel_context *ce, *child;
+    struct intel_gt *gt;
  unsigned int idx;
  int err;
  @@ -2706,10 +2707,16 @@ eb_select_engine(struct i915_execbuffer *eb)
  }
  }
  eb->num_batches = ce->parallel.number_children + 1;
+    gt = ce->engine->gt;
    for_each_child(ce, child)
  intel_context_get(child);
  intel_gt_pm_get(ce->engine->gt);


intel_gt_pm_get(gt)



+    /* Keep GT0 active on MTL so that i915_vma_parked() doesn't
+ * free VMAs while execbuf ioctl is validating VMAs.
+ */
+    if (gt->info.id)
+    intel_gt_pm_get(to_gt(gt->i915));
    if (!test_bit(CONTEXT_ALLOC_BIT, >flags)) {
  err = intel_context_alloc_state(ce);
@@ -2748,6 +2755,9 @@ eb_select_engine(struct i915_execbuffer *eb)
  return err;
    err:
+    if (gt->info.id)
+    intel_gt_pm_put(to_gt(gt->i915));
+
  intel_gt_pm_put(ce->engine->gt);


intel_gt_pm_put(gt)


Will change both.


Reviewed-by: Andrzej Hajda 



Thanks,

Nirmoy



Regards
Andrzej


  for_each_child(ce, child)
  intel_context_put(child);
@@ -2761,6 +2771,8 @@ eb_put_engine(struct i915_execbuffer *eb)
  struct intel_context *child;
    i915_vm_put(eb->context->vm);
+    if (eb->gt->info.id)
+    intel_gt_pm_put(to_gt(eb->gt->i915));
  intel_gt_pm_put(eb->gt);
  for_each_child(eb->context, child)
  intel_context_put(child);




Re: [Intel-gfx] [PATCH v2] drm/i915: Fix a VMA UAF for multi-gt platform

2023-06-07 Thread Nirmoy Das



On 6/6/2023 10:56 PM, Andi Shyti wrote:

Hi Nirmoy,

On Tue, Jun 06, 2023 at 10:27:55PM +0200, Nirmoy Das wrote:

Ensure correct handling of closed VMAs on multi-gt platforms to prevent
Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are
exclusively added to GT0's closed_vma link (gt->closed_vma) and
subsequently freed by i915_vma_parked(), which assumes the entire GPU is
idle. However, on platforms with multiple GTs, such as MTL, GT1 may
remain active while GT0 is idle. This causes GT0 to mistakenly consider
the closed VMAs in its closed_vma list as unnecessary, potentially
leading to Use-After-Free issues if a job for GT1 attempts to access a
freed VMA.

Although we do take a wakeref for GT0 but it happens later, after
evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref
early.

v2: Use gt id to detect multi-tile(Andi)
 Fix the incorrect error path.

Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Cc: Chris Wilson 
Cc: Andi Shyti 
Cc: Andrzej Hajda 
Cc: Sushma Venkatesh Reddy 
Tested-by: Andi Shyti 
Signed-off-by: Nirmoy Das 

I wonder if we need any Fixes tag here, maybe this:

Fixes: d93939730347 ("drm/i915: Remove the vma refcount")


No, vma refcount is not enough unfortunately. Issue is i915_vma_parked() 
expects only one GT.






---
  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 3aeede6aee4d..c2a67435acfa 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2683,6 +2683,7 @@ static int
  eb_select_engine(struct i915_execbuffer *eb)
  {
struct intel_context *ce, *child;
+   struct intel_gt *gt;
unsigned int idx;
int err;
  
@@ -2706,10 +2707,16 @@ eb_select_engine(struct i915_execbuffer *eb)

}
}
eb->num_batches = ce->parallel.number_children + 1;
+   gt = ce->engine->gt;
  
  	for_each_child(ce, child)

intel_context_get(child);
intel_gt_pm_get(ce->engine->gt);
+   /* Keep GT0 active on MTL so that i915_vma_parked() doesn't
+* free VMAs while execbuf ioctl is validating VMAs.
+*/
+   if (gt->info.id)
+   intel_gt_pm_get(to_gt(gt->i915));
  
  	if (!test_bit(CONTEXT_ALLOC_BIT, >flags)) {

err = intel_context_alloc_state(ce);
@@ -2748,6 +2755,9 @@ eb_select_engine(struct i915_execbuffer *eb)
return err;
  
  err:

+   if (gt->info.id)
+   intel_gt_pm_put(to_gt(gt->i915));
+
intel_gt_pm_put(ce->engine->gt);
for_each_child(ce, child)
intel_context_put(child);
@@ -2761,6 +2771,8 @@ eb_put_engine(struct i915_execbuffer *eb)
struct intel_context *child;
  
  	i915_vm_put(eb->context->vm);

+   if (eb->gt->info.id)
+   intel_gt_pm_put(to_gt(eb->gt->i915));
intel_gt_pm_put(eb->gt);

I would add a comment up here, just not to scare people when they
see this.



I can add a comment pairing comment mentioning

eb_select_engine().


Reviewed-by: Andi Shyti 



Thanks,

Nirmoy



Andi


for_each_child(eb->context, child)
intel_context_put(child);
--
2.39.0


Re: [Intel-gfx] [PATCH v2] drm/i915: Fix a VMA UAF for multi-gt platform

2023-06-07 Thread Andrzej Hajda




On 06.06.2023 22:27, Nirmoy Das wrote:

Ensure correct handling of closed VMAs on multi-gt platforms to prevent
Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are
exclusively added to GT0's closed_vma link (gt->closed_vma) and
subsequently freed by i915_vma_parked(), which assumes the entire GPU is
idle. However, on platforms with multiple GTs, such as MTL, GT1 may
remain active while GT0 is idle. This causes GT0 to mistakenly consider
the closed VMAs in its closed_vma list as unnecessary, potentially
leading to Use-After-Free issues if a job for GT1 attempts to access a
freed VMA.

Although we do take a wakeref for GT0 but it happens later, after
evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref
early.

v2: Use gt id to detect multi-tile(Andi)
 Fix the incorrect error path.

Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: Thomas Hellström 
Cc: Chris Wilson 
Cc: Andi Shyti 
Cc: Andrzej Hajda 
Cc: Sushma Venkatesh Reddy 
Tested-by: Andi Shyti 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 3aeede6aee4d..c2a67435acfa 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2683,6 +2683,7 @@ static int
  eb_select_engine(struct i915_execbuffer *eb)
  {
struct intel_context *ce, *child;
+   struct intel_gt *gt;
unsigned int idx;
int err;
  
@@ -2706,10 +2707,16 @@ eb_select_engine(struct i915_execbuffer *eb)

}
}
eb->num_batches = ce->parallel.number_children + 1;
+   gt = ce->engine->gt;
  
  	for_each_child(ce, child)

intel_context_get(child);
intel_gt_pm_get(ce->engine->gt);


intel_gt_pm_get(gt)



+   /* Keep GT0 active on MTL so that i915_vma_parked() doesn't
+* free VMAs while execbuf ioctl is validating VMAs.
+*/
+   if (gt->info.id)
+   intel_gt_pm_get(to_gt(gt->i915));
  
  	if (!test_bit(CONTEXT_ALLOC_BIT, >flags)) {

err = intel_context_alloc_state(ce);
@@ -2748,6 +2755,9 @@ eb_select_engine(struct i915_execbuffer *eb)
return err;
  
  err:

+   if (gt->info.id)
+   intel_gt_pm_put(to_gt(gt->i915));
+
intel_gt_pm_put(ce->engine->gt);


intel_gt_pm_put(gt)


Reviewed-by: Andrzej Hajda 

Regards
Andrzej


for_each_child(ce, child)
intel_context_put(child);
@@ -2761,6 +2771,8 @@ eb_put_engine(struct i915_execbuffer *eb)
struct intel_context *child;
  
  	i915_vm_put(eb->context->vm);

+   if (eb->gt->info.id)
+   intel_gt_pm_put(to_gt(eb->gt->i915));
intel_gt_pm_put(eb->gt);
for_each_child(eb->context, child)
intel_context_put(child);




Re: [Intel-gfx] [PATCH v2] drm/i915: Fix a VMA UAF for multi-gt platform

2023-06-06 Thread Andi Shyti
Hi Nirmoy,

On Tue, Jun 06, 2023 at 10:27:55PM +0200, Nirmoy Das wrote:
> Ensure correct handling of closed VMAs on multi-gt platforms to prevent
> Use-After-Free. Currently, when GT0 goes idle, closed VMAs that are
> exclusively added to GT0's closed_vma link (gt->closed_vma) and
> subsequently freed by i915_vma_parked(), which assumes the entire GPU is
> idle. However, on platforms with multiple GTs, such as MTL, GT1 may
> remain active while GT0 is idle. This causes GT0 to mistakenly consider
> the closed VMAs in its closed_vma list as unnecessary, potentially
> leading to Use-After-Free issues if a job for GT1 attempts to access a
> freed VMA.
> 
> Although we do take a wakeref for GT0 but it happens later, after
> evaluating VMAs. To mitigate this, it is necessary to hold a GT0 wakeref
> early.
> 
> v2: Use gt id to detect multi-tile(Andi)
> Fix the incorrect error path.
> 
> Cc: Joonas Lahtinen 
> Cc: Rodrigo Vivi 
> Cc: Tvrtko Ursulin 
> Cc: Thomas Hellström 
> Cc: Chris Wilson 
> Cc: Andi Shyti 
> Cc: Andrzej Hajda 
> Cc: Sushma Venkatesh Reddy 
> Tested-by: Andi Shyti 
> Signed-off-by: Nirmoy Das 

I wonder if we need any Fixes tag here, maybe this:

Fixes: d93939730347 ("drm/i915: Remove the vma refcount")

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 3aeede6aee4d..c2a67435acfa 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2683,6 +2683,7 @@ static int
>  eb_select_engine(struct i915_execbuffer *eb)
>  {
>   struct intel_context *ce, *child;
> + struct intel_gt *gt;
>   unsigned int idx;
>   int err;
>  
> @@ -2706,10 +2707,16 @@ eb_select_engine(struct i915_execbuffer *eb)
>   }
>   }
>   eb->num_batches = ce->parallel.number_children + 1;
> + gt = ce->engine->gt;
>  
>   for_each_child(ce, child)
>   intel_context_get(child);
>   intel_gt_pm_get(ce->engine->gt);
> + /* Keep GT0 active on MTL so that i915_vma_parked() doesn't
> +  * free VMAs while execbuf ioctl is validating VMAs.
> +  */
> + if (gt->info.id)
> + intel_gt_pm_get(to_gt(gt->i915));
>  
>   if (!test_bit(CONTEXT_ALLOC_BIT, >flags)) {
>   err = intel_context_alloc_state(ce);
> @@ -2748,6 +2755,9 @@ eb_select_engine(struct i915_execbuffer *eb)
>   return err;
>  
>  err:
> + if (gt->info.id)
> + intel_gt_pm_put(to_gt(gt->i915));
> +
>   intel_gt_pm_put(ce->engine->gt);
>   for_each_child(ce, child)
>   intel_context_put(child);
> @@ -2761,6 +2771,8 @@ eb_put_engine(struct i915_execbuffer *eb)
>   struct intel_context *child;
>  
>   i915_vm_put(eb->context->vm);
> + if (eb->gt->info.id)
> + intel_gt_pm_put(to_gt(eb->gt->i915));
>   intel_gt_pm_put(eb->gt);

I would add a comment up here, just not to scare people when they
see this.

Reviewed-by: Andi Shyti  

Andi

>   for_each_child(eb->context, child)
>   intel_context_put(child);
> -- 
> 2.39.0