Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
Hi Ville, On 11/2/2022 6:55 PM, Ville Syrjälä wrote: On Mon, Oct 24, 2022 at 10:08:29AM +0200, Das, Nirmoy wrote: On 10/21/2022 6:34 PM, Ville Syrjälä wrote: On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. shard-snb is stil hitting the mm.shrink_count WARNn reliably, and things go downhill after that. Looks better now again. Going to look into that. Looks to be still hitting it occasionally in module reload tests: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7033/shard-snb5/igt@i915_module_l...@reload.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7035/shard-snb7/igt@perf_...@module-unload.html There are no snb in RIl so I ran this test on tgl-u for 6+ hours without any reproduction. Not sure why snb is so special here. May be we need your previous patch as well ? I will be on vacation from next week so unfortunately I won't be able work on it for few weeks. Regards, Nirmoy Thanks, Nirmoy Suggested-by: Chris Wilson Acked-by: Tvrtko Ursulin Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 88df9a35e0fe..7541028caebd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); -- 2.37.3
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On Mon, Oct 24, 2022 at 10:08:29AM +0200, Das, Nirmoy wrote: > > On 10/21/2022 6:34 PM, Ville Syrjälä wrote: > > On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote: > >> i915_gem_drain_freed_objects() might not be enough to > >> free all the objects and RCU delayed work might get > >> scheduled after the i915 device struct gets freed. > >> > >> Call i915_gem_drain_workqueue() to catch all RCU delayed work. > > shard-snb is stil hitting the mm.shrink_count WARNn reliably, > > and things go downhill after that. > > > Looks better now again. Going to look into that. Looks to be still hitting it occasionally in module reload tests: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7033/shard-snb5/igt@i915_module_l...@reload.html https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7035/shard-snb7/igt@perf_...@module-unload.html > > > Thanks, > > Nirmoy > > > > >> Suggested-by: Chris Wilson > >> Acked-by: Tvrtko Ursulin > >> Signed-off-by: Nirmoy Das > >> --- > >> drivers/gpu/drm/i915/i915_gem.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_gem.c > >> b/drivers/gpu/drm/i915/i915_gem.c > >> index 88df9a35e0fe..7541028caebd 100644 > >> --- a/drivers/gpu/drm/i915/i915_gem.c > >> +++ b/drivers/gpu/drm/i915/i915_gem.c > >> @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private > >> *dev_priv) > >> > >> void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) > >> { > >> - i915_gem_drain_freed_objects(dev_priv); > >> + i915_gem_drain_workqueue(dev_priv); > >>GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); > >>GEM_BUG_ON(atomic_read(_priv->mm.free_count)); > >>drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); > >> -- > >> 2.37.3 -- Ville Syrjälä Intel
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 10/21/2022 6:34 PM, Ville Syrjälä wrote: On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. shard-snb is stil hitting the mm.shrink_count WARNn reliably, and things go downhill after that. Looks better now again. Going to look into that. Thanks, Nirmoy Suggested-by: Chris Wilson Acked-by: Tvrtko Ursulin Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 88df9a35e0fe..7541028caebd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); -- 2.37.3
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote: > i915_gem_drain_freed_objects() might not be enough to > free all the objects and RCU delayed work might get > scheduled after the i915 device struct gets freed. > > Call i915_gem_drain_workqueue() to catch all RCU delayed work. shard-snb is stil hitting the mm.shrink_count WARNn reliably, and things go downhill after that. > > Suggested-by: Chris Wilson > Acked-by: Tvrtko Ursulin > Signed-off-by: Nirmoy Das > --- > drivers/gpu/drm/i915/i915_gem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 88df9a35e0fe..7541028caebd 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private > *dev_priv) > > void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) > { > - i915_gem_drain_freed_objects(dev_priv); > + i915_gem_drain_workqueue(dev_priv); > GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); > GEM_BUG_ON(atomic_read(_priv->mm.free_count)); > drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); > -- > 2.37.3 -- Ville Syrjälä Intel
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 9/29/2022 1:32 PM, Andi Shyti wrote: Hi Nirmoy, On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. Suggested-by: Chris Wilson Acked-by: Tvrtko Ursulin Signed-off-by: Nirmoy Das pushed to drm-intel-gt-next Thanks, Andi! Thanks, Andi
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
Hi Nirmoy, On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote: > i915_gem_drain_freed_objects() might not be enough to > free all the objects and RCU delayed work might get > scheduled after the i915 device struct gets freed. > > Call i915_gem_drain_workqueue() to catch all RCU delayed work. > > Suggested-by: Chris Wilson > Acked-by: Tvrtko Ursulin > Signed-off-by: Nirmoy Das pushed to drm-intel-gt-next Thanks, Andi
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 23.09.2022 09:35, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. Suggested-by: Chris Wilson Acked-by: Tvrtko Ursulin Signed-off-by: Nirmoy Das Reviewed-by: Andrzej Hajda Regards Andrzej --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 88df9a35e0fe..7541028caebd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count);
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 9/22/2022 2:28 PM, Tvrtko Ursulin wrote: On 22/09/2022 13:11, Das, Nirmoy wrote: On 9/22/2022 11:37 AM, Tvrtko Ursulin wrote: On 21/09/2022 16:53, Das, Nirmoy wrote: On 9/9/2022 10:55 AM, Tvrtko Ursulin wrote: On 08/09/2022 21:07, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. Suggested-by: Chris Wilson Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0f49ec9d494a..e8a053eaaa89 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1254,7 +1254,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); Help me spot the place where RCU free worker schedules itself back to free more objects - if I got the rationale here right? (Sorry for late reply, was on leave last week.) I had to clarify this with Chris. So when driver frees a obj, it does dma_resv_fini() which will drop reference for all the fences in it and a fence might reference to an object and upon release of that fence can trigger a release reference to an object. Hmm I couldn't find that in code but never mind. It's just a stronger version of the same flushing and it's not on a path where speed matters so feel free to go with it. Can I get a Ack from you for this, Tvrtko ? Sorry yes, forgot to be explicit. Acked-by: Tvrtko Ursulin Thanks a lot. I will rebase and send again. Nirmoy Regards, Tvrtko
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 22/09/2022 13:11, Das, Nirmoy wrote: On 9/22/2022 11:37 AM, Tvrtko Ursulin wrote: On 21/09/2022 16:53, Das, Nirmoy wrote: On 9/9/2022 10:55 AM, Tvrtko Ursulin wrote: On 08/09/2022 21:07, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. Suggested-by: Chris Wilson Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0f49ec9d494a..e8a053eaaa89 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1254,7 +1254,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); Help me spot the place where RCU free worker schedules itself back to free more objects - if I got the rationale here right? (Sorry for late reply, was on leave last week.) I had to clarify this with Chris. So when driver frees a obj, it does dma_resv_fini() which will drop reference for all the fences in it and a fence might reference to an object and upon release of that fence can trigger a release reference to an object. Hmm I couldn't find that in code but never mind. It's just a stronger version of the same flushing and it's not on a path where speed matters so feel free to go with it. Can I get a Ack from you for this, Tvrtko ? Sorry yes, forgot to be explicit. Acked-by: Tvrtko Ursulin Regards, Tvrtko
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 9/22/2022 11:37 AM, Tvrtko Ursulin wrote: On 21/09/2022 16:53, Das, Nirmoy wrote: On 9/9/2022 10:55 AM, Tvrtko Ursulin wrote: On 08/09/2022 21:07, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. Suggested-by: Chris Wilson Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0f49ec9d494a..e8a053eaaa89 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1254,7 +1254,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); Help me spot the place where RCU free worker schedules itself back to free more objects - if I got the rationale here right? (Sorry for late reply, was on leave last week.) I had to clarify this with Chris. So when driver frees a obj, it does dma_resv_fini() which will drop reference for all the fences in it and a fence might reference to an object and upon release of that fence can trigger a release reference to an object. Hmm I couldn't find that in code but never mind. It's just a stronger version of the same flushing and it's not on a path where speed matters so feel free to go with it. Can I get a Ack from you for this, Tvrtko ? Thanks, Nirmoy Regards, Tvrtko
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 21/09/2022 16:53, Das, Nirmoy wrote: On 9/9/2022 10:55 AM, Tvrtko Ursulin wrote: On 08/09/2022 21:07, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. Suggested-by: Chris Wilson Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0f49ec9d494a..e8a053eaaa89 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1254,7 +1254,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); Help me spot the place where RCU free worker schedules itself back to free more objects - if I got the rationale here right? (Sorry for late reply, was on leave last week.) I had to clarify this with Chris. So when driver frees a obj, it does dma_resv_fini() which will drop reference for all the fences in it and a fence might reference to an object and upon release of that fence can trigger a release reference to an object. Hmm I couldn't find that in code but never mind. It's just a stronger version of the same flushing and it's not on a path where speed matters so feel free to go with it. Regards, Tvrtko
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 9/9/2022 10:55 AM, Tvrtko Ursulin wrote: On 08/09/2022 21:07, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. Suggested-by: Chris Wilson Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0f49ec9d494a..e8a053eaaa89 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1254,7 +1254,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); Help me spot the place where RCU free worker schedules itself back to free more objects - if I got the rationale here right? (Sorry for late reply, was on leave last week.) I had to clarify this with Chris. So when driver frees a obj, it does dma_resv_fini() which will drop reference for all the fences in it and a fence might reference to an object and upon release of that fence can trigger a release reference to an object. Regards, Nirmoy Regards, Tvrtko
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Fix a potential UAF at device unload
On 08/09/2022 21:07, Nirmoy Das wrote: i915_gem_drain_freed_objects() might not be enough to free all the objects and RCU delayed work might get scheduled after the i915 device struct gets freed. Call i915_gem_drain_workqueue() to catch all RCU delayed work. Suggested-by: Chris Wilson Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0f49ec9d494a..e8a053eaaa89 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1254,7 +1254,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv) void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) { - i915_gem_drain_freed_objects(dev_priv); + i915_gem_drain_workqueue(dev_priv); GEM_BUG_ON(!llist_empty(_priv->mm.free_list)); GEM_BUG_ON(atomic_read(_priv->mm.free_count)); drm_WARN_ON(_priv->drm, dev_priv->mm.shrink_count); Help me spot the place where RCU free worker schedules itself back to free more objects - if I got the rationale here right? Regards, Tvrtko