Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind

2019-04-04 Thread Jani Nikula
On Thu, 04 Apr 2019, Chris Wilson  wrote:
> Quoting Janusz Krzysztofik (2019-04-04 11:50:14)
>> On Thu, 2019-04-04 at 11:43 +0100, Chris Wilson wrote:
>> > Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
>> > > On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
>> > > > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
>> > > > > From: Janusz Krzysztofik 
>> > > > > 
>> > > > > In case the driver gets unbound while a device is open, kernel
>> > > > > panic
>> > > > > may be forced if a list of allocated context IDs is not empty.
>> > > > > 
>> > > > > When a device is open, the list may happen to be not empty
>> > > > > because
>> > > > > a
>> > > > > context ID, once allocated by a context ID allocator to a
>> > > > > context
>> > > > > assosiated with that open file descriptor, is released as late
>> > > > > as
>> > > > > on device close.
>> > > > > 
>> > > > > On the other hand, there is a need to release all allocated
>> > > > > context
>> > > > > IDs
>> > > > > and destroy the context ID allocator on driver unbind, even if
>> > > > > a
>> > > > > device
>> > > > > is open, in order to free memory resources consumed and prevent
>> > > > > from
>> > > > > memory leaks.  The purpose of the forced kernel panic was to
>> > > > > protect
>> > > > > the context ID allocator from being silently destroyed if not
>> > > > > all
>> > > > > allocated IDs had been released.
>> > > > 
>> > > > Those open fd are still pointing into kernel memory where the
>> > > > driver
>> > > > used to be. The panic is entirely correct, we should not be
>> > > > unloading
>> > > > the module before those dangling pointers have been made safe.
>> > > > 
>> > > > This is papering over the symptom. How is the module being
>> > > > unloaded
>> > > > with
>> > > > open fd? 
>> > > 
>> > > A user can play with the driver unbind or device remove sysfs
>> > > interface.
>> > 
>> > Sure, but we must still follow all the steps before _unloading_ the
>> > module or else the user is left pointing into reused kernel memory.
>> 
>> I'm not talking about unloading the module, that is prevented by open
>> fds.  The driver still exists after being unbound from a device and may
>> just respond with -ENODEV.
>
> i915_gem_contexts_fini() *is* module unload.

Janusz, please describe what you're doing exactly.

BR,
Jani.



-- 
Jani Nikula, Intel Open Source Graphics Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind

2019-04-04 Thread Chris Wilson
Quoting Janusz Krzysztofik (2019-04-04 11:50:14)
> On Thu, 2019-04-04 at 11:43 +0100, Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
> > > On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> > > > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > > > > From: Janusz Krzysztofik 
> > > > > 
> > > > > In case the driver gets unbound while a device is open, kernel
> > > > > panic
> > > > > may be forced if a list of allocated context IDs is not empty.
> > > > > 
> > > > > When a device is open, the list may happen to be not empty
> > > > > because
> > > > > a
> > > > > context ID, once allocated by a context ID allocator to a
> > > > > context
> > > > > assosiated with that open file descriptor, is released as late
> > > > > as
> > > > > on device close.
> > > > > 
> > > > > On the other hand, there is a need to release all allocated
> > > > > context
> > > > > IDs
> > > > > and destroy the context ID allocator on driver unbind, even if
> > > > > a
> > > > > device
> > > > > is open, in order to free memory resources consumed and prevent
> > > > > from
> > > > > memory leaks.  The purpose of the forced kernel panic was to
> > > > > protect
> > > > > the context ID allocator from being silently destroyed if not
> > > > > all
> > > > > allocated IDs had been released.
> > > > 
> > > > Those open fd are still pointing into kernel memory where the
> > > > driver
> > > > used to be. The panic is entirely correct, we should not be
> > > > unloading
> > > > the module before those dangling pointers have been made safe.
> > > > 
> > > > This is papering over the symptom. How is the module being
> > > > unloaded
> > > > with
> > > > open fd? 
> > > 
> > > A user can play with the driver unbind or device remove sysfs
> > > interface.
> > 
> > Sure, but we must still follow all the steps before _unloading_ the
> > module or else the user is left pointing into reused kernel memory.
> 
> I'm not talking about unloading the module, that is prevented by open
> fds.  The driver still exists after being unbound from a device and may
> just respond with -ENODEV.

i915_gem_contexts_fini() *is* module unload.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind

2019-04-04 Thread Janusz Krzysztofik
On Thu, 2019-04-04 at 11:43 +0100, Chris Wilson wrote:
> Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
> > On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> > > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > > > From: Janusz Krzysztofik 
> > > > 
> > > > In case the driver gets unbound while a device is open, kernel
> > > > panic
> > > > may be forced if a list of allocated context IDs is not empty.
> > > > 
> > > > When a device is open, the list may happen to be not empty
> > > > because
> > > > a
> > > > context ID, once allocated by a context ID allocator to a
> > > > context
> > > > assosiated with that open file descriptor, is released as late
> > > > as
> > > > on device close.
> > > > 
> > > > On the other hand, there is a need to release all allocated
> > > > context
> > > > IDs
> > > > and destroy the context ID allocator on driver unbind, even if
> > > > a
> > > > device
> > > > is open, in order to free memory resources consumed and prevent
> > > > from
> > > > memory leaks.  The purpose of the forced kernel panic was to
> > > > protect
> > > > the context ID allocator from being silently destroyed if not
> > > > all
> > > > allocated IDs had been released.
> > > 
> > > Those open fd are still pointing into kernel memory where the
> > > driver
> > > used to be. The panic is entirely correct, we should not be
> > > unloading
> > > the module before those dangling pointers have been made safe.
> > > 
> > > This is papering over the symptom. How is the module being
> > > unloaded
> > > with
> > > open fd? 
> > 
> > A user can play with the driver unbind or device remove sysfs
> > interface.
> 
> Sure, but we must still follow all the steps before _unloading_ the
> module or else the user is left pointing into reused kernel memory.

I'm not talking about unloading the module, that is prevented by open
fds.  The driver still exists after being unbound from a device and may
just respond with -ENODEV.

Janusz

> -Chris
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind

2019-04-04 Thread Chris Wilson
Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
> On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > > From: Janusz Krzysztofik 
> > > 
> > > In case the driver gets unbound while a device is open, kernel
> > > panic
> > > may be forced if a list of allocated context IDs is not empty.
> > > 
> > > When a device is open, the list may happen to be not empty because
> > > a
> > > context ID, once allocated by a context ID allocator to a context
> > > assosiated with that open file descriptor, is released as late as
> > > on device close.
> > > 
> > > On the other hand, there is a need to release all allocated context
> > > IDs
> > > and destroy the context ID allocator on driver unbind, even if a
> > > device
> > > is open, in order to free memory resources consumed and prevent
> > > from
> > > memory leaks.  The purpose of the forced kernel panic was to
> > > protect
> > > the context ID allocator from being silently destroyed if not all
> > > allocated IDs had been released.
> > 
> > Those open fd are still pointing into kernel memory where the driver
> > used to be. The panic is entirely correct, we should not be unloading
> > the module before those dangling pointers have been made safe.
> > 
> > This is papering over the symptom. How is the module being unloaded
> > with
> > open fd? 
> 
> A user can play with the driver unbind or device remove sysfs
> interface.

Sure, but we must still follow all the steps before _unloading_ the
module or else the user is left pointing into reused kernel memory.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind

2019-04-04 Thread Janusz Krzysztofik
On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > From: Janusz Krzysztofik 
> > 
> > In case the driver gets unbound while a device is open, kernel
> > panic
> > may be forced if a list of allocated context IDs is not empty.
> > 
> > When a device is open, the list may happen to be not empty because
> > a
> > context ID, once allocated by a context ID allocator to a context
> > assosiated with that open file descriptor, is released as late as
> > on device close.
> > 
> > On the other hand, there is a need to release all allocated context
> > IDs
> > and destroy the context ID allocator on driver unbind, even if a
> > device
> > is open, in order to free memory resources consumed and prevent
> > from
> > memory leaks.  The purpose of the forced kernel panic was to
> > protect
> > the context ID allocator from being silently destroyed if not all
> > allocated IDs had been released.
> 
> Those open fd are still pointing into kernel memory where the driver
> used to be. The panic is entirely correct, we should not be unloading
> the module before those dangling pointers have been made safe.
> 
> This is papering over the symptom. How is the module being unloaded
> with
> open fd? 

A user can play with the driver unbind or device remove sysfs
interface.

Thanks,
Janusz

> If all the fd have been closed, how have we failed to flush and
> retire all requests (thereby unpinning the contexts and all other
> pointers).
> -Chris
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind

2019-04-04 Thread Chris Wilson
Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> From: Janusz Krzysztofik 
> 
> In case the driver gets unbound while a device is open, kernel panic
> may be forced if a list of allocated context IDs is not empty.
> 
> When a device is open, the list may happen to be not empty because a
> context ID, once allocated by a context ID allocator to a context
> assosiated with that open file descriptor, is released as late as
> on device close.
> 
> On the other hand, there is a need to release all allocated context IDs
> and destroy the context ID allocator on driver unbind, even if a device
> is open, in order to free memory resources consumed and prevent from
> memory leaks.  The purpose of the forced kernel panic was to protect
> the context ID allocator from being silently destroyed if not all
> allocated IDs had been released.

Those open fd are still pointing into kernel memory where the driver
used to be. The panic is entirely correct, we should not be unloading
the module before those dangling pointers have been made safe.

This is papering over the symptom. How is the module being unloaded with
open fd? If all the fd have been closed, how have we failed to flush and
retire all requests (thereby unpinning the contexts and all other
pointers).
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx