Re: SIGBUS on device disappearance (Re: Warnings in DRM code when removing/unbinding a driver)

2020-01-07 Thread Daniel Vetter
On Mon, Dec 23, 2019 at 11:00:15AM +0200, Pekka Paalanen wrote:
> On Thu, 19 Dec 2019 13:42:33 +0100
> Daniel Vetter  wrote:
> 
> > On Thu, Dec 19, 2019 at 12:32 PM Gerd Hoffmann  wrote:
> > >
> > > While being at it:  How would a driver cleanup properly cleanup gem
> > > objects created by userspace on hotunbind?  Specifically a gem object
> > > pinned to vram?  
> > 
> > Two things:
> > - the mmap needs to be torn down and replaced by something which will
> > sigbus. Probably should have that as a helper (plus vram fault code
> > should use drm_dev_enter/exit to plug races).
> 
> Hi,
> 
> I assume SIGBUS is the traditional way to say "oops, the memory you
> mmapped and tried to access no longer exists". Is there nothing
> else for this?
> 
> I'm asking, because SIGBUS is really hard to handle right in
> userspace. It can be caused by any number of wildly different
> reasons, yet being a signal means that a userspace process can only
> have a single global handler for it. That makes it almost
> impossible to use safely in libraries, because you would want to
> register independent handlers from multiple libraries in the same
> process. Some libraries may also be using threads.
> 
> How to handle a SIGBUS completely depends on what triggered it.
> Almost always userspace wants it to be a non-fatal error. A Wayland
> compositor can hit SIGBUS on accessing wl_shm-based client buffers
> (regular mmapped files), and then it just wants to continue with
> garbage data as if nothing happened and possibly send a protocol
> error to the client provoking it.

For drm drivers that you actually want to hotunplug (as opposed to more
just for driver development) they all use system memory/shmem, so
shouldn't sigbus. I think at least, I haven't tested anything. This is for
udl, or the tiny displays behind an spi bridge.

For pci drivers where the mmap often points at a pci bridge the mmio range
will be gone, so not SIGBUSing is going to be a tough order. Not
impossible, but before we enshrine this into uapi someont will have to do
some serious typing.

> I would also imagine that Mesa, when it starts looking into
> supporting GPU hotunplug, needs to handle vanished mmaps. I don't
> think Mesa can ever install signal handlers, because that would
> mess with the applications that may already be using SIGBUS for
> handling disappearing mmapped files. It needs to start returning
> errors via API calls. I cannot imagine a way to reliably prevent
> such SIGBUS either by e.g. ensuring Mesa gets notified of removal
> before it actually starts failing.

Mesa already blows up in all kinds of interesting ways when it gets an EIO
at execbuf. I think. Robust handling of gpu hotunplug for gl/vk contexts
is going to be more work on top (and mmap is probably the least issue
there, at least right now).

> For now, I'm just looking for a simple "yes" or "no" here for the
> something else. If it's "no" like I expect, creating something else
> is probably in the order of years to get into a usable state. Does
> anyone already have plans towards that?

I agree with you that SIGBUS for mmap of hotunplugged devices is
essentially unusable because sighandlers and all what you point out (would
make it impossible to have robust vk/gl contexts, at least robuts against
hotunplug).

So in principle I'm open to have some other uapi for this, but it's going
to be serios amounts of work across the stack.

For display only udl-style devices otoh I think we should be mostly there,
+/- driver bugs as usual.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


SIGBUS on device disappearance (Re: Warnings in DRM code when removing/unbinding a driver)

2019-12-23 Thread Pekka Paalanen
On Thu, 19 Dec 2019 13:42:33 +0100
Daniel Vetter  wrote:

> On Thu, Dec 19, 2019 at 12:32 PM Gerd Hoffmann  wrote:
> >
> > While being at it:  How would a driver cleanup properly cleanup gem
> > objects created by userspace on hotunbind?  Specifically a gem object
> > pinned to vram?  
> 
> Two things:
> - the mmap needs to be torn down and replaced by something which will
> sigbus. Probably should have that as a helper (plus vram fault code
> should use drm_dev_enter/exit to plug races).

Hi,

I assume SIGBUS is the traditional way to say "oops, the memory you
mmapped and tried to access no longer exists". Is there nothing
else for this?

I'm asking, because SIGBUS is really hard to handle right in
userspace. It can be caused by any number of wildly different
reasons, yet being a signal means that a userspace process can only
have a single global handler for it. That makes it almost
impossible to use safely in libraries, because you would want to
register independent handlers from multiple libraries in the same
process. Some libraries may also be using threads.

How to handle a SIGBUS completely depends on what triggered it.
Almost always userspace wants it to be a non-fatal error. A Wayland
compositor can hit SIGBUS on accessing wl_shm-based client buffers
(regular mmapped files), and then it just wants to continue with
garbage data as if nothing happened and possibly send a protocol
error to the client provoking it.

I would also imagine that Mesa, when it starts looking into
supporting GPU hotunplug, needs to handle vanished mmaps. I don't
think Mesa can ever install signal handlers, because that would
mess with the applications that may already be using SIGBUS for
handling disappearing mmapped files. It needs to start returning
errors via API calls. I cannot imagine a way to reliably prevent
such SIGBUS either by e.g. ensuring Mesa gets notified of removal
before it actually starts failing.

For now, I'm just looking for a simple "yes" or "no" here for the
something else. If it's "no" like I expect, creating something else
is probably in the order of years to get into a usable state. Does
anyone already have plans towards that?


Thanks,
pq


pgpnyCIKvyxtw.pgp
Description: OpenPGP digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel