On Mon, May 11, 2020 at 2:34 PM Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:
>
> Am 11.05.20 um 13:19 schrieb Daniel Vetter:
> > On Mon, May 11, 2020 at 11:54:33AM +0200, Daniel Vetter wrote:
> >> On Sat, May 09, 2020 at 02:51:44PM -0400, Andrey Grodzovsky wrote:
> >>> This RFC is a more of a proof of concept then a fully working solution as 
> >>> there are a few unresolved issues we are hopping to get advise on from 
> >>> people on the mailing list.
> >>> Until now extracting a card either by physical extraction (e.g. eGPU with 
> >>> thunderbold connection or by emulation through syfs -> 
> >>> /sys/bus/pci/devices/device_id/remove)
> >>> would cause random crashes in user apps. The random crashes in apps were 
> >>> mostly due to the app having mapped a device backed BO into it's adress 
> >>> space was still
> >>> trying to access the BO while the backing device was gone.
> >>> To answer this first problem Christian suggested to fix the handling of 
> >>> mapped memory in the clients when the device goes away by forcibly unmap 
> >>> all buffers
> >>> the user processes has by clearing their respective VMAs mapping the 
> >>> device BOs. Then when the VMAs try to fill in the page tables again we 
> >>> check in the fault handler
> >>> if the device is removed and if so, return an error. This will generate a 
> >>> SIGBUS to the application which can then cleanly terminate.
> >>> This indeed was done but this in turn created a problem of kernel OOPs 
> >>> were the OOPSes were due to the fact that while the app was terminating 
> >>> because of the SIGBUS
> >>> it would trigger use after free in the driver by calling to accesses 
> >>> device structures that were already released from the pci remove sequence.
> >>> This we handled by introducing a 'flush' seqence during device removal 
> >>> were we wait for drm file reference to drop to 0 meaning all user clients 
> >>> directly using this device terminated.
> >>> With this I was able to cleanly emulate device unplug with X and glxgears 
> >>> running and later emulate device plug back and restart of X and glxgears.
> >>>
> >>> But this use case is only partial and as I see it all the use cases are 
> >>> as follwing and the questions it raises.
> >>>
> >>> 1) Application accesses a BO by opening drm file
> >>>     1.1) BO is mapped into applications address space (BO is CPU visible) 
> >>> - this one we have a solution for by invaldating BO's CPU mapping casuing 
> >>> SIGBUS
> >>>          and termination and waiting for drm file refcound to drop to 0 
> >>> before releasing the device
> >>>     1.2) BO is not mapped into applcation address space (BO is CPU 
> >>> invisible) - no solution yet because how we force the application to 
> >>> terminate in this case ?
> >>>
> >>> 2) Application accesses a BO by importing a DMA-BUF
> >>>     2.1)  BO is mapped into applications address space (BO is CPU 
> >>> visible) - solution is same as 1.1 but instead of waiting for drm file 
> >>> release we wait for the
> >>>           imported dma-buf's file release
> >>>     2.2)  BO is not mapped into applcation address space (BO is CPU 
> >>> invisible) - our solution is to invalidate GPUVM page tables and destroy 
> >>> backing storage for
> >>>                all exported BOs which will in turn casue VM faults in the 
> >>> importing device and then when the importing driver will try to re-attach 
> >>> the imported BO to
> >>>           update mappings we return -ENODEV in the import hook which 
> >>> hopeffuly will cause the user app to terminate.
> >>>
> >>> 3) Applcation opens a drm file or imports a dma-bud and holds a reference 
> >>> but never access any BO or does access but never more after device was 
> >>> unplug - how would we
> >>>     force this applcation to termiante before proceeding with device 
> >>> removal code ? Otherwise the wait in pci remove just hangs for ever.
> >>>
> >>> The attached patches adress 1.1, 2.1 and 2.2, for now only 1.1 fully 
> >>> tested and I am still testing the others but I will be happy for any 
> >>> advise on all the
> >>> described use cases and maybe some alternative and better (more generic) 
> >>> approach to this like maybe obtaining PIDs of relevant processes through 
> >>> some revere
> >>> mapping from device file and exported dma-buf files and send them SIGKILL 
> >>> - would this make more sense or any other method ?
> >>>
> >>> Patches 1-3 address 1.1
> >>> Patch 4 addresses 2.1
> >>> Pathces 5-6 address 2.2
> >>>
> >>> Reference: https://gitlab.freedesktop.org/drm/amd/-/issues/1081
> >> So we've been working on this problem for a few years already (but it's
> >> still not solved), I think you could have saved yourselfs some typing.
> >>
> >> Bunch of things:
> >> - we can't wait for userspace in the hotunplug handlers, that might never
> >>    happen. The correct way is to untangle the lifetime of your hw driver
> >>    for a specific struct pci_device from the drm_device lifetime.
> >>    Infrastructure is all there now, see drm_dev_get/put, drm_dev_unplug and
> >>    drm_dev_enter/exit. A bunch of usb/spi drivers use this 100% correctly
> >>    now, so there's examples. Plus kerneldoc explains stuff.
>
> That's exactly what we tried first and I expected that this is a
> necessity. Ok so back to the drawing board for this.
>
> >>
> >> - for a big driver like amdgpu doing this split up is going to be
> >>    horrendously complex. I know, we've done it for i915, at least
> >>    partially. I strongly recommend that you're using devm_ for managing hw
> >>    related resources (iomap, irq, ...) as much as possible.
> >>
> >>    For drm_device resources (mostly structures and everything related to
> >>    that) we've just merged the drmm_ managed resources framework. There's
> >>    some more work to be done there for various kms objects, but you can at
> >>    least somewhat avoid tedious handrolling for everything internal
> >>    already.
> >>
> >>    Don't ever use devm_kzalloc and friends, I've looked at hundreds of uses
> >>    of this in drm, they're all wrong.
> >>
> >> - dma-buf is hilarious (and atm unfixed), dma-fence is even worse. In
> >>    theory they're already refcounted and all and so should work, in
> >>    practice I think we need to refcount the underlying drm_device with
> >>    drm_dev_get/put to avoid the worst fall-out.
> > oh I forgot one, since it's new since the last time we've seriously
> > discussed this: p2p dma-buf
> >
> > But that /should/ be handleable with the move_notify callback. Assuming we
> > don't have any bugs anywhere, and the importer can indeed get rid of all
> > its mapping, always.
>
> Yeah, already noted that as well in the internal discussion.
>
> >
> > But for completeness probably need this one, just to keep it noted.
> > -Daniel
> >
> >> - One unfortunate thing with drm_dev_unplug is that the driver core is
> >>    very opinionated and doesn't tell you whether it's a hotunplug or a
> >>    driver unload. In the former case trying to shut down hw just wastes
> >>    time (and might hit driver bugs), in the latter case driver engineers
> >>    very much expect everything to be shut down.
> >>
> >>    Right now you can only have one or the other, so this needs a module
> >>    option hack or similar (default to the correct hotunplug behaviour for
> >>    users).
> >>
> >> - SIGBUS is better than crashing the kernel, but it's not even close for
> >>    users. They still lose everything because everything crashes because in
> >>    my experience, in practice, no one ever handles errors. There's a few
> >>    more things on top:
> >>
> >>    - sighandlers are global, which means only the app can use it. You can't
> >>      use it in e.g. mesa. They're also not composable, so if you have on
> >>      sighandler for gpu1 and a 2nd one for gpu2 (could be different vendor)
> >>      it's all sadness. Hence "usersapce will handle SIGBUS" wont work.
> >>
> >>    - worse, neither vk nor gl (to my knowledge) have a concept of events
> >>      for when the gpu died. The only stuff you have is things like
> >>      arb_robustness which says a) everything continues as if nothing
> >>      happened b) there's a function where you can ask whether your gl
> >>      context and all the textures/buffers are toast.
> >>
> >>      I think that's about the only hotunplug application model we can
> >>      realistically expect applications to support. That means _all_ errors
> >>      need to be silently eaten by either mesa or the kernel. On i915 the
> >>      model (also for terminally wedged gpu hangs) is that all ioctl keep
> >>      working, mmaps keep working, and execbuf gives you an -EIO (which mesa
> >>      eats silently iirc for arb_robustness).
> >>
> >>    Conclusion is that SIGBUS is imo a no-go, and the only option we have is
> >>    that a) mmaps fully keep working, doable for shmem or b) we put some
> >>    fake memory in there (for vram or whatever), maybe even only a single
> >>    page for all fake memory.
>
> Ok, good to know.
>
> So to summarize no application termination, but instead redirect all
> memory access to a dummy page.
>
>  From the IOCTLs we return -ENODEV instead of -EIO. Is that a problem?

For atomic I think it'd be good if we're consistent across drivers.

For render ioctl all that matters is that you end up implementing
arb_robusteness/vk device removal/whatever else custom interface HSA
has/... correctly. So entirely up to a discussion between amdgpu and
radeonsi folks I'd say.
-Daniel

> Thanks for the comments,
> Christian.
>
> >>
> >> - you probably want arb_robustness and similar stuff in userspace as a
> >>    first step.
> >>
> >> tldr;
> >> - refcounting, not waiting for userspace
> >> - nothing can fail because userspace wont handle it
> >>
> >> That's at least my take on this mess, and what we've been pushing for over
> >> the past few years. For kms-only drm_driver we should have achieved that
> >> by now (plus/minus maybe some issues for dma-buf/fences, but kms-only
> >> dma-buf/fences are simple enough that maybe we don't go boom yet).
> >>
> >> For big gpus with rendering I think best next step would be to type up a
> >> reasonable Gran Plan (into Documentation/gpu/todo.rst) with all the issues
> >> and likely solutions. And then bikeshed that, since the above is just my
> >> take on all this.
> >>
> >> Cheers, Daniel
> >>
> >>> Andrey Grodzovsky (6):
> >>>    drm/ttm: Add unampping of the entire device address space
> >>>    drm/amdgpu: Force unmap all user VMAs on device removal.
> >>>    drm/amdgpu: Wait for all user clients
> >>>    drm/amdgpu: Wait for all clients importing out dma-bufs.
> >>>    drm/ttm: Add destroy flag in TTM BO eviction interface
> >>>    drm/amdgpu: Use TTM MMs destroy interface
> >>>
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  3 ++
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  7 +++-
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 27 ++++++++++++-
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c     | 22 ++++++++--
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  9 +++++
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c     |  4 ++
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 17 +++++++-
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 +
> >>>   drivers/gpu/drm/nouveau/nouveau_drm.c       |  2 +-
> >>>   drivers/gpu/drm/qxl/qxl_object.c            |  4 +-
> >>>   drivers/gpu/drm/radeon/radeon_object.c      |  2 +-
> >>>   drivers/gpu/drm/ttm/ttm_bo.c                | 63 
> >>> +++++++++++++++++++++--------
> >>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c         |  6 +--
> >>>   include/drm/ttm/ttm_bo_api.h                |  2 +-
> >>>   include/drm/ttm/ttm_bo_driver.h             |  2 +
> >>>   16 files changed, 139 insertions(+), 34 deletions(-)
> >>>
> >>> --
> >>> 2.7.4
> >>>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to