[Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery
On Thu, Apr 26, 2012 at 05:32:29PM +1000, Ben Skeggs wrote: > On Wed, 2012-04-25 at 23:20 +0200, Marcin Slusarz wrote: > > Overall idea: > > Detect lockups by watching for timeouts (vm flush / fence), return -EIOs, > > handle them at ioctl level, reset the GPU and repeat last ioctl. > > > > GPU reset is done by doing suspend / resume cycle with few tweaks: > > - CPU-only bo eviction > > - ignoring vm flush / fence timeouts > > - shortening waits > Okay. I've thought about this a bit for a couple of days and think I'll > be able to coherently share my thoughts on this issue now :) > > Firstly, while I agree that we need to become more resilient to errors, > I don't think that following in the radeon/intel footsteps with > something (imo, hackish) like this is the right choice for us > necessarily. This is not only radeon/intel way. Windows, since Vista SP1, does the same - see http://msdn.microsoft.com/en-us/windows/hardware/gg487368. It's funny how similar it is to this patch (I haven't seen this page earlier). If you fear people will stop reporting bugs - don't. GPU reset is painfully slow and can take up to 50 seconds (BO eviction is the most time consuming part), so people will be annoyed enough to report them. Currently, GPU lockups make users so angry, they frequently switch to blob without even thinking about reporting anything. > The *vast* majority of "lockups" we have are as a result of us badly > mishandling exceptions reported to us by the GPU. There are a couple of > exceptions, however, they're very rare.. > A very common example is where people gain DMA_PUSHERs for whatever > reason, and things go haywire eventually. Nope, I had tens of lockups during testing, and only once I had DMA_PUSHER before detecting GPU lockup. > To handle a DMA_PUSHER > sanely, generally you have to drop all pending commands for the channel > (set GET=PUT, etc) and continue on. However, this leaves us with fences > and semaphores unsignalled etc, causing issues further up the stack with > perfectly good channels hanging on attempting to sync with the crashed > channel etc. > > The next most common example I can think of is nv4x hardware, getting a > LIMIT_COLOR/ZETA exception from PGRAPH, and then a hang. The solution > is simple, learn how to handle the exception, log it, and PGRAPH > survives. > > I strongly believe that if we focused our efforts on dealing with what > the GPU reports to us a lot better, we'll find we really don't need such > "lockup recovery". While I agree we need to improve on error handling to make "lockup recovery" not needed, the reality is we can't predict everything and driver needs to cope with its own bugs. > I am, however, considering pulling the vm flush timeout error > propagation and break-out-of-waits-on-signals that builds on it. As we > really do need to become better at having killable processes if things > go wrong :) Good :) Marcin
Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery
On Thu, Apr 26, 2012 at 05:32:29PM +1000, Ben Skeggs wrote: > On Wed, 2012-04-25 at 23:20 +0200, Marcin Slusarz wrote: > > Overall idea: > > Detect lockups by watching for timeouts (vm flush / fence), return -EIOs, > > handle them at ioctl level, reset the GPU and repeat last ioctl. > > > > GPU reset is done by doing suspend / resume cycle with few tweaks: > > - CPU-only bo eviction > > - ignoring vm flush / fence timeouts > > - shortening waits > Okay. I've thought about this a bit for a couple of days and think I'll > be able to coherently share my thoughts on this issue now :) > > Firstly, while I agree that we need to become more resilient to errors, > I don't think that following in the radeon/intel footsteps with > something (imo, hackish) like this is the right choice for us > necessarily. This is not only radeon/intel way. Windows, since Vista SP1, does the same - see http://msdn.microsoft.com/en-us/windows/hardware/gg487368. It's funny how similar it is to this patch (I haven't seen this page earlier). If you fear people will stop reporting bugs - don't. GPU reset is painfully slow and can take up to 50 seconds (BO eviction is the most time consuming part), so people will be annoyed enough to report them. Currently, GPU lockups make users so angry, they frequently switch to blob without even thinking about reporting anything. > The *vast* majority of "lockups" we have are as a result of us badly > mishandling exceptions reported to us by the GPU. There are a couple of > exceptions, however, they're very rare.. > A very common example is where people gain DMA_PUSHERs for whatever > reason, and things go haywire eventually. Nope, I had tens of lockups during testing, and only once I had DMA_PUSHER before detecting GPU lockup. > To handle a DMA_PUSHER > sanely, generally you have to drop all pending commands for the channel > (set GET=PUT, etc) and continue on. However, this leaves us with fences > and semaphores unsignalled etc, causing issues further up the stack with > perfectly good channels hanging on attempting to sync with the crashed > channel etc. > > The next most common example I can think of is nv4x hardware, getting a > LIMIT_COLOR/ZETA exception from PGRAPH, and then a hang. The solution > is simple, learn how to handle the exception, log it, and PGRAPH > survives. > > I strongly believe that if we focused our efforts on dealing with what > the GPU reports to us a lot better, we'll find we really don't need such > "lockup recovery". While I agree we need to improve on error handling to make "lockup recovery" not needed, the reality is we can't predict everything and driver needs to cope with its own bugs. > I am, however, considering pulling the vm flush timeout error > propagation and break-out-of-waits-on-signals that builds on it. As we > really do need to become better at having killable processes if things > go wrong :) Good :) Marcin ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery
On Wed, 2012-04-25 at 23:20 +0200, Marcin Slusarz wrote: > Overall idea: > Detect lockups by watching for timeouts (vm flush / fence), return -EIOs, > handle them at ioctl level, reset the GPU and repeat last ioctl. > > GPU reset is done by doing suspend / resume cycle with few tweaks: > - CPU-only bo eviction > - ignoring vm flush / fence timeouts > - shortening waits Okay. I've thought about this a bit for a couple of days and think I'll be able to coherently share my thoughts on this issue now :) Firstly, while I agree that we need to become more resilient to errors, I don't think that following in the radeon/intel footsteps with something (imo, hackish) like this is the right choice for us necessarily. The *vast* majority of "lockups" we have are as a result of us badly mishandling exceptions reported to us by the GPU. There are a couple of exceptions, however, they're very rare.. A very common example is where people gain DMA_PUSHERs for whatever reason, and things go haywire eventually. To handle a DMA_PUSHER sanely, generally you have to drop all pending commands for the channel (set GET=PUT, etc) and continue on. However, this leaves us with fences and semaphores unsignalled etc, causing issues further up the stack with perfectly good channels hanging on attempting to sync with the crashed channel etc. The next most common example I can think of is nv4x hardware, getting a LIMIT_COLOR/ZETA exception from PGRAPH, and then a hang. The solution is simple, learn how to handle the exception, log it, and PGRAPH survives. I strongly believe that if we focused our efforts on dealing with what the GPU reports to us a lot better, we'll find we really don't need such "lockup recovery". I am, however, considering pulling the vm flush timeout error propagation and break-out-of-waits-on-signals that builds on it. As we really do need to become better at having killable processes if things go wrong :) Ben. > > Signed-off-by: Marcin Slusarz > --- > drivers/gpu/drm/nouveau/Makefile |2 +- > drivers/gpu/drm/nouveau/nouveau_bo.c |2 +- > drivers/gpu/drm/nouveau/nouveau_channel.c |5 +- > drivers/gpu/drm/nouveau/nouveau_drv.c | 56 ++- > drivers/gpu/drm/nouveau/nouveau_drv.h | 45 - > drivers/gpu/drm/nouveau/nouveau_fence.c|7 +- > drivers/gpu/drm/nouveau/nouveau_gem.c | 14 +++- > drivers/gpu/drm/nouveau/nouveau_notifier.c |3 + > drivers/gpu/drm/nouveau/nouveau_object.c |6 + > drivers/gpu/drm/nouveau/nouveau_reset.c| 148 > > drivers/gpu/drm/nouveau/nouveau_state.c|6 + > drivers/gpu/drm/nouveau/nv50_graph.c | 11 +- > 12 files changed, 290 insertions(+), 15 deletions(-) > create mode 100644 drivers/gpu/drm/nouveau/nouveau_reset.c > > diff --git a/drivers/gpu/drm/nouveau/Makefile > b/drivers/gpu/drm/nouveau/Makefile > index 03860f5..77d0c33 100644 > --- a/drivers/gpu/drm/nouveau/Makefile > +++ b/drivers/gpu/drm/nouveau/Makefile > @@ -9,7 +9,7 @@ nouveau-y := nouveau_drv.o nouveau_state.o nouveau_channel.o > nouveau_mem.o \ > nouveau_bo.o nouveau_fence.o nouveau_gem.o nouveau_ttm.o \ > nouveau_hw.o nouveau_calc.o nouveau_bios.o nouveau_i2c.o \ > nouveau_display.o nouveau_connector.o nouveau_fbcon.o \ > - nouveau_hdmi.o nouveau_dp.o nouveau_ramht.o \ > + nouveau_hdmi.o nouveau_dp.o nouveau_ramht.o nouveau_reset.o \ >nouveau_pm.o nouveau_volt.o nouveau_perf.o nouveau_temp.o \ >nouveau_mm.o nouveau_vm.o nouveau_mxm.o nouveau_gpio.o \ > nv04_timer.o \ > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c > b/drivers/gpu/drm/nouveau/nouveau_bo.c > index 5b0dc50..7de6cad 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > @@ -936,7 +936,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, > bool intr, > } > > /* Software copy if the card isn't up and running yet. */ > - if (!dev_priv->channel) { > + if (!dev_priv->channel || nouveau_gpu_reset_in_progress(dev_priv->dev)) > { > ret = ttm_bo_move_memcpy(bo, evict, no_wait_reserve, > no_wait_gpu, new_mem); > goto out; > } > diff --git a/drivers/gpu/drm/nouveau/nouveau_channel.c > b/drivers/gpu/drm/nouveau/nouveau_channel.c > index 846afb0..c0fa5a7 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_channel.c > +++ b/drivers/gpu/drm/nouveau/nouveau_channel.c > @@ -420,7 +420,7 @@ nouveau_ioctl_fifo_alloc(struct drm_device *dev, void > *data, > init->fb_ctxdma_handle, > init->tt_ctxdma_handle); > if (ret) > - return ret; > + goto out; > init->channel = chan->id; > > if (nouveau_vram_pushbuf == 0) { > @@ -450,6 +450,9 @@ nouveau_ioctl_fifo_alloc(struct drm_device *dev, void > *data, > if (ret ==
Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery
On Wed, 2012-04-25 at 23:20 +0200, Marcin Slusarz wrote: > Overall idea: > Detect lockups by watching for timeouts (vm flush / fence), return -EIOs, > handle them at ioctl level, reset the GPU and repeat last ioctl. > > GPU reset is done by doing suspend / resume cycle with few tweaks: > - CPU-only bo eviction > - ignoring vm flush / fence timeouts > - shortening waits Okay. I've thought about this a bit for a couple of days and think I'll be able to coherently share my thoughts on this issue now :) Firstly, while I agree that we need to become more resilient to errors, I don't think that following in the radeon/intel footsteps with something (imo, hackish) like this is the right choice for us necessarily. The *vast* majority of "lockups" we have are as a result of us badly mishandling exceptions reported to us by the GPU. There are a couple of exceptions, however, they're very rare.. A very common example is where people gain DMA_PUSHERs for whatever reason, and things go haywire eventually. To handle a DMA_PUSHER sanely, generally you have to drop all pending commands for the channel (set GET=PUT, etc) and continue on. However, this leaves us with fences and semaphores unsignalled etc, causing issues further up the stack with perfectly good channels hanging on attempting to sync with the crashed channel etc. The next most common example I can think of is nv4x hardware, getting a LIMIT_COLOR/ZETA exception from PGRAPH, and then a hang. The solution is simple, learn how to handle the exception, log it, and PGRAPH survives. I strongly believe that if we focused our efforts on dealing with what the GPU reports to us a lot better, we'll find we really don't need such "lockup recovery". I am, however, considering pulling the vm flush timeout error propagation and break-out-of-waits-on-signals that builds on it. As we really do need to become better at having killable processes if things go wrong :) Ben. > > Signed-off-by: Marcin Slusarz > --- > drivers/gpu/drm/nouveau/Makefile |2 +- > drivers/gpu/drm/nouveau/nouveau_bo.c |2 +- > drivers/gpu/drm/nouveau/nouveau_channel.c |5 +- > drivers/gpu/drm/nouveau/nouveau_drv.c | 56 ++- > drivers/gpu/drm/nouveau/nouveau_drv.h | 45 - > drivers/gpu/drm/nouveau/nouveau_fence.c|7 +- > drivers/gpu/drm/nouveau/nouveau_gem.c | 14 +++- > drivers/gpu/drm/nouveau/nouveau_notifier.c |3 + > drivers/gpu/drm/nouveau/nouveau_object.c |6 + > drivers/gpu/drm/nouveau/nouveau_reset.c| 148 > > drivers/gpu/drm/nouveau/nouveau_state.c|6 + > drivers/gpu/drm/nouveau/nv50_graph.c | 11 +- > 12 files changed, 290 insertions(+), 15 deletions(-) > create mode 100644 drivers/gpu/drm/nouveau/nouveau_reset.c > > diff --git a/drivers/gpu/drm/nouveau/Makefile > b/drivers/gpu/drm/nouveau/Makefile > index 03860f5..77d0c33 100644 > --- a/drivers/gpu/drm/nouveau/Makefile > +++ b/drivers/gpu/drm/nouveau/Makefile > @@ -9,7 +9,7 @@ nouveau-y := nouveau_drv.o nouveau_state.o nouveau_channel.o > nouveau_mem.o \ > nouveau_bo.o nouveau_fence.o nouveau_gem.o nouveau_ttm.o \ > nouveau_hw.o nouveau_calc.o nouveau_bios.o nouveau_i2c.o \ > nouveau_display.o nouveau_connector.o nouveau_fbcon.o \ > - nouveau_hdmi.o nouveau_dp.o nouveau_ramht.o \ > + nouveau_hdmi.o nouveau_dp.o nouveau_ramht.o nouveau_reset.o \ >nouveau_pm.o nouveau_volt.o nouveau_perf.o nouveau_temp.o \ >nouveau_mm.o nouveau_vm.o nouveau_mxm.o nouveau_gpio.o \ > nv04_timer.o \ > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c > b/drivers/gpu/drm/nouveau/nouveau_bo.c > index 5b0dc50..7de6cad 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > @@ -936,7 +936,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, > bool intr, > } > > /* Software copy if the card isn't up and running yet. */ > - if (!dev_priv->channel) { > + if (!dev_priv->channel || nouveau_gpu_reset_in_progress(dev_priv->dev)) > { > ret = ttm_bo_move_memcpy(bo, evict, no_wait_reserve, > no_wait_gpu, new_mem); > goto out; > } > diff --git a/drivers/gpu/drm/nouveau/nouveau_channel.c > b/drivers/gpu/drm/nouveau/nouveau_channel.c > index 846afb0..c0fa5a7 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_channel.c > +++ b/drivers/gpu/drm/nouveau/nouveau_channel.c > @@ -420,7 +420,7 @@ nouveau_ioctl_fifo_alloc(struct drm_device *dev, void > *data, > init->fb_ctxdma_handle, > init->tt_ctxdma_handle); > if (ret) > - return ret; > + goto out; > init->channel = chan->id; > > if (nouveau_vram_pushbuf == 0) { > @@ -450,6 +450,9 @@ nouveau_ioctl_fifo_alloc(struct drm_device *dev, void > *data, > if (ret ==
[PATCH v2 4/4] drm/nouveau: gpu lockup recovery
On Wed, Apr 25, 2012 at 11:20:36PM +0200, Marcin Slusarz wrote: > Overall idea: > Detect lockups by watching for timeouts (vm flush / fence), return -EIOs, > handle them at ioctl level, reset the GPU and repeat last ioctl. > > GPU reset is done by doing suspend / resume cycle with few tweaks: > - CPU-only bo eviction > - ignoring vm flush / fence timeouts > - shortening waits > > Signed-off-by: Marcin Slusarz > --- What changed from v1: - moved ioctl locking from drm core to nouveau - made down_reads interruptible - fixed build bug on 32-bit systems
[PATCH v2 4/4] drm/nouveau: gpu lockup recovery
Overall idea: Detect lockups by watching for timeouts (vm flush / fence), return -EIOs, handle them at ioctl level, reset the GPU and repeat last ioctl. GPU reset is done by doing suspend / resume cycle with few tweaks: - CPU-only bo eviction - ignoring vm flush / fence timeouts - shortening waits Signed-off-by: Marcin Slusarz --- drivers/gpu/drm/nouveau/Makefile |2 +- drivers/gpu/drm/nouveau/nouveau_bo.c |2 +- drivers/gpu/drm/nouveau/nouveau_channel.c |5 +- drivers/gpu/drm/nouveau/nouveau_drv.c | 56 ++- drivers/gpu/drm/nouveau/nouveau_drv.h | 45 - drivers/gpu/drm/nouveau/nouveau_fence.c|7 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 14 +++- drivers/gpu/drm/nouveau/nouveau_notifier.c |3 + drivers/gpu/drm/nouveau/nouveau_object.c |6 + drivers/gpu/drm/nouveau/nouveau_reset.c| 148 drivers/gpu/drm/nouveau/nouveau_state.c|6 + drivers/gpu/drm/nouveau/nv50_graph.c | 11 +- 12 files changed, 290 insertions(+), 15 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/nouveau_reset.c diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile index 03860f5..77d0c33 100644 --- a/drivers/gpu/drm/nouveau/Makefile +++ b/drivers/gpu/drm/nouveau/Makefile @@ -9,7 +9,7 @@ nouveau-y := nouveau_drv.o nouveau_state.o nouveau_channel.o nouveau_mem.o \ nouveau_bo.o nouveau_fence.o nouveau_gem.o nouveau_ttm.o \ nouveau_hw.o nouveau_calc.o nouveau_bios.o nouveau_i2c.o \ nouveau_display.o nouveau_connector.o nouveau_fbcon.o \ - nouveau_hdmi.o nouveau_dp.o nouveau_ramht.o \ + nouveau_hdmi.o nouveau_dp.o nouveau_ramht.o nouveau_reset.o \ nouveau_pm.o nouveau_volt.o nouveau_perf.o nouveau_temp.o \ nouveau_mm.o nouveau_vm.o nouveau_mxm.o nouveau_gpio.o \ nv04_timer.o \ diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 5b0dc50..7de6cad 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -936,7 +936,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, bool intr, } /* Software copy if the card isn't up and running yet. */ - if (!dev_priv->channel) { + if (!dev_priv->channel || nouveau_gpu_reset_in_progress(dev_priv->dev)) { ret = ttm_bo_move_memcpy(bo, evict, no_wait_reserve, no_wait_gpu, new_mem); goto out; } diff --git a/drivers/gpu/drm/nouveau/nouveau_channel.c b/drivers/gpu/drm/nouveau/nouveau_channel.c index 846afb0..c0fa5a7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_channel.c +++ b/drivers/gpu/drm/nouveau/nouveau_channel.c @@ -420,7 +420,7 @@ nouveau_ioctl_fifo_alloc(struct drm_device *dev, void *data, init->fb_ctxdma_handle, init->tt_ctxdma_handle); if (ret) - return ret; + goto out; init->channel = chan->id; if (nouveau_vram_pushbuf == 0) { @@ -450,6 +450,9 @@ nouveau_ioctl_fifo_alloc(struct drm_device *dev, void *data, if (ret == 0) atomic_inc(&chan->users); /* userspace reference */ nouveau_channel_put(&chan); +out: + if (ret == -EIO) + ret = nouveau_reset_device(dev); return ret; } diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.c b/drivers/gpu/drm/nouveau/nouveau_drv.c index 090fff6..261e1f5 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.c +++ b/drivers/gpu/drm/nouveau/nouveau_drv.c @@ -237,7 +237,7 @@ nouveau_pci_suspend(struct pci_dev *pdev, pm_message_t pm_state) if (!dev_priv->eng[e]) continue; - ret = dev_priv->eng[e]->fini(dev, e, true); + ret = dev_priv->eng[e]->fini(dev, e, !nouveau_gpu_reset_in_progress(dev)); if (ret) { NV_ERROR(dev, "... engine %d failed: %d\n", e, ret); goto out_abort; @@ -443,11 +443,63 @@ nouveau_pci_resume(struct pci_dev *pdev) return 0; } +void intr_rwsem_init(struct intr_rwsem *r) +{ + init_rwsem(&r->rwsem); + mutex_init(&r->mutex); +} + +int intr_rwsem_down_read_interruptible(struct intr_rwsem *r) +{ + while (down_read_trylock(&r->rwsem) == 0) { + int ret = mutex_lock_interruptible(&r->mutex); + if (ret) + return ret; + mutex_unlock(&r->mutex); + } + return 0; +} + +void intr_rwsem_up_read(struct intr_rwsem *r) +{ + up_read(&r->rwsem); +} + +void intr_rwsem_down_write(struct intr_rwsem *r) +{ + mutex_lock(&r->mutex); + down_write(&r->rwsem); +} + +void intr_rwsem_up_write(struct intr_rwsem *r) +{ + up_write(&r->rwsem); + mutex_unlock(&r->mutex); +} + +static long nouveau_ioctl(struct file *filp,
Re: [PATCH v2 4/4] drm/nouveau: gpu lockup recovery
On Wed, Apr 25, 2012 at 11:20:36PM +0200, Marcin Slusarz wrote: > Overall idea: > Detect lockups by watching for timeouts (vm flush / fence), return -EIOs, > handle them at ioctl level, reset the GPU and repeat last ioctl. > > GPU reset is done by doing suspend / resume cycle with few tweaks: > - CPU-only bo eviction > - ignoring vm flush / fence timeouts > - shortening waits > > Signed-off-by: Marcin Slusarz > --- What changed from v1: - moved ioctl locking from drm core to nouveau - made down_reads interruptible - fixed build bug on 32-bit systems ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 4/4] drm/nouveau: gpu lockup recovery
Overall idea: Detect lockups by watching for timeouts (vm flush / fence), return -EIOs, handle them at ioctl level, reset the GPU and repeat last ioctl. GPU reset is done by doing suspend / resume cycle with few tweaks: - CPU-only bo eviction - ignoring vm flush / fence timeouts - shortening waits Signed-off-by: Marcin Slusarz --- drivers/gpu/drm/nouveau/Makefile |2 +- drivers/gpu/drm/nouveau/nouveau_bo.c |2 +- drivers/gpu/drm/nouveau/nouveau_channel.c |5 +- drivers/gpu/drm/nouveau/nouveau_drv.c | 56 ++- drivers/gpu/drm/nouveau/nouveau_drv.h | 45 - drivers/gpu/drm/nouveau/nouveau_fence.c|7 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 14 +++- drivers/gpu/drm/nouveau/nouveau_notifier.c |3 + drivers/gpu/drm/nouveau/nouveau_object.c |6 + drivers/gpu/drm/nouveau/nouveau_reset.c| 148 drivers/gpu/drm/nouveau/nouveau_state.c|6 + drivers/gpu/drm/nouveau/nv50_graph.c | 11 +- 12 files changed, 290 insertions(+), 15 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/nouveau_reset.c diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile index 03860f5..77d0c33 100644 --- a/drivers/gpu/drm/nouveau/Makefile +++ b/drivers/gpu/drm/nouveau/Makefile @@ -9,7 +9,7 @@ nouveau-y := nouveau_drv.o nouveau_state.o nouveau_channel.o nouveau_mem.o \ nouveau_bo.o nouveau_fence.o nouveau_gem.o nouveau_ttm.o \ nouveau_hw.o nouveau_calc.o nouveau_bios.o nouveau_i2c.o \ nouveau_display.o nouveau_connector.o nouveau_fbcon.o \ - nouveau_hdmi.o nouveau_dp.o nouveau_ramht.o \ + nouveau_hdmi.o nouveau_dp.o nouveau_ramht.o nouveau_reset.o \ nouveau_pm.o nouveau_volt.o nouveau_perf.o nouveau_temp.o \ nouveau_mm.o nouveau_vm.o nouveau_mxm.o nouveau_gpio.o \ nv04_timer.o \ diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 5b0dc50..7de6cad 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -936,7 +936,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, bool intr, } /* Software copy if the card isn't up and running yet. */ - if (!dev_priv->channel) { + if (!dev_priv->channel || nouveau_gpu_reset_in_progress(dev_priv->dev)) { ret = ttm_bo_move_memcpy(bo, evict, no_wait_reserve, no_wait_gpu, new_mem); goto out; } diff --git a/drivers/gpu/drm/nouveau/nouveau_channel.c b/drivers/gpu/drm/nouveau/nouveau_channel.c index 846afb0..c0fa5a7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_channel.c +++ b/drivers/gpu/drm/nouveau/nouveau_channel.c @@ -420,7 +420,7 @@ nouveau_ioctl_fifo_alloc(struct drm_device *dev, void *data, init->fb_ctxdma_handle, init->tt_ctxdma_handle); if (ret) - return ret; + goto out; init->channel = chan->id; if (nouveau_vram_pushbuf == 0) { @@ -450,6 +450,9 @@ nouveau_ioctl_fifo_alloc(struct drm_device *dev, void *data, if (ret == 0) atomic_inc(&chan->users); /* userspace reference */ nouveau_channel_put(&chan); +out: + if (ret == -EIO) + ret = nouveau_reset_device(dev); return ret; } diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.c b/drivers/gpu/drm/nouveau/nouveau_drv.c index 090fff6..261e1f5 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.c +++ b/drivers/gpu/drm/nouveau/nouveau_drv.c @@ -237,7 +237,7 @@ nouveau_pci_suspend(struct pci_dev *pdev, pm_message_t pm_state) if (!dev_priv->eng[e]) continue; - ret = dev_priv->eng[e]->fini(dev, e, true); + ret = dev_priv->eng[e]->fini(dev, e, !nouveau_gpu_reset_in_progress(dev)); if (ret) { NV_ERROR(dev, "... engine %d failed: %d\n", e, ret); goto out_abort; @@ -443,11 +443,63 @@ nouveau_pci_resume(struct pci_dev *pdev) return 0; } +void intr_rwsem_init(struct intr_rwsem *r) +{ + init_rwsem(&r->rwsem); + mutex_init(&r->mutex); +} + +int intr_rwsem_down_read_interruptible(struct intr_rwsem *r) +{ + while (down_read_trylock(&r->rwsem) == 0) { + int ret = mutex_lock_interruptible(&r->mutex); + if (ret) + return ret; + mutex_unlock(&r->mutex); + } + return 0; +} + +void intr_rwsem_up_read(struct intr_rwsem *r) +{ + up_read(&r->rwsem); +} + +void intr_rwsem_down_write(struct intr_rwsem *r) +{ + mutex_lock(&r->mutex); + down_write(&r->rwsem); +} + +void intr_rwsem_up_write(struct intr_rwsem *r) +{ + up_write(&r->rwsem); + mutex_unlock(&r->mutex); +} + +static long nouveau_ioctl(struct file *