op 04-08-14 10:36, Christian K?nig schreef: > Hi Maarten, > > Sorry for the delay. I've got way to much todo recently. > > Am 01.08.2014 um 19:46 schrieb Maarten Lankhorst: >> >> On 01-08-14 18:35, Christian K?nig wrote: >>> Am 31.07.2014 um 17:33 schrieb Maarten Lankhorst: >>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com> >>>> --- >>>> V1 had a nasty bug breaking gpu lockup recovery. The fix is not >>>> allowing radeon_fence_driver_check_lockup to take exclusive_lock, >>>> and kill it during lockup recovery instead. >>> That looks like the delayed work starts running as soon as we submit a >>> fence, and not when it's needed for waiting. >>> >>> Since it's a backup for failing IRQs I would rather put it into >>> radeon_irq_kms.c and start/stop it when the IRQs are started/stoped. >> The delayed work is not just for failing irq's, it's also the handler that's >> used to detect lockups, which is why I trigger after processing fences, and >> reset the timer after processing. > > The idea was turning the delayed work on and off when we turn the irq on and > off as well, processing of the delayed work handler can still happen in > radeon_fence.c > >> >> Specifically what happened was this scenario: >> >> - lock up occurs >> - write lock taken by gpu_reset >> - delayed work runs, tries to acquire read lock, blocks >> - gpu_reset tries to cancel delayed work synchronously >> - has to wait for delayed work to finish -> deadlock > > Why do you want to disable the work item from the lockup handler in the first > place? > > Just take the exclusive lock in the work item, when it concurrently runs with > the lockup handler it will just block for the lockup handler to complete. With the delayed work radeon_fence_wait no longer handles unreliable interrupts itself, so it has to run from the lockup handler. But an alternative solution could be adding a radeon_fence_wait_timeout, ignore the timeout and check if fence is signaled on timeout. This would probably be a cleaner solution.
~Maarten