Hi Maarten, try to implement this as a replacement for specifying the RADEON_FENCE_JIFFIES_TIMEOUT on wait_event_*. And reset the timeout every time radeon_fence_process is called and not only when any of the sequences increment.
I don't have the time right now to look deeper into it or help with the patch, but the general approach sounds valid to me. Regards, Christian. Am 23.07.2014 16:05, schrieb Maarten Lankhorst: > op 23-07-14 15:16, Maarten Lankhorst schreef: >> op 23-07-14 14:36, Christian K?nig schreef: >>> Am 23.07.2014 12:52, schrieb Daniel Vetter: >>>> On Wed, Jul 23, 2014 at 12:13 PM, Christian K?nig >>>> <christian.koenig at amd.com> wrote: >>>>>> And the dma-buf would still have fences belonging to both drivers, and it >>>>>> would still call from outside the driver. >>>>> Calling from outside the driver is fine as long as the driver can do >>>>> everything necessary to complete it's work and isn't forced into any ugly >>>>> hacks and things that are not 100% reliable. >>>>> >>>>> So I don't see much other approach as integrating recovery code for not >>>>> firing interrupts and some kind of lockup handling into the fence code as >>>>> well. >>>> That approach doesn't really work at that well since every driver has >>>> it's own reset semantics. And we're trying to move away from global >>>> reset to fine-grained reset. So stop-the-world reset is out of >>>> fashion, at least for i915. As you said, reset is normal in gpus and >>>> we're trying to make reset less invasive. I really don't see a point >>>> in imposing a reset scheme upon all drivers and I think you have about >>>> as much motivation to convert radeon to the scheme used by i915 as >>>> I'll have for converting to the one used by radeon. If it would fit at >>>> all. >>> Oh my! No, I didn't wanted to suggest any global reset infrastructure. >>> >>> My idea was more that the fence framework provides a >>> fence->process_signaling callback that is periodically called after >>> enable_signaling is called to trigger manual signal processing in the >>> driver. >>> >>> This would both be suitable as a fallback in case of not working interrupts >>> as well as a chance for any driver to do necessary lockup handling. >> I managed to do it without needing it to be part of the interface? I'm not >> sure whether radeon_fence_driver_recheck needs exclusive_lock, but if so >> it's a small change.. >> >> diff --git a/drivers/gpu/drm/radeon/radeon.h >> b/drivers/gpu/drm/radeon/radeon.h >> index 7fbfd41479f1..51b646b9c8bb 100644 >> --- a/drivers/gpu/drm/radeon/radeon.h >> +++ b/drivers/gpu/drm/radeon/radeon.h >> @@ -345,6 +345,9 @@ struct radeon_fence_driver { >> uint64_t sync_seq[RADEON_NUM_RINGS]; >> atomic64_t last_seq; >> bool initialized; >> + struct delayed_work work; >> + struct radeon_device *rdev; >> + unsigned ring; >> }; >> >> struct radeon_fence_cb { >> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c >> b/drivers/gpu/drm/radeon/radeon_fence.c >> index da83f36dd708..955c825946ad 100644 >> --- a/drivers/gpu/drm/radeon/radeon_fence.c >> +++ b/drivers/gpu/drm/radeon/radeon_fence.c >> @@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device >> *rdev, int ring) >> } >> } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq); >> >> + if (!wake && last_seq < last_emitted) >> + schedule_delayed_work(&rdev->fence_drv[ring].work, >> jiffies_to_msecs(10)); >> + >> > When trying this: if (seq < last_emitted) is probably a better check. > > ~Maarten >