http://bugs.freedesktop.org/show_bug.cgi?id=14937
Summary: Intel i915 hard lockup when using vsync-synchronized swapping Product: DRI Version: unspecified Platform: All OS/Version: Linux (All) Status: NEW Keywords: patch Severity: major Priority: highest Component: DRM modules AssignedTo: dri-devel@lists.sourceforge.net ReportedBy: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Created an attachment (id=15005) --> (http://bugs.freedesktop.org/attachment.cgi?id=15005) standard signed-off kernel patch to fix intel driver lockup on buffer swap Running an opengl application which attempts to synchronize buffer swapping to the vertical sync event eventually results in a hard lockup of the Linux kernel. When this happens even trying the magic sysrq key won't show any signs of life. This problem takes from 5 to 90 minutes of running before the lockup happens, and you need to be running dual head cloned mode for it to happen at all. The root cause is not related to dual head, but the less predictable interrupt rate when running this way I am guessing really exacerbates the underlying race condition, raising the probability of failure. A standard signed-off kernel patch which fixes the problem is attached to this bug. I am not a git expert so I apologize that I don't have a git changeset built against the drm git repository. (I did however check that this problem also still appears to be present in the git repository.) It should be trivial to apply this patch in any case. This patch was built against the 2.6.24.3 vanilla kernel source tree. Following are the comments from the attached patch: <blockquote> The i915_vblank_swap() function schedules an automatic buffer swap upon receipt of the vertical sync interrupt. Such an operation is lengthy so it can't happen normal interrupt context, so the DRM implements this by scheduling the work in a kernel softirq-scheduled tasklet. In order for the buffer swap to work safely, the DRM's central lock must be taken, via a call to drm_lock_take() in drm_irq.c within the function drm_locked_tasklet_func(). The lock-taking logic uses a non-interrupt-blocking spinlock to implement the manipulations needed to take the lock. Note that a non-interrupt-blocking spinlock blocks kernel pre-emption and atomically sets a flag, but interrupts are still enabled. This semantic is safe if ALL attempts to use the spinlock only happen from process context. However this buffer swap happens from softirq context which is really a form of interrupt context that WILL pre-empt execution even when normal thread pre-emption is otherwise disabled. Thus we have an unsafe situation, in that drm_locked_tasklet_func() can block on a spinlock already taken by a thread in process context which will never get scheduled again because of the blocked softirq tasklet. This wedges the kernel hard. It's a very small race condition, but a race nonetheless with a very undesirable potential outcome. To trigger this bug, run a dual-head cloned mode configuration which uses the i915 drm, then execute an opengl application which synchronizes buffer swaps against the vertical sync interrupt. In my testing, a lockup always results after running anywhere from 5 minutes to an hour and a half. I believe dual-head is needed to really trigger the problem because then the vertical sync interrupt handling is no longer predictable (due to being interrupt-sourced from two different heads running at different speeds). This raises the probability of a the tasklet trying to run while the userspace DRI is doing things to the GPU (and manipulating the DRM lock). The fix is to change the relevant spinlock semantics to be the interrupt-blocking form. Thus all calls for this spinlock change from spin_lock() to spin_lock_irqsave() and similarly calls to spin_unlock() change to spin_unlock_irqrestore(). After this change I am no longer able to trigger the lockup; the longest test run so far was 6 hours (test stopped after that point). Signed-off-by: Mike Isely <[EMAIL PROTECTED]> </blockquote> -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel