I've been trying to track down a DRI client and server deadlock problem. I think I now know the problem, I'd appreciate it if others could confirm this is a bug or if I have a misunderstanding.
This is the scenario: 1) Client takes heavyweight lock via ioctl, lock now has DRM_LOCK_HELD bit or'ed in, high nibble is now 0x8. 2) Server requests heavyweight lock on a different context via ioctl, lock is held by client, server is suspened pending release of lock by client. The DRM_LOCK_CONT flag is or'ed in, high nibble is now 0xC. 3) Client wants to take lightweight lock, client currently holds lock. A CAS test is performed between the lock and (DRM_LOCK_HELD | context). The CAS test fails because even though the context is the same the high nibble now has both the DRM_LOCK_HELD and the DRM_LOCK_CONT flags or'ed into it. The test would have succeeded if the DRM_LOCK_CONT flags was not set. Because the test fails the client does not believe it owns the lock (but it does!) and then issues heavyweight ioctl lock on the very context it already owns the lock on. 4) In the kernel driver DRM(take_lock) discovers the lock is already held on that context by that process. It issues an ERROR message, and returns 0. A zero return value indicates the lock cannot be taken, it then suspends the client waiting for the lock to be released, but it is this client that holds the lock, both the client and server are now suspended both waiting for a lock release that will never occur, a classic deadlock. Assuming my analysis is correct I see the following possible solutions: 1) Invoke CAS twice, once with (DRM_LOCK_HELD | context) and if it fails try once again with (DRM_LOCK_HELD | DRM_LOCK_CONT | context) 2) remove the DRM_LOCK_CONT from the lock and put the flag elsewhere. 3) Have CAS (or better a macro that wraps it) mask out bits not belonging to the test (at the moment thats just DRM_LOCK_CONT). 4) Have DRM(take_lock) return TRUE if the lock is already held. I think this is a bad choice because it violates the locking semantics of no nested heavyweight locks in the driver. The client would continue to be confused over when to lock and unlock, thus no matter what the client needs to be fixed. Questions: 1) Does the analysis sound correct? 2) If so, which approach is preferred? I need to make a patch to fix this, might as well do in a manner that keeps the upstream developers happy. My personal preference is solution #3. John ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel