[Dri-devel] bug in light locks?

2003-10-07 Thread John Dennis
I've been trying to track down a DRI client and server deadlock
problem. I think I now know the problem, I'd appreciate it if others
could confirm this is a bug or if I have a misunderstanding.

This is the scenario:

1) Client takes heavyweight lock via ioctl, lock now has DRM_LOCK_HELD
bit or'ed in, high nibble is now 0x8.

2) Server requests heavyweight lock on a different context via ioctl,
lock is held by client, server is suspened pending release of lock by
client. The DRM_LOCK_CONT flag is or'ed in, high nibble is now 0xC.

3) Client wants to take lightweight lock, client currently holds
lock. A CAS test is performed between the lock and
(DRM_LOCK_HELD | context). The CAS test fails because even though the
context is the same the high nibble now has both the DRM_LOCK_HELD and
the DRM_LOCK_CONT flags or'ed into it. The test would have succeeded
if the DRM_LOCK_CONT flags was not set. Because the test fails the
client does not believe it owns the lock (but it does!) and then
issues heavyweight ioctl lock on the very context it already owns the
lock on.

4) In the kernel driver DRM(take_lock) discovers the lock is already
held on that context by that process. It issues an ERROR message, and
returns 0. A zero return value indicates the lock cannot be taken, it
then suspends the client waiting for the lock to be released, but it is
this client that holds the lock, both the client and server are now
suspended both waiting for a lock release that will never occur, a
classic deadlock.

Assuming my analysis is correct I see the following possible
solutions:

1) Invoke CAS twice, once with (DRM_LOCK_HELD | context) and if it
fails try once again with (DRM_LOCK_HELD | DRM_LOCK_CONT | context)

2) remove the DRM_LOCK_CONT from the lock and put the flag elsewhere.

3) Have CAS (or better a macro that wraps it) mask out bits not
belonging to the test (at the moment thats just DRM_LOCK_CONT).

4) Have DRM(take_lock) return TRUE if the lock is already held. I
think this is a bad choice because it violates the locking semantics
of no nested heavyweight locks in the driver. The client would
continue to be confused over when to lock and unlock, thus no matter
what the client needs to be fixed.

Questions:

1) Does the analysis sound correct?

2) If so, which approach is preferred? I need to make a patch to fix
this, might as well do in a manner that keeps the upstream developers
happy. My personal preference is solution #3.

John



---
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] bug in light locks?

2003-10-07 Thread Keith Whitwell
John Dennis wrote:
I've been trying to track down a DRI client and server deadlock
problem. I think I now know the problem, I'd appreciate it if others
could confirm this is a bug or if I have a misunderstanding.
This is the scenario:

1) Client takes heavyweight lock via ioctl, lock now has DRM_LOCK_HELD
bit or'ed in, high nibble is now 0x8.
2) Server requests heavyweight lock on a different context via ioctl,
lock is held by client, server is suspened pending release of lock by
client. The DRM_LOCK_CONT flag is or'ed in, high nibble is now 0xC.
3) Client wants to take lightweight lock, client currently holds
lock. 
These locks don't support recursive locking -- if this situation arises, the 
client is broken.

Keith



---
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel