-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday 22 May 2004 16:04, Michel DÃnzer wrote:
> On Sat, 2004-05-22 at 14:04, Nicolai Haehnle wrote:
> > It seems to me as if DRM(unlock) in drm_drv.h unlocks without checking 
> > whether the caller actually holds the global lock. There is no 
> > LOCK_TEST_WITH_RETURN or similar, and the helper function lock_transfer 
has 
> > no check in it either.
> > Did I miss something, or is this intended behaviour? It certainly seems 
> > strange to me.
> 
> True. Note that the lock ioctls are only used on contention, but still.

Unless I'm mistaken, DRM(lock) is always called when a client wants the lock 
for the first time (or when it needs to re-grab after it lost the lock). 
This is necessary because the DRM makes sure that dev->lock.filp matches 
the "calling" file. Afterwards, the ioctls are only used on contention.
The entire locking can be subverted anyway, because part of the lock is in 
userspace. I believe the important thing is to make sure that the X server 
can force a return into a sane locking state.

> > Side question: Is killing the offending DRI client enough? When the 
process 
> > is killed, the /dev/drm fd is closed, which should automatically release 
> > the lock. On the other hand, I'm pretty sure that we can't just kill a 
> > process immediately (unfortunately, I'm not familiar with process 
handling 
> > in the kernel). What if, for some reason, the process is in a state 
where 
> > it can't be killed yet?
> 
> We're screwed? :)

Looks like it...

> This sounds like an idea for you to play with, but I'm afraid it won't
> be useful very often in my experience:
> 
>       * getting rid of the offending client doesn't help with a wedged
>         chip (some way to recover from that would be nice...)
>       * it doesn't help if the X server itself spins with the lock held

You were right, of course, while I show my lack of experience with driver 
writing. In my case I can get the X server's reset code to run, but some 
way through the reset the machine finally locks up completely (no more 
networking, no more disk I/O).

I'm curious though, how can a complete lockup like this be caused by the 
graphics card? My guess would be that it grabs the PCI/AGP bus forever for 
some reason (the dark side of bus mastering, so to speak). Is there 
anything else that could be the cause?

> > Side question #2: Is it safe to release the DRM lock in the watchdog? 
There 
> > might be races where the offending DRI client is currently executing a 
DRM 
> > ioctl when the watchdog fires.
> 
> Not sure, but this might not be a problem when just killing the
> offending process?

You're right.
On the other hand, it might sometimes be useful to be a little bit nicer to 
the offending process (see point 4 below).

I had a go at implementing my watchdog idea for Linux, see the attached 
patch. It basically works, but I couldn't test it on a system where the DRI 
actually works without locking up... *sigh*

Now for some notes:
1. This only affects the DRM for Linux. I don't have an installation of BSD, 
and while I know a little bit about the Linux kernel, I don't know anything 
about the BSD kernel(s).

2. The timeout cannot be configured yet. I didn't find "prior art" as to how 
something like it should be configured, so I'm open for input. For a Linux 
driver, adding to the /proc entries seems to be the logical way to go, but 
the DRI is very ioctl-centric. Maybe both?

3. Privileged processes may take the hardware lock for an infinite amount of 
time. This is necessary because the X server holds the lock when VT is 
switched away. 
Currently, "privileged" means capable(CAP_SYS_ADMIN). I would prefer if it 
meant "the multiplexing controller process", i.e. the one that 
authenticates other processes. Unfortunately, this distinction isn't made 
anywhere in the DRM as far as I can see. This means that runaway DRI 
clients owned by root aren't killed by the watchdog, either.

4. Keith mentioned single-stepping through a driver, and he does have a 
point. Unfortunately, I also believe that it's not that simple.
Suppose an application developer debugs a windowed OpenGL application, on 
the local machine, without a dual-head setup. It may sound like a naive 
thing to do, but this actually works on Windows (yes, Windows is *a lot* 
more stable than Linux/BSD in that respect).
Now suppose she's got a bug in her application (e.g. bad vertex array) that 
triggers a segmentation fault inside the GL driver, while the hardware lock 
is held. GDB will catch that signal, so the process won't die, which in 
turn means that the lock is not released. Thus the developer's machine 
locks up unless the watchdog kicks in (of course, the watchdog in its 
current form will also frustrate her to no end).

cu,
Nicolai

> 
> -- 
> Earthling Michel DÃnzer      |     Debian (powerpc), X and DRI developer
> Libre software enthusiast    |   http://svcs.affero.net/rm.php?r=daenzer
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAsHd5sxPozBga0lwRAl2UAJoD78zAW8HaXMpn5uDh/qK4jJ4tjQCgzuG2
iWiQ90y9AvVDqoNR+s1iqrY=
=9eqi
-----END PGP SIGNATURE-----
diff -ur -x CVS -x Makefile drm-base/linux/drm_drv.h drm/linux/drm_drv.h
--- drm-base/linux/drm_drv.h	2004-05-22 21:41:28.000000000 +0200
+++ drm/linux/drm_drv.h	2004-05-22 21:26:18.000000000 +0200
@@ -273,6 +273,8 @@
 MODULE_PARM( drm_opts, "s" );
 MODULE_LICENSE("GPL and additional rights");
 
+static void drm_lock_watchdog( unsigned long __data );
+
 static int DRM(setup)( drm_device_t *dev )
 {
 	int i;
@@ -282,6 +284,10 @@
 	atomic_set( &dev->vma_count, 0 );
 	dev->buf_use = 0;
 	atomic_set( &dev->buf_alloc, 0 );
+	
+	init_timer( &dev->lock.watchdog );
+	dev->lock.watchdog.data = (unsigned long) dev;
+	dev->lock.watchdog.function = drm_lock_watchdog;
 
 #if __HAVE_DMA
 	i = DRM(dma_setup)( dev );
@@ -415,6 +421,7 @@
 
 	down( &dev->struct_sem );
 	del_timer( &dev->timer );
+	del_timer_sync( &dev->lock.watchdog );
 
 	if ( dev->devname ) {
 		DRM(free)( dev->devname, strlen( dev->devname ) + 1,
@@ -545,6 +552,7 @@
 	if ( dev->lock.hw_lock ) {
 		dev->sigdata.lock = dev->lock.hw_lock = NULL; /* SHM removed */
 		dev->lock.filp = 0;
+		dev->lock.dontbreak = 1;
 		wake_up_interruptible( &dev->lock.lock_queue );
 	}
 	
@@ -928,6 +936,11 @@
 #if __HAVE_RELEASE
 		DRIVER_RELEASE();
 #endif
+		/* Avoid potential race where the watchdog callback is still
+		 * running when filp is freed.
+		 */
+		del_timer_sync( &dev->lock.watchdog );
+
 		DRM(lock_free)( dev, &dev->lock.hw_lock->lock,
 				_DRM_LOCKING_CONTEXT(dev->lock.hw_lock->lock) );
 
@@ -951,6 +964,7 @@
 			}
 			if ( DRM(lock_take)( &dev->lock.hw_lock->lock,
 					     DRM_KERNEL_CONTEXT ) ) {
+				dev->lock.dontbreak = 1;
 				dev->lock.filp	    = filp;
 				dev->lock.lock_time = jiffies;
                                 atomic_inc( &dev->counts[_DRM_STAT_LOCKS] );
@@ -1096,6 +1110,40 @@
 	return retcode;
 }
 
+
+/**
+ * Lock watchdog callback function.
+ *
+ * Whenever a privileged client must sleep on the lock waitqueue
+ * in the LOCK ioctl, the watchdog timer is started.
+ * When the UNLOCK ioctl is called, the timer is stopped.
+ *
+ * When the watchdog timer expires, the process holding the lock
+ * is killed. Privileged clients set lock.dontbreak and are exempt
+ * from this rule.
+ */
+static void drm_lock_watchdog( unsigned long __data )
+{
+	drm_device_t *dev = (drm_device_t *)__data;
+	drm_file_t *priv;
+	
+	if ( !dev->lock.filp ) {
+		DRM_DEBUG( "held by kernel\n" );
+		return;
+	}
+	
+	if ( dev->lock.dontbreak ) {
+		DRM_DEBUG( "privileged lock\n" );
+		return;
+	}
+	
+	priv = dev->lock.filp->private_data;
+	DRM_DEBUG( "Kill pid=%d\n", priv->pid );
+	
+	kill_proc( priv->pid, SIGKILL, 1 );
+}
+
+
 /** 
  * Lock ioctl.
  *
@@ -1115,6 +1163,7 @@
         DECLARE_WAITQUEUE( entry, current );
         drm_lock_t lock;
         int ret = 0;
+        int privileged = capable( CAP_SYS_ADMIN );
 #if __HAVE_MULTIPLE_DMA_QUEUES
 	drm_queue_t *q;
 #endif
@@ -1157,6 +1206,7 @@
                         }
                         if ( DRM(lock_take)( &dev->lock.hw_lock->lock,
 					     lock.context ) ) {
+                                dev->lock.dontbreak = privileged;
                                 dev->lock.filp      = filp;
                                 dev->lock.lock_time = jiffies;
                                 atomic_inc( &dev->counts[_DRM_STAT_LOCKS] );
@@ -1164,6 +1214,13 @@
                         }
 
                                 /* Contention */
+                        if ( privileged ) {
+                        	if ( !timer_pending( &dev->lock.watchdog ) ) {
+                        		DRM_DEBUG( "Starting lock watchdog\n" );
+                        		mod_timer( &dev->lock.watchdog, jiffies + 5 * HZ );
+                        	}
+                        }
+                        
                         schedule();
                         if ( signal_pending( current ) ) {
                                 ret = -ERESTARTSYS;
@@ -1243,8 +1300,12 @@
 		return -EINVAL;
 	}
 
+	DRM_DEBUG( "\n" );
+
 	atomic_inc( &dev->counts[_DRM_STAT_UNLOCKS] );
 
+	del_timer_sync( &dev->lock.watchdog );
+
 	/* __HAVE_KERNEL_CTX_SWITCH isn't used by any of the drm
 	 * modules in the DRI cvs tree, but it is required by the
 	 * Sparc driver.
diff -ur -x CVS -x Makefile drm-base/linux/drmP.h drm/linux/drmP.h
--- drm-base/linux/drmP.h	2004-05-22 21:41:28.000000000 +0200
+++ drm/linux/drmP.h	2004-05-22 19:52:41.000000000 +0200
@@ -574,6 +574,8 @@
 	struct file       *filp;	/**< File descr of lock holder (0=kernel) */
 	wait_queue_head_t lock_queue;	/**< Queue of blocked processes */
 	unsigned long	  lock_time;	/**< Time of last lock in jiffies */
+	struct timer_list watchdog;	/**< Watchdog timer to kill runaway processes */
+	int               dontbreak;	/**< Even watchdog honours the current lock */
 } drm_lock_data_t;
 
 /**

Reply via email to