-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday 22 May 2004 16:04, Michel DÃnzer wrote: > On Sat, 2004-05-22 at 14:04, Nicolai Haehnle wrote: > > It seems to me as if DRM(unlock) in drm_drv.h unlocks without checking > > whether the caller actually holds the global lock. There is no > > LOCK_TEST_WITH_RETURN or similar, and the helper function lock_transfer has > > no check in it either. > > Did I miss something, or is this intended behaviour? It certainly seems > > strange to me. > > True. Note that the lock ioctls are only used on contention, but still.
Unless I'm mistaken, DRM(lock) is always called when a client wants the lock for the first time (or when it needs to re-grab after it lost the lock). This is necessary because the DRM makes sure that dev->lock.filp matches the "calling" file. Afterwards, the ioctls are only used on contention. The entire locking can be subverted anyway, because part of the lock is in userspace. I believe the important thing is to make sure that the X server can force a return into a sane locking state. > > Side question: Is killing the offending DRI client enough? When the process > > is killed, the /dev/drm fd is closed, which should automatically release > > the lock. On the other hand, I'm pretty sure that we can't just kill a > > process immediately (unfortunately, I'm not familiar with process handling > > in the kernel). What if, for some reason, the process is in a state where > > it can't be killed yet? > > We're screwed? :) Looks like it... > This sounds like an idea for you to play with, but I'm afraid it won't > be useful very often in my experience: > > * getting rid of the offending client doesn't help with a wedged > chip (some way to recover from that would be nice...) > * it doesn't help if the X server itself spins with the lock held You were right, of course, while I show my lack of experience with driver writing. In my case I can get the X server's reset code to run, but some way through the reset the machine finally locks up completely (no more networking, no more disk I/O). I'm curious though, how can a complete lockup like this be caused by the graphics card? My guess would be that it grabs the PCI/AGP bus forever for some reason (the dark side of bus mastering, so to speak). Is there anything else that could be the cause? > > Side question #2: Is it safe to release the DRM lock in the watchdog? There > > might be races where the offending DRI client is currently executing a DRM > > ioctl when the watchdog fires. > > Not sure, but this might not be a problem when just killing the > offending process? You're right. On the other hand, it might sometimes be useful to be a little bit nicer to the offending process (see point 4 below). I had a go at implementing my watchdog idea for Linux, see the attached patch. It basically works, but I couldn't test it on a system where the DRI actually works without locking up... *sigh* Now for some notes: 1. This only affects the DRM for Linux. I don't have an installation of BSD, and while I know a little bit about the Linux kernel, I don't know anything about the BSD kernel(s). 2. The timeout cannot be configured yet. I didn't find "prior art" as to how something like it should be configured, so I'm open for input. For a Linux driver, adding to the /proc entries seems to be the logical way to go, but the DRI is very ioctl-centric. Maybe both? 3. Privileged processes may take the hardware lock for an infinite amount of time. This is necessary because the X server holds the lock when VT is switched away. Currently, "privileged" means capable(CAP_SYS_ADMIN). I would prefer if it meant "the multiplexing controller process", i.e. the one that authenticates other processes. Unfortunately, this distinction isn't made anywhere in the DRM as far as I can see. This means that runaway DRI clients owned by root aren't killed by the watchdog, either. 4. Keith mentioned single-stepping through a driver, and he does have a point. Unfortunately, I also believe that it's not that simple. Suppose an application developer debugs a windowed OpenGL application, on the local machine, without a dual-head setup. It may sound like a naive thing to do, but this actually works on Windows (yes, Windows is *a lot* more stable than Linux/BSD in that respect). Now suppose she's got a bug in her application (e.g. bad vertex array) that triggers a segmentation fault inside the GL driver, while the hardware lock is held. GDB will catch that signal, so the process won't die, which in turn means that the lock is not released. Thus the developer's machine locks up unless the watchdog kicks in (of course, the watchdog in its current form will also frustrate her to no end). cu, Nicolai > > -- > Earthling Michel DÃnzer | Debian (powerpc), X and DRI developer > Libre software enthusiast | http://svcs.affero.net/rm.php?r=daenzer -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAsHd5sxPozBga0lwRAl2UAJoD78zAW8HaXMpn5uDh/qK4jJ4tjQCgzuG2 iWiQ90y9AvVDqoNR+s1iqrY= =9eqi -----END PGP SIGNATURE-----
diff -ur -x CVS -x Makefile drm-base/linux/drm_drv.h drm/linux/drm_drv.h --- drm-base/linux/drm_drv.h 2004-05-22 21:41:28.000000000 +0200 +++ drm/linux/drm_drv.h 2004-05-22 21:26:18.000000000 +0200 @@ -273,6 +273,8 @@ MODULE_PARM( drm_opts, "s" ); MODULE_LICENSE("GPL and additional rights"); +static void drm_lock_watchdog( unsigned long __data ); + static int DRM(setup)( drm_device_t *dev ) { int i; @@ -282,6 +284,10 @@ atomic_set( &dev->vma_count, 0 ); dev->buf_use = 0; atomic_set( &dev->buf_alloc, 0 ); + + init_timer( &dev->lock.watchdog ); + dev->lock.watchdog.data = (unsigned long) dev; + dev->lock.watchdog.function = drm_lock_watchdog; #if __HAVE_DMA i = DRM(dma_setup)( dev ); @@ -415,6 +421,7 @@ down( &dev->struct_sem ); del_timer( &dev->timer ); + del_timer_sync( &dev->lock.watchdog ); if ( dev->devname ) { DRM(free)( dev->devname, strlen( dev->devname ) + 1, @@ -545,6 +552,7 @@ if ( dev->lock.hw_lock ) { dev->sigdata.lock = dev->lock.hw_lock = NULL; /* SHM removed */ dev->lock.filp = 0; + dev->lock.dontbreak = 1; wake_up_interruptible( &dev->lock.lock_queue ); } @@ -928,6 +936,11 @@ #if __HAVE_RELEASE DRIVER_RELEASE(); #endif + /* Avoid potential race where the watchdog callback is still + * running when filp is freed. + */ + del_timer_sync( &dev->lock.watchdog ); + DRM(lock_free)( dev, &dev->lock.hw_lock->lock, _DRM_LOCKING_CONTEXT(dev->lock.hw_lock->lock) ); @@ -951,6 +964,7 @@ } if ( DRM(lock_take)( &dev->lock.hw_lock->lock, DRM_KERNEL_CONTEXT ) ) { + dev->lock.dontbreak = 1; dev->lock.filp = filp; dev->lock.lock_time = jiffies; atomic_inc( &dev->counts[_DRM_STAT_LOCKS] ); @@ -1096,6 +1110,40 @@ return retcode; } + +/** + * Lock watchdog callback function. + * + * Whenever a privileged client must sleep on the lock waitqueue + * in the LOCK ioctl, the watchdog timer is started. + * When the UNLOCK ioctl is called, the timer is stopped. + * + * When the watchdog timer expires, the process holding the lock + * is killed. Privileged clients set lock.dontbreak and are exempt + * from this rule. + */ +static void drm_lock_watchdog( unsigned long __data ) +{ + drm_device_t *dev = (drm_device_t *)__data; + drm_file_t *priv; + + if ( !dev->lock.filp ) { + DRM_DEBUG( "held by kernel\n" ); + return; + } + + if ( dev->lock.dontbreak ) { + DRM_DEBUG( "privileged lock\n" ); + return; + } + + priv = dev->lock.filp->private_data; + DRM_DEBUG( "Kill pid=%d\n", priv->pid ); + + kill_proc( priv->pid, SIGKILL, 1 ); +} + + /** * Lock ioctl. * @@ -1115,6 +1163,7 @@ DECLARE_WAITQUEUE( entry, current ); drm_lock_t lock; int ret = 0; + int privileged = capable( CAP_SYS_ADMIN ); #if __HAVE_MULTIPLE_DMA_QUEUES drm_queue_t *q; #endif @@ -1157,6 +1206,7 @@ } if ( DRM(lock_take)( &dev->lock.hw_lock->lock, lock.context ) ) { + dev->lock.dontbreak = privileged; dev->lock.filp = filp; dev->lock.lock_time = jiffies; atomic_inc( &dev->counts[_DRM_STAT_LOCKS] ); @@ -1164,6 +1214,13 @@ } /* Contention */ + if ( privileged ) { + if ( !timer_pending( &dev->lock.watchdog ) ) { + DRM_DEBUG( "Starting lock watchdog\n" ); + mod_timer( &dev->lock.watchdog, jiffies + 5 * HZ ); + } + } + schedule(); if ( signal_pending( current ) ) { ret = -ERESTARTSYS; @@ -1243,8 +1300,12 @@ return -EINVAL; } + DRM_DEBUG( "\n" ); + atomic_inc( &dev->counts[_DRM_STAT_UNLOCKS] ); + del_timer_sync( &dev->lock.watchdog ); + /* __HAVE_KERNEL_CTX_SWITCH isn't used by any of the drm * modules in the DRI cvs tree, but it is required by the * Sparc driver. diff -ur -x CVS -x Makefile drm-base/linux/drmP.h drm/linux/drmP.h --- drm-base/linux/drmP.h 2004-05-22 21:41:28.000000000 +0200 +++ drm/linux/drmP.h 2004-05-22 19:52:41.000000000 +0200 @@ -574,6 +574,8 @@ struct file *filp; /**< File descr of lock holder (0=kernel) */ wait_queue_head_t lock_queue; /**< Queue of blocked processes */ unsigned long lock_time; /**< Time of last lock in jiffies */ + struct timer_list watchdog; /**< Watchdog timer to kill runaway processes */ + int dontbreak; /**< Even watchdog honours the current lock */ } drm_lock_data_t; /**