Re: [patch] Re: Some questions regarding locks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've attached a new version of the patch. This should fix a minor bug: I put the call to init_timer() too late, which resulted in a kernel warning when the module was loaded/unloaded without actually being used. On Sunday 23 May 2004 14:37, Michel Dnzer wrote: 2. The timeout cannot be configured yet. I didn't find prior art as to how something like it should be configured, so I'm open for input. For a Linux driver, adding to the /proc entries seems to be the logical way to go, but the DRI is very ioctl-centric. Maybe both? What's the goal of making it configurable at all, to allow for driver debugging? Maybe that could be dealt with better, see below. This is actually a good point :) Is there a way to tell that a process is being debugged? If so, maybe it could be handled sanely by default? E.g., release the lock while the process is stopped? (That might wreak havoc once execution is resumed though) ... Could be possible, but it *is* bound to wreak havoc. Now that you talk about it... in the far future, it would be *very* useful if clients could deal with temporary loss of access to the DRM. I'm thinking of the recent discussions about the possible future of fbdev, DRI, etc. where all graphics access eventually goes through the DRM, or something similar. In that scenario, we need to have a way to establish a secure terminal that is safe against a) fake messages / dialogs created by DRI clients running in the background and b) screen scraping by background clients I don't see how this could be done without revoking authorization temporarily, including unmapping memory regions. Once DRI clients can deal with this, running them in a debugger should be a piece of cake, really :) cu, Nicolai -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAs6BxsxPozBga0lwRAglnAKC9ldd4n/KbE1cDSLPapkRHMx2O0QCZAdMC Ab4c9daD4WyRZWyGPRJmSyw= =yLV/ -END PGP SIGNATURE- diff -ur drm-base/linux/drm_drv.h drm/linux/drm_drv.h --- drm-base/linux/drm_drv.h 2004-05-22 21:41:28.0 +0200 +++ drm/linux/drm_drv.h 2004-05-25 19:51:21.0 +0200 @@ -273,6 +273,8 @@ MODULE_PARM( drm_opts, s ); MODULE_LICENSE(GPL and additional rights); +static void drm_lock_watchdog( unsigned long __data ); + static int DRM(setup)( drm_device_t *dev ) { int i; @@ -415,6 +417,7 @@ down( dev-struct_sem ); del_timer( dev-timer ); + del_timer_sync( dev-lock.watchdog ); if ( dev-devname ) { DRM(free)( dev-devname, strlen( dev-devname ) + 1, @@ -545,6 +548,7 @@ if ( dev-lock.hw_lock ) { dev-sigdata.lock = dev-lock.hw_lock = NULL; /* SHM removed */ dev-lock.filp = 0; + dev-lock.dontbreak = 1; wake_up_interruptible( dev-lock.lock_queue ); } @@ -581,6 +585,10 @@ sema_init( dev-struct_sem, 1 ); sema_init( dev-ctxlist_sem, 1 ); + init_timer( dev-lock.watchdog ); + dev-lock.watchdog.data = (unsigned long) dev; + dev-lock.watchdog.function = drm_lock_watchdog; + if ((dev-minor = DRM(stub_register)(DRIVER_NAME, DRM(fops),dev)) 0) { retcode = -EPERM; @@ -928,6 +936,11 @@ #if __HAVE_RELEASE DRIVER_RELEASE(); #endif + /* Avoid potential race where the watchdog callback is still + * running when filp is freed. + */ + del_timer_sync( dev-lock.watchdog ); + DRM(lock_free)( dev, dev-lock.hw_lock-lock, _DRM_LOCKING_CONTEXT(dev-lock.hw_lock-lock) ); @@ -951,6 +964,7 @@ } if ( DRM(lock_take)( dev-lock.hw_lock-lock, DRM_KERNEL_CONTEXT ) ) { +dev-lock.dontbreak = 1; dev-lock.filp = filp; dev-lock.lock_time = jiffies; atomic_inc( dev-counts[_DRM_STAT_LOCKS] ); @@ -1096,6 +1110,40 @@ return retcode; } + +/** + * Lock watchdog callback function. + * + * Whenever a privileged client must sleep on the lock waitqueue + * in the LOCK ioctl, the watchdog timer is started. + * When the UNLOCK ioctl is called, the timer is stopped. + * + * When the watchdog timer expires, the process holding the lock + * is killed. Privileged clients set lock.dontbreak and are exempt + * from this rule. + */ +static void drm_lock_watchdog( unsigned long __data ) +{ + drm_device_t *dev = (drm_device_t *)__data; + drm_file_t *priv; + + if ( !dev-lock.filp ) { + DRM_DEBUG( held by kernel\n ); + return; + } + + if ( dev-lock.dontbreak ) { + DRM_DEBUG( privileged lock\n ); + return; + } + + priv = dev-lock.filp-private_data; + DRM_DEBUG( Kill pid=%d\n, priv-pid ); + + kill_proc( priv-pid, SIGKILL, 1 ); +} + + /** * Lock ioctl. * @@ -1115,6 +1163,7 @@ DECLARE_WAITQUEUE( entry, current ); drm_lock_t lock; int ret = 0; +int privileged = capable( CAP_SYS_ADMIN ); #if __HAVE_MULTIPLE_DMA_QUEUES drm_queue_t *q; #endif @@ -1157,6 +1206,7 @@ } if ( DRM(lock_take)( dev-lock.hw_lock-lock, lock.context ) ) { +
[patch] Re: Some questions regarding locks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Saturday 22 May 2004 16:04, Michel Dnzer wrote: On Sat, 2004-05-22 at 14:04, Nicolai Haehnle wrote: It seems to me as if DRM(unlock) in drm_drv.h unlocks without checking whether the caller actually holds the global lock. There is no LOCK_TEST_WITH_RETURN or similar, and the helper function lock_transfer has no check in it either. Did I miss something, or is this intended behaviour? It certainly seems strange to me. True. Note that the lock ioctls are only used on contention, but still. Unless I'm mistaken, DRM(lock) is always called when a client wants the lock for the first time (or when it needs to re-grab after it lost the lock). This is necessary because the DRM makes sure that dev-lock.filp matches the calling file. Afterwards, the ioctls are only used on contention. The entire locking can be subverted anyway, because part of the lock is in userspace. I believe the important thing is to make sure that the X server can force a return into a sane locking state. Side question: Is killing the offending DRI client enough? When the process is killed, the /dev/drm fd is closed, which should automatically release the lock. On the other hand, I'm pretty sure that we can't just kill a process immediately (unfortunately, I'm not familiar with process handling in the kernel). What if, for some reason, the process is in a state where it can't be killed yet? We're screwed? :) Looks like it... This sounds like an idea for you to play with, but I'm afraid it won't be useful very often in my experience: * getting rid of the offending client doesn't help with a wedged chip (some way to recover from that would be nice...) * it doesn't help if the X server itself spins with the lock held You were right, of course, while I show my lack of experience with driver writing. In my case I can get the X server's reset code to run, but some way through the reset the machine finally locks up completely (no more networking, no more disk I/O). I'm curious though, how can a complete lockup like this be caused by the graphics card? My guess would be that it grabs the PCI/AGP bus forever for some reason (the dark side of bus mastering, so to speak). Is there anything else that could be the cause? Side question #2: Is it safe to release the DRM lock in the watchdog? There might be races where the offending DRI client is currently executing a DRM ioctl when the watchdog fires. Not sure, but this might not be a problem when just killing the offending process? You're right. On the other hand, it might sometimes be useful to be a little bit nicer to the offending process (see point 4 below). I had a go at implementing my watchdog idea for Linux, see the attached patch. It basically works, but I couldn't test it on a system where the DRI actually works without locking up... *sigh* Now for some notes: 1. This only affects the DRM for Linux. I don't have an installation of BSD, and while I know a little bit about the Linux kernel, I don't know anything about the BSD kernel(s). 2. The timeout cannot be configured yet. I didn't find prior art as to how something like it should be configured, so I'm open for input. For a Linux driver, adding to the /proc entries seems to be the logical way to go, but the DRI is very ioctl-centric. Maybe both? 3. Privileged processes may take the hardware lock for an infinite amount of time. This is necessary because the X server holds the lock when VT is switched away. Currently, privileged means capable(CAP_SYS_ADMIN). I would prefer if it meant the multiplexing controller process, i.e. the one that authenticates other processes. Unfortunately, this distinction isn't made anywhere in the DRM as far as I can see. This means that runaway DRI clients owned by root aren't killed by the watchdog, either. 4. Keith mentioned single-stepping through a driver, and he does have a point. Unfortunately, I also believe that it's not that simple. Suppose an application developer debugs a windowed OpenGL application, on the local machine, without a dual-head setup. It may sound like a naive thing to do, but this actually works on Windows (yes, Windows is *a lot* more stable than Linux/BSD in that respect). Now suppose she's got a bug in her application (e.g. bad vertex array) that triggers a segmentation fault inside the GL driver, while the hardware lock is held. GDB will catch that signal, so the process won't die, which in turn means that the lock is not released. Thus the developer's machine locks up unless the watchdog kicks in (of course, the watchdog in its current form will also frustrate her to no end). cu, Nicolai -- Earthling Michel Dnzer | Debian (powerpc), X and DRI developer Libre software enthusiast| http://svcs.affero.net/rm.php?r=daenzer -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux)
Re: [patch] Re: Some questions regarding locks
On Sun, 2004-05-23 at 12:05, Nicolai Haehnle wrote: This sounds like an idea for you to play with, but I'm afraid it won't be useful very often in my experience: * getting rid of the offending client doesn't help with a wedged chip (some way to recover from that would be nice...) * it doesn't help if the X server itself spins with the lock held You were right, of course, while I show my lack of experience with driver writing. In my case I can get the X server's reset code to run, but some way through the reset the machine finally locks up completely (no more networking, no more disk I/O). I'm curious though, how can a complete lockup like this be caused by the graphics card? My guess would be that it grabs the PCI/AGP bus forever for some reason (the dark side of bus mastering, so to speak). Is there anything else that could be the cause? I don't really know, but I also guess it's PCI/AGP related. 2. The timeout cannot be configured yet. I didn't find prior art as to how something like it should be configured, so I'm open for input. For a Linux driver, adding to the /proc entries seems to be the logical way to go, but the DRI is very ioctl-centric. Maybe both? What's the goal of making it configurable at all, to allow for driver debugging? Maybe that could be dealt with better, see below. 3. Privileged processes may take the hardware lock for an infinite amount of time. This is necessary because the X server holds the lock when VT is switched away. Currently, privileged means capable(CAP_SYS_ADMIN). I would prefer if it meant the multiplexing controller process, i.e. the one that authenticates other processes. Unfortunately, this distinction isn't made anywhere in the DRM as far as I can see. Maybe one could recognize it by the DRM context handle, a bit hackish though. This means that runaway DRI clients owned by root aren't killed by the watchdog, either. root is generally allowed to shoot itself in the foot, so this isn't all that bad I guess. 4. Keith mentioned single-stepping through a driver, and he does have a point. Unfortunately, I also believe that it's not that simple. Suppose an application developer debugs a windowed OpenGL application, on the local machine, without a dual-head setup. It may sound like a naive thing to do, but this actually works on Windows (yes, Windows is *a lot* more stable than Linux/BSD in that respect). Now suppose she's got a bug in her application (e.g. bad vertex array) that triggers a segmentation fault inside the GL driver, while the hardware lock is held. GDB will catch that signal, so the process won't die, which in turn means that the lock is not released. Thus the developer's machine locks up unless the watchdog kicks in (of course, the watchdog in its current form will also frustrate her to no end). Is there a way to tell that a process is being debugged? If so, maybe it could be handled sanely by default? E.g., release the lock while the process is stopped? (That might wreak havoc once execution is resumed though) ... Another random thought I had was to introduce a Magic Sysrq key combo to release the lock / kill the process holding it / whatever, but I guess that wouldn't help Joe User with the casual hanging 3D client either. -- Earthling Michel Dnzer | Debian (powerpc), X and DRI developer Libre software enthusiast| http://svcs.affero.net/rm.php?r=daenzer --- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click -- ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel