Re: [patch] Re: Some questions regarding locks

2004-05-25 Thread Nicolai Haehnle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've attached a new version of the patch. This should fix a minor bug: I put 
the call to init_timer() too late, which resulted in a kernel warning when 
the module was loaded/unloaded without actually being used.

On Sunday 23 May 2004 14:37, Michel Dnzer wrote:
  2. The timeout cannot be configured yet. I didn't find prior art as to 
how 
  something like it should be configured, so I'm open for input. For a 
Linux 
  driver, adding to the /proc entries seems to be the logical way to go, 
but 
  the DRI is very ioctl-centric. Maybe both?
 
 What's the goal of making it configurable at all, to allow for driver
 debugging? Maybe that could be dealt with better, see below.

This is actually a good point :)
 
 Is there a way to tell that a process is being debugged? If so, maybe it
 could be handled sanely by default? E.g., release the lock while the
 process is stopped? (That might wreak havoc once execution is resumed
 though) ...

Could be possible, but it *is* bound to wreak havoc. Now that you talk about 
it... in the far future, it would be *very* useful if clients could deal 
with temporary loss of access to the DRM. I'm thinking of the recent 
discussions about the possible future of fbdev, DRI, etc. where all 
graphics access eventually goes through the DRM, or something similar.
In that scenario, we need to have a way to establish a secure terminal that 
is safe against
a) fake messages / dialogs created by DRI clients running in the background 
and
b) screen scraping by background clients

I don't see how this could be done without revoking authorization 
temporarily, including unmapping memory regions.
Once DRI clients can deal with this, running them in a debugger should be a 
piece of cake, really :)

cu,
Nicolai
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAs6BxsxPozBga0lwRAglnAKC9ldd4n/KbE1cDSLPapkRHMx2O0QCZAdMC
Ab4c9daD4WyRZWyGPRJmSyw=
=yLV/
-END PGP SIGNATURE-
diff -ur drm-base/linux/drm_drv.h drm/linux/drm_drv.h
--- drm-base/linux/drm_drv.h	2004-05-22 21:41:28.0 +0200
+++ drm/linux/drm_drv.h	2004-05-25 19:51:21.0 +0200
@@ -273,6 +273,8 @@
 MODULE_PARM( drm_opts, s );
 MODULE_LICENSE(GPL and additional rights);
 
+static void drm_lock_watchdog( unsigned long __data );
+
 static int DRM(setup)( drm_device_t *dev )
 {
 	int i;
@@ -415,6 +417,7 @@
 
 	down( dev-struct_sem );
 	del_timer( dev-timer );
+	del_timer_sync( dev-lock.watchdog );
 
 	if ( dev-devname ) {
 		DRM(free)( dev-devname, strlen( dev-devname ) + 1,
@@ -545,6 +548,7 @@
 	if ( dev-lock.hw_lock ) {
 		dev-sigdata.lock = dev-lock.hw_lock = NULL; /* SHM removed */
 		dev-lock.filp = 0;
+		dev-lock.dontbreak = 1;
 		wake_up_interruptible( dev-lock.lock_queue );
 	}
 	
@@ -581,6 +585,10 @@
 	sema_init( dev-struct_sem, 1 );
 	sema_init( dev-ctxlist_sem, 1 );
 
+	init_timer( dev-lock.watchdog );
+	dev-lock.watchdog.data = (unsigned long) dev;
+	dev-lock.watchdog.function = drm_lock_watchdog;
+
 	if ((dev-minor = DRM(stub_register)(DRIVER_NAME, DRM(fops),dev))  0)
 	{
 		retcode = -EPERM;
@@ -928,6 +936,11 @@
 #if __HAVE_RELEASE
 		DRIVER_RELEASE();
 #endif
+		/* Avoid potential race where the watchdog callback is still
+		 * running when filp is freed.
+		 */
+		del_timer_sync( dev-lock.watchdog );
+
 		DRM(lock_free)( dev, dev-lock.hw_lock-lock,
 _DRM_LOCKING_CONTEXT(dev-lock.hw_lock-lock) );
 
@@ -951,6 +964,7 @@
 			}
 			if ( DRM(lock_take)( dev-lock.hw_lock-lock,
 	 DRM_KERNEL_CONTEXT ) ) {
+dev-lock.dontbreak = 1;
 dev-lock.filp	= filp;
 dev-lock.lock_time = jiffies;
 atomic_inc( dev-counts[_DRM_STAT_LOCKS] );
@@ -1096,6 +1110,40 @@
 	return retcode;
 }
 
+
+/**
+ * Lock watchdog callback function.
+ *
+ * Whenever a privileged client must sleep on the lock waitqueue
+ * in the LOCK ioctl, the watchdog timer is started.
+ * When the UNLOCK ioctl is called, the timer is stopped.
+ *
+ * When the watchdog timer expires, the process holding the lock
+ * is killed. Privileged clients set lock.dontbreak and are exempt
+ * from this rule.
+ */
+static void drm_lock_watchdog( unsigned long __data )
+{
+	drm_device_t *dev = (drm_device_t *)__data;
+	drm_file_t *priv;
+	
+	if ( !dev-lock.filp ) {
+		DRM_DEBUG( held by kernel\n );
+		return;
+	}
+	
+	if ( dev-lock.dontbreak ) {
+		DRM_DEBUG( privileged lock\n );
+		return;
+	}
+	
+	priv = dev-lock.filp-private_data;
+	DRM_DEBUG( Kill pid=%d\n, priv-pid );
+	
+	kill_proc( priv-pid, SIGKILL, 1 );
+}
+
+
 /** 
  * Lock ioctl.
  *
@@ -1115,6 +1163,7 @@
 DECLARE_WAITQUEUE( entry, current );
 drm_lock_t lock;
 int ret = 0;
+int privileged = capable( CAP_SYS_ADMIN );
 #if __HAVE_MULTIPLE_DMA_QUEUES
 	drm_queue_t *q;
 #endif
@@ -1157,6 +1206,7 @@
 }
 if ( DRM(lock_take)( dev-lock.hw_lock-lock,
 	 lock.context ) ) {
+

[patch] Re: Some questions regarding locks

2004-05-23 Thread Nicolai Haehnle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Saturday 22 May 2004 16:04, Michel Dnzer wrote:
 On Sat, 2004-05-22 at 14:04, Nicolai Haehnle wrote:
  It seems to me as if DRM(unlock) in drm_drv.h unlocks without checking 
  whether the caller actually holds the global lock. There is no 
  LOCK_TEST_WITH_RETURN or similar, and the helper function lock_transfer 
has 
  no check in it either.
  Did I miss something, or is this intended behaviour? It certainly seems 
  strange to me.
 
 True. Note that the lock ioctls are only used on contention, but still.

Unless I'm mistaken, DRM(lock) is always called when a client wants the lock 
for the first time (or when it needs to re-grab after it lost the lock). 
This is necessary because the DRM makes sure that dev-lock.filp matches 
the calling file. Afterwards, the ioctls are only used on contention.
The entire locking can be subverted anyway, because part of the lock is in 
userspace. I believe the important thing is to make sure that the X server 
can force a return into a sane locking state.

  Side question: Is killing the offending DRI client enough? When the 
process 
  is killed, the /dev/drm fd is closed, which should automatically release 
  the lock. On the other hand, I'm pretty sure that we can't just kill a 
  process immediately (unfortunately, I'm not familiar with process 
handling 
  in the kernel). What if, for some reason, the process is in a state 
where 
  it can't be killed yet?
 
 We're screwed? :)

Looks like it...

 This sounds like an idea for you to play with, but I'm afraid it won't
 be useful very often in my experience:
 
   * getting rid of the offending client doesn't help with a wedged
 chip (some way to recover from that would be nice...)
   * it doesn't help if the X server itself spins with the lock held

You were right, of course, while I show my lack of experience with driver 
writing. In my case I can get the X server's reset code to run, but some 
way through the reset the machine finally locks up completely (no more 
networking, no more disk I/O).

I'm curious though, how can a complete lockup like this be caused by the 
graphics card? My guess would be that it grabs the PCI/AGP bus forever for 
some reason (the dark side of bus mastering, so to speak). Is there 
anything else that could be the cause?

  Side question #2: Is it safe to release the DRM lock in the watchdog? 
There 
  might be races where the offending DRI client is currently executing a 
DRM 
  ioctl when the watchdog fires.
 
 Not sure, but this might not be a problem when just killing the
 offending process?

You're right.
On the other hand, it might sometimes be useful to be a little bit nicer to 
the offending process (see point 4 below).

I had a go at implementing my watchdog idea for Linux, see the attached 
patch. It basically works, but I couldn't test it on a system where the DRI 
actually works without locking up... *sigh*

Now for some notes:
1. This only affects the DRM for Linux. I don't have an installation of BSD, 
and while I know a little bit about the Linux kernel, I don't know anything 
about the BSD kernel(s).

2. The timeout cannot be configured yet. I didn't find prior art as to how 
something like it should be configured, so I'm open for input. For a Linux 
driver, adding to the /proc entries seems to be the logical way to go, but 
the DRI is very ioctl-centric. Maybe both?

3. Privileged processes may take the hardware lock for an infinite amount of 
time. This is necessary because the X server holds the lock when VT is 
switched away. 
Currently, privileged means capable(CAP_SYS_ADMIN). I would prefer if it 
meant the multiplexing controller process, i.e. the one that 
authenticates other processes. Unfortunately, this distinction isn't made 
anywhere in the DRM as far as I can see. This means that runaway DRI 
clients owned by root aren't killed by the watchdog, either.

4. Keith mentioned single-stepping through a driver, and he does have a 
point. Unfortunately, I also believe that it's not that simple.
Suppose an application developer debugs a windowed OpenGL application, on 
the local machine, without a dual-head setup. It may sound like a naive 
thing to do, but this actually works on Windows (yes, Windows is *a lot* 
more stable than Linux/BSD in that respect).
Now suppose she's got a bug in her application (e.g. bad vertex array) that 
triggers a segmentation fault inside the GL driver, while the hardware lock 
is held. GDB will catch that signal, so the process won't die, which in 
turn means that the lock is not released. Thus the developer's machine 
locks up unless the watchdog kicks in (of course, the watchdog in its 
current form will also frustrate her to no end).

cu,
Nicolai

 
 -- 
 Earthling Michel Dnzer  | Debian (powerpc), X and DRI developer
 Libre software enthusiast|   http://svcs.affero.net/rm.php?r=daenzer
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)


Re: [patch] Re: Some questions regarding locks

2004-05-23 Thread Michel Dänzer
On Sun, 2004-05-23 at 12:05, Nicolai Haehnle wrote:
 
  This sounds like an idea for you to play with, but I'm afraid it won't
  be useful very often in my experience:
  
* getting rid of the offending client doesn't help with a wedged
  chip (some way to recover from that would be nice...)
* it doesn't help if the X server itself spins with the lock held
 
 You were right, of course, while I show my lack of experience with driver 
 writing. In my case I can get the X server's reset code to run, but some 
 way through the reset the machine finally locks up completely (no more 
 networking, no more disk I/O).
 
 I'm curious though, how can a complete lockup like this be caused by the 
 graphics card? My guess would be that it grabs the PCI/AGP bus forever for 
 some reason (the dark side of bus mastering, so to speak). Is there 
 anything else that could be the cause?

I don't really know, but I also guess it's PCI/AGP related.


 2. The timeout cannot be configured yet. I didn't find prior art as to how 
 something like it should be configured, so I'm open for input. For a Linux 
 driver, adding to the /proc entries seems to be the logical way to go, but 
 the DRI is very ioctl-centric. Maybe both?

What's the goal of making it configurable at all, to allow for driver
debugging? Maybe that could be dealt with better, see below.

 3. Privileged processes may take the hardware lock for an infinite amount of 
 time. This is necessary because the X server holds the lock when VT is 
 switched away. 
 Currently, privileged means capable(CAP_SYS_ADMIN). I would prefer if it 
 meant the multiplexing controller process, i.e. the one that 
 authenticates other processes. Unfortunately, this distinction isn't made 
 anywhere in the DRM as far as I can see. 

Maybe one could recognize it by the DRM context handle, a bit hackish
though.

 This means that runaway DRI clients owned by root aren't killed by the 
 watchdog, either.

root is generally allowed to shoot itself in the foot, so this isn't all
that bad I guess.


 4. Keith mentioned single-stepping through a driver, and he does have a 
 point. Unfortunately, I also believe that it's not that simple.
 Suppose an application developer debugs a windowed OpenGL application, on 
 the local machine, without a dual-head setup. It may sound like a naive 
 thing to do, but this actually works on Windows (yes, Windows is *a lot* 
 more stable than Linux/BSD in that respect).
 Now suppose she's got a bug in her application (e.g. bad vertex array) that 
 triggers a segmentation fault inside the GL driver, while the hardware lock 
 is held. GDB will catch that signal, so the process won't die, which in 
 turn means that the lock is not released. Thus the developer's machine 
 locks up unless the watchdog kicks in (of course, the watchdog in its 
 current form will also frustrate her to no end).

Is there a way to tell that a process is being debugged? If so, maybe it
could be handled sanely by default? E.g., release the lock while the
process is stopped? (That might wreak havoc once execution is resumed
though) ...


Another random thought I had was to introduce a Magic Sysrq key combo to
release the lock / kill the process holding it / whatever, but I guess
that wouldn't help Joe User with the casual hanging 3D client either.


-- 
Earthling Michel Dnzer  | Debian (powerpc), X and DRI developer
Libre software enthusiast|   http://svcs.affero.net/rm.php?r=daenzer



---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel