Hi all,

in this message
http://lists.gnu.org/archive/html/chicken-users/2008-08/msg00094.html
I posted a patch to scheduler.scm, which fixes a race condition wrt. bad
file descriptors in the waiting queue.
(Attached is a slightly brushed up version.)

I have not seen any replies to this message (which is sort of strange on
this list).  Did anyone consider to roll it in?  If not, why?  Anything
bad about it?

The race originally arose in a somewhat strange setup, which might not
be even repeatable for everyone, since it involved accidentally changing
the scheduling sequence (using more or fewer exception handlers), which
may have different results on different machines.

But the issue as such is easy to recreate, if you make a thread wait
asynchronously on any file descriptor and close that fd from a second
thread.  (Something, which may or may not be intentional, but easy to
do.)

thank you

/Jörg

BTW (& why I'm I so desperate about it): I just completed the port of
Askemos (www.askemos.org) to chicken.  This works pretty good on my
laptop now.  I intend to release it as soon as it's better tested.  For
that I need to roll the development environment out the several machines
and I'd really, really love to do so straight from chicken svn without
additional patches.  (And promise: Askemos will run without the patch;
but it handles lots of concurrent network connections.  It will surely
hit the race after a few minutes.)

Index: scheduler.scm
===================================================================
--- scheduler.scm	(Revision 11653)
+++ scheduler.scm	(Arbeitskopie)
@@ -36,7 +36,7 @@
 	##sys#update-thread-state-buffer ##sys#restore-thread-state-buffer
 	##sys#remove-from-ready-queue ##sys#unblock-threads-for-i/o ##sys#force-primordial
 	##sys#fdset-input-set ##sys#fdset-output-set ##sys#fdset-clear
-	##sys#fdset-select-timeout ##sys#fdset-restore
+	##sys#fdset-select-timeout ##sys#fdset-restore ##sys#handle-bad-fd!
 	##sys#clear-i/o-state-for-thread!) 
   (foreign-declare #<<EOF
 #ifdef HAVE_ERRNO_H
@@ -60,6 +60,7 @@
 # include <sys/types.h>
 # include <sys/time.h>
 # include <time.h>
+# include <sys/stat.h>
 static C_word C_msleep(C_word ms);
 C_word C_msleep(C_word ms) {
 #ifdef __CYGWIN__
@@ -341,6 +342,25 @@
   (##sys#setislot t 13 #f)
   (##sys#setslot t 11 (cons fd i/o)) )
 
+(define-foreign-variable error-bad-file int "(errno == EBADF)")
+
+(define (##sys#handle-bad-fd! e)
+  (let ((bad ((foreign-lambda*
+	       bool ((integer fd))
+	       "struct stat buf;"
+	       "int i = ( (fstat(fd, &buf) == -1 && errno == EBADF) ? 1 : 0);"
+	       "return(i);")
+	      (car e))))
+    (if (and bad (pair? (cdr e)))
+	(thread-signal!
+	 (cadr c)
+	 (##sys#make-structure
+	  'condition
+	  '(exn i/o) ;; better? '(exn i/o net)
+	  (list '(exn . message) "bad file descriptor"
+		'(exn . arguments) (car e)) )))
+    bad))
+
 (define (##sys#unblock-threads-for-i/o)
   (dbg "fd-list: " ##sys#fd-list)
   (let* ([to? (pair? ##sys#timeout-list)]
@@ -353,8 +373,23 @@
 		   (fxmax 0 (- tmo1 now)) )
 		 0) ) ] )		; otherwise immediate timeout.
     (dbg n " fds ready")
-    (cond [(eq? -1 n) 
-	   (##sys#force-primordial)]
+    (cond [(eq? -1 n)
+	   (cond
+	    (error-bad-file
+	     (set! ##sys#fd-list
+		   (let loop ((l ##sys#fd-list))
+		     (cond
+		      ((null? l) l)
+		      ((##sys#handle-bad-fd! (car l))
+		       (##sys#fdset-clear (caar l))
+		       ;; This is supposed to be a rare case, catch
+		       ;; them one by one.
+		       ;; (loop (cdr l))
+		       (cdr l))
+		      (else (cons (car l) (loop (cdr l)))))))
+	     (##sys#fdset-restore)
+	     (##sys#unblock-threads-for-i/o))
+	    (else (##sys#force-primordial))) ]
 	  [(fx> n 0)
 	   (set! ##sys#fd-list
 	     (let loop ([n n] [lst ##sys#fd-list])
_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to