Re: [Chicken-users] Race condition in scheduler.scm

2008-08-17 Thread Elf

On Sun, 17 Aug 2008, Jörg F. Wittenberger wrote:


Hi all,

in this message
http://lists.gnu.org/archive/html/chicken-users/2008-08/msg00094.html
I posted a patch to scheduler.scm, which fixes a race condition wrt. bad
file descriptors in the waiting queue.
(Attached is a slightly brushed up version.)

I have not seen any replies to this message (which is sort of strange on
this list).  Did anyone consider to roll it in?  If not, why?  Anything
bad about it?

The race originally arose in a somewhat strange setup, which might not
be even repeatable for everyone, since it involved accidentally changing
the scheduling sequence (using more or fewer exception handlers), which
may have different results on different machines.

But the issue as such is easy to recreate, if you make a thread wait
asynchronously on any file descriptor and close that fd from a second
thread.  (Something, which may or may not be intentional, but easy to
do.)



i'm not understanding why this would cause a 'race condition', nor why your 
patch is necessary.  the lines which you deleted from the scheduler (the -1

check and forced-primordial) accomplish the same thing without raising an
exception or causing modifications to the signal/condition/exception handler
in unsafe ways.

-elf
___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Race condition in scheduler.scm

2008-08-17 Thread F. Wittenberger
Am Sonntag, den 17.08.2008, 08:54 +0900 schrieb Ivan Raikov:
   You should probably email Felix directly about that. I am not
 familiar with the threads library at all, so I am not qualified to
 evaluate this patch, and I don't know if many other Chicken users have
 had your level of experience with threads. At the moment, you might be
 the most experienced user of that library.

It already looks somewhat like that.  :-(

Am Sonntag, den 17.08.2008, 03:47 -0700 schrieb Elf:
 On Sun, 17 Aug 2008, Jörg F. Wittenberger wrote:
 
  Hi all,
 
  in this message
  http://lists.gnu.org/archive/html/chicken-users/2008-08/msg00094.html
  I posted a patch to scheduler.scm, which fixes a race condition wrt. bad
  file descriptors in the waiting queue.
  (Attached is a slightly brushed up version.)
 
  I have not seen any replies to this message (which is sort of strange on
  this list).  Did anyone consider to roll it in?  If not, why?  Anything
  bad about it?
 
  The race originally arose in a somewhat strange setup, which might not
  be even repeatable for everyone, since it involved accidentally changing
  the scheduling sequence (using more or fewer exception handlers), which
  may have different results on different machines.
 
  But the issue as such is easy to recreate, if you make a thread wait
  asynchronously on any file descriptor and close that fd from a second
  thread.  (Something, which may or may not be intentional, but easy to
  do.)
 
 
 i'm not understanding why this would cause a 'race condition', nor why your 
 patch is necessary.  the lines which you deleted from the scheduler (the -1
 check and forced-primordial) accomplish the same thing without raising an
 exception or causing modifications to the signal/condition/exception handler
 in unsafe ways.

Originally it came up as a race condition.  See my above referenced
message I.  Add/delete an exception handler from that code and - at
least on my machine - run into fake thread-join-timeout exceptions
endlessly.

And truly it is a race condition, since - depending on order of
scheduling - threads may or may not handle the case before the scheduler
switches to the primordial, possibly causing the harm.

Not, it's not true that forcing the primordial (and which is in some
unknown state at that time) will do the same.  The patch checks the
##sys#fd-list, removes bad fd's and retries.  Other -1 from select(2)
are still propagated to the primordial.  This should be understood as a
temporary bug compatibility since it's probably not much better, though
I haven#t been able to create a test case for those.

The patch I attached last time is actually broken (a typo) wrt.
signalling threads waiting for input.  Now, since I ran into that one
too, find a fixed patch attached.

Felix, could you please review the patch and integrate the fix.

Thanks a lot

/Jörg
Index: scheduler.scm
===
--- scheduler.scm	(Revision 11663)
+++ scheduler.scm	(Arbeitskopie)
@@ -36,7 +36,7 @@
 	##sys#update-thread-state-buffer ##sys#restore-thread-state-buffer
 	##sys#remove-from-ready-queue ##sys#unblock-threads-for-i/o ##sys#force-primordial
 	##sys#fdset-input-set ##sys#fdset-output-set ##sys#fdset-clear
-	##sys#fdset-select-timeout ##sys#fdset-restore
+	##sys#fdset-select-timeout ##sys#fdset-restore ##sys#handle-bad-fd!
 	##sys#clear-i/o-state-for-thread!) 
   (foreign-declare #EOF
 #ifdef HAVE_ERRNO_H
@@ -60,6 +60,7 @@
 # include sys/types.h
 # include sys/time.h
 # include time.h
+# include sys/stat.h
 static C_word C_msleep(C_word ms);
 C_word C_msleep(C_word ms) {
 #ifdef __CYGWIN__
@@ -341,6 +342,24 @@
   (##sys#setislot t 13 #f)
   (##sys#setslot t 11 (cons fd i/o)) )
 
+(define-foreign-variable error-bad-file int (errno == EBADF))
+
+(define (##sys#handle-bad-fd! e)
+  (dbg check bad e)
+  (let ((bad ((foreign-lambda*
+	   bool ((integer fd))
+	   struct stat buf;
+	   int i = ( (fstat(fd, buf) == -1  errno == EBADF) ? 1 : 0);
+	   return(i);)
+	  (car e
+(if (and bad (pair? (cdr e)))
+	(thread-signal! (cadr e) (##sys#make-structure
+	  'condition
+	  '(exn i/o) ;; better? '(exn i/o net)
+	  (list '(exn . message) bad file descriptor
+		'(exn . arguments) (car e)) )))
+bad))
+
 (define (##sys#unblock-threads-for-i/o)
   (dbg fd-list:  ##sys#fd-list)
   (let* ([to? (pair? ##sys#timeout-list)]
@@ -353,8 +372,23 @@
 		   (fxmax 0 (- tmo1 now)) )
 		 0) ) ] )		; otherwise immediate timeout.
 (dbg n  fds ready)
-(cond [(eq? -1 n) 
-	   (##sys#force-primordial)]
+(cond [(eq? -1 n)
+	   (cond
+	(error-bad-file
+	 (set! ##sys#fd-list
+		   (let loop ((l ##sys#fd-list))
+		 (cond
+		  ((null? l) l)
+		  ((##sys#handle-bad-fd! (car l))
+		   (##sys#fdset-clear (caar l))
+		   ;; This is supposed to be a rare case, catch
+		   ;; them one by one.
+		   ;; (loop (cdr l))
+		   (cdr l))
+		  (else (cons (car l) (loop (cdr l)))
+	 

[Chicken-users] Race condition in scheduler.scm

2008-08-16 Thread F. Wittenberger
Hi all,

in this message
http://lists.gnu.org/archive/html/chicken-users/2008-08/msg00094.html
I posted a patch to scheduler.scm, which fixes a race condition wrt. bad
file descriptors in the waiting queue.
(Attached is a slightly brushed up version.)

I have not seen any replies to this message (which is sort of strange on
this list).  Did anyone consider to roll it in?  If not, why?  Anything
bad about it?

The race originally arose in a somewhat strange setup, which might not
be even repeatable for everyone, since it involved accidentally changing
the scheduling sequence (using more or fewer exception handlers), which
may have different results on different machines.

But the issue as such is easy to recreate, if you make a thread wait
asynchronously on any file descriptor and close that fd from a second
thread.  (Something, which may or may not be intentional, but easy to
do.)

thank you

/Jörg

BTW ( why I'm I so desperate about it): I just completed the port of
Askemos (www.askemos.org) to chicken.  This works pretty good on my
laptop now.  I intend to release it as soon as it's better tested.  For
that I need to roll the development environment out the several machines
and I'd really, really love to do so straight from chicken svn without
additional patches.  (And promise: Askemos will run without the patch;
but it handles lots of concurrent network connections.  It will surely
hit the race after a few minutes.)

Index: scheduler.scm
===
--- scheduler.scm	(Revision 11653)
+++ scheduler.scm	(Arbeitskopie)
@@ -36,7 +36,7 @@
 	##sys#update-thread-state-buffer ##sys#restore-thread-state-buffer
 	##sys#remove-from-ready-queue ##sys#unblock-threads-for-i/o ##sys#force-primordial
 	##sys#fdset-input-set ##sys#fdset-output-set ##sys#fdset-clear
-	##sys#fdset-select-timeout ##sys#fdset-restore
+	##sys#fdset-select-timeout ##sys#fdset-restore ##sys#handle-bad-fd!
 	##sys#clear-i/o-state-for-thread!) 
   (foreign-declare #EOF
 #ifdef HAVE_ERRNO_H
@@ -60,6 +60,7 @@
 # include sys/types.h
 # include sys/time.h
 # include time.h
+# include sys/stat.h
 static C_word C_msleep(C_word ms);
 C_word C_msleep(C_word ms) {
 #ifdef __CYGWIN__
@@ -341,6 +342,25 @@
   (##sys#setislot t 13 #f)
   (##sys#setslot t 11 (cons fd i/o)) )
 
+(define-foreign-variable error-bad-file int (errno == EBADF))
+
+(define (##sys#handle-bad-fd! e)
+  (let ((bad ((foreign-lambda*
+	   bool ((integer fd))
+	   struct stat buf;
+	   int i = ( (fstat(fd, buf) == -1  errno == EBADF) ? 1 : 0);
+	   return(i);)
+	  (car e
+(if (and bad (pair? (cdr e)))
+	(thread-signal!
+	 (cadr c)
+	 (##sys#make-structure
+	  'condition
+	  '(exn i/o) ;; better? '(exn i/o net)
+	  (list '(exn . message) bad file descriptor
+		'(exn . arguments) (car e)) )))
+bad))
+
 (define (##sys#unblock-threads-for-i/o)
   (dbg fd-list:  ##sys#fd-list)
   (let* ([to? (pair? ##sys#timeout-list)]
@@ -353,8 +373,23 @@
 		   (fxmax 0 (- tmo1 now)) )
 		 0) ) ] )		; otherwise immediate timeout.
 (dbg n  fds ready)
-(cond [(eq? -1 n) 
-	   (##sys#force-primordial)]
+(cond [(eq? -1 n)
+	   (cond
+	(error-bad-file
+	 (set! ##sys#fd-list
+		   (let loop ((l ##sys#fd-list))
+		 (cond
+		  ((null? l) l)
+		  ((##sys#handle-bad-fd! (car l))
+		   (##sys#fdset-clear (caar l))
+		   ;; This is supposed to be a rare case, catch
+		   ;; them one by one.
+		   ;; (loop (cdr l))
+		   (cdr l))
+		  (else (cons (car l) (loop (cdr l)))
+	 (##sys#fdset-restore)
+	 (##sys#unblock-threads-for-i/o))
+	(else (##sys#force-primordial))) ]
 	  [(fx n 0)
 	   (set! ##sys#fd-list
 	 (let loop ([n n] [lst ##sys#fd-list])
___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users