Re: [Chicken-hackers] PATCH: Re: Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
Attached a version against master. This only appears to be correct. :-/ I messed up with the prefix, thus srfi-18 did not load and I will really not find the time to come back to the issue in a timely manner. On Dec 3 2018, Jörg F. Wittenberger wrote: Attached a patch against 4.13 master still compiling On Nov 30 2018, megane wrote: Hi, Here's another version that crashes quickly with "very high probability". (cond-expand (chicken-5 (import (chicken base)) (import (chicken time)) (import srfi-18)) (else (import chicken) (use srfi-18))) (define m (make-mutex)) (print "@@ " (current-thread) " " "lock") (mutex-lock! m) (define t (current-milliseconds)) (define (get-tosleep) (/ (floor (* 1000 (- (+ t .030) (current-milliseconds 1000)) (thread-start! (make-thread (lambda () ;; (thread-sleep! .01) (print "@@ " (current-thread) " " "lock") (let lp () (when (not (mutex-lock! m (get-tosleep))) (thread-yield!) (lp))) (print "@@ " (current-thread) " " "unlock") (mutex-unlock! m (print "@@ " (current-thread) " " "sleep") (thread-sleep! (get-tosleep)) (print "@@ " (current-thread) " " "unlock") (mutex-unlock! m) (thread-yield!) (thread-sleep! .01) (print "All ok!!") --- typical output of a failing execution: $ stdbuf -oL -eL ./t |& cat -n 1 @@ # lock 2 #: locking # 3 @@ # sleep 4 # blocks for timeout 933.0 5 scheduling, current: #, ready: (#) 6 timeout: # -> 933.0 (now: 904) 7 switching to # 8 @@ # lock 9 #: locking # 10 # blocks for timeout 933.0 11 # sleeping on mutex mutex0 12 scheduling, current: #, ready: () 13 timeout: # -> 933.0 (now: 904) 14 timeout: # -> 933.0 (now: 934) 15 timeout expired for # 16 unblocking: # 17 timeout: # -> 933.0 (now: 934) 18 timeout expired for # 19 unblocking: # 20 switching to # 21 @@ # unlock 22 #: unlocking mutex0 23 24 Error: (mutex-unlock) Internal scheduler error: unknown thread state 25 # 26 ready 27 28 Call history: 29 30 t.scm:27: chicken.base#print 31 t.scm:28: get-tosleep 32 t.scm:15: chicken.time#current-milliseconds 33 t.scm:15: scheme#floor 34 t.scm:15: scheme#/ 35 t.scm:28: srfi-18#thread-sleep! 36 t.scm:29: srfi-18#current-thread 37 t.scm:29: chicken.base#print 38 t.scm:30: srfi-18#mutex-unlock! <-- (There's an extra debug message on line 15. Add (dbg "timeout expired for " tto) in this true branch: (if (>= now tmo1) ; timeout reached? in ##sys#schedule) --- The issue mutex-unlock! makes the decision that a thread freed from the mutex's waiting list cannot be in the 'ready state. From the output above you see a case how a thread waiting on a mutex can end up being in the 'ready state. line 2: The mutex is locked by primordial thread (pt) line 4: The pt goes to sleep until 933.0 line 7: As the pt goes to sleep thread1 is scheduled to run line 10: thread1 tries to lock the mutex, but sets a timeout that happens to be at time 933.0 lines 12-14: Both threads asleep, time advances to 934 lines 15-16: pt gets put on the ready list lines 17-19: thread1 gets put on the ready list line 20: pt starts running lines 21-22: pt executes mutex-unlock! while thread1 is ready to run --- A fix Just allow the 'ready state for threads in mutex-unlock! In the patch I arbitrarily call ##sys#schedule after removing a thread from the list, but I think doing nothing would work equally well. Is this a correct fix? Sorry, I can't help with that one.. Maybe it's possible there's threads on the waiting list, but the thread that gets removed is not going to lock the mutex: There are 3 threads in this scenario, A, B and C. * A locks mutex * A sleeps until t * B tries to lock mutex until t * C tries to lock mutex * A and B are woken up at t * A unlocks mutex, frees B * B is scheduled to run as per the patch * B finds out about the timeout, gives up and starts doing something else * Now thread C is waiting on the mutex but no-one is going to free it! From 307e9d806f421bd13e4b6f30a8cdb86378b8c1dd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rg=20F=2E=20Wittenberger?= Date: Mon, 3 Dec 2018 22:22:05 +0100 Subject: [PATCH] Fix 1564 internal scheduler error. --- scheduler.scm | 79 +++ 1 file changed, 41 insertions(+), 38 deletions(-) diff --git a/scheduler.scm b/scheduler.scm index 238c348e..32c2743c 100644 --- a/scheduler.scm +++ b/scheduler.scm @@ -35,7 +35,7 @@ ;; This isn't hidden ATM to allow set!ing it as a hook/workaround ; ##sys#force-primordial remove-from-ready-queue fdset-test create-fdset stderr
[Chicken-hackers] PATCH: Re: Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
Attached a patch against 4.13 master still compiling On Nov 30 2018, megane wrote: Hi, Here's another version that crashes quickly with "very high probability". (cond-expand (chicken-5 (import (chicken base)) (import (chicken time)) (import srfi-18)) (else (import chicken) (use srfi-18))) (define m (make-mutex)) (print "@@ " (current-thread) " " "lock") (mutex-lock! m) (define t (current-milliseconds)) (define (get-tosleep) (/ (floor (* 1000 (- (+ t .030) (current-milliseconds 1000)) (thread-start! (make-thread (lambda () ;; (thread-sleep! .01) (print "@@ " (current-thread) " " "lock") (let lp () (when (not (mutex-lock! m (get-tosleep))) (thread-yield!) (lp))) (print "@@ " (current-thread) " " "unlock") (mutex-unlock! m (print "@@ " (current-thread) " " "sleep") (thread-sleep! (get-tosleep)) (print "@@ " (current-thread) " " "unlock") (mutex-unlock! m) (thread-yield!) (thread-sleep! .01) (print "All ok!!") --- typical output of a failing execution: $ stdbuf -oL -eL ./t |& cat -n 1 @@ # lock 2 #: locking # 3 @@ # sleep 4 # blocks for timeout 933.0 5 scheduling, current: #, ready: (#) 6 timeout: # -> 933.0 (now: 904) 7 switching to # 8 @@ # lock 9 #: locking # 10 # blocks for timeout 933.0 11 # sleeping on mutex mutex0 12 scheduling, current: #, ready: () 13 timeout: # -> 933.0 (now: 904) 14 timeout: # -> 933.0 (now: 934) 15 timeout expired for # 16 unblocking: # 17 timeout: # -> 933.0 (now: 934) 18 timeout expired for # 19 unblocking: # 20 switching to # 21 @@ # unlock 22 #: unlocking mutex0 23 24 Error: (mutex-unlock) Internal scheduler error: unknown thread state 25 # 26 ready 27 28 Call history: 29 30 t.scm:27: chicken.base#print 31 t.scm:28: get-tosleep 32 t.scm:15: chicken.time#current-milliseconds 33 t.scm:15: scheme#floor 34 t.scm:15: scheme#/ 35 t.scm:28: srfi-18#thread-sleep! 36 t.scm:29: srfi-18#current-thread 37 t.scm:29: chicken.base#print 38 t.scm:30: srfi-18#mutex-unlock! <-- (There's an extra debug message on line 15. Add (dbg "timeout expired for " tto) in this true branch: (if (>= now tmo1) ; timeout reached? in ##sys#schedule) --- The issue mutex-unlock! makes the decision that a thread freed from the mutex's waiting list cannot be in the 'ready state. From the output above you see a case how a thread waiting on a mutex can end up being in the 'ready state. line 2: The mutex is locked by primordial thread (pt) line 4: The pt goes to sleep until 933.0 line 7: As the pt goes to sleep thread1 is scheduled to run line 10: thread1 tries to lock the mutex, but sets a timeout that happens to be at time 933.0 lines 12-14: Both threads asleep, time advances to 934 lines 15-16: pt gets put on the ready list lines 17-19: thread1 gets put on the ready list line 20: pt starts running lines 21-22: pt executes mutex-unlock! while thread1 is ready to run --- A fix Just allow the 'ready state for threads in mutex-unlock! In the patch I arbitrarily call ##sys#schedule after removing a thread from the list, but I think doing nothing would work equally well. Is this a correct fix? Sorry, I can't help with that one.. Maybe it's possible there's threads on the waiting list, but the thread that gets removed is not going to lock the mutex: There are 3 threads in this scenario, A, B and C. * A locks mutex * A sleeps until t * B tries to lock mutex until t * C tries to lock mutex * A and B are woken up at t * A unlocks mutex, frees B * B is scheduled to run as per the patch * B finds out about the timeout, gives up and starts doing something else * Now thread C is waiting on the mutex but no-one is going to free it! From b6837b2c94feb5f8348965f538b5a45bf01a7506 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rg=20F=2E=20Wittenberger?= Date: Mon, 3 Dec 2018 21:06:26 +0100 Subject: [PATCH] Fix 1564 internal scheduler error. --- scheduler.scm | 80 ++- 1 file changed, 41 insertions(+), 39 deletions(-) diff --git a/scheduler.scm b/scheduler.scm index 0b292f7f..a1a03293 100644 --- a/scheduler.scm +++ b/scheduler.scm @@ -34,7 +34,7 @@ ;; This isn't hidden ATM to allow set!ing it as a hook/workaround ; ##sys#force-primordial fdset-set fdset-test create-fdset stderr - ##sys#clear-i/o-state-for-thread! ##sys#abandon-mutexes) + ##sys#thread-clear-blocking-state! ##sys#abandon-mutexes) (not inline ##sys#interrupt-hook ##sys#force-primordial) (unsafe) (foreign-declare #<= now tmo1) ; timeout reached? (begin