Re: [Chicken-hackers] PATCH: Re: Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error

2018-12-03 Thread Jörg F . Wittenberger

Attached a version against master.

This only appears to be correct. :-/ I messed up with the prefix, thus 
srfi-18 did not load and I will really not find the time to come back to 
the issue in a timely manner.


On Dec 3 2018, Jörg F. Wittenberger wrote:


Attached a patch against 4.13

master still compiling

On Nov 30 2018, megane wrote:


Hi,

Here's another version that crashes quickly with "very high
probability".

(cond-expand
(chicken-5 (import (chicken base))
   (import (chicken time))
   (import srfi-18))
(else (import chicken)
  (use srfi-18)))

(define m (make-mutex))

(print "@@ " (current-thread) " " "lock")
(mutex-lock! m)

(define t (current-milliseconds))
(define (get-tosleep)
 (/ (floor (* 1000 (- (+ t .030) (current-milliseconds 1000))

(thread-start!
(make-thread (lambda ()
   ;; (thread-sleep! .01)
   (print "@@ " (current-thread) " " "lock")
   (let lp ()
 (when (not (mutex-lock! m (get-tosleep)))
   (thread-yield!)
   (lp)))
   (print "@@ " (current-thread) " " "unlock")
   (mutex-unlock! m
(print "@@ " (current-thread) " " "sleep")
(thread-sleep! (get-tosleep))
(print "@@ " (current-thread) " " "unlock")
(mutex-unlock! m)
(thread-yield!)
(thread-sleep! .01)
(print "All ok!!")

--- typical output of a failing execution:

$ stdbuf -oL -eL ./t |& cat -n
1   @@ # lock
2   #: locking #
3   @@ # sleep
4   # blocks for timeout 933.0
5  scheduling, current: #, 
ready: (#)

6   timeout: # -> 933.0 (now: 904)
7   switching to #
8   @@ # lock
9   #: locking #
   10   # blocks for timeout 933.0
   11   # sleeping on mutex mutex0
   12  scheduling, current: #, 
ready: ()

   13   timeout: # -> 933.0 (now: 904)
   14   timeout: # -> 933.0 (now: 934)
   15   timeout expired for #
   16   unblocking: #
   17   timeout: # -> 933.0 (now: 934)
   18   timeout expired for #
   19   unblocking: #
   20   switching to #
   21   @@ # unlock
   22   #: unlocking mutex0
   23
   24 Error: (mutex-unlock) Internal scheduler error: unknown thread 
state

   25   #
   26   ready
   27
   28   Call history:
   29
   30   t.scm:27: chicken.base#print
   31   t.scm:28: get-tosleep
   32   t.scm:15: chicken.time#current-milliseconds
   33   t.scm:15: scheme#floor
   34   t.scm:15: scheme#/
   35   t.scm:28: srfi-18#thread-sleep!
   36   t.scm:29: srfi-18#current-thread
   37   t.scm:29: chicken.base#print
   38   t.scm:30: srfi-18#mutex-unlock! <--

(There's an extra debug message on line 15.
Add (dbg "timeout expired for " tto) in this true branch:

(if (>= now tmo1) ; timeout reached?

in ##sys#schedule)

--- The issue
mutex-unlock! makes the decision that a thread freed from
the mutex's waiting list cannot be in the 'ready state.

From the output above you see a case how a thread waiting on a mutex
can end up being in the 'ready state.

line  2: The mutex is locked by primordial thread (pt)
line  4: The pt goes to sleep until 933.0
line  7: As the pt goes to sleep thread1 is scheduled to run
line 10: thread1 tries to lock the mutex, but sets a timeout that
happens to be at time 933.0

lines 12-14: Both threads asleep, time advances to 934
lines 15-16: pt gets put on the ready list
lines 17-19: thread1 gets put on the ready list
line 20: pt starts running
lines 21-22: pt executes mutex-unlock! while thread1 is ready to run

--- A fix

Just allow the 'ready state for threads in mutex-unlock!

In the patch I arbitrarily call ##sys#schedule after removing a thread
from the list, but I think doing nothing would work equally well.

Is this a correct fix?
Sorry, I can't help with that one..

Maybe it's possible there's threads on the waiting list, but the thread
that gets removed is not going to lock the mutex:

There are 3 threads in this scenario, A, B and C.

* A locks mutex
* A sleeps until t
* B tries to lock mutex until t
* C tries to lock mutex
* A and B are woken up at t
* A unlocks mutex, frees B
* B is scheduled to run as per the patch
* B finds out about the timeout, gives up and starts doing something else
* Now thread C is waiting on the mutex but no-one is going to free it!


From 307e9d806f421bd13e4b6f30a8cdb86378b8c1dd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=B6rg=20F=2E=20Wittenberger?=
 
Date: Mon, 3 Dec 2018 22:22:05 +0100
Subject: [PATCH] Fix 1564 internal scheduler error.

---
 scheduler.scm | 79 +++
 1 file changed, 41 insertions(+), 38 deletions(-)

diff --git a/scheduler.scm b/scheduler.scm
index 238c348e..32c2743c 100644
--- a/scheduler.scm
+++ b/scheduler.scm
@@ -35,7 +35,7 @@
 	;; This isn't hidden ATM to allow set!ing it as a hook/workaround
 	; ##sys#force-primordial
 	remove-from-ready-queue fdset-test create-fdset stderr 

[Chicken-hackers] PATCH: Re: Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error

2018-12-03 Thread Jörg F . Wittenberger

Attached a patch against 4.13

master still compiling

On Nov 30 2018, megane wrote:


Hi,

Here's another version that crashes quickly with "very high
probability".

(cond-expand
(chicken-5 (import (chicken base))
   (import (chicken time))
   (import srfi-18))
(else (import chicken)
  (use srfi-18)))

(define m (make-mutex))

(print "@@ " (current-thread) " " "lock")
(mutex-lock! m)

(define t (current-milliseconds))
(define (get-tosleep)
 (/ (floor (* 1000 (- (+ t .030) (current-milliseconds 1000))

(thread-start!
(make-thread (lambda ()
   ;; (thread-sleep! .01)
   (print "@@ " (current-thread) " " "lock")
   (let lp ()
 (when (not (mutex-lock! m (get-tosleep)))
   (thread-yield!)
   (lp)))
   (print "@@ " (current-thread) " " "unlock")
   (mutex-unlock! m
(print "@@ " (current-thread) " " "sleep")
(thread-sleep! (get-tosleep))
(print "@@ " (current-thread) " " "unlock")
(mutex-unlock! m)
(thread-yield!)
(thread-sleep! .01)
(print "All ok!!")

--- typical output of a failing execution:

$ stdbuf -oL -eL ./t |& cat -n
1   @@ # lock
2   #: locking #
3   @@ # sleep
4   # blocks for timeout 933.0
5  scheduling, current: #, 
ready: (#)

6   timeout: # -> 933.0 (now: 904)
7   switching to #
8   @@ # lock
9   #: locking #
   10   # blocks for timeout 933.0
   11   # sleeping on mutex mutex0
   12  scheduling, current: #, 
ready: ()

   13   timeout: # -> 933.0 (now: 904)
   14   timeout: # -> 933.0 (now: 934)
   15   timeout expired for #
   16   unblocking: #
   17   timeout: # -> 933.0 (now: 934)
   18   timeout expired for #
   19   unblocking: #
   20   switching to #
   21   @@ # unlock
   22   #: unlocking mutex0
   23
   24   Error: (mutex-unlock) Internal scheduler error: unknown thread state
   25   #
   26   ready
   27
   28   Call history:
   29
   30   t.scm:27: chicken.base#print
   31   t.scm:28: get-tosleep
   32   t.scm:15: chicken.time#current-milliseconds
   33   t.scm:15: scheme#floor
   34   t.scm:15: scheme#/
   35   t.scm:28: srfi-18#thread-sleep!
   36   t.scm:29: srfi-18#current-thread
   37   t.scm:29: chicken.base#print
   38   t.scm:30: srfi-18#mutex-unlock! <--

(There's an extra debug message on line 15.
Add (dbg "timeout expired for " tto) in this true branch:

(if (>= now tmo1) ; timeout reached?

in ##sys#schedule)

--- The issue
mutex-unlock! makes the decision that a thread freed from
the mutex's waiting list cannot be in the 'ready state.

From the output above you see a case how a thread waiting on a mutex
can end up being in the 'ready state.

line  2: The mutex is locked by primordial thread (pt)
line  4: The pt goes to sleep until 933.0
line  7: As the pt goes to sleep thread1 is scheduled to run
line 10: thread1 tries to lock the mutex, but sets a timeout that
happens to be at time 933.0

lines 12-14: Both threads asleep, time advances to 934
lines 15-16: pt gets put on the ready list
lines 17-19: thread1 gets put on the ready list
line 20: pt starts running
lines 21-22: pt executes mutex-unlock! while thread1 is ready to run

--- A fix

Just allow the 'ready state for threads in mutex-unlock!

In the patch I arbitrarily call ##sys#schedule after removing a thread
from the list, but I think doing nothing would work equally well.

Is this a correct fix?
Sorry, I can't help with that one..

Maybe it's possible there's threads on the waiting list, but the thread
that gets removed is not going to lock the mutex:

There are 3 threads in this scenario, A, B and C.

* A locks mutex
* A sleeps until t
* B tries to lock mutex until t
* C tries to lock mutex
* A and B are woken up at t
* A unlocks mutex, frees B
* B is scheduled to run as per the patch
* B finds out about the timeout, gives up and starts doing something else
* Now thread C is waiting on the mutex but no-one is going to free it!


From b6837b2c94feb5f8348965f538b5a45bf01a7506 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=B6rg=20F=2E=20Wittenberger?=
 
Date: Mon, 3 Dec 2018 21:06:26 +0100
Subject: [PATCH] Fix 1564 internal scheduler error.

---
 scheduler.scm | 80 ++-
 1 file changed, 41 insertions(+), 39 deletions(-)

diff --git a/scheduler.scm b/scheduler.scm
index 0b292f7f..a1a03293 100644
--- a/scheduler.scm
+++ b/scheduler.scm
@@ -34,7 +34,7 @@
 	;; This isn't hidden ATM to allow set!ing it as a hook/workaround
 	; ##sys#force-primordial
 	fdset-set fdset-test create-fdset stderr
-	##sys#clear-i/o-state-for-thread! ##sys#abandon-mutexes) 
+	##sys#thread-clear-blocking-state! ##sys#abandon-mutexes)
   (not inline ##sys#interrupt-hook ##sys#force-primordial)
   (unsafe)
   (foreign-declare #<= now tmo1) ; timeout reached?
 			  (begin