Case 17373 had an erroneous failure from
ProcessTest>>testHighPriorityOverridesWaitTime
and several others seem also to have suffered...
* https://pharo.fogbugz.com/default.asp?17373
* https://pharo.fogbugz.com/default.asp?17496
* https://pharo.fogbugz.com/default.asp?17445
* https://pharo.fogbugz.com/default.asp?7444
These have comments that the problem can't be reproduced, which is
common of occasional race conditions. So rather than just try to
solve this on the issue tracker, I thought I'd share a tip how to
isolate and squash such intermittent failures, essentially by
stressing that single test a large number of times like this...
1 to: 100 do: [ :n | Transcript crShow: 'Try ' , n printString.
ProcessTest debug: #testHighPriorityOverridesWaitTime. ]
Since the debugger only focusses on a point in time snapshot of the
running system, some Transcript instrumentation was useful to overview
the dynamic behaviour, like this...
ProcessTest>>testHighPriorityOverridesWaitTime
| lowerPriorityWaitingLonger higherPriorityWaitingLess
nextReadyProcess |
lowerPriorityWaitingLonger := [ ] forkAt: 11
higherPriorityWaitingLess := [ ] forkAt: 12.
nextReadyProcess := Processor nextReadyProcess.
Transcript cr;
tab;show: (lowerPriorityWaitingLonger priority ->
lowerPriorityWaitingLonger identityHash); cr;
tab;show: (higherPriorityWaitingLess priority ->
higherPriorityWaitingLess identityHash); cr;
tab;show: (nextReadyProcess priority -> nextReadyProcess
identityHash).
self assert: nextReadyProcess equals: higherPriorityWaitingLess.
which produces...
Try 1
11->343137024
12->40659200
12->40659200
Try 2
11->461794304
12->374996992
12->40659200
Observe on the second try nextReadyProcess refers to same process as
from the first try. This can be cleaned up just prior to the asset
check by adding ...
lowerPriorityWaitingLonger terminate.
higherPriorityWaitingLess terminate.
which produces...
Try 1
11->1005688064
12->833273600
12->833273600
...
Try 100
11->764450816
12->807973888
12->807973888
However this is still a little fragile, since some process with
priority greater than the running test could slip in before the call
to nextReadyProcess, as can be seen by pushing the #to:do: loop out to
100000, which produces...
Try 32044
11->784361216
12->431928576
40->397533952
or wrapping it by [ ]forkAt:39, which produces...
Try 1
11->431897088
12->400972288
30->338363648
After several awkward failed attempts to fix this, it seemed best to
observe actual behaviour rather than peeking into the dynamic data
structures. Thus I propose...
ProcessTest>>testHighPriorityOverridesWaitTime
"The first process to resume will pass straight through the gate
while the other waits for the assert to whichRan."
| gate checkAssert priorityRan |
gate := Semaphore new signal.
checkAssert := Semaphore new.
[ gate wait. priorityRan := 11. checkAssert signal ] forkAt: 11.
[ gate wait. priorityRan := 12. checkAssert signal ] forkAt: 12.
checkAssert wait.
self assert: priorityRan=12 description: 'Second scheduled but
higher priority should run first'.
gate signal.
checkAssert wait.
self assert: whichRan=11 description: 'First scheduled but lower
priority should run after'.
https://pharo.fogbugz.com/default.asp?17501