Re: [Pharo-users] Pharo7, consistent image freeze/deadlock on snapshotAndQuit

Ben Coman Sat, 06 Oct 2018 22:25:54 -0700

>
> On 26 July 2018 at 00:46, Johan Brichau <jo...@inceptive.be> wrote:
> > Hi,
> >
> > I ran into a freeze issue when trying to save and/or quit an image after
> loading Seaside3 and Zinc (not an infrequent combination :)
> >
> > The cause seems to be that terminating processes in a `shutDown` handler
> (called on image shutdown / snapshot) leads to a deadlock (somewhere).
> > Both Comet and Zinc will send #terminate to a process when an image is
> quit or saved. If I disable one of those, the deadlock does not occur.
> >
> > I managed to reconstruct the issue with a simple example, attached.
> >
> > The attached Freeze class creates two processes (that loop with a
> Delay>>wait inside) in the `startUp` method (on image start).
> > These processes are terminated by the shutdown method (called on image
> quit/save).
> > However, when you try to quit and/or save the image, you will experience
> an image freeze.
> >
> > I’m not familiar with the process scheduler, or anything else in this
> context… so help would be appreciated while I try to dive further into it :)
> >
> > To reproduce: just file in the attached file in a Pharo7 image and try
> to quit.
> > For reference, System Reporter output for my configuration.
> >
> > Any ideas?
> >
> > Cheers
> > Johan
>


On Thu, 26 Jul 2018 at 19:06, Ben Coman <b...@openinworld.com> wrote:

> I don't have a total answer, but can confirm the problem and help
> characterize it.
> First, great test-snippet. Really nice and concise.
>
> On Windows it indeed freezes my Pharo 7 image when saving the image.
>
> It doesn't freeze Pharo 6.
>
> It doesn't freeze if the test-processes are not terminated.
>
> It doesn't freeze if the shutdown code runs at a lower priority that
> the test-processes.
> i.e. it works with these added lines...
>     shutDown: quitting
>     "+"    |restorePriority|
>     "+"    restorePriority := Processor activeProcess priority.
>     "+"    Processor activeProcess priority: 15.
>             Process1 ifNotNil:[ Process1 terminate ].
>             Process2 ifNotNil: [Process2 terminate ].
>     "+"   Processor activeProcess priority: restorePriority.
>             Process1 := Process2 := nil
>
>
> Does that help someone else identify the relevant changes between Pharo 6
> & 7
> to explain the behaviour in more depth.
>
> cheers -ben
>

This was being worked on in
https://pharo.fogbugz.com/f/cases/22284/Freeze-during-startup
>From recent discussion Vincent, I believe I've got my head fully around
this...

IIUC part of the shutDown/startUp was being run at highestPriority. So your
case boiled down to...

  |process1 process2|
  Delay delaySchedulerClass: DelaySpinScheduler. "default < build 1273"
  processA := [ (Delay forSeconds: 10) wait ] forkAt: 20 named: 'processA'.
  processB := [ (Delay forSeconds: 10) wait ] forkAt: 20 named: 'processB'.
  1 second wait.
  [
    processA terminate.
    processB terminate. "image locked here"
  ] forkAt: Processor highestPriority.

The freeze is due to the #terminate causing the curtailed block in
Delay>>wait being run at highestPriority.
    Delay>>wait
self schedule.
[delaySemaphore wait] ifCurtailed: [self unschedule].

The first termination (Process A) fills the transfer variable
/finishedDelay/ in DelaySpinScheduler>>unschedule:
which is normally cleared by "timingSemaphore signal" waking up
#handleTimerEvent: running at highestPriority.
But with #unschedule: at highestPriority, co-operative scheduling within
priorities instead continues execution
with the second termination (Process B) which finds /finishedDelay/ still
filled and spins forever with no chance
of #handleTimerEvent: running to clear the transfer variable.

The following instrumentation helps observe this...

    DelayExperimentalScheduler>>unschedule: aDelay
finishedDelay == nil
ifTrue: [
finishedDelay := aDelay. "...and this assignment"
timingSemaphore signal.
]
ifFalse: [ |context|
Transcript crShow: Processor activeProcess name, ' finishedDelay not nil'.
context := thisContext.
[ context = nil ] whileFalse: [
Transcript crShow: '  <-- ', context printString.
context := context sender ]].
which then using "Delay delaySchedulerClass:
DelayExperimentalSpinScheduler" for the boiled-down-case produces...

processB finishedDelay not nil
  <-- DelayExperimentalSpinScheduler>>unschedule:
  <-- Delay>>unschedule
  <-- [ self unschedule ] in Delay>>wait
  <-- Context>>resume:through:
  <-- BlockClosure>>ifCurtailed:
  <-- Delay>>wait
  <-- [ (Delay forSeconds: 10) wait ] in UndefinedObject>>DoIt
  <-- BlockClosure>>on:do:
  <-- BlockClosure>>ensure:
  <-- [ self value. Processor terminateActive ] in BlockClosure>>newProcess


# POSSIBLE FIXES

1. Adding a "Processor yield" like this...
    DelaySpinScheduler>>unschedule: aDelay
[ finishedDelay == nil
                      ifTrue: [
finishedDelay := aDelay.
timingSemaphore signal.
finishedDelay ifNotNil: [ Processor yield ].
^true.
].
true.
] whileTrue.


2. Change the #signal semantics to immediately activate a signaled process
when signalled/signalling priorities are the same. Maybe like this...
    StackInterpreter>>resume: aProcess preemptedYieldingIf: yieldImplicitly
- "Make aProcess runnable and if its priority is higher than that of the
current process, preempt the current process. "
+ "Make aProcess runnable and if its priority is the same or higher than
that of the current process, preempt the current process."
| activeProc activePriority newPriority |
<inline: false>
activeProc := self activeProcess.
activePriority := self quickFetchInteger: PriorityIndex ofObject:
activeProc.
newPriority := self quickFetchInteger: PriorityIndex ofObject: aProcess.
- newPriority <= activePriority ifTrue:
+ newPriority <  activePriority ifTrue:
[self putToSleep: aProcess yieldingIf: true.
^false].
self putToSleep: activeProc yieldingIf: yieldImplicitly.
self transferTo: aProcess.
^true


3. Switching to DelaySemaphoreScheduler which suspends the activeProcess
when the transfer variable is not empty.
This is default in the latest builds.


@Vincent, revising what I said elsewhere that the delay scheduler must be
the only process running at highestPriority,
I've realised that was an assumption I'd held without detailed
consideration.  Perhaps that is over constrained.
Now understanding better the cause of the freeze, it only affects the
spin-scheduler due to its busy-loop in the methods I'd considered only ran
at user-priority.
So other processes running at highestPriority is probably okay, as long as
they have no busy loops.

cheers -ben

Re: [Pharo-users] Pharo7, consistent image freeze/deadlock on snapshotAndQuit

Reply via email to