On Wed, Aug 16, 2017 at 7:46 AM, Guillermo Polito <guillermopol...@gmail.com
> wrote:

>
> On Tue, Aug 15, 2017 at 3:25 PM, Ben Coman <b...@openinworld.com> wrote:
>
>> In case any of the shutdown/startup scripts use a delay, now or in the
>> future,
>> I'd first try at    highestPriority-1 to avoid influence on the
>> DelayScheduler.
>> but then Eliot's suggestion to valueUnpreemptively may avoid that anyway.
>>
>
> Why should a shutdown/startup use a delay?
>


> startup and shutdown should be fast and not be blocked...
>

okay.  My response was a just reflex to not assume what some third-party
developer might do.  I'm not familiar with the startup/shutdown and had not
considered it deeply.


> If there is a delay on client code, it should block a client thread, not a
> system thread...
>


> Moreover, I see a series of issues in having the delay process running in
> higher priority than the startup. If I'm wrong, please correct me because
> otherwise that means there is something I'm not getting.
>
> First, today the Delay scheduling process is being terminated on shutdown
> and being re-initialized on startup. This means that even if a
> shutdown/startup action tries to use a delay that will fail/block
> indefinitely?
>

That is not how I understand it works.  The DelayScheduler process is not
terminated, just suspended by DelayMicrosecondScheduler>>shutdown.
 #stopTimerEventLoop is not called during shutdown.


>
> Second, what happens with race conditions between the startup and the
> delay process? If the shutdown is in the middle of terminating the delay
> process and the delay process gets suddenly activated?
>
> stopTimerEventLoop
> "Stop the timer event loop"
>
> timerEventLoop ifNotNil: [ timerEventLoop terminate ].
> timerEventLoop := nil.
>

That is effectively only called when changing which between different
DelayScheduler-implementations.


> Maybe before terminating the timerEventLoop we need to suspend it? That
> will at least atomically (primitive) remove the process from the ready list
> and avoid it from being activated again, no?
>

I'm not completely happy with how the DelayScheduler shutdown happens now,
whether some case might cause timingSemaphore to be triggered to run, and
 how #save/#restoreResumptionTimes are called from low-priority code which
feels susceptible to race conditions (although I've not identified any).  A
while a go I was playing with the idea that to **ensure** the
timerEventLoop doesn't run, you add another semaphore in middle activated
by #shutDown,  - but it was near a Pharo 5 release so I didn't push it in,
and then failed to get back to it.  From memory something like...

DelayScheduler>>handleTimerEventLoop: microsecondNowTick
   ....
   suspendDelaysSemaphore ifNotNil: [
         self saveResumptionTimes.
         suspendDelaysSemaphore wait.
         suspendDelaysSemaphore := nil.
         self restoreResumptionTimes.
         ].

DelayScheduler>>shutDown
        suspendDelaysSemaphore := Semaphore new.

DelayScheduler>>startUp
        suspendDelaysSemaphore signal.

I feel a dedicated sempahore would be more robust to control of the
suspension of the DelayScehduler during shutdown/start, rather than leaving
it waiting on the timingSemaphore that several things interact with and
hoping none signal it.

Maybe it time for me to try for my first Pharo 7 contribution.

cheers -ben


> In any case, I see no good in letting a delay work on startup. That is far
> too low level and the system would be in a far too unstable state to run
> any code other than the startup itself.
>
>
>> btw, what happens if an error occurs inside valueUnpreemptively?
>> Does the normal priority debugger still get to run?
>>
>> cheers -ben
>>
>> On Mon, Aug 14, 2017 at 6:42 PM, Guillermo Polito <
>> guillermopol...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm proposing a kind-of critical change that I believe is very good for
>>> the health of the system: I want that the startup of the system runs in
>>> maximum priority and becomes non-interruptable.
>>>
>>> Right now, when you save your image, the shutdown and startup are run in
>>> the same priority than the process that triggered the save (usually the ui
>>> or the command line, priority 40). This can cause lots of problems and race
>>> conditions: processes with higher priorities can interrupt the
>>> shutdown/startup and try to do something while the system is unstable. As a
>>> side effect also, when you use extensively the command line, you start
>>> stacking startup contexts from old sessions:
>>>
>>> ...
>>> session 3 ctxt 4 <- This guy makes a save and a new session starts
>>> session 3 ctxt 3
>>> session 3 ctxt 2
>>> session 3 ctxt 1
>>> session 2 ctxt 4 <- This guy makes a save and a new session starts
>>> session 2 ctxt 3
>>> session 2 ctxt 2
>>> session 2 ctxt 1
>>> session 1 ctxt 4 <- This guy makes a save and a new session starts
>>> session 1 ctxt 3
>>> session 1 ctxt 2
>>> session 1 ctxt 1
>>>
>>> Old contexts are never collected, and the objects they referenced
>>> neither.
>>>
>>> To fix these two problems I propose to do every image save/session start
>>> in a new process in maximum priority. That way, other process should not be
>>> able to interrupt the startup process. Moreover, every session
>>> shutdown/startup should happen in a new clean process, to avoid the session
>>> stacking.
>>>
>>> For normal users, this should have no side effect at all. This change
>>> will have a good impact on people working on the debugger and the stack
>>> such as fueling-out the stack because they will have a cleaner stack.
>>>
>>> There is however a side-effect/design point to consider: startup actions
>>> should be quick to run. If a startup action requires to run a long-running
>>> action such as starting a server or managing a command line action, that
>>> should run in a separate process with lower priority (usually
>>> userPriority). In other words, the startup action should create a new
>>> process managing its action.
>>>
>>> If you want to review (and I'd be glad)
>>>
>>> Pull request: https://github.com/pharo-project/pharo/pull/198
>>> Fogbugz issue: https://pharo.fogbugz.com/f/cases/20309
>>> Current validation going on: https://ci.inria.fr/pharo-ci-j
>>> enkins2/job/Test%20pending%20pull%20request%20and%20branch%2
>>> 0Pipeline/view/change-requests/job/PR-198/
>>>
>>> Guille
>>>
>>> --
>>>
>>>
>>>
>>> Guille Polito
>>>
>>>
>>> Research Engineer
>>>
>>> French National Center for Scientific Research - *http://www.cnrs.fr*
>>> <http://www.cnrs.fr>
>>>
>>>
>>>
>>> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>>>
>>> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013>
>>>
>>
>>
>
>
> --
>
>
>
> Guille Polito
>
>
> Research Engineer
>
> French National Center for Scientific Research - *http://www.cnrs.fr*
> <http://www.cnrs.fr>
>
>
>
> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>
> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013>
>

Reply via email to