On Thu, May 19, 2016 at 8:49 AM, Martin McClure <mar...@hand2mouse.com> wrote:
> On 05/18/2016 03:17 PM, Martin McClure wrote:
>>
>> On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote:
>>>
>>> Hi guys,
>>>
>>> I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot
>>> explain how this could happened but it does, and it happened to me a couple
>>> of times (but not fully reproducible).
>>>
>>
>> Hmm. The schedulerResumptionTime is, somehow, being (approximately)
>> doubled. It's not clear how that can happen, but I'll look a little more.
>>
>
> Mario, is there any chance that you might be saving the image during one of
> these Delays?
>
>
> This one smells like a race condition, and I think I see something that
> *might* explain it. But I don't have any more time to spend on this one, so
> I'll leave the rest to someone else. I hope this is helpful:
>
> The only way I immediately see for the schedulerResumptionTime to become
> approximately doubled is if the Delay's resumption time is adjusted by
> #restoreResumptionTimes without previously having been adjusted by
> #saveResumptionTimes.
>
> The only time either of those are sent, that I can see, is on saving the
> image. Both are normally sent, (save before the snapshot, restore
> afterwards), but there may be a hole there.
>
> #saveResumptionTimes is only sent (by this scheduler class) when the
> accessProtect semaphore is held, but #handleTimerEvent: is executed in the
> timing Process *without* the protection of accessProtect, in the case of the
> VM signaling the timingSemaphore. If the VM signals the timingSemaphore,
> #handleTimerEvent: could run in the middle of #saveResumptionTimes. If some
> Delay expires because of that timer event, our Delay could move from being
> the first suspended delay to being the active delay. If that happens after
> we've adjusted the active delay, but before we've processed the suspended
> delays, that Delay will not get adjusted, and will show the symptoms that
> Mariano is seeing.

A quick experiment to test this might be in shutDown/#startUp trying...
    [ self saveResumptionTimes ] valueAt: Processor timingPriority
    [ self resumeResumptionTimes ] valueAt: Processor timingPriority

>
> Also, I'm not sure how the Heap that holds the suspendedDelays will react to
> being modified in the middle of an enumeration. That might open a larger
> window for the problem.
>
> Regards,
>
> -Martin
>

Even if not directly related to Mariano's problem, I agree with your
general assessment.  I'm not comfortable with the way that
#save/#restoreResumptionTimes (which manipulate suspendedDelays) are
called from user priority code via #shutDown/#startUp.  Per the
original code**, accessProtect can't be used inside the timing
priority #handleTimerEvent since accessProtect is held by the user
priority #schedule when it uses "timingSemaphore signal" to invoke
invokes #handleTimerEvent.  accessProtect never protected
timingPriority manipulation of suspendedDelays by #handleTimerEvent,
nor expired delays waking up.  But ahhh... the disabling of
accessProtect previously prevented new delays being scheduled between
a #save and #restore. If a new delay is scheduled after the #save,
when it is #restore'd its resumptionTime would be wrong.

Waiting in the wings for Pharo 6 I have changes that should help:
* have #save/#restoreResumptionTimes *only* called from timing
priority event loop (i.e. #handleTimerEvent)
* shutDown/startUp suspends/resumes the timing priority event loop,
instead of trying to block signals to timingSemaphore

I haven't touched it for a few months so I'll need to chase it up to
provide a preview.


Mariano, can you try DelayMillisecondScheduler (which however is
missing some fixes for other issues).

cheers -ben

Reply via email to