On Thu, May 19, 2016 at 8:49 AM, Martin McClure <mar...@hand2mouse.com> wrote: > On 05/18/2016 03:17 PM, Martin McClure wrote: >> >> On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote: >>> >>> Hi guys, >>> >>> I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot >>> explain how this could happened but it does, and it happened to me a couple >>> of times (but not fully reproducible). >>> >> >> Hmm. The schedulerResumptionTime is, somehow, being (approximately) >> doubled. It's not clear how that can happen, but I'll look a little more. >> > > Mario, is there any chance that you might be saving the image during one of > these Delays? > > > This one smells like a race condition, and I think I see something that > *might* explain it. But I don't have any more time to spend on this one, so > I'll leave the rest to someone else. I hope this is helpful: > > The only way I immediately see for the schedulerResumptionTime to become > approximately doubled is if the Delay's resumption time is adjusted by > #restoreResumptionTimes without previously having been adjusted by > #saveResumptionTimes. > > The only time either of those are sent, that I can see, is on saving the > image. Both are normally sent, (save before the snapshot, restore > afterwards), but there may be a hole there. > > #saveResumptionTimes is only sent (by this scheduler class) when the > accessProtect semaphore is held, but #handleTimerEvent: is executed in the > timing Process *without* the protection of accessProtect, in the case of the > VM signaling the timingSemaphore. If the VM signals the timingSemaphore, > #handleTimerEvent: could run in the middle of #saveResumptionTimes. If some > Delay expires because of that timer event, our Delay could move from being > the first suspended delay to being the active delay. If that happens after > we've adjusted the active delay, but before we've processed the suspended > delays, that Delay will not get adjusted, and will show the symptoms that > Mariano is seeing.
A quick experiment to test this might be in shutDown/#startUp trying... [ self saveResumptionTimes ] valueAt: Processor timingPriority [ self resumeResumptionTimes ] valueAt: Processor timingPriority > > Also, I'm not sure how the Heap that holds the suspendedDelays will react to > being modified in the middle of an enumeration. That might open a larger > window for the problem. > > Regards, > > -Martin > Even if not directly related to Mariano's problem, I agree with your general assessment. I'm not comfortable with the way that #save/#restoreResumptionTimes (which manipulate suspendedDelays) are called from user priority code via #shutDown/#startUp. Per the original code**, accessProtect can't be used inside the timing priority #handleTimerEvent since accessProtect is held by the user priority #schedule when it uses "timingSemaphore signal" to invoke invokes #handleTimerEvent. accessProtect never protected timingPriority manipulation of suspendedDelays by #handleTimerEvent, nor expired delays waking up. But ahhh... the disabling of accessProtect previously prevented new delays being scheduled between a #save and #restore. If a new delay is scheduled after the #save, when it is #restore'd its resumptionTime would be wrong. Waiting in the wings for Pharo 6 I have changes that should help: * have #save/#restoreResumptionTimes *only* called from timing priority event loop (i.e. #handleTimerEvent) * shutDown/startUp suspends/resumes the timing priority event loop, instead of trying to block signals to timingSemaphore I haven't touched it for a few months so I'll need to chase it up to provide a preview. Mariano, can you try DelayMillisecondScheduler (which however is missing some fixes for other issues). cheers -ben