On Wed, May 18, 2016 at 11:39 PM, Ben Coman <b...@openinworld.com> wrote:
> On Thu, May 19, 2016 at 8:49 AM, Martin McClure <mar...@hand2mouse.com> > wrote: > > On 05/18/2016 03:17 PM, Martin McClure wrote: > >> > >> On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote: > >>> > >>> Hi guys, > >>> > >>> I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot > >>> explain how this could happened but it does, and it happened to me a > couple > >>> of times (but not fully reproducible). > >>> > >> > >> Hmm. The schedulerResumptionTime is, somehow, being (approximately) > >> doubled. It's not clear how that can happen, but I'll look a little > more. > >> > > > > Mario, is there any chance that you might be saving the image during one > of > > these Delays? > > > > > > This one smells like a race condition, and I think I see something that > > *might* explain it. But I don't have any more time to spend on this one, > so > > I'll leave the rest to someone else. I hope this is helpful: > > > > The only way I immediately see for the schedulerResumptionTime to become > > approximately doubled is if the Delay's resumption time is adjusted by > > #restoreResumptionTimes without previously having been adjusted by > > #saveResumptionTimes. > > > > The only time either of those are sent, that I can see, is on saving the > > image. Both are normally sent, (save before the snapshot, restore > > afterwards), but there may be a hole there. > > > > #saveResumptionTimes is only sent (by this scheduler class) when the > > accessProtect semaphore is held, but #handleTimerEvent: is executed in > the > > timing Process *without* the protection of accessProtect, in the case of > the > > VM signaling the timingSemaphore. If the VM signals the timingSemaphore, > > #handleTimerEvent: could run in the middle of #saveResumptionTimes. If > some > > Delay expires because of that timer event, our Delay could move from > being > > the first suspended delay to being the active delay. If that happens > after > > we've adjusted the active delay, but before we've processed the suspended > > delays, that Delay will not get adjusted, and will show the symptoms that > > Mariano is seeing. > > A quick experiment to test this might be in shutDown/#startUp trying... > [ self saveResumptionTimes ] valueAt: Processor timingPriority > [ self resumeResumptionTimes ] valueAt: Processor timingPriority > > > > > Also, I'm not sure how the Heap that holds the suspendedDelays will > react to > > being modified in the middle of an enumeration. That might open a larger > > window for the problem. > > > > Regards, > > > > -Martin > > > > Even if not directly related to Mariano's problem, I agree with your > general assessment. I'm not comfortable with the way that > #save/#restoreResumptionTimes (which manipulate suspendedDelays) are > called from user priority code via #shutDown/#startUp. Per the > original code**, accessProtect can't be used inside the timing > priority #handleTimerEvent since accessProtect is held by the user > priority #schedule when it uses "timingSemaphore signal" to invoke > invokes #handleTimerEvent. accessProtect never protected > timingPriority manipulation of suspendedDelays by #handleTimerEvent, > nor expired delays waking up. But ahhh... the disabling of > accessProtect previously prevented new delays being scheduled between > a #save and #restore. If a new delay is scheduled after the #save, > when it is #restore'd its resumptionTime would be wrong. > > Waiting in the wings for Pharo 6 I have changes that should help: > * have #save/#restoreResumptionTimes *only* called from timing > priority event loop (i.e. #handleTimerEvent) > * shutDown/startUp suspends/resumes the timing priority event loop, > instead of trying to block signals to timingSemaphore > > I haven't touched it for a few months so I'll need to chase it up to > provide a preview. > > Mariano, can you try DelayMillisecondScheduler (which however is > missing some fixes for other issues). > > Ben, Did you find a chance to do something else with this? I am still finding lookups every in a while even with the DelayMillisecondScheduler :( Thanks! -- Mariano http://marianopeck.wordpress.com