On Wed, Jul 13, 2016 at 12:30 AM, Mariano Martinez Peck <marianop...@gmail.com> wrote: > > > On Wed, May 18, 2016 at 11:39 PM, Ben Coman <b...@openinworld.com> wrote: >> >> On Thu, May 19, 2016 at 8:49 AM, Martin McClure <mar...@hand2mouse.com> >> wrote: >> > On 05/18/2016 03:17 PM, Martin McClure wrote: >> >> >> >> On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote: >> >>> >> >>> Hi guys, >> >>> >> >>> I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot >> >>> explain how this could happened but it does, and it happened to me a >> >>> couple >> >>> of times (but not fully reproducible). >> >>> >> >> >> >> Hmm. The schedulerResumptionTime is, somehow, being (approximately) >> >> doubled. It's not clear how that can happen, but I'll look a little >> >> more. >> >> >> > >> > Mario, is there any chance that you might be saving the image during one >> > of >> > these Delays? >> > >> > >> > This one smells like a race condition, and I think I see something that >> > *might* explain it. But I don't have any more time to spend on this one, >> > so >> > I'll leave the rest to someone else. I hope this is helpful: >> > >> > The only way I immediately see for the schedulerResumptionTime to become >> > approximately doubled is if the Delay's resumption time is adjusted by >> > #restoreResumptionTimes without previously having been adjusted by >> > #saveResumptionTimes. >> > >> > The only time either of those are sent, that I can see, is on saving the >> > image. Both are normally sent, (save before the snapshot, restore >> > afterwards), but there may be a hole there. >> > >> > #saveResumptionTimes is only sent (by this scheduler class) when the >> > accessProtect semaphore is held, but #handleTimerEvent: is executed in >> > the >> > timing Process *without* the protection of accessProtect, in the case of >> > the >> > VM signaling the timingSemaphore. If the VM signals the timingSemaphore, >> > #handleTimerEvent: could run in the middle of #saveResumptionTimes. If >> > some >> > Delay expires because of that timer event, our Delay could move from >> > being >> > the first suspended delay to being the active delay. If that happens >> > after >> > we've adjusted the active delay, but before we've processed the >> > suspended >> > delays, that Delay will not get adjusted, and will show the symptoms >> > that >> > Mariano is seeing. >> >> A quick experiment to test this might be in shutDown/#startUp trying... >> [ self saveResumptionTimes ] valueAt: Processor timingPriority >> [ self resumeResumptionTimes ] valueAt: Processor timingPriority >> >> > >> > Also, I'm not sure how the Heap that holds the suspendedDelays will >> > react to >> > being modified in the middle of an enumeration. That might open a larger >> > window for the problem. >> > >> > Regards, >> > >> > -Martin >> > >> >> Even if not directly related to Mariano's problem, I agree with your >> general assessment. I'm not comfortable with the way that >> #save/#restoreResumptionTimes (which manipulate suspendedDelays) are >> called from user priority code via #shutDown/#startUp. Per the >> original code**, accessProtect can't be used inside the timing >> priority #handleTimerEvent since accessProtect is held by the user >> priority #schedule when it uses "timingSemaphore signal" to invoke >> invokes #handleTimerEvent. accessProtect never protected >> timingPriority manipulation of suspendedDelays by #handleTimerEvent, >> nor expired delays waking up. But ahhh... the disabling of >> accessProtect previously prevented new delays being scheduled between >> a #save and #restore. If a new delay is scheduled after the #save, >> when it is #restore'd its resumptionTime would be wrong. >> >> Waiting in the wings for Pharo 6 I have changes that should help: >> * have #save/#restoreResumptionTimes *only* called from timing >> priority event loop (i.e. #handleTimerEvent) >> * shutDown/startUp suspends/resumes the timing priority event loop, >> instead of trying to block signals to timingSemaphore >> >> I haven't touched it for a few months so I'll need to chase it up to >> provide a preview. >> >> >> >> Mariano, can you try DelayMillisecondScheduler (which however is >> missing some fixes for other issues). >> > > > Ben, > > Did you find a chance to do something else with this? I am still finding > lookups every in a while even with the DelayMillisecondScheduler :(
Hi Mariano, Sorry for the delay attending to this. I got a bit twisted up on whether I'm complicating things too much with my DelayScheduler refactoring for Pharo 6 (I should just put it out for early review), but then I had to get through a Netflix addiction to the Star Trek prequel Enterprise, and family time over my kids school holidays. I'm back looking at it now - and taking a step back to see what is the minimum change required to fix this. Do you have some test code or a test image I could run to reproduce the problem? Now I have one thing for you to try. In the move of DelaySchedulerXxx from milliseconds to microseconds, the code for dealing with clock wrap-around was removed, but I wonder if it was also covering clock jitter as a side effect. Could you try again the SpinScheduler but restore this code... >From handletTimerEvent... "Check for clock wrap-around." millisecondNowTick < activeDelayStartTime ifTrue: [ "clock wrapped" self saveResumptionTimes. self restoreResumptionTimes ]. activeDelayStartTime := millisecondNowTick. >From runTimerEventLoop... Time millisecondClockValue < millisecondNowTick ifTrue:[ timingSemaphore signal ]. "retry" >From startup... activeDelayStartTime := Time millisecondClockValue. cheers -ben