On Wednesday 14 January 2009 06:01:17 Marcelo Tosatti wrote: > On Tue, Jan 13, 2009 at 03:47:49PM +0800, Sheng Yang wrote: > > On Sunday 11 January 2009 12:55:58 Marcelo Tosatti wrote: > > > > The original apic_timer_intr_post() got: > > > if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec) > > > apic->timer.last_update = ktime_add_ns( > > > apic->timer.last_update, > > > apic->timer.period); > > > > I think the purpose is let guest see a reasonable TMCCT. Which means: > > > > 1. Timer start to fire at start_apic_timer(). last_update=now(). > > > > 2. Every one LAPIC timer interrupt was injected into the guest, so the > > time at least elapsed timer.period(the first alarm set at "period" > > later), then last_update is increased by period. Notice last_update's > > base is set before timer fire, so it's not indicated the next one, but > > *the time this interrupt should be injected*(time can be delayed)... So > > last_update = n*period + base. > > Right. > > > 3. If there is pending interrupt, last_update won't be updated, so we > > have > > > > this: > > > while (counter_passed > tmcct) > > > counter_passed -= tmcct; > > > tmcct -= counter_passed; > > > > (notice tmcct is TMICT here.) > > > > And the returned tmcct value indicated (counter_passed % tmict), a offset > > regardless of pending interrupt numbers. > > Right. The problem is that for accumulated interrupts, the guest will > receive the interrupt as fast as the qemu process can be scheduled (as > long as its not masked, of course). There could be higher priority > vectors in the mix, but thats not the common case. > > So calculating the offset using last interrupt injection is not very > reasonable in this case.
Um... I think it's not easy to find a complete reasonable result here, I haven't look into how OS us TMCCT, maybe it can help us to find a more reasonable result here. > > I think now the overflow seems OK, but I am not sure why last_update can > > be bigger than ktime_get()? Maybe due to vcpu migration? Suspect some > > racy or boundary condition existed... > > Yep, it seems to be some small inaccuracy in the accounting that > causes it. > > > And base on this, I don't think my quick fix is correct... > > > > > 3) And then there's interrupt reinjection. Once interrupts accumulate, > > > and we reinject, current count reads become completly bogus. This is > > > the reason for time drift on RHEL4-style kernels with "clock=tsc". The > > > algorithm calculates the interrupt handling delay by using the PIT > > > count (equivalent to APIC_TMCCT). But PIT count emulation uses the > > > hrtimer expiration time, which has nothing to do with the accumulated > > > interrupts. > > > > > > So what I propose is to first switch lapic current count emulation to a > > > straightforward scaled hrtimer_get_remaining equivalent. > > > > > > For the reinjection case, maintain an average of the delay between host > > > interrupt and interrupt injection (this can be generic, so both PIT > > > and LAPIC timer can use it). Return that scaled average on APIC_TMCCT > > > emulation whenever pending > 2. > > > > > > What you think? > > > > 1. If we simply use ktime_get() to update last_update, the interval can't > > be same between different last_update. I think we may got some problem > > here, but not for sure. For example, three delayed interrupt was injected > > one after one, last_update would show very little interval, and > > APIC_TMCCT may got trouble. > > I mean get rid of last_update and use hrtimer_get_remaining on > apic_get_tmcct for no pending interrupts. And estimated delay between > timer fire and injection if there are pending interrupts. > > > Maintain an average of the delay seems a little tricky here, and I am not > > sure if it would help the problem... Average is average at most, not the > > real affection at the time... > > Yes. Probably using the last delay between timer fire and injection is > more accurate than an average. > > > 2. For the current mechanism, the interval is the same, so last_update > > always equal to n*period + base. If there are more than one pending > > interrupts, TMCCT should also can return the relative correct value. > > > > So I am leaning toward to fix current problem, though I haven't find the > > root cause yet... > > I think it can be simpler, without the need to deal with overflow at > all. > > Understanding the present root cause is important though. Yes, but still don't have many clues now... I would look through the code and purpose some patches for Alex to test... -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
