Blue Swirl wrote: > On Thu, May 27, 2010 at 7:08 PM, Jan Kiszka <jan.kis...@web.de> wrote: >> Blue Swirl wrote: >>> On Thu, May 27, 2010 at 6:31 PM, Jan Kiszka <jan.kis...@web.de> wrote: >>>> Blue Swirl wrote: >>>>> On Wed, May 26, 2010 at 11:26 PM, Paul Brook <p...@codesourcery.com> >>>>> wrote: >>>>>>> At the other extreme, would it be possible to make the educated guests >>>>>>> aware of the virtualization also in clock aspect: virtio-clock? >>>>>> The guest doesn't even need to be aware of virtualization. It just needs >>>>>> to be >>>>>> able to accommodate the lack of guaranteed realtime behavior. >>>>>> >>>>>> The fundamental problem here is that some guest operating systems assume >>>>>> that >>>>>> the hardware provides certain realtime guarantees with respect to >>>>>> execution of >>>>>> interrupt handlers. In particular they assume that the CPU will always >>>>>> be >>>>>> able to complete execution of the timer IRQ handler before the periodic >>>>>> timer >>>>>> triggers again. In most virtualized environments you have absolutely no >>>>>> guarantee of realtime response. >>>>>> >>>>>> With Linux guests this was solved a long time ago by the introduction of >>>>>> tickless kernels. These separate the timekeeping from wakeup events, so >>>>>> it >>>>>> doesn't matter if several wakeup triggers end up getting merged (either >>>>>> at the >>>>>> hardware level or via top/bottom half guest IRQ handlers). >>>>>> >>>>>> >>>>>> It's worth mentioning that this problem also occurs on real hardware, >>>>>> typically due to lame hardware/drivers which end up masking interrupts or >>>>>> otherwise stall the CPU for for long periods of time. >>>>>> >>>>>> >>>>>> The PIT hack attempts to workaround broken guests by adding artificial >>>>>> latency >>>>>> to the timer event, ensuring that the guest "sees" them all. >>>>>> Unfortunately >>>>>> guests vary on when it is safe for them to see the next timer event, and >>>>>> trying to observe this behavior involves potentially harmful heuristics >>>>>> and >>>>>> collusion between unrelated devices (e.g. interrupt controller and >>>>>> timer). >>>>>> >>>>>> In some cases we don't even do that, and just reschedule the event some >>>>>> arbitrarily small amount of time later. This assumes the guest to do >>>>>> useful >>>>>> work in that time. In a single threaded environment this is probably >>>>>> true - >>>>>> qemu got enough CPU to inject the first interrupt, so will probably >>>>>> manage to >>>>>> execute some guest code before the end of its timeslice. In an >>>>>> environment >>>>>> where interrupt processing/delivery and execution of the guest code >>>>>> happen in >>>>>> different threads this becomes increasingly likely to fail. >>>>> So any voodoo around timer events is doomed to fail in some cases. >>>>> What's the amount of hacks what we want then? Is there any generic >>>> The aim of this patch is to reduce the amount of existing and upcoming >>>> hacks. It may still require some refinements, but I think we haven't >>>> found any smarter approach yet that fits existing use cases. >>> I don't feel we have tried other possibilities hard enough. >> Well, seeing prototypes wouldn't be bad, also to run real load againt >> them. But at least I'm currently clueless what to implement. > > Perhaps now is then not the time to rush to implement something, but > to brainstorm for a clean solution.
And sometimes it can help to understand how ideas could even be improved or why others doesn't work at all. > >>>>> solution, like slowing down the guest system to the point where we can >>>>> guarantee the interrupt rate vs. CPU execution speed? >>>> That's generally a non-option in virtualized production environments. >>>> Specifically if the guest system lost interrupts due to host >>>> overcommitment, you do not want it slow down even further. >>> I meant that the guest time could be scaled down, for example 2s in >>> wall clock time would be presented to the guest as 1s. >> But that is precisely what already happens when the guest loses timer >> interrupts. There is no other time source for this kind of guests - >> often except for some external events generated by systems which you >> don't want to fall behind arbitrarily. >> >>> Then the amount >>> of CPU cycles between timer interrupts would increase and hopefully >>> the guest can keep up. If the guest sleeps, time base could be >>> accelerated to catch up with wall clock and then set back to 1:1 rate. >> Can't follow you ATM, sorry. What should be slowed down then? And how >> precisely? > > I think vm_clock and everything that depends on vm_clock, also > rtc_clock should be tied to vm_clock in this mode, not host_clock. Let me check if I got this idea correctly: Instead of tuning just the tick frequency of the affected timer device / sending its backlog in a row, you rather want to tune the vm_clock correspondingly? Maybe a way to abstract the required logic currently sitting only in the RTC for use by other timer sources as well. But just switching rtc_clock to vm_clock when the user wants host_clock is obviously not an option. We would rather have to tune host_clock in parallel. Still, this does not answer: - How do you want to detect lost timer ticks? - What subsystem(s) keeps track of the backlog? - And depending on the above: How to detect at all that a specific IRQ is a timer tick? Jan
signature.asc
Description: OpenPGP digital signature