On 2012-11-03 05:43, Satoru Moriya wrote: > We have some plans to migrate old enterprise/control systems which > require low latency (msec order) to kvm virtualized environment. > In order to satisfy the requirements, this patch adds realtime option > to qemu: > > -realtime maxprio=<prio>,policy=<pol> > > This option change the scheduling policy and priority to realtime one > (only vcpu thread) as specified with argument and mlock all qemu and > guest memory.
This patch breaks win32 build. All the POSIX stuff has to be pushed into os-posix.c e.g. I'm introducing some os_prioritize() function for that purpose, empty on win32. Then another question is how to get the parameters around. I played with many options, ending up so far with /* called by os_prioritize */ void qemu_init_realtime(int rt_sched_policy, int max_sched_priority); /* called by threaded subsystems */ bool qemu_realtime_is_enabled(void); void qemu_realtime_get_parameters(int *policy, int *max_priority); all hosted by qemu-thread-*.c (empty/aborting on win32). This allows to adjust subsystems to realtime without pushing all the parameters into global variables. > > Of course, we need much more improvements to keep latency low in qemu > virtualized environment and this is a first step. OTOH, we can meet the > requirement of our first migration project with this patch. > > These are basic performance test results: > > Host : 4 core, 4GB, 3.7.0-rc3 > Guest: 1 core, 512MB, 3.6.3-1.fc17 > > Benchmark: cyclictest > https://rt.wiki.kernel.org/index.php/Cyclictest > > Command: > $ cyclictest -p 99 -n -m -q -l 100000 > > Results: > - no load (1:normal qemu, 2:realtime qemu) > 1. T: 0 ( 544) P:99 I:1000 C:100000 Min: 11 Act: 32 Avg: 157 Max: 10029 > 2. T: 0 ( 449) P:99 I:1000 C:100000 Min: 16 Act: 30 Avg: 29 Max: 540 > > - load (heavy network traffic) (3:normal qemu, 4: realtime qemu) > 3. T: 0 (3455) P:99 I:1000 C:100000 Min: 10 Act: 38 Avg: 364 Max: 18394 > 4. T: 0 ( 493) P:99 I:1000 C:100000 Min: 12 Act: 21 Avg: 76 Max: 10796 What are the numbers of "chrt -f -p 99 <vcpu_tid>" compared to this? My point is: This alone is not yet a good justification for the switch and its current semantic. The approach of just raising the VCPU priority is quite fragile without [V]CPU isolation. If you raise the VCPU over its event threads, specifically the iothread, you risk starvation, e.g during boot (BIOS will poll endlessly for PIT or disk). Yes, there is /proc/sys/kernel/sched_rt_*, but this is what you typically disable when doing realtime seriously, particularly if your guest doesn't idle during operation. The model I would propose for mainline first is different: maxprio goes to the event threads, maxprio - 1 to all vcpus (means that maxprio must be > 1). This setup is less likely to starve and makes more sense (interrupts must have higher prio than CPUs). However, that's also not yet generic as we will have scenarios where only part of the event sources and VCPUs will be prioritized and the rest shall remain low prio / SCHED_OTHER. Besides defining a way to express such configurations, the problem is that they may not work during guest boot. So some realtime profile switching concept may also be needed. I haven't made up my mind on these issues yet. Not to speak of the horrible mess of configuring a PREEMPT-RT host... What is clear, though, is that we need a reference show case for realtime QEMU/KVM. One that is as easy to reproduce as possible, doesn't depend on proprietary realtime guests and clearly shows the advantages of all the needed changes for a reasonable use case. I'd like to discuss this at the RT-KVM BoF at the KVM Forum next week. Will you and/or any of your colleagues be there? Jan
signature.asc
Description: OpenPGP digital signature