On Sun, 2009-04-12 at 02:59 -0400, Christoffer Dall wrote: > Hi Hollis. > > We are about to begin integration between QEMU and our KVM module for > ARM, but we have a few architectural questions, for which we would > very much appreciate your view based on your experience with the PPC > implementation.
Please also CC a mailing list or two so that more people can help you, or correct or elaborate on my answers... By the way, if you have working kernel and userspace code (even incomplete), you should consider posting it now for feedback and even attract other developers or users. Right now it doesn't look like there's any KVM ARM development going on at all, so potential users might look elsewhere and never come back. > First, it's our impression that we only need to implement the MMU > functionality in the KVM module and QEMU takes care of the rest. > However, we can see that the decrementer unit is implemented in KVM > and not in QEMU. Is it not possible to let a timer reside in QEMU and > inject interrupt into the guest through KVM_INTERRUPT when needed? It is possible. However, in PowerPC, the decrementer timer is part of the core. I believe on other architectures it's common for it to be an off-core device, so there it would make sense to use qemu's emulation. x86 KVM does this with the PIT, for example. (To complicate things slightly, x86 KVM emulates a few devices in the kernel instead of using Qemu's emulation, in particular the PIT and the APIC. They do this because those devices are accessed relatively frequently, and they get better performance that way. This is not entirely uncontroversial though.) > Second, regarding interrupts in general, it seems that the QEMU > architecture for delivering / receiving interrupts is generally > adapted to work with KVM (through the kvm_arch_pre_run function) and > thus from a high architectural point of view, this 'flow of things' > does not need to be modified. Is this correct? Right, you can see the interesting parts in kvm_cpu_exec(), which I think is pretty easy to read. Basically qemu's KVM code polls qemu state for pending interrupts. We could add a KVM hook into the qemu "set irq" path, but instead we check for pending interrupts after the fact. I think that's to minimize KVM-specific code changes to qemu. > As far as we have gathered interrupts are sent to QEMU by signalling > the process, and on each trap to the host, KVM will detect a pending > signal, resume the QEMU process, which will deal with the interrupt in > the main_loop_wait(...) function. Is this far off track? Yes, I think you've got the basic idea. Qemu sets its file descriptors to generate signals when host data is available, so that's how the signals are generated in the first place. (Of course, host data doesn't *necessarily* cause guest interrupts.) So with -nographic, typing into the qemu tty could cause qemu's UART emulation to raise an interrupt to the guest indicating there's stuff to read from the virtual UART FIFO. > Third, it seems that the PPC implementation only transfers a subset of > the CPUPPCState fields to the kvm_regs struct and subsequently only a > subset of the kvm_vcpu_arch struct. Is it possible in short to > summarize how to determine what it is necessary to transfer? The PowerPC implementation should transfer more state than it currently does. What we do now is functional, but we completely omit FPU state and a number of SPRs. If we want to do full-guest debugging with the Qemu debugger, we'd need to fill in the rest. This is somewhat complicated by the differing supervisor register set found in different PowerPC implementations (e.g. 440 vs e500 vs 970). -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html