Stephane Eranian wrote: > Hi Avi, > > >> Shobha Ranganathan wrote: >> >>> I am trying to capture in vmx.c the hardware >>> performance counter(PMU) interrupt of a i386 Linux >>> kernel running with perfmon on a Core 2 Duo machine >>> running with kvm-15. host is running kvm with VT-x in >>> x86-64 mode. >>> >>> The PMU interrupt is programmed in the APIC LVT entry >>> (set to 0xee)by the guest OS. >>> >> On stock kvm, the guest os programs a virtual apic that lives in qemu, >> not the real apic, so it would never cause any interrupt. Are you >> running with a modified kvm that allows the guest to touch the real apic? >> >> > > The Performance counters (PMU) cannot be fully virtualized, they need to > run on the actual MSR registers. The PMU interrupt is controlled by the > local APIC. To get overflow-based sampling to work in a guest, we need to > allow the PMU to interrupt. Supposing we have allowed wrmsr,rdmsr to the > PMU registers, the guest perfmon will setup the virtual APIC and virtual > IDT as it normally would on real HW. VT-x takes care of the IDT but not > of the APIC. The guest never touches the real APIC, qemu handles this. > However if the host kernel is running perfmon, it does already have the > actual APIC programmed for the PMU. > > In this configuration, the host perfmon interrupt driver catches the PMU > interrupt generated while running in non-root VMX mode. At that point, there > is a VM-exit. I have now been able to track down the type of exit in this > case. You have a VM-exit for an external interrupt, which is fine, however > the intr_info (VM_EXIT_INTR_INFO) is 0x0, in other words, VT-x does not give > you any good info as to why you exited. As soon as you leave the VM_RESUME > code, > you branch to the host perfmon interrupt handler. >
Actually it can be convinced to give the interrupt number. Right now, we program VT not to ack interrupts, so we don't know their number, and they are dispatched by the processor as soon as we enable interrupts on the host. An alternative mechanism exists. We can tell VT to ack the interrupt, in which case the vector number becomes valid, but we need to dispatch the interrupt ourselves using the 'int' instruction. As I'd rather not do that, perhaps we can program the apic to issue an nmi instead of an interrupt while in guest mode. On receipt of nmi, we can call the host perfmon handler directly to interpret the performance counters. > In any case, the current solution I have for this is sort of hybrid because > you rely on the host APIC to be programmed correctly, and then you need > communication between the host perfmon code and the KVM kernel code to be > able to inject the PMU interrupt back into the guest. Another solution I have > experimented is for the host perfmon to notify the user level qemu APIC code > (SIGIO) which then issues the right KVM_INTERRUPT ioctl(), but that is slow > and has some rce condition with the guest. > > That looks promising. The slowness can be addressed by (first) moving to queued signals instead of delivered signals and (later) pushing the apic emulation into the kernel. VT also has a facility to swap msrs on entry to the guest and back. > The timer interrupt, also normally controlled by the APIC, is managed > differently > and can be fully virtualized by qemu using Linux timers. The PMU cannot be > virtualized that way. > > At this point, even if you had APIC emulation in KVM (kernel), I am not sure > this would solve this issue. I think I can live with having back communication > between the host perfmon and KVM. > > Any better ideas? > > It really depends on what one wants to do with the performance monitor on the guest: - if it's just to shut up the nmi watchdog, we can report a cpu model that does not have the performance monitor (which would be a classic Pentium? or maybe a 486?) - if we want something like the nmi watchdog to run, we can emulate all counters based on cpu cycles, even if they count branches or something else. That gives an inaccurate but sort-of-working counter, which we can emulate using host timers. - if we want real performance monitoring, we need to do the msr swap. That has the disadvantage of disabling perfmon on the host, and of being depressingly complex. What application do you have in mind for kvm guest performance monitoring? >>> Similarly, an IDT entry >>> connects the interrupt vector to the interrupt >>> handler. >>> I am not able to catch, in kvm, the PMU interrupt >>> happening in VMX non-root mode. It does not seem to >>> appear in the VM-exit interruption information nor in >>> the IDT-vectoring information. It does not seem to >>> be caught by any of the exit handlers yet the host PMU >>> interrupt handler catches it which is not what we >>> want. >>> >>> Any idea on what is going on with this interrupt? >>> >>> >> It looks completely normal, assuming the host also programmed the timer >> to the same vector. Look in qemu/hw/apic.c to find your missing interrupt. >> >> > > -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel