Re: Some Code for Performance Profiling
2010/4/5 Avi Kivity : > On 03/31/2010 07:53 PM, Jiaqing Du wrote: >> >> Hi, >> >> We have some code about performance profiling in KVM. They are outputs >> of a school project. Previous discussions in KVM, Perfmon2, and Xen >> mailing lists helped us a lot. The code are NOT in a good shape and >> are only used to demonstrated the feasibility of doing performance >> profiling in KVM. Feel free to use it if you want. >> > > Performance monitoring is an important feature for kvm. Is there any chance > you can work at getting it into good shape? I have been following the discussions about PMU virtualization in the list for a while. Exporting a proper interface, i.e., guest visible MSRs and supported events, to the guest across a large number physical CPUs from different vendors, families, and models is the major problem. For KVM, currently it also supports almost a dozen different types of virtual CPUs. I will think about it and try to come up with something more general. > >> We categorize performance profiling in a virtualized environment into >> two types: *guest-wide profiling* and *system-wide profiling*. For >> guest-wide profiling, only the guest is profiled. KVM virtualizes the >> PMU and the user runs a profiler directly in the guest. It requires no >> modifications to the guest OS and the profiler running in the guest. >> For system-wide profiling, both KVM and the guest OS are profiled. The >> results are similar to what XenOprof outputs. In this case, one >> profiler running in the host and one profiler running in the guest. >> Still it requires no modifications to the guest and the profiler >> running in it. >> > > Can your implementation support both simultaneously? What do you mean "simultaneously"? With my implementation, you either do guest-wide profiling or system-wide profiling. They are achieved through different patches. Actually, the result of guest-wide profiling is a subset of system-wide profiling. > >> For guest-wide profiling, there are two possible places to save and >> restore the related MSRs. One is where the CPU switches between guest >> mode and host mode. We call this *CPU-switch*. Profiling with this >> enabled reflects how the guest behaves on the physical CPU, plus other >> virtualized, not emulated, devices. The other place is where the CPU >> switches between the KVM context and others. Here KVM context means >> the CPU is executing guest code or KVM code, both kernel space and >> user space. We call this *domain-switch*. Profiling with this enabled >> discloses how the guest behaves on both the physical CPU and KVM. >> (Some emulated operations are really expensive in a virtualized >> environment.) >> > > Which method do you use? Or do you support both? I post two patches in my previous email. One is for CPU-switch, and the other is for domain-switch. > > Note disclosing host pmu data to the guest is sometimes a security issue. > For instance? > -- > Do not meddle in the internals of kernels, for they are subtle and quick to > panic. > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Some Code for Performance Profiling
Hi, We have some code about performance profiling in KVM. They are outputs of a school project. Previous discussions in KVM, Perfmon2, and Xen mailing lists helped us a lot. The code are NOT in a good shape and are only used to demonstrated the feasibility of doing performance profiling in KVM. Feel free to use it if you want. We categorize performance profiling in a virtualized environment into two types: *guest-wide profiling* and *system-wide profiling*. For guest-wide profiling, only the guest is profiled. KVM virtualizes the PMU and the user runs a profiler directly in the guest. It requires no modifications to the guest OS and the profiler running in the guest. For system-wide profiling, both KVM and the guest OS are profiled. The results are similar to what XenOprof outputs. In this case, one profiler running in the host and one profiler running in the guest. Still it requires no modifications to the guest and the profiler running in it. For guest-wide profiling, there are two possible places to save and restore the related MSRs. One is where the CPU switches between guest mode and host mode. We call this *CPU-switch*. Profiling with this enabled reflects how the guest behaves on the physical CPU, plus other virtualized, not emulated, devices. The other place is where the CPU switches between the KVM context and others. Here KVM context means the CPU is executing guest code or KVM code, both kernel space and user space. We call this *domain-switch*. Profiling with this enabled discloses how the guest behaves on both the physical CPU and KVM. (Some emulated operations are really expensive in a virtualized environment.) More details can be found at http://jiaqing.org/download/profiling_kvm.tgz =Guest-wide profiling with domain-switch, for Linux-2.6.32== diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index d27d0a2..b749b5d 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -96,6 +96,7 @@ struct thread_info { #define TIF_DS_AREA_MSR26 /* uses thread_struct.ds_area_msr */ #define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */ #define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */ +#define TIF_VPMU_CTXSW 29 /* KVM thread tag */ #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) @@ -119,6 +120,7 @@ struct thread_info { #define _TIF_DS_AREA_MSR (1 << TIF_DS_AREA_MSR) #define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES) #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT) +#define _TIF_VPMU_CTXSW (1 << TIF_VPMU_CTXSW) /* work to do in syscall_trace_enter() */ #define _TIF_WORK_SYSCALL_ENTRY\ @@ -146,8 +148,9 @@ struct thread_info { /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ - (_TIF_IO_BITMAP|_TIF_DEBUGCTLMSR|_TIF_DS_AREA_MSR|_TIF_NOTSC) - + (_TIF_IO_BITMAP|_TIF_DEBUGCTLMSR|_TIF_DS_AREA_MSR|_TIF_NOTSC| \ + _TIF_VPMU_CTXSW) + #define _TIF_WORK_CTXSW_PREV _TIF_WORK_CTXSW #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW|_TIF_DEBUG) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 5284cd2..d5269d8 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -178,6 +178,53 @@ int set_tsc_mode(unsigned int val) return 0; } +static const u32 vmx_pmu_msr_index[] = { + MSR_P6_EVNTSEL0, MSR_P6_EVNTSEL1, MSR_P6_PERFCTR0, MSR_P6_PERFCTR1, +}; +#define NR_VMX_PMU_MSR ARRAY_SIZE(vmx_pmu_msr_index) +static u64 vpmu_msr_list[NR_VMX_PMU_MSR]; + +static void vpmu_load_msrs(u64 *msr_list) +{ +u64 *p = msr_list; +int i; + + for (i = 0; i < NR_VMX_PMU_MSR; ++i) { + wrmsrl(vmx_pmu_msr_index[i], *p); + p++; + } +} + +static void vpmu_save_msrs(u64 *msr_list) +{ +u64 *p = msr_list; +int i; + + for (i = 0; i < NR_VMX_PMU_MSR; ++i) { + rdmsrl(vmx_pmu_msr_index[i], *p); + p++; + } +} + +#define P6_EVENTSEL0_ENABLE (1 << 22) +static void enable_perf(void) +{ +u64 val; + +rdmsrl(MSR_P6_EVNTSEL0, val); +val |= P6_EVENTSEL0_ENABLE; +wrmsrl(MSR_P6_EVNTSEL0, val); +} + +static void disable_perf(void) +{ +u64 val; + +rdmsrl(MSR_P6_EVNTSEL0, val); +val &= ~P6_EVENTSEL0_ENABLE; +wrmsrl(MSR_P6_EVNTSEL0, val); +} + void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss) { @@ -186,6 +233,21 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, prev = &prev_p->thread; next = &next_p->thread; +if (test_tsk_thread_flag(prev_p, TIF_VPMU_CTXSW) && +test_tsk_thread_flag(next_p, TIF_VPMU_CTXSW)) { +/* do nothing, still in KVM context */ +} else { +if
Re: MSRs load/store
Hi Avi, I did not get your point. But if we want to multiplex some of the MSRs across the VMM and the guest(s), it would be handy if the hardware provides this feature: save host's version and load guest's version. Of course, we can do this manually. I'm just wondering why this feature is missing. Thanks, Jiaqing 2009/12/7 Avi Kivity : > On 12/07/2009 05:07 PM, Jiaqing Du wrote: >> >> Hi List, >> >> My question is about VM-Exit& VM-Entry controls for MSRs on Intel's >> processors. >> >> For VM-Exit, a VMM can specify lists of MSRs to be stored and loaded >> on VM exits. But for VM-Entry, a VMM can only specify a list of MSRs >> to be loaded on VM entries. Why does not the processor have the >> feature that stores MSRs before loading new ones for VM entries? >> > > Presumably the host knows what values are in those MSRs, so it doesn't need > to store them. > > -- > error compiling committee.c: too many arguments to function > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
MSRs load/store
Hi List, My question is about VM-Exit & VM-Entry controls for MSRs on Intel's processors. For VM-Exit, a VMM can specify lists of MSRs to be stored and loaded on VM exits. But for VM-Entry, a VMM can only specify a list of MSRs to be loaded on VM entries. Why does not the processor have the feature that stores MSRs before loading new ones for VM entries? Thanks, Jiaqing -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NMI Injection to Guest
Hi Gleb, Another problem on AMD processors. After each vm-exit, I need to check if this vm-exit is due to NMI. For vmx.c, I add the check in vmx_complete_interrupts(). The code snippet is: 3539 if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR && 3540 (exit_intr_info & INTR_INFO_VALID_MASK)) { 3541 3542 printk(KERN_INFO "kvm-oprofile: vm exit due to NMI.\n"); 3543 3544 /* indicate vm-exit due to conter overflow */ 3545 vcpu->vm_exit_on_cntr_overflow = 1; 3546 } This works on Intel chips. I did the similar check in svm_complete_interrupts(). 2501 static void svm_complete_interrupts(struct vcpu_svm *svm) 2502 { 2503 u8 vector; 2504 int type; 2505 u32 exitintinfo = svm->vmcb->control.exit_int_info; 2506 struct kvm_vcpu *vcpu = &svm->vcpu; 2507 2508 if (svm->vcpu.arch.hflags & HF_IRET_MASK) 2509 svm->vcpu.arch.hflags &= ~(HF_NMI_MASK | HF_IRET_MASK); 2510 2511 svm->vcpu.arch.nmi_injected = false; 2512 kvm_clear_exception_queue(&svm->vcpu); 2513 kvm_clear_interrupt_queue(&svm->vcpu); 2514 2515 if (!(exitintinfo & SVM_EXITINTINFO_VALID)) 2516 return; 2517 2518 vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK; 2519 type = exitintinfo & SVM_EXITINTINFO_TYPE_MASK; 2520 2521 /* kvm-oprofile */ 2522 if (type == SVM_EXITINTINFO_TYPE_NMI) { 2523 2524 printk(KERN_INFO "kvm-oprofile: counter_overflowed & vm exit.\n"); 2525 vcpu->vm_exit_on_cntr_overflow = 1; 2526 } However, this part (2522 to 2526) never got executed. By using qemu monitor, I managed to inject NMI to the guests. But this check, after vm-exit due to NMI, does not succeed. Thanks, Jiaqing 2009/7/30 Jiaqing Du : > Hi Gleb, > > My code works by setting "vcpu->arch.nmi_pending = 1;" inside > vcpu_enter_guest(). > > > Thanks, > Jiaqing > > 2009/7/27 Gleb Natapov : >> On Sun, Jul 26, 2009 at 09:25:34PM +0200, Jiaqing Du wrote: >>> Hi Gleb, >>> >>> Thanks for your reply. >>> >>> 2009/7/26 Gleb Natapov : >>> > On Sat, Jul 25, 2009 at 10:46:39PM +0200, Jiaqing Du wrote: >>> >> Hi list, >>> >> >>> >> I'm trying to extend OProfile to support guest profiling. One step of >>> >> my work is to push an NMI to the guest(s) when a performance counter >>> >> overflows. Please correct me if the following is not correct: >>> >> >>> >> counter overflow --> NMI to host --> VM exit --> "int $2" to handle >>> >> NMI on host --> ... --> VM entry --> NMI to guest >>> >> >>> > Correct except the last step (--> NMI to guest). Host nmi is not >>> > propagated to guests. >>> >>> Yes. I need to add some code to propagate host NMI to guests. >>> > >>> >> On the path between VM-exit and VM-entry, I want to push an NMI to the >>> >> guest. I tried to put the following code on the path, but never >>> >> succeeded. Various wired things happened, such as KVM hangs, guest >>> >> kernel oops, and host hangs. I tried both code with Linux 2.6.30 and >>> >> version 88. >>> >> >>> >> if (vmx_nmi_allowed()) { vmx_inject_nmi(); } >>> >> >>> >> Any suggestions? Where is the right place to push an NMI and what are >>> >> the necessary checks? >>> > Call kvm_inject_nmi(vcpu). And don't forget to vcpu_load(vcpu) before >>> > doing it. See kvm_vcpu_ioctl_nmi(). >>> >>> Based on the code with Linux 2.6.30, what kvm_inject_nmi(vcpu) does is >>> just set vcpu->arch.nmi_pending to 1. kvm_vcpu_ioctl_nmi() puts >>> vcpu_load() before the setting and vcpu_put() after it. >>> >>> I need to push host NMI to guests between a VM-exit and a VM-entry >>> after that. The VM-exit is due to an NMI caused by performance counter >>> overflow. The following code with vcpu_enter_guest(), which is >>> surrounded by a vcpu_load() and vcpu_put(), checks this >>> vcpu->arch.nmi_pending and other related flags to decide whether an >>> NMI should be pushed to guests. >>> >>> if (vcpu->arch.exception.pending) >>> __queue_exception(vcpu); >>> else if (irqchip_in_kernel(vcpu->kvm)) >>> kvm_x86_ops->inject_pending_irq(vcpu); >>> else >&g
Re: NMI Injection to Guest
Hi Gleb, My code works by setting "vcpu->arch.nmi_pending = 1;" inside vcpu_enter_guest(). Thanks, Jiaqing 2009/7/27 Gleb Natapov : > On Sun, Jul 26, 2009 at 09:25:34PM +0200, Jiaqing Du wrote: >> Hi Gleb, >> >> Thanks for your reply. >> >> 2009/7/26 Gleb Natapov : >> > On Sat, Jul 25, 2009 at 10:46:39PM +0200, Jiaqing Du wrote: >> >> Hi list, >> >> >> >> I'm trying to extend OProfile to support guest profiling. One step of >> >> my work is to push an NMI to the guest(s) when a performance counter >> >> overflows. Please correct me if the following is not correct: >> >> >> >> counter overflow --> NMI to host --> VM exit --> "int $2" to handle >> >> NMI on host --> ... --> VM entry --> NMI to guest >> >> >> > Correct except the last step (--> NMI to guest). Host nmi is not >> > propagated to guests. >> >> Yes. I need to add some code to propagate host NMI to guests. >> > >> >> On the path between VM-exit and VM-entry, I want to push an NMI to the >> >> guest. I tried to put the following code on the path, but never >> >> succeeded. Various wired things happened, such as KVM hangs, guest >> >> kernel oops, and host hangs. I tried both code with Linux 2.6.30 and >> >> version 88. >> >> >> >> if (vmx_nmi_allowed()) { vmx_inject_nmi(); } >> >> >> >> Any suggestions? Where is the right place to push an NMI and what are >> >> the necessary checks? >> > Call kvm_inject_nmi(vcpu). And don't forget to vcpu_load(vcpu) before >> > doing it. See kvm_vcpu_ioctl_nmi(). >> >> Based on the code with Linux 2.6.30, what kvm_inject_nmi(vcpu) does is >> just set vcpu->arch.nmi_pending to 1. kvm_vcpu_ioctl_nmi() puts >> vcpu_load() before the setting and vcpu_put() after it. >> >> I need to push host NMI to guests between a VM-exit and a VM-entry >> after that. The VM-exit is due to an NMI caused by performance counter >> overflow. The following code with vcpu_enter_guest(), which is >> surrounded by a vcpu_load() and vcpu_put(), checks this >> vcpu->arch.nmi_pending and other related flags to decide whether an >> NMI should be pushed to guests. >> >> if (vcpu->arch.exception.pending) >> __queue_exception(vcpu); >> else if (irqchip_in_kernel(vcpu->kvm)) >> kvm_x86_ops->inject_pending_irq(vcpu); >> else >> kvm_x86_ops->inject_pending_vectors(vcpu, kvm_run); >> >> What I did is given below: >> >> 3097 static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run >> *kvm_run) >> 3098 { >> ... ... >> >> 3156 if (kvm_vm_exit_on_cnt_overflow) { >> 3157 vcpu->arch.nmi_pending = 1; >> 3158 } >> 3159 >> 3160 if (vcpu->arch.exception.pending) >> 3161 __queue_exception(vcpu); >> 3162 else if (irqchip_in_kernel(vcpu->kvm)) >> 3163 kvm_x86_ops->inject_pending_irq(vcpu); >> 3164 else >> 3165 kvm_x86_ops->inject_pending_vectors(vcpu, kvm_run); >> >> ... >> 3236 } >> >> In vcpu_enter_guest(), before this part of code is reached, >> vcpu->arch.nmi_pending is set to 1 if the VM-exit is due to >> performance counter overflow. Still, no NMIs are seen by the guests. I >> also tried to put this "vcpu->arch.nmi_pending = 1;" somewhere else on >> the path between a VM-exit and VM-entry, it does not seem to work >> neither. Only vmx_inject_nmi() manages to push NMIs to guests, but >> without right sanity checks, it causes various wired host and guest >> behaviors. >> >> To inject NMIs on the path between a VM-exit and VM-entry, what's to try >> next? >> > If you set vcpu->arch.nmi_pending here there vmx_inject_nmi() will be > called inside kvm_x86_ops->inject_pending_irq(vcpu) (if there is not > pending exceptions or interrupt at that moment), so if NMI is not > injected either you have a bug somewhere (why kvm_vm_exit_on_cnt_overflow > is global?) or you guest ignores NMIs. Does your guest react to NMI if > you send it via qemu monitor (type "nmi 0" in qemu monitor). > > Post you code here, may be I'll see something. > > -- > Gleb. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NMI Injection to Guest
Hi Gleb, Thanks for your reply. 2009/7/26 Gleb Natapov : > On Sat, Jul 25, 2009 at 10:46:39PM +0200, Jiaqing Du wrote: >> Hi list, >> >> I'm trying to extend OProfile to support guest profiling. One step of >> my work is to push an NMI to the guest(s) when a performance counter >> overflows. Please correct me if the following is not correct: >> >> counter overflow --> NMI to host --> VM exit --> "int $2" to handle >> NMI on host --> ... --> VM entry --> NMI to guest >> > Correct except the last step (--> NMI to guest). Host nmi is not > propagated to guests. Yes. I need to add some code to propagate host NMI to guests. > >> On the path between VM-exit and VM-entry, I want to push an NMI to the >> guest. I tried to put the following code on the path, but never >> succeeded. Various wired things happened, such as KVM hangs, guest >> kernel oops, and host hangs. I tried both code with Linux 2.6.30 and >> version 88. >> >> if (vmx_nmi_allowed()) { vmx_inject_nmi(); } >> >> Any suggestions? Where is the right place to push an NMI and what are >> the necessary checks? > Call kvm_inject_nmi(vcpu). And don't forget to vcpu_load(vcpu) before > doing it. See kvm_vcpu_ioctl_nmi(). Based on the code with Linux 2.6.30, what kvm_inject_nmi(vcpu) does is just set vcpu->arch.nmi_pending to 1. kvm_vcpu_ioctl_nmi() puts vcpu_load() before the setting and vcpu_put() after it. I need to push host NMI to guests between a VM-exit and a VM-entry after that. The VM-exit is due to an NMI caused by performance counter overflow. The following code with vcpu_enter_guest(), which is surrounded by a vcpu_load() and vcpu_put(), checks this vcpu->arch.nmi_pending and other related flags to decide whether an NMI should be pushed to guests. if (vcpu->arch.exception.pending) __queue_exception(vcpu); else if (irqchip_in_kernel(vcpu->kvm)) kvm_x86_ops->inject_pending_irq(vcpu); else kvm_x86_ops->inject_pending_vectors(vcpu, kvm_run); What I did is given below: 3097 static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) 3098 { ... ... 3156 if (kvm_vm_exit_on_cnt_overflow) { 3157 vcpu->arch.nmi_pending = 1; 3158 } 3159 3160 if (vcpu->arch.exception.pending) 3161 __queue_exception(vcpu); 3162 else if (irqchip_in_kernel(vcpu->kvm)) 3163 kvm_x86_ops->inject_pending_irq(vcpu); 3164 else 3165 kvm_x86_ops->inject_pending_vectors(vcpu, kvm_run); ... 3236 } In vcpu_enter_guest(), before this part of code is reached, vcpu->arch.nmi_pending is set to 1 if the VM-exit is due to performance counter overflow. Still, no NMIs are seen by the guests. I also tried to put this "vcpu->arch.nmi_pending = 1;" somewhere else on the path between a VM-exit and VM-entry, it does not seem to work neither. Only vmx_inject_nmi() manages to push NMIs to guests, but without right sanity checks, it causes various wired host and guest behaviors. To inject NMIs on the path between a VM-exit and VM-entry, what's to try next? > > -- > Gleb. > Thanks, Jiaqing -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
NMI Injection to Guest
Hi list, I'm trying to extend OProfile to support guest profiling. One step of my work is to push an NMI to the guest(s) when a performance counter overflows. Please correct me if the following is not correct: counter overflow --> NMI to host --> VM exit --> "int $2" to handle NMI on host --> ... --> VM entry --> NMI to guest On the path between VM-exit and VM-entry, I want to push an NMI to the guest. I tried to put the following code on the path, but never succeeded. Various wired things happened, such as KVM hangs, guest kernel oops, and host hangs. I tried both code with Linux 2.6.30 and version 88. if (vmx_nmi_allowed()) { vmx_inject_nmi(); } Any suggestions? Where is the right place to push an NMI and what are the necessary checks? Thanks, Jiaqing -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html