Re: Some Code for Performance Profiling
On 04/07/2010 10:23 PM, Jiaqing Du wrote: Can your implementation support both simultaneously? What do you mean "simultaneously"? With my implementation, you either do guest-wide profiling or system-wide profiling. They are achieved through different patches. Actually, the result of guest-wide profiling is a subset of system-wide profiling. A guest admin monitors the performance of their guest via a vpmu. Meanwhile the host admin monitors the performance of the host (including all guests) using the host pmu. Given that the host pmu and the vpmu may select different counters, it is difficult to support both simultaneously. For guest-wide profiling, there are two possible places to save and restore the related MSRs. One is where the CPU switches between guest mode and host mode. We call this *CPU-switch*. Profiling with this enabled reflects how the guest behaves on the physical CPU, plus other virtualized, not emulated, devices. The other place is where the CPU switches between the KVM context and others. Here KVM context means the CPU is executing guest code or KVM code, both kernel space and user space. We call this *domain-switch*. Profiling with this enabled discloses how the guest behaves on both the physical CPU and KVM. (Some emulated operations are really expensive in a virtualized environment.) Which method do you use? Or do you support both? I post two patches in my previous email. One is for CPU-switch, and the other is for domain-switch. I see. I'm not sure I know which one is better! Note disclosing host pmu data to the guest is sometimes a security issue. For instance? The standard example is hyperthreading where the memory bus unit is shared among two logical processors. A guest sampling a vcpu on one thread can gain information about what is happening on the other - the number of bus transactions the other thread has issued. This can be used to establish a communication channel between two guests that shouldn't be communicating, or to eavesdrop on another guest. A similar problem happens with multicores. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Some Code for Performance Profiling
2010/4/5 Avi Kivity : > On 03/31/2010 07:53 PM, Jiaqing Du wrote: >> >> Hi, >> >> We have some code about performance profiling in KVM. They are outputs >> of a school project. Previous discussions in KVM, Perfmon2, and Xen >> mailing lists helped us a lot. The code are NOT in a good shape and >> are only used to demonstrated the feasibility of doing performance >> profiling in KVM. Feel free to use it if you want. >> > > Performance monitoring is an important feature for kvm. Is there any chance > you can work at getting it into good shape? I have been following the discussions about PMU virtualization in the list for a while. Exporting a proper interface, i.e., guest visible MSRs and supported events, to the guest across a large number physical CPUs from different vendors, families, and models is the major problem. For KVM, currently it also supports almost a dozen different types of virtual CPUs. I will think about it and try to come up with something more general. > >> We categorize performance profiling in a virtualized environment into >> two types: *guest-wide profiling* and *system-wide profiling*. For >> guest-wide profiling, only the guest is profiled. KVM virtualizes the >> PMU and the user runs a profiler directly in the guest. It requires no >> modifications to the guest OS and the profiler running in the guest. >> For system-wide profiling, both KVM and the guest OS are profiled. The >> results are similar to what XenOprof outputs. In this case, one >> profiler running in the host and one profiler running in the guest. >> Still it requires no modifications to the guest and the profiler >> running in it. >> > > Can your implementation support both simultaneously? What do you mean "simultaneously"? With my implementation, you either do guest-wide profiling or system-wide profiling. They are achieved through different patches. Actually, the result of guest-wide profiling is a subset of system-wide profiling. > >> For guest-wide profiling, there are two possible places to save and >> restore the related MSRs. One is where the CPU switches between guest >> mode and host mode. We call this *CPU-switch*. Profiling with this >> enabled reflects how the guest behaves on the physical CPU, plus other >> virtualized, not emulated, devices. The other place is where the CPU >> switches between the KVM context and others. Here KVM context means >> the CPU is executing guest code or KVM code, both kernel space and >> user space. We call this *domain-switch*. Profiling with this enabled >> discloses how the guest behaves on both the physical CPU and KVM. >> (Some emulated operations are really expensive in a virtualized >> environment.) >> > > Which method do you use? Or do you support both? I post two patches in my previous email. One is for CPU-switch, and the other is for domain-switch. > > Note disclosing host pmu data to the guest is sometimes a security issue. > For instance? > -- > Do not meddle in the internals of kernels, for they are subtle and quick to > panic. > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Some Code for Performance Profiling
On 03/31/2010 07:53 PM, Jiaqing Du wrote: Hi, We have some code about performance profiling in KVM. They are outputs of a school project. Previous discussions in KVM, Perfmon2, and Xen mailing lists helped us a lot. The code are NOT in a good shape and are only used to demonstrated the feasibility of doing performance profiling in KVM. Feel free to use it if you want. Performance monitoring is an important feature for kvm. Is there any chance you can work at getting it into good shape? We categorize performance profiling in a virtualized environment into two types: *guest-wide profiling* and *system-wide profiling*. For guest-wide profiling, only the guest is profiled. KVM virtualizes the PMU and the user runs a profiler directly in the guest. It requires no modifications to the guest OS and the profiler running in the guest. For system-wide profiling, both KVM and the guest OS are profiled. The results are similar to what XenOprof outputs. In this case, one profiler running in the host and one profiler running in the guest. Still it requires no modifications to the guest and the profiler running in it. Can your implementation support both simultaneously? For guest-wide profiling, there are two possible places to save and restore the related MSRs. One is where the CPU switches between guest mode and host mode. We call this *CPU-switch*. Profiling with this enabled reflects how the guest behaves on the physical CPU, plus other virtualized, not emulated, devices. The other place is where the CPU switches between the KVM context and others. Here KVM context means the CPU is executing guest code or KVM code, both kernel space and user space. We call this *domain-switch*. Profiling with this enabled discloses how the guest behaves on both the physical CPU and KVM. (Some emulated operations are really expensive in a virtualized environment.) Which method do you use? Or do you support both? Note disclosing host pmu data to the guest is sometimes a security issue. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Some Code for Performance Profiling
Hi, We have some code about performance profiling in KVM. They are outputs of a school project. Previous discussions in KVM, Perfmon2, and Xen mailing lists helped us a lot. The code are NOT in a good shape and are only used to demonstrated the feasibility of doing performance profiling in KVM. Feel free to use it if you want. We categorize performance profiling in a virtualized environment into two types: *guest-wide profiling* and *system-wide profiling*. For guest-wide profiling, only the guest is profiled. KVM virtualizes the PMU and the user runs a profiler directly in the guest. It requires no modifications to the guest OS and the profiler running in the guest. For system-wide profiling, both KVM and the guest OS are profiled. The results are similar to what XenOprof outputs. In this case, one profiler running in the host and one profiler running in the guest. Still it requires no modifications to the guest and the profiler running in it. For guest-wide profiling, there are two possible places to save and restore the related MSRs. One is where the CPU switches between guest mode and host mode. We call this *CPU-switch*. Profiling with this enabled reflects how the guest behaves on the physical CPU, plus other virtualized, not emulated, devices. The other place is where the CPU switches between the KVM context and others. Here KVM context means the CPU is executing guest code or KVM code, both kernel space and user space. We call this *domain-switch*. Profiling with this enabled discloses how the guest behaves on both the physical CPU and KVM. (Some emulated operations are really expensive in a virtualized environment.) More details can be found at http://jiaqing.org/download/profiling_kvm.tgz =Guest-wide profiling with domain-switch, for Linux-2.6.32== diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index d27d0a2..b749b5d 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -96,6 +96,7 @@ struct thread_info { #define TIF_DS_AREA_MSR26 /* uses thread_struct.ds_area_msr */ #define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */ #define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */ +#define TIF_VPMU_CTXSW 29 /* KVM thread tag */ #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) @@ -119,6 +120,7 @@ struct thread_info { #define _TIF_DS_AREA_MSR (1 << TIF_DS_AREA_MSR) #define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES) #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT) +#define _TIF_VPMU_CTXSW (1 << TIF_VPMU_CTXSW) /* work to do in syscall_trace_enter() */ #define _TIF_WORK_SYSCALL_ENTRY\ @@ -146,8 +148,9 @@ struct thread_info { /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ - (_TIF_IO_BITMAP|_TIF_DEBUGCTLMSR|_TIF_DS_AREA_MSR|_TIF_NOTSC) - + (_TIF_IO_BITMAP|_TIF_DEBUGCTLMSR|_TIF_DS_AREA_MSR|_TIF_NOTSC| \ + _TIF_VPMU_CTXSW) + #define _TIF_WORK_CTXSW_PREV _TIF_WORK_CTXSW #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW|_TIF_DEBUG) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 5284cd2..d5269d8 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -178,6 +178,53 @@ int set_tsc_mode(unsigned int val) return 0; } +static const u32 vmx_pmu_msr_index[] = { + MSR_P6_EVNTSEL0, MSR_P6_EVNTSEL1, MSR_P6_PERFCTR0, MSR_P6_PERFCTR1, +}; +#define NR_VMX_PMU_MSR ARRAY_SIZE(vmx_pmu_msr_index) +static u64 vpmu_msr_list[NR_VMX_PMU_MSR]; + +static void vpmu_load_msrs(u64 *msr_list) +{ +u64 *p = msr_list; +int i; + + for (i = 0; i < NR_VMX_PMU_MSR; ++i) { + wrmsrl(vmx_pmu_msr_index[i], *p); + p++; + } +} + +static void vpmu_save_msrs(u64 *msr_list) +{ +u64 *p = msr_list; +int i; + + for (i = 0; i < NR_VMX_PMU_MSR; ++i) { + rdmsrl(vmx_pmu_msr_index[i], *p); + p++; + } +} + +#define P6_EVENTSEL0_ENABLE (1 << 22) +static void enable_perf(void) +{ +u64 val; + +rdmsrl(MSR_P6_EVNTSEL0, val); +val |= P6_EVENTSEL0_ENABLE; +wrmsrl(MSR_P6_EVNTSEL0, val); +} + +static void disable_perf(void) +{ +u64 val; + +rdmsrl(MSR_P6_EVNTSEL0, val); +val &= ~P6_EVENTSEL0_ENABLE; +wrmsrl(MSR_P6_EVNTSEL0, val); +} + void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss) { @@ -186,6 +233,21 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, prev = &prev_p->thread; next = &next_p->thread; +if (test_tsk_thread_flag(prev_p, TIF_VPMU_CTXSW) && +test_tsk_thread_flag(next_p, TIF_VPMU_CTXSW)) { +/* do nothing, still in KVM context */ +} else { +if