Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. /* * Whether page-faults are trapped is determined by a combination of @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + else + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); Let's prepare the value for VM_EXIT_CONTROLS in a local variable first, then write it to the vmcs. /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) } /* + * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support + * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. + */ + if ((vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); + return 1; + } Nope, the guest is free to run the preemption timer without saving on exits. It may have a valid use case for this, e.g. that it will always reprogram it on entry. + + /* * We're finally done with prerequisite checking, and can start with * the nested entry. */ Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH v3 0/6] KVM: nVMX: Enable unrestricted guest mode and fix some nEPT issues
On 2013-08-08 16:26, Jan Kiszka wrote: These patches apply on top of kvm.git queue. Changes in v3: - rebased over queue - added Do not set identity page map for L2 - dropped Fix guest CR3 read-back on VM-exit Jan Kiszka (6): KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in load_vmcs12_host_state KVM: nVMX: Do not set identity page map for L2 KVM: nVMX: Load nEPT state after EFER KVM: nVMX: Implement support for EFER saving on VM-exit KVM: nVMX: Update mmu.base_role.nxe after EFER loading on VM-entry/exit KVM: nVMX: Enable unrestricted guest mode support arch/x86/kvm/vmx.c | 44 +++- 1 files changed, 31 insertions(+), 13 deletions(-) Ping for this series. It still applies, now to next, without conflicts. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/2] kvm: use anon_inode_getfd() with O_CLOEXEC flag
Il 24/08/2013 22:14, Yann Droneaud ha scritto: KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud ydrone...@opteya.com Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com Reviewed-by: Paolo Bonzini pbonz...@redhat.com --- virt/kvm/kvm_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 89f74d1..d65cc0c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1896,7 +1896,7 @@ static struct file_operations kvm_vcpu_fops = { */ static int create_vcpu_fd(struct kvm_vcpu *vcpu) { - return anon_inode_getfd(kvm-vcpu, kvm_vcpu_fops, vcpu, O_RDWR); + return anon_inode_getfd(kvm-vcpu, kvm_vcpu_fops, vcpu, O_RDWR | O_CLOEXEC); } /* @@ -2306,7 +2306,7 @@ static int kvm_ioctl_create_device(struct kvm *kvm, return ret; } - ret = anon_inode_getfd(ops-name, kvm_device_fops, dev, O_RDWR); + ret = anon_inode_getfd(ops-name, kvm_device_fops, dev, O_RDWR | O_CLOEXEC); if (ret 0) { ops-destroy(dev); return ret; @@ -2590,7 +2590,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) return r; } #endif - r = anon_inode_getfd(kvm-vm, kvm_vm_fops, kvm, O_RDWR); + r = anon_inode_getfd(kvm-vm, kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC); if (r 0) kvm_put_kvm(kvm); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. /* * Whether page-faults are trapped is determined by a combination of @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + else + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); Let's prepare the value for VM_EXIT_CONTROLS in a local variable first, then write it to the vmcs. /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) } /* + * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support + * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. + */ + if ((vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); + return 1; + } Nope, the guest is free to run the preemption timer without saving on exits. It may have a valid use case for this, e.g. that it will always reprogram it on entry. Here !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware support save preemption timer feature, which means if L2 supports pinbased vmx preemption timer, host must support save preemption timer feature. Though nested_vmx_exit_ctls_* is used for nested env, but it can also used to reflect the host's feature. Here is what I discuss with you yesterday, and we can also get the feature via rdmsr here to avoid the confusion. Arthur + + /* * We're finally done with prerequisite checking, and can start with * the nested entry. */ Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 09:24, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. /* * Whether page-faults are trapped is determined by a combination of @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + else + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); Let's prepare the value for VM_EXIT_CONTROLS in a local variable first, then write it to the vmcs. /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) } /* + * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support + * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. + */ + if ((vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); + return 1; + } Nope, the guest is free to run the preemption timer without saving on exits. It may have a valid use case for this, e.g. that it will always reprogram it on entry. Here !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware support save preemption timer feature, which means if L2 supports pinbased vmx preemption timer, host must support save preemption timer feature. Sorry, parsed the code incorrectly. Though nested_vmx_exit_ctls_* is used for nested env, but it can also used to reflect the host's feature. Here is what I discuss with you yesterday, and we can also get the feature via rdmsr here to avoid the confusion. Yes. The point is that we will not even expose PIN_BASED_VMX_PREEMPTION_TIMER if VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is missing. If the guest then requests the former, it simply sets an invalid pin-based control value which we already catch and report. So this hunk becomes redundant.
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 09:24, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. Besides, we need to test that in the absence of PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen according to Intel SDM. @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. /* * Whether page-faults are trapped is determined by a combination of @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + else + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); Let's prepare the value for VM_EXIT_CONTROLS in a local variable first, then write it to the vmcs. /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) } /* + * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support + * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. + */ + if ((vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); + return 1; + } Nope, the guest is free to run the preemption timer without saving on exits. It may have a valid use case for this, e.g. that it will always reprogram it on entry. Here !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware support save preemption timer feature, which means if L2 supports pinbased vmx preemption timer, host must support save preemption timer feature. Sorry, parsed the code incorrectly. Though nested_vmx_exit_ctls_* is used for nested env, but it can also used to reflect the host's feature. Here is what I discuss with you yesterday, and we can also get the feature via rdmsr here to avoid the confusion. Yes. The point is that we
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 3:37 PM, Abel Gordon ab...@il.ibm.com wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. My solution is setting save preemption value feature of L2 if L2 sets vmx preemption timer feature, thus external interrupts (or others) will save the exact value in L2's vmcs, and the resume of L2 will load the value in L2's vmcs. Thus cycles of handling these vmexit in L0 will not affect L2's preemption value. Arthur -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 09:37, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 09:24, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. Besides, we need to test that in the absence of PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen according to Intel SDM. If the SDM guarantees this for us, we don't need such a safety measure. Otherwise, it should be added, yes. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 3:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 09:37, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 09:24, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..9579409 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its behavior properly in that case. Besides, we need to test that in the absence of PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen according to Intel SDM. If the SDM guarantees this for us, we don't need such a safety measure. Otherwise, it should be added, yes. The SDM has such description (see 26.2.1.2): If “activate VMX-preemption timer” VM-execution control is 0, the “save VMX-preemption timer value” VM-exit control must also be 0. It doesn't tell us if these two flags are consistent when getting them from related MSR (IA32_VMX_PINBASED_CTLS and IA32_VMX_EXIT_CTLS). So I think the check is needed here. Arthur Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. Jan PS: You had kvm-owner in CC. signature.asc Description: OpenPGP digital signature
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. Sorry, I previously misunderstand your comments. But why should we need to exclude cycles in L0 from L2 preemption value? These cycles are not spent by L2 and it should not be on L2. Arthur -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:55:24 AM: From: Arthur Chunqi Li yzt...@gmail.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Jan Kiszka jan.kis...@web.de, Gleb Natapov g...@redhat.com, kvm kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, Paolo Bonzini pbonz...@redhat.com Date: 25/08/2013 10:55 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2- L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. Sorry, I previously misunderstand your comments. But why should we need to exclude cycles in L0 from L2 preemption value? These cycles are not spent by L2 and it should not be on L2. L1 asked the hardware (emulated by L0) to run L2 and force an exit after Y cycles. Now, in practice, we may spend X cycles at L0 handling exits without switching to L1. That means that from L1 perspective L2 was running all these X cycles. L1 should assume that the instructions per cycle the CPU executed decreased but the cycles were spent. That's why I believe you should take in account these X cycles. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 10:04, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:55:24 AM: From: Arthur Chunqi Li yzt...@gmail.com To: Abel Gordon/Haifa/IBM@IBMIL, Cc: Jan Kiszka jan.kis...@web.de, Gleb Natapov g...@redhat.com, kvm kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, Paolo Bonzini pbonz...@redhat.com Date: 25/08/2013 10:55 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2- L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. Sorry, I previously misunderstand your comments. But why should we need to exclude cycles in L0 from L2 preemption value? These cycles are not spent by L2 and it should not be on L2. L1 asked the hardware (emulated by L0) to run L2 and force an exit after Y cycles. Now, in practice, we may spend X cycles at L0 handling exits without switching to L1. That means that from L1 perspective L2 was running all these X cycles. L1 should assume that the instructions per cycle the CPU executed decreased but the cycles were spent. That's why I believe you should take in account these X cycles. Now I get it. There is likely some truth in this as the reference clock for the preemption timer, the TSC, isn't stopped for L1/L2 while running in L0. And the SDM demands the countdown to be proportional to that clock. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 10:18, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. Yes, but see my other reply. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) Almost . 6) should be: If exit to L1 occurred after last L2, set X to 0. Then load MAX(original value - Y - X, 0). The hardware will trigger the exit for us. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 10:25, Jan Kiszka wrote: On 2013-08-25 10:18, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. Yes, but see my other reply. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) Almost . 6) should be: If exit to L1 occurred after last L2, set X to 0. Then load MAX(original value - Y - X, 0). Hmm, no: If exit to L1 occurred after last L2, load value of vmcs12, else load MAX(original value - Y - X, 0). Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
Jan Kiszka jan.kis...@web.de wrote on 25/08/2013 11:27:22 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 11:27 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer On 2013-08-25 10:25, Jan Kiszka wrote: On 2013-08-25 10:18, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. Yes, but see my other reply. I sent my reply before I read yours, sorry.
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) Yes, your description is right. But I'm also thinking about my previous consideration, why should we consider such X cycles as what L2 spent. For nested VMX. external interrupt is not provided by L1, it is triggered from L0 and want to cause periodically exit to L1, L2 is
Re: Partial huge page backing with KVM/qemu
On Sat, Aug 24, 2013 at 12:32:07AM +, Chris Leduc wrote: Hi - In a KVM/qemu environment is it possible for the host to back only a portion of the guests memory with huge pages? In some situations it may not be desirable to back the entirety of a guest's memory with huge pages (as can be done via libvirt memoryBacking option). What are those situations? What would be very useful is to request huge pages in the guest, either at boot time or dynamically, and have the host back them with physical huge pages, but not back the rest of the normal page guest memory with huge pages from the host. The equivalent in Xen is setting allowsuperpage=1 on the hypervisor boot line. As far as I can tell this disables/enables use of huge pages by XEN vm, not something you say you want. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 10:41, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) Yes, your description is right. But I'm also thinking about my previous consideration, why should we consider such X cycles as what L2 spent. For nested VMX. external interrupt is not provided by L1, it is triggered from L0 and want to cause periodically exit to L1, L2 is accidentally injure actually. Since
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 10:41, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) Yes, your description is right. But I'm also thinking about my previous consideration, why should we consider such X cycles as what L2 spent. For nested VMX. external interrupt is not provided by L1, it is triggered from L0 and want to
Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 11:07, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 10:41, Arthur Chunqi Li wrote: On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:54 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:50, Abel Gordon wrote: kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM: From: Jan Kiszka jan.kis...@web.de To: Abel Gordon/Haifa/IBM@IBMIL, Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com Date: 25/08/2013 10:43 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-25 09:37, Abel Gordon wrote: From: Jan Kiszka jan.kis...@web.de To: 李春奇 Arthur Chunqi Li yzt...@gmail.com, Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com Date: 25/08/2013 09:44 AM Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer Sent by: kvm-ow...@vger.kernel.org On 2013-08-24 20:44, root wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs_config.pin_based_exec_ctrl | vmcs12-pin_based_vm_exec_control)); - if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, - vmcs12-vmx_preemption_timer_value); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } This is not correct. We still need to set the vmcs to vmx_preemption_timer_value. The difference is that, on exit from L2, vmx_preemption_timer_value has to be updated according to the saved hardware state. The corresponding code is missing in your patch so far. I think something else maybe be missing here: assuming L0 handles exits for L2 without involving L1 (e.g. external interrupts or ept violations), then, we may spend some cycles in L0 handling these exits. Note L1 is not aware of these exits and from L1 perspective L2 was running on the CPU. That means that we may need to reduce these cycles spent at L0 from the preemtion timer or emulate a preemption timer exit to force a transition to L1 instead of resuming L2. That's precisely what the logic I described should achieve: reload the value we saved on L2 exit on reentry. But don't you think we should also reduce the cycles spent at L0 from the preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit which was not forwarded to L1, then, before we resume L2, the preemption timer should be: (previous_value_on_exit - X). If (previous_value_on_exit - X) 0, then we should force (emulate) a preemption timer exit between L2 and L1. We ask the hardware to save the value of the preemption on L2 exit. This value will be exposed to L1 (if it asked for saving as well) and/or be written back to the hardware on L2 reenty (unless L1 had a chance to run and modified it). So the time spent in L0 is implicitly subtracted. I think you are suggesting the following, please correct me if I am wrong. 1) L1 resumes L2 with preemption timer enabled 2) L0 emulates the resume/launch 3) L2 runs for Y cycles until an external interrupt occurs (Y preemption timer specified by L1) 4) L0 saved the preemption timer (original value - Y) 5) L0 spends X cycles handling the external interrupt 6) L0 resumes L2 with preemption timer = original value - Y Note that in this case X is ignored. I was suggesting to do the following: 6) If original value - Y - X 0 then L0 resumes L2 with preemption timer = original value - Y - X else L0 emulates a L2-L1 preemption timer exit (resumes L1) Yes, your description is right. But I'm also thinking about my previous consideration, why should we consider such X cycles as what L2 spent. For nested VMX. external interrupt is not provided
Re: [PATCH v3 0/6] KVM: nVMX: Enable unrestricted guest mode and fix some nEPT issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Il 25/08/2013 08:46, Jan Kiszka ha scritto: On 2013-08-08 16:26, Jan Kiszka wrote: These patches apply on top of kvm.git queue. Changes in v3: - rebased over queue - added Do not set identity page map for L2 - dropped Fix guest CR3 read-back on VM-exit Jan Kiszka (6): KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in load_vmcs12_host_state KVM: nVMX: Do not set identity page map for L2 KVM: nVMX: Load nEPT state after EFER KVM: nVMX: Implement support for EFER saving on VM-exit KVM: nVMX: Update mmu.base_role.nxe after EFER loading on VM-entry/exit KVM: nVMX: Enable unrestricted guest mode support arch/x86/kvm/vmx.c | 44 +++- 1 files changed, 31 insertions(+), 13 deletions(-) Ping for this series. It still applies, now to next, without conflicts. I tested it and it works fine. It also looks good to me. However, I wanted to leave time to Gleb so that he could also review it. Paolo -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSGdYBAAoJEBvWZb6bTYbyQgsP/ihhQga53141nXf9U70lLbbP rv4hJCsqwHSOmjlEww0qUJOAE1nwlSYwowkucLKSVeuKy2/MXl+qwfxdLkFVqwgs CvSadewf69BrYDkIs9gaxZ++CDKOWGppoKU2EyWLTibiloUti9kATXcxNGyQNHgY 7SrNrwzayUyp4KCm2iXcmag9U3kRscTHH2zxKFGjKtrvsf3yNjDHQJlXFIV0r1IB HJCjm5kb8k1r9MGJye1nR4ZCyzqNtqyxGLaJV6W4cz5kJcNIStJuL18SGGzf77jb I4DpDredzzuI22CsnJEicucNpq5i9C7tIxqo9q5zeNsg9gya+SdkHZEhOGompEP6 tDrd0Apn3Bbz48kLuGF6VvX3g4iLNop5qGXsZ66NxnQdtT9Yfx1jOQA2yqZGfcPi hSNTHQBHlcvsXTO3n7lkCHd0inyxe+4zxF1hQBzYtT9gPp3KQRt4nd2ZqcAjVZrb iEqTyXaiTi+5FDJXMziWNompQwWwJ8muMKNCd+Z7371TtVATmx7CEmGsKoB7buGb sZiah2uQ0FBkD6v4wSL9VY9iTb+IRBEgAB/9viBiOcyxAngiAcTT8WB1snxlMUmj 5M31qPIWHN1qbi/JBE+/K3yFgAHJPjkfTsavCVqjXl4Su5bMmUOoCHE30Z6/CIQV VWVaAJAymWyY+o+ON8HQ =8y8m -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.10.1 - NMI received for unknown reason
On Fri, Aug 09, 2013 at 09:14:13PM +0200, Stefan Pietsch wrote: On 04.08.2013 14:44, Gleb Natapov wrote: On Fri, Aug 02, 2013 at 08:24:38AM +0200, Stefan Pietsch wrote: On 31.07.2013 11:20, Gleb Natapov wrote: On Wed, Jul 31, 2013 at 11:10:01AM +0200, Stefan Pietsch wrote: On 30.07.2013 07:31, Gleb Natapov wrote: What happen if you run perf on your host (perf record -a)? Do you see same NMI messages? It seems that perf record -a triggers some delayed NMI messages. They appear about 20 or 30 minutes after the command. This seems strange. Definitely strange. KVM guest is not running in parallel, correct? 20, 30 minutes after perf stopped running or it is running all of the time? No, the KVM guest ist not running in parallel. But I'm not able to clearly reproduce the NMI messages with perf record. I start perf record -a and after some minutes I stop the recording. After that it seems NMI messages appear within a random period of time. So, I cannot tell what triggers the messages. When you run KVM with coreduo cpu model it emulates PMU which basically make is perf front end. If you can reproduce the messages with perf too it probably means that the problem is not in the KVM itself. If you disabled NMI watchdog in the guest the messages may go away. Can you send your guest's dmesg when you boot it with coreduo mode? The NMI messages appear in the host only. The guest runs as usual. I understand that. But enabling guest nmi watchdog is what makes KVM to use perf subsystem and likely causes this host messages. Try do disable nmi watchdog in a guest and see what happens. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()
On Fri, Aug 23, 2013 at 04:50:38PM +0800, Jason Wang wrote: On 08/20/2013 10:33 AM, Jason Wang wrote: On 08/16/2013 05:54 PM, Michael S. Tsirkin wrote: On Fri, Aug 16, 2013 at 01:16:26PM +0800, Jason Wang wrote: Switch to use vhost_add_used_and_signal_n() to avoid multiple calls to vhost_add_used_and_signal(). With the patch we will call at most 2 times (consider done_idx warp around) compared to N times w/o this patch. Signed-off-by: Jason Wang jasow...@redhat.com So? Does this help performance then? Looks like it can especially when guest does support event index. When guest enable tx interrupt, this can saves us some unnecessary signal to guest. I will do some test. Have done some test. I can see 2% - 3% increasing in both aggregate transaction rate and per cpu transaction rate in TCP_RR and UDP_RR test. I'm using ixgbe. W/o this patch, I can see more than 100 calls of vhost_add_used_signal() in one vhost_zerocopy_signaled_used(). This is because ixgbe (and other modern ethernet driver) tends to free old tx skbs in a loop during tx interrupt, and vhost tend to batch the adding used and signal in vhost_zerocopy_callback(). Switching to use vhost_add_use_and_signal_n() means saving 100 times of used idx updating and memory barriers. Well it's only smp_wmb so a nop on most architectures, so a 2% gain is surprising. I'm guessing the cache miss on the write is what's giving us a speedup here. I'll review the code, thanks. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] vhost_net: remove the max pending check
On Fri, Aug 23, 2013 at 04:55:49PM +0800, Jason Wang wrote: On 08/20/2013 10:48 AM, Jason Wang wrote: On 08/16/2013 06:02 PM, Michael S. Tsirkin wrote: On Fri, Aug 16, 2013 at 01:16:30PM +0800, Jason Wang wrote: We used to limit the max pending DMAs to prevent guest from pinning too many pages. But this could be removed since: - We have the sk_wmem_alloc check in both tun/macvtap to do the same work - This max pending check were almost useless since it was one done when there's no new buffers coming from guest. Guest can easily exceeds the limitation. - We've already check upend_idx != done_idx and switch to non zerocopy then. So even if all vq-heads were used, we can still does the packet transmission. We can but performance will suffer. The check were in fact only done when no new buffers submitted from guest. So if guest keep sending, the check won't be done. If we really want to do this, we should do it unconditionally. Anyway, I will do test to see the result. There's a bug in PATCH 5/6, the check: nvq-upend_idx != nvq-done_idx makes the zerocopy always been disabled since we initialize both upend_idx and done_idx to zero. So I change it to: (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx. But what I would really like to try is limit ubuf_info to VHOST_MAX_PEND. I think this has a chance to improve performance since we'll be using less cache. Of course this means we must fix the code to really never submit more than VHOST_MAX_PEND requests. Want to try? With this change on top, I didn't see performance difference w/ and w/o this patch. Did you try small message sizes btw (like 1K)? Or just netperf default of 64K? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V12 3/5] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
On Wed, Aug 07, 2013 at 12:40:36AM +0530, Raghavendra K T wrote: On 08/07/2013 12:02 AM, Eric Northup wrote: If this is about migration correctness, could it get folded into the previous patch 2/5, so that there's not a broken commit which could hurt bisection? Yes. It could be. Only reason I maintained like that was, original author in the previous patch is different (Srivatsa) and I did not want to merge this hunk when the patch series got evolved to mix the sign-offs. Gleb, Paolo please let me know. Yes please, do so. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] -cpu host (was Re: KVM call minutes for 2013-08-06)
On Thu, Aug 08, 2013 at 05:55:09PM +0200, Andreas Färber wrote: Hi Peter, Am 08.08.2013 14:51, schrieb Peter Maydell: [I missed this KVM call but the stuff about -cpu host ties into an issue we've been grappling with for ARM KVM, so it seems a reasonable jumping-off-point.] On 6 August 2013 16:15, Juan Quintela quint...@redhat.com wrote: 2013-08-06 -- What libvirt needs/miss Today? - how to handle machine types? creating them inside qemu? - qemu --cpu help only shows cpus, not what features qemu will use - qemu -cpu host what does this exactly means? kvm removes same flags. - Important to know if migration would work. - Machine types sometimes disable some feature, so cpu alone is not enough. - kernel removes some features because it knows it can't be virtualised - qemu adds some others because it knows it don't need host support - and then lots of features in the middle So, coming at this from an ARM perspective: Should any target arch that supports KVM also support -cpu host? If so, what should it do? I think that depends on the target and whether/what is useful. Is there a description somewhere of what the x86 and PPC semantics of -cpu host are? I'm afraid our usual documentation will be reading the source code. ;) x86 was first to implement -cpu host and passed through pretty much all host features even if they would not work without additional support code. This is definitely not true. Only features that will work are passsed through. Actually the definition of -cpu host for x86 can be: advertise any feature that can be supported on this host/qemu combo. I've seen a bunch of bugs where that leads to GMP and others breaking badly. Lately in the case of PMU we've started to limit that. The problem with PMU was that PMU capabilities was passed through even for non -cpu host. There was no problem with -cpu host. Alex proposed -cpu best, which was never merged to date. It was similar to how ppc's -cpu host works: According to http://wiki.qemu.org/Features/CPUModels#-cpu_host_vs_-cpu_best it should select predefined cpu model closest to host one. Useful, bit not the same as -cpu host. ppc matches the Processor Version Register (PVR) in kvm.c against its known models from cpu-models.c (strictly today, mask being discussed). The PVR can be read from userspace via mfpvr alias to mfspr (Move From Special Purpose Register; possibly emulated for userspace by kernel?). CPU features are all QEMU-driven AFAIU, through the CPU families in translate_init.c. Beware, everything is highly macro'fied in ppc code. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] -cpu host (was Re: KVM call minutes for 2013-08-06)
On Fri, Aug 09, 2013 at 02:53:30PM -0300, Eduardo Habkost wrote: On Thu, Aug 08, 2013 at 12:29:07PM -0700, Christoffer Dall wrote: On Thu, Aug 08, 2013 at 08:05:11PM +0100, Peter Maydell wrote: On 8 August 2013 19:39, Christoffer Dall christoffer.d...@linaro.org wrote: FWIW, from the kernel point of view I'd much prefer to return this is the type of VCPU that I prefer to emulate to user space on this current host than having QEMU come up with its own suggestion for CPU and asking the kernel for it. The reason is that it gives us slightly more freedom in how we choose to support a given host SoC in that we can say that we at least support core A on core B, so if user space can deal with a virtual core A, we should be good. Hmm, I'm not sure how useful a query support kind of API would be to QEMU. QEMU is basically going to have two use cases: (1) I want an A15 [ie -cpu cortex-a15] (2) give me whatever you have and I'll cope [ie -cpu host] so my thought was that we could just have the kernel support init.target = KVM_ARM_TARGET_HOST; memset(init.features, 0, sizeof(init.features)); ret = kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, init); (in the same way we currently ask for KVM_ARM_TARGET_CORTEX_A15). I guess we could have a return preferred target value VM ioctl, but it seems a bit pointless given that the only thing userspace is going to do with the return value is immediately feed it back to the kernel... My thinking was that the result of cpu = KVM_ARM_TARGET_HOST would be the same as x = kvm_get_target_host(), cpu = x, but at the same time letting QEMU know what it's dealing with. Perhaps QEMU doesn't need this for emulation, but isn't it useful for save/restore/migration/debugging scenarios? FWIW, this is how it works on x86: QEMU calls GET_SUPPORTED_CPUID and then uses the result on a SET_CPUID call. (Well, at least that's the general idea, but the details are a bit more complicated than that. e.g.: some features can be enabled only if some QEMU options are set as well, like tsc-deadline and x2apic, that require the in-kernel irqchip to be enabled.) Even if you don't really need this 2-step method today, doing this would allow QEMU to fiddle with some bits if necessary in the future. Without 2-step method how QEMU ARM knows that it can create cpu type user asked for? On x86 QEMU queries kernel about what feature can be supported and then and it with requested features (feature can be requested implicitly, by specifying cpu model name like Neahalem, or explicitly, by adding +feature_name on -cpu parameter). -cpu host is just a by product of this algorithm, it says enables everything that is supported (supported 1 instead of supported requested). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] -cpu host (was Re: KVM call minutes for 2013-08-06)
On Thu, Aug 08, 2013 at 07:20:41PM +0100, Peter Maydell wrote: For ARM you can't get at feature info of the host from userspace (unless you want to get into parsing /proc/cpuinfo), so my current idea is to have KVM_ARM_VCPU_INIT support a target-cpu-type which means whatever host CPU is. Then when we've created the vcpu we can populate QEMU's idea of what the CPU features are by using the existing ioctls for reading the cp15 registers of the vcpu. Sounds sane to me iff those cp15 registers all work with KVM and don't need any additional KVM/QEMU/device code. Yes; KVM won't tell us about CP15 registers unless they are exposed to the guest VM (that is, we're querying the So why not implement something similar to x86 cpuid thing. Have separate ioctls to query what CP15 registers KVM can support and what is exposed to a guest? VCPU, not the host CPU). More generally, the cp15 tuple list code I landed a couple of months back makes the kernel the authoritative source for which cp15 registers exist and what their values are -- in -enable-kvm mode QEMU no longer cares about them (its own list of which registers exist for which CPU is used only for TCG). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines
On Fri, Aug 09, 2013 at 07:30:13AM -0700, Christoffer Dall wrote: The KVM_HPAGE_DEFINES are a little artificial on ARM, since the huge page size is statically defined at compile time and there is only a single huge page size. Now when the main kvm code relying on these defines has been moved to the x86 specific part of the world, we can get rid of these. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- [Resending this because the KVM list filter caught the subject of the previous message due to the letter X] arch/arm/include/asm/kvm_host.h |5 - 1 file changed, 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 7d22517..e45a74b 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -38,11 +38,6 @@ #define KVM_VCPU_MAX_FEATURES 1 -/* We don't currently support large pages. */ -#define KVM_HPAGE_GFN_SHIFT(x) 0 -#define KVM_NR_PAGE_SIZES1 With huge page support you do have two different page sizes, but the only code that uses those defines currently is in the x86 shadow mmu code, so I am fine with moving gfn_to_index() into x86 specific code and getting rid of those defines (you will probably need them when you will implement nested HYP mode :)). But can you do it as a separate patch series and remove the defines for all arches? -#define KVM_PAGES_PER_HPAGE(x) (1UL31) - #include kvm/arm_vgic.h struct kvm_vcpu; -- 1.7.10.4 ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines
On 25 August 2013 15:05, Gleb Natapov g...@redhat.com wrote: (you will probably need them when you will implement nested HYP mode :)). Smiley noted, but this is pretty unlikely since it's not possible to lie to the guest about which mode it's in, so you can't make a guest think it's in Hyp mode. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines
On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote: On 25 August 2013 15:05, Gleb Natapov g...@redhat.com wrote: (you will probably need them when you will implement nested HYP mode :)). Smiley noted, but this is pretty unlikely since it's not possible to lie to the guest about which mode it's in, so you can't make a guest think it's in Hyp mode. I suspected this, but forgot most that I read about Hyp mode by now. Need to refresh my memory ASAP. Is it impossible even with a lot of emulation? Can guest detect that it is not in a Hyp mode without trapping into hypervisor? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
On 24.08.2013, at 21:14, Yann Droneaud wrote: KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud ydrone...@opteya.com Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com Reviewed-by: Alexander Graf ag...@suse.de Would it make sense to simply inherit the O_CLOEXEC flag from the parent kvm fd instead? That would give user space the power to keep fds across exec() if it wants to. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines
On 25 August 2013 15:48, Gleb Natapov g...@redhat.com wrote: On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote: Smiley noted, but this is pretty unlikely since it's not possible to lie to the guest about which mode it's in, so you can't make a guest think it's in Hyp mode. I suspected this, but forgot most that I read about Hyp mode by now. Need to refresh my memory ASAP. Is it impossible even with a lot of emulation? Can guest detect that it is not in a Hyp mode without trapping into hypervisor? Yes. The current mode is in the the low bits of the CPSR, which is readable without causing a trap. This is just the most obvious roadblock; I bet there are more. If you really had to run Hyp mode code in a VM you probably have to do it by having it all emulated via TCG. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] KVM: arm-vgic: Add vgic reg access from dev attr
On 23.08.2013, at 20:20, Christoffer Dall wrote: Add infrastructure to handle distributor and cpu interface register accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups and defining the semantics of the attr field to be the MMIO offset as specified in the GICv2 specs. Missing register accesses or other changes in individual register access functions to support save/restore of the VGIC state is added in subsequent patches. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Documentation/virtual/kvm/devices/arm-vgic.txt | 35 ++ virt/kvm/arm/vgic.c| 143 2 files changed, 178 insertions(+) diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt b/Documentation/virtual/kvm/devices/arm-vgic.txt index c9febb2..1b68475 100644 --- a/Documentation/virtual/kvm/devices/arm-vgic.txt +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt @@ -19,3 +19,38 @@ Groups: KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit) Base address in the guest physical address space of the GIC virtual cpu interface register mappings. + + KVM_DEV_ARM_VGIC_GRP_DIST_REGS + Attributes: +The attr field of kvm_device_attr encodes two values: +bits: | 63 40 | 39 .. 32 | 31 0 | +values: |reserved | cpu id | offset | + +All distributor regs are (rw, 32-bit) + +The offset is relative to the Distributor base address as defined in the +GICv2 specs. Getting or setting such a register has the same effect as +reading or writing the register on the actual hardware from the cpu +specified with cpu id field. Note that most distributor fields are not +banked, but return the same value regardless of the cpu id used to access +the register. + Limitations: +- Priorities are not implemented, and registers are RAZ/WI + Errors: +- ENODEV: Getting or setting this register is not yet supported + + KVM_DEV_ARM_VGIC_GRP_CPU_REGS + Attributes: +The attr field of kvm_device_attr encodes two values: +bits: | 63 40 | 39 .. 32 | 31 0 | +values: |reserved | cpu id | offset | + +All CPU regs are (rw, 32-bit) + +The offsetspecifies the offset from the CPU interface base address as offset specifies +defined in the GICv2 specs. Getting or setting such a register has the +same effect as reading or writing the register on the actual hardware. + Limitations: +- Priorities are not implemented, and registers are RAZ/WI + Errors: +- ENODEV: Getting or setting this register is not yet supported diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 629caeb..e31625c 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -591,11 +591,29 @@ static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu, return false; } +static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + +static bool handle_mmio_sgi_set(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + /* * I would have liked to use the kvm_bus_io_*() API instead, but it * cannot cope with banked registers (only the VM pointer is passed * around, and we need the vcpu). One of these days, someone please * fix it! + * + * Note that the handle_mmio implementations should not use the phys_addr + * field from the kvm_exit_mmio struct as this will not have any sane values + * when used to save/restore state from user space. */ struct mmio_range { phys_addr_t base; @@ -665,6 +683,16 @@ static const struct mmio_range vgic_dist_ranges[] = { .len= 4, .handle_mmio= handle_mmio_sgi_reg, }, + { + .base = GIC_DIST_SGI_CLEAR, + .len= VGIC_NR_SGIS, + .handle_mmio= handle_mmio_sgi_clear, + }, + { + .base = GIC_DIST_SGI_SET, + .len= VGIC_NR_SGIS, + .handle_mmio= handle_mmio_sgi_set, + }, {} }; @@ -1543,6 +1571,80 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) return r; } +static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, phys_addr_t offset) +{ + return true; +} + +static const struct mmio_range vgic_cpu_ranges[] = { + { + .base = GIC_CPU_CTRL, + .len= 12, + .handle_mmio= handle_cpu_mmio_misc, + }, + { + .base
Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access
On 23.08.2013, at 20:20, Christoffer Dall wrote: Implement support for the CPU interface register access driven by MMIO address offsets from the CPU interface base address. Useful for user space to support save/restore of the VGIC state. This commit adds support only for the same logic as the current VGIC support, and no more. For example, the active priority registers are handled as RAZ/WI, just like setting priorities on the emulated distributor. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- virt/kvm/arm/vgic.c | 66 +++ 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index d44b5a1..257dbae 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { - return true; + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + u32 reg, mask = 0, shift = 0; + bool updated = false; + + switch (offset ~0x3) { + case GIC_CPU_CTRL: + mask = GICH_VMCR_CTRL_MASK; + shift = GICH_VMCR_CTRL_SHIFT; + break; + case GIC_CPU_PRIMASK: + mask = GICH_VMCR_PRIMASK_MASK; + shift = GICH_VMCR_PRIMASK_SHIFT; + break; + case GIC_CPU_BINPOINT: + mask = GICH_VMCR_BINPOINT_MASK; + shift = GICH_VMCR_BINPOINT_SHIFT; + break; + case GIC_CPU_ALIAS_BINPOINT: + mask = GICH_VMCR_ALIAS_BINPOINT_MASK; + shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT; + break; + } + + if (!mmio-is_write) { + reg = (vgic_cpu-vgic_vmcr mask) shift; + memcpy(mmio-data, reg, sizeof(reg)); + } else { + memcpy(reg, mmio-data, sizeof(reg)); + reg = (reg shift) mask; + if (reg != (vgic_cpu-vgic_vmcr mask)) + updated = true; + vgic_cpu-vgic_vmcr = ~mask; + vgic_cpu-vgic_vmcr |= reg; + } + return updated; +} + +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, phys_addr_t offset) +{ + return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT); +} + +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + u32 reg; + + if (mmio-is_write) + return false; + + reg = 0x0002043B; This wants a comment and probably also a #define :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] Support VGIC save/restore using device control API
On 23.08.2013, at 20:19, Christoffer Dall wrote: Implement save/restore of the VGIC state using the newer KVM Device Control API. This requries some number of changes to existing code in addition to actually supporting save/restore of the necessary state. The first patches (01-03) support creating the VGIC using the Device Control API. This change is necessary because there are no other suitable KVM APIs that we can leverage to access the VGIC state from user space and the device control API was crafted exactly for this purpose. Subsequent patches add the missing infrastructure and user space API pieces necessary to actually save and restore the VGIC state. The GIC v2.0 architecture specification already specifies registers that can be used to save and restore the complete VGIC state for suspend/resume purposes on real hardware, and we can resuse this interface for the VGIC. The API is therefore based on the memory-mapped register accesses defined in the specs. See the individual patches for details. The patches rely on a number of small fixes sent separately to actually work: git://git.linaro.org/people/cdall/linux-kvm-arm.git vgic-migrate-prereq This patch series based on the above can be cloned from: git://git.linaro.org/people/cdall/linux-kvm-arm.git vgic-migrate User space patches for QEMU will follow shortly. Tested on Versatile Express TC2. Looks quite straight forward and nice. Reviewed-by: Alexander Graf ag...@suse.de Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines
On 25.08.2013, at 16:18, Peter Maydell wrote: On 25 August 2013 15:48, Gleb Natapov g...@redhat.com wrote: On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote: Smiley noted, but this is pretty unlikely since it's not possible to lie to the guest about which mode it's in, so you can't make a guest think it's in Hyp mode. I suspected this, but forgot most that I read about Hyp mode by now. Need to refresh my memory ASAP. Is it impossible even with a lot of emulation? Can guest detect that it is not in a Hyp mode without trapping into hypervisor? Yes. The current mode is in the the low bits of the CPSR, which is readable without causing a trap. This is just the most obvious roadblock; I bet there are more. If you really had to run Hyp mode code in a VM you probably have to do it by having it all emulated via TCG. Or in an in-kernel instruction emulator that we have lying around anyways. For kvm-in-kvm that should be good enough, as we only need to execute a few instructions in HYP mode. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer
This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 49 - 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..6aa320e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_fix_preempt(struct kvm_vcpu *vcpu) +{ + u64 delta_guest_tsc; + u32 preempt_val, preempt_bit, delta_preempt_val; + + preempt_bit = native_read_msr(MSR_IA32_VMX_MISC) 0x1F; + delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + delta_preempt_val = delta_guest_tsc preempt_bit; + preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + if (preempt_val - delta_preempt_val 0) + preempt_val = 0; + else + preempt_val -= delta_preempt_val; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val); +} /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) else vmx-nested.nested_run_pending = 0; - if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { - nested_vmx_vmexit(vcpu); - return 1; + if (is_guest_mode(vcpu)) { + if (nested_vmx_exit_handled(vcpu)) { + nested_vmx_vmexit(vcpu); + return 1; + } else + nested_fix_preempt(vcpu); } if (exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY) { @@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control = vmcs_config.vmexit_ctrl; + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + vmcs_write32(VM_EXIT_CONTROLS, exit_control); /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -8089,6 +8119,15 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12-guest_pending_dbg_exceptions = vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) { + if (vmcs12-vm_exit_controls VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) + vmcs12-vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + else + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, + vmcs12-vmx_preemption_timer_value); + } + /* * In some cases (usually, nested EPT), L2 is allowed to change its * own CR3 without exiting. If it has changed it, we must keep it. -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More
Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines
On Sun, Aug 25, 2013 at 04:18:02PM +0100, Peter Maydell wrote: On 25 August 2013 15:48, Gleb Natapov g...@redhat.com wrote: On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote: Smiley noted, but this is pretty unlikely since it's not possible to lie to the guest about which mode it's in, so you can't make a guest think it's in Hyp mode. I suspected this, but forgot most that I read about Hyp mode by now. Need to refresh my memory ASAP. Is it impossible even with a lot of emulation? Can guest detect that it is not in a Hyp mode without trapping into hypervisor? Yes. The current mode is in the the low bits of the CPSR, which is readable without causing a trap. This is just the most obvious roadblock; I bet there are more. If you really had to run Hyp mode code in a VM you probably have to do it by having it all emulated via TCG. Or through some highly paravirtualized nested virtualization solution if that is ever relevant. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] tile: support KVM for tilegx
On 8/25/2013 7:39 AM, Gleb Natapov wrote: On Mon, Aug 12, 2013 at 04:24:11PM -0400, Chris Metcalf wrote: This change provides the initial framework support for KVM on tilegx. Basic virtual disk and networking is supported. This needs to be broken down to more reviewable patches. I already broke out one pre-requisite patch that wasn't strictly KVM-related: https://lkml.org/lkml/2013/8/12/339 In addition, we've separately arranged to support booting our kernels in a way that is compatible with the Tilera booter running at the highest privilege level, which enables multiple kernel privilege levels: https://lkml.org/lkml/2013/5/2/468 How would you recommend further breaking down this patch? It's pretty much just the basic support for minimal KVM. I suppose I could break out all the I/O related stuff into a separate patch, though it wouldn't amount to much; perhaps the console could also be broken out separately. Any other suggestions? Also can you describe the implementation a little bit? Does tile arch has vitalization extension this implementation uses, or is it trap and emulate approach? If later does it run unmodified guest kernels? What userspace are you using with this implementation? We could do full virtualization via trap and emulate, but we've elected to do a para-virtualized approach. Userspace runs at PL (privilege level) 0, the guest kernel runs at PL1, and the host runs at PL2. We have available per-PL resources for various things, and take advantage of having two on-chip timers (for example) to handle timing for the host and guest kernels. We run the same userspace with either the host or the guest. -- Chris Metcalf, Tilera Corp. http://www.tilera.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Partial huge page backing with KVM/qemu
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Sunday, August 25, 2013 1:52 AM To: Chris Leduc Cc: kvm@vger.kernel.org Subject: Re: Partial huge page backing with KVM/qemu On Sat, Aug 24, 2013 at 12:32:07AM +, Chris Leduc wrote: Hi - In a KVM/qemu environment is it possible for the host to back only a portion of the guests memory with huge pages? In some situations it may not be desirable to back the entirety of a guest's memory with huge pages (as can be done via libvirt memoryBacking option). What are those situations? For example to limit a guest with 64GB of total memory to use 4GB of huge pages for fast lookup memory. This takes advantage of the 4 TLB entries for 1G pages on a Sandy/Ivy Bridge processor to ensure a page walk is never necessary for this fast memory. An example is a high performance data plane application. The remainder of the less frequently accessed memory can be in normal pages. What would be very useful is to request huge pages in the guest, either at boot time or dynamically, and have the host back them with physical huge pages, but not back the rest of the normal page guest memory with huge pages from the host. The equivalent in Xen is setting allowsuperpage=1 on the hypervisor boot line. As far as I can tell this disables/enables use of huge pages by XEN vm, not something you say you want. The Xen documentation is not clear on this, but in practice this flag allows the host to back up guest huge page requests with physical huge pages. So the guest could for example add hugepages=N to its boot line and these pages would be backed in the host with corresponding physical huge pages. From experimentation with KVM, requesting hugepages at guest boot time (without memory backing enabled) will result in guest hugepages backed by host normal pages. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 60782] [Nested] Windows XP Mode can not work
https://bugzilla.kernel.org/show_bug.cgi?id=60782 --- Comment #2 from Zhou, Chao chao.z...@intel.com --- (In reply to Paolo Bonzini from comment #1) Try -cpu host,+vmx instead. This issue still exists. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
On 24.08.2013, at 21:14, Yann Droneaud wrote: KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud ydrone...@opteya.com Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com Reviewed-by: Alexander Graf ag...@suse.de Would it make sense to simply inherit the O_CLOEXEC flag from the parent kvm fd instead? That would give user space the power to keep fds across exec() if it wants to. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly
On 23.08.2013, at 05:28, Benjamin Herrenschmidt wrote: On Fri, 2013-08-23 at 09:01 +0530, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 22.08.2013, at 12:37, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Isn't this you? Yes. The patches are generated using git format-patch and sent by git send-email. That's how it always created patches for me. I am not sure if there is a config I can change to avoid having From: Don't bother, that's perfectly fine, and git am will do the right thing. It will, but it's an indicator that something in his git config is misconfigured. Usually when git sees Author == Sender it will omit the From: line. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly
On 23.08.2013, at 04:31, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 22.08.2013, at 12:37, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Isn't this you? Yes. The patches are generated using git format-patch and sent by git send-email. That's how it always created patches for me. I am not sure if there is a config I can change to avoid having From: We should be able to copy upto count bytes Why? Without this we end up doing +struct kvm_get_htab_buf { +struct kvm_get_htab_header header; +/* + * Older kernel required one extra byte. + */ +unsigned long hpte[3]; +} hpte_buf; even though we are only looking for one hpte entry. Ok, please give me an example with real numbers and why it breaks. Alex http://mid.gmane.org/1376995766-16526-4-git-send-email-aneesh.ku...@linux.vnet.ibm.com Alex Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 710d313..0ae6bb6 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1362,7 +1362,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf, lbuf = (unsigned long __user *)buf; nb = 0; - while (nb + sizeof(hdr) + HPTE_SIZE count) { + while (nb + sizeof(hdr) + HPTE_SIZE = count) { /* Initialize header */ hptr = (struct kvm_get_htab_header __user *)buf; hdr.n_valid = 0; @@ -1385,7 +1385,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf, /* Grab a series of valid entries */ while (i kvm-arch.hpt_npte hdr.n_valid 0x - nb + HPTE_SIZE count + nb + HPTE_SIZE = count record_hpte(flags, hptp, hpte, revp, 1, first_pass)) { /* valid entry, write it out */ ++hdr.n_valid; -- 1.8.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html