Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.
 
 With this patch, nested VMX preemption timer features are fully
 supported.
 
 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
behavior properly in that case.

  
 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   (vmcs_config.pin_based_exec_ctrl |
vmcs12-pin_based_vm_exec_control));
  
 - if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
 - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 + if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER) 
 {
 + if (vmcs12-vm_exit_controls  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 + vmcs12-vmx_preemption_timer_value);
 + }

This is not correct. We still need to set the vmcs to
vmx_preemption_timer_value. The difference is that, on exit from L2,
vmx_preemption_timer_value has to be updated according to the saved
hardware state. The corresponding code is missing in your patch so far.

  
   /*
* Whether page-faults are trapped is determined by a combination of
 @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
* bits are further modified by vmx_set_efer() below.
*/
 - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 + if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + else
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);

Let's prepare the value for VM_EXIT_CONTROLS in a local variable first,
then write it to the vmcs.

  
   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
* emulated by vmx_set_efer(), below.
 @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
 launch)
   }
  
   /*
 +  * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support
 +  * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
 +  */
 + if ((vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) 
 + !(nested_vmx_exit_ctls_high  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 + return 1;
 + }

Nope, the guest is free to run the preemption timer without saving on
exits. It may have a valid use case for this, e.g. that it will always
reprogram it on entry.

 +
 + /*
* We're finally done with prerequisite checking, and can start with
* the nested entry.
*/
 

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3 0/6] KVM: nVMX: Enable unrestricted guest mode and fix some nEPT issues

2013-08-25 Thread Jan Kiszka
On 2013-08-08 16:26, Jan Kiszka wrote:
 These patches apply on top of kvm.git queue.
 
 Changes in v3:
  - rebased over queue
  - added Do not set identity page map for L2
  - dropped Fix guest CR3 read-back on VM-exit
 
 Jan Kiszka (6):
   KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in
 load_vmcs12_host_state
   KVM: nVMX: Do not set identity page map for L2
   KVM: nVMX: Load nEPT state after EFER
   KVM: nVMX: Implement support for EFER saving on VM-exit
   KVM: nVMX: Update mmu.base_role.nxe after EFER loading on
 VM-entry/exit
   KVM: nVMX: Enable unrestricted guest mode support
 
  arch/x86/kvm/vmx.c |   44 +++-
  1 files changed, 31 insertions(+), 13 deletions(-)
 

Ping for this series. It still applies, now to next, without conflicts.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-25 Thread Paolo Bonzini
Il 24/08/2013 22:14, Yann Droneaud ha scritto:
 KVM uses anon_inode_get() to allocate file descriptors as part
 of some of its ioctls. But those ioctls are lacking a flag argument
 allowing userspace to choose options for the newly opened file descriptor.
 
 In such case it's advised to use O_CLOEXEC by default so that
 userspace is allowed to choose, without race, if the file descriptor
 is going to be inherited across exec().
 
 This patch set O_CLOEXEC flag on all file descriptors created
 with anon_inode_getfd() to not leak file descriptors across exec().
 
 Signed-off-by: Yann Droneaud ydrone...@opteya.com
 Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

 ---
  virt/kvm/kvm_main.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 89f74d1..d65cc0c 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1896,7 +1896,7 @@ static struct file_operations kvm_vcpu_fops = {
   */
  static int create_vcpu_fd(struct kvm_vcpu *vcpu)
  {
 - return anon_inode_getfd(kvm-vcpu, kvm_vcpu_fops, vcpu, O_RDWR);
 + return anon_inode_getfd(kvm-vcpu, kvm_vcpu_fops, vcpu, O_RDWR | 
 O_CLOEXEC);
  }
  
  /*
 @@ -2306,7 +2306,7 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
   return ret;
   }
  
 - ret = anon_inode_getfd(ops-name, kvm_device_fops, dev, O_RDWR);
 + ret = anon_inode_getfd(ops-name, kvm_device_fops, dev, O_RDWR | 
 O_CLOEXEC);
   if (ret  0) {
   ops-destroy(dev);
   return ret;
 @@ -2590,7 +2590,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
   return r;
   }
  #endif
 - r = anon_inode_getfd(kvm-vm, kvm_vm_fops, kvm, O_RDWR);
 + r = anon_inode_getfd(kvm-vm, kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC);
   if (r  0)
   kvm_put_kvm(kvm);
  
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li
On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
 PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
 behavior properly in that case.


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   (vmcs_config.pin_based_exec_ctrl |
vmcs12-pin_based_vm_exec_control));

 - if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
 - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 + if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 + if (vmcs12-vm_exit_controls  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 + vmcs12-vmx_preemption_timer_value);
 + }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.


   /*
* Whether page-faults are trapped is determined by a combination of
 @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
* bits are further modified by vmx_set_efer() below.
*/
 - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 + if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + else
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);

 Let's prepare the value for VM_EXIT_CONTROLS in a local variable first,
 then write it to the vmcs.


   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
* emulated by vmx_set_efer(), below.
 @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
 launch)
   }

   /*
 +  * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support
 +  * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
 +  */
 + if ((vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) 
 + !(nested_vmx_exit_ctls_high  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 + return 1;
 + }

 Nope, the guest is free to run the preemption timer without saving on
 exits. It may have a valid use case for this, e.g. that it will always
 reprogram it on entry.
Here !(nested_vmx_exit_ctls_high 
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware
support save preemption timer feature, which means if L2 supports
pinbased vmx preemption timer, host must support save preemption
timer feature. Though nested_vmx_exit_ctls_* is used for nested env,
but it can also used to reflect the host's feature. Here is what I
discuss with you yesterday, and we can also get the feature via
rdmsr here to avoid the confusion.

Arthur

 +
 + /*
* We're finally done with prerequisite checking, and can start with
* the nested entry.
*/


 Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 09:24, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
 PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
 behavior properly in that case.


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   (vmcs_config.pin_based_exec_ctrl |
vmcs12-pin_based_vm_exec_control));

 - if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER)
 - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 + if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 + if (vmcs12-vm_exit_controls  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 + vmcs12-vmx_preemption_timer_value);
 + }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.


   /*
* Whether page-faults are trapped is determined by a combination of
 @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
* bits are further modified by vmx_set_efer() below.
*/
 - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 + if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER)
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + else
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);

 Let's prepare the value for VM_EXIT_CONTROLS in a local variable first,
 then write it to the vmcs.


   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
* emulated by vmx_set_efer(), below.
 @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, 
 bool launch)
   }

   /*
 +  * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support
 +  * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
 +  */
 + if ((vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) 
 + !(nested_vmx_exit_ctls_high  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 + nested_vmx_failValid(vcpu, 
 VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 + return 1;
 + }

 Nope, the guest is free to run the preemption timer without saving on
 exits. It may have a valid use case for this, e.g. that it will always
 reprogram it on entry.
 Here !(nested_vmx_exit_ctls_high 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware
 support save preemption timer feature, which means if L2 supports
 pinbased vmx preemption timer, host must support save preemption
 timer feature.

Sorry, parsed the code incorrectly.

 Though nested_vmx_exit_ctls_* is used for nested env,
 but it can also used to reflect the host's feature. Here is what I
 discuss with you yesterday, and we can also get the feature via
 rdmsr here to avoid the confusion.

Yes. The point is that we will not even expose
PIN_BASED_VMX_PREEMPTION_TIMER if VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is
missing. If the guest then requests the former, it simply sets an
invalid pin-based control value which we already catch and report. So
this hunk becomes redundant.

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Abel Gordon


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully
  supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---

 
  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));
 
  -   if (vmcs12-pin_based_vm_exec_control 
PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.

I think something else maybe be missing here: assuming L0 handles exits
for L2 without involving L1 (e.g. external interrupts or ept violations),
then, we may spend some cycles in L0 handling these exits. Note L1 is not
aware of these exits and from L1 perspective L2 was running on the CPU.
That means that we may need to reduce these cycles spent at
L0 from the preemtion timer or emulate a preemption timer exit to
force a transition to L1 instead of resuming L2.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li
On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 09:24, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
 PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
 behavior properly in that case.
Besides, we need to test that in the absence of
PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen
according to Intel SDM.


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   (vmcs_config.pin_based_exec_ctrl |
vmcs12-pin_based_vm_exec_control));

 - if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER)
 - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 + if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 + if (vmcs12-vm_exit_controls  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 + vmcs12-vmx_preemption_timer_value);
 + }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.


   /*
* Whether page-faults are trapped is determined by a combination of
 @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
* bits are further modified by vmx_set_efer() below.
*/
 - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 + if (vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER)
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + else
 + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);

 Let's prepare the value for VM_EXIT_CONTROLS in a local variable first,
 then write it to the vmcs.


   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
* emulated by vmx_set_efer(), below.
 @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, 
 bool launch)
   }

   /*
 +  * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support
 +  * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
 +  */
 + if ((vmcs12-pin_based_vm_exec_control  
 PIN_BASED_VMX_PREEMPTION_TIMER) 
 + !(nested_vmx_exit_ctls_high  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 + nested_vmx_failValid(vcpu, 
 VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 + return 1;
 + }

 Nope, the guest is free to run the preemption timer without saving on
 exits. It may have a valid use case for this, e.g. that it will always
 reprogram it on entry.
 Here !(nested_vmx_exit_ctls_high 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) is used to detect if hardware
 support save preemption timer feature, which means if L2 supports
 pinbased vmx preemption timer, host must support save preemption
 timer feature.

 Sorry, parsed the code incorrectly.

 Though nested_vmx_exit_ctls_* is used for nested env,
 but it can also used to reflect the host's feature. Here is what I
 discuss with you yesterday, and we can also get the feature via
 rdmsr here to avoid the confusion.

 Yes. The point is that we 

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 09:37, Abel Gordon wrote:
 
 
 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
 

 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.
 
 I think something else maybe be missing here: assuming L0 handles exits
 for L2 without involving L1 (e.g. external interrupts or ept violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is not
 aware of these exits and from L1 perspective L2 was running on the CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

That's precisely what the logic I described should achieve: reload the
value we saved on L2 exit on reentry.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li
On Sun, Aug 25, 2013 at 3:37 PM, Abel Gordon ab...@il.ibm.com wrote:


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully
  supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---

 
  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));
 
  -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so far.

 I think something else maybe be missing here: assuming L0 handles exits
 for L2 without involving L1 (e.g. external interrupts or ept violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is not
 aware of these exits and from L1 perspective L2 was running on the CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.
My solution is setting save preemption value feature of L2 if L2
sets vmx preemption timer feature, thus external interrupts (or
others) will save the exact value in L2's vmcs, and the resume of L2
will load the value in L2's vmcs. Thus cycles of handling these vmexit
in L0 will not affect L2's preemption value.

Arthur



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 09:37, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 09:24, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
 PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
 behavior properly in that case.
 Besides, we need to test that in the absence of
 PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen
 according to Intel SDM.

If the SDM guarantees this for us, we don't need such a safety measure.
Otherwise, it should be added, yes.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li
On Sun, Aug 25, 2013 at 3:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 09:37, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 09:24, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   30 +-
  1 file changed, 25 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..9579409 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide
 PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its
 behavior properly in that case.
 Besides, we need to test that in the absence of
 PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen
 according to Intel SDM.

 If the SDM guarantees this for us, we don't need such a safety measure.
 Otherwise, it should be added, yes.
The SDM has such description (see 26.2.1.2):

If “activate VMX-preemption timer” VM-execution control is 0, the
“save VMX-preemption timer value” VM-exit
control must also be 0.

It doesn't tell us if these two flags are consistent when getting them
from related MSR (IA32_VMX_PINBASED_CTLS and IA32_VMX_EXIT_CTLS). So I
think the check is needed here.

Arthur

 Jan


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Abel Gordon


kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:
 
 
  From: Jan Kiszka jan.kis...@web.de
  To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
  Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
  Date: 25/08/2013 09:44 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully
  supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
 
 
  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
  *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));
 
  -   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }
 
  This is not correct. We still need to set the vmcs to
  vmx_preemption_timer_value. The difference is that, on exit from L2,
  vmx_preemption_timer_value has to be updated according to the saved
  hardware state. The corresponding code is missing in your patch so
far.
 
  I think something else maybe be missing here: assuming L0 handles exits
  for L2 without involving L1 (e.g. external interrupts or ept
violations),
  then, we may spend some cycles in L0 handling these exits. Note L1 is
not
  aware of these exits and from L1 perspective L2 was running on the CPU.
  That means that we may need to reduce these cycles spent at
  L0 from the preemtion timer or emulate a preemption timer exit to
  force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

But don't you think we should also reduce the cycles spent at L0 from the
preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit
which was not forwarded to L1, then, before we resume L2,
the preemption timer should be: (previous_value_on_exit - X).
If (previous_value_on_exit - X)  0, then we should force (emulate) a
preemption timer exit between L2 and L1.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 09:50, Abel Gordon wrote:
 
 
 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:
 
 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so
 far.

 I think something else maybe be missing here: assuming L0 handles exits
 for L2 without involving L1 (e.g. external interrupts or ept
 violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
 aware of these exits and from L1 perspective L2 was running on the CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.
 
 But don't you think we should also reduce the cycles spent at L0 from the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.

We ask the hardware to save the value of the preemption on L2 exit. This
value will be exposed to L1 (if it asked for saving as well) and/or be
written back to the hardware on L2 reenty (unless L1 had a chance to run
and modified it). So the time spent in L0 is implicitly subtracted.

Jan

PS: You had kvm-owner in CC.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li
On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon ab...@il.ibm.com wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:
 
 
  From: Jan Kiszka jan.kis...@web.de
  To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
  Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
  Date: 25/08/2013 09:44 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully
  supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
 
 
  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
  *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));
 
  -   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }
 
  This is not correct. We still need to set the vmcs to
  vmx_preemption_timer_value. The difference is that, on exit from L2,
  vmx_preemption_timer_value has to be updated according to the saved
  hardware state. The corresponding code is missing in your patch so
 far.
 
  I think something else maybe be missing here: assuming L0 handles exits
  for L2 without involving L1 (e.g. external interrupts or ept
 violations),
  then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
  aware of these exits and from L1 perspective L2 was running on the CPU.
  That means that we may need to reduce these cycles spent at
  L0 from the preemtion timer or emulate a preemption timer exit to
  force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.
Sorry, I previously misunderstand your comments. But why should we
need to exclude cycles in L0 from L2 preemption value? These cycles
are not spent by L2 and it should not be on L2.

Arthur


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Abel Gordon


kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:55:24 AM:

 From: Arthur Chunqi Li yzt...@gmail.com
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: Jan Kiszka jan.kis...@web.de, Gleb Natapov g...@redhat.com,
 kvm kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, Paolo Bonzini
 pbonz...@redhat.com
 Date: 25/08/2013 10:55 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
timer
 Sent by: kvm-ow...@vger.kernel.org

 On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon ab...@il.ibm.com wrote:
 
 
  kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:
 
  From: Jan Kiszka jan.kis...@web.de
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
  pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
  Date: 25/08/2013 10:43 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-25 09:37, Abel Gordon wrote:
  
  
   From: Jan Kiszka jan.kis...@web.de
   To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
   Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
   Date: 25/08/2013 09:44 AM
   Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
preemption
   timer
   Sent by: kvm-ow...@vger.kernel.org
  
   On 2013-08-24 20:44, root wrote:
   This patch contains the following two changes:
   1. Fix the bug in nested preemption timer support. If vmexit L2-
L0
   with some reasons not emulated by L1, preemption timer value
should
   be save in such exits.
   2. Add support of Save VMX-preemption timer value VM-Exit
controls
   to nVMX.
  
   With this patch, nested VMX preemption timer features are fully
   supported.
  
   Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
   ---
  
  
   @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
   *vcpu, struct vmcs12 *vmcs12)
  (vmcs_config.pin_based_exec_ctrl |
   vmcs12-pin_based_vm_exec_control));
  
   -   if (vmcs12-pin_based_vm_exec_control 
   PIN_BASED_VMX_PREEMPTION_TIMER)
   -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
   -  vmcs12-vmx_preemption_timer_value);
   +   if (vmcs12-pin_based_vm_exec_control 
   PIN_BASED_VMX_PREEMPTION_TIMER) {
   +  if (vmcs12-vm_exit_controls 
   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
   + vmcs12-vmx_preemption_timer_value =
   +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
   +  else
   + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
   +   vmcs12-vmx_preemption_timer_value);
   +   }
  
   This is not correct. We still need to set the vmcs to
   vmx_preemption_timer_value. The difference is that, on exit from
L2,
   vmx_preemption_timer_value has to be updated according to the saved
   hardware state. The corresponding code is missing in your patch so
  far.
  
   I think something else maybe be missing here: assuming L0 handles
exits
   for L2 without involving L1 (e.g. external interrupts or ept
  violations),
   then, we may spend some cycles in L0 handling these exits. Note L1
is
  not
   aware of these exits and from L1 perspective L2 was running on the
CPU.
   That means that we may need to reduce these cycles spent at
   L0 from the preemtion timer or emulate a preemption timer exit to
   force a transition to L1 instead of resuming L2.
 
  That's precisely what the logic I described should achieve: reload the
  value we saved on L2 exit on reentry.
 
  But don't you think we should also reduce the cycles spent at L0 from
the
  preemption timer ? I mean, if we spent X cycles at L0 handling a L2
exit
  which was not forwarded to L1, then, before we resume L2,
  the preemption timer should be: (previous_value_on_exit - X).
  If (previous_value_on_exit - X)  0, then we should force (emulate) a
  preemption timer exit between L2 and L1.
 Sorry, I previously misunderstand your comments. But why should we
 need to exclude cycles in L0 from L2 preemption value? These cycles
 are not spent by L2 and it should not be on L2.

L1 asked the hardware (emulated by L0) to run L2 and force an exit
after Y cycles. Now, in practice, we may spend X cycles at L0 handling
exits without switching to L1. That means that from L1 perspective L2
was running all these X cycles. L1 should assume that the instructions per
cycle
the CPU executed decreased but the cycles were spent. That's why I believe
you should take in account these X cycles.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 10:04, Abel Gordon wrote:
 
 
 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:55:24 AM:
 
 From: Arthur Chunqi Li yzt...@gmail.com
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: Jan Kiszka jan.kis...@web.de, Gleb Natapov g...@redhat.com,
 kvm kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, Paolo Bonzini
 pbonz...@redhat.com
 Date: 25/08/2013 10:55 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon ab...@il.ibm.com wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-
 L0
 with some reasons not emulated by L1, preemption timer value
 should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit
 controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from
 L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so
 far.

 I think something else maybe be missing here: assuming L0 handles
 exits
 for L2 without involving L1 (e.g. external interrupts or ept
 violations),
 then, we may spend some cycles in L0 handling these exits. Note L1
 is
 not
 aware of these exits and from L1 perspective L2 was running on the
 CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from
 the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.
 Sorry, I previously misunderstand your comments. But why should we
 need to exclude cycles in L0 from L2 preemption value? These cycles
 are not spent by L2 and it should not be on L2.
 
 L1 asked the hardware (emulated by L0) to run L2 and force an exit
 after Y cycles. Now, in practice, we may spend X cycles at L0 handling
 exits without switching to L1. That means that from L1 perspective L2
 was running all these X cycles. L1 should assume that the instructions per
 cycle
 the CPU executed decreased but the cycles were spent. That's why I believe
 you should take in account these X cycles.
 

Now I get it. There is likely some truth in this as the reference clock
for the preemption timer, the TSC, isn't stopped for L1/L2 while running
in L0. And the SDM demands the countdown to be proportional to that clock.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Abel Gordon


kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:
 
 
  kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:
 
  From: Jan Kiszka jan.kis...@web.de
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
  pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
  Date: 25/08/2013 10:43 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-25 09:37, Abel Gordon wrote:
 
 
  From: Jan Kiszka jan.kis...@web.de
  To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
  Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
  Date: 25/08/2013 09:44 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit
controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully
  supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
 
 
  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
  *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));
 
  -   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }
 
  This is not correct. We still need to set the vmcs to
  vmx_preemption_timer_value. The difference is that, on exit from L2,
  vmx_preemption_timer_value has to be updated according to the saved
  hardware state. The corresponding code is missing in your patch so
  far.
 
  I think something else maybe be missing here: assuming L0 handles
exits
  for L2 without involving L1 (e.g. external interrupts or ept
  violations),
  then, we may spend some cycles in L0 handling these exits. Note L1 is
  not
  aware of these exits and from L1 perspective L2 was running on the
CPU.
  That means that we may need to reduce these cycles spent at
  L0 from the preemtion timer or emulate a preemption timer exit to
  force a transition to L1 instead of resuming L2.
 
  That's precisely what the logic I described should achieve: reload the
  value we saved on L2 exit on reentry.
 
  But don't you think we should also reduce the cycles spent at L0 from
the
  preemption timer ? I mean, if we spent X cycles at L0 handling a L2
exit
  which was not forwarded to L1, then, before we resume L2,
  the preemption timer should be: (previous_value_on_exit - X).
  If (previous_value_on_exit - X)  0, then we should force (emulate) a
  preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.

I think you are suggesting the following, please correct me if I am wrong.
1) L1 resumes L2 with preemption timer enabled
2) L0 emulates the resume/launch
3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
timer specified by L1)
4) L0 saved the preemption timer (original value - Y)
5) L0 spends X cycles handling the external interrupt
6) L0 resumes L2 with preemption timer = original value - Y

Note that in this case X is ignored.

I was suggesting to do the following:
6) If original value - Y - X  0 then
 L0 resumes L2 with preemption timer = original value - Y - X
else
 L0 emulates a L2-L1 preemption timer exit (resumes L1)





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 10:18, Abel Gordon wrote:
 
 
 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:
 
 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit
 controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so
 far.

 I think something else maybe be missing here: assuming L0 handles
 exits
 for L2 without involving L1 (e.g. external interrupts or ept
 violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
 aware of these exits and from L1 perspective L2 was running on the
 CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from
 the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.
 
 I think you are suggesting the following, please correct me if I am wrong.
 1) L1 resumes L2 with preemption timer enabled
 2) L0 emulates the resume/launch
 3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
 timer specified by L1)
 4) L0 saved the preemption timer (original value - Y)
 5) L0 spends X cycles handling the external interrupt
 6) L0 resumes L2 with preemption timer = original value - Y
 
 Note that in this case X is ignored.

Yes, but see my other reply.

 
 I was suggesting to do the following:
 6) If original value - Y - X  0 then
  L0 resumes L2 with preemption timer = original value - Y - X
 else
  L0 emulates a L2-L1 preemption timer exit (resumes L1)

Almost . 6) should be:
If exit to L1 occurred after last L2, set X to 0. Then load MAX(original
value - Y - X, 0).

The hardware will trigger the exit for us.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 10:25, Jan Kiszka wrote:
 On 2013-08-25 10:18, Abel Gordon wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit
 controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so
 far.

 I think something else maybe be missing here: assuming L0 handles
 exits
 for L2 without involving L1 (e.g. external interrupts or ept
 violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
 aware of these exits and from L1 perspective L2 was running on the
 CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from
 the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.

 I think you are suggesting the following, please correct me if I am wrong.
 1) L1 resumes L2 with preemption timer enabled
 2) L0 emulates the resume/launch
 3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
 timer specified by L1)
 4) L0 saved the preemption timer (original value - Y)
 5) L0 spends X cycles handling the external interrupt
 6) L0 resumes L2 with preemption timer = original value - Y

 Note that in this case X is ignored.
 
 Yes, but see my other reply.
 

 I was suggesting to do the following:
 6) If original value - Y - X  0 then
  L0 resumes L2 with preemption timer = original value - Y - X
 else
  L0 emulates a L2-L1 preemption timer exit (resumes L1)
 
 Almost . 6) should be:
 If exit to L1 occurred after last L2, set X to 0. Then load MAX(original
 value - Y - X, 0).

Hmm, no:

If exit to L1 occurred after last L2, load value of vmcs12, else load
MAX(original
value - Y - X, 0).

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Abel Gordon


Jan Kiszka jan.kis...@web.de wrote on 25/08/2013 11:27:22 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 11:27 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
timer

 On 2013-08-25 10:25, Jan Kiszka wrote:
  On 2013-08-25 10:18, Abel Gordon wrote:
 
 
  kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:
 
  From: Jan Kiszka jan.kis...@web.de
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
  李春奇 Arthur Chunqi Li  yzt...@gmail.com
  Date: 25/08/2013 10:54 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-25 09:50, Abel Gordon wrote:
 
 
  kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:
 
  From: Jan Kiszka jan.kis...@web.de
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: g...@redhat.com, kvm@vger.kernel.org,
kvm-ow...@vger.kernel.org,
  pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
  Date: 25/08/2013 10:43 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-25 09:37, Abel Gordon wrote:
 
 
  From: Jan Kiszka jan.kis...@web.de
  To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
  Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
  Date: 25/08/2013 09:44 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
  preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit
L2-L0
  with some reasons not emulated by L1, preemption timer value
should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit
  controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully
  supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
 
 
  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct
kvm_vcpu
  *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));
 
  -   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }
 
  This is not correct. We still need to set the vmcs to
  vmx_preemption_timer_value. The difference is that, on exit from
L2,
  vmx_preemption_timer_value has to be updated according to the
saved
  hardware state. The corresponding code is missing in your patch
so
  far.
 
  I think something else maybe be missing here: assuming L0 handles
  exits
  for L2 without involving L1 (e.g. external interrupts or ept
  violations),
  then, we may spend some cycles in L0 handling these exits. Note L1
is
  not
  aware of these exits and from L1 perspective L2 was running on the
  CPU.
  That means that we may need to reduce these cycles spent at
  L0 from the preemtion timer or emulate a preemption timer exit to
  force a transition to L1 instead of resuming L2.
 
  That's precisely what the logic I described should achieve: reload
the
  value we saved on L2 exit on reentry.
 
  But don't you think we should also reduce the cycles spent at L0
from
  the
  preemption timer ? I mean, if we spent X cycles at L0 handling a L2
  exit
  which was not forwarded to L1, then, before we resume L2,
  the preemption timer should be: (previous_value_on_exit - X).
  If (previous_value_on_exit - X)  0, then we should force
(emulate) a
  preemption timer exit between L2 and L1.
 
  We ask the hardware to save the value of the preemption on L2 exit.
This
  value will be exposed to L1 (if it asked for saving as well) and/or
be
  written back to the hardware on L2 reenty (unless L1 had a chance to
run
  and modified it). So the time spent in L0 is implicitly subtracted.
 
  I think you are suggesting the following, please correct me if I am
wrong.
  1) L1 resumes L2 with preemption timer enabled
  2) L0 emulates the resume/launch
  3) L2 runs for Y cycles until an external interrupt occurs (Y 
preemption
  timer specified by L1)
  4) L0 saved the preemption timer (original value - Y)
  5) L0 spends X cycles handling the external interrupt
  6) L0 resumes L2 with preemption timer = original value - Y
 
  Note that in this case X is ignored.
 
  Yes, but see my other reply.

I sent my reply before I read yours, sorry.

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li
On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:
 
 
  kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:
 
  From: Jan Kiszka jan.kis...@web.de
  To: Abel Gordon/Haifa/IBM@IBMIL,
  Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
  pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
  Date: 25/08/2013 10:43 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-25 09:37, Abel Gordon wrote:
 
 
  From: Jan Kiszka jan.kis...@web.de
  To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
  Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
  Date: 25/08/2013 09:44 AM
  Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
  timer
  Sent by: kvm-ow...@vger.kernel.org
 
  On 2013-08-24 20:44, root wrote:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit
 controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully
  supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
 
 
  @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
  *vcpu, struct vmcs12 *vmcs12)
 (vmcs_config.pin_based_exec_ctrl |
  vmcs12-pin_based_vm_exec_control));
 
  -   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER)
  -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  -  vmcs12-vmx_preemption_timer_value);
  +   if (vmcs12-pin_based_vm_exec_control 
  PIN_BASED_VMX_PREEMPTION_TIMER) {
  +  if (vmcs12-vm_exit_controls 
  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
  + vmcs12-vmx_preemption_timer_value =
  +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
  +  else
  + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
  +   vmcs12-vmx_preemption_timer_value);
  +   }
 
  This is not correct. We still need to set the vmcs to
  vmx_preemption_timer_value. The difference is that, on exit from L2,
  vmx_preemption_timer_value has to be updated according to the saved
  hardware state. The corresponding code is missing in your patch so
  far.
 
  I think something else maybe be missing here: assuming L0 handles
 exits
  for L2 without involving L1 (e.g. external interrupts or ept
  violations),
  then, we may spend some cycles in L0 handling these exits. Note L1 is
  not
  aware of these exits and from L1 perspective L2 was running on the
 CPU.
  That means that we may need to reduce these cycles spent at
  L0 from the preemtion timer or emulate a preemption timer exit to
  force a transition to L1 instead of resuming L2.
 
  That's precisely what the logic I described should achieve: reload the
  value we saved on L2 exit on reentry.
 
  But don't you think we should also reduce the cycles spent at L0 from
 the
  preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
  which was not forwarded to L1, then, before we resume L2,
  the preemption timer should be: (previous_value_on_exit - X).
  If (previous_value_on_exit - X)  0, then we should force (emulate) a
  preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.

 I think you are suggesting the following, please correct me if I am wrong.
 1) L1 resumes L2 with preemption timer enabled
 2) L0 emulates the resume/launch
 3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
 timer specified by L1)
 4) L0 saved the preemption timer (original value - Y)
 5) L0 spends X cycles handling the external interrupt
 6) L0 resumes L2 with preemption timer = original value - Y

 Note that in this case X is ignored.

 I was suggesting to do the following:
 6) If original value - Y - X  0 then
  L0 resumes L2 with preemption timer = original value - Y - X
 else
  L0 emulates a L2-L1 preemption timer exit (resumes L1)
Yes, your description is right. But I'm also thinking about my
previous consideration, why should we consider such X cycles as what
L2 spent. For nested VMX. external interrupt is not provided by L1, it
is triggered from L0 and want to cause periodically exit to L1, L2 is

Re: Partial huge page backing with KVM/qemu

2013-08-25 Thread Gleb Natapov
On Sat, Aug 24, 2013 at 12:32:07AM +, Chris Leduc wrote:
 Hi - In a KVM/qemu environment is it possible for the host to back only a 
 portion of the guests memory with huge pages?  In some situations it may not 
 be desirable to back the entirety of a guest's memory with huge pages (as can 
 be done via libvirt memoryBacking option).
What are those situations?

 What would be very useful is to request huge pages in the guest, either at 
 boot time or dynamically, and have the host back them with physical huge 
 pages, but not back the rest of the normal page guest memory with huge pages 
 from the host.
 
 The equivalent in Xen is setting allowsuperpage=1 on the hypervisor boot line.
 
As far as I can tell this disables/enables use of huge pages by XEN vm,
not something you say you want.
 
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 10:41, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit
 controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so
 far.

 I think something else maybe be missing here: assuming L0 handles
 exits
 for L2 without involving L1 (e.g. external interrupts or ept
 violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
 aware of these exits and from L1 perspective L2 was running on the
 CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from
 the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.

 I think you are suggesting the following, please correct me if I am wrong.
 1) L1 resumes L2 with preemption timer enabled
 2) L0 emulates the resume/launch
 3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
 timer specified by L1)
 4) L0 saved the preemption timer (original value - Y)
 5) L0 spends X cycles handling the external interrupt
 6) L0 resumes L2 with preemption timer = original value - Y

 Note that in this case X is ignored.

 I was suggesting to do the following:
 6) If original value - Y - X  0 then
  L0 resumes L2 with preemption timer = original value - Y - X
 else
  L0 emulates a L2-L1 preemption timer exit (resumes L1)
 Yes, your description is right. But I'm also thinking about my
 previous consideration, why should we consider such X cycles as what
 L2 spent. For nested VMX. external interrupt is not provided by L1, it
 is triggered from L0 and want to cause periodically exit to L1, L2 is
 accidentally injure actually. Since 

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li
On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 10:41, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit
 controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so
 far.

 I think something else maybe be missing here: assuming L0 handles
 exits
 for L2 without involving L1 (e.g. external interrupts or ept
 violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
 aware of these exits and from L1 perspective L2 was running on the
 CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from
 the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.

 I think you are suggesting the following, please correct me if I am wrong.
 1) L1 resumes L2 with preemption timer enabled
 2) L0 emulates the resume/launch
 3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
 timer specified by L1)
 4) L0 saved the preemption timer (original value - Y)
 5) L0 spends X cycles handling the external interrupt
 6) L0 resumes L2 with preemption timer = original value - Y

 Note that in this case X is ignored.

 I was suggesting to do the following:
 6) If original value - Y - X  0 then
  L0 resumes L2 with preemption timer = original value - Y - X
 else
  L0 emulates a L2-L1 preemption timer exit (resumes L1)
 Yes, your description is right. But I'm also thinking about my
 previous consideration, why should we consider such X cycles as what
 L2 spent. For nested VMX. external interrupt is not provided by L1, it
 is triggered from L0 and want to 

Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Jan Kiszka
On 2013-08-25 11:07, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 10:41, Arthur Chunqi Li wrote:
 On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon ab...@il.ibm.com wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:54:13 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm kvm@vger.kernel.org, pbonz...@redhat.com,
 李春奇 Arthur Chunqi Li  yzt...@gmail.com
 Date: 25/08/2013 10:54 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:50, Abel Gordon wrote:


 kvm-ow...@vger.kernel.org wrote on 25/08/2013 10:43:12 AM:

 From: Jan Kiszka jan.kis...@web.de
 To: Abel Gordon/Haifa/IBM@IBMIL,
 Cc: g...@redhat.com, kvm@vger.kernel.org, kvm-ow...@vger.kernel.org,
 pbonz...@redhat.com, 李春奇 Arthur Chunqi Li yzt...@gmail.com
 Date: 25/08/2013 10:43 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-25 09:37, Abel Gordon wrote:


 From: Jan Kiszka jan.kis...@web.de
 To: 李春奇 Arthur Chunqi Li  yzt...@gmail.com,
 Cc: kvm@vger.kernel.org, g...@redhat.com, pbonz...@redhat.com
 Date: 25/08/2013 09:44 AM
 Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
 preemption
 timer
 Sent by: kvm-ow...@vger.kernel.org

 On 2013-08-24 20:44, root wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit
 controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---


 @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
 *vcpu, struct vmcs12 *vmcs12)
(vmcs_config.pin_based_exec_ctrl |
 vmcs12-pin_based_vm_exec_control));

 -   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER)
 -  vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 -  vmcs12-vmx_preemption_timer_value);
 +   if (vmcs12-pin_based_vm_exec_control 
 PIN_BASED_VMX_PREEMPTION_TIMER) {
 +  if (vmcs12-vm_exit_controls 
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
 + vmcs12-vmx_preemption_timer_value =
 +vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 +  else
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
 +   vmcs12-vmx_preemption_timer_value);
 +   }

 This is not correct. We still need to set the vmcs to
 vmx_preemption_timer_value. The difference is that, on exit from L2,
 vmx_preemption_timer_value has to be updated according to the saved
 hardware state. The corresponding code is missing in your patch so
 far.

 I think something else maybe be missing here: assuming L0 handles
 exits
 for L2 without involving L1 (e.g. external interrupts or ept
 violations),
 then, we may spend some cycles in L0 handling these exits. Note L1 is
 not
 aware of these exits and from L1 perspective L2 was running on the
 CPU.
 That means that we may need to reduce these cycles spent at
 L0 from the preemtion timer or emulate a preemption timer exit to
 force a transition to L1 instead of resuming L2.

 That's precisely what the logic I described should achieve: reload the
 value we saved on L2 exit on reentry.

 But don't you think we should also reduce the cycles spent at L0 from
 the
 preemption timer ? I mean, if we spent X cycles at L0 handling a L2
 exit
 which was not forwarded to L1, then, before we resume L2,
 the preemption timer should be: (previous_value_on_exit - X).
 If (previous_value_on_exit - X)  0, then we should force (emulate) a
 preemption timer exit between L2 and L1.

 We ask the hardware to save the value of the preemption on L2 exit. This
 value will be exposed to L1 (if it asked for saving as well) and/or be
 written back to the hardware on L2 reenty (unless L1 had a chance to run
 and modified it). So the time spent in L0 is implicitly subtracted.

 I think you are suggesting the following, please correct me if I am wrong.
 1) L1 resumes L2 with preemption timer enabled
 2) L0 emulates the resume/launch
 3) L2 runs for Y cycles until an external interrupt occurs (Y  preemption
 timer specified by L1)
 4) L0 saved the preemption timer (original value - Y)
 5) L0 spends X cycles handling the external interrupt
 6) L0 resumes L2 with preemption timer = original value - Y

 Note that in this case X is ignored.

 I was suggesting to do the following:
 6) If original value - Y - X  0 then
  L0 resumes L2 with preemption timer = original value - Y - X
 else
  L0 emulates a L2-L1 preemption timer exit (resumes L1)
 Yes, your description is right. But I'm also thinking about my
 previous consideration, why should we consider such X cycles as what
 L2 spent. For nested VMX. external interrupt is not provided 

Re: [PATCH v3 0/6] KVM: nVMX: Enable unrestricted guest mode and fix some nEPT issues

2013-08-25 Thread Paolo Bonzini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 25/08/2013 08:46, Jan Kiszka ha scritto:
 On 2013-08-08 16:26, Jan Kiszka wrote:
 These patches apply on top of kvm.git queue.
 
 Changes in v3: - rebased over queue - added Do not set identity
 page map for L2 - dropped Fix guest CR3 read-back on VM-exit
 
 Jan Kiszka (6): KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0
 in load_vmcs12_host_state KVM: nVMX: Do not set identity page map
 for L2 KVM: nVMX: Load nEPT state after EFER KVM: nVMX: Implement
 support for EFER saving on VM-exit KVM: nVMX: Update
 mmu.base_role.nxe after EFER loading on VM-entry/exit KVM: nVMX:
 Enable unrestricted guest mode support
 
 arch/x86/kvm/vmx.c |   44
 +++- 1 files changed, 31
 insertions(+), 13 deletions(-)
 
 
 Ping for this series. It still applies, now to next, without
 conflicts.

I tested it and it works fine.  It also looks good to me.  However, I
wanted to leave time to Gleb so that he could also review it.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSGdYBAAoJEBvWZb6bTYbyQgsP/ihhQga53141nXf9U70lLbbP
rv4hJCsqwHSOmjlEww0qUJOAE1nwlSYwowkucLKSVeuKy2/MXl+qwfxdLkFVqwgs
CvSadewf69BrYDkIs9gaxZ++CDKOWGppoKU2EyWLTibiloUti9kATXcxNGyQNHgY
7SrNrwzayUyp4KCm2iXcmag9U3kRscTHH2zxKFGjKtrvsf3yNjDHQJlXFIV0r1IB
HJCjm5kb8k1r9MGJye1nR4ZCyzqNtqyxGLaJV6W4cz5kJcNIStJuL18SGGzf77jb
I4DpDredzzuI22CsnJEicucNpq5i9C7tIxqo9q5zeNsg9gya+SdkHZEhOGompEP6
tDrd0Apn3Bbz48kLuGF6VvX3g4iLNop5qGXsZ66NxnQdtT9Yfx1jOQA2yqZGfcPi
hSNTHQBHlcvsXTO3n7lkCHd0inyxe+4zxF1hQBzYtT9gPp3KQRt4nd2ZqcAjVZrb
iEqTyXaiTi+5FDJXMziWNompQwWwJ8muMKNCd+Z7371TtVATmx7CEmGsKoB7buGb
sZiah2uQ0FBkD6v4wSL9VY9iTb+IRBEgAB/9viBiOcyxAngiAcTT8WB1snxlMUmj
5M31qPIWHN1qbi/JBE+/K3yFgAHJPjkfTsavCVqjXl4Su5bMmUOoCHE30Z6/CIQV
VWVaAJAymWyY+o+ON8HQ
=8y8m
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.10.1 - NMI received for unknown reason

2013-08-25 Thread Gleb Natapov
On Fri, Aug 09, 2013 at 09:14:13PM +0200, Stefan Pietsch wrote:
 On 04.08.2013 14:44, Gleb Natapov wrote:
  On Fri, Aug 02, 2013 at 08:24:38AM +0200, Stefan Pietsch wrote:
  On 31.07.2013 11:20, Gleb Natapov wrote:
  On Wed, Jul 31, 2013 at 11:10:01AM +0200, Stefan Pietsch wrote:
  On 30.07.2013 07:31, Gleb Natapov wrote:
 
  What happen if you run perf on your host (perf record -a)?
  Do you see same NMI messages?
 
  It seems that perf record -a triggers some delayed NMI messages.
  They appear about 20 or 30 minutes after the command. This seems strange.
  Definitely strange. KVM guest is not running in parallel, correct? 20, 30
  minutes after perf stopped running or it is running all of the time?
 
  No, the KVM guest ist not running in parallel. But I'm not able to
  clearly reproduce the NMI messages with perf record.
  I start perf record -a and after some minutes I stop the recording.
 
  After that it seems NMI messages appear within a random period of time.
  So, I cannot tell what triggers the messages.
  When you run KVM with coreduo cpu model it emulates PMU which basically
  make is perf front end. If you can reproduce the messages with perf too
  it probably means that the problem is not in the KVM itself. If you
  disabled NMI watchdog in the guest the messages may go away.
  Can you send your guest's dmesg when you boot it with coreduo mode?
 
 
 The NMI messages appear in the host only. The guest runs as usual.
 
 
I understand that. But enabling guest nmi watchdog is what makes KVM to
use perf subsystem and likely causes this host messages. Try do disable
nmi watchdog in a guest and see what happens.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

2013-08-25 Thread Michael S. Tsirkin
On Fri, Aug 23, 2013 at 04:50:38PM +0800, Jason Wang wrote:
 On 08/20/2013 10:33 AM, Jason Wang wrote:
  On 08/16/2013 05:54 PM, Michael S. Tsirkin wrote:
  On Fri, Aug 16, 2013 at 01:16:26PM +0800, Jason Wang wrote:
  Switch to use vhost_add_used_and_signal_n() to avoid multiple calls to
  vhost_add_used_and_signal(). With the patch we will call at most 2 times
  (consider done_idx warp around) compared to N times w/o this patch.
 
  Signed-off-by: Jason Wang jasow...@redhat.com
  So? Does this help performance then?
 
  Looks like it can especially when guest does support event index. When
  guest enable tx interrupt, this can saves us some unnecessary signal to
  guest. I will do some test.
 
 Have done some test. I can see 2% - 3% increasing in both aggregate
 transaction rate and per cpu transaction rate in TCP_RR and UDP_RR test.
 
 I'm using ixgbe. W/o this patch, I can see more than 100 calls of
 vhost_add_used_signal() in one vhost_zerocopy_signaled_used(). This is
 because ixgbe (and other modern ethernet driver) tends to free old tx
 skbs in a loop during tx interrupt, and vhost tend to batch the adding
 used and signal in vhost_zerocopy_callback(). Switching to use
 vhost_add_use_and_signal_n() means saving 100 times of used idx updating
 and memory barriers.

Well it's only smp_wmb so a nop on most architectures, so
a 2% gain is surprising.
I'm guessing the cache miss on the write is what's
giving us a speedup here.

I'll review the code, thanks.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] vhost_net: remove the max pending check

2013-08-25 Thread Michael S. Tsirkin
On Fri, Aug 23, 2013 at 04:55:49PM +0800, Jason Wang wrote:
 On 08/20/2013 10:48 AM, Jason Wang wrote:
  On 08/16/2013 06:02 PM, Michael S. Tsirkin wrote:
   On Fri, Aug 16, 2013 at 01:16:30PM +0800, Jason Wang wrote:
   We used to limit the max pending DMAs to prevent guest from pinning 
   too many
   pages. But this could be removed since:
  
   - We have the sk_wmem_alloc check in both tun/macvtap to do the same 
   work
   - This max pending check were almost useless since it was one done 
   when there's
 no new buffers coming from guest. Guest can easily exceeds the 
   limitation.
   - We've already check upend_idx != done_idx and switch to non zerocopy 
   then. So
 even if all vq-heads were used, we can still does the packet 
   transmission.
   We can but performance will suffer.
  The check were in fact only done when no new buffers submitted from
  guest. So if guest keep sending, the check won't be done.
 
  If we really want to do this, we should do it unconditionally. Anyway, I
  will do test to see the result.
 
 There's a bug in PATCH 5/6, the check:
 
 nvq-upend_idx != nvq-done_idx
 
 makes the zerocopy always been disabled since we initialize both
 upend_idx and done_idx to zero. So I change it to:
 
 (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx.

But what I would really like to try is limit ubuf_info to VHOST_MAX_PEND.
I think this has a chance to improve performance since
we'll be using less cache.
Of course this means we must fix the code to really never submit
more than VHOST_MAX_PEND requests.

Want to try?
 
 With this change on top, I didn't see performance difference w/ and w/o
 this patch.

Did you try small message sizes btw (like 1K)? Or just netperf
default of 64K?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V12 3/5] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

2013-08-25 Thread Gleb Natapov
On Wed, Aug 07, 2013 at 12:40:36AM +0530, Raghavendra K T wrote:
 On 08/07/2013 12:02 AM, Eric Northup wrote:
 
 If this is about migration correctness, could it get folded into the
 previous patch 2/5, so that there's not a broken commit which could
 hurt bisection?
 
 Yes. It could be. Only reason I maintained like that was,
 original author in the previous patch is different (Srivatsa) and I did
 not want to merge this hunk when the patch series got evolved to mix
 the sign-offs.
 
 Gleb, Paolo please let me know.
 
Yes please, do so.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] -cpu host (was Re: KVM call minutes for 2013-08-06)

2013-08-25 Thread Gleb Natapov
On Thu, Aug 08, 2013 at 05:55:09PM +0200, Andreas Färber wrote:
 Hi Peter,
 
 Am 08.08.2013 14:51, schrieb Peter Maydell:
  [I missed this KVM call but the stuff about -cpu host ties into
  an issue we've been grappling with for ARM KVM, so it seems
  a reasonable jumping-off-point.]
  
  On 6 August 2013 16:15, Juan Quintela quint...@redhat.com wrote:
  2013-08-06
  --
 
  What libvirt needs/miss Today?
  - how to handle machine types? creating them inside qemu?
  - qemu --cpu help
only shows cpus,  not what features qemu will use
  - qemu -cpu host
what does this exactly means?  kvm removes same flags.
  - Important to know if migration would work.
  - Machine types sometimes disable some feature, so cpu alone is not
enough.
  
  - kernel removes some features because it knows it can't be virtualised
  - qemu adds some others because it knows it don't need host support
  - and then lots of features in the middle
  
  So, coming at this from an ARM perspective:
  Should any target arch that supports KVM also support -cpu host?
  If so, what should it do?
 
 I think that depends on the target and whether/what is useful.
 
  Is there a description somewhere of
  what the x86 and PPC semantics of -cpu host are?
 
 I'm afraid our usual documentation will be reading the source code. ;)
 
 x86 was first to implement -cpu host and passed through pretty much all
 host features even if they would not work without additional support
 code.
This is definitely not true. Only features that will work are passsed through.
Actually the definition of -cpu host for x86 can be: advertise any
feature that can be supported on this host/qemu combo.

   I've seen a bunch of bugs where that leads to GMP and others
 breaking badly. Lately in the case of PMU we've started to limit that.
The problem with PMU was that PMU capabilities was passed through even
for non -cpu host. There was no problem with -cpu host.

 Alex proposed -cpu best, which was never merged to date. It was similar
 to how ppc's -cpu host works:
According to http://wiki.qemu.org/Features/CPUModels#-cpu_host_vs_-cpu_best
it should select predefined cpu model closest to host one. Useful, bit
not the same as -cpu host.

 
 ppc matches the Processor Version Register (PVR) in kvm.c against its
 known models from cpu-models.c (strictly today, mask being discussed).
 The PVR can be read from userspace via mfpvr alias to mfspr (Move From
 Special Purpose Register; possibly emulated for userspace by kernel?).
 CPU features are all QEMU-driven AFAIU, through the CPU families in
 translate_init.c. Beware, everything is highly macro'fied in ppc code.
 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] -cpu host (was Re: KVM call minutes for 2013-08-06)

2013-08-25 Thread Gleb Natapov
On Fri, Aug 09, 2013 at 02:53:30PM -0300, Eduardo Habkost wrote:
 On Thu, Aug 08, 2013 at 12:29:07PM -0700, Christoffer Dall wrote:
  On Thu, Aug 08, 2013 at 08:05:11PM +0100, Peter Maydell wrote:
   On 8 August 2013 19:39, Christoffer Dall christoffer.d...@linaro.org 
   wrote:
FWIW, from the kernel point of view I'd much prefer to return this is
the type of VCPU that I prefer to emulate to user space on this current
host than having QEMU come up with its own suggestion for CPU and asking
the kernel for it.  The reason is that it gives us slightly more freedom
in how we choose to support a given host SoC in that we can say that we
at least support core A on core B, so if user space can deal with a
virtual core A, we should be good.
   
   Hmm, I'm not sure how useful a query support kind of API would
   be to QEMU. QEMU is basically going to have two use cases:
   (1) I want an A15 [ie -cpu cortex-a15]
   (2) give me whatever you have and I'll cope [ie -cpu host]
   
   so my thought was that we could just have the kernel support
   init.target = KVM_ARM_TARGET_HOST;
   memset(init.features, 0, sizeof(init.features));
   ret = kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, init);
   
   (in the same way we currently ask for KVM_ARM_TARGET_CORTEX_A15).
   
   I guess we could have a return preferred target value
   VM ioctl, but it seems a bit pointless given that the
   only thing userspace is going to do with the return
   value is immediately feed it back to the kernel...
   
  My thinking was that the result of cpu = KVM_ARM_TARGET_HOST would be
  the same as x = kvm_get_target_host(), cpu = x, but at the same time
  letting QEMU know what it's dealing with.  Perhaps QEMU doesn't need
  this for emulation, but isn't it useful for
  save/restore/migration/debugging scenarios?
 
 FWIW, this is how it works on x86: QEMU calls GET_SUPPORTED_CPUID and
 then uses the result on a SET_CPUID call.
 
 (Well, at least that's the general idea, but the details are a bit more
 complicated than that. e.g.: some features can be enabled only if some
 QEMU options are set as well, like tsc-deadline and x2apic, that require
 the in-kernel irqchip to be enabled.)
 
 Even if you don't really need this 2-step method today, doing this would
 allow QEMU to fiddle with some bits if necessary in the future.
 
Without 2-step method how QEMU ARM knows that it can create cpu type
user asked for? On x86 QEMU queries kernel about what feature can be
supported and then and it with requested features (feature can be
requested implicitly, by specifying cpu model name like Neahalem, or
explicitly, by adding +feature_name on -cpu parameter). -cpu host is
just a by product of this algorithm, it says enables everything that is
supported (supported  1 instead of supported  requested).

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] -cpu host (was Re: KVM call minutes for 2013-08-06)

2013-08-25 Thread Gleb Natapov
On Thu, Aug 08, 2013 at 07:20:41PM +0100, Peter Maydell wrote:
  For ARM you can't get at feature info of the host from userspace
  (unless you want to get into parsing /proc/cpuinfo), so my current
  idea is to have KVM_ARM_VCPU_INIT support a target-cpu-type
  which means whatever host CPU is. Then when we've created the
  vcpu we can populate QEMU's idea of what the CPU features are
  by using the existing ioctls for reading the cp15 registers of
  the vcpu.
 
  Sounds sane to me iff those cp15 registers all work with KVM and don't
  need any additional KVM/QEMU/device code.
 
 Yes; KVM won't tell us about CP15 registers unless they
 are exposed to the guest VM (that is, we're querying the
So why not implement something similar to x86 cpuid thing. Have
separate ioctls to query what CP15 registers KVM can support and what
is exposed to a guest?

 VCPU, not the host CPU). More generally, the cp15 tuple
 list code I landed a couple of months back makes the kernel
 the authoritative source for which cp15 registers exist and
 what their values are -- in -enable-kvm mode QEMU no longer
 cares about them (its own list of which registers exist for
 which CPU is used only for TCG).
 
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines

2013-08-25 Thread Gleb Natapov
On Fri, Aug 09, 2013 at 07:30:13AM -0700, Christoffer Dall wrote:
 The KVM_HPAGE_DEFINES are a little artificial on ARM, since the huge
 page size is statically defined at compile time and there is only a
 single huge page size.
 
 Now when the main kvm code relying on these defines has been moved to
 the x86 specific part of the world, we can get rid of these.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 
 [Resending this because the KVM list filter caught the subject of the
 previous message due to the letter X]
 
  arch/arm/include/asm/kvm_host.h |5 -
  1 file changed, 5 deletions(-)
 
 diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
 index 7d22517..e45a74b 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -38,11 +38,6 @@
  
  #define KVM_VCPU_MAX_FEATURES 1
  
 -/* We don't currently support large pages. */
 -#define KVM_HPAGE_GFN_SHIFT(x)   0
 -#define KVM_NR_PAGE_SIZES1
With huge page support you do have two different page sizes, but the only
code that uses those defines currently is in the x86 shadow mmu code,
so I am fine with moving gfn_to_index() into x86 specific code and getting rid
of those defines (you will probably need them when you will implement
nested HYP mode :)). But can you do it as a separate patch series and
remove the defines for all arches?

 -#define KVM_PAGES_PER_HPAGE(x)   (1UL31)
 -
  #include kvm/arm_vgic.h
  
  struct kvm_vcpu;
 -- 
 1.7.10.4
 
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines

2013-08-25 Thread Peter Maydell
On 25 August 2013 15:05, Gleb Natapov g...@redhat.com wrote:
 (you will probably need them when you will implement
 nested HYP mode :)).

Smiley noted, but this is pretty unlikely since it's not possible
to lie to the guest about which mode it's in, so you can't make
a guest think it's in Hyp mode.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines

2013-08-25 Thread Gleb Natapov
On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote:
 On 25 August 2013 15:05, Gleb Natapov g...@redhat.com wrote:
  (you will probably need them when you will implement
  nested HYP mode :)).
 
 Smiley noted, but this is pretty unlikely since it's not possible
 to lie to the guest about which mode it's in, so you can't make
 a guest think it's in Hyp mode.
 
I suspected this, but forgot most that I read about Hyp mode by now.
Need to refresh my memory ASAP. Is it impossible even with a lot of
emulation? Can guest detect that it is not in a Hyp mode without
trapping into hypervisor?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-25 Thread Alexander Graf

On 24.08.2013, at 21:14, Yann Droneaud wrote:

 KVM uses anon_inode_get() to allocate file descriptors as part
 of some of its ioctls. But those ioctls are lacking a flag argument
 allowing userspace to choose options for the newly opened file descriptor.
 
 In such case it's advised to use O_CLOEXEC by default so that
 userspace is allowed to choose, without race, if the file descriptor
 is going to be inherited across exec().
 
 This patch set O_CLOEXEC flag on all file descriptors created
 with anon_inode_getfd() to not leak file descriptors across exec().
 
 Signed-off-by: Yann Droneaud ydrone...@opteya.com
 Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com

Reviewed-by: Alexander Graf ag...@suse.de

Would it make sense to simply inherit the O_CLOEXEC flag from the parent kvm fd 
instead? That would give user space the power to keep fds across exec() if it 
wants to.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines

2013-08-25 Thread Peter Maydell
On 25 August 2013 15:48, Gleb Natapov g...@redhat.com wrote:
 On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote:
 Smiley noted, but this is pretty unlikely since it's not possible
 to lie to the guest about which mode it's in, so you can't make
 a guest think it's in Hyp mode.

 I suspected this, but forgot most that I read about Hyp mode by now.
 Need to refresh my memory ASAP. Is it impossible even with a lot of
 emulation? Can guest detect that it is not in a Hyp mode without
 trapping into hypervisor?

Yes. The current mode is in the the low bits of the CPSR, which
is readable without causing a trap. This is just the most obvious
roadblock; I bet there are more. If you really had to run Hyp mode
code in a VM you probably have to do it by having it all emulated
via TCG.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] KVM: arm-vgic: Add vgic reg access from dev attr

2013-08-25 Thread Alexander Graf

On 23.08.2013, at 20:20, Christoffer Dall wrote:

 Add infrastructure to handle distributor and cpu interface register
 accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the
 KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups
 and defining the semantics of the attr field to be the MMIO offset as
 specified in the GICv2 specs.
 
 Missing register accesses or other changes in individual register access
 functions to support save/restore of the VGIC state is added in
 subsequent patches.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 Documentation/virtual/kvm/devices/arm-vgic.txt |   35 ++
 virt/kvm/arm/vgic.c|  143 
 2 files changed, 178 insertions(+)
 
 diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
 b/Documentation/virtual/kvm/devices/arm-vgic.txt
 index c9febb2..1b68475 100644
 --- a/Documentation/virtual/kvm/devices/arm-vgic.txt
 +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
 @@ -19,3 +19,38 @@ Groups:
 KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit)
   Base address in the guest physical address space of the GIC virtual cpu
   interface register mappings.
 +
 +  KVM_DEV_ARM_VGIC_GRP_DIST_REGS
 +  Attributes:
 +The attr field of kvm_device_attr encodes two values:
 +bits: | 63     40 | 39 ..  32  |  31   0 |
 +values:   |reserved   |   cpu id   |  offset |
 +
 +All distributor regs are (rw, 32-bit)
 +
 +The offset is relative to the Distributor base address as defined in 
 the
 +GICv2 specs.  Getting or setting such a register has the same effect as
 +reading or writing the register on the actual hardware from the cpu
 +specified with cpu id field.  Note that most distributor fields are not
 +banked, but return the same value regardless of the cpu id used to access
 +the register.
 +  Limitations:
 +- Priorities are not implemented, and registers are RAZ/WI
 +  Errors:
 +- ENODEV: Getting or setting this register is not yet supported
 +
 +  KVM_DEV_ARM_VGIC_GRP_CPU_REGS
 +  Attributes:
 +The attr field of kvm_device_attr encodes two values:
 +bits: | 63     40 | 39 ..  32  |  31   0 |
 +values:   |reserved   |   cpu id   |  offset |
 +
 +All CPU regs are (rw, 32-bit)
 +
 +The offsetspecifies the offset from the CPU interface base address as

offset specifies

 +defined in the GICv2 specs.  Getting or setting such a register has the
 +same effect as reading or writing the register on the actual hardware.
 +  Limitations:
 +- Priorities are not implemented, and registers are RAZ/WI
 +  Errors:
 +- ENODEV: Getting or setting this register is not yet supported
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 629caeb..e31625c 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -591,11 +591,29 @@ static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu,
   return false;
 }
 
 +static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu,
 +   struct kvm_exit_mmio *mmio,
 +   phys_addr_t offset)
 +{
 + return false;
 +}
 +
 +static bool handle_mmio_sgi_set(struct kvm_vcpu *vcpu,
 + struct kvm_exit_mmio *mmio,
 + phys_addr_t offset)
 +{
 + return false;
 +}
 +
 /*
  * I would have liked to use the kvm_bus_io_*() API instead, but it
  * cannot cope with banked registers (only the VM pointer is passed
  * around, and we need the vcpu). One of these days, someone please
  * fix it!
 + *
 + * Note that the handle_mmio implementations should not use the phys_addr
 + * field from the kvm_exit_mmio struct as this will not have any sane values
 + * when used to save/restore state from user space.
  */
 struct mmio_range {
   phys_addr_t base;
 @@ -665,6 +683,16 @@ static const struct mmio_range vgic_dist_ranges[] = {
   .len= 4,
   .handle_mmio= handle_mmio_sgi_reg,
   },
 + {
 + .base   = GIC_DIST_SGI_CLEAR,
 + .len= VGIC_NR_SGIS,
 + .handle_mmio= handle_mmio_sgi_clear,
 + },
 + {
 + .base   = GIC_DIST_SGI_SET,
 + .len= VGIC_NR_SGIS,
 + .handle_mmio= handle_mmio_sgi_set,
 + },
   {}
 };
 
 @@ -1543,6 +1571,80 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, 
 u64 *addr, bool write)
   return r;
 }
 
 +static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
 +  struct kvm_exit_mmio *mmio, phys_addr_t offset)
 +{
 + return true;
 +}
 +
 +static const struct mmio_range vgic_cpu_ranges[] = {
 + {
 + .base   = GIC_CPU_CTRL,
 + .len= 12,
 + .handle_mmio= handle_cpu_mmio_misc,
 + },
 + {
 + .base

Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access

2013-08-25 Thread Alexander Graf

On 23.08.2013, at 20:20, Christoffer Dall wrote:

 Implement support for the CPU interface register access driven by MMIO
 address offsets from the CPU interface base address.  Useful for user
 space to support save/restore of the VGIC state.
 
 This commit adds support only for the same logic as the current VGIC
 support, and no more.  For example, the active priority registers are
 handled as RAZ/WI, just like setting priorities on the emulated
 distributor.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 virt/kvm/arm/vgic.c |   66 +++
 1 file changed, 62 insertions(+), 4 deletions(-)
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index d44b5a1..257dbae 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, 
 u64 *addr, bool write)
 static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
struct kvm_exit_mmio *mmio, phys_addr_t offset)
 {
 - return true;
 + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 + u32 reg, mask = 0, shift = 0;
 + bool updated = false;
 +
 + switch (offset  ~0x3) {
 + case GIC_CPU_CTRL:
 + mask = GICH_VMCR_CTRL_MASK;
 + shift = GICH_VMCR_CTRL_SHIFT;
 + break;
 + case GIC_CPU_PRIMASK:
 + mask = GICH_VMCR_PRIMASK_MASK;
 + shift = GICH_VMCR_PRIMASK_SHIFT;
 + break;
 + case GIC_CPU_BINPOINT:
 + mask = GICH_VMCR_BINPOINT_MASK;
 + shift = GICH_VMCR_BINPOINT_SHIFT;
 + break;
 + case GIC_CPU_ALIAS_BINPOINT:
 + mask = GICH_VMCR_ALIAS_BINPOINT_MASK;
 + shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT;
 + break;
 + }
 +
 + if (!mmio-is_write) {
 + reg = (vgic_cpu-vgic_vmcr  mask)  shift;
 + memcpy(mmio-data, reg, sizeof(reg));
 + } else {
 + memcpy(reg, mmio-data, sizeof(reg));
 + reg = (reg  shift)  mask;
 + if (reg != (vgic_cpu-vgic_vmcr  mask))
 + updated = true;
 + vgic_cpu-vgic_vmcr = ~mask;
 + vgic_cpu-vgic_vmcr |= reg;
 + }
 + return updated;
 +}
 +
 +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu,
 +  struct kvm_exit_mmio *mmio, phys_addr_t offset)
 +{
 + return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT);
 +}
 +
 +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
 +   struct kvm_exit_mmio *mmio,
 +   phys_addr_t offset)
 +{
 + u32 reg;
 +
 + if (mmio-is_write)
 + return false;
 +
 + reg = 0x0002043B;

This wants a comment and probably also a #define :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Support VGIC save/restore using device control API

2013-08-25 Thread Alexander Graf

On 23.08.2013, at 20:19, Christoffer Dall wrote:

 Implement save/restore of the VGIC state using the newer KVM Device
 Control API.  This requries some number of changes to existing code in
 addition to actually supporting save/restore of the necessary state.
 
 The first patches (01-03) support creating the VGIC using the Device
 Control API.  This change is necessary because there are no other
 suitable KVM APIs that we can leverage to access the VGIC state from
 user space and the device control API was crafted exactly for this
 purpose.
 
 Subsequent patches add the missing infrastructure and user space API
 pieces necessary to actually save and restore the VGIC state.  The GIC
 v2.0 architecture specification already specifies registers that can be
 used to save and restore the complete VGIC state for suspend/resume
 purposes on real hardware, and we can resuse this interface for the
 VGIC.  The API is therefore based on the memory-mapped register accesses
 defined in the specs.  See the individual patches for details.
 
 The patches rely on a number of small fixes sent separately to actually
 work:
 git://git.linaro.org/people/cdall/linux-kvm-arm.git vgic-migrate-prereq
 
 This patch series based on the above can be cloned from:
 git://git.linaro.org/people/cdall/linux-kvm-arm.git vgic-migrate
 
 User space patches for QEMU will follow shortly.  Tested on Versatile
 Express TC2.

Looks quite straight forward and nice.

Reviewed-by: Alexander Graf ag...@suse.de


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines

2013-08-25 Thread Alexander Graf

On 25.08.2013, at 16:18, Peter Maydell wrote:

 On 25 August 2013 15:48, Gleb Natapov g...@redhat.com wrote:
 On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote:
 Smiley noted, but this is pretty unlikely since it's not possible
 to lie to the guest about which mode it's in, so you can't make
 a guest think it's in Hyp mode.
 
 I suspected this, but forgot most that I read about Hyp mode by now.
 Need to refresh my memory ASAP. Is it impossible even with a lot of
 emulation? Can guest detect that it is not in a Hyp mode without
 trapping into hypervisor?
 
 Yes. The current mode is in the the low bits of the CPSR, which
 is readable without causing a trap. This is just the most obvious
 roadblock; I bet there are more. If you really had to run Hyp mode
 code in a VM you probably have to do it by having it all emulated
 via TCG.

Or in an in-kernel instruction emulator that we have lying around anyways. For 
kvm-in-kvm that should be good enough, as we only need to execute a few 
instructions in HYP mode.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-25 Thread Arthur Chunqi Li
This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2-L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of Save VMX-preemption timer value VM-Exit controls
to nVMX.

With this patch, nested VMX preemption timer features are fully
supported.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 arch/x86/kvm/vmx.c |   49 -
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 57b4e12..6aa320e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
-   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER))
+   nested_vmx_exit_ctls_high =
+   (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
+   if (!(nested_vmx_exit_ctls_high  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
+   nested_vmx_pinbased_ctls_high =
+   (~PIN_BASED_VMX_PREEMPTION_TIMER);
nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
  VM_EXIT_LOAD_IA32_EFER);
 
@@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 
*info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
 }
 
+static void nested_fix_preempt(struct kvm_vcpu *vcpu)
+{
+   u64 delta_guest_tsc;
+   u32 preempt_val, preempt_bit, delta_preempt_val;
+
+   preempt_bit = native_read_msr(MSR_IA32_VMX_MISC)  0x1F;
+   delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu,
+   native_read_tsc()) - vcpu-arch.last_guest_tsc;
+   delta_preempt_val = delta_guest_tsc  preempt_bit;
+   preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   if (preempt_val - delta_preempt_val  0)
+   preempt_val = 0;
+   else
+   preempt_val -= delta_preempt_val;
+   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val);
+}
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
else
vmx-nested.nested_run_pending = 0;
 
-   if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
-   nested_vmx_vmexit(vcpu);
-   return 1;
+   if (is_guest_mode(vcpu)) {
+   if (nested_vmx_exit_handled(vcpu)) {
+   nested_vmx_vmexit(vcpu);
+   return 1;
+   } else
+   nested_fix_preempt(vcpu);
}
 
if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
@@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   u32 exit_control;
 
vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
@@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
 * bits are further modified by vmx_set_efer() below.
 */
-   vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
+   exit_control = vmcs_config.vmexit_ctrl;
+   if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
+   exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   vmcs_write32(VM_EXIT_CONTROLS, exit_control);
 
/* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
 * emulated by vmx_set_efer(), below.
@@ -8089,6 +8119,15 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmcs12-guest_pending_dbg_exceptions =
vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 
+   if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER) 
{
+   if (vmcs12-vm_exit_controls  
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
+   vmcs12-vmx_preemption_timer_value =
+   vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   else
+   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
+   vmcs12-vmx_preemption_timer_value);
+   }
+
/*
 * In some cases (usually, nested EPT), L2 is allowed to change its
 * own CR3 without exiting. If it has changed it, we must keep it.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines

2013-08-25 Thread Christoffer Dall
On Sun, Aug 25, 2013 at 04:18:02PM +0100, Peter Maydell wrote:
 On 25 August 2013 15:48, Gleb Natapov g...@redhat.com wrote:
  On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote:
  Smiley noted, but this is pretty unlikely since it's not possible
  to lie to the guest about which mode it's in, so you can't make
  a guest think it's in Hyp mode.
 
  I suspected this, but forgot most that I read about Hyp mode by now.
  Need to refresh my memory ASAP. Is it impossible even with a lot of
  emulation? Can guest detect that it is not in a Hyp mode without
  trapping into hypervisor?
 
 Yes. The current mode is in the the low bits of the CPSR, which
 is readable without causing a trap. This is just the most obvious
 roadblock; I bet there are more. If you really had to run Hyp mode
 code in a VM you probably have to do it by having it all emulated
 via TCG.
 
Or through some highly paravirtualized nested virtualization solution if
that is ever relevant.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] tile: support KVM for tilegx

2013-08-25 Thread Chris Metcalf
On 8/25/2013 7:39 AM, Gleb Natapov wrote:
 On Mon, Aug 12, 2013 at 04:24:11PM -0400, Chris Metcalf wrote:
 This change provides the initial framework support for KVM on tilegx.
 Basic virtual disk and networking is supported.

 This needs to be broken down to more reviewable patches.

I already broke out one pre-requisite patch that wasn't strictly KVM-related:

https://lkml.org/lkml/2013/8/12/339

In addition, we've separately arranged to support booting our kernels in a way 
that is compatible with the Tilera booter running at the highest privilege 
level, which enables multiple kernel privilege levels:

https://lkml.org/lkml/2013/5/2/468

How would you recommend further breaking down this patch?  It's pretty much 
just the basic support for minimal KVM.  I suppose I could break out all the 
I/O related stuff into a separate patch, though it wouldn't amount to much; 
perhaps the console could also be broken out separately.  Any other suggestions?

 Also can you
 describe the implementation a little bit? Does tile arch has vitalization
 extension this implementation uses, or is it trap and emulate approach?
 If later does it run unmodified guest kernels? What userspace are you
 using with this implementation?

We could do full virtualization via trap and emulate, but we've elected to do a 
para-virtualized approach.  Userspace runs at PL (privilege level) 0, the guest 
kernel runs at PL1, and the host runs at PL2.  We have available per-PL 
resources for various things, and take advantage of having two on-chip timers 
(for example) to handle timing for the host and guest kernels.  We run the same 
userspace with either the host or the guest.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Partial huge page backing with KVM/qemu

2013-08-25 Thread Chris Leduc


 -Original Message-
 From: Gleb Natapov [mailto:g...@redhat.com]
 Sent: Sunday, August 25, 2013 1:52 AM
 To: Chris Leduc
 Cc: kvm@vger.kernel.org
 Subject: Re: Partial huge page backing with KVM/qemu
 
 On Sat, Aug 24, 2013 at 12:32:07AM +, Chris Leduc wrote:
  Hi - In a KVM/qemu environment is it possible for the host to back only a
 portion of the guests memory with huge pages?  In some situations it may
 not be desirable to back the entirety of a guest's memory with huge pages
 (as can be done via libvirt memoryBacking option).
 What are those situations?
For example to limit a guest with 64GB of total memory to use 4GB of huge pages 
for fast lookup memory.  This takes advantage of the 4 TLB entries for 1G pages 
on a Sandy/Ivy Bridge processor to ensure a page walk is never necessary for 
this fast memory.  An example is a high performance data plane application.  
The remainder of the less frequently accessed memory can be in normal pages.

  What would be very useful is to request huge pages in the guest, either at
 boot time or dynamically, and have the host back them with physical huge
 pages, but not back the rest of the normal page guest memory with huge
 pages from the host.
 
  The equivalent in Xen is setting allowsuperpage=1 on the hypervisor boot
 line.
 
 As far as I can tell this disables/enables use of huge pages by XEN vm, not
 something you say you want.

The Xen documentation is not clear on this, but in practice this flag allows 
the host to back up guest huge page requests with physical huge pages.  So the 
guest could for example add hugepages=N to its boot line and these pages would 
be backed in the host with corresponding physical huge pages.

From experimentation with KVM, requesting hugepages at guest boot time 
(without memory backing enabled) will result in guest hugepages backed by host 
normal pages.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60782] [Nested] Windows XP Mode can not work

2013-08-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60782

--- Comment #2 from Zhou, Chao chao.z...@intel.com ---
(In reply to Paolo Bonzini from comment #1)
 Try -cpu host,+vmx instead.

This issue still exists.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-25 Thread Alexander Graf

On 24.08.2013, at 21:14, Yann Droneaud wrote:

 KVM uses anon_inode_get() to allocate file descriptors as part
 of some of its ioctls. But those ioctls are lacking a flag argument
 allowing userspace to choose options for the newly opened file descriptor.
 
 In such case it's advised to use O_CLOEXEC by default so that
 userspace is allowed to choose, without race, if the file descriptor
 is going to be inherited across exec().
 
 This patch set O_CLOEXEC flag on all file descriptors created
 with anon_inode_getfd() to not leak file descriptors across exec().
 
 Signed-off-by: Yann Droneaud ydrone...@opteya.com
 Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com

Reviewed-by: Alexander Graf ag...@suse.de

Would it make sense to simply inherit the O_CLOEXEC flag from the parent kvm fd 
instead? That would give user space the power to keep fds across exec() if it 
wants to.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly

2013-08-25 Thread Alexander Graf

On 23.08.2013, at 05:28, Benjamin Herrenschmidt wrote:

 On Fri, 2013-08-23 at 09:01 +0530, Aneesh Kumar K.V wrote:
 Alexander Graf ag...@suse.de writes:
 
 On 22.08.2013, at 12:37, Aneesh Kumar K.V wrote:
 
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Isn't this you?
 
 Yes. The patches are generated using git format-patch and sent by
 git send-email. That's how it always created patches for me. I am not sure if
 there is a config I can change to avoid having From:
 
 Don't bother, that's perfectly fine, and git am will do the right thing.

It will, but it's an indicator that something in his git config is 
misconfigured. Usually when git sees Author == Sender it will omit the From: 
line.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly

2013-08-25 Thread Alexander Graf

On 23.08.2013, at 04:31, Aneesh Kumar K.V wrote:

 Alexander Graf ag...@suse.de writes:
 
 On 22.08.2013, at 12:37, Aneesh Kumar K.V wrote:
 
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Isn't this you?
 
 Yes. The patches are generated using git format-patch and sent by
 git send-email. That's how it always created patches for me. I am not sure if
 there is a config I can change to avoid having From:
 
 
 
 We should be able to copy upto count bytes
 
 Why?
 
 
 Without this we end up doing
 
 +struct kvm_get_htab_buf {
 +struct kvm_get_htab_header header;
 +/*
 + * Older kernel required one extra byte.
 + */
 +unsigned long hpte[3];
 +} hpte_buf;
 
 
 even though we are only looking for one hpte entry.

Ok, please give me an example with real numbers and why it breaks.


Alex

 
 http://mid.gmane.org/1376995766-16526-4-git-send-email-aneesh.ku...@linux.vnet.ibm.com
 
 
 Alex
 
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index 710d313..0ae6bb6 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -1362,7 +1362,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
 __user *buf,
 lbuf = (unsigned long __user *)buf;
 
 nb = 0;
 -   while (nb + sizeof(hdr) + HPTE_SIZE  count) {
 +   while (nb + sizeof(hdr) + HPTE_SIZE = count) {
 /* Initialize header */
 hptr = (struct kvm_get_htab_header __user *)buf;
 hdr.n_valid = 0;
 @@ -1385,7 +1385,7 @@ static ssize_t kvm_htab_read(struct file *file, char 
 __user *buf,
 /* Grab a series of valid entries */
 while (i  kvm-arch.hpt_npte 
hdr.n_valid  0x 
 -  nb + HPTE_SIZE  count 
 +  nb + HPTE_SIZE = count 
record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
 /* valid entry, write it out */
 ++hdr.n_valid;
 -- 
 1.8.1.2
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html