Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-11-09 Thread Abel Gordon


Gleb Natapov g...@redhat.com wrote on 29/10/2009 19:31:05:

 [image removed]

 Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

 Gleb Natapov

 to:

 Orit Wasserman

 29/10/2009 19:31

 Cc:

 Abel Gordon, aliguori, Ben-Ami Yassour1, kvm, mdday, Muli Ben-Yehuda

 On Wed, Oct 28, 2009 at 06:23:42PM +0200, Orit Wasserman wrote:
 
 
  Gleb Natapov g...@redhat.com wrote on 25/10/2009 11:44:31:
 
   From:
  
   Gleb Natapov g...@redhat.com
  
   To:
  
   Orit Wasserman/Haifa/i...@ibmil
  
   Cc:
  
   Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/
   Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben-
   Yehuda/Haifa/i...@ibmil
  
   Date:
  
   25/10/2009 11:44
  
   Subject:
  
   Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
  
   On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote:
   
   
Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58:
   
 From:

 Gleb Natapov g...@redhat.com

 To:

 Orit Wasserman/Haifa/i...@ibmil

 Cc:

 Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami
Yassour1/
 Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben-
 Yehuda/Haifa/i...@ibmil

 Date:

 22/10/2009 11:05

 Subject:

 Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and
vmresume

 On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote:
@@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts
  (struct
   vcpu_vmx *vmx)
int type;
bool idtv_info_valid;
   
-   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-
vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
   
+   if (vmx-nested.nested_mode)
+  return;
+
   Why return here? What the function does that should not be
done
  in
   nested mode?
  In nested mode L0 injects an interrupt to L2 only in one
scenario,
  if there is an IDT_VALID event and L0 decides to run L2 again
and
  not
to
  switch back to L1.
  In all other cases the injection is handled by L1.
 This is exactly the kind of scenario that is handled by
 vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store
 pending event in arch agnostic way and re-injection is handled by
 x86.c You bypass this logic by inserting return here and
introducing
 nested_handle_valid_idt() function below.
The only location we can truly know if we are switching to L1 is in
vmx_vcpu_run
because enable_irq_window (that is called after handling the exit)
can
decide to
switch to L1 because of an interrupt.
   enable_irq_window() will be called after L2 VMCS will be setup for
event
   re-injection by previous call to inject_pending_event(). As far as I
   can see this should work for interrupt injection. For exception we
   should probably require l2 guest to re execute faulted instruction
for
   now like svm does.
  The main issue is that L0 doesn't inject events to L2 but L1 hypervisor
(we
  want to keep the nested hypervisor semantics as
  much as possible). Only if the event was caused by the fact that L2 is
a
  nested guest
  and L1 can't handle it L0 will re-inject and event to L2, for example
IDT
  event
  with page fault that is caused by a missing entry in SPT02 (the shadow
page
  table L0 create for L2).
  In this case when vmx_complete_intterupts is called L0 doesn't know if
the
  page fault should be handled by it or
  by L1 (it is decided later when handling the exit).
 So what? When it will be decided that L2 exit is needed pending event
 will be transfered into L2's idt_vectoring_info. Otherwise event will be
 reinfected by usual mechanism. BTW I don't see where you current code
 setup L2's idt_vectoring_info if it is decided that L1 should handle
 event re-injection.
Suppose we are executing an L2 guest and we got an exit. There are 2
possible scenarios here:
A) The L2 exit will be handled by the L1 guest hypervisor. In this case
when we switch to L1 the IDT vectoring info field is copied from vmcs(02)
to vmcs(12) in prepare_vmcs_12 (part of the nested_vmx_vmexit path). Now is
under responsibility of L1 to deal with the IDT and do the corresponding
logic.
B) The L2 exit will be handled only by L0. In this case we never switch to
L1. L0 handles the exit and resume L2. Any pending event in vmcs(02) idt
vectoring info field is injected to l2 when L0 resumes it.

KVM handles IDT in at the end of vmx_vcpu_run, calling
vmx_complete_interrupts. The decision to switch or not switch to L1 is made
in the following points:
1) nested_vmx_check_exception (called from vmx_queue_exception)
2) nested_vmx_intr (called from vmx_interrupt_allowed and
enable_irq_window)
3) vmx_handle_exit

From x86 perspective the flow looks as follow:
vcpu_enter_guest {
 1
 2
 run (includes vmx_complete_interrupts)
 3
}


All these functions are called

Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-29 Thread Gleb Natapov
On Wed, Oct 28, 2009 at 06:23:42PM +0200, Orit Wasserman wrote:
 
 
 Gleb Natapov g...@redhat.com wrote on 25/10/2009 11:44:31:
 
  From:
 
  Gleb Natapov g...@redhat.com
 
  To:
 
  Orit Wasserman/Haifa/i...@ibmil
 
  Cc:
 
  Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/
  Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben-
  Yehuda/Haifa/i...@ibmil
 
  Date:
 
  25/10/2009 11:44
 
  Subject:
 
  Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
 
  On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote:
  
  
   Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58:
  
From:
   
Gleb Natapov g...@redhat.com
   
To:
   
Orit Wasserman/Haifa/i...@ibmil
   
Cc:
   
Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/
Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben-
Yehuda/Haifa/i...@ibmil
   
Date:
   
22/10/2009 11:05
   
Subject:
   
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
   
On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote:
   @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts
 (struct
  vcpu_vmx *vmx)
   int type;
   bool idtv_info_valid;
  
   -   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
   -
   vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
  
   +   if (vmx-nested.nested_mode)
   +  return;
   +
  Why return here? What the function does that should not be done
 in
  nested mode?
 In nested mode L0 injects an interrupt to L2 only in one scenario,
 if there is an IDT_VALID event and L0 decides to run L2 again and
 not
   to
 switch back to L1.
 In all other cases the injection is handled by L1.
This is exactly the kind of scenario that is handled by
vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store
pending event in arch agnostic way and re-injection is handled by
x86.c You bypass this logic by inserting return here and introducing
nested_handle_valid_idt() function below.
   The only location we can truly know if we are switching to L1 is in
   vmx_vcpu_run
   because enable_irq_window (that is called after handling the exit) can
   decide to
   switch to L1 because of an interrupt.
  enable_irq_window() will be called after L2 VMCS will be setup for event
  re-injection by previous call to inject_pending_event(). As far as I
  can see this should work for interrupt injection. For exception we
  should probably require l2 guest to re execute faulted instruction for
  now like svm does.
 The main issue is that L0 doesn't inject events to L2 but L1 hypervisor (we
 want to keep the nested hypervisor semantics as
 much as possible). Only if the event was caused by the fact that L2 is a
 nested guest
 and L1 can't handle it L0 will re-inject and event to L2, for example IDT
 event
 with page fault that is caused by a missing entry in SPT02 (the shadow page
 table L0 create for L2).
 In this case when vmx_complete_intterupts is called L0 doesn't know if the
 page fault should be handled by it or
 by L1 (it is decided later when handling the exit).
So what? When it will be decided that L2 exit is needed pending event
will be transfered into L2's idt_vectoring_info. Otherwise event will be
reinfected by usual mechanism. BTW I don't see where you current code
setup L2's idt_vectoring_info if it is decided that L1 should handle
event reinjection.

 In most other cases , L0 will switch to L1 and L1 will decide if there will
 be re-injection
 (depends on the L1 hypervisor logic) and update L2 VMCS accordingly.
 
   In order to simplify our code it was simpler to bypass
   vmx_complete_interrupts when it is called (after
   running L2) and to add nested_handle_valid_idt just before running L2.
 
   +   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
   +
   /* Handle machine checks before interrupts are enabled */
   if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
   || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI
   @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct
 vcpu_vmx
 *vmx)
  | vmx-rmode.irq.vector;
}
  
   +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu)
   +{
  It seems by this function you are trying to bypass general event
  reinjection logic. Why?
 See above.
The logic implemented by this function is handled in x86.c in arch
agnostic way. Is there something wrong with this?
   See my comment before
  Sometimes it is wrong to reinject events from L0 to L2 directly. If L2
  was not able to handle event because its IDT is not mapped by L1 shadow
  page table we should generate PF vmexit with valid idt vectoring info to
  L1 and let L1 handle event reinjection.
 
  --
   Gleb.

--
Gleb.
--
To unsubscribe from

Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-28 Thread Orit Wasserman


Gleb Natapov g...@redhat.com wrote on 25/10/2009 11:44:31:

 From:

 Gleb Natapov g...@redhat.com

 To:

 Orit Wasserman/Haifa/i...@ibmil

 Cc:

 Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/
 Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben-
 Yehuda/Haifa/i...@ibmil

 Date:

 25/10/2009 11:44

 Subject:

 Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

 On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote:
 
 
  Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58:
 
   From:
  
   Gleb Natapov g...@redhat.com
  
   To:
  
   Orit Wasserman/Haifa/i...@ibmil
  
   Cc:
  
   Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/
   Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben-
   Yehuda/Haifa/i...@ibmil
  
   Date:
  
   22/10/2009 11:05
  
   Subject:
  
   Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
  
   On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote:
  @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts
(struct
 vcpu_vmx *vmx)
  int type;
  bool idtv_info_valid;
 
  -   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
  -
  vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
 
  +   if (vmx-nested.nested_mode)
  +  return;
  +
 Why return here? What the function does that should not be done
in
 nested mode?
In nested mode L0 injects an interrupt to L2 only in one scenario,
if there is an IDT_VALID event and L0 decides to run L2 again and
not
  to
switch back to L1.
In all other cases the injection is handled by L1.
   This is exactly the kind of scenario that is handled by
   vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store
   pending event in arch agnostic way and re-injection is handled by
   x86.c You bypass this logic by inserting return here and introducing
   nested_handle_valid_idt() function below.
  The only location we can truly know if we are switching to L1 is in
  vmx_vcpu_run
  because enable_irq_window (that is called after handling the exit) can
  decide to
  switch to L1 because of an interrupt.
 enable_irq_window() will be called after L2 VMCS will be setup for event
 re-injection by previous call to inject_pending_event(). As far as I
 can see this should work for interrupt injection. For exception we
 should probably require l2 guest to re execute faulted instruction for
 now like svm does.
The main issue is that L0 doesn't inject events to L2 but L1 hypervisor (we
want to keep the nested hypervisor semantics as
much as possible). Only if the event was caused by the fact that L2 is a
nested guest
and L1 can't handle it L0 will re-inject and event to L2, for example IDT
event
with page fault that is caused by a missing entry in SPT02 (the shadow page
table L0 create for L2).
In this case when vmx_complete_intterupts is called L0 doesn't know if the
page fault should be handled by it or
by L1 (it is decided later when handling the exit).
In most other cases , L0 will switch to L1 and L1 will decide if there will
be re-injection
(depends on the L1 hypervisor logic) and update L2 VMCS accordingly.

  In order to simplify our code it was simpler to bypass
  vmx_complete_interrupts when it is called (after
  running L2) and to add nested_handle_valid_idt just before running L2.

  +   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
  +
  /* Handle machine checks before interrupts are enabled */
  if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
  || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI
  @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct
vcpu_vmx
*vmx)
 | vmx-rmode.irq.vector;
   }
 
  +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu)
  +{
 It seems by this function you are trying to bypass general event
 reinjection logic. Why?
See above.
   The logic implemented by this function is handled in x86.c in arch
   agnostic way. Is there something wrong with this?
  See my comment before
 Sometimes it is wrong to reinject events from L0 to L2 directly. If L2
 was not able to handle event because its IDT is not mapped by L1 shadow
 page table we should generate PF vmexit with valid idt vectoring info to
 L1 and let L1 handle event reinjection.

 --
  Gleb.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-25 Thread Gleb Natapov
On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote:
 
 
 Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58:
 
  From:
 
  Gleb Natapov g...@redhat.com
 
  To:
 
  Orit Wasserman/Haifa/i...@ibmil
 
  Cc:
 
  Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/
  Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben-
  Yehuda/Haifa/i...@ibmil
 
  Date:
 
  22/10/2009 11:05
 
  Subject:
 
  Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
 
  On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote:
 @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts(struct
vcpu_vmx *vmx)
 int type;
 bool idtv_info_valid;

 -   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 -
 vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);

 +   if (vmx-nested.nested_mode)
 +  return;
 +
Why return here? What the function does that should not be done in
nested mode?
   In nested mode L0 injects an interrupt to L2 only in one scenario,
   if there is an IDT_VALID event and L0 decides to run L2 again and not
 to
   switch back to L1.
   In all other cases the injection is handled by L1.
  This is exactly the kind of scenario that is handled by
  vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store
  pending event in arch agnostic way and re-injection is handled by
  x86.c You bypass this logic by inserting return here and introducing
  nested_handle_valid_idt() function below.
 The only location we can truly know if we are switching to L1 is in
 vmx_vcpu_run
 because enable_irq_window (that is called after handling the exit) can
 decide to
 switch to L1 because of an interrupt.
enable_irq_window() will be called after L2 VMCS will be setup for event
re-injection by previous call to inject_pending_event(). As far as I
can see this should work for interrupt injection. For exception we
should probably require l2 guest to re execute faulted instruction for
now like svm does.

 In order to simplify our code it was simpler to bypass
 vmx_complete_interrupts when it is called (after
 running L2) and to add nested_handle_valid_idt just before running L2.
   
 +   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 +
 /* Handle machine checks before interrupts are enabled */
 if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
 || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI
 @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct vcpu_vmx
   *vmx)
| vmx-rmode.irq.vector;
  }

 +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu)
 +{
It seems by this function you are trying to bypass general event
reinjection logic. Why?
   See above.
  The logic implemented by this function is handled in x86.c in arch
  agnostic way. Is there something wrong with this?
 See my comment before
Sometimes it is wrong to reinject events from L0 to L2 directly. If L2
was not able to handle event because its IDT is not mapped by L1 shadow
page table we should generate PF vmexit with valid idt vectoring info to
L1 and let L1 handle event reinjection.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-22 Thread Gleb Natapov
On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote:
   @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts(struct
  vcpu_vmx *vmx)
   int type;
   bool idtv_info_valid;
  
   -   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
   -
   vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
  
   +   if (vmx-nested.nested_mode)
   +  return;
   +
  Why return here? What the function does that should not be done in
  nested mode?
 In nested mode L0 injects an interrupt to L2 only in one scenario,
 if there is an IDT_VALID event and L0 decides to run L2 again and not to
 switch back to L1.
 In all other cases the injection is handled by L1.
This is exactly the kind of scenario that is handled by
vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store
pending event in arch agnostic way and re-injection is handled by
x86.c You bypass this logic by inserting return here and introducing
nested_handle_valid_idt() function below.

 
   +   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
   +
   /* Handle machine checks before interrupts are enabled */
   if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
   || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI
   @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct vcpu_vmx
 *vmx)
  | vmx-rmode.irq.vector;
}
  
   +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu)
   +{
  It seems by this function you are trying to bypass general event
  reinjection logic. Why?
 See above.
The logic implemented by this function is handled in x86.c in arch
agnostic way. Is there something wrong with this?

   +   vmx-launched = vmx-nested.l2_state-launched;
   +
  Can you explain why -launched logic is needed?
 It is possible L1 called vmlaunch but we didn't actually run L2 (for
 example there was an interrupt and
 enable_irq_window switched back to L1 before running L2). L1 thinks the
 vmlaunch was successful and call vmresume in the next time
 but KVM needs to call vmlaunch for L2.
handle_vmlauch() and handle_vmresume() are exactly the same. Why KVM needs
to run one and not the other?
 
   +static int nested_vmx_vmexit(struct kvm_vcpu *vcpu,
   +  bool is_interrupt)
   +{
   +   struct vcpu_vmx *vmx = to_vmx(vcpu);
   +   int initial_pfu_active = vcpu-fpu_active;
   +
   +   if (!vmx-nested.nested_mode) {
   +  printk(KERN_INFO WARNING: %s called but not in nested mode\n,
   + __func__);
   +  return 0;
   +   }
   +
   +   save_msrs(vmx-guest_msrs, vmx-save_nmsrs);
   +
   +   sync_cached_regs_to_vmcs(vcpu);
   +
   +   if (!nested_map_shadow_vmcs(vcpu)) {
   +  printk(KERN_INFO Error mapping shadow vmcs\n);
   +  set_rflags_to_vmx_fail_valid(vcpu);
  Error during vmexit should set abort flag, not change flags.
 I think this is more a vmlaunch/vmresume error (in the code that switch
 back to L1).
How is this vmlaunch/vmresume error? This function is called to exit
from L2 guest while on L2 vcms. It is called asynchronously in respect
to L2 guest and you modify L2 guest rflags register at unpredictable
place here.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-22 Thread Avi Kivity

On 10/21/2009 04:43 PM, Orit Wasserman wrote:


It is possible L1 called vmlaunch but we didn't actually run L2 (for
example there was an interrupt and
enable_irq_window switched back to L1 before running L2). L1 thinks the
vmlaunch was successful and call vmresume in the next time
but KVM needs to call vmlaunch for L2.
   


Is it really possible?  If vmlaunch is started, it should complete 
unconditionally for L1.  The irq window should be recalculated based on 
the guest vmcs exit on external interrupt and guest eflags/cr8 etc.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-22 Thread Orit Wasserman


Avi Kivity a...@redhat.com wrote on 20/10/2009 06:56:39:

 From:

 Avi Kivity a...@redhat.com

 To:

 Orit Wasserman/Haifa/i...@ibmil

 Cc:

 kvm@vger.kernel.org, Ben-Ami Yassour1/Haifa/i...@ibmil, Abel Gordon/
 Haifa/i...@ibmil, Muli Ben-Yehuda/Haifa/i...@ibmil,
 aligu...@us.ibm.com, md...@us.ibm.com

 Date:

 20/10/2009 06:56

 Subject:

 Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

 On 10/15/2009 11:41 PM, or...@il.ibm.com wrote:
  From: Orit Wassermanor...@il.ibm.com
 
  ---
arch/x86/kvm/vmx.c | 1173 ++
 --
1 files changed, 1148 insertions(+), 25 deletions(-)
 
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index 6a4c252..e814029 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -209,6 +209,7 @@ struct __attribute__ ((__packed__)) level_state {
   struct vmcs *vmcs;
   int cpu;
   int launched;
  +   bool first_launch;
};
 
struct nested_vmx {
  @@ -216,6 +217,12 @@ struct nested_vmx {
   bool vmxon;
   /* What is the location of the  vmcs l1 keeps for l2? (in
 level1 gpa) */
   u64 vmptr;
  +   /* Are we running nested guest */
  +   bool nested_mode;
  +   /* L1 requested VMLAUNCH or VMRESUME but we didn't run L2 yet */
  +   bool nested_run_pending;
  +   /* flag indicating if there was a valid IDT after exiting from l2
*/
  +   bool nested_valid_idt;
 

 Did you mean valid_idt_vectoring_info?
yes.

 No need to prefix everything with nested_ inside nested_vmx.
OK.

  +void prepare_vmcs_12(struct kvm_vcpu *vcpu)
  +{
  +   struct shadow_vmcs *l2_shadow_vmcs =
  +  get_shadow_vmcs(vcpu);
  +
  +   l2_shadow_vmcs-guest_es_selector = vmcs_read16(GUEST_ES_SELECTOR);
  +   l2_shadow_vmcs-guest_cs_selector = vmcs_read16(GUEST_CS_SELECTOR);
  +   l2_shadow_vmcs-guest_ss_selector = vmcs_read16(GUEST_SS_SELECTOR);
  +   l2_shadow_vmcs-guest_ds_selector = vmcs_read16(GUEST_DS_SELECTOR);
  +   l2_shadow_vmcs-guest_fs_selector = vmcs_read16(GUEST_FS_SELECTOR);
  +   l2_shadow_vmcs-guest_gs_selector = vmcs_read16(GUEST_GS_SELECTOR);
  +   l2_shadow_vmcs-guest_ldtr_selector = vmcs_read16
(GUEST_LDTR_SELECTOR);
  +   l2_shadow_vmcs-guest_tr_selector = vmcs_read16(GUEST_TR_SELECTOR);
  +
  +   l2_shadow_vmcs-tsc_offset = vmcs_read64(TSC_OFFSET);
  +   l2_shadow_vmcs-guest_physical_address =
  +  vmcs_read64(GUEST_PHYSICAL_ADDRESS);
  +   l2_shadow_vmcs-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER);
 

 Physical addresses need translation,  no?
If you are referring to GUEST_PHYSICAL_ADDRESS than there is no need for
translation for L1.
It need to stay in L2 physical address.

  +   l2_shadow_vmcs-guest_cr0 = vmcs_readl(GUEST_CR0);
  +
  +   l2_shadow_vmcs-guest_cr4 = vmcs_readl(GUEST_CR4);
 

 We don't allow the guest to modify these, so no need to read them.  If
 you do, you need to remove the bits that we modify.
You are correct for example CR0.TS , it will be fixed in the next set of
patches.

  +
  +int load_vmcs_common(struct shadow_vmcs *src)
  +{
  +
  +   vmcs_write64(VMCS_LINK_POINTER, src-vmcs_link_pointer);
 

 Why load this?
At the moment it is not used , maybe in the future.
I can add a check to see if it was changed.

  +   vmcs_write64(GUEST_IA32_DEBUGCTL, src-guest_ia32_debugctl);
 

 I think some features there are dangerous.
I will look into it.

  +   vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, src-
vm_entry_msr_load_count);
 

 Need to verify?  Also need to validate the loaded MSRs and run them
 through kvm_set_msr() instead of letting the cpu do it.
I will add the checks.

 --
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-22 Thread Orit Wasserman


Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58:

 From:

 Gleb Natapov g...@redhat.com

 To:

 Orit Wasserman/Haifa/i...@ibmil

 Cc:

 Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/
 Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben-
 Yehuda/Haifa/i...@ibmil

 Date:

 22/10/2009 11:05

 Subject:

 Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

 On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote:
@@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts(struct
   vcpu_vmx *vmx)
int type;
bool idtv_info_valid;
   
-   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-
vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
   
+   if (vmx-nested.nested_mode)
+  return;
+
   Why return here? What the function does that should not be done in
   nested mode?
  In nested mode L0 injects an interrupt to L2 only in one scenario,
  if there is an IDT_VALID event and L0 decides to run L2 again and not
to
  switch back to L1.
  In all other cases the injection is handled by L1.
 This is exactly the kind of scenario that is handled by
 vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store
 pending event in arch agnostic way and re-injection is handled by
 x86.c You bypass this logic by inserting return here and introducing
 nested_handle_valid_idt() function below.
The only location we can truly know if we are switching to L1 is in
vmx_vcpu_run
because enable_irq_window (that is called after handling the exit) can
decide to
switch to L1 because of an interrupt.
In order to simplify our code it was simpler to bypass
vmx_complete_interrupts when it is called (after
running L2) and to add nested_handle_valid_idt just before running L2.
  
+   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+
/* Handle machine checks before interrupts are enabled */
if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
|| (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI
@@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct vcpu_vmx
  *vmx)
   | vmx-rmode.irq.vector;
 }
   
+static int nested_handle_valid_idt(struct kvm_vcpu *vcpu)
+{
   It seems by this function you are trying to bypass general event
   reinjection logic. Why?
  See above.
 The logic implemented by this function is handled in x86.c in arch
 agnostic way. Is there something wrong with this?
See my comment before

+   vmx-launched = vmx-nested.l2_state-launched;
+
   Can you explain why -launched logic is needed?
  It is possible L1 called vmlaunch but we didn't actually run L2 (for
  example there was an interrupt and
  enable_irq_window switched back to L1 before running L2). L1 thinks the
  vmlaunch was successful and call vmresume in the next time
  but KVM needs to call vmlaunch for L2.
 handle_vmlauch() and handle_vmresume() are exactly the same. Why KVM
needs
 to run one and not the other?
Yes they are very similar (almost the same code) the only difference is the
check of vmclear,
we need to emulate the vmx hardware behavior for those two commands and
check VMC12 state.

+static int nested_vmx_vmexit(struct kvm_vcpu *vcpu,
+  bool is_interrupt)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   int initial_pfu_active = vcpu-fpu_active;
+
+   if (!vmx-nested.nested_mode) {
+  printk(KERN_INFO WARNING: %s called but not in nested mode
\n,
+ __func__);
+  return 0;
+   }
+
+   save_msrs(vmx-guest_msrs, vmx-save_nmsrs);
+
+   sync_cached_regs_to_vmcs(vcpu);
+
+   if (!nested_map_shadow_vmcs(vcpu)) {
+  printk(KERN_INFO Error mapping shadow vmcs\n);
+  set_rflags_to_vmx_fail_valid(vcpu);
   Error during vmexit should set abort flag, not change flags.
  I think this is more a vmlaunch/vmresume error (in the code that switch
  back to L1).
 How is this vmlaunch/vmresume error? This function is called to exit
 from L2 guest while on L2 vcms. It is called asynchronously in respect
 to L2 guest and you modify L2 guest rflags register at unpredictable
 place here.
OK.

 --
  Gleb.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-19 Thread Gleb Natapov
On Thu, Oct 15, 2009 at 04:41:46PM +0200, or...@il.ibm.com wrote:
 From: Orit Wasserman or...@il.ibm.com
 
 ---
  arch/x86/kvm/vmx.c | 1173 
 ++--
  1 files changed, 1148 insertions(+), 25 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 6a4c252..e814029 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -209,6 +209,7 @@ struct __attribute__ ((__packed__)) level_state {
   struct vmcs *vmcs;
   int cpu;
   int launched;
 + bool first_launch;
  };
  
  struct nested_vmx {
 @@ -216,6 +217,12 @@ struct nested_vmx {
   bool vmxon;
   /* What is the location of the  vmcs l1 keeps for l2? (in level1 gpa) */
   u64 vmptr;
 + /* Are we running nested guest */
 + bool nested_mode;
 + /* L1 requested VMLAUNCH or VMRESUME but we didn't run L2 yet */
 + bool nested_run_pending;
 + /* flag indicating if there was a valid IDT after exiting from l2 */
 + bool nested_valid_idt;
   /*
* Level 2 state : includes vmcs,registers and
* a copy of vmcs12 for vmread/vmwrite
 @@ -240,6 +247,10 @@ static inline int vmcs_field_length(unsigned long field)
   return (VMCS_FIELD_LENGTH_MASK  field)  13;
  }
  
 +#define NESTED_VM_EXIT_CONTROLS_MASK (~(VM_EXIT_LOAD_IA32_PAT | \
 + VM_EXIT_SAVE_IA32_PAT))
 +#define NESTED_VM_ENTRY_CONTROLS_MASK (~(VM_ENTRY_LOAD_IA32_PAT | \
 +  VM_ENTRY_IA32E_MODE))
  struct vmcs {
   u32 revision_id;
   u32 abort;
 @@ -303,6 +314,12 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu 
 *vcpu)
   return container_of(vcpu, struct vcpu_vmx, vcpu);
  }
  
 +static inline struct shadow_vmcs *get_shadow_vmcs(struct kvm_vcpu *vcpu)
 +{
 + WARN_ON(!to_vmx(vcpu)-nested.l2_state-shadow_vmcs);
 + return to_vmx(vcpu)-nested.l2_state-shadow_vmcs;
 +}
 +
  #define SHADOW_VMCS_OFFSET(x) offsetof(struct shadow_vmcs, x)
  
  static unsigned short vmcs_field_to_offset_table[HOST_RIP+1] = {
 @@ -822,8 +839,16 @@ static struct kvm_vmx_segment_field {
  static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
  
  static int nested_vmx_check_permission(struct kvm_vcpu *vcpu);
 +static int nested_vmx_check_exception(struct vcpu_vmx *vmx, unsigned nr,
 +   bool has_error_code, u32 error_code);
 +static int nested_vmx_intr(struct kvm_vcpu *vcpu);
  static int create_l1_state(struct kvm_vcpu *vcpu);
  static int create_l2_state(struct kvm_vcpu *vcpu);
 +static int launch_guest(struct kvm_vcpu *vcpu);
 +static int nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu);
 +static int nested_vmx_exit_handled(struct kvm_vcpu *vcpu, bool kvm_override);
 +static int nested_vmx_vmexit(struct kvm_vcpu *vcpu,
 +  bool is_interrupt);
  
  /*
   * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it
 @@ -940,6 +965,18 @@ static inline bool cpu_has_vmx_ept_2m_page(void)
   return !!(vmx_capability.ept  VMX_EPT_2MB_PAGE_BIT);
  }
  
 +static inline int is_exception(u32 intr_info)
 +{
 + return (intr_info  (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
 + == (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK);
 +}
 +
 +static inline int is_nmi(u32 intr_info)
 +{
 + return (intr_info  (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
 + == (INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK);
 +}
 +
  static inline int cpu_has_vmx_invept_individual_addr(void)
  {
   return !!(vmx_capability.ept  VMX_EPT_EXTENT_INDIVIDUAL_BIT);
 @@ -990,6 +1027,51 @@ static inline bool report_flexpriority(void)
   return flexpriority_enabled;
  }
  
 +static inline int nested_cpu_has_vmx_tpr_shadow(struct  kvm_vcpu *vcpu)
 +{
 + return cpu_has_vmx_tpr_shadow() 
 + get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control 
 + CPU_BASED_TPR_SHADOW;
 +}
 +
 +static inline int nested_cpu_has_secondary_exec_ctrls(struct kvm_vcpu *vcpu)
 +{
 + return cpu_has_secondary_exec_ctrls() 
 + get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control 
 + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
 +}
 +
 +static inline bool nested_vm_need_virtualize_apic_accesses(struct kvm_vcpu
 +*vcpu)
 +{
 + return get_shadow_vmcs(vcpu)-secondary_vm_exec_control 
 + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 +}
 +
 +static inline int nested_cpu_has_vmx_ept(struct kvm_vcpu *vcpu)
 +{
 + return get_shadow_vmcs(vcpu)-
 + secondary_vm_exec_control  SECONDARY_EXEC_ENABLE_EPT;
 +}
 +
 +static inline int nested_cpu_has_vmx_vpid(struct kvm_vcpu *vcpu)
 +{
 + return get_shadow_vmcs(vcpu)-secondary_vm_exec_control 
 + SECONDARY_EXEC_ENABLE_VPID;
 +}
 +
 +static inline int nested_cpu_has_vmx_pat(struct kvm_vcpu *vcpu)
 +{
 + return get_shadow_vmcs(vcpu)-vm_entry_controls 
 + 

Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-10-19 Thread Avi Kivity

On 10/15/2009 11:41 PM, or...@il.ibm.com wrote:

From: Orit Wassermanor...@il.ibm.com

---
  arch/x86/kvm/vmx.c | 1173 ++--
  1 files changed, 1148 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a4c252..e814029 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -209,6 +209,7 @@ struct __attribute__ ((__packed__)) level_state {
struct vmcs *vmcs;
int cpu;
int launched;
+   bool first_launch;
  };

  struct nested_vmx {
@@ -216,6 +217,12 @@ struct nested_vmx {
bool vmxon;
/* What is the location of the  vmcs l1 keeps for l2? (in level1 gpa) */
u64 vmptr;
+   /* Are we running nested guest */
+   bool nested_mode;
+   /* L1 requested VMLAUNCH or VMRESUME but we didn't run L2 yet */
+   bool nested_run_pending;
+   /* flag indicating if there was a valid IDT after exiting from l2 */
+   bool nested_valid_idt;
   


Did you mean valid_idt_vectoring_info?

No need to prefix everything with nested_ inside nested_vmx.


+void prepare_vmcs_12(struct kvm_vcpu *vcpu)
+{
+   struct shadow_vmcs *l2_shadow_vmcs =
+   get_shadow_vmcs(vcpu);
+
+   l2_shadow_vmcs-guest_es_selector = vmcs_read16(GUEST_ES_SELECTOR);
+   l2_shadow_vmcs-guest_cs_selector = vmcs_read16(GUEST_CS_SELECTOR);
+   l2_shadow_vmcs-guest_ss_selector = vmcs_read16(GUEST_SS_SELECTOR);
+   l2_shadow_vmcs-guest_ds_selector = vmcs_read16(GUEST_DS_SELECTOR);
+   l2_shadow_vmcs-guest_fs_selector = vmcs_read16(GUEST_FS_SELECTOR);
+   l2_shadow_vmcs-guest_gs_selector = vmcs_read16(GUEST_GS_SELECTOR);
+   l2_shadow_vmcs-guest_ldtr_selector = vmcs_read16(GUEST_LDTR_SELECTOR);
+   l2_shadow_vmcs-guest_tr_selector = vmcs_read16(GUEST_TR_SELECTOR);
+
+   l2_shadow_vmcs-tsc_offset = vmcs_read64(TSC_OFFSET);
+   l2_shadow_vmcs-guest_physical_address =
+   vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+   l2_shadow_vmcs-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER);
   


Physical addresses need translation,  no?


+   l2_shadow_vmcs-guest_cr0 = vmcs_readl(GUEST_CR0);
+
+   l2_shadow_vmcs-guest_cr4 = vmcs_readl(GUEST_CR4);
   


We don't allow the guest to modify these, so no need to read them.  If 
you do, you need to remove the bits that we modify.



+
+int load_vmcs_common(struct shadow_vmcs *src)
+{
+
+   vmcs_write64(VMCS_LINK_POINTER, src-vmcs_link_pointer);
   


Why load this?


+   vmcs_write64(GUEST_IA32_DEBUGCTL, src-guest_ia32_debugctl);
   


I think some features there are dangerous.


+   vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, src-vm_entry_msr_load_count);
   


Need to verify?  Also need to validate the loaded MSRs and run them 
through kvm_set_msr() instead of letting the cpu do it.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html