date:20130925

Re: [PATCH] KVM: ARM: enable Cortex A7 hosts

2013-09-25 Thread Gleb Natapov

On Wed, Sep 25, 2013 at 02:32:31PM +0900, Simon Horman wrote:
 On Thu, Sep 19, 2013 at 02:01:48PM +0200, Ulrich Hecht wrote:
  KVM runs fine on Cortex A7 cores, so they should be enabled. Tested on an
  APE6EVM board (r8a73a4 SoC).
  
  Signed-off-by: Ulrich Hecht ulrich.he...@gmail.com
 
 Hi Ulrich,
 
 I'm not entirely sure but it seems to me that you should expand the
 CC list of this patch someohow as it doesn't seem to have got any
 attention in the week since you sent it.
 
Adding kvm...@lists.cs.columbia.edu will be a good start.

 
 # ./scripts/get_maintainer.pl -f arch/arm/kvm/guest.c
 Christoffer Dall christoffer.d...@linaro.org (supporter:KERNEL VIRTUAL 
 MA...)
 Gleb Natapov g...@redhat.com (supporter:KERNEL VIRTUAL MA...)
 Paolo Bonzini pbonz...@redhat.com (supporter:KERNEL VIRTUAL MA...)
 Russell King li...@arm.linux.org.uk (maintainer:ARM PORT)
 kvm...@lists.cs.columbia.edu (open list:KERNEL VIRTUAL MA...)
 kvm@vger.kernel.org (open list:KERNEL VIRTUAL MA...)
 linux-arm-ker...@lists.infradead.org (moderated list:ARM PORT)
 linux-ker...@vger.kernel.org (open list)
 
 
 
  ---
   arch/arm/kvm/guest.c | 2 ++
   1 file changed, 2 insertions(+)
  
  diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
  index 152d036..05c62d5 100644
  --- a/arch/arm/kvm/guest.c
  +++ b/arch/arm/kvm/guest.c
  @@ -192,6 +192,8 @@ int __attribute_const__ kvm_target_cpu(void)
  switch (part_number) {
  case ARM_CPU_PART_CORTEX_A15:
  return KVM_ARM_TARGET_CORTEX_A15;
  +   case ARM_CPU_PART_CORTEX_A7:
  +   return KVM_ARM_TARGET_CORTEX_A15;
  default:
  return -EINVAL;
  }
  -- 
  1.8.3.1
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-sh in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined

2013-09-25 Thread Gleb Natapov

Bit 12 is undefined in any of the following cases:
- If the NMI exiting VM-execution control is 1 and the virtual NMIs
  VM-execution control is 0.
- If the VM exit sets the valid bit in the IDT-vectoring information field

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a1216de..0e06c1c4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5345,7 +5345,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 * There are errata that may cause this bit to not be set:
 * AAK134, BY25.
 */
-   if (exit_qualification  INTR_INFO_UNBLOCK_NMI)
+   if (!(to_vmx(vcpu)-idt_vectoring_info  VECTORING_INFO_VALID_MASK) 
+   cpu_has_virtual_nmis() 
+   exit_qualification  INTR_INFO_UNBLOCK_NMI)
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, 
GUEST_INTR_STATE_NMI);
 
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yield CPU on poll for serial ports

2013-09-25 Thread Dmitry Ursegov

Hi,

Is it possible to implement Yield CPU on poll option similar to VmWare:
http://www.vmware.com/support/ws45/doc/devices_serial_ws.html#1023869 ?
I use kvm for windows kernel development and it will be great to not
have 100% cpu load during a debugging over serial line session.

Thanks,
Dmitry
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT PULL] KVM/ARM Fixes for 3.12-rc2

2013-09-25 Thread Paolo Bonzini

Il 24/09/2013 21:49, Christoffer Dall ha scritto:
 The following changes since commit 62d228b8c676232eca579f91cc0782b060a59097:
 
   Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm 
 (2013-09-17 22:20:30 -0400)
 
 are available in the git repository at:
 
 
   git://git.linaro.org/people/cdall/linux-kvm-arm.git 
 tags/kvm-arm-fixes-3.12-rc2
 
 for you to fetch changes up to ac570e0493815e0b41681c89cb50d66421429d27:
 
   ARM: kvm: rename cpu_reset to avoid name clash (2013-09-24 11:15:05 -0700)
 
 
 KVM/ARM Fixes for 3.12-rc2
 
 Fixes compilation error with KVM/ARM enabled.
 
 
 Olof Johansson (1):
   ARM: kvm: rename cpu_reset to avoid name clash
 
  arch/arm/kvm/reset.c |6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 09:58, Gleb Natapov ha scritto:
 Bit 12 is undefined in any of the following cases:
 - If the NMI exiting VM-execution control is 1 and the virtual NMIs
   VM-execution control is 0.
 - If the VM exit sets the valid bit in the IDT-vectoring information field
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index a1216de..0e06c1c4 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -5345,7 +5345,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
* There are errata that may cause this bit to not be set:
* AAK134, BY25.
*/
 - if (exit_qualification  INTR_INFO_UNBLOCK_NMI)
 + if (!(to_vmx(vcpu)-idt_vectoring_info  VECTORING_INFO_VALID_MASK) 
 + cpu_has_virtual_nmis() 
 + exit_qualification  INTR_INFO_UNBLOCK_NMI)
   vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, 
 GUEST_INTR_STATE_NMI);
  
   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Applied to kvm/master, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] KVM: nVMX: Do not put exception that caused vmexit to IDT_VECTORING_INFO

2013-09-25 Thread Gleb Natapov

If an exception causes vmexit directly it should not be reported in
IDT_VECTORING_INFO during the exit. For that we need to be able to
distinguish between exception that is injected into nested VM and one that
is reinjected because its delivery failed. Fortunately we already have
mechanism to do so for nested SVM, so here we just use correct function
to requeue exceptions and make sure that reinjected exception is not
moved to IDT_VECTORING_INFO during vmexit emulation and not re-checked
for interception during delivery.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/vmx.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7eb0512..663bc5e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1921,7 +1921,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
unsigned nr,
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
-   if (nr == PF_VECTOR  is_guest_mode(vcpu) 
+   if (!reinject  nr == PF_VECTOR  is_guest_mode(vcpu) 
!vmx-nested.nested_run_pending  nested_pf_handled(vcpu))
return;
 
@@ -7053,9 +7053,9 @@ static void __vmx_complete_interrupts(struct kvm_vcpu 
*vcpu,
case INTR_TYPE_HARD_EXCEPTION:
if (idt_vectoring_info  VECTORING_INFO_DELIVER_CODE_MASK) {
u32 err = vmcs_read32(error_code_field);
-   kvm_queue_exception_e(vcpu, vector, err);
+   kvm_requeue_exception_e(vcpu, vector, err);
} else
-   kvm_queue_exception(vcpu, vector);
+   kvm_requeue_exception(vcpu, vector);
break;
case INTR_TYPE_SOFT_INTR:
vcpu-arch.event_exit_inst_len = vmcs_read32(instr_len_field);
@@ -8013,7 +8013,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu 
*vcpu,
u32 idt_vectoring;
unsigned int nr;
 
-   if (vcpu-arch.exception.pending) {
+   if (vcpu-arch.exception.pending  vcpu-arch.exception.reinject) {
nr = vcpu-arch.exception.nr;
idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
 
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Gleb Natapov

All exceptions should be checked for intercept during delivery to L2,
but we check only #PF currently. Drop nested_run_pending while we are
at it since exception cannot be injected during vmentry anyway.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/vmx.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 663bc5e..5bfa09d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct kvm_vcpu 
*vcpu)
  * a #PF exception (this is the only case in which KVM injects a #PF when L2
  * is running).
  */
-static int nested_pf_handled(struct kvm_vcpu *vcpu)
+static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr)
 {
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 
-   /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
-   if (!(vmcs12-exception_bitmap  (1u  PF_VECTOR)))
+   if (!(vmcs12-exception_bitmap  (1u  nr)))
return 0;
 
nested_vmx_vmexit(vcpu);
@@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
unsigned nr,
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
-   if (!reinject  nr == PF_VECTOR  is_guest_mode(vcpu) 
-   !vmx-nested.nested_run_pending  nested_pf_handled(vcpu))
+   if (!reinject  is_guest_mode(vcpu)  nested_ex_handled(vcpu, nr))
return;
 
if (has_error_code) {
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] Fix shadow-on-shadow nested VMX

2013-09-25 Thread Gleb Natapov

This series fixes shadow-on-shadow nested VMX (at least for me).

Gleb Natapov (4):
  KVM: nVMX: Amend nested_run_pending logic
  KVM: nVMX: Do not put exception that caused vmexit to
IDT_VECTORING_INFO
  KVM: nVMX: Check all exceptions for intercept during delivery to L2
  KVM: nVMX: Do not generate #DF if #PF happens during exception
delivery into L2

 arch/x86/kvm/vmx.c | 60 ++
 1 file changed, 38 insertions(+), 22 deletions(-)

-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

2013-09-25 Thread Gleb Natapov

If #PF happens during delivery of an exception into L2 and L1 also do
not have the page mapped in its shadow page table then L0 needs to
generate vmexit to L2 with original event in IDT_VECTORING_INFO, but
current code combines both exception and generates #DF instead. Fix that
by providing nVMX specific function to handle page faults during page
table walk that handles this case correctly.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/vmx.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5bfa09d..07c36fd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7520,6 +7520,20 @@ static void nested_ept_uninit_mmu_context(struct 
kvm_vcpu *vcpu)
vcpu-arch.walk_mmu = vcpu-arch.mmu;
 }
 
+static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
+   struct x86_exception *fault)
+{
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+   WARN_ON(!is_guest_mode(vcpu));
+
+   /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
+   if (vmcs12-exception_bitmap  (1u  PF_VECTOR))
+   nested_vmx_vmexit(vcpu);
+   else
+   kvm_inject_page_fault(vcpu, fault);
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function merges it
@@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
kvm_set_cr3(vcpu, vmcs12-guest_cr3);
kvm_mmu_reset_context(vcpu);
 
+   if (!enable_ept)
+   vcpu-arch.walk_mmu-inject_page_fault = 
vmx_inject_page_fault_nested;
+
/*
 * L1 may access the L2's PDPTR, so save them to construct vmcs12
 */
@@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
kvm_set_cr3(vcpu, vmcs12-host_cr3);
kvm_mmu_reset_context(vcpu);
 
+   if (!enable_ept)
+   vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;
+
if (enable_vpid) {
/*
 * Trivially support vpid by letting L2s share their parent
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] KVM: nVMX: Amend nested_run_pending logic

2013-09-25 Thread Gleb Natapov

EXIT_REASON_VMLAUNCH/EXIT_REASON_VMRESUME exit does not mean that nested
VM will actually run during next entry. Move setting nested_run_pending
closer to vmentry emulation code and move its clearing close to vmexit to
minimize amount of code that will erroneously run with nested_run_pending
set.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/vmx.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e6e8fbc..7eb0512 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6742,20 +6742,6 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
if (vmx-emulation_required)
return handle_invalid_guest_state(vcpu);
 
-   /*
-* the KVM_REQ_EVENT optimization bit is only on for one entry, and if
-* we did not inject a still-pending event to L1 now because of
-* nested_run_pending, we need to re-enable this bit.
-*/
-   if (vmx-nested.nested_run_pending)
-   kvm_make_request(KVM_REQ_EVENT, vcpu);
-
-   if (!is_guest_mode(vcpu)  (exit_reason == EXIT_REASON_VMLAUNCH ||
-   exit_reason == EXIT_REASON_VMRESUME))
-   vmx-nested.nested_run_pending = 1;
-   else
-   vmx-nested.nested_run_pending = 0;
-
if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
nested_vmx_vmexit(vcpu);
return 1;
@@ -7290,6 +7276,16 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx-exit_reason = vmcs_read32(VM_EXIT_REASON);
trace_kvm_exit(vmx-exit_reason, vcpu, KVM_ISA_VMX);
 
+   /*
+* the KVM_REQ_EVENT optimization bit is only on for one entry, and if
+* we did not inject a still-pending event to L1 now because of
+* nested_run_pending, we need to re-enable this bit.
+*/
+   if (vmx-nested.nested_run_pending)
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+
+   vmx-nested.nested_run_pending = 0;
+
vmx_complete_atomic_exit(vmx);
vmx_recover_nmi_blocking(vmx);
vmx_complete_interrupts(vmx);
@@ -7948,6 +7944,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
 
enter_guest_mode(vcpu);
 
+   vmx-nested.nested_run_pending = 1;
+
vmx-nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
 
cpu = get_cpu();
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 11:51, Gleb Natapov ha scritto:
 All exceptions should be checked for intercept during delivery to L2,
 but we check only #PF currently. Drop nested_run_pending while we are
 at it since exception cannot be injected during vmentry anyway.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  arch/x86/kvm/vmx.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 663bc5e..5bfa09d 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct kvm_vcpu 
 *vcpu)
   * a #PF exception (this is the only case in which KVM injects a #PF when L2
   * is running).
   */
 -static int nested_pf_handled(struct kvm_vcpu *vcpu)
 +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr)
  {
   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
  
 - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
 - if (!(vmcs12-exception_bitmap  (1u  PF_VECTOR)))
 + if (!(vmcs12-exception_bitmap  (1u  nr)))
   return 0;
  
   nested_vmx_vmexit(vcpu);
 @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
 unsigned nr,
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 intr_info = nr | INTR_INFO_VALID_MASK;
  
 - if (!reinject  nr == PF_VECTOR  is_guest_mode(vcpu) 
 - !vmx-nested.nested_run_pending  nested_pf_handled(vcpu))
 + if (!reinject  is_guest_mode(vcpu)  nested_ex_handled(vcpu, nr))

The code is now pretty similar to what svm.c does.  Do we want to move
the is_guest_mode(vcpu) check into nested_ex_handled, too?  (Or vice
versa, take it out in svm.c).  Perhaps you could also name the function
nested_vmx_check_exception.

Paolo

   return;
  
   if (has_error_code) {
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Gleb Natapov

On Wed, Sep 25, 2013 at 12:38:20PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 11:51, Gleb Natapov ha scritto:
  All exceptions should be checked for intercept during delivery to L2,
  but we check only #PF currently. Drop nested_run_pending while we are
  at it since exception cannot be injected during vmentry anyway.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/kvm/vmx.c | 8 +++-
   1 file changed, 3 insertions(+), 5 deletions(-)
  
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index 663bc5e..5bfa09d 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct 
  kvm_vcpu *vcpu)
* a #PF exception (this is the only case in which KVM injects a #PF when 
  L2
* is running).
*/
  -static int nested_pf_handled(struct kvm_vcpu *vcpu)
  +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr)
   {
  struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
   
  -   /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
  -   if (!(vmcs12-exception_bitmap  (1u  PF_VECTOR)))
  +   if (!(vmcs12-exception_bitmap  (1u  nr)))
  return 0;
   
  nested_vmx_vmexit(vcpu);
  @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu 
  *vcpu, unsigned nr,
  struct vcpu_vmx *vmx = to_vmx(vcpu);
  u32 intr_info = nr | INTR_INFO_VALID_MASK;
   
  -   if (!reinject  nr == PF_VECTOR  is_guest_mode(vcpu) 
  -   !vmx-nested.nested_run_pending  nested_pf_handled(vcpu))
  +   if (!reinject  is_guest_mode(vcpu)  nested_ex_handled(vcpu, nr))
 
 The code is now pretty similar to what svm.c does.  Do we want to move
 the is_guest_mode(vcpu) check into nested_ex_handled, too?  (Or vice
 versa, take it out in svm.c).  Perhaps you could also name the function
 nested_vmx_check_exception.
 
I want to try to move the logic into common code eventually. I do not
mind renaming for now, but it will have to wait for the next week :)


--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 11:51, Gleb Natapov ha scritto:
 If #PF happens during delivery of an exception into L2 and L1 also do
 not have the page mapped in its shadow page table then L0 needs to
 generate vmexit to L2 with original event in IDT_VECTORING_INFO, but
 current code combines both exception and generates #DF instead. Fix that
 by providing nVMX specific function to handle page faults during page
 table walk that handles this case correctly.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  arch/x86/kvm/vmx.c | 20 
  1 file changed, 20 insertions(+)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 5bfa09d..07c36fd 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7520,6 +7520,20 @@ static void nested_ept_uninit_mmu_context(struct 
 kvm_vcpu *vcpu)
   vcpu-arch.walk_mmu = vcpu-arch.mmu;
  }
  
 +static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
 + struct x86_exception *fault)
 +{
 + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 +
 + WARN_ON(!is_guest_mode(vcpu));
 +
 + /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
 + if (vmcs12-exception_bitmap  (1u  PF_VECTOR))
 + nested_vmx_vmexit(vcpu);
 + else
 + kvm_inject_page_fault(vcpu, fault);
 +}
 +
  /*
   * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
   * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function merges it
 @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   kvm_set_cr3(vcpu, vmcs12-guest_cr3);
   kvm_mmu_reset_context(vcpu);
  
 + if (!enable_ept)
 + vcpu-arch.walk_mmu-inject_page_fault = 
 vmx_inject_page_fault_nested;
 +
   /*
* L1 may access the L2's PDPTR, so save them to construct vmcs12
*/
 @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
 *vcpu,
   kvm_set_cr3(vcpu, vmcs12-host_cr3);
   kvm_mmu_reset_context(vcpu);
  
 + if (!enable_ept)
 + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;

This is strictly speaking not needed, because kvm_mmu_reset_context
takes care of it.

But I wonder if it is cleaner to not touch the struct here, and instead
add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like
kvm_x86_ops-set_cr3.  The new member can do something like

if (is_guest_mode(vcpu)) {
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
if (vmcs12-exception_bitmap  (1u  PF_VECTOR)) {
nested_vmx_vmexit(vcpu);
return;
}
}

kvm_inject_page_fault(vcpu, fault);

Marcelo, Jan, what do you think?

Alex (or Gleb :)), do you have any idea why SVM does not need this?

Paolo

   if (enable_vpid) {
   /*
* Trivially support vpid by letting L2s share their parent
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 13:00, Gleb Natapov ha scritto:
   @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu 
   *vcpu, unsigned nr,
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 intr_info = nr | INTR_INFO_VALID_MASK;

   -   if (!reinject  nr == PF_VECTOR  is_guest_mode(vcpu) 
   -   !vmx-nested.nested_run_pending  nested_pf_handled(vcpu))
   +   if (!reinject  is_guest_mode(vcpu)  nested_ex_handled(vcpu, 
   nr))
  
  The code is now pretty similar to what svm.c does.  Do we want to move
  the is_guest_mode(vcpu) check into nested_ex_handled, too?  (Or vice
  versa, take it out in svm.c).  Perhaps you could also name the function
  nested_vmx_check_exception.
  
 I want to try to move the logic into common code eventually. I do not
 mind renaming for now, but it will have to wait for the next week :)

I can rename while applying the patch.  Making more logic common to vmx
and svm can wait.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

2013-09-25 Thread Gleb Natapov

On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 11:51, Gleb Natapov ha scritto:
  If #PF happens during delivery of an exception into L2 and L1 also do
  not have the page mapped in its shadow page table then L0 needs to
  generate vmexit to L2 with original event in IDT_VECTORING_INFO, but
  current code combines both exception and generates #DF instead. Fix that
  by providing nVMX specific function to handle page faults during page
  table walk that handles this case correctly.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/kvm/vmx.c | 20 
   1 file changed, 20 insertions(+)
  
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index 5bfa09d..07c36fd 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -7520,6 +7520,20 @@ static void nested_ept_uninit_mmu_context(struct 
  kvm_vcpu *vcpu)
  vcpu-arch.walk_mmu = vcpu-arch.mmu;
   }
   
  +static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
  +   struct x86_exception *fault)
  +{
  +   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
  +
  +   WARN_ON(!is_guest_mode(vcpu));
  +
  +   /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
  +   if (vmcs12-exception_bitmap  (1u  PF_VECTOR))
  +   nested_vmx_vmexit(vcpu);
  +   else
  +   kvm_inject_page_fault(vcpu, fault);
  +}
  +
   /*
* prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
* L2 guest. L1 has a vmcs for L2 (vmcs12), and this function merges it
  @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
  struct vmcs12 *vmcs12)
  kvm_set_cr3(vcpu, vmcs12-guest_cr3);
  kvm_mmu_reset_context(vcpu);
   
  +   if (!enable_ept)
  +   vcpu-arch.walk_mmu-inject_page_fault = 
  vmx_inject_page_fault_nested;
  +
  /*
   * L1 may access the L2's PDPTR, so save them to construct vmcs12
   */
  @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
  *vcpu,
  kvm_set_cr3(vcpu, vmcs12-host_cr3);
  kvm_mmu_reset_context(vcpu);
   
  +   if (!enable_ept)
  +   vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;
 
 This is strictly speaking not needed, because kvm_mmu_reset_context
 takes care of it.
 
Yeah, but better make it explicit, it does not hurt but make it more
clear what is going on. Or at least add comment above
kvm_mmu_reset_context() about this side effect.

 But I wonder if it is cleaner to not touch the struct here, and instead
 add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like
 kvm_x86_ops-set_cr3.  The new member can do something like
 
   if (is_guest_mode(vcpu)) {
   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
   if (vmcs12-exception_bitmap  (1u  PF_VECTOR)) {
   nested_vmx_vmexit(vcpu);
   return;
   }
   }
 
   kvm_inject_page_fault(vcpu, fault);
I do not quite understand what you mean here. inject_page_fault() is
called from the depth of page table walking. How the code will not to
call new member in some circumstances?

 
 Marcelo, Jan, what do you think?
 
 Alex (or Gleb :)), do you have any idea why SVM does not need this?
 
It's probably needed there too. At least I fail to see why it does
not. Without that patch guest is actually booting (most of the times),
but sometimes random processes crash with double fault exception.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Gleb Natapov

On Wed, Sep 25, 2013 at 01:25:39PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 13:00, Gleb Natapov ha scritto:
@@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu 
*vcpu, unsigned nr,
  struct vcpu_vmx *vmx = to_vmx(vcpu);
  u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
- if (!reinject  nr == PF_VECTOR  is_guest_mode(vcpu) 
- !vmx-nested.nested_run_pending  nested_pf_handled(vcpu))
+ if (!reinject  is_guest_mode(vcpu)  nested_ex_handled(vcpu, 
nr))
   
   The code is now pretty similar to what svm.c does.  Do we want to move
   the is_guest_mode(vcpu) check into nested_ex_handled, too?  (Or vice
   versa, take it out in svm.c).  Perhaps you could also name the function
   nested_vmx_check_exception.
   
  I want to try to move the logic into common code eventually. I do not
  mind renaming for now, but it will have to wait for the next week :)
 
 I can rename while applying the patch.  Making more logic common to vmx
 and svm can wait.
 
Yes, of course, logic unification is not for the immediate feature.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 13:51, Gleb Natapov ha scritto:
 On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 11:51, Gleb Natapov ha scritto:
 @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
 kvm_set_cr3(vcpu, vmcs12-guest_cr3);
 kvm_mmu_reset_context(vcpu);
  
 +   if (!enable_ept)
 +   vcpu-arch.walk_mmu-inject_page_fault = 
 vmx_inject_page_fault_nested;
 +
 /*
  * L1 may access the L2's PDPTR, so save them to construct vmcs12
  */
 @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
 *vcpu,
 kvm_set_cr3(vcpu, vmcs12-host_cr3);
 kvm_mmu_reset_context(vcpu);
  
 +   if (!enable_ept)
 +   vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;

 This is strictly speaking not needed, because kvm_mmu_reset_context
 takes care of it.

 Yeah, but better make it explicit, it does not hurt but make it more
 clear what is going on. Or at least add comment above
 kvm_mmu_reset_context() about this side effect.

Yes, I agree the code is cleaner like you wrote it.

 But I wonder if it is cleaner to not touch the struct here, and instead
 add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like
 kvm_x86_ops-set_cr3.  The new member can do something like

  if (is_guest_mode(vcpu)) {
  struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
  if (vmcs12-exception_bitmap  (1u  PF_VECTOR)) {
  nested_vmx_vmexit(vcpu);
  return;
  }
  }

  kvm_inject_page_fault(vcpu, fault);
 
 I do not quite understand what you mean here. inject_page_fault() is
 called from the depth of page table walking. How the code will not to
 call new member in some circumstances?

IIUC the new function is called if and only if is_guest_mode(vcpu)  
!enable_ept.  So what I'm suggesting is something like this:

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -735,6 +735,8 @@ struct kvm_x86_ops {
void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool 
host);
 
void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
+   void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault);
 
void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
 
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
vcpu-arch.walk_mmu-set_cr3   = kvm_x86_ops-set_cr3;
vcpu-arch.walk_mmu-get_cr3   = get_cr3;
vcpu-arch.walk_mmu-get_pdptr = kvm_pdptr_read;
-   vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;
+   vcpu-arch.walk_mmu-inject_page_fault = 
kvm_x86_ops-inject_softmmu_page_fault;
 
return r;
 }
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu 
*vcpu,
vmcs12-guest_physical_address = fault-address;
 }
 
+static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu,
+   struct x86_exception *fault)
+{
+   if (is_guest_mode(vcpu)) {
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+   if (vmcs12-exception_bitmap  (1u  PF_VECTOR)) {
+   nested_vmx_vmexit(vcpu);
+   return;
+   }
+   }
+
+   kvm_inject_page_fault(vcpu, fault);
+}
+
 /* Callbacks for nested_ept_init_mmu_context: */
 
 static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
@@ -8490,6 +8504,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.read_l1_tsc = vmx_read_l1_tsc,
 
.set_tdp_cr3 = vmx_set_cr3,
+   .inject_nested_tdp_pagefault = vmx_set_cr3,
 
.check_intercept = vmx_check_intercept,
.handle_external_intr = vmx_handle_external_intr,
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4347,6 +4347,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.read_l1_tsc = svm_read_l1_tsc,
 
.set_tdp_cr3 = set_tdp_cr3,
+   .inject_nested_tdp_pagefault = kvm_inject_page_fault, /*FIXME*/
 
.check_intercept = svm_check_intercept,
.handle_external_intr = svm_handle_external_intr,

 Alex (or Gleb :)), do you have any idea why SVM does not need this?

 It's probably needed there too. At least I fail to see why it does
 not. Without that patch guest is actually booting (most of the times),
 but sometimes random processes crash with double fault exception.

Sounds indeed like the same bug.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

2013-09-25 Thread Gleb Natapov

On Wed, Sep 25, 2013 at 02:08:09PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 13:51, Gleb Natapov ha scritto:
  On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote:
  Il 25/09/2013 11:51, Gleb Natapov ha scritto:
  @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
  struct vmcs12 *vmcs12)
kvm_set_cr3(vcpu, vmcs12-guest_cr3);
kvm_mmu_reset_context(vcpu);
   
  + if (!enable_ept)
  + vcpu-arch.walk_mmu-inject_page_fault = 
  vmx_inject_page_fault_nested;
  +
/*
 * L1 may access the L2's PDPTR, so save them to construct vmcs12
 */
  @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
  *vcpu,
kvm_set_cr3(vcpu, vmcs12-host_cr3);
kvm_mmu_reset_context(vcpu);
   
  + if (!enable_ept)
  + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;
 
  This is strictly speaking not needed, because kvm_mmu_reset_context
  takes care of it.
 
  Yeah, but better make it explicit, it does not hurt but make it more
  clear what is going on. Or at least add comment above
  kvm_mmu_reset_context() about this side effect.
 
 Yes, I agree the code is cleaner like you wrote it.
 
  But I wonder if it is cleaner to not touch the struct here, and instead
  add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like
  kvm_x86_ops-set_cr3.  The new member can do something like
 
 if (is_guest_mode(vcpu)) {
 struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 if (vmcs12-exception_bitmap  (1u  PF_VECTOR)) {
 nested_vmx_vmexit(vcpu);
 return;
 }
 }
 
 kvm_inject_page_fault(vcpu, fault);
  
  I do not quite understand what you mean here. inject_page_fault() is
  called from the depth of page table walking. How the code will not to
  call new member in some circumstances?
 
 IIUC the new function is called if and only if is_guest_mode(vcpu)  
 !enable_ept.  So what I'm suggesting is something like this:
 
Ah I see, so you propose to check for guest mode and enable_ept in the
function instead of switching to another function, but switching to
another function is how code was designed to be. Nested NPT/EPT provide
their own function too, but there is nothing that stops you from
checking on what MMU you are now in the function itself.

 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -735,6 +735,8 @@ struct kvm_x86_ops {
   void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool 
 host);
  
   void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
 + void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu,
 +   struct x86_exception *fault);
  
   void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
  
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
   vcpu-arch.walk_mmu-set_cr3   = kvm_x86_ops-set_cr3;
   vcpu-arch.walk_mmu-get_cr3   = get_cr3;
   vcpu-arch.walk_mmu-get_pdptr = kvm_pdptr_read;
 - vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;
 + vcpu-arch.walk_mmu-inject_page_fault = 
 kvm_x86_ops-inject_softmmu_page_fault;
  
   return r;
  }
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct 
 kvm_vcpu *vcpu,
   vmcs12-guest_physical_address = fault-address;
  }
  
 +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu,
 + struct x86_exception *fault)
 +{
 + if (is_guest_mode(vcpu)) {
is_guest_mode(vcpu)  !enable_ept

 + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 + if (vmcs12-exception_bitmap  (1u  PF_VECTOR)) {
 + nested_vmx_vmexit(vcpu);
 + return;
 + }
 + }
 +
 + kvm_inject_page_fault(vcpu, fault);
 +}
 +
  /* Callbacks for nested_ept_init_mmu_context: */
  
  static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
 @@ -8490,6 +8504,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
   .read_l1_tsc = vmx_read_l1_tsc,
  
   .set_tdp_cr3 = vmx_set_cr3,
 + .inject_nested_tdp_pagefault = vmx_set_cr3,
  
   .check_intercept = vmx_check_intercept,
   .handle_external_intr = vmx_handle_external_intr,
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -4347,6 +4347,7 @@ static struct kvm_x86_ops svm_x86_ops = {
   .read_l1_tsc = svm_read_l1_tsc,
  
   .set_tdp_cr3 = set_tdp_cr3,
 + .inject_nested_tdp_pagefault = kvm_inject_page_fault, /*FIXME*/
  
   .check_intercept = svm_check_intercept,
   .handle_external_intr = svm_handle_external_intr,
 
  Alex (or Gleb :)), do you have any idea why SVM does not need this?
 
  It's probably needed there too. At least I fail to see why it does
  not. Without that patch guest is actually booting (most of the times),

Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 14:21, Gleb Natapov ha scritto:
 On Wed, Sep 25, 2013 at 02:08:09PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 13:51, Gleb Natapov ha scritto:
 On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 11:51, Gleb Natapov ha scritto:
 @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   kvm_set_cr3(vcpu, vmcs12-guest_cr3);
   kvm_mmu_reset_context(vcpu);
  
 + if (!enable_ept)
 + vcpu-arch.walk_mmu-inject_page_fault = 
 vmx_inject_page_fault_nested;
 +
   /*
* L1 may access the L2's PDPTR, so save them to construct vmcs12
*/
 @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu 
 *vcpu,
   kvm_set_cr3(vcpu, vmcs12-host_cr3);
   kvm_mmu_reset_context(vcpu);
  
 + if (!enable_ept)
 + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;

 This is strictly speaking not needed, because kvm_mmu_reset_context
 takes care of it.

 Yeah, but better make it explicit, it does not hurt but make it more
 clear what is going on. Or at least add comment above
 kvm_mmu_reset_context() about this side effect.

 Yes, I agree the code is cleaner like you wrote it.

 But I wonder if it is cleaner to not touch the struct here, and instead
 add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like
 kvm_x86_ops-set_cr3.  The new member can do something like

if (is_guest_mode(vcpu)) {
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
if (vmcs12-exception_bitmap  (1u  PF_VECTOR)) {
nested_vmx_vmexit(vcpu);
return;
}
}

kvm_inject_page_fault(vcpu, fault);

 I do not quite understand what you mean here. inject_page_fault() is
 called from the depth of page table walking. How the code will not to
 call new member in some circumstances?

 IIUC the new function is called if and only if is_guest_mode(vcpu)  
 !enable_ept.  So what I'm suggesting is something like this:

 Ah I see, so you propose to check for guest mode and enable_ept in the
 function instead of switching to another function, but switching to
 another function is how code was designed to be.

You do not need to check enable_ept if I understand the code correctly,
because the new function is specifically called in init_kvm_softmmu,
i.e. not for nested_mmu and not for tdp_enabled.

I'm asking because I didn't find any other place that modifies function
pointers this way after kvm_mmu_reset_context.

 Nested NPT/EPT provide
 their own function too, but there is nothing that stops you from
 checking on what MMU you are now in the function itself.

The difference is that NPT/EPT use a completely different paging mode
for nested and non-nested (non-nested uses direct mapping, nested uses
shadow mapping).  Shadow paging is really the same thing for nested and
non-nested, you just have to do the injection the right way.

 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -735,6 +735,8 @@ struct kvm_x86_ops {
  void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool 
 host);
  
  void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
 +void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu,
 +  struct x86_exception *fault);
  
  void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
  
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
  vcpu-arch.walk_mmu-set_cr3   = kvm_x86_ops-set_cr3;
  vcpu-arch.walk_mmu-get_cr3   = get_cr3;
  vcpu-arch.walk_mmu-get_pdptr = kvm_pdptr_read;
 -vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;
 +vcpu-arch.walk_mmu-inject_page_fault = 
 kvm_x86_ops-inject_softmmu_page_fault;
  
  return r;
  }
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct 
 kvm_vcpu *vcpu,
  vmcs12-guest_physical_address = fault-address;
  }
  
 +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu,
 +struct x86_exception *fault)
 +{
 +if (is_guest_mode(vcpu)) {
 is_guest_mode(vcpu)  !enable_ept

You don't really need to check for enable_ept (perhaps
WARN_ON(enable_ept) instead) because the function is not used always,
only in init_kvm_softmmu.

 I described what I saw with VMX, I am not saying the same happens with
 SVM :) I just do not see why it should not and the non fatality of the
 BUG can explain why it was missed.

Interesting, I got something completely different.  The guest just got
stuck before even getting to the GRUB prompt.  I'm trying your patches
now...

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

2013-09-25 Thread Gleb Natapov

On Wed, Sep 25, 2013 at 03:26:56PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 14:21, Gleb Natapov ha scritto:
  On Wed, Sep 25, 2013 at 02:08:09PM +0200, Paolo Bonzini wrote:
  Il 25/09/2013 13:51, Gleb Natapov ha scritto:
  On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote:
  Il 25/09/2013 11:51, Gleb Natapov ha scritto:
  @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
  struct vmcs12 *vmcs12)
  kvm_set_cr3(vcpu, vmcs12-guest_cr3);
  kvm_mmu_reset_context(vcpu);
   
  +   if (!enable_ept)
  +   vcpu-arch.walk_mmu-inject_page_fault = 
  vmx_inject_page_fault_nested;
  +
  /*
   * L1 may access the L2's PDPTR, so save them to construct 
  vmcs12
   */
  @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct 
  kvm_vcpu *vcpu,
  kvm_set_cr3(vcpu, vmcs12-host_cr3);
  kvm_mmu_reset_context(vcpu);
   
  +   if (!enable_ept)
  +   vcpu-arch.walk_mmu-inject_page_fault = 
  kvm_inject_page_fault;
 
  This is strictly speaking not needed, because kvm_mmu_reset_context
  takes care of it.
 
  Yeah, but better make it explicit, it does not hurt but make it more
  clear what is going on. Or at least add comment above
  kvm_mmu_reset_context() about this side effect.
 
  Yes, I agree the code is cleaner like you wrote it.
 
  But I wonder if it is cleaner to not touch the struct here, and instead
  add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like
  kvm_x86_ops-set_cr3.  The new member can do something like
 
   if (is_guest_mode(vcpu)) {
   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
   if (vmcs12-exception_bitmap  (1u  PF_VECTOR)) {
   nested_vmx_vmexit(vcpu);
   return;
   }
   }
 
   kvm_inject_page_fault(vcpu, fault);
 
  I do not quite understand what you mean here. inject_page_fault() is
  called from the depth of page table walking. How the code will not to
  call new member in some circumstances?
 
  IIUC the new function is called if and only if is_guest_mode(vcpu)  
  !enable_ept.  So what I'm suggesting is something like this:
 
  Ah I see, so you propose to check for guest mode and enable_ept in the
  function instead of switching to another function, but switching to
  another function is how code was designed to be.
 
 You do not need to check enable_ept if I understand the code correctly,
 because the new function is specifically called in init_kvm_softmmu,
 i.e. not for nested_mmu and not for tdp_enabled.
 
Ah true. With EPT shadow page code will not run so function will not be
called.

 I'm asking because I didn't find any other place that modifies function
 pointers this way after kvm_mmu_reset_context.
 
nested_ept_init_mmu_context() modify the pointer. Not after
kvm_mmu_reset_context() but it does not matter much. The canonical way
to do what I did here would be to create special mmu context to handle
shadow on shadow, bit the only difference between it and regular shadow
one would be this pointer, so it looked like overkill. Just changing the
pointer do the trick.

  Nested NPT/EPT provide
  their own function too, but there is nothing that stops you from
  checking on what MMU you are now in the function itself.
 
 The difference is that NPT/EPT use a completely different paging mode
 for nested and non-nested (non-nested uses direct mapping, nested uses
 shadow mapping).  Shadow paging is really the same thing for nested and
 non-nested, you just have to do the injection the right way.
 
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -735,6 +735,8 @@ struct kvm_x86_ops {
 void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool 
  host);
   
 void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
  +  void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu,
  +struct x86_exception *fault);
   
 void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
   
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 vcpu-arch.walk_mmu-set_cr3   = kvm_x86_ops-set_cr3;
 vcpu-arch.walk_mmu-get_cr3   = get_cr3;
 vcpu-arch.walk_mmu-get_pdptr = kvm_pdptr_read;
  -  vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault;
  +  vcpu-arch.walk_mmu-inject_page_fault = 
  kvm_x86_ops-inject_softmmu_page_fault;
   
 return r;
   }
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct 
  kvm_vcpu *vcpu,
 vmcs12-guest_physical_address = fault-address;
   }
   
  +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu,
  +  struct x86_exception *fault)
  +{
  +  if (is_guest_mode(vcpu)) {
  is_guest_mode(vcpu)  !enable_ept
 
 You don't really need to check for enable_ept (perhaps

Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-25 Thread Paolo Bonzini

Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.
 
 With this patch, nested VMX preemption timer features are fully
 supported.
 
 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
 ChangeLog to v4:
   Format changes and remove a flag in nested_vmx.
  arch/x86/include/uapi/asm/msr-index.h |1 +
  arch/x86/kvm/vmx.c|   44 
 +++--
  2 files changed, 43 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/uapi/asm/msr-index.h 
 b/arch/x86/include/uapi/asm/msr-index.h
 index bb04650..b93e09a 100644
 --- a/arch/x86/include/uapi/asm/msr-index.h
 +++ b/arch/x86/include/uapi/asm/msr-index.h
 @@ -536,6 +536,7 @@
  
  /* MSR_IA32_VMX_MISC bits */
  #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL  29)
 +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
  /* AMD-V MSRs */
  
  #define MSR_VM_CR   0xc0010114
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 1f1da43..e1fa13a 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER) ||
 + !(nested_vmx_exit_ctls_high  
 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
 + nested_vmx_exit_ctls_high = ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + nested_vmx_pinbased_ctls_high = 
 ~PIN_BASED_VMX_PREEMPTION_TIMER;
 + }
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);
  
 @@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
 u64 *info1, u64 *info2)
   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
  }
  
 +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
 +{
 + u64 delta_tsc_l1;
 + u32 preempt_val_l1, preempt_val_l2, preempt_scale;
 +
 + if (!(get_vmcs12(vcpu)-pin_based_vm_exec_control 
 + PIN_BASED_VMX_PREEMPTION_TIMER))
 + return;
 + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) 
 + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
 + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc())
 + - vcpu-arch.last_guest_tsc;
 + preempt_val_l1 = delta_tsc_l1  preempt_scale;
 + if (preempt_val_l2 = preempt_val_l1)
 + preempt_val_l2 = 0;
 + else
 + preempt_val_l2 -= preempt_val_l1;
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
 +}
 +
  /*
   * The guest has exited.  See if we can fix it or if we need userspace
   * assistance.
 @@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
 *vcpu)
   atomic_switch_perf_msrs(vmx);
   debugctlmsr = get_debugctlmsr();
  
 + if (is_guest_mode(vcpu)  !(vmx-nested.nested_run_pending))
 + nested_adjust_preemption_timer(vcpu);
   vmx-__launched = vmx-loaded_vmcs-launched;
   asm(
   /* Store host registers */
 @@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
  {
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 exec_control;
 + u32 exit_control;
  
   vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
   vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
 @@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
* bits are further modified by vmx_set_efer() below.
*/
 - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 + exit_control = vmcs_config.vmexit_ctrl;
 + if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
 + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + vmcs_write32(VM_EXIT_CONTROLS, exit_control);
  
   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
* emulated by vmx_set_efer(), below.
 @@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   vmcs12-guest_pending_dbg_exceptions =
   vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
  
 + if ((vmcs12-pin_based_vm_exec_control 
 + PIN_BASED_VMX_PREEMPTION_TIMER) 
 + (vmcs12-vm_exit_controls 
 +

Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 11:51, Gleb Natapov ha scritto:
 All exceptions should be checked for intercept during delivery to L2,
 but we check only #PF currently. Drop nested_run_pending while we are
 at it since exception cannot be injected during vmentry anyway.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  arch/x86/kvm/vmx.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 663bc5e..5bfa09d 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct kvm_vcpu 
 *vcpu)
   * a #PF exception (this is the only case in which KVM injects a #PF when L2
   * is running).
   */
 -static int nested_pf_handled(struct kvm_vcpu *vcpu)
 +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr)
  {
   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
  
 - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */

Just one more question, why drop this comment?  It doesn't seem
incorrect in the particular case where nr == PF_VECTOR.

 - if (!(vmcs12-exception_bitmap  (1u  PF_VECTOR)))
 + if (!(vmcs12-exception_bitmap  (1u  nr)))
   return 0;
  
   nested_vmx_vmexit(vcpu);
 @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
 unsigned nr,
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 intr_info = nr | INTR_INFO_VALID_MASK;
  
 - if (!reinject  nr == PF_VECTOR  is_guest_mode(vcpu) 
 - !vmx-nested.nested_run_pending  nested_pf_handled(vcpu))
 + if (!reinject  is_guest_mode(vcpu)  nested_ex_handled(vcpu, nr))
   return;
  
   if (has_error_code) {
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 15:36, Gleb Natapov ha scritto:
  I'm asking because I didn't find any other place that modifies function
  pointers this way after kvm_mmu_reset_context.
  
 nested_ept_init_mmu_context() modify the pointer. Not after
 kvm_mmu_reset_context() but it does not matter much. The canonical way
 to do what I did here would be to create special mmu context to handle
 shadow on shadow, bit the only difference between it and regular shadow
 one would be this pointer, so it looked like overkill. Just changing the
 pointer do the trick.

Ok, I think it's possible to clean up the code a bit but I can do that
on top of your patches.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Gleb Natapov

On Wed, Sep 25, 2013 at 04:00:16PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 11:51, Gleb Natapov ha scritto:
  All exceptions should be checked for intercept during delivery to L2,
  but we check only #PF currently. Drop nested_run_pending while we are
  at it since exception cannot be injected during vmentry anyway.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/kvm/vmx.c | 8 +++-
   1 file changed, 3 insertions(+), 5 deletions(-)
  
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index 663bc5e..5bfa09d 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct 
  kvm_vcpu *vcpu)
* a #PF exception (this is the only case in which KVM injects a #PF when 
  L2
* is running).
*/
  -static int nested_pf_handled(struct kvm_vcpu *vcpu)
  +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr)
   {
  struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
   
  -   /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
 
 Just one more question, why drop this comment?  It doesn't seem
 incorrect in the particular case where nr == PF_VECTOR.
 
I moved it to vmx_inject_page_fault_nested().


  -   if (!(vmcs12-exception_bitmap  (1u  PF_VECTOR)))
  +   if (!(vmcs12-exception_bitmap  (1u  nr)))
  return 0;
   
  nested_vmx_vmexit(vcpu);
  @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu 
  *vcpu, unsigned nr,
  struct vcpu_vmx *vmx = to_vmx(vcpu);
  u32 intr_info = nr | INTR_INFO_VALID_MASK;
   
  -   if (!reinject  nr == PF_VECTOR  is_guest_mode(vcpu) 
  -   !vmx-nested.nested_run_pending  nested_pf_handled(vcpu))
  +   if (!reinject  is_guest_mode(vcpu)  nested_ex_handled(vcpu, nr))
  return;
   
  if (has_error_code) {
  

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Paolo Bonzini

Il 25/09/2013 16:19, Gleb Natapov ha scritto:
   -static int nested_pf_handled(struct kvm_vcpu *vcpu)
   +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr)
{
   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);

   -   /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
  
  Just one more question, why drop this comment?  It doesn't seem
  incorrect in the particular case where nr == PF_VECTOR.
 
 I moved it to vmx_inject_page_fault_nested().

Ah, so if you test it there, you shouldn't get here unless the
PFEC_MASK/MATCH matches.  In the case of EPT, you run with L1
PFEC_MASK/MATCH so you do not need any check.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to share filesystem

2013-09-25 Thread Brian Jackson


On Tuesday, September 24, 2013 3:38:39 AM CDT, Ross Boylan wrote:

I would like to have access to the same file system from the host and
the guest.  Can anyone recommend the best way to do this, considering
ease of use, safety (concurrent access from guest and host does not
corrupt) and performance?

For example, I would like to restore files from backup using the host,
but write to filesystems used by the guest.

I have previously used kvm mostly with disks that are based on LVM
logical volumes, e.g. -hda /dev/turtle/Squeeze00.  Since the LVs are
virtual disks, I can't just mount them in the host AFAIK.

Among the alternatives I can think of are using NFS and using NBD.
Maybe there's some kind of loopback device I could use on the disk image
to access it from the host.



I would suggest NFS/CIFS generally speaking. 9p/virtio is an option, but 
NFS and CIFS are going to be far more stable, tested, and fast.





Host: Debian GNU/Linux wheezy, amd64 architecture, qemu-kvm 1.1.2
Guest: Debian GNU/Linux lenny i386.
Host processor is a recent i5 with good virtualization (flaga: fpu vme
de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes
xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm
tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms)

Thanks.
Ross Boylan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2

2013-09-25 Thread Gleb Natapov

On Wed, Sep 25, 2013 at 04:22:48PM +0200, Paolo Bonzini wrote:
 Il 25/09/2013 16:19, Gleb Natapov ha scritto:
-static int nested_pf_handled(struct kvm_vcpu *vcpu)
+static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr)
 {
  struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 
- /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
   
   Just one more question, why drop this comment?  It doesn't seem
   incorrect in the particular case where nr == PF_VECTOR.
  
  I moved it to vmx_inject_page_fault_nested().
 
 Ah, so if you test it there, you shouldn't get here unless the
 PFEC_MASK/MATCH matches.
Yes, PF will never reach here in case of shadow on shadow.


--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] arm32: kvm: rename CONFIG_KVM_ARM_MAX_VCPUS

2013-09-25 Thread Christoffer Dall

On Tue, Sep 24, 2013 at 04:09:28PM +0200, Andrew Jones wrote:
 Drop the _ARM_ part of the name. We can then introduce a config option
 like this to aarch64 and other arches using the same name - allowing
 grep to show them all. Also update the help text to describe the option
 more completely.
 
 Signed-off-by: Andrew Jones drjo...@redhat.com
 
 ---
 v2: reword help text some more
 ---
  arch/arm/include/asm/kvm_host.h | 4 ++--
  arch/arm/kvm/Kconfig| 9 +
  2 files changed, 7 insertions(+), 6 deletions(-)
 
 diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
 index 7d22517d80711..c614d3eb176c6 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -25,8 +25,8 @@
  #include asm/fpstate.h
  #include kvm/arm_arch_timer.h
  
 -#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 -#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 +#if defined(CONFIG_KVM_MAX_VCPUS)
 +#define KVM_MAX_VCPUS CONFIG_KVM_MAX_VCPUS
  #else
  #define KVM_MAX_VCPUS 0
  #endif
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index ebf5015508b52..8e56ccf45edce 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -40,16 +40,17 @@ config KVM_ARM_HOST
   ---help---
 Provides host support for ARM processors.
  
 -config KVM_ARM_MAX_VCPUS
 +config KVM_MAX_VCPUS
   int Number maximum supported virtual CPUs per VM
   depends on KVM_ARM_HOST
   default 4
   help
 Static number of max supported virtual CPUs per VM.
  
 -   If you choose a high number, the vcpu structures will be quite
 -   large, so only choose a reasonable number that you expect to
 -   actually use.
 +   The default is set to the highest number of vcpus that
 +   current hardware supports. Choosing a lower number decreases
 +   the size of the VM data structure. This number may also be
 +   increased.
  
Maybe I'm being forgetful, but what do you mean by This number may also
be increased ?

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-25 Thread Alex Williamson

On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote:
 On 09/14/2013 02:25 AM, Alex Williamson wrote:
  On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
  On 09/13/2013 07:23 AM, Alex Williamson wrote:
  So far we've succeeded at making KVM and VFIO mostly unaware of each
  other, but there's any important point where that breaks down.  Intel
  VT-d hardware may or may not support snoop control.  When snoop
  control is available, intel-iommu promotes No-Snoop transactions on
  PCIe to be cache coherent.  That allows KVM to handle things like the
  x86 WBINVD opcode as a nop.  When the hardware does not support this,
  KVM must implement a hardware visible WBINVD for the guest.
 
  We could simply let userspace tell KVM how to handle WBINVD, but it's
  privileged for a reason.  Allowing an arbitrary user to enable
  physical WBINVD gives them a more access to the hardware.  Previously,
  this has only been enabled for guests supporting legacy PCI device
  assignment.  In such cases it's necessary for proper guest execution.
  We therefore create a new KVM-VFIO virtual device.  The user can add
  and remove VFIO groups to this device via file descriptors.  KVM
  makes use of the VFIO external user interface to validate that the
  user has access to physical hardware and gets the coherency state of
  the IOMMU from VFIO.  This provides equivalent functionality to
  legacy KVM assignment, while keeping (nearly) all the bits isolated.
 
  The one intrusion is the resulting flag indicating the coherency
  state.  For this RFC it's placed on the x86 kvm_arch struct, however
  I know POWER has interest in using the VFIO external user interface,
  and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
  care about No-Snoop handling as well or the code can be #ifdef'd.
 
 
  POWER does not support (at least boos3s - server, not sure about others)
  this cache-non-coherent stuff at all.
  
  Then it's easy for your IOMMU API interface to return always cache
  coherent or never cache coherent or whatever ;)
  
  Regarding reusing this device with external API for POWER - I posted a
  patch which introduces KVM device to link KVM with IOMMU but besides the
  list of groups registered in KVM, it also provides the way to find a group
  by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
  in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
  there window_size too (for a quick boundary check). I am not sure we want
  to mix everything here.
 
  It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
  handling if you are interested (kvmppc_spapr_tce_iommu_device).
  
  Yes, I stole the code to get the vfio symbols from your code.  The
  convergence I was hoping to achieve is that KVM doesn't really want to
  know about VFIO and vica versa.  We can therefore at least limit the
  intrusion by sharing a common device.  Obviously for you it will need
  some extra interfaces to associate an LIOBN to a group, but we keep both
  the kernel an userspace cleaner by avoiding duplication where we can.
  Is this really not extensible to your usage?
 
 Well, I do not have a good taste for this kind of stuff so I cannot tell
 for sure. I can reuse this device and hack it to do whatever I need but how?
 
 There are 2 things I need from KVM device:
 1. associate IOMMU group with LIOBN

Does QEMU know this?  We could set another attribute to specify the
LIOBN for a group.

 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
 pointer which is an IOMMU data of an IOMMU group so we could take a
 shortcut here).

Once we have a VFIO device with a VFIO group added and the LIOBN
attribute set, isn't this just a matter of some access code?

 There are 3 possible interfaces to use:
 A. set/get attributes
 B. ioctl
 C. additional API

I think we need to differentiate user interfaces from kernel interfaces.
Within the kernel, we make no guarantees about interfaces and we can
change them to meet our needs.  That includes the vfio external user
interface.  For userspace, we need to be careful not to break things.  I
suggest we use the set/get/has attribute interface that's already part
of KVM for the user interface.

 What I posted a week ago uses A for 1 and C for 2. Now we move this to
 virt/kvm/vfio.c.

I don't care where it lives, other than we both have a need for it, so
it seems like the core of it should not live in architecture specific
directories.

 A for 1 is more or less ok but how exactly? Yet another attribute? Platform
 specific bus ID? In your patch attr-addr is not really an address (to be
 accessed with get_user()) but just an fd.

Is that a problem?

 For 2 - there are already A and B interfaces so we do not want C, right?
 What kind of a good device has a backdoor? :) But A and B do not have
 access control to prevent the user space from receiving a IOMMU group
 pointer (which it would not be able to use anyway

Re: [PATCH 6/8] KVM: arm-vgic: Add vgic reg access from dev attr

2013-09-25 Thread Christoffer Dall

On Sun, Aug 25, 2013 at 04:21:58PM +0100, Alexander Graf wrote:
 
 On 23.08.2013, at 20:20, Christoffer Dall wrote:
 
  Add infrastructure to handle distributor and cpu interface register
  accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the
  KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups
  and defining the semantics of the attr field to be the MMIO offset as
  specified in the GICv2 specs.
  
  Missing register accesses or other changes in individual register access
  functions to support save/restore of the VGIC state is added in
  subsequent patches.
  
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  Documentation/virtual/kvm/devices/arm-vgic.txt |   35 ++
  virt/kvm/arm/vgic.c|  143 
  
  2 files changed, 178 insertions(+)
  
  diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
  b/Documentation/virtual/kvm/devices/arm-vgic.txt
  index c9febb2..1b68475 100644
  --- a/Documentation/virtual/kvm/devices/arm-vgic.txt
  +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
  @@ -19,3 +19,38 @@ Groups:
  KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit)
Base address in the guest physical address space of the GIC virtual 
  cpu
interface register mappings.
  +
  +  KVM_DEV_ARM_VGIC_GRP_DIST_REGS
  +  Attributes:
  +The attr field of kvm_device_attr encodes two values:
  +bits: | 63     40 | 39 ..  32  |  31   0 |
  +values:   |reserved   |   cpu id   |  offset |
  +
  +All distributor regs are (rw, 32-bit)
  +
  +The offset is relative to the Distributor base address as defined in 
  the
  +GICv2 specs.  Getting or setting such a register has the same effect as
  +reading or writing the register on the actual hardware from the cpu
  +specified with cpu id field.  Note that most distributor fields are not
  +banked, but return the same value regardless of the cpu id used to 
  access
  +the register.
  +  Limitations:
  +- Priorities are not implemented, and registers are RAZ/WI
  +  Errors:
  +- ENODEV: Getting or setting this register is not yet supported
  +
  +  KVM_DEV_ARM_VGIC_GRP_CPU_REGS
  +  Attributes:
  +The attr field of kvm_device_attr encodes two values:
  +bits: | 63     40 | 39 ..  32  |  31   0 |
  +values:   |reserved   |   cpu id   |  offset |
  +
  +All CPU regs are (rw, 32-bit)
  +
  +The offsetspecifies the offset from the CPU interface base address as
 
 offset specifies
 
  +defined in the GICv2 specs.  Getting or setting such a register has the
  +same effect as reading or writing the register on the actual hardware.
  +  Limitations:
  +- Priorities are not implemented, and registers are RAZ/WI
  +  Errors:
  +- ENODEV: Getting or setting this register is not yet supported
  diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
  index 629caeb..e31625c 100644
  --- a/virt/kvm/arm/vgic.c
  +++ b/virt/kvm/arm/vgic.c
  @@ -591,11 +591,29 @@ static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu,
  return false;
  }
  
  +static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu,
  + struct kvm_exit_mmio *mmio,
  + phys_addr_t offset)
  +{
  +   return false;
  +}
  +
  +static bool handle_mmio_sgi_set(struct kvm_vcpu *vcpu,
  +   struct kvm_exit_mmio *mmio,
  +   phys_addr_t offset)
  +{
  +   return false;
  +}
  +
  /*
   * I would have liked to use the kvm_bus_io_*() API instead, but it
   * cannot cope with banked registers (only the VM pointer is passed
   * around, and we need the vcpu). One of these days, someone please
   * fix it!
  + *
  + * Note that the handle_mmio implementations should not use the phys_addr
  + * field from the kvm_exit_mmio struct as this will not have any sane 
  values
  + * when used to save/restore state from user space.
   */
  struct mmio_range {
  phys_addr_t base;
  @@ -665,6 +683,16 @@ static const struct mmio_range vgic_dist_ranges[] = {
  .len= 4,
  .handle_mmio= handle_mmio_sgi_reg,
  },
  +   {
  +   .base   = GIC_DIST_SGI_CLEAR,
  +   .len= VGIC_NR_SGIS,
  +   .handle_mmio= handle_mmio_sgi_clear,
  +   },
  +   {
  +   .base   = GIC_DIST_SGI_SET,
  +   .len= VGIC_NR_SGIS,
  +   .handle_mmio= handle_mmio_sgi_set,
  +   },
  {}
  };
  
  @@ -1543,6 +1571,80 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long 
  type, u64 *addr, bool write)
  return r;
  }
  
  +static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
  +struct kvm_exit_mmio *mmio, phys_addr_t offset)
  +{
  +   return true;
  +}
  +
  +static const struct mmio_range vgic_cpu_ranges[] = {
  +   {
  +   .base

Re: [PATCH 6/8] KVM: arm-vgic: Add vgic reg access from dev attr

2013-09-25 Thread Christoffer Dall

On Wed, Sep 25, 2013 at 01:38:03PM -0700, Christoffer Dall wrote:
 On Sun, Aug 25, 2013 at 04:21:58PM +0100, Alexander Graf wrote:
  
  On 23.08.2013, at 20:20, Christoffer Dall wrote:
  
   Add infrastructure to handle distributor and cpu interface register
   accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the
   KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups
   and defining the semantics of the attr field to be the MMIO offset as
   specified in the GICv2 specs.
   
   Missing register accesses or other changes in individual register access
   functions to support save/restore of the VGIC state is added in
   subsequent patches.
   
   Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
   ---
   Documentation/virtual/kvm/devices/arm-vgic.txt |   35 ++
   virt/kvm/arm/vgic.c|  143 
   
   2 files changed, 178 insertions(+)
   
   diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
   b/Documentation/virtual/kvm/devices/arm-vgic.txt
   index c9febb2..1b68475 100644
   --- a/Documentation/virtual/kvm/devices/arm-vgic.txt
   +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
   @@ -19,3 +19,38 @@ Groups:
   KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit)
 Base address in the guest physical address space of the GIC virtual 
   cpu
 interface register mappings.
   +
   +  KVM_DEV_ARM_VGIC_GRP_DIST_REGS
   +  Attributes:
   +The attr field of kvm_device_attr encodes two values:
   +bits: | 63     40 | 39 ..  32  |  31   0 |
   +values:   |reserved   |   cpu id   |  offset |
   +
   +All distributor regs are (rw, 32-bit)
   +
   +The offset is relative to the Distributor base address as defined 
   in the
   +GICv2 specs.  Getting or setting such a register has the same effect 
   as
   +reading or writing the register on the actual hardware from the cpu
   +specified with cpu id field.  Note that most distributor fields are 
   not
   +banked, but return the same value regardless of the cpu id used to 
   access
   +the register.
   +  Limitations:
   +- Priorities are not implemented, and registers are RAZ/WI
   +  Errors:
   +- ENODEV: Getting or setting this register is not yet supported
   +
   +  KVM_DEV_ARM_VGIC_GRP_CPU_REGS
   +  Attributes:
   +The attr field of kvm_device_attr encodes two values:
   +bits: | 63     40 | 39 ..  32  |  31   0 |
   +values:   |reserved   |   cpu id   |  offset |
   +
   +All CPU regs are (rw, 32-bit)
   +
   +The offsetspecifies the offset from the CPU interface base address 
   as
  
  offset specifies
  
   +defined in the GICv2 specs.  Getting or setting such a register has 
   the
   +same effect as reading or writing the register on the actual 
   hardware.
   +  Limitations:
   +- Priorities are not implemented, and registers are RAZ/WI
   +  Errors:
   +- ENODEV: Getting or setting this register is not yet supported
   diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
   index 629caeb..e31625c 100644
   --- a/virt/kvm/arm/vgic.c
   +++ b/virt/kvm/arm/vgic.c
   @@ -591,11 +591,29 @@ static bool handle_mmio_sgi_reg(struct kvm_vcpu 
   *vcpu,
 return false;
   }
   
   +static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu,
   +   struct kvm_exit_mmio *mmio,
   +   phys_addr_t offset)
   +{
   + return false;
   +}
   +
   +static bool handle_mmio_sgi_set(struct kvm_vcpu *vcpu,
   + struct kvm_exit_mmio *mmio,
   + phys_addr_t offset)
   +{
   + return false;
   +}
   +
   /*
* I would have liked to use the kvm_bus_io_*() API instead, but it
* cannot cope with banked registers (only the VM pointer is passed
* around, and we need the vcpu). One of these days, someone please
* fix it!
   + *
   + * Note that the handle_mmio implementations should not use the phys_addr
   + * field from the kvm_exit_mmio struct as this will not have any sane 
   values
   + * when used to save/restore state from user space.
*/
   struct mmio_range {
 phys_addr_t base;
   @@ -665,6 +683,16 @@ static const struct mmio_range vgic_dist_ranges[] = {
 .len= 4,
 .handle_mmio= handle_mmio_sgi_reg,
 },
   + {
   + .base   = GIC_DIST_SGI_CLEAR,
   + .len= VGIC_NR_SGIS,
   + .handle_mmio= handle_mmio_sgi_clear,
   + },
   + {
   + .base   = GIC_DIST_SGI_SET,
   + .len= VGIC_NR_SGIS,
   + .handle_mmio= handle_mmio_sgi_set,
   + },
 {}
   };
   
   @@ -1543,6 +1571,80 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long 
   type, u64 *addr, bool write)
 return r;
   }
   
   +static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
   +

Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access

2013-09-25 Thread Christoffer Dall

On Sun, Aug 25, 2013 at 04:24:20PM +0100, Alexander Graf wrote:
 
 On 23.08.2013, at 20:20, Christoffer Dall wrote:
 
  Implement support for the CPU interface register access driven by MMIO
  address offsets from the CPU interface base address.  Useful for user
  space to support save/restore of the VGIC state.
  
  This commit adds support only for the same logic as the current VGIC
  support, and no more.  For example, the active priority registers are
  handled as RAZ/WI, just like setting priorities on the emulated
  distributor.
  
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  virt/kvm/arm/vgic.c |   66 
  +++
  1 file changed, 62 insertions(+), 4 deletions(-)
  
  diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
  index d44b5a1..257dbae 100644
  --- a/virt/kvm/arm/vgic.c
  +++ b/virt/kvm/arm/vgic.c
  @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long 
  type, u64 *addr, bool write)
  static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
   struct kvm_exit_mmio *mmio, phys_addr_t offset)
  {
  -   return true;
  +   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
  +   u32 reg, mask = 0, shift = 0;
  +   bool updated = false;
  +
  +   switch (offset  ~0x3) {
  +   case GIC_CPU_CTRL:
  +   mask = GICH_VMCR_CTRL_MASK;
  +   shift = GICH_VMCR_CTRL_SHIFT;
  +   break;
  +   case GIC_CPU_PRIMASK:
  +   mask = GICH_VMCR_PRIMASK_MASK;
  +   shift = GICH_VMCR_PRIMASK_SHIFT;
  +   break;
  +   case GIC_CPU_BINPOINT:
  +   mask = GICH_VMCR_BINPOINT_MASK;
  +   shift = GICH_VMCR_BINPOINT_SHIFT;
  +   break;
  +   case GIC_CPU_ALIAS_BINPOINT:
  +   mask = GICH_VMCR_ALIAS_BINPOINT_MASK;
  +   shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT;
  +   break;
  +   }
  +
  +   if (!mmio-is_write) {
  +   reg = (vgic_cpu-vgic_vmcr  mask)  shift;
  +   memcpy(mmio-data, reg, sizeof(reg));
  +   } else {
  +   memcpy(reg, mmio-data, sizeof(reg));
  +   reg = (reg  shift)  mask;
  +   if (reg != (vgic_cpu-vgic_vmcr  mask))
  +   updated = true;
  +   vgic_cpu-vgic_vmcr = ~mask;
  +   vgic_cpu-vgic_vmcr |= reg;
  +   }
  +   return updated;
  +}
  +
  +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu,
  +struct kvm_exit_mmio *mmio, phys_addr_t offset)
  +{
  +   return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT);
  +}
  +
  +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
  + struct kvm_exit_mmio *mmio,
  + phys_addr_t offset)
  +{
  +   u32 reg;
  +
  +   if (mmio-is_write)
  +   return false;
  +
  +   reg = 0x0002043B;
 
 This wants a comment and probably also a #define :).
 

Marc, where does the 0x4b0 product id code come from for the distributor
IIDR?

Would this be satisfying?


diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 5214424..558be38 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -71,6 +71,9 @@
 #define VGIC_ADDR_UNDEF(-1)
 #define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
 
+#define GIC_PRODUCT_ID 0x4b0
+#define ARM_JEP106_IMPLEMENTER 0x43b
+
 /* Physical address of vgic virtual cpu interface */
 static phys_addr_t vgic_vcpu_base;
 
@@ -331,7 +334,7 @@ static bool handle_mmio_misc(struct kvm_vcpu *vcpu,
break;
 
case 8: /* IIDR */
-   reg = 0x4B00043B;
+   reg = (GIC_PRODUCT_ID  20) | ARM_JEP106_IMPLEMENTER;
vgic_reg_access(mmio, reg, word_offset,
ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
break;
@@ -1734,7 +1737,7 @@ static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
if (mmio-is_write)
return false;
 
-   reg = 0x0002043B;
+   reg = (GIC_PRODUCT_ID  20) | (0x2  16) | ARM_JEP106_IMPLEMENTER;
mmio_data_write(mmio, ~0, reg);
return false;
 }

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 61971] Migration problem of qemu-kvm guests

2013-09-25 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=61971

Magnus Boden mag...@boden.cx changed:

   What|Removed |Added

 Kernel Version|3.10.5  |3.11.1

--- Comment #1 from Magnus Boden mag...@boden.cx ---
Hello,

Since the kernel I used was a debian backport I compiled my own kernel from
kernel.org 3.11.1 and it has the same problem.

Regards
Magnus

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access

2013-09-25 Thread Alexander Graf


On 25.09.2013, at 23:30, Christoffer Dall wrote:

 On Sun, Aug 25, 2013 at 04:24:20PM +0100, Alexander Graf wrote:
 
 On 23.08.2013, at 20:20, Christoffer Dall wrote:
 
 Implement support for the CPU interface register access driven by MMIO
 address offsets from the CPU interface base address.  Useful for user
 space to support save/restore of the VGIC state.
 
 This commit adds support only for the same logic as the current VGIC
 support, and no more.  For example, the active priority registers are
 handled as RAZ/WI, just like setting priorities on the emulated
 distributor.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 virt/kvm/arm/vgic.c |   66 
 +++
 1 file changed, 62 insertions(+), 4 deletions(-)
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index d44b5a1..257dbae 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long 
 type, u64 *addr, bool write)
 static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
  struct kvm_exit_mmio *mmio, phys_addr_t offset)
 {
 -   return true;
 +   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 +   u32 reg, mask = 0, shift = 0;
 +   bool updated = false;
 +
 +   switch (offset  ~0x3) {
 +   case GIC_CPU_CTRL:
 +   mask = GICH_VMCR_CTRL_MASK;
 +   shift = GICH_VMCR_CTRL_SHIFT;
 +   break;
 +   case GIC_CPU_PRIMASK:
 +   mask = GICH_VMCR_PRIMASK_MASK;
 +   shift = GICH_VMCR_PRIMASK_SHIFT;
 +   break;
 +   case GIC_CPU_BINPOINT:
 +   mask = GICH_VMCR_BINPOINT_MASK;
 +   shift = GICH_VMCR_BINPOINT_SHIFT;
 +   break;
 +   case GIC_CPU_ALIAS_BINPOINT:
 +   mask = GICH_VMCR_ALIAS_BINPOINT_MASK;
 +   shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT;
 +   break;
 +   }
 +
 +   if (!mmio-is_write) {
 +   reg = (vgic_cpu-vgic_vmcr  mask)  shift;
 +   memcpy(mmio-data, reg, sizeof(reg));
 +   } else {
 +   memcpy(reg, mmio-data, sizeof(reg));
 +   reg = (reg  shift)  mask;
 +   if (reg != (vgic_cpu-vgic_vmcr  mask))
 +   updated = true;
 +   vgic_cpu-vgic_vmcr = ~mask;
 +   vgic_cpu-vgic_vmcr |= reg;
 +   }
 +   return updated;
 +}
 +
 +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu,
 +struct kvm_exit_mmio *mmio, phys_addr_t offset)
 +{
 +   return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT);
 +}
 +
 +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
 + struct kvm_exit_mmio *mmio,
 + phys_addr_t offset)
 +{
 +   u32 reg;
 +
 +   if (mmio-is_write)
 +   return false;
 +
 +   reg = 0x0002043B;
 
 This wants a comment and probably also a #define :).
 
 
 Marc, where does the 0x4b0 product id code come from for the distributor
 IIDR?
 
 Would this be satisfying?
 
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 5214424..558be38 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -71,6 +71,9 @@
 #define VGIC_ADDR_UNDEF   (-1)
 #define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
 
 +#define GIC_PRODUCT_ID   0x4b0

This is a specific GIC version. PL390 for example is 0x3b0:

  
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0416b/Beiggeba.html

That should be reflected in the #define. If it means GICv2 then it should be 
GICV2_PRODUCT_ID for example.

 +#define ARM_JEP106_IMPLEMENTER   0x43b

I think naming this JEP106_IMPLEMENTER_ARM makes it more obvious that we're 
talking about the implementer code for ARM. Or maybe just only IMPLEMENTER_ARM.

 +
 /* Physical address of vgic virtual cpu interface */
 static phys_addr_t vgic_vcpu_base;
 
 @@ -331,7 +334,7 @@ static bool handle_mmio_misc(struct kvm_vcpu *vcpu,
   break;
 
   case 8: /* IIDR */
 - reg = 0x4B00043B;
 + reg = (GIC_PRODUCT_ID  20) | ARM_JEP106_IMPLEMENTER;
   vgic_reg_access(mmio, reg, word_offset,
   ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
   break;
 @@ -1734,7 +1737,7 @@ static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
   if (mmio-is_write)
   return false;
 
 - reg = 0x0002043B;
 + reg = (GIC_PRODUCT_ID  20) | (0x2  16) | ARM_JEP106_IMPLEMENTER;

What is the 0x2 here?


Alex

   mmio_data_write(mmio, ~0, reg);
   return false;
 }
 
 Thanks,
 -Christoffer

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/18] KVM: PPC: Fixes for PR and preparation for POWER8

2013-09-25 Thread Alexander Graf


On 20.09.2013, at 06:52, Paul Mackerras wrote:

 This patch series contains updated versions of patches that have been
 posted before, plus one new compilation fix (for PR KVM without
 CONFIG_ALTIVEC), plus a patch to allow the guest VRSAVE register to be
 accessed with the ONE_REG interface on Book E.  The first few patches
 are preparation for POWER8 support.  Following that there are several
 patches that improve PR KVM's MMU emulation and prepare for being able
 to compile both HV and PR KVM in the one kernel.  The series stops
 short of allowing them to coexist, though, since the details of how
 that should best be done are still being discussed.
 
 Please apply.

Thanks, applied all (with minor cosmetic fixes) to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

I urgently request your partnership in a transaction.

2013-09-25 Thread drt5

I urgently request your partnership in a transaction.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access

2013-09-25 Thread Christoffer Dall

On Thu, Sep 26, 2013 at 12:37:03AM +0200, Alexander Graf wrote:
 
 On 25.09.2013, at 23:30, Christoffer Dall wrote:
 
  On Sun, Aug 25, 2013 at 04:24:20PM +0100, Alexander Graf wrote:
  
  On 23.08.2013, at 20:20, Christoffer Dall wrote:
  
  Implement support for the CPU interface register access driven by MMIO
  address offsets from the CPU interface base address.  Useful for user
  space to support save/restore of the VGIC state.
  
  This commit adds support only for the same logic as the current VGIC
  support, and no more.  For example, the active priority registers are
  handled as RAZ/WI, just like setting priorities on the emulated
  distributor.
  
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  virt/kvm/arm/vgic.c |   66 
  +++
  1 file changed, 62 insertions(+), 4 deletions(-)
  
  diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
  index d44b5a1..257dbae 100644
  --- a/virt/kvm/arm/vgic.c
  +++ b/virt/kvm/arm/vgic.c
  @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long 
  type, u64 *addr, bool write)
  static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
 struct kvm_exit_mmio *mmio, phys_addr_t offset)
  {
  - return true;
  + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
  + u32 reg, mask = 0, shift = 0;
  + bool updated = false;
  +
  + switch (offset  ~0x3) {
  + case GIC_CPU_CTRL:
  + mask = GICH_VMCR_CTRL_MASK;
  + shift = GICH_VMCR_CTRL_SHIFT;
  + break;
  + case GIC_CPU_PRIMASK:
  + mask = GICH_VMCR_PRIMASK_MASK;
  + shift = GICH_VMCR_PRIMASK_SHIFT;
  + break;
  + case GIC_CPU_BINPOINT:
  + mask = GICH_VMCR_BINPOINT_MASK;
  + shift = GICH_VMCR_BINPOINT_SHIFT;
  + break;
  + case GIC_CPU_ALIAS_BINPOINT:
  + mask = GICH_VMCR_ALIAS_BINPOINT_MASK;
  + shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT;
  + break;
  + }
  +
  + if (!mmio-is_write) {
  + reg = (vgic_cpu-vgic_vmcr  mask)  shift;
  + memcpy(mmio-data, reg, sizeof(reg));
  + } else {
  + memcpy(reg, mmio-data, sizeof(reg));
  + reg = (reg  shift)  mask;
  + if (reg != (vgic_cpu-vgic_vmcr  mask))
  + updated = true;
  + vgic_cpu-vgic_vmcr = ~mask;
  + vgic_cpu-vgic_vmcr |= reg;
  + }
  + return updated;
  +}
  +
  +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu,
  +  struct kvm_exit_mmio *mmio, phys_addr_t offset)
  +{
  + return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT);
  +}
  +
  +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
  +   struct kvm_exit_mmio *mmio,
  +   phys_addr_t offset)
  +{
  + u32 reg;
  +
  + if (mmio-is_write)
  + return false;
  +
  + reg = 0x0002043B;
  
  This wants a comment and probably also a #define :).
  
  
  Marc, where does the 0x4b0 product id code come from for the distributor
  IIDR?
  
  Would this be satisfying?
  
  
  diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
  index 5214424..558be38 100644
  --- a/virt/kvm/arm/vgic.c
  +++ b/virt/kvm/arm/vgic.c
  @@ -71,6 +71,9 @@
  #define VGIC_ADDR_UNDEF (-1)
  #define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
  
  +#define GIC_PRODUCT_ID 0x4b0
 
 This is a specific GIC version. PL390 for example is 0x3b0:
 
   
 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0416b/Beiggeba.html
 
 That should be reflected in the #define. If it means GICv2 then it should 
 be GICV2_PRODUCT_ID for example.
 

I know what field in the register it is thanks :)

But I couldn't find 0x4b0 anywhere in the docs, so I'm asking
Marc where he got it from.  I don't believe it means GICv2, but a
specific implementation of a GICv2, and once I have more info I can
change the define name, I suspect this is potentially something made-up
to indicate that this is the KVM VGIC...

  +#define ARM_JEP106_IMPLEMENTER 0x43b
 
 I think naming this JEP106_IMPLEMENTER_ARM makes it more obvious that we're 
 talking about the implementer code for ARM. Or maybe just only 
 IMPLEMENTER_ARM.
 

ok

  +
  /* Physical address of vgic virtual cpu interface */
  static phys_addr_t vgic_vcpu_base;
  
  @@ -331,7 +334,7 @@ static bool handle_mmio_misc(struct kvm_vcpu *vcpu,
  break;
  
  case 8: /* IIDR */
  -   reg = 0x4B00043B;
  +   reg = (GIC_PRODUCT_ID  20) | ARM_JEP106_IMPLEMENTER;
  vgic_reg_access(mmio, reg, word_offset,
  ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
  break;
  @@ -1734,7 +1737,7 @@ static bool handle_cpu_mmio_ident(struct kvm_vcpu 
  *vcpu,
  if (mmio-is_write)
  return false;
  
  -   reg = 0x0002043B;
  +   reg = (GIC_PRODUCT_ID  20) | (0x2  16) | ARM_JEP106_IMPLEMENTER;
 
 What is the 0x2 here?
 
 

GICv2, and now

Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access

2013-09-25 Thread Alexander Graf


On 26.09.2013, at 03:36, Alexander Graf wrote:

 
 On 26.09.2013, at 03:15, Alexander Graf wrote:
 
 
 On 26.09.2013, at 02:54, Christoffer Dall wrote:
 
 On Thu, Sep 26, 2013 at 12:37:03AM +0200, Alexander Graf wrote:
 
 On 25.09.2013, at 23:30, Christoffer Dall wrote:
 
 On Sun, Aug 25, 2013 at 04:24:20PM +0100, Alexander Graf wrote:
 
 On 23.08.2013, at 20:20, Christoffer Dall wrote:
 
 Implement support for the CPU interface register access driven by MMIO
 address offsets from the CPU interface base address.  Useful for user
 space to support save/restore of the VGIC state.
 
 This commit adds support only for the same logic as the current VGIC
 support, and no more.  For example, the active priority registers are
 handled as RAZ/WI, just like setting priorities on the emulated
 distributor.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 virt/kvm/arm/vgic.c |   66 
 +++
 1 file changed, 62 insertions(+), 4 deletions(-)
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index d44b5a1..257dbae 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long 
 type, u64 *addr, bool write)
 static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu,
  struct kvm_exit_mmio *mmio, 
 phys_addr_t offset)
 {
 -   return true;
 +   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 +   u32 reg, mask = 0, shift = 0;
 +   bool updated = false;
 +
 +   switch (offset  ~0x3) {
 +   case GIC_CPU_CTRL:
 +   mask = GICH_VMCR_CTRL_MASK;
 +   shift = GICH_VMCR_CTRL_SHIFT;
 +   break;
 +   case GIC_CPU_PRIMASK:
 +   mask = GICH_VMCR_PRIMASK_MASK;
 +   shift = GICH_VMCR_PRIMASK_SHIFT;
 +   break;
 +   case GIC_CPU_BINPOINT:
 +   mask = GICH_VMCR_BINPOINT_MASK;
 +   shift = GICH_VMCR_BINPOINT_SHIFT;
 +   break;
 +   case GIC_CPU_ALIAS_BINPOINT:
 +   mask = GICH_VMCR_ALIAS_BINPOINT_MASK;
 +   shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT;
 +   break;
 +   }
 +
 +   if (!mmio-is_write) {
 +   reg = (vgic_cpu-vgic_vmcr  mask)  shift;
 +   memcpy(mmio-data, reg, sizeof(reg));
 +   } else {
 +   memcpy(reg, mmio-data, sizeof(reg));
 +   reg = (reg  shift)  mask;
 +   if (reg != (vgic_cpu-vgic_vmcr  mask))
 +   updated = true;
 +   vgic_cpu-vgic_vmcr = ~mask;
 +   vgic_cpu-vgic_vmcr |= reg;
 +   }
 +   return updated;
 +}
 +
 +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu,
 +struct kvm_exit_mmio *mmio, phys_addr_t 
 offset)
 +{
 +   return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT);
 +}
 +
 +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu,
 + struct kvm_exit_mmio *mmio,
 + phys_addr_t offset)
 +{
 +   u32 reg;
 +
 +   if (mmio-is_write)
 +   return false;
 +
 +   reg = 0x0002043B;
 
 This wants a comment and probably also a #define :).
 
 
 Marc, where does the 0x4b0 product id code come from for the distributor
 IIDR?
 
 Would this be satisfying?
 
 ^
 
 
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 5214424..558be38 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -71,6 +71,9 @@
 #define VGIC_ADDR_UNDEF   (-1)
 #define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
 
 +#define GIC_PRODUCT_ID   0x4b0
 
 This is a specific GIC version. PL390 for example is 0x3b0:
 
 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0416b/Beiggeba.html
 
 That should be reflected in the #define. If it means GICv2 then it 
 should be GICV2_PRODUCT_ID for example.
 
 
 I know what field in the register it is thanks :)
 
 But I couldn't find 0x4b0 anywhere in the docs, so I'm asking
 Marc where he got it from.  I don't believe it means GICv2, but a
 
 Ah, ok. Then the answer to your question above is a simple no as the name 
 doesn't really tell us everything we want to know yet :).
 
 specific implementation of a GICv2, and once I have more info I can
 change the define name, I suspect this is potentially something made-up
 to indicate that this is the KVM VGIC...
 
 Hrm, makes sense. So that also explains why there's a special version field.
 
 It doesn't explain why it only gets set in one of the IIDR variants though. 
 Is this on purpose? From what I can tell, the CPU and Distributor interfaces 
 both should return the same number here.


Hrm. Curious. According to

  
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438i/BABGBHBG.html

the proper values for IIDR on an A15 are:

  GICD_IIDR 0x043B
  GICC_IIDR 0x0002043B

So what do the fields mean in each

Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time

2013-09-25 Thread Jason Wang

On 09/23/2013 03:16 PM, Michael S. Tsirkin wrote:
 On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
  On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
   On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
   Currently, even if the packet length is smaller than 
   VHOST_GOODCOPY_LEN, if
   upend_idx != done_idx we still set zcopy_used to true and rollback 
   this choice
   later. This could be avoided by determining zerocopy once by checking 
   all
   conditions at one time before.
  
   Signed-off-by: Jason Wang jasow...@redhat.com
   ---
drivers/vhost/net.c |   47 
   ---
1 files changed, 20 insertions(+), 27 deletions(-)
  
   diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
   index 8a6dd0d..3f89dea 100644
   --- a/drivers/vhost/net.c
   +++ b/drivers/vhost/net.c
   @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
iov_length(nvq-hdr, s), hdr_size);
 break;
 }
   - zcopy_used = zcopy  (len = VHOST_GOODCOPY_LEN ||
   -nvq-upend_idx != nvq-done_idx);
   +
   + zcopy_used = zcopy  len = VHOST_GOODCOPY_LEN
   + (nvq-upend_idx + 1) % UIO_MAXIOV 
   !=
   +   nvq-done_idx
   Thinking about this, this looks strange.
   The original idea was that once we start doing zcopy, we keep
   using the heads ring even for short packets until no zcopy is 
   outstanding.
  
  What's the reason for keep using the heads ring?
 To keep completions in order.

Ok, I will do some test to see the impact.
  
   What's the logic behind (nvq-upend_idx + 1) % UIO_MAXIOV != 
   nvq-done_idx
   here?
  
  Because we initialize both upend_idx and done_idx to zero, so upend_idx
  != done_idx could not be used to check whether or not the heads ring
  were full.
 But what does ring full have to do with zerocopy use?


It was used to forbid the zerocopy when heads ring are full, but since
we have the limitation now, it could be removed.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-25 Thread Alexey Kardashevskiy

On 09/26/2013 06:19 AM, Alex Williamson wrote:
 On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote:
 On 09/14/2013 02:25 AM, Alex Williamson wrote:
 On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
 On 09/13/2013 07:23 AM, Alex Williamson wrote:
 So far we've succeeded at making KVM and VFIO mostly unaware of each
 other, but there's any important point where that breaks down.  Intel
 VT-d hardware may or may not support snoop control.  When snoop
 control is available, intel-iommu promotes No-Snoop transactions on
 PCIe to be cache coherent.  That allows KVM to handle things like the
 x86 WBINVD opcode as a nop.  When the hardware does not support this,
 KVM must implement a hardware visible WBINVD for the guest.

 We could simply let userspace tell KVM how to handle WBINVD, but it's
 privileged for a reason.  Allowing an arbitrary user to enable
 physical WBINVD gives them a more access to the hardware.  Previously,
 this has only been enabled for guests supporting legacy PCI device
 assignment.  In such cases it's necessary for proper guest execution.
 We therefore create a new KVM-VFIO virtual device.  The user can add
 and remove VFIO groups to this device via file descriptors.  KVM
 makes use of the VFIO external user interface to validate that the
 user has access to physical hardware and gets the coherency state of
 the IOMMU from VFIO.  This provides equivalent functionality to
 legacy KVM assignment, while keeping (nearly) all the bits isolated.

 The one intrusion is the resulting flag indicating the coherency
 state.  For this RFC it's placed on the x86 kvm_arch struct, however
 I know POWER has interest in using the VFIO external user interface,
 and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
 care about No-Snoop handling as well or the code can be #ifdef'd.


 POWER does not support (at least boos3s - server, not sure about others)
 this cache-non-coherent stuff at all.

 Then it's easy for your IOMMU API interface to return always cache
 coherent or never cache coherent or whatever ;)

 Regarding reusing this device with external API for POWER - I posted a
 patch which introduces KVM device to link KVM with IOMMU but besides the
 list of groups registered in KVM, it also provides the way to find a group
 by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
 in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
 there window_size too (for a quick boundary check). I am not sure we want
 to mix everything here.

 It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
 handling if you are interested (kvmppc_spapr_tce_iommu_device).

 Yes, I stole the code to get the vfio symbols from your code.  The
 convergence I was hoping to achieve is that KVM doesn't really want to
 know about VFIO and vica versa.  We can therefore at least limit the
 intrusion by sharing a common device.  Obviously for you it will need
 some extra interfaces to associate an LIOBN to a group, but we keep both
 the kernel an userspace cleaner by avoiding duplication where we can.
 Is this really not extensible to your usage?

 Well, I do not have a good taste for this kind of stuff so I cannot tell
 for sure. I can reuse this device and hack it to do whatever I need but how?

 There are 2 things I need from KVM device:
 1. associate IOMMU group with LIOBN
 
 Does QEMU know this?  We could set another attribute to specify the
 LIOBN for a group.

QEMU knows as QEMU decides on LOIBN number. And yes, we could do that.


 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
 pointer which is an IOMMU data of an IOMMU group so we could take a
 shortcut here).
 
 Once we have a VFIO device with a VFIO group added and the LIOBN
 attribute set, isn't this just a matter of some access code?


The lookup function will be called from what we call a realmode on PPC64,
i.e. when MMU is off and kvm.ko code is not available. So we will either
have to put this lookup function to a separate virt/kvm/vfio_rm.c or
compile the whole thing into the kernel image but this is not a big issue
anyway.

You can have a look for example at book3s_64_vio_hv.o vs. book3s_64_vio.o
to get a better picture if you like.


 There are 3 possible interfaces to use:
 A. set/get attributes
 B. ioctl
 C. additional API
 
 I think we need to differentiate user interfaces from kernel interfaces.
 Within the kernel, we make no guarantees about interfaces and we can
 change them to meet our needs.  That includes the vfio external user
 interface.  For userspace, we need to be careful not to break things.  I
 suggest we use the set/get/has attribute interface that's already part
 of KVM for the user interface.
 
 What I posted a week ago uses A for 1 and C for 2. Now we move this to
 virt/kvm/vfio.c.
 
 I don't care where it lives, other than we both have a need for it, so
 it seems like the core of it should not live in architecture specific

Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly

2013-09-25 Thread Aneesh Kumar K.V


Hi Alex,

Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes:
 Ok, please give me an example with real numbers and why it breaks.
 
 
 http://mid.gmane.org/1376995766-16526-4-git-send-email-aneesh.ku...@linux.vnet.ibm.com
 
 
 Didn't quiet get what you are looking for. As explained before, we now
 need to pass an array with array size 3 even though we know we need to
 read only 2 entries because kernel doesn't loop correctly.

 But we need to do that regardless, because newer QEMU needs to be able to 
 run on older kernels, no?


 yes. So use space will have to pass an array of size 3. But that should
 not prevent us from fixing this right ?


Do we still want this patch or should I drop this ?

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/18] KVM: PPC: Fixes for PR and preparation for POWER8

2013-09-25 Thread Alexander Graf


On 20.09.2013, at 06:52, Paul Mackerras wrote:

 This patch series contains updated versions of patches that have been
 posted before, plus one new compilation fix (for PR KVM without
 CONFIG_ALTIVEC), plus a patch to allow the guest VRSAVE register to be
 accessed with the ONE_REG interface on Book E.  The first few patches
 are preparation for POWER8 support.  Following that there are several
 patches that improve PR KVM's MMU emulation and prepare for being able
 to compile both HV and PR KVM in the one kernel.  The series stops
 short of allowing them to coexist, though, since the details of how
 that should best be done are still being discussed.
 
 Please apply.

Thanks, applied all (with minor cosmetic fixes) to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: Implement default IRQ routing

2013-09-25 Thread Paul Mackerras

On Mon, Sep 23, 2013 at 09:34:01PM +0300, Gleb Natapov wrote:
 On Mon, Sep 23, 2013 at 09:24:14PM +1000, Paul Mackerras wrote:
  On Sun, Sep 22, 2013 at 03:32:53PM +0300, Gleb Natapov wrote:
   On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote:
This implements a simple way to express the case of IRQ routing where
there is one in-kernel PIC and the system interrupts (GSIs) are routed
1-1 to the corresponding pins of the PIC.  This is expressed by having
kvm-irq_routing == NULL with a skeleton irq routing entry in the new
kvm-default_irq_route field.

This provides a convenient way to provide backwards compatibility when
adding IRQ routing capability to an existing in-kernel PIC, such as the
XICS emulation on powerpc.

   Why not create simple 1-1 irq routing table? It will take a little bit
   more memory, but there will be no need for kvm-irq_routing == NULL
   special handling.
  
  The short answer is that userspace wants to use interrupt source
  numbers (i.e. pin numbers for the inputs to the emulated XICS) that
  are scattered throughout a large space, since that mirrors what real
  hardware does.  More specifically, hardware divides up the interrupt
  source number into two fields, each of typically 12 bits, where the
  more significant field identifies an interrupt source unit (ISU) and
  the less significant field identifies an interrupt within the ISU.
  Each PCI host bridge would have an ISU, for example, and there can be
  ISUs associated with other things that attach directly to the
  interconnect fabric (coprocessors, cluster interconnects, etc.).
  
  Today, QEMU creates a virtual ISU numbered 1 for the emulated PCI host
  bridge, which means for example that virtio devices get interrupt pin
  numbers starting at 4096.
  
  So, I could have increased KVM_IRQCHIP_NUM_PINS to some multiple of
  4096, say 16384, which would allow for 3 ISUs.  But that would bloat
  out struct kvm_irq_routing_table to over 64kB, and if I wanted 1-1
  mappings between GSI and pins for all of them, the routing table would
  be over 960kB.
  
 Yes, this is not an option. GSI is just a cookie for anything but x86
 non MSI interrupts. So the way to use irq routing table to deliver XICS irqs
 is to register GSI-XICS irq mapping and by triggering GSI, which is
 just an arbitrary number, userspace tells kernel that XICS irq, that was
 registered for that GSI, should be injected.

Yes, that's fine as far as it goes, but the trouble is that the
existing data structures (specifically the chip[][] array in struct
kvm_irq_routing_table) don't handle well the case where the pin
numbers are large and/or sparse.

In other words, using a small compact set of GSI numbers wouldn't
help, because it's not the GSI - pin mapping that is the problem, it
is the reverse pin - GSI mapping.

  There is a compatibility concern too -- if I want existing userspace
  to run, I would have to create 1-1 default mappings for at least the
  first (say) 4200 pins or so, which would use up about 294kB.  That
  really doesn't seem worth it compared to just using the null routing
  table pointer to indicate an unlimited 1-1 mapping.
  
 Let me check that I understand you correctly. Exiting userspace already
 uses XICS irq directly and now you want to switch this interface to use
 irq routing. New userspace will register GSI-XICS irq mapping like
 described above, this is what [2/2] does. Is this correct? 

I don't particularly want irq routing, but it appears to be
unavoidable if one wants IRQFD, which I do want.  There is no
particular advantage to userspace in using a GSI - XICS irq mapping.
If userspace did set one up it would be the identity mapping.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

43 matches

Mail list logo