Re: [PATCHv2/RFC] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

2014-01-17 Thread Christian Borntraeger
On 16/01/14 19:56, Michael S. Tsirkin wrote:
 On Thu, Jan 16, 2014 at 02:07:19PM +0100, Paolo Bonzini wrote:
 Il 16/01/2014 14:06, Christian Borntraeger ha scritto:
 Will you edit the patch description or shall I resend the patch?

 I can edit the commit message.

 Paolo
 
 I think we really need to see the effect adding srcu has on interrupt
 injection.

Michael, 
do you have a quick way to check if srcu has a noticeable impact on int
injection on your systems? I am happy with either v2 or v3 of the patch,
but srcu_synchronize_expedited seems to have less latency impact on the 
full system than rcu_synchronize_expedited. This might give Paolo a hint
which of the patches is the right way to go.

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2/RFC] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

2014-01-17 Thread Paolo Bonzini
Il 17/01/2014 09:29, Christian Borntraeger ha scritto:
 Michael, 
 do you have a quick way to check if srcu has a noticeable impact on int
 injection on your systems? I am happy with either v2 or v3 of the patch,
 but srcu_synchronize_expedited seems to have less latency impact on the 
 full system than rcu_synchronize_expedited. This might give Paolo a hint
 which of the patches is the right way to go.

For 3.14, I'll definitely pick v3.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v5] add support for Hyper-V reference time counter

2014-01-17 Thread Paolo Bonzini
Il 16/01/2014 23:23, Marcelo Tosatti ha scritto:
 Comment 2)
 
 Is there any specification related to the initial value of the clock
 after it is enabled ?

The clock counts since the VM was started.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-17 Thread Jan Beulich
 On 16.01.14 at 17:04, Julian Stecklina jstec...@os.inf.tu-dresden.de 
 wrote:
 On 01/16/2014 04:04 PM, Jan Beulich wrote:
 I don't think so - this would only be an issue if the conditions used
 | instead of ||. || implies a sequence point between evaluating the
 left and right sides, and the standard says: The presence of a
 sequence point between the evaluation of expressions A and B
 implies that every value computation and side effect associated
 with A is sequenced before every value computation and side
 effect associated with B.
 
 This only applies to single-threaded code. Multithreaded code must be
 data-race free for that to be true. See
 
 https://lwn.net/Articles/508991/ 
 
 And even if there was a problem (i.e. my interpretation of the
 above being incorrect), I don't think you'd need ACCESS_ONCE()
 here: The same local variable can't have two different values in
 two different use sites when there was no intermediate
 assignment to it.
 
 Same comment as above.

One half of this doesn't apply here, due to the explicit barriers
that are there. The half about converting local variable accesses
back to memory reads (i.e. eliding the local variable), however,
is only a theoretical issue afaict: If a compiler really did this, I
think there'd be far more places where this would hurt.

Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-17 Thread Julian Stecklina
On 01/17/2014 10:41 AM, Jan Beulich wrote:
 The half about converting local variable accesses
 back to memory reads (i.e. eliding the local variable), however,
 is only a theoretical issue afaict: If a compiler really did this, I
 think there'd be far more places where this would hurt.

It happens rarely, but it does happen. Not fixing those issues is
inviting trouble with new compiler generations. And these issues are
terribly hard to debug.

Julian



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v5] add support for Hyper-V reference time counter

2014-01-17 Thread Vadim Rozenfeld
On Thu, 2014-01-16 at 20:23 -0200, Marcelo Tosatti wrote:
 On Thu, Jan 16, 2014 at 08:18:37PM +1100, Vadim Rozenfeld wrote:
  Signed-off: Peter Lieven p...@kamp.de
  Signed-off: Gleb Natapov
  Signed-off: Vadim Rozenfeld vroze...@redhat.com
   
  After some consideration I decided to submit only Hyper-V reference
  counters support this time. I will submit iTSC support as a separate
  patch as soon as it is ready. 
  
  v1 - v2
  1. mark TSC page dirty as suggested by 
  Eric Northup digitale...@google.com and Gleb
  2. disable local irq when calling get_kernel_ns, 
  as it was done by Peter Lieven p...@amp.de
  3. move check for TSC page enable from second patch
  to this one.
  
  v3 - v4
  Get rid of ref counter offset.
  
  v4 - v5
  replace __copy_to_user with kvm_write_guest
  when updateing iTSC page.
  
  ---
   arch/x86/include/asm/kvm_host.h|  1 +
   arch/x86/include/uapi/asm/hyperv.h | 13 +
   arch/x86/kvm/x86.c | 28 +++-
   include/uapi/linux/kvm.h   |  1 +
   4 files changed, 42 insertions(+), 1 deletion(-)
  
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index ae5d783..33fef07 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -605,6 +605,7 @@ struct kvm_arch {
  /* fields used by HYPER-V emulation */
  u64 hv_guest_os_id;
  u64 hv_hypercall;
  +   u64 hv_tsc_page;
   
  #ifdef CONFIG_KVM_MMU_AUDIT
  int audit_point;
  diff --git a/arch/x86/include/uapi/asm/hyperv.h 
  b/arch/x86/include/uapi/asm/hyperv.h
  index b8f1c01..462efe7 100644
  --- a/arch/x86/include/uapi/asm/hyperv.h
  +++ b/arch/x86/include/uapi/asm/hyperv.h
  @@ -28,6 +28,9 @@
   /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
   #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE(1  1)
   
  +/* A partition's reference time stamp counter (TSC) page */
  +#define HV_X64_MSR_REFERENCE_TSC   0x4021
  +
   /*
* There is a single feature flag that signifies the presence of the MSR
* that can be used to retrieve both the local APIC Timer frequency as
  @@ -198,6 +201,9 @@
   #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK   \
  (~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
   
  +#define HV_X64_MSR_TSC_REFERENCE_ENABLE0x0001
  +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
  +
   #define HV_PROCESSOR_POWER_STATE_C00
   #define HV_PROCESSOR_POWER_STATE_C11
   #define HV_PROCESSOR_POWER_STATE_C22
  @@ -210,4 +216,11 @@
   #define HV_STATUS_INVALID_ALIGNMENT4
   #define HV_STATUS_INSUFFICIENT_BUFFERS 19
   
  +typedef struct _HV_REFERENCE_TSC_PAGE {
  +   __u32 tsc_sequence;
  +   __u32 res1;
  +   __u64 tsc_scale;
  +   __s64 tsc_offset;
  +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
  +
   #endif
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 5d004da..8e685b8 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -836,11 +836,12 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
* kvm-specific. Those are put in the beginning of the list.
*/
   
  -#define KVM_SAVE_MSRS_BEGIN10
  +#define KVM_SAVE_MSRS_BEGIN12
   static u32 msrs_to_save[] = {
  MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
  MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
  HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
  +   HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
  HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
  MSR_KVM_PV_EOI_EN,
  MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
  @@ -1826,6 +1827,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
  switch (msr) {
  case HV_X64_MSR_GUEST_OS_ID:
  case HV_X64_MSR_HYPERCALL:
  +   case HV_X64_MSR_REFERENCE_TSC:
  +   case HV_X64_MSR_TIME_REF_COUNT:
  r = true;
  break;
  }
  @@ -1867,6 +1870,20 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, 
  u32 msr, u64 data)
  kvm-arch.hv_hypercall = data;
  break;
  }
  +   case HV_X64_MSR_REFERENCE_TSC: {
  +   u64 gfn;
  +   HV_REFERENCE_TSC_PAGE tsc_ref;
  +   memset(tsc_ref, 0, sizeof(tsc_ref));
  +   kvm-arch.hv_tsc_page = data;
 
 Comment 1)
 
 Is there a reason (that is compliance with spec) to maintain
 value, for HV_X64_MSR_REFERENCE_TSC wrmsr operation, in case
 HV_X64_MSR_TSC_REFERENCE_ENABLE is not set?
 
Windows seems to be retrieving HV_X64_MSR_REFERENCE_TSC value only once
on boot-up, checks HV_X64_MSR_TSC_REFERENCE_ENABLE bit allocate one page
and maps it into the system space, and writes the page address to
HV_X64_MSR_REFERENCE_TSC MSR if this bit was not set. Windows keeps the
TSC page address value in HvlReferenceTscPage variable and uses it
every time when needs to read the TSC page 

Re: nested EPT

2014-01-17 Thread Kashyap Chamarthy
On Fri, Jan 17, 2014 at 2:51 AM, duy hai nguyen hain...@gmail.com wrote:
 Now I can run an L2 guest (nested guest)  using the kvm kernel module
 of kernel 3.12

 However, I am facing a new problem when trying to build and use kvm
 kernel module from git://git.kiszka.org/kvm-kmod.git: L1 (nested
 hypervisor) cannot boot L2  and the graphic console of virt-manager
 hangs displaying 'Booting from Hard Disk...'. L1 still runs fine.

 Loading kvm_intel with 'emulate_invalid_guest_state=0' in L0 does not
 solve the problem. I have also tried with different kernel versions:
 3.12.0, 3.12.8 and 3.13.0 without success.

 Can you give me some suggestions?

Maybe you can try without graphical managers and enable serial console
('console=ttyS0') to your Kernel command-line of L2 guest, so you can
see where it's stuck.

/kashyap
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-17 Thread Jan Kiszka
On 2014-01-17 12:29, Kashyap Chamarthy wrote:
 On Fri, Jan 17, 2014 at 2:51 AM, duy hai nguyen hain...@gmail.com wrote:
 Now I can run an L2 guest (nested guest)  using the kvm kernel module
 of kernel 3.12

 However, I am facing a new problem when trying to build and use kvm
 kernel module from git://git.kiszka.org/kvm-kmod.git: L1 (nested
 hypervisor) cannot boot L2  and the graphic console of virt-manager
 hangs displaying 'Booting from Hard Disk...'. L1 still runs fine.

 Loading kvm_intel with 'emulate_invalid_guest_state=0' in L0 does not
 solve the problem. I have also tried with different kernel versions:
 3.12.0, 3.12.8 and 3.13.0 without success.

 Can you give me some suggestions?
 
 Maybe you can try without graphical managers and enable serial console
 ('console=ttyS0') to your Kernel command-line of L2 guest, so you can
 see where it's stuck.

Tracing can also be helpful, both in L1 and L0:

http://www.linux-kvm.org/page/Tracing

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] MIPS: KVM: fixes for KVM on ProAptiv cores

2014-01-17 Thread James Hogan
ProAptiv support includes support for EHINV (TLB invalidation) and FTLB
(large fixed page size TLBs), both of which cause problems when combined
with KVM. These two patches fix those problems.

These are based on John Crispin's mips-next-3.14 branch where ProAptiv
support is applied. Please consider applying these for v3.14 too.

v2:
- Rewrite patch 2 commit message to be a bit clearer and more explicit
  (on John Crispin's suggestion).

Cc: Ralf Baechle r...@linux-mips.org
Cc: John Crispin blo...@openwrt.org
Cc: linux-m...@linux-mips.org
Cc: Gleb Natapov g...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: kvm@vger.kernel.org
Cc: Markos Chandras markos.chand...@imgtec.com
Cc: Sanjay Lal sanj...@kymasys.com

James Hogan (2):
  MIPS: KVM: use common EHINV aware UNIQUE_ENTRYHI
  MIPS: KVM: remove shadow_tlb code

 arch/mips/include/asm/kvm_host.h |   7 --
 arch/mips/kvm/kvm_mips.c |   1 -
 arch/mips/kvm/kvm_tlb.c  | 134 +--
 3 files changed, 1 insertion(+), 141 deletions(-)

-- 
1.8.1.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] MIPS: KVM: use common EHINV aware UNIQUE_ENTRYHI

2014-01-17 Thread James Hogan
When KVM is enabled and TLB invalidation is supported,
kvm_mips_flush_host_tlb() can cause a machine check exception due to
multiple matching TLB entries. This can occur on shutdown even when KVM
hasn't been actively used.

Commit adb78de9eae8 (MIPS: mm: Move UNIQUE_ENTRYHI macro to a header
file) created a common UNIQUE_ENTRYHI in asm/tlb.h but it didn't update
the copy of UNIQUE_ENTRYHI in kvm_tlb.c to use it.

Commit 36b175451399 (MIPS: tlb: Set the EHINV bit for TLBINVF cores when
invalidating the TLB) later added TLB invalidation (EHINV) support to
the common UNIQUE_ENTRYHI.

Therefore make kvm_tlb.c use the EHINV aware UNIQUE_ENTRYHI
implementation in asm/tlb.h too.

Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Ralf Baechle r...@linux-mips.org
Cc: John Crispin blo...@openwrt.org
Cc: linux-m...@linux-mips.org
Cc: Gleb Natapov g...@redhat.com
Cc: kvm@vger.kernel.org
Cc: Sanjay Lal sanj...@kymasys.com
Reviewed-by: Markos Chandras markos.chand...@imgtec.com
Acked-by: Paolo Bonzini pbonz...@redhat.com
---
This is based on John Crispin's mips-next-3.14 branch.

I do not object to it being squashed into commit adb78de9eae8 (MIPS: mm:
Move UNIQUE_ENTRYHI macro to a header file) since that commit hasn't
reached mainline yet.
---
 arch/mips/kvm/kvm_tlb.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/mips/kvm/kvm_tlb.c b/arch/mips/kvm/kvm_tlb.c
index c777dd36d4a8..52083ea7fddd 100644
--- a/arch/mips/kvm/kvm_tlb.c
+++ b/arch/mips/kvm/kvm_tlb.c
@@ -25,6 +25,7 @@
 #include asm/mmu_context.h
 #include asm/pgtable.h
 #include asm/cacheflush.h
+#include asm/tlb.h
 
 #undef CONFIG_MIPS_MT
 #include asm/r4kcache.h
@@ -35,9 +36,6 @@
 
 #define PRIx64 llx
 
-/* Use VZ EntryHi.EHINV to invalidate TLB entries */
-#define UNIQUE_ENTRYHI(idx) (CKSEG0 + ((idx)  (PAGE_SHIFT + 1)))
-
 atomic_t kvm_mips_instance;
 EXPORT_SYMBOL(kvm_mips_instance);
 
-- 
1.8.1.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] MIPS: KVM: remove shadow_tlb code

2014-01-17 Thread James Hogan
The kvm_mips_init_shadow_tlb() function is called from
kvm_arch_vcpu_init() and initialises entries 0 to
current_cpu_data.tlbsize-1 of the virtual cpu's shadow_tlb[64] array.

However newer cores with FTLBs can have a tlbsize  64, for example the
ProAptiv I'm testing on has a total tlbsize of 576. This causes
kvm_mips_init_shadow_tlb() to overflow the shadow_tlb[64] array and
overwrite the comparecount_timer among other things, causing a lock up
when starting a KVM guest.

Aside from kvm_mips_init_shadow_tlb() which only initialises it, the
shadow_tlb[64] array is only actually used by the following functions:
 - kvm_shadow_tlb_put()  kvm_shadow_tlb_load()
 These are never called. The only call sites are #if 0'd out.
 - kvm_mips_dump_shadow_tlbs()
 This is never called.

It was originally added for trap  emulate, but turned out to be
unnecessary so it was disabled.

So instead of fixing the shadow_tlb initialisation code, lets just
remove the shadow_tlb[64] array and the above functions entirely. The
only functional change here is the removal of broken shadow_tlb
initialisation. The rest just deletes dead code.

Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Ralf Baechle r...@linux-mips.org
Cc: John Crispin blo...@openwrt.org
Cc: linux-m...@linux-mips.org
Cc: Gleb Natapov g...@redhat.com
Cc: kvm@vger.kernel.org
Cc: Sanjay Lal sanj...@kymasys.com
Acked-by: Paolo Bonzini pbonz...@redhat.com
---
This is based on John Crispin's mips-next-3.14 branch where FTLB support
is applied.

v2:
- Rewrite commit message to be a bit clearer and more explicit (on John
  Crispin's suggestion).
---
 arch/mips/include/asm/kvm_host.h |   7 ---
 arch/mips/kvm/kvm_mips.c |   1 -
 arch/mips/kvm/kvm_tlb.c  | 130 ---
 3 files changed, 138 deletions(-)

diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 32966969f2f9..a995fce87791 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -391,9 +391,6 @@ struct kvm_vcpu_arch {
uint32_t guest_kernel_asid[NR_CPUS];
struct mm_struct guest_kernel_mm, guest_user_mm;
 
-   struct kvm_mips_tlb shadow_tlb[NR_CPUS][KVM_MIPS_GUEST_TLB_SIZE];
-
-
struct hrtimer comparecount_timer;
 
int last_sched_cpu;
@@ -529,7 +526,6 @@ extern enum emulation_result 
kvm_mips_handle_tlbmod(unsigned long cause,
 
 extern void kvm_mips_dump_host_tlbs(void);
 extern void kvm_mips_dump_guest_tlbs(struct kvm_vcpu *vcpu);
-extern void kvm_mips_dump_shadow_tlbs(struct kvm_vcpu *vcpu);
 extern void kvm_mips_flush_host_tlb(int skip_kseg0);
 extern int kvm_mips_host_tlb_inv(struct kvm_vcpu *vcpu, unsigned long entryhi);
 extern int kvm_mips_host_tlb_inv_index(struct kvm_vcpu *vcpu, int index);
@@ -541,10 +537,7 @@ extern unsigned long 
kvm_mips_translate_guest_kseg0_to_hpa(struct kvm_vcpu *vcpu
   unsigned long gva);
 extern void kvm_get_new_mmu_context(struct mm_struct *mm, unsigned long cpu,
struct kvm_vcpu *vcpu);
-extern void kvm_shadow_tlb_put(struct kvm_vcpu *vcpu);
-extern void kvm_shadow_tlb_load(struct kvm_vcpu *vcpu);
 extern void kvm_local_flush_tlb_all(void);
-extern void kvm_mips_init_shadow_tlb(struct kvm_vcpu *vcpu);
 extern void kvm_mips_alloc_new_mmu_context(struct kvm_vcpu *vcpu);
 extern void kvm_mips_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 extern void kvm_mips_vcpu_put(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
index 73b34827826c..da5186fbd77a 100644
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -1001,7 +1001,6 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
hrtimer_init(vcpu-arch.comparecount_timer, CLOCK_MONOTONIC,
 HRTIMER_MODE_REL);
vcpu-arch.comparecount_timer.function = kvm_mips_comparecount_wakeup;
-   kvm_mips_init_shadow_tlb(vcpu);
return 0;
 }
 
diff --git a/arch/mips/kvm/kvm_tlb.c b/arch/mips/kvm/kvm_tlb.c
index 52083ea7fddd..68e6563915cd 100644
--- a/arch/mips/kvm/kvm_tlb.c
+++ b/arch/mips/kvm/kvm_tlb.c
@@ -145,30 +145,6 @@ void kvm_mips_dump_guest_tlbs(struct kvm_vcpu *vcpu)
}
 }
 
-void kvm_mips_dump_shadow_tlbs(struct kvm_vcpu *vcpu)
-{
-   int i;
-   volatile struct kvm_mips_tlb tlb;
-
-   printk(Shadow TLBs:\n);
-   for (i = 0; i  KVM_MIPS_GUEST_TLB_SIZE; i++) {
-   tlb = vcpu-arch.shadow_tlb[smp_processor_id()][i];
-   printk(TLB%c%3d Hi 0x%08lx ,
-  (tlb.tlb_lo0 | tlb.tlb_lo1)  MIPS3_PG_V ? ' ' : '*',
-  i, tlb.tlb_hi);
-   printk(Lo0=0x%09 PRIx64  %c%c attr %lx ,
-  (uint64_t) mips3_tlbpfn_to_paddr(tlb.tlb_lo0),
-  (tlb.tlb_lo0  MIPS3_PG_D) ? 'D' : ' ',
-  (tlb.tlb_lo0  MIPS3_PG_G) ? 'G' : ' ',
-  (tlb.tlb_lo0  3)  7);
-   

Re: [RFC PATCH v5] add support for Hyper-V reference time counter

2014-01-17 Thread Marcelo Tosatti
On Fri, Jan 17, 2014 at 10:06:00PM +1100, Vadim Rozenfeld wrote:
 On Thu, 2014-01-16 at 20:23 -0200, Marcelo Tosatti wrote:
  On Thu, Jan 16, 2014 at 08:18:37PM +1100, Vadim Rozenfeld wrote:
   Signed-off: Peter Lieven p...@kamp.de
   Signed-off: Gleb Natapov
   Signed-off: Vadim Rozenfeld vroze...@redhat.com

   After some consideration I decided to submit only Hyper-V reference
   counters support this time. I will submit iTSC support as a separate
   patch as soon as it is ready. 
   
   v1 - v2
   1. mark TSC page dirty as suggested by 
   Eric Northup digitale...@google.com and Gleb
   2. disable local irq when calling get_kernel_ns, 
   as it was done by Peter Lieven p...@amp.de
   3. move check for TSC page enable from second patch
   to this one.
   
   v3 - v4
   Get rid of ref counter offset.
   
   v4 - v5
   replace __copy_to_user with kvm_write_guest
   when updateing iTSC page.
   
   ---
arch/x86/include/asm/kvm_host.h|  1 +
arch/x86/include/uapi/asm/hyperv.h | 13 +
arch/x86/kvm/x86.c | 28 +++-
include/uapi/linux/kvm.h   |  1 +
4 files changed, 42 insertions(+), 1 deletion(-)
   
   diff --git a/arch/x86/include/asm/kvm_host.h 
   b/arch/x86/include/asm/kvm_host.h
   index ae5d783..33fef07 100644
   --- a/arch/x86/include/asm/kvm_host.h
   +++ b/arch/x86/include/asm/kvm_host.h
   @@ -605,6 +605,7 @@ struct kvm_arch {
 /* fields used by HYPER-V emulation */
 u64 hv_guest_os_id;
 u64 hv_hypercall;
   + u64 hv_tsc_page;

 #ifdef CONFIG_KVM_MMU_AUDIT
 int audit_point;
   diff --git a/arch/x86/include/uapi/asm/hyperv.h 
   b/arch/x86/include/uapi/asm/hyperv.h
   index b8f1c01..462efe7 100644
   --- a/arch/x86/include/uapi/asm/hyperv.h
   +++ b/arch/x86/include/uapi/asm/hyperv.h
   @@ -28,6 +28,9 @@
/* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
#define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE  (1  1)

   +/* A partition's reference time stamp counter (TSC) page */
   +#define HV_X64_MSR_REFERENCE_TSC 0x4021
   +
/*
 * There is a single feature flag that signifies the presence of the MSR
 * that can be used to retrieve both the local APIC Timer frequency as
   @@ -198,6 +201,9 @@
#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \
 (~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))

   +#define HV_X64_MSR_TSC_REFERENCE_ENABLE  0x0001
   +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT   12
   +
#define HV_PROCESSOR_POWER_STATE_C0  0
#define HV_PROCESSOR_POWER_STATE_C1  1
#define HV_PROCESSOR_POWER_STATE_C2  2
   @@ -210,4 +216,11 @@
#define HV_STATUS_INVALID_ALIGNMENT  4
#define HV_STATUS_INSUFFICIENT_BUFFERS   19

   +typedef struct _HV_REFERENCE_TSC_PAGE {
   + __u32 tsc_sequence;
   + __u32 res1;
   + __u64 tsc_scale;
   + __s64 tsc_offset;
   +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
   +
#endif
   diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
   index 5d004da..8e685b8 100644
   --- a/arch/x86/kvm/x86.c
   +++ b/arch/x86/kvm/x86.c
   @@ -836,11 +836,12 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
 * kvm-specific. Those are put in the beginning of the list.
 */

   -#define KVM_SAVE_MSRS_BEGIN  10
   +#define KVM_SAVE_MSRS_BEGIN  12
static u32 msrs_to_save[] = {
 MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
   + HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
 HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
 MSR_KVM_PV_EOI_EN,
 MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
   @@ -1826,6 +1827,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
 switch (msr) {
 case HV_X64_MSR_GUEST_OS_ID:
 case HV_X64_MSR_HYPERCALL:
   + case HV_X64_MSR_REFERENCE_TSC:
   + case HV_X64_MSR_TIME_REF_COUNT:
 r = true;
 break;
 }
   @@ -1867,6 +1870,20 @@ static int set_msr_hyperv_pw(struct kvm_vcpu 
   *vcpu, u32 msr, u64 data)
 kvm-arch.hv_hypercall = data;
 break;
 }
   + case HV_X64_MSR_REFERENCE_TSC: {
   + u64 gfn;
   + HV_REFERENCE_TSC_PAGE tsc_ref;
   + memset(tsc_ref, 0, sizeof(tsc_ref));
   + kvm-arch.hv_tsc_page = data;
  
  Comment 1)
  
  Is there a reason (that is compliance with spec) to maintain
  value, for HV_X64_MSR_REFERENCE_TSC wrmsr operation, in case
  HV_X64_MSR_TSC_REFERENCE_ENABLE is not set?
  
 Windows seems to be retrieving HV_X64_MSR_REFERENCE_TSC value only once
 on boot-up, checks HV_X64_MSR_TSC_REFERENCE_ENABLE bit allocate one page
 and maps it into the system space, and writes the page address to
 HV_X64_MSR_REFERENCE_TSC MSR if this bit was not set. Windows keeps the
 TSC page 

Re: [RFC PATCH v5] add support for Hyper-V reference time counter

2014-01-17 Thread Paolo Bonzini
Il 17/01/2014 14:18, Marcelo Tosatti ha scritto:
 On Fri, Jan 17, 2014 at 10:06:00PM +1100, Vadim Rozenfeld wrote:
 On Thu, 2014-01-16 at 20:23 -0200, Marcelo Tosatti wrote:
 On Thu, Jan 16, 2014 at 08:18:37PM +1100, Vadim Rozenfeld wrote:
 Signed-off: Peter Lieven p...@kamp.de
 Signed-off: Gleb Natapov
 Signed-off: Vadim Rozenfeld vroze...@redhat.com
  
 After some consideration I decided to submit only Hyper-V reference
 counters support this time. I will submit iTSC support as a separate
 patch as soon as it is ready. 

 v1 - v2
 1. mark TSC page dirty as suggested by 
 Eric Northup digitale...@google.com and Gleb
 2. disable local irq when calling get_kernel_ns, 
 as it was done by Peter Lieven p...@amp.de
 3. move check for TSC page enable from second patch
 to this one.

 v3 - v4
 Get rid of ref counter offset.

 v4 - v5
 replace __copy_to_user with kvm_write_guest
 when updateing iTSC page.

 ---
  arch/x86/include/asm/kvm_host.h|  1 +
  arch/x86/include/uapi/asm/hyperv.h | 13 +
  arch/x86/kvm/x86.c | 28 +++-
  include/uapi/linux/kvm.h   |  1 +
  4 files changed, 42 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index ae5d783..33fef07 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -605,6 +605,7 @@ struct kvm_arch {
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
 +  u64 hv_tsc_page;
  
#ifdef CONFIG_KVM_MMU_AUDIT
int audit_point;
 diff --git a/arch/x86/include/uapi/asm/hyperv.h 
 b/arch/x86/include/uapi/asm/hyperv.h
 index b8f1c01..462efe7 100644
 --- a/arch/x86/include/uapi/asm/hyperv.h
 +++ b/arch/x86/include/uapi/asm/hyperv.h
 @@ -28,6 +28,9 @@
  /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
  #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE   (1  1)
  
 +/* A partition's reference time stamp counter (TSC) page */
 +#define HV_X64_MSR_REFERENCE_TSC  0x4021
 +
  /*
   * There is a single feature flag that signifies the presence of the MSR
   * that can be used to retrieve both the local APIC Timer frequency as
 @@ -198,6 +201,9 @@
  #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK  \
(~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
  
 +#define HV_X64_MSR_TSC_REFERENCE_ENABLE   0x0001
 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT12
 +
  #define HV_PROCESSOR_POWER_STATE_C0   0
  #define HV_PROCESSOR_POWER_STATE_C1   1
  #define HV_PROCESSOR_POWER_STATE_C2   2
 @@ -210,4 +216,11 @@
  #define HV_STATUS_INVALID_ALIGNMENT   4
  #define HV_STATUS_INSUFFICIENT_BUFFERS19
  
 +typedef struct _HV_REFERENCE_TSC_PAGE {
 +  __u32 tsc_sequence;
 +  __u32 res1;
 +  __u64 tsc_scale;
 +  __s64 tsc_offset;
 +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 +
  #endif
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 5d004da..8e685b8 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -836,11 +836,12 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
   * kvm-specific. Those are put in the beginning of the list.
   */
  
 -#define KVM_SAVE_MSRS_BEGIN   10
 +#define KVM_SAVE_MSRS_BEGIN   12
  static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 +  HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
MSR_KVM_PV_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 @@ -1826,6 +1827,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
switch (msr) {
case HV_X64_MSR_GUEST_OS_ID:
case HV_X64_MSR_HYPERCALL:
 +  case HV_X64_MSR_REFERENCE_TSC:
 +  case HV_X64_MSR_TIME_REF_COUNT:
r = true;
break;
}
 @@ -1867,6 +1870,20 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, 
 u32 msr, u64 data)
kvm-arch.hv_hypercall = data;
break;
}
 +  case HV_X64_MSR_REFERENCE_TSC: {
 +  u64 gfn;
 +  HV_REFERENCE_TSC_PAGE tsc_ref;
 +  memset(tsc_ref, 0, sizeof(tsc_ref));
 +  kvm-arch.hv_tsc_page = data;

 Comment 1)

 Is there a reason (that is compliance with spec) to maintain
 value, for HV_X64_MSR_REFERENCE_TSC wrmsr operation, in case
 HV_X64_MSR_TSC_REFERENCE_ENABLE is not set?
  
 Windows seems to be retrieving HV_X64_MSR_REFERENCE_TSC value only once
 on boot-up, checks HV_X64_MSR_TSC_REFERENCE_ENABLE bit allocate one page
 and maps it into the system space, and writes the page address to
 HV_X64_MSR_REFERENCE_TSC MSR if this bit was not set. Windows keeps the
 TSC page address value in HvlReferenceTscPage variable and uses it
 every time when needs to read the TSC page content.
 
 Ok then it has to be saved/restored 

[PULL 1/3] KVM: s390: enable Transactional Execution

2014-01-17 Thread Christian Borntraeger
From: Michael Mueller m...@linux.vnet.ibm.com

This patch enables transactional execution for KVM guests
on s390 systems zec12 or later.

We rework the allocation of the page containing the sie_block
to also back the Interception Transaction Diagnostic Block.
If available the TE facilities will be enabled.

Setting bit 73 and 50 in vfacilities bitmask reveals the HW
facilities Transactional Memory and Constraint Transactional
Memory respectively to the KVM guest.

Furthermore, the patch restores the Program-Interruption TDB
from the Interception TDB in case a program interception has
occurred and the ITDB has a valid format.

Signed-off-by: Michael Mueller m...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/include/asm/kvm_host.h |   15 ++-
 arch/s390/kvm/intercept.c|   11 +++
 arch/s390/kvm/kvm-s390.c |   17 +++--
 arch/s390/kvm/kvm-s390.h |6 ++
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index d5bc375..eef3dd3 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -106,9 +106,22 @@ struct kvm_s390_sie_block {
__u64   gbea;   /* 0x0180 */
__u8reserved188[24];/* 0x0188 */
__u32   fac;/* 0x01a0 */
-   __u8reserved1a4[92];/* 0x01a4 */
+   __u8reserved1a4[68];/* 0x01a4 */
+   __u64   itdba;  /* 0x01e8 */
+   __u8reserved1f0[16];/* 0x01f0 */
 } __attribute__((packed));
 
+struct kvm_s390_itdb {
+   __u8data[256];
+} __packed;
+
+struct sie_page {
+   struct kvm_s390_sie_block sie_block;
+   __u8 reserved200[1024]; /* 0x0200 */
+   struct kvm_s390_itdb itdb;  /* 0x0600 */
+   __u8 reserved700[2304]; /* 0x0700 */
+} __packed;
+
 struct kvm_vcpu_stat {
u32 exit_userspace;
u32 exit_null;
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 5ddbbde..eeb1ac7 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -112,6 +112,17 @@ static int handle_instruction(struct kvm_vcpu *vcpu)
 static int handle_prog(struct kvm_vcpu *vcpu)
 {
vcpu-stat.exit_program_interruption++;
+
+   /* Restore ITDB to Program-Interruption TDB in guest memory */
+   if (IS_TE_ENABLED(vcpu) 
+   !(current-thread.per_flags  PER_FLAG_NO_TE) 
+   IS_ITDB_VALID(vcpu)) {
+   copy_to_guest(vcpu, TDB_ADDR, vcpu-arch.sie_block-itdba,
+ sizeof(struct kvm_s390_itdb));
+   memset((void *) vcpu-arch.sie_block-itdba, 0,
+  sizeof(struct kvm_s390_itdb));
+   }
+
trace_kvm_s390_intercept_prog(vcpu, vcpu-arch.sie_block-iprcc);
return kvm_s390_inject_program_int(vcpu, vcpu-arch.sie_block-iprcc);
 }
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1bb1dda..0084c2c2 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -395,6 +395,9 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
CPUSTAT_STOPPED |
CPUSTAT_GED);
vcpu-arch.sie_block-ecb   = 6;
+   if (test_vfacility(50)  test_vfacility(73))
+   vcpu-arch.sie_block-ecb |= 0x10;
+
vcpu-arch.sie_block-ecb2  = 8;
vcpu-arch.sie_block-eca   = 0xC1002001U;
vcpu-arch.sie_block-fac   = (int) (long) vfacilities;
@@ -411,6 +414,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
  unsigned int id)
 {
struct kvm_vcpu *vcpu;
+   struct sie_page *sie_page;
int rc = -EINVAL;
 
if (id = KVM_MAX_VCPUS)
@@ -422,12 +426,13 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
if (!vcpu)
goto out;
 
-   vcpu-arch.sie_block = (struct kvm_s390_sie_block *)
-   get_zeroed_page(GFP_KERNEL);
-
-   if (!vcpu-arch.sie_block)
+   sie_page = (struct sie_page *) get_zeroed_page(GFP_KERNEL);
+   if (!sie_page)
goto out_free_cpu;
 
+   vcpu-arch.sie_block = sie_page-sie_block;
+   vcpu-arch.sie_block-itdba = (unsigned long) sie_page-itdb;
+
vcpu-arch.sie_block-icpua = id;
if (!kvm_is_ucontrol(kvm)) {
if (!kvm-arch.sca) {
@@ -1178,8 +1183,8 @@ static int __init kvm_s390_init(void)
return -ENOMEM;
}
memcpy(vfacilities, S390_lowcore.stfle_fac_list, 16);
-   vfacilities[0] = 0xff82fff3f47cUL;
-   vfacilities[1] = 0x001cUL;
+   vfacilities[0] = 0xff82fff3f47c2000UL;
+   vfacilities[1] = 0x005cUL;
return 0;
 }
 
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 

[PULL 0/3] KVM: s390: patches for kvm-next

2014-01-17 Thread Christian Borntraeger
Paolo,

the following changes since commit 26a865f4aa8e66a6d94958de7656f7f1b03c6c56:

  KVM: VMX: fix use after free of vmx-loaded_vmcs (2014-01-08 19:14:08 -0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git 
tags/kvm-s390-20140117

for you to fetch changes up to 19e4735bd7f02bd38db43a8521377b35f236b3b6:

  KVM: s390: virtio-ccw: Handle command rejects. (2014-01-17 13:12:22 +0100)


This deals with 2 guest features that need enablement in the kvm host:
- transactional execution
- lpp sampling support

In addition there is also a fix to the virtio-ccw guest driver. This will
enable future features


Cornelia Huck (1):
  KVM: s390: virtio-ccw: Handle command rejects.

Michael Mueller (1):
  KVM: s390: enable Transactional Execution

Thomas Huth (1):
  KVM: s390: Enable the LPP facility for guests

 arch/s390/include/asm/kvm_host.h |   15 ++-
 arch/s390/kvm/intercept.c|   11 +++
 arch/s390/kvm/kvm-s390.c |   17 +++--
 arch/s390/kvm/kvm-s390.h |6 ++
 drivers/s390/kvm/virtio_ccw.c|   11 +--
 5 files changed, 51 insertions(+), 9 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 3/3] KVM: s390: virtio-ccw: Handle command rejects.

2014-01-17 Thread Christian Borntraeger
From: Cornelia Huck cornelia.h...@de.ibm.com

A command reject for a ccw may happen if we run on a host not supporting
a certain feature. We want to be able to handle this as special case of
command failure, so let's split this off from the generic -EIO error code.

Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 drivers/s390/kvm/virtio_ccw.c |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index d629717..0fc5848 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -642,8 +642,15 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev,
 (SCSW_STCTL_ALERT_STATUS | SCSW_STCTL_STATUS_PEND))) {
/* OK */
}
-   if (irb_is_error(irb))
-   vcdev-err = -EIO; /* XXX - use real error */
+   if (irb_is_error(irb)) {
+   /* Command reject? */
+   if ((scsw_dstat(irb-scsw)  DEV_STAT_UNIT_CHECK) 
+   (irb-ecw[0]  SNS0_CMD_REJECT))
+   vcdev-err = -EOPNOTSUPP;
+   else
+   /* Map everything else to -EIO. */
+   vcdev-err = -EIO;
+   }
if (vcdev-curr_io  activity) {
switch (activity) {
case VIRTIO_CCW_DOING_READ_FEAT:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 2/3] KVM: s390: Enable the LPP facility for guests

2014-01-17 Thread Christian Borntraeger
From: Thomas Huth th...@linux.vnet.ibm.com

The Load-Program-Parameter Facility is available for guests without
any further ado, so we should indicate its availability by setting
facility bit 40 if it is supported by the host.

Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
Reviewed-by: Michael Mueller m...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/kvm-s390.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0084c2c2..597114b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1183,7 +1183,7 @@ static int __init kvm_s390_init(void)
return -ENOMEM;
}
memcpy(vfacilities, S390_lowcore.stfle_fac_list, 16);
-   vfacilities[0] = 0xff82fff3f47c2000UL;
+   vfacilities[0] = 0xff82fff3f4fc2000UL;
vfacilities[1] = 0x005cUL;
return 0;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

2014-01-17 Thread Paolo Bonzini
Il 16/01/2014 21:22, Michael S. Tsirkin ha scritto:
 [PATCHv3] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

 When starting lots of dataplane devices the bootup takes very long
 on my s390 system(prototype irqfd code). With larger setups we are even
 able to trigger some timeouts in some components.
 Turns out that the KVM_SET_GSI_ROUTING ioctl takes very
 long (strace claims up to 0.1 sec) when having multiple CPUs.
 This is caused by the  synchronize_rcu and the HZ=100 of s390.

 Lets use the expedited variant to speed things up as suggested by
 Michael S. Tsirkin

 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 ---
  virt/kvm/irqchip.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
 index 20dc9e4..dbcfde7 100644
 --- a/virt/kvm/irqchip.c
 +++ b/virt/kvm/irqchip.c
 @@ -226,7 +226,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
  kvm_irq_routing_update(kvm, new);
  mutex_unlock(kvm-irq_lock);
  
 -synchronize_rcu();
 +synchronize_rcu_expedited();
  
  new = old;
  r = 0;
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Well... I love to contradict myself, so: no way this can be accepted
this close to the end of the merge window.  :(

synchronize_rcu_expedited() forces a context switch on all CPUs, even
those that are not running KVM.  Thus, this patch might help a guest DoS
its host by changing the IRQ routing tables in a loop.

So this will have to wait for 3.15.  We have ~2 months to do performance
measurements on the v2 patch.  Sorry.

Thanks,

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/3] arm64: KVM: force cache clean on page fault when caches are off

2014-01-17 Thread Marc Zyngier
In order for the guest with caches off to observe data written
contained in a given page, we need to make sure that page is
committed to memory, and not just hanging in the cache (as
guest accesses are completely bypassing the cache until it
decides to enable it).

For this purpose, hook into the coherent_icache_guest_page
function and flush the region if the guest SCTLR_EL1
register doesn't show the MMU  and caches as being enabled.
The function also get renamed to coherent_cache_guest_page.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Reviewed-by: Catalin Marinas catalin.mari...@arm.com
---
 arch/arm/include/asm/kvm_mmu.h   |  4 ++--
 arch/arm/kvm/mmu.c   |  4 ++--
 arch/arm64/include/asm/kvm_mmu.h | 11 +++
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 77de4a4..f997b9e 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -116,8 +116,8 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
 
 struct kvm;
 
-static inline void coherent_icache_guest_page(struct kvm *kvm, hva_t hva,
- unsigned long size)
+static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
+unsigned long size)
 {
/*
 * If we are going to insert an instruction page and the icache is
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 5809069..415fd63 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -713,7 +713,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
kvm_set_s2pmd_writable(new_pmd);
kvm_set_pfn_dirty(pfn);
}
-   coherent_icache_guest_page(kvm, hva  PMD_MASK, PMD_SIZE);
+   coherent_cache_guest_page(vcpu, hva  PMD_MASK, PMD_SIZE);
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, new_pmd);
} else {
pte_t new_pte = pfn_pte(pfn, PAGE_S2);
@@ -721,7 +721,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
kvm_set_s2pte_writable(new_pte);
kvm_set_pfn_dirty(pfn);
}
-   coherent_icache_guest_page(kvm, hva, PAGE_SIZE);
+   coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
ret = stage2_set_pte(kvm, memcache, fault_ipa, new_pte, false);
}
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 680f74e..2232dd0 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -106,7 +106,6 @@ static inline bool kvm_is_write_fault(unsigned long esr)
return true;
 }
 
-static inline void kvm_clean_dcache_area(void *addr, size_t size) {}
 static inline void kvm_clean_pgd(pgd_t *pgd) {}
 static inline void kvm_clean_pmd_entry(pmd_t *pmd) {}
 static inline void kvm_clean_pte(pte_t *pte) {}
@@ -124,9 +123,14 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
 
 struct kvm;
 
-static inline void coherent_icache_guest_page(struct kvm *kvm, hva_t hva,
- unsigned long size)
+#define kvm_flush_dcache_to_poc(a,l)   __flush_dcache_area((a), (l))
+
+static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
+unsigned long size)
 {
+   if ((vcpu_sys_reg(vcpu, SCTLR_EL1)  0b101) != 0b101)
+   kvm_flush_dcache_to_poc((void *)hva, size);
+
if (!icache_is_aliasing()) {/* PIPT */
flush_icache_range(hva, hva + size);
} else if (!icache_is_aivivt()) {   /* non ASID-tagged VIVT */
@@ -135,7 +139,6 @@ static inline void coherent_icache_guest_page(struct kvm 
*kvm, hva_t hva,
}
 }
 
-#define kvm_flush_dcache_to_poc(a,l)   __flush_dcache_area((a), (l))
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
-- 
1.8.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/3] arm64: KVM: trap VM system registers until MMU and caches are ON

2014-01-17 Thread Marc Zyngier
In order to be able to detect the point where the guest enables
its MMU and caches, trap all the VM related system registers.

Once we see the guest enabling both the MMU and the caches, we
can go back to a saner mode of operation, which is to leave these
registers in complete control of the guest.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Reviewed-by: Catalin Marinas catalin.mari...@arm.com
---
 arch/arm64/include/asm/kvm_arm.h |  3 ++-
 arch/arm64/kvm/sys_regs.c| 58 
 2 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index c98ef47..fd0a651 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -62,6 +62,7 @@
  * RW: 64bit by default, can be overriden for 32bit VMs
  * TAC:Trap ACTLR
  * TSC:Trap SMC
+ * TVM:Trap VM ops (until M+C set in SCTLR_EL1)
  * TSW:Trap cache operations by set/way
  * TWE:Trap WFE
  * TWI:Trap WFI
@@ -74,7 +75,7 @@
  * SWIO:   Turn set/way invalidates into set/way clean+invalidate
  */
 #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
-HCR_BSU_IS | HCR_FB | HCR_TAC | \
+HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
 HCR_AMO | HCR_IMO | HCR_FMO | \
 HCR_SWIO | HCR_TIDCP | HCR_RW)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 02e9d09..5e92b9e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -121,6 +121,42 @@ done:
 }
 
 /*
+ * Generic accessor for VM registers. Only called as long as HCR_TVM
+ * is set.
+ */
+static bool access_vm_reg(struct kvm_vcpu *vcpu,
+ const struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+   BUG_ON(!p-is_write);
+
+   vcpu_sys_reg(vcpu, r-reg) = *vcpu_reg(vcpu, p-Rt);
+   return true;
+}
+
+/*
+ * SCTLR_EL1 accessor. Only called as long as HCR_TVM is set.  If the
+ * guest enables the MMU, we stop trapping the VM sys_regs and leave
+ * it in complete control of the caches.
+ */
+static bool access_sctlr_el1(struct kvm_vcpu *vcpu,
+const struct sys_reg_params *p,
+const struct sys_reg_desc *r)
+{
+   unsigned long val;
+
+   BUG_ON(!p-is_write);
+
+   val = *vcpu_reg(vcpu, p-Rt);
+   vcpu_sys_reg(vcpu, r-reg) = val;
+
+   if ((val  (0b101)) == 0b101)   /* MMU+Caches enabled? */
+   vcpu-arch.hcr_el2 = ~HCR_TVM;
+
+   return true;
+}
+
+/*
  * We could trap ID_DFR0 and tell the guest we don't support performance
  * monitoring.  Unfortunately the patch to make the kernel check ID_DFR0 was
  * NAKed, so it will read the PMCR anyway.
@@ -185,32 +221,32 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  NULL, reset_mpidr, MPIDR_EL1 },
/* SCTLR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b), Op2(0b000),
- NULL, reset_val, SCTLR_EL1, 0x00C50078 },
+ access_sctlr_el1, reset_val, SCTLR_EL1, 0x00C50078 },
/* CPACR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b), Op2(0b010),
  NULL, reset_val, CPACR_EL1, 0 },
/* TTBR0_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b), Op2(0b000),
- NULL, reset_unknown, TTBR0_EL1 },
+ access_vm_reg, reset_unknown, TTBR0_EL1 },
/* TTBR1_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b), Op2(0b001),
- NULL, reset_unknown, TTBR1_EL1 },
+ access_vm_reg, reset_unknown, TTBR1_EL1 },
/* TCR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b), Op2(0b010),
- NULL, reset_val, TCR_EL1, 0 },
+ access_vm_reg, reset_val, TCR_EL1, 0 },
 
/* AFSR0_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0001), Op2(0b000),
- NULL, reset_unknown, AFSR0_EL1 },
+ access_vm_reg, reset_unknown, AFSR0_EL1 },
/* AFSR1_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0001), Op2(0b001),
- NULL, reset_unknown, AFSR1_EL1 },
+ access_vm_reg, reset_unknown, AFSR1_EL1 },
/* ESR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000),
- NULL, reset_unknown, ESR_EL1 },
+ access_vm_reg, reset_unknown, ESR_EL1 },
/* FAR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b), Op2(0b000),
- NULL, reset_unknown, FAR_EL1 },
+ access_vm_reg, reset_unknown, FAR_EL1 },
/* PAR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000),
  NULL, reset_unknown, PAR_EL1 },
@@ -224,17 +260,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
   

[RFC PATCH 3/3] arm64: KVM: flush VM pages before letting the guest enable caches

2014-01-17 Thread Marc Zyngier
When the guest runs with caches disabled (like in an early boot
sequence, for example), all the writes are diectly going to RAM,
bypassing the caches altogether.

Once the MMU and caches are enabled, whatever sits in the cache
becomes suddently visible, which isn't what the guest expects.

A way to avoid this potential disaster is to invalidate the cache
when the MMU is being turned on. For this, we hook into the SCTLR_EL1
trapping code, and scan the stage-2 page tables, invalidating the
pages/sections that have already been mapped in.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Reviewed-by: Catalin Marinas catalin.mari...@arm.com
---
 arch/arm/kvm/mmu.c   | 72 
 arch/arm64/include/asm/kvm_mmu.h |  1 +
 arch/arm64/kvm/sys_regs.c|  5 ++-
 3 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 415fd63..704c939 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -187,6 +187,78 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
}
 }
 
+void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
+  unsigned long addr, unsigned long end)
+{
+   pte_t *pte;
+
+   pte = pte_offset_kernel(pmd, addr);
+   do {
+   if (!pte_none(*pte)) {
+   hva_t hva = gfn_to_hva(kvm, addr  PAGE_SHIFT);
+   kvm_flush_dcache_to_poc((void*)hva, PAGE_SIZE);
+   }
+   } while(pte++, addr += PAGE_SIZE, addr != end);
+}
+
+void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
+  unsigned long addr, unsigned long end)
+{
+   pmd_t *pmd;
+   unsigned long next;
+
+   pmd = pmd_offset(pud, addr);
+   do {
+   next = pmd_addr_end(addr, end);
+   if (!pmd_none(*pmd)) {
+   if (kvm_pmd_huge(*pmd)) {
+   hva_t hva = gfn_to_hva(kvm, addr  PAGE_SHIFT);
+   kvm_flush_dcache_to_poc((void*)hva, PMD_SIZE);
+   } else {
+   stage2_flush_ptes(kvm, pmd, addr, next);
+   }
+   }
+   } while(pmd++, addr = next, addr != end);
+}
+
+void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
+  unsigned long addr, unsigned long end)
+{
+   pud_t *pud;
+   unsigned long next;
+
+   pud = pud_offset(pgd, addr);
+   do {
+   next = pud_addr_end(addr, end);
+   if (!pud_none(*pud)) {
+   if (pud_huge(*pud)) {
+   hva_t hva = gfn_to_hva(kvm, addr  PAGE_SHIFT);
+   kvm_flush_dcache_to_poc((void*)hva, PUD_SIZE);
+   } else {
+   stage2_flush_pmds(kvm, pud, addr, next);
+   }
+   }
+   } while(pud++, addr = next, addr != end);
+}
+
+void stage2_flush_vm(struct kvm *kvm)
+{
+   unsigned long long addr = 0;
+   unsigned long end = KVM_PHYS_SIZE;
+   unsigned long next;
+   pgd_t *pgd;
+
+   spin_lock(kvm-mmu_lock);
+
+   pgd = kvm-arch.pgd + pgd_index(addr);
+   do {
+   next = pgd_addr_end(addr, end);
+   stage2_flush_puds(kvm, pgd, addr, next);
+   } while(pgd++, addr = next, addr != end);
+
+   spin_unlock(kvm-mmu_lock);
+}
+
 /**
  * free_boot_hyp_pgd - free HYP boot page tables
  *
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 2232dd0..b7b2ca3 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -139,6 +139,7 @@ static inline void coherent_cache_guest_page(struct 
kvm_vcpu *vcpu, hva_t hva,
}
 }
 
+void stage2_flush_vm(struct kvm *kvm);
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 5e92b9e..32e440f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -27,6 +27,7 @@
 #include asm/kvm_host.h
 #include asm/kvm_emulate.h
 #include asm/kvm_coproc.h
+#include asm/kvm_mmu.h
 #include asm/cacheflush.h
 #include asm/cputype.h
 #include trace/events/kvm.h
@@ -150,8 +151,10 @@ static bool access_sctlr_el1(struct kvm_vcpu *vcpu,
val = *vcpu_reg(vcpu, p-Rt);
vcpu_sys_reg(vcpu, r-reg) = val;
 
-   if ((val  (0b101)) == 0b101)   /* MMU+Caches enabled? */
+   if ((val  (0b101)) == 0b101) { /* MMU+Caches enabled? */
vcpu-arch.hcr_el2 = ~HCR_TVM;
+   stage2_flush_vm(vcpu-kvm);
+   }
 
return true;
 }
-- 
1.8.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/3] arm64: KVM: host cache maintainance when guest caches are off

2014-01-17 Thread Marc Zyngier
When we run a guest with cache disabled, we don't flush the cache to
the Point of Coherency, hence possibly missing bits of data that have
been written in the cache, but have not yet reached memory.

We also have the opposite issue: when a guest enables its cache,
whatever sits in the cache is suddenly going to become visible,
shadowing whatever the guest has written into RAM.

There are several approaches to these issues:
- Using the DC bit when caches are off: this breaks guests assuming
  caches off while doing DMA operations. Bootloaders, for example.
  It also breaks the I-D coherency.
- Fetch the memory attributes on translation fault, and flush the
  cache while handling the fault. This relies on using the PAR_EL1
  register to obtain the Stage-1 memory attributes, and tends to be
  slow.
- Detecting the translation faults occuring with MMU off (and
  performing a cache clean), and trapping SCTLR_EL1 to detect the
  moment when the guest is turning its caches on (and performing a
  cache invalidation). Trapping of SCTLR_EL1 is then disabled to
  ensure the best performance.

This patch series implements the last solution, only on arm64 for the
time being (I'll add the necessary ARMv7 bits once we reach an
agreement on arm64).

Marc Zyngier (3):
  arm64: KVM: force cache clean on page fault when caches are off
  arm64: KVM: trap VM system registers until MMU and caches are ON
  arm64: KVM: flush VM pages before letting the guest enable caches

 arch/arm/include/asm/kvm_mmu.h   |  4 +--
 arch/arm/kvm/mmu.c   | 76 ++--
 arch/arm64/include/asm/kvm_arm.h |  3 +-
 arch/arm64/include/asm/kvm_mmu.h | 12 ---
 arch/arm64/kvm/sys_regs.c| 61 ++--
 5 files changed, 136 insertions(+), 20 deletions(-)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

2014-01-17 Thread Christian Borntraeger
On 17/01/14 15:03, Paolo Bonzini wrote:
 Il 16/01/2014 21:22, Michael S. Tsirkin ha scritto:
 [PATCHv3] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

 When starting lots of dataplane devices the bootup takes very long
 on my s390 system(prototype irqfd code). With larger setups we are even
 able to trigger some timeouts in some components.
 Turns out that the KVM_SET_GSI_ROUTING ioctl takes very
 long (strace claims up to 0.1 sec) when having multiple CPUs.
 This is caused by the  synchronize_rcu and the HZ=100 of s390.

 Lets use the expedited variant to speed things up as suggested by
 Michael S. Tsirkin

 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 ---
  virt/kvm/irqchip.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
 index 20dc9e4..dbcfde7 100644
 --- a/virt/kvm/irqchip.c
 +++ b/virt/kvm/irqchip.c
 @@ -226,7 +226,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
 kvm_irq_routing_update(kvm, new);
 mutex_unlock(kvm-irq_lock);
  
 -   synchronize_rcu();
 +   synchronize_rcu_expedited();
  
 new = old;
 r = 0;
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 
 Well... I love to contradict myself, so: no way this can be accepted
 this close to the end of the merge window.  :(
 
 synchronize_rcu_expedited() forces a context switch on all CPUs, even
 those that are not running KVM.  Thus, this patch might help a guest DoS
 its host by changing the IRQ routing tables in a loop.
 
 So this will have to wait for 3.15.  We have ~2 months to do performance
 measurements on the v2 patch.  Sorry.

Any chance that you or Michael can give some performance feedback on v2? All
my lab systems are s390 and not x86...


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

2014-01-17 Thread Paolo Bonzini
Il 17/01/2014 16:03, Christian Borntraeger ha scritto:
  Well... I love to contradict myself, so: no way this can be accepted
  this close to the end of the merge window.  :(
  
  synchronize_rcu_expedited() forces a context switch on all CPUs, even
  those that are not running KVM.  Thus, this patch might help a guest DoS
  its host by changing the IRQ routing tables in a loop.
  
  So this will have to wait for 3.15.  We have ~2 months to do performance
  measurements on the v2 patch.  Sorry.
 
 Any chance that you or Michael can give some performance feedback on v2? All
 my lab systems are s390 and not x86...

Yes, we will help.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/9] vfio: Use pci_enable_msi_range() and pci_enable_msix_range()

2014-01-17 Thread Alexander Gordeev
As result deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.

Signed-off-by: Alexander Gordeev agord...@redhat.com
Acked-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci_intrs.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 641bc87..4a9db1d 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -482,15 +482,19 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, 
int nvec, bool msix)
for (i = 0; i  nvec; i++)
vdev-msix[i].entry = i;
 
-   ret = pci_enable_msix(pdev, vdev-msix, nvec);
-   if (ret) {
+   ret = pci_enable_msix_range(pdev, vdev-msix, 1, nvec);
+   if (ret  nvec) {
+   if (ret  0)
+   pci_disable_msix(pdev);
kfree(vdev-msix);
kfree(vdev-ctx);
return ret;
}
} else {
-   ret = pci_enable_msi_block(pdev, nvec);
-   if (ret) {
+   ret = pci_enable_msi_range(pdev, 1, nvec);
+   if (ret  nvec) {
+   if (ret  0)
+   pci_disable_msi(pdev);
kfree(vdev-ctx);
return ret;
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/9] ipr: Use pci_enable_msi_range() and pci_enable_msix_range()

2014-01-17 Thread Alexander Gordeev
As result deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.

Signed-off-by: Alexander Gordeev agord...@redhat.com
---
 drivers/scsi/ipr.c |   47 ++-
 1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index fb57e21..3841298 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -9245,47 +9245,36 @@ ipr_get_chip_info(const struct pci_device_id *dev_id)
 static int ipr_enable_msix(struct ipr_ioa_cfg *ioa_cfg)
 {
struct msix_entry entries[IPR_MAX_MSIX_VECTORS];
-   int i, err, vectors;
+   int i, vectors;
 
for (i = 0; i  ARRAY_SIZE(entries); ++i)
entries[i].entry = i;
 
-   vectors = ipr_number_of_msix;
+   vectors = pci_enable_msix_range(ioa_cfg-pdev, entries,
+   1, ipr_number_of_msix);
+   if (vectors  0)
+   return vectors;
 
-   while ((err = pci_enable_msix(ioa_cfg-pdev, entries, vectors))  0)
-   vectors = err;
+   for (i = 0; i  vectors; i++)
+   ioa_cfg-vectors_info[i].vec = entries[i].vector;
+   ioa_cfg-nvectors = vectors;
 
-   if (err  0)
-   return err;
-
-   if (!err) {
-   for (i = 0; i  vectors; i++)
-   ioa_cfg-vectors_info[i].vec = entries[i].vector;
-   ioa_cfg-nvectors = vectors;
-   }
-
-   return err;
+   return 0;
 }
 
 static int ipr_enable_msi(struct ipr_ioa_cfg *ioa_cfg)
 {
-   int i, err, vectors;
+   int i, vectors;
 
-   vectors = ipr_number_of_msix;
+   vectors = pci_enable_msi_range(ioa_cfg-pdev, 1, ipr_number_of_msix);
+   if (vectors  0)
+   return vectors;
 
-   while ((err = pci_enable_msi_block(ioa_cfg-pdev, vectors))  0)
-   vectors = err;
+   for (i = 0; i  vectors; i++)
+   ioa_cfg-vectors_info[i].vec = ioa_cfg-pdev-irq + i;
+   ioa_cfg-nvectors = vectors;
 
-   if (err  0)
-   return err;
-
-   if (!err) {
-   for (i = 0; i  vectors; i++)
-   ioa_cfg-vectors_info[i].vec = ioa_cfg-pdev-irq + i;
-   ioa_cfg-nvectors = vectors;
-   }
-
-   return err;
+   return 0;
 }
 
 static void name_msi_vectors(struct ipr_ioa_cfg *ioa_cfg)
@@ -9350,7 +9339,7 @@ static irqreturn_t ipr_test_intr(int irq, void *devp)
  * ipr_test_msi - Test for Message Signaled Interrupt (MSI) support.
  * @pdev:  PCI device struct
  *
- * Description: The return value from pci_enable_msi() can not always be
+ * Description: The return value from pci_enable_msi_range() can not always be
  * trusted.  This routine sets up and initiates a test interrupt to determine
  * if the interrupt is received via the ipr_test_intr() service routine.
  * If the tests fails, the driver will fall back to LSI.
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/9] ipr: Get rid of superfluous call to pci_disable_msi/msix()

2014-01-17 Thread Alexander Gordeev
There is no need to call pci_disable_msi() or pci_disable_msix()
in case the call to pci_enable_msi() or pci_enable_msix() failed.

Signed-off-by: Alexander Gordeev agord...@redhat.com
---
 drivers/scsi/ipr.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 36ac1c3..fb57e21 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -9255,10 +9255,8 @@ static int ipr_enable_msix(struct ipr_ioa_cfg *ioa_cfg)
while ((err = pci_enable_msix(ioa_cfg-pdev, entries, vectors))  0)
vectors = err;
 
-   if (err  0) {
-   pci_disable_msix(ioa_cfg-pdev);
+   if (err  0)
return err;
-   }
 
if (!err) {
for (i = 0; i  vectors; i++)
@@ -9278,10 +9276,8 @@ static int ipr_enable_msi(struct ipr_ioa_cfg *ioa_cfg)
while ((err = pci_enable_msi_block(ioa_cfg-pdev, vectors))  0)
vectors = err;
 
-   if (err  0) {
-   pci_disable_msi(ioa_cfg-pdev);
+   if (err  0)
return err;
-   }
 
if (!err) {
for (i = 0; i  vectors; i++)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/9] Phase out pci_enable_msi_block()

2014-01-17 Thread Alexander Gordeev
This series is against next branch in Bjorn's repo:
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git

Changes from v1 to v2:
  - added a regression fix ahci: Fix broken fallback to single
MSI mode as patch 1/9;
  - the series is reordered to move the regression fix in front;
  - at Bjorn's request pci_enable_msi() is un-deprecated;
  - as result, pci_enable_msi_range(pdev, 1, 1) styled calls
rolled back to pci_enable_msi(pdev);
  - nvme bug fix moved out as a separate patch 5/9 nvme: Fix
invalid call to irq_set_affinity_hint()
  - patches changelog elaborated a bit;

Bjorn,

As the release is supposedly this weekend, do you prefer
the patches to go to your tree or to individual trees after
the release?

Thanks!

Alexander Gordeev (9):
  ahci: Fix broken fallback to single MSI mode
  ahci: Use pci_enable_msi_range()
  ipr: Get rid of superfluous call to pci_disable_msi/msix()
  ipr: Use pci_enable_msi_range() and pci_enable_msix_range()
  nvme: Fix invalid call to irq_set_affinity_hint()
  nvme: Use pci_enable_msi_range() and pci_enable_msix_range()
  vfio: Use pci_enable_msi_range() and pci_enable_msix_range()
  ath10k: Use pci_enable_msi_range()
  wil6210: Use pci_enable_msi_range()

 drivers/ata/ahci.c  |   18 +-
 drivers/block/nvme-core.c   |   33 -
 drivers/net/wireless/ath/ath10k/pci.c   |   20 +-
 drivers/net/wireless/ath/wil6210/pcie_bus.c |   36 ++-
 drivers/scsi/ipr.c  |   51 +-
 drivers/vfio/pci/vfio_pci_intrs.c   |   12 --
 6 files changed, 72 insertions(+), 98 deletions(-)

-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM and variable-endianness guest CPUs

2014-01-17 Thread Peter Maydell
[This seemed like a good jumping off point for this question.]

On 16 January 2014 17:51, Alexander Graf ag...@suse.de wrote:
 Am 16.01.2014 um 18:41 schrieb Peter Maydell peter.mayd...@linaro.org:
 Also see my remarks on the previous patch series suggesting
 that we should look at this in a more holistic way than
 just randomly fixing small bits of things. A good place
 to start would be what should the semantics of stl_p()
 be for a QEMU where the CPU is currently operating with
 a reversed endianness to the TARGET_WORDS_BIGENDIAN
 setting?.

 That'd open a giant can of worms that I'd rather not open.

Yeah, but you kind of have to open that can, because stl_p()
is used in the code path for KVM MMIO accesses to devices.

Specifically, the KVM API says here's a uint8_t[] byte
array and a length, and the current QEMU code treats that
as this is a byte array written as if the guest CPU
(a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
I/O access to this buffer rather than to the device.

The KVM API docs don't actually specify the endianness
semantics of the byte array, but I think that that really
needs to be nailed down. I can think of a couple of options:
 * always LE
 * always BE
   [these first two are non-starters because they would
   break either x86 or PPC existing code]
 * always the endianness the guest is at the time
 * always some arbitrary endianness based purely on the
   endianness the KVM implementation used historically
 * always the endianness of the host QEMU binary
 * something else?

Any preferences? Current QEMU code basically assumes
always the endianness of TARGET_WORDS_BIGENDIAN,
which is pretty random.

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-17 Thread duy hai nguyen
Thank you very much.

I can launch L2 from L1 by directly using qemu-system-x86_64
name_of_image. L2 still hangs if I launch it using 'virsh' command;
libvirt shows this log:

warning : virAuditSend:135 : Failed to send audit message virt=kvm
resrc=net reason=start vm=L2
uuid=e9549443-e63f-31b5-0692-1396736d06b4 old-net=?
new-net=52:54:00:75:c1:5b: Operation not permitted

I am using libvirt 1.1.1. Is it the above warning that causes the problem?

Best,
Hai

On Fri, Jan 17, 2014 at 6:40 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2014-01-17 12:29, Kashyap Chamarthy wrote:
 On Fri, Jan 17, 2014 at 2:51 AM, duy hai nguyen hain...@gmail.com wrote:
 Now I can run an L2 guest (nested guest)  using the kvm kernel module
 of kernel 3.12

 However, I am facing a new problem when trying to build and use kvm
 kernel module from git://git.kiszka.org/kvm-kmod.git: L1 (nested
 hypervisor) cannot boot L2  and the graphic console of virt-manager
 hangs displaying 'Booting from Hard Disk...'. L1 still runs fine.

 Loading kvm_intel with 'emulate_invalid_guest_state=0' in L0 does not
 solve the problem. I have also tried with different kernel versions:
 3.12.0, 3.12.8 and 3.13.0 without success.

 Can you give me some suggestions?

 Maybe you can try without graphical managers and enable serial console
 ('console=ttyS0') to your Kernel command-line of L2 guest, so you can
 see where it's stuck.

 Tracing can also be helpful, both in L1 and L0:

 http://www.linux-kvm.org/page/Tracing

 Jan

 --
 Siemens AG, Corporate Technology, CT RTC ITP SES-DE
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-17 Thread Peter Maydell
On 17 January 2014 17:53, Peter Maydell peter.mayd...@linaro.org wrote:
 Specifically, the KVM API says here's a uint8_t[] byte
 array and a length, and the current QEMU code treats that
 as this is a byte array written as if the guest CPU
 (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
 I/O access to this buffer rather than to the device.

 The KVM API docs don't actually specify the endianness
 semantics of the byte array, but I think that that really
 needs to be nailed down. I can think of a couple of options:
  * always LE
  * always BE
[these first two are non-starters because they would
break either x86 or PPC existing code]
  * always the endianness the guest is at the time
  * always some arbitrary endianness based purely on the
endianness the KVM implementation used historically
  * always the endianness of the host QEMU binary
  * something else?

 Any preferences? Current QEMU code basically assumes
 always the endianness of TARGET_WORDS_BIGENDIAN,
 which is pretty random.

Having thought a little more about this, my opinion is:

 * we should specify that the byte order of the mmio.data
   array is host kernel endianness (ie same endianness
   as the QEMU process itself) [this is what it actually
   is, I think, for all the cases that work today]
 * we should fix the code path in QEMU for handling
   mmio.data which currently has the implicit assumption
   that when using KVM TARGET_WORDS_BIGENDIAN is the same
   as the QEMU host process endianness (because it's using
   load/store functions which swap if TARGET_WORDS_BIGENDIAN
   is different from HOST_WORDS_BIGENDIAN)

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 1/7] vfio: Destroy memory regions

2014-01-17 Thread Alex Williamson
Somehow this has been lurking for a while; we remove our subregions
from the base BAR and VGA region mappings, but we don't destroy them,
creating a leak and more serious problems when we try to migrate after
removing these devices.  Add the trivial bit of final cleanup to
remove these entirely.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/misc/vfio.c |4 
 1 file changed, 4 insertions(+)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 9aecaa8..ec9f41b 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1968,6 +1968,7 @@ static void vfio_vga_quirk_teardown(VFIODevice *vdev)
 while (!QLIST_EMPTY(vdev-vga.region[i].quirks)) {
 VFIOQuirk *quirk = QLIST_FIRST(vdev-vga.region[i].quirks);
 memory_region_del_subregion(vdev-vga.region[i].mem, quirk-mem);
+memory_region_destroy(quirk-mem);
 QLIST_REMOVE(quirk, next);
 g_free(quirk);
 }
@@ -1990,6 +1991,7 @@ static void vfio_bar_quirk_teardown(VFIODevice *vdev, int 
nr)
 while (!QLIST_EMPTY(bar-quirks)) {
 VFIOQuirk *quirk = QLIST_FIRST(bar-quirks);
 memory_region_del_subregion(bar-mem, quirk-mem);
+memory_region_destroy(quirk-mem);
 QLIST_REMOVE(quirk, next);
 g_free(quirk);
 }
@@ -2412,10 +2414,12 @@ static void vfio_unmap_bar(VFIODevice *vdev, int nr)
 
 memory_region_del_subregion(bar-mem, bar-mmap_mem);
 munmap(bar-mmap, memory_region_size(bar-mmap_mem));
+memory_region_destroy(bar-mmap_mem);
 
 if (vdev-msix  vdev-msix-table_bar == nr) {
 memory_region_del_subregion(bar-mem, vdev-msix-mmap_mem);
 munmap(vdev-msix-mmap, memory_region_size(vdev-msix-mmap_mem));
+memory_region_destroy(vdev-msix-mmap_mem);
 }
 
 memory_region_destroy(bar-mem);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 2/7] vfio: warn if host device rom can't be read

2014-01-17 Thread Alex Williamson
From: Bandan Das b...@redhat.com

If the device rom can't be read, report an error to the
user. This alerts the user that the device has a bad
state that is causing rom read failure or option rom
loading has been disabled from the device boot menu
(among other reasons).

Signed-off-by: Bandan Das b...@redhat.com
Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/misc/vfio.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index ec9f41b..ef615fc 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1125,6 +1125,13 @@ static void vfio_pci_load_rom(VFIODevice *vdev)
 vdev-rom_offset = reg_info.offset;
 
 if (!vdev-rom_size) {
+error_report(vfio-pci: Cannot read device rom at 
+%04x:%02x:%02x.%x\n,
+vdev-host.domain, vdev-host.bus, vdev-host.slot,
+vdev-host.function);
+error_printf(Device option ROM contents are probably invalid 
+(check dmesg).\nSkip option ROM probe with rombar=0, 
+or load from file with romfile=\n);
 return;
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 0/7] vfio pull request

2014-01-17 Thread Alex Williamson
Hi Anthony,

The following changes since commit 1cf892ca2689c84960b4ce4d2723b6bee453711c:

  SPARC: Fix LEON3 power down instruction (2014-01-15 15:37:33 +1000)

are available in the git repository at:

  git://github.com/awilliam/qemu-vfio.git tags/vfio-pci-for-qemu-20140117.0

for you to fetch changes up to 8d7b5a1da0e06aa7addd7f084d9ec9d433c4bafb:

  vfio: fix mapping of MSIX bar (2014-01-17 11:12:56 -0700)


vfio-pci updates include:
 - Destroy MemoryRegions on device teardown
 - Print warnings around PCI option ROM failures
 - Skip bogus mappings from 64bit BAR sizing
 - Act on DMA mapping failures
 - Fix alignment to avoid MSI-X table mapping


Alex Williamson (3):
  vfio: Destroy memory regions
  vfio: Filter out bogus mappings
  vfio-pci: Fail initfn on DMA mapping errors

Alexey Kardashevskiy (2):
  kvm: initialize qemu_host_page_size
  vfio: fix mapping of MSIX bar

Bandan Das (2):
  vfio: warn if host device rom can't be read
  vfio: Do not reattempt a failed rom read

 hw/misc/vfio.c  | 76 ++---
 include/exec/exec-all.h |  1 +
 kvm-all.c   |  1 +
 translate-all.c | 14 +
 4 files changed, 76 insertions(+), 16 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 3/7] vfio: Do not reattempt a failed rom read

2014-01-17 Thread Alex Williamson
From: Bandan Das b...@redhat.com

During lazy rom loading, if rom read fails, and the
guest attempts a read again, vfio will again attempt it.
Add a boolean to prevent this. There could be a case where
a failed rom read might succeed the next time because of
a device reset or such, but it's best to exclude unpredictable
behavior

Signed-off-by: Bandan Das b...@redhat.com
Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/misc/vfio.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index ef615fc..30b1a78 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -191,6 +191,7 @@ typedef struct VFIODevice {
 bool has_flr;
 bool has_pm_reset;
 bool needs_reset;
+bool rom_read_failed;
 } VFIODevice;
 
 typedef struct VFIOGroup {
@@ -1125,6 +1126,7 @@ static void vfio_pci_load_rom(VFIODevice *vdev)
 vdev-rom_offset = reg_info.offset;
 
 if (!vdev-rom_size) {
+vdev-rom_read_failed = true;
 error_report(vfio-pci: Cannot read device rom at 
 %04x:%02x:%02x.%x\n,
 vdev-host.domain, vdev-host.bus, vdev-host.slot,
@@ -1163,6 +1165,9 @@ static uint64_t vfio_rom_read(void *opaque, hwaddr addr, 
unsigned size)
 /* Load the ROM lazily when the guest tries to read it */
 if (unlikely(!vdev-rom)) {
 vfio_pci_load_rom(vdev);
+if (unlikely(!vdev-rom  !vdev-rom_read_failed)) {
+vfio_pci_load_rom(vdev);
+}
 }
 
 memcpy(val, vdev-rom + addr,
@@ -1230,6 +1235,7 @@ static void vfio_pci_size_rom(VFIODevice *vdev)
  PCI_BASE_ADDRESS_SPACE_MEMORY, vdev-pdev.rom);
 
 vdev-pdev.has_rom = true;
+vdev-rom_read_failed = false;
 }
 
 static void vfio_vga_write(void *opaque, hwaddr addr,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 4/7] vfio: Filter out bogus mappings

2014-01-17 Thread Alex Williamson
Since 57271d63 we now see spurious mappings with the upper bits set
if 64bit PCI BARs are sized while enabled.  The guest writes a mask
of 0x to the lower BAR to size it, then restores it, then
writes the same mask to the upper BAR resulting in a spurious BAR
mapping into the last 4G of the 64bit address space.  Most
architectures do not support or make use of the full 64bits address
space for PCI BARs, so we filter out mappings with the high bit set.
Long term, we probably need to think about vfio telling us the
address width limitations of the IOMMU.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
---
 hw/misc/vfio.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 30b1a78..d304213 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -2156,7 +2156,14 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
-return !memory_region_is_ram(section-mr);
+return !memory_region_is_ram(section-mr) ||
+   /*
+* Sizing an enabled 64-bit BAR can cause spurious mappings to
+* addresses in the upper part of the 64-bit address space.  These
+* are never accessed by the CPU and beyond the address width of
+* some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
+*/
+   section-offset_within_address_space  (1ULL  63);
 }
 
 static void vfio_listener_region_add(MemoryListener *listener,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 5/7] vfio-pci: Fail initfn on DMA mapping errors

2014-01-17 Thread Alex Williamson
The vfio-pci initfn will currently succeed even if DMA mappings fail.
A typical reason for failure is if the user does not have sufficient
privilege to lock all the memory for the guest.  In this case, the
device gets attached, but can only access a portion of guest memory
and is extremely unlikely to work.

DMA mappings are done via a MemoryListener, which provides no direct
error return path.  We therefore stuff the errno into our container
structure and check for error after registration completes.  We can
also test for mapping errors during runtime, but our only option for
resolution at that point is to kill the guest with a hw_error.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/misc/vfio.c |   44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index d304213..432547c 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -135,12 +135,18 @@ enum {
 
 struct VFIOGroup;
 
+typedef struct VFIOType1 {
+MemoryListener listener;
+int error;
+bool initialized;
+} VFIOType1;
+
 typedef struct VFIOContainer {
 int fd; /* /dev/vfio/vfio, empowered by the attached groups */
 struct {
 /* enable abstraction to support various iommu backends */
 union {
-MemoryListener listener; /* Used by type1 iommu */
+VFIOType1 type1;
 };
 void (*release)(struct VFIOContainer *);
 } iommu_data;
@@ -2170,7 +2176,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
  MemoryRegionSection *section)
 {
 VFIOContainer *container = container_of(listener, VFIOContainer,
-iommu_data.listener);
+iommu_data.type1.listener);
 hwaddr iova, end;
 void *vaddr;
 int ret;
@@ -2212,6 +2218,19 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 error_report(vfio_dma_map(%p, 0x%HWADDR_PRIx, 
  0x%HWADDR_PRIx, %p) = %d (%m),
  container, iova, end - iova, vaddr, ret);
+
+/*
+ * On the initfn path, store the first error in the container so we
+ * can gracefully fail.  Runtime, there's not much we can do other
+ * than throw a hardware error.
+ */
+if (!container-iommu_data.type1.initialized) {
+if (!container-iommu_data.type1.error) {
+container-iommu_data.type1.error = ret;
+}
+} else {
+hw_error(vfio: DMA mapping failed, unable to continue\n);
+}
 }
 }
 
@@ -2219,7 +2238,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
  MemoryRegionSection *section)
 {
 VFIOContainer *container = container_of(listener, VFIOContainer,
-iommu_data.listener);
+iommu_data.type1.listener);
 hwaddr iova, end;
 int ret;
 
@@ -2264,7 +2283,7 @@ static MemoryListener vfio_memory_listener = {
 
 static void vfio_listener_release(VFIOContainer *container)
 {
-memory_listener_unregister(container-iommu_data.listener);
+memory_listener_unregister(container-iommu_data.type1.listener);
 }
 
 /*
@@ -3236,10 +3255,23 @@ static int vfio_connect_container(VFIOGroup *group)
 return -errno;
 }
 
-container-iommu_data.listener = vfio_memory_listener;
+container-iommu_data.type1.listener = vfio_memory_listener;
 container-iommu_data.release = vfio_listener_release;
 
-memory_listener_register(container-iommu_data.listener, 
address_space_memory);
+memory_listener_register(container-iommu_data.type1.listener,
+ address_space_memory);
+
+if (container-iommu_data.type1.error) {
+ret = container-iommu_data.type1.error;
+vfio_listener_release(container);
+g_free(container);
+close(fd);
+error_report(vfio: memory listener initialization failed for 
container\n);
+return ret;
+}
+
+container-iommu_data.type1.initialized = true;
+
 } else {
 error_report(vfio: No available IOMMU models);
 g_free(container);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 6/7] kvm: initialize qemu_host_page_size

2014-01-17 Thread Alex Williamson
From: Alexey Kardashevskiy a...@ozlabs.ru

There is a HOST_PAGE_ALIGN macro which makes sense for KVM accelerator
but it uses qemu_host_page_size/qemu_host_page_mask which initialized
for TCG only.

This moves qemu_host_page_size/qemu_host_page_mask initialization from
TCG's page_init() and adds a call for it from kvm_init().

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 include/exec/exec-all.h |1 +
 kvm-all.c   |1 +
 translate-all.c |   14 --
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ea90b64..3b03cbf 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -81,6 +81,7 @@ void cpu_gen_init(void);
 int cpu_gen_code(CPUArchState *env, struct TranslationBlock *tb,
  int *gen_code_size_ptr);
 bool cpu_restore_state(CPUArchState *env, uintptr_t searched_pc);
+void page_size_init(void);
 
 void QEMU_NORETURN cpu_resume_from_signal(CPUArchState *env1, void *puc);
 void QEMU_NORETURN cpu_io_recompile(CPUArchState *env, uintptr_t retaddr);
diff --git a/kvm-all.c b/kvm-all.c
index 0bfb060..edf2365 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1360,6 +1360,7 @@ int kvm_init(void)
  * page size for the system though.
  */
 assert(TARGET_PAGE_SIZE = getpagesize());
+page_size_init();
 
 #ifdef KVM_CAP_SET_GUEST_DEBUG
 QTAILQ_INIT(s-kvm_sw_breakpoints);
diff --git a/translate-all.c b/translate-all.c
index 105c25a..543e1ff 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -289,17 +289,15 @@ static inline void map_exec(void *addr, long size)
 }
 #endif
 
-static void page_init(void)
+void page_size_init(void)
 {
 /* NOTE: we can always suppose that qemu_host_page_size =
TARGET_PAGE_SIZE */
 #ifdef _WIN32
-{
-SYSTEM_INFO system_info;
+SYSTEM_INFO system_info;
 
-GetSystemInfo(system_info);
-qemu_real_host_page_size = system_info.dwPageSize;
-}
+GetSystemInfo(system_info);
+qemu_real_host_page_size = system_info.dwPageSize;
 #else
 qemu_real_host_page_size = getpagesize();
 #endif
@@ -310,7 +308,11 @@ static void page_init(void)
 qemu_host_page_size = TARGET_PAGE_SIZE;
 }
 qemu_host_page_mask = ~(qemu_host_page_size - 1);
+}
 
+static void page_init(void)
+{
+page_size_init();
 #if defined(CONFIG_BSD)  defined(CONFIG_USER_ONLY)
 {
 #ifdef HAVE_KINFO_GETVMMAP

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 7/7] vfio: fix mapping of MSIX bar

2014-01-17 Thread Alex Williamson
From: Alexey Kardashevskiy a...@ozlabs.ru

VFIO virtualizes MSIX table for the guest but not mapping the part of
a BAR which contains an MSIX table. Since vfio_mmap_bar() mmaps chunks
before and after the MSIX table, they have to be aligned to the host
page size which may be TARGET_PAGE_MASK (4K) or 64K in case of PPC64.

This fixes boundaries calculations to use the real host page size.

Without the patch, the chunk before MSIX table may overlap with the MSIX
table and mmap will fail in the host kernel. The result will be serious
slowdown as the whole BAR will be emulated by QEMU.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/misc/vfio.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 432547c..8a1f1a1 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -2544,7 +2544,7 @@ static void vfio_map_bar(VFIODevice *vdev, int nr)
  * potentially insert a direct-mapped subregion before and after it.
  */
 if (vdev-msix  vdev-msix-table_bar == nr) {
-size = vdev-msix-table_offset  TARGET_PAGE_MASK;
+size = vdev-msix-table_offset  qemu_host_page_mask;
 }
 
 strncat(name,  mmap, sizeof(name) - strlen(name) - 1);
@@ -2556,8 +2556,8 @@ static void vfio_map_bar(VFIODevice *vdev, int nr)
 if (vdev-msix  vdev-msix-table_bar == nr) {
 unsigned start;
 
-start = TARGET_PAGE_ALIGN(vdev-msix-table_offset +
-  (vdev-msix-entries * PCI_MSIX_ENTRY_SIZE));
+start = HOST_PAGE_ALIGN(vdev-msix-table_offset +
+(vdev-msix-entries * PCI_MSIX_ENTRY_SIZE));
 
 size = start  bar-size ? bar-size - start : 0;
 strncat(name,  msix-hi, sizeof(name) - strlen(name) - 1);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: SVM: fix NMI window after iret

2014-01-17 Thread Radim Krčmář
We should open NMI window right after an iret, but SVM exits before it.
We wanted to single step using the trap flag and then open it.
(or we could emulate the iret instead)
We don't do it since commit 3842d135ff2 (likely), because the iret exit
handler does not request an event, so NMI window remains closed until
the next exit.

Fix this by making KVM_REQ_EVENT request in the iret handler.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 (btw. kvm-unit-tests weren't executed on SVM since Nov 2010, at least)

 arch/x86/kvm/svm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c7168a5..b5a735b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2829,6 +2829,7 @@ static int iret_interception(struct vcpu_svm *svm)
clr_intercept(svm, INTERCEPT_IRET);
svm-vcpu.arch.hflags |= HF_IRET_MASK;
svm-nmi_iret_rip = kvm_rip_read(svm-vcpu);
+   kvm_make_request(KVM_REQ_EVENT, svm-vcpu);
return 1;
 }
 
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: fix NMI window after iret

2014-01-17 Thread Greg KH
On Fri, Jan 17, 2014 at 08:52:42PM +0100, Radim Krčmář wrote:
 We should open NMI window right after an iret, but SVM exits before it.
 We wanted to single step using the trap flag and then open it.
 (or we could emulate the iret instead)
 We don't do it since commit 3842d135ff2 (likely), because the iret exit
 handler does not request an event, so NMI window remains closed until
 the next exit.
 
 Fix this by making KVM_REQ_EVENT request in the iret handler.
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  (btw. kvm-unit-tests weren't executed on SVM since Nov 2010, at least)
 
  arch/x86/kvm/svm.c | 1 +
  1 file changed, 1 insertion(+)


formletter

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

/formletter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: fix NMI window after iret

2014-01-17 Thread Radim Krčmář
2014-01-17 12:18-0800, Greg KH:
 On Fri, Jan 17, 2014 at 08:52:42PM +0100, Radim Krčmář wrote:
  We should open NMI window right after an iret, but SVM exits before it.
  We wanted to single step using the trap flag and then open it.
  (or we could emulate the iret instead)
  We don't do it since commit 3842d135ff2 (likely), because the iret exit
  handler does not request an event, so NMI window remains closed until
  the next exit.
  
  Fix this by making KVM_REQ_EVENT request in the iret handler.
  
  Signed-off-by: Radim Krčmář rkrc...@redhat.com
  ---
   (btw. kvm-unit-tests weren't executed on SVM since Nov 2010, at least)
  
   arch/x86/kvm/svm.c | 1 +
   1 file changed, 1 insertion(+)
 
 
 formletter
 
 This is not the correct way to submit patches for inclusion in the
 stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
 for how to do this properly.
 
 /formletter

Welp, at the last second, I decided it is not that critical to have it
in stable and forgot to clean the git-send-email command line too.

Please ignore this patch in stable.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/9] Phase out pci_enable_msi_block()

2014-01-17 Thread Bjorn Helgaas
On Fri, Jan 17, 2014 at 9:02 AM, Alexander Gordeev agord...@redhat.com wrote:
 This series is against next branch in Bjorn's repo:
 git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git

 Changes from v1 to v2:
   - added a regression fix ahci: Fix broken fallback to single
 MSI mode as patch 1/9;
   - the series is reordered to move the regression fix in front;
   - at Bjorn's request pci_enable_msi() is un-deprecated;
   - as result, pci_enable_msi_range(pdev, 1, 1) styled calls
 rolled back to pci_enable_msi(pdev);
   - nvme bug fix moved out as a separate patch 5/9 nvme: Fix
 invalid call to irq_set_affinity_hint()
   - patches changelog elaborated a bit;

 Bjorn,

 As the release is supposedly this weekend, do you prefer
 the patches to go to your tree or to individual trees after
 the release?

I'd be happy to merge them, except for the fact that they probably
wouldn't have any time in -next before I ask Linus to pull them.  So
how about if we wait until after the release, ask the area maintainers
to take them, and if they don't take them, I'll put them in my tree
for v3.15?

Bjorn
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v5] add support for Hyper-V reference time counter

2014-01-17 Thread Vadim Rozenfeld
On Fri, 2014-01-17 at 14:25 +0100, Paolo Bonzini wrote:
 Il 17/01/2014 14:18, Marcelo Tosatti ha scritto:
  On Fri, Jan 17, 2014 at 10:06:00PM +1100, Vadim Rozenfeld wrote:
  On Thu, 2014-01-16 at 20:23 -0200, Marcelo Tosatti wrote:
  On Thu, Jan 16, 2014 at 08:18:37PM +1100, Vadim Rozenfeld wrote:
  Signed-off: Peter Lieven p...@kamp.de
  Signed-off: Gleb Natapov
  Signed-off: Vadim Rozenfeld vroze...@redhat.com
   
  After some consideration I decided to submit only Hyper-V reference
  counters support this time. I will submit iTSC support as a separate
  patch as soon as it is ready. 
 
  v1 - v2
  1. mark TSC page dirty as suggested by 
  Eric Northup digitale...@google.com and Gleb
  2. disable local irq when calling get_kernel_ns, 
  as it was done by Peter Lieven p...@amp.de
  3. move check for TSC page enable from second patch
  to this one.
 
  v3 - v4
  Get rid of ref counter offset.
 
  v4 - v5
  replace __copy_to_user with kvm_write_guest
  when updateing iTSC page.
 
  ---
   arch/x86/include/asm/kvm_host.h|  1 +
   arch/x86/include/uapi/asm/hyperv.h | 13 +
   arch/x86/kvm/x86.c | 28 +++-
   include/uapi/linux/kvm.h   |  1 +
   4 files changed, 42 insertions(+), 1 deletion(-)
 
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index ae5d783..33fef07 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -605,6 +605,7 @@ struct kvm_arch {
   /* fields used by HYPER-V emulation */
   u64 hv_guest_os_id;
   u64 hv_hypercall;
  +u64 hv_tsc_page;
   
   #ifdef CONFIG_KVM_MMU_AUDIT
   int audit_point;
  diff --git a/arch/x86/include/uapi/asm/hyperv.h 
  b/arch/x86/include/uapi/asm/hyperv.h
  index b8f1c01..462efe7 100644
  --- a/arch/x86/include/uapi/asm/hyperv.h
  +++ b/arch/x86/include/uapi/asm/hyperv.h
  @@ -28,6 +28,9 @@
   /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
   #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1  1)
   
  +/* A partition's reference time stamp counter (TSC) page */
  +#define HV_X64_MSR_REFERENCE_TSC0x4021
  +
   /*
* There is a single feature flag that signifies the presence of the MSR
* that can be used to retrieve both the local APIC Timer frequency as
  @@ -198,6 +201,9 @@
   #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK\
   (~((1ull  HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) 
  - 1))
   
  +#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x0001
  +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT  12
  +
   #define HV_PROCESSOR_POWER_STATE_C0 0
   #define HV_PROCESSOR_POWER_STATE_C1 1
   #define HV_PROCESSOR_POWER_STATE_C2 2
  @@ -210,4 +216,11 @@
   #define HV_STATUS_INVALID_ALIGNMENT 4
   #define HV_STATUS_INSUFFICIENT_BUFFERS  19
   
  +typedef struct _HV_REFERENCE_TSC_PAGE {
  +__u32 tsc_sequence;
  +__u32 res1;
  +__u64 tsc_scale;
  +__s64 tsc_offset;
  +} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
  +
   #endif
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 5d004da..8e685b8 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -836,11 +836,12 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
* kvm-specific. Those are put in the beginning of the list.
*/
   
  -#define KVM_SAVE_MSRS_BEGIN 10
  +#define KVM_SAVE_MSRS_BEGIN 12
   static u32 msrs_to_save[] = {
   MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
   MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
   HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
  +HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
   HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, 
  MSR_KVM_STEAL_TIME,
   MSR_KVM_PV_EOI_EN,
   MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, 
  MSR_IA32_SYSENTER_EIP,
  @@ -1826,6 +1827,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
   switch (msr) {
   case HV_X64_MSR_GUEST_OS_ID:
   case HV_X64_MSR_HYPERCALL:
  +case HV_X64_MSR_REFERENCE_TSC:
  +case HV_X64_MSR_TIME_REF_COUNT:
   r = true;
   break;
   }
  @@ -1867,6 +1870,20 @@ static int set_msr_hyperv_pw(struct kvm_vcpu 
  *vcpu, u32 msr, u64 data)
   kvm-arch.hv_hypercall = data;
   break;
   }
  +case HV_X64_MSR_REFERENCE_TSC: {
  +u64 gfn;
  +HV_REFERENCE_TSC_PAGE tsc_ref;
  +memset(tsc_ref, 0, sizeof(tsc_ref));
  +kvm-arch.hv_tsc_page = data;
 
  Comment 1)
 
  Is there a reason (that is compliance with spec) to maintain
  value, for HV_X64_MSR_REFERENCE_TSC wrmsr operation, in case
  HV_X64_MSR_TSC_REFERENCE_ENABLE is not set?
   
  Windows seems to be retrieving HV_X64_MSR_REFERENCE_TSC 

Re: KVM: MMU: handle invalid root_hpa at __direct_map

2014-01-17 Thread Marcelo Tosatti
On Fri, Jan 17, 2014 at 01:17:21AM +0200, Rom Freiman wrote:
 Hi everybody,
 Marcelo, your suggestion above should work together with the same
 patch to __direct_mapping.
 
 After running some test on different kernels, the !VALID_PAGE happens
 as following:
 
 Linux localhost.localdomain 3.12.5-200.fc19.x86_64 #1 SMP Tue Dec 17
 22:21:14 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux:
 INVALID PAGE kvm/arch/x86/kvm/paging_tmpl.h:FNAME(fetch):573: 
 
 Linux localhost.localdomain
 3.11.8-200.strato0002.fc19.strato.44671928e2e2.x86_64 #1 SMP Sun Jan 5
 18:30:38 IST 2014 x86_64 x86_64 x86_64 GNU/Linux:
 INVALID PAGE arch/x86/kvm/mmu.c:__direct_map:2695: 
 
 So u probably should add both patches to kvm.
 
 Here is my complete patch for 3.11, if you like:
 
 diff --git a/kvm/arch/x86/kvm/mmu.c b/kvm/arch/x86/kvm/mmu.c
 index 9e9285a..c38e480 100644
 --- a/kvm/arch/x86/kvm/mmu.c
 +++ b/kvm/arch/x86/kvm/mmu.c
 @@ -2691,6 +2691,11 @@ static int __direct_map(struct kvm_vcpu *vcpu,
 gpa_t v, int write,
 int emulate = 0;
 gfn_t pseudo_gfn;
 
 +   if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) {
 +   pgprintk(KERN_WARNING invalid page access %s:%s:%d:
 %llx\n,  __FILE__, __func__, __LINE__, vcpu-arch.mmu.root_hpa);
 +return 0;
 +   }
 +
 for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
 if (iterator.level == level) {
 mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
 @@ -2861,6 +2866,11 @@ static bool fast_page_fault(struct kvm_vcpu
 *vcpu, gva_t gva, int level,
 bool ret = false;
 u64 spte = 0ull;
 
 +   if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) {
 +   pgprintk(KERN_WARNING invalid page access %s:%s:%d:
 %llx\n,  __FILE__, __func__, __LINE__, vcpu-arch.mmu.root_hpa);
 +   return false;
 +   }
 +
 if (!page_fault_can_be_fast(vcpu, error_code))
 return false;
 
 @@ -3255,6 +3265,11 @@ static u64
 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
 struct kvm_shadow_walk_iterator iterator;
 u64 spte = 0ull;
 
 +   if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) {
 +   pgprintk(KERN_WARNING invalid page access %s:%s:%d:
 %llx\n,  __FILE__, __func__, __LINE__, vcpu-arch.mmu.root_hpa);
 +   return spte;
 +   }
 +
 walk_shadow_page_lockless_begin(vcpu);
 for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
 if (!is_shadow_present_pte(spte))
 @@ -4525,6 +4540,11 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu
 *vcpu, u64 addr, u64 sptes[4])
 u64 spte;
 int nr_sptes = 0;
 
 +   if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) {
 +   pgprintk(KERN_WARNING invalid page access %s:%s:%d:
 %llx\n,  __FILE__, __func__, __LINE__, vcpu-arch.mmu.root_hpa);
 +   return nr_sptes;
 +   }
 +
 walk_shadow_page_lockless_begin(vcpu);
 for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
 sptes[iterator.level-1] = spte;
 
 diff --git a/kvm/arch/x86/kvm/paging_tmpl.h b/kvm/arch/x86/kvm/paging_tmpl.h
 index 7769699..202a1fc 100644
 --- a/kvm/arch/x86/kvm/paging_tmpl.h
 +++ b/kvm/arch/x86/kvm/paging_tmpl.h
 @@ -423,6 +423,11 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t 
 addr,
 if (FNAME(gpte_changed)(vcpu, gw, top_level))
 goto out_gpte_changed;
 
 +   if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) {
 +   pgprintk(KERN_WARNING invalid page access %s:%s:%d:
 %llx\n,  __FILE__, __func__, __LINE__, vcpu-arch.mmu.root_hpa);
 +   goto out_gpte_changed;
 +   }
 +
 for (shadow_walk_init(it, vcpu, addr);
  shadow_walk_okay(it)  it.level  gw-level;
  shadow_walk_next(it)) {
 @@ -674,6 +679,12 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
 gva)
  */
 mmu_topup_memory_caches(vcpu);
 
 +   if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) {
 +   pgprintk(KERN_WARNING invalid page access %s:%s:%d:
 %llx\n,  __FILE__, __func__, __LINE__, vcpu-arch.mmu.root_hpa);
 +   WARN_ON(1);
 +return;
 +   }
 
 
 Another issue I'm struggling with is the next bug:
 https://bugzilla.redhat.com/show_bug.cgi?id=1052861
 
 The only thing i'm trying to over there is to run virt-v2v in nested
 environment. May be you have any idea?
 It's also related to memory pressure on L2. Of coarse L0 does not
 crash (using it with the above patch) but L2 is crushing during the
 conversion process.

No idea. I'd start from the double fault.

Tried tracing KVM exception injection in both L0 and L1 
and correlate (event injection by KVM) with the one you see?

http://www.linux-kvm.org/page/Tracing

http://www.linux-kvm.org/page/Perf_events#Tracing_events


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

[PATCH] kvm: make KVM_MMU_AUDIT help text more readable

2014-01-17 Thread Randy Dunlap
From: Randy Dunlap rdun...@infradead.org

Make KVM_MMU_AUDIT kconfig help text readable and collapse
two spaces between words down to one space.

Signed-off-by: Randy Dunlap rdun...@infradead.org
---
 arch/x86/kvm/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-313-rc8.orig/arch/x86/kvm/Kconfig
+++ lnx-313-rc8/arch/x86/kvm/Kconfig
@@ -65,7 +65,7 @@ config KVM_MMU_AUDIT
depends on KVM  TRACEPOINTS
---help---
 This option adds a R/W kVM module parameter 'mmu_audit', which allows
-audit  KVM MMU at runtime.
+auditing of KVM MMU events at runtime.
 
 config KVM_DEVICE_ASSIGNMENT
bool KVM legacy PCI device assignment support
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-17 Thread Christoffer Dall
On Fri, Jan 17, 2014 at 06:52:57PM +, Peter Maydell wrote:
 On 17 January 2014 17:53, Peter Maydell peter.mayd...@linaro.org wrote:
  Specifically, the KVM API says here's a uint8_t[] byte
  array and a length, and the current QEMU code treats that
  as this is a byte array written as if the guest CPU
  (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
  I/O access to this buffer rather than to the device.
 
  The KVM API docs don't actually specify the endianness
  semantics of the byte array, but I think that that really
  needs to be nailed down. I can think of a couple of options:
   * always LE
   * always BE
 [these first two are non-starters because they would
 break either x86 or PPC existing code]
   * always the endianness the guest is at the time
   * always some arbitrary endianness based purely on the
 endianness the KVM implementation used historically
   * always the endianness of the host QEMU binary
   * something else?
 
  Any preferences? Current QEMU code basically assumes
  always the endianness of TARGET_WORDS_BIGENDIAN,
  which is pretty random.
 
 Having thought a little more about this, my opinion is:
 
  * we should specify that the byte order of the mmio.data
array is host kernel endianness (ie same endianness
as the QEMU process itself) [this is what it actually
is, I think, for all the cases that work today]

I completely agree, given that it's too late to be set on always LE/BE,
I think the natural choice is something that allows a user to cast the
byte array to an appropriate pointer type and dereference it.

And I think we need to amend the KVM API docs to specify this.

-- 
Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/9] Phase out pci_enable_msi_block()

2014-01-17 Thread Alexander Gordeev
On Fri, Jan 17, 2014 at 02:00:32PM -0700, Bjorn Helgaas wrote:
  As the release is supposedly this weekend, do you prefer
  the patches to go to your tree or to individual trees after
  the release?
 
 I'd be happy to merge them, except for the fact that they probably
 wouldn't have any time in -next before I ask Linus to pull them.  So
 how about if we wait until after the release, ask the area maintainers
 to take them, and if they don't take them, I'll put them in my tree
 for v3.15?

Patch 11 depends on patches 1-10, so I am not sure how to better handle it.
Whatever works for you ;)

I am only concerned with a regression fix ahci: Fix broken fallback to
single MSI mode which would be nice to have in 3.14. But it seems pretty
much too late.

 Bjorn

Thanks!

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM and variable-endianness guest CPUs

2014-01-17 Thread Alexander Graf


 Am 18.01.2014 um 05:24 schrieb Christoffer Dall christoffer.d...@linaro.org:
 
 On Fri, Jan 17, 2014 at 06:52:57PM +, Peter Maydell wrote:
 On 17 January 2014 17:53, Peter Maydell peter.mayd...@linaro.org wrote:
 Specifically, the KVM API says here's a uint8_t[] byte
 array and a length, and the current QEMU code treats that
 as this is a byte array written as if the guest CPU
 (a) were in TARGET_WORDS_BIGENDIAN order and (b) wrote its
 I/O access to this buffer rather than to the device.
 
 The KVM API docs don't actually specify the endianness
 semantics of the byte array, but I think that that really
 needs to be nailed down. I can think of a couple of options:
 * always LE
 * always BE
   [these first two are non-starters because they would
   break either x86 or PPC existing code]
 * always the endianness the guest is at the time
 * always some arbitrary endianness based purely on the
   endianness the KVM implementation used historically
 * always the endianness of the host QEMU binary
 * something else?
 
 Any preferences? Current QEMU code basically assumes
 always the endianness of TARGET_WORDS_BIGENDIAN,
 which is pretty random.
 
 Having thought a little more about this, my opinion is:
 
 * we should specify that the byte order of the mmio.data
   array is host kernel endianness (ie same endianness
   as the QEMU process itself) [this is what it actually
   is, I think, for all the cases that work today]
 
 I completely agree, given that it's too late to be set on always LE/BE,
 I think the natural choice is something that allows a user to cast the
 byte array to an appropriate pointer type and dereference it.
 
 And I think we need to amend the KVM API docs to specify this.

I don't see the problem. For ppc we always do mmio emulation as if the cpu was 
big endian. We've had an is_bigendian variable for that since the very first 
versions.


Alex

 
 -- 
 Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html