Re: [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls

2015-12-21 Thread Alexey Kardashevskiy

On 12/08/2015 04:48 PM, David Gibson wrote:

On Tue, Sep 15, 2015 at 08:49:39PM +1000, Alexey Kardashevskiy wrote:

This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
devices or emulated PCI.  These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition between kernel and user space.

This implements the KVM_CAP_PPC_MULTITCE capability. When present,
the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
If they can not be handled by the kernel, they are passed on to
the user space. The user space still has to have an implementation
for these.

Both HV and PR-syle KVM are supported.

Signed-off-by: Alexey Kardashevskiy 
---
  Documentation/virtual/kvm/api.txt   |  25 ++
  arch/powerpc/include/asm/kvm_ppc.h  |  12 +++
  arch/powerpc/kvm/book3s_64_vio.c| 111 +++-
  arch/powerpc/kvm/book3s_64_vio_hv.c | 145 ++--
  arch/powerpc/kvm/book3s_hv.c|  26 +-
  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
  arch/powerpc/kvm/book3s_pr_papr.c   |  35 
  arch/powerpc/kvm/powerpc.c  |   3 +
  8 files changed, 350 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index d86d831..593c62a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3019,6 +3019,31 @@ Returns: 0 on success, -1 on error

  Queues an SMI on the thread's vcpu.

+4.97 KVM_CAP_PPC_MULTITCE
+
+Capability: KVM_CAP_PPC_MULTITCE
+Architectures: ppc
+Type: vm
+
+This capability means the kernel is capable of handling hypercalls
+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
+space. This significantly accelerates DMA operations for PPC KVM guests.
+User space should expect that its handlers for these hypercalls
+are not going to be called if user space previously registered LIOBN
+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
+
+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+The hypercalls mentioned above may or may not be processed successfully
+in the kernel based fast path. If they can not be handled by the kernel,
+they will get passed on to user space. So user space still has to have
+an implementation for these despite the in kernel acceleration.
+
+This capability is always enabled.
+
  5. The kvm_run structure
  

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fcde896..e5b968e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);

  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
+   struct kvm_vcpu *vcpu, unsigned long liobn);
  extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
unsigned long ioba, unsigned long npages);
  extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
unsigned long tce);
+extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+   unsigned long *ua, unsigned long **prmap);
+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
+   unsigned long idx, unsigned long tce);
  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba, unsigned long tce);
+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+   unsigned long liobn, unsigned long ioba,
+   unsigned long tce_list, unsigned long npages);
+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+   unsigned long liobn, unsigned long ioba,
+   unsigned long tce_value, unsigned long npages);
  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba);
  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index e347856..d3fc732 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -14,6 +14,7 @@
   *
   * Copyright 2010 Paul Mackerras, IBM Corp. 
   * Copyright 2011 David Gibson, IBM Corporation 
+ * Copyright 2013 Alexey Kardashevskiy, IBM Corporation 
   */

  #include 
@@ -37,8 +38,7 @@
  #include 
  #include 
  #include 
-
-#define TCES_PER_PAGE  (PAGE_SIZE / sizeof(u64))
+#include 

  static long kvmppc_stt_npages(unsign

Re: [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers

2015-12-21 Thread Alexey Kardashevskiy

On 12/08/2015 04:27 PM, David Gibson wrote:

On Tue, Sep 15, 2015 at 08:49:37PM +1000, Alexey Kardashevskiy wrote:

Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
will validate TCE (not to have unexpected bits) and IO address
(to be within the DMA window boundaries).

This introduces helpers to validate TCE and IO address.

Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
  arch/powerpc/kvm/book3s_64_vio_hv.c | 89 -
  2 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index c6ef05b..fcde896 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);

  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+   unsigned long ioba, unsigned long npages);
+extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
+   unsigned long tce);
  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 unsigned long ioba, unsigned long tce);
  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6cf1ab3..f0fd84c 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include 

  #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))

@@ -64,7 +65,7 @@ static struct kvmppc_spapr_tce_table 
*kvmppc_find_table(struct kvm_vcpu *vcpu,
   * WARNING: This will be called in real-mode on HV KVM and virtual
   *  mode on PR KVM
   */
-static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
unsigned long ioba, unsigned long npages)
  {
unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
@@ -76,6 +77,79 @@ static long kvmppc_ioba_validate(struct 
kvmppc_spapr_tce_table *stt,

return H_SUCCESS;
  }
+EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);


Why does it need to be exported - the new users will still be in the
KVM module, won't they?



book3s_64_vio_hv.c contains realmode code and is always compiled into 
vmlinux and the helper is meant to be called from book3s_64_vio.c which may 
compile as a module.






+
+/*
+ * Validates TCE address.
+ * At the moment flags and page mask are validated.
+ * As the host kernel does not access those addresses (just puts them
+ * to the table and user space is supposed to process them), we can skip
+ * checking other things (such as TCE is a guest RAM address or the page
+ * was actually allocated).


Hmm.  These comments apply given that the only current user of this is
the kvm acceleration of userspace TCE tables, but the name suggests it
would validate any TCE, including in kernel ones for which this would
be unsafe.



The function has the "kvmppc_" prefix and the file besides in 
arch/powerpc/kvm, so to my taste it is self-explanatory that it only 
handles TCEs from KVM guests (not even from a random user-space tool), no?





+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *  mode on PR KVM
+ */
+long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+{
+   unsigned long mask = ((1ULL << IOMMU_PAGE_SHIFT_4K) - 1) &
+   ~(TCE_PCI_WRITE | TCE_PCI_READ);
+
+   if (tce & mask)
+   return H_PARAMETER;
+
+   return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
+
+/* Note on the use of page_address() in real mode,
+ *
+ * It is safe to use page_address() in real mode on ppc64 because
+ * page_address() is always defined as lowmem_page_address()
+ * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
+ * operation and does not access page struct.
+ *
+ * Theoretically page_address() could be defined different
+ * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
+ * should be enabled.
+ * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
+ * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
+ * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
+ * is not expected to be enabled on ppc32, page_address()
+ * is safe for ppc32 as well.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *  mode on PR KVM
+ */
+static u64 *kvmppc_page_address(struct page *page)
+{
+#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
+#error TODO: fix to avoid page_address() here
+#endif
+   return (u64 *) page_address(page);
+}
+
+/*
+ * Handles TCE requests for emulated devices.
+ * Puts guest TCE values to the 

RE: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.

2015-12-21 Thread Pavel Fedin
 Hello!

> commit a684d520ed62cf0db4495e5197d5bf722e4f8109
> Author: Peter Hornyack 
> Date:   Fri Dec 18 14:44:04 2015 -0800
> 
> KVM: add capabilities and exit reasons for MSRs.
> 
> Define KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, and
> KVM_EXIT_MSR_AFTER_WRITE, new exit reasons for accesses to MSRs that kvm
> does not handle or that user space needs to be notified about. Define the
> KVM_CAP_MSR_EXITS, KVM_CAP_ENABLE_MSR_EXITS, and KVM_CAP_DISABLE_MSR_EXITS
> capabilities to control these new exits for a VM.
> 
> diff --git a/Documentation/virtual/kvm/api.txt
> b/Documentation/virtual/kvm/api.txt
> index 053f613fc9a9..3bba3248df3d 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3359,6 +3359,43 @@ Hyper-V SynIC state change. Notification is
> used to remap SynIC
>  event/message pages and to enable/disable SynIC messages/events processing
>  in userspace.
> 
> + /*
> + * KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE,
> + * KVM_EXIT_MSR_AFTER_WRITE
> + */
> + struct {
> + __u32 index;/* i.e. ecx; out */
> + __u64 data; /* out (wrmsr) / in (rdmsr) */
> +#define KVM_EXIT_MSR_COMPLETION_FAILED 1
> + __u64 type; /* out */
> +#define KVM_EXIT_MSR_UNHANDLED 0
> +#define KVM_EXIT_MSR_HANDLED   1
> + __u8 handled;   /* in */
> + } msr;
> +
> +If exit_reason is KVM_EXIT_MSR_READ or KVM_EXIT_MSR_WRITE, then the vcpu has
> +executed a rdmsr or wrmsr instruction which could not be satisfied by kvm. 
> The
> +msr struct is used for both output to and input from user space. index is the
> +target MSR number held in ecx; user space must not modify this field. data

 In 'index', you meant?
 I would enlarge it to __u64 and use generalized encoding, the same as for 
KVM_SET_ONE_REG ioctl. I already wrote about it.
 And i would use simply "REG" instead of "MSR" denotion. Because on different 
architectures they can have different names (e. g. on ARM32 they are called 
"coprocessor registers" and on ARM64 these are "system registers"), however the 
common thing between them is that it is some special CPU register, access to 
which can be trapped and emulated. Therefore KVM_EXIT_REG_xxx.

> +holds the payload from a wrmsr or must be filled in with a payload on a 
> rdmsr.
> +For a normal exit, type will be 0.
> +
> +On the return path into kvm, user space should set handled to
> +KVM_EXIT_MSR_HANDLED if it successfully handled the MSR access. Otherwise,
> +handled should be set to KVM_EXIT_MSR_UNHANDLED, which will cause a general
> +protection fault to be injected into the vcpu. If an error occurs during the
> +return into kvm, the vcpu will not be run and another exit will be generated
> +with type set to KVM_EXIT_MSR_COMPLETION_FAILED.
> +
> +If exit_reason is KVM_EXIT_MSR_AFTER_WRITE, then the vcpu has executed a 
> wrmsr
> +instruction which is handled by kvm but which user space may need to be
> +notified about. index and data are set as described above; the value of type
> +depends on the MSR that was written. handled is ignored on reentry into kvm.

1. Is there any real need to distinguish between KVM_EXIT_MSR_WRITE and 
KVM_EXIT_MSR_AFTER_WRITE ? IMHO from userland's point of view these are the 
same.
2. Why do WRITE and READ have to be different exit codes? We could use 
something like "u8 is_write" in our structure, this would be more in line with 
PIO and MMIO handling.

> +
> +KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, and KVM_EXIT_MSR_AFTER_WRITE can only
> +occur when KVM_CAP_MSR_EXITS is present and KVM_CAP_ENABLE_MSR_EXITS has been
> +set. A detailed description of these capabilities is below.
> +
>   /* Fix the size of the union. */
>   char padding[256];
>   };
> @@ -3697,6 +3734,26 @@ a KVM_EXIT_IOAPIC_EOI vmexit will be reported
> to userspace.
>  Fails if VCPU has already been created, or if the irqchip is already in the
>  kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
> 
> +7.6 KVM_CAP_ENABLE_MSR_EXITS, KVM_CAP_DISABLE_MSR_EXITS
> +
> +Architectures: x86 (vmx-only)
> +Parameters: none
> +Returns: 0 on success, -1 on error
> +
> +These capabilities enable and disable exits to user space for certain guest 
> MSR
> +accesses. These capabilities are only available if KVM_CHECK_EXTENSION
> +indicates that KVM_CAP_MSR_EXITS is present.
> +
> +When enabled, kvm will exit to user space when the guest reads
> +an MSR that kvm does not handle (KVM_EXIT_MSR_READ), writes an MSR that kvm
> +does not handle (KVM_EXIT_MSR_WRITE), or writes an MSR that kvm handles but 
> for
> +which user space should be notified (KVM_EXIT_MSR_AFTER_WRITE).
> +
> +These exits are currently only implemented for vmx. Also, note that if the 
> kvm
> +module's ignore_msrs flag is set then KVM_EXIT_MSR_READ and 
> KVM_EXIT_MSR_WRITE
> +will not be generated, and unhandled MSR accesses will simply be ignored and
> +the guest re-entered immediately.
> +
> 
>  8. Other capabilities.
>  --
> @@ -3726,3 +3783,11 @@ In order to use SynIC, it ha

RE: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-21 Thread Wu, Feng
Hi Radim/Paolo,

> -Original Message-
> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> Sent: Tuesday, December 22, 2015 3:14 PM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu
> (jiang@linux.intel.com) 
> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> On 2015/12/22 14:59, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> >> Sent: Tuesday, December 22, 2015 2:49 PM
> >> To: Wu, Feng ; pbonz...@redhat.com;
> >> rkrc...@redhat.com
> >> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu
> >> (jiang@linux.intel.com) 
> >> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
> >> priority interrupts
> >>
> >>
> >> On 2015/12/16 9:37, Feng Wu wrote:
> >>> Use vector-hashing to deliver lowest-priority interrupts, As an
> >>> example, modern Intel CPUs in server platform use this method to
> >>> handle lowest-priority interrupts.
> >>>
> >>> Signed-off-by: Feng Wu 
> >>> ---
> >>>  arch/x86/kvm/irq_comm.c | 27 ++-
> >>>  arch/x86/kvm/lapic.c| 57
> >> -
> >>>  arch/x86/kvm/lapic.h|  2 ++
> >>>  arch/x86/kvm/x86.c  |  9 
> >>>  arch/x86/kvm/x86.h  |  1 +
> >>>  5 files changed, 81 insertions(+), 15 deletions(-)
> >>>
> >>>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct
> kvm_lapic
>  *src,
> >>>   struct kvm_lapic_irq *irq, int *r, unsigned long
> >> *dest_map)
> >>>  {
> >>> @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct
> kvm
> >> *kvm, struct kvm_lapic *src,
> >>>   dst = map->logical_map[cid];
> >>>
> >>>   if (kvm_lowest_prio_delivery(irq)) {
> >>> - int l = -1;
> >>> - for_each_set_bit(i, &bitmap, 16) {
> >>> - if (!dst[i])
> >>> - continue;
> >>> - if (l < 0)
> >>> - l = i;
> >>> - else if (kvm_apic_compare_prio(dst[i]-
> >vcpu,
> >> dst[l]->vcpu) < 0)
> >>> - l = i;
> >>> + if (!kvm_vector_hashing_enabled()) {
> >>> + int l = -1;
> >>> + for_each_set_bit(i, &bitmap, 16) {
> >>> + if (!dst[i])
> >>> + continue;
> >>> + if (l < 0)
> >>> + l = i;
> >>> + else if
> (kvm_apic_compare_prio(dst[i]-
> >>> vcpu, dst[l]->vcpu) < 0)
> >>> + l = i;
> >>> + }
> >>> + bitmap = (l >= 0) ? 1 << l : 0;
> >>> + } else {
> >>> + int idx = 0;
> >>> + unsigned int dest_vcpus = 0;
> >>> +
> >>> + for_each_set_bit(i, &bitmap, 16) {
> >>> + if (!dst[i]
> >> && !kvm_lapic_enabled(dst[i]->vcpu)) {
> >>
> >> It should be or(||) not and (&&).
> >
> > Oh, you are right! My negligence! Thanks for pointing this out, Yang!
> 
>  btw, i think the kvm_lapic_enabled check is wrong here? Why need it here?
> >>>
> >>> If the lapic is not enabled, I think we cannot recognize it as a 
> >>> candidate, can
> >> we?
> >>> Maybe Radim can confirm this, Radim, what is your option?
> >>
> >> Lapic can be disable by hw or sw. Here we only need to check the hw is
> >> enough which is already covered while injecting the interrupt into
> >> guest. I remember we(Glab, Macelo and me) have discussed it several ago,
> >> but i cannot find the mail thread.
> >
> > But if the lapic is disabled by software, we cannot still inject interrupts 
> > to
> > it, can we?
> 
> Yes, We cannot inject the normal interrupt. But this already covered by
> current logic and add a check here seems meaningless. Conversely, it may
> do bad thing..
> 

Let's wait for Radim/Paolo's opinions about this.

Thanks,
Feng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-21 Thread Yang Zhang

On 2015/12/22 14:59, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Tuesday, December 22, 2015 2:49 PM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu
(jiang@linux.intel.com) 
Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
priority interrupts



On 2015/12/16 9:37, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.

Signed-off-by: Feng Wu 
---
 arch/x86/kvm/irq_comm.c | 27 ++-
 arch/x86/kvm/lapic.c| 57

-

 arch/x86/kvm/lapic.h|  2 ++
 arch/x86/kvm/x86.c  |  9 
 arch/x86/kvm/x86.h  |  1 +
 5 files changed, 81 insertions(+), 15 deletions(-)

 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic

*src,

struct kvm_lapic_irq *irq, int *r, unsigned long

*dest_map)

 {
@@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm

*kvm, struct kvm_lapic *src,

dst = map->logical_map[cid];

if (kvm_lowest_prio_delivery(irq)) {
-   int l = -1;
-   for_each_set_bit(i, &bitmap, 16) {
-   if (!dst[i])
-   continue;
-   if (l < 0)
-   l = i;
-   else if (kvm_apic_compare_prio(dst[i]->vcpu,

dst[l]->vcpu) < 0)

-   l = i;
+   if (!kvm_vector_hashing_enabled()) {
+   int l = -1;
+   for_each_set_bit(i, &bitmap, 16) {
+   if (!dst[i])
+   continue;
+   if (l < 0)
+   l = i;
+   else if (kvm_apic_compare_prio(dst[i]-
vcpu, dst[l]->vcpu) < 0)
+   l = i;
+   }
+   bitmap = (l >= 0) ? 1 << l : 0;
+   } else {
+   int idx = 0;
+   unsigned int dest_vcpus = 0;
+
+   for_each_set_bit(i, &bitmap, 16) {
+   if (!dst[i]

&& !kvm_lapic_enabled(dst[i]->vcpu)) {

It should be or(||) not and (&&).


Oh, you are right! My negligence! Thanks for pointing this out, Yang!


btw, i think the kvm_lapic_enabled check is wrong here? Why need it here?


If the lapic is not enabled, I think we cannot recognize it as a candidate, can

we?

Maybe Radim can confirm this, Radim, what is your option?


Lapic can be disable by hw or sw. Here we only need to check the hw is
enough which is already covered while injecting the interrupt into
guest. I remember we(Glab, Macelo and me) have discussed it several ago,
but i cannot find the mail thread.


But if the lapic is disabled by software, we cannot still inject interrupts to
it, can we?


Yes, We cannot inject the normal interrupt. But this already covered by 
current logic and add a check here seems meaningless. Conversely, it may 
do bad thing..


--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-21 Thread Wu, Feng


> -Original Message-
> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> Sent: Tuesday, December 22, 2015 2:49 PM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu
> (jiang@linux.intel.com) 
> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> 
>  On 2015/12/16 9:37, Feng Wu wrote:
> > Use vector-hashing to deliver lowest-priority interrupts, As an
> > example, modern Intel CPUs in server platform use this method to
> > handle lowest-priority interrupts.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > arch/x86/kvm/irq_comm.c | 27 ++-
> > arch/x86/kvm/lapic.c| 57
>  -
> > arch/x86/kvm/lapic.h|  2 ++
> > arch/x86/kvm/x86.c  |  9 
> > arch/x86/kvm/x86.h  |  1 +
> > 5 files changed, 81 insertions(+), 15 deletions(-)
> >
> > bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
> >> *src,
> > struct kvm_lapic_irq *irq, int *r, unsigned long
> *dest_map)
> > {
> > @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm
>  *kvm, struct kvm_lapic *src,
> > dst = map->logical_map[cid];
> >
> > if (kvm_lowest_prio_delivery(irq)) {
> > -   int l = -1;
> > -   for_each_set_bit(i, &bitmap, 16) {
> > -   if (!dst[i])
> > -   continue;
> > -   if (l < 0)
> > -   l = i;
> > -   else if 
> > (kvm_apic_compare_prio(dst[i]->vcpu,
>  dst[l]->vcpu) < 0)
> > -   l = i;
> > +   if (!kvm_vector_hashing_enabled()) {
> > +   int l = -1;
> > +   for_each_set_bit(i, &bitmap, 16) {
> > +   if (!dst[i])
> > +   continue;
> > +   if (l < 0)
> > +   l = i;
> > +   else if 
> > (kvm_apic_compare_prio(dst[i]-
> > vcpu, dst[l]->vcpu) < 0)
> > +   l = i;
> > +   }
> > +   bitmap = (l >= 0) ? 1 << l : 0;
> > +   } else {
> > +   int idx = 0;
> > +   unsigned int dest_vcpus = 0;
> > +
> > +   for_each_set_bit(i, &bitmap, 16) {
> > +   if (!dst[i]
>  && !kvm_lapic_enabled(dst[i]->vcpu)) {
> 
>  It should be or(||) not and (&&).
> >>>
> >>> Oh, you are right! My negligence! Thanks for pointing this out, Yang!
> >>
> >> btw, i think the kvm_lapic_enabled check is wrong here? Why need it here?
> >
> > If the lapic is not enabled, I think we cannot recognize it as a candidate, 
> > can
> we?
> > Maybe Radim can confirm this, Radim, what is your option?
> 
> Lapic can be disable by hw or sw. Here we only need to check the hw is
> enough which is already covered while injecting the interrupt into
> guest. I remember we(Glab, Macelo and me) have discussed it several ago,
> but i cannot find the mail thread.

But if the lapic is disabled by software, we cannot still inject interrupts to
it, can we?

Thanks,
Feng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-21 Thread Yang Zhang

On 2015/12/22 12:37, Wu, Feng wrote:




-Original Message-
From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
ow...@vger.kernel.org] On Behalf Of Yang Zhang
Sent: Monday, December 21, 2015 10:06 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
priority interrupts

On 2015/12/21 9:50, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Monday, December 21, 2015 9:46 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
priority interrupts

On 2015/12/16 9:37, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.

Signed-off-by: Feng Wu 
---
arch/x86/kvm/irq_comm.c | 27 ++-
arch/x86/kvm/lapic.c| 57

-

arch/x86/kvm/lapic.h|  2 ++
arch/x86/kvm/x86.c  |  9 
arch/x86/kvm/x86.h  |  1 +
5 files changed, 81 insertions(+), 15 deletions(-)

bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic

*src,

struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
{
@@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm

*kvm, struct kvm_lapic *src,

dst = map->logical_map[cid];

if (kvm_lowest_prio_delivery(irq)) {
-   int l = -1;
-   for_each_set_bit(i, &bitmap, 16) {
-   if (!dst[i])
-   continue;
-   if (l < 0)
-   l = i;
-   else if (kvm_apic_compare_prio(dst[i]->vcpu,

dst[l]->vcpu) < 0)

-   l = i;
+   if (!kvm_vector_hashing_enabled()) {
+   int l = -1;
+   for_each_set_bit(i, &bitmap, 16) {
+   if (!dst[i])
+   continue;
+   if (l < 0)
+   l = i;
+   else if (kvm_apic_compare_prio(dst[i]-
vcpu, dst[l]->vcpu) < 0)
+   l = i;
+   }
+   bitmap = (l >= 0) ? 1 << l : 0;
+   } else {
+   int idx = 0;
+   unsigned int dest_vcpus = 0;
+
+   for_each_set_bit(i, &bitmap, 16) {
+   if (!dst[i]

&& !kvm_lapic_enabled(dst[i]->vcpu)) {

It should be or(||) not and (&&).


Oh, you are right! My negligence! Thanks for pointing this out, Yang!


btw, i think the kvm_lapic_enabled check is wrong here? Why need it here?


If the lapic is not enabled, I think we cannot recognize it as a candidate, can 
we?
Maybe Radim can confirm this, Radim, what is your option?


Lapic can be disable by hw or sw. Here we only need to check the hw is 
enough which is already covered while injecting the interrupt into 
guest. I remember we(Glab, Macelo and me) have discussed it several ago, 
but i cannot find the mail thread.




Thanks,
Feng




--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-21 Thread Yang Zhang

On 2015/12/22 12:36, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Monday, December 21, 2015 10:01 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
posted-interrupts

On 2015/12/21 9:55, Wu, Feng wrote:




-Original Message-
From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
ow...@vger.kernel.org] On Behalf Of Yang Zhang
Sent: Monday, December 21, 2015 9:50 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
posted-interrupts

On 2015/12/16 9:37, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts for
VT-d posted-interrupts.

Signed-off-by: Feng Wu 
---
arch/x86/kvm/lapic.c | 67



arch/x86/kvm/lapic.h |  2 ++
arch/x86/kvm/vmx.c   | 12 --
3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e29001f..d4f2c8f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -854,6 +854,73 @@ out:
}

/*
+ * This routine handles lowest-priority interrupts using vector-hashing
+ * mechanism. As an example, modern Intel CPUs use this method to

handle

+ * lowest-priority interrupts.
+ *
+ * Here is the details about the vector-hashing mechanism:
+ * 1. For lowest-priority interrupts, store all the possible destination
+ *vCPUs in an array.
+ * 2. Use "guest vector % max number of destination vCPUs" to find the

right

+ *destination vCPU in the array for the lowest-priority interrupt.
+ */
+struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
+ struct kvm_lapic_irq *irq)
+{
+   struct kvm_apic_map *map;
+   struct kvm_vcpu *vcpu = NULL;
+
+   if (irq->shorthand)
+   return NULL;
+
+   rcu_read_lock();
+   map = rcu_dereference(kvm->arch.apic_map);
+
+   if (!map)
+   goto out;
+
+   if ((irq->dest_mode != APIC_DEST_PHYSICAL) &&
+   kvm_lowest_prio_delivery(irq)) {
+   u16 cid;
+   int i, idx = 0;
+   unsigned long bitmap = 1;
+   unsigned int dest_vcpus = 0;
+   struct kvm_lapic **dst = NULL;
+
+
+   if (!kvm_apic_logical_map_valid(map))
+   goto out;
+
+   apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
+
+   if (cid >= ARRAY_SIZE(map->logical_map))
+   goto out;
+
+   dst = map->logical_map[cid];
+
+   for_each_set_bit(i, &bitmap, 16) {
+   if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) {
+   clear_bit(i, &bitmap);
+   continue;
+   }
+   }
+
+   dest_vcpus = hweight16(bitmap);
+
+   if (dest_vcpus != 0) {
+   idx = kvm_vector_2_index(irq->vector, dest_vcpus,
+&bitmap, 16);
+   vcpu = dst[idx-1]->vcpu;
+   }
+   }
+
+out:
+   rcu_read_unlock();
+   return vcpu;
+}
+EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
+
+/*
 * Add a pending IRQ into lapic.
 * Return 1 if successfully added and 0 if discarded.
 */
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6890ef0..52bffce 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,

struct kvm_lapic_irq *irq,

struct kvm_vcpu **dest_vcpu);
int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
   const unsigned long *bitmap, u32 bitmap_size);
+struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
+ struct kvm_lapic_irq *irq);
#endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5eb56ed..3f89189 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm

*kvm,

unsigned int host_irq,

 */

kvm_set_msi_irq(e, &irq);
-   if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
-   continue;
+
+   if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
+   if (!kvm_vector_hashing_enabled() ||
+   irq.delivery_mode !=

APIC_DM_LOWEST)

+   continue;
+
+   vcpu = kvm_intr_vector_hashing_dest(kvm, &irq);
+   if (!vcpu)
+   continue;
+   

RE: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-21 Thread Wu, Feng


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Yang Zhang
> Sent: Monday, December 21, 2015 10:06 AM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> On 2015/12/21 9:50, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> >> Sent: Monday, December 21, 2015 9:46 AM
> >> To: Wu, Feng ; pbonz...@redhat.com;
> >> rkrc...@redhat.com
> >> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
> >> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
> >> priority interrupts
> >>
> >> On 2015/12/16 9:37, Feng Wu wrote:
> >>> Use vector-hashing to deliver lowest-priority interrupts, As an
> >>> example, modern Intel CPUs in server platform use this method to
> >>> handle lowest-priority interrupts.
> >>>
> >>> Signed-off-by: Feng Wu 
> >>> ---
> >>>arch/x86/kvm/irq_comm.c | 27 ++-
> >>>arch/x86/kvm/lapic.c| 57
> >> -
> >>>arch/x86/kvm/lapic.h|  2 ++
> >>>arch/x86/kvm/x86.c  |  9 
> >>>arch/x86/kvm/x86.h  |  1 +
> >>>5 files changed, 81 insertions(+), 15 deletions(-)
> >>>
> >>>bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
> *src,
> >>>   struct kvm_lapic_irq *irq, int *r, unsigned long 
> >>> *dest_map)
> >>>{
> >>> @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm
> >> *kvm, struct kvm_lapic *src,
> >>>   dst = map->logical_map[cid];
> >>>
> >>>   if (kvm_lowest_prio_delivery(irq)) {
> >>> - int l = -1;
> >>> - for_each_set_bit(i, &bitmap, 16) {
> >>> - if (!dst[i])
> >>> - continue;
> >>> - if (l < 0)
> >>> - l = i;
> >>> - else if (kvm_apic_compare_prio(dst[i]->vcpu,
> >> dst[l]->vcpu) < 0)
> >>> - l = i;
> >>> + if (!kvm_vector_hashing_enabled()) {
> >>> + int l = -1;
> >>> + for_each_set_bit(i, &bitmap, 16) {
> >>> + if (!dst[i])
> >>> + continue;
> >>> + if (l < 0)
> >>> + l = i;
> >>> + else if (kvm_apic_compare_prio(dst[i]-
> >>> vcpu, dst[l]->vcpu) < 0)
> >>> + l = i;
> >>> + }
> >>> + bitmap = (l >= 0) ? 1 << l : 0;
> >>> + } else {
> >>> + int idx = 0;
> >>> + unsigned int dest_vcpus = 0;
> >>> +
> >>> + for_each_set_bit(i, &bitmap, 16) {
> >>> + if (!dst[i]
> >> && !kvm_lapic_enabled(dst[i]->vcpu)) {
> >>
> >> It should be or(||) not and (&&).
> >
> > Oh, you are right! My negligence! Thanks for pointing this out, Yang!
> 
> btw, i think the kvm_lapic_enabled check is wrong here? Why need it here?

If the lapic is not enabled, I think we cannot recognize it as a candidate, can 
we?
Maybe Radim can confirm this, Radim, what is your option?

Thanks,
Feng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-21 Thread Wu, Feng


> -Original Message-
> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> Sent: Monday, December 21, 2015 10:01 AM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
> posted-interrupts
> 
> On 2015/12/21 9:55, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> >> ow...@vger.kernel.org] On Behalf Of Yang Zhang
> >> Sent: Monday, December 21, 2015 9:50 AM
> >> To: Wu, Feng ; pbonz...@redhat.com;
> >> rkrc...@redhat.com
> >> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
> >> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
> >> posted-interrupts
> >>
> >> On 2015/12/16 9:37, Feng Wu wrote:
> >>> Use vector-hashing to deliver lowest-priority interrupts for
> >>> VT-d posted-interrupts.
> >>>
> >>> Signed-off-by: Feng Wu 
> >>> ---
> >>>arch/x86/kvm/lapic.c | 67
> >> 
> >>>arch/x86/kvm/lapic.h |  2 ++
> >>>arch/x86/kvm/vmx.c   | 12 --
> >>>3 files changed, 79 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >>> index e29001f..d4f2c8f 100644
> >>> --- a/arch/x86/kvm/lapic.c
> >>> +++ b/arch/x86/kvm/lapic.c
> >>> @@ -854,6 +854,73 @@ out:
> >>>}
> >>>
> >>>/*
> >>> + * This routine handles lowest-priority interrupts using vector-hashing
> >>> + * mechanism. As an example, modern Intel CPUs use this method to
> handle
> >>> + * lowest-priority interrupts.
> >>> + *
> >>> + * Here is the details about the vector-hashing mechanism:
> >>> + * 1. For lowest-priority interrupts, store all the possible destination
> >>> + *vCPUs in an array.
> >>> + * 2. Use "guest vector % max number of destination vCPUs" to find the
> right
> >>> + *destination vCPU in the array for the lowest-priority interrupt.
> >>> + */
> >>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
> >>> +   struct kvm_lapic_irq *irq)
> >>> +{
> >>> + struct kvm_apic_map *map;
> >>> + struct kvm_vcpu *vcpu = NULL;
> >>> +
> >>> + if (irq->shorthand)
> >>> + return NULL;
> >>> +
> >>> + rcu_read_lock();
> >>> + map = rcu_dereference(kvm->arch.apic_map);
> >>> +
> >>> + if (!map)
> >>> + goto out;
> >>> +
> >>> + if ((irq->dest_mode != APIC_DEST_PHYSICAL) &&
> >>> + kvm_lowest_prio_delivery(irq)) {
> >>> + u16 cid;
> >>> + int i, idx = 0;
> >>> + unsigned long bitmap = 1;
> >>> + unsigned int dest_vcpus = 0;
> >>> + struct kvm_lapic **dst = NULL;
> >>> +
> >>> +
> >>> + if (!kvm_apic_logical_map_valid(map))
> >>> + goto out;
> >>> +
> >>> + apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
> >>> +
> >>> + if (cid >= ARRAY_SIZE(map->logical_map))
> >>> + goto out;
> >>> +
> >>> + dst = map->logical_map[cid];
> >>> +
> >>> + for_each_set_bit(i, &bitmap, 16) {
> >>> + if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) {
> >>> + clear_bit(i, &bitmap);
> >>> + continue;
> >>> + }
> >>> + }
> >>> +
> >>> + dest_vcpus = hweight16(bitmap);
> >>> +
> >>> + if (dest_vcpus != 0) {
> >>> + idx = kvm_vector_2_index(irq->vector, dest_vcpus,
> >>> +  &bitmap, 16);
> >>> + vcpu = dst[idx-1]->vcpu;
> >>> + }
> >>> + }
> >>> +
> >>> +out:
> >>> + rcu_read_unlock();
> >>> + return vcpu;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
> >>> +
> >>> +/*
> >>> * Add a pending IRQ into lapic.
> >>> * Return 1 if successfully added and 0 if discarded.
> >>> */
> >>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> >>> index 6890ef0..52bffce 100644
> >>> --- a/arch/x86/kvm/lapic.h
> >>> +++ b/arch/x86/kvm/lapic.h
> >>> @@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
> >> struct kvm_lapic_irq *irq,
> >>>   struct kvm_vcpu **dest_vcpu);
> >>>int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
> >>>  const unsigned long *bitmap, u32 bitmap_size);
> >>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
> >>> +   struct kvm_lapic_irq *irq);
> >>>#endif
> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>> index 5eb56ed..3f89189 100644
> >>> --- a/arch/x86/kvm/vmx.c
> >>> +++ b/arch/x86/kvm/vmx.c
> >>> @@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm
> *kvm,
> >> unsigned int host_irq,
> >>>*/
> >>>
> >>>   kvm_set_msi_irq(e, &irq);
> >>> - if (!kvm_intr_is_single_vcpu(kvm, &ir

Re: RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Xulei (Stone)
Hi, Kevin,
Can you tell how to reset/reboot this VM, if it goes to the handle_hwpic1()
on its booting procedure? I mean, usually, SeaBIOS would not go to 
handle_hwpic routine. But in my test case, SeaBIOS calls handle_hwpic when
KVM injects a #UD expcetion (not irq) and  SeaBIOS will loop to handle this
if KVM persistently injects exception.
 
Now, i just wish to reset/reboot this VM if it is fall into handle_hwpic. I
tried follwing patch and it seems not work. What can i do to force 
reset/reboot? 

@@ -7,10 +7,11 @@
  #include "biosvar.h" // SET_IVT
  #include "config.h" // CONFIG_*
  #include "output.h" // dprintf
  #include "pic.h" // pic_*
+ #include "hw/ps2port.h"
 
 u16
 pic_irqmask_read(void)
 {
 if (!CONFIG_HARDWARE_IRQ)
@@ -103,10 +104,11 @@ pic_isr2_read(void)
 void VISIBLE16
 handle_hwpic1(struct bregs *regs)
 {
 dprintf(DEBUG_ISR_hwpic1, "handle_hwpic1 irq=%x\n", pic_isr1_read());
 pic_eoi1();
+i8042_reboot();
 }

useful information:
kmod ftrace:
<...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 
8306
<...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
<...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
<...>-31509 [035] 154753.180077: kvm_entry: vcpu 0

bad SeaBIOS log:
[2015-12-17 12:37:30] In 32bit resume
[2015-12-17 12:37:30] =Attempting a hard reboot
[2015-12-17 12:37:30] SeaBIOS (version 
rel-1.8.1-0-g4adadbd-20151217_104405-linux-emBwNn)
[2015-12-17 12:37:30] No Xen hypervisor found.
[2015-12-17 12:37:30] Running on QEMU (i440fx)
[2015-12-17 12:37:30] Running on KVM
[2015-12-17 12:37:30] RamSize: 0x8000 [cmos]
[2015-12-17 12:37:30] Relocating init from 0x000db230 to 0x7ffad360 (size 76768)
[2015-12-17 12:37:30] Found QEMU fw_cfg
[2015-12-17 12:37:30] RamBlock: addr 0x len 0x8000 
[e820]
[2015-12-17 12:37:30] Moving pm_base to 0x600
[2015-12-17 12:37:30] boot order:
[2015-12-17 12:37:30] 1: /pci@i0cf8/ide@1,1/drive@0/disk@0
[2015-12-17 12:37:30] 2: HALT
[2015-12-17 12:37:30] maininit
[2015-12-17 12:37:30] platform_hardware_setup
[2015-12-17 12:37:30] init pic
[2015-12-17 12:37:30] pic_setup
[2015-12-17 12:37:30] pic_reset
[2015-12-17 12:37:30] enable_hwirq
[2015-12-17 12:37:30] CPU Mhz=3304
[2015-12-17 12:37:30] enable_hwirq
[2015-12-17 12:37:30] enable_hwirq
[2015-12-17 12:37:30] === PCI bus & bridge init ===
[2015-12-17 12:37:30] PCI: pci_bios_init_bus_rec bus = 0x0
[2015-12-17 12:37:30] === PCI device probing ===
[2015-12-17 12:37:30] Found 6 PCI devices (max PCI bus is 00)
[2015-12-17 12:37:30] === PCI new allocation pass #1 ===
[2015-12-17 12:37:30] PCI: check devices
[2015-12-17 12:37:30] === PCI new allocation pass #2 ===
[2015-12-17 12:37:30] PCI: IO: c000 - c02f
[2015-12-17 12:37:30] PCI: 32: 8000 - fec0
[2015-12-17 12:37:30] PCI: map device bdf=00:01.2  bar 4, addr c000, size 
0020 [io]
[2015-12-17 12:37:30] PCI: map device bdf=00:01.1  bar 4, addr c020, size 
0010 [io]
[2015-12-17 12:37:30] PCI: map device bdf=00:02.0  bar 6, addr febe, size 
0001 [mem]
[2015-12-17 12:37:30] PCI: map device bdf=00:02.0  bar 1, addr febf, size 
1000 [mem]
[2015-12-17 12:37:30] PCI: map device bdf=00:02.0  bar 0, addr fc00, size 
0200 [prefmem]
[2015-12-17 12:37:30] PCI: init bdf=00:00.0 id=8086:1237
[2015-12-17 12:37:30] PCI: init bdf=00:01.0 id=8086:7000
[2015-12-17 12:37:30] PIIX3/PIIX4 init: elcr=00 0c
[2015-12-17 12:37:30] PCI: init bdf=00:01.1 id=8086:7010
[2015-12-17 12:37:30] PCI: init bdf=00:01.2 id=8086:7020
[2015-12-17 12:37:30] PCI: init bdf=00:01.3 id=8086:7113
[2015-12-17 12:37:30] Using pmtimer, ioport 0x608
[2015-12-17 12:37:30] PCI: init bdf=00:02.0 id=1013:00b8
[2015-12-17 12:37:30] PCI: Using 00:02.0 for primary VGA
[2015-12-17 12:37:30] handle_hshamanpnd:dl leae_p_sismcmp_p:i: d a=ap3  
 <<=== everytime stuck, AP setup log seems abnormal!
[2015-12-17 12:37:30] ièf[cf_^ifd_=f3
[2015-12-17 12:37:30] èf[f^f_f]fÃÍ^XË<90>Found 4 cpu(s) max supported 4 cpu(s)
[2015-12-17 12:37:30] Copying PIR from 0x7ffbea18 to 0x000f5700
[2015-12-17 12:37:30] Copying MPTABLE from 0x6e30/7ffa42c0 to 0x000f55e0
[2015-12-17 12:37:30] Copying SMBIOS entry point from 0x6e11 to 0x000f55c0
[2015-12-17 12:37:31] Scan for VGA option rom
[2015-12-17 12:37:31] Running option rom at c000:0003
[2015-12-17 12:37:31] Start SeaVGABIOS (version 
rel-1.8.1-0-g4adadbd-20150316_085902-nilsson.home.kraxel.org)
[2015-12-17 12:37:31] enter vga_post:
[2015-12-17 12:37:31]a=0010  b=  c=  d= ds= 
es=f000 ss=
[2015-12-17 12:37:31]   si= di=57e0 bp= sp=6dbe cs=f000 
ip=d1fb  f=
[2015-12-17 12:37:31] cirrus init
[2015-12-17 12:37:31] cirrus init 2
[2015-12-17 12:37:31] Attempting to allocate VGA stack via pmm call to 
f000:d2a0   <<== here stuck, loop handle PIC irq0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[2015-12-17 12:37:35] handle_hwpic1 irq=0
[

RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Gonglei (Arei)
> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Tuesday, December 22, 2015 2:47 AM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> > When the gurb of OS is booting, then the softirq and C function
> > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
> > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
> > be used again. And the stack of first calling will be broken, so that the
> SeaBIOS stuck.
> >
> > You can easily reproduce the problem.
> >
> > 1. start on guest
> > 2. reset the guest
> > 3. inject a NMI when the guest show the grub surface 4. then the guest
> > stuck
> 
> Does the SeaBIOS patch below help?  

Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch. 


> I'm not familiar with how to "inject a
> NMI" - can you describe the process in more detail?
> 

1. Qemu Command line:

#: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -smp 8 
-name suse -vnc 0.0.0.0:10 \
-device virtio-scsi-pci,id=scsi0 -drive 
file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native
 \
-device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
-chardev file,id=seabios,path=/home/seabios.log -device 
isa-debugcon,iobase=0x402,chardev=seabios \
-monitor stdio -qmp unix:/tmp/qmp,server,nowait 

2. Inject a NMI by QMP:

#: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
Welcome to the QMP low-level shell!
Connected to QEMU 2.5.0

(QEMU) system_reset
{"return": {}}
(QEMU) inject-nmi  
{"return": {}}
(QEMU) inject-nmi
{"return": {}}


Regards,
-Gonglei

> -Kevin
> 
> 
> --- a/src/romlayout.S
> +++ b/src/romlayout.S
> @@ -548,7 +548,9 @@ entry_post:
>  ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
> 
>  ORG 0xe2c3
> -IRQ_ENTRY 02
> +.global entry_02
> +entry_02:
> +ENTRY handle_02  // NMI handler does not switch onto extra
> +stack
> 
>  ORG 0xe3fe
>  .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/14] qapi: Rename (one) qjson.h to qobject-json.h

2015-12-21 Thread Eric Blake
We have two different JSON visitors in the tree; and having both
named 'qjson.h' can cause include confusion.  Rename the qapi
version.

Why did I pick that one?  A later patch plans on deleting the
top-level qjson.c once we have a native JSON output visitor; we
could have renamed that one for less overall churn.  On the other
hand, all of the QObject subtypes have their own qFOO.c file, but
qjson.c makes it sound like we have a QTYPE_JSON subclass of
QObject; the new name of qobject-json makes it obvious that the
file is used for conversions between QObject and JSON, and not a
QObject subtype.

Kill trailing whitespace in the renamed tests/check-qobject-json.c
to keep checkpatch.pl happy.

Signed-off-by: Eric Blake 
Reviewed-by: Paolo Bonzini 

---
v2: retitle, enhance commit message, rebase to master
---
 MAINTAINERS   |  2 +-
 balloon.c |  2 +-
 block.c   |  2 +-
 block/archipelago.c   |  2 +-
 block/nbd.c   |  2 +-
 block/quorum.c|  2 +-
 blockjob.c|  2 +-
 hw/core/qdev.c|  2 +-
 hw/misc/pvpanic.c |  2 +-
 hw/net/virtio-net.c   |  2 +-
 include/qapi/qmp/{qjson.h => qobject-json.h}  |  0
 include/qapi/qmp/types.h  |  2 +-
 monitor.c |  2 +-
 qapi/qmp-event.c  |  2 +-
 qemu-img.c|  2 +-
 qga/main.c|  2 +-
 qobject/Makefile.objs |  3 ++-
 qobject/{qjson.c => qobject-json.c}   |  2 +-
 target-s390x/kvm.c|  2 +-
 tests/.gitignore  |  2 +-
 tests/Makefile|  8 
 tests/{check-qjson.c => check-qobject-json.c} | 14 +++---
 tests/libqtest.c  |  2 +-
 ui/spice-core.c   |  2 +-
 vl.c  |  2 +-
 25 files changed, 34 insertions(+), 33 deletions(-)
 rename include/qapi/qmp/{qjson.h => qobject-json.h} (100%)
 rename qobject/{qjson.c => qobject-json.c} (99%)
 rename tests/{check-qjson.c => check-qobject-json.c} (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 55a0fd8..81fd039 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1155,7 +1155,7 @@ X: include/qapi/qmp/dispatch.h
 F: tests/check-qdict.c
 F: tests/check-qfloat.c
 F: tests/check-qint.c
-F: tests/check-qjson.c
+F: tests/check-qobject-json.c
 F: tests/check-qlist.c
 F: tests/check-qstring.c
 T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp
diff --git a/balloon.c b/balloon.c
index 0f45d1b..5983b4f 100644
--- a/balloon.c
+++ b/balloon.c
@@ -31,7 +31,7 @@
 #include "trace.h"
 #include "qmp-commands.h"
 #include "qapi/qmp/qerror.h"
-#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qobject-json.h"

 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
diff --git a/block.c b/block.c
index 411edbf..0b7eb00 100644
--- a/block.c
+++ b/block.c
@@ -30,7 +30,7 @@
 #include "qemu/module.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qbool.h"
-#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qobject-json.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/sysemu.h"
 #include "qemu/notify.h"
diff --git a/block/archipelago.c b/block/archipelago.c
index 855655c..80a1bb5 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -56,7 +56,7 @@
 #include "qemu/thread.h"
 #include "qapi/qmp/qint.h"
 #include "qapi/qmp/qstring.h"
-#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qobject-json.h"
 #include "qemu/atomic.h"

 #include 
diff --git a/block/nbd.c b/block/nbd.c
index 416f42b..ef53083 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -32,7 +32,7 @@
 #include "qemu/module.h"
 #include "qemu/sockets.h"
 #include "qapi/qmp/qdict.h"
-#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qobject-json.h"
 #include "qapi/qmp/qint.h"
 #include "qapi/qmp/qstring.h"

diff --git a/block/quorum.c b/block/quorum.c
index 6793f12..a64b40d 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -18,7 +18,7 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qint.h"
-#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qobject-json.h"
 #include "qapi/qmp/qlist.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi-event.h"
diff --git a/blockjob.c b/blockjob.c
index 80adb9d..84361f7 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -31,7 +31,7 @@
 #include "block/block_int.h"
 #include "sysemu/block-backend.h"
 #include "qapi/qmp/qerror.h"
-#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qobject-json.h"
 #include "qemu/coroutine.h"
 #include "qmp-commands.h"
 #include "qemu/timer.h"
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 3447f93..149028d 100644
--

Re: kvmclock doesn't work, help?

2015-12-21 Thread Andy Lutomirski
On Fri, Dec 18, 2015 at 1:49 PM, Marcelo Tosatti  wrote:
> On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote:
>> [cc: John Stultz -- maybe you have ideas on how this should best
>> integrate with the core code]
>>
>> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti  
>> wrote:

>> > Can you write an actual proposal (with details) that accomodates the
>> > issue described at "Assuming a stable TSC across physical CPUS, and a
>> > stable TSC" ?
>> >
>> > Yes it would be nicer, the IPIs (to stop the vcpus) are problematic for
>> > realtime guests.
>>
>> This shouldn't require many details, and I don't think there's an ABI
>> change.  The rules are:
>>
>> When the overall system timebase changes (e.g. when the selected
>> clocksource changes or when update_pvclock_gtod is called), the KVM
>> host would:
>>
>> optionally: preempt_disable();  /* for performance */
>>
>> for all vms {
>>
>>   for all registered pvti structures {
>> pvti->version++;  /* should be odd now */
>>   }
>
> pvti is userspace data, so you have to pin it before?

Yes.

Fortunately, most systems probably only have one page of pvti
structures, I think (unless there are a ton of vcpus), so the
performance impact should be negligible.

>
>>   /* Note: right now, any vcpu that tries to access pvti will start
>> infinite looping.  We should add cpu_relax() to the guests. */
>>
>>   for all registered pvti structures {
>> update everything except pvti->version;
>>   }
>>
>>   for all registered pvti structures {
>> pvti->version++;  /* should be even now */
>>   }
>>
>>   cond_resched();
>> }
>>
>> Is this enough detail?  This should work with all existing guests,
>> too, unless there's a buggy guest out there that actually fails to
>> double-check version.
>
> What is the advantage of this over the brute force method, given
> that guests will busy spin?
>
> (busy spin is equally problematic as IPI for realtime guests).

I disagree.  It's never been safe to call clock_gettime from an RT
task and expect a guarantee of real-time performance.  We could fix
that, but it's not even safe on non-KVM.

Sending an IPI *always* stalls the task.  Taking a lock (which is
effectively what this is doing) only stalls the tasks that contend for
the lock, which, most of the time, means that nothing stalls.

Also, if the host disables preemption or otherwise boosts its priority
while version is odd, then the actual stall will be very short, in
contrast to an IPI-induced stall, which will be much, much longer.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 9/9] KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection

2015-12-21 Thread Suresh E. Warrier
Redirecting the wakeup of a VCPU from the H_IPI hypercall to
a core running in the host is usually a good idea, most workloads
seemed to benefit. However, in one heavily interrupt-driven SMT1
workload, some regression was observed. This patch adds a kvm_hv
module parameter called h_ipi_redirect to control this feature.

The default value for this tunable is 1 - that is enable the feature.

Signed-off-by: Suresh Warrier 
---
Resending the updated patch with the updated diff since an 
earlier patch (patch 8/9) had to be resent to fix a build
break.

 arch/powerpc/include/asm/kvm_ppc.h   |  1 +
 arch/powerpc/kvm/book3s_hv.c | 11 +++
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  5 -
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 1b93519..29d1442 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -448,6 +448,7 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 
icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
 extern void kvmppc_xics_ipi_action(void);
+extern int h_ipi_redirect;
 #else
 static inline void kvmppc_alloc_host_rm_ops(void) {};
 static inline void kvmppc_free_host_rm_ops(void) {};
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d6280ed..182ec84 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,17 @@ static int target_smt_mode;
 module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)");
 
+#ifdef CONFIG_KVM_XICS
+static struct kernel_param_ops module_param_ops = {
+   .set = param_set_int,
+   .get = param_get_int,
+};
+
+module_param_cb(h_ipi_redirect, &module_param_ops, &h_ipi_redirect,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core");
+#endif
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index e673fb9..980d8a6 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -24,6 +24,9 @@
 
 #define DEBUG_PASSUP
 
+int h_ipi_redirect = 1;
+EXPORT_SYMBOL(h_ipi_redirect);
+
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
@@ -148,7 +151,7 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
cpu = vcpu->arch.thread_cpu;
if (cpu < 0 || cpu >= nr_cpu_ids) {
hcore = -1;
-   if (kvmppc_host_rm_ops_hv)
+   if (kvmppc_host_rm_ops_hv && h_ipi_redirect)
hcore = find_available_hostcore(XICS_RM_KICK_VCPU);
if (hcore != -1) {
icp_send_hcore_msg(hcore, vcpu);
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 8/9] KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU

2015-12-21 Thread Suresh E. Warrier
This patch adds support to real-mode KVM to search for a core
running in the host partition and send it an IPI message with
VCPU to be woken. This avoids having to switch to the host
partition to complete an H_IPI hypercall when the VCPU which
is the target of the the H_IPI is not loaded (is not running
in the guest).

The patch also includes the support in the IPI handler running
in the host to do the wakeup by calling kvmppc_xics_ipi_action
for the PPC_MSG_RM_HOST_ACTION message.

When a guest is being destroyed, we need to ensure that there
are no pending IPIs waiting to wake up a VCPU before we free
the VCPUs of the guest. This is accomplished by:
- Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs
  before freeing any VCPUs in kvm_arch_destroy_vm().
- Any PPC_MSG_RM_HOST_ACTION messages must be executed first
  before any other PPC_MSG_CALL_FUNCTION messages.

Signed-off-by: Suresh Warrier 
---
Fixed build break for CONFIG_SMP=n (thanks to Mike Ellerman for
pointing that out).

 arch/powerpc/kernel/smp.c| 11 +
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 92 ++--
 arch/powerpc/kvm/powerpc.c   | 10 
 3 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e222efc..cb8be5d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -257,6 +257,17 @@ irqreturn_t smp_ipi_demux(void)
 
do {
all = xchg(&info->messages, 0);
+#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+   /*
+* Must check for PPC_MSG_RM_HOST_ACTION messages
+* before PPC_MSG_CALL_FUNCTION messages because when
+* a VM is destroyed, we call kick_all_cpus_sync()
+* to ensure that any pending PPC_MSG_RM_HOST_ACTION
+* messages have completed before we free any VCPUs.
+*/
+   if (all & IPI_MESSAGE(PPC_MSG_RM_HOST_ACTION))
+   kvmppc_xics_ipi_action();
+#endif
if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNCTION))
generic_smp_call_function_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 43ffbfe..e673fb9 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -51,11 +51,84 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics,
 
 /* -- ICP routines -- */
 
+#ifdef CONFIG_SMP
+static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu)
+{
+   int hcpu;
+
+   hcpu = hcore << threads_shift;
+   kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
+   smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
+   icp_native_cause_ipi_rm(hcpu);
+}
+#else
+static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { }
+#endif
+
+/*
+ * We start the search from our current CPU Id in the core map
+ * and go in a circle until we get back to our ID looking for a
+ * core that is running in host context and that hasn't already
+ * been targeted for another rm_host_ops.
+ *
+ * In the future, could consider using a fairer algorithm (one
+ * that distributes the IPIs better)
+ *
+ * Returns -1, if no CPU could be found in the host
+ * Else, returns a CPU Id which has been reserved for use
+ */
+static inline int grab_next_hostcore(int start,
+   struct kvmppc_host_rm_core *rm_core, int max, int action)
+{
+   bool success;
+   int core;
+   union kvmppc_rm_state old, new;
+
+   for (core = start + 1; core < max; core++)  {
+   old = new = READ_ONCE(rm_core[core].rm_state);
+
+   if (!old.in_host || old.rm_action)
+   continue;
+
+   /* Try to grab this host core if not taken already. */
+   new.rm_action = action;
+
+   success = cmpxchg64(&rm_core[core].rm_state.raw,
+   old.raw, new.raw) == old.raw;
+   if (success) {
+   /*
+* Make sure that the store to the rm_action is made
+* visible before we return to caller (and the
+* subsequent store to rm_data) to synchronize with
+* the IPI handler.
+*/
+   smp_wmb();
+   return core;
+   }
+   }
+
+   return -1;
+}
+
+static inline int find_available_hostcore(int action)
+{
+   int core;
+   int my_core = smp_processor_id() >> threads_shift;
+   struct kvmppc_host_rm_core *rm_core = kvmppc_host_rm_ops_hv->rm_core;
+
+   core = grab_next_hostcore(my_core, rm_core, cpu_nr_cores(), action);
+   if (core == -1)
+   core = grab_next_hostcore(core, rm_core, my

Re: [kvm-unit-tests PATCH 3/3] add timeout support

2015-12-21 Thread Andrew Jones
On Mon, Dec 21, 2015 at 06:04:20PM +0100, Radim Krčmář wrote:
> 2015-12-17 14:10-0600, Andrew Jones:
> > Signed-off-by: Andrew Jones 
> > ---
> > diff --git a/arm/run b/arm/run
> > @@ -75,10 +75,14 @@ chr_testdev+=' -device virtconsole,chardev=ctd -chardev 
> > testdev,id=ctd'
> >  M+=",accel=$ACCEL"
> >  command="$qemu $M -cpu $processor $chr_testdev"
> >  command+=" -display none -serial stdio -kernel"
> > -echo $command "$@"
> > +
> > +if [ "$TIMEOUT" ]; then
> > +   timeout_cmd="timeout --foreground $TIMEOUT"
> 
> (command="timeout --foreground $TIMEOUT $command" # to keep the clutter
>  down.)
> 
> > +fi
> 
> (We paste it three times, so I'd rather see this abstracted in a
>  "scripts/" library.)

Sounds good

> 
> > diff --git a/run_tests.sh b/run_tests.sh
> > @@ -21,6 +21,7 @@ function run()
> > +local timeout="${9:-$TIMEOUT}"
> > diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> > @@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then
> > +if [ "$timeout" ]; then
> > +   timeout_cmd='timeout --foreground $timeout'
> 
> Both would be nicer if they took the TIMEOUT variable as an override.

Everything already takes TIMEOUT as an override, i.e.

TIMEOUT=3 ./run_tests.sh

and

TIMEOUT=3 arm/run arm/test.flat

will both already set a timeout for any test that didn't have a timeout
set in unittests.cfg, or wasn't run with run()/unittests.cfg. Or, did
you mean that you'd prefer TIMEOUT to override the timeout in
unittests.cfg? That does make some sense, in the case the one in the
config is longer than desired, however it also makes sense the way I
have it now when the one in the config is shorter than TIMEOUT (the
fallback default). I think I like it this way better.

> We already don't do that for accel and the patch seems ok in other
> regards,

Hmm, for accel I see a need for a patch allowing us to do

ACCEL=?? ./run_tests.sh

as I already have for TIMEOUT. Also, for both I should add a
mkstandalone patch allowing

TIMEOUT=? ACCEL=? make standalone

Thanks,
drew

> 
> Reviewed-by: Radim Krčmář 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 1/3] run_tests.sh: reduce return code ambiguity

2015-12-21 Thread Andrew Jones
On Mon, Dec 21, 2015 at 05:31:24PM +0100, Radim Krčmář wrote:
> 2015-12-17 14:10-0600, Andrew Jones:
> > qemu/unittest exit codes are convoluted, causing codes 0 and 1
> > to be ambiguous. Here are the possible meanings
> > 
> >  .-.
> >  || 0 | 1  |
> >  |-|
> >  | QEMU   | did something successfully,   | FAILURE|
> >  || but probably didn't run the   ||
> >  || unittest, OR caught SIGINT,   ||
> >  || SIGHUP, or SIGTERM||
> >  |-|
> >  | unittest   | for some reason exited using  | SUCCESS|
> >  || ACPI/PSCI, not with debug-exit||
> >  .-.
> > 
> > As we can see above, an exit code of zero is even ambiguous for each
> > row, i.e. QEMU could exit with zero because it successfully completed,
> > or because it caught a signal. unittest could exit with zero because
> > it successfully powered-off, or because for some odd reason it powered-
> > off instead of calling debug-exit.
> > 
> > And, the most fun is that exit-code == 1 means QEMU failed, but the
> > unittest succeeded.
> > 
> > This patch attempts to reduce that ambiguity, by also looking at stderr.
> 
> Nice.
> 
> > With it, we have
> > 
> >  0  - unexpected exit from qemu, or the unittest not using debug-exit.
> >   Consider it a FAILURE
> >  1  - unittest SUCCESS
> >  < 128  - something failed (could be the unittest, qemu, or a run script)
> >   Check the logs.
> >  >= 128 - signal (signum = code - 128)
> 
> I think this heuristic should be applied to {arm,x86}/run.
> run_tests.sh would inherit it and we would finally get reasonable exit
> values everywhere.

Good idea. We can add the table to a scripts/functions.bash function and
use it everywhere.

> 
> The resulting table would look like this:
> 
>  0 = unit-test passed
>  77= unit-test skipped (not implemented yet)
>  124   = unit-test timeouted (implemented in [3/3])
>  127   = qemu returned 0 (debug-exit probably wasn't called)

We already use 127 for abort(), called from a unit test, see
lib/abort.c. I guess we can use 126 for "debug-exit probably wasn't
called". We should also add a (unit test called abort) message for
127.

>  > 128 = exited because of signal $? - 128
>  * = unit-test failed
> 
> (Signal 0 is not used, so we could map 128 to mean "debug-exit probably
>  wasn't called", but others might not understand our signal convention.

I think we want 128 to be the beginning of signal space, which goes all
the way up to 255, in order to allow exit code masking to work.

>  Anyway, it'd be best for us to start at 200, for `case $? in 2??)` ...)

Start what at 200? I think we have everything covered above. The mapping
looks like this

0   = success
1-63= unit test failure code
64-127  = test suite failure code
128-255 = signal

which sounds good to me.

> 
> > Signed-off-by: Andrew Jones 
> > ---
> > diff --git a/run_tests.sh b/run_tests.sh
> > @@ -54,10 +55,32 @@ function run()
> >  
> >  # extra_params in the config file may contain backticks that need to be
> >  # expanded, so use eval to start qemu
> > -eval $cmdline >> test.log
> > +errlog=$(mktemp)
> > +eval $cmdline >> test.log 2> $errlog
> | [...]
> |  cat $errlog >> test.log
> 
> This assumes that stderr is always after stdout,

True. I'm not sure that matters when the unit test, which only uses stdout
will always output stuff serially with qemu, which could output a mix. But
your version below is fine by me if we want to pick up the need for the
pipe and tee.

> 
>   eval $cmdline 2>&1 >> test.log | tee $errlog >> test.log
> 
> has a chance to print lines in wrong order too, but I think it's going
> to be closer to the original.

I'll play with it and send a v2 soon.

Thanks,
drew

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-21 Thread Mario Smarduch


On 12/18/2015 11:45 PM, Christoffer Dall wrote:
> On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
>> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
>>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
 This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 
 register.
 On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
 trapping for 32 and 64 bit guests. On first fp/simd access trap to handler 
 to save host and restore guest context, and clear trapping bits to enable 
 vcpu 
 lazy mode. On vcpu_put if trap bits are clear save guest and restore host 
 context and also save 32 bit guest fpexc register.

 Signed-off-by: Mario Smarduch 
 ---
  arch/arm/include/asm/kvm_emulate.h   |   5 ++
  arch/arm/include/asm/kvm_host.h  |   2 +
  arch/arm/kvm/arm.c   |  20 +--
  arch/arm64/include/asm/kvm_asm.h |   2 +
  arch/arm64/include/asm/kvm_emulate.h |  15 +++--
  arch/arm64/include/asm/kvm_host.h|  16 +-
  arch/arm64/kernel/asm-offsets.c  |   1 +
  arch/arm64/kvm/Makefile  |   3 +-
  arch/arm64/kvm/fpsimd_switch.S   |  38 
  arch/arm64/kvm/hyp.S | 108 
 +--
  arch/arm64/kvm/hyp_head.S|  48 
  11 files changed, 181 insertions(+), 77 deletions(-)
  create mode 100644 arch/arm64/kvm/fpsimd_switch.S
  create mode 100644 arch/arm64/kvm/hyp_head.S

 diff --git a/arch/arm/include/asm/kvm_emulate.h 
 b/arch/arm/include/asm/kvm_emulate.h
 index 3de11a2..13feed5 100644
 --- a/arch/arm/include/asm/kvm_emulate.h
 +++ b/arch/arm/include/asm/kvm_emulate.h
 @@ -243,6 +243,11 @@ static inline unsigned long 
 vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
}
  }
  
 +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
 +{
 +  return true;
 +}
 +
  #ifdef CONFIG_VFPv3
  /* Called from vcpu_load - save fpexc and enable guest access to fp/simd 
 unit */
  static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index ecc883a..720ae51 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
  
  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 +
 +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
  
  static inline void kvm_arch_hardware_disable(void) {}
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 1de07ab..dd59f8a 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
 cpu)
  
kvm_arm_set_running_vcpu(vcpu);
  
 -  /*  Save and enable FPEXC before we load guest context */
 -  kvm_enable_vcpu_fpexc(vcpu);
 +  /*
 +   * For 32bit guest executing on arm64, enable fp/simd access in
 +   * EL2. On arm32 save host fpexc and then enable fp/simd access.
 +   */
 +  if (kvm_guest_vcpu_is_32bit(vcpu))
 +  kvm_enable_vcpu_fpexc(vcpu);
  
/* reset hyp cptr register to trap on tracing and vfp/simd access*/
vcpu_reset_cptr(vcpu);
 @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
 cpu)
  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
  {
/* If the fp/simd registers are dirty save guest, restore host. */
 -  if (kvm_vcpu_vfp_isdirty(vcpu))
 +  if (kvm_vcpu_vfp_isdirty(vcpu)) {
kvm_restore_host_vfp_state(vcpu);
  
 -  /* Restore host FPEXC trashed in vcpu_load */
 +  /*
 +   * For 32bit guest on arm64 save the guest fpexc register
 +   * in EL2 mode.
 +   */
 +  if (kvm_guest_vcpu_is_32bit(vcpu))
 +  kvm_save_guest_vcpu_fpexc(vcpu);
 +  }
 +
 +  /* For arm32 restore host FPEXC trashed in vcpu_load. */
kvm_restore_host_fpexc(vcpu);
  
/*
 diff --git a/arch/arm64/include/asm/kvm_asm.h 
 b/arch/arm64/include/asm/kvm_asm.h
 index 5e37710..d53d069 100644
 --- a/arch/arm64/include/asm/kvm_asm.h
 +++ b/arch/arm64/include/asm/kvm_asm.h
 @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
  extern void __kvm_flush_vm_context(void);
  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 +extern void __kvm_vcpu_enable_fpexc32(void);
 +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *v

Re: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.

2015-12-21 Thread Peter Hornyack
On Fri, Dec 18, 2015 at 1:25 PM, Paolo Bonzini  wrote:
>
>
> On 18/08/2015 20:46, Peter Hornyack wrote:
>> Define KVM_EXIT_MSR, a new exit reason for accesses to MSRs that kvm
>> does not handle. Define KVM_CAP_UNHANDLED_MSR_EXITS, a vm-wide
>> capability that guards the new exit reason and which can be enabled via
>> the KVM_ENABLE_CAP ioctl.
>>
>> Signed-off-by: Peter Hornyack 
>> ---
>>  Documentation/virtual/kvm/api.txt | 48 
>> +++
>>  include/uapi/linux/kvm.h  | 14 
>>  2 files changed, 62 insertions(+)
>
> Let's instead add:
>
> - KVM_CAP_MSR_EXITS
>
> - KVM_CAP_ENABLE_MSR_EXIT
>
> - KVM_CAP_DISABLE_MSR_EXIT
>
> and 3 kinds of exits: KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE,
> KVM_EXIT_MSR_AFTER_WRITE.  The first two use four fields (type, msr,
> data, handled) and #GP if handled=0 is zero on the next entry.  The last
> one uses the first three fields and can be used for HyperV.
>
> Paolo

Paolo, I've included an updated version of this patch below. Does this
match the API that you had in mind? If so, I'll continue adjusting the
rest of the code and will send the entire new patchset.

This updated version of the API seems more cumbersome to me (three new
capabilities, three exit reasons when just one or two might suffice)
than the original interface I used, but maybe you have a good reason
for that. Also, will it be ok to have just one capability to enable
all three kinds of exits, or will KVM_EXIT_MSR_AFTER_WRITE want a
separate capability?


commit a684d520ed62cf0db4495e5197d5bf722e4f8109
Author: Peter Hornyack 
Date:   Fri Dec 18 14:44:04 2015 -0800

KVM: add capabilities and exit reasons for MSRs.

Define KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, and
KVM_EXIT_MSR_AFTER_WRITE, new exit reasons for accesses to MSRs that kvm
does not handle or that user space needs to be notified about. Define the
KVM_CAP_MSR_EXITS, KVM_CAP_ENABLE_MSR_EXITS, and KVM_CAP_DISABLE_MSR_EXITS
capabilities to control these new exits for a VM.

diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt
index 053f613fc9a9..3bba3248df3d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3359,6 +3359,43 @@ Hyper-V SynIC state change. Notification is
used to remap SynIC
 event/message pages and to enable/disable SynIC messages/events processing
 in userspace.

+ /*
+ * KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE,
+ * KVM_EXIT_MSR_AFTER_WRITE
+ */
+ struct {
+ __u32 index;/* i.e. ecx; out */
+ __u64 data; /* out (wrmsr) / in (rdmsr) */
+#define KVM_EXIT_MSR_COMPLETION_FAILED 1
+ __u64 type; /* out */
+#define KVM_EXIT_MSR_UNHANDLED 0
+#define KVM_EXIT_MSR_HANDLED   1
+ __u8 handled;   /* in */
+ } msr;
+
+If exit_reason is KVM_EXIT_MSR_READ or KVM_EXIT_MSR_WRITE, then the vcpu has
+executed a rdmsr or wrmsr instruction which could not be satisfied by kvm. The
+msr struct is used for both output to and input from user space. index is the
+target MSR number held in ecx; user space must not modify this field. data
+holds the payload from a wrmsr or must be filled in with a payload on a rdmsr.
+For a normal exit, type will be 0.
+
+On the return path into kvm, user space should set handled to
+KVM_EXIT_MSR_HANDLED if it successfully handled the MSR access. Otherwise,
+handled should be set to KVM_EXIT_MSR_UNHANDLED, which will cause a general
+protection fault to be injected into the vcpu. If an error occurs during the
+return into kvm, the vcpu will not be run and another exit will be generated
+with type set to KVM_EXIT_MSR_COMPLETION_FAILED.
+
+If exit_reason is KVM_EXIT_MSR_AFTER_WRITE, then the vcpu has executed a wrmsr
+instruction which is handled by kvm but which user space may need to be
+notified about. index and data are set as described above; the value of type
+depends on the MSR that was written. handled is ignored on reentry into kvm.
+
+KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, and KVM_EXIT_MSR_AFTER_WRITE can only
+occur when KVM_CAP_MSR_EXITS is present and KVM_CAP_ENABLE_MSR_EXITS has been
+set. A detailed description of these capabilities is below.
+
  /* Fix the size of the union. */
  char padding[256];
  };
@@ -3697,6 +3734,26 @@ a KVM_EXIT_IOAPIC_EOI vmexit will be reported
to userspace.
 Fails if VCPU has already been created, or if the irqchip is already in the
 kernel (i.e. KVM_CREATE_IRQCHIP has already been called).

+7.6 KVM_CAP_ENABLE_MSR_EXITS, KVM_CAP_DISABLE_MSR_EXITS
+
+Architectures: x86 (vmx-only)
+Parameters: none
+Returns: 0 on success, -1 on error
+
+These capabilities enable and disable exits to user space for certain guest MSR
+accesses. These capabilities are only available if KVM_CHECK_EXTENSION
+indicates that KVM_CAP_MSR_EXITS is present.
+
+When enabled, kvm will exit to user space when the guest reads
+an MSR that kvm does not handle (KVM_EXIT_MSR_READ), writes an MSR that kvm
+does not handle (KVM_EXIT_MSR_WRITE), or writes an MSR that kvm hand

Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Kevin O'Connor
On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> When the gurb of OS is booting, then the softirq and C function send_disk_op()
> may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: 
> irqentry_extrastack
> is invoked, and the extra stack will be used again. And the stack of first 
> calling
> will be broken, so that the SeaBIOS stuck. 
> 
> You can easily reproduce the problem.
> 
> 1. start on guest
> 2. reset the guest
> 3. inject a NMI when the guest show the grub surface
> 4. then the guest stuck

Does the SeaBIOS patch below help?  I'm not familiar with how to
"inject a NMI" - can you describe the process in more detail?

-Kevin


--- a/src/romlayout.S
+++ b/src/romlayout.S
@@ -548,7 +548,9 @@ entry_post:
 ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
 
 ORG 0xe2c3
-IRQ_ENTRY 02
+.global entry_02
+entry_02:
+ENTRY handle_02  // NMI handler does not switch onto extra stack
 
 ORG 0xe3fe
 .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 3/3] add timeout support

2015-12-21 Thread Radim Krčmář
2015-12-17 14:10-0600, Andrew Jones:
> Signed-off-by: Andrew Jones 
> ---
> diff --git a/arm/run b/arm/run
> @@ -75,10 +75,14 @@ chr_testdev+=' -device virtconsole,chardev=ctd -chardev 
> testdev,id=ctd'
>  M+=",accel=$ACCEL"
>  command="$qemu $M -cpu $processor $chr_testdev"
>  command+=" -display none -serial stdio -kernel"
> -echo $command "$@"
> +
> +if [ "$TIMEOUT" ]; then
> + timeout_cmd="timeout --foreground $TIMEOUT"

(command="timeout --foreground $TIMEOUT $command" # to keep the clutter
 down.)

> +fi

(We paste it three times, so I'd rather see this abstracted in a
 "scripts/" library.)

> diff --git a/run_tests.sh b/run_tests.sh
> @@ -21,6 +21,7 @@ function run()
> +local timeout="${9:-$TIMEOUT}"
> diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> @@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then
> +if [ "$timeout" ]; then
> + timeout_cmd='timeout --foreground $timeout'

Both would be nicer if they took the TIMEOUT variable as an override.
We already don't do that for accel and the patch seems ok in other
regards,

Reviewed-by: Radim Krčmář 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 1/3] run_tests.sh: reduce return code ambiguity

2015-12-21 Thread Radim Krčmář
2015-12-17 14:10-0600, Andrew Jones:
> qemu/unittest exit codes are convoluted, causing codes 0 and 1
> to be ambiguous. Here are the possible meanings
> 
>  .-.
>  || 0 | 1  |
>  |-|
>  | QEMU   | did something successfully,   | FAILURE|
>  || but probably didn't run the   ||
>  || unittest, OR caught SIGINT,   ||
>  || SIGHUP, or SIGTERM||
>  |-|
>  | unittest   | for some reason exited using  | SUCCESS|
>  || ACPI/PSCI, not with debug-exit||
>  .-.
> 
> As we can see above, an exit code of zero is even ambiguous for each
> row, i.e. QEMU could exit with zero because it successfully completed,
> or because it caught a signal. unittest could exit with zero because
> it successfully powered-off, or because for some odd reason it powered-
> off instead of calling debug-exit.
> 
> And, the most fun is that exit-code == 1 means QEMU failed, but the
> unittest succeeded.
> 
> This patch attempts to reduce that ambiguity, by also looking at stderr.

Nice.

> With it, we have
> 
>  0  - unexpected exit from qemu, or the unittest not using debug-exit.
>   Consider it a FAILURE
>  1  - unittest SUCCESS
>  < 128  - something failed (could be the unittest, qemu, or a run script)
>   Check the logs.
>  >= 128 - signal (signum = code - 128)

I think this heuristic should be applied to {arm,x86}/run.
run_tests.sh would inherit it and we would finally get reasonable exit
values everywhere.

The resulting table would look like this:

 0 = unit-test passed
 77= unit-test skipped (not implemented yet)
 124   = unit-test timeouted (implemented in [3/3])
 127   = qemu returned 0 (debug-exit probably wasn't called)
 > 128 = exited because of signal $? - 128
 * = unit-test failed

(Signal 0 is not used, so we could map 128 to mean "debug-exit probably
 wasn't called", but others might not understand our signal convention.
 Anyway, it'd be best for us to start at 200, for `case $? in 2??)` ...)

> Signed-off-by: Andrew Jones 
> ---
> diff --git a/run_tests.sh b/run_tests.sh
> @@ -54,10 +55,32 @@ function run()
>  
>  # extra_params in the config file may contain backticks that need to be
>  # expanded, so use eval to start qemu
> -eval $cmdline >> test.log
> +errlog=$(mktemp)
> +eval $cmdline >> test.log 2> $errlog
| [...]
|  cat $errlog >> test.log

This assumes that stderr is always after stdout,

  eval $cmdline 2>&1 >> test.log | tee $errlog >> test.log

has a chance to print lines in wrong order too, but I think it's going
to be closer to the original.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-21 Thread Pavel Fedin
 Hello!

> >   It depends. Can i read about these hypercalls somewhere? Is there any 
> > documentation?
> I don't know about a documentation, but you can look at the code of
> Hyper-V hypercall handling inside KVM:
> 
> https://github.com/torvalds/linux/blob/master/arch/x86/kvm/hyperv.c#L346

 Aha, i see, so vmmcall CPU instruction is employed. Well, i believe this very 
well fits into the sematics of KVM_EXIT_HYPERCALL,
because it's a true hypercall.

> The code simply decodes hypercall parameters from vcpu registers then
> handle hypercall code in switch and encode return code inside vcpu
> registers. Probably encode and decode of hypercall parameters/return
> code can be done in QEMU so we need only some exit with parameter that
> this is Hyper-V hypercall and probably KVM_EXIT_HYPERCALL is good for it.

 Or you could even reuse the whole structure, it has all you need:

__u64 nr;   /* Reserved for x86, other 
architectures can use it, for example ARM "hvc #nr" */
__u64 args[6];  /* rax, rbx, rcx, rdx, rdi, rsi */
__u64 ret;
__u32 longmode; /* longmode; other architectures (like 
ARM64) can also make sense of it */

 Or you could put in struct kvm_regs instead of args and ret, and allow the 
userspace to manipulate it.

> But KVM_EXIT_HYPERCALL is not used inside KVM/QEMU so requires
> implementation.

 I guess your hypercalls to be introduced using KVM_EXIT_HYPERV are also not 
used inside qemu so require implementation :)

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/5] implement vNVDIMM

2015-12-21 Thread Xiao Guangrong



On 12/10/2015 11:11 AM, Xiao Guangrong wrote:


New version, new week, and unfortunate new ping... :(


Ping again to see what happened...


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-21 Thread Andrey Smetanin



On 12/21/2015 04:28 PM, Pavel Fedin wrote:

  Hello!


Yes, we can use  KVM_EXIT_REG_IO/MSR_IO for Hyper-V SynIC MSRS's changes
and can even use only one MSR value . So union inside struct
kvm_hyperv_exit is excessive.

But we still need Vcpu exit to handle VMBus hypercalls by QEMU to
emulate VMBus devices inside QEMU.

And currently we are going to extend struct kvm_hyperv_exit
to store Hyper-V VMBus hypercall parameters.


  Hm... Hypercalls, you say?
  We already have KVM_EXIT_HYPERCALL. Documentation says it's currently unused. 
Is it a leftover from ia64 KVM? Could we reuse it for
the purpose?


but could we replace Hyper-V VMBus hypercall and it's parameters
by KVM_EXIT_REG_IO/MSR_IO too?


  It depends. Can i read about these hypercalls somewhere? Is there any 
documentation?
I don't know about a documentation, but you can look at the code of 
Hyper-V hypercall handling inside KVM:


https://github.com/torvalds/linux/blob/master/arch/x86/kvm/hyperv.c#L346

The code simply decodes hypercall parameters from vcpu registers then 
handle hypercall code in switch and encode return code inside vcpu 
registers. Probably encode and decode of hypercall parameters/return 
code can be done in QEMU so we need only some exit with parameter that 
this is Hyper-V hypercall and probably KVM_EXIT_HYPERCALL is good for it.


But KVM_EXIT_HYPERCALL is not used inside KVM/QEMU so requires
implementation.



Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-21 Thread Pavel Fedin
 Hello!

> Yes, we can use  KVM_EXIT_REG_IO/MSR_IO for Hyper-V SynIC MSRS's changes
> and can even use only one MSR value . So union inside struct
> kvm_hyperv_exit is excessive.
> 
> But we still need Vcpu exit to handle VMBus hypercalls by QEMU to
> emulate VMBus devices inside QEMU.
> 
> And currently we are going to extend struct kvm_hyperv_exit
> to store Hyper-V VMBus hypercall parameters.

 Hm... Hypercalls, you say?
 We already have KVM_EXIT_HYPERCALL. Documentation says it's currently unused. 
Is it a leftover from ia64 KVM? Could we reuse it for
the purpose?

> but could we replace Hyper-V VMBus hypercall and it's parameters
> by KVM_EXIT_REG_IO/MSR_IO too?

 It depends. Can i read about these hypercalls somewhere? Is there any 
documentation?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-12-21 Thread Andrey Smetanin



On 12/18/2015 09:39 PM, Roman Kagan wrote:

On Fri, Dec 18, 2015 at 10:10:11AM -0800, Peter Hornyack wrote:

On Fri, Dec 18, 2015 at 8:01 AM, Paolo Bonzini  wrote:

On 18/12/2015 16:19, Pavel Fedin wrote:

As far as i understand this code, KVM_EXIT_HYPERV is called when one
of three MSRs are accessed. But, shouldn't we have implemented
instead something more generic, like KVM_EXIT_REG_IO, which would
work similar to KVM_EXIT_PIO or KVM_EXIT_MMIO, but carry register
code and value?


Yes, we considered that.  There were actually patches for this as well.
  However, in this case the register is still emulated in the kernel, and
userspace just gets informed of the new value.


On brief inspection of Andrey's patch (I have not been following
closely) it looks like the kvm_hyperv_exit struct that's returned to
userspace contains more data (control, evt_page, and msg_page fields)
than simply the value of the MSR, so would the desired SynIC exit fit
into a general-purpose exit for MSR emulation?


Frankly I'm struggling trying to recall why we implemented it this way.
Actually all three fields are the values of respective MSRs and I don't
see any necessity to pass all three at the same time when any of them
gets updated.  The patch for QEMU adds an exit handler which processes
the fields individually, so I have a strong suspicion that union was
meant here rather than struct.

I hope Andrey will help to shed some light on that when he's back in the
office on Monday; meanwhile I think this peculiarity can be ignored.

Hello!

We have implemented Hyper-V related Vcpu exit not only for Hyper-V SynIC 
MSR's changes but also to provide future interface to transfer guest 
VMBus hypercalls parameters into QEMU.


Yes, we can use  KVM_EXIT_REG_IO/MSR_IO for Hyper-V SynIC MSRS's changes 
and can even use only one MSR value . So union inside struct 
kvm_hyperv_exit is excessive.


But we still need Vcpu exit to handle VMBus hypercalls by QEMU to 
emulate VMBus devices inside QEMU.


And currently we are going to extend struct kvm_hyperv_exit
to store Hyper-V VMBus hypercall parameters.

SynIC MSR's changes could be replaced by KVM_EXIT_REG_IO/MSR_IO
but could we replace Hyper-V VMBus hypercall and it's parameters
by KVM_EXIT_REG_IO/MSR_IO too?



Roman.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-21 Thread Marcelo Tosatti
On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote:
> [cc: John Stultz -- maybe you have ideas on how this should best
> integrate with the core code]
> 
> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti  wrote:
> > On Fri, Dec 18, 2015 at 11:27:13AM -0800, Andy Lutomirski wrote:
> >> On Fri, Dec 18, 2015 at 3:47 AM, Marcelo Tosatti  
> >> wrote:
> >> > On Thu, Dec 17, 2015 at 05:12:59PM -0800, Andy Lutomirski wrote:
> >> >> On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  
> >> >> wrote:
> >> >> > On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
> >> >> >> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti 
> >> >> >>  wrote:
> >> >> >> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> >> >> >> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski 
> >> >> >> >>  wrote:
> >> >> >> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini 
> >> >> >> >> >  wrote:
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >> >> >> >> >>> > RAW TSC NTP corrected TSC
> >> >> >> >> >>> > t0  10  10
> >> >> >> >> >>> > t1  20  19.99
> >> >> >> >> >>> > t2  30  29.98
> >> >> >> >> >>> > t3  40  39.97
> >> >> >> >> >>> > t4  50  49.96
> >> >> >
> >> >> > (1)
> >> >> >
> >> >> >> >> >>> >
> >> >> >> >> >>> > ...
> >> >> >> >> >>> >
> >> >> >> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >> >> >> >> >>> > you can see what will happen.
> >> >> >> >> >>>
> >> >> >> >> >>> Sure, but why would you ever switch from one to the other?
> >> >> >> >> >>
> >> >> >> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  
> >> >> >> >> >> After
> >> >> >> >> >> resume, the TSC certainly increases at the same rate as 
> >> >> >> >> >> before, but the
> >> >> >> >> >> raw TSC restarted counting from 0 and systemtime has increased 
> >> >> >> >> >> slower
> >> >> >> >> >> than the guest kvmclock.
> >> >> >> >> >
> >> >> >> >> > Wait, are we talking about the host's NTP or the guest's NTP?
> >> >> >> >> >
> >> >> >> >> > If it's the host's, then wouldn't systemtime be reset after 
> >> >> >> >> > resume to
> >> >> >> >> > the NTP corrected value?  If so, the guest wouldn't see time go
> >> >> >> >> > backwards.
> >> >> >> >> >
> >> >> >> >> > If it's the guest's, then the guest's NTP correction is applied 
> >> >> >> >> > on top
> >> >> >> >> > of kvmclock, and this shouldn't matter.
> >> >> >> >> >
> >> >> >> >> > I still feel like I'm missing something very basic here.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> OK, I think I get it.
> >> >> >> >>
> >> >> >> >> Marcelo, I thought that kvmclock was supposed to propagate the 
> >> >> >> >> host's
> >> >> >> >> correction to the guest.  If it did, indeed, propagate the 
> >> >> >> >> correction
> >> >> >> >> then, after resume, the host's new system_time would match the 
> >> >> >> >> guest's
> >> >> >> >> idea of it (after accounting for the guest's long nap), and I 
> >> >> >> >> don't
> >> >> >> >> think there would be a problem.
> >> >> >> >> That being said, I can't find the code in the masterclock stuff 
> >> >> >> >> that
> >> >> >> >> would actually do this.
> >> >> >> >
> >> >> >> > Guest clock is maintained by guest timekeeping code, which does:
> >> >> >> >
> >> >> >> > timer_interrupt()
> >> >> >> > offset = read clocksource since last timer interrupt
> >> >> >> > accumulate_to_systemclock(offset)
> >> >> >> >
> >> >> >> > The frequency correction of NTP in the host can be applied to
> >> >> >> > kvmclock, which will be visible to the guest
> >> >> >> > at "read clocksource since last timer interrupt"
> >> >> >> > (kvmclock_clocksource_read function).
> >> >> >>
> >> >> >> pvclock_clocksource_read?  That seems to do the same thing as all the
> >> >> >> other clocksource access functions.
> >> >> >>
> >> >> >> >
> >> >> >> > This does not mean that the NTP correction in the host is 
> >> >> >> > propagated
> >> >> >> > to the guests system clock directly.
> >> >> >> >
> >> >> >> > (For example, the guest can run NTP which is free to do further
> >> >> >> > adjustments at "accumulate_to_systemclock(offset)" time).
> >> >> >>
> >> >> >> Of course.  But I expected that, in the absence of NTP on the guest,
> >> >> >> that the guest would track the host's *corrected* time.
> >> >> >>
> >> >> >> >
> >> >> >> >> If, on the other hand, the host's NTP correction is not supposed 
> >> >> >> >> to
> >> >> >> >> propagate to the guest,
> >> >> >> >
> >> >> >> > This is optional. There is a module option to control this, in 
> >> >> >> > fact.
> >> >> >> >
> >> >> >> > Its nice to have, because then you can execute a guest without NTP
> >> >> >> > (say without network connection), and have a kvmclock (kvmclock is 
> >> >> >> > a
> >> >> >> > clocksource, not a guest system clock) which is NTP corrected.
> >> >>

RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-21 Thread Gonglei (Arei)
Dear Kevin,

> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Sunday, December 20, 2015 10:33 PM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Sun, Dec 20, 2015 at 09:49:54AM +, Gonglei (Arei) wrote:
> > > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > > Sent: Saturday, December 19, 2015 11:12 PM
> > > On Sat, Dec 19, 2015 at 12:03:15PM +, Gonglei (Arei) wrote:
> > > > Maybe the root cause is not NMI but INTR, so yield() can open hardware
> > > interrupt,
> > > > And then execute interrupt handler, but the interrupt handler make the
> > > SeaBIOS
> > > > stack broken, so that the BSP can't execute the instruction and occur
> > > exception,
> > > > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs
> except
> > > > the surface phenomenon.
> > >
> > > I can't see any reason why allowing interrupts at this location would
> > > be a problem.
> > >
> > Does it have any relationship with *extra stack* of SeaBIOS?
> 
> None that I can see.  Also, the kvm trace seems to show the code
> trying to execute at rip=0x03 - that will crash long before the extra
> stack is used.
> 
When the gurb of OS is booting, then the softirq and C function send_disk_op()
may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: 
irqentry_extrastack
is invoked, and the extra stack will be used again. And the stack of first 
calling
will be broken, so that the SeaBIOS stuck. 

You can easily reproduce the problem.

1. start on guest
2. reset the guest
3. inject a NMI when the guest show the grub surface
4. then the guest stuck

If we disabled extra stack by setting

 CONFIG_ENTRY_EXTRASTACK=n

Then the problem is gone.

Besides, I have another thought:

Is it possible when one cpu is using the extra stack, but other cpus (APs)
still be waked up by hardware interrupt after yield() or br->flags = F_IF 
and used the extra stack again?


Regards,
-Gonglei

> > > > Kevin, can we drop yield() in smp_setup() ?
> > >
> > > It's possible to eliminate this instance of yield, but I think it
> > > would just push the crash to the next time interrupts are enabled.
> > >
> > Perhaps. I'm not sure.
> >
> > > > Is it really useful and allowable for SeaBIOS? Maybe for other
> components?
> > > > I'm not sure. Because we found that when SeaBIOS is booting, if we 
> > > > inject
> a
> > > > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same
> with
> > > > the current problem.
> > >
> > > If you apply the patches you had to prevent that NMI crash problem,
> > > does it also prevent the above crash?
> > >
> > Yes, but we cannot prevent the NMI injection (though I'll submit some
> patches to
> > forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70).
> >
> 
> -Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html