Re: [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls
On 12/08/2015 04:48 PM, David Gibson wrote: On Tue, Sep 15, 2015 at 08:49:39PM +1000, Alexey Kardashevskiy wrote: This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition between kernel and user space. This implements the KVM_CAP_PPC_MULTITCE capability. When present, the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE. If they can not be handled by the kernel, they are passed on to the user space. The user space still has to have an implementation for these. Both HV and PR-syle KVM are supported. Signed-off-by: Alexey Kardashevskiy --- Documentation/virtual/kvm/api.txt | 25 ++ arch/powerpc/include/asm/kvm_ppc.h | 12 +++ arch/powerpc/kvm/book3s_64_vio.c| 111 +++- arch/powerpc/kvm/book3s_64_vio_hv.c | 145 ++-- arch/powerpc/kvm/book3s_hv.c| 26 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 +- arch/powerpc/kvm/book3s_pr_papr.c | 35 arch/powerpc/kvm/powerpc.c | 3 + 8 files changed, 350 insertions(+), 13 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d86d831..593c62a 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3019,6 +3019,31 @@ Returns: 0 on success, -1 on error Queues an SMI on the thread's vcpu. +4.97 KVM_CAP_PPC_MULTITCE + +Capability: KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This capability means the kernel is capable of handling hypercalls +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user +space. This significantly accelerates DMA operations for PPC KVM guests. +User space should expect that its handlers for these hypercalls +are not going to be called if user space previously registered LIOBN +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls). + +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest, +user space might have to advertise it for the guest. For example, +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is +present in the "ibm,hypertas-functions" device-tree property. + +The hypercalls mentioned above may or may not be processed successfully +in the kernel based fast path. If they can not be handled by the kernel, +they will get passed on to user space. So user space still has to have +an implementation for these despite the in kernel acceleration. + +This capability is always enabled. + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index fcde896..e5b968e 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern struct kvmppc_spapr_tce_table *kvmppc_find_table( + struct kvm_vcpu *vcpu, unsigned long liobn); extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, unsigned long ioba, unsigned long npages); extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt, unsigned long tce); +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa, + unsigned long *ua, unsigned long **prmap); +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt, + unsigned long idx, unsigned long tce); extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce); +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, + unsigned long liobn, unsigned long ioba, + unsigned long tce_list, unsigned long npages); +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, + unsigned long liobn, unsigned long ioba, + unsigned long tce_value, unsigned long npages); extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba); extern struct page *kvm_alloc_hpt(unsigned long nr_pages); diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index e347856..d3fc732 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -14,6 +14,7 @@ * * Copyright 2010 Paul Mackerras, IBM Corp. * Copyright 2011 David Gibson, IBM Corporation + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation */ #include @@ -37,8 +38,7 @@ #include #include #include - -#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) +#include static long kvmppc_stt_npages(unsign
Re: [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
On 12/08/2015 04:27 PM, David Gibson wrote: On Tue, Sep 15, 2015 at 08:49:37PM +1000, Alexey Kardashevskiy wrote: Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls) will validate TCE (not to have unexpected bits) and IO address (to be within the DMA window boundaries). This introduces helpers to validate TCE and IO address. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/kvm_ppc.h | 4 ++ arch/powerpc/kvm/book3s_64_vio_hv.c | 89 - 2 files changed, 83 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index c6ef05b..fcde896 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, struct kvm_create_spapr_tce *args); +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, + unsigned long ioba, unsigned long npages); +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt, + unsigned long tce); extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, unsigned long ioba, unsigned long tce); extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 6cf1ab3..f0fd84c 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -36,6 +36,7 @@ #include #include #include +#include #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) @@ -64,7 +65,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu, * WARNING: This will be called in real-mode on HV KVM and virtual * mode on PR KVM */ -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, unsigned long ioba, unsigned long npages) { unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1; @@ -76,6 +77,79 @@ static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, return H_SUCCESS; } +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate); Why does it need to be exported - the new users will still be in the KVM module, won't they? book3s_64_vio_hv.c contains realmode code and is always compiled into vmlinux and the helper is meant to be called from book3s_64_vio.c which may compile as a module. + +/* + * Validates TCE address. + * At the moment flags and page mask are validated. + * As the host kernel does not access those addresses (just puts them + * to the table and user space is supposed to process them), we can skip + * checking other things (such as TCE is a guest RAM address or the page + * was actually allocated). Hmm. These comments apply given that the only current user of this is the kvm acceleration of userspace TCE tables, but the name suggests it would validate any TCE, including in kernel ones for which this would be unsafe. The function has the "kvmppc_" prefix and the file besides in arch/powerpc/kvm, so to my taste it is self-explanatory that it only handles TCEs from KVM guests (not even from a random user-space tool), no? + * WARNING: This will be called in real-mode on HV KVM and virtual + * mode on PR KVM + */ +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce) +{ + unsigned long mask = ((1ULL << IOMMU_PAGE_SHIFT_4K) - 1) & + ~(TCE_PCI_WRITE | TCE_PCI_READ); + + if (tce & mask) + return H_PARAMETER; + + return H_SUCCESS; +} +EXPORT_SYMBOL_GPL(kvmppc_tce_validate); + +/* Note on the use of page_address() in real mode, + * + * It is safe to use page_address() in real mode on ppc64 because + * page_address() is always defined as lowmem_page_address() + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial + * operation and does not access page struct. + * + * Theoretically page_address() could be defined different + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL + * should be enabled. + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64, + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP + * is not expected to be enabled on ppc32, page_address() + * is safe for ppc32 as well. + * + * WARNING: This will be called in real-mode on HV KVM and virtual + * mode on PR KVM + */ +static u64 *kvmppc_page_address(struct page *page) +{ +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL) +#error TODO: fix to avoid page_address() here +#endif + return (u64 *) page_address(page); +} + +/* + * Handles TCE requests for emulated devices. + * Puts guest TCE values to the
RE: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.
Hello! > commit a684d520ed62cf0db4495e5197d5bf722e4f8109 > Author: Peter Hornyack > Date: Fri Dec 18 14:44:04 2015 -0800 > > KVM: add capabilities and exit reasons for MSRs. > > Define KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, and > KVM_EXIT_MSR_AFTER_WRITE, new exit reasons for accesses to MSRs that kvm > does not handle or that user space needs to be notified about. Define the > KVM_CAP_MSR_EXITS, KVM_CAP_ENABLE_MSR_EXITS, and KVM_CAP_DISABLE_MSR_EXITS > capabilities to control these new exits for a VM. > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 053f613fc9a9..3bba3248df3d 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -3359,6 +3359,43 @@ Hyper-V SynIC state change. Notification is > used to remap SynIC > event/message pages and to enable/disable SynIC messages/events processing > in userspace. > > + /* > + * KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, > + * KVM_EXIT_MSR_AFTER_WRITE > + */ > + struct { > + __u32 index;/* i.e. ecx; out */ > + __u64 data; /* out (wrmsr) / in (rdmsr) */ > +#define KVM_EXIT_MSR_COMPLETION_FAILED 1 > + __u64 type; /* out */ > +#define KVM_EXIT_MSR_UNHANDLED 0 > +#define KVM_EXIT_MSR_HANDLED 1 > + __u8 handled; /* in */ > + } msr; > + > +If exit_reason is KVM_EXIT_MSR_READ or KVM_EXIT_MSR_WRITE, then the vcpu has > +executed a rdmsr or wrmsr instruction which could not be satisfied by kvm. > The > +msr struct is used for both output to and input from user space. index is the > +target MSR number held in ecx; user space must not modify this field. data In 'index', you meant? I would enlarge it to __u64 and use generalized encoding, the same as for KVM_SET_ONE_REG ioctl. I already wrote about it. And i would use simply "REG" instead of "MSR" denotion. Because on different architectures they can have different names (e. g. on ARM32 they are called "coprocessor registers" and on ARM64 these are "system registers"), however the common thing between them is that it is some special CPU register, access to which can be trapped and emulated. Therefore KVM_EXIT_REG_xxx. > +holds the payload from a wrmsr or must be filled in with a payload on a > rdmsr. > +For a normal exit, type will be 0. > + > +On the return path into kvm, user space should set handled to > +KVM_EXIT_MSR_HANDLED if it successfully handled the MSR access. Otherwise, > +handled should be set to KVM_EXIT_MSR_UNHANDLED, which will cause a general > +protection fault to be injected into the vcpu. If an error occurs during the > +return into kvm, the vcpu will not be run and another exit will be generated > +with type set to KVM_EXIT_MSR_COMPLETION_FAILED. > + > +If exit_reason is KVM_EXIT_MSR_AFTER_WRITE, then the vcpu has executed a > wrmsr > +instruction which is handled by kvm but which user space may need to be > +notified about. index and data are set as described above; the value of type > +depends on the MSR that was written. handled is ignored on reentry into kvm. 1. Is there any real need to distinguish between KVM_EXIT_MSR_WRITE and KVM_EXIT_MSR_AFTER_WRITE ? IMHO from userland's point of view these are the same. 2. Why do WRITE and READ have to be different exit codes? We could use something like "u8 is_write" in our structure, this would be more in line with PIO and MMIO handling. > + > +KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, and KVM_EXIT_MSR_AFTER_WRITE can only > +occur when KVM_CAP_MSR_EXITS is present and KVM_CAP_ENABLE_MSR_EXITS has been > +set. A detailed description of these capabilities is below. > + > /* Fix the size of the union. */ > char padding[256]; > }; > @@ -3697,6 +3734,26 @@ a KVM_EXIT_IOAPIC_EOI vmexit will be reported > to userspace. > Fails if VCPU has already been created, or if the irqchip is already in the > kernel (i.e. KVM_CREATE_IRQCHIP has already been called). > > +7.6 KVM_CAP_ENABLE_MSR_EXITS, KVM_CAP_DISABLE_MSR_EXITS > + > +Architectures: x86 (vmx-only) > +Parameters: none > +Returns: 0 on success, -1 on error > + > +These capabilities enable and disable exits to user space for certain guest > MSR > +accesses. These capabilities are only available if KVM_CHECK_EXTENSION > +indicates that KVM_CAP_MSR_EXITS is present. > + > +When enabled, kvm will exit to user space when the guest reads > +an MSR that kvm does not handle (KVM_EXIT_MSR_READ), writes an MSR that kvm > +does not handle (KVM_EXIT_MSR_WRITE), or writes an MSR that kvm handles but > for > +which user space should be notified (KVM_EXIT_MSR_AFTER_WRITE). > + > +These exits are currently only implemented for vmx. Also, note that if the > kvm > +module's ignore_msrs flag is set then KVM_EXIT_MSR_READ and > KVM_EXIT_MSR_WRITE > +will not be generated, and unhandled MSR accesses will simply be ignored and > +the guest re-entered immediately. > + > > 8. Other capabilities. > -- > @@ -3726,3 +3783,11 @@ In order to use SynIC, it ha
RE: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts
Hi Radim/Paolo, > -Original Message- > From: Yang Zhang [mailto:yang.zhang...@gmail.com] > Sent: Tuesday, December 22, 2015 3:14 PM > To: Wu, Feng ; pbonz...@redhat.com; > rkrc...@redhat.com > Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu > (jiang@linux.intel.com) > Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest- > priority interrupts > > On 2015/12/22 14:59, Wu, Feng wrote: > > > > > >> -Original Message- > >> From: Yang Zhang [mailto:yang.zhang...@gmail.com] > >> Sent: Tuesday, December 22, 2015 2:49 PM > >> To: Wu, Feng ; pbonz...@redhat.com; > >> rkrc...@redhat.com > >> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu > >> (jiang@linux.intel.com) > >> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest- > >> priority interrupts > >> > >> > >> On 2015/12/16 9:37, Feng Wu wrote: > >>> Use vector-hashing to deliver lowest-priority interrupts, As an > >>> example, modern Intel CPUs in server platform use this method to > >>> handle lowest-priority interrupts. > >>> > >>> Signed-off-by: Feng Wu > >>> --- > >>> arch/x86/kvm/irq_comm.c | 27 ++- > >>> arch/x86/kvm/lapic.c| 57 > >> - > >>> arch/x86/kvm/lapic.h| 2 ++ > >>> arch/x86/kvm/x86.c | 9 > >>> arch/x86/kvm/x86.h | 1 + > >>> 5 files changed, 81 insertions(+), 15 deletions(-) > >>> > >>> bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct > kvm_lapic > *src, > >>> struct kvm_lapic_irq *irq, int *r, unsigned long > >> *dest_map) > >>> { > >>> @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct > kvm > >> *kvm, struct kvm_lapic *src, > >>> dst = map->logical_map[cid]; > >>> > >>> if (kvm_lowest_prio_delivery(irq)) { > >>> - int l = -1; > >>> - for_each_set_bit(i, &bitmap, 16) { > >>> - if (!dst[i]) > >>> - continue; > >>> - if (l < 0) > >>> - l = i; > >>> - else if (kvm_apic_compare_prio(dst[i]- > >vcpu, > >> dst[l]->vcpu) < 0) > >>> - l = i; > >>> + if (!kvm_vector_hashing_enabled()) { > >>> + int l = -1; > >>> + for_each_set_bit(i, &bitmap, 16) { > >>> + if (!dst[i]) > >>> + continue; > >>> + if (l < 0) > >>> + l = i; > >>> + else if > (kvm_apic_compare_prio(dst[i]- > >>> vcpu, dst[l]->vcpu) < 0) > >>> + l = i; > >>> + } > >>> + bitmap = (l >= 0) ? 1 << l : 0; > >>> + } else { > >>> + int idx = 0; > >>> + unsigned int dest_vcpus = 0; > >>> + > >>> + for_each_set_bit(i, &bitmap, 16) { > >>> + if (!dst[i] > >> && !kvm_lapic_enabled(dst[i]->vcpu)) { > >> > >> It should be or(||) not and (&&). > > > > Oh, you are right! My negligence! Thanks for pointing this out, Yang! > > btw, i think the kvm_lapic_enabled check is wrong here? Why need it here? > >>> > >>> If the lapic is not enabled, I think we cannot recognize it as a > >>> candidate, can > >> we? > >>> Maybe Radim can confirm this, Radim, what is your option? > >> > >> Lapic can be disable by hw or sw. Here we only need to check the hw is > >> enough which is already covered while injecting the interrupt into > >> guest. I remember we(Glab, Macelo and me) have discussed it several ago, > >> but i cannot find the mail thread. > > > > But if the lapic is disabled by software, we cannot still inject interrupts > > to > > it, can we? > > Yes, We cannot inject the normal interrupt. But this already covered by > current logic and add a check here seems meaningless. Conversely, it may > do bad thing.. > Let's wait for Radim/Paolo's opinions about this. Thanks, Feng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts
On 2015/12/22 14:59, Wu, Feng wrote: -Original Message- From: Yang Zhang [mailto:yang.zhang...@gmail.com] Sent: Tuesday, December 22, 2015 2:49 PM To: Wu, Feng ; pbonz...@redhat.com; rkrc...@redhat.com Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu (jiang@linux.intel.com) Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest- priority interrupts On 2015/12/16 9:37, Feng Wu wrote: Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: Feng Wu --- arch/x86/kvm/irq_comm.c | 27 ++- arch/x86/kvm/lapic.c| 57 - arch/x86/kvm/lapic.h| 2 ++ arch/x86/kvm/x86.c | 9 arch/x86/kvm/x86.h | 1 + 5 files changed, 81 insertions(+), 15 deletions(-) bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map) { @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, dst = map->logical_map[cid]; if (kvm_lowest_prio_delivery(irq)) { - int l = -1; - for_each_set_bit(i, &bitmap, 16) { - if (!dst[i]) - continue; - if (l < 0) - l = i; - else if (kvm_apic_compare_prio(dst[i]->vcpu, dst[l]->vcpu) < 0) - l = i; + if (!kvm_vector_hashing_enabled()) { + int l = -1; + for_each_set_bit(i, &bitmap, 16) { + if (!dst[i]) + continue; + if (l < 0) + l = i; + else if (kvm_apic_compare_prio(dst[i]- vcpu, dst[l]->vcpu) < 0) + l = i; + } + bitmap = (l >= 0) ? 1 << l : 0; + } else { + int idx = 0; + unsigned int dest_vcpus = 0; + + for_each_set_bit(i, &bitmap, 16) { + if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) { It should be or(||) not and (&&). Oh, you are right! My negligence! Thanks for pointing this out, Yang! btw, i think the kvm_lapic_enabled check is wrong here? Why need it here? If the lapic is not enabled, I think we cannot recognize it as a candidate, can we? Maybe Radim can confirm this, Radim, what is your option? Lapic can be disable by hw or sw. Here we only need to check the hw is enough which is already covered while injecting the interrupt into guest. I remember we(Glab, Macelo and me) have discussed it several ago, but i cannot find the mail thread. But if the lapic is disabled by software, we cannot still inject interrupts to it, can we? Yes, We cannot inject the normal interrupt. But this already covered by current logic and add a check here seems meaningless. Conversely, it may do bad thing.. -- best regards yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts
> -Original Message- > From: Yang Zhang [mailto:yang.zhang...@gmail.com] > Sent: Tuesday, December 22, 2015 2:49 PM > To: Wu, Feng ; pbonz...@redhat.com; > rkrc...@redhat.com > Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu > (jiang@linux.intel.com) > Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest- > priority interrupts > > > On 2015/12/16 9:37, Feng Wu wrote: > > Use vector-hashing to deliver lowest-priority interrupts, As an > > example, modern Intel CPUs in server platform use this method to > > handle lowest-priority interrupts. > > > > Signed-off-by: Feng Wu > > --- > > arch/x86/kvm/irq_comm.c | 27 ++- > > arch/x86/kvm/lapic.c| 57 > - > > arch/x86/kvm/lapic.h| 2 ++ > > arch/x86/kvm/x86.c | 9 > > arch/x86/kvm/x86.h | 1 + > > 5 files changed, 81 insertions(+), 15 deletions(-) > > > > bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic > >> *src, > > struct kvm_lapic_irq *irq, int *r, unsigned long > *dest_map) > > { > > @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm > *kvm, struct kvm_lapic *src, > > dst = map->logical_map[cid]; > > > > if (kvm_lowest_prio_delivery(irq)) { > > - int l = -1; > > - for_each_set_bit(i, &bitmap, 16) { > > - if (!dst[i]) > > - continue; > > - if (l < 0) > > - l = i; > > - else if > > (kvm_apic_compare_prio(dst[i]->vcpu, > dst[l]->vcpu) < 0) > > - l = i; > > + if (!kvm_vector_hashing_enabled()) { > > + int l = -1; > > + for_each_set_bit(i, &bitmap, 16) { > > + if (!dst[i]) > > + continue; > > + if (l < 0) > > + l = i; > > + else if > > (kvm_apic_compare_prio(dst[i]- > > vcpu, dst[l]->vcpu) < 0) > > + l = i; > > + } > > + bitmap = (l >= 0) ? 1 << l : 0; > > + } else { > > + int idx = 0; > > + unsigned int dest_vcpus = 0; > > + > > + for_each_set_bit(i, &bitmap, 16) { > > + if (!dst[i] > && !kvm_lapic_enabled(dst[i]->vcpu)) { > > It should be or(||) not and (&&). > >>> > >>> Oh, you are right! My negligence! Thanks for pointing this out, Yang! > >> > >> btw, i think the kvm_lapic_enabled check is wrong here? Why need it here? > > > > If the lapic is not enabled, I think we cannot recognize it as a candidate, > > can > we? > > Maybe Radim can confirm this, Radim, what is your option? > > Lapic can be disable by hw or sw. Here we only need to check the hw is > enough which is already covered while injecting the interrupt into > guest. I remember we(Glab, Macelo and me) have discussed it several ago, > but i cannot find the mail thread. But if the lapic is disabled by software, we cannot still inject interrupts to it, can we? Thanks, Feng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts
On 2015/12/22 12:37, Wu, Feng wrote: -Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- ow...@vger.kernel.org] On Behalf Of Yang Zhang Sent: Monday, December 21, 2015 10:06 AM To: Wu, Feng ; pbonz...@redhat.com; rkrc...@redhat.com Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest- priority interrupts On 2015/12/21 9:50, Wu, Feng wrote: -Original Message- From: Yang Zhang [mailto:yang.zhang...@gmail.com] Sent: Monday, December 21, 2015 9:46 AM To: Wu, Feng ; pbonz...@redhat.com; rkrc...@redhat.com Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest- priority interrupts On 2015/12/16 9:37, Feng Wu wrote: Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: Feng Wu --- arch/x86/kvm/irq_comm.c | 27 ++- arch/x86/kvm/lapic.c| 57 - arch/x86/kvm/lapic.h| 2 ++ arch/x86/kvm/x86.c | 9 arch/x86/kvm/x86.h | 1 + 5 files changed, 81 insertions(+), 15 deletions(-) bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map) { @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, dst = map->logical_map[cid]; if (kvm_lowest_prio_delivery(irq)) { - int l = -1; - for_each_set_bit(i, &bitmap, 16) { - if (!dst[i]) - continue; - if (l < 0) - l = i; - else if (kvm_apic_compare_prio(dst[i]->vcpu, dst[l]->vcpu) < 0) - l = i; + if (!kvm_vector_hashing_enabled()) { + int l = -1; + for_each_set_bit(i, &bitmap, 16) { + if (!dst[i]) + continue; + if (l < 0) + l = i; + else if (kvm_apic_compare_prio(dst[i]- vcpu, dst[l]->vcpu) < 0) + l = i; + } + bitmap = (l >= 0) ? 1 << l : 0; + } else { + int idx = 0; + unsigned int dest_vcpus = 0; + + for_each_set_bit(i, &bitmap, 16) { + if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) { It should be or(||) not and (&&). Oh, you are right! My negligence! Thanks for pointing this out, Yang! btw, i think the kvm_lapic_enabled check is wrong here? Why need it here? If the lapic is not enabled, I think we cannot recognize it as a candidate, can we? Maybe Radim can confirm this, Radim, what is your option? Lapic can be disable by hw or sw. Here we only need to check the hw is enough which is already covered while injecting the interrupt into guest. I remember we(Glab, Macelo and me) have discussed it several ago, but i cannot find the mail thread. Thanks, Feng -- best regards yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts
On 2015/12/22 12:36, Wu, Feng wrote: -Original Message- From: Yang Zhang [mailto:yang.zhang...@gmail.com] Sent: Monday, December 21, 2015 10:01 AM To: Wu, Feng ; pbonz...@redhat.com; rkrc...@redhat.com Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts On 2015/12/21 9:55, Wu, Feng wrote: -Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- ow...@vger.kernel.org] On Behalf Of Yang Zhang Sent: Monday, December 21, 2015 9:50 AM To: Wu, Feng ; pbonz...@redhat.com; rkrc...@redhat.com Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts On 2015/12/16 9:37, Feng Wu wrote: Use vector-hashing to deliver lowest-priority interrupts for VT-d posted-interrupts. Signed-off-by: Feng Wu --- arch/x86/kvm/lapic.c | 67 arch/x86/kvm/lapic.h | 2 ++ arch/x86/kvm/vmx.c | 12 -- 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e29001f..d4f2c8f 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -854,6 +854,73 @@ out: } /* + * This routine handles lowest-priority interrupts using vector-hashing + * mechanism. As an example, modern Intel CPUs use this method to handle + * lowest-priority interrupts. + * + * Here is the details about the vector-hashing mechanism: + * 1. For lowest-priority interrupts, store all the possible destination + *vCPUs in an array. + * 2. Use "guest vector % max number of destination vCPUs" to find the right + *destination vCPU in the array for the lowest-priority interrupt. + */ +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm, + struct kvm_lapic_irq *irq) +{ + struct kvm_apic_map *map; + struct kvm_vcpu *vcpu = NULL; + + if (irq->shorthand) + return NULL; + + rcu_read_lock(); + map = rcu_dereference(kvm->arch.apic_map); + + if (!map) + goto out; + + if ((irq->dest_mode != APIC_DEST_PHYSICAL) && + kvm_lowest_prio_delivery(irq)) { + u16 cid; + int i, idx = 0; + unsigned long bitmap = 1; + unsigned int dest_vcpus = 0; + struct kvm_lapic **dst = NULL; + + + if (!kvm_apic_logical_map_valid(map)) + goto out; + + apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap); + + if (cid >= ARRAY_SIZE(map->logical_map)) + goto out; + + dst = map->logical_map[cid]; + + for_each_set_bit(i, &bitmap, 16) { + if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) { + clear_bit(i, &bitmap); + continue; + } + } + + dest_vcpus = hweight16(bitmap); + + if (dest_vcpus != 0) { + idx = kvm_vector_2_index(irq->vector, dest_vcpus, +&bitmap, 16); + vcpu = dst[idx-1]->vcpu; + } + } + +out: + rcu_read_unlock(); + return vcpu; +} +EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest); + +/* * Add a pending IRQ into lapic. * Return 1 if successfully added and 0 if discarded. */ diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6890ef0..52bffce 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, struct kvm_vcpu **dest_vcpu); int kvm_vector_2_index(u32 vector, u32 dest_vcpus, const unsigned long *bitmap, u32 bitmap_size); +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm, + struct kvm_lapic_irq *irq); #endif diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5eb56ed..3f89189 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq, */ kvm_set_msi_irq(e, &irq); - if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) - continue; + + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) { + if (!kvm_vector_hashing_enabled() || + irq.delivery_mode != APIC_DM_LOWEST) + continue; + + vcpu = kvm_intr_vector_hashing_dest(kvm, &irq); + if (!vcpu) + continue; +
RE: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Yang Zhang > Sent: Monday, December 21, 2015 10:06 AM > To: Wu, Feng ; pbonz...@redhat.com; > rkrc...@redhat.com > Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest- > priority interrupts > > On 2015/12/21 9:50, Wu, Feng wrote: > > > > > >> -Original Message- > >> From: Yang Zhang [mailto:yang.zhang...@gmail.com] > >> Sent: Monday, December 21, 2015 9:46 AM > >> To: Wu, Feng ; pbonz...@redhat.com; > >> rkrc...@redhat.com > >> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org > >> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest- > >> priority interrupts > >> > >> On 2015/12/16 9:37, Feng Wu wrote: > >>> Use vector-hashing to deliver lowest-priority interrupts, As an > >>> example, modern Intel CPUs in server platform use this method to > >>> handle lowest-priority interrupts. > >>> > >>> Signed-off-by: Feng Wu > >>> --- > >>>arch/x86/kvm/irq_comm.c | 27 ++- > >>>arch/x86/kvm/lapic.c| 57 > >> - > >>>arch/x86/kvm/lapic.h| 2 ++ > >>>arch/x86/kvm/x86.c | 9 > >>>arch/x86/kvm/x86.h | 1 + > >>>5 files changed, 81 insertions(+), 15 deletions(-) > >>> > >>>bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic > *src, > >>> struct kvm_lapic_irq *irq, int *r, unsigned long > >>> *dest_map) > >>>{ > >>> @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm > >> *kvm, struct kvm_lapic *src, > >>> dst = map->logical_map[cid]; > >>> > >>> if (kvm_lowest_prio_delivery(irq)) { > >>> - int l = -1; > >>> - for_each_set_bit(i, &bitmap, 16) { > >>> - if (!dst[i]) > >>> - continue; > >>> - if (l < 0) > >>> - l = i; > >>> - else if (kvm_apic_compare_prio(dst[i]->vcpu, > >> dst[l]->vcpu) < 0) > >>> - l = i; > >>> + if (!kvm_vector_hashing_enabled()) { > >>> + int l = -1; > >>> + for_each_set_bit(i, &bitmap, 16) { > >>> + if (!dst[i]) > >>> + continue; > >>> + if (l < 0) > >>> + l = i; > >>> + else if (kvm_apic_compare_prio(dst[i]- > >>> vcpu, dst[l]->vcpu) < 0) > >>> + l = i; > >>> + } > >>> + bitmap = (l >= 0) ? 1 << l : 0; > >>> + } else { > >>> + int idx = 0; > >>> + unsigned int dest_vcpus = 0; > >>> + > >>> + for_each_set_bit(i, &bitmap, 16) { > >>> + if (!dst[i] > >> && !kvm_lapic_enabled(dst[i]->vcpu)) { > >> > >> It should be or(||) not and (&&). > > > > Oh, you are right! My negligence! Thanks for pointing this out, Yang! > > btw, i think the kvm_lapic_enabled check is wrong here? Why need it here? If the lapic is not enabled, I think we cannot recognize it as a candidate, can we? Maybe Radim can confirm this, Radim, what is your option? Thanks, Feng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts
> -Original Message- > From: Yang Zhang [mailto:yang.zhang...@gmail.com] > Sent: Monday, December 21, 2015 10:01 AM > To: Wu, Feng ; pbonz...@redhat.com; > rkrc...@redhat.com > Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d > posted-interrupts > > On 2015/12/21 9:55, Wu, Feng wrote: > > > > > >> -Original Message- > >> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > >> ow...@vger.kernel.org] On Behalf Of Yang Zhang > >> Sent: Monday, December 21, 2015 9:50 AM > >> To: Wu, Feng ; pbonz...@redhat.com; > >> rkrc...@redhat.com > >> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org > >> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d > >> posted-interrupts > >> > >> On 2015/12/16 9:37, Feng Wu wrote: > >>> Use vector-hashing to deliver lowest-priority interrupts for > >>> VT-d posted-interrupts. > >>> > >>> Signed-off-by: Feng Wu > >>> --- > >>>arch/x86/kvm/lapic.c | 67 > >> > >>>arch/x86/kvm/lapic.h | 2 ++ > >>>arch/x86/kvm/vmx.c | 12 -- > >>>3 files changed, 79 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >>> index e29001f..d4f2c8f 100644 > >>> --- a/arch/x86/kvm/lapic.c > >>> +++ b/arch/x86/kvm/lapic.c > >>> @@ -854,6 +854,73 @@ out: > >>>} > >>> > >>>/* > >>> + * This routine handles lowest-priority interrupts using vector-hashing > >>> + * mechanism. As an example, modern Intel CPUs use this method to > handle > >>> + * lowest-priority interrupts. > >>> + * > >>> + * Here is the details about the vector-hashing mechanism: > >>> + * 1. For lowest-priority interrupts, store all the possible destination > >>> + *vCPUs in an array. > >>> + * 2. Use "guest vector % max number of destination vCPUs" to find the > right > >>> + *destination vCPU in the array for the lowest-priority interrupt. > >>> + */ > >>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm, > >>> + struct kvm_lapic_irq *irq) > >>> +{ > >>> + struct kvm_apic_map *map; > >>> + struct kvm_vcpu *vcpu = NULL; > >>> + > >>> + if (irq->shorthand) > >>> + return NULL; > >>> + > >>> + rcu_read_lock(); > >>> + map = rcu_dereference(kvm->arch.apic_map); > >>> + > >>> + if (!map) > >>> + goto out; > >>> + > >>> + if ((irq->dest_mode != APIC_DEST_PHYSICAL) && > >>> + kvm_lowest_prio_delivery(irq)) { > >>> + u16 cid; > >>> + int i, idx = 0; > >>> + unsigned long bitmap = 1; > >>> + unsigned int dest_vcpus = 0; > >>> + struct kvm_lapic **dst = NULL; > >>> + > >>> + > >>> + if (!kvm_apic_logical_map_valid(map)) > >>> + goto out; > >>> + > >>> + apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap); > >>> + > >>> + if (cid >= ARRAY_SIZE(map->logical_map)) > >>> + goto out; > >>> + > >>> + dst = map->logical_map[cid]; > >>> + > >>> + for_each_set_bit(i, &bitmap, 16) { > >>> + if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) { > >>> + clear_bit(i, &bitmap); > >>> + continue; > >>> + } > >>> + } > >>> + > >>> + dest_vcpus = hweight16(bitmap); > >>> + > >>> + if (dest_vcpus != 0) { > >>> + idx = kvm_vector_2_index(irq->vector, dest_vcpus, > >>> + &bitmap, 16); > >>> + vcpu = dst[idx-1]->vcpu; > >>> + } > >>> + } > >>> + > >>> +out: > >>> + rcu_read_unlock(); > >>> + return vcpu; > >>> +} > >>> +EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest); > >>> + > >>> +/* > >>> * Add a pending IRQ into lapic. > >>> * Return 1 if successfully added and 0 if discarded. > >>> */ > >>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h > >>> index 6890ef0..52bffce 100644 > >>> --- a/arch/x86/kvm/lapic.h > >>> +++ b/arch/x86/kvm/lapic.h > >>> @@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, > >> struct kvm_lapic_irq *irq, > >>> struct kvm_vcpu **dest_vcpu); > >>>int kvm_vector_2_index(u32 vector, u32 dest_vcpus, > >>> const unsigned long *bitmap, u32 bitmap_size); > >>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm, > >>> + struct kvm_lapic_irq *irq); > >>>#endif > >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>> index 5eb56ed..3f89189 100644 > >>> --- a/arch/x86/kvm/vmx.c > >>> +++ b/arch/x86/kvm/vmx.c > >>> @@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm > *kvm, > >> unsigned int host_irq, > >>>*/ > >>> > >>> kvm_set_msi_irq(e, &irq); > >>> - if (!kvm_intr_is_single_vcpu(kvm, &ir
Re: RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
Hi, Kevin, Can you tell how to reset/reboot this VM, if it goes to the handle_hwpic1() on its booting procedure? I mean, usually, SeaBIOS would not go to handle_hwpic routine. But in my test case, SeaBIOS calls handle_hwpic when KVM injects a #UD expcetion (not irq) and SeaBIOS will loop to handle this if KVM persistently injects exception. Now, i just wish to reset/reboot this VM if it is fall into handle_hwpic. I tried follwing patch and it seems not work. What can i do to force reset/reboot? @@ -7,10 +7,11 @@ #include "biosvar.h" // SET_IVT #include "config.h" // CONFIG_* #include "output.h" // dprintf #include "pic.h" // pic_* + #include "hw/ps2port.h" u16 pic_irqmask_read(void) { if (!CONFIG_HARDWARE_IRQ) @@ -103,10 +104,11 @@ pic_isr2_read(void) void VISIBLE16 handle_hwpic1(struct bregs *regs) { dprintf(DEBUG_ISR_hwpic1, "handle_hwpic1 irq=%x\n", pic_isr1_read()); pic_eoi1(); +i8042_reboot(); } useful information: kmod ftrace: <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 8306 <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real) <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0) <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0 bad SeaBIOS log: [2015-12-17 12:37:30] In 32bit resume [2015-12-17 12:37:30] =Attempting a hard reboot [2015-12-17 12:37:30] SeaBIOS (version rel-1.8.1-0-g4adadbd-20151217_104405-linux-emBwNn) [2015-12-17 12:37:30] No Xen hypervisor found. [2015-12-17 12:37:30] Running on QEMU (i440fx) [2015-12-17 12:37:30] Running on KVM [2015-12-17 12:37:30] RamSize: 0x8000 [cmos] [2015-12-17 12:37:30] Relocating init from 0x000db230 to 0x7ffad360 (size 76768) [2015-12-17 12:37:30] Found QEMU fw_cfg [2015-12-17 12:37:30] RamBlock: addr 0x len 0x8000 [e820] [2015-12-17 12:37:30] Moving pm_base to 0x600 [2015-12-17 12:37:30] boot order: [2015-12-17 12:37:30] 1: /pci@i0cf8/ide@1,1/drive@0/disk@0 [2015-12-17 12:37:30] 2: HALT [2015-12-17 12:37:30] maininit [2015-12-17 12:37:30] platform_hardware_setup [2015-12-17 12:37:30] init pic [2015-12-17 12:37:30] pic_setup [2015-12-17 12:37:30] pic_reset [2015-12-17 12:37:30] enable_hwirq [2015-12-17 12:37:30] CPU Mhz=3304 [2015-12-17 12:37:30] enable_hwirq [2015-12-17 12:37:30] enable_hwirq [2015-12-17 12:37:30] === PCI bus & bridge init === [2015-12-17 12:37:30] PCI: pci_bios_init_bus_rec bus = 0x0 [2015-12-17 12:37:30] === PCI device probing === [2015-12-17 12:37:30] Found 6 PCI devices (max PCI bus is 00) [2015-12-17 12:37:30] === PCI new allocation pass #1 === [2015-12-17 12:37:30] PCI: check devices [2015-12-17 12:37:30] === PCI new allocation pass #2 === [2015-12-17 12:37:30] PCI: IO: c000 - c02f [2015-12-17 12:37:30] PCI: 32: 8000 - fec0 [2015-12-17 12:37:30] PCI: map device bdf=00:01.2 bar 4, addr c000, size 0020 [io] [2015-12-17 12:37:30] PCI: map device bdf=00:01.1 bar 4, addr c020, size 0010 [io] [2015-12-17 12:37:30] PCI: map device bdf=00:02.0 bar 6, addr febe, size 0001 [mem] [2015-12-17 12:37:30] PCI: map device bdf=00:02.0 bar 1, addr febf, size 1000 [mem] [2015-12-17 12:37:30] PCI: map device bdf=00:02.0 bar 0, addr fc00, size 0200 [prefmem] [2015-12-17 12:37:30] PCI: init bdf=00:00.0 id=8086:1237 [2015-12-17 12:37:30] PCI: init bdf=00:01.0 id=8086:7000 [2015-12-17 12:37:30] PIIX3/PIIX4 init: elcr=00 0c [2015-12-17 12:37:30] PCI: init bdf=00:01.1 id=8086:7010 [2015-12-17 12:37:30] PCI: init bdf=00:01.2 id=8086:7020 [2015-12-17 12:37:30] PCI: init bdf=00:01.3 id=8086:7113 [2015-12-17 12:37:30] Using pmtimer, ioport 0x608 [2015-12-17 12:37:30] PCI: init bdf=00:02.0 id=1013:00b8 [2015-12-17 12:37:30] PCI: Using 00:02.0 for primary VGA [2015-12-17 12:37:30] handle_hshamanpnd:dl leae_p_sismcmp_p:i: d a=ap3 <<=== everytime stuck, AP setup log seems abnormal! [2015-12-17 12:37:30] ièf[cf_^ifd_=f3 [2015-12-17 12:37:30] èf[f^f_f]fÃÍ^XË<90>Found 4 cpu(s) max supported 4 cpu(s) [2015-12-17 12:37:30] Copying PIR from 0x7ffbea18 to 0x000f5700 [2015-12-17 12:37:30] Copying MPTABLE from 0x6e30/7ffa42c0 to 0x000f55e0 [2015-12-17 12:37:30] Copying SMBIOS entry point from 0x6e11 to 0x000f55c0 [2015-12-17 12:37:31] Scan for VGA option rom [2015-12-17 12:37:31] Running option rom at c000:0003 [2015-12-17 12:37:31] Start SeaVGABIOS (version rel-1.8.1-0-g4adadbd-20150316_085902-nilsson.home.kraxel.org) [2015-12-17 12:37:31] enter vga_post: [2015-12-17 12:37:31]a=0010 b= c= d= ds= es=f000 ss= [2015-12-17 12:37:31] si= di=57e0 bp= sp=6dbe cs=f000 ip=d1fb f= [2015-12-17 12:37:31] cirrus init [2015-12-17 12:37:31] cirrus init 2 [2015-12-17 12:37:31] Attempting to allocate VGA stack via pmm call to f000:d2a0 <<== here stuck, loop handle PIC irq0 [2015-12-17 12:37:35] handle_hwpic1 irq=0 [2015-12-17 12:37:35] handle_hwpic1 irq=0 [
RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
> -Original Message- > From: Kevin O'Connor [mailto:ke...@koconnor.net] > Sent: Tuesday, December 22, 2015 2:47 AM > To: Gonglei (Arei) > Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org; > Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy > problem on qemu-kvm platform > > On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote: > > When the gurb of OS is booting, then the softirq and C function > > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI, > > romlayout.S: irqentry_extrastack is invoked, and the extra stack will > > be used again. And the stack of first calling will be broken, so that the > SeaBIOS stuck. > > > > You can easily reproduce the problem. > > > > 1. start on guest > > 2. reset the guest > > 3. inject a NMI when the guest show the grub surface 4. then the guest > > stuck > > Does the SeaBIOS patch below help? Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch. > I'm not familiar with how to "inject a > NMI" - can you describe the process in more detail? > 1. Qemu Command line: #: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -smp 8 -name suse -vnc 0.0.0.0:10 \ -device virtio-scsi-pci,id=scsi0 -drive file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native \ -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \ -chardev file,id=seabios,path=/home/seabios.log -device isa-debugcon,iobase=0x402,chardev=seabios \ -monitor stdio -qmp unix:/tmp/qmp,server,nowait 2. Inject a NMI by QMP: #: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp Welcome to the QMP low-level shell! Connected to QEMU 2.5.0 (QEMU) system_reset {"return": {}} (QEMU) inject-nmi {"return": {}} (QEMU) inject-nmi {"return": {}} Regards, -Gonglei > -Kevin > > > --- a/src/romlayout.S > +++ b/src/romlayout.S > @@ -548,7 +548,9 @@ entry_post: > ENTRY_INTO32 _cfunc32flat_handle_post // Normal entry point > > ORG 0xe2c3 > -IRQ_ENTRY 02 > +.global entry_02 > +entry_02: > +ENTRY handle_02 // NMI handler does not switch onto extra > +stack > > ORG 0xe3fe > .global entry_13_official -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 01/14] qapi: Rename (one) qjson.h to qobject-json.h
We have two different JSON visitors in the tree; and having both named 'qjson.h' can cause include confusion. Rename the qapi version. Why did I pick that one? A later patch plans on deleting the top-level qjson.c once we have a native JSON output visitor; we could have renamed that one for less overall churn. On the other hand, all of the QObject subtypes have their own qFOO.c file, but qjson.c makes it sound like we have a QTYPE_JSON subclass of QObject; the new name of qobject-json makes it obvious that the file is used for conversions between QObject and JSON, and not a QObject subtype. Kill trailing whitespace in the renamed tests/check-qobject-json.c to keep checkpatch.pl happy. Signed-off-by: Eric Blake Reviewed-by: Paolo Bonzini --- v2: retitle, enhance commit message, rebase to master --- MAINTAINERS | 2 +- balloon.c | 2 +- block.c | 2 +- block/archipelago.c | 2 +- block/nbd.c | 2 +- block/quorum.c| 2 +- blockjob.c| 2 +- hw/core/qdev.c| 2 +- hw/misc/pvpanic.c | 2 +- hw/net/virtio-net.c | 2 +- include/qapi/qmp/{qjson.h => qobject-json.h} | 0 include/qapi/qmp/types.h | 2 +- monitor.c | 2 +- qapi/qmp-event.c | 2 +- qemu-img.c| 2 +- qga/main.c| 2 +- qobject/Makefile.objs | 3 ++- qobject/{qjson.c => qobject-json.c} | 2 +- target-s390x/kvm.c| 2 +- tests/.gitignore | 2 +- tests/Makefile| 8 tests/{check-qjson.c => check-qobject-json.c} | 14 +++--- tests/libqtest.c | 2 +- ui/spice-core.c | 2 +- vl.c | 2 +- 25 files changed, 34 insertions(+), 33 deletions(-) rename include/qapi/qmp/{qjson.h => qobject-json.h} (100%) rename qobject/{qjson.c => qobject-json.c} (99%) rename tests/{check-qjson.c => check-qobject-json.c} (99%) diff --git a/MAINTAINERS b/MAINTAINERS index 55a0fd8..81fd039 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1155,7 +1155,7 @@ X: include/qapi/qmp/dispatch.h F: tests/check-qdict.c F: tests/check-qfloat.c F: tests/check-qint.c -F: tests/check-qjson.c +F: tests/check-qobject-json.c F: tests/check-qlist.c F: tests/check-qstring.c T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp diff --git a/balloon.c b/balloon.c index 0f45d1b..5983b4f 100644 --- a/balloon.c +++ b/balloon.c @@ -31,7 +31,7 @@ #include "trace.h" #include "qmp-commands.h" #include "qapi/qmp/qerror.h" -#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qobject-json.h" static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; diff --git a/block.c b/block.c index 411edbf..0b7eb00 100644 --- a/block.c +++ b/block.c @@ -30,7 +30,7 @@ #include "qemu/module.h" #include "qapi/qmp/qerror.h" #include "qapi/qmp/qbool.h" -#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qobject-json.h" #include "sysemu/block-backend.h" #include "sysemu/sysemu.h" #include "qemu/notify.h" diff --git a/block/archipelago.c b/block/archipelago.c index 855655c..80a1bb5 100644 --- a/block/archipelago.c +++ b/block/archipelago.c @@ -56,7 +56,7 @@ #include "qemu/thread.h" #include "qapi/qmp/qint.h" #include "qapi/qmp/qstring.h" -#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qobject-json.h" #include "qemu/atomic.h" #include diff --git a/block/nbd.c b/block/nbd.c index 416f42b..ef53083 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -32,7 +32,7 @@ #include "qemu/module.h" #include "qemu/sockets.h" #include "qapi/qmp/qdict.h" -#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qobject-json.h" #include "qapi/qmp/qint.h" #include "qapi/qmp/qstring.h" diff --git a/block/quorum.c b/block/quorum.c index 6793f12..a64b40d 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -18,7 +18,7 @@ #include "qapi/qmp/qdict.h" #include "qapi/qmp/qerror.h" #include "qapi/qmp/qint.h" -#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qobject-json.h" #include "qapi/qmp/qlist.h" #include "qapi/qmp/qstring.h" #include "qapi-event.h" diff --git a/blockjob.c b/blockjob.c index 80adb9d..84361f7 100644 --- a/blockjob.c +++ b/blockjob.c @@ -31,7 +31,7 @@ #include "block/block_int.h" #include "sysemu/block-backend.h" #include "qapi/qmp/qerror.h" -#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qobject-json.h" #include "qemu/coroutine.h" #include "qmp-commands.h" #include "qemu/timer.h" diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 3447f93..149028d 100644 --
Re: kvmclock doesn't work, help?
On Fri, Dec 18, 2015 at 1:49 PM, Marcelo Tosatti wrote: > On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote: >> [cc: John Stultz -- maybe you have ideas on how this should best >> integrate with the core code] >> >> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti >> wrote: >> > Can you write an actual proposal (with details) that accomodates the >> > issue described at "Assuming a stable TSC across physical CPUS, and a >> > stable TSC" ? >> > >> > Yes it would be nicer, the IPIs (to stop the vcpus) are problematic for >> > realtime guests. >> >> This shouldn't require many details, and I don't think there's an ABI >> change. The rules are: >> >> When the overall system timebase changes (e.g. when the selected >> clocksource changes or when update_pvclock_gtod is called), the KVM >> host would: >> >> optionally: preempt_disable(); /* for performance */ >> >> for all vms { >> >> for all registered pvti structures { >> pvti->version++; /* should be odd now */ >> } > > pvti is userspace data, so you have to pin it before? Yes. Fortunately, most systems probably only have one page of pvti structures, I think (unless there are a ton of vcpus), so the performance impact should be negligible. > >> /* Note: right now, any vcpu that tries to access pvti will start >> infinite looping. We should add cpu_relax() to the guests. */ >> >> for all registered pvti structures { >> update everything except pvti->version; >> } >> >> for all registered pvti structures { >> pvti->version++; /* should be even now */ >> } >> >> cond_resched(); >> } >> >> Is this enough detail? This should work with all existing guests, >> too, unless there's a buggy guest out there that actually fails to >> double-check version. > > What is the advantage of this over the brute force method, given > that guests will busy spin? > > (busy spin is equally problematic as IPI for realtime guests). I disagree. It's never been safe to call clock_gettime from an RT task and expect a guarantee of real-time performance. We could fix that, but it's not even safe on non-KVM. Sending an IPI *always* stalls the task. Taking a lock (which is effectively what this is doing) only stalls the tasks that contend for the lock, which, most of the time, means that nothing stalls. Also, if the host disables preemption or otherwise boosts its priority while version is odd, then the actual stall will be very short, in contrast to an IPI-induced stall, which will be much, much longer. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 9/9] KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection
Redirecting the wakeup of a VCPU from the H_IPI hypercall to a core running in the host is usually a good idea, most workloads seemed to benefit. However, in one heavily interrupt-driven SMT1 workload, some regression was observed. This patch adds a kvm_hv module parameter called h_ipi_redirect to control this feature. The default value for this tunable is 1 - that is enable the feature. Signed-off-by: Suresh Warrier --- Resending the updated patch with the updated diff since an earlier patch (patch 8/9) had to be resent to fix a build break. arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/book3s_hv.c | 11 +++ arch/powerpc/kvm/book3s_hv_rm_xics.c | 5 - 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 1b93519..29d1442 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -448,6 +448,7 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval); extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu, u32 cpu); extern void kvmppc_xics_ipi_action(void); +extern int h_ipi_redirect; #else static inline void kvmppc_alloc_host_rm_ops(void) {}; static inline void kvmppc_free_host_rm_ops(void) {}; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d6280ed..182ec84 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -81,6 +81,17 @@ static int target_smt_mode; module_param(target_smt_mode, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)"); +#ifdef CONFIG_KVM_XICS +static struct kernel_param_ops module_param_ops = { + .set = param_set_int, + .get = param_get_int, +}; + +module_param_cb(h_ipi_redirect, &module_param_ops, &h_ipi_redirect, + S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core"); +#endif + static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index e673fb9..980d8a6 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -24,6 +24,9 @@ #define DEBUG_PASSUP +int h_ipi_redirect = 1; +EXPORT_SYMBOL(h_ipi_redirect); + static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, u32 new_irq); @@ -148,7 +151,7 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, cpu = vcpu->arch.thread_cpu; if (cpu < 0 || cpu >= nr_cpu_ids) { hcore = -1; - if (kvmppc_host_rm_ops_hv) + if (kvmppc_host_rm_ops_hv && h_ipi_redirect) hcore = find_available_hostcore(XICS_RM_KICK_VCPU); if (hcore != -1) { icp_send_hcore_msg(hcore, vcpu); -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 8/9] KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU
This patch adds support to real-mode KVM to search for a core running in the host partition and send it an IPI message with VCPU to be woken. This avoids having to switch to the host partition to complete an H_IPI hypercall when the VCPU which is the target of the the H_IPI is not loaded (is not running in the guest). The patch also includes the support in the IPI handler running in the host to do the wakeup by calling kvmppc_xics_ipi_action for the PPC_MSG_RM_HOST_ACTION message. When a guest is being destroyed, we need to ensure that there are no pending IPIs waiting to wake up a VCPU before we free the VCPUs of the guest. This is accomplished by: - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs before freeing any VCPUs in kvm_arch_destroy_vm(). - Any PPC_MSG_RM_HOST_ACTION messages must be executed first before any other PPC_MSG_CALL_FUNCTION messages. Signed-off-by: Suresh Warrier --- Fixed build break for CONFIG_SMP=n (thanks to Mike Ellerman for pointing that out). arch/powerpc/kernel/smp.c| 11 + arch/powerpc/kvm/book3s_hv_rm_xics.c | 92 ++-- arch/powerpc/kvm/powerpc.c | 10 3 files changed, 110 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index e222efc..cb8be5d 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -257,6 +257,17 @@ irqreturn_t smp_ipi_demux(void) do { all = xchg(&info->messages, 0); +#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) + /* +* Must check for PPC_MSG_RM_HOST_ACTION messages +* before PPC_MSG_CALL_FUNCTION messages because when +* a VM is destroyed, we call kick_all_cpus_sync() +* to ensure that any pending PPC_MSG_RM_HOST_ACTION +* messages have completed before we free any VCPUs. +*/ + if (all & IPI_MESSAGE(PPC_MSG_RM_HOST_ACTION)) + kvmppc_xics_ipi_action(); +#endif if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNCTION)) generic_smp_call_function_interrupt(); if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE)) diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c index 43ffbfe..e673fb9 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c @@ -51,11 +51,84 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics, /* -- ICP routines -- */ +#ifdef CONFIG_SMP +static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) +{ + int hcpu; + + hcpu = hcore << threads_shift; + kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu; + smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION); + icp_native_cause_ipi_rm(hcpu); +} +#else +static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { } +#endif + +/* + * We start the search from our current CPU Id in the core map + * and go in a circle until we get back to our ID looking for a + * core that is running in host context and that hasn't already + * been targeted for another rm_host_ops. + * + * In the future, could consider using a fairer algorithm (one + * that distributes the IPIs better) + * + * Returns -1, if no CPU could be found in the host + * Else, returns a CPU Id which has been reserved for use + */ +static inline int grab_next_hostcore(int start, + struct kvmppc_host_rm_core *rm_core, int max, int action) +{ + bool success; + int core; + union kvmppc_rm_state old, new; + + for (core = start + 1; core < max; core++) { + old = new = READ_ONCE(rm_core[core].rm_state); + + if (!old.in_host || old.rm_action) + continue; + + /* Try to grab this host core if not taken already. */ + new.rm_action = action; + + success = cmpxchg64(&rm_core[core].rm_state.raw, + old.raw, new.raw) == old.raw; + if (success) { + /* +* Make sure that the store to the rm_action is made +* visible before we return to caller (and the +* subsequent store to rm_data) to synchronize with +* the IPI handler. +*/ + smp_wmb(); + return core; + } + } + + return -1; +} + +static inline int find_available_hostcore(int action) +{ + int core; + int my_core = smp_processor_id() >> threads_shift; + struct kvmppc_host_rm_core *rm_core = kvmppc_host_rm_ops_hv->rm_core; + + core = grab_next_hostcore(my_core, rm_core, cpu_nr_cores(), action); + if (core == -1) + core = grab_next_hostcore(core, rm_core, my
Re: [kvm-unit-tests PATCH 3/3] add timeout support
On Mon, Dec 21, 2015 at 06:04:20PM +0100, Radim Krčmář wrote: > 2015-12-17 14:10-0600, Andrew Jones: > > Signed-off-by: Andrew Jones > > --- > > diff --git a/arm/run b/arm/run > > @@ -75,10 +75,14 @@ chr_testdev+=' -device virtconsole,chardev=ctd -chardev > > testdev,id=ctd' > > M+=",accel=$ACCEL" > > command="$qemu $M -cpu $processor $chr_testdev" > > command+=" -display none -serial stdio -kernel" > > -echo $command "$@" > > + > > +if [ "$TIMEOUT" ]; then > > + timeout_cmd="timeout --foreground $TIMEOUT" > > (command="timeout --foreground $TIMEOUT $command" # to keep the clutter > down.) > > > +fi > > (We paste it three times, so I'd rather see this abstracted in a > "scripts/" library.) Sounds good > > > diff --git a/run_tests.sh b/run_tests.sh > > @@ -21,6 +21,7 @@ function run() > > +local timeout="${9:-$TIMEOUT}" > > diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh > > @@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then > > +if [ "$timeout" ]; then > > + timeout_cmd='timeout --foreground $timeout' > > Both would be nicer if they took the TIMEOUT variable as an override. Everything already takes TIMEOUT as an override, i.e. TIMEOUT=3 ./run_tests.sh and TIMEOUT=3 arm/run arm/test.flat will both already set a timeout for any test that didn't have a timeout set in unittests.cfg, or wasn't run with run()/unittests.cfg. Or, did you mean that you'd prefer TIMEOUT to override the timeout in unittests.cfg? That does make some sense, in the case the one in the config is longer than desired, however it also makes sense the way I have it now when the one in the config is shorter than TIMEOUT (the fallback default). I think I like it this way better. > We already don't do that for accel and the patch seems ok in other > regards, Hmm, for accel I see a need for a patch allowing us to do ACCEL=?? ./run_tests.sh as I already have for TIMEOUT. Also, for both I should add a mkstandalone patch allowing TIMEOUT=? ACCEL=? make standalone Thanks, drew > > Reviewed-by: Radim Krčmář > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-unit-tests PATCH 1/3] run_tests.sh: reduce return code ambiguity
On Mon, Dec 21, 2015 at 05:31:24PM +0100, Radim Krčmář wrote: > 2015-12-17 14:10-0600, Andrew Jones: > > qemu/unittest exit codes are convoluted, causing codes 0 and 1 > > to be ambiguous. Here are the possible meanings > > > > .-. > > || 0 | 1 | > > |-| > > | QEMU | did something successfully, | FAILURE| > > || but probably didn't run the || > > || unittest, OR caught SIGINT, || > > || SIGHUP, or SIGTERM|| > > |-| > > | unittest | for some reason exited using | SUCCESS| > > || ACPI/PSCI, not with debug-exit|| > > .-. > > > > As we can see above, an exit code of zero is even ambiguous for each > > row, i.e. QEMU could exit with zero because it successfully completed, > > or because it caught a signal. unittest could exit with zero because > > it successfully powered-off, or because for some odd reason it powered- > > off instead of calling debug-exit. > > > > And, the most fun is that exit-code == 1 means QEMU failed, but the > > unittest succeeded. > > > > This patch attempts to reduce that ambiguity, by also looking at stderr. > > Nice. > > > With it, we have > > > > 0 - unexpected exit from qemu, or the unittest not using debug-exit. > > Consider it a FAILURE > > 1 - unittest SUCCESS > > < 128 - something failed (could be the unittest, qemu, or a run script) > > Check the logs. > > >= 128 - signal (signum = code - 128) > > I think this heuristic should be applied to {arm,x86}/run. > run_tests.sh would inherit it and we would finally get reasonable exit > values everywhere. Good idea. We can add the table to a scripts/functions.bash function and use it everywhere. > > The resulting table would look like this: > > 0 = unit-test passed > 77= unit-test skipped (not implemented yet) > 124 = unit-test timeouted (implemented in [3/3]) > 127 = qemu returned 0 (debug-exit probably wasn't called) We already use 127 for abort(), called from a unit test, see lib/abort.c. I guess we can use 126 for "debug-exit probably wasn't called". We should also add a (unit test called abort) message for 127. > > 128 = exited because of signal $? - 128 > * = unit-test failed > > (Signal 0 is not used, so we could map 128 to mean "debug-exit probably > wasn't called", but others might not understand our signal convention. I think we want 128 to be the beginning of signal space, which goes all the way up to 255, in order to allow exit code masking to work. > Anyway, it'd be best for us to start at 200, for `case $? in 2??)` ...) Start what at 200? I think we have everything covered above. The mapping looks like this 0 = success 1-63= unit test failure code 64-127 = test suite failure code 128-255 = signal which sounds good to me. > > > Signed-off-by: Andrew Jones > > --- > > diff --git a/run_tests.sh b/run_tests.sh > > @@ -54,10 +55,32 @@ function run() > > > > # extra_params in the config file may contain backticks that need to be > > # expanded, so use eval to start qemu > > -eval $cmdline >> test.log > > +errlog=$(mktemp) > > +eval $cmdline >> test.log 2> $errlog > | [...] > | cat $errlog >> test.log > > This assumes that stderr is always after stdout, True. I'm not sure that matters when the unit test, which only uses stdout will always output stuff serially with qemu, which could output a mix. But your version below is fine by me if we want to pick up the need for the pipe and tee. > > eval $cmdline 2>&1 >> test.log | tee $errlog >> test.log > > has a chance to print lines in wrong order too, but I think it's going > to be closer to the original. I'll play with it and send a v2 soon. Thanks, drew > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch
On 12/18/2015 11:45 PM, Christoffer Dall wrote: > On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote: >> On 12/18/2015 5:54 AM, Christoffer Dall wrote: >>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote: This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 register. On vcpu_load for 32 bit guests enable FP access, and enable fp/simd trapping for 32 and 64 bit guests. On first fp/simd access trap to handler to save host and restore guest context, and clear trapping bits to enable vcpu lazy mode. On vcpu_put if trap bits are clear save guest and restore host context and also save 32 bit guest fpexc register. Signed-off-by: Mario Smarduch --- arch/arm/include/asm/kvm_emulate.h | 5 ++ arch/arm/include/asm/kvm_host.h | 2 + arch/arm/kvm/arm.c | 20 +-- arch/arm64/include/asm/kvm_asm.h | 2 + arch/arm64/include/asm/kvm_emulate.h | 15 +++-- arch/arm64/include/asm/kvm_host.h| 16 +- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/Makefile | 3 +- arch/arm64/kvm/fpsimd_switch.S | 38 arch/arm64/kvm/hyp.S | 108 +-- arch/arm64/kvm/hyp_head.S| 48 11 files changed, 181 insertions(+), 77 deletions(-) create mode 100644 arch/arm64/kvm/fpsimd_switch.S create mode 100644 arch/arm64/kvm/hyp_head.S diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 3de11a2..13feed5 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -243,6 +243,11 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu, } } +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu) +{ + return true; +} + #ifdef CONFIG_VFPv3 /* Called from vcpu_load - save fpexc and enable guest access to fp/simd unit */ static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index ecc883a..720ae51 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -227,6 +227,8 @@ int kvm_perf_teardown(void); void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); + +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {} void kvm_restore_host_vfp_state(struct kvm_vcpu *); static inline void kvm_arch_hardware_disable(void) {} diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 1de07ab..dd59f8a 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_arm_set_running_vcpu(vcpu); - /* Save and enable FPEXC before we load guest context */ - kvm_enable_vcpu_fpexc(vcpu); + /* + * For 32bit guest executing on arm64, enable fp/simd access in + * EL2. On arm32 save host fpexc and then enable fp/simd access. + */ + if (kvm_guest_vcpu_is_32bit(vcpu)) + kvm_enable_vcpu_fpexc(vcpu); /* reset hyp cptr register to trap on tracing and vfp/simd access*/ vcpu_reset_cptr(vcpu); @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) { /* If the fp/simd registers are dirty save guest, restore host. */ - if (kvm_vcpu_vfp_isdirty(vcpu)) + if (kvm_vcpu_vfp_isdirty(vcpu)) { kvm_restore_host_vfp_state(vcpu); - /* Restore host FPEXC trashed in vcpu_load */ + /* + * For 32bit guest on arm64 save the guest fpexc register + * in EL2 mode. + */ + if (kvm_guest_vcpu_is_32bit(vcpu)) + kvm_save_guest_vcpu_fpexc(vcpu); + } + + /* For arm32 restore host FPEXC trashed in vcpu_load. */ kvm_restore_host_fpexc(vcpu); /* diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 5e37710..d53d069 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[]; extern void __kvm_flush_vm_context(void); extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa); extern void __kvm_tlb_flush_vmid(struct kvm *kvm); +extern void __kvm_vcpu_enable_fpexc32(void); +extern void __kvm_vcpu_save_fpexc32(struct kvm_vcpu *v
Re: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.
On Fri, Dec 18, 2015 at 1:25 PM, Paolo Bonzini wrote: > > > On 18/08/2015 20:46, Peter Hornyack wrote: >> Define KVM_EXIT_MSR, a new exit reason for accesses to MSRs that kvm >> does not handle. Define KVM_CAP_UNHANDLED_MSR_EXITS, a vm-wide >> capability that guards the new exit reason and which can be enabled via >> the KVM_ENABLE_CAP ioctl. >> >> Signed-off-by: Peter Hornyack >> --- >> Documentation/virtual/kvm/api.txt | 48 >> +++ >> include/uapi/linux/kvm.h | 14 >> 2 files changed, 62 insertions(+) > > Let's instead add: > > - KVM_CAP_MSR_EXITS > > - KVM_CAP_ENABLE_MSR_EXIT > > - KVM_CAP_DISABLE_MSR_EXIT > > and 3 kinds of exits: KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, > KVM_EXIT_MSR_AFTER_WRITE. The first two use four fields (type, msr, > data, handled) and #GP if handled=0 is zero on the next entry. The last > one uses the first three fields and can be used for HyperV. > > Paolo Paolo, I've included an updated version of this patch below. Does this match the API that you had in mind? If so, I'll continue adjusting the rest of the code and will send the entire new patchset. This updated version of the API seems more cumbersome to me (three new capabilities, three exit reasons when just one or two might suffice) than the original interface I used, but maybe you have a good reason for that. Also, will it be ok to have just one capability to enable all three kinds of exits, or will KVM_EXIT_MSR_AFTER_WRITE want a separate capability? commit a684d520ed62cf0db4495e5197d5bf722e4f8109 Author: Peter Hornyack Date: Fri Dec 18 14:44:04 2015 -0800 KVM: add capabilities and exit reasons for MSRs. Define KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, and KVM_EXIT_MSR_AFTER_WRITE, new exit reasons for accesses to MSRs that kvm does not handle or that user space needs to be notified about. Define the KVM_CAP_MSR_EXITS, KVM_CAP_ENABLE_MSR_EXITS, and KVM_CAP_DISABLE_MSR_EXITS capabilities to control these new exits for a VM. diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 053f613fc9a9..3bba3248df3d 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3359,6 +3359,43 @@ Hyper-V SynIC state change. Notification is used to remap SynIC event/message pages and to enable/disable SynIC messages/events processing in userspace. + /* + * KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, + * KVM_EXIT_MSR_AFTER_WRITE + */ + struct { + __u32 index;/* i.e. ecx; out */ + __u64 data; /* out (wrmsr) / in (rdmsr) */ +#define KVM_EXIT_MSR_COMPLETION_FAILED 1 + __u64 type; /* out */ +#define KVM_EXIT_MSR_UNHANDLED 0 +#define KVM_EXIT_MSR_HANDLED 1 + __u8 handled; /* in */ + } msr; + +If exit_reason is KVM_EXIT_MSR_READ or KVM_EXIT_MSR_WRITE, then the vcpu has +executed a rdmsr or wrmsr instruction which could not be satisfied by kvm. The +msr struct is used for both output to and input from user space. index is the +target MSR number held in ecx; user space must not modify this field. data +holds the payload from a wrmsr or must be filled in with a payload on a rdmsr. +For a normal exit, type will be 0. + +On the return path into kvm, user space should set handled to +KVM_EXIT_MSR_HANDLED if it successfully handled the MSR access. Otherwise, +handled should be set to KVM_EXIT_MSR_UNHANDLED, which will cause a general +protection fault to be injected into the vcpu. If an error occurs during the +return into kvm, the vcpu will not be run and another exit will be generated +with type set to KVM_EXIT_MSR_COMPLETION_FAILED. + +If exit_reason is KVM_EXIT_MSR_AFTER_WRITE, then the vcpu has executed a wrmsr +instruction which is handled by kvm but which user space may need to be +notified about. index and data are set as described above; the value of type +depends on the MSR that was written. handled is ignored on reentry into kvm. + +KVM_EXIT_MSR_READ, KVM_EXIT_MSR_WRITE, and KVM_EXIT_MSR_AFTER_WRITE can only +occur when KVM_CAP_MSR_EXITS is present and KVM_CAP_ENABLE_MSR_EXITS has been +set. A detailed description of these capabilities is below. + /* Fix the size of the union. */ char padding[256]; }; @@ -3697,6 +3734,26 @@ a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace. Fails if VCPU has already been created, or if the irqchip is already in the kernel (i.e. KVM_CREATE_IRQCHIP has already been called). +7.6 KVM_CAP_ENABLE_MSR_EXITS, KVM_CAP_DISABLE_MSR_EXITS + +Architectures: x86 (vmx-only) +Parameters: none +Returns: 0 on success, -1 on error + +These capabilities enable and disable exits to user space for certain guest MSR +accesses. These capabilities are only available if KVM_CHECK_EXTENSION +indicates that KVM_CAP_MSR_EXITS is present. + +When enabled, kvm will exit to user space when the guest reads +an MSR that kvm does not handle (KVM_EXIT_MSR_READ), writes an MSR that kvm +does not handle (KVM_EXIT_MSR_WRITE), or writes an MSR that kvm hand
Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote: > When the gurb of OS is booting, then the softirq and C function send_disk_op() > may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: > irqentry_extrastack > is invoked, and the extra stack will be used again. And the stack of first > calling > will be broken, so that the SeaBIOS stuck. > > You can easily reproduce the problem. > > 1. start on guest > 2. reset the guest > 3. inject a NMI when the guest show the grub surface > 4. then the guest stuck Does the SeaBIOS patch below help? I'm not familiar with how to "inject a NMI" - can you describe the process in more detail? -Kevin --- a/src/romlayout.S +++ b/src/romlayout.S @@ -548,7 +548,9 @@ entry_post: ENTRY_INTO32 _cfunc32flat_handle_post // Normal entry point ORG 0xe2c3 -IRQ_ENTRY 02 +.global entry_02 +entry_02: +ENTRY handle_02 // NMI handler does not switch onto extra stack ORG 0xe3fe .global entry_13_official -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-unit-tests PATCH 3/3] add timeout support
2015-12-17 14:10-0600, Andrew Jones: > Signed-off-by: Andrew Jones > --- > diff --git a/arm/run b/arm/run > @@ -75,10 +75,14 @@ chr_testdev+=' -device virtconsole,chardev=ctd -chardev > testdev,id=ctd' > M+=",accel=$ACCEL" > command="$qemu $M -cpu $processor $chr_testdev" > command+=" -display none -serial stdio -kernel" > -echo $command "$@" > + > +if [ "$TIMEOUT" ]; then > + timeout_cmd="timeout --foreground $TIMEOUT" (command="timeout --foreground $TIMEOUT $command" # to keep the clutter down.) > +fi (We paste it three times, so I'd rather see this abstracted in a "scripts/" library.) > diff --git a/run_tests.sh b/run_tests.sh > @@ -21,6 +21,7 @@ function run() > +local timeout="${9:-$TIMEOUT}" > diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh > @@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then > +if [ "$timeout" ]; then > + timeout_cmd='timeout --foreground $timeout' Both would be nicer if they took the TIMEOUT variable as an override. We already don't do that for accel and the patch seems ok in other regards, Reviewed-by: Radim Krčmář -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-unit-tests PATCH 1/3] run_tests.sh: reduce return code ambiguity
2015-12-17 14:10-0600, Andrew Jones: > qemu/unittest exit codes are convoluted, causing codes 0 and 1 > to be ambiguous. Here are the possible meanings > > .-. > || 0 | 1 | > |-| > | QEMU | did something successfully, | FAILURE| > || but probably didn't run the || > || unittest, OR caught SIGINT, || > || SIGHUP, or SIGTERM|| > |-| > | unittest | for some reason exited using | SUCCESS| > || ACPI/PSCI, not with debug-exit|| > .-. > > As we can see above, an exit code of zero is even ambiguous for each > row, i.e. QEMU could exit with zero because it successfully completed, > or because it caught a signal. unittest could exit with zero because > it successfully powered-off, or because for some odd reason it powered- > off instead of calling debug-exit. > > And, the most fun is that exit-code == 1 means QEMU failed, but the > unittest succeeded. > > This patch attempts to reduce that ambiguity, by also looking at stderr. Nice. > With it, we have > > 0 - unexpected exit from qemu, or the unittest not using debug-exit. > Consider it a FAILURE > 1 - unittest SUCCESS > < 128 - something failed (could be the unittest, qemu, or a run script) > Check the logs. > >= 128 - signal (signum = code - 128) I think this heuristic should be applied to {arm,x86}/run. run_tests.sh would inherit it and we would finally get reasonable exit values everywhere. The resulting table would look like this: 0 = unit-test passed 77= unit-test skipped (not implemented yet) 124 = unit-test timeouted (implemented in [3/3]) 127 = qemu returned 0 (debug-exit probably wasn't called) > 128 = exited because of signal $? - 128 * = unit-test failed (Signal 0 is not used, so we could map 128 to mean "debug-exit probably wasn't called", but others might not understand our signal convention. Anyway, it'd be best for us to start at 200, for `case $? in 2??)` ...) > Signed-off-by: Andrew Jones > --- > diff --git a/run_tests.sh b/run_tests.sh > @@ -54,10 +55,32 @@ function run() > > # extra_params in the config file may contain backticks that need to be > # expanded, so use eval to start qemu > -eval $cmdline >> test.log > +errlog=$(mktemp) > +eval $cmdline >> test.log 2> $errlog | [...] | cat $errlog >> test.log This assumes that stderr is always after stdout, eval $cmdline 2>&1 >> test.log | tee $errlog >> test.log has a chance to print lines in wrong order too, but I think it's going to be closer to the original. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit
Hello! > > It depends. Can i read about these hypercalls somewhere? Is there any > > documentation? > I don't know about a documentation, but you can look at the code of > Hyper-V hypercall handling inside KVM: > > https://github.com/torvalds/linux/blob/master/arch/x86/kvm/hyperv.c#L346 Aha, i see, so vmmcall CPU instruction is employed. Well, i believe this very well fits into the sematics of KVM_EXIT_HYPERCALL, because it's a true hypercall. > The code simply decodes hypercall parameters from vcpu registers then > handle hypercall code in switch and encode return code inside vcpu > registers. Probably encode and decode of hypercall parameters/return > code can be done in QEMU so we need only some exit with parameter that > this is Hyper-V hypercall and probably KVM_EXIT_HYPERCALL is good for it. Or you could even reuse the whole structure, it has all you need: __u64 nr; /* Reserved for x86, other architectures can use it, for example ARM "hvc #nr" */ __u64 args[6]; /* rax, rbx, rcx, rdx, rdi, rsi */ __u64 ret; __u32 longmode; /* longmode; other architectures (like ARM64) can also make sense of it */ Or you could put in struct kvm_regs instead of args and ret, and allow the userspace to manipulate it. > But KVM_EXIT_HYPERCALL is not used inside KVM/QEMU so requires > implementation. I guess your hypercalls to be introduced using KVM_EXIT_HYPERV are also not used inside qemu so require implementation :) Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 0/5] implement vNVDIMM
On 12/10/2015 11:11 AM, Xiao Guangrong wrote: New version, new week, and unfortunate new ping... :( Ping again to see what happened... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit
On 12/21/2015 04:28 PM, Pavel Fedin wrote: Hello! Yes, we can use KVM_EXIT_REG_IO/MSR_IO for Hyper-V SynIC MSRS's changes and can even use only one MSR value . So union inside struct kvm_hyperv_exit is excessive. But we still need Vcpu exit to handle VMBus hypercalls by QEMU to emulate VMBus devices inside QEMU. And currently we are going to extend struct kvm_hyperv_exit to store Hyper-V VMBus hypercall parameters. Hm... Hypercalls, you say? We already have KVM_EXIT_HYPERCALL. Documentation says it's currently unused. Is it a leftover from ia64 KVM? Could we reuse it for the purpose? but could we replace Hyper-V VMBus hypercall and it's parameters by KVM_EXIT_REG_IO/MSR_IO too? It depends. Can i read about these hypercalls somewhere? Is there any documentation? I don't know about a documentation, but you can look at the code of Hyper-V hypercall handling inside KVM: https://github.com/torvalds/linux/blob/master/arch/x86/kvm/hyperv.c#L346 The code simply decodes hypercall parameters from vcpu registers then handle hypercall code in switch and encode return code inside vcpu registers. Probably encode and decode of hypercall parameters/return code can be done in QEMU so we need only some exit with parameter that this is Hyper-V hypercall and probably KVM_EXIT_HYPERCALL is good for it. But KVM_EXIT_HYPERCALL is not used inside KVM/QEMU so requires implementation. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit
Hello! > Yes, we can use KVM_EXIT_REG_IO/MSR_IO for Hyper-V SynIC MSRS's changes > and can even use only one MSR value . So union inside struct > kvm_hyperv_exit is excessive. > > But we still need Vcpu exit to handle VMBus hypercalls by QEMU to > emulate VMBus devices inside QEMU. > > And currently we are going to extend struct kvm_hyperv_exit > to store Hyper-V VMBus hypercall parameters. Hm... Hypercalls, you say? We already have KVM_EXIT_HYPERCALL. Documentation says it's currently unused. Is it a leftover from ia64 KVM? Could we reuse it for the purpose? > but could we replace Hyper-V VMBus hypercall and it's parameters > by KVM_EXIT_REG_IO/MSR_IO too? It depends. Can i read about these hypercalls somewhere? Is there any documentation? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit
On 12/18/2015 09:39 PM, Roman Kagan wrote: On Fri, Dec 18, 2015 at 10:10:11AM -0800, Peter Hornyack wrote: On Fri, Dec 18, 2015 at 8:01 AM, Paolo Bonzini wrote: On 18/12/2015 16:19, Pavel Fedin wrote: As far as i understand this code, KVM_EXIT_HYPERV is called when one of three MSRs are accessed. But, shouldn't we have implemented instead something more generic, like KVM_EXIT_REG_IO, which would work similar to KVM_EXIT_PIO or KVM_EXIT_MMIO, but carry register code and value? Yes, we considered that. There were actually patches for this as well. However, in this case the register is still emulated in the kernel, and userspace just gets informed of the new value. On brief inspection of Andrey's patch (I have not been following closely) it looks like the kvm_hyperv_exit struct that's returned to userspace contains more data (control, evt_page, and msg_page fields) than simply the value of the MSR, so would the desired SynIC exit fit into a general-purpose exit for MSR emulation? Frankly I'm struggling trying to recall why we implemented it this way. Actually all three fields are the values of respective MSRs and I don't see any necessity to pass all three at the same time when any of them gets updated. The patch for QEMU adds an exit handler which processes the fields individually, so I have a strong suspicion that union was meant here rather than struct. I hope Andrey will help to shed some light on that when he's back in the office on Monday; meanwhile I think this peculiarity can be ignored. Hello! We have implemented Hyper-V related Vcpu exit not only for Hyper-V SynIC MSR's changes but also to provide future interface to transfer guest VMBus hypercalls parameters into QEMU. Yes, we can use KVM_EXIT_REG_IO/MSR_IO for Hyper-V SynIC MSRS's changes and can even use only one MSR value . So union inside struct kvm_hyperv_exit is excessive. But we still need Vcpu exit to handle VMBus hypercalls by QEMU to emulate VMBus devices inside QEMU. And currently we are going to extend struct kvm_hyperv_exit to store Hyper-V VMBus hypercall parameters. SynIC MSR's changes could be replaced by KVM_EXIT_REG_IO/MSR_IO but could we replace Hyper-V VMBus hypercall and it's parameters by KVM_EXIT_REG_IO/MSR_IO too? Roman. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvmclock doesn't work, help?
On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote: > [cc: John Stultz -- maybe you have ideas on how this should best > integrate with the core code] > > On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti wrote: > > On Fri, Dec 18, 2015 at 11:27:13AM -0800, Andy Lutomirski wrote: > >> On Fri, Dec 18, 2015 at 3:47 AM, Marcelo Tosatti > >> wrote: > >> > On Thu, Dec 17, 2015 at 05:12:59PM -0800, Andy Lutomirski wrote: > >> >> On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti > >> >> wrote: > >> >> > On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote: > >> >> >> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti > >> >> >> wrote: > >> >> >> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote: > >> >> >> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski > >> >> >> >> wrote: > >> >> >> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini > >> >> >> >> > wrote: > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote: > >> >> >> >> >>> > RAW TSC NTP corrected TSC > >> >> >> >> >>> > t0 10 10 > >> >> >> >> >>> > t1 20 19.99 > >> >> >> >> >>> > t2 30 29.98 > >> >> >> >> >>> > t3 40 39.97 > >> >> >> >> >>> > t4 50 49.96 > >> >> > > >> >> > (1) > >> >> > > >> >> >> >> >>> > > >> >> >> >> >>> > ... > >> >> >> >> >>> > > >> >> >> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC, > >> >> >> >> >>> > you can see what will happen. > >> >> >> >> >>> > >> >> >> >> >>> Sure, but why would you ever switch from one to the other? > >> >> >> >> >> > >> >> >> >> >> The guest uses the raw TSC and systemtime = 0 until suspend. > >> >> >> >> >> After > >> >> >> >> >> resume, the TSC certainly increases at the same rate as > >> >> >> >> >> before, but the > >> >> >> >> >> raw TSC restarted counting from 0 and systemtime has increased > >> >> >> >> >> slower > >> >> >> >> >> than the guest kvmclock. > >> >> >> >> > > >> >> >> >> > Wait, are we talking about the host's NTP or the guest's NTP? > >> >> >> >> > > >> >> >> >> > If it's the host's, then wouldn't systemtime be reset after > >> >> >> >> > resume to > >> >> >> >> > the NTP corrected value? If so, the guest wouldn't see time go > >> >> >> >> > backwards. > >> >> >> >> > > >> >> >> >> > If it's the guest's, then the guest's NTP correction is applied > >> >> >> >> > on top > >> >> >> >> > of kvmclock, and this shouldn't matter. > >> >> >> >> > > >> >> >> >> > I still feel like I'm missing something very basic here. > >> >> >> >> > > >> >> >> >> > >> >> >> >> OK, I think I get it. > >> >> >> >> > >> >> >> >> Marcelo, I thought that kvmclock was supposed to propagate the > >> >> >> >> host's > >> >> >> >> correction to the guest. If it did, indeed, propagate the > >> >> >> >> correction > >> >> >> >> then, after resume, the host's new system_time would match the > >> >> >> >> guest's > >> >> >> >> idea of it (after accounting for the guest's long nap), and I > >> >> >> >> don't > >> >> >> >> think there would be a problem. > >> >> >> >> That being said, I can't find the code in the masterclock stuff > >> >> >> >> that > >> >> >> >> would actually do this. > >> >> >> > > >> >> >> > Guest clock is maintained by guest timekeeping code, which does: > >> >> >> > > >> >> >> > timer_interrupt() > >> >> >> > offset = read clocksource since last timer interrupt > >> >> >> > accumulate_to_systemclock(offset) > >> >> >> > > >> >> >> > The frequency correction of NTP in the host can be applied to > >> >> >> > kvmclock, which will be visible to the guest > >> >> >> > at "read clocksource since last timer interrupt" > >> >> >> > (kvmclock_clocksource_read function). > >> >> >> > >> >> >> pvclock_clocksource_read? That seems to do the same thing as all the > >> >> >> other clocksource access functions. > >> >> >> > >> >> >> > > >> >> >> > This does not mean that the NTP correction in the host is > >> >> >> > propagated > >> >> >> > to the guests system clock directly. > >> >> >> > > >> >> >> > (For example, the guest can run NTP which is free to do further > >> >> >> > adjustments at "accumulate_to_systemclock(offset)" time). > >> >> >> > >> >> >> Of course. But I expected that, in the absence of NTP on the guest, > >> >> >> that the guest would track the host's *corrected* time. > >> >> >> > >> >> >> > > >> >> >> >> If, on the other hand, the host's NTP correction is not supposed > >> >> >> >> to > >> >> >> >> propagate to the guest, > >> >> >> > > >> >> >> > This is optional. There is a module option to control this, in > >> >> >> > fact. > >> >> >> > > >> >> >> > Its nice to have, because then you can execute a guest without NTP > >> >> >> > (say without network connection), and have a kvmclock (kvmclock is > >> >> >> > a > >> >> >> > clocksource, not a guest system clock) which is NTP corrected. > >> >>
RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
Dear Kevin, > -Original Message- > From: Kevin O'Connor [mailto:ke...@koconnor.net] > Sent: Sunday, December 20, 2015 10:33 PM > To: Gonglei (Arei) > Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org; > Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy > problem on qemu-kvm platform > > On Sun, Dec 20, 2015 at 09:49:54AM +, Gonglei (Arei) wrote: > > > From: Kevin O'Connor [mailto:ke...@koconnor.net] > > > Sent: Saturday, December 19, 2015 11:12 PM > > > On Sat, Dec 19, 2015 at 12:03:15PM +, Gonglei (Arei) wrote: > > > > Maybe the root cause is not NMI but INTR, so yield() can open hardware > > > interrupt, > > > > And then execute interrupt handler, but the interrupt handler make the > > > SeaBIOS > > > > stack broken, so that the BSP can't execute the instruction and occur > > > exception, > > > > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs > except > > > > the surface phenomenon. > > > > > > I can't see any reason why allowing interrupts at this location would > > > be a problem. > > > > > Does it have any relationship with *extra stack* of SeaBIOS? > > None that I can see. Also, the kvm trace seems to show the code > trying to execute at rip=0x03 - that will crash long before the extra > stack is used. > When the gurb of OS is booting, then the softirq and C function send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: irqentry_extrastack is invoked, and the extra stack will be used again. And the stack of first calling will be broken, so that the SeaBIOS stuck. You can easily reproduce the problem. 1. start on guest 2. reset the guest 3. inject a NMI when the guest show the grub surface 4. then the guest stuck If we disabled extra stack by setting CONFIG_ENTRY_EXTRASTACK=n Then the problem is gone. Besides, I have another thought: Is it possible when one cpu is using the extra stack, but other cpus (APs) still be waked up by hardware interrupt after yield() or br->flags = F_IF and used the extra stack again? Regards, -Gonglei > > > > Kevin, can we drop yield() in smp_setup() ? > > > > > > It's possible to eliminate this instance of yield, but I think it > > > would just push the crash to the next time interrupts are enabled. > > > > > Perhaps. I'm not sure. > > > > > > Is it really useful and allowable for SeaBIOS? Maybe for other > components? > > > > I'm not sure. Because we found that when SeaBIOS is booting, if we > > > > inject > a > > > > NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same > with > > > > the current problem. > > > > > > If you apply the patches you had to prevent that NMI crash problem, > > > does it also prevent the above crash? > > > > > Yes, but we cannot prevent the NMI injection (though I'll submit some > patches to > > forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70). > > > > -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html