Re: [question] incremental backup a running vm
On 2015-01-26 19:29:03, Paolo Bonzini wrote: On 26/01/2015 12:13, Zhang Haoyu wrote: Thanks, Paolo, but too many internal snapshots were saved by customers, switching to external snapshot mechanism has significant impaction on subsequent upgrade. In that case, patches are welcome. :) Another problem: drive_backup just implement one time backup, but I want VMWare's VDP-like backup mechanism. The initial backup of a virtual machine takes comparatively more time, because all of the data for that virtual machine is being backed up. Subsequent backups of the same virtual machine take less time, because changed block tracking (log dirty) mechanism is used to only backup the dirty data. After inittial backup done, even the VM shutdown, but subsequent backup also only copy the changed data. As mentioned before, patches for this are on the list. I see, thanks, Paolo. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost-scsi: introduce an ioctl to get the minimum tpgt
From: Gonglei arei.gong...@huawei.com In order to support to assign a boot order for vhost-scsi device, we should get the tpgt for user level (such as Qemu). and at present, we only support the minimum tpgt can boot. Signed-off-by: Gonglei arei.gong...@huawei.com Signed-off-by: Bo Su su...@huawei.com --- drivers/vhost/scsi.c | 41 + include/uapi/linux/vhost.h | 2 ++ 2 files changed, 43 insertions(+) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index d695b16..12e79b9 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1522,6 +1522,38 @@ err_dev: return ret; } +static int vhost_scsi_get_first_tpgt( + struct vhost_scsi *vs, + struct vhost_scsi_target *t) +{ + struct tcm_vhost_tpg *tv_tpg; + struct tcm_vhost_tport *tv_tport; + int tpgt = -1; + + mutex_lock(tcm_vhost_mutex); + mutex_lock(vs-dev.mutex); + + list_for_each_entry(tv_tpg, tcm_vhost_list, tv_tpg_list) { + tv_tport = tv_tpg-tport; + + if (!strcmp(tv_tport-tport_name, t-vhost_wwpn)) { + if (tpgt 0) + tpgt = tv_tpg-tport_tpgt; + else if (tpgt tv_tpg-tport_tpgt) + tpgt = tv_tpg-tport_tpgt; + } + } + + mutex_unlock(vs-dev.mutex); + mutex_unlock(tcm_vhost_mutex); + + if (tpgt 0) + return -ENXIO; + + t-vhost_tpgt = tpgt; + return 0; +} + static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features) { struct vhost_virtqueue *vq; @@ -1657,6 +1689,15 @@ vhost_scsi_ioctl(struct file *f, if (put_user(events_missed, eventsp)) return -EFAULT; return 0; + case VHOST_SCSI_GET_TPGT: + if (copy_from_user(backend, argp, sizeof(backend))) + return -EFAULT; + r = vhost_scsi_get_first_tpgt(vs, backend); + if (r 0) + return r; + if (copy_to_user(argp, backend, sizeof(backend))) + return -EFAULT; + return 0; case VHOST_GET_FEATURES: features = VHOST_SCSI_FEATURES; if (copy_to_user(featurep, features, sizeof features)) diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index bb6a5b4..5d350f7 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -155,4 +155,6 @@ struct vhost_scsi_target { #define VHOST_SCSI_SET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x43, __u32) #define VHOST_SCSI_GET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x44, __u32) +#define VHOST_SCSI_GET_TPGT _IOW(VHOST_VIRTIO, 0x45, struct vhost_scsi_target) + #endif -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: the number of PCI pass-through devices limit?
On Mon, 2015-01-26 at 16:46 +0800, Xuekun Hu wrote: Hi, All Is there a limit for number of PCI pass-through devices in KVM? For Legacy PCI device assignement or VFIO pass-through method? There's no enforced limit, but the usable limit is related to the number of KVM memory slots available. Each PCI BAR uses a memory slot (sometimes two). If memory slots are exhausted, the the VM aborts. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes
On Wed, Jan 21, 2015 at 06:39:45PM +, Marc Zyngier wrote: This small series fixes a number of issues that Christoffer and I have been trying to nail down for a while, having to do with the host dying under load (swapping), and also with the way we deal with caches in general (and with set/way operation in particular): - The first one changes the way we handle cache ops by set/way, basically turning them into VA ops for the whole memory. This allows platforms with system caches to boot a 32bit zImage, for example. - The second one fixes a corner case that could happen if the guest used an uncached mapping (or had its caches off) while the host was swapping it out (and using a cache-coherent IO subsystem). - Finally, the last one fixes this stability issue when the host was swapping, by using a kernel mapping for cache maintenance instead of the userspace one. With these patches (and both the TLB invalidation and HCR fixes that are on their way to mainline), the APM platform seems much more robust than it previously was. Fingers crossed. The first round of review has generated a lot of traffic about ASID-tagged icache management for guests, but I've decided not to address this issue as part of this series. The code is broken already, and there isn't any virtualization capable, ASID-tagged icache core in the wild, AFAIK. I'll try to revisit this in another series, once I have wrapped my head around it (or someone beats me to it). Based on 3.19-rc5, tested on Juno, X-Gene, TC-2 and Cubietruck. Also at git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/mm-fixes-3.19 * From v2: [2] - Reworked the algorithm that tracks the state of the guest's caches, as there is some cases I didn't anticipate. In the end, the algorithm is simpler. * From v1: [1] - Dropped Steve's patch after discussion with Andrea - Refactor set/way support to avoid code duplication, better comments - Much improved comments in patch #2, courtesy of Christoffer [1]: http://www.spinics.net/lists/kvm-arm/msg13008.html [2]: http://www.spinics.net/lists/kvm-arm/msg13161.html Marc Zyngier (3): arm/arm64: KVM: Use set/way op trapping to track the state of the caches arm/arm64: KVM: Invalidate data cache on unmap arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault arch/arm/include/asm/kvm_emulate.h | 10 +++ arch/arm/include/asm/kvm_host.h | 3 - arch/arm/include/asm/kvm_mmu.h | 77 +--- arch/arm/kvm/arm.c | 10 --- arch/arm/kvm/coproc.c| 64 +++--- arch/arm/kvm/coproc_a15.c| 2 +- arch/arm/kvm/coproc_a7.c | 2 +- arch/arm/kvm/mmu.c | 164 ++- arch/arm/kvm/trace.h | 39 + arch/arm64/include/asm/kvm_emulate.h | 10 +++ arch/arm64/include/asm/kvm_host.h| 3 - arch/arm64/include/asm/kvm_mmu.h | 34 ++-- arch/arm64/kvm/sys_regs.c| 75 +++- 13 files changed, 321 insertions(+), 172 deletions(-) -- 2.1.4 ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm Hi Marc, checkpatch found some whitespace issues (not just the false alarms that trace.h files generate). Also a loosing vs. losing typo in 2/3's commit message. Thanks, Drew (trivial comments) Jones -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
vhost-scsi support for ANY_LAYOUT
Hi MST Paolo, So I'm currently working on vhost-scsi support for ANY_LAYOUT, and wanted to verify some assumptions based upon your earlier emails.. *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers will (always..?) be within a single iovec. *) When ANY_LAYOUT is negotiated by vhost-scsi, it's expected that virtio-scsi request + response headers may be (but not always..?) combined with data-out + data-in payloads into a single iovec. *) When ANY_LAYOUT + T10_PI is negotiated by vhost-scsi, it's expected that PI and data payloads for data-out + data-in may be (but not always..?) within the same iovec. Consequently, both headers + PI + data-payloads may also be within a single iovec. *) Is it still safe to use 'out' + 'in' values from vhost_get_vq_desc() in order to determine the data_direction...? If not, what's the preferred way of determining this information for get_user_pages_fast() permission bits and target_submit_cmd_map_sgls()..? Also, what is required on the QEMU side in order to start generating ANY_LAYOUT style iovecs to verify the WIP changes..? I see hw/scsi/virtio-scsi.c has been converted to accept any_layout=1, but AFAICT the changes where only related to code not shared between hw/scsi/vhost-scsi.c. Thank you, --nab -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 2/2] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier
We switch to unlock variant with memory barriers in the error path and also in code path where we had implicit dependency on previous functions calling lwsync/ptesync. In most of the cases we don't really need an explicit barrier, but using the variant make sure we don't make mistakes later with code movements. We also document why a non-barrier variant is ok in performance critical path. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Rebase to latest upstream arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 551dabb9551b..0fd91f54d1a7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -639,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -767,8 +767,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, note_hpte_modification(kvm, rev[i]); } } + unlock_hpte(hptep, be64_to_cpu(hptep[0])); unlock_rmap(rmapp); - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -854,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -971,7 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -994,7 +994,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - __unlock_hpte(hptep, v); + unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 9123132b3053..2e45bd57d4e8 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -268,6 +268,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, pte = be64_to_cpu(hpte[0]); if (!(pte (HPTE_V_VALID | HPTE_V_ABSENT))) break; + /* +* Data dependency will avoid re-ordering +*/ __unlock_hpte(hpte, pte); hpte += 2; } @@ -286,7 +289,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, cpu_relax(); pte = be64_to_cpu(hpte[0]); if (pte (HPTE_V_VALID | HPTE_V_ABSENT)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_PTEG_FULL; } } @@ -406,7 +409,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn) || ((flags H_ANDCOND) (pte avpn) != 0)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -542,7 +545,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) be64_to_cpu(hp[0]), be64_to_cpu(hp[1])); rcbits = rev-guest_rpte (HPTE_R_R|HPTE_R_C); args[j] |= rcbits (56 - 5); - __unlock_hpte(hp, 0); + unlock_hpte(hp, 0); } } @@ -568,7 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, pte = be64_to_cpu(hpte[0]); if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -748,7 +751,9
[PATCH V2 2/2] KVM: PPC: BOOK3S: HV: Use unlock variant with memory barrier
We switch to unlock variant with memory barriers in the error path and also in code path where we had implicit dependency on previous functions calling lwsync/ptesync. In most of the cases we don't really need an explicit barrier, but using the variant make sure we don't make mistakes later with code movements. We also document why a non-barrier variant is ok in performance critical path. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Rebase to latest upstream arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 15 ++- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 551dabb9551b..0fd91f54d1a7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -639,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -767,8 +767,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, note_hpte_modification(kvm, rev[i]); } } + unlock_hpte(hptep, be64_to_cpu(hptep[0])); unlock_rmap(rmapp); - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -854,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -971,7 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - __unlock_hpte(hptep, be64_to_cpu(hptep[0])); + unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -994,7 +994,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) } v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - __unlock_hpte(hptep, v); + unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 9123132b3053..2e45bd57d4e8 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -268,6 +268,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, pte = be64_to_cpu(hpte[0]); if (!(pte (HPTE_V_VALID | HPTE_V_ABSENT))) break; + /* +* Data dependency will avoid re-ordering +*/ __unlock_hpte(hpte, pte); hpte += 2; } @@ -286,7 +289,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, cpu_relax(); pte = be64_to_cpu(hpte[0]); if (pte (HPTE_V_VALID | HPTE_V_ABSENT)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_PTEG_FULL; } } @@ -406,7 +409,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn) || ((flags H_ANDCOND) (pte avpn) != 0)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -542,7 +545,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu) be64_to_cpu(hp[0]), be64_to_cpu(hp[1])); rcbits = rev-guest_rpte (HPTE_R_R|HPTE_R_C); args[j] |= rcbits (56 - 5); - __unlock_hpte(hp, 0); + unlock_hpte(hp, 0); } } @@ -568,7 +571,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, pte = be64_to_cpu(hpte[0]); if ((pte (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 || ((flags H_AVPN) (pte ~0x7fUL) != avpn)) { - __unlock_hpte(hpte, pte); + unlock_hpte(hpte, pte); return H_NOT_FOUND; } @@ -748,7 +751,9
Re: [RFC v4 1/2] x86/xen: add xen_is_preemptible_hypercall()
On Thu, Jan 22, 2015 at 05:40:45PM -0800, Andy Lutomirski wrote: On Thu, Jan 22, 2015 at 4:29 PM, Luis R. Rodriguez mcg...@do-not-panic.com wrote: From: Luis R. Rodriguez mcg...@suse.com On kernels with voluntary or no preemption we can run into situations where a hypercall issued through userspace will linger around as it addresses sub-operatiosn in kernel context (multicalls). Such operations can trigger soft lockup detection. Looks reasonable. I'll add a reviewed-by... LUis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 1/2] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte
This patch adds helper routine for lock and unlock hpte and use the same for rest of the code. We don't change any locking rules in this patch. In the next patch we switch some of the unlock usage to use the api with barrier and also document the usage without barriers. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Rebase to latest upstream arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 3 files changed, 33 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2d81e202bdcc..0789a0f50969 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER : : : memory); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3c6c3d..551dabb9551b 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte-eaddr = eaddr; @@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev-guest_rpte; - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu-arch.pgfault_hpte[0] || @@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) @@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp, r = ~HPTE_GR_MODIFIED; revp-guest_rpte = r; } - asm volatile(PPC_RELEASE_BARRIER : : : memory); - hptp[0] =
[PATCH V2 1/2] KVM: PPC: BOOK3S: HV: Add helpers for lock/unlock hpte
This patch adds helper routine for lock and unlock hpte and use the same for rest of the code. We don't change any locking rules in this patch. In the next patch we switch some of the unlock usage to use the api with barrier and also document the usage without barriers. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- Changes from V1: * Rebase to latest upstream arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 25 ++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 25 + 3 files changed, 33 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2d81e202bdcc..0789a0f50969 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) return old == 0; } +static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + asm volatile(PPC_RELEASE_BARRIER : : : memory); + hpte[0] = cpu_to_be64(hpte_v); +} + +/* Without barrier */ +static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v) +{ + hpte_v = ~HPTE_V_HVLOCK; + hpte[0] = cpu_to_be64(hpte_v); +} + static inline int __hpte_actual_psize(unsigned int lp, int psize) { int i, shift; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 534acb3c6c3d..551dabb9551b 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; - /* Unlock the HPTE */ - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(v); + unlock_hpte(hptep, v); preempt_enable(); gpte-eaddr = eaddr; @@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hpte[0] = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; hpte[1] = be64_to_cpu(hptep[1]); hpte[2] = r = rev-guest_rpte; - asm volatile(lwsync : : : memory); - hptep[0] = cpu_to_be64(hpte[0]); + unlock_hpte(hptep, hpte[0]); preempt_enable(); if (hpte[0] != vcpu-arch.pgfault_hpte[0] || @@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, hptep[1] = cpu_to_be64(r); eieio(); - hptep[0] = cpu_to_be64(hpte[0]); + __unlock_hpte(hptep, hpte[0]); asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) @@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return ret; out_unlock: - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); preempt_enable(); goto out_put; } @@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, } } unlock_rmap(rmapp); - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } return 0; } @@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, } ret = 1; } - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); } while ((i = j) != head); unlock_rmap(rmapp); @@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) /* Now check and modify the HPTE */ if (!(hptep[0] cpu_to_be64(HPTE_V_VALID))) { - /* unlock and continue */ - hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK); + __unlock_hpte(hptep, be64_to_cpu(hptep[0])); continue; } @@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) npages_dirty = n; eieio(); } - v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK); + v = ~HPTE_V_ABSENT; v |= HPTE_V_VALID; - hptep[0] = cpu_to_be64(v); + __unlock_hpte(hptep, v); } while ((i = j) != head); unlock_rmap(rmapp); @@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp, r = ~HPTE_GR_MODIFIED; revp-guest_rpte = r; } - asm volatile(PPC_RELEASE_BARRIER : : : memory); - hptp[0] =
[PATCH v5 2/2] x86/xen: allow privcmd hypercalls to be preempted on 64-bit
From: Luis R. Rodriguez mcg...@suse.com Xen has support for splitting heavy work work into a series of hypercalls, called multicalls, and preempting them through what Xen calls continuation [0]. Despite this though without CONFIG_PREEMPT preemption won't happen, without preemption a system can become pretty useless on heavy handed hypercalls. Such is the case for example when creating a 50 GiB HVM guest, we can get softlockups [1] with:. kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351] The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check (default 120 seconds), on the Xen side in this particular case this happens when the following Xen hypervisor code is used: xc_domain_set_pod_target() -- do_memory_op() -- arch_memory_op() -- p2m_pod_set_mem_target() -- long delay (real or emulated) -- This happens on arch_memory_op() on the XENMEM_set_pod_target memory op even though arch_memory_op() can handle continuation via hypercall_create_continuation() for example. Machines over 50 GiB of memory are on high demand and hard to come by so to help replicate this sort of issue long delays on select hypercalls have been emulated in order to be able to test this on smaller machines [2]. On one hand this issue can be considered as expected given that CONFIG_PREEMPT=n is used however we have forced voluntary preemption precedent practices in the kernel even for CONFIG_PREEMPT=n through the usage of cond_resched() sprinkled in many places. To address this issue with Xen hypercalls though we need to find a way to aid to the schedular in the middle of hypercalls. We are motivated to address this issue on CONFIG_PREEMPT=n as otherwise the system becomes rather unresponsive for long periods of time; in the worst case, at least only currently by emulating long delays on select io disk bound hypercalls, this can lead to filesystem corruption if the delay happens for example on SCHEDOP_remote_shutdown (when we call 'xl domain shutdown'). We can address this problem by trying to check if we should schedule on the xen timer in the middle of a hypercall on the return from the timer interrupt. We want to be careful to not always force voluntary preemption though so to do this we only selectively enable preemption on very specific xen hypercalls. This enables hypercall preemption by selectively forcing checks for voluntary preempting only on ioctl initiated private hypercalls where we know some folks have run into reported issues [1]. [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9 [1] https://bugzilla.novell.com/show_bug.cgi?id=861093 [2] http://ftp.suse.com/pub/people/mcgrof/xen/emulate-long-xen-hypercalls.patch Based on original work by: David Vrabel david.vra...@citrix.com Suggested-by: Andy Lutomirski l...@amacapital.net Cc: Andy Lutomirski l...@amacapital.net Cc: Borislav Petkov b...@suse.de Cc: David Vrabel david.vra...@citrix.com Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: x...@kernel.org Cc: Steven Rostedt rost...@goodmis.org Cc: Masami Hiramatsu masami.hiramatsu...@hitachi.com Cc: Jan Beulich jbeul...@suse.com Cc: linux-ker...@vger.kernel.org Reviewed-by: Andy Lutomirski l...@amacapital.net Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- arch/x86/kernel/entry_64.S | 2 ++ drivers/xen/events/events_base.c | 14 ++ include/xen/events.h | 1 + 3 files changed, 17 insertions(+) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 9ebaf63..ee28733 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1198,6 +1198,8 @@ ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) popq %rsp CFI_DEF_CFA_REGISTER rsp decl PER_CPU_VAR(irq_count) + movq %rsp, %rdi /* pass pt_regs as first argument */ + call xen_end_upcall jmp error_exit CFI_ENDPROC END(xen_do_hypervisor_callback) diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index b4bca2d..bf207f2 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -32,6 +32,8 @@ #include linux/slab.h #include linux/irqnr.h #include linux/pci.h +#include linux/sched.h +#include linux/kprobes.h #ifdef CONFIG_X86 #include asm/desc.h @@ -1243,6 +1245,18 @@ void xen_evtchn_do_upcall(struct pt_regs *regs) set_irq_regs(old_regs); } +/* + * Some hypercalls issued by the toolstack can take many 10s of + * seconds. Allow tasks running hypercalls via the privcmd driver to be + * voluntarily preempted even if full kernel preemption is disabled. + */ +void xen_end_upcall(struct pt_regs *regs) +{ + if (xen_is_preemptible_hypercall(regs)) + _cond_resched(); +} +NOKPROBE_SYMBOL(xen_end_upcall); + void
[PATCH v5 0/2] x86/xen: add xen hypercall preemption
From: Luis R. Rodriguez mcg...@suse.com This v5 nukes tracing as David said it was useless, it also only adds support for 64-bit as its the only thing I can test, and slightly modifies the documentation in code as to why we want this. The no krobe thing is left in place as I haven't heard confirmation its kosher to remove it. Luis R. Rodriguez (2): x86/xen: add xen_is_preemptible_hypercall() x86/xen: allow privcmd hypercalls to be preempted on 64-bit arch/arm/include/asm/xen/hypercall.h | 5 + arch/x86/include/asm/xen/hypercall.h | 20 arch/x86/kernel/entry_64.S | 2 ++ arch/x86/xen/enlighten.c | 7 +++ arch/x86/xen/xen-head.S | 18 +- drivers/xen/events/events_base.c | 14 ++ include/xen/events.h | 1 + 7 files changed, 66 insertions(+), 1 deletion(-) -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/2] xen: add xen_is_preemptible_hypercall()
From: Luis R. Rodriguez mcg...@suse.com On kernels with voluntary or no preemption we can run into situations where a hypercall issued through userspace will linger around as it addresses sub-operatiosn in kernel context (multicalls). Such operations can trigger soft lockup detection. We want to address a way to let the kernel voluntarily preempt such calls even on non preempt kernels, to address this we first need to distinguish which hypercalls fall under this category. This implements xen_is_preemptible_hypercall() which lets us do just that by adding a secondary hypercall page, calls made via the new page may be preempted. This will only be used on x86 for now, on arm we just have a stub to always return false for now. Andrew had originally submitted a version of this work [0]. [0] http://lists.xen.org/archives/html/xen-devel/2014-02/msg01056.html Based on original work by: Andrew Cooper andrew.coop...@citrix.com Cc: Andy Lutomirski l...@amacapital.net Cc: Borislav Petkov b...@suse.de Cc: David Vrabel david.vra...@citrix.com Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: x...@kernel.org Cc: Steven Rostedt rost...@goodmis.org Cc: Masami Hiramatsu masami.hiramatsu...@hitachi.com Cc: Jan Beulich jbeul...@suse.com Cc: linux-ker...@vger.kernel.org Reviewed-by: Andy Lutomirski l...@amacapital.net Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- arch/arm/include/asm/xen/hypercall.h | 5 + arch/x86/include/asm/xen/hypercall.h | 20 arch/x86/xen/enlighten.c | 7 +++ arch/x86/xen/xen-head.S | 18 +- 4 files changed, 49 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/xen/hypercall.h b/arch/arm/include/asm/xen/hypercall.h index 712b50e..4fc8395 100644 --- a/arch/arm/include/asm/xen/hypercall.h +++ b/arch/arm/include/asm/xen/hypercall.h @@ -74,4 +74,9 @@ MULTI_mmu_update(struct multicall_entry *mcl, struct mmu_update *req, BUG(); } +static inline bool xen_is_preemptible_hypercall(struct pt_regs *regs) +{ + return false; +} + #endif /* _ASM_ARM_XEN_HYPERCALL_H */ diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index ca08a27..221008e 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -84,6 +84,22 @@ extern struct { char _entry[32]; } hypercall_page[]; +#ifndef CONFIG_PREEMPT +extern struct { char _entry[32]; } preemptible_hypercall_page[]; + +static inline bool xen_is_preemptible_hypercall(struct pt_regs *regs) +{ + return !user_mode_vm(regs) + regs-ip = (unsigned long)preemptible_hypercall_page + regs-ip (unsigned long)preemptible_hypercall_page + PAGE_SIZE; +} +#else +static inline bool xen_is_preemptible_hypercall(struct pt_regs *regs) +{ + return false; +} +#endif + #define __HYPERCALLcall hypercall_page+%c[offset] #define __HYPERCALL_ENTRY(x) \ [offset] i (__HYPERVISOR_##x * sizeof(hypercall_page[0])) @@ -215,7 +231,11 @@ privcmd_call(unsigned call, asm volatile(call *%[call] : __HYPERCALL_5PARAM +#ifndef CONFIG_PREEMPT +: [call] a (preemptible_hypercall_page[call]) +#else : [call] a (hypercall_page[call]) +#endif : __HYPERCALL_CLOBBER5); return (long)__res; diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 6bf3a13..9c01b48 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -84,6 +84,9 @@ #include multicalls.h EXPORT_SYMBOL_GPL(hypercall_page); +#ifndef CONFIG_PREEMPT +EXPORT_SYMBOL_GPL(preemptible_hypercall_page); +#endif /* * Pointer to the xen_vcpu_info structure or @@ -1531,6 +1534,10 @@ asmlinkage __visible void __init xen_start_kernel(void) #endif xen_setup_machphys_mapping(); +#ifndef CONFIG_PREEMPT + copy_page(preemptible_hypercall_page, hypercall_page); +#endif + /* Install Xen paravirt ops */ pv_info = xen_info; pv_init_ops = xen_init_ops; diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S index 674b2225..6e6a9517 100644 --- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -85,9 +85,18 @@ ENTRY(xen_pvh_early_cpu_init) .pushsection .text .balign PAGE_SIZE ENTRY(hypercall_page) + +#ifdef CONFIG_PREEMPT +# define PREEMPT_HYPERCALL_ENTRY(x) +#else +# define PREEMPT_HYPERCALL_ENTRY(x) \ + .global xen_hypercall_##x ## _p ASM_NL \ + .set preemptible_xen_hypercall_##x, xen_hypercall_##x + PAGE_SIZE ASM_NL +#endif #define NEXT_HYPERCALL(x) \ ENTRY(xen_hypercall_##x) \ - .skip 32 + .skip 32 ASM_NL \ + PREEMPT_HYPERCALL_ENTRY(x) NEXT_HYPERCALL(set_trap_table) NEXT_HYPERCALL(mmu_update) @@ -138,6 +147,13 @@ NEXT_HYPERCALL(arch_4) NEXT_HYPERCALL(arch_5)
Re: Submit your Google Summer of Code project ideas and volunteer to mentor
On Fri, 01/23 17:21, Stefan Hajnoczi wrote: Dear libvirt, KVM, and QEMU contributors, The Google Summer of Code season begins soon and it's time to collect our thoughts for mentoring students this summer working full-time on libvirt, KVM, and QEMU. What is GSoC? Google Summer of Code 2015 (GSoC) funds students to work on open source projects for 12 weeks over the summer. Open source organizations apply to participate and those accepted receive funding for one or more students. We now need to collect a list of project ideas on our wiki. We also need mentors to volunteer. http://qemu-project.org/Google_Summer_of_Code_2015 Project ideas Please post project ideas on the wiki page below. Project ideas should be suitable as a 12-week project that a student fluent in C/Python/etc can complete. No prior knowledge of QEMU/KVM/libvirt internals can be assumed. http://qemu-project.org/Google_Summer_of_Code_2015 Mentors Please add your name to project ideas you are willing to mentor. In order to mentor you must be an established contributor (regularly contribute patches). You must be willing to spend about 5 hours per week from May 25 to August 21. I have CCed the 8 most active committers since QEMU 2.1.0 as well as the previous libvirt and KVM mentors but everyone is invited. Official timeline: https://www.google-melange.com/gsoc/events/google/gsoc20145 s/20145/2015/ Thank you for organizing it! Fam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Supporting guest OS callchain (perf -g) on KVM
On Thu, Jan 22, 2015 at 03:29:10PM +0200, Elazar Leibovich wrote: When perf runs on a regular linux, it can collect the current stacktraces (kernel or user) for each sample. This is a very important feature, and is utilized by some visualization tools (see, e.g., Brendan's post[0]). As far as I understand, it is not currently implemented in perf [1]. While providing a cross platform, safe solution that works every time is a challenge, I think we can give a reasonable solution for Linux guests only. I think we can, in a portable way, across multiple Linux versions, do the following: 1) Find out at which stack the guest kernel is. 2) Find out Kernel's text address. 3) Scan the stack up to its maximal size. 4) Record all addresses found in the kernel text segment. This is more or less what the kernel do when recording its own stack traces[2]. An alternative, more general design is, recording all integers that looks like addresses in from the location of RIP to the start of the physical page. This would give you a slightly trimmed stack trace, but is pretty safe (you'll never get segfault, as RIP must be in a valid page), and should work across many different guest OS. I have little experience in the internals of perf or KVM, and would be happy to any feedback about implementing guest os callchain for KVM. [0] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html [1] https://github.com/torvalds/linux/blob/master/arch/x86/kernel/cpu/perf_event.c#L1968 /* TODO: We don't support guest os callchain now */ [2] https://github.com/torvalds/linux/blob/master/arch/x86/kernel/dumpstack_64.c#L188 I'm not familiar with perf(1) internals but I guess a starting point is the perf-record(1) userspace call stack code, which collects call stacks for userspace processes. KVM guests are similar. i386 32-bit guest on x86_64 host is an interesting case. The host must be aware of the different calling conventions. CCing people who have been involved in perf-kvm(1). Stefan pgppQMsgVe1V1.pgp Description: PGP signature
Re: [question] incremental backup a running vm
On 26/01/2015 02:07, Zhang Haoyu wrote: Hi, Kashyap I've tried ‘drive_backup’ via QMP, but the snapshots were missed to backup to destination, I think the reason is that backup_run() only copy the guest data regarding qcow2 image. Yes, that's the case. QEMU cannot still access internal snapshots while the file is open. External snapshots are opened read-only, and can be copied with cp while QEMU is running. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
the number of PCI pass-through devices limit?
Hi, All Is there a limit for number of PCI pass-through devices in KVM? For Legacy PCI device assignement or VFIO pass-through method? Many thanks. Thx, Xuekun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted
On 23/01/15 18:58, Luis R. Rodriguez wrote: Its not just hypercalls though, this is all about the interactions with multicalls no? No. This applies to any preemptible hypercall and the toolstack doesn't use multicalls for most of its work. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] incremental backup a running vm
On 26/01/2015 12:13, Zhang Haoyu wrote: Thanks, Paolo, but too many internal snapshots were saved by customers, switching to external snapshot mechanism has significant impaction on subsequent upgrade. In that case, patches are welcome. :) Another problem: drive_backup just implement one time backup, but I want VMWare's VDP-like backup mechanism. The initial backup of a virtual machine takes comparatively more time, because all of the data for that virtual machine is being backed up. Subsequent backups of the same virtual machine take less time, because changed block tracking (log dirty) mechanism is used to only backup the dirty data. After inittial backup done, even the VM shutdown, but subsequent backup also only copy the changed data. As mentioned before, patches for this are on the list. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/11] kvmtool: add command line parameter to instantiate a vGICv3
Hi Will, On 26/01/15 11:30, Will Deacon wrote: On Fri, Jan 23, 2015 at 04:35:10PM +, Andre Przywara wrote: Add the command line parameter --gicv3 to request GICv3 emulation in the kernel. Connect that to the already existing GICv3 code. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/aarch64/arm-cpu.c|5 - .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h |4 +++- tools/kvm/arm/gic.c| 14 ++ tools/kvm/arm/include/arm-common/kvm-config-arch.h |1 + tools/kvm/arm/kvm-cpu.c|2 +- tools/kvm/arm/kvm.c|3 ++- 6 files changed, 25 insertions(+), 4 deletions(-) diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c index a70d6bb..46d6d6a 100644 --- a/tools/kvm/arm/aarch64/arm-cpu.c +++ b/tools/kvm/arm/aarch64/arm-cpu.c @@ -12,7 +12,10 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle) { int timer_interrupts[4] = {13, 14, 11, 10}; -gic__generate_fdt_nodes(fdt, gic_phandle, KVM_DEV_TYPE_ARM_VGIC_V2); +gic__generate_fdt_nodes(fdt, gic_phandle, +kvm-cfg.arch.gicv3 ? +KVM_DEV_TYPE_ARM_VGIC_V3 : +KVM_DEV_TYPE_ARM_VGIC_V2); timer__generate_fdt_nodes(fdt, kvm, timer_interrupts); } diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h index 89860ae..106e52f 100644 --- a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h +++ b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h @@ -3,7 +3,9 @@ #define ARM_OPT_ARCH_RUN(cfg) \ OPT_BOOLEAN('\0', aarch32, (cfg)-aarch32_guest, \ -Run AArch32 guest), +Run AArch32 guest), \ +OPT_BOOLEAN('\0', gicv3, (cfg)-gicv3, \ +Use a GICv3 interrupt controller in the guest), On a GICv3-capable system, why would I *not* want to enable this option? In other words, could we make this the default behaviour on systems that support it, and if you need an override then it should be something like --force-gicv2. Well, you could have a guest kernel 3.17, which does not have GICv3 support. In general I consider GICv2 better tested, so I reckon that people will only want to use GICv3 emulation if there is a need for it (non-compat GICv3 host or more than 8 VCPUs in the guest). That probably changes over time, but for the time being I'd better keep the default at GICv2 emulation. Having said that, there could be a fallback in case GICv2 emulation is not available. Let me take a look at that. Also thinking about the future (ITS emulation) I found that I'd like to replace this option with something more generic like --irqchip=. Cheers, Andre. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted
On 23.01.15 at 19:58, mcg...@suse.com wrote: On Fri, Jan 23, 2015 at 11:45:06AM +, David Vrabel wrote: On 23/01/15 00:29, Luis R. Rodriguez wrote: @@ -1243,6 +1247,25 @@ void xen_evtchn_do_upcall(struct pt_regs *regs) set_irq_regs(old_regs); } +/* + * CONFIG_PREEMPT=n kernels can end up triggering the softlock + * TASK_UNINTERRUPTIBLE hanger check (default 120 seconds) + * when certain multicalls are used [0] on large systems, in + * that case we need a way to voluntarily preempt. This is + * only an issue on CONFIG_PREEMPT=n kernels. Rewrite this comment as; * Some hypercalls issued by the toolstack can take many 10s of Its not just hypercalls though, this is all about the interactions with multicalls no? multicalls are just a special case of hypercalls. Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] incremental backup a running vm
On 2015-01-26 17:29:43, Paolo Bonzini wrote: On 26/01/2015 02:07, Zhang Haoyu wrote: Hi, Kashyap I've tried ‘drive_backup’ via QMP, but the snapshots were missed to backup to destination, I think the reason is that backup_run() only copy the guest data regarding qcow2 image. Yes, that's the case. QEMU cannot still access internal snapshots while the file is open. External snapshots are opened read-only, and can be copied with cp while QEMU is running. Thanks, Paolo, but too many internal snapshots were saved by customers, switching to external snapshot mechanism has significant impaction on subsequent upgrade. Another problem: drive_backup just implement one time backup, but I want VMWare's VDP-like backup mechanism. The initial backup of a virtual machine takes comparatively more time, because all of the data for that virtual machine is being backed up. Subsequent backups of the same virtual machine take less time, because changed block tracking (log dirty) mechanism is used to only backup the dirty data. After inittial backup done, even the VM shutdown, but subsequent backup also only copy the changed data. Thanks, Zhang Haoyu Paolo N�Р骒r��yb�X�肚�v�^�)藓{.n�+�筏�hФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f
Re: [PATCH 0/7] KVM: x86: Emulator fixes
On 26/01/2015 08:32, Nadav Amit wrote: Sorry for sending patches at the last minute. There is nothing critical in this patch-set. Yet, if you may want to incorporate something in 3.20 - specifically 5 (small define mistakes) or 7 (which is somewhat affected by recent changes). Thanks for reviewing the patches. I'll apply all of them for 3.20, since Linus expects no merge window for 2 weeks. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] kvmtool: AArch{32,64}: use KVM_CREATE_DEVICE co to instanciate the GIC
On Fri, Jan 23, 2015 at 04:35:02PM +, Andre Przywara wrote: From: Marc Zyngier marc.zyng...@arm.com As of 3.14, KVM/arm supports the creation/configuration of the GIC through a more generic device API, which is now the preferred way to do so. Plumb the new API in, and allow the old code to be used as a fallback. [Andre: Rename some functions on the way to differentiate between creation and initialisation more clearly.] Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/gic.c| 60 tools/kvm/arm/include/arm-common/gic.h |2 +- tools/kvm/arm/kvm.c|6 ++-- 3 files changed, 57 insertions(+), 11 deletions(-) diff --git a/tools/kvm/arm/gic.c b/tools/kvm/arm/gic.c index 5d8cbe6..ce5f7fa 100644 --- a/tools/kvm/arm/gic.c +++ b/tools/kvm/arm/gic.c @@ -7,7 +7,41 @@ #include linux/byteorder.h #include linux/kvm.h -int gic__init_irqchip(struct kvm *kvm) +static int gic_fd = -1; + +static int gic__create_device(struct kvm *kvm) +{ + int err; + u64 cpu_if_addr = ARM_GIC_CPUI_BASE; + u64 dist_addr = ARM_GIC_DIST_BASE; + struct kvm_create_device gic_device = { + .type = KVM_DEV_TYPE_ARM_VGIC_V2, + }; + struct kvm_device_attr cpu_if_attr = { + .group = KVM_DEV_ARM_VGIC_GRP_ADDR, + .attr = KVM_VGIC_V2_ADDR_TYPE_CPU, + .addr = (u64)(unsigned long)cpu_if_addr, + }; + struct kvm_device_attr dist_attr = { + .group = KVM_DEV_ARM_VGIC_GRP_ADDR, + .attr = KVM_VGIC_V2_ADDR_TYPE_DIST, + .addr = (u64)(unsigned long)dist_addr, + }; + + err = ioctl(kvm-vm_fd, KVM_CREATE_DEVICE, gic_device); + if (err) + return err; + + gic_fd = gic_device.fd; + + err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, cpu_if_attr); + if (err) + return err; + + return ioctl(gic_fd, KVM_SET_DEVICE_ATTR, dist_attr); +} + +static int gic__create_irqchip(struct kvm *kvm) { int err; struct kvm_arm_device_addr gic_addr[] = { @@ -23,12 +57,6 @@ int gic__init_irqchip(struct kvm *kvm) } }; - if (kvm-nrcpus GIC_MAX_CPUS) { - pr_warning(%d CPUS greater than maximum of %d -- truncating\n, - kvm-nrcpus, GIC_MAX_CPUS); - kvm-nrcpus = GIC_MAX_CPUS; - } - err = ioctl(kvm-vm_fd, KVM_CREATE_IRQCHIP); if (err) return err; @@ -41,6 +69,24 @@ int gic__init_irqchip(struct kvm *kvm) return err; } +int gic__create(struct kvm *kvm) +{ + int err; + + if (kvm-nrcpus GIC_MAX_CPUS) { + pr_warning(%d CPUS greater than maximum of %d -- truncating\n, + kvm-nrcpus, GIC_MAX_CPUS); + kvm-nrcpus = GIC_MAX_CPUS; + } + + /* Try the new way first, and fallback on legacy method otherwise */ + err = gic__create_device(kvm); + if (err) + err = gic__create_irqchip(kvm); This fallback doesn't look safe to me: - gic_fd might remain initialised - What does the kernel vgic driver do if you've already done a successful KVM_CREATE_DEVICE and then try to use the legacy method? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/11] kvmtool: add command line parameter to instantiate a vGICv3
On Fri, Jan 23, 2015 at 04:35:10PM +, Andre Przywara wrote: Add the command line parameter --gicv3 to request GICv3 emulation in the kernel. Connect that to the already existing GICv3 code. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/aarch64/arm-cpu.c|5 - .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h |4 +++- tools/kvm/arm/gic.c| 14 ++ tools/kvm/arm/include/arm-common/kvm-config-arch.h |1 + tools/kvm/arm/kvm-cpu.c|2 +- tools/kvm/arm/kvm.c|3 ++- 6 files changed, 25 insertions(+), 4 deletions(-) diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c index a70d6bb..46d6d6a 100644 --- a/tools/kvm/arm/aarch64/arm-cpu.c +++ b/tools/kvm/arm/aarch64/arm-cpu.c @@ -12,7 +12,10 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle) { int timer_interrupts[4] = {13, 14, 11, 10}; - gic__generate_fdt_nodes(fdt, gic_phandle, KVM_DEV_TYPE_ARM_VGIC_V2); + gic__generate_fdt_nodes(fdt, gic_phandle, + kvm-cfg.arch.gicv3 ? + KVM_DEV_TYPE_ARM_VGIC_V3 : + KVM_DEV_TYPE_ARM_VGIC_V2); timer__generate_fdt_nodes(fdt, kvm, timer_interrupts); } diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h index 89860ae..106e52f 100644 --- a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h +++ b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h @@ -3,7 +3,9 @@ #define ARM_OPT_ARCH_RUN(cfg) \ OPT_BOOLEAN('\0', aarch32, (cfg)-aarch32_guest, \ - Run AArch32 guest), + Run AArch32 guest), \ + OPT_BOOLEAN('\0', gicv3, (cfg)-gicv3, \ + Use a GICv3 interrupt controller in the guest), On a GICv3-capable system, why would I *not* want to enable this option? In other words, could we make this the default behaviour on systems that support it, and if you need an override then it should be something like --force-gicv2. Or am I missing a key piece of the puzzle? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/11] kvmtool: add command line parameter to instantiate a vGICv3
On 26/01/15 11:43, Andre Przywara wrote: Hi Will, On 26/01/15 11:30, Will Deacon wrote: On Fri, Jan 23, 2015 at 04:35:10PM +, Andre Przywara wrote: Add the command line parameter --gicv3 to request GICv3 emulation in the kernel. Connect that to the already existing GICv3 code. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/aarch64/arm-cpu.c|5 - .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h |4 +++- tools/kvm/arm/gic.c| 14 ++ tools/kvm/arm/include/arm-common/kvm-config-arch.h |1 + tools/kvm/arm/kvm-cpu.c|2 +- tools/kvm/arm/kvm.c|3 ++- 6 files changed, 25 insertions(+), 4 deletions(-) diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c index a70d6bb..46d6d6a 100644 --- a/tools/kvm/arm/aarch64/arm-cpu.c +++ b/tools/kvm/arm/aarch64/arm-cpu.c @@ -12,7 +12,10 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle) { int timer_interrupts[4] = {13, 14, 11, 10}; - gic__generate_fdt_nodes(fdt, gic_phandle, KVM_DEV_TYPE_ARM_VGIC_V2); + gic__generate_fdt_nodes(fdt, gic_phandle, + kvm-cfg.arch.gicv3 ? + KVM_DEV_TYPE_ARM_VGIC_V3 : + KVM_DEV_TYPE_ARM_VGIC_V2); timer__generate_fdt_nodes(fdt, kvm, timer_interrupts); } diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h index 89860ae..106e52f 100644 --- a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h +++ b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h @@ -3,7 +3,9 @@ #define ARM_OPT_ARCH_RUN(cfg) \ OPT_BOOLEAN('\0', aarch32, (cfg)-aarch32_guest, \ - Run AArch32 guest), + Run AArch32 guest), \ + OPT_BOOLEAN('\0', gicv3, (cfg)-gicv3, \ + Use a GICv3 interrupt controller in the guest), On a GICv3-capable system, why would I *not* want to enable this option? In other words, could we make this the default behaviour on systems that support it, and if you need an override then it should be something like --force-gicv2. Well, you could have a guest kernel 3.17, which does not have GICv3 support. In general I consider GICv2 better tested, so I reckon that people will only want to use GICv3 emulation if there is a need for it (non-compat GICv3 host or more than 8 VCPUs in the guest). That probably changes over time, but for the time being I'd better keep the default at GICv2 emulation. I think there is slightly more to it. You want the same command-line options to give you the same result on different platform (provided that the HW is available, see below). Changing the default depending on the platform you're is not very good for reproducibility. Having said that, there could be a fallback in case GICv2 emulation is not available. Let me take a look at that. You could try and pick a GICv3 emulation if v2 is not available, and probably print a warning in that case. Also thinking about the future (ITS emulation) I found that I'd like to replace this option with something more generic like --irqchip=. That's an orthogonal issue, but yes, this is probably better. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] kvmtool: AArch{32,64}: use KVM_CREATE_DEVICE co to instanciate the GIC
On 26/01/15 11:26, Will Deacon wrote: On Fri, Jan 23, 2015 at 04:35:02PM +, Andre Przywara wrote: From: Marc Zyngier marc.zyng...@arm.com As of 3.14, KVM/arm supports the creation/configuration of the GIC through a more generic device API, which is now the preferred way to do so. Plumb the new API in, and allow the old code to be used as a fallback. [Andre: Rename some functions on the way to differentiate between creation and initialisation more clearly.] Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/gic.c| 60 tools/kvm/arm/include/arm-common/gic.h |2 +- tools/kvm/arm/kvm.c|6 ++-- 3 files changed, 57 insertions(+), 11 deletions(-) diff --git a/tools/kvm/arm/gic.c b/tools/kvm/arm/gic.c index 5d8cbe6..ce5f7fa 100644 --- a/tools/kvm/arm/gic.c +++ b/tools/kvm/arm/gic.c @@ -7,7 +7,41 @@ #include linux/byteorder.h #include linux/kvm.h -int gic__init_irqchip(struct kvm *kvm) +static int gic_fd = -1; + +static int gic__create_device(struct kvm *kvm) +{ +int err; +u64 cpu_if_addr = ARM_GIC_CPUI_BASE; +u64 dist_addr = ARM_GIC_DIST_BASE; +struct kvm_create_device gic_device = { +.type = KVM_DEV_TYPE_ARM_VGIC_V2, +}; +struct kvm_device_attr cpu_if_attr = { +.group = KVM_DEV_ARM_VGIC_GRP_ADDR, +.attr = KVM_VGIC_V2_ADDR_TYPE_CPU, +.addr = (u64)(unsigned long)cpu_if_addr, +}; +struct kvm_device_attr dist_attr = { +.group = KVM_DEV_ARM_VGIC_GRP_ADDR, +.attr = KVM_VGIC_V2_ADDR_TYPE_DIST, +.addr = (u64)(unsigned long)dist_addr, +}; + +err = ioctl(kvm-vm_fd, KVM_CREATE_DEVICE, gic_device); +if (err) +return err; + +gic_fd = gic_device.fd; + +err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, cpu_if_attr); +if (err) +return err; + +return ioctl(gic_fd, KVM_SET_DEVICE_ATTR, dist_attr); +} + +static int gic__create_irqchip(struct kvm *kvm) { int err; struct kvm_arm_device_addr gic_addr[] = { @@ -23,12 +57,6 @@ int gic__init_irqchip(struct kvm *kvm) } }; -if (kvm-nrcpus GIC_MAX_CPUS) { -pr_warning(%d CPUS greater than maximum of %d -- truncating\n, -kvm-nrcpus, GIC_MAX_CPUS); -kvm-nrcpus = GIC_MAX_CPUS; -} - err = ioctl(kvm-vm_fd, KVM_CREATE_IRQCHIP); if (err) return err; @@ -41,6 +69,24 @@ int gic__init_irqchip(struct kvm *kvm) return err; } +int gic__create(struct kvm *kvm) +{ +int err; + +if (kvm-nrcpus GIC_MAX_CPUS) { +pr_warning(%d CPUS greater than maximum of %d -- truncating\n, +kvm-nrcpus, GIC_MAX_CPUS); +kvm-nrcpus = GIC_MAX_CPUS; +} + +/* Try the new way first, and fallback on legacy method otherwise */ +err = gic__create_device(kvm); +if (err) +err = gic__create_irqchip(kvm); This fallback doesn't look safe to me: - gic_fd might remain initialised - What does the kernel vgic driver do if you've already done a successful KVM_CREATE_DEVICE and then try to use the legacy method? Good point. I think we need to cleanup the device by closing the fd (and resetting the variable to -1) in case any of the subsequent ioctls return with an error (e.g. due to unaligned addresses). I have to check what happens in the kernel in that case, though. Cheers, Andre. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
On Wed, Jan 21, 2015 at 06:39:46PM +, Marc Zyngier wrote: Trying to emulate the behaviour of set/way cache ops is fairly pointless, as there are too many ways we can end-up missing stuff. Also, there is some system caches out there that simply ignore set/way operations. So instead of trying to implement them, let's convert it to VA ops, and use them as a way to re-enable the trapping of VM ops. That way, we can detect the point when the MMU/caches are turned off, and do a full VM flush (which is what the guest was trying to do anyway). This allows a 32bit zImage to boot on the APM thingy, and will probably help bootloaders in general. Signed-off-by: Marc Zyngier marc.zyng...@arm.com This had some conflicts with dirty page logging. I fixed it up here, and also removed some trailing white space and mixed spaces/tabs that patch complained about: http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes --- arch/arm/include/asm/kvm_emulate.h | 10 + arch/arm/include/asm/kvm_host.h | 3 -- arch/arm/include/asm/kvm_mmu.h | 3 +- arch/arm/kvm/arm.c | 10 - arch/arm/kvm/coproc.c| 64 ++ arch/arm/kvm/coproc_a15.c| 2 +- arch/arm/kvm/coproc_a7.c | 2 +- arch/arm/kvm/mmu.c | 70 - arch/arm/kvm/trace.h | 39 +++ arch/arm64/include/asm/kvm_emulate.h | 10 + arch/arm64/include/asm/kvm_host.h| 3 -- arch/arm64/include/asm/kvm_mmu.h | 3 +- arch/arm64/kvm/sys_regs.c| 75 +--- 13 files changed, 155 insertions(+), 139 deletions(-) diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 66ce176..7b01523 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) vcpu-arch.hcr = HCR_GUEST_MASK; } +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.hcr; +} + +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr) +{ + vcpu-arch.hcr = hcr; +} + static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu) { return 1; diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 254e065..04b4ea0 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -125,9 +125,6 @@ struct kvm_vcpu_arch { * Anything that is not used directly from assembly code goes * here. */ - /* dcache set/way operation pending */ - int last_pcpu; - cpumask_t require_dcache_flush; /* Don't run the guest on this vcpu */ bool pause; diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 63e0ecc..286644c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, #define kvm_virt_to_phys(x) virt_to_idmap((unsigned long)(x)) -void stage2_flush_vm(struct kvm *kvm); +void kvm_set_way_flush(struct kvm_vcpu *vcpu); +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled); #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 2d6d910..0b0d58a 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vcpu-cpu = cpu; vcpu-arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); - /* - * Check whether this vcpu requires the cache to be flushed on - * this physical CPU. This is a consequence of doing dcache - * operations by set/way on this vcpu. We do it here to be in - * a non-preemptible section. - */ - if (cpumask_test_and_clear_cpu(cpu, vcpu-arch.require_dcache_flush)) - flush_cache_all(); /* We'd really want v7_flush_dcache_all() */ - kvm_arm_set_running_vcpu(vcpu); } @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) ret = kvm_call_hyp(__kvm_vcpu_run, vcpu); vcpu-mode = OUTSIDE_GUEST_MODE; - vcpu-arch.last_pcpu = smp_processor_id(); kvm_guest_exit(); trace_kvm_exit(*vcpu_pc(vcpu)); /* diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index 7928dbd..0afcc00 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu, return true; } -/* See note at ARM ARM B1.14.4 */ +/* + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized). + */ static bool access_dcsw(struct