[RFC]How to improve KVM VM resource assignment and per-vm process/thread scheduling.
Hi, ALL Currently kvm hypervisor have lots of features depend on linux standard apis, like vcpupin/mempin/processpin etc. But in the real production environment, we need an automated resource assign and/or scheduling, is there any plan to implement it? resource assignment requirements like: cpu eligible by the VMs In case it is eligible, whether it is in use if it is in use, whether it is dedicated to one VM, or can be shared by many VMs In case of Shared CPU need to configure oversubscription ratio used ratio info So does memory, I/O device assignment requirements. per-vm process/thread scheduling requirements like: On hypervisor side, VMs use vhost-net, virtio-scsi devices have qemu io-thread, vhost-net thread, ovs thread, hNIC interrupt context(hirq/softirq), you shoud place there threads on the same numa node to gain best performance, another important thing, you should balance these threads' cpuload on all the numa cores to avoid unbalance between vcpu usable resources. Thanks. Peter Huang
Re: [PATCH v3 2/4] live migration support for initial write protect of VM
On 04/24/2014 09:39 AM, Steve Capper wrote: > On Wed, Apr 23, 2014 at 12:18:07AM +0100, Mario Smarduch wrote: >> >> >> Support for live migration initial write protect. >> - moved write protect to architecture memory region prepare function. This >> way you can fail, abort migration without keep track of migration status. >> - Above also allows to generalize read dirty log function with x86 >> - Added stage2_mark_pte_ro() >> - optimized initial write protect, skip upper table lookups >> - added stage2pmd_addr_end() to do generic 4 level table walk >> - changed kvm_flush_remote_tlbs() to weak function > > Hello Mario, > I've taken a quick look at this and have a few suggestions below. > (I'm not a KVM expert, but took a look at the memory manipulation). Hi Steve, your suggestions are very helpful, my response inline. Thanks. Mario > > Future versions of this series could probably benefit from being sent > to lakml too? > > Cheers, > -- > Steve > >> >> Signed-off-by: Mario Smarduch >> --- >> arch/arm/include/asm/kvm_host.h |8 ++ >> arch/arm/kvm/arm.c |3 + >> arch/arm/kvm/mmu.c | 163 >> +++ >> virt/kvm/kvm_main.c |5 +- >> 4 files changed, 178 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm/include/asm/kvm_host.h >> b/arch/arm/include/asm/kvm_host.h >> index 1e739f9..9f827c8 100644 >> --- a/arch/arm/include/asm/kvm_host.h >> +++ b/arch/arm/include/asm/kvm_host.h >> @@ -67,6 +67,12 @@ struct kvm_arch { >> >> /* Interrupt controller */ >> struct vgic_distvgic; >> + >> + /* Marks start of migration, used to handle 2nd stage page faults >> +* during migration, prevent installing huge pages and split huge >> pages >> +* to small pages. >> +*/ >> + int migration_in_progress; >> }; >> >> #define KVM_NR_MEM_OBJS 40 >> @@ -230,4 +236,6 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, >> u64 value); >> >> void kvm_tlb_flush_vmid(struct kvm *kvm); >> >> +int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); >> + >> #endif /* __ARM_KVM_HOST_H__ */ >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index 9a4bc10..b916478 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -233,6 +233,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, >>struct kvm_userspace_memory_region *mem, >>enum kvm_mr_change change) >> { >> + /* Request for migration issued by user, write protect memory slot */ >> + if ((change != KVM_MR_DELETE) && (mem->flags & >> KVM_MEM_LOG_DIRTY_PAGES)) >> + return kvm_mmu_slot_remove_write_access(kvm, mem->slot); >> return 0; >> } >> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 7ab77f3..4d029a6 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -31,6 +31,11 @@ >> >> #include "trace.h" >> >> +#define stage2pud_addr_end(addr, end) \ >> +({ u64 __boundary = ((addr) + PUD_SIZE) & PUD_MASK;\ >> + (__boundary - 1 < (end) - 1) ? __boundary : (end); \ >> +}) > > A matter of personal preference: can this be a static inline function > instead? That way you could avoid ambiguity with the parameter types. > (not an issue here, but this has bitten me in the past). Yes good point, will change. > >> + >> extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[]; >> >> static pgd_t *boot_hyp_pgd; >> @@ -569,6 +574,15 @@ static int stage2_set_pte(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cache, >> return 0; >> } >> >> +/* Write protect page */ >> +static void stage2_mark_pte_ro(pte_t *pte) >> +{ >> + pte_t new_pte; >> + >> + new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2); >> + *pte = new_pte; >> +} > > This isn't making the pte read only. > It's nuking all the flags from the pte and replacing them with factory > settings. (In this case the PAGE_S2 pgprot). > If we had other attributes that we later wish to retain this could be > easily overlooked. Perhaps a new name for the function? Yes that's pretty bad, I'll clear the write protect bit only. > >> + >> /** >> * kvm_phys_addr_ioremap - map a device range to guest IPA >> * >> @@ -649,6 +663,155 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, >> phys_addr_t *ipap) >> return false; >> } >> >> +/** >> + * split_pmd - splits huge pages to small pages, required to keep a dirty >> log of >> + * smaller memory granules, otherwise huge pages would need to be >> + * migrated. Practically an idle system has problems migrating with >> + * huge pages. Called during WP of entire VM address space, done >> + * initially when migration thread isses the KVM_MEM_LOG_DIRTY_PAGES >> + * ioctl. >> + * The mmu_lock is held during splitting. >> + * >> + * @kvm:The KVM p
Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed
On Fri, Apr 25, 2014 at 12:57:48AM +0200, Paolo Bonzini wrote: > Il 24/04/2014 22:57, Eduardo Habkost ha scritto: > >On Thu, Apr 24, 2014 at 04:42:33PM -0400, Paolo Bonzini wrote: > >>Il 22/04/2014 21:14, Eduardo Habkost ha scritto: > >>>Not for "-cpu host". If somebody needs migration to work, they shouldn't > >>>be using "-cpu host" anyway (I don't know if you have seen the other > >>>comments in my message?). > >> > >>I'm not entirely sure. If you have hosts with exactly identical > >>chipsets, "-cpu host" migration will in all likelihood work. > >>Marcelo's approach is safer. > > > >If that didn't break other use cases, I would agree. > > > >But "-cpu host" today covers two use cases: 1) enabling everything that > >can be enabled, even if it breaks migration; 2) enabling all stuff that > >can be safely enabled without breaking migration. > > What does it enable *now* that breaks migration? Every single feature it enables can break it. It breaks if you upgrade to a QEMU version with new feature words. It breaks if you upgrade to a kernel which supports new features. A feature that doesn't let you upgrade the kernel isn't a feature I expect users to be relying upon. libvirt even blocks migration if "-cpu host" is in use. > > >Now we can't do both at the same time[1]. > > > >(1) is important for management software; > >(2) works only if you are lucky. > > Or if you plan ahead. With additional logic even invariant TSC in > principle can be made to work across migration if the host clocks are > synchronized well enough (PTP accuracy is in the 100-1000 TSC ticks > range). Yes, it is possible in the future. But we never planned for it, so "-cpu host" never supported migration. > > >Why would it make sense to break (1) to try make (2) work? > > > >[1] I would even argue that we never did both at the same time."-cpu > >host" depends on host hardware capabilities, host kernel capabilities, > >and host QEMU version (we never took care of keeping guest ABI with > >"-cpu host"). If migration did work, it was never supposed to. > > I think this is where I disagree. Migration of the PMU is one thing > that obviously was done with "-cpu host" in mind. We may try to make a reliable implementation of use case (2) some day, yes. But the choice I see right now is between trying not break a feature that was never declared to exist, or breaking an existing interface that is required to solve existing bugs between libvirt and QEMU. -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed
Il 24/04/2014 22:57, Eduardo Habkost ha scritto: On Thu, Apr 24, 2014 at 04:42:33PM -0400, Paolo Bonzini wrote: Il 22/04/2014 21:14, Eduardo Habkost ha scritto: Not for "-cpu host". If somebody needs migration to work, they shouldn't be using "-cpu host" anyway (I don't know if you have seen the other comments in my message?). I'm not entirely sure. If you have hosts with exactly identical chipsets, "-cpu host" migration will in all likelihood work. Marcelo's approach is safer. If that didn't break other use cases, I would agree. But "-cpu host" today covers two use cases: 1) enabling everything that can be enabled, even if it breaks migration; 2) enabling all stuff that can be safely enabled without breaking migration. What does it enable *now* that breaks migration? Now we can't do both at the same time[1]. (1) is important for management software; (2) works only if you are lucky. Or if you plan ahead. With additional logic even invariant TSC in principle can be made to work across migration if the host clocks are synchronized well enough (PTP accuracy is in the 100-1000 TSC ticks range). Why would it make sense to break (1) to try make (2) work? [1] I would even argue that we never did both at the same time."-cpu host" depends on host hardware capabilities, host kernel capabilities, and host QEMU version (we never took care of keeping guest ABI with "-cpu host"). If migration did work, it was never supposed to. I think this is where I disagree. Migration of the PMU is one thing that obviously was done with "-cpu host" in mind. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm: Use pci_enable_msix_exact() instead of pci_enable_msix()
> > >>> So, do I have to pull something (which I'd rather not, since pulling > > >>> the wrong thing in a submaintainer tree will make Linus angry), or > > >>> should I do it in the next merge window after pci_enable_msix_exact > > >>> gets in? > > >So it is already in. > > > > It is not, because maintainer branches are not rebased. KVM > > development is based on 3.14-rc1, and will not get that commit until > > the first 3.15 pull request is sent to Linux. > > > > No big deal, I'll include this patch in a second 3.15 pull request. > > Hi Paolo, > > I believe it is safe to pull it now? Yup, vacation got in the way of doing this during the merge window but I can safely send this for -rc next week. It was on my todo list. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed
On Thu, Apr 24, 2014 at 04:42:33PM -0400, Paolo Bonzini wrote: > Il 22/04/2014 21:14, Eduardo Habkost ha scritto: > >Not for "-cpu host". If somebody needs migration to work, they shouldn't > >be using "-cpu host" anyway (I don't know if you have seen the other > >comments in my message?). > > I'm not entirely sure. If you have hosts with exactly identical > chipsets, "-cpu host" migration will in all likelihood work. > Marcelo's approach is safer. If that didn't break other use cases, I would agree. But "-cpu host" today covers two use cases: 1) enabling everything that can be enabled, even if it breaks migration; 2) enabling all stuff that can be safely enabled without breaking migration. Now we can't do both at the same time[1]. (1) is important for management software; (2) works only if you are lucky. Why would it make sense to break (1) to try make (2) work? [1] I would even argue that we never did both at the same time."-cpu host" depends on host hardware capabilities, host kernel capabilities, and host QEMU version (we never took care of keeping guest ABI with "-cpu host"). If migration did work, it was never supposed to. -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed
Il 22/04/2014 21:14, Eduardo Habkost ha scritto: Not for "-cpu host". If somebody needs migration to work, they shouldn't be using "-cpu host" anyway (I don't know if you have seen the other comments in my message?). I'm not entirely sure. If you have hosts with exactly identical chipsets, "-cpu host" migration will in all likelihood work. Marcelo's approach is safer. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: target-i386: block migration and savevm if invariant tsc is exposed (v3)
On Wed, Apr 23, 2014 at 06:04:45PM -0300, Marcelo Tosatti wrote: > > Invariant TSC documentation mentions that "invariant TSC will run at a > constant rate in all ACPI P-, C-. and T-states". > > This is not the case if migration to a host with different TSC frequency > is allowed, or if savevm is performed. So block migration/savevm. > > Signed-off-by: Marcelo Tosatti > [...] > @@ -702,6 +706,16 @@ int kvm_arch_init_vcpu(CPUState *cs) >!!(c->ecx & CPUID_EXT_SMX); > } > > +c = cpuid_find_entry(&cpuid_data.cpuid, 0x8007, 0); > +if (c && (c->edx & 1<<8) && invtsc_mig_blocker == NULL) { > +/* for migration */ > +error_set(&invtsc_mig_blocker, > + QERR_DEVICE_FEATURE_BLOCKS_MIGRATION, "invtsc", "cpu"); > +migrate_add_blocker(invtsc_mig_blocker); > +/* for savevm */ > +vmstate_x86_cpu.unmigratable = 1; Did you ensure this will always happen before vmstate_register() is called for vmstate_x86_cpu? I believe kvm_arch_init_vcpu() is called a long long time after device_set_realized() (which is where vmstate_register() is called for DeviceState objects). -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/9] kvmtool: handle guests of a different endianness
This patch series adds some infrastructure to kvmtool to allow a BE guest to use virtio-mmio on a LE host, provided that the architecture actually supports such madness. Not all the backend have been converted, only those I actually cared about. Converting them is pretty easy though, and will be done if the method is deemed acceptable. This has been tested on both arm and arm64 (I use this on a daily basis to test BE code). The corresponding kernel changes have all been merged. Also available at: git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/kvmtool-be-on-le >From v2 (never posted): - Fixed tons of bugs (config space) - Fixed TAP networking >From v1: - Gave up on the virtio extension after the push back from the PPC guys. Instead, we snapshot the endianness of the vcpu when it tries to reset the device. A bit ugly, but doesn't require any change on the kernel side. Marc Zyngier (9): kvmtool: pass trapped vcpu to MMIO accessors kvmtool: virt_queue configuration based on endianness kvmtool: sample CPU endianness on virtio-mmio device reset kvmtool: add queue endianness initializer kvmtool: convert console backend to support bi-endianness kvmtool: convert 9p backend to support bi-endianness kvmtool: convert blk backend to support bi-endianness kvmtool: convert net backend to support bi-endianness kvmtool: virtio: enable arm/arm64 support for bi-endianness tools/kvm/arm/aarch32/kvm-cpu.c | 14 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h | 2 + tools/kvm/arm/aarch64/kvm-cpu.c | 25 tools/kvm/arm/include/arm-common/kvm-arch.h | 2 + tools/kvm/arm/include/arm-common/kvm-cpu-arch.h | 4 +- tools/kvm/arm/kvm-cpu.c | 10 +-- tools/kvm/hw/pci-shmem.c | 2 +- tools/kvm/include/kvm/kvm-cpu.h | 1 + tools/kvm/include/kvm/kvm.h | 4 +- tools/kvm/include/kvm/virtio.h | 82 +++- tools/kvm/kvm-cpu.c | 10 ++- tools/kvm/mmio.c | 11 ++-- tools/kvm/pci.c | 3 +- tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h | 2 +- tools/kvm/powerpc/kvm-cpu.c | 4 +- tools/kvm/powerpc/spapr_pci.h| 6 +- tools/kvm/virtio/9p.c| 3 + tools/kvm/virtio/blk.c | 31 +++-- tools/kvm/virtio/console.c | 8 ++- tools/kvm/virtio/core.c | 59 + tools/kvm/virtio/mmio.c | 21 -- tools/kvm/virtio/net.c | 45 +++-- tools/kvm/virtio/pci.c | 6 +- tools/kvm/x86/include/kvm/kvm-cpu-arch.h | 4 +- 24 files changed, 284 insertions(+), 75 deletions(-) -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/9] kvmtool: convert console backend to support bi-endianness
Configure the queues to follow the guest endianness, and make sure the configuration space is doing the same. Signed-off-by: Marc Zyngier --- tools/kvm/virtio/console.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/kvm/virtio/console.c b/tools/kvm/virtio/console.c index 0474e2b..384eac1 100644 --- a/tools/kvm/virtio/console.c +++ b/tools/kvm/virtio/console.c @@ -131,7 +131,12 @@ static u32 get_host_features(struct kvm *kvm, void *dev) static void set_guest_features(struct kvm *kvm, void *dev, u32 features) { - /* Unused */ + struct con_dev *cdev = dev; + struct virtio_console_config *conf = &cdev->config; + + conf->cols = virtio_host_to_guest_u16(&cdev->vdev, conf->cols); + conf->rows = virtio_host_to_guest_u16(&cdev->vdev, conf->rows); + conf->max_nr_ports = virtio_host_to_guest_u32(&cdev->vdev, conf->max_nr_ports); } static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, @@ -149,6 +154,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, p = virtio_get_vq(kvm, queue->pfn, page_size); vring_init(&queue->vring, VIRTIO_CONSOLE_QUEUE_SIZE, p, align); + virtio_init_device_vq(&cdev.vdev, queue); if (vq == VIRTIO_CONSOLE_TX_QUEUE) { thread_pool__init_job(&cdev.jobs[vq], kvm, virtio_console_handle_callback, queue); -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 9/9] kvmtool: virtio: enable arm/arm64 support for bi-endianness
Implement the kcm_cpu__get_endianness call for both AArch32 and AArch64, and advertise the bi-endianness support. Signed-off-by: Marc Zyngier --- tools/kvm/arm/aarch32/kvm-cpu.c | 14 + tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h | 2 ++ tools/kvm/arm/aarch64/kvm-cpu.c | 25 tools/kvm/arm/include/arm-common/kvm-arch.h | 2 ++ 4 files changed, 43 insertions(+) diff --git a/tools/kvm/arm/aarch32/kvm-cpu.c b/tools/kvm/arm/aarch32/kvm-cpu.c index bd71037..464b473 100644 --- a/tools/kvm/arm/aarch32/kvm-cpu.c +++ b/tools/kvm/arm/aarch32/kvm-cpu.c @@ -1,5 +1,6 @@ #include "kvm/kvm-cpu.h" #include "kvm/kvm.h" +#include "kvm/virtio.h" #include @@ -76,6 +77,19 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu) die_perror("KVM_SET_ONE_REG failed (pc)"); } +int kvm_cpu__get_endianness(struct kvm_cpu *vcpu) +{ + struct kvm_one_reg reg; + u32 data; + + reg.id = ARM_CORE_REG(usr_regs.ARM_cpsr); + reg.addr = (u64)(unsigned long)&data; + if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, ®) < 0) + die("KVM_GET_ONE_REG failed (cpsr)"); + + return (data & PSR_E_BIT) ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE; +} + void kvm_cpu__show_code(struct kvm_cpu *vcpu) { struct kvm_one_reg reg; diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h b/tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h index 7d70c3b..ed7da45 100644 --- a/tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h +++ b/tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h @@ -13,5 +13,7 @@ #define ARM_MPIDR_HWID_BITMASK 0xFF00FFUL #define ARM_CPU_ID 3, 0, 0, 0 #define ARM_CPU_ID_MPIDR 5 +#define ARM_CPU_CTRL 3, 0, 1, 0 +#define ARM_CPU_CTRL_SCTLR 0 #endif /* KVM__KVM_CPU_ARCH_H */ diff --git a/tools/kvm/arm/aarch64/kvm-cpu.c b/tools/kvm/arm/aarch64/kvm-cpu.c index 059e42c..b3ce2c8 100644 --- a/tools/kvm/arm/aarch64/kvm-cpu.c +++ b/tools/kvm/arm/aarch64/kvm-cpu.c @@ -1,12 +1,16 @@ #include "kvm/kvm-cpu.h" #include "kvm/kvm.h" +#include "kvm/virtio.h" #include #define COMPAT_PSR_F_BIT 0x0040 #define COMPAT_PSR_I_BIT 0x0080 +#define COMPAT_PSR_E_BIT 0x0200 #define COMPAT_PSR_MODE_SVC0x0013 +#define SCTLR_EL1_EE_MASK (1 << 25) + #define ARM64_CORE_REG(x) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x)) @@ -133,6 +137,27 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu) return reset_vcpu_aarch64(vcpu); } +int kvm_cpu__get_endianness(struct kvm_cpu *vcpu) +{ + struct kvm_one_reg reg; + u64 data; + + reg.id = ARM64_CORE_REG(regs.pstate); + reg.addr = (u64)&data; + if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, ®) < 0) + die("KVM_GET_ONE_REG failed (spsr[EL1])"); + + if (data & PSR_MODE32_BIT) + return (data & COMPAT_PSR_E_BIT) ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE; + + reg.id = ARM64_SYS_REG(ARM_CPU_CTRL, ARM_CPU_CTRL_SCTLR); /* SCTLR_EL1 */ + reg.addr = (u64)&data; + if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, ®) < 0) + die("KVM_GET_ONE_REG failed (SCTLR_EL1)"); + + return (data & SCTLR_EL1_EE_MASK) ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE; +} + void kvm_cpu__show_code(struct kvm_cpu *vcpu) { struct kvm_one_reg reg; diff --git a/tools/kvm/arm/include/arm-common/kvm-arch.h b/tools/kvm/arm/include/arm-common/kvm-arch.h index b6c4bf8..5d2fab2 100644 --- a/tools/kvm/arm/include/arm-common/kvm-arch.h +++ b/tools/kvm/arm/include/arm-common/kvm-arch.h @@ -35,6 +35,8 @@ #define VIRTIO_DEFAULT_TRANS(kvm) \ ((kvm)->cfg.arch.virtio_trans_pci ? VIRTIO_PCI : VIRTIO_MMIO) +#define VIRTIO_RING_ENDIAN (VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE) + static inline bool arm_addr_in_ioport_region(u64 phys_addr) { u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE; -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/9] kvmtool: sample CPU endianness on virtio-mmio device reset
Save the CPU endianness when the device is reset. It is widely assumed that the guest won't change its endianness after, or at least not without reseting the device first. A default implementation of the endianness sampling just returns the default "host endianness" value so that unsuspecting architectures are not affected. Signed-off-by: Marc Zyngier --- tools/kvm/include/kvm/kvm-cpu.h | 1 + tools/kvm/include/kvm/virtio.h | 1 + tools/kvm/kvm-cpu.c | 6 ++ tools/kvm/virtio/mmio.c | 3 +++ 4 files changed, 11 insertions(+) diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h index 0ece28c..aa0cb54 100644 --- a/tools/kvm/include/kvm/kvm-cpu.h +++ b/tools/kvm/include/kvm/kvm-cpu.h @@ -15,6 +15,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu); void kvm_cpu__reboot(struct kvm *kvm); int kvm_cpu__start(struct kvm_cpu *cpu); bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu); +int kvm_cpu__get_endianness(struct kvm_cpu *vcpu); int kvm_cpu__get_debug_fd(void); void kvm_cpu__set_debug_fd(int fd); diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h index f6bddd9..1180a3e 100644 --- a/tools/kvm/include/kvm/virtio.h +++ b/tools/kvm/include/kvm/virtio.h @@ -132,6 +132,7 @@ struct virtio_device { booluse_vhost; void*virtio; struct virtio_ops *ops; + u16 endian; }; struct virtio_ops { diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index 5c70b00..9575b32 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -3,6 +3,7 @@ #include "kvm/symbol.h" #include "kvm/util.h" #include "kvm/kvm.h" +#include "kvm/virtio.h" #include #include @@ -14,6 +15,11 @@ extern __thread struct kvm_cpu *current_kvm_cpu; +int __attribute__((weak)) kvm_cpu__get_endianness(struct kvm_cpu *vcpu) +{ + return VIRTIO_ENDIAN_HOST; +} + void kvm_cpu__enable_singlestep(struct kvm_cpu *vcpu) { struct kvm_guest_debug debug = { diff --git a/tools/kvm/virtio/mmio.c b/tools/kvm/virtio/mmio.c index 9d385e2..3a2bd62 100644 --- a/tools/kvm/virtio/mmio.c +++ b/tools/kvm/virtio/mmio.c @@ -4,6 +4,7 @@ #include "kvm/ioport.h" #include "kvm/virtio.h" #include "kvm/kvm.h" +#include "kvm/kvm-cpu.h" #include "kvm/irq.h" #include "kvm/fdt.h" @@ -159,6 +160,8 @@ static void virtio_mmio_config_out(struct kvm_cpu *vcpu, break; case VIRTIO_MMIO_STATUS: vmmio->hdr.status = ioport__read32(data); + if (!vmmio->hdr.status) /* Sample endianness on reset */ + vdev->endian = kvm_cpu__get_endianness(vcpu); if (vdev->ops->notify_status) vdev->ops->notify_status(kvm, vmmio->dev, vmmio->hdr.status); break; -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/9] kvmtool: add queue endianness initializer
Add a utility function that transfers the endianness sampled at device reset time to a queue being set up. Signed-off-by: Marc Zyngier --- tools/kvm/include/kvm/virtio.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h index 1180a3e..8a9eab5 100644 --- a/tools/kvm/include/kvm/virtio.h +++ b/tools/kvm/include/kvm/virtio.h @@ -28,6 +28,7 @@ struct virt_queue { It's where we assume the next request index is at. */ u16 last_avail_idx; u16 last_used_signalled; + u16 endian; }; /* @@ -165,4 +166,10 @@ static inline void *virtio_get_vq(struct kvm *kvm, u32 pfn, u32 page_size) return guest_flat_to_host(kvm, (u64)pfn * page_size); } +static inline void virtio_init_device_vq(struct virtio_device *vdev, +struct virt_queue *vq) +{ + vq->endian = vdev->endian; +} + #endif /* KVM__VIRTIO_H */ -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 7/9] kvmtool: convert blk backend to support bi-endianness
Configure the queues to follow the guest endianness, and make sure the configuration space is doing the same. Signed-off-by: Marc Zyngier --- tools/kvm/virtio/blk.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c index 4bed3a9..edfa8e6 100644 --- a/tools/kvm/virtio/blk.c +++ b/tools/kvm/virtio/blk.c @@ -77,13 +77,15 @@ void virtio_blk_complete(void *param, long len) bdev->vdev.ops->signal_vq(req->kvm, &bdev->vdev, queueid); } -static void virtio_blk_do_io_request(struct kvm *kvm, struct blk_dev_req *req) +static void virtio_blk_do_io_request(struct kvm *kvm, struct virt_queue *vq, struct blk_dev_req *req) { struct virtio_blk_outhdr *req_hdr; ssize_t block_cnt; struct blk_dev *bdev; struct iovec *iov; u16 out, in; + u32 type; + u64 sector; block_cnt = -1; bdev= req->bdev; @@ -92,13 +94,16 @@ static void virtio_blk_do_io_request(struct kvm *kvm, struct blk_dev_req *req) in = req->in; req_hdr = iov[0].iov_base; - switch (req_hdr->type) { + type = virtio_guest_to_host_u32(vq, req_hdr->type); + sector = virtio_guest_to_host_u64(vq, req_hdr->sector); + + switch (type) { case VIRTIO_BLK_T_IN: - block_cnt = disk_image__read(bdev->disk, req_hdr->sector, + block_cnt = disk_image__read(bdev->disk, sector, iov + 1, in + out - 2, req); break; case VIRTIO_BLK_T_OUT: - block_cnt = disk_image__write(bdev->disk, req_hdr->sector, + block_cnt = disk_image__write(bdev->disk, sector, iov + 1, in + out - 2, req); break; case VIRTIO_BLK_T_FLUSH: @@ -112,7 +117,7 @@ static void virtio_blk_do_io_request(struct kvm *kvm, struct blk_dev_req *req) virtio_blk_complete(req, block_cnt); break; default: - pr_warning("request type %d", req_hdr->type); + pr_warning("request type %d", type); block_cnt = -1; break; } @@ -130,7 +135,7 @@ static void virtio_blk_do_io(struct kvm *kvm, struct virt_queue *vq, struct blk_ &req->in, head, kvm); req->vq = vq; - virtio_blk_do_io_request(kvm, req); + virtio_blk_do_io_request(kvm, vq, req); } } @@ -152,8 +157,21 @@ static u32 get_host_features(struct kvm *kvm, void *dev) static void set_guest_features(struct kvm *kvm, void *dev, u32 features) { struct blk_dev *bdev = dev; + struct virtio_blk_config *conf = &bdev->blk_config; + struct virtio_blk_geometry *geo = &conf->geometry; bdev->features = features; + + conf->capacity = virtio_host_to_guest_u64(&bdev->vdev, conf->capacity); + conf->size_max = virtio_host_to_guest_u32(&bdev->vdev, conf->size_max); + conf->seg_max = virtio_host_to_guest_u32(&bdev->vdev, conf->seg_max); + + /* Geometry */ + geo->cylinders = virtio_host_to_guest_u16(&bdev->vdev, geo->cylinders); + + conf->blk_size = virtio_host_to_guest_u32(&bdev->vdev, conf->blk_size); + conf->min_io_size = virtio_host_to_guest_u16(&bdev->vdev, conf->min_io_size); + conf->opt_io_size = virtio_host_to_guest_u32(&bdev->vdev, conf->opt_io_size); } static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, @@ -170,6 +188,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, p = virtio_get_vq(kvm, queue->pfn, page_size); vring_init(&queue->vring, VIRTIO_BLK_QUEUE_SIZE, p, align); + virtio_init_device_vq(&bdev->vdev, queue); return 0; } -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 8/9] kvmtool: convert net backend to support bi-endianness
Configure the queues to follow the guest endianness, and make sure the configuration space is doing the same. Extra care is taken for the handling of the virtio_net_hdr structures on both the TX and RX ends. Signed-off-by: Marc Zyngier --- tools/kvm/virtio/net.c | 45 - 1 file changed, 40 insertions(+), 5 deletions(-) diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c index dbb4431..363ec73 100644 --- a/tools/kvm/virtio/net.c +++ b/tools/kvm/virtio/net.c @@ -73,6 +73,24 @@ static bool has_virtio_feature(struct net_dev *ndev, u32 feature) return ndev->features & (1 << feature); } +static void virtio_net_fix_tx_hdr(struct virtio_net_hdr *hdr, struct net_dev *ndev) +{ + hdr->hdr_len= virtio_guest_to_host_u16(&ndev->vdev, hdr->hdr_len); + hdr->gso_size = virtio_guest_to_host_u16(&ndev->vdev, hdr->gso_size); + hdr->csum_start = virtio_guest_to_host_u16(&ndev->vdev, hdr->csum_start); + hdr->csum_offset= virtio_guest_to_host_u16(&ndev->vdev, hdr->csum_offset); +} + +static void virtio_net_fix_rx_hdr(struct virtio_net_hdr_mrg_rxbuf *hdr, struct net_dev *ndev) +{ + hdr->hdr.hdr_len= virtio_host_to_guest_u16(&ndev->vdev, hdr->hdr.hdr_len); + hdr->hdr.gso_size = virtio_host_to_guest_u16(&ndev->vdev, hdr->hdr.gso_size); + hdr->hdr.csum_start = virtio_host_to_guest_u16(&ndev->vdev, hdr->hdr.csum_start); + hdr->hdr.csum_offset= virtio_host_to_guest_u16(&ndev->vdev, hdr->hdr.csum_offset); + if (has_virtio_feature(ndev, VIRTIO_NET_F_MRG_RXBUF)) + hdr->num_buffers= virtio_host_to_guest_u16(&ndev->vdev, hdr->num_buffers); +} + static void *virtio_net_rx_thread(void *p) { struct iovec iov[VIRTIO_NET_QUEUE_SIZE]; @@ -106,6 +124,7 @@ static void *virtio_net_rx_thread(void *p) .iov_len = sizeof(buffer), }; struct virtio_net_hdr_mrg_rxbuf *hdr; + int i; len = ndev->ops->rx(&dummy_iov, 1, ndev); if (len < 0) { @@ -114,16 +133,20 @@ static void *virtio_net_rx_thread(void *p) goto out_err; } - copied = 0; + copied = i = 0; head = virt_queue__get_iov(vq, iov, &out, &in, kvm); - hdr = (void *)iov[0].iov_base; + hdr = iov[0].iov_base; while (copied < len) { size_t iovsize = min_t(size_t, len - copied, iov_size(iov, in)); memcpy_toiovec(iov, buffer + copied, iovsize); copied += iovsize; - if (has_virtio_feature(ndev, VIRTIO_NET_F_MRG_RXBUF)) - hdr->num_buffers++; + if (i++ == 0) + virtio_net_fix_rx_hdr(hdr, ndev); + if (has_virtio_feature(ndev, VIRTIO_NET_F_MRG_RXBUF)) { + u16 num_buffers = virtio_guest_to_host_u16(vq, hdr->num_buffers); + hdr->num_buffers = virtio_host_to_guest_u16(vq, num_buffers + 1); + } virt_queue__set_used_elem(vq, head, iovsize); if (copied == len) break; @@ -170,11 +193,14 @@ static void *virtio_net_tx_thread(void *p) mutex_unlock(&ndev->io_lock[id]); while (virt_queue__available(vq)) { + struct virtio_net_hdr *hdr; head = virt_queue__get_iov(vq, iov, &out, &in, kvm); + hdr = iov[0].iov_base; + virtio_net_fix_tx_hdr(hdr, ndev); len = ndev->ops->tx(iov, out, ndev); if (len < 0) { pr_warning("%s: tx on vq %u failed (%d)\n", - __func__, id, len); + __func__, id, errno); goto out_err; } @@ -415,9 +441,14 @@ static int virtio_net__vhost_set_features(struct net_dev *ndev) static void set_guest_features(struct kvm *kvm, void *dev, u32 features) { struct net_dev *ndev = dev; + struct virtio_net_config *conf = &ndev->config; ndev->features = features; + conf->status = virtio_host_to_guest_u16(&ndev->vdev, conf->status); + conf->max_virtqueue_pairs = virtio_host_to_guest_u16(&ndev->vdev, + conf->max_virtqueue_pairs); + if (ndev->mode
[PATCH v3 6/9] kvmtool: convert 9p backend to support bi-endianness
Configure the queues to follow the guest endianness, and make sure the configuration space is doing the same. Signed-off-by: Marc Zyngier --- tools/kvm/virtio/9p.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c index 847eddb..9073a1e 100644 --- a/tools/kvm/virtio/9p.c +++ b/tools/kvm/virtio/9p.c @@ -1252,8 +1252,10 @@ static u32 get_host_features(struct kvm *kvm, void *dev) static void set_guest_features(struct kvm *kvm, void *dev, u32 features) { struct p9_dev *p9dev = dev; + struct virtio_9p_config *conf = p9dev->config; p9dev->features = features; + conf->tag_len = virtio_host_to_guest_u16(&p9dev->vdev, conf->tag_len); } static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, @@ -1272,6 +1274,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, job = &p9dev->jobs[vq]; vring_init(&queue->vring, VIRTQUEUE_NUM, p, align); + virtio_init_device_vq(&p9dev->vdev, queue); *job= (struct p9_dev_job) { .vq = queue, -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/9] kvmtool: virt_queue configuration based on endianness
Define a simple infrastructure to configure a virt_queue depending on the guest endianness, as reported by the feature flags. At this stage, the endianness is always the host's. Wrap all accesses to virt_queue data structures shared between host and guest with byte swapping helpers. Should the architecture only support one endianness, these helpers are reduced to the identity function. Signed-off-by: Marc Zyngier --- tools/kvm/include/kvm/virtio.h | 74 -- tools/kvm/virtio/core.c| 59 +++-- 2 files changed, 105 insertions(+), 28 deletions(-) diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h index 820b94a..f6bddd9 100644 --- a/tools/kvm/include/kvm/virtio.h +++ b/tools/kvm/include/kvm/virtio.h @@ -1,6 +1,8 @@ #ifndef KVM__VIRTIO_H #define KVM__VIRTIO_H +#include + #include #include @@ -15,6 +17,10 @@ #define VIRTIO_PCI_O_CONFIG0 #define VIRTIO_PCI_O_MSIX 1 +#define VIRTIO_ENDIAN_HOST 0 +#define VIRTIO_ENDIAN_LE (1 << 0) +#define VIRTIO_ENDIAN_BE (1 << 1) + struct virt_queue { struct vringvring; u32 pfn; @@ -24,9 +30,71 @@ struct virt_queue { u16 last_used_signalled; }; +/* + * The default policy is not to cope with the guest endianness. + * It also helps not breaking archs that do not care about supporting + * such a configuration. + */ +#ifndef VIRTIO_RING_ENDIAN +#define VIRTIO_RING_ENDIAN VIRTIO_ENDIAN_HOST +#endif + +#if (VIRTIO_RING_ENDIAN & (VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)) + +static inline __u16 __virtio_g2h_u16(u16 endian, __u16 val) +{ + return (endian == VIRTIO_ENDIAN_LE) ? le16toh(val) : be16toh(val); +} + +static inline __u16 __virtio_h2g_u16(u16 endian, __u16 val) +{ + return (endian == VIRTIO_ENDIAN_LE) ? htole16(val) : htobe16(val); +} + +static inline __u32 __virtio_g2h_u32(u16 endian, __u32 val) +{ + return (endian == VIRTIO_ENDIAN_LE) ? le32toh(val) : be32toh(val); +} + +static inline __u32 __virtio_h2g_u32(u16 endian, __u32 val) +{ + return (endian == VIRTIO_ENDIAN_LE) ? htole32(val) : htobe32(val); +} + +static inline __u64 __virtio_g2h_u64(u16 endian, __u64 val) +{ + return (endian == VIRTIO_ENDIAN_LE) ? le64toh(val) : be64toh(val); +} + +static inline __u64 __virtio_h2g_u64(u16 endian, __u64 val) +{ + return (endian == VIRTIO_ENDIAN_LE) ? htole64(val) : htobe64(val); +} + +#define virtio_guest_to_host_u16(x, v) __virtio_g2h_u16((x)->endian, (v)) +#define virtio_host_to_guest_u16(x, v) __virtio_h2g_u16((x)->endian, (v)) +#define virtio_guest_to_host_u32(x, v) __virtio_g2h_u32((x)->endian, (v)) +#define virtio_host_to_guest_u32(x, v) __virtio_h2g_u32((x)->endian, (v)) +#define virtio_guest_to_host_u64(x, v) __virtio_g2h_u64((x)->endian, (v)) +#define virtio_host_to_guest_u64(x, v) __virtio_h2g_u64((x)->endian, (v)) + +#else + +#define virtio_guest_to_host_u16(x, v) (v) +#define virtio_host_to_guest_u16(x, v) (v) +#define virtio_guest_to_host_u32(x, v) (v) +#define virtio_host_to_guest_u32(x, v) (v) +#define virtio_guest_to_host_u64(x, v) (v) +#define virtio_host_to_guest_u64(x, v) (v) + +#endif + static inline u16 virt_queue__pop(struct virt_queue *queue) { - return queue->vring.avail->ring[queue->last_avail_idx++ % queue->vring.num]; + __u16 guest_idx; + + guest_idx = queue->vring.avail->ring[queue->last_avail_idx++ % queue->vring.num]; + return virtio_guest_to_host_u16(queue, guest_idx); } static inline struct vring_desc *virt_queue__get_desc(struct virt_queue *queue, u16 desc_ndx) @@ -39,8 +107,8 @@ static inline bool virt_queue__available(struct virt_queue *vq) if (!vq->vring.avail) return 0; - vring_avail_event(&vq->vring) = vq->last_avail_idx; - return vq->vring.avail->idx != vq->last_avail_idx; + vring_avail_event(&vq->vring) = virtio_host_to_guest_u16(vq, vq->last_avail_idx); + return virtio_guest_to_host_u16(vq, vq->vring.avail->idx) != vq->last_avail_idx; } struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len); diff --git a/tools/kvm/virtio/core.c b/tools/kvm/virtio/core.c index 2dfb828..9ae7887 100644 --- a/tools/kvm/virtio/core.c +++ b/tools/kvm/virtio/core.c @@ -15,10 +15,11 @@ struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len) { struct vring_used_elem *used_elem; + u16 idx = virtio_guest_to_host_u16(queue, queue->vring.used->idx); - used_elem = &queue->vring.used->ring[queue->vring.used->idx % queue->vring.num]; - used_elem->id = head; - used_elem->len = len; + used_elem = &queue->vring.used->ring[idx % queue->vring.num]; + used_elem->id = virtio_host_to_guest_u32(queue, head); + used_elem->len = virtio_host_to_guest_u32(queue, len); /* * Use wmb to assure that used e
[PATCH v3 1/9] kvmtool: pass trapped vcpu to MMIO accessors
In order to be able to find out about the endianness of a virtual CPU, it is necessary to pass a pointer to the kvm_cpu structure down to the MMIO accessors. This patch just pushes such pointer as far as required for the MMIO accessors to have a play with the vcpu. Signed-off-by: Marc Zyngier --- tools/kvm/arm/include/arm-common/kvm-cpu-arch.h | 4 ++-- tools/kvm/arm/kvm-cpu.c | 10 +- tools/kvm/hw/pci-shmem.c| 2 +- tools/kvm/include/kvm/kvm.h | 4 ++-- tools/kvm/kvm-cpu.c | 4 ++-- tools/kvm/mmio.c| 11 ++- tools/kvm/pci.c | 3 ++- tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h| 2 +- tools/kvm/powerpc/kvm-cpu.c | 4 ++-- tools/kvm/powerpc/spapr_pci.h | 6 +++--- tools/kvm/virtio/mmio.c | 18 +++--- tools/kvm/virtio/pci.c | 6 -- tools/kvm/x86/include/kvm/kvm-cpu-arch.h| 4 ++-- 13 files changed, 43 insertions(+), 35 deletions(-) diff --git a/tools/kvm/arm/include/arm-common/kvm-cpu-arch.h b/tools/kvm/arm/include/arm-common/kvm-cpu-arch.h index bef1761..355a02d 100644 --- a/tools/kvm/arm/include/arm-common/kvm-cpu-arch.h +++ b/tools/kvm/arm/include/arm-common/kvm-cpu-arch.h @@ -42,8 +42,8 @@ static inline bool kvm_cpu__emulate_io(struct kvm *kvm, u16 port, void *data, return false; } -bool kvm_cpu__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, - u8 is_write); +bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, + u32 len, u8 is_write); unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu); diff --git a/tools/kvm/arm/kvm-cpu.c b/tools/kvm/arm/kvm-cpu.c index 9c9616f..53afa35 100644 --- a/tools/kvm/arm/kvm-cpu.c +++ b/tools/kvm/arm/kvm-cpu.c @@ -98,17 +98,17 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu) return false; } -bool kvm_cpu__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, - u8 is_write) +bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, + u32 len, u8 is_write) { if (arm_addr_in_virtio_mmio_region(phys_addr)) { - return kvm__emulate_mmio(kvm, phys_addr, data, len, is_write); + return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write); } else if (arm_addr_in_ioport_region(phys_addr)) { int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN; u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX; - return kvm__emulate_io(kvm, port, data, direction, len, 1); + return kvm__emulate_io(vcpu->kvm, port, data, direction, len, 1); } else if (arm_addr_in_pci_region(phys_addr)) { - return kvm__emulate_mmio(kvm, phys_addr, data, len, is_write); + return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write); } return false; diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c index 34de747..4b837eb 100644 --- a/tools/kvm/hw/pci-shmem.c +++ b/tools/kvm/hw/pci-shmem.c @@ -105,7 +105,7 @@ static struct ioport_operations shmem_pci__io_ops = { .io_out = shmem_pci__io_out, }; -static void callback_mmio_msix(u64 addr, u8 *data, u32 len, u8 is_write, void *ptr) +static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr) { void *mem; diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index d05b936..f1b71a0 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -84,10 +84,10 @@ int kvm_timer__exit(struct kvm *kvm); void kvm__irq_line(struct kvm *kvm, int irq, int level); void kvm__irq_trigger(struct kvm *kvm, int irq); bool kvm__emulate_io(struct kvm *kvm, u16 port, void *data, int direction, int size, u32 count); -bool kvm__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, u8 is_write); +bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write); int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr); int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce, - void (*mmio_fn)(u64 addr, u8 *data, u32 len, u8 is_write, void *ptr), + void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr), void *ptr); bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr); void kvm__pause(struct kvm *kvm); diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index be05c49..5c70b00 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -54,7 +54,7 @@ static void kvm_cpu__handle_coalesced_mmio(struct kvm
[PATCH 0/4] Random kvmtool fixes
This small series addresses a number of issues that have been pestering me for a while. Nothing major though: The first two patches simply ensure that we can always use THP if the have been enabled on the host. Third one fixes an annoying issue when -tty is used. The fourth patch allows me to use TAP interfaces *and* run kvmtool as a non-priviledged user (tunctl is your BFF). The whole series applies on top of kvmtool/next as of yesterday. Thanks, M. Marc Zyngier (4): kvmtool: ARM: force alignment of memory for THP kvmtool: ARM: pass MADV_HUGEPAGE to madvise kvmtool: Fix handling of POLLHUP when --tty is used kvmtool: allow the TAP interface to be specified on the command line tools/kvm/arm/kvm.c| 10 ++ tools/kvm/include/kvm/virtio-net.h | 1 + tools/kvm/term.c | 4 +++- tools/kvm/virtio/net.c | 21 ++--- 4 files changed, 24 insertions(+), 12 deletions(-) -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] kvmtool: Fix handling of POLLHUP when --tty is used
The --tty option allows the redirection of a console (serial or virtio) to a pseudo-terminal. As long as the slave port of this pseudo-terminal is not opened by another process, a poll() call on the master port will return POLLHUP in the .event field. This confuses the virtio console code, as term_readable() returns a positive value, indicating that something is available, while the call to term_getc_iov will fail. The fix is to check for the presence of the POLLIN flag in the .event field. Note that this is only a partial fix, as kvmtool will still consume vast amounts of CPU resource by spinning like crazy until the slave port is actually opened. Signed-off-by: Marc Zyngier --- tools/kvm/term.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/kvm/term.c b/tools/kvm/term.c index 5c3e543..214f5e2 100644 --- a/tools/kvm/term.c +++ b/tools/kvm/term.c @@ -89,8 +89,10 @@ bool term_readable(int term) .events = POLLIN, .revents = 0, }; + int err; - return poll(&pollfd, 1, 0) > 0; + err = poll(&pollfd, 1, 0); + return (err > 0 && (pollfd.revents & POLLIN)); } static void *term_poll_thread_loop(void *param) -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] kvmtool: ARM: force alignment of memory for THP
Use of THP requires that the VMA containing the guest memory is 2MB aligned. Unfortunately, nothing in kvmtool ensures that the memory is actually aligned, making the use of THP very unlikely. Just follow what we're already doing for virtio, and expand our forced alignment to 2M. * without this patch: root@muffin-man:~# for i in $(seq 1 5); do ./hackbench 50 process 1000; done Running with 50*40 (== 2000) tasks. Time: 113.600 Running with 50*40 (== 2000) tasks. Time: 108.650 Running with 50*40 (== 2000) tasks. Time: 110.753 Running with 50*40 (== 2000) tasks. Time: 116.992 Running with 50*40 (== 2000) tasks. Time: 117.317 * with this patch: root@muffin-man:~# for i in $(seq 1 5); do ./hackbench 50 process 1000; done Running with 50*40 (== 2000) tasks. Time: 97.613 Running with 50*40 (== 2000) tasks. Time: 96.111 Running with 50*40 (== 2000) tasks. Time: 97.090 Running with 50*40 (== 2000) tasks. Time: 100.820 Running with 50*40 (== 2000) tasks. Time: 100.298 Acked-by: Will Deacon Signed-off-by: Marc Zyngier --- tools/kvm/arm/kvm.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/kvm/arm/kvm.c b/tools/kvm/arm/kvm.c index 008b7fe..d0d64ff 100644 --- a/tools/kvm/arm/kvm.c +++ b/tools/kvm/arm/kvm.c @@ -61,11 +61,13 @@ void kvm__arch_set_cmdline(char *cmdline, bool video) void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size) { /* -* Allocate guest memory. We must align out buffer to 64K to +* Allocate guest memory. We must align our buffer to 64K to * correlate with the maximum guest page size for virtio-mmio. +* If using THP, then our minimal alignment becomes 2M. +* 2M trumps 64K, so let's go with that. */ kvm->ram_size = min(ram_size, (u64)ARM_MAX_MEMORY(kvm)); - kvm->arch.ram_alloc_size = kvm->ram_size + SZ_64K; + kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M; kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, hugetlbfs_path, kvm->arch.ram_alloc_size); @@ -74,7 +76,7 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size) kvm->arch.ram_alloc_size, errno); kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start, - SZ_64K); + SZ_2M); madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size, MADV_MERGEABLE); -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] kvmtool: ARM: pass MADV_HUGEPAGE to madvise
If the host kernel is configured with CONFIG_TRANSPARENT_HUGEPAGE_MADVISE, it is important to madvise(MADV_HUGEPAGE) the memory region. Otherwise, the guest won't benefit from using THP. Acked-by: Will Deacon Signed-off-by: Marc Zyngier --- tools/kvm/arm/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/kvm/arm/kvm.c b/tools/kvm/arm/kvm.c index d0d64ff..58ad9fa 100644 --- a/tools/kvm/arm/kvm.c +++ b/tools/kvm/arm/kvm.c @@ -79,7 +79,7 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size) SZ_2M); madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size, - MADV_MERGEABLE); + MADV_MERGEABLE | MADV_HUGEPAGE); /* Initialise the virtual GIC. */ if (gic__init_irqchip(kvm)) -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] kvmtool: allow the TAP interface to be specified on the command line
In order to overcome the fact that a TAP interface can only be created by root, allow the use of an interface that has already been created, configured, made persistent and owned by a specific user/group (such as done with tunctl). In this case, any kind of configuration can be skipped (IP, up and running mode), and the TAP is assumed to be ready for use. This is done by introducing the "tapif" option, as used here: --network trans=mmio,mode=tap,tapif=blah where "blah" is a TAP interface. This allow the creation/configuration of the interface to be controlled by root, and lkvm to be run as a normal user. Signed-off-by: Marc Zyngier --- tools/kvm/include/kvm/virtio-net.h | 1 + tools/kvm/virtio/net.c | 21 ++--- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/tools/kvm/include/kvm/virtio-net.h b/tools/kvm/include/kvm/virtio-net.h index 0f4d1e5..f435cc3 100644 --- a/tools/kvm/include/kvm/virtio-net.h +++ b/tools/kvm/include/kvm/virtio-net.h @@ -10,6 +10,7 @@ struct virtio_net_params { const char *host_ip; const char *script; const char *trans; + const char *tapif; char guest_mac[6]; char host_mac[6]; struct kvm *kvm; diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c index dbb4431..82dbb88 100644 --- a/tools/kvm/virtio/net.c +++ b/tools/kvm/virtio/net.c @@ -257,6 +257,7 @@ static bool virtio_net__tap_init(struct net_dev *ndev) struct sockaddr_in sin = {0}; struct ifreq ifr; const struct virtio_net_params *params = ndev->params; + bool skipconf = !!params->tapif; /* Did the user already gave us the FD? */ if (params->fd) { @@ -272,6 +273,8 @@ static bool virtio_net__tap_init(struct net_dev *ndev) memset(&ifr, 0, sizeof(ifr)); ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; + if (params->tapif) + strncpy(ifr.ifr_name, params->tapif, sizeof(ifr.ifr_name)); if (ioctl(ndev->tap_fd, TUNSETIFF, &ifr) < 0) { pr_warning("Config tap device error. Are you root?"); goto fail; @@ -308,7 +311,7 @@ static bool virtio_net__tap_init(struct net_dev *ndev) goto fail; } } - } else { + } else if (!skipconf) { memset(&ifr, 0, sizeof(ifr)); strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name)); sin.sin_addr.s_addr = inet_addr(params->host_ip); @@ -320,12 +323,14 @@ static bool virtio_net__tap_init(struct net_dev *ndev) } } - memset(&ifr, 0, sizeof(ifr)); - strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name)); - ioctl(sock, SIOCGIFFLAGS, &ifr); - ifr.ifr_flags |= IFF_UP | IFF_RUNNING; - if (ioctl(sock, SIOCSIFFLAGS, &ifr) < 0) - pr_warning("Could not bring tap device up"); + if (!skipconf) { + memset(&ifr, 0, sizeof(ifr)); + strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name)); + ioctl(sock, SIOCGIFFLAGS, &ifr); + ifr.ifr_flags |= IFF_UP | IFF_RUNNING; + if (ioctl(sock, SIOCSIFFLAGS, &ifr) < 0) + pr_warning("Could not bring tap device up"); + } close(sock); @@ -650,6 +655,8 @@ static int set_net_param(struct kvm *kvm, struct virtio_net_params *p, p->host_ip = strdup(val); } else if (strcmp(param, "trans") == 0) { p->trans = strdup(val); + } else if (strcmp(param, "tapif") == 0) { + p->tapif = strdup(val); } else if (strcmp(param, "vhost") == 0) { p->vhost = atoi(val); } else if (strcmp(param, "fd") == 0) { -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] live migration support for initial write protect of VM
On Thu, Apr 24, 2014 at 05:39:29PM +0100, Steve Capper wrote: [ ... ] > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2557590 > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2548782 > Please ignore this notice, apologies for it appearing. I will learn how to configure email Cheers, -- Steve -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] live migration support for initial write protect of VM
On Wed, Apr 23, 2014 at 12:18:07AM +0100, Mario Smarduch wrote: > > > Support for live migration initial write protect. > - moved write protect to architecture memory region prepare function. This > way you can fail, abort migration without keep track of migration status. > - Above also allows to generalize read dirty log function with x86 > - Added stage2_mark_pte_ro() > - optimized initial write protect, skip upper table lookups > - added stage2pmd_addr_end() to do generic 4 level table walk > - changed kvm_flush_remote_tlbs() to weak function Hello Mario, I've taken a quick look at this and have a few suggestions below. (I'm not a KVM expert, but took a look at the memory manipulation). Future versions of this series could probably benefit from being sent to lakml too? Cheers, -- Steve > > Signed-off-by: Mario Smarduch > --- > arch/arm/include/asm/kvm_host.h |8 ++ > arch/arm/kvm/arm.c |3 + > arch/arm/kvm/mmu.c | 163 > +++ > virt/kvm/kvm_main.c |5 +- > 4 files changed, 178 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 1e739f9..9f827c8 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -67,6 +67,12 @@ struct kvm_arch { > > /* Interrupt controller */ > struct vgic_distvgic; > + > + /* Marks start of migration, used to handle 2nd stage page faults > +* during migration, prevent installing huge pages and split huge > pages > +* to small pages. > +*/ > + int migration_in_progress; > }; > > #define KVM_NR_MEM_OBJS 40 > @@ -230,4 +236,6 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, > u64 value); > > void kvm_tlb_flush_vmid(struct kvm *kvm); > > +int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); > + > #endif /* __ARM_KVM_HOST_H__ */ > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index 9a4bc10..b916478 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -233,6 +233,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, >struct kvm_userspace_memory_region *mem, >enum kvm_mr_change change) > { > + /* Request for migration issued by user, write protect memory slot */ > + if ((change != KVM_MR_DELETE) && (mem->flags & > KVM_MEM_LOG_DIRTY_PAGES)) > + return kvm_mmu_slot_remove_write_access(kvm, mem->slot); > return 0; > } > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 7ab77f3..4d029a6 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -31,6 +31,11 @@ > > #include "trace.h" > > +#define stage2pud_addr_end(addr, end) \ > +({ u64 __boundary = ((addr) + PUD_SIZE) & PUD_MASK;\ > + (__boundary - 1 < (end) - 1) ? __boundary : (end); \ > +}) A matter of personal preference: can this be a static inline function instead? That way you could avoid ambiguity with the parameter types. (not an issue here, but this has bitten me in the past). > + > extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[]; > > static pgd_t *boot_hyp_pgd; > @@ -569,6 +574,15 @@ static int stage2_set_pte(struct kvm *kvm, struct > kvm_mmu_memory_cache *cache, > return 0; > } > > +/* Write protect page */ > +static void stage2_mark_pte_ro(pte_t *pte) > +{ > + pte_t new_pte; > + > + new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2); > + *pte = new_pte; > +} This isn't making the pte read only. It's nuking all the flags from the pte and replacing them with factory settings. (In this case the PAGE_S2 pgprot). If we had other attributes that we later wish to retain this could be easily overlooked. Perhaps a new name for the function? > + > /** > * kvm_phys_addr_ioremap - map a device range to guest IPA > * > @@ -649,6 +663,155 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, > phys_addr_t *ipap) > return false; > } > > +/** > + * split_pmd - splits huge pages to small pages, required to keep a dirty > log of > + * smaller memory granules, otherwise huge pages would need to be > + * migrated. Practically an idle system has problems migrating with > + * huge pages. Called during WP of entire VM address space, done > + * initially when migration thread isses the KVM_MEM_LOG_DIRTY_PAGES > + * ioctl. > + * The mmu_lock is held during splitting. > + * > + * @kvm:The KVM pointer > + * @pmd:Pmd to 2nd stage huge page > + * @addr: ` Guest Physical Address Nitpick: typo ` > + */ > +int split_pmd(struct kvm *kvm, pmd_t *pmd, u64 addr) Maybe worth renaming to something like kvm_split_pmd to avoid future namespace collisions (either compiler or cscope/ctags)? It should also probably be static? > +{ > + struct page *page; > + pfn_t pf
Re: [PATCH v5 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
On Thu, Apr 17, 2014 at 05:06:11PM +0800, Xiao Guangrong wrote: > Since Marcelo has agreed the comments improving in the off-line mail, i > consider this is his Ack. :) Please let me know If i misunderstood it. > > This patchset is splited from my previous patchset: > [PATCH v3 00/15] KVM: MMU: locklessly write-protect > that can be found at: > https://lkml.org/lkml/2013/10/23/265 Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: async_pf: use_mm/mm_users fixes
On 04/24, Christian Borntraeger wrote: > > On 21/04/14 15:25, Oleg Nesterov wrote: > > Hello. > > > > Completely untested and I know nothing about kvm ;) Please review. > > > > But use_mm() really looks misleading, and the usage of mm_users looks > > "obviously wrong". I already sent this change while we were discussing > > vmacache, but it was ignored. Since then kvm_async_page_present_sync() > > was added into async_pf_execute() into async_pf_execute(), but it seems > > to me that use_mm() is still unnecessary. > > > > Oleg. > > > > virt/kvm/async_pf.c | 10 -- > > 1 files changed, 4 insertions(+), 6 deletions(-) > > > > I gave both patches some testing on s390, seems fine. I think patch2 really > does fix a bug. So if Paolo, Marcelo, Gleb agree (maybe do a test on x86 for > async_pf) both patches are good to go. Given that somebody tests this on x86: > > Acked-by: Christian Borntraeger Thanks! I think x86 should be fine, it doesn't select CONFIG_KVM_ASYNC_PF_SYNC and get_user_pages() is certainly fine without use_mm(). And I still think it should do get_user_pages(tsk => NULL) but this is minor. Oleg. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: async_pf: use_mm/mm_users fixes
On 21/04/14 15:25, Oleg Nesterov wrote: > Hello. > > Completely untested and I know nothing about kvm ;) Please review. > > But use_mm() really looks misleading, and the usage of mm_users looks > "obviously wrong". I already sent this change while we were discussing > vmacache, but it was ignored. Since then kvm_async_page_present_sync() > was added into async_pf_execute() into async_pf_execute(), but it seems > to me that use_mm() is still unnecessary. > > Oleg. > > virt/kvm/async_pf.c | 10 -- > 1 files changed, 4 insertions(+), 6 deletions(-) > I gave both patches some testing on s390, seems fine. I think patch2 really does fix a bug. So if Paolo, Marcelo, Gleb agree (maybe do a test on x86 for async_pf) both patches are good to go. Given that somebody tests this on x86: Acked-by: Christian Borntraeger -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/13] KVM: PPC: Book3S PR: Implement LPCR ONE_REG
To control whether we should inject interrupts in little or big endian mode, user space sets the LPCR.ILE bit accordingly via ONE_REG. Let's implement it, so we are able to trigger interrupts in LE mode. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/kvm/book3s_64_mmu.c | 8 +++- arch/powerpc/kvm/book3s_pr.c | 6 ++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index bb1e38a..27b1041 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -106,6 +106,7 @@ struct kvmppc_vcpu_book3s { #endif int hpte_cache_count; spinlock_t mmu_lock; + ulong lpcr; }; #define CONTEXT_HOST 0 diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index 83da1f8..4a77725 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -38,7 +38,13 @@ static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu) { - kvmppc_set_msr(vcpu, MSR_SF); + struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu); + ulong new_msr = MSR_SF; + + if (vcpu_book3s->lpcr & LPCR_ILE) + new_msr |= MSR_LE; + + kvmppc_set_msr(vcpu, new_msr); } static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe( diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index c5c052a..9189ac5 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1110,6 +1110,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_HIOR: *val = get_reg_val(id, to_book3s(vcpu)->hior); break; + case KVM_REG_PPC_LPCR: + *val = get_reg_val(id, to_book3s(vcpu)->lpcr); + break; default: r = -EINVAL; break; @@ -1128,6 +1131,9 @@ static int kvmppc_set_one_reg_pr(struct kvm_vcpu *vcpu, u64 id, to_book3s(vcpu)->hior = set_reg_val(id, *val); to_book3s(vcpu)->hior_explicit = true; break; + case KVM_REG_PPC_LPCR: + to_book3s(vcpu)->lpcr = set_reg_val(id, *val) & LPCR_ILE; + break; default: r = -EINVAL; break; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/13] KVM: PPC: Book3S: Move little endian conflict to HV KVM
With the previous patches applied, we can now successfully use PR KVM on little endian hosts which means we can now allow users to select it. However, HV KVM still needs some work, so let's keep the kconfig conflict on that one. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 141b202..d6a53b9 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -6,7 +6,6 @@ source "virt/kvm/Kconfig" menuconfig VIRTUALIZATION bool "Virtualization" - depends on !CPU_LITTLE_ENDIAN ---help--- Say Y here to get to see options for using your Linux host to run other operating systems inside virtual machines (guests). @@ -76,6 +75,7 @@ config KVM_BOOK3S_64 config KVM_BOOK3S_64_HV tristate "KVM support for POWER7 and PPC970 using hypervisor mode in host" depends on KVM_BOOK3S_64 + depends on !CPU_LITTLE_ENDIAN select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/13] PPC: KVM: Enable PR KVM on ppc64le
During the enablement of ppc64le KVM has been kept unfixed. This patch set is the initial attempt to make all of KVM work on ppc64le hosts. It starts the effort by bringing PR KVM over. With this patch set I am successfully able to run book3s_32 (BE) and book3s_64 (BE, LE) guests on a host ppc64le system. Please bear in mind that this patch set does *not* implement POWER8 support, so if you're running on a POWER8 host you definitely want to pass in -cpu POWER7 and cross your fingers that the guest doesn't trigger a facility unavailable interrupt which we don't trap on yet. Alex Alexander Graf (13): KVM: PPC: Book3S PR: Implement LPCR ONE_REG KVM: PPC: Book3S: PR: Fix C/R bit setting KVM: PPC: Book3S_32: PR: Access HTAB in big endian KVM: PPC: Book3S_64 PR: Access HTAB in big endian KVM: PPC: Book3S_64 PR: Access shadow slb in big endian KVM: PPC: Book3S PR: Give guest control over MSR_LE KVM: PPC: Book3S PR: Default to big endian guest KVM: PPC: Book3S PR: PAPR: Access HTAB in big endian KVM: PPC: Book3S PR: PAPR: Access RTAS in big endian KVM: PPC: PR: Fill pvinfo hcall instructions in big endian KVM: PPC: Make shared struct aka magic page guest endian KVM: PPC: Book3S PR: Do dcbz32 patching with big endian instructions KVM: PPC: Book3S: Move little endian conflict to HV KVM arch/powerpc/include/asm/kvm_book3s.h| 4 +- arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/include/asm/kvm_ppc.h | 80 ++- arch/powerpc/kernel/asm-offsets.c| 2 + arch/powerpc/kvm/Kconfig | 2 +- arch/powerpc/kvm/book3s.c| 72 ++-- arch/powerpc/kvm/book3s_32_mmu.c | 41 +++- arch/powerpc/kvm/book3s_32_mmu_host.c| 4 +- arch/powerpc/kvm/book3s_64_mmu.c | 42 +++- arch/powerpc/kvm/book3s_64_mmu_host.c| 4 +- arch/powerpc/kvm/book3s_64_slb.S | 33 +- arch/powerpc/kvm/book3s_emulate.c| 28 arch/powerpc/kvm/book3s_hv.c | 11 arch/powerpc/kvm/book3s_interrupts.S | 23 ++- arch/powerpc/kvm/book3s_paired_singles.c | 16 +++-- arch/powerpc/kvm/book3s_pr.c | 109 +++ arch/powerpc/kvm/book3s_pr_papr.c| 16 +++-- arch/powerpc/kvm/book3s_rtas.c | 29 arch/powerpc/kvm/emulate.c | 24 +++ arch/powerpc/kvm/powerpc.c | 50 +++--- arch/powerpc/kvm/trace_pr.h | 2 +- 21 files changed, 410 insertions(+), 185 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/13] KVM: PPC: Book3S PR: Give guest control over MSR_LE
When we calculate the actual MSR that the guest is running with when in guest context, we take a few MSR bits from the MSR the guest thinks it's using. Add MSR_LE to these bits, so the guest gets full control over its own endianness setting. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_pr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 9189ac5..8076543 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -249,7 +249,7 @@ static void kvmppc_recalc_shadow_msr(struct kvm_vcpu *vcpu) ulong smsr = vcpu->arch.shared->msr; /* Guest MSR values */ - smsr &= MSR_FE0 | MSR_FE1 | MSR_SF | MSR_SE | MSR_BE; + smsr &= MSR_FE0 | MSR_FE1 | MSR_SF | MSR_SE | MSR_BE | MSR_LE; /* Process MSR values */ smsr |= MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_PR | MSR_EE; /* External providers the guest reserved */ -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/13] KVM: PPC: Book3S_32: PR: Access HTAB in big endian
The HTAB is always big endian. We access the guest's HTAB using copy_from/to_user, but don't yet take care of the fact that we might be running on an LE host. Wrap all accesses to the guest HTAB with big endian accessors. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_32_mmu.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 60fc3f4..0e42b16 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -208,6 +208,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr, u32 sre; hva_t ptegp; u32 pteg[16]; + u32 pte0, pte1; u32 ptem = 0; int i; int found = 0; @@ -233,11 +234,13 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr, } for (i=0; i<16; i+=2) { - if (ptem == pteg[i]) { + pte0 = be32_to_cpu(pteg[i]); + pte1 = be32_to_cpu(pteg[i + 1]); + if (ptem == pte0) { u8 pp; - pte->raddr = (pteg[i+1] & ~(0xFFFULL)) | (eaddr & 0xFFF); - pp = pteg[i+1] & 3; + pte->raddr = (pte1 & ~(0xFFFULL)) | (eaddr & 0xFFF); + pp = pte1 & 3; if ((sr_kp(sre) && (vcpu->arch.shared->msr & MSR_PR)) || (sr_ks(sre) && !(vcpu->arch.shared->msr & MSR_PR))) @@ -260,7 +263,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr, } dprintk_pte("MMU: Found PTE -> %x %x - %x\n", - pteg[i], pteg[i+1], pp); + pte0, pte1, pp); found = 1; break; } @@ -269,7 +272,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr, /* Update PTE C and A bits, so the guest's swapper knows we used the page */ if (found) { - u32 pte_r = pteg[i+1]; + u32 pte_r = pte1; char __user *addr = (char __user *) (ptegp + (i+1) * sizeof(u32)); /* @@ -296,7 +299,8 @@ no_page_found: to_book3s(vcpu)->sdr1, ptegp); for (i=0; i<16; i+=2) { dprintk_pte(" %02d: 0x%x - 0x%x (0x%x)\n", - i, pteg[i], pteg[i+1], ptem); + i, be32_to_cpu(pteg[i]), + be32_to_cpu(pteg[i+1]), ptem); } } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/13] KVM: PPC: PR: Fill pvinfo hcall instructions in big endian
We expose a blob of hypercall instructions to user space that it gives to the guest via device tree again. That blob should contain a stream of instructions necessary to do a hypercall in big endian, as it just gets passed into the guest and old guests use them straight away. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/powerpc.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 3cf541a..a9bd0ff 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -1015,10 +1015,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo) u32 inst_nop = 0x6000; #ifdef CONFIG_KVM_BOOKE_HV u32 inst_sc1 = 0x4422; - pvinfo->hcall[0] = inst_sc1; - pvinfo->hcall[1] = inst_nop; - pvinfo->hcall[2] = inst_nop; - pvinfo->hcall[3] = inst_nop; + pvinfo->hcall[0] = cpu_to_be32(inst_sc1); + pvinfo->hcall[1] = cpu_to_be32(inst_nop); + pvinfo->hcall[2] = cpu_to_be32(inst_nop); + pvinfo->hcall[3] = cpu_to_be32(inst_nop); #else u32 inst_lis = 0x3c00; u32 inst_ori = 0x6000; @@ -1034,10 +1034,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo) *sc *nop */ - pvinfo->hcall[0] = inst_lis | ((KVM_SC_MAGIC_R0 >> 16) & inst_imm_mask); - pvinfo->hcall[1] = inst_ori | (KVM_SC_MAGIC_R0 & inst_imm_mask); - pvinfo->hcall[2] = inst_sc; - pvinfo->hcall[3] = inst_nop; + pvinfo->hcall[0] = cpu_to_be32(inst_lis | ((KVM_SC_MAGIC_R0 >> 16) & inst_imm_mask)); + pvinfo->hcall[1] = cpu_to_be32(inst_ori | (KVM_SC_MAGIC_R0 & inst_imm_mask)); + pvinfo->hcall[2] = cpu_to_be32(inst_sc); + pvinfo->hcall[3] = cpu_to_be32(inst_nop); #endif pvinfo->flags = KVM_PPC_PVINFO_FLAGS_EV_IDLE; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/13] KVM: PPC: Book3S_64 PR: Access HTAB in big endian
The HTAB is always big endian. We access the guest's HTAB using copy_from/to_user, but don't yet take care of the fact that we might be running on an LE host. Wrap all accesses to the guest HTAB with big endian accessors. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_64_mmu.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index e9854e7..158fb22 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -281,12 +281,15 @@ do_second: key = 4; for (i=0; i<16; i+=2) { + u64 pte0 = be64_to_cpu(pteg[i]); + u64 pte1 = be64_to_cpu(pteg[i + 1]); + /* Check all relevant fields of 1st dword */ - if ((pteg[i] & v_mask) == v_val) { + if ((pte0 & v_mask) == v_val) { /* If large page bit is set, check pgsize encoding */ if (slbe->large && (vcpu->arch.hflags & BOOK3S_HFLAG_MULTI_PGSIZE)) { - pgsize = decode_pagesize(slbe, pteg[i+1]); + pgsize = decode_pagesize(slbe, pte1); if (pgsize < 0) continue; } @@ -303,8 +306,8 @@ do_second: goto do_second; } - v = pteg[i]; - r = pteg[i+1]; + v = be64_to_cpu(pteg[i]); + r = be64_to_cpu(pteg[i+1]); pp = (r & HPTE_R_PP) | key; if (r & HPTE_R_PP0) pp |= 8; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/13] KVM: PPC: Book3S_64 PR: Access shadow slb in big endian
The "shadow SLB" in the PACA is shared with the hypervisor, so it has to be big endian. We access the shadow SLB during world switch, so let's make sure we access it in big endian even when we're on a little endian host. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_64_slb.S | 33 - 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S index 4f12e8f..596140e 100644 --- a/arch/powerpc/kvm/book3s_64_slb.S +++ b/arch/powerpc/kvm/book3s_64_slb.S @@ -17,29 +17,28 @@ * Authors: Alexander Graf */ -#ifdef __LITTLE_ENDIAN__ -#error Need to fix SLB shadow accesses in little endian mode -#endif - #define SHADOW_SLB_ESID(num) (SLBSHADOW_SAVEAREA + (num * 0x10)) #define SHADOW_SLB_VSID(num) (SLBSHADOW_SAVEAREA + (num * 0x10) + 0x8) #define UNBOLT_SLB_ENTRY(num) \ - ld r9, SHADOW_SLB_ESID(num)(r12); \ - /* Invalid? Skip. */; \ - rldicl. r0, r9, 37, 63; \ - beq slb_entry_skip_ ## num; \ - xoris r9, r9, SLB_ESID_V@h; \ - std r9, SHADOW_SLB_ESID(num)(r12); \ + li r11, SHADOW_SLB_ESID(num); \ + LDX_BE r9, r12, r11; \ + /* Invalid? Skip. */; \ + rldicl. r0, r9, 37, 63; \ + beq slb_entry_skip_ ## num; \ + xoris r9, r9, SLB_ESID_V@h; \ + STDX_BE r9, r12, r11; \ slb_entry_skip_ ## num: #define REBOLT_SLB_ENTRY(num) \ - ld r10, SHADOW_SLB_ESID(num)(r11); \ - cmpdi r10, 0; \ - beq slb_exit_skip_ ## num; \ - orisr10, r10, SLB_ESID_V@h; \ - ld r9, SHADOW_SLB_VSID(num)(r11); \ - slbmte r9, r10; \ - std r10, SHADOW_SLB_ESID(num)(r11); \ + li r8, SHADOW_SLB_ESID(num); \ + li r7, SHADOW_SLB_VSID(num); \ + LDX_BE r10, r11, r8; \ + cmpdi r10, 0; \ + beq slb_exit_skip_ ## num; \ + orisr10, r10, SLB_ESID_V@h; \ + LDX_BE r9, r11, r7;\ + slbmte r9, r10;\ + STDX_BE r10, r11, r8; \ slb_exit_skip_ ## num: /** -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/13] KVM: PPC: Book3S PR: PAPR: Access RTAS in big endian
When the guest does an RTAS hypercall it keeps all RTAS variables inside a big endian data structure. To make sure we don't have to bother about endianness inside the actual RTAS handlers, let's just convert the whole structure to host endian before we call our RTAS handlers and back to big endian when we return to the guest. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_rtas.c | 29 + 1 file changed, 29 insertions(+) diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c index 7a05315..edb14ba 100644 --- a/arch/powerpc/kvm/book3s_rtas.c +++ b/arch/powerpc/kvm/book3s_rtas.c @@ -205,6 +205,32 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp) return rc; } +static void kvmppc_rtas_swap_endian_in(struct rtas_args *args) +{ +#ifdef __LITTLE_ENDIAN__ + int i; + + args->token = be32_to_cpu(args->token); + args->nargs = be32_to_cpu(args->nargs); + args->nret = be32_to_cpu(args->nret); + for (i = 0; i < args->nargs; i++) + args->args[i] = be32_to_cpu(args->args[i]); +#endif +} + +static void kvmppc_rtas_swap_endian_out(struct rtas_args *args) +{ +#ifdef __LITTLE_ENDIAN__ + int i; + + for (i = 0; i < args->nret; i++) + args->args[i] = cpu_to_be32(args->args[i]); + args->token = cpu_to_be32(args->token); + args->nargs = cpu_to_be32(args->nargs); + args->nret = cpu_to_be32(args->nret); +#endif +} + int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu) { struct rtas_token_definition *d; @@ -223,6 +249,8 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu) if (rc) goto fail; + kvmppc_rtas_swap_endian_in(&args); + /* * args->rets is a pointer into args->args. Now that we've * copied args we need to fix it up to point into our copy, @@ -247,6 +275,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu) if (rc == 0) { args.rets = orig_rets; + kvmppc_rtas_swap_endian_out(&args); rc = kvm_write_guest(vcpu->kvm, args_phys, &args, sizeof(args)); if (rc) goto fail; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/13] KVM: PPC: Book3S: PR: Fix C/R bit setting
Commit 9308ab8e2d made C/R HTAB updates go byte-wise into the target HTAB. However, it didn't update the guest's copy of the HTAB, but instead the host local copy of it. Write to the guest's HTAB instead. Signed-off-by: Alexander Graf CC: Paul Mackerras --- arch/powerpc/kvm/book3s_32_mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 76a64ce..60fc3f4 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -270,7 +270,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr, page */ if (found) { u32 pte_r = pteg[i+1]; - char __user *addr = (char __user *) &pteg[i+1]; + char __user *addr = (char __user *) (ptegp + (i+1) * sizeof(u32)); /* * Use single-byte writes to update the HPTE, to diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index 4a77725..e9854e7 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -348,14 +348,14 @@ do_second: * non-PAPR platforms such as mac99, and this is * what real hardware does. */ - char __user *addr = (char __user *) &pteg[i+1]; +char __user *addr = (char __user *) (ptegp + (i + 1) * sizeof(u64)); r |= HPTE_R_R; put_user(r >> 8, addr + 6); } if (iswrite && gpte->may_write && !(r & HPTE_R_C)) { /* Set the dirty flag */ /* Use a single byte write */ - char __user *addr = (char __user *) &pteg[i+1]; +char __user *addr = (char __user *) (ptegp + (i + 1) * sizeof(u64)); r |= HPTE_R_C; put_user(r, addr + 7); } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/13] KVM: PPC: Book3S PR: PAPR: Access HTAB in big endian
The HTAB on PPC is always in big endian. When we access it via hypercalls on behalf of the guest and we're running on a little endian host, we need to make sure we swap the bits accordingly. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_pr_papr.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c index 5efa97b..255e5b1 100644 --- a/arch/powerpc/kvm/book3s_pr_papr.c +++ b/arch/powerpc/kvm/book3s_pr_papr.c @@ -57,7 +57,7 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu) for (i = 0; ; ++i) { if (i == 8) goto done; - if ((*hpte & HPTE_V_VALID) == 0) + if ((be64_to_cpu(*hpte) & HPTE_V_VALID) == 0) break; hpte += 2; } @@ -67,8 +67,8 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu) goto done; } - hpte[0] = kvmppc_get_gpr(vcpu, 6); - hpte[1] = kvmppc_get_gpr(vcpu, 7); + hpte[0] = cpu_to_be64(kvmppc_get_gpr(vcpu, 6)); + hpte[1] = cpu_to_be64(kvmppc_get_gpr(vcpu, 7)); pteg_addr += i * HPTE_SIZE; copy_to_user((void __user *)pteg_addr, hpte, HPTE_SIZE); kvmppc_set_gpr(vcpu, 4, pte_index | i); @@ -93,6 +93,8 @@ static int kvmppc_h_pr_remove(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, pte_index); mutex_lock(&vcpu->kvm->arch.hpt_mutex); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); + pte[0] = be64_to_cpu(pte[0]); + pte[1] = be64_to_cpu(pte[1]); ret = H_NOT_FOUND; if ((pte[0] & HPTE_V_VALID) == 0 || @@ -169,6 +171,8 @@ static int kvmppc_h_pr_bulk_remove(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, tsh & H_BULK_REMOVE_PTEX); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); + pte[0] = be64_to_cpu(pte[0]); + pte[1] = be64_to_cpu(pte[1]); /* tsl = AVPN */ flags = (tsh & H_BULK_REMOVE_FLAGS) >> 26; @@ -207,6 +211,8 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, pte_index); mutex_lock(&vcpu->kvm->arch.hpt_mutex); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); + pte[0] = be64_to_cpu(pte[0]); + pte[1] = be64_to_cpu(pte[1]); ret = H_NOT_FOUND; if ((pte[0] & HPTE_V_VALID) == 0 || @@ -225,6 +231,8 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu) rb = compute_tlbie_rb(v, r, pte_index); vcpu->arch.mmu.tlbie(vcpu, rb, rb & 1 ? true : false); + pte[0] = cpu_to_be64(pte[0]); + pte[1] = cpu_to_be64(pte[1]); copy_to_user((void __user *)pteg, pte, sizeof(pte)); ret = H_SUCCESS; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/13] KVM: PPC: Make shared struct aka magic page guest endian
The shared (magic) page is a data structure that contains often used supervisor privileged SPRs accessible via memory to the user to reduce the number of exits we have to take to read/write them. When we actually share this structure with the guest we have to maintain it in guest endianness, because some of the patch tricks only work with native endian load/store operations. Since we only share the structure with either host or guest in little endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv. For booke, the shared struct stays big endian. For book3s_64 hv we maintain the struct in host native endian, since it never gets shared with the guest. For book3s_64 pr we introduce a variable that tells us which endianness the shared struct is in and route every access to it through helper inline functions that evaluate this variable. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h| 3 +- arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/include/asm/kvm_ppc.h | 80 ++- arch/powerpc/kernel/asm-offsets.c| 2 + arch/powerpc/kvm/book3s.c| 72 arch/powerpc/kvm/book3s_32_mmu.c | 21 +++ arch/powerpc/kvm/book3s_32_mmu_host.c| 4 +- arch/powerpc/kvm/book3s_64_mmu.c | 19 --- arch/powerpc/kvm/book3s_64_mmu_host.c| 4 +- arch/powerpc/kvm/book3s_emulate.c| 28 +- arch/powerpc/kvm/book3s_hv.c | 11 arch/powerpc/kvm/book3s_interrupts.S | 23 +++- arch/powerpc/kvm/book3s_paired_singles.c | 16 +++--- arch/powerpc/kvm/book3s_pr.c | 95 +++- arch/powerpc/kvm/book3s_pr_papr.c| 2 +- arch/powerpc/kvm/emulate.c | 24 arch/powerpc/kvm/powerpc.c | 34 +++- arch/powerpc/kvm/trace_pr.h | 2 +- 18 files changed, 306 insertions(+), 137 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 27b1041..ca3b8f1 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -269,9 +269,10 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu) return vcpu->arch.pc; } +static u64 kvmppc_get_msr(struct kvm_vcpu *vcpu); static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu) { - return (vcpu->arch.shared->msr & MSR_LE) != (MSR_KERNEL & MSR_LE); + return (kvmppc_get_msr(vcpu) & MSR_LE) != (MSR_KERNEL & MSR_LE); } static inline u32 kvmppc_get_last_inst_internal(struct kvm_vcpu *vcpu, ulong pc) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1eaea2d..3fffb2e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -622,6 +622,9 @@ struct kvm_vcpu_arch { wait_queue_head_t cpu_run; struct kvm_vcpu_arch_shared *shared; +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE) + bool shared_big_endian; +#endif unsigned long magic_page_pa; /* phys addr to map the magic page to */ unsigned long magic_page_ea; /* effect. addr to map the magic page to */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 4096f16..4a7cc45 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -449,6 +449,84 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn) } /* + * Shared struct helpers. The shared struct can be little or big endian, + * depending on the guest endianness. So expose helpers to all of them. + */ +static inline bool kvmppc_shared_big_endian(struct kvm_vcpu *vcpu) +{ +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE) + /* Only Book3S_64 PR supports bi-endian for now */ + return vcpu->arch.shared_big_endian; +#elif defined(CONFIG_PPC_BOOK3S_64) && defined(__LITTLE_ENDIAN__) + /* Book3s_64 HV on little endian is always little endian */ + return false; +#else + return true; +#endif +} + +#define SHARED_WRAPPER_GET(reg, size) \ +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu) \ +{ \ + if (kvmppc_shared_big_endian(vcpu)) \ + return be##size##_to_cpu(vcpu->arch.shared->reg);\ + else\ + return le##size##_to_cpu(vcpu->arch.shared->reg);\ +} \ + +#define SHARED_WRAPPER_SET(reg, size) \ +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \ +{ \ + if (kvmppc_shared_big_endi
[PATCH 12/13] KVM: PPC: Book3S PR: Do dcbz32 patching with big endian instructions
When the host CPU we're running on doesn't support dcbz32 itself, but the guest wants to have dcbz only clear 32 bytes of data, we loop through every executable mapped page to search for dcbz instructions and patch them with a special privileged instruction that we emulate as dcbz32. The only guests that want to see dcbz act as 32byte are book3s_32 guests, so we don't have to worry about little endian instruction ordering. So let's just always search for big endian dcbz instructions, also when we're on a little endian host. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_32_mmu.c | 2 +- arch/powerpc/kvm/book3s_pr.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 628d90e..93503bb 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -131,7 +131,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvm_vcpu *vcpu, pteg = (vcpu_book3s->sdr1 & 0x) | hash; dprintk("MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n", - kvmppc_get_pc(&vcpu_book3s->vcpu), eaddr, vcpu_book3s->sdr1, pteg, + kvmppc_get_pc(vcpu), eaddr, vcpu_book3s->sdr1, pteg, sr_vsid(sre)); r = gfn_to_hva(vcpu->kvm, pteg >> PAGE_SHIFT); diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index a3d705e..96dbb5f 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -428,8 +428,8 @@ static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) /* patch dcbz into reserved instruction, so we trap */ for (i=hpage_offset; i < hpage_offset + (HW_PAGE_SIZE / 4); i++) - if ((page[i] & 0xff0007ff) == INS_DCBZ) - page[i] &= 0xfff7; + if ((be32_to_cpu(page[i]) & 0xff0007ff) == INS_DCBZ) + page[i] &= cpu_to_be32(0xfff7); kunmap_atomic(page); put_page(hpage); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/13] KVM: PPC: Book3S PR: Default to big endian guest
The default MSR when user space does not define anything should be identical on little and big endian hosts, so remove MSR_LE from it. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_pr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 8076543..1644d17 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1193,7 +1193,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct kvm *kvm, kvmppc_set_pvr_pr(vcpu, vcpu->arch.pvr); vcpu->arch.slb_nr = 64; - vcpu->arch.shadow_msr = MSR_USER64; + vcpu->arch.shadow_msr = MSR_USER64 & ~MSR_LE; err = kvmppc_mmu_init(vcpu); if (err < 0) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging
Hi Rik, On Tue, Apr 22, 2014 at 10:40:17AM -0400, Rik van Riel wrote: > On 04/22/2014 07:57 AM, Christian Borntraeger wrote: > > On 22/04/14 12:55, Christian Borntraeger wrote: > >> While preparing/testing some KVM on s390 patches for the next merge window > >> (target is kvm/next which is based on 3.15-rc1) I faced a very severe > >> performance hickup on guest paging (all anonymous memory). > >> > >> All memory bound guests are in "D" state now and the system is barely > >> unusable. > >> > >> Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d > >> "mm: vmscan: do not swap anon pages just because free+file is low" makes > >> the problem go away. > >> > >> According to /proc/vmstat the system is now in direct reclaim almost all > >> the time for every page fault (more than 10x more direct reclaims than > >> kswap reclaims) > >> With the patch being reverted everything is fine again. > >> > >> Any ideas? > > > > Here is an idea to tackle my problem and the original problem: > > > > reverting 0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, > > also seems to make my system usable. > > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, > > struct scan_control *sc, > > */ > > if (global_reclaim(sc)) { > > free = zone_page_state(zone, NR_FREE_PAGES); > > - if (unlikely(file + free <= high_wmark_pages(zone))) { > > + if (unlikely(file + free <= low_wmark_pages(zone))) { > > scan_balance = SCAN_ANON; > > goto out; > > } > > > > Looks reasonable to me. Johannes? I went with a full revert to be on the safe side. Since kswapd's goal is the high watermark, I kind of liked the idea that we start swapping once the file pages alone are not enough anymore to restore the wmark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Change dev->needed_headroom of virtio_net to make Patch"virtio-net: put virtio net header inline with data"work better
Thank you very much! I will post a patch tomorrow. --Best wishes! //Zhang Jie -Original Message- From: Michael S. Tsirkin [mailto:m...@redhat.com] Sent: Thursday, April 24, 2014 7:45 PM To: Zhangjie (HZ) Cc: jasow...@redhat.com; Qinchuanyu; Liuyongan; kvm@vger.kernel.org; net...@vger.kernel.org Subject: Re: Change dev->needed_headroom of virtio_net to make Patch"virtio-net: put virtio net header inline with data"work better On Thu, Apr 24, 2014 at 10:19:58AM +, Zhangjie (HZ) wrote: > Hi! > > Patch “virtio-net: put virtio net header inline with data” , has a > notable improvement for TCP packages. > > But UDP packages from virtio_net nic, do not have enough head room. I > wonder if we can set dev->needed_headroom > > to the size of virtio net header, so as to put the header in. By doing > this, udp get about 5% improvement in bandwidth. Sounds like a reasonable thing to do. Want to post the patch so people can try it out? > > > -- > > Thanks, > > //Zhang Jie > > > N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
Re: Change dev->needed_headroom of virtio_net to make Patch"virtio-net: put virtio net header inline with data"work better
On Thu, Apr 24, 2014 at 10:19:58AM +, Zhangjie (HZ) wrote: > Hi! > > Patch “virtio-net: put virtio net header inline with data” , has a notable > improvement for TCP packages. > > But UDP packages from virtio_net nic, do not have enough head room. I wonder > if > we can set dev->needed_headroom > > to the size of virtio net header, so as to put the header in. By doing this, > udp get about 5% improvement in bandwidth. Sounds like a reasonable thing to do. Want to post the patch so people can try it out? > > > -- > > Thanks, > > //Zhang Jie > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3.11 078/182] MIPS: KVM: Pass reserved instruction exceptions to guest
3.11.10.9 -stable review patch. If anyone has any objections, please let me know. -- From: James Hogan commit 15505679362270d02c449626385cb74af8905514 upstream. Previously a reserved instruction exception while in guest code would cause a KVM internal error if kvm_mips_handle_ri() didn't recognise the instruction (including a RDHWR from an unrecognised hardware register). However the guest OS should really have the opportunity to catch the exception so that it can take the appropriate actions such as sending a SIGILL to the guest user process or emulating the instruction itself. Therefore in these cases emulate a guest RI exception and only return EMULATE_FAIL if that fails, being careful to revert the PC first in case the exception occurred in a branch delay slot in which case the PC will already point to the branch target. Also turn the printk messages relating to these cases into kvm_debug messages so that they aren't usually visible. This allows crashme to run in the guest without killing the entire VM. Signed-off-by: James Hogan Cc: Ralf Baechle Cc: Gleb Natapov Cc: Paolo Bonzini Cc: Sanjay Lal Cc: linux-m...@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Luis Henriques --- arch/mips/kvm/kvm_mips_emul.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c index 4b6274b..e75ef82 100644 --- a/arch/mips/kvm/kvm_mips_emul.c +++ b/arch/mips/kvm/kvm_mips_emul.c @@ -1571,17 +1571,17 @@ kvm_mips_handle_ri(unsigned long cause, uint32_t *opc, arch->gprs[rt] = kvm_read_c0_guest_userlocal(cop0); #else /* UserLocal not implemented */ - er = kvm_mips_emulate_ri_exc(cause, opc, run, vcpu); + er = EMULATE_FAIL; #endif break; default: - printk("RDHWR not supported\n"); + kvm_debug("RDHWR %#x not supported @ %p\n", rd, opc); er = EMULATE_FAIL; break; } } else { - printk("Emulate RI not supported @ %p: %#x\n", opc, inst); + kvm_debug("Emulate RI not supported @ %p: %#x\n", opc, inst); er = EMULATE_FAIL; } @@ -1590,6 +1590,7 @@ kvm_mips_handle_ri(unsigned long cause, uint32_t *opc, */ if (er == EMULATE_FAIL) { vcpu->arch.pc = curr_pc; + er = kvm_mips_emulate_ri_exc(cause, opc, run, vcpu); } return er; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html