Re: [PATCH] arm64: KVM: Do not inject a 64bit fault for a 32bit guest
On Thu, Aug 27, 2015 at 03:05:47PM +0100, Marc Zyngier wrote: When injecting a fault into a 32bit guest, it seems rather idiotic to also inject a 64bit fault that is only going to corrupt the guest state, and lead to a situation where we restore an illegal context. Just fix the stupid bug that has been there from day 1. Cc: sta...@vger.kernel.org Reported-by: Russell King li...@arm.linux.org.uk s/linux/rmk+kernel/ please Tested-by: Russell King rmk+ker...@arm.linux.org.uk Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- Will: Paolo being on holiday, do you mind merging this one via your tree? I don't think the commit message does this bug justice. The implication is it's just a guest issue. It isn't, the bug appears to take out the host kernel in a truely spectacular way. http://www.arm.linux.org.uk/developer/build/result.php?type=bootidx=4871 Tested here, the fix stops the host kernel exploding. The crashed kvm instance can be stopped and a proper kernel can then be booted in a new guest instance. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/4] irqchip: GICv2/v3: Add support for irq_vcpu_affinity
Hi Marc, I tested the series on Calxeda Midway with VFIO use case. Also reviewed it again without finding anything new. Tested-by: Eric Auger eric.au...@linaro.org Reviewed-by: Eric Auger eric.au...@linaro.org Best Regards Eric On 08/26/2015 06:00 PM, Marc Zyngier wrote: The GICv2 and GICv3 architectures allow an active physical interrupt to be forwarded to a guest, and the guest to indirectly perform the deactivation of the interrupt by performing an EOI on the virtual interrupt (see for example the GICv2 spec, 3.2.1). This allows some substantial performance improvement for level triggered interrupts that otherwise have to be masked/unmasked in VFIO, not to mention the required trap back to KVM when the guest performs an EOI. To enable this, the GICs need to be switched to a different EOImode, where a taken interrupt can be left active (which prevents the same interrupt from being taken again), while other interrupts are still being processed normally. We also use the new irq_set_vcpu_affinity hook that was introduced for Intel's Posted Interrupts to determine whether or not to perform the deactivation at EOI-time. As all of this only makes sense when the kernel can behave as a hypervisor, we only enable this mode on detecting that the kernel was actually booted in HYP mode, and that the GIC supports this feature. This series is a complete rework of a RFC I sent over a year ago: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/266328.html Since then, a lot has been either merged (the irqchip_state) or reworked (my active-timer series: http://www.spinics.net/lists/kvm/msg118768.html), and this implements the last few bits for Eric Auger's series to finally make it into the kernel: https://lkml.org/lkml/2015/7/2/268 https://lkml.org/lkml/2015/7/6/291 With all these patches combined, physical interrupt routing from the kernel into a VM becomes possible. Note that the implementation makes use of the static_key mechanism, which is undergoing an extensive rework in 4.3. I intend to convert this code once both are in mainline. This has been tested on Juno (GICv2) and FastModel (GICv3). A branch is available at: git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/gic-irq-vcpu-affinity-v4 * From v3: - Use separate irq_chip structures, leading to much nicer code (tglx) - Dropped Eric's Tested/Reviewed-by as there is significant changes * From v2: - Another small fix from Eric - Some commit message cleanups * From v1: - Fixes after review from Eric - Got rid of the cascaded GICv2 hack (it was broken anyway) - Folded the LPI deactivation patch (it makes more sense as part of the main one. - Some clarifying comments about the deactivate on mask - I haven't retained Eric's Reviewed/Tested-by, as the code as significantly changed on GICv2 Marc Zyngier (4): irqchip: GICv3: Convert to EOImode == 1 irqchip: GICv3: Don't deactivate interrupts forwarded to a guest irqchip: GIC: Convert to EOImode == 1 irqchip: GIC: Don't deactivate interrupts forwarded to a guest drivers/irqchip/irq-gic-v3.c | 70 +-- drivers/irqchip/irq-gic.c | 111 - include/linux/irqchip/arm-gic-v3.h | 9 +++ include/linux/irqchip/arm-gic.h| 4 ++ 4 files changed, 188 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] iommu/s390: add iommu api for s390 pci devices
This adds an IOMMU API implementation for s390 PCI devices. Reviewed-by: Sebastian Ott seb...@linux.vnet.ibm.com Signed-off-by: Gerald Schaefer gerald.schae...@de.ibm.com --- MAINTAINERS | 7 + arch/s390/Kconfig | 1 + arch/s390/include/asm/pci.h | 4 + arch/s390/include/asm/pci_dma.h | 5 +- arch/s390/pci/pci_dma.c | 37 +++-- drivers/iommu/Kconfig | 7 + drivers/iommu/Makefile | 1 + drivers/iommu/s390-iommu.c | 337 8 files changed, 386 insertions(+), 13 deletions(-) create mode 100644 drivers/iommu/s390-iommu.c diff --git a/MAINTAINERS b/MAINTAINERS index 23da5da..8845410 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8937,6 +8937,13 @@ F: drivers/s390/net/*iucv* F: include/net/iucv/ F: net/iucv/ +S390 IOMMU (PCI) +M: Gerald Schaefer gerald.schae...@de.ibm.com +L: linux-s...@vger.kernel.org +W: http://www.ibm.com/developerworks/linux/linux390/ +S: Supported +F: drivers/iommu/s390-iommu.c + S3C24XX SD/MMC Driver M: Ben Dooks ben-li...@fluff.org L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 1d57000..74db332 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -582,6 +582,7 @@ menuconfig PCI bool PCI support select HAVE_DMA_ATTRS select PCI_MSI + select IOMMU_SUPPORT help Enable PCI support. diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 34d9603..c873e68 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -62,6 +62,8 @@ struct zpci_bar_struct { u8 size; /* order 2 exponent */ }; +struct s390_domain; + /* Private data per function */ struct zpci_dev { struct pci_dev *pdev; @@ -118,6 +120,8 @@ struct zpci_dev { struct dentry *debugfs_dev; struct dentry *debugfs_perf; + + struct s390_domain *s390_domain; /* s390 IOMMU domain data */ }; static inline bool zdev_enabled(struct zpci_dev *zdev) diff --git a/arch/s390/include/asm/pci_dma.h b/arch/s390/include/asm/pci_dma.h index 30b4c17..7a7abf1 100644 --- a/arch/s390/include/asm/pci_dma.h +++ b/arch/s390/include/asm/pci_dma.h @@ -192,5 +192,8 @@ static inline unsigned long *get_st_pto(unsigned long entry) /* Prototypes */ int zpci_dma_init_device(struct zpci_dev *); void zpci_dma_exit_device(struct zpci_dev *); - +void dma_free_seg_table(unsigned long); +unsigned long *dma_alloc_cpu_table(void); +void dma_cleanup_tables(unsigned long *); +void dma_update_cpu_trans(unsigned long *, void *, dma_addr_t, int); #endif diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c index 37505b8..37d10f7 100644 --- a/arch/s390/pci/pci_dma.c +++ b/arch/s390/pci/pci_dma.c @@ -24,7 +24,7 @@ static int zpci_refresh_global(struct zpci_dev *zdev) zdev-iommu_pages * PAGE_SIZE); } -static unsigned long *dma_alloc_cpu_table(void) +unsigned long *dma_alloc_cpu_table(void) { unsigned long *table, *entry; @@ -114,12 +114,12 @@ static unsigned long *dma_walk_cpu_trans(unsigned long *rto, dma_addr_t dma_addr return pto[px]; } -static void dma_update_cpu_trans(struct zpci_dev *zdev, void *page_addr, -dma_addr_t dma_addr, int flags) +void dma_update_cpu_trans(unsigned long *dma_table, void *page_addr, + dma_addr_t dma_addr, int flags) { unsigned long *entry; - entry = dma_walk_cpu_trans(zdev-dma_table, dma_addr); + entry = dma_walk_cpu_trans(dma_table, dma_addr); if (!entry) { WARN_ON_ONCE(1); return; @@ -156,7 +156,8 @@ static int dma_update_trans(struct zpci_dev *zdev, unsigned long pa, goto no_refresh; for (i = 0; i nr_pages; i++) { - dma_update_cpu_trans(zdev, page_addr, dma_addr, flags); + dma_update_cpu_trans(zdev-dma_table, page_addr, dma_addr, +flags); page_addr += PAGE_SIZE; dma_addr += PAGE_SIZE; } @@ -181,7 +182,7 @@ no_refresh: return rc; } -static void dma_free_seg_table(unsigned long entry) +void dma_free_seg_table(unsigned long entry) { unsigned long *sto = get_rt_sto(entry); int sx; @@ -193,21 +194,18 @@ static void dma_free_seg_table(unsigned long entry) dma_free_cpu_table(sto); } -static void dma_cleanup_tables(struct zpci_dev *zdev) +void dma_cleanup_tables(unsigned long *table) { - unsigned long *table; int rtx; - if (!zdev || !zdev-dma_table) + if (!table) return; - table = zdev-dma_table; for (rtx = 0; rtx ZPCI_TABLE_ENTRIES; rtx++) if (reg_entry_isvalid(table[rtx]))
Re: [PATCH v4 0/4] irqchip: GICv2/v3: Add support for irq_vcpu_affinity
Hi Eric, On 27/08/15 14:03, Eric Auger wrote: Hi Marc, I tested the series on Calxeda Midway with VFIO use case. Also reviewed it again without finding anything new. Tested-by: Eric Auger eric.au...@linaro.org Reviewed-by: Eric Auger eric.au...@linaro.org Thanks a lot Eric, much appreciated! M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm/arm64: BUG FIX: Do not inject spurious interrupts
On Thu, Aug 27, 2015 at 02:51:22PM +0300, Pavel Fedin wrote: Commit 71760950bf3dc796e5e53ea3300dec724a09f593 (arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn) introduced vgic_queue_irq_to_lr() function with additional vgic_dist_irq_is_pending() check before setting LR_STATE_PENDING bit. In some cases it started causing the following situation if the userland quickly drops the IRQ back to inactive state for some reason: 1. Userland injects an IRQ with level == 1, this ends up in vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending() for this IRQ. 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly (!!!) 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending() in this case will call vgic_dist_irq_clear_pending() and reset the pending flag. 4. vCPU finally wakes up. It succesfully rolls through through __kvm_vgic_flush_hwstate(), which populates vGIC registers. However, since neither pending nor active flags are now set for this IRQ, vgic_queue_irq_to_lr() does not set any state bits on this LR at all. Since this is level-sensitive IRQ, we end up in LR containing only LR_EOI_INT bit, causing unnecessary immediate exit from the guest. This patch fixes the problem by adding forgotten vgic_cpu_irq_clear(). This causes the IRQ not to be included into any lists, if it has been picked up after getting dropped to inactive level. Since this is a level-sensitive IRQ, this is correct behavior. The bug was caught on ARM64 kernel v4.1.6, running qemu virt guest, where it was caused by emulated pl011. It's a bit weird to just sned this as a new patch without replying to my mail from yesterday with feedback, explaining changes from what I did etc. Anyway. Signed-off-by: Pavel Fedin p.fe...@samsung.com --- virt/kvm/arm/vgic.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 34dad3c..bf155e3 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -,7 +,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, kvm_debug(Set active, clear distributor: 0x%x\n, vlr.state); vgic_irq_clear_active(vcpu, irq); vgic_update_state(vcpu-kvm); - } else if (vgic_dist_irq_is_pending(vcpu, irq)) { + } else { + WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq)); vlr.state |= LR_STATE_PENDING; kvm_debug(Set pending: 0x%x\n, vlr.state); } @@ -1567,8 +1568,10 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid, } else { if (level_triggered) { vgic_dist_irq_clear_level(vcpu, irq_num); - if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) + if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) { vgic_dist_irq_clear_pending(vcpu, irq_num); + vgic_cpu_irq_clear(vcpu, irq_num); I think you're missing a potential change to the irq_pending_on_cpu field here, which you have to compute by calling vgic_update_state() like we do elsewhere when we change status bits (note that this is different from the incorrect approach I suggested yesterday where we always just clear the bit for that vcpu). + } } ret = false; -- 2.4.4 -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/15] KVM: arm64: add data structures to model ITS interrupt translation
Hi Andre, On 08/25/2015 01:15 PM, Andre Przywara wrote: Hi Eric, On 13/08/15 16:46, Eric Auger wrote: On 07/10/2015 04:21 PM, Andre Przywara wrote: The GICv3 Interrupt Translation Service (ITS) uses tables in memory to allow a sophisticated interrupt routing. It features device tables, an interrupt table per device and a table connecting collections to actual CPUs (aka. redistributors in the GICv3 lingo). Since the interrupt numbers for the LPIs are allocated quite sparsely and the range can be quite huge (8192 LPIs being the minimum), using bitmaps or arrays for storing information is a waste of memory. We use linked lists instead, which we iterate linearily. This works very well with the actual number of LPIs/MSIs in the guest being quite low. Should the number of LPIs exceed the number where iterating through lists seems acceptable, we can later revisit this and use more efficient data structures. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- include/kvm/arm_vgic.h | 3 +++ virt/kvm/arm/its-emul.c | 48 2 files changed, 51 insertions(+) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index b432055..1648668 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -25,6 +25,7 @@ #include linux/spinlock.h #include linux/types.h #include kvm/iodev.h +#include linux/list.h #define VGIC_NR_IRQS_LEGACY256 #define VGIC_NR_SGIS 16 @@ -162,6 +163,8 @@ struct vgic_its { u64 cbaser; int creadr; int cwriter; + struct list_headdevice_list; + struct list_headcollection_list; }; struct vgic_dist { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index b498f06..7f217fa 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -21,6 +21,7 @@ #include linux/kvm.h #include linux/kvm_host.h #include linux/interrupt.h +#include linux/list.h #include linux/irqchip/arm-gic-v3.h #include kvm/arm_vgic.h @@ -32,6 +33,25 @@ #include vgic.h #include its-emul.h +struct its_device { + struct list_head dev_list; + struct list_head itt; + u32 device_id; +}; + +struct its_collection { + struct list_head coll_list; + u32 collection_id; + u32 target_addr; +}; + +struct its_itte { + struct list_head itte_list; + struct its_collection *collection; + u32 lpi; + u32 event_id; +}; + #define BASER_BASE_ADDRESS(x) ((x) 0xf000ULL) /* The distributor lock is held by the VGIC MMIO handler. */ @@ -311,6 +331,9 @@ int vits_init(struct kvm *kvm) spin_lock_init(its-lock); + INIT_LIST_HEAD(its-device_list); + INIT_LIST_HEAD(its-collection_list); + its-enabled = false; return -ENXIO; @@ -320,11 +343,36 @@ void vits_destroy(struct kvm *kvm) { struct vgic_dist *dist = kvm-arch.vgic; struct vgic_its *its = dist-its; + struct its_device *dev; + struct its_itte *itte; + struct list_head *dev_cur, *dev_temp; + struct list_head *cur, *temp; if (!vgic_has_its(kvm)) return; + if (!its-device_list.next) Why not using list_empty? But I think I would simply remove this since the empty case if handle below... list_empty() requires the list to be initialized before. This check here is to detect that map_resources was never called (this is only done on the first VCPU run) and thus device_list is basically still all zeroes. If we abort the guest without ever running a VCPU (for instance because some initialization failed), we call vits_destroy() anyway (because this is called when tearing down the VGIC device). So the check is here to detect early that vits_destroy() has been called without the ITS ever been fully initialized. This fixed a real bug when the guest start was aborted before the ITS was ever used. I will add a comment to make this clear. OK. My next question is why don't we call vits_init in vgic_init after dist-nr_cpus? I had in mind map_resources was linked to the setting of vgic_dist_base vgic_cpu_base. Here you do not depend on those addresses? Do I miss smthg? + return; + + spin_lock(its-lock); + list_for_each_safe(dev_cur, dev_temp, its-device_list) { + dev = container_of(dev_cur, struct its_device, dev_list); isn't the usage of list_for_each_entry_safe more synthetic here? Sorry my point was about the _ENTRY_ variant that should avoid to use container_of, isn't it? Eric If I got this correctly, we need the _safe variant if we want to remove the list item within the loop. Or am I missing something here? Cheers, Andre. + list_for_each_safe(cur, temp, dev-itt) { + itte = (container_of(cur, struct its_itte, itte_list)); same Eric + list_del(cur); + kfree(itte); + } +
[PATCH] arm64: KVM: Do not inject a 64bit fault for a 32bit guest
When injecting a fault into a 32bit guest, it seems rather idiotic to also inject a 64bit fault that is only going to corrupt the guest state, and lead to a situation where we restore an illegal context. Just fix the stupid bug that has been there from day 1. Cc: sta...@vger.kernel.org Reported-by: Russell King li...@arm.linux.org.uk Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- Will: Paolo being on holiday, do you mind merging this one via your tree? Thanks, M. arch/arm64/kvm/inject_fault.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c index f02530e..85c5715 100644 --- a/arch/arm64/kvm/inject_fault.c +++ b/arch/arm64/kvm/inject_fault.c @@ -168,8 +168,8 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr) { if (!(vcpu-arch.hcr_el2 HCR_RW)) inject_abt32(vcpu, false, addr); - - inject_abt64(vcpu, false, addr); + else + inject_abt64(vcpu, false, addr); } /** @@ -184,8 +184,8 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr) { if (!(vcpu-arch.hcr_el2 HCR_RW)) inject_abt32(vcpu, true, addr); - - inject_abt64(vcpu, true, addr); + else + inject_abt64(vcpu, true, addr); } /** @@ -198,6 +198,6 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu) { if (!(vcpu-arch.hcr_el2 HCR_RW)) inject_undef32(vcpu); - - inject_undef64(vcpu); + else + inject_undef64(vcpu); } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A question about vring operation
On Thu, Aug 27, 2015 at 01:18:47AM +, Wang, Wei W wrote: Hi all, I have a question about the vring_avail: It only includes an idx (equivalent to the ring tail), which is used by the frontend (virtio_net) to fill bufs. The backend (e.g. vhost_net) maintains the ring head (last_avail_idx) by itself. The frontend checks if the ring is full or empty via a counter (vq-num_free). My question is why can’t we include the ring head in the vring_avail struct, so that the vq-num_free is not needed, and the backend can directly use it without maintaining its own copy? Thanks, Wei I'm not sure I understand your proposal, and what it would accomplish. Write a patch, that'll make it easier to discuss. Also copy all relevant mailing lists, not just kvm. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: add multiple times opening support to a virtserialport
On 07/24/2015 08:00 AM, Matt Ma wrote: Hi all, Linaro has developed the foundation for the new Android Emulator code base based on a fairly recent upstream QEMU code base, when we re-based the code, we updated the device model to be more virtio based (for example the drives are now virtio block devices). The aim of this is to minimise the delta between upstream and the Android specific changes to QEMU. One Android emulator specific feature is the AndroidPipe. AndroidPipe is a communication channel between the guest system and the emulator itself. Guest side device node can be opened by multi processes at the same time with different service name. It has a de-multiplexer on the QEMU side to figure out which service the guest actually wanted, so the first write after opening device node is the service name guest wanted, after QEMU backend receive this service name, create a corresponding communication channel, initialize related component, such as file descriptor which connect to the host socket serve. So each opening in guest will create a separated communication channel. We can create a separate device for each service type, however some services, such as the OpenGL emulation, need to have multiple open channels at a time. This is currently not possible using the virtserialport which can only be opened once. Current virtserialport can not be opened by multiple processes at the same time. I know virtserialport has provided buffers beforehand to cache data from host to guest, so even there is no guest read, data can still be transported from host to guest kernel, when there is guest read request, just copy cached data to user space. We are not sure clearly whether virtio can support multi-open-per-device semantics or not, followings are just our initial ideas about adding multi-open-per-device feature to a port: * when there is a open request on a port, kernel will allocate a portclient with new id and __wait_queue_head to track this request * save this portclient in file-private_data * guest kernel pass this portclient info to QEMU and notify that the port has been opened * QEMU backend will create a clientinfo struct to track this communication channel, initialize related component * we may change the kernel side strategy of allocating receiving buffers in advance to a new strategy, that is when there is a read request: - allocate a port_buffer, put user space buffer address to port_buffer.buf, share memory to avoid memcpy - put both portclient id(or portclient addrss) and port_buffer.buf to virtqueue, that is the length of buffers chain is 2 - kick to notify QEMU backend to consume read buffer - QEMU backend read portclient info firstly to find the correct clientinfo, then read host data directly into virtqueue buffer to avoid memcpy - guest kernel will wait(similarly in block mode, because the user space address has been put into virtqueue) until QEMU backend has consumed buffer(all data/part data/nothing have been sent to host side) - if nothing has been read from host and file descriptor is in block mode, read request will wait through __wait_queue_head until host side is readable * above read logic may change the current behavior of transferring data to guest kernel even without guest user read * when there is a write request: - allocate a port_buffer, put user space buffer address to port_buffer.buf, share memory to avoid memcpy - put both portclient id(or portclient addrss) and port_buffer.buf to virtqueue, the length of buffers chain is 2 - kick to notify QEMU backend to consume write buffer - QEMU backend read portclient info firstly to find the correct clientinfo, then write the virtqueue buffer content to host side as current logic - guest kernel will wait(similarly in block mode, because the user space address has been put into virtqueue) until QEMU backend has consumed buffer(all data/part data/nothing have been receive from host side) - if nothing has been sent out and file descriptor is in block mode, write request will wait through __wait_queue_head until host side is writable We obviously don't want to regress existing virtio behaviour and performance and welcome the communities expertise to point out anything we may have missed out before we get to far down implementing our initial proof-of-concept. Would virtio-vsock be interesting for your purposes? http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf (Video doesn't seem to be up yet, but should probably be available eventually at the following link) https://www.youtube.com/playlist?list=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep Regards, Christopher Covington -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line
[GIT PULL] Late arm64 KVM fix for 4.2
Hi Linus, I appreciate that it's extremely late in the cycle, but we've uncovered a nasty bug in the arm64 KVM code which allows a badly behaved 32-bit guest to bring down the host. The fix is simple (it's what I believe we call a brown paper bag bug) and I don't think it makes sense to sit on this, particularly as Russell ended up triggering this rather than just somebody noticing a potential problem by inspection. Usually arm64 KVM changes would go via Paolo's tree, but he's on holiday at the moment and the deal is that anything urgent gets shuffled via the arch trees, so here it is. Please pull. Cheers, Will ---8 The following changes since commit c13dcf9f2d6f5f06ef1bf79ec456df614c5e058b: Linux 4.2-rc8 (2015-08-23 20:52:59 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes for you to fetch changes up to 126c69a0bd0e441bf6766a5d9bf20de011be9f68: arm64: KVM: Fix host crash when injecting a fault into a 32bit guest (2015-08-27 16:16:55 +0100) Urgent arm64 KVM fix for 4.2: Fix arm64 KVM issue when injecting an abort into a 32-bit guest, which would lead to an illegal exception return at EL2 and a subsequent host crash. Marc Zyngier (1): arm64: KVM: Fix host crash when injecting a fault into a 32bit guest arch/arm64/kvm/inject_fault.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] arm64: KVM: Do not inject a 64bit fault for a 32bit guest
When injecting a fault into a misbehaving 32bit guest, it seems rather idiotic to also inject a 64bit fault that is only going to corrupt the guest state. This leads to a situation where we perform an illegal exception return at EL2 causing the host to crash instead of killing the guest. Just fix the stupid bug that has been there from day 1. Cc: sta...@vger.kernel.org Reported-by: Russell King rmk+ker...@arm.linux.org.uk Tested-by: Russell King rmk+ker...@arm.linux.org.uk Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- arch/arm64/kvm/inject_fault.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c index f02530e..85c5715 100644 --- a/arch/arm64/kvm/inject_fault.c +++ b/arch/arm64/kvm/inject_fault.c @@ -168,8 +168,8 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr) { if (!(vcpu-arch.hcr_el2 HCR_RW)) inject_abt32(vcpu, false, addr); - - inject_abt64(vcpu, false, addr); + else + inject_abt64(vcpu, false, addr); } /** @@ -184,8 +184,8 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr) { if (!(vcpu-arch.hcr_el2 HCR_RW)) inject_abt32(vcpu, true, addr); - - inject_abt64(vcpu, true, addr); + else + inject_abt64(vcpu, true, addr); } /** @@ -198,6 +198,6 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu) { if (!(vcpu-arch.hcr_el2 HCR_RW)) inject_undef32(vcpu); - - inject_undef64(vcpu); + else + inject_undef64(vcpu); } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment
On Thu, Aug 27, 2015 at 2:59 AM, Wanpeng Li wanpeng...@hotmail.com wrote: Hi David, On 8/26/15 1:19 AM, David Matlack wrote: Thanks for writing v2, Wanpeng. On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com wrote: There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patch adds the ability to adjust halt_poll_ns dynamically. What testing have you done with these patches? Do you know if this removes the overhead of polling in idle VCPUs? Do we lose any of the performance from always polling? There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter, halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow matrix is suggested by David: if (poll successfully for interrupt): stay the same else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow The way you implemented this wasn't what I expected. I thought you would time the whole function (kvm_vcpu_block). But I like your approach better. It's simpler and [by inspection] does what we want. I see there is more idle vCPUs overhead w/ this method even more than always halt-poll, so I bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks when idle VCPU is detected. The perfomance looks good in v4. Why did this patch have a worse idle overhead than always poll? Regards, Wanpeng Li halt_poll_ns_shrink/ | halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns -+--+--- 1 | = halt_poll_ns | = 0 halt_poll_ns | *= halt_poll_ns_grow | /= halt_poll_ns_shrink otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink I was curious why you went with this approach rather than just the middle row, or just the last row. Do you think we'll want the extra flexibility? Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- virt/kvm/kvm_main.c | 65 - 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 93db833..2a4962b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -66,9 +66,26 @@ MODULE_AUTHOR(Qumranet); MODULE_LICENSE(GPL); -static unsigned int halt_poll_ns; +#define KVM_HALT_POLL_NS 50 +#define KVM_HALT_POLL_NS_GROW 2 +#define KVM_HALT_POLL_NS_SHRINK 0 +#define KVM_HALT_POLL_NS_MAX 200 The macros are not necessary. Also, hard coding the numbers in the param definitions will make reading the comments above them easier. + +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS; module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR); +/* Default doubles per-vcpu halt_poll_ns. */ +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW; +module_param(halt_poll_ns_grow, int, S_IRUGO); + +/* Default resets per-vcpu halt_poll_ns . */ +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK; +module_param(halt_poll_ns_shrink, int, S_IRUGO); + +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */ Ah, I misspoke before. I was thinking about round-trip latency. The latency of a single halt is reduced by about 5-7 us. +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX; +module_param(halt_poll_ns_max, int, S_IRUGO); We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at zero and grow from there. Then we just need one module param to keep vcpu-halt_poll_ns from growing too large. [ It would make more sense to remove halt_poll_ns and keep halt_poll_ns_max, but since halt_poll_ns already exists in upstream kernels, we probably can't remove it. ] + /* * Ordering of locks: * @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); +static unsigned int __grow_halt_poll_ns(unsigned int val) +{ + if (halt_poll_ns_grow 1) + return halt_poll_ns; + + val = min(val, halt_poll_ns_max); + + if (val == 0) + return halt_poll_ns; + + if (halt_poll_ns_grow halt_poll_ns) + val *= halt_poll_ns_grow; + else + val += halt_poll_ns_grow; + + return val; +} + +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum) minimum never gets used. +{ + if (modifier 1) + return 0; + + if (modifier halt_poll_ns) + val /= modifier; + else + val -= modifier; + + return val; +} + +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) These
Re: add multiple times opening support to a virtserialport
On Thu, Aug 27, 2015 at 10:23:38AM -0400, Christopher Covington wrote: On 07/24/2015 08:00 AM, Matt Ma wrote: Hi all, Linaro has developed the foundation for the new Android Emulator code base based on a fairly recent upstream QEMU code base, when we re-based the code, we updated the device model to be more virtio based (for example the drives are now virtio block devices). The aim of this is to minimise the delta between upstream and the Android specific changes to QEMU. One Android emulator specific feature is the AndroidPipe. AndroidPipe is a communication channel between the guest system and the emulator itself. Guest side device node can be opened by multi processes at the same time with different service name. It has a de-multiplexer on the QEMU side to figure out which service the guest actually wanted, so the first write after opening device node is the service name guest wanted, after QEMU backend receive this service name, create a corresponding communication channel, initialize related component, such as file descriptor which connect to the host socket serve. So each opening in guest will create a separated communication channel. We can create a separate device for each service type, however some services, such as the OpenGL emulation, need to have multiple open channels at a time. This is currently not possible using the virtserialport which can only be opened once. Current virtserialport can not be opened by multiple processes at the same time. I know virtserialport has provided buffers beforehand to cache data from host to guest, so even there is no guest read, data can still be transported from host to guest kernel, when there is guest read request, just copy cached data to user space. We are not sure clearly whether virtio can support multi-open-per-device semantics or not, followings are just our initial ideas about adding multi-open-per-device feature to a port: * when there is a open request on a port, kernel will allocate a portclient with new id and __wait_queue_head to track this request * save this portclient in file-private_data * guest kernel pass this portclient info to QEMU and notify that the port has been opened * QEMU backend will create a clientinfo struct to track this communication channel, initialize related component * we may change the kernel side strategy of allocating receiving buffers in advance to a new strategy, that is when there is a read request: - allocate a port_buffer, put user space buffer address to port_buffer.buf, share memory to avoid memcpy - put both portclient id(or portclient addrss) and port_buffer.buf to virtqueue, that is the length of buffers chain is 2 - kick to notify QEMU backend to consume read buffer - QEMU backend read portclient info firstly to find the correct clientinfo, then read host data directly into virtqueue buffer to avoid memcpy - guest kernel will wait(similarly in block mode, because the user space address has been put into virtqueue) until QEMU backend has consumed buffer(all data/part data/nothing have been sent to host side) - if nothing has been read from host and file descriptor is in block mode, read request will wait through __wait_queue_head until host side is readable * above read logic may change the current behavior of transferring data to guest kernel even without guest user read * when there is a write request: - allocate a port_buffer, put user space buffer address to port_buffer.buf, share memory to avoid memcpy - put both portclient id(or portclient addrss) and port_buffer.buf to virtqueue, the length of buffers chain is 2 - kick to notify QEMU backend to consume write buffer - QEMU backend read portclient info firstly to find the correct clientinfo, then write the virtqueue buffer content to host side as current logic - guest kernel will wait(similarly in block mode, because the user space address has been put into virtqueue) until QEMU backend has consumed buffer(all data/part data/nothing have been receive from host side) - if nothing has been sent out and file descriptor is in block mode, write request will wait through __wait_queue_head until host side is writable We obviously don't want to regress existing virtio behaviour and performance and welcome the communities expertise to point out anything we may have missed out before we get to far down implementing our initial proof-of-concept. Hi Chris, Would virtio-vsock be interesting for your purposes? http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf (Video doesn't seem to be up yet, but should probably be available eventually at the following link) https://www.youtube.com/playlist?list=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep Thanks for looking at this lengthy mail. Yes, we are
Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment
On 8/28/15 12:25 AM, David Matlack wrote: On Thu, Aug 27, 2015 at 2:59 AM, Wanpeng Li wanpeng...@hotmail.com wrote: Hi David, On 8/26/15 1:19 AM, David Matlack wrote: Thanks for writing v2, Wanpeng. On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com wrote: There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patch adds the ability to adjust halt_poll_ns dynamically. What testing have you done with these patches? Do you know if this removes the overhead of polling in idle VCPUs? Do we lose any of the performance from always polling? There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter, halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow matrix is suggested by David: if (poll successfully for interrupt): stay the same else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow The way you implemented this wasn't what I expected. I thought you would time the whole function (kvm_vcpu_block). But I like your approach better. It's simpler and [by inspection] does what we want. I see there is more idle vCPUs overhead w/ this method even more than always halt-poll, so I bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks when idle VCPU is detected. The perfomance looks good in v4. Why did this patch have a worse idle overhead than always poll? I’m not sure. I make a mistake when I report the kernelbuild test, the perfomance is also worse than always poll w/ your method. I think your method didn't grow halt_poll_ns according to if interrupt arrives. Regards, Wanpeng Li Regards, Wanpeng Li halt_poll_ns_shrink/ | halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns -+--+--- 1 | = halt_poll_ns | = 0 halt_poll_ns | *= halt_poll_ns_grow | /= halt_poll_ns_shrink otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink I was curious why you went with this approach rather than just the middle row, or just the last row. Do you think we'll want the extra flexibility? Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- virt/kvm/kvm_main.c | 65 - 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 93db833..2a4962b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -66,9 +66,26 @@ MODULE_AUTHOR(Qumranet); MODULE_LICENSE(GPL); -static unsigned int halt_poll_ns; +#define KVM_HALT_POLL_NS 50 +#define KVM_HALT_POLL_NS_GROW 2 +#define KVM_HALT_POLL_NS_SHRINK 0 +#define KVM_HALT_POLL_NS_MAX 200 The macros are not necessary. Also, hard coding the numbers in the param definitions will make reading the comments above them easier. + +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS; module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR); +/* Default doubles per-vcpu halt_poll_ns. */ +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW; +module_param(halt_poll_ns_grow, int, S_IRUGO); + +/* Default resets per-vcpu halt_poll_ns . */ +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK; +module_param(halt_poll_ns_shrink, int, S_IRUGO); + +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */ Ah, I misspoke before. I was thinking about round-trip latency. The latency of a single halt is reduced by about 5-7 us. +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX; +module_param(halt_poll_ns_max, int, S_IRUGO); We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at zero and grow from there. Then we just need one module param to keep vcpu-halt_poll_ns from growing too large. [ It would make more sense to remove halt_poll_ns and keep halt_poll_ns_max, but since halt_poll_ns already exists in upstream kernels, we probably can't remove it. ] + /* * Ordering of locks: * @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); +static unsigned int __grow_halt_poll_ns(unsigned int val) +{ + if (halt_poll_ns_grow 1) + return halt_poll_ns; + + val = min(val, halt_poll_ns_max); + + if (val == 0) + return halt_poll_ns; + + if (halt_poll_ns_grow halt_poll_ns) + val *= halt_poll_ns_grow; + else + val += halt_poll_ns_grow; + + return val; +} + +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum) minimum never gets used. +{ + if (modifier 1) + return 0; + + if (modifier
Re: [Qemu-devel] add multiple times opening support to a virtserialport
Hello Christoffer, On Fri, Aug 28, 2015 at 2:30 AM, Christoffer Dall christoffer.d...@linaro.org wrote: On Thu, Aug 27, 2015 at 10:23:38AM -0400, Christopher Covington wrote: On 07/24/2015 08:00 AM, Matt Ma wrote: Hi all, Linaro has developed the foundation for the new Android Emulator code base based on a fairly recent upstream QEMU code base, when we re-based the code, we updated the device model to be more virtio based (for example the drives are now virtio block devices). The aim of this is to minimise the delta between upstream and the Android specific changes to QEMU. One Android emulator specific feature is the AndroidPipe. AndroidPipe is a communication channel between the guest system and the emulator itself. Guest side device node can be opened by multi processes at the same time with different service name. It has a de-multiplexer on the QEMU side to figure out which service the guest actually wanted, so the first write after opening device node is the service name guest wanted, after QEMU backend receive this service name, create a corresponding communication channel, initialize related component, such as file descriptor which connect to the host socket serve. So each opening in guest will create a separated communication channel. We can create a separate device for each service type, however some services, such as the OpenGL emulation, need to have multiple open channels at a time. This is currently not possible using the virtserialport which can only be opened once. Current virtserialport can not be opened by multiple processes at the same time. I know virtserialport has provided buffers beforehand to cache data from host to guest, so even there is no guest read, data can still be transported from host to guest kernel, when there is guest read request, just copy cached data to user space. We are not sure clearly whether virtio can support multi-open-per-device semantics or not, followings are just our initial ideas about adding multi-open-per-device feature to a port: * when there is a open request on a port, kernel will allocate a portclient with new id and __wait_queue_head to track this request * save this portclient in file-private_data * guest kernel pass this portclient info to QEMU and notify that the port has been opened * QEMU backend will create a clientinfo struct to track this communication channel, initialize related component * we may change the kernel side strategy of allocating receiving buffers in advance to a new strategy, that is when there is a read request: - allocate a port_buffer, put user space buffer address to port_buffer.buf, share memory to avoid memcpy - put both portclient id(or portclient addrss) and port_buffer.buf to virtqueue, that is the length of buffers chain is 2 - kick to notify QEMU backend to consume read buffer - QEMU backend read portclient info firstly to find the correct clientinfo, then read host data directly into virtqueue buffer to avoid memcpy - guest kernel will wait(similarly in block mode, because the user space address has been put into virtqueue) until QEMU backend has consumed buffer(all data/part data/nothing have been sent to host side) - if nothing has been read from host and file descriptor is in block mode, read request will wait through __wait_queue_head until host side is readable * above read logic may change the current behavior of transferring data to guest kernel even without guest user read * when there is a write request: - allocate a port_buffer, put user space buffer address to port_buffer.buf, share memory to avoid memcpy - put both portclient id(or portclient addrss) and port_buffer.buf to virtqueue, the length of buffers chain is 2 - kick to notify QEMU backend to consume write buffer - QEMU backend read portclient info firstly to find the correct clientinfo, then write the virtqueue buffer content to host side as current logic - guest kernel will wait(similarly in block mode, because the user space address has been put into virtqueue) until QEMU backend has consumed buffer(all data/part data/nothing have been receive from host side) - if nothing has been sent out and file descriptor is in block mode, write request will wait through __wait_queue_head until host side is writable We obviously don't want to regress existing virtio behaviour and performance and welcome the communities expertise to point out anything we may have missed out before we get to far down implementing our initial proof-of-concept. Hi Chris, Would virtio-vsock be interesting for your purposes? http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf (Video doesn't seem to be up yet, but should probably be available eventually at the following link)
RE: A question about vring operation
On 24/07/2015 21:36, Michael S. Tsirkin wrote: On Thu, Aug 27, 2015 at 01:18:47AM +, Wang, Wei W wrote: Hi all, I have a question about the vring_avail: It only includes an idx (equivalent to the ring tail), which is used by the frontend (virtio_net) to fill bufs. The backend (e.g. vhost_net) maintains the ring head (last_avail_idx) by itself. The frontend checks if the ring is full or empty via a counter (vq-num_free). My question is why can’t we include the ring head in the vring_avail struct, so that the vq-num_free is not needed, and the backend can directly use it without maintaining its own copy? Thanks, Wei I'm not sure I understand your proposal, and what it would accomplish. Write a patch, that'll make it easier to discuss. Also copy all relevant mailing lists, not just kvm. Thanks Michael. I haven’t got the patch ready yet. I am just wondering if it is better to have last_avail_idx in vhost_virtqueue moved into vring_avail. In other regular ring operations, we usually use *head and *tail to judge if the ring is full or empty, but it seems vring_avail does not use this method in virtio_net.c (it uses a counter). Best, Wei
[PATCH v4 0/3] KVM: Dynamic Halt-Polling
v3 - v4: * bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks when idle VCPU is detected v2 - v3: * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink * drop the macros and hard coding the numbers in the param definitions * update the comments 5-7 us * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time, vcpu-halt_poll_ns start at zero * drop the wrappers * move the grow/shrink logic before out: w/ if (waited) v1 - v2: * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of the module parameter * use the shrink/grow matrix which is suggested by David * set halt_poll_ns_max to 2ms There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patchset add the ability to adjust halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and shrinks halt_poll_ns when idle VCPU is detected. There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of halt-poll is the default 50ns, the max halt_poll_ns of dynamic halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool. The test method is almost from David. +-++---+ | || | | w/o halt-poll | w/ halt-poll | dynamic halt-poll | +-++---+ | || | |~0.9%|~1.8% | ~1.2% | +-++---+ The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead introduced by always halt-poll. Wanpeng Li (3): KVM: make halt_poll_ns per-VCPU KVM: dynamic halt_poll_ns adjustment KVM: trace kvm_halt_poll_ns grow/shrink include/linux/kvm_host.h | 1 + include/trace/events/kvm.h | 30 virt/kvm/kvm_main.c| 50 +++--- 3 files changed, 78 insertions(+), 3 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 3/3] KVM: trace kvm_halt_poll_ns grow/shrink
Tracepoint for dynamic halt_pool_ns, fired on every potential change. Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- include/trace/events/kvm.h | 30 ++ virt/kvm/kvm_main.c| 8 ++-- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index a44062d..75ddf80 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -356,6 +356,36 @@ TRACE_EVENT( __entry-address) ); +TRACE_EVENT(kvm_halt_poll_ns, + TP_PROTO(bool grow, unsigned int vcpu_id, int new, int old), + TP_ARGS(grow, vcpu_id, new, old), + + TP_STRUCT__entry( + __field(bool, grow) + __field(unsigned int, vcpu_id) + __field(int, new) + __field(int, old) + ), + + TP_fast_assign( + __entry-grow = grow; + __entry-vcpu_id= vcpu_id; + __entry-new= new; + __entry-old= old; + ), + + TP_printk(vcpu %u: halt_pool_ns %d (%s %d), + __entry-vcpu_id, + __entry-new, + __entry-grow ? grow : shrink, + __entry-old) +); + +#define trace_kvm_halt_poll_ns_grow(vcpu_id, new, old) \ + trace_kvm_halt_poll_ns(true, vcpu_id, new, old) +#define trace_kvm_halt_poll_ns_shrink(vcpu_id, new, old) \ + trace_kvm_halt_poll_ns(false, vcpu_id, new, old) + #endif #endif /* _TRACE_KVM_MAIN_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d63790d..a3aa23f0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1918,8 +1918,9 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) { - int val = vcpu-halt_poll_ns; + int old, val; + old = val = vcpu-halt_poll_ns; /* 500us step */ if (val == 0 halt_poll_ns_grow) val = 50; @@ -1927,18 +1928,21 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) val *= halt_poll_ns_grow; vcpu-halt_poll_ns = val; + trace_kvm_halt_poll_ns_grow(vcpu-vcpu_id, val, old); } static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu) { - int val = vcpu-halt_poll_ns; + int old, val; + old = val = vcpu-halt_poll_ns; if (halt_poll_ns_shrink == 0) val = 0; else val /= halt_poll_ns_shrink; vcpu-halt_poll_ns = val; + trace_kvm_halt_poll_ns_shrink(vcpu-vcpu_id, val, old); } static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/3] KVM: make halt_poll_ns per-VCPU
Change halt_poll_ns into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 81089cf..1bef9e2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -242,6 +242,7 @@ struct kvm_vcpu { int sigset_active; sigset_t sigset; struct kvm_vcpu_stat stat; + unsigned int halt_poll_ns; #ifdef CONFIG_HAS_IOMEM int mmio_needed; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d8db2f8f..c06e57c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -217,6 +217,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) vcpu-kvm = kvm; vcpu-vcpu_id = id; vcpu-pid = NULL; + vcpu-halt_poll_ns = 0; init_waitqueue_head(vcpu-wq); kvm_async_pf_vcpu_init(vcpu); @@ -1930,8 +1931,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) bool waited = false; start = cur = ktime_get(); - if (halt_poll_ns) { - ktime_t stop = ktime_add_ns(ktime_get(), halt_poll_ns); + if (vcpu-halt_poll_ns) { + ktime_t stop = ktime_add_ns(ktime_get(), vcpu-halt_poll_ns); do { /* -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/3] KVM: Dynamic Halt-Polling
v3 - v4: * bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks when idle VCPU is detected v2 - v3: * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink * drop the macros and hard coding the numbers in the param definitions * update the comments 5-7 us * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time, vcpu-halt_poll_ns start at zero * drop the wrappers * move the grow/shrink logic before out: w/ if (waited) v1 - v2: * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of the module parameter * use the shrink/grow matrix which is suggested by David * set halt_poll_ns_max to 2ms There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patchset add the ability to adjust halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and shrinks halt_poll_ns when idle VCPU is detected. There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of halt-poll is the default 50ns, the max halt_poll_ns of dynamic halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool. The test method is almost from David. +-++---+ | || | | w/o halt-poll | w/ halt-poll | dynamic halt-poll | +-++---+ | || | |~0.9%|~1.8% | ~1.2% | +-++---+ The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead introduced by always halt-poll. Wanpeng Li (3): KVM: make halt_poll_ns per-VCPU KVM: dynamic halt_poll_ns adjustment KVM: trace kvm_halt_poll_ns grow/shrink include/linux/kvm_host.h | 1 + include/trace/events/kvm.h | 30 virt/kvm/kvm_main.c| 50 +++--- 3 files changed, 78 insertions(+), 3 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/3] KVM: dynamic halt_poll_ns adjustment
There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patchset add the ability to adjust halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and shrinks halt_poll_ns when idle VCPU is detected. There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of halt-poll is the default 50ns, the max halt_poll_ns of dynamic halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool. The test method is almost from David. +-++---+ | || | | w/o halt-poll | w/ halt-poll | dynamic halt-poll | +-++---+ | || | |~0.9%|~1.8% | ~1.2% | +-++---+ The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead introduced by always halt-poll. Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- virt/kvm/kvm_main.c | 41 - 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c06e57c..d63790d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -66,9 +66,18 @@ MODULE_AUTHOR(Qumranet); MODULE_LICENSE(GPL); -static unsigned int halt_poll_ns; +/* halt polling only reduces halt latency by 5-7 us, 2ms is enough */ +static unsigned int halt_poll_ns = 200; module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR); +/* Default doubles per-vcpu halt_poll_ns. */ +static unsigned int halt_poll_ns_grow = 2; +module_param(halt_poll_ns_grow, int, S_IRUGO); + +/* Default resets per-vcpu halt_poll_ns . */ +static unsigned int halt_poll_ns_shrink; +module_param(halt_poll_ns_shrink, int, S_IRUGO); + /* * Ordering of locks: * @@ -1907,6 +1916,31 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) +{ + int val = vcpu-halt_poll_ns; + + /* 500us step */ + if (val == 0 halt_poll_ns_grow) + val = 50; + else + val *= halt_poll_ns_grow; + + vcpu-halt_poll_ns = val; +} + +static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu) +{ + int val = vcpu-halt_poll_ns; + + if (halt_poll_ns_shrink == 0) + val = 0; + else + val /= halt_poll_ns_shrink; + + vcpu-halt_poll_ns = val; +} + static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) { if (kvm_arch_vcpu_runnable(vcpu)) { @@ -1961,6 +1995,11 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) cur = ktime_get(); out: + if (waited vcpu-halt_poll_ns 0) + shrink_halt_poll_ns(vcpu); + else if (vcpu-halt_poll_ns halt_poll_ns) + grow_halt_poll_ns(vcpu); + trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited); } EXPORT_SYMBOL_GPL(kvm_vcpu_block); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment
Hi David, On 8/26/15 1:19 AM, David Matlack wrote: Thanks for writing v2, Wanpeng. On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com wrote: There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patch adds the ability to adjust halt_poll_ns dynamically. What testing have you done with these patches? Do you know if this removes the overhead of polling in idle VCPUs? Do we lose any of the performance from always polling? There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter, halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow matrix is suggested by David: if (poll successfully for interrupt): stay the same else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow The way you implemented this wasn't what I expected. I thought you would time the whole function (kvm_vcpu_block). But I like your approach better. It's simpler and [by inspection] does what we want. I see there is more idle vCPUs overhead w/ this method even more than always halt-poll, so I bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks when idle VCPU is detected. The perfomance looks good in v4. Regards, Wanpeng Li halt_poll_ns_shrink/ | halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns -+--+--- 1 | = halt_poll_ns | = 0 halt_poll_ns | *= halt_poll_ns_grow | /= halt_poll_ns_shrink otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink I was curious why you went with this approach rather than just the middle row, or just the last row. Do you think we'll want the extra flexibility? Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- virt/kvm/kvm_main.c | 65 - 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 93db833..2a4962b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -66,9 +66,26 @@ MODULE_AUTHOR(Qumranet); MODULE_LICENSE(GPL); -static unsigned int halt_poll_ns; +#define KVM_HALT_POLL_NS 50 +#define KVM_HALT_POLL_NS_GROW 2 +#define KVM_HALT_POLL_NS_SHRINK 0 +#define KVM_HALT_POLL_NS_MAX 200 The macros are not necessary. Also, hard coding the numbers in the param definitions will make reading the comments above them easier. + +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS; module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR); +/* Default doubles per-vcpu halt_poll_ns. */ +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW; +module_param(halt_poll_ns_grow, int, S_IRUGO); + +/* Default resets per-vcpu halt_poll_ns . */ +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK; +module_param(halt_poll_ns_shrink, int, S_IRUGO); + +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */ Ah, I misspoke before. I was thinking about round-trip latency. The latency of a single halt is reduced by about 5-7 us. +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX; +module_param(halt_poll_ns_max, int, S_IRUGO); We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at zero and grow from there. Then we just need one module param to keep vcpu-halt_poll_ns from growing too large. [ It would make more sense to remove halt_poll_ns and keep halt_poll_ns_max, but since halt_poll_ns already exists in upstream kernels, we probably can't remove it. ] + /* * Ordering of locks: * @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); +static unsigned int __grow_halt_poll_ns(unsigned int val) +{ + if (halt_poll_ns_grow 1) + return halt_poll_ns; + + val = min(val, halt_poll_ns_max); + + if (val == 0) + return halt_poll_ns; + + if (halt_poll_ns_grow halt_poll_ns) + val *= halt_poll_ns_grow; + else + val += halt_poll_ns_grow; + + return val; +} + +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum) minimum never gets used. +{ + if (modifier 1) + return 0; + + if (modifier halt_poll_ns) + val /= modifier; + else + val -= modifier; + + return val; +} + +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) These wrappers aren't necessary. +{ + vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns); +} + +static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu) +{ + vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns, +
[PATCH] KVM: arm/arm64: BUG FIX: Do not inject spurious interrupts
Commit 71760950bf3dc796e5e53ea3300dec724a09f593 (arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn) introduced vgic_queue_irq_to_lr() function with additional vgic_dist_irq_is_pending() check before setting LR_STATE_PENDING bit. In some cases it started causing the following situation if the userland quickly drops the IRQ back to inactive state for some reason: 1. Userland injects an IRQ with level == 1, this ends up in vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending() for this IRQ. 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly (!!!) 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending() in this case will call vgic_dist_irq_clear_pending() and reset the pending flag. 4. vCPU finally wakes up. It succesfully rolls through through __kvm_vgic_flush_hwstate(), which populates vGIC registers. However, since neither pending nor active flags are now set for this IRQ, vgic_queue_irq_to_lr() does not set any state bits on this LR at all. Since this is level-sensitive IRQ, we end up in LR containing only LR_EOI_INT bit, causing unnecessary immediate exit from the guest. This patch fixes the problem by adding forgotten vgic_cpu_irq_clear(). This causes the IRQ not to be included into any lists, if it has been picked up after getting dropped to inactive level. Since this is a level-sensitive IRQ, this is correct behavior. The bug was caught on ARM64 kernel v4.1.6, running qemu virt guest, where it was caused by emulated pl011. Signed-off-by: Pavel Fedin p.fe...@samsung.com --- virt/kvm/arm/vgic.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 34dad3c..bf155e3 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -,7 +,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, kvm_debug(Set active, clear distributor: 0x%x\n, vlr.state); vgic_irq_clear_active(vcpu, irq); vgic_update_state(vcpu-kvm); - } else if (vgic_dist_irq_is_pending(vcpu, irq)) { + } else { + WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq)); vlr.state |= LR_STATE_PENDING; kvm_debug(Set pending: 0x%x\n, vlr.state); } @@ -1567,8 +1568,10 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid, } else { if (level_triggered) { vgic_dist_irq_clear_level(vcpu, irq_num); - if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) + if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) { vgic_dist_irq_clear_pending(vcpu, irq_num); + vgic_cpu_irq_clear(vcpu, irq_num); + } } ret = false; -- 2.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] pci-test: allow bar 4k
vmexit using test assumes BAR size is exactly 4K. Remove this restriction, allow any BAR size. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- x86/vmexit.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/x86/vmexit.c b/x86/vmexit.c index 7e9af15..3ff7fbf 100644 --- a/x86/vmexit.c +++ b/x86/vmexit.c @@ -407,7 +407,7 @@ bool test_wanted(struct test *test, char *wanted[], int nwanted) int main(int ac, char **av) { int i; - unsigned long membar = 0, base, offset; + unsigned long membar = 0, base, offset, size; void *m; pcidevaddr_t pcidev; @@ -425,12 +425,18 @@ int main(int ac, char **av) continue; } if (pci_bar_is_memory(pcidev, i)) { + int p; + + size = pci_bar_size(pcidev, i); membar = pci_bar_addr(pcidev, i); - base = membar ~4095; + base = membar ~(size - 1); offset = membar - base; - m = alloc_vpages(1); - - install_page((void *)read_cr3(), base, m); + m = alloc_vpages(size / 4096); + + for (p = 0; p size; p += 4096) { + install_page((void *)read_cr3(), base + p, + m + p); + } pci_test.memaddr = m + offset; } else { pci_test.iobar = pci_bar_addr(pcidev, i); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Correct type based on checkpatch.pl
Signed-off-by: Minjune Kim infinite.minjune@gmail.com --- arch/arm/kvm/interrupts.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index 900ef6d..30bded6 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -353,7 +353,7 @@ hyp_dabt: .align hyp_hvc: /* -* Getting here is either becuase of a trap from a guest or from calling +* Getting here is either because of a trap from a guest or from calling * HVC from the host kernel, which means switch to Hyp mode. */ push{r0, r1, r2} -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] pci: add bar sizing
Will be used for pci-testdev. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- lib/x86/pci.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/lib/x86/pci.c b/lib/x86/pci.c index 231668a..5116dac 100644 --- a/lib/x86/pci.c +++ b/lib/x86/pci.c @@ -19,6 +19,13 @@ static uint32_t pci_config_read(pcidevaddr_t dev, uint8_t reg) return inl(0xCFC); } +static uint32_t pci_config_write(pcidevaddr_t dev, uint8_t reg, uint32_t val) +{ +uint32_t index = reg | (dev 8) | (0x1 31); +outl(0xCF8, index); +outl(0xCFC, val); +} + /* Scan bus look for a specific device. Only bus 0 scanned for now. */ pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id) { @@ -42,6 +49,22 @@ unsigned long pci_bar_addr(pcidevaddr_t dev, int bar_num) } } +unsigned long pci_bar_size(pcidevaddr_t dev, int bar_num) +{ +uint32_t bar = pci_config_read(dev, PCI_BASE_ADDRESS_0 + bar_num * 4); +uint32_t mask; + +pci_config_write(dev, PCI_BASE_ADDRESS_0 + bar_num * 4, 0x); +mask = pci_config_read(dev, PCI_BASE_ADDRESS_0 + bar_num * 4); +pci_config_write(dev, PCI_BASE_ADDRESS_0 + bar_num * 4, bar); + +if (bar PCI_BASE_ADDRESS_SPACE_IO) { +return ~(mask PCI_BASE_ADDRESS_IO_MASK) + 1; +} else { +return ~(mask PCI_BASE_ADDRESS_MEM_MASK) + 1; +} +} + bool pci_bar_is_memory(pcidevaddr_t dev, int bar_num) { uint32_t bar = pci_config_read(dev, PCI_BASE_ADDRESS_0 + bar_num * 4); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: Security Policy
Hello kvm mailing list, I assume, this is a rather uncommon mailing list post since it is not directly related to the usage or development of KVM. Instead, the following is the case: I am a student of computer science and am currently working on my masters thesis. The work in progress topic is Mining vulnerability databases for information on hypervisor vulnerabilities: Analyses and Predictions. In the context of this research work i am analyzing various security related aspects regarding different hypervisors including KVM (A simple example contained in my analysis is the discovery process of security vulnerabilities and how the total number of disclosed vulnerabilities developes over time). The reason i am writing this post to the public mailing list is, that i am looking for someone who might be willing to support me during my work with (for example) information and/or personal experience. Or simply said: May i post questions and ask for help explaining my findings from time to time or is this too much off-topic for this mailing list? For now the question would be, whether there is some kind of a formal documentation of the vulnerability disclosure process or a security policy specific for KVM? If someone has any information regarding this, feel free to contact me directly through my personal mail address. Any help and information will be greatly appreciated! If this post is misplaced at this mailing list maybe someone could point me at the right place. Kind regards and thank you in advance, Stefan Geißler -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html