[Bug 50921] kvm hangs booting Windows 2000
https://bugzilla.kernel.org/show_bug.cgi?id=50921 --- Comment #21 from Gleb g...@redhat.com 2013-02-03 08:43:20 --- It is queued for 3.8. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 50921] kvm hangs booting Windows 2000
https://bugzilla.kernel.org/show_bug.cgi?id=50921 --- Comment #22 from Gleb g...@redhat.com 2013-02-03 08:48:16 --- (In reply to comment #21) It is queued for 3.8. Sorry, for 3.9 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: VMX: disable SMEP feature when guest is in non-paging mode
On Fri, Feb 01, 2013 at 08:30:07AM -, Xu wrote: SMEP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with HAP. Not always, only if unrestricted mode is disabled, since vm86 mode, that is used otherwise, requires paging. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0. This is because KVM uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Signed-off-by: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com Reviewed-by: Gleb Natapov g...@redhat.com But HAP is XEN terminology AFAIK. KVM speak is tdp (two dimensional paging). If would be nice to change it in the commit message and the comment before committing. --- arch/x86/kvm/vmx.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9120ae1..e82f20d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3155,6 +3155,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if (!is_paging(vcpu)) { hw_cr4 = ~X86_CR4_PAE; hw_cr4 |= X86_CR4_PSE; + /* + * SMEP is disabled if CPU is in non-paging mode in + * hardware. However KVM always uses paging mode to + * emulate guest non-paging mode with HAP. + * To emulate this behavior, SMEP needs to be manually + * disabled when guest switches to non-paging mode. + */ + hw_cr4 = ~X86_CR4_SMEP; } else if (!(cr4 X86_CR4_PAE)) { hw_cr4 = ~X86_CR4_PAE; } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/3] VFIO: Wrapper for getting reference to vfio_device from device
- Added vfio_device_get_from_dev() as wrapper to get reference to vfio_device from struct device. - Added vfio_device_data() as a wrapper to get device_data from vfio_device. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- drivers/vfio/vfio.c | 41 - include/linux/vfio.h | 3 +++ 2 files changed, 35 insertions(+), 9 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 12c264d..f0a78a2 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -407,12 +407,13 @@ static void vfio_device_release(struct kref *kref) } /* Device reference always implies a group reference */ -static void vfio_device_put(struct vfio_device *device) +void vfio_device_put(struct vfio_device *device) { struct vfio_group *group = device-group; kref_put_mutex(device-kref, vfio_device_release, group-device_lock); vfio_group_put(group); } +EXPORT_SYMBOL_GPL(vfio_device_put); static void vfio_device_get(struct vfio_device *device) { @@ -642,8 +643,12 @@ int vfio_add_group_dev(struct device *dev, } EXPORT_SYMBOL_GPL(vfio_add_group_dev); -/* Test whether a struct device is present in our tracking */ -static bool vfio_dev_present(struct device *dev) +/** + * This does a get on the vfio_device from device. + * Callers of this function will have to call vfio_put_device() to + * remove the reference. + */ +struct vfio_device *vfio_device_get_from_dev(struct device *dev) { struct iommu_group *iommu_group; struct vfio_group *group; @@ -651,25 +656,43 @@ static bool vfio_dev_present(struct device *dev) iommu_group = iommu_group_get(dev); if (!iommu_group) - return false; + return NULL; group = vfio_group_get_from_iommu(iommu_group); if (!group) { iommu_group_put(iommu_group); - return false; + return NULL; } device = vfio_group_get_device(group, dev); if (!device) { vfio_group_put(group); iommu_group_put(iommu_group); - return false; + return NULL; } - - vfio_device_put(device); vfio_group_put(group); iommu_group_put(iommu_group); - return true; + return device; +} +EXPORT_SYMBOL_GPL(vfio_device_get_from_dev); + +void *vfio_device_data(struct vfio_device *device) +{ + return device-device_data; +} +EXPORT_SYMBOL_GPL(vfio_device_data); + +/* Test whether a struct device is present in our tracking */ +static bool vfio_dev_present(struct device *dev) +{ + struct vfio_device *device; + + device = vfio_device_get_from_dev(dev); + if (device) { + vfio_device_put(device); + return true; + } else + return false; } /* diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ab9e862..ac8d488 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -45,6 +45,9 @@ extern int vfio_add_group_dev(struct device *dev, void *device_data); extern void *vfio_del_group_dev(struct device *dev); +extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); +extern void vfio_device_put(struct vfio_device *device); +extern void *vfio_device_data(struct vfio_device *device); /** * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks -- 1.7.11.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/3] AER-KVM: Error containment of VFIO devices assigned to KVM guests
Add support for error containment when a VFIO device assigned to a KVM guest encounters an error. This is for PCIe devices/drivers that support AER functionality. When the host OS is notified of an error in a device either through the firmware first approach or through an interrupt handled by the AER root port driver, the error handler registered by the vfio-pci driver gets invoked. The qemu process is signaled through an eventfd registered per VFIO device by the qemu process. In the eventfd handler, qemu decides on what action to take. In this implementation, guest is brought down to contain the error. v3: - Removed PCI_AER* flags from device info ioctl. - Incorporated feedback v2: - Rebased to latest upstream stable bits - Changed the new ioctl to be part of VFIO_SET_IRQs ioctl - Added a new patch to get/put reference to a vfio device from struct device - Incorporated all other feedback. --- Vijay Mohan Pandarathil(3): [PATCH 1/3] VFIO: Wrapper for getting reference to vfio_device from device [PATCH 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER [PATCH 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices Kernel files changed drivers/vfio/vfio.c | 41 - include/linux/vfio.h | 3 +++ 2 files changed, 35 insertions(+), 9 deletions(-) drivers/vfio/pci/vfio_pci.c | 43 - drivers/vfio/pci/vfio_pci_intrs.c | 30 ++ drivers/vfio/pci/vfio_pci_private.h | 1 + include/uapi/linux/vfio.h | 1 + 4 files changed, 74 insertions(+), 1 deletion(-) Qemu files changed hw/vfio_pci.c | 105 + linux-headers/linux/vfio.h | 1 + 2 files changed, 106 insertions(+) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices
- Create eventfd per vfio device assigned to a guest and register an event handler - This fd is passed to the vfio_pci driver through the SET_IRQ ioctl - When the device encounters an error, the eventfd is signalled and the qemu eventfd handler gets invoked. - In the handler decide what action to take. Current action taken is to terminate the guest. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- hw/vfio_pci.c | 105 + linux-headers/linux/vfio.h | 1 + 2 files changed, 106 insertions(+) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..4e2f768 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -130,6 +130,8 @@ typedef struct VFIODevice { QLIST_ENTRY(VFIODevice) next; struct VFIOGroup *group; bool reset_works; +EventNotifier err_notifier; +bool pci_aer; } VFIODevice; typedef struct VFIOGroup { @@ -1922,6 +1924,106 @@ static void vfio_put_device(VFIODevice *vdev) } } +static void vfio_err_notifier_handler(void *opaque) +{ +VFIODevice *vdev = opaque; + +if (!event_notifier_test_and_clear(vdev-err_notifier)) { +return; +} + +/* + * TBD. Retrieve the error details and decide what action + * needs to be taken. One of the actions could be to pass + * the error to the guest and have the guest driver recover + * from the error. This requires that PCIe capabilities be + * exposed to the guest. At present, we just terminate the + * guest to contain the error. + */ + +error_report(%s (%04x:%02x:%02x.%x) +Unrecoverable error detected... Terminating guest\n, +__func__, vdev-host.domain, vdev-host.bus, +vdev-host.slot, vdev-host.function); + +hw_error((%04x:%02x:%02x.%x) Unrecoverable device error\n, +vdev-host.domain, vdev-host.bus, +vdev-host.slot, vdev-host.function); + +return; +} + +static void vfio_register_err_notifier(VFIODevice *vdev) +{ +int ret; +int argsz; +struct vfio_irq_set *irq_set; +int32_t *pfd; + +if (event_notifier_init(vdev-err_notifier, 0)) { +error_report(vfio: Warning: Unable to init event notifier for error detection\n); +return; +} + +argsz = sizeof(*irq_set) + sizeof(*pfd); + +irq_set = g_malloc0(argsz); +irq_set-argsz = argsz; +irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD | + VFIO_IRQ_SET_ACTION_TRIGGER; +irq_set-index = VFIO_PCI_ERR_IRQ_INDEX; +irq_set-start = 0; +irq_set-count = 1; +pfd = (int32_t *)irq_set-data; + +*pfd = event_notifier_get_fd(vdev-err_notifier); +qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev); + +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +if (ret) { +DPRINTF(vfio: Error notification not supported for the device\n); +qemu_set_fd_handler(*pfd, NULL, NULL, vdev); +event_notifier_cleanup(vdev-err_notifier); +g_free(irq_set); +return; +} +g_free(irq_set); +vdev-pci_aer = 1; +return; +} +static void vfio_unregister_err_notifier(VFIODevice *vdev) +{ +int argsz; +struct vfio_irq_set *irq_set; +int32_t *pfd; +int ret; + +if (!vdev-pci_aer) { +return; +} + +argsz = sizeof(*irq_set) + sizeof(*pfd); + +irq_set = g_malloc0(argsz); +irq_set-argsz = argsz; +irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD | + VFIO_IRQ_SET_ACTION_TRIGGER; +irq_set-index = VFIO_PCI_ERR_IRQ_INDEX; +irq_set-start = 0; +irq_set-count = 1; +pfd = (int32_t *)irq_set-data; +*pfd = -1; + +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +if (ret) { +DPRINTF(vfio: Failed to de-assign error fd: %d\n, ret); +} +g_free(irq_set); +qemu_set_fd_handler(event_notifier_get_fd(vdev-err_notifier), +NULL, NULL, vdev); +event_notifier_cleanup(vdev-err_notifier); +return; +} static int vfio_initfn(PCIDevice *pdev) { VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev); @@ -2032,6 +2134,8 @@ static int vfio_initfn(PCIDevice *pdev) } } +vfio_register_err_notifier(vdev); + return 0; out_teardown: @@ -2049,6 +2153,7 @@ static void vfio_exitfn(PCIDevice *pdev) VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev); VFIOGroup *group = vdev-group; +vfio_unregister_err_notifier(vdev); pci_device_set_intx_routing_notifier(vdev-pdev, NULL); vfio_disable_interrupts(vdev); if (vdev-intx.mmap_timer) { diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index f787b72..6b20849 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -310,6 +310,7 @@ enum { VFIO_PCI_INTX_IRQ_INDEX, VFIO_PCI_MSI_IRQ_INDEX, VFIO_PCI_MSIX_IRQ_INDEX, + VFIO_PCI_ERR_IRQ_INDEX, VFIO_PCI_NUM_IRQS };
Re: [PATCH v3 0/3] AER-KVM: Error containment of VFIO devices assigned to KVM guests
Please use git send-email --thread --no-chain-reply-to when sending patch series. On Sun, Feb 03, 2013 at 02:10:08PM +, Pandarathil, Vijaymohan R wrote: Add support for error containment when a VFIO device assigned to a KVM guest encounters an error. This is for PCIe devices/drivers that support AER functionality. When the host OS is notified of an error in a device either through the firmware first approach or through an interrupt handled by the AER root port driver, the error handler registered by the vfio-pci driver gets invoked. The qemu process is signaled through an eventfd registered per VFIO device by the qemu process. In the eventfd handler, qemu decides on what action to take. In this implementation, guest is brought down to contain the error. v3: - Removed PCI_AER* flags from device info ioctl. - Incorporated feedback v2: - Rebased to latest upstream stable bits - Changed the new ioctl to be part of VFIO_SET_IRQs ioctl - Added a new patch to get/put reference to a vfio device from struct device - Incorporated all other feedback. --- Vijay Mohan Pandarathil(3): [PATCH 1/3] VFIO: Wrapper for getting reference to vfio_device from device [PATCH 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER [PATCH 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices Kernel files changed drivers/vfio/vfio.c | 41 - include/linux/vfio.h | 3 +++ 2 files changed, 35 insertions(+), 9 deletions(-) drivers/vfio/pci/vfio_pci.c | 43 - drivers/vfio/pci/vfio_pci_intrs.c | 30 ++ drivers/vfio/pci/vfio_pci_private.h | 1 + include/uapi/linux/vfio.h | 1 + 4 files changed, 74 insertions(+), 1 deletion(-) Qemu files changed hw/vfio_pci.c | 105 + linux-headers/linux/vfio.h | 1 + 2 files changed, 106 insertions(+) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER
- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when an error occurs in the vfio_pci_device - Register pci_error_handler for the vfio_pci driver - When the device encounters an error, the error handler registered by the vfio_pci driver gets invoked by the AER infrastructure - In the error handler, signal the eventfd registered for the device. - This results in the qemu eventfd handler getting invoked and appropriate action taken for the guest. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- drivers/vfio/pci/vfio_pci.c | 43 - drivers/vfio/pci/vfio_pci_intrs.c | 30 ++ drivers/vfio/pci/vfio_pci_private.h | 1 + include/uapi/linux/vfio.h | 1 + 4 files changed, 74 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index b28e66c..818b1ed 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -196,7 +196,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) return (flags PCI_MSIX_FLAGS_QSIZE) + 1; } - } + } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) + if (pci_is_pcie(vdev-pdev)) + return 1; return 0; } @@ -302,6 +304,16 @@ static long vfio_pci_ioctl(void *device_data, if (info.argsz minsz || info.index = VFIO_PCI_NUM_IRQS) return -EINVAL; + switch (info.index) { + case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX: + break; + case VFIO_PCI_ERR_IRQ_INDEX: + if (pci_is_pcie(vdev-pdev)) + break; + default: + return -EINVAL; + } + info.flags = VFIO_IRQ_INFO_EVENTFD; info.count = vfio_pci_get_irq_count(vdev, info.index); @@ -538,11 +550,40 @@ static void vfio_pci_remove(struct pci_dev *pdev) kfree(vdev); } +static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct vfio_pci_device *vpdev; + void *vdev; + + vdev = vfio_device_get_from_dev(pdev-dev); + if (vdev == NULL) + return PCI_ERS_RESULT_DISCONNECT; + + vpdev = vfio_device_data(vdev); + if (vpdev == NULL) { + vfio_device_put(vdev); + return PCI_ERS_RESULT_DISCONNECT; + } + + if (vpdev-err_trigger) + eventfd_signal(vpdev-err_trigger, 1); + + vfio_device_put(vdev); + + return PCI_ERS_RESULT_CAN_RECOVER; +} + +static struct pci_error_handlers vfio_err_handlers = { + .error_detected = vfio_pci_aer_err_detected, +}; + static struct pci_driver vfio_pci_driver = { .name = vfio-pci, .id_table = NULL, /* only dynamic ids */ .probe = vfio_pci_probe, .remove = vfio_pci_remove, + .err_handler= vfio_err_handlers, }; static void __exit vfio_pci_cleanup(void) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 3639371..83035b1 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -745,6 +745,29 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_device *vdev, return 0; } +static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev, + unsigned index, unsigned start, + unsigned count, uint32_t flags, void *data) +{ + int32_t fd = *(int32_t *)data; + + if ((index != VFIO_PCI_ERR_IRQ_INDEX) || + !(flags VFIO_IRQ_SET_DATA_EVENTFD)) + return -EINVAL; + + if (fd == -1) { + if (vdev-err_trigger) + eventfd_ctx_put(vdev-err_trigger); + vdev-err_trigger = NULL; + return 0; + } else if (fd = 0) { + vdev-err_trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(vdev-err_trigger)) + return PTR_ERR(vdev-err_trigger); + return 0; + } else + return -EINVAL; +} int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags, unsigned index, unsigned start, unsigned count, void *data) @@ -779,6 +802,13 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags, break; } break; + case VFIO_PCI_ERR_IRQ_INDEX: + switch (flags VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_TRIGGER: + if
[PATCH] KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 694586c..5c9dbad 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -105,7 +105,12 @@ { EXIT_REASON_APIC_ACCESS, APIC_ACCESS }, \ { EXIT_REASON_EPT_VIOLATION, EPT_VIOLATION }, \ { EXIT_REASON_EPT_MISCONFIG, EPT_MISCONFIG }, \ - { EXIT_REASON_WBINVD,WBINVD } + { EXIT_REASON_WBINVD,WBINVD }, \ + { EXIT_REASON_APIC_WRITE,APIC_WRITE }, \ + { EXIT_REASON_EOI_INDUCED, EOI_INDUCED }, \ + { EXIT_REASON_INVALID_STATE, INVALID_STATE }, \ + { EXIT_REASON_INVD, INVD }, \ + { EXIT_REASON_INVPCID, INVPCID } #ifdef __KERNEL__ -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER
On Sun, Feb 3, 2013 at 2:10 PM, Pandarathil, Vijaymohan R vijaymohan.pandarat...@hp.com wrote: - New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when an error occurs in the vfio_pci_device - Register pci_error_handler for the vfio_pci driver - When the device encounters an error, the error handler registered by the vfio_pci driver gets invoked by the AER infrastructure - In the error handler, signal the eventfd registered for the device. - This results in the qemu eventfd handler getting invoked and appropriate action taken for the guest. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- drivers/vfio/pci/vfio_pci.c | 43 - drivers/vfio/pci/vfio_pci_intrs.c | 30 ++ drivers/vfio/pci/vfio_pci_private.h | 1 + include/uapi/linux/vfio.h | 1 + 4 files changed, 74 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index b28e66c..818b1ed 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -196,7 +196,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) return (flags PCI_MSIX_FLAGS_QSIZE) + 1; } - } + } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) + if (pci_is_pcie(vdev-pdev)) + return 1; return 0; } @@ -302,6 +304,16 @@ static long vfio_pci_ioctl(void *device_data, if (info.argsz minsz || info.index = VFIO_PCI_NUM_IRQS) return -EINVAL; + switch (info.index) { + case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX: + break; + case VFIO_PCI_ERR_IRQ_INDEX: + if (pci_is_pcie(vdev-pdev)) + break; I don't know what is the policy in Linux kernel for this, but I'd add a comment about fall through here. + default: + return -EINVAL; + } + info.flags = VFIO_IRQ_INFO_EVENTFD; info.count = vfio_pci_get_irq_count(vdev, info.index); @@ -538,11 +550,40 @@ static void vfio_pci_remove(struct pci_dev *pdev) kfree(vdev); } +static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct vfio_pci_device *vpdev; + void *vdev; + + vdev = vfio_device_get_from_dev(pdev-dev); + if (vdev == NULL) + return PCI_ERS_RESULT_DISCONNECT; + + vpdev = vfio_device_data(vdev); + if (vpdev == NULL) { + vfio_device_put(vdev); + return PCI_ERS_RESULT_DISCONNECT; + } + + if (vpdev-err_trigger) + eventfd_signal(vpdev-err_trigger, 1); + + vfio_device_put(vdev); + + return PCI_ERS_RESULT_CAN_RECOVER; +} + +static struct pci_error_handlers vfio_err_handlers = { + .error_detected = vfio_pci_aer_err_detected, +}; + static struct pci_driver vfio_pci_driver = { .name = vfio-pci, .id_table = NULL, /* only dynamic ids */ .probe = vfio_pci_probe, .remove = vfio_pci_remove, + .err_handler= vfio_err_handlers, }; static void __exit vfio_pci_cleanup(void) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 3639371..83035b1 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -745,6 +745,29 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_device *vdev, return 0; } +static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev, + unsigned index, unsigned start, + unsigned count, uint32_t flags, void *data) +{ + int32_t fd = *(int32_t *)data; + + if ((index != VFIO_PCI_ERR_IRQ_INDEX) || + !(flags VFIO_IRQ_SET_DATA_EVENTFD)) + return -EINVAL; + + if (fd == -1) { + if (vdev-err_trigger) + eventfd_ctx_put(vdev-err_trigger); + vdev-err_trigger = NULL; + return 0; + } else if (fd = 0) { + vdev-err_trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(vdev-err_trigger)) + return PTR_ERR(vdev-err_trigger); + return 0; + } else + return -EINVAL; +} int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags, unsigned index, unsigned start, unsigned count, void *data) @@ -779,6 +802,13 @@ int
Re: [PATCH v3 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices
On Sun, Feb 3, 2013 at 2:10 PM, Pandarathil, Vijaymohan R vijaymohan.pandarat...@hp.com wrote: - Create eventfd per vfio device assigned to a guest and register an event handler - This fd is passed to the vfio_pci driver through the SET_IRQ ioctl - When the device encounters an error, the eventfd is signalled and the qemu eventfd handler gets invoked. - In the handler decide what action to take. Current action taken is to terminate the guest. Usually this is not OK, but I guess this is not guest triggerable. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- hw/vfio_pci.c | 105 + linux-headers/linux/vfio.h | 1 + 2 files changed, 106 insertions(+) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..4e2f768 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -130,6 +130,8 @@ typedef struct VFIODevice { QLIST_ENTRY(VFIODevice) next; struct VFIOGroup *group; bool reset_works; +EventNotifier err_notifier; +bool pci_aer; } VFIODevice; typedef struct VFIOGroup { @@ -1922,6 +1924,106 @@ static void vfio_put_device(VFIODevice *vdev) } } +static void vfio_err_notifier_handler(void *opaque) +{ +VFIODevice *vdev = opaque; + +if (!event_notifier_test_and_clear(vdev-err_notifier)) { +return; +} + +/* + * TBD. Retrieve the error details and decide what action + * needs to be taken. One of the actions could be to pass + * the error to the guest and have the guest driver recover + * from the error. This requires that PCIe capabilities be + * exposed to the guest. At present, we just terminate the + * guest to contain the error. + */ + +error_report(%s (%04x:%02x:%02x.%x) +Unrecoverable error detected... Terminating guest\n, +__func__, vdev-host.domain, vdev-host.bus, +vdev-host.slot, vdev-host.function); + +hw_error((%04x:%02x:%02x.%x) Unrecoverable device error\n, +vdev-host.domain, vdev-host.bus, +vdev-host.slot, vdev-host.function); + +return; Useless, please remove. +} + +static void vfio_register_err_notifier(VFIODevice *vdev) +{ +int ret; +int argsz; +struct vfio_irq_set *irq_set; +int32_t *pfd; + +if (event_notifier_init(vdev-err_notifier, 0)) { +error_report(vfio: Warning: Unable to init event notifier for error detection\n); +return; +} + +argsz = sizeof(*irq_set) + sizeof(*pfd); + +irq_set = g_malloc0(argsz); +irq_set-argsz = argsz; +irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD | + VFIO_IRQ_SET_ACTION_TRIGGER; +irq_set-index = VFIO_PCI_ERR_IRQ_INDEX; +irq_set-start = 0; +irq_set-count = 1; +pfd = (int32_t *)irq_set-data; + +*pfd = event_notifier_get_fd(vdev-err_notifier); +qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev); + +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +if (ret) { +DPRINTF(vfio: Error notification not supported for the device\n); +qemu_set_fd_handler(*pfd, NULL, NULL, vdev); +event_notifier_cleanup(vdev-err_notifier); +g_free(irq_set); +return; +} +g_free(irq_set); +vdev-pci_aer = 1; +return; Ditto. +} +static void vfio_unregister_err_notifier(VFIODevice *vdev) +{ +int argsz; +struct vfio_irq_set *irq_set; +int32_t *pfd; +int ret; + +if (!vdev-pci_aer) { +return; +} + +argsz = sizeof(*irq_set) + sizeof(*pfd); + +irq_set = g_malloc0(argsz); +irq_set-argsz = argsz; +irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD | + VFIO_IRQ_SET_ACTION_TRIGGER; +irq_set-index = VFIO_PCI_ERR_IRQ_INDEX; +irq_set-start = 0; +irq_set-count = 1; +pfd = (int32_t *)irq_set-data; +*pfd = -1; + +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +if (ret) { +DPRINTF(vfio: Failed to de-assign error fd: %d\n, ret); +} +g_free(irq_set); +qemu_set_fd_handler(event_notifier_get_fd(vdev-err_notifier), +NULL, NULL, vdev); +event_notifier_cleanup(vdev-err_notifier); +return; Ditto. +} static int vfio_initfn(PCIDevice *pdev) { VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev); @@ -2032,6 +2134,8 @@ static int vfio_initfn(PCIDevice *pdev) } } +vfio_register_err_notifier(vdev); + return 0; out_teardown: @@ -2049,6 +2153,7 @@ static void vfio_exitfn(PCIDevice *pdev) VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev); VFIOGroup *group = vdev-group; +vfio_unregister_err_notifier(vdev); pci_device_set_intx_routing_notifier(vdev-pdev, NULL); vfio_disable_interrupts(vdev); if (vdev-intx.mmap_timer) {
Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting
On Thu, Jan 31, 2013 at 03:55:56PM +0200, Gleb Natapov wrote: On Thu, Jan 31, 2013 at 11:44:43AM -0200, Marcelo Tosatti wrote: On Thu, Jan 31, 2013 at 03:38:37PM +0200, Gleb Natapov wrote: On Thu, Jan 31, 2013 at 11:32:45AM -0200, Marcelo Tosatti wrote: On Thu, Jan 31, 2013 at 11:43:48AM +0200, Gleb Natapov wrote: On Wed, Jan 30, 2013 at 09:03:11PM -0200, Marcelo Tosatti wrote: Posted interrupt patch: 2) Must move IN_GUEST_MODE assignment after local_irq_disable, in vcpu_enter_guest function. Otherwise: cpu0vcpu1-cpu1 vcpu-mode = IN_GUEST_MODE if IN_GUEST_MODE == true send IPI local_irq_disable PIR not transferred to VIRR, misses interrupt. cpu0 will set KVM_REQ_EVENT, so vmentry will be aborted after local_irq_disable() by -requests check. Yes, but you don't want KVM_REQ_EVENT+kick. It defeats the purpose of posted interrupts. You want if vcpu in guest mode send posted interrupt IPI else KVM_REQ_EVENT+kick I am thinking: set KVM_REQ_EVENT if pi is enabled send posted interrupt IPI else kick KVM_REQ_EVENT must be after sending posted interrupt IPI. Otherwise on the vcpu entry side test_and_clear(KVM_REQ_EVENT) { No bits set in PIR } It should be after updating PIR, but before sending posted interrupt IPI. Otherwise: cpu0 cpu1/vcpu KVM_REQ_EVENT is not set set pir send IPI irq_disable() -request is empty. set KVM_REQ_EVENT That's the same sequence as with IRR update, KVM_REQ_EVENT and kick today. Can only send IPI if vcpu-mode == IN_GUEST_MODE, which must be set after interrupt flag is cleared as noted. Also KVM_REQ_EVENT is processed outside of interrupt disabled region today. Or maybe i don't get what you say... write a complete description? What about item 4 below? That's for Yang to answer :) If more than one interrupt is generated with the same vector number, the local APIC can set the bit for the vector both in the IRR and ISR. This means that for the Pentium 4 and Intel Xeon processors, the IRR and ISR can queue two interrupts for each interrupt vector: one in the IRR and one in the ISR. Any additional interrupts issued for the same interrupt vector are collapsed into the single bit in IRR Which would mean KVM emulation differs... Eventually 3 interrupts can be queued: one in IRR, one in ISR, one in PIR. Any example how software relies on such two-interrupts-queued-in-IRR/ISR behaviour? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] Added ONE_REG interface for debug instruction
On Wed, Jan 16, 2013 at 01:54:41PM +0530, Bharat Bhushan wrote: This patch adds the one_reg interface to get the special instruction to be used for setting software breakpoint from userspace. Since this presumably is constant for any given platform, wouldn't a capability be more appropriate? Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to limit upload bandwidth for a guest server?
Hello all: I have a CentOS server using KVM to host guest servers. I am trying to limit the bandwidth usable by a guest server. I tried to use tc, but that is only limiting the download bandwidth to a server. It does not seem to filter packets uploaded by the server. Is there a tool to limit upload traffic for a guest server? Thanks, Neil -- Neil Aggarwal, (972)834-1565, http://UnmeteredVPS.net/centos Virtual private server with CentOS 6 preinstalled Unmetered bandwidth = no overage charges -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: VMX: disable SMEP feature when guest is in non-paging mode
Changes from v1 to v2: - Modify commit message and comments according to Gleb's suggestions. SMEP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0. This is because KVM uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Signed-off-by: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com --- arch/x86/kvm/vmx.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9120ae1..70393e2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3155,6 +3155,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if (!is_paging(vcpu)) { hw_cr4 = ~X86_CR4_PAE; hw_cr4 |= X86_CR4_PSE; + /* +* SMEP is disabled if CPU is in non-paging mode in +* hardware. However KVM always uses paging mode to +* emulate guest non-paging mode with TDP. +* To emulate this behavior, SMEP needs to be manually +* disabled when guest switches to non-paging mode. +*/ + hw_cr4 = ~X86_CR4_SMEP; } else if (!(cr4 X86_CR4_PAE)) { hw_cr4 = ~X86_CR4_PAE; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM: VMX: disable SMEP feature when guest is in non-paging mode
-Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Sunday, February 03, 2013 9:57 PM To: Xu, Dongxiao Cc: kvm@vger.kernel.org Subject: Re: KVM: VMX: disable SMEP feature when guest is in non-paging mode On Fri, Feb 01, 2013 at 08:30:07AM -, Xu wrote: SMEP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with HAP. Not always, only if unrestricted mode is disabled, since vm86 mode, that is used otherwise, requires paging. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0. This is because KVM uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Signed-off-by: Dongxiao Xu dongxiao...@intel.com Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com Reviewed-by: Gleb Natapov g...@redhat.com But HAP is XEN terminology AFAIK. KVM speak is tdp (two dimensional paging). If would be nice to change it in the commit message and the comment before committing. Thanks for reminding this. I've sent out the v2 patch to address this issue. Thanks, Dongxiao --- arch/x86/kvm/vmx.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9120ae1..e82f20d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3155,6 +3155,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if (!is_paging(vcpu)) { hw_cr4 = ~X86_CR4_PAE; hw_cr4 |= X86_CR4_PSE; + /* +* SMEP is disabled if CPU is in non-paging mode in +* hardware. However KVM always uses paging mode to +* emulate guest non-paging mode with HAP. +* To emulate this behavior, SMEP needs to be manually +* disabled when guest switches to non-paging mode. +*/ + hw_cr4 = ~X86_CR4_SMEP; } else if (!(cr4 X86_CR4_PAE)) { hw_cr4 = ~X86_CR4_PAE; } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to guest
-Original Message- From: Wood Scott-B07421 Sent: Saturday, February 02, 2013 4:09 AM To: Alexander Graf Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org Subject: Re: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to guest On 01/31/2013 06:11:32 PM, Alexander Graf wrote: On 31.01.2013, at 23:40, Scott Wood wrote: On 01/31/2013 01:20:39 PM, Alexander Graf wrote: On 31.01.2013, at 20:05, Alexander Graf wrote: On 31.01.2013, at 19:54, Scott Wood wrote: On 01/31/2013 12:52:41 PM, Alexander Graf wrote: On 31.01.2013, at 19:43, Scott Wood wrote: On 01/31/2013 12:21:07 PM, Alexander Graf wrote: How about something like this? Then both targets at least suck as much :). I'm not sure that should be the goal... Thanks to e500mc's awful hardware design, we don't know who sets the MSR_DE bit. Once we forced it onto the guest, we have no change to know whether the guest also set it or not. We could only guess. MSRP[DEP] can prevent the guest from modifying MSR[DE] -- but we still need to set it in the first place. According to ISA V2.06B, the hypervisor should set DBCR0[EDM] to let the guest know that the debug resources are not available, and that the value of MSR[DE] is not specified and not modifiable. So what would the guest do then to tell the hypervisor that it actually wants to know about debug events? The guest is out of luck, just as if a JTAG were in use. Hrm. Can we somehow generalize this out of luck behavior? Every time we would set or clear an MSR bit in shadow_msr on e500v2, we would instead set or clear it in the real MSR. That way only e500mc is out of luck, but the code would still be shared. I don't follow. e500v2 is just as out-of-luck. The mechanism simply does not support sharing debug resources. For e500v2 we have 2 fields * MSR as the guest sees it * MSR as we execute when the guest runs Since we know the MSR when the guest sees it, we can decide what to do when we get an unhandled debug interrupt. That's not the same thing as making the real MSR[DE] show up in the guest MSR[DE]. There are other problems with sharing -- what happens when both host and guest try to write to a particular IAC or DAC? Also, performance would be pretty awful if the guest has e.g. single stepping in DBCR0 enabled but MSR[DE]=0, and the host doesn't care about single stepping (but does want debugging enabled in general). What do you mean by the real MSR? The real MSR is shadow_msr, and MSR_DE must always be set there if the host is debugging the guest. As for reflecting it into the guest MSR, we could, but I don't really see the point. We're never going to actually send a debug exception to the guest when the host owns the debug resources. Why not? That's the whole point of jumping through user space. That's still needed for software breakpoints, which don't rely on the debug resources. 1) guest exits with debug interrupt 2) QEMU gets a debug exit 3) QEMU checks in its list whether it belongs to its own debug points 4) if not, it reinjects the interrupt into the guest Step 4 is pretty difficult to do when we don't know whether the guest is actually capable of handling debug interrupts at that moment. Software breakpoints take a Program interrupt rather than a Debug interrupt, unless MSR[DE]=1 and DBCR0[TRAP]=1. If the guest does not own debug resources we should always send it to the Program interrupt, so MSR[DE] doesn't matter. The = ~MSR_DE line is pointless on bookehv, and makes it harder to read. I had to stare at it a while before noticing that you initially set is_debug from the guest MSR and that you'd never really clear MSR_DE here on bookehv. Well, I'm mostly bouncing ideas here to find a way to express what we're trying to say in a way that someone who hasn't read this email thread would still understand what's going on :). I think it's already straightforward enough if you accept that shared debug resources aren't supported, and that we are either in a mode where the real MSR[DE] reflects the guest MSR[DE], or a mode where the real MSR[DE] is always on in guest mode and the guest MSR[DE] is irrelevant. How about this version? diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 38a62ef..9929c41 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,28 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { #ifndef +CONFIG_KVM_BOOKE_HV + /* Synchronize guest's desire to get debug interrupts into shadow MSR */ + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; #endif + + /* Force enable debug
One reg interface for Timer register
Hi Alex/Scott, Below is my understanding about the ONE_REG interface requirement for timer registers. Define the below 2 ONE_REG interface for TSR access: KVM_REG_SET_TSR, // Set the specified bits in TSR KVM_REG_CLEAR_TSR, // Clear the specified bits in TSR QEMU will use the above ioctl call to selectively set/clear bits of TSR. We do not need the similar interface for TCR as there is no race issue with TCR. So for TCR QEMU will keep on using the SREGS interface. Thanks -Bharat -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] Added ONE_REG interface for debug instruction
On Wed, Jan 16, 2013 at 01:54:41PM +0530, Bharat Bhushan wrote: This patch adds the one_reg interface to get the special instruction to be used for setting software breakpoint from userspace. Since this presumably is constant for any given platform, wouldn't a capability be more appropriate? Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to guest
-Original Message- From: Wood Scott-B07421 Sent: Saturday, February 02, 2013 4:09 AM To: Alexander Graf Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to guest On 01/31/2013 06:11:32 PM, Alexander Graf wrote: On 31.01.2013, at 23:40, Scott Wood wrote: On 01/31/2013 01:20:39 PM, Alexander Graf wrote: On 31.01.2013, at 20:05, Alexander Graf wrote: On 31.01.2013, at 19:54, Scott Wood wrote: On 01/31/2013 12:52:41 PM, Alexander Graf wrote: On 31.01.2013, at 19:43, Scott Wood wrote: On 01/31/2013 12:21:07 PM, Alexander Graf wrote: How about something like this? Then both targets at least suck as much :). I'm not sure that should be the goal... Thanks to e500mc's awful hardware design, we don't know who sets the MSR_DE bit. Once we forced it onto the guest, we have no change to know whether the guest also set it or not. We could only guess. MSRP[DEP] can prevent the guest from modifying MSR[DE] -- but we still need to set it in the first place. According to ISA V2.06B, the hypervisor should set DBCR0[EDM] to let the guest know that the debug resources are not available, and that the value of MSR[DE] is not specified and not modifiable. So what would the guest do then to tell the hypervisor that it actually wants to know about debug events? The guest is out of luck, just as if a JTAG were in use. Hrm. Can we somehow generalize this out of luck behavior? Every time we would set or clear an MSR bit in shadow_msr on e500v2, we would instead set or clear it in the real MSR. That way only e500mc is out of luck, but the code would still be shared. I don't follow. e500v2 is just as out-of-luck. The mechanism simply does not support sharing debug resources. For e500v2 we have 2 fields * MSR as the guest sees it * MSR as we execute when the guest runs Since we know the MSR when the guest sees it, we can decide what to do when we get an unhandled debug interrupt. That's not the same thing as making the real MSR[DE] show up in the guest MSR[DE]. There are other problems with sharing -- what happens when both host and guest try to write to a particular IAC or DAC? Also, performance would be pretty awful if the guest has e.g. single stepping in DBCR0 enabled but MSR[DE]=0, and the host doesn't care about single stepping (but does want debugging enabled in general). What do you mean by the real MSR? The real MSR is shadow_msr, and MSR_DE must always be set there if the host is debugging the guest. As for reflecting it into the guest MSR, we could, but I don't really see the point. We're never going to actually send a debug exception to the guest when the host owns the debug resources. Why not? That's the whole point of jumping through user space. That's still needed for software breakpoints, which don't rely on the debug resources. 1) guest exits with debug interrupt 2) QEMU gets a debug exit 3) QEMU checks in its list whether it belongs to its own debug points 4) if not, it reinjects the interrupt into the guest Step 4 is pretty difficult to do when we don't know whether the guest is actually capable of handling debug interrupts at that moment. Software breakpoints take a Program interrupt rather than a Debug interrupt, unless MSR[DE]=1 and DBCR0[TRAP]=1. If the guest does not own debug resources we should always send it to the Program interrupt, so MSR[DE] doesn't matter. The = ~MSR_DE line is pointless on bookehv, and makes it harder to read. I had to stare at it a while before noticing that you initially set is_debug from the guest MSR and that you'd never really clear MSR_DE here on bookehv. Well, I'm mostly bouncing ideas here to find a way to express what we're trying to say in a way that someone who hasn't read this email thread would still understand what's going on :). I think it's already straightforward enough if you accept that shared debug resources aren't supported, and that we are either in a mode where the real MSR[DE] reflects the guest MSR[DE], or a mode where the real MSR[DE] is always on in guest mode and the guest MSR[DE] is irrelevant. How about this version? diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 38a62ef..9929c41 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,28 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { #ifndef +CONFIG_KVM_BOOKE_HV + /* Synchronize guest's desire to get debug interrupts into shadow MSR */ + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; #endif + + /* Force enable debug
One reg interface for Timer register
Hi Alex/Scott, Below is my understanding about the ONE_REG interface requirement for timer registers. Define the below 2 ONE_REG interface for TSR access: KVM_REG_SET_TSR, // Set the specified bits in TSR KVM_REG_CLEAR_TSR, // Clear the specified bits in TSR QEMU will use the above ioctl call to selectively set/clear bits of TSR. We do not need the similar interface for TCR as there is no race issue with TCR. So for TCR QEMU will keep on using the SREGS interface. Thanks -Bharat -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html