[Bug 50921] kvm hangs booting Windows 2000

2013-02-03 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=50921





--- Comment #21 from Gleb g...@redhat.com  2013-02-03 08:43:20 ---
It is queued for 3.8.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 50921] kvm hangs booting Windows 2000

2013-02-03 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=50921





--- Comment #22 from Gleb g...@redhat.com  2013-02-03 08:48:16 ---
(In reply to comment #21)
 It is queued for 3.8.

Sorry, for 3.9

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: VMX: disable SMEP feature when guest is in non-paging mode

2013-02-03 Thread Gleb Natapov
On Fri, Feb 01, 2013 at 08:30:07AM -, Xu wrote:
 SMEP is disabled if CPU is in non-paging mode in hardware.
 However KVM always uses paging mode to emulate guest non-paging
 mode with HAP.
Not always, only if unrestricted mode is disabled, since vm86 mode, that
is used otherwise, requires paging.

To emulate this behavior, SMEP needs to be manually
 disabled when guest switches to non-paging mode.
 
 We met an issue that, SMP Linux guest with recent kernel (enable
 SMEP support, for example, 3.5.3) would crash with triple fault if
 setting unrestricted_guest=0. This is because KVM uses an identity
 mapping page table to emulate the non-paging mode, where the page
 table is set with USER flag. If SMEP is still enabled in this case,
 guest will meet unhandlable page fault and then crash.
 
 Signed-off-by: Dongxiao Xu dongxiao...@intel.com
 Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com
Reviewed-by: Gleb Natapov g...@redhat.com

But HAP is XEN terminology AFAIK. KVM speak is tdp (two dimensional
paging). If would be nice to change it in the commit message and the
comment before committing.

 
 ---
 arch/x86/kvm/vmx.c |8 
  1 files changed, 8 insertions(+), 0 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 9120ae1..e82f20d 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -3155,6 +3155,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned 
 long cr4)
   if (!is_paging(vcpu)) {
   hw_cr4 = ~X86_CR4_PAE;
   hw_cr4 |= X86_CR4_PSE;
 + /*
 +  * SMEP is disabled if CPU is in non-paging mode in
 +  * hardware. However KVM always uses paging mode to
 +  * emulate guest non-paging mode with HAP.
 +  * To emulate this behavior, SMEP needs to be manually
 +  * disabled when guest switches to non-paging mode.
 +  */
 + hw_cr4 = ~X86_CR4_SMEP;
   } else if (!(cr4  X86_CR4_PAE)) {
   hw_cr4 = ~X86_CR4_PAE;
   }

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] VFIO: Wrapper for getting reference to vfio_device from device

2013-02-03 Thread Pandarathil, Vijaymohan R
- Added vfio_device_get_from_dev() as wrapper to get
  reference to vfio_device from struct device.

- Added vfio_device_data() as a wrapper to get device_data from
  vfio_device.

Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
---
 drivers/vfio/vfio.c  | 41 -
 include/linux/vfio.h |  3 +++
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 12c264d..f0a78a2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -407,12 +407,13 @@ static void vfio_device_release(struct kref *kref)
 }
 
 /* Device reference always implies a group reference */
-static void vfio_device_put(struct vfio_device *device)
+void vfio_device_put(struct vfio_device *device)
 {
struct vfio_group *group = device-group;
kref_put_mutex(device-kref, vfio_device_release, group-device_lock);
vfio_group_put(group);
 }
+EXPORT_SYMBOL_GPL(vfio_device_put);
 
 static void vfio_device_get(struct vfio_device *device)
 {
@@ -642,8 +643,12 @@ int vfio_add_group_dev(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(vfio_add_group_dev);
 
-/* Test whether a struct device is present in our tracking */
-static bool vfio_dev_present(struct device *dev)
+/**
+ * This does a get on the vfio_device from device.
+ * Callers of this function will have to call vfio_put_device() to
+ * remove the reference.
+ */
+struct vfio_device *vfio_device_get_from_dev(struct device *dev)
 {
struct iommu_group *iommu_group;
struct vfio_group *group;
@@ -651,25 +656,43 @@ static bool vfio_dev_present(struct device *dev)
 
iommu_group = iommu_group_get(dev);
if (!iommu_group)
-   return false;
+   return NULL;
 
group = vfio_group_get_from_iommu(iommu_group);
if (!group) {
iommu_group_put(iommu_group);
-   return false;
+   return NULL;
}
 
device = vfio_group_get_device(group, dev);
if (!device) {
vfio_group_put(group);
iommu_group_put(iommu_group);
-   return false;
+   return NULL;
}
-
-   vfio_device_put(device);
vfio_group_put(group);
iommu_group_put(iommu_group);
-   return true;
+   return device;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
+
+void *vfio_device_data(struct vfio_device *device)
+{
+   return device-device_data;
+}
+EXPORT_SYMBOL_GPL(vfio_device_data);
+
+/* Test whether a struct device is present in our tracking */
+static bool vfio_dev_present(struct device *dev)
+{
+   struct vfio_device *device;
+
+   device = vfio_device_get_from_dev(dev);
+   if (device) {
+   vfio_device_put(device);
+   return true;
+   } else
+   return false;
 }
 
 /*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ab9e862..ac8d488 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -45,6 +45,9 @@ extern int vfio_add_group_dev(struct device *dev,
  void *device_data);
 
 extern void *vfio_del_group_dev(struct device *dev);
+extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
+extern void vfio_device_put(struct vfio_device *device);
+extern void *vfio_device_data(struct vfio_device *device);
 
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
-- 
1.7.11.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] AER-KVM: Error containment of VFIO devices assigned to KVM guests

2013-02-03 Thread Pandarathil, Vijaymohan R

Add support for error containment when a VFIO device assigned to a KVM
guest encounters an error. This is for PCIe devices/drivers that support AER
functionality. When the host OS is notified of an error in a device either
through the firmware first approach or through an interrupt handled by the AER
root port driver, the error handler registered by the vfio-pci driver gets
invoked. The qemu process is signaled through an eventfd registered per
VFIO device by the qemu process. In the eventfd handler, qemu decides on
what action to take. In this implementation, guest is brought down to
contain the error.


v3:
 - Removed PCI_AER* flags from device info ioctl.
 - Incorporated feedback
v2:
 - Rebased to latest upstream stable bits
 - Changed the new ioctl to be part of VFIO_SET_IRQs ioctl
 - Added a new patch to get/put reference to a vfio device from struct device
 - Incorporated all other feedback.

---

Vijay Mohan Pandarathil(3):

[PATCH 1/3] VFIO: Wrapper for getting reference to vfio_device from device 
[PATCH 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER
[PATCH 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

Kernel files changed

 drivers/vfio/vfio.c  | 41 -
 include/linux/vfio.h |  3 +++
 2 files changed, 35 insertions(+), 9 deletions(-)

 drivers/vfio/pci/vfio_pci.c | 43 -
 drivers/vfio/pci/vfio_pci_intrs.c   | 30 ++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h   |  1 +
 4 files changed, 74 insertions(+), 1 deletion(-)

Qemu files changed

 hw/vfio_pci.c  | 105 +
 linux-headers/linux/vfio.h |   1 +
 2 files changed, 106 insertions(+)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

2013-02-03 Thread Pandarathil, Vijaymohan R
- Create eventfd per vfio device assigned to a guest and register an
  event handler

- This fd is passed to the vfio_pci driver through the SET_IRQ ioctl

- When the device encounters an error, the eventfd is signalled
  and the qemu eventfd handler gets invoked.

- In the handler decide what action to take. Current action taken
  is to terminate the guest.

Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
---
 hw/vfio_pci.c  | 105 +
 linux-headers/linux/vfio.h |   1 +
 2 files changed, 106 insertions(+)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index c51ae67..4e2f768 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -130,6 +130,8 @@ typedef struct VFIODevice {
 QLIST_ENTRY(VFIODevice) next;
 struct VFIOGroup *group;
 bool reset_works;
+EventNotifier err_notifier;
+bool pci_aer;
 } VFIODevice;
 
 typedef struct VFIOGroup {
@@ -1922,6 +1924,106 @@ static void vfio_put_device(VFIODevice *vdev)
 }
 }
 
+static void vfio_err_notifier_handler(void *opaque)
+{
+VFIODevice *vdev = opaque;
+
+if (!event_notifier_test_and_clear(vdev-err_notifier)) {
+return;
+}
+
+/*
+ * TBD. Retrieve the error details and decide what action
+ * needs to be taken. One of the actions could be to pass
+ * the error to the guest and have the guest driver recover
+ * from the error. This requires that PCIe capabilities be
+ * exposed to the guest. At present, we just terminate the
+ * guest to contain the error.
+ */
+
+error_report(%s (%04x:%02x:%02x.%x)
+Unrecoverable error detected... Terminating guest\n,
+__func__, vdev-host.domain, vdev-host.bus,
+vdev-host.slot, vdev-host.function);
+
+hw_error((%04x:%02x:%02x.%x) Unrecoverable device error\n,
+vdev-host.domain, vdev-host.bus,
+vdev-host.slot, vdev-host.function);
+
+return;
+}
+
+static void vfio_register_err_notifier(VFIODevice *vdev)
+{
+int ret;
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+
+if (event_notifier_init(vdev-err_notifier, 0)) {
+error_report(vfio: Warning: Unable to init event notifier for error 
detection\n);
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set-argsz = argsz;
+irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set-index = VFIO_PCI_ERR_IRQ_INDEX;
+irq_set-start = 0;
+irq_set-count = 1;
+pfd = (int32_t *)irq_set-data;
+
+*pfd = event_notifier_get_fd(vdev-err_notifier);
+qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
+
+ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+DPRINTF(vfio: Error notification not supported for the device\n);
+qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
+event_notifier_cleanup(vdev-err_notifier);
+g_free(irq_set);
+return;
+}
+g_free(irq_set);
+vdev-pci_aer = 1;
+return;
+}
+static void vfio_unregister_err_notifier(VFIODevice *vdev)
+{
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+int ret;
+
+if (!vdev-pci_aer) {
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set-argsz = argsz;
+irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set-index = VFIO_PCI_ERR_IRQ_INDEX;
+irq_set-start = 0;
+irq_set-count = 1;
+pfd = (int32_t *)irq_set-data;
+*pfd = -1;
+
+ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+DPRINTF(vfio: Failed to de-assign error fd: %d\n, ret);
+}
+g_free(irq_set);
+qemu_set_fd_handler(event_notifier_get_fd(vdev-err_notifier),
+NULL, NULL, vdev);
+event_notifier_cleanup(vdev-err_notifier);
+return;
+}
 static int vfio_initfn(PCIDevice *pdev)
 {
 VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
@@ -2032,6 +2134,8 @@ static int vfio_initfn(PCIDevice *pdev)
 }
 }
 
+vfio_register_err_notifier(vdev);
+
 return 0;
 
 out_teardown:
@@ -2049,6 +2153,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
 VFIOGroup *group = vdev-group;
 
+vfio_unregister_err_notifier(vdev);
 pci_device_set_intx_routing_notifier(vdev-pdev, NULL);
 vfio_disable_interrupts(vdev);
 if (vdev-intx.mmap_timer) {
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index f787b72..6b20849 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -310,6 +310,7 @@ enum {
VFIO_PCI_INTX_IRQ_INDEX,
VFIO_PCI_MSI_IRQ_INDEX,
VFIO_PCI_MSIX_IRQ_INDEX,
+   VFIO_PCI_ERR_IRQ_INDEX,
VFIO_PCI_NUM_IRQS
 };

Re: [PATCH v3 0/3] AER-KVM: Error containment of VFIO devices assigned to KVM guests

2013-02-03 Thread Gleb Natapov
Please use git send-email --thread --no-chain-reply-to when sending
patch series.

On Sun, Feb 03, 2013 at 02:10:08PM +, Pandarathil, Vijaymohan R wrote:
 
 Add support for error containment when a VFIO device assigned to a KVM
 guest encounters an error. This is for PCIe devices/drivers that support AER
 functionality. When the host OS is notified of an error in a device either
 through the firmware first approach or through an interrupt handled by the AER
 root port driver, the error handler registered by the vfio-pci driver gets
 invoked. The qemu process is signaled through an eventfd registered per
 VFIO device by the qemu process. In the eventfd handler, qemu decides on
 what action to take. In this implementation, guest is brought down to
 contain the error.
 
 
 v3:
  - Removed PCI_AER* flags from device info ioctl.
  - Incorporated feedback
 v2:
  - Rebased to latest upstream stable bits
  - Changed the new ioctl to be part of VFIO_SET_IRQs ioctl
  - Added a new patch to get/put reference to a vfio device from struct device
  - Incorporated all other feedback.
 
 ---
 
 Vijay Mohan Pandarathil(3):
 
 [PATCH 1/3] VFIO: Wrapper for getting reference to vfio_device from device 
 [PATCH 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER
 [PATCH 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices
 
 Kernel files changed
 
  drivers/vfio/vfio.c  | 41 -
  include/linux/vfio.h |  3 +++
  2 files changed, 35 insertions(+), 9 deletions(-)
 
  drivers/vfio/pci/vfio_pci.c | 43 
 -
  drivers/vfio/pci/vfio_pci_intrs.c   | 30 ++
  drivers/vfio/pci/vfio_pci_private.h |  1 +
  include/uapi/linux/vfio.h   |  1 +
  4 files changed, 74 insertions(+), 1 deletion(-)
 
 Qemu files changed
 
  hw/vfio_pci.c  | 105 
 +
  linux-headers/linux/vfio.h |   1 +
  2 files changed, 106 insertions(+)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER

2013-02-03 Thread Pandarathil, Vijaymohan R
- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled 
when
  an error occurs in the vfio_pci_device

- Register pci_error_handler for the vfio_pci driver

- When the device encounters an error, the error handler registered by
  the vfio_pci driver gets invoked by the AER infrastructure

- In the error handler, signal the eventfd registered for the device.

- This results in the qemu eventfd handler getting invoked and
  appropriate action taken for the guest.

Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
---
 drivers/vfio/pci/vfio_pci.c | 43 -
 drivers/vfio/pci/vfio_pci_intrs.c   | 30 ++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h   |  1 +
 4 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b28e66c..818b1ed 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -196,7 +196,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
 
return (flags  PCI_MSIX_FLAGS_QSIZE) + 1;
}
-   }
+   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX)
+   if (pci_is_pcie(vdev-pdev))
+   return 1;
 
return 0;
 }
@@ -302,6 +304,16 @@ static long vfio_pci_ioctl(void *device_data,
if (info.argsz  minsz || info.index = VFIO_PCI_NUM_IRQS)
return -EINVAL;
 
+   switch (info.index) {
+   case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
+   break;
+   case VFIO_PCI_ERR_IRQ_INDEX:
+   if (pci_is_pcie(vdev-pdev))
+   break;
+   default:
+   return -EINVAL;
+   }
+
info.flags = VFIO_IRQ_INFO_EVENTFD;
 
info.count = vfio_pci_get_irq_count(vdev, info.index);
@@ -538,11 +550,40 @@ static void vfio_pci_remove(struct pci_dev *pdev)
kfree(vdev);
 }
 
+static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+   struct vfio_pci_device *vpdev;
+   void *vdev;
+
+   vdev = vfio_device_get_from_dev(pdev-dev);
+   if (vdev == NULL)
+   return PCI_ERS_RESULT_DISCONNECT;
+
+   vpdev = vfio_device_data(vdev);
+   if (vpdev == NULL) {
+   vfio_device_put(vdev);
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+
+   if (vpdev-err_trigger)
+   eventfd_signal(vpdev-err_trigger, 1);
+
+   vfio_device_put(vdev);
+
+   return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
+static struct pci_error_handlers vfio_err_handlers = {
+   .error_detected = vfio_pci_aer_err_detected,
+};
+
 static struct pci_driver vfio_pci_driver = {
.name   = vfio-pci,
.id_table   = NULL, /* only dynamic ids */
.probe  = vfio_pci_probe,
.remove = vfio_pci_remove,
+   .err_handler= vfio_err_handlers,
 };
 
 static void __exit vfio_pci_cleanup(void)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 3639371..83035b1 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -745,6 +745,29 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_device 
*vdev,
return 0;
 }
 
+static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
+   unsigned index, unsigned start,
+   unsigned count, uint32_t flags, void *data)
+{
+   int32_t fd = *(int32_t *)data;
+
+   if ((index != VFIO_PCI_ERR_IRQ_INDEX) ||
+   !(flags  VFIO_IRQ_SET_DATA_EVENTFD))
+   return -EINVAL;
+
+   if (fd == -1) {
+   if (vdev-err_trigger)
+   eventfd_ctx_put(vdev-err_trigger);
+   vdev-err_trigger = NULL;
+   return 0;
+   } else if (fd = 0) {
+   vdev-err_trigger = eventfd_ctx_fdget(fd);
+   if (IS_ERR(vdev-err_trigger))
+   return PTR_ERR(vdev-err_trigger);
+   return 0;
+   } else
+   return -EINVAL;
+}
 int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
unsigned index, unsigned start, unsigned count,
void *data)
@@ -779,6 +802,13 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, 
uint32_t flags,
break;
}
break;
+   case VFIO_PCI_ERR_IRQ_INDEX:
+   switch (flags  VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+   case VFIO_IRQ_SET_ACTION_TRIGGER:
+   if 

[PATCH] KVM: VMX: add missing exit names to VMX_EXIT_REASONS array

2013-02-03 Thread Gleb Natapov
Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 694586c..5c9dbad 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -105,7 +105,12 @@
{ EXIT_REASON_APIC_ACCESS,   APIC_ACCESS }, \
{ EXIT_REASON_EPT_VIOLATION, EPT_VIOLATION }, \
{ EXIT_REASON_EPT_MISCONFIG, EPT_MISCONFIG }, \
-   { EXIT_REASON_WBINVD,WBINVD }
+   { EXIT_REASON_WBINVD,WBINVD }, \
+   { EXIT_REASON_APIC_WRITE,APIC_WRITE }, \
+   { EXIT_REASON_EOI_INDUCED,   EOI_INDUCED }, \
+   { EXIT_REASON_INVALID_STATE, INVALID_STATE }, \
+   { EXIT_REASON_INVD,  INVD }, \
+   { EXIT_REASON_INVPCID,   INVPCID }
 
 #ifdef __KERNEL__
 
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER

2013-02-03 Thread Blue Swirl
On Sun, Feb 3, 2013 at 2:10 PM, Pandarathil, Vijaymohan R
vijaymohan.pandarat...@hp.com wrote:
 - New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled 
 when
   an error occurs in the vfio_pci_device

 - Register pci_error_handler for the vfio_pci driver

 - When the device encounters an error, the error handler registered by
   the vfio_pci driver gets invoked by the AER infrastructure

 - In the error handler, signal the eventfd registered for the device.

 - This results in the qemu eventfd handler getting invoked and
   appropriate action taken for the guest.

 Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
 ---
  drivers/vfio/pci/vfio_pci.c | 43 
 -
  drivers/vfio/pci/vfio_pci_intrs.c   | 30 ++
  drivers/vfio/pci/vfio_pci_private.h |  1 +
  include/uapi/linux/vfio.h   |  1 +
  4 files changed, 74 insertions(+), 1 deletion(-)

 diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
 index b28e66c..818b1ed 100644
 --- a/drivers/vfio/pci/vfio_pci.c
 +++ b/drivers/vfio/pci/vfio_pci.c
 @@ -196,7 +196,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
 *vdev, int irq_type)

 return (flags  PCI_MSIX_FLAGS_QSIZE) + 1;
 }
 -   }
 +   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX)
 +   if (pci_is_pcie(vdev-pdev))
 +   return 1;

 return 0;
  }
 @@ -302,6 +304,16 @@ static long vfio_pci_ioctl(void *device_data,
 if (info.argsz  minsz || info.index = VFIO_PCI_NUM_IRQS)
 return -EINVAL;

 +   switch (info.index) {
 +   case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
 +   break;
 +   case VFIO_PCI_ERR_IRQ_INDEX:
 +   if (pci_is_pcie(vdev-pdev))
 +   break;

I don't know what is the policy in Linux kernel for this, but I'd add
a comment about fall through here.

 +   default:
 +   return -EINVAL;
 +   }
 +
 info.flags = VFIO_IRQ_INFO_EVENTFD;

 info.count = vfio_pci_get_irq_count(vdev, info.index);
 @@ -538,11 +550,40 @@ static void vfio_pci_remove(struct pci_dev *pdev)
 kfree(vdev);
  }

 +static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 + pci_channel_state_t state)
 +{
 +   struct vfio_pci_device *vpdev;
 +   void *vdev;
 +
 +   vdev = vfio_device_get_from_dev(pdev-dev);
 +   if (vdev == NULL)
 +   return PCI_ERS_RESULT_DISCONNECT;
 +
 +   vpdev = vfio_device_data(vdev);
 +   if (vpdev == NULL) {
 +   vfio_device_put(vdev);
 +   return PCI_ERS_RESULT_DISCONNECT;
 +   }
 +
 +   if (vpdev-err_trigger)
 +   eventfd_signal(vpdev-err_trigger, 1);
 +
 +   vfio_device_put(vdev);
 +
 +   return PCI_ERS_RESULT_CAN_RECOVER;
 +}
 +
 +static struct pci_error_handlers vfio_err_handlers = {
 +   .error_detected = vfio_pci_aer_err_detected,
 +};
 +
  static struct pci_driver vfio_pci_driver = {
 .name   = vfio-pci,
 .id_table   = NULL, /* only dynamic ids */
 .probe  = vfio_pci_probe,
 .remove = vfio_pci_remove,
 +   .err_handler= vfio_err_handlers,
  };

  static void __exit vfio_pci_cleanup(void)
 diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
 b/drivers/vfio/pci/vfio_pci_intrs.c
 index 3639371..83035b1 100644
 --- a/drivers/vfio/pci/vfio_pci_intrs.c
 +++ b/drivers/vfio/pci/vfio_pci_intrs.c
 @@ -745,6 +745,29 @@ static int vfio_pci_set_msi_trigger(struct 
 vfio_pci_device *vdev,
 return 0;
  }

 +static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
 +   unsigned index, unsigned start,
 +   unsigned count, uint32_t flags, void 
 *data)
 +{
 +   int32_t fd = *(int32_t *)data;
 +
 +   if ((index != VFIO_PCI_ERR_IRQ_INDEX) ||
 +   !(flags  VFIO_IRQ_SET_DATA_EVENTFD))
 +   return -EINVAL;
 +
 +   if (fd == -1) {
 +   if (vdev-err_trigger)
 +   eventfd_ctx_put(vdev-err_trigger);
 +   vdev-err_trigger = NULL;
 +   return 0;
 +   } else if (fd = 0) {
 +   vdev-err_trigger = eventfd_ctx_fdget(fd);
 +   if (IS_ERR(vdev-err_trigger))
 +   return PTR_ERR(vdev-err_trigger);
 +   return 0;
 +   } else
 +   return -EINVAL;
 +}
  int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
 unsigned index, unsigned start, unsigned count,
 void *data)
 @@ -779,6 +802,13 @@ int 

Re: [PATCH v3 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

2013-02-03 Thread Blue Swirl
On Sun, Feb 3, 2013 at 2:10 PM, Pandarathil, Vijaymohan R
vijaymohan.pandarat...@hp.com wrote:
 - Create eventfd per vfio device assigned to a guest and register an
   event handler

 - This fd is passed to the vfio_pci driver through the SET_IRQ ioctl

 - When the device encounters an error, the eventfd is signalled
   and the qemu eventfd handler gets invoked.

 - In the handler decide what action to take. Current action taken
   is to terminate the guest.

Usually this is not OK, but I guess this is not guest triggerable.


 Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com
 ---
  hw/vfio_pci.c  | 105 
 +
  linux-headers/linux/vfio.h |   1 +
  2 files changed, 106 insertions(+)

 diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
 index c51ae67..4e2f768 100644
 --- a/hw/vfio_pci.c
 +++ b/hw/vfio_pci.c
 @@ -130,6 +130,8 @@ typedef struct VFIODevice {
  QLIST_ENTRY(VFIODevice) next;
  struct VFIOGroup *group;
  bool reset_works;
 +EventNotifier err_notifier;
 +bool pci_aer;
  } VFIODevice;

  typedef struct VFIOGroup {
 @@ -1922,6 +1924,106 @@ static void vfio_put_device(VFIODevice *vdev)
  }
  }

 +static void vfio_err_notifier_handler(void *opaque)
 +{
 +VFIODevice *vdev = opaque;
 +
 +if (!event_notifier_test_and_clear(vdev-err_notifier)) {
 +return;
 +}
 +
 +/*
 + * TBD. Retrieve the error details and decide what action
 + * needs to be taken. One of the actions could be to pass
 + * the error to the guest and have the guest driver recover
 + * from the error. This requires that PCIe capabilities be
 + * exposed to the guest. At present, we just terminate the
 + * guest to contain the error.
 + */
 +
 +error_report(%s (%04x:%02x:%02x.%x)
 +Unrecoverable error detected... Terminating guest\n,
 +__func__, vdev-host.domain, vdev-host.bus,
 +vdev-host.slot, vdev-host.function);
 +
 +hw_error((%04x:%02x:%02x.%x) Unrecoverable device error\n,
 +vdev-host.domain, vdev-host.bus,
 +vdev-host.slot, vdev-host.function);
 +
 +return;

Useless, please remove.

 +}
 +
 +static void vfio_register_err_notifier(VFIODevice *vdev)
 +{
 +int ret;
 +int argsz;
 +struct vfio_irq_set *irq_set;
 +int32_t *pfd;
 +
 +if (event_notifier_init(vdev-err_notifier, 0)) {
 +error_report(vfio: Warning: Unable to init event notifier for error 
 detection\n);
 +return;
 +}
 +
 +argsz = sizeof(*irq_set) + sizeof(*pfd);
 +
 +irq_set = g_malloc0(argsz);
 +irq_set-argsz = argsz;
 +irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD |
 + VFIO_IRQ_SET_ACTION_TRIGGER;
 +irq_set-index = VFIO_PCI_ERR_IRQ_INDEX;
 +irq_set-start = 0;
 +irq_set-count = 1;
 +pfd = (int32_t *)irq_set-data;
 +
 +*pfd = event_notifier_get_fd(vdev-err_notifier);
 +qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
 +
 +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
 +if (ret) {
 +DPRINTF(vfio: Error notification not supported for the device\n);
 +qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
 +event_notifier_cleanup(vdev-err_notifier);
 +g_free(irq_set);
 +return;
 +}
 +g_free(irq_set);
 +vdev-pci_aer = 1;
 +return;

Ditto.

 +}
 +static void vfio_unregister_err_notifier(VFIODevice *vdev)
 +{
 +int argsz;
 +struct vfio_irq_set *irq_set;
 +int32_t *pfd;
 +int ret;
 +
 +if (!vdev-pci_aer) {
 +return;
 +}
 +
 +argsz = sizeof(*irq_set) + sizeof(*pfd);
 +
 +irq_set = g_malloc0(argsz);
 +irq_set-argsz = argsz;
 +irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD |
 + VFIO_IRQ_SET_ACTION_TRIGGER;
 +irq_set-index = VFIO_PCI_ERR_IRQ_INDEX;
 +irq_set-start = 0;
 +irq_set-count = 1;
 +pfd = (int32_t *)irq_set-data;
 +*pfd = -1;
 +
 +ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set);
 +if (ret) {
 +DPRINTF(vfio: Failed to de-assign error fd: %d\n, ret);
 +}
 +g_free(irq_set);
 +qemu_set_fd_handler(event_notifier_get_fd(vdev-err_notifier),
 +NULL, NULL, vdev);
 +event_notifier_cleanup(vdev-err_notifier);
 +return;

Ditto.

 +}
  static int vfio_initfn(PCIDevice *pdev)
  {
  VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
 @@ -2032,6 +2134,8 @@ static int vfio_initfn(PCIDevice *pdev)
  }
  }

 +vfio_register_err_notifier(vdev);
 +
  return 0;

  out_teardown:
 @@ -2049,6 +2153,7 @@ static void vfio_exitfn(PCIDevice *pdev)
  VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
  VFIOGroup *group = vdev-group;

 +vfio_unregister_err_notifier(vdev);
  pci_device_set_intx_routing_notifier(vdev-pdev, NULL);
  vfio_disable_interrupts(vdev);
  if (vdev-intx.mmap_timer) {
 

Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting

2013-02-03 Thread Marcelo Tosatti
On Thu, Jan 31, 2013 at 03:55:56PM +0200, Gleb Natapov wrote:
 On Thu, Jan 31, 2013 at 11:44:43AM -0200, Marcelo Tosatti wrote:
  On Thu, Jan 31, 2013 at 03:38:37PM +0200, Gleb Natapov wrote:
   On Thu, Jan 31, 2013 at 11:32:45AM -0200, Marcelo Tosatti wrote:
On Thu, Jan 31, 2013 at 11:43:48AM +0200, Gleb Natapov wrote:
 On Wed, Jan 30, 2013 at 09:03:11PM -0200, Marcelo Tosatti wrote:
  Posted interrupt patch:
  2) Must move IN_GUEST_MODE assignment after local_irq_disable, in
  vcpu_enter_guest function. Otherwise:
  
  cpu0vcpu1-cpu1
  
  vcpu-mode = IN_GUEST_MODE
  
  if IN_GUEST_MODE == true
  send IPI
  local_irq_disable
  
  PIR not transferred to VIRR, misses interrupt.
  
 cpu0 will set KVM_REQ_EVENT, so vmentry will be aborted after
 local_irq_disable() by -requests check.

Yes, but you don't want KVM_REQ_EVENT+kick. It defeats the purpose 
of posted interrupts. You want

if vcpu in guest mode
send posted interrupt IPI
else
KVM_REQ_EVENT+kick

   I am thinking:
   
set KVM_REQ_EVENT
if pi is enabled
send posted interrupt IPI
else
kick
  
  KVM_REQ_EVENT must be after sending posted interrupt IPI. Otherwise on
  the vcpu entry side
  
  test_and_clear(KVM_REQ_EVENT) {
  No bits set in PIR
 }
  
 It should be after updating PIR, but before sending posted interrupt
 IPI. Otherwise:
 
  cpu0 cpu1/vcpu
 
   KVM_REQ_EVENT is not set
 set pir 
 send IPI
   irq_disable()
   -request is empty.
 set KVM_REQ_EVENT
 
 That's the same sequence as with IRR update, KVM_REQ_EVENT and kick
 today.

Can only send IPI if vcpu-mode == IN_GUEST_MODE, which must be set
after interrupt flag is cleared as noted.

Also KVM_REQ_EVENT is processed outside of interrupt disabled region today.

Or maybe i don't get what you say... write a complete
description?

  What about item 4 below?
  
 That's for Yang to answer :)

If more than one interrupt is generated with the same vector number,
the local APIC can set the bit for the vector both in the IRR and ISR.
This means that for the Pentium 4 and Intel Xeon processors, the IRR
and ISR can queue two interrupts for each interrupt vector: one in the
IRR and one in the ISR. Any additional interrupts issued for the same
interrupt vector are collapsed into the single bit in IRR

Which would mean KVM emulation differs... Eventually 3 interrupts can
be queued: one in IRR, one in ISR, one in PIR.

Any example how software relies on such two-interrupts-queued-in-IRR/ISR 
behaviour?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] Added ONE_REG interface for debug instruction

2013-02-03 Thread Paul Mackerras
On Wed, Jan 16, 2013 at 01:54:41PM +0530, Bharat Bhushan wrote:
 This patch adds the one_reg interface to get the special instruction
 to be used for setting software breakpoint from userspace.

Since this presumably is constant for any given platform, wouldn't
a capability be more appropriate?

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to limit upload bandwidth for a guest server?

2013-02-03 Thread Neil Aggarwal
Hello all:

I have a CentOS server using KVM to host guest servers.
I am trying to limit the bandwidth usable by a guest server.

I tried to use tc, but that is only limiting the download bandwidth
to a server.  It does not seem to filter packets uploaded by the
server.

Is there a tool to limit upload traffic for a guest server?

Thanks,
  Neil

--
Neil Aggarwal, (972)834-1565, http://UnmeteredVPS.net/centos
Virtual private server with CentOS 6 preinstalled
Unmetered bandwidth = no overage charges


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: VMX: disable SMEP feature when guest is in non-paging mode

2013-02-03 Thread Dongxiao Xu
Changes from v1 to v2:
 - Modify commit message and comments according to Gleb's suggestions.

SMEP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMEP needs to be manually
disabled when guest switches to non-paging mode.

We met an issue that, SMP Linux guest with recent kernel (enable
SMEP support, for example, 3.5.3) would crash with triple fault if
setting unrestricted_guest=0. This is because KVM uses an identity
mapping page table to emulate the non-paging mode, where the page
table is set with USER flag. If SMEP is still enabled in this case,
guest will meet unhandlable page fault and then crash.

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com
---
 arch/x86/kvm/vmx.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9120ae1..70393e2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3155,6 +3155,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
if (!is_paging(vcpu)) {
hw_cr4 = ~X86_CR4_PAE;
hw_cr4 |= X86_CR4_PSE;
+   /*
+* SMEP is disabled if CPU is in non-paging mode in
+* hardware. However KVM always uses paging mode to
+* emulate guest non-paging mode with TDP.
+* To emulate this behavior, SMEP needs to be manually
+* disabled when guest switches to non-paging mode.
+*/
+   hw_cr4 = ~X86_CR4_SMEP;
} else if (!(cr4  X86_CR4_PAE)) {
hw_cr4 = ~X86_CR4_PAE;
}
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM: VMX: disable SMEP feature when guest is in non-paging mode

2013-02-03 Thread Xu, Dongxiao
 -Original Message-
 From: Gleb Natapov [mailto:g...@redhat.com]
 Sent: Sunday, February 03, 2013 9:57 PM
 To: Xu, Dongxiao
 Cc: kvm@vger.kernel.org
 Subject: Re: KVM: VMX: disable SMEP feature when guest is in non-paging
 mode
 
 On Fri, Feb 01, 2013 at 08:30:07AM -, Xu wrote:
  SMEP is disabled if CPU is in non-paging mode in hardware.
  However KVM always uses paging mode to emulate guest non-paging
  mode with HAP.
 Not always, only if unrestricted mode is disabled, since vm86 mode, that
 is used otherwise, requires paging.
 
 To emulate this behavior, SMEP needs to be manually
  disabled when guest switches to non-paging mode.
 
  We met an issue that, SMP Linux guest with recent kernel (enable
  SMEP support, for example, 3.5.3) would crash with triple fault if
  setting unrestricted_guest=0. This is because KVM uses an identity
  mapping page table to emulate the non-paging mode, where the page
  table is set with USER flag. If SMEP is still enabled in this case,
  guest will meet unhandlable page fault and then crash.
 
  Signed-off-by: Dongxiao Xu dongxiao...@intel.com
  Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com
 Reviewed-by: Gleb Natapov g...@redhat.com
 
 But HAP is XEN terminology AFAIK. KVM speak is tdp (two dimensional
 paging). If would be nice to change it in the commit message and the
 comment before committing.

Thanks for reminding this. I've sent out the v2 patch to address this issue.

Thanks,
Dongxiao

 
 
  ---
  arch/x86/kvm/vmx.c |8 
   1 files changed, 8 insertions(+), 0 deletions(-)
 
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index 9120ae1..e82f20d 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -3155,6 +3155,14 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu,
 unsigned long cr4)
  if (!is_paging(vcpu)) {
  hw_cr4 = ~X86_CR4_PAE;
  hw_cr4 |= X86_CR4_PSE;
  +   /*
  +* SMEP is disabled if CPU is in non-paging mode in
  +* hardware. However KVM always uses paging mode to
  +* emulate guest non-paging mode with HAP.
  +* To emulate this behavior, SMEP needs to be manually
  +* disabled when guest switches to non-paging mode.
  +*/
  +   hw_cr4 = ~X86_CR4_SMEP;
  } else if (!(cr4  X86_CR4_PAE)) {
  hw_cr4 = ~X86_CR4_PAE;
  }
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to guest

2013-02-03 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, February 02, 2013 4:09 AM
 To: Alexander Graf
 Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org
 Subject: Re: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to 
 guest
 
 On 01/31/2013 06:11:32 PM, Alexander Graf wrote:
 
  On 31.01.2013, at 23:40, Scott Wood wrote:
 
   On 01/31/2013 01:20:39 PM, Alexander Graf wrote:
   On 31.01.2013, at 20:05, Alexander Graf wrote:
   
On 31.01.2013, at 19:54, Scott Wood wrote:
   
On 01/31/2013 12:52:41 PM, Alexander Graf wrote:
On 31.01.2013, at 19:43, Scott Wood wrote:
On 01/31/2013 12:21:07 PM, Alexander Graf wrote:
How about something like this? Then both targets at least
  suck as much :).
   
I'm not sure that should be the goal...
   
Thanks to e500mc's awful hardware design, we don't know who
  sets the MSR_DE bit. Once we forced it onto the guest, we have no
  change to know whether the guest also set it or not. We could only
  guess.
   
MSRP[DEP] can prevent the guest from modifying MSR[DE] -- but
  we still need to set it in the first place.
   
According to ISA V2.06B, the hypervisor should set DBCR0[EDM]
  to let the guest know that the debug resources are not available, and
  that the value of MSR[DE] is not specified and not modifiable.
So what would the guest do then to tell the hypervisor that it
  actually wants to know about debug events?
   
The guest is out of luck, just as if a JTAG were in use.
   
Hrm.
   
Can we somehow generalize this out of luck behavior?
   
Every time we would set or clear an MSR bit in shadow_msr on
  e500v2, we would instead set or clear it in the real MSR. That way
  only e500mc is out of luck, but the code would still be shared.
  
   I don't follow.  e500v2 is just as out-of-luck.  The mechanism
  simply does not support sharing debug resources.
 
  For e500v2 we have 2 fields
 
* MSR as the guest sees it
* MSR as we execute when the guest runs
 
  Since we know the MSR when the guest sees it, we can decide what to do
  when we get an unhandled debug interrupt.
 
 That's not the same thing as making the real MSR[DE] show up in the guest
 MSR[DE].
 
 There are other problems with sharing -- what happens when both host and guest
 try to write to a particular IAC or DAC?
 
 Also, performance would be pretty awful if the guest has e.g. single stepping 
 in
 DBCR0 enabled but MSR[DE]=0, and the host doesn't care about single stepping
 (but does want debugging enabled in general).
 
   What do you mean by the real MSR?  The real MSR is shadow_msr,
  and MSR_DE must always be set there if the host is debugging the
  guest.  As for reflecting it into the guest MSR, we could, but I don't
  really see the point.  We're never going to actually send a debug
  exception to the guest when the host owns the debug resources.
 
  Why not? That's the whole point of jumping through user space.
 
 That's still needed for software breakpoints, which don't rely on the debug
 resources.
 
1) guest exits with debug interrupt
2) QEMU gets a debug exit
3) QEMU checks in its list whether it belongs to its own debug
  points
4) if not, it reinjects the interrupt into the guest
 
  Step 4 is pretty difficult to do when we don't know whether the guest
  is actually capable of handling debug interrupts at that moment.
 
 Software breakpoints take a Program interrupt rather than a Debug interrupt,
 unless MSR[DE]=1 and DBCR0[TRAP]=1.  If the guest does not own debug resources
 we should always send it to the Program interrupt, so MSR[DE] doesn't matter.
 
   The = ~MSR_DE line is pointless on bookehv, and makes it harder
  to read.  I had to stare at it a while before noticing that you
  initially set is_debug from the guest MSR and that you'd never really
  clear MSR_DE here on bookehv.
 
  Well, I'm mostly bouncing ideas here to find a way to express what
  we're trying to say in a way that someone who hasn't read this email
  thread would still understand what's going on :).
 
 I think it's already straightforward enough if you accept that shared debug
 resources aren't supported, and that we are either in a mode where the real
 MSR[DE] reflects the guest MSR[DE], or a mode where the real MSR[DE] is always
 on in guest mode and the guest MSR[DE] is irrelevant.
 
  How about this version?
 
 
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
  38a62ef..9929c41 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -133,6 +133,28 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu
  *vcpu)
   #endif
   }
 
  +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { #ifndef
  +CONFIG_KVM_BOOKE_HV
  +   /* Synchronize guest's desire to get debug interrupts into
  shadow MSR */
  +   vcpu-arch.shadow_msr = ~MSR_DE;
  +   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE; #endif
  +
  +   /* Force enable debug 

One reg interface for Timer register

2013-02-03 Thread Bhushan Bharat-R65777
Hi Alex/Scott,

Below is my understanding about the ONE_REG interface requirement for timer 
registers.

Define the below 2 ONE_REG interface for TSR access:
KVM_REG_SET_TSR,  // Set the specified bits in TSR
KVM_REG_CLEAR_TSR, // Clear the specified bits in TSR

QEMU will use the above ioctl call to selectively set/clear bits of TSR.
We do not need the similar interface for TCR as there is no race issue with 
TCR. So for TCR QEMU will keep on using the SREGS interface.

Thanks
-Bharat


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] Added ONE_REG interface for debug instruction

2013-02-03 Thread Paul Mackerras
On Wed, Jan 16, 2013 at 01:54:41PM +0530, Bharat Bhushan wrote:
 This patch adds the one_reg interface to get the special instruction
 to be used for setting software breakpoint from userspace.

Since this presumably is constant for any given platform, wouldn't
a capability be more appropriate?

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to guest

2013-02-03 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, February 02, 2013 4:09 AM
 To: Alexander Graf
 Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org
 Subject: Re: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to 
 guest
 
 On 01/31/2013 06:11:32 PM, Alexander Graf wrote:
 
  On 31.01.2013, at 23:40, Scott Wood wrote:
 
   On 01/31/2013 01:20:39 PM, Alexander Graf wrote:
   On 31.01.2013, at 20:05, Alexander Graf wrote:
   
On 31.01.2013, at 19:54, Scott Wood wrote:
   
On 01/31/2013 12:52:41 PM, Alexander Graf wrote:
On 31.01.2013, at 19:43, Scott Wood wrote:
On 01/31/2013 12:21:07 PM, Alexander Graf wrote:
How about something like this? Then both targets at least
  suck as much :).
   
I'm not sure that should be the goal...
   
Thanks to e500mc's awful hardware design, we don't know who
  sets the MSR_DE bit. Once we forced it onto the guest, we have no
  change to know whether the guest also set it or not. We could only
  guess.
   
MSRP[DEP] can prevent the guest from modifying MSR[DE] -- but
  we still need to set it in the first place.
   
According to ISA V2.06B, the hypervisor should set DBCR0[EDM]
  to let the guest know that the debug resources are not available, and
  that the value of MSR[DE] is not specified and not modifiable.
So what would the guest do then to tell the hypervisor that it
  actually wants to know about debug events?
   
The guest is out of luck, just as if a JTAG were in use.
   
Hrm.
   
Can we somehow generalize this out of luck behavior?
   
Every time we would set or clear an MSR bit in shadow_msr on
  e500v2, we would instead set or clear it in the real MSR. That way
  only e500mc is out of luck, but the code would still be shared.
  
   I don't follow.  e500v2 is just as out-of-luck.  The mechanism
  simply does not support sharing debug resources.
 
  For e500v2 we have 2 fields
 
* MSR as the guest sees it
* MSR as we execute when the guest runs
 
  Since we know the MSR when the guest sees it, we can decide what to do
  when we get an unhandled debug interrupt.
 
 That's not the same thing as making the real MSR[DE] show up in the guest
 MSR[DE].
 
 There are other problems with sharing -- what happens when both host and guest
 try to write to a particular IAC or DAC?
 
 Also, performance would be pretty awful if the guest has e.g. single stepping 
 in
 DBCR0 enabled but MSR[DE]=0, and the host doesn't care about single stepping
 (but does want debugging enabled in general).
 
   What do you mean by the real MSR?  The real MSR is shadow_msr,
  and MSR_DE must always be set there if the host is debugging the
  guest.  As for reflecting it into the guest MSR, we could, but I don't
  really see the point.  We're never going to actually send a debug
  exception to the guest when the host owns the debug resources.
 
  Why not? That's the whole point of jumping through user space.
 
 That's still needed for software breakpoints, which don't rely on the debug
 resources.
 
1) guest exits with debug interrupt
2) QEMU gets a debug exit
3) QEMU checks in its list whether it belongs to its own debug
  points
4) if not, it reinjects the interrupt into the guest
 
  Step 4 is pretty difficult to do when we don't know whether the guest
  is actually capable of handling debug interrupts at that moment.
 
 Software breakpoints take a Program interrupt rather than a Debug interrupt,
 unless MSR[DE]=1 and DBCR0[TRAP]=1.  If the guest does not own debug resources
 we should always send it to the Program interrupt, so MSR[DE] doesn't matter.
 
   The = ~MSR_DE line is pointless on bookehv, and makes it harder
  to read.  I had to stare at it a while before noticing that you
  initially set is_debug from the guest MSR and that you'd never really
  clear MSR_DE here on bookehv.
 
  Well, I'm mostly bouncing ideas here to find a way to express what
  we're trying to say in a way that someone who hasn't read this email
  thread would still understand what's going on :).
 
 I think it's already straightforward enough if you accept that shared debug
 resources aren't supported, and that we are either in a mode where the real
 MSR[DE] reflects the guest MSR[DE], or a mode where the real MSR[DE] is always
 on in guest mode and the guest MSR[DE] is irrelevant.
 
  How about this version?
 
 
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
  38a62ef..9929c41 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -133,6 +133,28 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu
  *vcpu)
   #endif
   }
 
  +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { #ifndef
  +CONFIG_KVM_BOOKE_HV
  +   /* Synchronize guest's desire to get debug interrupts into
  shadow MSR */
  +   vcpu-arch.shadow_msr = ~MSR_DE;
  +   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE; #endif
  +
  +   /* Force enable debug 

One reg interface for Timer register

2013-02-03 Thread Bhushan Bharat-R65777
Hi Alex/Scott,

Below is my understanding about the ONE_REG interface requirement for timer 
registers.

Define the below 2 ONE_REG interface for TSR access:
KVM_REG_SET_TSR,  // Set the specified bits in TSR
KVM_REG_CLEAR_TSR, // Clear the specified bits in TSR

QEMU will use the above ioctl call to selectively set/clear bits of TSR.
We do not need the similar interface for TCR as there is no race issue with 
TCR. So for TCR QEMU will keep on using the SREGS interface.

Thanks
-Bharat


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html