Re: [PATCH 3/3] ALSA: intel8x0: Check return value before assigning

2013-01-28 Thread Takashi Iwai
At Sun, 27 Jan 2013 17:40:02 +0100,
Emil Goode wrote:
> 
> The first patch in this series changes the return type of function
> kvm_para_available to bool. In the function snd_intel8x0_inside_vm
> we now need to check it's return value before assigning a value to
> the result variable.
> 
> Signed-off-by: Emil Goode 
> ---
>  sound/pci/intel8x0.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c
> index 3b9be75..fa9337e 100644
> --- a/sound/pci/intel8x0.c
> +++ b/sound/pci/intel8x0.c
> @@ -2982,7 +2982,7 @@ static int snd_intel8x0_inside_vm(struct pci_dev *pci)
>   }
>  
>   /* detect KVM and Parallels virtual environments */
> - result = kvm_para_available();
> + result = kvm_para_available() ? 1 : 0;

This is superfluous.  The conversion from bool to int works implicitly
like that.


Takashi
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: set_memory_region: Identify the requested change explicitly

2013-01-28 Thread Takuya Yoshikawa
KVM_SET_USER_MEMORY_REGION forces __kvm_set_memory_region() to identify
what kind of change is being requested by checking the arguments.  The
current code does this checking at various points in code and each
condition being used there is not easy to understand at first glance.

This patch consolidates these checks and introduces an enum to name the
possible changes to clean up the code.

Although this does not introduce any functional changes, there is one
change which optimizes the code a bit: if we have nothing to change, the
new code returns 0 immediately.

Note that the return value for this case cannot be changed since QEMU
relies on it: we noticed this when we changed it to -EINVAL and got a
section mismatch error at the final stage of live migration.

Signed-off-by: Takuya Yoshikawa 
---
v2: updated iommu related parts

 virt/kvm/kvm_main.c |   64 +++
 1 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3fec2cd..8ea447b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -714,6 +714,24 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
 }
 
 /*
+ * KVM_SET_USER_MEMORY_REGION ioctl allows the following operations:
+ * - create a new memory slot
+ * - delete an existing memory slot
+ * - modify an existing memory slot
+ *   -- move it in the guest physical memory space
+ *   -- just change its flags
+ *
+ * Since flags can be changed by some of these operations, the following
+ * differentiation is the best we can do for __kvm_set_memory_region():
+ */
+enum kvm_mr_change {
+   KVM_MR_CREATE,
+   KVM_MR_DELETE,
+   KVM_MR_MOVE,
+   KVM_MR_FLAGS_ONLY,
+};
+
+/*
  * Allocate some memory and give it an address in the guest physical address
  * space.
  *
@@ -732,6 +750,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
struct kvm_memory_slot old, new;
struct kvm_memslots *slots = NULL, *old_memslots;
bool old_iommu_mapped;
+   enum kvm_mr_change change;
 
r = check_memory_region_flags(mem);
if (r)
@@ -775,17 +794,30 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
old_iommu_mapped = old.npages;
 
-   /*
-* Disallow changing a memory slot's size or changing anything about
-* zero sized slots that doesn't involve making them non-zero.
-*/
r = -EINVAL;
-   if (npages && old.npages && npages != old.npages)
-   goto out;
-   if (!npages && !old.npages)
+   if (npages) {
+   if (!old.npages)
+   change = KVM_MR_CREATE;
+   else { /* Modify an existing slot. */
+   if ((mem->userspace_addr != old.userspace_addr) ||
+   (npages != old.npages))
+   goto out;
+
+   if (base_gfn != old.base_gfn)
+   change = KVM_MR_MOVE;
+   else if (new.flags != old.flags)
+   change = KVM_MR_FLAGS_ONLY;
+   else { /* Nothing to change. */
+   r = 0;
+   goto out;
+   }
+   }
+   } else if (old.npages) {
+   change = KVM_MR_DELETE;
+   } else /* Modify a non-existent slot: disallowed. */
goto out;
 
-   if ((npages && !old.npages) || (base_gfn != old.base_gfn)) {
+   if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
/* Check for overlaps */
r = -EEXIST;
kvm_for_each_memslot(slot, kvm->memslots) {
@@ -803,20 +835,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
new.dirty_bitmap = NULL;
 
r = -ENOMEM;
-
-   /*
-* Allocate if a slot is being created.  If modifying a slot,
-* the userspace_addr cannot change.
-*/
-   if (!old.npages) {
+   if (change == KVM_MR_CREATE) {
new.user_alloc = user_alloc;
new.userspace_addr = mem->userspace_addr;
 
if (kvm_arch_create_memslot(&new, npages))
goto out_free;
-   } else if (npages && mem->userspace_addr != old.userspace_addr) {
-   r = -EINVAL;
-   goto out_free;
}
 
/* Allocate page dirty bitmap if needed */
@@ -825,7 +849,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
goto out_free;
}
 
-   if (!npages || base_gfn != old.base_gfn) {
+   if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
r = -ENOMEM;
slots = kmemdup(kvm->memslots, sizeof(struct kvm_memslots),
GFP_KERNEL);
@@ -876,7 +900,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 * slots (size changes, userspace addr changes) is disallowed above,
 * so any 

Re: [PATCH] vhost-net: fall back to vmalloc if high-order allocation fails

2013-01-28 Thread Romain Francoise
David Miller  writes:

> I'm not going to apply this vmalloc patch, because if I apply it the
> fundamental problem here just gets swept under the carpet even longer.

No problem, I'll keep this as a local change until vhost-net's allocation
strategy gains some sanity.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

2013-01-28 Thread Pandarathil, Vijaymohan R

- Create eventfd per vfio device assigned to a guest and register an
  event handler

- This fd is passed to the vfio_pci driver through the SET_IRQ ioctl

- When the device encounters an error, the eventfd is signalled
  and the qemu eventfd handler gets invoked.

- In the handler decide what action to take. Current action taken
  is to terminate the guest.

Signed-off-by: Vijay Mohan Pandarathil 
---
 hw/vfio_pci.c  | 106 +
 linux-headers/linux/vfio.h |   3 ++
 2 files changed, 109 insertions(+)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index c51ae67..c4a6aec 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -130,6 +130,8 @@ typedef struct VFIODevice {
 QLIST_ENTRY(VFIODevice) next;
 struct VFIOGroup *group;
 bool reset_works;
+bool pci_aer_notify_cap;
+EventNotifier err_notifier;
 } VFIODevice;
 
 typedef struct VFIOGroup {
@@ -1837,6 +1839,10 @@ static int vfio_get_device(VFIOGroup *group, const char 
*name, VFIODevice *vdev)
 error_report("Warning, device %s does not support reset\n", name);
 }
 
+vdev->pci_aer_notify_cap = !!(dev_info.flags &
+(VFIO_DEVICE_FLAGS_PCI_AER |
+VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY));
+
 if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
 error_report("vfio: unexpected number of io regions %u\n",
  dev_info.num_regions);
@@ -1922,6 +1928,103 @@ static void vfio_put_device(VFIODevice *vdev)
 }
 }
 
+static void vfio_err_notifier_handler(void *opaque)
+{
+VFIODevice *vdev = opaque;
+
+if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
+return;
+}
+
+/*
+ * TBD. Retrieve the error details and decide what action
+ * needs to be taken. One of the actions could be to pass
+ * the error to the guest and have the guest driver recover
+ * from the error. This requires that PCIe capabilities be
+ * exposed to the guest. At present, we just terminate the
+ * guest to contain the error.
+ */
+
+error_report("%s (%04x:%02x:%02x.%x)"
+"Unrecoverable error detected... Terminating guest\n",
+__func__, vdev->host.domain, vdev->host.bus,
+vdev->host.slot, vdev->host.function);
+
+hw_error("(%04x:%02x:%02x.%x) Unrecoverable device error\n",
+vdev->host.domain, vdev->host.bus,
+vdev->host.slot, vdev->host.function);
+
+return;
+}
+
+static void vfio_register_err_notifier(VFIODevice *vdev)
+{
+int ret;
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+
+if (!(vdev->pci_aer_notify_cap)) {
+DPRINTF("vfio: Error notification not supported for the device\n");
+return;
+}
+if (event_notifier_init(&vdev->err_notifier, 0)) {
+error_report("vfio: Warning: Unable to init event notifier for error 
detection\n");
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_PCI_ERR_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)&irq_set->data;
+
+*pfd = event_notifier_get_fd(&vdev->err_notifier);
+qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
+
+ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+error_report("vfio: Warning: Failed to setup error fd: %d\n", ret);
+qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
+event_notifier_cleanup(&vdev->err_notifier);
+}
+g_free(irq_set);
+return;
+}
+static void vfio_unregister_err_notifier(VFIODevice *vdev)
+{
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+int ret;
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_PCI_ERR_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)&irq_set->data;
+*pfd = -1;
+
+ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+DPRINTF("vfio: Failed to de-assign error fd: %d\n", ret);
+}
+g_free(irq_set);
+qemu_set_fd_handler(event_notifier_get_fd(&vdev->err_notifier),
+NULL, NULL, vdev);
+event_notifier_cleanup(&vdev->err_notifier);
+return;
+}
 static int vfio_initfn(PCIDevice *pdev)
 {
 VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
@@ -2032,6 +2135,8 @@ static int vfio_initfn(PCIDevice *pdev)
 }
 }
 
+vfio_register_err_notifier(vdev);
+
 return 0;
 
 out_teardown:
@@ -2049,6 +2154,7 @@ static void vfio_exitfn(PCIDevice *pdev)

[PATCH v2 0/3] AER-KVM: Error containment of VFIO devices assigned to KVM guests

2013-01-28 Thread Pandarathil, Vijaymohan R
Add support for error containment when a VFIO device assigned to a KVM
guest encounters an error. This is for PCIe devices/drivers that support AER
functionality. When the host OS is notified of an error in a device either
through the firmware first approach or through an interrupt handled by the AER
root port driver, the error handler registered by the vfio-pci driver gets
invoked. The qemu process is signaled through an eventfd registered per
VFIO device by the qemu process. In the eventfd handler, qemu decides on
what action to take. In this implementation, guest is brought down to
contain the error.

v2:
 - Rebased to latest upstream stable bits
 - Changed the new ioctl to be part of VFIO_SET_IRQs ioctl
 - Added a new patch to get/put reference to a vfio device from struct device
 - Incorporated all other feedback.

---
Vijay Mohan Pandarathil(3):

[PATCH 1/3] VFIO: Wrappers for getting/putting reference to vfio_device
[PATCH 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER
[PATCH 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

Kernel files changed

 drivers/vfio/vfio.c  | 47 +--
 include/linux/vfio.h |  3 +++
 2 files changed, 44 insertions(+), 6 deletions(-)

 drivers/vfio/pci/vfio_pci.c | 44 -
 drivers/vfio/pci/vfio_pci_intrs.c   | 32 +++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h   |  3 +++
 4 files changed, 79 insertions(+), 1 deletion(-)


Qemu files changed

 hw/vfio_pci.c  | 106 +
 linux-headers/linux/vfio.h |   3 ++
 2 files changed, 109 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] VFIO: Wrappers for getting/putting reference to vfio_device

2013-01-28 Thread Pandarathil, Vijaymohan R

- Added vfio_device_get_from_vdev(), vfio_device_put_vdev()
  as wrappers to get/put reference to vfio_device from struct device.

- Added vfio_device_data() as a wrapper to get device_data from
  vfio_device.

Signed-off-by: Vijay Mohan Pandarathil 
---
 drivers/vfio/vfio.c  | 47 +--
 include/linux/vfio.h |  3 +++
 2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 12c264d..c2ff1b2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -642,8 +642,13 @@ int vfio_add_group_dev(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(vfio_add_group_dev);
 
-/* Test whether a struct device is present in our tracking */
-static bool vfio_dev_present(struct device *dev)
+/**
+ * This does a get on the corresponding iommu_group,
+ * vfio_group and the vfio_device. Callers of this
+ * function will hae to call vfio_put_vdev() to
+ * remove the reference to all objects.
+ */
+void *vfio_device_get_from_dev(struct device *dev)
 {
struct iommu_group *iommu_group;
struct vfio_group *group;
@@ -651,25 +656,55 @@ static bool vfio_dev_present(struct device *dev)
 
iommu_group = iommu_group_get(dev);
if (!iommu_group)
-   return false;
+   return NULL;
 
group = vfio_group_get_from_iommu(iommu_group);
if (!group) {
iommu_group_put(iommu_group);
-   return false;
+   return NULL;
}
 
device = vfio_group_get_device(group, dev);
if (!device) {
vfio_group_put(group);
iommu_group_put(iommu_group);
-   return false;
+   return NULL;
}
+   return device;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
+
+void *vfio_device_data(void *data)
+{
+   struct vfio_device *device = data;
+   return device->device_data;
+}
+EXPORT_SYMBOL_GPL(vfio_device_data);
+
+void vfio_device_put_vdev(void *data)
+{
+   struct vfio_device *device = data;
+   struct vfio_group *group = device->group;
+   struct iommu_group *iommu_group = group->iommu_group;
 
vfio_device_put(device);
vfio_group_put(group);
iommu_group_put(iommu_group);
-   return true;
+   return;
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_vdev);
+
+/* Test whether a struct device is present in our tracking */
+static bool vfio_dev_present(struct device *dev)
+{
+   struct vfio_device *device;
+
+   device = vfio_device_get_from_dev(dev);
+   if (device) {
+   vfio_device_put_vdev(device);
+   return true;
+   } else
+   return false;
 }
 
 /*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ab9e862..e550c09 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -45,6 +45,9 @@ extern int vfio_add_group_dev(struct device *dev,
  void *device_data);
 
 extern void *vfio_del_group_dev(struct device *dev);
+extern void *vfio_device_get_from_dev(struct device *dev);
+extern void vfio_device_put_vdev(void *device);
+extern void *vfio_device_data(void *device);
 
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER

2013-01-28 Thread Pandarathil, Vijaymohan R
 
- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signalled 
when
  an error occurs in the vfio_pci_device

- Register pci_error_handler for the vfio_pci driver

- When the device encounters an error, the error handler registered by
  the vfio_pci driver gets invoked by the AER infrastructure

- In the error handler, signal the eventfd registered for the device.

- This results in the qemu eventfd handler getting invoked and
  appropriate action taken for the guest.

Signed-off-by: Vijay Mohan Pandarathil 
---
 drivers/vfio/pci/vfio_pci.c | 44 -
 drivers/vfio/pci/vfio_pci_intrs.c   | 32 +++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h   |  3 +++
 4 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b28e66c..ff2a078 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -196,7 +196,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
 
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
-   }
+   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX)
+   if (pci_is_pcie(vdev->pdev))
+   return 1;
 
return 0;
 }
@@ -223,9 +225,18 @@ static long vfio_pci_ioctl(void *device_data,
if (vdev->reset_works)
info.flags |= VFIO_DEVICE_FLAGS_RESET;
 
+   if (pci_is_pcie(vdev->pdev)) {
+   info.flags |= VFIO_DEVICE_FLAGS_PCI_AER;
+   info.flags |= VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY;
+   }
+
info.num_regions = VFIO_PCI_NUM_REGIONS;
info.num_irqs = VFIO_PCI_NUM_IRQS;
 
+   /* Expose only implemented IRQs */
+   if (!(info.flags & VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY))
+   info.num_irqs--;
+
return copy_to_user((void __user *)arg, &info, minsz);
 
} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
@@ -302,6 +313,10 @@ static long vfio_pci_ioctl(void *device_data,
if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
return -EINVAL;
 
+   if ((info.index == VFIO_PCI_ERR_IRQ_INDEX) &&
+!pci_is_pcie(vdev->pdev))
+   return -EINVAL;
+
info.flags = VFIO_IRQ_INFO_EVENTFD;
 
info.count = vfio_pci_get_irq_count(vdev, info.index);
@@ -538,11 +553,38 @@ static void vfio_pci_remove(struct pci_dev *pdev)
kfree(vdev);
 }
 
+static pci_ers_result_t vfio_err_detected(struct pci_dev *pdev,
+   pci_channel_state_t state)
+{
+   struct vfio_pci_device *vpdev;
+   void *vdev;
+
+   vdev = vfio_device_get_from_dev(&pdev->dev);
+   if (vdev == NULL)
+   return PCI_ERS_RESULT_DISCONNECT;
+
+   vpdev = vfio_device_data(vdev);
+   if (vpdev == NULL)
+   return PCI_ERS_RESULT_DISCONNECT;
+
+   if (vpdev->err_trigger)
+   eventfd_signal(vpdev->err_trigger, 1);
+
+   vfio_device_put_vdev(vdev);
+
+   return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
+static const struct pci_error_handlers vfio_err_handlers = {
+   .error_detected = vfio_err_detected,
+};
+
 static struct pci_driver vfio_pci_driver = {
.name   = "vfio-pci",
.id_table   = NULL, /* only dynamic ids */
.probe  = vfio_pci_probe,
.remove = vfio_pci_remove,
+   .err_handler= &vfio_err_handlers,
 };
 
 static void __exit vfio_pci_cleanup(void)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 3639371..f003e08 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -745,6 +745,31 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_device 
*vdev,
return 0;
 }
 
+static int vfio_pci_set_err_eventfd(struct vfio_pci_device *vdev,
+   unsigned index, unsigned start,
+   unsigned count, uint32_t flags, void *data)
+{
+   if ((index == VFIO_PCI_ERR_IRQ_INDEX) &&
+   (flags & VFIO_IRQ_SET_DATA_EVENTFD)   &&
+   pci_is_pcie(vdev->pdev)) {
+
+   int32_t fd = *(int32_t *)data;
+
+   if (fd == -1) {
+   if (vdev->err_trigger)
+   eventfd_ctx_put(vdev->err_trigger);
+   vdev->err_trigger = NULL;
+   return 0;
+   } else if (fd >= 0) {
+   vdev->err_trigger = eventfd_ctx_fdget(fd);
+   if (IS_ERR(vdev->err_trigger))
+   return PTR_ERR(vdev->err_trigger);
+   return 0;
+

Re: [PATCH v3] KVM: VMX: enable acknowledge interupt on vmexit

2013-01-28 Thread Gleb Natapov
On Mon, Jan 28, 2013 at 08:54:07AM +0800, Yang Zhang wrote:
> From: Yang Zhang 
> 
> The "acknowledge interrupt on exit" feature controls processor behavior
> for external interrupt acknowledgement. When this control is set, the
> processor acknowledges the interrupt controller to acquire the
> interrupt vector on VM exit.
> 
> After enabling this feature, an interrupt which arrived when target cpu is
> running in vmx non-root mode will be handled by vmx handler instead of handler
> in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
> table to let real handler to handle it. Further, we will recognize the 
> interrupt
> and only delivery the interrupt which not belong to current vcpu through idt 
> table.
> The interrupt which belonged to current vcpu will be handled inside vmx 
> handler.
> This will reduce the interrupt handle cost of KVM.
> 
> Refer to Intel SDM volum 3, chapter 33.2.
> 
> Signed-off-by: Yang Zhang 
> ---
>  arch/x86/include/asm/kvm_host.h |2 +
>  arch/x86/kvm/svm.c  |6 
>  arch/x86/kvm/vmx.c  |   61 --
>  arch/x86/kvm/x86.c  |3 +-
>  4 files changed, 67 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 77d56a4..07daf10 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -340,6 +340,7 @@ struct kvm_vcpu_arch {
>   unsigned long cr8;
>   u32 hflags;
>   u64 efer;
> + unsigned long host_idt_base;
Should be in vmx.

>   u64 apic_base;
>   struct kvm_lapic *apic;/* kernel irqchip context */
>   unsigned long apic_attention;
> @@ -725,6 +726,7 @@ struct kvm_x86_ops {
>   int (*check_intercept)(struct kvm_vcpu *vcpu,
>  struct x86_instruction_info *info,
>  enum x86_intercept_stage stage);
> + void (*handle_external_intr)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d29d3cd..c283185 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4227,6 +4227,11 @@ out:
>   return ret;
>  }
>  
> +static void svm_handle_external_intr(struct kvm_vcpu *vcpu)
> +{
> + local_irq_enable();
> +}
> +
>  static struct kvm_x86_ops svm_x86_ops = {
>   .cpu_has_kvm_support = has_svm,
>   .disabled_by_bios = is_disabled,
> @@ -4318,6 +4323,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>   .set_tdp_cr3 = set_tdp_cr3,
>  
>   .check_intercept = svm_check_intercept,
> + .handle_external_intr = svm_handle_external_intr,
>  };
>  
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 02eeba8..243ce45 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2565,7 +2565,8 @@ static __init int setup_vmcs_config(struct vmcs_config 
> *vmcs_conf)
>  #ifdef CONFIG_X86_64
>   min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
>  #endif
> - opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT;
> + opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
> + VM_EXIT_ACK_INTR_ON_EXIT;
>   if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
>   &_vmexit_control) < 0)
>   return -EIO;
> @@ -3742,7 +3743,7 @@ static void vmx_disable_intercept_for_msr(u32 msr, bool 
> longmode_only)
>   * Note that host-state that does change is set elsewhere. E.g., host-state
>   * that is set differently for each CPU is set in vmx_vcpu_load(), not here.
>   */
> -static void vmx_set_constant_host_state(void)
> +static void vmx_set_constant_host_state(struct kvm_vcpu *vcpu)
>  {
>   u32 low32, high32;
>   unsigned long tmpl;
> @@ -3770,6 +3771,7 @@ static void vmx_set_constant_host_state(void)
>  
>   native_store_idt(&dt);
>   vmcs_writel(HOST_IDTR_BASE, dt.address);   /* 22.2.4 */
> + vcpu->arch.host_idt_base = dt.address;
>  
>   vmcs_writel(HOST_RIP, vmx_return); /* 22.2.5 */
>  
> @@ -3884,7 +3886,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  
>   vmcs_write16(HOST_FS_SELECTOR, 0);/* 22.2.4 */
>   vmcs_write16(HOST_GS_SELECTOR, 0);/* 22.2.4 */
> - vmx_set_constant_host_state();
> + vmx_set_constant_host_state(&vmx->vcpu);
>  #ifdef CONFIG_X86_64
>   rdmsrl(MSR_FS_BASE, a);
>   vmcs_writel(HOST_FS_BASE, a); /* 22.2.4 */
> @@ -6094,6 +6096,56 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx 
> *vmx)
>   }
>  }
>  
> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
> +{
> + u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> + if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
> + == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
> + unsigned int vector;
> + unsigned long entry;
> + gate_desc *desc;
> +
> +

Re: [PATCH 0/2] kvm: IOMMU read-only mapping support

2013-01-28 Thread Gleb Natapov
On Fri, Jan 25, 2013 at 11:28:40AM +0800, Xiao Guangrong wrote:
> On 01/25/2013 09:17 AM, Takuya Yoshikawa wrote:
> > On Thu, 24 Jan 2013 15:03:57 -0700
> > Alex Williamson  wrote:
> > 
> >> A couple patches to make KVM IOMMU support honor read-only mappings.
> >> This causes an un-map, re-map when the read-only flag changes and
> >> makes use of it when setting IOMMU attributes.  Thanks,
> > 
> > Looks good to me.
> > 
> > I think I can naturally update my patch after this gets merged.
> > 
> 
> Please wait.
> 
> The commit c972f3b1 changed the write-protect behaviour - it does
> wirte-protection only when dirty flag is set.
> [ I did not see this commit when we discussed the problem before. ]
> 
> Further more, i notice that write-protect is not enough, when do sync
> shadow page:
> 
> FNAME(sync_page):
> 
>   host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
> 
>   set_spte(vcpu, &sp->spt[i], pte_access,
>PT_PAGE_TABLE_LEVEL, gfn,
>spte_to_pfn(sp->spt[i]), true, false,
>host_writable);
> 
> It sets spte based on the old value that means the readonly flag check
> is missed. We need to call kvm_arch_flush_shadow_all under this case.
Why not just disallow changing memory region KVM_MEM_READONLY flag
without deleting the region?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for 2013-01-29

2013-01-28 Thread Juan Quintela


Hi

Please send in any agenda topics you are interested in.

Later, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] trace-cmd: add kvm_mmu_prepare_zap_page even and fix kvm_mmu_get_page event output in kvm plugin

2013-01-28 Thread Gleb Natapov
Ping.

On Thu, Dec 27, 2012 at 01:34:15PM +0200, Gleb Natapov wrote:
> kvm_mmu_zap_page event was renamed to kvm_mmu_prepare_zap_page. Add new
> even, but leave the old one to parse older traces.  Print out "created"
> field for kvm_mmu_get_page event.
> 
> Signed-off-by: Gleb Natapov 
> diff --git a/plugin_kvm.c b/plugin_kvm.c
> index 55812ef..9b376d8 100644
> --- a/plugin_kvm.c
> +++ b/plugin_kvm.c
> @@ -382,7 +382,7 @@ static int kvm_mmu_print_role(struct trace_seq *s, struct 
> pevent_record *record,
>   } else
>   trace_seq_printf(s, "WORD: %08x", role.word);
>  
> - pevent_print_num_field(s, " root %u",  event,
> + pevent_print_num_field(s, " root %u ",  event,
>  "root_count", record, 1);
>  
>   if (pevent_get_field_val(s, event, "unsync", record, &val, 1) < 0)
> @@ -397,6 +397,11 @@ static int kvm_mmu_get_page_handler(struct trace_seq *s, 
> struct pevent_record *r
>  {
>   unsigned long long val;
>  
> + if (pevent_get_field_val(s, event, "created", record, &val, 1) < 0)
> + return -1;
> +
> + trace_seq_printf(s, "%s ", val ? "new" : "existing");
> +
>   if (pevent_get_field_val(s, event, "gfn", record, &val, 1) < 0)
>   return -1;
>  
> @@ -433,5 +438,9 @@ int PEVENT_PLUGIN_LOADER(struct pevent *pevent)
>   pevent_register_event_handler(pevent, -1, "kvmmmu", "kvm_mmu_zap_page",
> kvm_mmu_print_role, NULL);
>  
> + pevent_register_event_handler(pevent, -1, "kvmmmu",
> + "kvm_mmu_prepare_zap_page", kvm_mmu_print_role,
> + NULL);
> +
>   return 0;
>  }
> --
>   Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3] KVM: VMX: enable acknowledge interupt on vmexit

2013-01-28 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-01-28:
> On Mon, Jan 28, 2013 at 08:54:07AM +0800, Yang Zhang wrote:
>> From: Yang Zhang 
>> 
>> The "acknowledge interrupt on exit" feature controls processor behavior
>> for external interrupt acknowledgement. When this control is set, the
>> processor acknowledges the interrupt controller to acquire the
>> interrupt vector on VM exit.
>> 
>> After enabling this feature, an interrupt which arrived when target cpu
>> is running in vmx non-root mode will be handled by vmx handler instead
>> of handler in idt. Currently, vmx handler only fakes an interrupt stack
>> and jump to idt table to let real handler to handle it. Further, we
>> will recognize the interrupt and only delivery the interrupt which not
>> belong to current vcpu through idt table. The interrupt which belonged
>> to current vcpu will be handled inside vmx handler. This will reduce
>> the interrupt handle cost of KVM.
>> 
>> Refer to Intel SDM volum 3, chapter 33.2.
>> 
>> Signed-off-by: Yang Zhang 
>> ---
>>  arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/svm.c   
>>|6  arch/x86/kvm/vmx.c  |   61
>>  -- arch/x86/kvm/x86.c 
>>  |3 +- 4 files changed, 67 insertions(+), 5 deletions(-)
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h index 77d56a4..07daf10 100644 ---
>> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -340,6 +340,7 @@ struct kvm_vcpu_arch {
>>  unsigned long cr8;
>>  u32 hflags;
>>  u64 efer;
>> +unsigned long host_idt_base;
> Should be in vmx.
Sure.

>>  u64 apic_base;  struct kvm_lapic *apic;/* kernel irqchip context
>>  */  unsigned long apic_attention; @@ -725,6 +726,7 @@ struct
>>  kvm_x86_ops {   int (*check_intercept)(struct kvm_vcpu *vcpu,   
>>   
>>  struct x86_instruction_info *info, enum 
>> x86_intercept_stage
>>  stage); +   void (*handle_external_intr)(struct kvm_vcpu *vcpu); };
>>  
>>  struct kvm_arch_async_pf {
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index d29d3cd..c283185 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -4227,6 +4227,11 @@ out:
>>  return ret;
>>  }
>> +static void svm_handle_external_intr(struct kvm_vcpu *vcpu)
>> +{
>> +local_irq_enable();
>> +}
>> +
>>  static struct kvm_x86_ops svm_x86_ops = {   .cpu_has_kvm_support =
>>  has_svm,.disabled_by_bios = is_disabled, @@ -4318,6 +4323,7 @@
>>  static struct kvm_x86_ops svm_x86_ops = {   .set_tdp_cr3 = set_tdp_cr3,
>>  
>>  .check_intercept = svm_check_intercept, +   .handle_external_intr =
>>  svm_handle_external_intr, };
>>  
>>  static int __init svm_init(void)
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 02eeba8..243ce45 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2565,7 +2565,8 @@ static __init int setup_vmcs_config(struct
> vmcs_config *vmcs_conf)
>>  #ifdef CONFIG_X86_64
>>  min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
>>  #endif
>> -opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT;
>> +opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
>> +VM_EXIT_ACK_INTR_ON_EXIT;
>>  if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
>>  &_vmexit_control) < 0)
>>  return -EIO;
>> @@ -3742,7 +3743,7 @@ static void vmx_disable_intercept_for_msr(u32 msr,
> bool longmode_only)
>>   * Note that host-state that does change is set elsewhere. E.g., host-state
>>   * that is set differently for each CPU is set in vmx_vcpu_load(), not here.
>>   */
>> -static void vmx_set_constant_host_state(void)
>> +static void vmx_set_constant_host_state(struct kvm_vcpu *vcpu)
>>  {
>>  u32 low32, high32;
>>  unsigned long tmpl;
>> @@ -3770,6 +3771,7 @@ static void vmx_set_constant_host_state(void)
>> 
>>  native_store_idt(&dt);
>>  vmcs_writel(HOST_IDTR_BASE, dt.address);   /* 22.2.4 */
>> +vcpu->arch.host_idt_base = dt.address;
>> 
>>  vmcs_writel(HOST_RIP, vmx_return); /* 22.2.5 */
>> @@ -3884,7 +3886,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>> 
>>  vmcs_write16(HOST_FS_SELECTOR, 0);/* 22.2.4 */
>>  vmcs_write16(HOST_GS_SELECTOR, 0);/* 22.2.4 */
>> -vmx_set_constant_host_state();
>> +vmx_set_constant_host_state(&vmx->vcpu);
>>  #ifdef CONFIG_X86_64
>>  rdmsrl(MSR_FS_BASE, a);
>>  vmcs_writel(HOST_FS_BASE, a); /* 22.2.4 */
>> @@ -6094,6 +6096,56 @@ static void vmx_complete_atomic_exit(struct
> vcpu_vmx *vmx)
>>  }
>>  }
>> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +u32
>> exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); +   if ((exit_intr_info
>> & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK)) +   
>> ==
>> (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) { + unsigned int 
>> vector;
>> +unsig

Re: [PATCH v2] KVM: set_memory_region: Identify the requested change explicitly

2013-01-28 Thread Takuya Yoshikawa
On Mon, 28 Jan 2013 18:19:45 +0900
Takuya Yoshikawa  wrote:

> @@ -876,7 +900,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>* slots (size changes, userspace addr changes) is disallowed above,
>* so any other attribute changes getting here can be skipped.
>*/
> - if (npages) {
> + if (!(change == KVM_MR_DELETE)) {
>   if (old_iommu_mapped &&
>   ((new.flags ^ old.flags) & KVM_MEM_READONLY)) {
>   kvm_iommu_unmap_pages(kvm, &old);

I somehow wrote strange code.
This should have been if (A != B) instead of !(A == B).

Will update later.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros

2013-01-28 Thread Gleb Natapov
On Wed, Jan 23, 2013 at 07:13:21PM +0900, Takuya Yoshikawa wrote:
> The expression (sp)->gfn should not be expanded using @gfn.
> 
> Although no user of these macros passes a string other than gfn now,
> this should be fixed before anyone sees strange errors.
> 
> Also, the counter-intuitive conditions, which had been used before these
> macros were introduced to avoid extra indentations, should not be used.
> 
Not sure what do you mean here. This counter-intuitive conditions are
used so that if "else" follows the macro it will not be interpreted as
belonging to the hidden if(). Like in the following code:

 if (x)
for_each_gfn_sp()
 else
   do_y();

You do not expect do_y() to be called for each sp->gfn != gfn.

> Note: ignored the following checkpatch report:
>   ERROR: Macros with complex values should be enclosed in parenthesis
> 
> Signed-off-by: Takuya Yoshikawa 
> ---
>  arch/x86/kvm/mmu.c |   18 --
>  1 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9f628f7..376cec8 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1658,16 +1658,14 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, 
> struct kvm_mmu_page *sp,
>  static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>   struct list_head *invalid_list);
>  
> -#define for_each_gfn_sp(kvm, sp, gfn, pos)   \
> -  hlist_for_each_entry(sp, pos,  
> \
> -   &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)
> \
> - if ((sp)->gfn != (gfn)) {} else
> -
> -#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos)\
> -  hlist_for_each_entry(sp, pos,  
> \
> -   &(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link)
> \
> - if ((sp)->gfn != (gfn) || (sp)->role.direct ||  \
> - (sp)->role.invalid) {} else
> +#define for_each_gfn_sp(_kvm, _sp, _gfn, _pos)   
> \
> + hlist_for_each_entry(_sp, _pos, \
> +   &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \
> + if ((_sp)->gfn == (_gfn))
> +
> +#define for_each_gfn_indirect_valid_sp(_kvm, _sp, _gfn, _pos)
> \
> + for_each_gfn_sp(_kvm, _sp, _gfn, _pos)  \
> + if (!(_sp)->role.direct && !(_sp)->role.invalid)
>  
>  /* @sp->gfn should be write-protected at the call site */
>  static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> -- 
> 1.7.5.4

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: IOMMU read-only mapping support

2013-01-28 Thread Takuya Yoshikawa
On Mon, 28 Jan 2013 12:59:03 +0200
Gleb Natapov  wrote:

> > It sets spte based on the old value that means the readonly flag check
> > is missed. We need to call kvm_arch_flush_shadow_all under this case.
> Why not just disallow changing memory region KVM_MEM_READONLY flag
> without deleting the region?

Sounds good.

If you prefer, I can fold the required change into my patch.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] KVM: MMU: Fix and clean up for_each_gfn_* macros

2013-01-28 Thread Takuya Yoshikawa
On Mon, 28 Jan 2013 14:24:25 +0200
Gleb Natapov  wrote:

> On Wed, Jan 23, 2013 at 07:13:21PM +0900, Takuya Yoshikawa wrote:
> > The expression (sp)->gfn should not be expanded using @gfn.
> > 
> > Although no user of these macros passes a string other than gfn now,
> > this should be fixed before anyone sees strange errors.
> > 
> > Also, the counter-intuitive conditions, which had been used before these
> > macros were introduced to avoid extra indentations, should not be used.
> > 
> Not sure what do you mean here. This counter-intuitive conditions are
> used so that if "else" follows the macro it will not be interpreted as
> belonging to the hidden if(). Like in the following code:
> 
>  if (x)
> for_each_gfn_sp()
>  else
>do_y();
> 
> You do not expect do_y() to be called for each sp->gfn != gfn.

I could not think of this case.

Will fix not to change the current conditions.

Thanks,
Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can KVM notify VM to execute some operations?

2013-01-28 Thread 独孤败
I know that the VM can use the VMCALL instruction to notify the KVM to
do some operations.
My question is that if the KVM want to notify the VM to execute some
operations, which instrction should be use?
Or is there any other method to realize this?

--
Best Regards!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-28 Thread Paolo Bonzini
Il 21/01/2013 09:50, Michael S. Tsirkin ha scritto:
>> Unfortunately, I've not been able to get back to the conversion
>> requested by Paolo for a standalone vhost-scsi PCI device.
> 
> It doesn't have to be a standalone device. A vhost=on frontend
> option is also OK I think. Paolo, any objections?

Sorry, I missed this message.

I asked for a standalone device because the configuration mechanism
(configfs vs. command-line) and the feature set are completely
different.  Unlike virtio-net, it's not possible to switch one to the
other at run time.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2013-01-29

2013-01-28 Thread Andreas Färber
Am 28.01.2013 11:59, schrieb Juan Quintela:
> Please send in any agenda topics you are interested in.

Buildbot situation:
* Trees are not being added/updated in a timely fashion
* Insufficient build test coverage of trees (ppc, s390x, MinGW, BSD)

http://buildbot.b1-systems.de/qemu/builders

Stefan H. has access to some of the build slaves but not to the main
buildbot server for configuration, it seems.

If Daniel does not have sufficient time to administer it, can we maybe
have that set up on qemu.org instead, with more than one person that has
access to it?

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-28 Thread Michael S. Tsirkin
On Mon, Jan 28, 2013 at 02:01:41PM +0100, Paolo Bonzini wrote:
> Il 21/01/2013 09:50, Michael S. Tsirkin ha scritto:
> >> Unfortunately, I've not been able to get back to the conversion
> >> requested by Paolo for a standalone vhost-scsi PCI device.
> > 
> > It doesn't have to be a standalone device. A vhost=on frontend
> > option is also OK I think. Paolo, any objections?
> 
> Sorry, I missed this message.
> 
> I asked for a standalone device because the configuration mechanism
> (configfs vs. command-line) and the feature set are completely
> different.  Unlike virtio-net, it's not possible to switch one to the
> other at run time.
> 
> Paolo

Exactly the same applies to any other frontend option.
For example if you have two qemu instances with
different num_queues values you can not migrate one
to the other.
So in this sense it is not different from any other
frontend option, right?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-28 Thread Paolo Bonzini
Il 28/01/2013 14:11, Michael S. Tsirkin ha scritto:
> > I asked for a standalone device because the configuration mechanism
> > (configfs vs. command-line) and the feature set are completely
> > different.  Unlike virtio-net, it's not possible to switch one to the
> > other at run time.
> 
> Exactly the same applies to any other frontend option.
> For example if you have two qemu instances with
> different num_queues values you can not migrate one
> to the other.
> So in this sense it is not different from any other
> frontend option, right?

Indeed, in this sense it is not.

Actually in this case migrating one to the other could succeed, and make
all disks disappear on the destination (because of the different
configuration mechanism).  That however could be overcome with vhost=on
registering a migration blocker.

I won't really block the patch with the vhost=on/off frontend option if
it is properly done (e.g. the QEMU SCSI bus should not be created for
vhost=on) and minimally invasive to the non-vhost code.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-01-29

2013-01-28 Thread Stefan Hajnoczi
On Mon, Jan 28, 2013 at 11:59:40AM +0100, Juan Quintela wrote:
> Please send in any agenda topics you are interested in.

Replacing select(2) so that we will not hit the 1024 fd_set limit in the
future.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-28 Thread Michael S. Tsirkin
On Mon, Jan 28, 2013 at 02:29:23PM +0100, Paolo Bonzini wrote:
> Il 28/01/2013 14:11, Michael S. Tsirkin ha scritto:
> > > I asked for a standalone device because the configuration mechanism
> > > (configfs vs. command-line) and the feature set are completely
> > > different.  Unlike virtio-net, it's not possible to switch one to the
> > > other at run time.
> > 
> > Exactly the same applies to any other frontend option.
> > For example if you have two qemu instances with
> > different num_queues values you can not migrate one
> > to the other.
> > So in this sense it is not different from any other
> > frontend option, right?
> 
> Indeed, in this sense it is not.
> 
> Actually in this case migrating one to the other could succeed, and make
> all disks disappear on the destination (because of the different
> configuration mechanism).  That however could be overcome with vhost=on
> registering a migration blocker.

Or better add a subsection if vhost is set: vhost=on to vhost=on
can migrate, right?

> I won't really block the patch with the vhost=on/off frontend option if
> it is properly done (e.g. the QEMU SCSI bus should not be created for
> vhost=on) and minimally invasive to the non-vhost code.
> 
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-28 Thread Paolo Bonzini
Il 28/01/2013 14:36, Michael S. Tsirkin ha scritto:
> On Mon, Jan 28, 2013 at 02:29:23PM +0100, Paolo Bonzini wrote:
>> Il 28/01/2013 14:11, Michael S. Tsirkin ha scritto:
 I asked for a standalone device because the configuration mechanism
 (configfs vs. command-line) and the feature set are completely
 different.  Unlike virtio-net, it's not possible to switch one to the
 other at run time.
>>>
>>> Exactly the same applies to any other frontend option.
>>> For example if you have two qemu instances with
>>> different num_queues values you can not migrate one
>>> to the other.
>>> So in this sense it is not different from any other
>>> frontend option, right?
>>
>> Indeed, in this sense it is not.
>>
>> Actually in this case migrating one to the other could succeed, and make
>> all disks disappear on the destination (because of the different
>> configuration mechanism).  That however could be overcome with vhost=on
>> registering a migration blocker.
> 
> Or better add a subsection if vhost is set: vhost=on to vhost=on
> can migrate, right?

I think it's not yet supported by the kernel.  You have no guarantee
that I/O is quiescent at the time the VM starts on the destination.
You'd need a ioctl to do the equivalent of bdrv_drain_all().

Once you have that, a subsection would do the job, yes.

Paolo

>> I won't really block the patch with the vhost=on/off frontend option if
>> it is properly done (e.g. the QEMU SCSI bus should not be created for
>> vhost=on) and minimally invasive to the non-vhost code.
>>
>> Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-28 Thread Michael S. Tsirkin
On Mon, Jan 28, 2013 at 02:33:44PM +0100, Paolo Bonzini wrote:
> Il 28/01/2013 14:36, Michael S. Tsirkin ha scritto:
> > On Mon, Jan 28, 2013 at 02:29:23PM +0100, Paolo Bonzini wrote:
> >> Il 28/01/2013 14:11, Michael S. Tsirkin ha scritto:
>  I asked for a standalone device because the configuration mechanism
>  (configfs vs. command-line) and the feature set are completely
>  different.  Unlike virtio-net, it's not possible to switch one to the
>  other at run time.
> >>>
> >>> Exactly the same applies to any other frontend option.
> >>> For example if you have two qemu instances with
> >>> different num_queues values you can not migrate one
> >>> to the other.
> >>> So in this sense it is not different from any other
> >>> frontend option, right?
> >>
> >> Indeed, in this sense it is not.
> >>
> >> Actually in this case migrating one to the other could succeed, and make
> >> all disks disappear on the destination (because of the different
> >> configuration mechanism).  That however could be overcome with vhost=on
> >> registering a migration blocker.
> > 
> > Or better add a subsection if vhost is set: vhost=on to vhost=on
> > can migrate, right?
> 
> I think it's not yet supported by the kernel.  You have no guarantee
> that I/O is quiescent at the time the VM starts on the destination.
> You'd need a ioctl to do the equivalent of bdrv_drain_all().
> 
> Once you have that, a subsection would do the job, yes.
> 
> Paolo

OK once that's in it would be easy to probe for.

> >> I won't really block the patch with the vhost=on/off frontend option if
> >> it is properly done (e.g. the QEMU SCSI bus should not be created for
> >> vhost=on) and minimally invasive to the non-vhost code.
> >>
> >> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH for-1.4 v2] target-i386: kvm: prevent buffer overflow if -cpu foo, [x]level is too big

2013-01-28 Thread Andreas Färber
Am 28.01.2013 12:49, schrieb Igor Mammedov:
> Stack corruption may occur if too big 'level' or 'xlevel' values passed
> on command line with KVM enabled, due to limited size of cpuid_data
> in kvm_arch_init_vcpu().
> 
> reproduces with:
>  qemu -enable-kvm -cpu qemu64,level=4294967295
> or
>  qemu -enable-kvm -cpu qemu64,xlevel=4294967295
> 
> Check if there is space in cpuid_data before passing it to cpu_x86_cpuid()
> or abort() if there is not space.
> 
> Signed-off-by: Igor Mammedov 

Reviewed-by: Andreas Färber 

CC'ing Gleb and KVM list.

Andreas

> ---
>   * v2:
> * use macro instead of const int max_cpuid_entries to fix build breakage
>   in C99 mode. Suggested-By: Laszlo Ersek 
> * compare with array index instead of address of the last element
>   Sugested-By: Marcelo Tosatti 
> 
> ---
>  target-i386/kvm.c |   25 -
>  1 files changed, 24 insertions(+), 1 deletions(-)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 3acff40..4ecb728 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -411,11 +411,12 @@ static void cpu_update_state(void *opaque, int running, 
> RunState state)
>  }
>  }
>  
> +#define KVM_MAX_CPUID_ENTRIES  100
>  int kvm_arch_init_vcpu(CPUState *cs)
>  {
>  struct {
>  struct kvm_cpuid2 cpuid;
> -struct kvm_cpuid_entry2 entries[100];
> +struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
>  } QEMU_PACKED cpuid_data;
>  X86CPU *cpu = X86_CPU(cs);
>  CPUX86State *env = &cpu->env;
> @@ -502,6 +503,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
>  
>  for (i = 0; i <= limit; i++) {
> +if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +fprintf(stderr, "unsupported level value: 0x%x\n", limit);
> +abort();
> +}
>  c = &cpuid_data.entries[cpuid_i++];
>  
>  switch (i) {
> @@ -516,6 +521,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  times = c->eax & 0xff;
>  
>  for (j = 1; j < times; ++j) {
> +if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +fprintf(stderr, "cpuid_data is full, no space for "
> +"cpuid(eax:2):eax & 0xf = 0x%x\n", times);
> +abort();
> +}
>  c = &cpuid_data.entries[cpuid_i++];
>  c->function = i;
>  c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
> @@ -544,6 +554,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  if (i == 0xd && c->eax == 0) {
>  continue;
>  }
> +if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +fprintf(stderr, "cpuid_data is full, no space for "
> +"cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
> +abort();
> +}
>  c = &cpuid_data.entries[cpuid_i++];
>  }
>  break;
> @@ -557,6 +572,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  cpu_x86_cpuid(env, 0x8000, 0, &limit, &unused, &unused, &unused);
>  
>  for (i = 0x8000; i <= limit; i++) {
> +if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +fprintf(stderr, "unsupported xlevel value: 0x%x\n", limit);
> +abort();
> +}
>  c = &cpuid_data.entries[cpuid_i++];
>  
>  c->function = i;
> @@ -569,6 +588,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  cpu_x86_cpuid(env, 0xC000, 0, &limit, &unused, &unused, &unused);
>  
>  for (i = 0xC000; i <= limit; i++) {
> +if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +fprintf(stderr, "unsupported xlevel2 value: 0x%x\n", limit);
> +abort();
> +}
>  c = &cpuid_data.entries[cpuid_i++];
>  
>  c->function = i;
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qom-cpu for-1.4?] kvm: Pass CPUState to kvm_on_sigbus_vcpu()

2013-01-28 Thread Gleb Natapov
On Fri, Jan 25, 2013 at 04:55:29PM +0100, Andreas Färber wrote:
> Since commit 20d695a9254c1b086a456d3b79a3c311236643ba (kvm: Pass
> CPUState to kvm_arch_*) CPUArchState is no longer needed.
> 
> Allows to change qemu_kvm_eat_signals() argument as well.
> 
> Signed-off-by: Andreas Färber 
Reviewed-by: Gleb Natapov 

> ---
>  Extracted from my qom-cpu-8 queue.
> 
>  cpus.c   |8 
>  include/sysemu/kvm.h |2 +-
>  kvm-all.c|3 +--
>  kvm-stub.c   |2 +-
>  4 Dateien geändert, 7 Zeilen hinzugefügt(+), 8 Zeilen entfernt(-)
> 
> diff --git a/cpus.c b/cpus.c
> index a4390c3..41779eb 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -517,7 +517,7 @@ static void qemu_init_sigbus(void)
>  prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_EARLY, 0, 0);
>  }
>  
> -static void qemu_kvm_eat_signals(CPUArchState *env)
> +static void qemu_kvm_eat_signals(CPUState *cpu)
>  {
>  struct timespec ts = { 0, 0 };
>  siginfo_t siginfo;
> @@ -538,7 +538,7 @@ static void qemu_kvm_eat_signals(CPUArchState *env)
>  
>  switch (r) {
>  case SIGBUS:
> -if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) {
> +if (kvm_on_sigbus_vcpu(cpu, siginfo.si_code, siginfo.si_addr)) {
>  sigbus_reraise();
>  }
>  break;
> @@ -560,7 +560,7 @@ static void qemu_init_sigbus(void)
>  {
>  }
>  
> -static void qemu_kvm_eat_signals(CPUArchState *env)
> +static void qemu_kvm_eat_signals(CPUState *cpu)
>  {
>  }
>  #endif /* !CONFIG_LINUX */
> @@ -727,7 +727,7 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
>  qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>  }
>  
> -qemu_kvm_eat_signals(env);
> +qemu_kvm_eat_signals(cpu);
>  qemu_wait_io_event_common(cpu);
>  }
>  
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 384ee66..6e6dfb3 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -159,7 +159,7 @@ int kvm_update_guest_debug(CPUArchState *env, unsigned 
> long reinject_trap);
>  int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset);
>  #endif
>  
> -int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr);
> +int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>  int kvm_on_sigbus(int code, void *addr);
>  
>  /* internal API */
> diff --git a/kvm-all.c b/kvm-all.c
> index 363a358..04ec2d5 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -2026,9 +2026,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, 
> uint16_t val, bool assign)
>  return 0;
>  }
>  
> -int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr)
> +int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr)
>  {
> -CPUState *cpu = ENV_GET_CPU(env);
>  return kvm_arch_on_sigbus_vcpu(cpu, code, addr);
>  }
>  
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 47f8dca..760aadc 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -112,7 +112,7 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t 
> val, bool assign, uint
>  return -ENOSYS;
>  }
>  
> -int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr)
> +int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr)
>  {
>  return 1;
>  }
> -- 
> 1.7.10.4

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: windows 2008 guest causing rcu_shed to emit NMI

2013-01-28 Thread Andrey Korolyov
On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti  wrote:
> On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote:
>> On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti  
>> wrote:
>> > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
>> >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti  
>> >> wrote:
>> >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
>> >> >> Thank you Marcelo,
>> >> >>
>> >> >> Host node locking up sometimes later than yesterday, bur problem still
>> >> >> here, please see attached dmesg. Stuck process looks like
>> >> >> root 19251  0.0  0.0 228476 12488 ?D14:42   0:00
>> >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
>> >> >> virtio-blk-pci,? -device
>> >> >>
>> >> >> on fourth vm by count.
>> >> >>
>> >> >> Should I try upstream kernel instead of applying patch to the latest
>> >> >> 3.4 or it is useless?
>> >> >
>> >> > If you can upgrade to an upstream kernel, please do that.
>> >> >
>> >>
>> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing
>> >> again. External symptoms looks like following: starting from some
>> >> count, may be third or sixth vm, qemu-kvm process allocating its
>> >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
>> >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
>> >> kill stuck kvm processes and node returned back to the normal, when on
>> >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
>> >> output (problem and workaround when no scheduler involved described
>> >> here http://www.spinics.net/lists/kvm/msg84799.html).
>> >
>> > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module 
>> > parameter.
>> >
>>
>> Hi Marcelo,
>>
>> thanks, this parameter helped to increase number of working VMs in a
>> half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10
>> to 15 percents, persists on such numbers for a long time, where linux
>> guests in same configuration do not jump over one percent even under
>> stress bench. After I disabled HT, crash happens only in long runs and
>> now it is kernel panic :)
>> Stair-like memory allocation behaviour disappeared, but other symptom
>> leading to the crash which I have not counted previously, persists: if
>> VM count is ``enough'' for crash, some qemu processes starting to eat
>> one core, and they`ll panic system after run in tens of minutes in
>> such state or if I try to attach debugger to one of them. If needed, I
>> can log entire crash output via netconsole, now I have some tail,
>> almost the same every time:
>> http://xdel.ru/downloads/btwin.png
>
> Yes, please log entire crash output, thanks.
>

Here please, 3.7.4-vanilla, 16 vms, ple_gap=0:

http://xdel.ru/downloads/oops-default-kvmintel.txt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-01-29

2013-01-28 Thread Anthony Liguori
Juan Quintela  writes:

> Hi
>
> Please send in any agenda topics you are interested in.

 - Outstanding virtio work for 1.4
   - Multiqueue virtio-net (Amos/Michael)
   - Refactorings (Fred/Peter)
   - virtio-ccw (Cornelia/Alex)

We need to work out the ordering here and what's reasonable to merge
over the next week.

Regards,

Anthony Liguori

>
> Later, Juan.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


QEMU buildbot maintenance state (was: Re: [Qemu-devel] KVM call agenda for 2013-01-29)

2013-01-28 Thread Daniel Gollub
Hi Andreas,

thanks for bringing this topic up.

January was quite busy month for me so far I could only handle minor changes
on the buildbot. I hesitate to make larger changes to avoid I break other
stuff. So I just queued them up once I have more time (beginning in February
or so)

I bounced now all open request from you and the suggestion of Gerd (with
regards to append the log on the mail) to Christian Berendt - who is also
maintaining the buildbot on buildbot.b1-systems.de

On Monday, January 28, 2013 02:02:27 PM Andreas Färber wrote:
> Am 28.01.2013 11:59, schrieb Juan Quintela:
> > Please send in any agenda topics you are interested in.
>
> Buildbot situation:
> * Trees are not being added/updated in a timely fashion

Primary reason why I hesitated to add new trees in the last weeks is pretty
simple:
 - we run out of buildslave power
 - lack of time to setup new buildslaves


If anyone has some spare time and could provide us for a longer period
buildslaves please drop me and Christian a mail for further instruction.



> * Insufficient build test coverage of trees (ppc, s390x, MinGW, BSD)
>
> http://buildbot.b1-systems.de/qemu/builders


With regards to the build test coverage for all trees we would have to
refactor the configuration a lot to make it more easier to add and maintain it
in an easy way.

If you have some time left feel free to come up with a patch for that.
This is on my list - but it requires more time then I have available right now
to make it "right".


(The failing macosx buildslave is known issue on the MacOSX site. The
maintainer has also to investigate that at some later point due to lack of
sufficient time.)


>
> Stefan H. has access to some of the build slaves but not to the main
> buildbot server for configuration, it seems.

JFYI, the main buildbot configuration which controls everything (beside
buildslave credentials) is accessible to everyone:
http://people.b1-systems.de/~gollub/buildbot/

If you are familiar with buildbot feel free to incorporate your suggested
changes directly on a copy and send me or Christian the diff so we just have
to review and apply it.

>
> If Daniel does not have sufficient time to administer it, can we maybe
> have that set up on qemu.org instead, with more than one person that has
> access to it?

JFYI, I just requested if I am allowed to grant Stefan root access to our box.
I would not mind to give him access - but need to check back with our IT
first.

Best Regards,
Daniel

--
Daniel Gollub
Linux Consultant & Developer
Tel.: +49-160 47 73 970
Mail: gol...@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

signature.asc
Description: This is a digitally signed message part.


Re: [PATCHv2] trace-cmd: add kvm_mmu_prepare_zap_page even and fix kvm_mmu_get_page event output in kvm plugin

2013-01-28 Thread Steven Rostedt
On Mon, 2013-01-28 at 13:12 +0200, Gleb Natapov wrote:
> Ping.

Please send email (especially patches) to my rost...@goodmis.org
account. I don't always read my RH email, except to skim it for RH
related work.

I may miss a patch in my goodmis account, but I will periodically purge
it (go through each email one by one and put them into special folders
or delete them). A patch missed will eventually be hit there. But that's
not the same for my RH account. A patch missed there, will never be seen
again.

Note, I'm still in November of my "purge" so a December patch will
probably take a week or two to see now ;-)

Anyway, I'll take a look at this patch and let you know.

Thanks,

-- Steve

> 
> On Thu, Dec 27, 2012 at 01:34:15PM +0200, Gleb Natapov wrote:
> > kvm_mmu_zap_page event was renamed to kvm_mmu_prepare_zap_page. Add new
> > even, but leave the old one to parse older traces.  Print out "created"
> > field for kvm_mmu_get_page event.
> > 
> > Signed-off-by: Gleb Natapov 
> > diff --git a/plugin_kvm.c b/plugin_kvm.c
> > index 55812ef..9b376d8 100644
> > --- a/plugin_kvm.c
> > +++ b/plugin_kvm.c
> > @@ -382,7 +382,7 @@ static int kvm_mmu_print_role(struct trace_seq *s, 
> > struct pevent_record *record,
> > } else
> > trace_seq_printf(s, "WORD: %08x", role.word);
> >  
> > -   pevent_print_num_field(s, " root %u",  event,
> > +   pevent_print_num_field(s, " root %u ",  event,
> >"root_count", record, 1);
> >  
> > if (pevent_get_field_val(s, event, "unsync", record, &val, 1) < 0)
> > @@ -397,6 +397,11 @@ static int kvm_mmu_get_page_handler(struct trace_seq 
> > *s, struct pevent_record *r
> >  {
> > unsigned long long val;
> >  
> > +   if (pevent_get_field_val(s, event, "created", record, &val, 1) < 0)
> > +   return -1;
> > +
> > +   trace_seq_printf(s, "%s ", val ? "new" : "existing");
> > +
> > if (pevent_get_field_val(s, event, "gfn", record, &val, 1) < 0)
> > return -1;
> >  
> > @@ -433,5 +438,9 @@ int PEVENT_PLUGIN_LOADER(struct pevent *pevent)
> > pevent_register_event_handler(pevent, -1, "kvmmmu", "kvm_mmu_zap_page",
> >   kvm_mmu_print_role, NULL);
> >  
> > +   pevent_register_event_handler(pevent, -1, "kvmmmu",
> > +   "kvm_mmu_prepare_zap_page", kvm_mmu_print_role,
> > +   NULL);
> > +
> > return 0;
> >  }
> > --
> > Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
>   Gleb.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QEMU buildbot maintenance state

2013-01-28 Thread Andreas Färber
Hi Daniel,

Am 28.01.2013 15:29, schrieb Daniel Gollub:
> thanks for bringing this topic up.
> 
> January was quite busy month for me so far I could only handle minor changes 
> on the buildbot. I hesitate to make larger changes to avoid I break other 
> stuff. So I just queued them up once I have more time (beginning in February 
> or so)
[snip]

Some kind of reply indicating so would've been nice, you were already
CC'ed last year a few times. ;)

Anyway, my point is that we are approaching Hard Freeze and seeing
weekly pulls with recurring breakages on one or another platform (pci on
ppc, s390x on mingw32). Some people have been assuming that if they
don't get mails from the build bots then their tree is okay to merge.

Our shortened release cycle leads to two weeks of Soft Freeze where
everyone pushes their queues before Feb 1 and most breakages occur.
During Hard Freeze we can still try to fix anything the bots report, but
catching errors before they land in master would be better:

At QEMU Summit Anthony requested that each submaintainer's tree be added
to the buildbot system; if there are as you now indicate build power
shortages, I suggest you specify what exactly is needed, maybe someone
in the community can help out. To me it was not clear at least. The
revived ppc slave is hopefully not running into a bottleneck yet. :)

I'll take a look at the config later.

Thanks,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] trace-cmd: add kvm_mmu_prepare_zap_page even and fix kvm_mmu_get_page event output in kvm plugin

2013-01-28 Thread Steven Rostedt
On Mon, 2013-01-28 at 13:12 +0200, Gleb Natapov wrote:
> Ping.
> 
> On Thu, Dec 27, 2012 at 01:34:15PM +0200, Gleb Natapov wrote:
> > kvm_mmu_zap_page event was renamed to kvm_mmu_prepare_zap_page. Add new
> > even, but leave the old one to parse older traces.  Print out "created"
> > field for kvm_mmu_get_page event.
> > 
> > Signed-off-by: Gleb Natapov 
> > diff --git a/plugin_kvm.c b/plugin_kvm.c
> > index 55812ef..9b376d8 100644
> > --- a/plugin_kvm.c
> > +++ b/plugin_kvm.c
> > @@ -382,7 +382,7 @@ static int kvm_mmu_print_role(struct trace_seq *s, 
> > struct pevent_record *record,
> > } else
> > trace_seq_printf(s, "WORD: %08x", role.word);
> >  
> > -   pevent_print_num_field(s, " root %u",  event,
> > +   pevent_print_num_field(s, " root %u ",  event,
> >"root_count", record, 1);
> >  
> > if (pevent_get_field_val(s, event, "unsync", record, &val, 1) < 0)
> > @@ -397,6 +397,11 @@ static int kvm_mmu_get_page_handler(struct trace_seq 
> > *s, struct pevent_record *r
> >  {
> > unsigned long long val;
> >  
> > +   if (pevent_get_field_val(s, event, "created", record, &val, 1) < 0)

Is "created" a new field? Or is it something that has always been there
but never displayed?

If it is new, then instead of returning '-1' if it's not found, could
you just ignore it. I don't want old kernels to start breaking on new
trace-cmd plugins.

Thanks,

-- Steve

> > +   return -1;
> > +
> > +   trace_seq_printf(s, "%s ", val ? "new" : "existing");
> > +
> > if (pevent_get_field_val(s, event, "gfn", record, &val, 1) < 0)
> > return -1;
> >  
> > @@ -433,5 +438,9 @@ int PEVENT_PLUGIN_LOADER(struct pevent *pevent)
> > pevent_register_event_handler(pevent, -1, "kvmmmu", "kvm_mmu_zap_page",
> >   kvm_mmu_print_role, NULL);
> >  
> > +   pevent_register_event_handler(pevent, -1, "kvmmmu",
> > +   "kvm_mmu_prepare_zap_page", kvm_mmu_print_role,
> > +   NULL);
> > +
> > return 0;
> >  }
> > --
> > Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
>   Gleb.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Fix irqfd resampler list walk

2013-01-28 Thread Alex Williamson
On Mon, 2012-12-10 at 18:16 -0200, Marcelo Tosatti wrote:
> On Thu, Dec 06, 2012 at 02:44:59PM -0700, Alex Williamson wrote:
> > Typo for the next pointer means we're walking random data here.
> > 
> > Signed-off-by: Alex Williamson 
> > Cc: sta...@vger.kernel.org [3.7]
> > ---
> > 
> > Not sure if this will make 3.7, so preemptively adding the stable flag
> > 
> >  virt/kvm/eventfd.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Applied, thanks.
> 

Hi Marcelo,

This didn't seem to make it into 3.7.0 or any stable 3.7.  Can we
promote it for stable?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH qom-cpu for-1.4?] kvm: Pass CPUState to kvm_on_sigbus_vcpu()

2013-01-28 Thread Andreas Färber
Am 28.01.2013 14:50, schrieb Gleb Natapov:
> On Fri, Jan 25, 2013 at 04:55:29PM +0100, Andreas Färber wrote:
>> Since commit 20d695a9254c1b086a456d3b79a3c311236643ba (kvm: Pass
>> CPUState to kvm_arch_*) CPUArchState is no longer needed.
>>
>> Allows to change qemu_kvm_eat_signals() argument as well.
>>
>> Signed-off-by: Andreas Färber 
> Reviewed-by: Gleb Natapov 

Thanks, applied to qom-cpu:
https://github.com/afaerber/qemu-cpu/commits/qom-cpu

Background was:
https://lists.nongnu.org/archive/html/qemu-devel/2013-01/msg03087.html

<<<
[...] qemu_init_vcpu() still operates on CPUArchState and thus cannot be
moved into CPUClass yet. The reason is that
cpus.c:qemu_kvm_cpu_thread_fn sets cpu_single_env, and I do not see a
solution for that - suggestions or patches welcome.

However, I see that kvm-all.c:kvm_on_sigbus_vcpu() can be switched to
CPUState now, so that cpus.c:qemu_kvm_eat_signals() can be changed to
CPUState, used from cpus.c:qemu_kvm_wait_io_event().
But cpus.c:cpu_thread_is_idle() still uses env->halted, which is blocked
by the search for an acceptable solution to flush the TLB at CPUState
level (exec.c:cpu_common_post_load()).
>>>

A less elegant but working solution is on my qom-cpu-8 branch (based off
qom-cpu-next): I introduced a void *env_ptr CPUState field. While
potentially opening a can of worms I wanted to avoid, it allows us to
defer finding a solution to the target_ulong-dependent TLB some more.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: IOMMU read-only mapping support

2013-01-28 Thread Alex Williamson
On Mon, 2013-01-28 at 21:25 +0900, Takuya Yoshikawa wrote:
> On Mon, 28 Jan 2013 12:59:03 +0200
> Gleb Natapov  wrote:
> 
> > > It sets spte based on the old value that means the readonly flag check
> > > is missed. We need to call kvm_arch_flush_shadow_all under this case.
> > Why not just disallow changing memory region KVM_MEM_READONLY flag
> > without deleting the region?
> 
> Sounds good.
> 
> If you prefer, I can fold the required change into my patch.

That would seem to make my patch 1/2 unnecessary.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/37] kvm: Create kvm_arch_vcpu_id() function

2013-01-28 Thread Andreas Färber
From: Eduardo Habkost 

This will allow each architecture to define how the VCPU ID is set on
the KVM_CREATE_VCPU ioctl call.

Signed-off-by: Eduardo Habkost 
Acked-by: Gleb Natapov 
Signed-off-by: Andreas Färber 
---
 include/sysemu/kvm.h |3 +++
 kvm-all.c|2 +-
 target-i386/kvm.c|5 +
 target-ppc/kvm.c |5 +
 target-s390x/kvm.c   |5 +
 5 Dateien geändert, 19 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 22acf91..384ee66 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -196,6 +196,9 @@ int kvm_arch_init(KVMState *s);
 
 int kvm_arch_init_vcpu(CPUState *cpu);
 
+/* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
+unsigned long kvm_arch_vcpu_id(CPUState *cpu);
+
 void kvm_arch_reset_vcpu(CPUState *cpu);
 
 int kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
diff --git a/kvm-all.c b/kvm-all.c
index 6278d61..363a358 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -222,7 +222,7 @@ int kvm_init_vcpu(CPUState *cpu)
 
 DPRINTF("kvm_init_vcpu\n");
 
-ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, cpu->cpu_index);
+ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)kvm_arch_vcpu_id(cpu));
 if (ret < 0) {
 DPRINTF("kvm_create_vcpu failed\n");
 goto err;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3acff40..5f3f789 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -411,6 +411,11 @@ static void cpu_update_state(void *opaque, int running, 
RunState state)
 }
 }
 
+unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+{
+return cpu->cpu_index;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct {
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 2f4f068..2c64c63 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -384,6 +384,11 @@ static inline void kvm_fixup_page_sizes(PowerPCCPU *cpu)
 
 #endif /* !defined (TARGET_PPC64) */
 
+unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+{
+return cpu->cpu_index;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index add6a58..99deddf 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -76,6 +76,11 @@ int kvm_arch_init(KVMState *s)
 return 0;
 }
 
+unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+{
+return cpu->cpu_index;
+}
+
 int kvm_arch_init_vcpu(CPUState *cpu)
 {
 int ret = 0;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/37] target-i386: kvm: Set vcpu_id to APIC ID instead of CPU index

2013-01-28 Thread Andreas Färber
From: Eduardo Habkost 

The CPU ID in KVM is supposed to be the APIC ID, so change the
KVM_CREATE_VCPU call to match it. The current behavior didn't break
anything yet because today the APIC ID is assumed to be equal to the CPU
index, but this won't be true in the future.

Signed-off-by: Eduardo Habkost 
Reviewed-by: Marcelo Tosatti 
Acked-by: Gleb Natapov 
Signed-off-by: Andreas Färber 
---
 target-i386/kvm.c |5 +++--
 1 Datei geändert, 3 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5f3f789..c440809 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -411,9 +411,10 @@ static void cpu_update_state(void *opaque, int running, 
RunState state)
 }
 }
 
-unsigned long kvm_arch_vcpu_id(CPUState *cpu)
+unsigned long kvm_arch_vcpu_id(CPUState *cs)
 {
-return cpu->cpu_index;
+X86CPU *cpu = X86_CPU(cs);
+return cpu->env.cpuid_apic_id;
 }
 
 int kvm_arch_init_vcpu(CPUState *cs)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 37/37] kvm: Pass CPUState to kvm_on_sigbus_vcpu()

2013-01-28 Thread Andreas Färber
Since commit 20d695a9254c1b086a456d3b79a3c311236643ba (kvm: Pass
CPUState to kvm_arch_*) CPUArchState is no longer needed.

Allows to change qemu_kvm_eat_signals() argument as well.

Signed-off-by: Andreas Färber 
Reviewed-by: Gleb Natapov 
---
 cpus.c   |8 
 include/sysemu/kvm.h |2 +-
 kvm-all.c|3 +--
 kvm-stub.c   |2 +-
 4 Dateien geändert, 7 Zeilen hinzugefügt(+), 8 Zeilen entfernt(-)

diff --git a/cpus.c b/cpus.c
index a4390c3..41779eb 100644
--- a/cpus.c
+++ b/cpus.c
@@ -517,7 +517,7 @@ static void qemu_init_sigbus(void)
 prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_EARLY, 0, 0);
 }
 
-static void qemu_kvm_eat_signals(CPUArchState *env)
+static void qemu_kvm_eat_signals(CPUState *cpu)
 {
 struct timespec ts = { 0, 0 };
 siginfo_t siginfo;
@@ -538,7 +538,7 @@ static void qemu_kvm_eat_signals(CPUArchState *env)
 
 switch (r) {
 case SIGBUS:
-if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) {
+if (kvm_on_sigbus_vcpu(cpu, siginfo.si_code, siginfo.si_addr)) {
 sigbus_reraise();
 }
 break;
@@ -560,7 +560,7 @@ static void qemu_init_sigbus(void)
 {
 }
 
-static void qemu_kvm_eat_signals(CPUArchState *env)
+static void qemu_kvm_eat_signals(CPUState *cpu)
 {
 }
 #endif /* !CONFIG_LINUX */
@@ -727,7 +727,7 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
 qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
 }
 
-qemu_kvm_eat_signals(env);
+qemu_kvm_eat_signals(cpu);
 qemu_wait_io_event_common(cpu);
 }
 
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 384ee66..6e6dfb3 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -159,7 +159,7 @@ int kvm_update_guest_debug(CPUArchState *env, unsigned long 
reinject_trap);
 int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset);
 #endif
 
-int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr);
+int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 int kvm_on_sigbus(int code, void *addr);
 
 /* internal API */
diff --git a/kvm-all.c b/kvm-all.c
index 363a358..04ec2d5 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2026,9 +2026,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, 
uint16_t val, bool assign)
 return 0;
 }
 
-int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr)
+int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr)
 {
-CPUState *cpu = ENV_GET_CPU(env);
 return kvm_arch_on_sigbus_vcpu(cpu, code, addr);
 }
 
diff --git a/kvm-stub.c b/kvm-stub.c
index 47f8dca..760aadc 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -112,7 +112,7 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t 
val, bool assign, uint
 return -ENOSYS;
 }
 
-int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr)
+int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr)
 {
 return 1;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] trace-cmd: add kvm_mmu_prepare_zap_page even and fix kvm_mmu_get_page event output in kvm plugin

2013-01-28 Thread Gleb Natapov
On Mon, Jan 28, 2013 at 10:23:50AM -0500, Steven Rostedt wrote:
> On Mon, 2013-01-28 at 13:12 +0200, Gleb Natapov wrote:
> > Ping.
> > 
> > On Thu, Dec 27, 2012 at 01:34:15PM +0200, Gleb Natapov wrote:
> > > kvm_mmu_zap_page event was renamed to kvm_mmu_prepare_zap_page. Add new
> > > even, but leave the old one to parse older traces.  Print out "created"
> > > field for kvm_mmu_get_page event.
> > > 
> > > Signed-off-by: Gleb Natapov 
> > > diff --git a/plugin_kvm.c b/plugin_kvm.c
> > > index 55812ef..9b376d8 100644
> > > --- a/plugin_kvm.c
> > > +++ b/plugin_kvm.c
> > > @@ -382,7 +382,7 @@ static int kvm_mmu_print_role(struct trace_seq *s, 
> > > struct pevent_record *record,
> > >   } else
> > >   trace_seq_printf(s, "WORD: %08x", role.word);
> > >  
> > > - pevent_print_num_field(s, " root %u",  event,
> > > + pevent_print_num_field(s, " root %u ",  event,
> > >  "root_count", record, 1);
> > >  
> > >   if (pevent_get_field_val(s, event, "unsync", record, &val, 1) < 0)
> > > @@ -397,6 +397,11 @@ static int kvm_mmu_get_page_handler(struct trace_seq 
> > > *s, struct pevent_record *r
> > >  {
> > >   unsigned long long val;
> > >  
> > > + if (pevent_get_field_val(s, event, "created", record, &val, 1) < 0)
> 
> Is "created" a new field? Or is it something that has always been there
> but never displayed?
> 
Always been there, but for some reason, was not displayed.

> If it is new, then instead of returning '-1' if it's not found, could
> you just ignore it. I don't want old kernels to start breaking on new
> trace-cmd plugins.
> 
> Thanks,
> 
> -- Steve
> 
> > > + return -1;
> > > +
> > > + trace_seq_printf(s, "%s ", val ? "new" : "existing");
> > > +
> > >   if (pevent_get_field_val(s, event, "gfn", record, &val, 1) < 0)
> > >   return -1;
> > >  
> > > @@ -433,5 +438,9 @@ int PEVENT_PLUGIN_LOADER(struct pevent *pevent)
> > >   pevent_register_event_handler(pevent, -1, "kvmmmu", "kvm_mmu_zap_page",
> > > kvm_mmu_print_role, NULL);
> > >  
> > > + pevent_register_event_handler(pevent, -1, "kvmmmu",
> > > + "kvm_mmu_prepare_zap_page", kvm_mmu_print_role,
> > > + NULL);
> > > +
> > >   return 0;
> > >  }
> > > --
> > >   Gleb.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > Gleb.
> 

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio-pci: Enable PCIe extended config space

2013-01-28 Thread Michael S. Tsirkin
On Wed, Jan 23, 2013 at 05:46:13PM -0700, Alex Williamson wrote:
> We don't know pre-init time whether the device we're exposing is PCIe
> or legacy PCI.  We could ask for it to be specified via a device
> option, but that seems like too much to ask of the user.  Instead we
> can assume everything will be PCIe, which makes PCI-core allocate
> enough config space.  Removing the flag during init leaves the space
> allocated, but allows legacy PCI devices to report the real device
> config space size to rest of Qemu.
> 
> Signed-off-by: Alex Williamson 

It's a bit of a hack but I don't have a better idea.
Acked-by: Michael S. Tsirkin 


> ---
>  hw/vfio_pci.c |4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
> index c51ae67..66537b7 100644
> --- a/hw/vfio_pci.c
> +++ b/hw/vfio_pci.c
> @@ -1899,6 +1899,9 @@ static int vfio_get_device(VFIOGroup *group, const char 
> *name, VFIODevice *vdev)
>  (unsigned long)reg_info.flags);
>  
>  vdev->config_size = reg_info.size;
> +if (vdev->config_size == PCI_CONFIG_SPACE_SIZE) {
> +vdev->pdev.cap_present &= ~QEMU_PCI_CAP_EXPRESS;
> +}
>  vdev->config_offset = reg_info.offset;
>  
>  error:
> @@ -2121,6 +2124,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, 
> void *data)
>  pdc->exit = vfio_exitfn;
>  pdc->config_read = vfio_pci_read_config;
>  pdc->config_write = vfio_pci_write_config;
> +pdc->is_express = 1; /* We might be */
>  }
>  
>  static const TypeInfo vfio_pci_dev_info = {
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] trace-cmd: add kvm_mmu_prepare_zap_page even and fix kvm_mmu_get_page event output in kvm plugin

2013-01-28 Thread Steven Rostedt
On Mon, 2013-01-28 at 19:00 +0200, Gleb Natapov wrote:

> > Is "created" a new field? Or is it something that has always been there
> > but never displayed?
> > 
> Always been there, but for some reason, was not displayed.

OK, I'll apply it then. Thanks!

-- Steve

> 
> > If it is new, then instead of returning '-1' if it's not found, could
> > you just ignore it. I don't want old kernels to start breaking on new
> > trace-cmd plugins.
> > 
> >

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] VFIO: Wrappers for getting/putting reference to vfio_device

2013-01-28 Thread Alex Williamson
On Mon, 2013-01-28 at 09:54 +, Pandarathil, Vijaymohan R wrote:
>   - Added vfio_device_get_from_vdev(), vfio_device_put_vdev()
>   as wrappers to get/put reference to vfio_device from struct device.
> 
>   - Added vfio_device_data() as a wrapper to get device_data from
>   vfio_device.
> 
> Signed-off-by: Vijay Mohan Pandarathil 
> ---
>  drivers/vfio/vfio.c  | 47 +--
>  include/linux/vfio.h |  3 +++
>  2 files changed, 44 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 12c264d..c2ff1b2 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -642,8 +642,13 @@ int vfio_add_group_dev(struct device *dev,
>  }
>  EXPORT_SYMBOL_GPL(vfio_add_group_dev);
>  
> -/* Test whether a struct device is present in our tracking */
> -static bool vfio_dev_present(struct device *dev)
> +/**
> + * This does a get on the corresponding iommu_group,
> + * vfio_group and the vfio_device. Callers of this
> + * function will hae to call vfio_put_vdev() to

s/hae/have/

Note that your commit log and the patch don't agree on names.
vfio_device_get_from_dev vs vfio_device_get_from_vdev.  vfio_put_vdev vs
vfio_device_put_vdev.  Personally I think they should be:

vfio_device_get_from_dev
vfio_device_put

> + * remove the reference to all objects.

Why do we need to hold references to all the objects?  Holding a
reference to a vfio_device should implicitly hold the vfio_group, which
implicitly holds the iommu_group.  We have to get each to get to the
vfio_device, but once we hold of that reference I believe we can let the
others go.  Is this not the case?

> + */
> +void *vfio_device_get_from_dev(struct device *dev)

Return should be struct vfio_device*.  It doesn't have to be void to be
opaque.

>  {
>   struct iommu_group *iommu_group;
>   struct vfio_group *group;
> @@ -651,25 +656,55 @@ static bool vfio_dev_present(struct device *dev)
>  
>   iommu_group = iommu_group_get(dev);
>   if (!iommu_group)
> - return false;
> + return NULL;
>  
>   group = vfio_group_get_from_iommu(iommu_group);
>   if (!group) {
>   iommu_group_put(iommu_group);
> - return false;
> + return NULL;
>   }
>  
>   device = vfio_group_get_device(group, dev);
>   if (!device) {
>   vfio_group_put(group);
>   iommu_group_put(iommu_group);
> - return false;
> + return NULL;
>   }
> + return device;
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
> +
> +void *vfio_device_data(void *data)
> +{

Why wouldn't this take struct vfio_device*?  We're ignoring free type
checking errors even though the user should be treating the device as
opaque.

> + struct vfio_device *device = data;
> + return device->device_data;
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_data);
> +
> +void vfio_device_put_vdev(void *data)
> +{

This also should take a struct vfio_device* and be called
vfio_device_put().  If we fix the above extra reference holding then
it's just the existingvfio_device_put, which just needs to be exported.

> + struct vfio_device *device = data;
> + struct vfio_group *group = device->group;
> + struct iommu_group *iommu_group = group->iommu_group;
>  
>   vfio_device_put(device);
>   vfio_group_put(group);
>   iommu_group_put(iommu_group);
> - return true;
> + return;

Unnecessary explicit return.  Thanks,

Alex

> +}
> +EXPORT_SYMBOL_GPL(vfio_device_put_vdev);
> +
> +/* Test whether a struct device is present in our tracking */
> +static bool vfio_dev_present(struct device *dev)
> +{
> + struct vfio_device *device;
> +
> + device = vfio_device_get_from_dev(dev);
> + if (device) {
> + vfio_device_put_vdev(device);
> + return true;
> + } else
> + return false;
>  }
>  
>  /*
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ab9e862..e550c09 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -45,6 +45,9 @@ extern int vfio_add_group_dev(struct device *dev,
> void *device_data);
>  
>  extern void *vfio_del_group_dev(struct device *dev);
> +extern void *vfio_device_get_from_dev(struct device *dev);
> +extern void vfio_device_put_vdev(void *device);
> +extern void *vfio_device_data(void *device);
>  
>  /**
>   * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER

2013-01-28 Thread Alex Williamson
On Mon, 2013-01-28 at 09:54 +, Pandarathil, Vijaymohan R wrote:
>   - New VFIO_SET_IRQ ioctl option to pass the eventfd that is signalled 
> when
>   an error occurs in the vfio_pci_device
> 
>   - Register pci_error_handler for the vfio_pci driver
> 
>   - When the device encounters an error, the error handler registered by
>   the vfio_pci driver gets invoked by the AER infrastructure
> 
>   - In the error handler, signal the eventfd registered for the device.
> 
>   - This results in the qemu eventfd handler getting invoked and
>   appropriate action taken for the guest.
> 
> Signed-off-by: Vijay Mohan Pandarathil 
> ---
>  drivers/vfio/pci/vfio_pci.c | 44 
> -
>  drivers/vfio/pci/vfio_pci_intrs.c   | 32 +++
>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>  include/uapi/linux/vfio.h   |  3 +++
>  4 files changed, 79 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b28e66c..ff2a078 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -196,7 +196,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
> *vdev, int irq_type)
>  
>   return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>   }
> - }
> + } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX)
> + if (pci_is_pcie(vdev->pdev))
> + return 1;
>  
>   return 0;
>  }
> @@ -223,9 +225,18 @@ static long vfio_pci_ioctl(void *device_data,
>   if (vdev->reset_works)
>   info.flags |= VFIO_DEVICE_FLAGS_RESET;
>  
> + if (pci_is_pcie(vdev->pdev)) {
> + info.flags |= VFIO_DEVICE_FLAGS_PCI_AER;
> + info.flags |= VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY;

Not sure this second flag should be AER specific or if it's even needed,
see below for more comments on this.

> + }
> +
>   info.num_regions = VFIO_PCI_NUM_REGIONS;
>   info.num_irqs = VFIO_PCI_NUM_IRQS;
>  
> + /* Expose only implemented IRQs */
> + if (!(info.flags & VFIO_DEVICE_FLAGS_PCI_AER_NOTIFY))
> + info.num_irqs--;

I'm having second thoughts on this, see further below.

> +
>   return copy_to_user((void __user *)arg, &info, minsz);
>  
>   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
> @@ -302,6 +313,10 @@ static long vfio_pci_ioctl(void *device_data,
>   if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
>   return -EINVAL;
>  
> + if ((info.index == VFIO_PCI_ERR_IRQ_INDEX) &&
> +  !pci_is_pcie(vdev->pdev))
> + return -EINVAL;
> +

Perhaps we could incorporate the index test above this too?

switch (info.index) {
case VFIO_PCI_INTX_IRQ_INDEX: ... VFIO_PCI_MSIX_IRQ_INDEX:
break;
case VFIO_PCI_ERR_IRQ_INDEX:
if (pci_is_pcie(vdev->pdev))
break;
default:
return -EINVAL;
}

This is more similar to how I've re-written the same for the proposed
VGA/legacy I/O support.

>   info.flags = VFIO_IRQ_INFO_EVENTFD;
>  
>   info.count = vfio_pci_get_irq_count(vdev, info.index);
> @@ -538,11 +553,38 @@ static void vfio_pci_remove(struct pci_dev *pdev)
>   kfree(vdev);
>  }
>  
> +static pci_ers_result_t vfio_err_detected(struct pci_dev *pdev,
> + pci_channel_state_t state)

This is actually AER specific, right?  So perhaps it should be
vfio_pci_aer_err_detected?

Also, please follow existing whitespace usage throughout, tabs followed
by spaces to align function parameter wrap.

> +{
> + struct vfio_pci_device *vpdev;
> + void *vdev;

struct vfio_device *vdev;

> +
> + vdev = vfio_device_get_from_dev(&pdev->dev);
> + if (vdev == NULL)
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + vpdev = vfio_device_data(vdev);
> + if (vpdev == NULL)
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + if (vpdev->err_trigger)
> + eventfd_signal(vpdev->err_trigger, 1);
> +
> + vfio_device_put_vdev(vdev);
> +
> + return PCI_ERS_RESULT_CAN_RECOVER;
> +}
> +
> +static const struct pci_error_handlers vfio_err_handlers = {
> + .error_detected = vfio_err_detected,
> +};
> +
>  static struct pci_driver vfio_pci_driver = {
>   .name   = "vfio-pci",
>   .id_table   = NULL, /* only dynamic ids */
>   .probe  = vfio_pci_probe,
>   .remove = vfio_pci_remove,
> + .err_handler= &vfio_err_handlers,
>  };
>  
>  static void __exit vfio_pci_cleanup(void)
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
> b/drivers/vfio/pci/vfio_pci_intrs.c
> index 3639371..f003e08 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -745,6 +745,31 @@ static int vfio_pci_se

Re: [PATCH] kvm: Fix irqfd resampler list walk

2013-01-28 Thread Marcelo Tosatti
On Mon, Jan 28, 2013 at 08:25:00AM -0700, Alex Williamson wrote:
> On Mon, 2012-12-10 at 18:16 -0200, Marcelo Tosatti wrote:
> > On Thu, Dec 06, 2012 at 02:44:59PM -0700, Alex Williamson wrote:
> > > Typo for the next pointer means we're walking random data here.
> > > 
> > > Signed-off-by: Alex Williamson 
> > > Cc: sta...@vger.kernel.org [3.7]
> > > ---
> > > 
> > > Not sure if this will make 3.7, so preemptively adding the stable flag
> > > 
> > >  virt/kvm/eventfd.c |2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > Applied, thanks.
> > 
> 
> Hi Marcelo,
> 
> This didn't seem to make it into 3.7.0 or any stable 3.7.  Can we
> promote it for stable?  Thanks,
> 
> Alex

Yes, please send the patch with the backport you're interested in being
included to sta...@kernel.org.

http://www.kernel.org/doc/Documentation/stable_kernel_rules.txt


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of Fault Tolerance feature?

2013-01-28 Thread Brian Jackson
On Mon, 21 Jan 2013 14:24:12 +0200
Andres Toomsalu  wrote:

> Hi,
> 
> Could anyone shed a light what happened to Kemari project and are
> there any upcoming development planned in order to provide continous
> non-blocking VM checkpointing and VM HA with state replication?
> 
> Kind regards,

The project hasn't been actively developed in years and there has been
no public information about it in probably longer. So state is
"unknown".
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: windows 2008 guest causing rcu_shed to emit NMI

2013-01-28 Thread Andrey Korolyov
On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov  wrote:
> On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti  wrote:
>> On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote:
>>> On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti  
>>> wrote:
>>> > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
>>> >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti  
>>> >> wrote:
>>> >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
>>> >> >> Thank you Marcelo,
>>> >> >>
>>> >> >> Host node locking up sometimes later than yesterday, bur problem still
>>> >> >> here, please see attached dmesg. Stuck process looks like
>>> >> >> root 19251  0.0  0.0 228476 12488 ?D14:42   0:00
>>> >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
>>> >> >> virtio-blk-pci,? -device
>>> >> >>
>>> >> >> on fourth vm by count.
>>> >> >>
>>> >> >> Should I try upstream kernel instead of applying patch to the latest
>>> >> >> 3.4 or it is useless?
>>> >> >
>>> >> > If you can upgrade to an upstream kernel, please do that.
>>> >> >
>>> >>
>>> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing
>>> >> again. External symptoms looks like following: starting from some
>>> >> count, may be third or sixth vm, qemu-kvm process allocating its
>>> >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
>>> >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
>>> >> kill stuck kvm processes and node returned back to the normal, when on
>>> >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
>>> >> output (problem and workaround when no scheduler involved described
>>> >> here http://www.spinics.net/lists/kvm/msg84799.html).
>>> >
>>> > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module 
>>> > parameter.
>>> >
>>>
>>> Hi Marcelo,
>>>
>>> thanks, this parameter helped to increase number of working VMs in a
>>> half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10
>>> to 15 percents, persists on such numbers for a long time, where linux
>>> guests in same configuration do not jump over one percent even under
>>> stress bench. After I disabled HT, crash happens only in long runs and
>>> now it is kernel panic :)
>>> Stair-like memory allocation behaviour disappeared, but other symptom
>>> leading to the crash which I have not counted previously, persists: if
>>> VM count is ``enough'' for crash, some qemu processes starting to eat
>>> one core, and they`ll panic system after run in tens of minutes in
>>> such state or if I try to attach debugger to one of them. If needed, I
>>> can log entire crash output via netconsole, now I have some tail,
>>> almost the same every time:
>>> http://xdel.ru/downloads/btwin.png
>>
>> Yes, please log entire crash output, thanks.
>>
>
> Here please, 3.7.4-vanilla, 16 vms, ple_gap=0:
>
> http://xdel.ru/downloads/oops-default-kvmintel.txt

Just an update: I was able to reproduce that on pure linux VMs using
qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at
start of vm(with count ten working machines at the moment). Qemu-1.1.2
generally is not able to reproduce that, but host node with older
version crashing on less amount of Windows VMs(three to six instead
ten to fifteen) than with 1.3, please see trace below:

http://xdel.ru/downloads/oops-old-qemu.txt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3] KVM: VMX: enable acknowledge interupt on vmexit

2013-01-28 Thread Zhang, Yang Z
Gleb Natapov wrote on 2013-01-28:
> On Mon, Jan 28, 2013 at 08:54:07AM +0800, Yang Zhang wrote:
>> From: Yang Zhang 
>> 
>> The "acknowledge interrupt on exit" feature controls processor behavior
>> for external interrupt acknowledgement. When this control is set, the
>> processor acknowledges the interrupt controller to acquire the
>> interrupt vector on VM exit.
>> 
>> After enabling this feature, an interrupt which arrived when target cpu
>> is running in vmx non-root mode will be handled by vmx handler instead
>> of handler in idt. Currently, vmx handler only fakes an interrupt stack
>> and jump to idt table to let real handler to handle it. Further, we
>> will recognize the interrupt and only delivery the interrupt which not
>> belong to current vcpu through idt table. The interrupt which belonged
>> to current vcpu will be handled inside vmx handler. This will reduce
>> the interrupt handle cost of KVM.
>> 
>> Refer to Intel SDM volum 3, chapter 33.2.
>> 
>> Signed-off-by: Yang Zhang 
>> ---
>>  arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/svm.c   
>>|6  arch/x86/kvm/vmx.c  |   61
>>  -- arch/x86/kvm/x86.c 
>>  |3 +- 4 files changed, 67 insertions(+), 5 deletions(-)
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h index 77d56a4..07daf10 100644 ---
>> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -340,6 +340,7 @@ struct kvm_vcpu_arch {
>>  unsigned long cr8;
>>  u32 hflags;
>>  u64 efer;
>> +unsigned long host_idt_base;
> Should be in vmx.
> 
>>  u64 apic_base;  struct kvm_lapic *apic;/* kernel irqchip context
>>  */  unsigned long apic_attention; @@ -725,6 +726,7 @@ struct
>>  kvm_x86_ops {   int (*check_intercept)(struct kvm_vcpu *vcpu,   
>>   
>>  struct x86_instruction_info *info, enum 
>> x86_intercept_stage
>>  stage); +   void (*handle_external_intr)(struct kvm_vcpu *vcpu); };
>>  
>>  struct kvm_arch_async_pf {
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index d29d3cd..c283185 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -4227,6 +4227,11 @@ out:
>>  return ret;
>>  }
>> +static void svm_handle_external_intr(struct kvm_vcpu *vcpu)
>> +{
>> +local_irq_enable();
>> +}
>> +
>>  static struct kvm_x86_ops svm_x86_ops = {   .cpu_has_kvm_support =
>>  has_svm,.disabled_by_bios = is_disabled, @@ -4318,6 +4323,7 @@
>>  static struct kvm_x86_ops svm_x86_ops = {   .set_tdp_cr3 = set_tdp_cr3,
>>  
>>  .check_intercept = svm_check_intercept, +   .handle_external_intr =
>>  svm_handle_external_intr, };
>>  
>>  static int __init svm_init(void)
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 02eeba8..243ce45 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2565,7 +2565,8 @@ static __init int setup_vmcs_config(struct
> vmcs_config *vmcs_conf)
>>  #ifdef CONFIG_X86_64
>>  min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
>>  #endif
>> -opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT;
>> +opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
>> +VM_EXIT_ACK_INTR_ON_EXIT;
>>  if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
>>  &_vmexit_control) < 0)
>>  return -EIO;
>> @@ -3742,7 +3743,7 @@ static void vmx_disable_intercept_for_msr(u32 msr,
> bool longmode_only)
>>   * Note that host-state that does change is set elsewhere. E.g., host-state
>>   * that is set differently for each CPU is set in vmx_vcpu_load(), not here.
>>   */
>> -static void vmx_set_constant_host_state(void)
>> +static void vmx_set_constant_host_state(struct kvm_vcpu *vcpu)
>>  {
>>  u32 low32, high32;
>>  unsigned long tmpl;
>> @@ -3770,6 +3771,7 @@ static void vmx_set_constant_host_state(void)
>> 
>>  native_store_idt(&dt);
>>  vmcs_writel(HOST_IDTR_BASE, dt.address);   /* 22.2.4 */
>> +vcpu->arch.host_idt_base = dt.address;
>> 
>>  vmcs_writel(HOST_RIP, vmx_return); /* 22.2.5 */
>> @@ -3884,7 +3886,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>> 
>>  vmcs_write16(HOST_FS_SELECTOR, 0);/* 22.2.4 */
>>  vmcs_write16(HOST_GS_SELECTOR, 0);/* 22.2.4 */
>> -vmx_set_constant_host_state();
>> +vmx_set_constant_host_state(&vmx->vcpu);
>>  #ifdef CONFIG_X86_64
>>  rdmsrl(MSR_FS_BASE, a);
>>  vmcs_writel(HOST_FS_BASE, a); /* 22.2.4 */
>> @@ -6094,6 +6096,56 @@ static void vmx_complete_atomic_exit(struct
> vcpu_vmx *vmx)
>>  }
>>  }
>> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +u32
>> exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); +   if ((exit_intr_info
>> & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK)) +   
>> ==
>> (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) { + unsigned int 
>> vector;
>> +unsigned 

Re: [PATCH v13 0/3] x86, apicv: Add APIC virtualization support

2013-01-28 Thread Marcelo Tosatti
On Fri, Jan 25, 2013 at 10:18:48AM +0800, Yang Zhang wrote:
> From: Yang Zhang 
> 
> APIC virtualization is a new feature which can eliminate most of VM exit
> when vcpu handle a interrupt:
> 
> APIC register virtualization:
> APIC read access doesn't cause APIC-access VM exits.
> APIC write becomes trap-like.
> 
> Virtual interrupt delivery:
 Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> manually, which is fully taken care of by the hardware.
> 
> The first patch adds APIC register virtualization supporting.
> The second patch adds virtual x2apic mode supporting since it is required by 
> APICv when guest uses msr based way to access APIC.
> The third patch adds virtual interrupt delivery supporting.
> 
> Please refer to Intel SDM volume 3, chapter 29 for more details.
> Changes v12 to v13:
>  * Remove unnecessary check when set virtualized apic access
>  * Use vm_need_tpr_shadow() instead read vmcs to check tpr.
>  * Check irqchip_in_kernel when set msr bitmap.
>  * Correct comment format
>  * Remove unnecessary callback when set eoi exit bitmap.
>  * Disable vid when irqchip is in userspace.
>  * Rename vmx_vcpu_has_apicv to vmx_vm_has_apicv.
>  * Rebased on top of KVM upstream
> 
> Changes v11 to v12:
>  * Check irqchip in kernel when enabling apicv, if using userspace irq chip,
>apicv cannot be used and must be disabled.
>  * Rename some fucntion to more descriptive name.
>  * Move ioapic entry pase logic to lapic.c
>  * Rebased on top of KVM upstream
> 
> Changes v10 to v11:
>  * Use two new msr bitmaps for guest that enabling x2apic mode:
>Since msr bitmap is shared by all guests, it will break guest that
>not using x2apic when updating the global msr bitmap. To solve this,
>we use two new msr bitmap for guest which using x2apic.
> 
> Changes v9 to v10:
>  * Enable virtualize x2apic mode when guest is using x2apic and apicv:
>There is no point to enable x2apic mode when apicv is disabled.
>  * Grep ioapic_lock when traversing ioapic entry to set eoi exit bitmap
>  * Rebased on top of KVM upstream
> 
> Changes v8 to v9:
>  * Update eoi exit bitmap by vcpu itself.
>  * Enable virtualize x2apic mode when guest is using x2apic.
>  * Rebase on top of KVM upstream
> 
> Changes v7 to v8:
>  * According Marcelo's suggestion, add comments for irr_pending and isr_count,
>since the two valiables have different meaning when using apicv.
>  * Set highest bit in vISR to SVI after migation.
>  * Use spinlock to access eoi exit bitmap synchronously.
>  * Enable virtualize x2apic mode when guest is using x2apic
>  * Rebased on top of KVM upstream.
>  
> 
> Yang Zhang (3):
>   x86, apicv: add APICv register virtualization support
>   x86, apicv: add virtual x2apic support
>   x86, apicv: add virtual interrupt delivery support
> 
>  arch/ia64/kvm/lapic.h   |6 +
>  arch/x86/include/asm/kvm_host.h |6 +
>  arch/x86/include/asm/vmx.h  |   14 ++
>  arch/x86/kvm/irq.c  |   56 ++-
>  arch/x86/kvm/lapic.c|  140 +
>  arch/x86/kvm/lapic.h|   34 
>  arch/x86/kvm/svm.c  |   24 +++
>  arch/x86/kvm/vmx.c  |  327 
> ---
>  arch/x86/kvm/x86.c  |   23 +++-
>  include/linux/kvm_host.h|3 +
>  virt/kvm/ioapic.c   |   39 +
>  virt/kvm/ioapic.h   |4 +
>  virt/kvm/irq_comm.c |   25 +++
>  virt/kvm/kvm_main.c |5 +
>  14 files changed, 647 insertions(+), 59 deletions(-)

Reviewed-by: Marcelo Tosatti 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte

2013-01-28 Thread Marcelo Tosatti
On Wed, Jan 23, 2013 at 06:05:29PM +0800, Xiao Guangrong wrote:
> In order to detecting spte remapping, we can simply check whether the
> spte has already been pointing to the pfn even if the spte is not the
> last spte, for middle spte is pointing to the kernel pfn which can not
> be mapped to userspace

This check is detecting spte overwrite, when a large spte is replaced by 
pointer to spte table.

Can't see why check for different pfn is safe: only 'int level' can
differ, and pfn be equivalent, for example.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/12] KVM: MMU: introduce a static table to map guest access to spte access

2013-01-28 Thread Marcelo Tosatti
On Fri, Jan 25, 2013 at 10:46:31AM +0800, Xiao Guangrong wrote:
> On 01/25/2013 08:15 AM, Marcelo Tosatti wrote:
> > On Wed, Jan 23, 2013 at 06:07:20PM +0800, Xiao Guangrong wrote:
> >> It makes set_spte more clean and reduces branch prediction
> >>
> >> Signed-off-by: Xiao Guangrong 
> >> ---
> >>  arch/x86/kvm/mmu.c |   37 ++---
> >>  1 files changed, 26 insertions(+), 11 deletions(-)
> > 
> > Don't see set_spte as being a performance problem?
> > IMO the current code is quite simple.
> 
> Yes, this is not a performance problem.
> 
> I just dislike this many continuous "if"-s in the function:
> 
> if (xxx)
>   xxx
> if (xxx)
>   xxx
> 
> 
> Totally, it has 7 "if"-s before this patch.
> 
> Okay, if you think this is unnecessary, i will drop this patch. :)

Yes, please (unless you can show set_spte is a performance problem).


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/11] ksm: NUMA trees and page migration

2013-01-28 Thread Hugh Dickins
On Mon, 28 Jan 2013, Andrew Morton wrote:
> On Fri, 25 Jan 2013 17:53:10 -0800 (PST)
> Hugh Dickins  wrote:
> 
> > Here's a KSM series
> 
> Sanity check: do you have a feeling for how useful KSM is? 
> Performance/space improvements for typical (or atypical) workloads? 
> Are people using it?  Successfully?
> 
> IOW, is it justifying itself?

I have no idea!  To me it's simply a technical challenge - and I agree
with your implication that that's not a good enough justification.

I've added Marcelo and Gleb and the KVM list to the Cc:
my understanding is that it's the KVM guys who really appreciate KSM.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] KVM: set_memory_region: Identify the requested change explicitly

2013-01-28 Thread Takuya Yoshikawa
KVM_SET_USER_MEMORY_REGION forces __kvm_set_memory_region() to identify
what kind of change is being requested by checking the arguments.  The
current code does this checking at various points in code and each
condition being used there is not easy to understand at first glance.

This patch consolidates these checks and introduces an enum to name the
possible changes to clean up the code.

Although this does not introduce any functional changes, there is one
change which optimizes the code a bit: if we have nothing to change, the
new code returns 0 immediately.

Note that the return value for this case cannot be changed since QEMU
relies on it: we noticed this when we changed it to -EINVAL and got a
section mismatch error at the final stage of live migration.

Signed-off-by: Takuya Yoshikawa 
---
v2: updated iommu related parts
v3: converted !(A == B) to A != B
Note: based on this patch, disallowing RO change will become trivial:
check such changes when we identify KVM_MR_FLAGS_ONLY.

 virt/kvm/kvm_main.c |   64 +++
 1 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3fec2cd..c07be48 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -714,6 +714,24 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
 }
 
 /*
+ * KVM_SET_USER_MEMORY_REGION ioctl allows the following operations:
+ * - create a new memory slot
+ * - delete an existing memory slot
+ * - modify an existing memory slot
+ *   -- move it in the guest physical memory space
+ *   -- just change its flags
+ *
+ * Since flags can be changed by some of these operations, the following
+ * differentiation is the best we can do for __kvm_set_memory_region():
+ */
+enum kvm_mr_change {
+   KVM_MR_CREATE,
+   KVM_MR_DELETE,
+   KVM_MR_MOVE,
+   KVM_MR_FLAGS_ONLY,
+};
+
+/*
  * Allocate some memory and give it an address in the guest physical address
  * space.
  *
@@ -732,6 +750,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
struct kvm_memory_slot old, new;
struct kvm_memslots *slots = NULL, *old_memslots;
bool old_iommu_mapped;
+   enum kvm_mr_change change;
 
r = check_memory_region_flags(mem);
if (r)
@@ -775,17 +794,30 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
old_iommu_mapped = old.npages;
 
-   /*
-* Disallow changing a memory slot's size or changing anything about
-* zero sized slots that doesn't involve making them non-zero.
-*/
r = -EINVAL;
-   if (npages && old.npages && npages != old.npages)
-   goto out;
-   if (!npages && !old.npages)
+   if (npages) {
+   if (!old.npages)
+   change = KVM_MR_CREATE;
+   else { /* Modify an existing slot. */
+   if ((mem->userspace_addr != old.userspace_addr) ||
+   (npages != old.npages))
+   goto out;
+
+   if (base_gfn != old.base_gfn)
+   change = KVM_MR_MOVE;
+   else if (new.flags != old.flags)
+   change = KVM_MR_FLAGS_ONLY;
+   else { /* Nothing to change. */
+   r = 0;
+   goto out;
+   }
+   }
+   } else if (old.npages) {
+   change = KVM_MR_DELETE;
+   } else /* Modify a non-existent slot: disallowed. */
goto out;
 
-   if ((npages && !old.npages) || (base_gfn != old.base_gfn)) {
+   if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
/* Check for overlaps */
r = -EEXIST;
kvm_for_each_memslot(slot, kvm->memslots) {
@@ -803,20 +835,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
new.dirty_bitmap = NULL;
 
r = -ENOMEM;
-
-   /*
-* Allocate if a slot is being created.  If modifying a slot,
-* the userspace_addr cannot change.
-*/
-   if (!old.npages) {
+   if (change == KVM_MR_CREATE) {
new.user_alloc = user_alloc;
new.userspace_addr = mem->userspace_addr;
 
if (kvm_arch_create_memslot(&new, npages))
goto out_free;
-   } else if (npages && mem->userspace_addr != old.userspace_addr) {
-   r = -EINVAL;
-   goto out_free;
}
 
/* Allocate page dirty bitmap if needed */
@@ -825,7 +849,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
goto out_free;
}
 
-   if (!npages || base_gfn != old.base_gfn) {
+   if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
r = -ENOMEM;
slots = kmemdup(kvm->memslots, sizeof(struct kvm_memslots),
GFP_KERNEL

Re: [PATCH 0/2] kvm: IOMMU read-only mapping support

2013-01-28 Thread Takuya Yoshikawa
On Mon, 28 Jan 2013 08:36:56 -0700
Alex Williamson  wrote:

> On Mon, 2013-01-28 at 21:25 +0900, Takuya Yoshikawa wrote:
> > On Mon, 28 Jan 2013 12:59:03 +0200
> > Gleb Natapov  wrote:
> > 
> > > > It sets spte based on the old value that means the readonly flag check
> > > > is missed. We need to call kvm_arch_flush_shadow_all under this case.
> > > Why not just disallow changing memory region KVM_MEM_READONLY flag
> > > without deleting the region?
> > 
> > Sounds good.
> > 
> > If you prefer, I can fold the required change into my patch.
> 
> That would seem to make my patch 1/2 unnecessary.  Thanks,

I've decided not to fold that change into my
  KVM: set_memory_region: Identify the requested change explicitly
since this is a functional change, see v3 I posted today.

Since QEMU is not the only hypervisor using KVM, we should make the
change with a proper subject informing every user of the new restriction.

If the maintainers prefer to remove Alex's patch 1/2 first and then
rebase my v3 patch, I can do so.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/12] KVM: MMU: simplify mmu_set_spte

2013-01-28 Thread Xiao Guangrong
On 01/29/2013 08:21 AM, Marcelo Tosatti wrote:
> On Wed, Jan 23, 2013 at 06:05:29PM +0800, Xiao Guangrong wrote:
>> In order to detecting spte remapping, we can simply check whether the
>> spte has already been pointing to the pfn even if the spte is not the
>> last spte, for middle spte is pointing to the kernel pfn which can not
>> be mapped to userspace
> 
> This check is detecting spte overwrite, when a large spte is replaced by 
> pointer to spte table.
> 
> Can't see why check for different pfn is safe: only 'int level' can
> differ, and pfn be equivalent, for example.

The 'u64 *sptep' must on the "int level" we want to set, that means:
 page_header(__pa(sptep)).role.level == "int level".


We discussed this before :), the discussion can be found at:
http://marc.info/?l=kvm&m=135345057329427&w=2.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/12] KVM: MMU: lazily drop large spte

2013-01-28 Thread Xiao Guangrong
On 01/27/2013 08:06 PM, Gleb Natapov wrote:
> On Wed, Jan 23, 2013 at 06:04:17PM +0800, Xiao Guangrong wrote:
>> Do not drop large spte until it can be insteaded by small pages so that
>> the guest can happliy read memory through it
>>
>> The idea is from Avi:
>> | As I mentioned before, write-protecting a large spte is a good idea,
>> | since it moves some work from protect-time to fault-time, so it reduces
>> | jitter.  This removes the need for the return value.
>>
>> Signed-off-by: Xiao Guangrong 
>> ---
>>  arch/x86/kvm/mmu.c |   21 ++---
>>  1 files changed, 6 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 9f628f7..0f90269 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1105,7 +1105,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 
>> *sptep)
>>
>>  /*
>>   * Write-protect on the specified @sptep, @pt_protect indicates whether
>> - * spte writ-protection is caused by protecting shadow page table.
>> + * spte write-protection is caused by protecting shadow page table.
>>   * @flush indicates whether tlb need be flushed.
>>   *
>>   * Note: write protection is difference between drity logging and spte
>> @@ -1114,31 +1114,23 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, 
>> u64 *sptep)
>>   *   its dirty bitmap is properly set.
>>   * - for spte protection, the spte can be writable only after unsync-ing
>>   *   shadow page.
>> - *
>> - * Return true if the spte is dropped.
>>   */
>> -static bool
>> +static void
>>  spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool 
>> pt_protect)
> Since return value is not longer used make the function return true if flush 
> is needed
> instead of returning it via pointer to a variable.

Right, i forgot to check it, will update it in the next version. Thanks for 
your pointing
it out.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/12] KVM: MMU: unify the code of walking pte list

2013-01-28 Thread Xiao Guangrong
On 01/27/2013 09:28 PM, Gleb Natapov wrote:

>>
>> @@ -1256,19 +1243,18 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, 
>> unsigned long *rmapp,
>>
>>  if (pte_write(*ptep)) {
>>  drop_spte(kvm, sptep);
>> -sptep = rmap_get_first(*rmapp, &iter);
>> -} else {
>> -new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
>> -new_spte |= (u64)new_pfn << PAGE_SHIFT;
>> +goto restart;
> I do not like this "goto restart" pattern throughout this patch. Follow up
> patch gets rid of most of them, so they are tolerable as a temporary solution
> here, but this one stays. What about moving pte_write(*ptep) outside
> for_each_spte_in_rmap() loop like that:
> 
> if (pte_write(*ptep))
>   need_flush = kvm_unmap_rmapp();
> else
>   for_each_spte_in_rmap() {
>   }

Nice. Will do.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: IOMMU read-only mapping support

2013-01-28 Thread Xiao Guangrong
On 01/28/2013 06:59 PM, Gleb Natapov wrote:
> On Fri, Jan 25, 2013 at 11:28:40AM +0800, Xiao Guangrong wrote:
>> On 01/25/2013 09:17 AM, Takuya Yoshikawa wrote:
>>> On Thu, 24 Jan 2013 15:03:57 -0700
>>> Alex Williamson  wrote:
>>>
 A couple patches to make KVM IOMMU support honor read-only mappings.
 This causes an un-map, re-map when the read-only flag changes and
 makes use of it when setting IOMMU attributes.  Thanks,
>>>
>>> Looks good to me.
>>>
>>> I think I can naturally update my patch after this gets merged.
>>>
>>
>> Please wait.
>>
>> The commit c972f3b1 changed the write-protect behaviour - it does
>> wirte-protection only when dirty flag is set.
>> [ I did not see this commit when we discussed the problem before. ]
>>
>> Further more, i notice that write-protect is not enough, when do sync
>> shadow page:
>>
>> FNAME(sync_page):
>>
>>  host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
>>
>>  set_spte(vcpu, &sp->spt[i], pte_access,
>>   PT_PAGE_TABLE_LEVEL, gfn,
>>   spte_to_pfn(sp->spt[i]), true, false,
>>   host_writable);
>>
>> It sets spte based on the old value that means the readonly flag check
>> is missed. We need to call kvm_arch_flush_shadow_all under this case.
> Why not just disallow changing memory region KVM_MEM_READONLY flag
> without deleting the region?

It will introduce some restriction when VM-sharing-mem is being implemented,
but we need to do some optimization for it, at least, properly write-protect
readonly pages (fix sync_page()) instead of zap_all_page.

So, i guess we can do the simple fix first.




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH V2 00/20] Multiqueue virtio-net

2013-01-28 Thread Wanlong Gao
On 01/28/2013 12:24 PM, Jason Wang wrote:
> On 01/28/2013 11:27 AM, Wanlong Gao wrote:
>> On 01/25/2013 06:35 PM, Jason Wang wrote:
>>> Hello all:
>>>
>>> This seires is an update of last version of multiqueue virtio-net support.
>>>
>>> This series tries to brings multiqueue support to virtio-net through a
>>> multiqueue support tap backend and multiple vhost threads.
>>>
>>> To support this, multiqueue nic support were added to qemu. This is done by
>>> introducing an array of NetClientStates in NICState, and make each pair of 
>>> peers
>>> to be an queue of the nic. This is done in patch 1-7.
>>>
>>> Tap were also converted to be able to create a multiple queue
>>> backend. Currently, only linux support this by issuing TUNSETIFF N times 
>>> with
>>> the same device name to create N queues. Each fd returned by TUNSETIFF were 
>>> a
>>> queue supported by kernel. Three new command lines were introduced, "queues"
>>> were used to tell how many queues will be created by qemu; "fds" were used 
>>> to
>>> pass multiple pre-created tap file descriptors to qemu; "vhostfds" were 
>>> used to
>>> pass multiple pre-created vhost descriptors to qemu. This is done in patch 
>>> 8-13.
>>>
>>> A method of deleting a queue and queue_index were also introduce for virtio,
>>> this is done in patch 14-15.
>>>
>>> Vhost were also changed to support multiqueue by introducing a start vq 
>>> index
>>> which tracks the first virtqueue that will be used by vhost instead of the
>>> assumption that the vhost always use virtqueue from index 0. This is done in
>>> patch 16.
>>>
>>> The last part is the multiqueue userspace changes, this is done in patch 
>>> 17-20.
>>>
>>> With this changes, user could start a multiqueue virtio-net device through
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
>>> virtio-net-pci,netdev=hn0
>>>
>>> Management tools such as libvirt can pass multiple pre-created fds/vhostfds 
>>> through
>>>
>>> ./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device 
>>> virtio-net-pci,netdev=hn0
>>>
>>> No git tree this round since github is unavailable in China...
>> I saw that github had already been opened again. I can use it.
> 
> Thanks for reminding, I've pushed the new bits to
> git://github.com/jasowang/qemu.git.

I got host kernel oops here using your qemu tree and 3.8-rc5 kernel on host,

[31499.754779] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[31499.757098] IP: [] _raw_spin_lock_irqsave+0x1f/0x40
[31499.758304] PGD 0 
[31499.759498] Oops: 0002 [#1] SMP 
[31499.760704] Modules linked in: tcp_lp fuse xt_CHECKSUM lockd ipt_MASQUERADE 
sunrpc bnep bluetooth rfkill bridge stp llc iptable_nat nf_nat_ipv4 nf_nat 
iptable_mangle nf_conntr
ack_ipv4 nf_defrag_ipv4 nf_conntrack snd_hda_codec_realtek snd_hda_intel 
snd_hda_codec vhost_net tun snd_hwdep macvtap snd_seq macvlan coretemp 
kvm_intel snd_seq_device kvm snd_p
cm crc32c_intel r8169 snd_page_alloc snd_timer ghash_clmulni_intel snd mei 
iTCO_wdt mii microcode iTCO_vendor_support uinput serio_raw wmi i2c_i801 
lpc_ich soundcore pcspkr mfd_c
ore i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: 
ip6t_REJECT]
[31499.766412] CPU 2 
[31499.766426] Pid: 18742, comm: vhost-18728 Not tainted 3.8.0-rc5 #1 LENOVO 
QiTianM4300/To be filled by O.E.M.
[31499.769340] RIP: 0010:[]  [] 
_raw_spin_lock_irqsave+0x1f/0x40
[31499.770861] RSP: 0018:8801b2f9dd08  EFLAGS: 00010086
[31499.772380] RAX: 0286 RBX:  RCX: 
[31499.773916] RDX: 0100 RSI: 0286 RDI: 
[31499.775394] RBP: 8801b2f9dd08 R08: 880132ed4368 R09: 
[31499.776923] R10: 0001 R11: 0001 R12: 880132ed8590
[31499.778466] R13: 880232a6c290 R14: 880132ed42b0 R15: 880132ed0078
[31499.780012] FS:  () GS:88023fb0() 
knlGS:
[31499.781574] CS:  0010 DS:  ES:  CR0: 80050033
[31499.783126] CR2:  CR3: 000132d9c000 CR4: 000427e0
[31499.784696] DR0:  DR1:  DR2: 
[31499.786267] DR3:  DR6: 0ff0 DR7: 0400
[31499.787822] Process vhost-18728 (pid: 18742, threadinfo 8801b2f9c000, 
task 880036959740)
[31499.788821] Stack:
[31499.790392]  8801b2f9dd38 81082534  
0001
[31499.792029]  880132ed 880232a6c290 8801b2f9dd48 
a023fab6
[31499.793677]  8801b2f9de28 a0242f64 8801b2f9ddb8 
8109e0e0
[31499.795332] Call Trace:
[31499.796974]  [] remove_wait_queue+0x24/0x50
[31499.798641]  [] vhost_poll_stop+0x16/0x20 [vhost_net]
[31499.800313]  [] handle_tx+0x4c4/0x680 [vhost_net]
[31499.801995]  [] ? idle_balance+0x1b0/0x2f0
[31499.803685]  [] handle_tx_kick+0x15/0x20 [vhost_net]
[31499.805128]  [] vhost_worker+0xed/0x190 [vhost_net]
[31499.806842]  [] ? vhost_work_flus

Re: [Qemu-devel] [PATCH V2 00/20] Multiqueue virtio-net

2013-01-28 Thread Jason Wang
On 01/29/2013 01:36 PM, Wanlong Gao wrote:
> On 01/28/2013 12:24 PM, Jason Wang wrote:
>> On 01/28/2013 11:27 AM, Wanlong Gao wrote:
>>> On 01/25/2013 06:35 PM, Jason Wang wrote:
 Hello all:

 This seires is an update of last version of multiqueue virtio-net support.

 This series tries to brings multiqueue support to virtio-net through a
 multiqueue support tap backend and multiple vhost threads.

 To support this, multiqueue nic support were added to qemu. This is done by
 introducing an array of NetClientStates in NICState, and make each pair of 
 peers
 to be an queue of the nic. This is done in patch 1-7.

 Tap were also converted to be able to create a multiple queue
 backend. Currently, only linux support this by issuing TUNSETIFF N times 
 with
 the same device name to create N queues. Each fd returned by TUNSETIFF 
 were a
 queue supported by kernel. Three new command lines were introduced, 
 "queues"
 were used to tell how many queues will be created by qemu; "fds" were used 
 to
 pass multiple pre-created tap file descriptors to qemu; "vhostfds" were 
 used to
 pass multiple pre-created vhost descriptors to qemu. This is done in patch 
 8-13.

 A method of deleting a queue and queue_index were also introduce for 
 virtio,
 this is done in patch 14-15.

 Vhost were also changed to support multiqueue by introducing a start vq 
 index
 which tracks the first virtqueue that will be used by vhost instead of the
 assumption that the vhost always use virtqueue from index 0. This is done 
 in
 patch 16.

 The last part is the multiqueue userspace changes, this is done in patch 
 17-20.

 With this changes, user could start a multiqueue virtio-net device through

 ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
 virtio-net-pci,netdev=hn0

 Management tools such as libvirt can pass multiple pre-created 
 fds/vhostfds through

 ./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device 
 virtio-net-pci,netdev=hn0

 No git tree this round since github is unavailable in China...
>>> I saw that github had already been opened again. I can use it.
>> Thanks for reminding, I've pushed the new bits to
>> git://github.com/jasowang/qemu.git.
> I got host kernel oops here using your qemu tree and 3.8-rc5 kernel on host,
>
> [31499.754779] BUG: unable to handle kernel NULL pointer dereference at   
> (null)
> [31499.757098] IP: [] _raw_spin_lock_irqsave+0x1f/0x40
> [31499.758304] PGD 0 
> [31499.759498] Oops: 0002 [#1] SMP 
> [31499.760704] Modules linked in: tcp_lp fuse xt_CHECKSUM lockd 
> ipt_MASQUERADE sunrpc bnep bluetooth rfkill bridge stp llc iptable_nat 
> nf_nat_ipv4 nf_nat iptable_mangle nf_conntr
> ack_ipv4 nf_defrag_ipv4 nf_conntrack snd_hda_codec_realtek snd_hda_intel 
> snd_hda_codec vhost_net tun snd_hwdep macvtap snd_seq macvlan coretemp 
> kvm_intel snd_seq_device kvm snd_p
> cm crc32c_intel r8169 snd_page_alloc snd_timer ghash_clmulni_intel snd mei 
> iTCO_wdt mii microcode iTCO_vendor_support uinput serio_raw wmi i2c_i801 
> lpc_ich soundcore pcspkr mfd_c
> ore i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: 
> ip6t_REJECT]
> [31499.766412] CPU 2 
> [31499.766426] Pid: 18742, comm: vhost-18728 Not tainted 3.8.0-rc5 #1 LENOVO 
> QiTianM4300/To be filled by O.E.M.
> [31499.769340] RIP: 0010:[]  [] 
> _raw_spin_lock_irqsave+0x1f/0x40
> [31499.770861] RSP: 0018:8801b2f9dd08  EFLAGS: 00010086
> [31499.772380] RAX: 0286 RBX:  RCX: 
> 
> [31499.773916] RDX: 0100 RSI: 0286 RDI: 
> 
> [31499.775394] RBP: 8801b2f9dd08 R08: 880132ed4368 R09: 
> 
> [31499.776923] R10: 0001 R11: 0001 R12: 
> 880132ed8590
> [31499.778466] R13: 880232a6c290 R14: 880132ed42b0 R15: 
> 880132ed0078
> [31499.780012] FS:  () GS:88023fb0() 
> knlGS:
> [31499.781574] CS:  0010 DS:  ES:  CR0: 80050033
> [31499.783126] CR2:  CR3: 000132d9c000 CR4: 
> 000427e0
> [31499.784696] DR0:  DR1:  DR2: 
> 
> [31499.786267] DR3:  DR6: 0ff0 DR7: 
> 0400
> [31499.787822] Process vhost-18728 (pid: 18742, threadinfo 8801b2f9c000, 
> task 880036959740)
> [31499.788821] Stack:
> [31499.790392]  8801b2f9dd38 81082534  
> 0001
> [31499.792029]  880132ed 880232a6c290 8801b2f9dd48 
> a023fab6
> [31499.793677]  8801b2f9de28 a0242f64 8801b2f9ddb8 
> 8109e0e0
> [31499.795332] Call Trace:
> [31499.796974]  [] remove_wait_queue+0x24/0x50
> [31499.798641]  [] vhost_poll_stop+0x16/0x20 [vhost_net]
> [31499.800313]

Re: [PATCH v3] KVM: VMX: enable acknowledge interupt on vmexit

2013-01-28 Thread Gleb Natapov
On Mon, Jan 28, 2013 at 11:57:09PM +, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-01-28:
> > On Mon, Jan 28, 2013 at 08:54:07AM +0800, Yang Zhang wrote:
> >> From: Yang Zhang 
> >> 
> >> The "acknowledge interrupt on exit" feature controls processor behavior
> >> for external interrupt acknowledgement. When this control is set, the
> >> processor acknowledges the interrupt controller to acquire the
> >> interrupt vector on VM exit.
> >> 
> >> After enabling this feature, an interrupt which arrived when target cpu
> >> is running in vmx non-root mode will be handled by vmx handler instead
> >> of handler in idt. Currently, vmx handler only fakes an interrupt stack
> >> and jump to idt table to let real handler to handle it. Further, we
> >> will recognize the interrupt and only delivery the interrupt which not
> >> belong to current vcpu through idt table. The interrupt which belonged
> >> to current vcpu will be handled inside vmx handler. This will reduce
> >> the interrupt handle cost of KVM.
> >> 
> >> Refer to Intel SDM volum 3, chapter 33.2.
> >> 
> >> Signed-off-by: Yang Zhang 
> >> ---
> >>  arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/svm.c   
> >>|6  arch/x86/kvm/vmx.c  |   61
> >>  -- arch/x86/kvm/x86.c 
> >>  |3 +- 4 files changed, 67 insertions(+), 5 deletions(-)
> >> diff --git a/arch/x86/include/asm/kvm_host.h
> >> b/arch/x86/include/asm/kvm_host.h index 77d56a4..07daf10 100644 ---
> >> a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -340,6 +340,7 @@ struct kvm_vcpu_arch {
> >>unsigned long cr8;
> >>u32 hflags;
> >>u64 efer;
> >> +  unsigned long host_idt_base;
> > Should be in vmx.
> > 
> >>u64 apic_base;  struct kvm_lapic *apic;/* kernel irqchip context
> >>  */unsigned long apic_attention; @@ -725,6 +726,7 @@ struct
> >>  kvm_x86_ops { int (*check_intercept)(struct kvm_vcpu *vcpu,   
> >>   
> >>  struct x86_instruction_info *info,   enum 
> >> x86_intercept_stage
> >>  stage); + void (*handle_external_intr)(struct kvm_vcpu *vcpu); };
> >>  
> >>  struct kvm_arch_async_pf {
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index d29d3cd..c283185 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -4227,6 +4227,11 @@ out:
> >>return ret;
> >>  }
> >> +static void svm_handle_external_intr(struct kvm_vcpu *vcpu)
> >> +{
> >> +  local_irq_enable();
> >> +}
> >> +
> >>  static struct kvm_x86_ops svm_x86_ops = { .cpu_has_kvm_support =
> >>  has_svm,  .disabled_by_bios = is_disabled, @@ -4318,6 +4323,7 @@
> >>  static struct kvm_x86_ops svm_x86_ops = { .set_tdp_cr3 = 
> >> set_tdp_cr3,
> >>  
> >>.check_intercept = svm_check_intercept, +   .handle_external_intr =
> >>  svm_handle_external_intr, };
> >>  
> >>  static int __init svm_init(void)
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 02eeba8..243ce45 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -2565,7 +2565,8 @@ static __init int setup_vmcs_config(struct
> > vmcs_config *vmcs_conf)
> >>  #ifdef CONFIG_X86_64
> >>min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
> >>  #endif
> >> -  opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT;
> >> +  opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
> >> +  VM_EXIT_ACK_INTR_ON_EXIT;
> >>if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
> >>&_vmexit_control) < 0)
> >>return -EIO;
> >> @@ -3742,7 +3743,7 @@ static void vmx_disable_intercept_for_msr(u32 msr,
> > bool longmode_only)
> >>   * Note that host-state that does change is set elsewhere. E.g., 
> >> host-state
> >>   * that is set differently for each CPU is set in vmx_vcpu_load(), not 
> >> here.
> >>   */
> >> -static void vmx_set_constant_host_state(void)
> >> +static void vmx_set_constant_host_state(struct kvm_vcpu *vcpu)
> >>  {
> >>u32 low32, high32;
> >>unsigned long tmpl;
> >> @@ -3770,6 +3771,7 @@ static void vmx_set_constant_host_state(void)
> >> 
> >>native_store_idt(&dt);
> >>vmcs_writel(HOST_IDTR_BASE, dt.address);   /* 22.2.4 */
> >> +  vcpu->arch.host_idt_base = dt.address;
> >> 
> >>vmcs_writel(HOST_RIP, vmx_return); /* 22.2.5 */
> >> @@ -3884,7 +3886,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> >> 
> >>vmcs_write16(HOST_FS_SELECTOR, 0);/* 22.2.4 */
> >>vmcs_write16(HOST_GS_SELECTOR, 0);/* 22.2.4 */
> >> -  vmx_set_constant_host_state();
> >> +  vmx_set_constant_host_state(&vmx->vcpu);
> >>  #ifdef CONFIG_X86_64
> >>rdmsrl(MSR_FS_BASE, a);
> >>vmcs_writel(HOST_FS_BASE, a); /* 22.2.4 */
> >> @@ -6094,6 +6096,56 @@ static void vmx_complete_atomic_exit(struct
> > vcpu_vmx *vmx)
> >>}
> >>  }
> >> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ +  u32
> >> exit_intr

Re: [PATCH 0/2] kvm: IOMMU read-only mapping support

2013-01-28 Thread Gleb Natapov
On Tue, Jan 29, 2013 at 11:06:43AM +0800, Xiao Guangrong wrote:
> On 01/28/2013 06:59 PM, Gleb Natapov wrote:
> > On Fri, Jan 25, 2013 at 11:28:40AM +0800, Xiao Guangrong wrote:
> >> On 01/25/2013 09:17 AM, Takuya Yoshikawa wrote:
> >>> On Thu, 24 Jan 2013 15:03:57 -0700
> >>> Alex Williamson  wrote:
> >>>
>  A couple patches to make KVM IOMMU support honor read-only mappings.
>  This causes an un-map, re-map when the read-only flag changes and
>  makes use of it when setting IOMMU attributes.  Thanks,
> >>>
> >>> Looks good to me.
> >>>
> >>> I think I can naturally update my patch after this gets merged.
> >>>
> >>
> >> Please wait.
> >>
> >> The commit c972f3b1 changed the write-protect behaviour - it does
> >> wirte-protection only when dirty flag is set.
> >> [ I did not see this commit when we discussed the problem before. ]
> >>
> >> Further more, i notice that write-protect is not enough, when do sync
> >> shadow page:
> >>
> >> FNAME(sync_page):
> >>
> >>host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
> >>
> >>set_spte(vcpu, &sp->spt[i], pte_access,
> >> PT_PAGE_TABLE_LEVEL, gfn,
> >> spte_to_pfn(sp->spt[i]), true, false,
> >> host_writable);
> >>
> >> It sets spte based on the old value that means the readonly flag check
> >> is missed. We need to call kvm_arch_flush_shadow_all under this case.
> > Why not just disallow changing memory region KVM_MEM_READONLY flag
> > without deleting the region?
> 
> It will introduce some restriction when VM-sharing-mem is being implemented,
> but we need to do some optimization for it, at least, properly write-protect
> readonly pages (fix sync_page()) instead of zap_all_page.
> 
What is VM-sharing-mem?

> So, i guess we can do the simple fix first.
> 
By simple fix you mean calling kvm_arch_flush_shadow_all() on READONLY
flag change?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: IOMMU read-only mapping support

2013-01-28 Thread Xiao Guangrong
On 01/29/2013 02:50 PM, Gleb Natapov wrote:
> On Tue, Jan 29, 2013 at 11:06:43AM +0800, Xiao Guangrong wrote:
>> On 01/28/2013 06:59 PM, Gleb Natapov wrote:
>>> On Fri, Jan 25, 2013 at 11:28:40AM +0800, Xiao Guangrong wrote:
 On 01/25/2013 09:17 AM, Takuya Yoshikawa wrote:
> On Thu, 24 Jan 2013 15:03:57 -0700
> Alex Williamson  wrote:
>
>> A couple patches to make KVM IOMMU support honor read-only mappings.
>> This causes an un-map, re-map when the read-only flag changes and
>> makes use of it when setting IOMMU attributes.  Thanks,
>
> Looks good to me.
>
> I think I can naturally update my patch after this gets merged.
>

 Please wait.

 The commit c972f3b1 changed the write-protect behaviour - it does
 wirte-protection only when dirty flag is set.
 [ I did not see this commit when we discussed the problem before. ]

 Further more, i notice that write-protect is not enough, when do sync
 shadow page:

 FNAME(sync_page):

host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;

set_spte(vcpu, &sp->spt[i], pte_access,
 PT_PAGE_TABLE_LEVEL, gfn,
 spte_to_pfn(sp->spt[i]), true, false,
 host_writable);

 It sets spte based on the old value that means the readonly flag check
 is missed. We need to call kvm_arch_flush_shadow_all under this case.
>>> Why not just disallow changing memory region KVM_MEM_READONLY flag
>>> without deleting the region?
>>
>> It will introduce some restriction when VM-sharing-mem is being implemented,
>> but we need to do some optimization for it, at least, properly write-protect
>> readonly pages (fix sync_page()) instead of zap_all_page.
>>
> What is VM-sharing-mem?

Sharing memory between different guests.

> 
>> So, i guess we can do the simple fix first.
>>
> By simple fix you mean calling kvm_arch_flush_shadow_all() on READONLY
> flag change?

Simply disallow READONLY flag changing.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html