Network performance data

2013-06-27 Thread Bill Rich
Hello All,

I've run into a problem with getting network performance data on
Windows VMs running on KVM. When I check the network data in the
Windows task manager on the VM, it remains at zero, even if large
amounts of data are being transferred. This has been tested on Windows
Server 2008r2 using the standard Windows driver and the e1000 nic. I
searched the web and the bug reports specifically, but I didn't find
this issue mentioned. Is this expected behavior, or is there something
I can do to fix it?


Below is the info on the hypervisor the VM is running on:

OS: CentOS release 6.4
Kernel: 2.6.32-358.11.1.el6.x86_64
qemu-kvm: 0.12.1.2-3.209.el6.4.x86_64


And the VM details:

domain type='kvm'
  namei-3-13643-VM/name
  uuid716c272a-8822-3f46-abf3-c65dd177389e/uuid
  descriptionWindows Server 2008 R2 (64-bit)/description
  memory unit='KiB'524288/memory
  currentMemory unit='KiB'524288/currentMemory
  vcpu placement='static'1/vcpu
  os
type arch='x86_64' machine='rhel6.2.0'hvm/type
boot dev='cdrom'/
boot dev='hd'/
  /os
  features
acpi/
apic/
pae/
  /features
  clock offset='localtime'
timer name='rtc' tickpolicy='catchup'/
  /clock
  on_poweroffdestroy/on_poweroff
  on_rebootrestart/on_reboot
  on_crashdestroy/on_crash
  devices
emulator/usr/libexec/qemu-kvm/emulator
disk type='file' device='disk'
  driver name='qemu' type='qcow2' cache='none'/
  source 
file='/var/lib/libvirt/images/369c3379-ad6b-4828-8b6e-b6fede0f60b7'/
  target dev='hda' bus='ide'/
  address type='drive' controller='0' bus='0' target='0' unit='0'/
/disk
disk type='file' device='cdrom'
  driver name='qemu' type='raw' cache='none'/
  target dev='hdc' bus='ide'/
  readonly/
  address type='drive' controller='0' bus='1' target='0' unit='0'/
/disk
controller type='usb' index='0'
  address type='pci' domain='0x' bus='0x00' slot='0x01'
function='0x2'/
/controller
controller type='ide' index='0'
  address type='pci' domain='0x' bus='0x00' slot='0x01'
function='0x1'/
/controller
interface type='bridge'
  mac address='06:da:2a:00:00:0b'/
  source bridge='br-public'/
  model type='e1000'/
  address type='pci' domain='0x' bus='0x00' slot='0x03'
function='0x0'/
/interface
serial type='pty'
  target port='0'/
/serial
console type='pty'
  target type='serial' port='0'/
/console
input type='tablet' bus='usb'/
input type='mouse' bus='ps2'/
graphics type='vnc' port='-1' autoport='yes'/
video
  model type='cirrus' vram='9216' heads='1'/
  address type='pci' domain='0x' bus='0x00' slot='0x02'
function='0x0'/
/video
memballoon model='virtio'
  address type='pci' domain='0x' bus='0x00' slot='0x04'
function='0x0'/
/memballoon
  /devices
/domain


Thanks,
Bill Rich
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM/ARM queue for 3.11

2013-06-27 Thread Gleb Natapov
On Wed, Jun 26, 2013 at 12:05:54PM -0700, Christoffer Dall wrote:
 Hi Gleb and Paolo,
 
 The following changes since commit 87d41fb4da6467622b7a87fd6afe8071abab6dae:
 
   KVM: s390: Fixed priority of execution in STSI (2013-06-20 23:33:01 +0200)
 
 are available in the git repository at:
 
   git://git.linaro.org/people/cdall/linux-kvm-arm.git tags/kvm-arm-3.11
 
 for you to fetch changes up to 8bd4ffd6b3a98f00267051dc095076ea2ff06ea8:
 
   ARM: kvm: don't include drivers/virtio/Kconfig (2013-06-26 10:50:06 -0700)
 
 
Pulled, thanks.

 Thanks,
 -Christoffer
 
 
 Anup Patel (1):
   ARM: KVM: Allow host virt timer irq to be different from guest timer 
 virt irq
 
 Arnd Bergmann (1):
   ARM: kvm: don't include drivers/virtio/Kconfig
 
 Christoffer Dall (1):
   Update MAINTAINERS: KVM/ARM work now funded by Linaro
 
 Dave P Martin (1):
   ARM: KVM: Don't handle PSCI calls via SMC
 
 Geoff Levand (1):
   arm/kvm: Cleanup KVM_ARM_MAX_VCPUS logic
 
 Marc Zyngier (7):
   ARM: KVM: remove dead prototype for __kvm_tlb_flush_vmid
   ARM: KVM: use phys_addr_t instead of unsigned long long for HYP PGDs
   ARM: KVM: don't special case PC when doing an MMIO
   ARM: KVM: get rid of S2_PGD_SIZE
   ARM: KVM: perform save/restore of PAR
   ARM: KVM: add missing dsb before invalidating Stage-2 TLBs
   ARM: KVM: clear exclusive monitor on all exception returns
 
  MAINTAINERS|  4 ++--
  arch/arm/include/asm/kvm_arm.h |  1 -
  arch/arm/include/asm/kvm_asm.h | 24 
  arch/arm/include/asm/kvm_emulate.h |  5 -
  arch/arm/include/asm/kvm_host.h|  9 +++--
  arch/arm/kvm/Kconfig   |  8 +++-
  arch/arm/kvm/arm.c |  8 
  arch/arm/kvm/coproc.c  |  4 
  arch/arm/kvm/handle_exit.c |  3 ---
  arch/arm/kvm/interrupts.S  | 16 +++-
  arch/arm/kvm/interrupts_head.S | 10 --
  arch/arm/kvm/mmio.c|  6 --
  arch/arm/kvm/mmu.c |  3 ---
  arch/arm/kvm/psci.c|  2 +-
  arch/arm/kvm/reset.c   | 12 
  include/kvm/arm_arch_timer.h   |  4 
  virt/kvm/arm/arch_timer.c  | 29 -
  17 files changed, 92 insertions(+), 56 deletions(-)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] vfio: add external user support

2013-06-27 Thread Stephen Rothwell
Hi Alexy,

On Thu, 27 Jun 2013 15:02:31 +1000 Alexey Kardashevskiy a...@ozlabs.ru wrote:

 index c488da5..54192b2 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,59 @@ static const struct file_operations vfio_device_fops = 
 {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + */
 +
 +/* Allows an external user (for example, KVM) to lock an IOMMU group */
 +static int vfio_group_add_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + if (filep-f_op != vfio_group_fops)
 + return -EINVAL;
 +
 + if (!atomic_inc_not_zero(group-container_users))
 + return -EINVAL;
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_add_external_user);

You cannot EXPORT a static symbol ... The same through the rest of the
file.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp6E1OHQCz1Z.pgp
Description: PGP signature


Re: [PATCH 3/8] vfio: add external user support

2013-06-27 Thread Stephen Rothwell
Hi Alexy,

On Thu, 27 Jun 2013 15:02:31 +1000 Alexey Kardashevskiy a...@ozlabs.ru wrote:

 +/* Allows an external user (for example, KVM) to unlock an IOMMU group */
 +static void vfio_group_del_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + BUG_ON(filep-f_op != vfio_group_fops);

We usually reserve BUG_ON for situations where there is no way to
continue running or continuing will corrupt the running kernel.  Maybe
WARN_ON() and return?

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpPoxT1H0ADV.pgp
Description: PGP signature


[patch] vfio/type1: fix a leak on error path

2013-06-27 Thread Dan Carpenter
If vfio_unmap_unpin() returns an error then we leak split.  I've moved
the allocation later in the function to fix this.

Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 98231d1..657f6a8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -367,7 +367,6 @@ static int vfio_remove_dma_overlap(struct vfio_iommu 
*iommu, dma_addr_t start,
   size_t *size, struct vfio_dma *dma)
 {
size_t offset, overlap, tmp;
-   struct vfio_dma *split;
int ret;
 
if (!*size)
@@ -435,21 +434,13 @@ static int vfio_remove_dma_overlap(struct vfio_iommu 
*iommu, dma_addr_t start,
return 0;
}
 
-   /* Split existing */
-   split = kzalloc(sizeof(*split), GFP_KERNEL);
-   if (!split)
-   return -ENOMEM;
-
offset = start - dma-iova;
 
ret = vfio_unmap_unpin(iommu, dma, start, size);
if (ret)
return ret;
-
-   if (!*size) {
-   kfree(split);
+   if (!*size)
return -EINVAL;
-   }
 
tmp = dma-size;
 
@@ -458,13 +449,19 @@ static int vfio_remove_dma_overlap(struct vfio_iommu 
*iommu, dma_addr_t start,
 
/* Insert new for remainder, assuming it didn't all get unmapped */
if (likely(offset + *size  tmp)) {
+   struct vfio_dma *split;
+
+   /* Split existing */
+   split = kzalloc(sizeof(*split), GFP_KERNEL);
+   if (!split)
+   return -ENOMEM;
+
split-size = tmp - offset - *size;
split-iova = dma-iova + offset + *size;
split-vaddr = dma-vaddr + offset + *size;
split-prot = dma-prot;
vfio_insert_dma(iommu, split);
-   } else
-   kfree(split);
+   }
 
return 0;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] vfio: add external user support

2013-06-27 Thread Alexey Kardashevskiy
VFIO is designed to be used via ioctls on file descriptors
returned by VFIO.

However in some situations support for an external user is required.
The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
use the existing VFIO groups for exclusive access in real/virtual mode
in the host kernel to avoid passing map/unmap requests to the user
space which would made things pretty slow.

The proposed protocol includes:

1. do normal VFIO init stuff such as opening a new container, attaching
group(s) to it, setting an IOMMU driver for a container. When IOMMU is
set for a container, all groups in it are considered ready to use by
an external user.

2. pass a fd of the group we want to accelerate to KVM. KVM calls
vfio_group_iommu_id_from_file() to verify if the group is initialized
and IOMMU is set for it. The current TCE IOMMU driver marks the whole
IOMMU table as busy when IOMMU is set for a container what this prevents
other DMA users from allocating from it so it is safe to pass the group
to the user space.

3. KVM increases the container users counter via
vfio_group_add_external_user(). This prevents the VFIO group from
being disposed prior to exiting KVM.

4. When KVM is finished and doing cleanup, it releases the group file
and decrements the container users counter. Everything gets released.

5. KVM also keeps the group file as otherwise its fd might have been
closed at the moment of KVM finish so vfio_group_del_external_user()
call will not be possible.

The vfio: Limit group opens patch is also required for the consistency.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---

v1-v2: added definitions to vfio.h :)
Should not compile but compiled. Hm.

---
 drivers/vfio/vfio.c  |   54 ++
 include/linux/vfio.h |7 +++
 2 files changed, 61 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c488da5..40875d2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1370,6 +1370,60 @@ static const struct file_operations vfio_device_fops = {
 };
 
 /**
+ * External user API, exported by symbols to be linked dynamically.
+ */
+
+/* Allows an external user (for example, KVM) to lock an IOMMU group */
+int vfio_group_add_external_user(struct file *filep)
+{
+   struct vfio_group *group = filep-private_data;
+
+   if (filep-f_op != vfio_group_fops)
+   return -EINVAL;
+
+   if (!atomic_inc_not_zero(group-container_users))
+   return -EINVAL;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_group_add_external_user);
+
+/* Allows an external user (for example, KVM) to unlock an IOMMU group */
+void vfio_group_del_external_user(struct file *filep)
+{
+   struct vfio_group *group = filep-private_data;
+
+   if (WARN_ON(filep-f_op != vfio_group_fops))
+   return;
+
+   vfio_group_try_dissolve_container(group);
+}
+EXPORT_SYMBOL_GPL(vfio_group_del_external_user);
+
+/*
+ * Checks if a group for the specified file can be used by
+ * an external user and returns the IOMMU ID if external use is possible.
+ */
+int vfio_group_iommu_id_from_file(struct file *filep)
+{
+   int ret;
+   struct vfio_group *group = filep-private_data;
+
+   if (WARN_ON(filep-f_op != vfio_group_fops))
+   return -EINVAL;
+
+   if (0 == atomic_read(group-container_users) ||
+   !group-container-iommu_driver ||
+   !vfio_group_viable(group))
+   return -EINVAL;
+
+   ret = iommu_group_id(group-iommu_group);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_group_iommu_id_from_file);
+
+/**
  * Module/class support
  */
 static char *vfio_devnode(struct device *dev, umode_t *mode)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ac8d488..7ee6575 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver(
TYPE tmp;   \
offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \
 
+/*
+ * External user API
+ */
+int vfio_group_add_external_user(struct file *filep);
+void vfio_group_del_external_user(struct file *filep);
+int vfio_group_iommu_id_from_file(struct file *filep);
+
 #endif /* VFIO_H */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] uio: uio_pci_generic: Add support for MSI interrupts

2013-06-27 Thread Michael S. Tsirkin
On Wed, Jun 26, 2013 at 03:30:23PM -0700, Guenter Roeck wrote:
 Enable support for MSI interrupts if the device supports it.
 Since MSI interrupts are edge triggered, it is no longer necessary to
 disable interrupts in the kernel and re-enable them from user-space.
 Instead, clearing the interrupt condition in the user space application
 automatically re-enables the interrupt.
 
 Signed-off-by: Guenter Roeck li...@roeck-us.net
 ---
 An open question is if we can just do this unconditionally
 or if there should be some flag to enable it. A module parameter, maybe ?

NACK

UIO is for devices that don't do memory writes.
Anything that can do writes must be protected by an IOMMU
and/or have a secure kernel driver, not a UIO stub.

MSI is done by memory writes so if userspace
controls the device it can trick it to write
anywhere in memory.


  Documentation/DocBook/uio-howto.tmpl |   23 ---
  drivers/uio/uio_pci_generic.c|   15 ---
  2 files changed, 32 insertions(+), 6 deletions(-)
 
 diff --git a/Documentation/DocBook/uio-howto.tmpl 
 b/Documentation/DocBook/uio-howto.tmpl
 index 9561815..69b54e0 100644
 --- a/Documentation/DocBook/uio-howto.tmpl
 +++ b/Documentation/DocBook/uio-howto.tmpl
 @@ -46,6 +46,12 @@ GPL version 2.
  
  revhistory
   revision
 + revnumber0.10/revnumber
 + date2013-06-26/date
 + authorinitialsgr/authorinitials
 + revremarkAdded MSI support to uio_pci_generic./revremark
 + /revision
 + revision
   revnumber0.9/revnumber
   date2009-07-16/date
   authorinitialsmst/authorinitials
 @@ -935,15 +941,26 @@ and look in the output for failure reasons
  sect1 id=uio_pci_generic_internals
  titleThings to know about uio_pci_generic/title
   para
 -Interrupts are handled using the Interrupt Disable bit in the PCI command
 +Interrupts are handled either as MSI interrupts (if the device supports it) 
 or
 +as legacy INTx interrupts.
 + /para
 + para
 +uio_pci_generic automatically configures a device to use MSI interrupts
 +if the device supports it. If an MSI interrupt is received, the user space
 +driver is notified. Since MSI interrupts are edge sensitive, the user space
 +driver needs to clear the interrupt condition in the device before blocking
 +and waiting for more interrupts.
 + /para
 + para
 +Legacy interrupts are handled using the Interrupt Disable bit in the PCI 
 command
  register and Interrupt Status bit in the PCI status register.  All devices
  compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices 
 should
  support these bits.  uio_pci_generic detects this support, and won't bind to
  devices which do not support the Interrupt Disable Bit in the command 
 register.
   /para
   para
 -On each interrupt, uio_pci_generic sets the Interrupt Disable bit.
 -This prevents the device from generating further interrupts
 +If legacy interrupts are used, uio_pci_generic sets the Interrupt Disable 
 bit on
 +each interrupt. This prevents the device from generating further interrupts
  until the bit is cleared. The userspace driver should clear this
  bit before blocking and waiting for more interrupts.
   /para
 diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
 index 14aa10c..3366fdb 100644
 --- a/drivers/uio/uio_pci_generic.c
 +++ b/drivers/uio/uio_pci_generic.c
 @@ -32,6 +32,7 @@
  struct uio_pci_generic_dev {
   struct uio_info info;
   struct pci_dev *pdev;
 + bool have_msi;
  };
  
  static inline struct uio_pci_generic_dev *
 @@ -46,7 +47,7 @@ static irqreturn_t irqhandler(int irq, struct uio_info 
 *info)
  {
   struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
  
 - if (!pci_check_and_mask_intx(gdev-pdev))
 + if (!gdev-have_msi  !pci_check_and_mask_intx(gdev-pdev))
   return IRQ_NONE;
  
   /* UIO core will signal the user process. */
 @@ -58,6 +59,7 @@ static int probe(struct pci_dev *pdev,
  {
   struct uio_pci_generic_dev *gdev;
   int err;
 + bool have_msi = false;
  
   err = pci_enable_device(pdev);
   if (err) {
 @@ -73,7 +75,9 @@ static int probe(struct pci_dev *pdev,
   return -ENODEV;
   }
  
 - if (!pci_intx_mask_supported(pdev)) {
 + if (!pci_enable_msi(pdev)) {
 + have_msi = true;
 + } else if (!pci_intx_mask_supported(pdev)) {
   err = -ENODEV;
   goto err_verify;
   }
 @@ -84,10 +88,11 @@ static int probe(struct pci_dev *pdev,
   goto err_alloc;
   }
  
 + gdev-have_msi = have_msi;
   gdev-info.name = uio_pci_generic;
   gdev-info.version = DRIVER_VERSION;
   gdev-info.irq = pdev-irq;
 - gdev-info.irq_flags = IRQF_SHARED;
 + gdev-info.irq_flags = have_msi ? 0 : IRQF_SHARED;
   gdev-info.handler = irqhandler;
   gdev-pdev = pdev;
  
 @@ -99,6 +104,8 @@ static int probe(struct pci_dev *pdev,
  err_register:
   kfree(gdev);
  

[RFC 1/5] VSOCK: Introduce vsock_find_unbound_socket and vsock_bind_dgram_generic

2013-06-27 Thread Asias He
Signed-off-by: Asias He as...@redhat.com
---
 net/vmw_vsock/af_vsock.c | 70 
 net/vmw_vsock/af_vsock.h |  2 ++
 2 files changed, 72 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 593071d..bc76ddb 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -225,6 +225,17 @@ static struct sock *__vsock_find_bound_socket(struct 
sockaddr_vm *addr)
return NULL;
 }
 
+static struct sock *__vsock_find_unbound_socket(struct sockaddr_vm *addr)
+{
+   struct vsock_sock *vsk;
+
+   list_for_each_entry(vsk, vsock_unbound_sockets, bound_table)
+   if (addr-svm_port == vsk-local_addr.svm_port)
+   return sk_vsock(vsk);
+
+   return NULL;
+}
+
 static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
  struct sockaddr_vm *dst)
 {
@@ -300,6 +311,21 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm 
*addr)
 }
 EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
 
+struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr)
+{
+   struct sock *sk;
+
+   spin_lock_bh(vsock_table_lock);
+   sk = __vsock_find_unbound_socket(addr);
+   if (sk)
+   sock_hold(sk);
+
+   spin_unlock_bh(vsock_table_lock);
+
+   return sk;
+}
+EXPORT_SYMBOL_GPL(vsock_find_unbound_socket);
+
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst)
 {
@@ -534,6 +560,50 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
return 0;
 }
 
+int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr)
+{
+   static u32 port = LAST_RESERVED_PORT + 1;
+   struct sockaddr_vm new_addr;
+
+   vsock_addr_init(new_addr, addr-svm_cid, addr-svm_port);
+
+   if (addr-svm_port == VMADDR_PORT_ANY) {
+   bool found = false;
+   unsigned int i;
+
+   for (i = 0; i  MAX_PORT_RETRIES; i++) {
+   if (port = LAST_RESERVED_PORT)
+   port = LAST_RESERVED_PORT + 1;
+
+   new_addr.svm_port = port++;
+
+   if (!__vsock_find_unbound_socket(new_addr)) {
+   found = true;
+   break;
+   }
+   }
+
+   if (!found)
+   return -EADDRNOTAVAIL;
+   } else {
+   /* If port is in reserved range, ensure caller
+* has necessary privileges.
+*/
+   if (addr-svm_port = LAST_RESERVED_PORT 
+   !capable(CAP_NET_BIND_SERVICE)) {
+   return -EACCES;
+   }
+
+   if (__vsock_find_unbound_socket(new_addr))
+   return -EADDRINUSE;
+   }
+
+   vsock_addr_init(vsk-local_addr, new_addr.svm_cid, new_addr.svm_port);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_bind_dgram_generic);
+
 static int __vsock_bind_dgram(struct vsock_sock *vsk,
  struct sockaddr_vm *addr)
 {
diff --git a/net/vmw_vsock/af_vsock.h b/net/vmw_vsock/af_vsock.h
index 7d64d36..88f559a 100644
--- a/net/vmw_vsock/af_vsock.h
+++ b/net/vmw_vsock/af_vsock.h
@@ -168,8 +168,10 @@ void vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
+struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
+int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr);
 
 #endif /* __AF_VSOCK_H__ */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/5] VSOCK: Introduce virtio-vsock-common.ko

2013-06-27 Thread Asias He
This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.

Signed-off-by: Asias He as...@redhat.com
---
 include/linux/virtio_vsock.h| 200 +++
 include/uapi/linux/virtio_ids.h |   1 +
 include/uapi/linux/virtio_vsock.h   |  70 +++
 net/vmw_vsock/virtio_transport_common.c | 992 
 4 files changed, 1263 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 000..cd8ed95
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,200 @@
+/*
+ * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
+ * anyone can use the definitions to implement compatible drivers/servers:
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Copyright (C) Red Hat, Inc., 2013
+ * Copyright (C) Asias He as...@redhat.com, 2013
+ */
+
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include uapi/linux/virtio_vsock.h
+#include linux/socket.h
+#include net/sock.h
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
+
+struct vsock_transport_recv_notify_data;
+struct vsock_transport_send_notify_data;
+struct sockaddr_vm;
+struct vsock_sock;
+
+enum {
+   VSOCK_VQ_CTRL   = 0,
+   VSOCK_VQ_RX = 1, /* for host to guest data */
+   VSOCK_VQ_TX = 2, /* for guest to host data */
+   VSOCK_VQ_MAX= 3,
+};
+
+/* virtio transport socket state */
+struct virtio_transport {
+   struct virtio_transport_pkt_ops *ops;
+   struct vsock_sock *vsk;
+
+   u64 buf_size;
+   u64 buf_size_min;
+   u64 buf_size_max;
+
+   struct mutex tx_lock;
+   struct mutex rx_lock;
+
+   struct list_head rx_queue;
+   u64 rx_bytes;
+
+   /* Protected by trans-tx_lock */
+   u64 tx_cnt;
+   u64 buf_alloc;
+   u64 peer_fwd_cnt;
+   u64 peer_buf_alloc;
+   /* Protected by trans-rx_lock */
+   u64 fwd_cnt;
+};
+
+struct virtio_vsock_pkt {
+   struct virtio_vsock_hdr hdr;
+   struct virtio_transport *trans;
+   struct work_struct work;
+   struct list_head list;
+   void *buf;
+   u32 len;
+   u32 off;
+};
+
+struct virtio_vsock_pkt_info {
+   struct sockaddr_vm *src;
+   struct sockaddr_vm *dst;
+   struct iovec *iov;
+   u32 len;
+   u8 type;
+   u8 op;
+   u8 shut;
+};
+
+struct virtio_transport_pkt_ops {
+   int (*send_pkt)(struct vsock_sock *vsk,
+   struct virtio_vsock_pkt_info *info);
+};
+
+void virtio_vsock_dumppkt(const char *func,
+ const struct virtio_vsock_pkt *pkt);
+
+struct sock *
+virtio_transport_get_pending(struct sock *listener,
+struct virtio_vsock_pkt *pkt);
+struct virtio_vsock_pkt *
+virtio_transport_alloc_pkt(struct vsock_sock *vsk,
+  struct virtio_vsock_pkt_info *info,
+  size_t len,
+  u32 src_cid,
+  u32 src_port,
+  u32 dst_cid,
+  u32 

[RFC 3/5] VSOCK: Introduce virtio-vsock.ko

2013-06-27 Thread Asias He
VM sockets virtio transport implementation. This module runs in guest
kernel.

Signed-off-by: Asias He as...@redhat.com
---
 net/vmw_vsock/virtio_transport.c | 424 +++
 1 file changed, 424 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 000..f4323aa
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,424 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013 Red Hat, Inc.
+ * Author: Asias He as...@redhat.com
+ *
+ * Some of the code is take from Gerd Hoffmann kra...@redhat.com's
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include linux/spinlock.h
+#include linux/module.h
+#include linux/list.h
+#include linux/virtio.h
+#include linux/virtio_ids.h
+#include linux/virtio_config.h
+#include linux/virtio_vsock.h
+#include net/sock.h
+#include linux/mutex.h
+#include af_vsock.h
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
+
+struct virtio_vsock {
+   /* Virtio device */
+   struct virtio_device *vdev;
+   /* Virtio virtqueue */
+   struct virtqueue *vqs[VSOCK_VQ_MAX];
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Work item to send pkt */
+   struct work_struct tx_work;
+   /* Work item to recv pkt */
+   struct work_struct rx_work;
+   /* Mutex to protect send pkt*/
+   struct mutex tx_lock;
+   /* Mutex to protect recv pkt*/
+   struct mutex rx_lock;
+   /* Number of recv buffers */
+   int rx_buf_nr;
+   /* Number of max recv buffers */
+   int rx_buf_max_nr;
+   /* Guest context id, just like guest ip address */
+   u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+   return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+   struct virtio_vsock *vsock = virtio_vsock_get();
+
+   return vsock-guest_cid;
+}
+
+static int
+virtio_transport_send_pkt(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt_info *info)
+{
+   u32 src_cid, src_port, dst_cid, dst_port;
+   int ret, in_sg = 0, out_sg = 0;
+   struct virtio_transport *trans;
+   struct virtio_vsock_pkt *pkt;
+   struct virtio_vsock *vsock;
+   struct scatterlist sg[2];
+   struct virtqueue *vq;
+   DEFINE_WAIT(wait);
+   u64 credit;
+
+   vsock = virtio_vsock_get();
+   if (!vsock)
+   return -ENODEV;
+
+   src_cid = virtio_transport_get_local_cid();
+   src_port = vsk-local_addr.svm_port;
+   dst_cid = vsk-remote_addr.svm_cid;
+   dst_port = vsk-remote_addr.svm_port;
+
+   trans = vsk-trans;
+   vq = vsock-vqs[VSOCK_VQ_TX];
+
+   if (info-type == SOCK_STREAM) {
+   credit = virtio_transport_get_credit(trans);
+   if (info-len  credit)
+   info-len = credit;
+   }
+   if (info-len  VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
+   info-len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+   /* Do not send zero length OP_RW pkt*/
+   if (info-len == 0  info-op == VIRTIO_VSOCK_OP_RW)
+   return info-len;
+
+   pkt = virtio_transport_alloc_pkt(vsk, info, info-len,
+src_cid, src_port,
+dst_cid, dst_port);
+   if (!pkt)
+   return -ENOMEM;
+
+   pr_debug(%s:info-len= %d\n, __func__, info-len);
+
+   /* Will be released in virtio_transport_send_pkt_work */
+   sock_hold(trans-vsk-sk);
+   virtio_transport_inc_tx_pkt(pkt);
+
+   /* Put pkt in the virtqueue */
+   sg_init_table(sg, ARRAY_SIZE(sg));
+   sg_set_buf(sg[out_sg++], pkt-hdr, sizeof(pkt-hdr));
+   if (info-iov  info-len  0)
+   sg_set_buf(sg[out_sg++], pkt-buf, pkt-len);
+
+   mutex_lock(vsock-tx_lock);
+   while ((ret = virtqueue_add_buf(vq, sg, out_sg, in_sg, pkt,
+   GFP_KERNEL))  0) {
+   prepare_to_wait_exclusive(vsock-queue_wait, wait,
+ TASK_UNINTERRUPTIBLE);
+
+   mutex_unlock(vsock-tx_lock);
+   schedule();
+   mutex_lock(vsock-tx_lock);
+
+   finish_wait(vsock-queue_wait, wait);
+   }
+   virtqueue_kick(vq);
+   mutex_unlock(vsock-tx_lock);
+
+   return info-len;
+}
+
+static struct virtio_transport_pkt_ops virtio_ops = {
+   .send_pkt = virtio_transport_send_pkt,
+};
+
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
+{
+   int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+   struct virtio_vsock_pkt *pkt;
+   struct scatterlist sg[2];
+   

[RFC 5/5] VSOCK: Add Makefile and Kconfig

2013-06-27 Thread Asias He
Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He as...@redhat.com
---
 drivers/vhost/Kconfig   |  4 
 drivers/vhost/Kconfig.vsock |  7 +++
 drivers/vhost/Makefile  |  5 +
 net/vmw_vsock/Kconfig   | 18 ++
 net/vmw_vsock/Makefile  |  4 
 5 files changed, 38 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.vsock

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 017a1e8..169fb19 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -32,3 +32,7 @@ config VHOST
---help---
  This option is selected by any driver which needs to access
  the core of vhost.
+
+if STAGING
+source drivers/vhost/Kconfig.vsock
+endif
diff --git a/drivers/vhost/Kconfig.vsock b/drivers/vhost/Kconfig.vsock
new file mode 100644
index 000..3491865
--- /dev/null
+++ b/drivers/vhost/Kconfig.vsock
@@ -0,0 +1,7 @@
+config VHOST_VSOCK
+   tristate vhost virtio-vsock driver
+   depends on VSOCKETS  EVENTFD
+   select VIRTIO_VSOCKETS_COMMON
+   default n
+   ---help---
+   Say M here to enable the vhost-vsock for virtio-vsock guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..ddf87cb 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,10 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+#CFLAGS_vsock.o := -DDEBUG
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index b5fa7e4..c2b6d6f 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,21 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+   tristate virtio transport for Virtual Sockets
+   depends on VSOCKETS  VIRTIO
+   select VIRTIO_VSOCKETS_COMMON
+   help
+ This module implements a virtio transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine runs on Qemu/KVM.
+
+ To compile this driver as a module, choose M here: the module
+ will be called virtio_vsock_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..bc37e59 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,5 +1,9 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
+#CFLAGS_virtio_transport.o := -DDEBUG
+#CFLAGS_virtio_transport_common.o := -DDEBUG
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 4/5] VSOCK: Introduce vhost-vsock.ko

2013-06-27 Thread Asias He
VM sockets vhost transport implementation. This module runs in host
kernel.

Signed-off-by: Asias He as...@redhat.com
---
 drivers/vhost/vsock.c | 534 ++
 drivers/vhost/vsock.h |   4 +
 2 files changed, 538 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 000..cb54090
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,534 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013 Red Hat, Inc.
+ * Author: Asias He as...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include linux/miscdevice.h
+#include linux/module.h
+#include linux/mutex.h
+#include net/sock.h
+#include linux/virtio_vsock.h
+#include linux/vhost.h
+
+#include ../../../net/vmw_vsock/af_vsock.h
+#include vhost.h
+#include vsock.h
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID   2;
+
+static int vhost_transport_socket_init(struct vsock_sock *vsk,
+  struct vsock_sock *psk);
+
+enum {
+   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instacne on the system. */
+static LIST_HEAD(vhost_vsock_list);
+static DEFINE_MUTEX(vhost_vsock_mutex);
+
+struct vhost_vsock_virtqueue {
+   struct vhost_virtqueue vq;
+};
+
+struct vhost_vsock {
+   /* Vhost device */
+   struct vhost_dev dev;
+   /* Vhost vsock virtqueue*/
+   struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
+   /* Link to global vhost_vsock_list*/
+   struct list_head list;
+   /* Head for pkt from host to guest */
+   struct list_head send_pkt_list;
+   /* Work item to send pkt */
+   struct vhost_work send_pkt_work;
+   /* Guest contex id this vhost_vsock instance handles */
+   u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+   u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
+   return cid;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   mutex_lock(vhost_vsock_mutex);
+   list_for_each_entry(vsock, vhost_vsock_list, list) {
+   if (vsock-guest_cid == guest_cid) {
+   mutex_unlock(vhost_vsock_mutex);
+   return vsock;
+   }
+   }
+   mutex_unlock(vhost_vsock_mutex);
+
+   return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+   struct vhost_virtqueue *vq)
+{
+   struct virtio_vsock_pkt *pkt;
+   unsigned out, in;
+   struct sock *sk;
+   int head, ret;
+
+   mutex_lock(vq-mutex);
+   vhost_disable_notify(vsock-dev, vq);
+   for (;;) {
+   if (list_empty(vsock-send_pkt_list)) {
+   vhost_enable_notify(vsock-dev, vq);
+   break;
+   }
+
+   head = vhost_get_vq_desc(vsock-dev, vq, vq-iov,
+   ARRAY_SIZE(vq-iov), out, in,
+   NULL, NULL);
+   pr_debug(%s: head = %d\n, __func__, head);
+   if (head  0)
+   break;
+
+   if (head == vq-num) {
+   if (unlikely(vhost_enable_notify(vsock-dev, vq))) {
+   vhost_disable_notify(vsock-dev, vq);
+   continue;
+   }
+   break;
+   }
+
+   pkt = list_first_entry(vsock-send_pkt_list,
+  struct virtio_vsock_pkt, list);
+   list_del_init(pkt-list);
+
+   /* FIXME: no assumption of frame layout */
+   ret = __copy_to_user(vq-iov[0].iov_base, pkt-hdr,
+sizeof(pkt-hdr));
+   if (ret) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, Faulted on copying pkt hdr\n);
+   break;
+   }
+   if (pkt-buf  pkt-len  0) {
+   ret = __copy_to_user(vq-iov[1].iov_base, pkt-buf,
+   pkt-len);
+   if (ret) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, Faulted on copying pkt buf\n);
+   break;
+   }
+   }
+
+   vhost_add_used(vq, head, pkt-len);
+
+   virtio_transport_dec_tx_pkt(pkt);
+
+   sk = sk_vsock(pkt-trans-vsk);
+   /* Release refcnt taken in vhost_transport_send_pkt */
+   sock_put(sk);
+
+   virtio_transport_free_pkt(pkt);
+   }
+   vhost_signal(vsock-dev, vq);
+   mutex_unlock(vq-mutex);
+}
+
+static void vhost_transport_send_pkt_work(struct 

Re: i/o threads

2013-06-27 Thread Stefan Hajnoczi
On Wed, Jun 26, 2013 at 03:53:21PM +0200, folkert wrote:
 I noticed that on my 3 VMs running server, that there are 10-20 threads
 doing i/o. As the VMs are running on HDDs and not SSDs I think that is
 counterproductive: won't these threads make the HDDs seek back and forth
 constantly?

The worker threads are doing preadv()/pwritev()/fdatasync().  It's up to
the host kernel to schedule that I/O efficiently.

Exposing more I/O to the host gives it a chance to merge or reorder I/O
for optimal performance, so it's a good thing.

On the other hand, if QEMU only did 1 or 2 I/O requests at a time then
the host kernel could do nothing to improve the I/O pattern and the
disks would indeed seek back and forth constantly.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-27 Thread Gleb Natapov
On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
 This patch tries to introduce a very simple and scale way to invalidate
 all mmio sptes - it need not walk any shadow pages and hold mmu-lock
 
 KVM maintains a global mmio valid generation-number which is stored in
 kvm-memslots.generation and every mmio spte stores the current global
 generation-number into his available bits when it is created
 
 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global generation-number,
 it will go to the normal #PF handler to update the mmio spte
 
 Since 19 bits are used to store generation-number on mmio spte, we zap all
 mmio sptes when the number is round
 
So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
fails too, but I haven't checked what happens exactly.

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/kvm_host.h |  2 +-
  arch/x86/kvm/mmu.c  | 54 
 +++--
  arch/x86/kvm/mmu.h  |  5 +++-
  arch/x86/kvm/paging_tmpl.h  |  7 --
  arch/x86/kvm/vmx.c  |  4 +++
  arch/x86/kvm/x86.c  |  3 +--
  6 files changed, 61 insertions(+), 14 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 1f98c1b..90d05ed 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -773,7 +773,7 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask);
  void kvm_mmu_zap_all(struct kvm *kvm);
 -void kvm_mmu_zap_mmio_sptes(struct kvm *kvm);
 +void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
 kvm_nr_mmu_pages);
  
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 044d8c0..bdc95bc 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -205,9 +205,11 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
  #define MMIO_SPTE_GEN_LOW_SHIFT  3
  #define MMIO_SPTE_GEN_HIGH_SHIFT 52
  
 +#define MMIO_GEN_SHIFT   19
  #define MMIO_GEN_LOW_SHIFT   9
  #define MMIO_GEN_LOW_MASK((1  MMIO_GEN_LOW_SHIFT) - 1)
 -#define MMIO_MAX_GEN ((1  19) - 1)
 +#define MMIO_GEN_MASK((1  MMIO_GEN_SHIFT) - 1)
 +#define MMIO_MAX_GEN ((1  MMIO_GEN_SHIFT) - 1)
  
  static u64 generation_mmio_spte_mask(unsigned int gen)
  {
 @@ -231,17 +233,23 @@ static unsigned int get_mmio_spte_generation(u64 spte)
   return gen;
  }
  
 +static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 +{
 + return kvm_memslots(kvm)-generation  MMIO_GEN_MASK;
 +}
 +
  static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
  unsigned access)
  {
   struct kvm_mmu_page *sp =  page_header(__pa(sptep));
 - u64 mask = generation_mmio_spte_mask(0);
 + unsigned int gen = kvm_current_mmio_generation(kvm);
 + u64 mask = generation_mmio_spte_mask(gen);
  
   access = ACC_WRITE_MASK | ACC_USER_MASK;
   mask |= shadow_mmio_mask | access | gfn  PAGE_SHIFT;
   sp-mmio_cached = true;
  
 - trace_mark_mmio_spte(sptep, gfn, access, 0);
 + trace_mark_mmio_spte(sptep, gfn, access, gen);
   mmu_spte_set(sptep, mask);
  }
  
 @@ -271,6 +279,12 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
 gfn_t gfn,
   return false;
  }
  
 +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
 +{
 + return likely(get_mmio_spte_generation(spte) ==
 + kvm_current_mmio_generation(kvm));
 +}
 +
  static inline u64 rsvd_bits(int s, int e)
  {
   return ((1ULL  (e - s + 1)) - 1)  s;
 @@ -3235,6 +3249,9 @@ int handle_mmio_page_fault_common(struct kvm_vcpu 
 *vcpu, u64 addr, bool direct)
   gfn_t gfn = get_mmio_spte_gfn(spte);
   unsigned access = get_mmio_spte_access(spte);
  
 + if (!check_mmio_spte(vcpu-kvm, spte))
 + return RET_MMIO_PF_INVALID;
 +
   if (direct)
   addr = 0;
  
 @@ -3276,8 +3293,12 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
 gva_t gva,
  
   pgprintk(%s: gva %lx error %x\n, __func__, gva, error_code);
  
 - if (unlikely(error_code  PFERR_RSVD_MASK))
 - return handle_mmio_page_fault(vcpu, gva, error_code, true);
 + if (unlikely(error_code  PFERR_RSVD_MASK)) {
 + r = handle_mmio_page_fault(vcpu, gva, error_code, true);
 +
 + if (likely(r != RET_MMIO_PF_INVALID))
 + return r;
 +

Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-27 Thread Gleb Natapov
On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
 On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
  This patch tries to introduce a very simple and scale way to invalidate
  all mmio sptes - it need not walk any shadow pages and hold mmu-lock
  
  KVM maintains a global mmio valid generation-number which is stored in
  kvm-memslots.generation and every mmio spte stores the current global
  generation-number into his available bits when it is created
  
  When KVM need zap all mmio sptes, it just simply increase the global
  generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
  then it walks the shadow page table and get the mmio spte. If the
  generation-number on the spte does not equal the global generation-number,
  it will go to the normal #PF handler to update the mmio spte
  
  Since 19 bits are used to store generation-number on mmio spte, we zap all
  mmio sptes when the number is round
  
 So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
 fails too, but I haven't checked what happens exactly.
 
Something wrong with gfn calculation during mmio:

qemu-system-x86-17003 [000]  3962.625103: handle_mmio_page_fault: addr:c00ba6c0 
gfn 1ba access a92
qemu-system-x86-17003 [000]  3962.774862: handle_mmio_page_fault: addr:b170 
gfn 10fee00 access a92

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-27 Thread Gleb Natapov
On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
  On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
   This patch tries to introduce a very simple and scale way to invalidate
   all mmio sptes - it need not walk any shadow pages and hold mmu-lock
   
   KVM maintains a global mmio valid generation-number which is stored in
   kvm-memslots.generation and every mmio spte stores the current global
   generation-number into his available bits when it is created
   
   When KVM need zap all mmio sptes, it just simply increase the global
   generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
   then it walks the shadow page table and get the mmio spte. If the
   generation-number on the spte does not equal the global generation-number,
   it will go to the normal #PF handler to update the mmio spte
   
   Since 19 bits are used to store generation-number on mmio spte, we zap all
   mmio sptes when the number is round
   
  So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
  fails too, but I haven't checked what happens exactly.
  
 Something wrong with gfn calculation during mmio:
 
 qemu-system-x86-17003 [000]  3962.625103: handle_mmio_page_fault: 
 addr:c00ba6c0 gfn 1ba access a92
 qemu-system-x86-17003 [000]  3962.774862: handle_mmio_page_fault: 
 addr:b170 gfn 10fee00 access a92
 
Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-27 Thread Gleb Natapov
On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
  On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
   On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
This patch tries to introduce a very simple and scale way to invalidate
all mmio sptes - it need not walk any shadow pages and hold mmu-lock

KVM maintains a global mmio valid generation-number which is stored in
kvm-memslots.generation and every mmio spte stores the current global
generation-number into his available bits when it is created

When KVM need zap all mmio sptes, it just simply increase the global
generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
then it walks the shadow page table and get the mmio spte. If the
generation-number on the spte does not equal the global 
generation-number,
it will go to the normal #PF handler to update the mmio spte

Since 19 bits are used to store generation-number on mmio spte, we zap 
all
mmio sptes when the number is round

   So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
   fails too, but I haven't checked what happens exactly.
   
  Something wrong with gfn calculation during mmio:
  
  qemu-system-x86-17003 [000]  3962.625103: handle_mmio_page_fault: 
  addr:c00ba6c0 gfn 1ba access a92
  qemu-system-x86-17003 [000]  3962.774862: handle_mmio_page_fault: 
  addr:b170 gfn 10fee00 access a92
  
 Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.
 
Hmm, something like patch below fixes it. Will test more.


diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1fd2c05..aec9c05 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)
 
 static gfn_t get_mmio_spte_gfn(u64 spte)
 {
-   return (spte  ~shadow_mmio_mask)  PAGE_SHIFT;
+   u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
+   return (spte  ~mask)  PAGE_SHIFT;
 }
 
 static unsigned get_mmio_spte_access(u64 spte)
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nVMX w/ Haswell] KVM unit-tests in L1 - eventinj test fails trying to send NMI

2013-06-27 Thread Kashyap Chamarthy
On Wed, Jun 26, 2013 at 10:49 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-06-26 10:03, Kashyap Chamarthy wrote:
 Thanks for the note, it's very helpful! This test actually fails on
 older CPUs as well, and I can finally reproduce the issue that Jay also
 reported. I'm not able to cure it by going back to 3b656cf764^,

 Ok, you tried w/o this commit..
 
 commit 3b656cf764cbc43d3efb9bf5f45c618d4cf0989f
 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Sun Apr 14 12:12:45 2013 +0200

 KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1

 Check if the interrupt or NMI window exit is for L1 by testing if it has
 the corresponding controls enabled. This is required when we allow
 direct injection from L0 to L2
 

 I first tried by reverting to the commit before this one, just like Jay
 reported for https://bugzilla.kernel.org/show_bug.cgi?id=58941. But this
 just varied the error (kvm reports an internal error), didn't solve the
 issue. Now I simply reverted the commit on top of next, but without an
 effect. Looks like those problems are not directly related.

 Kashyap, you can do us a favor and try to find out if there was a commit
 in the recent history (roughly before I started to hack on nVMX this
 year) where this test cases succeeded.

Ok, will try this week by trial and error. I'm currently heads down on
a couple of other things.

By this test cases succeeded - I interpret it as NMI tests succeeded too.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] vfio: add external user support

2013-06-27 Thread Benjamin Herrenschmidt
On Thu, 2013-06-27 at 16:59 +1000, Stephen Rothwell wrote:
  +/* Allows an external user (for example, KVM) to unlock an IOMMU
 group */
  +static void vfio_group_del_external_user(struct file *filep)
  +{
  + struct vfio_group *group = filep-private_data;
  +
  + BUG_ON(filep-f_op != vfio_group_fops);
 
 We usually reserve BUG_ON for situations where there is no way to
 continue running or continuing will corrupt the running kernel.  Maybe
 WARN_ON() and return?

Not even that. This is a user space provided fd, we shouldn't oops the
kernel because we passed a wrong argument, just return -EINVAL or
something like that (add a return code).

Ben.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-27 Thread Xiao Guangrong
On 06/27/2013 05:21 PM, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
 On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
 This patch tries to introduce a very simple and scale way to invalidate
 all mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio valid generation-number which is stored in
 kvm-memslots.generation and every mmio spte stores the current global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, we zap all
 mmio sptes when the number is round

 So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
 fails too, but I haven't checked what happens exactly.

 Something wrong with gfn calculation during mmio:

 qemu-system-x86-17003 [000]  3962.625103: handle_mmio_page_fault: 
 addr:c00ba6c0 gfn 1ba access a92
 qemu-system-x86-17003 [000]  3962.774862: handle_mmio_page_fault: 
 addr:b170 gfn 10fee00 access a92

 Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.

 Hmm, something like patch below fixes it. Will test more.
 
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 1fd2c05..aec9c05 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)
 
  static gfn_t get_mmio_spte_gfn(u64 spte)
  {
 - return (spte  ~shadow_mmio_mask)  PAGE_SHIFT;
 + u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
 + return (spte  ~mask)  PAGE_SHIFT;
  }

Looks nice.

Gleb, thank you very much for investigating the bug and fixing my mistake.
I will be more careful in the further developments.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-27 Thread Gleb Natapov
On Thu, Jun 27, 2013 at 05:50:08PM +0800, Xiao Guangrong wrote:
 On 06/27/2013 05:21 PM, Gleb Natapov wrote:
  On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
  On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
  On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
  On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
  This patch tries to introduce a very simple and scale way to invalidate
  all mmio sptes - it need not walk any shadow pages and hold mmu-lock
 
  KVM maintains a global mmio valid generation-number which is stored in
  kvm-memslots.generation and every mmio spte stores the current global
  generation-number into his available bits when it is created
 
  When KVM need zap all mmio sptes, it just simply increase the global
  generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
  then it walks the shadow page table and get the mmio spte. If the
  generation-number on the spte does not equal the global 
  generation-number,
  it will go to the normal #PF handler to update the mmio spte
 
  Since 19 bits are used to store generation-number on mmio spte, we zap 
  all
  mmio sptes when the number is round
 
  So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
  fails too, but I haven't checked what happens exactly.
 
  Something wrong with gfn calculation during mmio:
 
  qemu-system-x86-17003 [000]  3962.625103: handle_mmio_page_fault: 
  addr:c00ba6c0 gfn 1ba access a92
  qemu-system-x86-17003 [000]  3962.774862: handle_mmio_page_fault: 
  addr:b170 gfn 10fee00 access a92
 
  Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.
 
  Hmm, something like patch below fixes it. Will test more.
  
  
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 1fd2c05..aec9c05 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)
  
   static gfn_t get_mmio_spte_gfn(u64 spte)
   {
  -   return (spte  ~shadow_mmio_mask)  PAGE_SHIFT;
  +   u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
  +   return (spte  ~mask)  PAGE_SHIFT;
   }
 
 Looks nice.
 
The question is if get_mmio_spte_access() need the same treatment?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-next v2] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Paolo Bonzini
Il 27/06/2013 04:56, Paul Gortmaker ha scritto:
 Il 26/06/2013 20:11, Paul Gortmaker ha scritto:
   spin_unlock(kvm-mmu_lock);
   +   kvm_put_kvm(kvm);
   srcu_read_unlock(kvm-srcu, idx);

  
  kvm_put_kvm needs to go last.  I can fix when applying, but I'll wait
  for Gleb to take a look too.
 I'm curious why you would say that -- since the way I sent it has the
 lock tear down be symmetrical and opposite to the build up - e.g.
 
   idx = srcu_read_lock(kvm-srcu);
 
 [...]
 
 + kvm_get_kvm(kvm);
 
 [...]
   spin_lock(kvm-mmu_lock);
  
 [...]
 
  unlock:
   spin_unlock(kvm-mmu_lock);
 + kvm_put_kvm(kvm);
   srcu_read_unlock(kvm-srcu, idx);
  
 You'd originally said to put the kvm_get_kvm where it currently is;
 perhaps instead we want the get/put to encompass the whole 
 srcu_read locked section?

The put really needs to be the last thing you do, as the data structure
can be destroyed before it returns.  Where you put kvm_get_kvm doesn't
really matter, since you're protected by the kvm lock.  So, moving the
kvm_get_kvm before would also work---I didn't really mean that
kvm_get_kvm has to be literally just before the raw_spin_unlock.

However, I actually like having the get_kvm right there, because it
makes it explicit that you are using reference counting as a substitute
for holding the lock.  I find it quite idiomatic, and in some sense the
lock/unlock is still symmetric: the kvm_put_kvm goes exactly where you'd
have unlocked the kvm_lock.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] Introduce VM Sockets virtio transport

2013-06-27 Thread Michael S. Tsirkin
On Thu, Jun 27, 2013 at 03:59:59PM +0800, Asias He wrote:
 Hello guys,
 
 In commit d021c344051af91 (VSOCK: Introduce VM Sockets), VMware added VM
 Sockets support. VM Sockets allows communication between virtual
 machines and the hypervisor. VM Sockets is able to use different
 hyervisor neutral transport to transfer data. Currently, only VMware
 VMCI transport is supported. 
 
 This series introduces virtio transport for VM Sockets.
 
 Any comments are appreciated! Thanks!
 
 Code:
 =
 1) kernel bits
git://github.com/asias/linux.git vsock
 
 2) userspace bits:
git://github.com/asias/linux-kvm.git vsock
 
 Howto:
 =
 Make sure you have these kernel options:
 
   CONFIG_VSOCKETS=y
   CONFIG_VIRTIO_VSOCKETS=y
   CONFIG_VIRTIO_VSOCKETS_COMMON=y
   CONFIG_VHOST_VSOCK=m
 
 $ git clone git://github.com/asias/linux-kvm.git
 $ cd linux-kvm/tools/kvm
 $ co -b vsock origin/vsock
 $ make
 $ modprobe vhost_vsock
 $ ./lkvm run -d os.img -k bzImage --vsock guest_cid
 
 Test:
 =
 I hacked busybox's http server and wget to run over vsock. Start http
 server in host and guest, download a 512MB file in guest and host
 simultaneously for 6000 times. Manged to run the http stress test.
 
 Also, I wrote a small libvsock.so to play the LD_PRELOAD trick and
 managed to make sshd and ssh work over virito-vsock without modifying
 the source code.
 
 Draft VM Sockets Virtio Device spec:
 =
 Appendix K: VM Sockets Device
 
 The virtio VM sockets device is a virtio transport device for VM Sockets. VM
 Sockets allows communication between virtual machines and the hypervisor.
 
 Configuration:
 
 Subsystem Device ID 13
 
 Virtqueues:
 0:controlq; 1:receiveq0; 2:transmitq0 ... 2N+1:receivqN; 2N+2:transmitqN
 
 Feature bits:
 Currently, no feature bits are defined.
 
 Device configuration layout:
 
 Two configuration fields are currently defined.
 
struct virtio_vsock_config {

which fields are RW,RO,WO?

__u32 guest_cid;

Given that cid is like an IP address, 32 bit seems too
limiting. I would go for a 64 bit one or maybe even 128 bit,
so that e.g. GUIDs can be used there.


__u32 max_virtqueue_pairs;

I'd make this little endian.

} __packed;


 
 The guest_cid field specifies the guest context id which likes the guest IP
 address. The max_virtqueue_pairs field specifies the maximum number of receive
 and transmit virtqueue pairs (receiveq0 ...  receiveqN and transmitq0 ...
 transmitqN respectively; N = max_virtqueue_pairs - 1 ) that can be configured.
 The driver is free to use only one virtqueue pairs, or it can use more to
 achieve better performance.

Don't we need a field for driver to specify the # of VQs?

I note packets have no sequence numbers.
This means that a given stream should only use
a single VQ in each direction, correct?
Maybe make this explicit.

 
 Device Initialization:
 The initialization routine should discover the device's virtqueues.
 
 Device Operation:
 Packets are transmitted by placing them in the transmitq0..transmitqN, and
 buffers for incoming packets are placed in the receiveq0..receiveqN. In each
 case, the packet itself is preceded by a header:
 
struct virtio_vsock_hdr {

Let's make header explicitly little endian and avoid the
heartburn we have with many other transports.

__u32   src_cid;
__u32   src_port;
__u32   dst_cid;
__u32   dst_port;

Ports are 32 bit? I guess most applications can't work with 16 bit.

Also, why put cid's in all packets? They are only needed
when you create a connection, no? Afterwards port numbers
can be used.

__u32   len;
__u8type;
__u8op;
__u8shut;

Please add padding to align all field naturally.

__u64   fwd_cnt;
__u64   buf_alloc;

Is a 64 bit counter really needed? 64 bit math
has portability limitations and performance overhead on many
architectures.

} __packed;

Packing produces terrible code in many compilers.
Please avoid packed structures on data path, instead,
pad structures explicitly to align all fields naturally.

 
 src_cid and dst_cid: specify the source and destination context id.
 src_port and dst_port: specify the source and destination port.
 len: specifies the size of the data payload, it could be zero if no data
 payload is transferred.
 type: specifies the type of the packet, it can be SOCK_STREAM or SOCK_DGRAM.
 op: specifies the operation of the packet, it is defined as follows.
 
enum {
VIRTIO_VSOCK_OP_INVALID = 0,
VIRTIO_VSOCK_OP_REQUEST = 1,
VIRTIO_VSOCK_OP_NEGOTIATE = 2,
VIRTIO_VSOCK_OP_OFFER = 3,
VIRTIO_VSOCK_OP_ATTACH = 4,
VIRTIO_VSOCK_OP_RW = 5,
VIRTIO_VSOCK_OP_CREDIT = 6,
VIRTIO_VSOCK_OP_RST = 7,
VIRTIO_VSOCK_OP_SHUTDOWN = 8,
};
 
 shut: 

Re: [RFC 2/5] VSOCK: Introduce virtio-vsock-common.ko

2013-06-27 Thread Michael S. Tsirkin
On Thu, Jun 27, 2013 at 04:00:01PM +0800, Asias He wrote:
 This module contains the common code and header files for the following
 virtio-vsock and virtio-vhost kernel modules.
 
 Signed-off-by: Asias He as...@redhat.com
 ---
  include/linux/virtio_vsock.h| 200 +++
  include/uapi/linux/virtio_ids.h |   1 +
  include/uapi/linux/virtio_vsock.h   |  70 +++
  net/vmw_vsock/virtio_transport_common.c | 992 
 
  4 files changed, 1263 insertions(+)
  create mode 100644 include/linux/virtio_vsock.h
  create mode 100644 include/uapi/linux/virtio_vsock.h
  create mode 100644 net/vmw_vsock/virtio_transport_common.c
 
 diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
 new file mode 100644
 index 000..cd8ed95
 --- /dev/null
 +++ b/include/linux/virtio_vsock.h
 @@ -0,0 +1,200 @@
 +/*
 + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
 + * anyone can use the definitions to implement compatible drivers/servers:
 + *
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + * 1. Redistributions of source code must retain the above copyright
 + *notice, this list of conditions and the following disclaimer.
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *notice, this list of conditions and the following disclaimer in the
 + *documentation and/or other materials provided with the distribution.
 + * 3. Neither the name of IBM nor the names of its contributors
 + *may be used to endorse or promote products derived from this software
 + *without specific prior written permission.
 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
 IS''
 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
 + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 + * SUCH DAMAGE.
 + *
 + * Copyright (C) Red Hat, Inc., 2013
 + * Copyright (C) Asias He as...@redhat.com, 2013
 + */
 +
 +#ifndef _LINUX_VIRTIO_VSOCK_H
 +#define _LINUX_VIRTIO_VSOCK_H
 +
 +#include uapi/linux/virtio_vsock.h
 +#include linux/socket.h
 +#include net/sock.h
 +
 +#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE128
 +#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE(1024 * 256)
 +#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE(1024 * 256)
 +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4)
 +#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE(1024 * 64)
 +
 +struct vsock_transport_recv_notify_data;
 +struct vsock_transport_send_notify_data;
 +struct sockaddr_vm;
 +struct vsock_sock;
 +
 +enum {
 + VSOCK_VQ_CTRL   = 0,
 + VSOCK_VQ_RX = 1, /* for host to guest data */
 + VSOCK_VQ_TX = 2, /* for guest to host data */
 + VSOCK_VQ_MAX= 3,
 +};
 +
 +/* virtio transport socket state */
 +struct virtio_transport {
 + struct virtio_transport_pkt_ops *ops;
 + struct vsock_sock *vsk;
 +
 + u64 buf_size;
 + u64 buf_size_min;
 + u64 buf_size_max;
 +
 + struct mutex tx_lock;
 + struct mutex rx_lock;
 +
 + struct list_head rx_queue;
 + u64 rx_bytes;
 +
 + /* Protected by trans-tx_lock */
 + u64 tx_cnt;
 + u64 buf_alloc;
 + u64 peer_fwd_cnt;
 + u64 peer_buf_alloc;
 + /* Protected by trans-rx_lock */
 + u64 fwd_cnt;
 +};
 +
 +struct virtio_vsock_pkt {
 + struct virtio_vsock_hdr hdr;
 + struct virtio_transport *trans;
 + struct work_struct work;
 + struct list_head list;
 + void *buf;
 + u32 len;
 + u32 off;
 +};
 +
 +struct virtio_vsock_pkt_info {
 + struct sockaddr_vm *src;
 + struct sockaddr_vm *dst;
 + struct iovec *iov;
 + u32 len;
 + u8 type;
 + u8 op;
 + u8 shut;
 +};
 +
 +struct virtio_transport_pkt_ops {
 + int (*send_pkt)(struct vsock_sock *vsk,
 + struct virtio_vsock_pkt_info *info);
 +};
 +
 +void virtio_vsock_dumppkt(const char *func,
 +   const struct virtio_vsock_pkt *pkt);
 +
 +struct sock *
 +virtio_transport_get_pending(struct sock *listener,
 +  struct virtio_vsock_pkt *pkt);
 +struct virtio_vsock_pkt *
 +virtio_transport_alloc_pkt(struct vsock_sock *vsk,
 +struct virtio_vsock_pkt_info *info,
 +size_t len,
 +u32 

Re: [RFC 4/5] VSOCK: Introduce vhost-vsock.ko

2013-06-27 Thread Michael S. Tsirkin
On Thu, Jun 27, 2013 at 04:00:03PM +0800, Asias He wrote:
 VM sockets vhost transport implementation. This module runs in host
 kernel.
 
 Signed-off-by: Asias He as...@redhat.com

Has any thought been given to how this affects migration?
I don't see any API for an application to
move to a different host and reconnect to a running
vsock in guest.

I think we could merge without this, there are more
pressing issues, but it's probably a requirement
if you want this to replace e.g. serial in many
scenarious.

 ---
  drivers/vhost/vsock.c | 534 
 ++
  drivers/vhost/vsock.h |   4 +
  2 files changed, 538 insertions(+)
  create mode 100644 drivers/vhost/vsock.c
  create mode 100644 drivers/vhost/vsock.h
 
 diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
 new file mode 100644
 index 000..cb54090
 --- /dev/null
 +++ b/drivers/vhost/vsock.c
 @@ -0,0 +1,534 @@
 +/*
 + * vhost transport for vsock
 + *
 + * Copyright (C) 2013 Red Hat, Inc.
 + * Author: Asias He as...@redhat.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.
 + */
 +#include linux/miscdevice.h
 +#include linux/module.h
 +#include linux/mutex.h
 +#include net/sock.h
 +#include linux/virtio_vsock.h
 +#include linux/vhost.h
 +
 +#include ../../../net/vmw_vsock/af_vsock.h

Send patch to move this to include/linux ?

 +#include vhost.h
 +#include vsock.h
 +
 +#define VHOST_VSOCK_DEFAULT_HOST_CID 2;

Sure you want that ; there? This can result in strange code, e.g.

int a = VHOST_VSOCK_DEFAULT_HOST_CID + 1;
set's a to 2.

 +
 +static int vhost_transport_socket_init(struct vsock_sock *vsk,
 +struct vsock_sock *psk);
 +
 +enum {
 + VHOST_VSOCK_FEATURES = VHOST_FEATURES,
 +};
 +
 +/* Used to track all the vhost_vsock instacne on the system. */

typo

 +static LIST_HEAD(vhost_vsock_list);
 +static DEFINE_MUTEX(vhost_vsock_mutex);
 +
 +struct vhost_vsock_virtqueue {
 + struct vhost_virtqueue vq;
 +};
 +
 +struct vhost_vsock {
 + /* Vhost device */
 + struct vhost_dev dev;
 + /* Vhost vsock virtqueue*/
 + struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
 + /* Link to global vhost_vsock_list*/
 + struct list_head list;
 + /* Head for pkt from host to guest */
 + struct list_head send_pkt_list;
 + /* Work item to send pkt */
 + struct vhost_work send_pkt_work;
 + /* Guest contex id this vhost_vsock instance handles */
 + u32 guest_cid;
 +};
 +
 +static u32 vhost_transport_get_local_cid(void)
 +{
 + u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
 + return cid;
 +}
 +

Interesting. So all hosts in fact have the same CID?

 +static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
 +{
 + struct vhost_vsock *vsock;
 +
 + mutex_lock(vhost_vsock_mutex);
 + list_for_each_entry(vsock, vhost_vsock_list, list) {
 + if (vsock-guest_cid == guest_cid) {
 + mutex_unlock(vhost_vsock_mutex);
 + return vsock;
 + }
 + }
 + mutex_unlock(vhost_vsock_mutex);
 +
 + return NULL;
 +}
 +
 +static void
 +vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 + struct vhost_virtqueue *vq)
 +{
 + struct virtio_vsock_pkt *pkt;
 + unsigned out, in;
 + struct sock *sk;
 + int head, ret;
 +
 + mutex_lock(vq-mutex);
 + vhost_disable_notify(vsock-dev, vq);
 + for (;;) {
 + if (list_empty(vsock-send_pkt_list)) {
 + vhost_enable_notify(vsock-dev, vq);
 + break;
 + }
 +
 + head = vhost_get_vq_desc(vsock-dev, vq, vq-iov,
 + ARRAY_SIZE(vq-iov), out, in,
 + NULL, NULL);
 + pr_debug(%s: head = %d\n, __func__, head);
 + if (head  0)
 + break;
 +
 + if (head == vq-num) {
 + if (unlikely(vhost_enable_notify(vsock-dev, vq))) {
 + vhost_disable_notify(vsock-dev, vq);
 + continue;
 + }
 + break;
 + }
 +
 + pkt = list_first_entry(vsock-send_pkt_list,
 +struct virtio_vsock_pkt, list);
 + list_del_init(pkt-list);
 +
 + /* FIXME: no assumption of frame layout */

Pls fix. memcpy_from_iovec is not harder.

 + ret = __copy_to_user(vq-iov[0].iov_base, pkt-hdr,
 +  sizeof(pkt-hdr));
 + if (ret) {
 + virtio_transport_free_pkt(pkt);
 + vq_err(vq, Faulted on copying pkt hdr\n);
 + break;
 + }
 + if (pkt-buf  pkt-len  0) {
 + ret = __copy_to_user(vq-iov[1].iov_base, pkt-buf,
 + pkt-len);
 +   

Re: [PATCH 3/8] vfio: add external user support

2013-06-27 Thread Alexey Kardashevskiy
On 06/27/2013 07:42 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2013-06-27 at 16:59 +1000, Stephen Rothwell wrote:
 +/* Allows an external user (for example, KVM) to unlock an IOMMU
 group */
 +static void vfio_group_del_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + BUG_ON(filep-f_op != vfio_group_fops);

 We usually reserve BUG_ON for situations where there is no way to
 continue running or continuing will corrupt the running kernel.  Maybe
 WARN_ON() and return?
 
 Not even that. This is a user space provided fd, we shouldn't oops the
 kernel because we passed a wrong argument, just return -EINVAL or
 something like that (add a return code).

I'll change to WARN_ON but...
This is going to be called on KVM exit on a file pointer previously
verified for correctness. If it is a wrong file*, then something went
terribly wrong.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-27 Thread Xiao Guangrong
On 06/27/2013 06:19 PM, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 05:50:08PM +0800, Xiao Guangrong wrote:
 On 06/27/2013 05:21 PM, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
 On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
 On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
 This patch tries to introduce a very simple and scale way to invalidate
 all mmio sptes - it need not walk any shadow pages and hold mmu-lock

 KVM maintains a global mmio valid generation-number which is stored in
 kvm-memslots.generation and every mmio spte stores the current global
 generation-number into his available bits when it is created

 When KVM need zap all mmio sptes, it just simply increase the global
 generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
 then it walks the shadow page table and get the mmio spte. If the
 generation-number on the spte does not equal the global 
 generation-number,
 it will go to the normal #PF handler to update the mmio spte

 Since 19 bits are used to store generation-number on mmio spte, we zap 
 all
 mmio sptes when the number is round

 So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
 fails too, but I haven't checked what happens exactly.

 Something wrong with gfn calculation during mmio:

 qemu-system-x86-17003 [000]  3962.625103: handle_mmio_page_fault: 
 addr:c00ba6c0 gfn 1ba access a92
 qemu-system-x86-17003 [000]  3962.774862: handle_mmio_page_fault: 
 addr:b170 gfn 10fee00 access a92

 Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.

 Hmm, something like patch below fixes it. Will test more.


 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 1fd2c05..aec9c05 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)

  static gfn_t get_mmio_spte_gfn(u64 spte)
  {
 -   return (spte  ~shadow_mmio_mask)  PAGE_SHIFT;
 +   u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
 +   return (spte  ~mask)  PAGE_SHIFT;
  }

 Looks nice.

 The question is if get_mmio_spte_access() need the  same treatment?

It works okay since the Access only uses bit1 and bit2 (and in the direct mmu
case, only use gfn). But i am happy to do the same change in 
get_mmio_spte_access()
to make the code more clear.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-27 Thread David Gibson
On Sun, Jun 23, 2013 at 10:41:24PM -0600, Alex Williamson wrote:
 On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote:
  On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote:
   On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
 On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
  On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
   On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
   Just out of curiosity - would not get_file() and fput_atomic() 
   on a
   group's
   file* do the right job instead of 
   vfio_group_add_external_user() and
   vfio_group_del_external_user()?
  
   I was thinking that too.  Grabbing a file reference would 
   certainly be
   the usual way of handling this sort of thing.
   
   But that wouldn't prevent the group ownership to be returned to
   the kernel or another user would it ?
  
  
  Holding the file pointer does not let the group-container_users 
  counter go
  to zero
 
 How so?  Holding the file pointer means the file won't go away, which
 means the group release function won't be called.  That means the 
 group
 won't go away, but that doesn't mean it's attached to an IOMMU.  A 
 user
 could call UNSET_CONTAINER.

Uhh... *thinks*.  Ah, I see.

I think the interface should not take the group fd, but the container
fd.  Holding a reference to *that* would keep the necessary things
around.  But more to the point, it's the right thing semantically:

The container is essentially the handle on a host iommu address space,
and so that's what should be bound by the KVM call to a particular
guest iommu address space.  e.g. it would make no sense to bind two
different groups to different guest iommu address spaces, if they were
in the same container - the guest thinks they are different spaces,
but if they're in the same container they must be the same space.
   
   While the container is the gateway to the iommu, what empowers the
   container to maintain an iommu is the group.  What happens to a
   container when all the groups are disconnected or closed?  Groups are
   the unit that indicates hardware access, not containers.  Thanks,
  
  Uh... huh?  I'm really not sure what you're getting at.
  
  The operation we're doing for KVM here is binding a guest iommu
  address space to a particular host iommu address space.  Why would we
  not want to use the obvious handle on the host iommu address space,
  which is the container fd?
 
 AIUI, the request isn't for an interface through which to do iommu
 mappings.  The request is for an interface to show that the user has
 sufficient privileges to do mappings.  Groups are what gives the user
 that ability.  The iommu is also possibly associated with multiple iommu
 groups and I believe what is being asked for here is a way to hold and
 lock a single iommu group with iommu protection.
 
 From a practical point of view, the iommu interface is de-privileged
 once the groups are disconnected or closed.  Holding a reference count
 on the iommu fd won't prevent that.  That means we'd have to use a
 notifier to have KVM stop the side-channel iommu access.  Meanwhile
 holding the file descriptor for the group and adding an interface that
 bumps use counter allows KVM to lock itself in, just as if it had a
 device opened itself.  Thanks,

Ah, good point.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpCSZuQp1350.pgp
Description: PGP signature


Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Gleb Natapov
On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
 In commit e935b8372cf8 (KVM: Convert kvm_lock to raw_spinlock),
I am copying Jan, the author of the patch. Commit message says:
Code under this lock requires non-preemptibility, but which code
exactly is this? Is this still true?

 the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
 function tries to grab the (non-raw) mmu_lock within the scope of
 the raw locked kvm_lock being held.  This leads to the following:
 
 BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
 Preemption disabled at:[a0376eac] mmu_shrink+0x5c/0x1b0 [kvm]
 
 Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
 Call Trace:
  [8106f2ad] __might_sleep+0xfd/0x160
  [817d8d64] rt_spin_lock+0x24/0x50
  [a0376f3c] mmu_shrink+0xec/0x1b0 [kvm]
  [8111455d] shrink_slab+0x17d/0x3a0
  [81151f00] ? mem_cgroup_iter+0x130/0x260
  [8111824a] balance_pgdat+0x54a/0x730
  [8111fe47] ? set_pgdat_percpu_threshold+0xa7/0xd0
  [811185bf] kswapd+0x18f/0x490
  [81070961] ? get_parent_ip+0x11/0x50
  [81061970] ? __init_waitqueue_head+0x50/0x50
  [81118430] ? balance_pgdat+0x730/0x730
  [81060d2b] kthread+0xdb/0xe0
  [8106e122] ? finish_task_switch+0x52/0x100
  [817e1e94] kernel_thread_helper+0x4/0x10
  [81060c50] ? __init_kthread_worker+0x
 
 Since we only use the lock for protecting the vm_list, once we've
 found the instance we want, we can shuffle it to the end of the
 list and then drop the kvm_lock before taking the mmu_lock.  We
 can do this because after the mmu operations are completed, we
 break -- i.e. we don't continue list processing, so it doesn't
 matter if the list changed around us.
 
 Signed-off-by: Paul Gortmaker paul.gortma...@windriver.com
 ---
 
 [Note1: do double check that this solution makes sense for the
  mainline kernel; consider this an RFC patch that does want a
  review from people in the know.]
 
 [Note2: you'll need to be running a preempt-rt kernel to actually
  see this.  Also note that the above patch is against linux-next.
  Alternate solutions welcome ; this seemed to me the obvious fix.]
 
  arch/x86/kvm/mmu.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 748e0d8..db93a70 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct 
 shrink_control *sc)
  {
   struct kvm *kvm;
   int nr_to_scan = sc-nr_to_scan;
 + int found = 0;
   unsigned long freed = 0;
  
   raw_spin_lock(kvm_lock);
 @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct 
 shrink_control *sc)
   continue;
  
   idx = srcu_read_lock(kvm-srcu);
 +
 + list_move_tail(kvm-vm_list, vm_list);
 + found = 1;
 + /* We can't be holding a raw lock and take non-raw mmu_lock */
 + raw_spin_unlock(kvm_lock);
 +
   spin_lock(kvm-mmu_lock);
  
   if (kvm_has_zapped_obsolete_pages(kvm)) {
 @@ -4370,11 +4377,12 @@ unlock:
* per-vm shrinkers cry out
* sadness comes quickly
*/
 - list_move_tail(kvm-vm_list, vm_list);
   break;
   }
  
 - raw_spin_unlock(kvm_lock);
 + if (!found)
 + raw_spin_unlock(kvm_lock);
 +
   return freed;
  
  }
 -- 
 1.8.1.2

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-27 Thread Gleb Natapov
On Thu, Jun 27, 2013 at 07:05:20PM +0800, Xiao Guangrong wrote:
 On 06/27/2013 06:19 PM, Gleb Natapov wrote:
  On Thu, Jun 27, 2013 at 05:50:08PM +0800, Xiao Guangrong wrote:
  On 06/27/2013 05:21 PM, Gleb Natapov wrote:
  On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
  On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
  On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
  On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
  This patch tries to introduce a very simple and scale way to 
  invalidate
  all mmio sptes - it need not walk any shadow pages and hold mmu-lock
 
  KVM maintains a global mmio valid generation-number which is stored in
  kvm-memslots.generation and every mmio spte stores the current global
  generation-number into his available bits when it is created
 
  When KVM need zap all mmio sptes, it just simply increase the global
  generation-number. When guests do mmio access, KVM intercepts a MMIO 
  #PF
  then it walks the shadow page table and get the mmio spte. If the
  generation-number on the spte does not equal the global 
  generation-number,
  it will go to the normal #PF handler to update the mmio spte
 
  Since 19 bits are used to store generation-number on mmio spte, we 
  zap all
  mmio sptes when the number is round
 
  So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
  fails too, but I haven't checked what happens exactly.
 
  Something wrong with gfn calculation during mmio:
 
  qemu-system-x86-17003 [000]  3962.625103: handle_mmio_page_fault: 
  addr:c00ba6c0 gfn 1ba access a92
  qemu-system-x86-17003 [000]  3962.774862: handle_mmio_page_fault: 
  addr:b170 gfn 10fee00 access a92
 
  Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.
 
  Hmm, something like patch below fixes it. Will test more.
 
 
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 1fd2c05..aec9c05 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)
 
   static gfn_t get_mmio_spte_gfn(u64 spte)
   {
  - return (spte  ~shadow_mmio_mask)  PAGE_SHIFT;
  + u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
  + return (spte  ~mask)  PAGE_SHIFT;
   }
 
  Looks nice.
 
  The question is if get_mmio_spte_access() need the  same treatment?
 
 It works okay since the Access only uses bit1 and bit2 (and in the direct mmu
 case, only use gfn). But i am happy to do the same change in 
 get_mmio_spte_access()
 to make the code more clear.
 
It will fix output of handle_mmio_page_fault at least. Currently we have 
access a92 there.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Paolo Bonzini
Il 27/06/2013 13:09, Gleb Natapov ha scritto:
 On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
 In commit e935b8372cf8 (KVM: Convert kvm_lock to raw_spinlock),
 I am copying Jan, the author of the patch. Commit message says:
 Code under this lock requires non-preemptibility, but which code
 exactly is this? Is this still true?

hardware_enable_nolock/hardware_disable_nolock does.

Paolo

 the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
 function tries to grab the (non-raw) mmu_lock within the scope of
 the raw locked kvm_lock being held.  This leads to the following:

 BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
 Preemption disabled at:[a0376eac] mmu_shrink+0x5c/0x1b0 [kvm]

 Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
 Call Trace:
  [8106f2ad] __might_sleep+0xfd/0x160
  [817d8d64] rt_spin_lock+0x24/0x50
  [a0376f3c] mmu_shrink+0xec/0x1b0 [kvm]
  [8111455d] shrink_slab+0x17d/0x3a0
  [81151f00] ? mem_cgroup_iter+0x130/0x260
  [8111824a] balance_pgdat+0x54a/0x730
  [8111fe47] ? set_pgdat_percpu_threshold+0xa7/0xd0
  [811185bf] kswapd+0x18f/0x490
  [81070961] ? get_parent_ip+0x11/0x50
  [81061970] ? __init_waitqueue_head+0x50/0x50
  [81118430] ? balance_pgdat+0x730/0x730
  [81060d2b] kthread+0xdb/0xe0
  [8106e122] ? finish_task_switch+0x52/0x100
  [817e1e94] kernel_thread_helper+0x4/0x10
  [81060c50] ? __init_kthread_worker+0x

 Since we only use the lock for protecting the vm_list, once we've
 found the instance we want, we can shuffle it to the end of the
 list and then drop the kvm_lock before taking the mmu_lock.  We
 can do this because after the mmu operations are completed, we
 break -- i.e. we don't continue list processing, so it doesn't
 matter if the list changed around us.

 Signed-off-by: Paul Gortmaker paul.gortma...@windriver.com
 ---

 [Note1: do double check that this solution makes sense for the
  mainline kernel; consider this an RFC patch that does want a
  review from people in the know.]

 [Note2: you'll need to be running a preempt-rt kernel to actually
  see this.  Also note that the above patch is against linux-next.
  Alternate solutions welcome ; this seemed to me the obvious fix.]

  arch/x86/kvm/mmu.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 748e0d8..db93a70 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct 
 shrink_control *sc)
  {
  struct kvm *kvm;
  int nr_to_scan = sc-nr_to_scan;
 +int found = 0;
  unsigned long freed = 0;
  
  raw_spin_lock(kvm_lock);
 @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct 
 shrink_control *sc)
  continue;
  
  idx = srcu_read_lock(kvm-srcu);
 +
 +list_move_tail(kvm-vm_list, vm_list);
 +found = 1;
 +/* We can't be holding a raw lock and take non-raw mmu_lock */
 +raw_spin_unlock(kvm_lock);
 +
  spin_lock(kvm-mmu_lock);
  
  if (kvm_has_zapped_obsolete_pages(kvm)) {
 @@ -4370,11 +4377,12 @@ unlock:
   * per-vm shrinkers cry out
   * sadness comes quickly
   */
 -list_move_tail(kvm-vm_list, vm_list);
  break;
  }
  
 -raw_spin_unlock(kvm_lock);
 +if (!found)
 +raw_spin_unlock(kvm_lock);
 +
  return freed;
  
  }
 -- 
 1.8.1.2
 
 --
   Gleb.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Gleb Natapov
On Thu, Jun 27, 2013 at 01:38:29PM +0200, Paolo Bonzini wrote:
 Il 27/06/2013 13:09, Gleb Natapov ha scritto:
  On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
  In commit e935b8372cf8 (KVM: Convert kvm_lock to raw_spinlock),
  I am copying Jan, the author of the patch. Commit message says:
  Code under this lock requires non-preemptibility, but which code
  exactly is this? Is this still true?
 
 hardware_enable_nolock/hardware_disable_nolock does.
 
I suspected this will be the answer and prepared another question :)
From a glance kvm_lock is used to protect those just to avoid creating
separate lock, so why not create raw one to protect them and change
kvm_lock to non raw again. Admittedly I haven't looked too close into
this yet.

 Paolo
 
  the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
  function tries to grab the (non-raw) mmu_lock within the scope of
  the raw locked kvm_lock being held.  This leads to the following:
 
  BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
  in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
  Preemption disabled at:[a0376eac] mmu_shrink+0x5c/0x1b0 [kvm]
 
  Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
  Call Trace:
   [8106f2ad] __might_sleep+0xfd/0x160
   [817d8d64] rt_spin_lock+0x24/0x50
   [a0376f3c] mmu_shrink+0xec/0x1b0 [kvm]
   [8111455d] shrink_slab+0x17d/0x3a0
   [81151f00] ? mem_cgroup_iter+0x130/0x260
   [8111824a] balance_pgdat+0x54a/0x730
   [8111fe47] ? set_pgdat_percpu_threshold+0xa7/0xd0
   [811185bf] kswapd+0x18f/0x490
   [81070961] ? get_parent_ip+0x11/0x50
   [81061970] ? __init_waitqueue_head+0x50/0x50
   [81118430] ? balance_pgdat+0x730/0x730
   [81060d2b] kthread+0xdb/0xe0
   [8106e122] ? finish_task_switch+0x52/0x100
   [817e1e94] kernel_thread_helper+0x4/0x10
   [81060c50] ? __init_kthread_worker+0x
 
  Since we only use the lock for protecting the vm_list, once we've
  found the instance we want, we can shuffle it to the end of the
  list and then drop the kvm_lock before taking the mmu_lock.  We
  can do this because after the mmu operations are completed, we
  break -- i.e. we don't continue list processing, so it doesn't
  matter if the list changed around us.
 
  Signed-off-by: Paul Gortmaker paul.gortma...@windriver.com
  ---
 
  [Note1: do double check that this solution makes sense for the
   mainline kernel; consider this an RFC patch that does want a
   review from people in the know.]
 
  [Note2: you'll need to be running a preempt-rt kernel to actually
   see this.  Also note that the above patch is against linux-next.
   Alternate solutions welcome ; this seemed to me the obvious fix.]
 
   arch/x86/kvm/mmu.c | 12 ++--
   1 file changed, 10 insertions(+), 2 deletions(-)
 
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 748e0d8..db93a70 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -4322,6 +4322,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct 
  shrink_control *sc)
   {
 struct kvm *kvm;
 int nr_to_scan = sc-nr_to_scan;
  +  int found = 0;
 unsigned long freed = 0;
   
 raw_spin_lock(kvm_lock);
  @@ -4349,6 +4350,12 @@ mmu_shrink_scan(struct shrinker *shrink, struct 
  shrink_control *sc)
 continue;
   
 idx = srcu_read_lock(kvm-srcu);
  +
  +  list_move_tail(kvm-vm_list, vm_list);
  +  found = 1;
  +  /* We can't be holding a raw lock and take non-raw mmu_lock */
  +  raw_spin_unlock(kvm_lock);
  +
 spin_lock(kvm-mmu_lock);
   
 if (kvm_has_zapped_obsolete_pages(kvm)) {
  @@ -4370,11 +4377,12 @@ unlock:
  * per-vm shrinkers cry out
  * sadness comes quickly
  */
  -  list_move_tail(kvm-vm_list, vm_list);
 break;
 }
   
  -  raw_spin_unlock(kvm_lock);
  +  if (!found)
  +  raw_spin_unlock(kvm_lock);
  +
 return freed;
   
   }
  -- 
  1.8.1.2
  
  --
  Gleb.
  

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks

2013-06-27 Thread Raghavendra K T

On 06/26/2013 02:03 PM, Raghavendra K T wrote:

On 06/24/2013 06:47 PM, Andrew Jones wrote:

On Mon, Jun 24, 2013 at 06:10:14PM +0530, Raghavendra K T wrote:


Results:
===
base = 3.10-rc2 kernel
patched = base + this series

The test was on 32 core (model: Intel(R) Xeon(R) CPU X7560) HT disabled
with 32 KVM guest vcpu 8GB RAM.


Have you ever tried to get results with HT enabled?



+---+---+---++---+
ebizzy (records/sec) higher is better
+---+---+---++---+
 basestdevpatchedstdev%improvement
+---+---+---++---+
1x  5574.9000   237.49975618.94.0366 0.77311
2x  2741.5000   561.30903332.   102.473821.53930
3x  2146.2500   216.77182302.76.3870 7.27237
4x  1663.   141.92351753.750083.5220 5.45701
+---+---+---++---+


This looks good. Are your ebizzy results consistent run to run
though?


+---+---+---++---+
   dbench  (Throughput) higher is better
+---+---+---++---+
 basestdevpatchedstdev%improvement
+---+---+---++---+
1x 14111.5600   754.4525   14645.9900   114.3087 3.78718
2x  2481.627071.26652667.128073.8193 7.47498
3x  1510.248331.86341503.879236.0777-0.42173
4x  1029.487516.91661039.706943.8840 0.99267
+---+---+---++---+


Hmm, I wonder what 2.5x looks like. Also, the 3% improvement with
no overcommit is interesting. What's happening there? It makes
me wonder what  1x looks like.



Hi Andrew,

I tried 2.5x case sort where I used 3 guests with 27 vcpu each on 32
core (HT disabled machine) and here is the output. almost no gain there.

  throughput avgstdev
base: 1768.7458 MB/sec 54.044221
patched:  1772.5617 MB/sec 41.227689
gain %0.226

I am yet to try HT enabled cases that would give 0.5x to 2x performance
results.



I have the result of HT enabled case now.
config: total 64 cpu (HT on) 32 vcpu guests.
I am seeing some inconsistency in ebizzy results in this case (May be 
Drew had tried with HT on and had observed the same in ebizzy runs).


patched-nonple and  base  performance in case of 1.5x and 2x also have 
been little inconsistent for dbench too. Overall I see pvspinlock + ple 
on case more stable.
and overall pvspinlock performance seem to be very impressive in HT 
enabled case.


patched = pvspinv10_hton
+---+---+---++---+
  ebizzy
++--+---+---++---+
basestdev   patched   stdev%improvement
++-+---+---++---+
 0.5x  6925.300074.4342   7317.86.3018 5.65607
 1.0x  2379.8000   405.3519   3427.   574.878944.00370
 1.5x  1850.833397.8114   2733.4167   459.801647.68573
 2.0x  1477.6250   105.2411   2525.250097.592170.89925
+---+---+---++---+
+---+---+---++---+
  dbench
++--+---+---++---+
basestdev   patched   stdev%improvement
++-+---+---++---+
 0.5x 9045.9950   463.1447   16482.720057.601782.21014
 1.0x 6251.1680   543.8219   11212.7600   380.754279.37064
 1.5x 3095.7475   231.15674308.8583   266.587339.18636
 2.0x 1219.120075.42941979.6750   134.693462.38557
+---+---+---++---+

patched = pvspinv10_hton_nople
+---+---+---++---+
  ebizzy
++--+---+---++---+
basestdev   patched   stdev%improvement
++-+---+---++---+
 0.5x 6925.300074.43427473.8000   224.6344 7.92023
 1.0x 2379.8000   405.35196176.2000   417.1133   159.52601
 1.5x 1850.833397.81142214.1667   515.687519.63080
 2.0x 1477.6250   105.2411 758.   108.8131   -48.70146
+---+---+---++---+
+---+---+---++---+
  dbench
++--+---+---++---+
basestdev   patched   stdev%improvement
++-+---+---++---+
 0.5x 9045.9950   463.1447  

Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Paolo Bonzini
Il 27/06/2013 13:43, Gleb Natapov ha scritto:
   I am copying Jan, the author of the patch. Commit message says:
   Code under this lock requires non-preemptibility, but which code
   exactly is this? Is this still true?
  
  hardware_enable_nolock/hardware_disable_nolock does.
  
 I suspected this will be the answer and prepared another question :)
 From a glance kvm_lock is used to protect those just to avoid creating
 separate lock, so why not create raw one to protect them and change
 kvm_lock to non raw again. Admittedly I haven't looked too close into
 this yet.

I was wondering the same, but I think it's fine.  There's just a handful
of uses outside virt/kvm/kvm_main.c.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Jan Kiszka
On 2013-06-27 13:38, Paolo Bonzini wrote:
 Il 27/06/2013 13:09, Gleb Natapov ha scritto:
 On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
 In commit e935b8372cf8 (KVM: Convert kvm_lock to raw_spinlock),
 I am copying Jan, the author of the patch. Commit message says:
 Code under this lock requires non-preemptibility, but which code
 exactly is this? Is this still true?
 
 hardware_enable_nolock/hardware_disable_nolock does.

IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it
reads the processor ID of the caller. That implies the caller cannot be
preempted, but theses days a migration lock should be fine as well.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Gleb Natapov
On Thu, Jun 27, 2013 at 02:16:07PM +0200, Jan Kiszka wrote:
 On 2013-06-27 13:38, Paolo Bonzini wrote:
  Il 27/06/2013 13:09, Gleb Natapov ha scritto:
  On Tue, Jun 25, 2013 at 06:34:03PM -0400, Paul Gortmaker wrote:
  In commit e935b8372cf8 (KVM: Convert kvm_lock to raw_spinlock),
  I am copying Jan, the author of the patch. Commit message says:
  Code under this lock requires non-preemptibility, but which code
  exactly is this? Is this still true?
  
  hardware_enable_nolock/hardware_disable_nolock does.
 
 IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it
 reads the processor ID of the caller. That implies the caller cannot be
 preempted, but theses days a migration lock should be fine as well.
 
OK, adding Marcelo to the party. This code is called from cpufreq
notifier. I would expect that it will be called from the context that
prevents migration to another cpu.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Paolo Bonzini
Il 27/06/2013 14:32, Gleb Natapov ha scritto:
   In commit e935b8372cf8 (KVM: Convert kvm_lock to raw_spinlock),
   I am copying Jan, the author of the patch. Commit message says:
   Code under this lock requires non-preemptibility, but which code
   exactly is this? Is this still true?
   
   hardware_enable_nolock/hardware_disable_nolock does.
  
  IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it
  reads the processor ID of the caller. That implies the caller cannot be
  preempted, but theses days a migration lock should be fine as well.
  
 OK, adding Marcelo to the party. This code is called from cpufreq
 notifier. I would expect that it will be called from the context that
 prevents migration to another cpu.

No, the CPU is in freq-cpu and may not even be the CPU that changed
frequency.

But even then I'm not sure the loop needs to be non-preemptible.  If it
were, the smp_call_function_single just before/after the loop would have
to be non-preemptable as well.  So it is just an optimization and it can
use raw_smp_processor_id() instead.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-next] kvm: don't try to take mmu_lock while holding the main raw kvm_lock

2013-06-27 Thread Paolo Bonzini
Il 27/06/2013 15:00, Paolo Bonzini ha scritto:
 Il 27/06/2013 14:32, Gleb Natapov ha scritto:
 In commit e935b8372cf8 (KVM: Convert kvm_lock to raw_spinlock),
 I am copying Jan, the author of the patch. Commit message says:
 Code under this lock requires non-preemptibility, but which code
 exactly is this? Is this still true?

 hardware_enable_nolock/hardware_disable_nolock does.

 IIRC, also the loop in kvmclock_cpufreq_notifier needs it because it
 reads the processor ID of the caller. That implies the caller cannot be
 preempted, but theses days a migration lock should be fine as well.

 OK, adding Marcelo to the party. This code is called from cpufreq
 notifier. I would expect that it will be called from the context that
 prevents migration to another cpu.
 
 No, the CPU is in freq-cpu and may not even be the CPU that changed
 frequency.

Try again: No, the CPU is in freq-cpu and smp_processor_id() may not
even be the CPU that changed frequency.  It probably makes more sense now.

Paolo

 But even then I'm not sure the loop needs to be non-preemptible.  If it
 were, the smp_call_function_single just before/after the loop would have
 to be non-preemptable as well.  So it is just an optimization and it can
 use raw_smp_processor_id() instead.
 
 Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: VMX: Use proper types to access const arrays

2013-06-27 Thread Paolo Bonzini
Il 26/06/2013 20:36, Mathias Krause ha scritto:
 Use a const pointer type instead of casting away the const qualifier
 from const arrays. Keep the pointer array on the stack, nonetheless.
 Making it static just increases the object size.
 
 Signed-off-by: Mathias Krause mini...@googlemail.com
 ---
  arch/x86/kvm/vmx.c |   15 +++
  1 file changed, 7 insertions(+), 8 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 260a919..7393164 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -5956,8 +5956,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
   unsigned long field;
   u64 field_value;
   struct vmcs *shadow_vmcs = vmx-nested.current_shadow_vmcs;
 - unsigned long *fields = (unsigned long *)shadow_read_write_fields;
 - int num_fields = max_shadow_read_write_fields;
 + const unsigned long *fields = shadow_read_write_fields;
 + const int num_fields = max_shadow_read_write_fields;
  
   vmcs_load(shadow_vmcs);
  
 @@ -5986,12 +5986,11 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx 
 *vmx)
  
  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
  {
 - unsigned long *fields[] = {
 - (unsigned long *)shadow_read_write_fields,
 - (unsigned long *)shadow_read_only_fields
 + const unsigned long *fields[] = {
 + shadow_read_write_fields,
 + shadow_read_only_fields
   };
 - int num_lists =  ARRAY_SIZE(fields);
 - int max_fields[] = {
 + const int max_fields[] = {
   max_shadow_read_write_fields,
   max_shadow_read_only_fields
   };
 @@ -6002,7 +6001,7 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
  
   vmcs_load(shadow_vmcs);
  
 - for (q = 0; q  num_lists; q++) {
 + for (q = 0; q  ARRAY_SIZE(fields); q++) {
   for (i = 0; i  max_fields[q]; i++) {
   field = fields[q][i];
   vmcs12_read_any(vmx-vcpu, field, field_value);
 

The const int is not particularly useful, but doesn't hurt either.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: VMX: Use size_t to store sizeof() values

2013-06-27 Thread Paolo Bonzini
Il 26/06/2013 20:36, Mathias Krause ha scritto:
 The type for storing values of the sizeof operator should be size_t.
 No semantical changes, only type correctness.
 
 Signed-off-by: Mathias Krause mini...@googlemail.com
 ---
  arch/x86/kvm/vmx.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 7393164..cd9090f 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -3909,7 +3909,7 @@ static void free_vpid(struct vcpu_vmx *vmx)
  static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
   u32 msr, int type)
  {
 - int f = sizeof(unsigned long);
 + const size_t f = sizeof(unsigned long);
  
   if (!cpu_has_vmx_msr_bitmap())
   return;
 @@ -3944,7 +3944,7 @@ static void __vmx_disable_intercept_for_msr(unsigned 
 long *msr_bitmap,
  static void __vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
   u32 msr, int type)
  {
 - int f = sizeof(unsigned long);
 + const size_t f = sizeof(unsigned long);
  
   if (!cpu_has_vmx_msr_bitmap())
   return;
 

Both the const and the change seem like useless churn.  It is only
used to adjust a pointer by a given number of bytes.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86: Drop useless cast

2013-06-27 Thread Paolo Bonzini
Il 26/06/2013 20:36, Mathias Krause ha scritto:
 Void pointers don't need no casting, drop it.
 
 Signed-off-by: Mathias Krause mini...@googlemail.com
 ---
  arch/x86/kvm/x86.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index e8ba99c..472350c 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5300,7 +5300,7 @@ static struct notifier_block pvclock_gtod_notifier = {
  int kvm_arch_init(void *opaque)
  {
   int r;
 - struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque;
 + struct kvm_x86_ops *ops = opaque;
  
   if (kvm_x86_ops) {
   printk(KERN_ERR kvm: already loaded the other module\n);
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: VMX: Use proper types to access const arrays

2013-06-27 Thread Mathias Krause
On 27 June 2013 15:33, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 26/06/2013 20:36, Mathias Krause ha scritto:
 Use a const pointer type instead of casting away the const qualifier
 from const arrays. Keep the pointer array on the stack, nonetheless.
 Making it static just increases the object size.

 Signed-off-by: Mathias Krause mini...@googlemail.com
 ---
  arch/x86/kvm/vmx.c |   15 +++
  1 file changed, 7 insertions(+), 8 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 260a919..7393164 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -5956,8 +5956,8 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
   unsigned long field;
   u64 field_value;
   struct vmcs *shadow_vmcs = vmx-nested.current_shadow_vmcs;
 - unsigned long *fields = (unsigned long *)shadow_read_write_fields;
 - int num_fields = max_shadow_read_write_fields;
 + const unsigned long *fields = shadow_read_write_fields;
 + const int num_fields = max_shadow_read_write_fields;

   vmcs_load(shadow_vmcs);

 @@ -5986,12 +5986,11 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx 
 *vmx)

  static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
  {
 - unsigned long *fields[] = {
 - (unsigned long *)shadow_read_write_fields,
 - (unsigned long *)shadow_read_only_fields
 + const unsigned long *fields[] = {
 + shadow_read_write_fields,
 + shadow_read_only_fields
   };
 - int num_lists =  ARRAY_SIZE(fields);
 - int max_fields[] = {
 + const int max_fields[] = {
   max_shadow_read_write_fields,
   max_shadow_read_only_fields
   };
 @@ -6002,7 +6001,7 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)

   vmcs_load(shadow_vmcs);

 - for (q = 0; q  num_lists; q++) {
 + for (q = 0; q  ARRAY_SIZE(fields); q++) {
   for (i = 0; i  max_fields[q]; i++) {
   field = fields[q][i];
   vmcs12_read_any(vmx-vcpu, field, field_value);


 The const int is not particularly useful, but doesn't hurt either.

It's more of a hint for the compiler to take the values verbatim
instead of allocating stack space for them. But it'll probably already
do it even without that hint.


 Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Mathias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] x86-64 apic panic on shutdown on 1.4.93.

2013-06-27 Thread Luiz Capitulino
On Wed, 26 Jun 2013 00:52:31 -0500
Rob Landley r...@landley.net wrote:

 I intermittently get this from current kernels running under currentish  
 qemu-git. Look familiar to anybody?

Which kernel do you run in the host? Is the guest doing anything
special?

 
 reboot: machine restart
 general protection fault: fff2 [#1]
 CPU: 0 PID: 44 Comm: oneit Not tainted 3.10.0-rc7+ #3
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: 8800068fd500 ti: 880006a26000 task.ti: 880006a26000
 RIP: 0010:[81014441]  [81014441]  
 lapic_shutdown+0x32/0x34
 RSP: 0018:880006a27e28  EFLAGS: 0202
 RAX: 2193fbf9078bfbf9 RBX: 0202 RCX: 
 RDX: 81015f71 RSI: 00ff RDI: 00f0
 RBP: fee1dead R08: 0400 R09: 
 R10:  R11: 88000699f500 R12: 
 R13: 01234567 R14: 0004 R15: 00423872
 FS:  () GS:81308000()  
 knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 00657ad0 CR3: 0697c000 CR4: 06b0
 DR0:  DR1:  DR2: 
 DR3:  DR6:  DR7: 
 Stack:
   28121969 81013bda 03f9 81013dc5
    8102ad4b  
      
 Call Trace:
   [81013bda] ? native_machine_shutdown+0x6/0x1a
   [81013dc5] ? native_machine_restart+0x1d/0x31
   [8102ad4b] ? SyS_reboot+0x126/0x15b
   [810374bc] ? schedule_tail+0x1e/0x44
   [8122f57f] ? ret_from_fork+0xf/0xb0
   [8122f690] ? system_call_fastpath+0x16/0x1b
 Code: 53 f6 c4 02 75 1b 31 c0 83 3d af 42 50 00 00 74 0c 31 c0 83 3d b4  
 42 50 00 00 0f 94 c0 85 c0 74 0a 9c 5b fa e8 88 ff ff ff 53 9d 5b c3  
 50 e8 11 ec 00 00 e8 d6 2f ff ff 48 8b 05 43 4b 32 00 bf
 RIP  [81014441] lapic_shutdown+0x32/0x34
   RSP 880006a27e28
 ---[ end trace dd3c376274d1a087 ]---

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: VMX: Use proper types to access const arrays

2013-06-27 Thread Paolo Bonzini
Il 27/06/2013 15:42, Mathias Krause ha scritto:
  The const int is not particularly useful, but doesn't hurt either.
 
 It's more of a hint for the compiler to take the values verbatim
 instead of allocating stack space for them. But it'll probably already
 do it even without that hint.

It won't change anything really.  Maybe for const int foo[], but I'm
not even sure about that and it depends a lot on the circumstances (loop
unrolling, inlining, whether you ever store foo in a variable or pass
it as a parameter, ...).  The compiler may leave it on the stack anyway.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] uio: uio_pci_generic: Add support for MSI interrupts

2013-06-27 Thread Guenter Roeck
On Thu, Jun 27, 2013 at 10:45:01AM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 26, 2013 at 03:30:23PM -0700, Guenter Roeck wrote:
  Enable support for MSI interrupts if the device supports it.
  Since MSI interrupts are edge triggered, it is no longer necessary to
  disable interrupts in the kernel and re-enable them from user-space.
  Instead, clearing the interrupt condition in the user space application
  automatically re-enables the interrupt.
  
  Signed-off-by: Guenter Roeck li...@roeck-us.net
  ---
  An open question is if we can just do this unconditionally
  or if there should be some flag to enable it. A module parameter, maybe ?
 
 NACK
 
 UIO is for devices that don't do memory writes.
 Anything that can do writes must be protected by an IOMMU
 and/or have a secure kernel driver, not a UIO stub.
 
 MSI is done by memory writes so if userspace
 controls the device it can trick it to write
 anywhere in memory.
 
Interesting. Thanks for letting me know.

Guenter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-27 Thread Stefan Pietsch
On 26.06.2013 12:47, Gleb Natapov wrote:
 On Mon, Jun 24, 2013 at 10:42:57PM +0200, Stefan Pietsch wrote:
 On 24.06.2013 14:30, Gleb Natapov wrote:
 On Mon, Jun 24, 2013 at 01:59:34PM +0200, Stefan Pietsch wrote:
 As soon as I remove kvmvapic.bin the virtual machine boots with
 qemu-kvm 1.5.0. I just verified this with Linux kernel 3.10.0-rc5.
 emulate_invalid_guest_state=0 or emulate_invalid_guest_state=1 make
 no difference.

 Please send your patches.
 Here it is, run with it and kvmvapic.bin present. See what is printed in
 dmesg after the failure.


 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index f4a5b3f..65488a4 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -3385,6 +3385,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
  {
 struct vcpu_vmx *vmx = to_vmx(vcpu);
 u32 ar;
 +   unsigned long rip;
  
 if (vmx-rmode.vm86_active  seg != VCPU_SREG_LDTR) {
 *var = vmx-rmode.segs[seg];
 @@ -3408,6 +3409,9 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
 var-db = (ar  14)  1;
 var-g = (ar  15)  1;
 var-unusable = (ar  16)  1;
 +   rip = kvm_rip_read(vcpu);
 +   if ((rip == 0xc101611c || rip == 0xc101611a)  seg == VCPU_SREG_FS)
 +   printk(base=%p limit=%p selector=%x ar=%x\n, var-base, 
 var-limit, var-selector, ar);
  }
  
  static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)


 Booting kernel Linux 3.10-rc5 with your patch applied produces these
 messages in dmesg when starting a virtual machine:

 emulate_invalid_guest_state=0
 [  118.732151] base= limit=  (null) selector=ffff ar=0
 [  118.732341] base= limit=  (null) selector=ffff ar=0

 I've butchered printk format, but it gives me the idea of what is going
 on anyway. Can you try the patch below with
 emulate_invalid_guest_state=0|1?
 
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index f4a5b3f..eb062ce 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -3395,19 +3395,20 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
   var-selector = vmx_read_guest_seg_selector(vmx, seg);
   return;
   }
 +
   var-base = vmx_read_guest_seg_base(vmx, seg);
   var-limit = vmx_read_guest_seg_limit(vmx, seg);
   var-selector = vmx_read_guest_seg_selector(vmx, seg);
   ar = vmx_read_guest_seg_ar(vmx, seg);
 + var-unusable = (ar  16)  1;
   var-type = ar  15;
   var-s = (ar  4)  1;
   var-dpl = (ar  5)  3;
 - var-present = (ar  7)  1;
 + var-present = !var-unusable;
   var-avl = (ar  12)  1;
   var-l = (ar  13)  1;
   var-db = (ar  14)  1;
   var-g = (ar  15)  1;
 - var-unusable = (ar  16)  1;
  }
  
  static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)


Kernel 3.10-rc5 with your latest patch applied can successfully boot the
virtual machine with emulate_invalid_guest_state 0 or 1.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] vfio/type1: fix a leak on error path

2013-06-27 Thread Alex Williamson
On Thu, 2013-06-27 at 10:07 +0300, Dan Carpenter wrote:
 If vfio_unmap_unpin() returns an error then we leak split.  I've moved
 the allocation later in the function to fix this.

Thanks for spotting this and for the patch.  The allocation is done
early because if we get an allocation failure later, we lose track of
iommu mappings, which leads to other leaks.  I think it's best to fix
the exit path, but we can improve the comment and fix an inconsistent
return value at the same time.  Thanks,

Alex


vfio/type1: Fix leak on error path

We also don't handle unpinning zero pages as an error on other exits
so we can fix that inconsistency by rolling in the next conditional
return.

Reported-by: Dan Carpenter dan.carpen...@oracle.com
Signed-off-by: Alex Williamson alex.william...@redhat.com

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 98231d1..a9807de 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -436,6 +436,12 @@ static int vfio_remove_dma_overlap(struct vfio_iommu 
*iommu, dma_addr_t start,
}
 
/* Split existing */
+
+   /*
+* Allocate our tracking structure early even though it may not
+* be used.  An Allocation failure later loses track of pages and
+* is more difficult to unwind.
+*/
split = kzalloc(sizeof(*split), GFP_KERNEL);
if (!split)
return -ENOMEM;
@@ -443,12 +449,9 @@ static int vfio_remove_dma_overlap(struct vfio_iommu 
*iommu, dma_addr_t start,
offset = start - dma-iova;
 
ret = vfio_unmap_unpin(iommu, dma, start, size);
-   if (ret)
-   return ret;
-
-   if (!*size) {
+   if (ret || !*size) {
kfree(split);
-   return -EINVAL;
+   return ret;
}
 
tmp = dma-size;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] vfio: add external user support

2013-06-27 Thread Alex Williamson
On Thu, 2013-06-27 at 17:14 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.
 
 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 in the host kernel to avoid passing map/unmap requests to the user
 space which would made things pretty slow.
 
 The proposed protocol includes:
 
 1. do normal VFIO init stuff such as opening a new container, attaching
 group(s) to it, setting an IOMMU driver for a container. When IOMMU is
 set for a container, all groups in it are considered ready to use by
 an external user.
 
 2. pass a fd of the group we want to accelerate to KVM. KVM calls
 vfio_group_iommu_id_from_file() to verify if the group is initialized
 and IOMMU is set for it. The current TCE IOMMU driver marks the whole
 IOMMU table as busy when IOMMU is set for a container what this prevents
 other DMA users from allocating from it so it is safe to pass the group
 to the user space.
 
 3. KVM increases the container users counter via
 vfio_group_add_external_user(). This prevents the VFIO group from
 being disposed prior to exiting KVM.
 
 4. When KVM is finished and doing cleanup, it releases the group file
 and decrements the container users counter. Everything gets released.
 
 5. KVM also keeps the group file as otherwise its fd might have been
 closed at the moment of KVM finish so vfio_group_del_external_user()
 call will not be possible.

This is the wrong order in my mind.  An external user has no business
checking or maintaining any state of a group until it calls
add_external_user().  Only after that call is successful can the user
assume the filep to group relationship is static and get the iommu_id.
Any use of the external user API should start with add and end with
del.

 The vfio: Limit group opens patch is also required for the consistency.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 
 v1-v2: added definitions to vfio.h :)
 Should not compile but compiled. Hm.
 
 ---
  drivers/vfio/vfio.c  |   54 
 ++
  include/linux/vfio.h |7 +++
  2 files changed, 61 insertions(+)
 
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..40875d2 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,60 @@ static const struct file_operations vfio_device_fops = 
 {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + */
 +
 +/* Allows an external user (for example, KVM) to lock an IOMMU group */
 +int vfio_group_add_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + if (filep-f_op != vfio_group_fops)
 + return -EINVAL;
 +
 + if (!atomic_inc_not_zero(group-container_users))
 + return -EINVAL;

This is the place where I was suggesting we need tests to match
get_device_fd.  It's not clear what the external user is holding if the
group has no iommu or is not viable here.


if (!group-container-iommu_driver || !vfio_group_viable(group)) {
vfio_group_try_dissolve_container(group);
return -EINVAL;
}

 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_add_external_user);
 +
 +/* Allows an external user (for example, KVM) to unlock an IOMMU group */
 +void vfio_group_del_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + if (WARN_ON(filep-f_op != vfio_group_fops))
 + return;

How about we make this return int so we can return 0/-EINVAL and the
caller can decide the severity of the response?

 +
 + vfio_group_try_dissolve_container(group);
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_del_external_user);
 +
 +/*
 + * Checks if a group for the specified file can be used by
 + * an external user and returns the IOMMU ID if external use is possible.
 + */
 +int vfio_group_iommu_id_from_file(struct file *filep)

Let's name this in a way that makes it clear that it's part of the
external_user API.  vfio_group_external_user_iommu_id?

 +{
 + int ret;
 + struct vfio_group *group = filep-private_data;
 +
 + if (WARN_ON(filep-f_op != vfio_group_fops))
 + return -EINVAL;

This one probably doesn't deserve a WARN_ON either, let the caller
blowup if it wants.

 +
 + if (0 == atomic_read(group-container_users) ||
 + !group-container-iommu_driver ||
 + !vfio_group_viable(group))
 + return -EINVAL;

The above test just becomes a weak test that the caller is  correctly
using the API since we should be enforcing these tests when the external
user is added.  It doesn't hurt to leave them, but it's not very
convincing that the caller is the one holding anything.

 + ret = iommu_group_id(group-iommu_group);

The 'ret' variable isn't needed.
 

Re: [Qemu-devel] [PATCH qom-cpu v2 20/29] kvm: Change kvm_remove_all_breakpoints() argument to CPUState

2013-06-27 Thread Andreas Färber
Am 17.06.2013 18:17, schrieb Paolo Bonzini:
 Il 16/06/2013 17:57, Andreas Färber ha scritto:
 Signed-off-by: Andreas Färber afaer...@suse.de
 ---
  gdbstub.c| 2 +-
  include/sysemu/kvm.h | 2 +-
  kvm-all.c| 5 ++---
  kvm-stub.c   | 2 +-
  4 files changed, 5 insertions(+), 6 deletions(-)

 diff --git a/gdbstub.c b/gdbstub.c
 index 3101a43..9e7f7a1 100644
 --- a/gdbstub.c
 +++ b/gdbstub.c
 @@ -2019,7 +2019,7 @@ static void gdb_breakpoint_remove_all(void)
  CPUArchState *env;
  
  if (kvm_enabled()) {
 -kvm_remove_all_breakpoints(gdbserver_state-c_cpu);
 +kvm_remove_all_breakpoints(ENV_GET_CPU(gdbserver_state-c_cpu));
 
 Planning to make gdbserver_state take a CPUState, too?

Yes, I'm still working on that: The qom-cpu-11 series is already about
as large as this one and still has only two out of three CPUArchState
fields converted. ;)

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] uio: uio_pci_generic: Add support for MSI interrupts

2013-06-27 Thread Guenter Roeck
On Thu, Jun 27, 2013 at 10:45:01AM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 26, 2013 at 03:30:23PM -0700, Guenter Roeck wrote:
  Enable support for MSI interrupts if the device supports it.
  Since MSI interrupts are edge triggered, it is no longer necessary to
  disable interrupts in the kernel and re-enable them from user-space.
  Instead, clearing the interrupt condition in the user space application
  automatically re-enables the interrupt.
  
  Signed-off-by: Guenter Roeck li...@roeck-us.net
  ---
  An open question is if we can just do this unconditionally
  or if there should be some flag to enable it. A module parameter, maybe ?
 
 NACK
 
 UIO is for devices that don't do memory writes.
 Anything that can do writes must be protected by an IOMMU
 and/or have a secure kernel driver, not a UIO stub.
 
 MSI is done by memory writes so if userspace
 controls the device it can trick it to write
 anywhere in memory.
 
Just out of curiosity: Since MSI support is mandatory for all PCIE devices,
isn't that possible anyway, even if MSI is not enabled by the kernel ?
All one would need to do is to enable MSI from user space; after all,
the chip configuration space is writable.

Thanks,
Guenter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-06-27 Thread Scott Wood

On 06/27/2013 12:02:36 AM, Alexey Kardashevskiy wrote:

+/*
+ * The KVM guest can be backed with 16MB pages.
+ * In this case, we cannot do page counting from the real mode
+ * as the compound pages are used - they are linked in a list
+ * with pointers as virtual addresses which are inaccessible
+ * in real mode.
+ *
+ * The code below keeps a 16MB pages list and uses page struct
+ * in real mode if it is already locked in RAM and inserted into
+ * the list or switches to the virtual mode where it can be
+ * handled in a usual manner.
+ */
+#define KVMPPC_HUGEPAGE_HASH(gpa)  hash_32(gpa  24, 32)
+
+struct kvmppc_iommu_hugepage {
+   struct hlist_node hash_node;
+   unsigned long gpa;  /* Guest physical address */
+   unsigned long hpa;  /* Host physical address */
+	struct page *page;	/* page struct of the very first  
subpage */
+	unsigned long size;	/* Huge page size (always 16MB at the  
moment) */

+};


Shouldn't this be namespaced to something like book3s or spapr?

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] Introduce VM Sockets virtio transport

2013-06-27 Thread Sasha Levin

Hi Asias,

Looks nice! Some comments inline below (I've removed anything that mst already
commented on).

On 06/27/2013 03:59 AM, Asias He wrote:

Hello guys,

In commit d021c344051af91 (VSOCK: Introduce VM Sockets), VMware added VM
Sockets support. VM Sockets allows communication between virtual
machines and the hypervisor. VM Sockets is able to use different
hyervisor neutral transport to transfer data. Currently, only VMware
VMCI transport is supported.

This series introduces virtio transport for VM Sockets.

Any comments are appreciated! Thanks!

Code:
=
1) kernel bits
git://github.com/asias/linux.git vsock

2) userspace bits:
git://github.com/asias/linux-kvm.git vsock

Howto:
=
Make sure you have these kernel options:

   CONFIG_VSOCKETS=y
   CONFIG_VIRTIO_VSOCKETS=y
   CONFIG_VIRTIO_VSOCKETS_COMMON=y
   CONFIG_VHOST_VSOCK=m

$ git clone git://github.com/asias/linux-kvm.git
$ cd linux-kvm/tools/kvm
$ co -b vsock origin/vsock
$ make
$ modprobe vhost_vsock
$ ./lkvm run -d os.img -k bzImage --vsock guest_cid

Test:
=
I hacked busybox's http server and wget to run over vsock. Start http
server in host and guest, download a 512MB file in guest and host
simultaneously for 6000 times. Manged to run the http stress test.

Also, I wrote a small libvsock.so to play the LD_PRELOAD trick and
managed to make sshd and ssh work over virito-vsock without modifying
the source code.


Why did it require hacking in the first place? Does running with kvmtool
and just doing regular networking over virtio-net running on top of vsock
achieves the same goal?


Draft VM Sockets Virtio Device spec:
=
Appendix K: VM Sockets Device

The virtio VM sockets device is a virtio transport device for VM Sockets. VM
Sockets allows communication between virtual machines and the hypervisor.

Configuration:

Subsystem Device ID 13

Virtqueues:
 0:controlq; 1:receiveq0; 2:transmitq0 ... 2N+1:receivqN; 2N+2:transmitqN


controlq is defined but not used, is there something in mind for it? if not,
does it make sense keeping it here? we can always re-add it to the end just
like in virtio-net.


Feature bits:
 Currently, no feature bits are defined.

Device configuration layout:

Two configuration fields are currently defined.

struct virtio_vsock_config {
__u32 guest_cid;
__u32 max_virtqueue_pairs;
} __packed;

The guest_cid field specifies the guest context id which likes the guest IP
address. The max_virtqueue_pairs field specifies the maximum number of receive
and transmit virtqueue pairs (receiveq0 ...  receiveqN and transmitq0 ...
transmitqN respectively; N = max_virtqueue_pairs - 1 ) that can be configured.
The driver is free to use only one virtqueue pairs, or it can use more to
achieve better performance.


How does the driver tell the device how many vqs it's planning on actually 
using?
or is it assumed that all of them are in use?



Device Initialization:
The initialization routine should discover the device's virtqueues.

Device Operation:
Packets are transmitted by placing them in the transmitq0..transmitqN, and
buffers for incoming packets are placed in the receiveq0..receiveqN. In each
case, the packet itself is preceded by a header:

struct virtio_vsock_hdr {
__u32   src_cid;
__u32   src_port;
__u32   dst_cid;
__u32   dst_port;
__u32   len;
__u8type;
__u8op;
__u8shut;
__u64   fwd_cnt;
__u64   buf_alloc;
} __packed;

src_cid and dst_cid: specify the source and destination context id.
src_port and dst_port: specify the source and destination port.
len: specifies the size of the data payload, it could be zero if no data
payload is transferred.
type: specifies the type of the packet, it can be SOCK_STREAM or SOCK_DGRAM.
op: specifies the operation of the packet, it is defined as follows.

enum {
VIRTIO_VSOCK_OP_INVALID = 0,
VIRTIO_VSOCK_OP_REQUEST = 1,
VIRTIO_VSOCK_OP_NEGOTIATE = 2,
VIRTIO_VSOCK_OP_OFFER = 3,
VIRTIO_VSOCK_OP_ATTACH = 4,
VIRTIO_VSOCK_OP_RW = 5,
VIRTIO_VSOCK_OP_CREDIT = 6,
VIRTIO_VSOCK_OP_RST = 7,
VIRTIO_VSOCK_OP_SHUTDOWN = 8,
};

shut: specifies the shutdown mode when the socket is being shutdown. 1 is for
receive shutdown, 2 is for transmit shutdown, 3 is for both receive and transmit
shutdown.
fwd_cnt: specifies the the number of bytes the receiver has forwarded to 
userspace.


For the previous packet? For the entire session? Reading ahead makes it clear 
but
it's worth mentioning here the context just to make it easy for implementers.


buf_alloc: specifies the size of the receiver's recieve buffer in bytes.

  receive


Virtio VM socket connection creation:
1) Client 

Re: Network performance data

2013-06-27 Thread Brian Jackson

On Thursday, June 27, 2013 1:09:37 AM CDT, Bill Rich wrote:

Hello All,

I've run into a problem with getting network performance data on
Windows VMs running on KVM. When I check the network data in the
Windows task manager on the VM, it remains at zero, even if large
amounts of data are being transferred. This has been tested on Windows
Server 2008r2 using the standard Windows driver and the e1000 nic. I
searched the web and the bug reports specifically, but I didn't find
this issue mentioned. Is this expected behavior, or is there something
I can do to fix it?


Personally, I'd try a newer version of Qemu. There have been lots of fixes 
since 0.12 was released (almost 4 years ago). Barring that, you might want to 
seek support from your distribution.





Below is the info on the hypervisor the VM is running on:

OS: CentOS release 6.4
Kernel: 2.6.32-358.11.1.el6.x86_64
qemu-kvm: 0.12.1.2-3.209.el6.4.x86_64



P.S. In the future, it's sufficient to send the command line options used 
instead of the XML config file from libvirt.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Migration route from Parallels on Mac for Windows images?

2013-06-27 Thread Brian Jackson

On Wednesday, June 26, 2013 8:25:54 PM CDT, Ken Roberts wrote:
Sorry for the user query but I'm not finding expertise on the 
Linux mailing lists I belong to.  The web site says one-off user 
questions are OK.


I have a few VM images on Parallels 8 for Mac. I want them to 
be on KVM/Linux.


Some of the images are Linux, but the critical ones are a few 
types of Windows.  I don't want to trash my licenses.


I noticed that kvm-img has a parallels format option, and it 
seems to work while the conversion is going on.  I've tried 
kvm-img to convert to qcow2 and to raw, both cases the image 
converts but the disk is not bootable.  The only file the 
kvm-img doesn't immediately fail on is the one that contains the 
data.



More details on not bootable would be nice. Do you get a blue screen? Seabios 
screen? You may need to prep the image before you convert it (google mergeide).





The best answer to my problem is to find out how to make the disk bootable.

The next best answer is to find out if there is a reliable 
migration path, even if it means going to VMware first.


Also, if VMware is a necessary intermediate point, it would 
help to know which VMware format to use for best results.


I'm not a KVM expert, I've made some VMs on LVM and installed 
Linux on them with bridged networking, that's about the extent 
of it.  For the record that was insanely simple.


Thanks.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-27 Thread Gleb Natapov
On Thu, Jun 27, 2013 at 04:09:50PM +0200, Stefan Pietsch wrote:
 On 26.06.2013 12:47, Gleb Natapov wrote:
  On Mon, Jun 24, 2013 at 10:42:57PM +0200, Stefan Pietsch wrote:
  On 24.06.2013 14:30, Gleb Natapov wrote:
  On Mon, Jun 24, 2013 at 01:59:34PM +0200, Stefan Pietsch wrote:
  As soon as I remove kvmvapic.bin the virtual machine boots with
  qemu-kvm 1.5.0. I just verified this with Linux kernel 3.10.0-rc5.
  emulate_invalid_guest_state=0 or emulate_invalid_guest_state=1 make
  no difference.
 
  Please send your patches.
  Here it is, run with it and kvmvapic.bin present. See what is printed in
  dmesg after the failure.
 
 
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index f4a5b3f..65488a4 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -3385,6 +3385,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
   {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 ar;
  + unsigned long rip;
   
if (vmx-rmode.vm86_active  seg != VCPU_SREG_LDTR) {
*var = vmx-rmode.segs[seg];
  @@ -3408,6 +3409,9 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
var-db = (ar  14)  1;
var-g = (ar  15)  1;
var-unusable = (ar  16)  1;
  + rip = kvm_rip_read(vcpu);
  + if ((rip == 0xc101611c || rip == 0xc101611a)  seg == VCPU_SREG_FS)
  + printk(base=%p limit=%p selector=%x ar=%x\n, var-base, 
  var-limit, var-selector, ar);
   }
   
   static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
 
 
  Booting kernel Linux 3.10-rc5 with your patch applied produces these
  messages in dmesg when starting a virtual machine:
 
  emulate_invalid_guest_state=0
  [  118.732151] base= limit=  (null) selector=ffff ar=0
  [  118.732341] base= limit=  (null) selector=ffff ar=0
 
  I've butchered printk format, but it gives me the idea of what is going
  on anyway. Can you try the patch below with
  emulate_invalid_guest_state=0|1?
  
  
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index f4a5b3f..eb062ce 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -3395,19 +3395,20 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
  var-selector = vmx_read_guest_seg_selector(vmx, seg);
  return;
  }
  +
  var-base = vmx_read_guest_seg_base(vmx, seg);
  var-limit = vmx_read_guest_seg_limit(vmx, seg);
  var-selector = vmx_read_guest_seg_selector(vmx, seg);
  ar = vmx_read_guest_seg_ar(vmx, seg);
  +   var-unusable = (ar  16)  1;
  var-type = ar  15;
  var-s = (ar  4)  1;
  var-dpl = (ar  5)  3;
  -   var-present = (ar  7)  1;
  +   var-present = !var-unusable;
  var-avl = (ar  12)  1;
  var-l = (ar  13)  1;
  var-db = (ar  14)  1;
  var-g = (ar  15)  1;
  -   var-unusable = (ar  16)  1;
   }
   
   static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
 
 
 Kernel 3.10-rc5 with your latest patch applied can successfully boot the
 virtual machine with emulate_invalid_guest_state 0 or 1.
Excellent. I will send the patch to the list. Thank you for your help
in tracking it.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 41/45] powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-27 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Gleb Natapov g...@redhat.com
Cc: Alexander Graf ag...@suse.de
Cc: Rob Herring rob.herr...@calxeda.com
Cc: Grant Likely grant.lik...@secretlab.ca
Cc: Kumar Gala ga...@kernel.crashing.org
Cc: Zhao Chenhui chenhui.z...@freescale.com
Cc: linuxppc-...@lists.ozlabs.org
Cc: kvm@vger.kernel.org
Cc: kvm-...@vger.kernel.org
Cc: oprofile-l...@lists.sf.net
Cc: cbe-oss-...@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 arch/powerpc/kernel/irq.c  |7 ++-
 arch/powerpc/kernel/machine_kexec_64.c |4 ++--
 arch/powerpc/kernel/smp.c  |2 ++
 arch/powerpc/kvm/book3s_hv.c   |5 +++--
 arch/powerpc/mm/mmu_context_nohash.c   |3 +++
 arch/powerpc/oprofile/cell/spu_profiler.c  |3 +++
 arch/powerpc/oprofile/cell/spu_task_sync.c |4 
 arch/powerpc/oprofile/op_model_cell.c  |6 ++
 8 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ca39bac..41e9961 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -45,6 +45,7 @@
 #include linux/irq.h
 #include linux/seq_file.h
 #include linux/cpumask.h
+#include linux/cpu.h
 #include linux/profile.h
 #include linux/bitops.h
 #include linux/list.h
@@ -410,7 +411,10 @@ void migrate_irqs(void)
unsigned int irq;
static int warned;
cpumask_var_t mask;
-   const struct cpumask *map = cpu_online_mask;
+   const struct cpumask *map;
+
+   get_online_cpus_atomic();
+   map = cpu_online_mask;
 
alloc_cpumask_var(mask, GFP_ATOMIC);
 
@@ -436,6 +440,7 @@ void migrate_irqs(void)
}
 
free_cpumask_var(mask);
+   put_online_cpus_atomic();
 
local_irq_enable();
mdelay(1);
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 611acdf..38f6d75 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -187,7 +187,7 @@ static void kexec_prepare_cpus_wait(int wait_state)
int my_cpu, i, notified=-1;
 
hw_breakpoint_disable();
-   my_cpu = get_cpu();
+   my_cpu = get_online_cpus_atomic();
/* Make sure each CPU has at least made it to the state we need.
 *
 * FIXME: There is a (slim) chance of a problem if not all of the CPUs
@@ -266,7 +266,7 @@ static void kexec_prepare_cpus(void)
 */
kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
 
-   put_cpu();
+   put_online_cpus_atomic();
 }
 
 #else /* ! SMP */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7ac5e..2123bec 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -277,9 +277,11 @@ void smp_send_debugger_break(void)
if (unlikely(!smp_ops))
return;
 
+   get_online_cpus_atomic();
for_each_online_cpu(cpu)
if (cpu != me)
do_message_pass(cpu, PPC_MSG_DEBUGGER_BREAK);
+   put_online_cpus_atomic();
 }
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2efa9dd..9d8a973 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -28,6 +28,7 @@
 #include linux/fs.h
 #include linux/anon_inodes.h
 #include linux/cpumask.h
+#include linux/cpu.h
 #include linux/spinlock.h
 #include linux/page-flags.h
 #include linux/srcu.h
@@ -78,7 +79,7 @@ void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
++vcpu-stat.halt_wakeup;
}
 
-   me = get_cpu();
+   me = get_online_cpus_atomic();
 
/* CPU points to the first thread of the core */
if (cpu != me  cpu = 0  cpu  nr_cpu_ids) {
@@ -88,7 +89,7 @@ void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
else if (cpu_online(cpu))
smp_send_reschedule(cpu);
}
-   put_cpu();
+   put_online_cpus_atomic();
 }
 
 /*
diff --git a/arch/powerpc/mm/mmu_context_nohash.c 
b/arch/powerpc/mm/mmu_context_nohash.c
index e779642..c7bdcb4 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -194,6 +194,8 @@ void switch_mmu_context(struct mm_struct *prev, struct 
mm_struct *next)
unsigned int i, id, cpu = smp_processor_id();
unsigned long *map;
 
+   get_online_cpus_atomic();
+
/* No lockless fast path .. yet */
raw_spin_lock(context_lock);
 
@@ -280,6 +282,7 @@ void switch_mmu_context(struct mm_struct *prev, struct 
mm_struct *next)
pr_hardcont( - %d\n, id);
set_context(id, next-pgd);

[PATCH v3 29/45] KVM: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-27 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Acked-by: Paolo Bonzini pbonz...@redhat.com
Cc: Gleb Natapov g...@redhat.com
Cc: kvm@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 virt/kvm/kvm_main.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 302681c..5bbfa30 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -174,7 +174,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned 
int req)
 
zalloc_cpumask_var(cpus, GFP_ATOMIC);
 
-   me = get_cpu();
+   me = get_online_cpus_atomic();
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_make_request(req, vcpu);
cpu = vcpu-cpu;
@@ -192,7 +192,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned 
int req)
smp_call_function_many(cpus, ack_flush, NULL, 1);
else
called = false;
-   put_cpu();
+   put_online_cpus_atomic();
free_cpumask_var(cpus);
return called;
 }
@@ -1707,11 +1707,11 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
++vcpu-stat.halt_wakeup;
}
 
-   me = get_cpu();
+   me = get_online_cpus_atomic();
if (cpu != me  (unsigned)cpu  nr_cpu_ids  cpu_online(cpu))
if (kvm_arch_vcpu_should_kick(vcpu))
smp_send_reschedule(cpu);
-   put_cpu();
+   put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
 #endif /* !CONFIG_S390 */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re:

2013-06-27 Thread emirates

Did You Recieve Our Last Notification!!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] vfio: add external user support

2013-06-27 Thread Alexey Kardashevskiy
On 06/28/2013 01:44 AM, Alex Williamson wrote:
 On Thu, 2013-06-27 at 17:14 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.

 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 in the host kernel to avoid passing map/unmap requests to the user
 space which would made things pretty slow.

 The proposed protocol includes:

 1. do normal VFIO init stuff such as opening a new container, attaching
 group(s) to it, setting an IOMMU driver for a container. When IOMMU is
 set for a container, all groups in it are considered ready to use by
 an external user.

 2. pass a fd of the group we want to accelerate to KVM. KVM calls
 vfio_group_iommu_id_from_file() to verify if the group is initialized
 and IOMMU is set for it. The current TCE IOMMU driver marks the whole
 IOMMU table as busy when IOMMU is set for a container what this prevents
 other DMA users from allocating from it so it is safe to pass the group
 to the user space.

 3. KVM increases the container users counter via
 vfio_group_add_external_user(). This prevents the VFIO group from
 being disposed prior to exiting KVM.

 4. When KVM is finished and doing cleanup, it releases the group file
 and decrements the container users counter. Everything gets released.

 5. KVM also keeps the group file as otherwise its fd might have been
 closed at the moment of KVM finish so vfio_group_del_external_user()
 call will not be possible.
 
 This is the wrong order in my mind.  An external user has no business
 checking or maintaining any state of a group until it calls
 add_external_user().  Only after that call is successful can the user
 assume the filep to group relationship is static and get the iommu_id.
 Any use of the external user API should start with add and end with
 del.

Yes, this is what I actually do, just wrong commit message, will fix.

 
 The vfio: Limit group opens patch is also required for the consistency.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---

 v1-v2: added definitions to vfio.h :)
 Should not compile but compiled. Hm.

 ---
  drivers/vfio/vfio.c  |   54 
 ++
  include/linux/vfio.h |7 +++
  2 files changed, 61 insertions(+)

 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..40875d2 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,60 @@ static const struct file_operations vfio_device_fops 
 = {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + */
 +
 +/* Allows an external user (for example, KVM) to lock an IOMMU group */
 +int vfio_group_add_external_user(struct file *filep)
 +{
 +struct vfio_group *group = filep-private_data;
 +
 +if (filep-f_op != vfio_group_fops)
 +return -EINVAL;
 +
 +if (!atomic_inc_not_zero(group-container_users))
 +return -EINVAL;
 
 This is the place where I was suggesting we need tests to match
 get_device_fd.  It's not clear what the external user is holding if the
 group has no iommu or is not viable here.


In my mind this test must include test for iommu id so I would merge it
with vfio_group_iommu_id_from_file(). Till I check iommu id, I still cannot
use this group so where to put check for iommu/viable does not really
matter (for me).

 
 
 if (!group-container-iommu_driver || !vfio_group_viable(group)) {
   vfio_group_try_dissolve_container(group);
   return -EINVAL;
 }
 
 +
 +return 0;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_add_external_user);
 +
 +/* Allows an external user (for example, KVM) to unlock an IOMMU group */
 +void vfio_group_del_external_user(struct file *filep)
 +{
 +struct vfio_group *group = filep-private_data;
 +
 +if (WARN_ON(filep-f_op != vfio_group_fops))
 +return;
 
 How about we make this return int so we can return 0/-EINVAL and the
 caller can decide the severity of the response?

And what can the caller possibly do on !0?


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] vfio: add external user support

2013-06-27 Thread Alex Williamson
On Fri, 2013-06-28 at 08:57 +1000, Alexey Kardashevskiy wrote:
 On 06/28/2013 01:44 AM, Alex Williamson wrote:
  On Thu, 2013-06-27 at 17:14 +1000, Alexey Kardashevskiy wrote:
  VFIO is designed to be used via ioctls on file descriptors
  returned by VFIO.
 
  However in some situations support for an external user is required.
  The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
  use the existing VFIO groups for exclusive access in real/virtual mode
  in the host kernel to avoid passing map/unmap requests to the user
  space which would made things pretty slow.
 
  The proposed protocol includes:
 
  1. do normal VFIO init stuff such as opening a new container, attaching
  group(s) to it, setting an IOMMU driver for a container. When IOMMU is
  set for a container, all groups in it are considered ready to use by
  an external user.
 
  2. pass a fd of the group we want to accelerate to KVM. KVM calls
  vfio_group_iommu_id_from_file() to verify if the group is initialized
  and IOMMU is set for it. The current TCE IOMMU driver marks the whole
  IOMMU table as busy when IOMMU is set for a container what this prevents
  other DMA users from allocating from it so it is safe to pass the group
  to the user space.
 
  3. KVM increases the container users counter via
  vfio_group_add_external_user(). This prevents the VFIO group from
  being disposed prior to exiting KVM.
 
  4. When KVM is finished and doing cleanup, it releases the group file
  and decrements the container users counter. Everything gets released.
 
  5. KVM also keeps the group file as otherwise its fd might have been
  closed at the moment of KVM finish so vfio_group_del_external_user()
  call will not be possible.
  
  This is the wrong order in my mind.  An external user has no business
  checking or maintaining any state of a group until it calls
  add_external_user().  Only after that call is successful can the user
  assume the filep to group relationship is static and get the iommu_id.
  Any use of the external user API should start with add and end with
  del.
 
 Yes, this is what I actually do, just wrong commit message, will fix.
 
  
  The vfio: Limit group opens patch is also required for the consistency.
 
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
 
  v1-v2: added definitions to vfio.h :)
  Should not compile but compiled. Hm.
 
  ---
   drivers/vfio/vfio.c  |   54 
  ++
   include/linux/vfio.h |7 +++
   2 files changed, 61 insertions(+)
 
  diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
  index c488da5..40875d2 100644
  --- a/drivers/vfio/vfio.c
  +++ b/drivers/vfio/vfio.c
  @@ -1370,6 +1370,60 @@ static const struct file_operations 
  vfio_device_fops = {
   };
   
   /**
  + * External user API, exported by symbols to be linked dynamically.
  + */
  +
  +/* Allows an external user (for example, KVM) to lock an IOMMU group */
  +int vfio_group_add_external_user(struct file *filep)
  +{
  +  struct vfio_group *group = filep-private_data;
  +
  +  if (filep-f_op != vfio_group_fops)
  +  return -EINVAL;
  +
  +  if (!atomic_inc_not_zero(group-container_users))
  +  return -EINVAL;
  
  This is the place where I was suggesting we need tests to match
  get_device_fd.  It's not clear what the external user is holding if the
  group has no iommu or is not viable here.
 
 
 In my mind this test must include test for iommu id so I would merge it
 with vfio_group_iommu_id_from_file().

I'm not sure what that means.

 Till I check iommu id, I still cannot
 use this group so where to put check for iommu/viable does not really
 matter (for me).

The difference is that getting the group id may just be the first of
several external user API interfaces.  The idea of external user
interface is that from add-del the group is maintained in the same
state as if a device was opened.  If we disassemble that so that add
sets up some stuff and getting the group id does a little more, what
happens if we start adding more external user API callbacks?  A user of
the interface shouldn't need to know the internals to know which
interface allows what aspect of use.  Besides, I don't want to have to
worry about managing another state slightly different from that used by
the device fd.

  
  
  if (!group-container-iommu_driver || !vfio_group_viable(group)) {
  vfio_group_try_dissolve_container(group);
  return -EINVAL;
  }
  
  +
  +  return 0;
  +}
  +EXPORT_SYMBOL_GPL(vfio_group_add_external_user);
  +
  +/* Allows an external user (for example, KVM) to unlock an IOMMU group */
  +void vfio_group_del_external_user(struct file *filep)
  +{
  +  struct vfio_group *group = filep-private_data;
  +
  +  if (WARN_ON(filep-f_op != vfio_group_fops))
  +  return;
  
  How about we make this return int so we can return 0/-EINVAL and the
  caller can decide the severity of the response?
 
 And what can the caller possibly do on !0?


Re: Migration route from Parallels on Mac for Windows images?

2013-06-27 Thread Ken Roberts


--
Ken Roberts
k...@9ci.com
ken.roberts163 @ skype
605-222-5758 @ cell


On Jun 27, 2013, at 2:33 PM, Brian Jackson i...@theiggy.com wrote:

 On Wednesday, June 26, 2013 8:25:54 PM CDT, Ken Roberts wrote:
 Sorry for the user query but I'm not finding expertise on the Linux mailing 
 lists I belong to.  The web site says one-off user questions are OK.
 
 I have a few VM images on Parallels 8 for Mac. I want them to be on 
 KVM/Linux.
 
 Some of the images are Linux, but the critical ones are a few types of 
 Windows.  I don't want to trash my licenses.
 
 I noticed that kvm-img has a parallels format option, and it seems to work 
 while the conversion is going on.  I've tried kvm-img to convert to qcow2 
 and to raw, both cases the image converts but the disk is not bootable.  The 
 only file the kvm-img doesn't immediately fail on is the one that contains 
 the data.
 
 
 More details on not bootable would be nice. Do you get a blue screen? 
 Seabios screen? You may need to prep the image before you convert it (google 
 mergeide).


Not sure if you support screenshots on the list, so I'm typing it below the 
post.

The only configured device is the hard disk, it is INSTANTLY showing Boot 
failed: not a bootable disk.

Thanks.



Black screen from virt-manager:


Booting from Hard Disk…
Boot failed: not a bootable disk

Booting from ROM…
iPXE (PCI 00:03.0) starting execution…ok
iPXE initialising devices…ok


iPXE 1.0.0+ -- Open Source Network Boot Firmware -- http://ipxe.org
Features: HTTP HTTPS iSCSI DNS TFTP AoE bzImage ELF MBOOT PXE PXEXT Menu

net0: xx:xx:xx:xx:xx:xx using rtl8139 on PCI00:03.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
DHCP (net0 xx:xx:xx:xx:xx:xx)….. ok
net0: 192.168.1.140/255.255.255.0 gw 192.168.1.1
Next server: 192.168.1.1
Nothing to boot: No such file or directory (http://ipxe.org/2d03e13b)
No more network devices

Booting from Floppy…
Boot failed: could not read the boot disk

No bootable device.











--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] vfio: add external user support

2013-06-27 Thread Alexey Kardashevskiy
On 06/28/2013 10:41 AM, Alex Williamson wrote:
 On Fri, 2013-06-28 at 08:57 +1000, Alexey Kardashevskiy wrote:
 On 06/28/2013 01:44 AM, Alex Williamson wrote:
 On Thu, 2013-06-27 at 17:14 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.

 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 in the host kernel to avoid passing map/unmap requests to the user
 space which would made things pretty slow.

 The proposed protocol includes:

 1. do normal VFIO init stuff such as opening a new container, attaching
 group(s) to it, setting an IOMMU driver for a container. When IOMMU is
 set for a container, all groups in it are considered ready to use by
 an external user.

 2. pass a fd of the group we want to accelerate to KVM. KVM calls
 vfio_group_iommu_id_from_file() to verify if the group is initialized
 and IOMMU is set for it. The current TCE IOMMU driver marks the whole
 IOMMU table as busy when IOMMU is set for a container what this prevents
 other DMA users from allocating from it so it is safe to pass the group
 to the user space.

 3. KVM increases the container users counter via
 vfio_group_add_external_user(). This prevents the VFIO group from
 being disposed prior to exiting KVM.

 4. When KVM is finished and doing cleanup, it releases the group file
 and decrements the container users counter. Everything gets released.

 5. KVM also keeps the group file as otherwise its fd might have been
 closed at the moment of KVM finish so vfio_group_del_external_user()
 call will not be possible.

 This is the wrong order in my mind.  An external user has no business
 checking or maintaining any state of a group until it calls
 add_external_user().  Only after that call is successful can the user
 assume the filep to group relationship is static and get the iommu_id.
 Any use of the external user API should start with add and end with
 del.

 Yes, this is what I actually do, just wrong commit message, will fix.


 The vfio: Limit group opens patch is also required for the consistency.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---

 v1-v2: added definitions to vfio.h :)
 Should not compile but compiled. Hm.

 ---
  drivers/vfio/vfio.c  |   54 
 ++
  include/linux/vfio.h |7 +++
  2 files changed, 61 insertions(+)

 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..40875d2 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,60 @@ static const struct file_operations 
 vfio_device_fops = {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + */
 +
 +/* Allows an external user (for example, KVM) to lock an IOMMU group */
 +int vfio_group_add_external_user(struct file *filep)
 +{
 +  struct vfio_group *group = filep-private_data;
 +
 +  if (filep-f_op != vfio_group_fops)
 +  return -EINVAL;
 +
 +  if (!atomic_inc_not_zero(group-container_users))
 +  return -EINVAL;

 This is the place where I was suggesting we need tests to match
 get_device_fd.  It's not clear what the external user is holding if the
 group has no iommu or is not viable here.


 In my mind this test must include test for iommu id so I would merge it
 with vfio_group_iommu_id_from_file().
 
 I'm not sure what that means.

Sorry. Still a mess in my head :( I'll to explain.

vfio_group_add_external_user() should tell if the group is viable and has
iommu (does not the latter include check for viable?).

vfio_group_iommu_id_from_file() tells the group id which has to be compared
by KVM with what KVM got from the userspace and KVM should reject if the
group id is wrong.

So there are 3 checks. KVM can continue if all three passed.

 Till I check iommu id, I still cannot
 use this group so where to put check for iommu/viable does not really
 matter (for me).
 
 The difference is that getting the group id may just be the first of
 several external user API interfaces.  The idea of external user
 interface is that from add-del the group is maintained in the same
 state as if a device was opened.

Good point.

 If we disassemble that so that add
 sets up some stuff and getting the group id does a little more, what
 happens if we start adding more external user API callbacks?  A user of
 the interface shouldn't need to know the internals to know which
 interface allows what aspect of use.  Besides, I don't want to have to
 worry about managing another state slightly different from that used by
 the device fd.





 if (!group-container-iommu_driver || !vfio_group_viable(group)) {
 vfio_group_try_dissolve_container(group);
 return -EINVAL;
 }

 +
 +  return 0;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_add_external_user);
 +
 +/* Allows an external user (for example, KVM) to 

Re: [RFC 0/5] Introduce VM Sockets virtio transport

2013-06-27 Thread Andy King
Hi Michael,

 __u32 guest_cid;
 
 Given that cid is like an IP address, 32 bit seems too
 limiting. I would go for a 64 bit one or maybe even 128 bit,
 so that e.g. GUIDs can be used there.

That's likely based on what vSockets uses, which is in turn based on
what the VMCI device exposes (which is what vSockets was originally
built on), so unfortunately it's too late to extend that type.
However, that still allows just under 2^32 VMs per host (there are
three reserved values).

 __u32   dst_port;
 
 Ports are 32 bit? I guess most applications can't work with 16 bit.

As with the cid, the width of the port type comes from vSockets,
which is what this plugs into.

 Also, why put cid's in all packets? They are only needed
 when you create a connection, no? Afterwards port numbers
 can be used.

The cid is present in DGRAM packets and STREAM _control_ packets
(connection handshake, signal read/write and so forth).  I don't
think the intent here is for it to be in STREAM _data_ packets,
but Asias can clarify.

  Virtio VM socket connection creation:
  1) Client sends VIRTIO_VSOCK_OP_REQUEST to server
  2) Server reponses with VIRTIO_VSOCK_OP_NEGOTIATE to client
  3) Client sends VIRTIO_VSOCK_OP_OFFER to server
  4) Server responses with VIRTIO_VSOCK_OP_ATTACH to client
 
 What's the reason for a 4 stage setup? Most transports
 make do with 3.

I'm guessing that's also based on the original vSockets/VMCI
implementation, where the NEGOTIATE/OFFER stages are used to
negotiate the underlying transport buffer size (in VMCI, the
queuepair that underlies a STREAM socket).  The virtio
transport can probably use 3.

Thanks!
- Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] vfio: add external user support

2013-06-27 Thread Alex Williamson
On Fri, 2013-06-28 at 11:38 +1000, Alexey Kardashevskiy wrote:
 On 06/28/2013 10:41 AM, Alex Williamson wrote:
  On Fri, 2013-06-28 at 08:57 +1000, Alexey Kardashevskiy wrote:
  On 06/28/2013 01:44 AM, Alex Williamson wrote:
  On Thu, 2013-06-27 at 17:14 +1000, Alexey Kardashevskiy wrote:
  VFIO is designed to be used via ioctls on file descriptors
  returned by VFIO.
 
  However in some situations support for an external user is required.
  The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
  use the existing VFIO groups for exclusive access in real/virtual mode
  in the host kernel to avoid passing map/unmap requests to the user
  space which would made things pretty slow.
 
  The proposed protocol includes:
 
  1. do normal VFIO init stuff such as opening a new container, attaching
  group(s) to it, setting an IOMMU driver for a container. When IOMMU is
  set for a container, all groups in it are considered ready to use by
  an external user.
 
  2. pass a fd of the group we want to accelerate to KVM. KVM calls
  vfio_group_iommu_id_from_file() to verify if the group is initialized
  and IOMMU is set for it. The current TCE IOMMU driver marks the whole
  IOMMU table as busy when IOMMU is set for a container what this prevents
  other DMA users from allocating from it so it is safe to pass the group
  to the user space.
 
  3. KVM increases the container users counter via
  vfio_group_add_external_user(). This prevents the VFIO group from
  being disposed prior to exiting KVM.
 
  4. When KVM is finished and doing cleanup, it releases the group file
  and decrements the container users counter. Everything gets released.
 
  5. KVM also keeps the group file as otherwise its fd might have been
  closed at the moment of KVM finish so vfio_group_del_external_user()
  call will not be possible.
 
  This is the wrong order in my mind.  An external user has no business
  checking or maintaining any state of a group until it calls
  add_external_user().  Only after that call is successful can the user
  assume the filep to group relationship is static and get the iommu_id.
  Any use of the external user API should start with add and end with
  del.
 
  Yes, this is what I actually do, just wrong commit message, will fix.
 
 
  The vfio: Limit group opens patch is also required for the consistency.
 
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
 
  v1-v2: added definitions to vfio.h :)
  Should not compile but compiled. Hm.
 
  ---
   drivers/vfio/vfio.c  |   54 
  ++
   include/linux/vfio.h |7 +++
   2 files changed, 61 insertions(+)
 
  diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
  index c488da5..40875d2 100644
  --- a/drivers/vfio/vfio.c
  +++ b/drivers/vfio/vfio.c
  @@ -1370,6 +1370,60 @@ static const struct file_operations 
  vfio_device_fops = {
   };
   
   /**
  + * External user API, exported by symbols to be linked dynamically.
  + */
  +
  +/* Allows an external user (for example, KVM) to lock an IOMMU group */
  +int vfio_group_add_external_user(struct file *filep)
  +{
  +struct vfio_group *group = filep-private_data;
  +
  +if (filep-f_op != vfio_group_fops)
  +return -EINVAL;
  +
  +if (!atomic_inc_not_zero(group-container_users))
  +return -EINVAL;
 
  This is the place where I was suggesting we need tests to match
  get_device_fd.  It's not clear what the external user is holding if the
  group has no iommu or is not viable here.
 
 
  In my mind this test must include test for iommu id so I would merge it
  with vfio_group_iommu_id_from_file().
  
  I'm not sure what that means.
 
 Sorry. Still a mess in my head :( I'll to explain.
 
 vfio_group_add_external_user() should tell if the group is viable and has
 iommu

Agreed

 (does not the latter include check for viable?).

Mostly paranoia

 vfio_group_iommu_id_from_file() tells the group id which has to be compared
 by KVM with what KVM got from the userspace and KVM should reject if the
 group id is wrong.
 
 So there are 3 checks. KVM can continue if all three passed.

That's KVM's business, but what does it prove for userspace to give KVM
both a vfio group file descriptor and a group id?  It seems redundant
since the group id from vfio needs to take precedence.  More paranoia?

  Till I check iommu id, I still cannot
  use this group so where to put check for iommu/viable does not really
  matter (for me).
  
  The difference is that getting the group id may just be the first of
  several external user API interfaces.  The idea of external user
  interface is that from add-del the group is maintained in the same
  state as if a device was opened.
 
 Good point.
 
  If we disassemble that so that add
  sets up some stuff and getting the group id does a little more, what
  happens if we start adding more external user API callbacks?  A user of
  the interface shouldn't need to know the 

Re: [RFC 4/5] VSOCK: Introduce vhost-vsock.ko

2013-06-27 Thread Andy King
Hi Michael,

  +   u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
  +   return cid;
  +}
  +
 
 Interesting. So all hosts in fact have the same CID?

Host here means the thing _below_ the VM.  Any process running on
the host OS can be addressed with cid 2.  Each VM gets its own cid.
So communication is always between VM x - host 2.  That makes for
easy lookup on the VM's part.  (Note that we further distinguish in
the VMCI transport between the hypervisor, specifically the VM's own
VMX, which is on cid 0, and the host on cid 2.)

Thanks!
- Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


watchdog: print stolen time increment at softlockup detection

2013-06-27 Thread Marcelo Tosatti

One possibility for a softlockup report in a Linux VM, is that the host
system is overcommitted to the point where the watchdog task is unable
to make progress (unable to touch the watchdog).

Maintain the increment in stolen time for the period of 
softlockup threshold detection (20 seconds by the default), 
and report this increment in the softlockup message.

Overcommitment is then indicated by a large stolen time increment,
accounting for more than, or for a significant percentage of the
softlockup threshold.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 05039e3..ed09d58 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -34,6 +34,8 @@ int __read_mostly watchdog_thresh = 10;
 static int __read_mostly watchdog_disabled;
 static u64 __read_mostly sample_period;
 
+#define SOFT_INTRS_PER_PERIOD 5
+
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
 static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
@@ -127,9 +129,51 @@ static void set_sample_period(void)
 * and hard thresholds) to increment before the
 * hardlockup detector generates a warning
 */
-   sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5);
+   sample_period = get_softlockup_thresh() *
+   ((u64)NSEC_PER_SEC / SOFT_INTRS_PER_PERIOD);
 }
 
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+struct steal_clock_record {
+   u64 prev_stolen_time;
+   u64 stolen_time[SOFT_INTRS_PER_PERIOD];
+   int idx;
+};
+
+static DEFINE_PER_CPU(struct steal_clock_record, steal_record);
+static void record_steal_time(void)
+{
+   struct steal_clock_record *r;
+   int cpu = smp_processor_id();
+   u64 steal_time;
+   r = per_cpu(steal_record, cpu);
+
+   steal_time = paravirt_steal_clock(cpu);
+   r-stolen_time[r-idx] = steal_time - r-prev_stolen_time;
+   r-idx++;
+   if (r-idx == SOFT_INTRS_PER_PERIOD)
+   r-idx = 0;
+   r-prev_stolen_time = steal_time;
+}
+
+static unsigned int get_accumulated_steal(int cpu)
+{
+   int idx;
+   u64 t = 0;
+   struct steal_clock_record *r = per_cpu(steal_record, cpu);
+
+   for (idx = 0; idx  SOFT_INTRS_PER_PERIOD; idx++)
+   t += r-stolen_time[idx];
+
+   do_div(t, 100);
+
+   return t;
+}
+
+#else
+static void record_steal_time(void) { return; }
+#endif
+
 /* Commands for resetting the watchdog */
 static void __touch_watchdog(void)
 {
@@ -271,6 +315,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
/* kick the hardlockup detector */
watchdog_interrupt_count();
 
+   /* record steal time */
+   record_steal_time();
+
/* kick the softlockup detector */
wake_up_process(__this_cpu_read(softlockup_watchdog));
 
@@ -316,6 +363,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
printk(KERN_EMERG BUG: soft lockup - CPU#%d stuck for %us! 
[%s:%d]\n,
smp_processor_id(), duration,
current-comm, task_pid_nr(current));
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+   printk(KERN_EMERG soft lockup stolen time = %ums\n,
+   get_accumulated_steal(smp_processor_id()));
+#endif
print_modules();
print_irqtrace_events(current);
if (regs)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] vfio: add external user support

2013-06-27 Thread Alexey Kardashevskiy
On 06/28/2013 12:37 PM, Alex Williamson wrote:
 On Fri, 2013-06-28 at 11:38 +1000, Alexey Kardashevskiy wrote:
 On 06/28/2013 10:41 AM, Alex Williamson wrote:
 On Fri, 2013-06-28 at 08:57 +1000, Alexey Kardashevskiy wrote:
 On 06/28/2013 01:44 AM, Alex Williamson wrote:
 On Thu, 2013-06-27 at 17:14 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.

 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 in the host kernel to avoid passing map/unmap requests to the user
 space which would made things pretty slow.

 The proposed protocol includes:

 1. do normal VFIO init stuff such as opening a new container, attaching
 group(s) to it, setting an IOMMU driver for a container. When IOMMU is
 set for a container, all groups in it are considered ready to use by
 an external user.

 2. pass a fd of the group we want to accelerate to KVM. KVM calls
 vfio_group_iommu_id_from_file() to verify if the group is initialized
 and IOMMU is set for it. The current TCE IOMMU driver marks the whole
 IOMMU table as busy when IOMMU is set for a container what this prevents
 other DMA users from allocating from it so it is safe to pass the group
 to the user space.

 3. KVM increases the container users counter via
 vfio_group_add_external_user(). This prevents the VFIO group from
 being disposed prior to exiting KVM.

 4. When KVM is finished and doing cleanup, it releases the group file
 and decrements the container users counter. Everything gets released.

 5. KVM also keeps the group file as otherwise its fd might have been
 closed at the moment of KVM finish so vfio_group_del_external_user()
 call will not be possible.

 This is the wrong order in my mind.  An external user has no business
 checking or maintaining any state of a group until it calls
 add_external_user().  Only after that call is successful can the user
 assume the filep to group relationship is static and get the iommu_id.
 Any use of the external user API should start with add and end with
 del.

 Yes, this is what I actually do, just wrong commit message, will fix.


 The vfio: Limit group opens patch is also required for the consistency.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---

 v1-v2: added definitions to vfio.h :)
 Should not compile but compiled. Hm.

 ---
  drivers/vfio/vfio.c  |   54 
 ++
  include/linux/vfio.h |7 +++
  2 files changed, 61 insertions(+)

 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..40875d2 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,60 @@ static const struct file_operations 
 vfio_device_fops = {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + */
 +
 +/* Allows an external user (for example, KVM) to lock an IOMMU group */
 +int vfio_group_add_external_user(struct file *filep)
 +{
 +struct vfio_group *group = filep-private_data;
 +
 +if (filep-f_op != vfio_group_fops)
 +return -EINVAL;
 +
 +if (!atomic_inc_not_zero(group-container_users))
 +return -EINVAL;

 This is the place where I was suggesting we need tests to match
 get_device_fd.  It's not clear what the external user is holding if the
 group has no iommu or is not viable here.


 In my mind this test must include test for iommu id so I would merge it
 with vfio_group_iommu_id_from_file().

 I'm not sure what that means.

 Sorry. Still a mess in my head :( I'll to explain.

 vfio_group_add_external_user() should tell if the group is viable and has
 iommu
 
 Agreed
 
 (does not the latter include check for viable?).
 
 Mostly paranoia
 
 vfio_group_iommu_id_from_file() tells the group id which has to be compared
 by KVM with what KVM got from the userspace and KVM should reject if the
 group id is wrong.

 So there are 3 checks. KVM can continue if all three passed.
 
 That's KVM's business, but what does it prove for userspace to give KVM
 both a vfio group file descriptor and a group id?  It seems redundant
 since the group id from vfio needs to take precedence.  More paranoia?

Yep, that's her :) Without this check, the user space is allowed to mix up
PHB ID (liobn) and IOMMU group. It has the right to do so and it should not
break anything but nice to check, no?



 Till I check iommu id, I still cannot
 use this group so where to put check for iommu/viable does not really
 matter (for me).

 The difference is that getting the group id may just be the first of
 several external user API interfaces.  The idea of external user
 interface is that from add-del the group is maintained in the same
 state as if a device was opened.

 Good point.

 If we disassemble that so that add
 sets up some stuff and getting the group id does a 

Re: [RFC 0/5] Introduce VM Sockets virtio transport

2013-06-27 Thread Asias He
On Thu, Jun 27, 2013 at 07:25:40PM -0700, Andy King wrote:
 Hi Michael,
 
  __u32 guest_cid;
  
  Given that cid is like an IP address, 32 bit seems too
  limiting. I would go for a 64 bit one or maybe even 128 bit,
  so that e.g. GUIDs can be used there.
 
 That's likely based on what vSockets uses, which is in turn based on
 what the VMCI device exposes (which is what vSockets was originally
 built on), so unfortunately it's too late to extend that type.
 However, that still allows just under 2^32 VMs per host (there are
 three reserved values).

Yes, 32 bit cid and port are defined by vSockets, we can not go for 64 bit
for virtio transport.

  __u32   dst_port;
  
  Ports are 32 bit? I guess most applications can't work with 16 bit.
 
 As with the cid, the width of the port type comes from vSockets,
 which is what this plugs into.

Yes.

  Also, why put cid's in all packets? They are only needed
  when you create a connection, no? Afterwards port numbers
  can be used.
 
 The cid is present in DGRAM packets and STREAM _control_ packets
 (connection handshake, signal read/write and so forth).  I don't
 think the intent here is for it to be in STREAM _data_ packets,
 but Asias can clarify.

Virtio transport stream data packets are a bit different from how VMCI transport
handles them. In VMCI, a dedicated queue pairs is created for each
stream. In virtio, all the streams share the single virtqueue pairs.

On the recv path, we need the cid and port information from the packet
header to figure out which socket is responsible for the packet.

   Virtio VM socket connection creation:
   1) Client sends VIRTIO_VSOCK_OP_REQUEST to server
   2) Server reponses with VIRTIO_VSOCK_OP_NEGOTIATE to client
   3) Client sends VIRTIO_VSOCK_OP_OFFER to server
   4) Server responses with VIRTIO_VSOCK_OP_ATTACH to client
  
  What's the reason for a 4 stage setup? Most transports
  make do with 3.
 
 I'm guessing that's also based on the original vSockets/VMCI
 implementation, where the NEGOTIATE/OFFER stages are used to
 negotiate the underlying transport buffer size (in VMCI, the
 queuepair that underlies a STREAM socket).  The virtio
 transport can probably use 3.

Right, I wanted to follow how VMCI transport does the connection setup.

We can drop the IRTIO_VSOCK_OP_ATTACH stage, and make the client into
SS_CONNECTED state once we get the VIRTIO_VSOCK_OP_NEGOTIATE pkt.

 Thanks!
 - Andy

-- 
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] vfio: add external user support

2013-06-27 Thread Stephen Rothwell
Hi Alexy,

On Thu, 27 Jun 2013 15:02:31 +1000 Alexey Kardashevskiy a...@ozlabs.ru wrote:

 index c488da5..54192b2 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,59 @@ static const struct file_operations vfio_device_fops = 
 {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + */
 +
 +/* Allows an external user (for example, KVM) to lock an IOMMU group */
 +static int vfio_group_add_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + if (filep-f_op != vfio_group_fops)
 + return -EINVAL;
 +
 + if (!atomic_inc_not_zero(group-container_users))
 + return -EINVAL;
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_add_external_user);

You cannot EXPORT a static symbol ... The same through the rest of the
file.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpv4zCGfMNC4.pgp
Description: PGP signature


Re: [PATCH 3/8] vfio: add external user support

2013-06-27 Thread Stephen Rothwell
Hi Alexy,

On Thu, 27 Jun 2013 15:02:31 +1000 Alexey Kardashevskiy a...@ozlabs.ru wrote:

 +/* Allows an external user (for example, KVM) to unlock an IOMMU group */
 +static void vfio_group_del_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + BUG_ON(filep-f_op != vfio_group_fops);

We usually reserve BUG_ON for situations where there is no way to
continue running or continuing will corrupt the running kernel.  Maybe
WARN_ON() and return?

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpiQGEYh_pRZ.pgp
Description: PGP signature


Re: [PATCH 3/8] vfio: add external user support

2013-06-27 Thread Benjamin Herrenschmidt
On Thu, 2013-06-27 at 16:59 +1000, Stephen Rothwell wrote:
  +/* Allows an external user (for example, KVM) to unlock an IOMMU
 group */
  +static void vfio_group_del_external_user(struct file *filep)
  +{
  + struct vfio_group *group = filep-private_data;
  +
  + BUG_ON(filep-f_op != vfio_group_fops);
 
 We usually reserve BUG_ON for situations where there is no way to
 continue running or continuing will corrupt the running kernel.  Maybe
 WARN_ON() and return?

Not even that. This is a user space provided fd, we shouldn't oops the
kernel because we passed a wrong argument, just return -EINVAL or
something like that (add a return code).

Ben.



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] vfio: add external user support

2013-06-27 Thread Alexey Kardashevskiy
On 06/27/2013 07:42 PM, Benjamin Herrenschmidt wrote:
 On Thu, 2013-06-27 at 16:59 +1000, Stephen Rothwell wrote:
 +/* Allows an external user (for example, KVM) to unlock an IOMMU
 group */
 +static void vfio_group_del_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + BUG_ON(filep-f_op != vfio_group_fops);

 We usually reserve BUG_ON for situations where there is no way to
 continue running or continuing will corrupt the running kernel.  Maybe
 WARN_ON() and return?
 
 Not even that. This is a user space provided fd, we shouldn't oops the
 kernel because we passed a wrong argument, just return -EINVAL or
 something like that (add a return code).

I'll change to WARN_ON but...
This is going to be called on KVM exit on a file pointer previously
verified for correctness. If it is a wrong file*, then something went
terribly wrong.


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-27 Thread David Gibson
On Sun, Jun 23, 2013 at 10:41:24PM -0600, Alex Williamson wrote:
 On Mon, 2013-06-24 at 13:52 +1000, David Gibson wrote:
  On Sat, Jun 22, 2013 at 08:28:06AM -0600, Alex Williamson wrote:
   On Sat, 2013-06-22 at 22:03 +1000, David Gibson wrote:
On Thu, Jun 20, 2013 at 08:55:13AM -0600, Alex Williamson wrote:
 On Thu, 2013-06-20 at 18:48 +1000, Alexey Kardashevskiy wrote:
  On 06/20/2013 05:47 PM, Benjamin Herrenschmidt wrote:
   On Thu, 2013-06-20 at 15:28 +1000, David Gibson wrote:
   Just out of curiosity - would not get_file() and fput_atomic() 
   on a
   group's
   file* do the right job instead of 
   vfio_group_add_external_user() and
   vfio_group_del_external_user()?
  
   I was thinking that too.  Grabbing a file reference would 
   certainly be
   the usual way of handling this sort of thing.
   
   But that wouldn't prevent the group ownership to be returned to
   the kernel or another user would it ?
  
  
  Holding the file pointer does not let the group-container_users 
  counter go
  to zero
 
 How so?  Holding the file pointer means the file won't go away, which
 means the group release function won't be called.  That means the 
 group
 won't go away, but that doesn't mean it's attached to an IOMMU.  A 
 user
 could call UNSET_CONTAINER.

Uhh... *thinks*.  Ah, I see.

I think the interface should not take the group fd, but the container
fd.  Holding a reference to *that* would keep the necessary things
around.  But more to the point, it's the right thing semantically:

The container is essentially the handle on a host iommu address space,
and so that's what should be bound by the KVM call to a particular
guest iommu address space.  e.g. it would make no sense to bind two
different groups to different guest iommu address spaces, if they were
in the same container - the guest thinks they are different spaces,
but if they're in the same container they must be the same space.
   
   While the container is the gateway to the iommu, what empowers the
   container to maintain an iommu is the group.  What happens to a
   container when all the groups are disconnected or closed?  Groups are
   the unit that indicates hardware access, not containers.  Thanks,
  
  Uh... huh?  I'm really not sure what you're getting at.
  
  The operation we're doing for KVM here is binding a guest iommu
  address space to a particular host iommu address space.  Why would we
  not want to use the obvious handle on the host iommu address space,
  which is the container fd?
 
 AIUI, the request isn't for an interface through which to do iommu
 mappings.  The request is for an interface to show that the user has
 sufficient privileges to do mappings.  Groups are what gives the user
 that ability.  The iommu is also possibly associated with multiple iommu
 groups and I believe what is being asked for here is a way to hold and
 lock a single iommu group with iommu protection.
 
 From a practical point of view, the iommu interface is de-privileged
 once the groups are disconnected or closed.  Holding a reference count
 on the iommu fd won't prevent that.  That means we'd have to use a
 notifier to have KVM stop the side-channel iommu access.  Meanwhile
 holding the file descriptor for the group and adding an interface that
 bumps use counter allows KVM to lock itself in, just as if it had a
 device opened itself.  Thanks,

Ah, good point.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpAjDK6J9rb1.pgp
Description: PGP signature


Re: [PATCH 8/8] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-06-27 Thread Scott Wood

On 06/27/2013 12:02:36 AM, Alexey Kardashevskiy wrote:

+/*
+ * The KVM guest can be backed with 16MB pages.
+ * In this case, we cannot do page counting from the real mode
+ * as the compound pages are used - they are linked in a list
+ * with pointers as virtual addresses which are inaccessible
+ * in real mode.
+ *
+ * The code below keeps a 16MB pages list and uses page struct
+ * in real mode if it is already locked in RAM and inserted into
+ * the list or switches to the virtual mode where it can be
+ * handled in a usual manner.
+ */
+#define KVMPPC_HUGEPAGE_HASH(gpa)  hash_32(gpa  24, 32)
+
+struct kvmppc_iommu_hugepage {
+   struct hlist_node hash_node;
+   unsigned long gpa;  /* Guest physical address */
+   unsigned long hpa;  /* Host physical address */
+	struct page *page;	/* page struct of the very first  
subpage */
+	unsigned long size;	/* Huge page size (always 16MB at the  
moment) */

+};


Shouldn't this be namespaced to something like book3s or spapr?

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 41/45] powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-06-27 Thread Srivatsa S. Bhat
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
offline, while invoking from atomic context.

Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: Gleb Natapov g...@redhat.com
Cc: Alexander Graf ag...@suse.de
Cc: Rob Herring rob.herr...@calxeda.com
Cc: Grant Likely grant.lik...@secretlab.ca
Cc: Kumar Gala ga...@kernel.crashing.org
Cc: Zhao Chenhui chenhui.z...@freescale.com
Cc: linuxppc-...@lists.ozlabs.org
Cc: k...@vger.kernel.org
Cc: kvm-ppc@vger.kernel.org
Cc: oprofile-l...@lists.sf.net
Cc: cbe-oss-...@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 arch/powerpc/kernel/irq.c  |7 ++-
 arch/powerpc/kernel/machine_kexec_64.c |4 ++--
 arch/powerpc/kernel/smp.c  |2 ++
 arch/powerpc/kvm/book3s_hv.c   |5 +++--
 arch/powerpc/mm/mmu_context_nohash.c   |3 +++
 arch/powerpc/oprofile/cell/spu_profiler.c  |3 +++
 arch/powerpc/oprofile/cell/spu_task_sync.c |4 
 arch/powerpc/oprofile/op_model_cell.c  |6 ++
 8 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ca39bac..41e9961 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -45,6 +45,7 @@
 #include linux/irq.h
 #include linux/seq_file.h
 #include linux/cpumask.h
+#include linux/cpu.h
 #include linux/profile.h
 #include linux/bitops.h
 #include linux/list.h
@@ -410,7 +411,10 @@ void migrate_irqs(void)
unsigned int irq;
static int warned;
cpumask_var_t mask;
-   const struct cpumask *map = cpu_online_mask;
+   const struct cpumask *map;
+
+   get_online_cpus_atomic();
+   map = cpu_online_mask;
 
alloc_cpumask_var(mask, GFP_ATOMIC);
 
@@ -436,6 +440,7 @@ void migrate_irqs(void)
}
 
free_cpumask_var(mask);
+   put_online_cpus_atomic();
 
local_irq_enable();
mdelay(1);
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 611acdf..38f6d75 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -187,7 +187,7 @@ static void kexec_prepare_cpus_wait(int wait_state)
int my_cpu, i, notified=-1;
 
hw_breakpoint_disable();
-   my_cpu = get_cpu();
+   my_cpu = get_online_cpus_atomic();
/* Make sure each CPU has at least made it to the state we need.
 *
 * FIXME: There is a (slim) chance of a problem if not all of the CPUs
@@ -266,7 +266,7 @@ static void kexec_prepare_cpus(void)
 */
kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
 
-   put_cpu();
+   put_online_cpus_atomic();
 }
 
 #else /* ! SMP */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7ac5e..2123bec 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -277,9 +277,11 @@ void smp_send_debugger_break(void)
if (unlikely(!smp_ops))
return;
 
+   get_online_cpus_atomic();
for_each_online_cpu(cpu)
if (cpu != me)
do_message_pass(cpu, PPC_MSG_DEBUGGER_BREAK);
+   put_online_cpus_atomic();
 }
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2efa9dd..9d8a973 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -28,6 +28,7 @@
 #include linux/fs.h
 #include linux/anon_inodes.h
 #include linux/cpumask.h
+#include linux/cpu.h
 #include linux/spinlock.h
 #include linux/page-flags.h
 #include linux/srcu.h
@@ -78,7 +79,7 @@ void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
++vcpu-stat.halt_wakeup;
}
 
-   me = get_cpu();
+   me = get_online_cpus_atomic();
 
/* CPU points to the first thread of the core */
if (cpu != me  cpu = 0  cpu  nr_cpu_ids) {
@@ -88,7 +89,7 @@ void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
else if (cpu_online(cpu))
smp_send_reschedule(cpu);
}
-   put_cpu();
+   put_online_cpus_atomic();
 }
 
 /*
diff --git a/arch/powerpc/mm/mmu_context_nohash.c 
b/arch/powerpc/mm/mmu_context_nohash.c
index e779642..c7bdcb4 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -194,6 +194,8 @@ void switch_mmu_context(struct mm_struct *prev, struct 
mm_struct *next)
unsigned int i, id, cpu = smp_processor_id();
unsigned long *map;
 
+   get_online_cpus_atomic();
+
/* No lockless fast path .. yet */
raw_spin_lock(context_lock);
 
@@ -280,6 +282,7 @@ void switch_mmu_context(struct mm_struct *prev, struct 
mm_struct *next)
pr_hardcont( - %d\n, id);
set_context(id, next-pgd);

Re:

2013-06-27 Thread emirates

Did You Recieve Our Last Notification!!

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html