Re: [RFC PATCH 09/11] kvm: simplify processor compat check

2013-09-29 Thread Gleb Natapov
On Sat, Sep 28, 2013 at 09:06:47PM +0530, Aneesh Kumar K.V wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
  Il 27/09/2013 15:13, Aneesh Kumar K.V ha scritto:
  Alexander Graf ag...@suse.de writes:
  
  On 27.09.2013, at 12:03, Aneesh Kumar K.V wrote:
 
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
  Missing patch description.
 
  Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
  I fail to see how this really simplifies things, but at the end of the
  day it's Gleb's and Paolo's call.
  
  will do. It avoid calling 
  
 for_each_online_cpu(cpu) {
 smp_call_function_single() 
  
  on multiple architecture.
 
  I agree with Alex.
 
  The current code is not specially awesome; having
  kvm_arch_check_processor_compat take an int* disguised as a void* is a
  bit ugly indeed.
 
  However, the API makes sense and tells you that it is being passed as a
  callback (to smp_call_function_single in this case).
 
 But whether to check on all cpus or not is arch dependent right?.
 IIUC only x86 and ppc64 need to do that. Also on ppc64 it really
 depends on whether HV or PR. We need to check on all cpus only if it is
 HV. 
 
 
  You are making the API more complicated to use on the arch layer
  (because arch maintainers now have to think do I need to check this on
  all online CPUs?) and making the leaf POWER code less legible because
  it still has the weird void()(void *) calling convention.
 
 
 IIUC what we wanted to check is to find out whether kvm can run on this
 system. That is really an arch specific check. So for core kvm the call
 should be a simple 
 
 if (kvm_arch_check_process_compat()  0)
 error;
We have that already, just return error from kvm_arch_hardware_setup. This
is specific processor compatibility check and you are arguing that the
processor check should be part of kvm_arch_hardware_setup().

 
 Now how each arch figure out whether kvm can run on this system should
 be arch specific. For x86 we do need to check all the cpus. On ppc64 for
 HV we need to. For other archs we always allow kvm. 
 
This is really a sanity check. Theoretically on x86 we also should
not need to check all cpus since SMP configuration with different cpu
models is not supported by the architecture (AFAIK), but bugs happen
(BIOS bugs may cause difference in capabilities for instance). So some
arches opted out from this sanity check for now and this is their choice,
but the code makes it explicit what are we checking here.

 
  If anything, you could change kvm_arch_check_processor_compat to return
  an int and accept no argument, and introduce a wrapper that kvm_init
  passes to smp_call_function_single.
 
 What i am suggesting in the patch is to avoid calling
 smp_call_function_single from kvm_init and let arch decide whether to
 check on all cpus or not.
 
 -aneesh

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time

2013-09-29 Thread Jason Wang
On 09/26/2013 12:30 PM, Jason Wang wrote:
 On 09/23/2013 03:16 PM, Michael S. Tsirkin wrote:
  On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
   On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
Currently, even if the packet length is smaller than 
VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and 
rollback this choice
later. This could be avoided by determining zerocopy once by 
checking all
conditions at one time before.
   
Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/vhost/net.c |   47 
---
 1 files changed, 20 insertions(+), 27 deletions(-)
   
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8a6dd0d..3f89dea 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net 
*net)
iov_length(nvq-hdr, s), 
hdr_size);
 break;
 }
-zcopy_used = zcopy  (len = 
VHOST_GOODCOPY_LEN ||
-   nvq-upend_idx != 
nvq-done_idx);
+
+zcopy_used = zcopy  len = VHOST_GOODCOPY_LEN
+(nvq-upend_idx + 1) % 
UIO_MAXIOV !=
+  nvq-done_idx
Thinking about this, this looks strange.
The original idea was that once we start doing zcopy, we keep
using the heads ring even for short packets until no zcopy is 
outstanding.
   
   What's the reason for keep using the heads ring?
  To keep completions in order.
 Ok, I will do some test to see the impact.

Since the our of order completion will happen when switching between
zero copy and non zero copy. I test this by using two sessions of
netperf in burst mode, one with 1 byte TCP_RR another with 512 bytes of
TCP_RR. There's no difference with the patch applied.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: Implement default IRQ routing

2013-09-29 Thread Gleb Natapov
On Thu, Sep 26, 2013 at 10:00:59AM +1000, Paul Mackerras wrote:
 On Mon, Sep 23, 2013 at 09:34:01PM +0300, Gleb Natapov wrote:
  On Mon, Sep 23, 2013 at 09:24:14PM +1000, Paul Mackerras wrote:
   On Sun, Sep 22, 2013 at 03:32:53PM +0300, Gleb Natapov wrote:
On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote:
 This implements a simple way to express the case of IRQ routing where
 there is one in-kernel PIC and the system interrupts (GSIs) are routed
 1-1 to the corresponding pins of the PIC.  This is expressed by having
 kvm-irq_routing == NULL with a skeleton irq routing entry in the new
 kvm-default_irq_route field.
 
 This provides a convenient way to provide backwards compatibility when
 adding IRQ routing capability to an existing in-kernel PIC, such as 
 the
 XICS emulation on powerpc.
 
Why not create simple 1-1 irq routing table? It will take a little bit
more memory, but there will be no need for kvm-irq_routing == NULL
special handling.
   
   The short answer is that userspace wants to use interrupt source
   numbers (i.e. pin numbers for the inputs to the emulated XICS) that
   are scattered throughout a large space, since that mirrors what real
   hardware does.  More specifically, hardware divides up the interrupt
   source number into two fields, each of typically 12 bits, where the
   more significant field identifies an interrupt source unit (ISU) and
   the less significant field identifies an interrupt within the ISU.
   Each PCI host bridge would have an ISU, for example, and there can be
   ISUs associated with other things that attach directly to the
   interconnect fabric (coprocessors, cluster interconnects, etc.).
   
   Today, QEMU creates a virtual ISU numbered 1 for the emulated PCI host
   bridge, which means for example that virtio devices get interrupt pin
   numbers starting at 4096.
   
   So, I could have increased KVM_IRQCHIP_NUM_PINS to some multiple of
   4096, say 16384, which would allow for 3 ISUs.  But that would bloat
   out struct kvm_irq_routing_table to over 64kB, and if I wanted 1-1
   mappings between GSI and pins for all of them, the routing table would
   be over 960kB.
   
  Yes, this is not an option. GSI is just a cookie for anything but x86
  non MSI interrupts. So the way to use irq routing table to deliver XICS irqs
  is to register GSI-XICS irq mapping and by triggering GSI, which is
  just an arbitrary number, userspace tells kernel that XICS irq, that was
  registered for that GSI, should be injected.
 
 Yes, that's fine as far as it goes, but the trouble is that the
 existing data structures (specifically the chip[][] array in struct
 kvm_irq_routing_table) don't handle well the case where the pin
 numbers are large and/or sparse.
 
 In other words, using a small compact set of GSI numbers wouldn't
 help, because it's not the GSI - pin mapping that is the problem, it
 is the reverse pin - GSI mapping.
 
That's internal implementation detail that can be redesigned if needed,
we can let architecture define how irq is mapped back to a GSI (cookie).
Buts since you mentioned chip[][] here I have a question about patch 2: in
kvm_set_routing_entry() you check irq against KVM_IRQCHIP_NUM_PINS which
is defined to be 256 for powerpc, but according to what you described
above about 12 bit ISU/irq devision this is not enough to inject any
interrupt with ISU 1. You would have had at least 256 irqs in each ISU
if you reused irqchip to specify ISU, but you didn't do it. Why not extend
a union in kvm_irq_routing_entry with xics specific structure that will
specify ISU/irq withing ISU?

Who irq numbers inside ISU are assigned?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-29 Thread Gleb Natapov
On Thu, Sep 26, 2013 at 07:47:33PM +0200, Paolo Bonzini wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Il 26/09/2013 19:19, Jan Kiszka ha scritto:
  On 2013-09-26 17:04, Paolo Bonzini wrote:
  Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
  This patch contains the following two changes:
  1. Fix the bug in nested preemption timer support. If vmexit L2-L0
  with some reasons not emulated by L1, preemption timer value should
  be save in such exits.
  2. Add support of Save VMX-preemption timer value VM-Exit controls
  to nVMX.
 
  With this patch, nested VMX preemption timer features are fully
  supported.
 
  Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
  ---
  ChangeLog to v4:
Format changes and remove a flag in nested_vmx.
   arch/x86/include/uapi/asm/msr-index.h |1 +
   arch/x86/kvm/vmx.c|   44 
  +++--
   2 files changed, 43 insertions(+), 2 deletions(-)
 
  Hi all,
 
  the test fails for me if the preemption timer value is set to a value
  that is above ~2000 (which means ~65000 TSC cycles on this machine).
  The preemption timer seems to count faster than what is expected, for
  example only up to 4 million cycles if you set it to one million.
  So, I am leaving the patch out of kvm/queue for now, until I can
  test it on more processors.
  
  So this behaviour is a regression of this patch (and your own version as
  well)?
 
 Without this patch Arthur's preemption timer test doesn't work at all.
 
Have you ruled out test bugs?

I see things like:
+   int i, j;
...

+   // Consume enough time to let L2-L0-L2 occurs
+   for(i = 0; i  10; i++)
+   for (j = 0; j  1; j++);

which can be optimized out. Or use of rdtsc() which can be problematic
if host tsc is not synchronised.


--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Gleb Natapov
On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
 So far we've succeeded at making KVM and VFIO mostly unaware of each
 other, but there's any important point where that breaks down.  Intel
 VT-d hardware may or may not support snoop control.  When snoop
 control is available, intel-iommu promotes No-Snoop transactions on
 PCIe to be cache coherent.  That allows KVM to handle things like the
 x86 WBINVD opcode as a nop.  When the hardware does not support this,
 KVM must implement a hardware visible WBINVD for the guest.
 
 We could simply let userspace tell KVM how to handle WBINVD, but it's
 privileged for a reason.  Allowing an arbitrary user to enable
 physical WBINVD gives them a more access to the hardware.  Previously,
 this has only been enabled for guests supporting legacy PCI device
 assignment.  In such cases it's necessary for proper guest execution.
 We therefore create a new KVM-VFIO virtual device.  The user can add
 and remove VFIO groups to this device via file descriptors.  KVM
 makes use of the VFIO external user interface to validate that the
 user has access to physical hardware and gets the coherency state of
 the IOMMU from VFIO.  This provides equivalent functionality to
 legacy KVM assignment, while keeping (nearly) all the bits isolated.
 
Looks good overall to me, one things though: to use legacy device
assignment one needs root permission, so only root user can enable
WBINVD emulation. Who does this permission checking here? Is only root
allowed to create non coherent group with vfio?

 The one intrusion is the resulting flag indicating the coherency
 state.  For this RFC it's placed on the x86 kvm_arch struct, however
 I know POWER has interest in using the VFIO external user interface,
 and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
 care about No-Snoop handling as well or the code can be #ifdef'd.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
  Documentation/virtual/kvm/devices/vfio.txt |   22 +++
  arch/x86/include/asm/kvm_host.h|1 
  arch/x86/kvm/Makefile  |2 
  arch/x86/kvm/vmx.c |5 -
  arch/x86/kvm/x86.c |5 -
  include/linux/kvm_host.h   |1 
  include/uapi/linux/kvm.h   |4 
  virt/kvm/kvm_main.c|3 
  virt/kvm/vfio.c|  237 
 
  9 files changed, 275 insertions(+), 5 deletions(-)
  create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
  create mode 100644 virt/kvm/vfio.c
 
 diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
 b/Documentation/virtual/kvm/devices/vfio.txt
 new file mode 100644
 index 000..831e6a6
 --- /dev/null
 +++ b/Documentation/virtual/kvm/devices/vfio.txt
 @@ -0,0 +1,22 @@
 +VFIO virtual device
 +===
 +
 +Device types supported:
 +  KVM_DEV_TYPE_VFIO
 +
 +Only one VFIO instance may be created per VM.  The created device
 +tracks VFIO groups in use by the VM and features of those groups
 +important to the correctness and acceleration of the VM.  As groups
 +are enabled and disabled for use by the VM, KVM should be updated
 +about their presence.  When registered with KVM, a reference to the
 +VFIO-group is held by KVM.
 +
 +Groups:
 +  KVM_DEV_VFIO_ADD_GROUP
 +  KVM_DEV_VFIO_DEL_GROUP
 +
 +Each takes a int32_t file descriptor for kvm_device_attr.addr and
 +does not support any group device kvm_device_attr.attr.
 +
 +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
 +  KVM_DEV_VFIO_GROUP_ADD  DEL?
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index c76ff74..5b9350d 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -588,6 +588,7 @@ struct kvm_arch {
  
   spinlock_t pvclock_gtod_sync_lock;
   bool use_master_clock;
 + bool vfio_noncoherent;
   u64 master_kernel_ns;
   cycle_t master_cycle_now;
  
 diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
 index bf4fb04..25d22b2 100644
 --- a/arch/x86/kvm/Makefile
 +++ b/arch/x86/kvm/Makefile
 @@ -9,7 +9,7 @@ KVM := ../../../virt/kvm
  
  kvm-y+= $(KVM)/kvm_main.o $(KVM)/ioapic.o \
   $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
 - $(KVM)/eventfd.o $(KVM)/irqchip.o
 + $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
  kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)  += $(KVM)/assigned-dev.o $(KVM)/iommu.o
  kvm-$(CONFIG_KVM_ASYNC_PF)   += $(KVM)/async_pf.o
  
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 1f1da43..94f7786 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7395,8 +7395,9 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
 gfn, bool is_mmio)
*/
   if (is_mmio)
   ret = MTRR_TYPE_UNCACHABLE  VMX_EPT_MT_EPTE_SHIFT;
 - else if 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Alex Williamson
On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
 On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
  So far we've succeeded at making KVM and VFIO mostly unaware of each
  other, but there's any important point where that breaks down.  Intel
  VT-d hardware may or may not support snoop control.  When snoop
  control is available, intel-iommu promotes No-Snoop transactions on
  PCIe to be cache coherent.  That allows KVM to handle things like the
  x86 WBINVD opcode as a nop.  When the hardware does not support this,
  KVM must implement a hardware visible WBINVD for the guest.
  
  We could simply let userspace tell KVM how to handle WBINVD, but it's
  privileged for a reason.  Allowing an arbitrary user to enable
  physical WBINVD gives them a more access to the hardware.  Previously,
  this has only been enabled for guests supporting legacy PCI device
  assignment.  In such cases it's necessary for proper guest execution.
  We therefore create a new KVM-VFIO virtual device.  The user can add
  and remove VFIO groups to this device via file descriptors.  KVM
  makes use of the VFIO external user interface to validate that the
  user has access to physical hardware and gets the coherency state of
  the IOMMU from VFIO.  This provides equivalent functionality to
  legacy KVM assignment, while keeping (nearly) all the bits isolated.
  
 Looks good overall to me, one things though: to use legacy device
 assignment one needs root permission, so only root user can enable
 WBINVD emulation.

That's not entirely accurate, legacy device assignment can be used by a
non-root user, libvirt does this all the time.  The part that requires
root access is opening the pci-sysfs config file, the rest can be
managed via file permissions on the remaining sysfs files.

  Who does this permission checking here? Is only root
 allowed to create non coherent group with vfio?

With vfio the user is granted permission by giving them access to the
vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
group to vfio.  That enables the user to create a container (~iommu
domain) with the group attached to it.  Only then will the vfio external
user interface provide a reference to the group and enable this wbinvd
support.  So, wbinvd emulation should only be available to a user that
own a vfio group and has it configured for use with this interface.
Thanks,

Alex

  The one intrusion is the resulting flag indicating the coherency
  state.  For this RFC it's placed on the x86 kvm_arch struct, however
  I know POWER has interest in using the VFIO external user interface,
  and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
  care about No-Snoop handling as well or the code can be #ifdef'd.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
   Documentation/virtual/kvm/devices/vfio.txt |   22 +++
   arch/x86/include/asm/kvm_host.h|1 
   arch/x86/kvm/Makefile  |2 
   arch/x86/kvm/vmx.c |5 -
   arch/x86/kvm/x86.c |5 -
   include/linux/kvm_host.h   |1 
   include/uapi/linux/kvm.h   |4 
   virt/kvm/kvm_main.c|3 
   virt/kvm/vfio.c|  237 
  
   9 files changed, 275 insertions(+), 5 deletions(-)
   create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
   create mode 100644 virt/kvm/vfio.c
  
  diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
  b/Documentation/virtual/kvm/devices/vfio.txt
  new file mode 100644
  index 000..831e6a6
  --- /dev/null
  +++ b/Documentation/virtual/kvm/devices/vfio.txt
  @@ -0,0 +1,22 @@
  +VFIO virtual device
  +===
  +
  +Device types supported:
  +  KVM_DEV_TYPE_VFIO
  +
  +Only one VFIO instance may be created per VM.  The created device
  +tracks VFIO groups in use by the VM and features of those groups
  +important to the correctness and acceleration of the VM.  As groups
  +are enabled and disabled for use by the VM, KVM should be updated
  +about their presence.  When registered with KVM, a reference to the
  +VFIO-group is held by KVM.
  +
  +Groups:
  +  KVM_DEV_VFIO_ADD_GROUP
  +  KVM_DEV_VFIO_DEL_GROUP
  +
  +Each takes a int32_t file descriptor for kvm_device_attr.addr and
  +does not support any group device kvm_device_attr.attr.
  +
  +RFC - Should we use Group KVM_DEV_VFIO_GROUP with Attributes
  +  KVM_DEV_VFIO_GROUP_ADD  DEL?
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index c76ff74..5b9350d 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -588,6 +588,7 @@ struct kvm_arch {
   
  spinlock_t pvclock_gtod_sync_lock;
  bool use_master_clock;
  +   bool vfio_noncoherent;
  u64 master_kernel_ns;
  cycle_t master_cycle_now;
   
  diff --git a/arch/x86/kvm/Makefile 

Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Gleb Natapov
On Sun, Sep 29, 2013 at 07:52:28AM -0600, Alex Williamson wrote:
 On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
  On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
   So far we've succeeded at making KVM and VFIO mostly unaware of each
   other, but there's any important point where that breaks down.  Intel
   VT-d hardware may or may not support snoop control.  When snoop
   control is available, intel-iommu promotes No-Snoop transactions on
   PCIe to be cache coherent.  That allows KVM to handle things like the
   x86 WBINVD opcode as a nop.  When the hardware does not support this,
   KVM must implement a hardware visible WBINVD for the guest.
   
   We could simply let userspace tell KVM how to handle WBINVD, but it's
   privileged for a reason.  Allowing an arbitrary user to enable
   physical WBINVD gives them a more access to the hardware.  Previously,
   this has only been enabled for guests supporting legacy PCI device
   assignment.  In such cases it's necessary for proper guest execution.
   We therefore create a new KVM-VFIO virtual device.  The user can add
   and remove VFIO groups to this device via file descriptors.  KVM
   makes use of the VFIO external user interface to validate that the
   user has access to physical hardware and gets the coherency state of
   the IOMMU from VFIO.  This provides equivalent functionality to
   legacy KVM assignment, while keeping (nearly) all the bits isolated.
   
  Looks good overall to me, one things though: to use legacy device
  assignment one needs root permission, so only root user can enable
  WBINVD emulation.
 
 That's not entirely accurate, legacy device assignment can be used by a
 non-root user, libvirt does this all the time.  The part that requires
 root access is opening the pci-sysfs config file, the rest can be
 managed via file permissions on the remaining sysfs files.
 
So how libvirt manages to do that as non-root user if pci-sysfs config
file needs root permission. I didn't mean to say that legacy code
checks for root explicitly, what I meant is that at some point root
permission is needed.

   Who does this permission checking here? Is only root
  allowed to create non coherent group with vfio?
 
 With vfio the user is granted permission by giving them access to the
 vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
 group to vfio.  That enables the user to create a container (~iommu
 domain) with the group attached to it.  Only then will the vfio external
 user interface provide a reference to the group and enable this wbinvd
 support.  So, wbinvd emulation should only be available to a user that
 own a vfio group and has it configured for use with this interface.
What is the default permission of /dev/vfio/$GROUP?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 09/11] kvm: simplify processor compat check

2013-09-29 Thread Aneesh Kumar K.V
Gleb Natapov g...@redhat.com writes:

 On Sat, Sep 28, 2013 at 09:06:47PM +0530, Aneesh Kumar K.V wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
  Il 27/09/2013 15:13, Aneesh Kumar K.V ha scritto:
  Alexander Graf ag...@suse.de writes:
  
  On 27.09.2013, at 12:03, Aneesh Kumar K.V wrote:
 
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
  Missing patch description.
 
  Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
  I fail to see how this really simplifies things, but at the end of the
  day it's Gleb's and Paolo's call.
  
  will do. It avoid calling 
  
for_each_online_cpu(cpu) {
smp_call_function_single() 
  
  on multiple architecture.
 
  I agree with Alex.
 
  The current code is not specially awesome; having
  kvm_arch_check_processor_compat take an int* disguised as a void* is a
  bit ugly indeed.
 
  However, the API makes sense and tells you that it is being passed as a
  callback (to smp_call_function_single in this case).
 
 But whether to check on all cpus or not is arch dependent right?.
 IIUC only x86 and ppc64 need to do that. Also on ppc64 it really
 depends on whether HV or PR. We need to check on all cpus only if it is
 HV. 
 
 
  You are making the API more complicated to use on the arch layer
  (because arch maintainers now have to think do I need to check this on
  all online CPUs?) and making the leaf POWER code less legible because
  it still has the weird void()(void *) calling convention.
 
 
 IIUC what we wanted to check is to find out whether kvm can run on this
 system. That is really an arch specific check. So for core kvm the call
 should be a simple 
 
 if (kvm_arch_check_process_compat()  0)
 error;
 We have that already, just return error from kvm_arch_hardware_setup. This
 is specific processor compatibility check and you are arguing that the
 processor check should be part of kvm_arch_hardware_setup().


What about the success case ?. ie, on arch like arm we do

void kvm_arch_check_processor_compat(void *rtn)
{
*(int *)rtn = 0;
}

for_each_online_cpu(cpu) {
smp_call_function_single(cpu,
kvm_arch_check_processor_compat,
r, 1);
if (r  0)
goto out_free_1;
}

There is no need to do that for loop for arm. 

The only reason I wanted this patch in the series is to make
kvm_arch_check_processor_compat take additional argument opaque. 
I am dropping that requirement in the last patch. Considering
that we have objection to this one, I will drop this patch in
the next posting by rearranging the patches.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu, numa: non-contiguous cpusets

2013-09-29 Thread Borislav Petkov
Btw,

while I got your attention, on a not-really related topic: how do we
feel about adding support for specifying a non-contiguous set of cpus
for a numa node in qemu with the -numa option? I.e., like this, for
example:

x86_64-softmmu/qemu-system-x86_64 -smp 8 -numa node,nodeid=0,cpus=0\;2\;4-5 
-numa node,nodeid=1,cpus=1\;3\;6-7

The ';' needs to be escaped from the shell but I'm open for better
suggestions.

Here's a diff:

---
diff --git a/vl.c b/vl.c
index 4e709d5c1c20..82a6c8451fb0 100644
--- a/vl.c
+++ b/vl.c
@@ -1261,9 +1261,27 @@ char *get_boot_devices_list(size_t *size)
 return list;
 }
 
+static int __numa_set_cpus(unsigned long *map, int start, int end)
+{
+if (end = MAX_CPUMASK_BITS) {
+end = MAX_CPUMASK_BITS - 1;
+fprintf(stderr,
+qemu: NUMA: A max of %d VCPUs are supported\n,
+ MAX_CPUMASK_BITS);
+return -EINVAL;
+}
+
+if (end  start) {
+return -EINVAL;
+}
+
+bitmap_set(map, start, end-start+1);
+return 0;
+}
+
 static void numa_node_parse_cpus(int nodenr, const char *cpus)
 {
-char *endptr;
+char *endptr, *ptr = (char *)cpus;
 unsigned long long value, endvalue;
 
 /* Empty CPU range strings will be considered valid, they will simply
@@ -1273,7 +1291,8 @@ static void numa_node_parse_cpus(int nodenr, const char 
*cpus)
 return;
 }
 
-if (parse_uint(cpus, value, endptr, 10)  0) {
+fromthetop:
+if (parse_uint(ptr, value, endptr, 10)  0) {
 goto error;
 }
 if (*endptr == '-') {
@@ -1282,22 +1301,22 @@ static void numa_node_parse_cpus(int nodenr, const char 
*cpus)
 }
 } else if (*endptr == '\0') {
 endvalue = value;
-} else {
-goto error;
-}
+} else if (*endptr == ';') {
+   if (__numa_set_cpus(node_cpumask[nodenr], value, value)  0) {
+   goto error;
+   }
+   endptr++;
+if (*endptr == '\0')
+   return;
 
-if (endvalue = MAX_CPUMASK_BITS) {
-endvalue = MAX_CPUMASK_BITS - 1;
-fprintf(stderr,
-qemu: NUMA: A max of %d VCPUs are supported\n,
- MAX_CPUMASK_BITS);
-}
+   ptr = endptr;
 
-if (endvalue  value) {
+   goto fromthetop;
+} else {
 goto error;
 }
 
-bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+__numa_set_cpus(node_cpumask[nodenr], value, endvalue);
 return;
 
 error:
--

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 09/11] kvm: simplify processor compat check

2013-09-29 Thread Gleb Natapov
On Sun, Sep 29, 2013 at 08:35:16PM +0530, Aneesh Kumar K.V wrote:
 Gleb Natapov g...@redhat.com writes:
 
  On Sat, Sep 28, 2013 at 09:06:47PM +0530, Aneesh Kumar K.V wrote:
  Paolo Bonzini pbonz...@redhat.com writes:
  
   Il 27/09/2013 15:13, Aneesh Kumar K.V ha scritto:
   Alexander Graf ag...@suse.de writes:
   
   On 27.09.2013, at 12:03, Aneesh Kumar K.V wrote:
  
   From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
   Missing patch description.
  
   Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
   I fail to see how this really simplifies things, but at the end of the
   day it's Gleb's and Paolo's call.
   
   will do. It avoid calling 
   
   for_each_online_cpu(cpu) {
   smp_call_function_single() 
   
   on multiple architecture.
  
   I agree with Alex.
  
   The current code is not specially awesome; having
   kvm_arch_check_processor_compat take an int* disguised as a void* is a
   bit ugly indeed.
  
   However, the API makes sense and tells you that it is being passed as a
   callback (to smp_call_function_single in this case).
  
  But whether to check on all cpus or not is arch dependent right?.
  IIUC only x86 and ppc64 need to do that. Also on ppc64 it really
  depends on whether HV or PR. We need to check on all cpus only if it is
  HV. 
  
  
   You are making the API more complicated to use on the arch layer
   (because arch maintainers now have to think do I need to check this on
   all online CPUs?) and making the leaf POWER code less legible because
   it still has the weird void()(void *) calling convention.
  
  
  IIUC what we wanted to check is to find out whether kvm can run on this
  system. That is really an arch specific check. So for core kvm the call
  should be a simple 
  
  if (kvm_arch_check_process_compat()  0)
  error;
  We have that already, just return error from kvm_arch_hardware_setup. This
  is specific processor compatibility check and you are arguing that the
  processor check should be part of kvm_arch_hardware_setup().
 
 
 What about the success case ?. ie, on arch like arm we do
 
 void kvm_arch_check_processor_compat(void *rtn)
 {
   *(int *)rtn = 0;
 }
 
 for_each_online_cpu(cpu) {
As I said they opted out from doing the check. They may reconsider after
first bad HW will be discovered.

   smp_call_function_single(cpu,
   kvm_arch_check_processor_compat,
   r, 1);
   if (r  0)
   goto out_free_1;
 }
 
 There is no need to do that for loop for arm. 
It's done once during module initialisation. Why is this a big deal?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-29 Thread Alex Williamson
On Sun, 2013-09-29 at 17:44 +0300, Gleb Natapov wrote:
 On Sun, Sep 29, 2013 at 07:52:28AM -0600, Alex Williamson wrote:
  On Sun, 2013-09-29 at 16:16 +0300, Gleb Natapov wrote:
   On Thu, Sep 12, 2013 at 03:23:15PM -0600, Alex Williamson wrote:
So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but there's any important point where that breaks down.  Intel
VT-d hardware may or may not support snoop control.  When snoop
control is available, intel-iommu promotes No-Snoop transactions on
PCIe to be cache coherent.  That allows KVM to handle things like the
x86 WBINVD opcode as a nop.  When the hardware does not support this,
KVM must implement a hardware visible WBINVD for the guest.

We could simply let userspace tell KVM how to handle WBINVD, but it's
privileged for a reason.  Allowing an arbitrary user to enable
physical WBINVD gives them a more access to the hardware.  Previously,
this has only been enabled for guests supporting legacy PCI device
assignment.  In such cases it's necessary for proper guest execution.
We therefore create a new KVM-VFIO virtual device.  The user can add
and remove VFIO groups to this device via file descriptors.  KVM
makes use of the VFIO external user interface to validate that the
user has access to physical hardware and gets the coherency state of
the IOMMU from VFIO.  This provides equivalent functionality to
legacy KVM assignment, while keeping (nearly) all the bits isolated.

   Looks good overall to me, one things though: to use legacy device
   assignment one needs root permission, so only root user can enable
   WBINVD emulation.
  
  That's not entirely accurate, legacy device assignment can be used by a
  non-root user, libvirt does this all the time.  The part that requires
  root access is opening the pci-sysfs config file, the rest can be
  managed via file permissions on the remaining sysfs files.
  
 So how libvirt manages to do that as non-root user if pci-sysfs config
 file needs root permission. I didn't mean to say that legacy code
 checks for root explicitly, what I meant is that at some point root
 permission is needed.

Yes, libvirt needs admin permission for legacy to bind to pci-stub,
change permission on sysfs files and pass an opened pci config sysfs
file descriptor.  For vfio libvirt needs admin permission to bind to
vfio-pci and change group file permission.  From that perspective the
admin requirement is similar.

Who does this permission checking here? Is only root
   allowed to create non coherent group with vfio?
  
  With vfio the user is granted permission by giving them access to the
  vfio group file (/dev/vfio/$GROUP) and binding all the devices in the
  group to vfio.  That enables the user to create a container (~iommu
  domain) with the group attached to it.  Only then will the vfio external
  user interface provide a reference to the group and enable this wbinvd
  support.  So, wbinvd emulation should only be available to a user that
  own a vfio group and has it configured for use with this interface.
 What is the default permission of /dev/vfio/$GROUP?

It's 600.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-29 Thread Jan Kiszka
On 2013-09-27 08:37, Jan Kiszka wrote:
 On 2013-09-26 22:44, Paolo Bonzini wrote:
 Il 26/09/2013 19:47, Paolo Bonzini ha scritto:

 If I only apply this hunk, which disables the preemption timer while
 in L1:

 @@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)

 load_vmcs12_host_state(vcpu, vmcs12);

 +   vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, 
 vmx_pin_based_exec_ctrl(vmx));
 +
 /* Update TSC_OFFSET if TSC was changed while L2 ran */
 vmcs_write64(TSC_OFFSET, vmx-nested.vmcs01_tsc_offset);

 then the testcase works for somewhat larger values of the preemption timer
 (up to ~150 TSC cycles), but then fails.
 
 Err, does this mean we run L1 with PIN_BASED_VM_EXEC_CONTROL of L2?
 Ouch. Should be fixed independently.

No, it doesn't mean this. L1 and L2 run on different VMCS, thus should
be able to set their PIN_BASED_VM_EXEC_CONTROL independently. I have no
clue ATM what that hunk can make a difference for you. Will have a
closer look.

BTW, aren't many VMCS fields written redundantly in prepare_vmcs02? I
mean, those that weren't changed by L1 in the shadow VMCS or require
updates for other reasons. There seems to be some room for saving a few
cycles.

Jan




signature.asc
Description: OpenPGP digital signature


Re: KVM call for agenda for 2013-10-01

2013-09-29 Thread Michael S. Tsirkin
On Tue, Sep 24, 2013 at 04:09:56PM +0200, Juan Quintela wrote:
 
 Hi
 
 Please, send any topic that you are interested in covering.
 
 Last week I forgot to send the call for topics.  We still have a topic there.
 
 Thanks, Juan.
 
 Agenda so far:
 - Talk about qemu reverse executing (1st description was done this week)
   How to handle IO when we want to do reverse execution.
   How this relate to Kemari needs?
   And to icount changes?

- acpi generation patches - ok to merge?
This should be a short item, maybe we
can start with it?


 Call details:
 
 10:00 AM to 11:00 AM EDT
 Every two weeks
 
 If you need phone number details,  contact me privately.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 09/11] kvm: simplify processor compat check

2013-09-29 Thread Gleb Natapov
On Sat, Sep 28, 2013 at 09:06:47PM +0530, Aneesh Kumar K.V wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
  Il 27/09/2013 15:13, Aneesh Kumar K.V ha scritto:
  Alexander Graf ag...@suse.de writes:
  
  On 27.09.2013, at 12:03, Aneesh Kumar K.V wrote:
 
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
  Missing patch description.
 
  Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
  I fail to see how this really simplifies things, but at the end of the
  day it's Gleb's and Paolo's call.
  
  will do. It avoid calling 
  
 for_each_online_cpu(cpu) {
 smp_call_function_single() 
  
  on multiple architecture.
 
  I agree with Alex.
 
  The current code is not specially awesome; having
  kvm_arch_check_processor_compat take an int* disguised as a void* is a
  bit ugly indeed.
 
  However, the API makes sense and tells you that it is being passed as a
  callback (to smp_call_function_single in this case).
 
 But whether to check on all cpus or not is arch dependent right?.
 IIUC only x86 and ppc64 need to do that. Also on ppc64 it really
 depends on whether HV or PR. We need to check on all cpus only if it is
 HV. 
 
 
  You are making the API more complicated to use on the arch layer
  (because arch maintainers now have to think do I need to check this on
  all online CPUs?) and making the leaf POWER code less legible because
  it still has the weird void()(void *) calling convention.
 
 
 IIUC what we wanted to check is to find out whether kvm can run on this
 system. That is really an arch specific check. So for core kvm the call
 should be a simple 
 
 if (kvm_arch_check_process_compat()  0)
 error;
We have that already, just return error from kvm_arch_hardware_setup. This
is specific processor compatibility check and you are arguing that the
processor check should be part of kvm_arch_hardware_setup().

 
 Now how each arch figure out whether kvm can run on this system should
 be arch specific. For x86 we do need to check all the cpus. On ppc64 for
 HV we need to. For other archs we always allow kvm. 
 
This is really a sanity check. Theoretically on x86 we also should
not need to check all cpus since SMP configuration with different cpu
models is not supported by the architecture (AFAIK), but bugs happen
(BIOS bugs may cause difference in capabilities for instance). So some
arches opted out from this sanity check for now and this is their choice,
but the code makes it explicit what are we checking here.

 
  If anything, you could change kvm_arch_check_processor_compat to return
  an int and accept no argument, and introduce a wrapper that kvm_init
  passes to smp_call_function_single.
 
 What i am suggesting in the patch is to avoid calling
 smp_call_function_single from kvm_init and let arch decide whether to
 check on all cpus or not.
 
 -aneesh

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: Implement default IRQ routing

2013-09-29 Thread Gleb Natapov
On Thu, Sep 26, 2013 at 10:00:59AM +1000, Paul Mackerras wrote:
 On Mon, Sep 23, 2013 at 09:34:01PM +0300, Gleb Natapov wrote:
  On Mon, Sep 23, 2013 at 09:24:14PM +1000, Paul Mackerras wrote:
   On Sun, Sep 22, 2013 at 03:32:53PM +0300, Gleb Natapov wrote:
On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote:
 This implements a simple way to express the case of IRQ routing where
 there is one in-kernel PIC and the system interrupts (GSIs) are routed
 1-1 to the corresponding pins of the PIC.  This is expressed by having
 kvm-irq_routing == NULL with a skeleton irq routing entry in the new
 kvm-default_irq_route field.
 
 This provides a convenient way to provide backwards compatibility when
 adding IRQ routing capability to an existing in-kernel PIC, such as 
 the
 XICS emulation on powerpc.
 
Why not create simple 1-1 irq routing table? It will take a little bit
more memory, but there will be no need for kvm-irq_routing == NULL
special handling.
   
   The short answer is that userspace wants to use interrupt source
   numbers (i.e. pin numbers for the inputs to the emulated XICS) that
   are scattered throughout a large space, since that mirrors what real
   hardware does.  More specifically, hardware divides up the interrupt
   source number into two fields, each of typically 12 bits, where the
   more significant field identifies an interrupt source unit (ISU) and
   the less significant field identifies an interrupt within the ISU.
   Each PCI host bridge would have an ISU, for example, and there can be
   ISUs associated with other things that attach directly to the
   interconnect fabric (coprocessors, cluster interconnects, etc.).
   
   Today, QEMU creates a virtual ISU numbered 1 for the emulated PCI host
   bridge, which means for example that virtio devices get interrupt pin
   numbers starting at 4096.
   
   So, I could have increased KVM_IRQCHIP_NUM_PINS to some multiple of
   4096, say 16384, which would allow for 3 ISUs.  But that would bloat
   out struct kvm_irq_routing_table to over 64kB, and if I wanted 1-1
   mappings between GSI and pins for all of them, the routing table would
   be over 960kB.
   
  Yes, this is not an option. GSI is just a cookie for anything but x86
  non MSI interrupts. So the way to use irq routing table to deliver XICS irqs
  is to register GSI-XICS irq mapping and by triggering GSI, which is
  just an arbitrary number, userspace tells kernel that XICS irq, that was
  registered for that GSI, should be injected.
 
 Yes, that's fine as far as it goes, but the trouble is that the
 existing data structures (specifically the chip[][] array in struct
 kvm_irq_routing_table) don't handle well the case where the pin
 numbers are large and/or sparse.
 
 In other words, using a small compact set of GSI numbers wouldn't
 help, because it's not the GSI - pin mapping that is the problem, it
 is the reverse pin - GSI mapping.
 
That's internal implementation detail that can be redesigned if needed,
we can let architecture define how irq is mapped back to a GSI (cookie).
Buts since you mentioned chip[][] here I have a question about patch 2: in
kvm_set_routing_entry() you check irq against KVM_IRQCHIP_NUM_PINS which
is defined to be 256 for powerpc, but according to what you described
above about 12 bit ISU/irq devision this is not enough to inject any
interrupt with ISU 1. You would have had at least 256 irqs in each ISU
if you reused irqchip to specify ISU, but you didn't do it. Why not extend
a union in kvm_irq_routing_entry with xics specific structure that will
specify ISU/irq withing ISU?

Who irq numbers inside ISU are assigned?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 09/11] kvm: simplify processor compat check

2013-09-29 Thread Aneesh Kumar K.V
Gleb Natapov g...@redhat.com writes:

 On Sat, Sep 28, 2013 at 09:06:47PM +0530, Aneesh Kumar K.V wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
  Il 27/09/2013 15:13, Aneesh Kumar K.V ha scritto:
  Alexander Graf ag...@suse.de writes:
  
  On 27.09.2013, at 12:03, Aneesh Kumar K.V wrote:
 
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
  Missing patch description.
 
  Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
  I fail to see how this really simplifies things, but at the end of the
  day it's Gleb's and Paolo's call.
  
  will do. It avoid calling 
  
for_each_online_cpu(cpu) {
smp_call_function_single() 
  
  on multiple architecture.
 
  I agree with Alex.
 
  The current code is not specially awesome; having
  kvm_arch_check_processor_compat take an int* disguised as a void* is a
  bit ugly indeed.
 
  However, the API makes sense and tells you that it is being passed as a
  callback (to smp_call_function_single in this case).
 
 But whether to check on all cpus or not is arch dependent right?.
 IIUC only x86 and ppc64 need to do that. Also on ppc64 it really
 depends on whether HV or PR. We need to check on all cpus only if it is
 HV. 
 
 
  You are making the API more complicated to use on the arch layer
  (because arch maintainers now have to think do I need to check this on
  all online CPUs?) and making the leaf POWER code less legible because
  it still has the weird void()(void *) calling convention.
 
 
 IIUC what we wanted to check is to find out whether kvm can run on this
 system. That is really an arch specific check. So for core kvm the call
 should be a simple 
 
 if (kvm_arch_check_process_compat()  0)
 error;
 We have that already, just return error from kvm_arch_hardware_setup. This
 is specific processor compatibility check and you are arguing that the
 processor check should be part of kvm_arch_hardware_setup().


What about the success case ?. ie, on arch like arm we do

void kvm_arch_check_processor_compat(void *rtn)
{
*(int *)rtn = 0;
}

for_each_online_cpu(cpu) {
smp_call_function_single(cpu,
kvm_arch_check_processor_compat,
r, 1);
if (r  0)
goto out_free_1;
}

There is no need to do that for loop for arm. 

The only reason I wanted this patch in the series is to make
kvm_arch_check_processor_compat take additional argument opaque. 
I am dropping that requirement in the last patch. Considering
that we have objection to this one, I will drop this patch in
the next posting by rearranging the patches.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 09/11] kvm: simplify processor compat check

2013-09-29 Thread Gleb Natapov
On Sun, Sep 29, 2013 at 08:35:16PM +0530, Aneesh Kumar K.V wrote:
 Gleb Natapov g...@redhat.com writes:
 
  On Sat, Sep 28, 2013 at 09:06:47PM +0530, Aneesh Kumar K.V wrote:
  Paolo Bonzini pbonz...@redhat.com writes:
  
   Il 27/09/2013 15:13, Aneesh Kumar K.V ha scritto:
   Alexander Graf ag...@suse.de writes:
   
   On 27.09.2013, at 12:03, Aneesh Kumar K.V wrote:
  
   From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
   Missing patch description.
  
   Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
   I fail to see how this really simplifies things, but at the end of the
   day it's Gleb's and Paolo's call.
   
   will do. It avoid calling 
   
   for_each_online_cpu(cpu) {
   smp_call_function_single() 
   
   on multiple architecture.
  
   I agree with Alex.
  
   The current code is not specially awesome; having
   kvm_arch_check_processor_compat take an int* disguised as a void* is a
   bit ugly indeed.
  
   However, the API makes sense and tells you that it is being passed as a
   callback (to smp_call_function_single in this case).
  
  But whether to check on all cpus or not is arch dependent right?.
  IIUC only x86 and ppc64 need to do that. Also on ppc64 it really
  depends on whether HV or PR. We need to check on all cpus only if it is
  HV. 
  
  
   You are making the API more complicated to use on the arch layer
   (because arch maintainers now have to think do I need to check this on
   all online CPUs?) and making the leaf POWER code less legible because
   it still has the weird void()(void *) calling convention.
  
  
  IIUC what we wanted to check is to find out whether kvm can run on this
  system. That is really an arch specific check. So for core kvm the call
  should be a simple 
  
  if (kvm_arch_check_process_compat()  0)
  error;
  We have that already, just return error from kvm_arch_hardware_setup. This
  is specific processor compatibility check and you are arguing that the
  processor check should be part of kvm_arch_hardware_setup().
 
 
 What about the success case ?. ie, on arch like arm we do
 
 void kvm_arch_check_processor_compat(void *rtn)
 {
   *(int *)rtn = 0;
 }
 
 for_each_online_cpu(cpu) {
As I said they opted out from doing the check. They may reconsider after
first bad HW will be discovered.

   smp_call_function_single(cpu,
   kvm_arch_check_processor_compat,
   r, 1);
   if (r  0)
   goto out_free_1;
 }
 
 There is no need to do that for loop for arm. 
It's done once during module initialisation. Why is this a big deal?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html