Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
 Gleb Natapov g...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
  H. Peter Anvin h...@zytor.com writes:
  
   On 06/05/2013 03:08 PM, Anthony Liguori wrote:
  
   Definitely an option.  However, we want to be able to boot from native
   devices, too, so having an I/O BAR (which would not be used by the OS
   driver) should still at the very least be an option.
   
   What makes it so difficult to work with an MMIO bar for PCI-e?
   
   With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
   forward.  Is there something special about PCI-e here?
   
  
   It's not tracking allocation.  It is that accessing memory above 1 MiB
   is incredibly painful in the BIOS environment, which basically means
   MMIO is inaccessible.
  
  Oh, you mean in real mode.
  
  SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
  There are loads of ASSERT32FLAT()s in the code to make sure of this.
  
  Well, not exactly. Initialization is done in 32bit, but disk
  reads/writes are done in 16bit mode since it should work from int13
  interrupt handler. The only way I know to access MMIO bars from 16 bit
  is to use SMM which we do not have in KVM.
 
 Ah, if it's just the dataplane operations then there's another solution.
 
 We can introduce a virtqueue flag that asks the backend to poll for new
 requests.  Then SeaBIOS can add the request to the queue and not worry
 about kicking or reading the ISR.

This will pin a host CPU.
If we do something timer based it will likely
both increase host CPU utilization and slow device down.

If we didn't care about performance at all we could
do config cycles for signalling, which is much
more elegant than polling in host, but I don't think
that's the case.

 
 SeaBIOS is polling for completion anyway.

I think that's different because a disk will normally respond
quickly. So it polls a bit, but then it stops as
there are no outstanding requests.

-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] add irq priodrop support

2013-06-11 Thread Mario Smarduch
This is the same Interrupt Priority Drop/Deactivation patch 
emailed some time back (except for 3.10-rc4) used by the initial 
device pass-through support. 

When enabled all IRQs on host write to distributor EOIR and 
DIR reg to dr-prioritize/de-activate an interrupt. For device 
that's passed through only the EOIR is written
to drop the priority, the Guest deactivates it when 
it handles its EOI. This supports exitless EOI that's agnostic
to bus type (i.e. PCI)

The patch has been tested for all configurations:
Host: No Prio Drop  Guest: No Prio Drop
Host: Prio DROP Guest: No Prio Drop
Host: Prio Drop Guest: Prio Drop 

- Mario

Signed-off-by: Mario Smarduch mario.smard...@huawei.com
---
 arch/arm/kvm/Kconfig|8 +++
 drivers/irqchip/irq-gic.c   |  145 ++-
 include/linux/irqchip/arm-gic.h |6 ++
 3 files changed, 156 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 370e1a8..c0c9f3c 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -59,6 +59,14 @@ config KVM_ARM_VGIC
---help---
  Adds support for a hardware assisted, in-kernel GIC emulation.
 
+config KVM_ARM_INT_PRIO_DROP
+bool KVM support for Interrupt pass-through
+depends on KVM_ARM_VGIC  OF
+default n
+---help---
+  Seperates interrupt priority drop and deactivation to enable device
+  pass-through to Guests.
+
 config KVM_ARM_TIMER
bool KVM support for Architected Timers
depends on KVM_ARM_VGIC  ARM_ARCH_TIMER
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 1760ceb..9fb4ef3 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -41,10 +41,13 @@
 #include linux/slab.h
 #include linux/irqchip/chained_irq.h
 #include linux/irqchip/arm-gic.h
+#include linux/irqflags.h
+#include linux/bitops.h
 
 #include asm/irq.h
 #include asm/exception.h
 #include asm/smp_plat.h
+#include asm/virt.h
 
 #include irqchip.h
 
@@ -99,6 +102,20 @@ struct irq_chip gic_arch_extn = {
 
 static struct gic_chip_data gic_data[MAX_GIC_NR] __read_mostly;
 
+#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP
+/*
+ * Priority drop/deactivation bit map, 1st 16 bits used for SGIs, this bit map
+ * is shared by several guests. If bit is set only execute EOI which drops
+ * current priority but not deactivation.
+ */
+static u32  gic_irq_prio_drop[DIV_ROUND_UP(1020, 32)] __read_mostly;
+static void gic_eoi_irq_priodrop(struct irq_data *);
+#endif
+
+static void gic_enable_gicc(void __iomem *);
+static void gic_eoi_sgi(u32, void __iomem *);
+static void gic_priodrop_remap_eoi(struct irq_chip *);
+
 #ifdef CONFIG_GIC_NON_BANKED
 static void __iomem *gic_get_percpu_base(union gic_base *base)
 {
@@ -296,7 +313,7 @@ static asmlinkage void __exception_irq_entry 
gic_handle_irq(struct pt_regs *regs
continue;
}
if (irqnr  16) {
-   writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
+   gic_eoi_sgi(irqstat, cpu_base);
 #ifdef CONFIG_SMP
handle_IPI(irqnr, regs);
 #endif
@@ -450,7 +467,7 @@ static void __cpuinit gic_cpu_init(struct gic_chip_data 
*gic)
writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4 / 
4);
 
writel_relaxed(0xf0, base + GIC_CPU_PRIMASK);
-   writel_relaxed(1, base + GIC_CPU_CTRL);
+   gic_enable_gicc(base);
 }
 
 #ifdef CONFIG_CPU_PM
@@ -585,7 +602,7 @@ static void gic_cpu_restore(unsigned int gic_nr)
writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4);
 
writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK);
-   writel_relaxed(1, cpu_base + GIC_CPU_CTRL);
+   gic_enable_gicc(cpu_base);
 }
 
 static int gic_notifier(struct notifier_block *self, unsigned long cmd,
void *v)
@@ -666,6 +683,7 @@ void gic_raise_softirq(const struct cpumask *mask, unsigned 
int irq)
 static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
irq_hw_number_t hw)
 {
+   gic_priodrop_remap_eoi(gic_chip);
if (hw  32) {
irq_set_percpu_devid(irq);
irq_set_chip_and_handler(irq, gic_chip,
@@ -857,4 +875,125 @@ IRQCHIP_DECLARE(cortex_a9_gic, arm,cortex-a9-gic, 
gic_of_init);
 IRQCHIP_DECLARE(msm_8660_qgic, qcom,msm-8660-qgic, gic_of_init);
 IRQCHIP_DECLARE(msm_qgic2, qcom,msm-qgic2, gic_of_init);
 
+#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP
+/* If HYP mode enabled and PRIO DROP set EOIR function to handle PRIO DROP */
+static inline void gic_priodrop_remap_eoi(struct irq_chip *chip)
+{
+   if (is_hyp_mode_available())
+   chip-irq_eoi = gic_eoi_irq_priodrop;
+}
+
+/* If HYP mode set enable interrupt priority drop/deactivation, and mark
+ * SGIs to deactive through writes to GCICC_DIR. For Guest only enable normal
+ * mode.
+ */
+static void gic_enable_gicc(void __iomem *gicc_base)
+{
+  

[PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Mario Smarduch

This is the initial device pass through support.
At this time host == guest only is supported.
Basic Operation:

- QEMU parameters: -device kvm-device-assign,host=device name
  for example - kvm-device-assign,host='arm-sp804'. Essentially
  any device that does PIO should be supported.
- Host DTS contains the node for device to be passed through
  The host driver is unbound or not compiled in.
- For Guest the intent is to add a DTS node that QEMU can
  parse and find the guest attributes (Mem. resource, IRQs)
  For now these values default to host. This is a future
  work item to get this working on board other then vexpress.
- The physical interrupt is always passed through to CPU
  where the target vCPU executes or will execute.
  Current approach - pins vCPUs to physical CPUs, when 
  Guest updates CPU affinity is updated in KVM vgic dist
  code. Future work item for IRQ affinity allow vCPU to
  float and on schedule in handle IRQ affinity. For high
  IRQ rates (i.e. wireless NEs) static binding may be used. 
  For some other device (env. mgmt IPMI)where latency is not
  important dynamic may be used, it should be upto the user.
- To support flexible affinity a mask is introduced (QEMU param0
  (although not used here yet)
  o vCPU affinity - vCPU -- CPU binding, the IRQ physical
CPU binding follows vCPU binding dynamically.
- Obviously DMA is not supported
  - early DMA may be supported through a 1:1 mapping but it's unsafe
and so far we don't know of any hardware that's not behind SMMU.
This option may be useful in some embedded/wireless environments,
where the guest may want to swap, secure isolation may not be
an issue or device like look aside crypto engine is not behind IOMMU.
  - IOMMU/VFIO support is key and next item for us to work on. Especially 
for ETSI NFV VFIO is key since 4G/IMS NE pull packets
of wire and switch them directly in user space.

The patch has been tested on fast models in couple ways:
- UP Guest with sp804 timer only - works consistently
- SMP Guest with sp804 timer works consistently. 
  Writes to '/proc/irq/sp804 irq/smp_affinity' 
  confirm dynamic CPU affinity.
- IRQ rates (maybe not that important give its emulated env) reached
  excess of 500.

There is a QEMU piece very simple for now that I will
email later, in case someone would like to test.

- Mario



Signed-off-by: Mario Smarduch mario.smard...@huawei.com
---
 arch/arm/include/asm/kvm_host.h |   14 +++
 arch/arm/include/asm/kvm_vgic.h |   10 +++
 arch/arm/kvm/Makefile   |1 +
 arch/arm/kvm/arm.c  |   60 +
 arch/arm/kvm/assign-dev.c   |  189 +++
 arch/arm/kvm/vgic.c |  106 ++
 include/linux/irqchip/arm-gic.h |1 +
 include/uapi/linux/kvm.h|   33 +++
 8 files changed, 414 insertions(+)
 create mode 100644 arch/arm/kvm/assign-dev.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 57cb786..c6ad3a3 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -67,6 +67,10 @@ struct kvm_arch {
 
/* Interrupt controller */
struct vgic_distvgic;
+
+   /* Device Passthrough Fields */
+   struct list_headassigned_dev_head;
+   struct mutexdev_pasthru_lock;
 };
 
 #define KVM_NR_MEM_OBJS 40
@@ -146,6 +150,13 @@ struct kvm_vcpu_stat {
u32 halt_wakeup;
 };
 
+struct kvm_arm_assigned_dev_kernel {
+   struct list_head list;
+   struct kvm_arm_assigned_device dev;
+   irqreturn_t (*irq_handler)(int, void *);
+   void *irq_arg;
+};
+
 struct kvm_vcpu_init;
 int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
const struct kvm_vcpu_init *init);
@@ -156,6 +167,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct 
kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 u64 kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
+int kvm_arm_get_device_resources(struct kvm *,
+   struct kvm_arm_get_device_resources *);
+int kvm_arm_assign_device(struct kvm *, struct kvm_arm_assigned_device *);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 struct kvm;
diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 343744e..c4370ae 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -107,6 +107,16 @@ struct vgic_dist {
 
/* Bitmap indicating which CPU has something pending */
unsigned long   irq_pending_on_cpu;
+
+   /* Device passthrough  fields */
+   /* Host irq to guest irq mapping */
+   u8  guest_irq[VGIC_NR_SHARED_IRQS];
+
+   /* Pending passthruogh irq */
+   struct vgic_bitmap  pasthru_spi_pending;
+
+   /* At least one passthrough IRQ pending for some vCPU */
+   u32 pasthru_pending;
 #endif
 };
 
diff 

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Gleb Natapov
On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
 On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
  Gleb Natapov g...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
   H. Peter Anvin h...@zytor.com writes:
   
On 06/05/2013 03:08 PM, Anthony Liguori wrote:
   
Definitely an option.  However, we want to be able to boot from 
native
devices, too, so having an I/O BAR (which would not be used by the OS
driver) should still at the very least be an option.

What makes it so difficult to work with an MMIO bar for PCI-e?

With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
straight
forward.  Is there something special about PCI-e here?

   
It's not tracking allocation.  It is that accessing memory above 1 MiB
is incredibly painful in the BIOS environment, which basically means
MMIO is inaccessible.
   
   Oh, you mean in real mode.
   
   SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
   There are loads of ASSERT32FLAT()s in the code to make sure of this.
   
   Well, not exactly. Initialization is done in 32bit, but disk
   reads/writes are done in 16bit mode since it should work from int13
   interrupt handler. The only way I know to access MMIO bars from 16 bit
   is to use SMM which we do not have in KVM.
  
  Ah, if it's just the dataplane operations then there's another solution.
  
  We can introduce a virtqueue flag that asks the backend to poll for new
  requests.  Then SeaBIOS can add the request to the queue and not worry
  about kicking or reading the ISR.
 
 This will pin a host CPU.
 If we do something timer based it will likely
 both increase host CPU utilization and slow device down.
 
 If we didn't care about performance at all we could
 do config cycles for signalling, which is much
 more elegant than polling in host, but I don't think
 that's the case.
 
I wouldn't call BIOS int13 interface performance critical.

  
  SeaBIOS is polling for completion anyway.
 
 I think that's different because a disk will normally respond
 quickly. So it polls a bit, but then it stops as
 there are no outstanding requests.
 
 -- 
 MST

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
 On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
  On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
   Gleb Natapov g...@redhat.com writes:
   
On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot from 
 native
 devices, too, so having an I/O BAR (which would not be used by the 
 OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
 straight
 forward.  Is there something special about PCI-e here?
 

 It's not tracking allocation.  It is that accessing memory above 1 
 MiB
 is incredibly painful in the BIOS environment, which basically means
 MMIO is inaccessible.

Oh, you mean in real mode.

SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
There are loads of ASSERT32FLAT()s in the code to make sure of this.

Well, not exactly. Initialization is done in 32bit, but disk
reads/writes are done in 16bit mode since it should work from int13
interrupt handler. The only way I know to access MMIO bars from 16 bit
is to use SMM which we do not have in KVM.
   
   Ah, if it's just the dataplane operations then there's another solution.
   
   We can introduce a virtqueue flag that asks the backend to poll for new
   requests.  Then SeaBIOS can add the request to the queue and not worry
   about kicking or reading the ISR.
  
  This will pin a host CPU.
  If we do something timer based it will likely
  both increase host CPU utilization and slow device down.
  
  If we didn't care about performance at all we could
  do config cycles for signalling, which is much
  more elegant than polling in host, but I don't think
  that's the case.
  
 I wouldn't call BIOS int13 interface performance critical.

So the plan always was to
- add an MMIO BAR
- add a register for pci-config based access to devices

hpa felt performance does matter there but didn't clarify why ...

   
   SeaBIOS is polling for completion anyway.
  
  I think that's different because a disk will normally respond
  quickly. So it polls a bit, but then it stops as
  there are no outstanding requests.
  
  -- 
  MST
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Gleb Natapov
On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
   On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
Gleb Natapov g...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to boot from 
  native
  devices, too, so having an I/O BAR (which would not be used by 
  the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
  straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory above 1 
  MiB
  is incredibly painful in the BIOS environment, which basically 
  means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
 layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.
 
 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from int13
 interrupt handler. The only way I know to access MMIO bars from 16 bit
 is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another solution.

We can introduce a virtqueue flag that asks the backend to poll for new
requests.  Then SeaBIOS can add the request to the queue and not worry
about kicking or reading the ISR.
   
   This will pin a host CPU.
   If we do something timer based it will likely
   both increase host CPU utilization and slow device down.
   
   If we didn't care about performance at all we could
   do config cycles for signalling, which is much
   more elegant than polling in host, but I don't think
   that's the case.
   
  I wouldn't call BIOS int13 interface performance critical.
 
 So the plan always was to
 - add an MMIO BAR
 - add a register for pci-config based access to devices
 
 hpa felt performance does matter there but didn't clarify why ...
 
You do not what to make it too slow obviously, this is interface that is
used to load OS during boot.


SeaBIOS is polling for completion anyway.
   
   I think that's different because a disk will normally respond
   quickly. So it polls a bit, but then it stops as
   there are no outstanding requests.
   
   -- 
   MST
  
  --
  Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
   On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
Gleb Natapov g...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to boot from 
  native
  devices, too, so having an I/O BAR (which would not be used by 
  the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
  straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory above 1 
  MiB
  is incredibly painful in the BIOS environment, which basically 
  means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
 layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.
 
 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from int13
 interrupt handler. The only way I know to access MMIO bars from 16 bit
 is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another solution.

We can introduce a virtqueue flag that asks the backend to poll for new
requests.  Then SeaBIOS can add the request to the queue and not worry
about kicking or reading the ISR.
   
   This will pin a host CPU.
   If we do something timer based it will likely
   both increase host CPU utilization and slow device down.
   
   If we didn't care about performance at all we could
   do config cycles for signalling, which is much
   more elegant than polling in host, but I don't think
   that's the case.
   
  I wouldn't call BIOS int13 interface performance critical.
 
 So the plan always was to
 - add an MMIO BAR
 - add a register for pci-config based access to devices
 
 hpa felt performance does matter there but didn't clarify why ...

Also

gleb mst, well the question is if it is safe to call int13 in the
middle of pci bus enumeration/configuration
gleb mst, and int13 predates PCI, so how knows


SeaBIOS is polling for completion anyway.
   
   I think that's different because a disk will normally respond
   quickly. So it polls a bit, but then it stops as
   there are no outstanding requests.
   
   -- 
   MST
  
  --
  Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Dev Passthrough QEMU patch

2013-06-11 Thread Mario Smarduch


This patch is for testing only and goes along with other
two patches for priodrop and dev passthrough, it should apply against
1.4.5. 

diff --git a/cpus.c b/cpus.c
index c15ff6c..0c19214 100644
--- a/cpus.c
+++ b/cpus.c
@@ -737,6 +737,26 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 CPUState *cpu = ENV_GET_CPU(env);
 int r;
 
+/* For now just do a 1:1 vCPU binding as they come online for device
+ * pass through
+ */
+cpu_set_t cpuset;
+int ret, i;
+unsigned long cpu_index = kvm_arch_vcpu_id(cpu);
+
+CPU_ZERO(cpuset);
+CPU_SET(cpu_index, cpuset);
+ret = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), cpuset);
+if(ret != 0) {
+   printf(pthread_setaffinity_np failed to setaffinity to CPU 0\n);
+exit(-1);
+}
+
+CPU_ZERO(cpuset);
+pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t), cpuset);
+if(CPU_ISSET(cpu_index,cpuset))
+printf(Binding: vCPU %ld -- CPU %d\n, cpu_index, i);
+
 qemu_mutex_lock(qemu_global_mutex);
 qemu_thread_get_self(cpu-thread);
 cpu-thread_id = qemu_get_thread_id();
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index caca979..46c2c59 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -904,6 +904,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_HTAB_FD  _IOW(KVMIO,  0xaa, struct kvm_get_htab_fd)
 /* Available with KVM_CAP_ARM_SET_DEVICE_ADDR */
 #define KVM_ARM_SET_DEVICE_ADDR  _IOW(KVMIO,  0xab, struct 
kvm_arm_device_addr)
+#define KVM_ARM_GET_DEVICE_RESOURCES _IOW(KVMIO,  0xe1, struct 
kvm_arm_get_device_resources)
+#define KVM_ARM_ASSIGN_DEVICE_IOW(KVMIO,  0xe2, struct 
kvm_arm_assigned_device)
 
 /*
  * ioctls for vcpu fds
@@ -1013,6 +1015,7 @@ struct kvm_assigned_irq {
};
 };
 
+
 struct kvm_assigned_msix_nr {
__u32 assigned_dev_id;
__u16 entry_nr;
@@ -1027,4 +1030,33 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+
+/* MAX 6 MMIO resources per device */
+#define MAX_RES_PER_DEVICE  6
+struct kvm_arm_get_device_resources {
+chardevname[128];
+__u32   resource_cnt;
+struct {
+__u64   hpa;
+__u32   size;
+__u32   attr;
+   charhost_name[64];
+} host_resources[MAX_RES_PER_DEVICE];
+   struct {
+   __u32   hwirq;
+   __u32   attr;   
+   charhost_name[64];
+   } hostirq;
+};
+
+struct kvm_guest_device_resources {
+__u64   gpa[MAX_RES_PER_DEVICE];
+__u32   girq;
+};
+
+struct kvm_arm_assigned_device {
+struct  kvm_arm_get_device_resources dev_res;
+struct  kvm_guest_device_resources guest_res;
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index d89b57c..9aee84e 100644
--- a/target-arm/Makefile.objs
+++ b/target-arm/Makefile.objs
@@ -1,5 +1,5 @@
 obj-y += arm-semi.o
 obj-$(CONFIG_SOFTMMU) += machine.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM) += kvm.o device-assign.o
 obj-y += translate.o op_helper.o helper.o cpu.o
 obj-y += neon_helper.o iwmmxt_helper.o
diff --git a/target-arm/device-assign.c b/target-arm/device-assign.c
new file mode 100644
index 000..e4d0e97
--- /dev/null
+++ b/target-arm/device-assign.c
@@ -0,0 +1,118 @@
+
+#include hw/sysbus.h
+#include qemu-common.h
+#include hw/qdev.h
+#include hw/ptimer.h
+#include kvm_arm.h
+#include qemu/error-report.h
+
+#define IORESOURCE_TYPE_BITS0x1f00  /* Resource type */
+#define IORESOURCE_IO   0x0100  /* PCI/ISA I/O ports */
+#define IORESOURCE_MEM  0x0200
+#define IORESOURCE_REG  0x0300  /* Register offsets */
+#define IORESOURCE_IRQ  0x0400
+#define IORESOURCE_DMA  0x0800
+
+#define IORESOURCE_PREFETCH 0x2000  /* No side effects */
+#define IORESOURCE_READONLY 0x4000
+#define IORESOURCE_CACHEABLE0x8000
+
+typedef struct {
+SysBusDevice busdev;
+char   *devname;
+uint64_t   hpa, gpa;
+uint32_t   dev_size;
+uint32_t   hirq,girq;
+} AssignedDevice;
+
+static Property device_assign_properties[] = {
+DEFINE_PROP_STRING(host, AssignedDevice, devname),
+DEFINE_PROP_UINT64(hpa, AssignedDevice, hpa, 0),
+DEFINE_PROP_UINT64(gpa, AssignedDevice, gpa, 0),
+DEFINE_PROP_UINT32(size, AssignedDevice, dev_size, 0),
+DEFINE_PROP_UINT32(hostirq, AssignedDevice, hirq, 0),
+DEFINE_PROP_UINT32(guestirq, AssignedDevice, girq, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static int assign_device(AssignedDevice *dev)
+{
+int ret,i;
+struct kvm_arm_get_device_resources dev_res;
+struct kvm_arm_assigned_device dev_assigned;
+char *p, c='-';
+
+memset(dev_res,0,sizeof(dev_res));
+memset(dev_assigned,0,sizeof(dev_assigned));
+
+if((p = strstr(dev-devname, (char *)c)) != (char *) NULL)
+   *p = ',';
+

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote:
 On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
  On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
   On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
 Gleb Natapov g...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
  H. Peter Anvin h...@zytor.com writes:
  
   On 06/05/2013 03:08 PM, Anthony Liguori wrote:
  
   Definitely an option.  However, we want to be able to boot 
   from native
   devices, too, so having an I/O BAR (which would not be used by 
   the OS
   driver) should still at the very least be an option.
   
   What makes it so difficult to work with an MMIO bar for PCI-e?
   
   With legacy PCI, tracking allocation of MMIO vs. PIO is pretty 
   straight
   forward.  Is there something special about PCI-e here?
   
  
   It's not tracking allocation.  It is that accessing memory above 
   1 MiB
   is incredibly painful in the BIOS environment, which basically 
   means
   MMIO is inaccessible.
  
  Oh, you mean in real mode.
  
  SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
  layout.
  There are loads of ASSERT32FLAT()s in the code to make sure of 
  this.
  
  Well, not exactly. Initialization is done in 32bit, but disk
  reads/writes are done in 16bit mode since it should work from int13
  interrupt handler. The only way I know to access MMIO bars from 16 
  bit
  is to use SMM which we do not have in KVM.
 
 Ah, if it's just the dataplane operations then there's another 
 solution.
 
 We can introduce a virtqueue flag that asks the backend to poll for 
 new
 requests.  Then SeaBIOS can add the request to the queue and not worry
 about kicking or reading the ISR.

This will pin a host CPU.
If we do something timer based it will likely
both increase host CPU utilization and slow device down.

If we didn't care about performance at all we could
do config cycles for signalling, which is much
more elegant than polling in host, but I don't think
that's the case.

   I wouldn't call BIOS int13 interface performance critical.
  
  So the plan always was to
  - add an MMIO BAR
  - add a register for pci-config based access to devices
  
  hpa felt performance does matter there but didn't clarify why ...
  
 You do not what to make it too slow obviously, this is interface that is
 used to load OS during boot.

And possibly installation?

 
 SeaBIOS is polling for completion anyway.

I think that's different because a disk will normally respond
quickly. So it polls a bit, but then it stops as
there are no outstanding requests.

-- 
MST
   
   --
 Gleb.
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Gleb Natapov
On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
   On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
 On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
  Gleb Natapov g...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
   H. Peter Anvin h...@zytor.com writes:
   
On 06/05/2013 03:08 PM, Anthony Liguori wrote:
   
Definitely an option.  However, we want to be able to boot 
from native
devices, too, so having an I/O BAR (which would not be used 
by the OS
driver) should still at the very least be an option.

What makes it so difficult to work with an MMIO bar for PCI-e?

With legacy PCI, tracking allocation of MMIO vs. PIO is 
pretty straight
forward.  Is there something special about PCI-e here?

   
It's not tracking allocation.  It is that accessing memory 
above 1 MiB
is incredibly painful in the BIOS environment, which basically 
means
MMIO is inaccessible.
   
   Oh, you mean in real mode.
   
   SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
   layout.
   There are loads of ASSERT32FLAT()s in the code to make sure of 
   this.
   
   Well, not exactly. Initialization is done in 32bit, but disk
   reads/writes are done in 16bit mode since it should work from 
   int13
   interrupt handler. The only way I know to access MMIO bars from 
   16 bit
   is to use SMM which we do not have in KVM.
  
  Ah, if it's just the dataplane operations then there's another 
  solution.
  
  We can introduce a virtqueue flag that asks the backend to poll for 
  new
  requests.  Then SeaBIOS can add the request to the queue and not 
  worry
  about kicking or reading the ISR.
 
 This will pin a host CPU.
 If we do something timer based it will likely
 both increase host CPU utilization and slow device down.
 
 If we didn't care about performance at all we could
 do config cycles for signalling, which is much
 more elegant than polling in host, but I don't think
 that's the case.
 
I wouldn't call BIOS int13 interface performance critical.
   
   So the plan always was to
   - add an MMIO BAR
   - add a register for pci-config based access to devices
   
   hpa felt performance does matter there but didn't clarify why ...
   
  You do not what to make it too slow obviously, this is interface that is
  used to load OS during boot.
 
 And possibly installation?
 
Only the stage that reads files from CDROM. IIRC actual installation
runs with native drivers. This is why Windows asks you to provide floppy
with a driver at very early stage of installation.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Alexander Graf

Am 11.06.2013 um 09:43 schrieb Mario Smarduch mario.smard...@huawei.com:

 
 This is the initial device pass through support.
 At this time host == guest only is supported.
 Basic Operation:
 
 - QEMU parameters: -device kvm-device-assign,host=device name
  for example - kvm-device-assign,host='arm-sp804'. Essentially
  any device that does PIO should be supported.

Yikes!

Over the last few years we've worked very hard to get rid of the unfortunate 
intertwining of device assignment and KVM. There are a number of reasons it's a 
bad idea:

  - kvm access is a potential priviledge escalation
  - device assignment is limited to kvm

The solution to both of the above is VFIO. You get a completely separate 
interface for accessing your devices with a few connecting bits (irqfd, 
eventfd) to communicate quickly between vfio and kvm.

Is there any particular reason you're not going down that path for your ARM 
implementation?

On the embedded PPC side we've been discussing vfio and how it fits into a 
device tree, non-PCI world for a while. If you like, we can dive into more 
detail on that, either via email or via phone.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 11:22:37AM +0300, Gleb Natapov wrote:
 On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote:
  On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote:
   On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
 On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
  On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
   Gleb Natapov g...@redhat.com writes:
   
On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot 
 from native
 devices, too, so having an I/O BAR (which would not be 
 used by the OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for 
 PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is 
 pretty straight
 forward.  Is there something special about PCI-e here?
 

 It's not tracking allocation.  It is that accessing memory 
 above 1 MiB
 is incredibly painful in the BIOS environment, which 
 basically means
 MMIO is inaccessible.

Oh, you mean in real mode.

SeaBIOS runs the virtio code in 32-bit mode with a flat memory 
layout.
There are loads of ASSERT32FLAT()s in the code to make sure of 
this.

Well, not exactly. Initialization is done in 32bit, but disk
reads/writes are done in 16bit mode since it should work from 
int13
interrupt handler. The only way I know to access MMIO bars from 
16 bit
is to use SMM which we do not have in KVM.
   
   Ah, if it's just the dataplane operations then there's another 
   solution.
   
   We can introduce a virtqueue flag that asks the backend to poll 
   for new
   requests.  Then SeaBIOS can add the request to the queue and not 
   worry
   about kicking or reading the ISR.
  
  This will pin a host CPU.
  If we do something timer based it will likely
  both increase host CPU utilization and slow device down.
  
  If we didn't care about performance at all we could
  do config cycles for signalling, which is much
  more elegant than polling in host, but I don't think
  that's the case.
  
 I wouldn't call BIOS int13 interface performance critical.

So the plan always was to
- add an MMIO BAR
- add a register for pci-config based access to devices

hpa felt performance does matter there but didn't clarify why ...

   You do not what to make it too slow obviously, this is interface that is
   used to load OS during boot.
  
  And possibly installation?
  
 Only the stage that reads files from CDROM. IIRC actual installation
 runs with native drivers. This is why Windows asks you to provide floppy
 with a driver at very early stage of installation.

Have any numbers to tell us how much time is spent there?
E.g. if it's slowed down by a factor of 2, is it a problem?
How about a factor of 10?

 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-11 Thread Gleb Natapov
On Tue, Jun 11, 2013 at 11:30:11AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 11:22:37AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote:
   On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote:
On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote:
 On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote:
  On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote:
   On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote:
Gleb Natapov g...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori 
 wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to 
  boot from native
  devices, too, so having an I/O BAR (which would not be 
  used by the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for 
  PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is 
  pretty straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory 
  above 1 MiB
  is incredibly painful in the BIOS environment, which 
  basically means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat 
 memory layout.
 There are loads of ASSERT32FLAT()s in the code to make sure 
 of this.
 
 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from 
 int13
 interrupt handler. The only way I know to access MMIO bars 
 from 16 bit
 is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another 
solution.

We can introduce a virtqueue flag that asks the backend to poll 
for new
requests.  Then SeaBIOS can add the request to the queue and 
not worry
about kicking or reading the ISR.
   
   This will pin a host CPU.
   If we do something timer based it will likely
   both increase host CPU utilization and slow device down.
   
   If we didn't care about performance at all we could
   do config cycles for signalling, which is much
   more elegant than polling in host, but I don't think
   that's the case.
   
  I wouldn't call BIOS int13 interface performance critical.
 
 So the plan always was to
 - add an MMIO BAR
 - add a register for pci-config based access to devices
 
 hpa felt performance does matter there but didn't clarify why ...
 
You do not what to make it too slow obviously, this is interface that is
used to load OS during boot.
   
   And possibly installation?
   
  Only the stage that reads files from CDROM. IIRC actual installation
  runs with native drivers. This is why Windows asks you to provide floppy
  with a driver at very early stage of installation.
 
 Have any numbers to tell us how much time is spent there?
 E.g. if it's slowed down by a factor of 2, is it a problem?
 How about a factor of 10?
 
No I do not.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/6] KVM: MMU: make return value of mmio page fault handler more readable

2013-06-11 Thread Gleb Natapov
On Mon, Jun 10, 2013 at 10:16:04PM +0900, Takuya Yoshikawa wrote:
 On Mon, 10 Jun 2013 10:57:50 +0300
 Gleb Natapov g...@redhat.com wrote:
 
  On Fri, Jun 07, 2013 at 04:51:25PM +0800, Xiao Guangrong wrote:
 
   +
   +/*
   + * Return values of handle_mmio_page_fault_common:
   + * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the 
   instruction
   + *directly.
   + * RET_MMIO_PF_RETRY: let CPU fault again on the address.
   + * RET_MMIO_PF_BUG: bug is detected.
   + */
   +enum {
   + RET_MMIO_PF_EMULATE = 1,
   + RET_MMIO_PF_RETRY = 0,
   + RET_MMIO_PF_BUG = -1
   +};
  I would order them from -1 to 1 and rename RET_MMIO_PF_BUG to
  RET_MMIO_PF_ERROR, but no need to resend just for that.
 
 Why not just let compilers select the values? -- It's an enum.
 Any reason to make it start from -1?
 
I am fine with this too as an additional patch. It makes sense to preserve
original values like Xiao did for initial patch, since it is easier to
verify that the patch is just a mechanical change.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] vhost: check owner before we overwrite ubuf_info

2013-06-11 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Thu, 6 Jun 2013 15:20:39 +0300

 If device has an owner, we shouldn't touch ubuf_info
 since it might be in use.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] vhost: fix ubuf_info cleanup

2013-06-11 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Thu, 6 Jun 2013 15:20:46 +0300

 vhost_net_clear_ubuf_info didn't clear ubuf_info
 after kfree, this could trigger double free.
 Fix this and simplify this code to make it more robust: make sure
 ubuf info is always freed through vhost_net_clear_ubuf_info.
 
 Reported-by: Tommi Rantala tt.rant...@gmail.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 03/13] nEPT: Add EPT tables support to paging_tmpl.h

2013-06-11 Thread Gleb Natapov
On Tue, May 21, 2013 at 03:52:12PM +0800, Xiao Guangrong wrote:
 On 05/19/2013 12:52 PM, Jun Nakajima wrote:
  From: Nadav Har'El n...@il.ibm.com
  
  This is the first patch in a series which adds nested EPT support to KVM's
  nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can 
  use
  EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest
  to set its own cr3 and take its own page faults without either of L0 or L1
  getting involved. This often significanlty improves L2's performance over 
  the
  previous two alternatives (shadow page tables over EPT, and shadow page
  tables over shadow page tables).
  
  This patch adds EPT support to paging_tmpl.h.
  
  paging_tmpl.h contains the code for reading and writing page tables. The 
  code
  for 32-bit and 64-bit tables is very similar, but not identical, so
  paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once
  with PTTYPE=64, and this generates the two sets of similar functions.
  
  There are subtle but important differences between the format of EPT tables
  and that of ordinary x86 64-bit page tables, so for nested EPT we need a
  third set of functions to read the guest EPT table and to write the shadow
  EPT table.
  
  So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions 
  (prefixed
  with EPT) which correctly read and write EPT tables.
  
  Signed-off-by: Nadav Har'El n...@il.ibm.com
  Signed-off-by: Jun Nakajima jun.nakaj...@intel.com
  Signed-off-by: Xinhao Xu xinhao...@intel.com
  ---
   arch/x86/kvm/mmu.c |  5 +
   arch/x86/kvm/paging_tmpl.h | 43 +--
   2 files changed, 46 insertions(+), 2 deletions(-)
  
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 117233f..6c1670f 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -3397,6 +3397,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, 
  unsigned level, unsigned gp
  return mmu-last_pte_bitmap  (1  index);
   }
  
  +#define PTTYPE_EPT 18 /* arbitrary */
  +#define PTTYPE PTTYPE_EPT
  +#include paging_tmpl.h
  +#undef PTTYPE
  +
   #define PTTYPE 64
   #include paging_tmpl.h
   #undef PTTYPE
  diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
  index df34d4a..4c45654 100644
  --- a/arch/x86/kvm/paging_tmpl.h
  +++ b/arch/x86/kvm/paging_tmpl.h
  @@ -50,6 +50,22 @@
  #define PT_LEVEL_BITS PT32_LEVEL_BITS
  #define PT_MAX_FULL_LEVELS 2
  #define CMPXCHG cmpxchg
  +#elif PTTYPE == PTTYPE_EPT
  +   #define pt_element_t u64
  +   #define guest_walker guest_walkerEPT
  +   #define FNAME(name) EPT_##name
  +   #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
  +   #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
  +   #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
  +   #define PT_INDEX(addr, level) PT64_INDEX(addr, level)
  +   #define PT_LEVEL_BITS PT64_LEVEL_BITS
  +   #ifdef CONFIG_X86_64
  +   #define PT_MAX_FULL_LEVELS 4
  +   #define CMPXCHG cmpxchg
  +   #else
  +   #define CMPXCHG cmpxchg64
 
 CMPXHG is only used in FNAME(cmpxchg_gpte), but you commented it later.
 Do we really need it?
 
  +   #define PT_MAX_FULL_LEVELS 2
 
 And the SDM says:
 
 It uses a page-walk length of 4, meaning that at most 4 EPT paging-structure
 entriesare accessed to translate a guest-physical address., Is My SDM 
 obsolete?
 Which kind of process supports page-walk length = 2?
 
 It seems your patch is not able to handle the case that the guest uses 
 walk-lenght = 2
 which is running on the host with walk-lenght = 4.
 (plrease refer to how to handle sp-role.quadrant in FNAME(get_level1_sp_gpa) 
 in
 the current code.)
 
But since EPT always has 4 levels on all existing cpus it is not an issue and 
the only case
that we should worry about is guest walk-lenght == host walk-lenght == 4, or 
have I
misunderstood what you mean here?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM fixes for 3.10-rc5

2013-06-11 Thread Gleb Natapov

Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git fixes

To receive the following KVM bug fixes. There is one more fix for MIPS
KVM ABI here, MIPS and PPC build breakage fixes and a couple of PPC bug fixes.


David Daney (2):
  kvm: Add definition of KVM_REG_MIPS
  mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG

James Hogan (1):
  KVM: add kvm_para_available to asm-generic/kvm_para.h

Mihai Caraman (1):
  kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage

Scott Wood (3):
  kvm/ppc/booke64: Disable e6500 support
  kvm/ppc/booke: Hold srcu lock when calling gfn functions
  kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()

 arch/mips/include/uapi/asm/kvm.h   |   81 +--
 arch/mips/kvm/kvm_mips.c   |   83 +++-
 arch/powerpc/include/asm/kvm_asm.h |   16 ---
 arch/powerpc/kvm/44x_tlb.c |5 +++
 arch/powerpc/kvm/booke.c   |   18 
 arch/powerpc/kvm/e500_mmu.c|5 +++
 arch/powerpc/kvm/e500mc.c  |2 -
 include/asm-generic/kvm_para.h |5 +++
 include/uapi/linux/kvm.h   |1 +
 9 files changed, 137 insertions(+), 79 deletions(-)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] add irq priodrop support

2013-06-11 Thread Grant Likely
On Tue, 11 Jun 2013 09:37:24 +0200, Mario Smarduch mario.smard...@huawei.com 
wrote:
 This is the same Interrupt Priority Drop/Deactivation patch 
 emailed some time back (except for 3.10-rc4) used by the initial 
 device pass-through support. 
 
 When enabled all IRQs on host write to distributor EOIR and 
 DIR reg to dr-prioritize/de-activate an interrupt. For device 
 that's passed through only the EOIR is written
 to drop the priority, the Guest deactivates it when 
 it handles its EOI. This supports exitless EOI that's agnostic
 to bus type (i.e. PCI)
 
 The patch has been tested for all configurations:
 Host: No Prio Drop  Guest: No Prio Drop
 Host: Prio DROP   Guest: No Prio Drop
 Host: Prio Drop Guest: Prio Drop 
 
 - Mario
 
 Signed-off-by: Mario Smarduch mario.smard...@huawei.com

Hi Mario,

Comments below. I'm rather weak on how irq passthough is intended to
work, so I don't have a lot of comments on that, but I did notice some
things in this patch that should be addressed.

 ---
  arch/arm/kvm/Kconfig|8 +++
  drivers/irqchip/irq-gic.c   |  145 
 ++-
  include/linux/irqchip/arm-gic.h |6 ++
  3 files changed, 156 insertions(+), 3 deletions(-)
 
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index 370e1a8..c0c9f3c 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -59,6 +59,14 @@ config KVM_ARM_VGIC
   ---help---
 Adds support for a hardware assisted, in-kernel GIC emulation.
  
 +config KVM_ARM_INT_PRIO_DROP
 +bool KVM support for Interrupt pass-through
 +depends on KVM_ARM_VGIC  OF
 +default n
 +---help---
 +  Seperates interrupt priority drop and deactivation to enable device
 +  pass-through to Guests.
 +

Nit: check your whitespace (tabs vs. spaces)

  config KVM_ARM_TIMER
   bool KVM support for Architected Timers
   depends on KVM_ARM_VGIC  ARM_ARCH_TIMER
 diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
 index 1760ceb..9fb4ef3 100644
 --- a/drivers/irqchip/irq-gic.c
 +++ b/drivers/irqchip/irq-gic.c
 @@ -41,10 +41,13 @@
  #include linux/slab.h
  #include linux/irqchip/chained_irq.h
  #include linux/irqchip/arm-gic.h
 +#include linux/irqflags.h
 +#include linux/bitops.h
  
  #include asm/irq.h
  #include asm/exception.h
  #include asm/smp_plat.h
 +#include asm/virt.h
  
  #include irqchip.h
  
 @@ -99,6 +102,20 @@ struct irq_chip gic_arch_extn = {
  
  static struct gic_chip_data gic_data[MAX_GIC_NR] __read_mostly;
  
 +#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP
 +/*
 + * Priority drop/deactivation bit map, 1st 16 bits used for SGIs, this bit 
 map
 + * is shared by several guests. If bit is set only execute EOI which drops
 + * current priority but not deactivation.
 + */
 +static u32  gic_irq_prio_drop[DIV_ROUND_UP(1020, 32)] __read_mostly;

I believe it is possible to have more than one GIC in a system. This map
assumes only one. The prio_drop map should probably be part of
gic_chip_data so that it is per-instance.

Also, as discussed below, the code should be using DECLARE_BITMAP()

 +static void gic_eoi_irq_priodrop(struct irq_data *);
 +#endif
 +
 +static void gic_enable_gicc(void __iomem *);
 +static void gic_eoi_sgi(u32, void __iomem *);
 +static void gic_priodrop_remap_eoi(struct irq_chip *);
 +

The typical pattern here is to actually define the static functions
above the code that uses them so that forward declarations are not
required.

  #ifdef CONFIG_GIC_NON_BANKED
  static void __iomem *gic_get_percpu_base(union gic_base *base)
  {
 @@ -296,7 +313,7 @@ static asmlinkage void __exception_irq_entry 
 gic_handle_irq(struct pt_regs *regs
   continue;
   }
   if (irqnr  16) {
 - writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
 + gic_eoi_sgi(irqstat, cpu_base);
  #ifdef CONFIG_SMP
   handle_IPI(irqnr, regs);
  #endif
 @@ -450,7 +467,7 @@ static void __cpuinit gic_cpu_init(struct gic_chip_data 
 *gic)
   writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4 / 
 4);
  
   writel_relaxed(0xf0, base + GIC_CPU_PRIMASK);
 - writel_relaxed(1, base + GIC_CPU_CTRL);
 + gic_enable_gicc(base);
  }
  
  #ifdef CONFIG_CPU_PM
 @@ -585,7 +602,7 @@ static void gic_cpu_restore(unsigned int gic_nr)
   writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4);
  
   writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK);
 - writel_relaxed(1, cpu_base + GIC_CPU_CTRL);
 + gic_enable_gicc(cpu_base);
  }
  
  static int gic_notifier(struct notifier_block *self, unsigned long cmd,  
 void *v)
 @@ -666,6 +683,7 @@ void gic_raise_softirq(const struct cpumask *mask, 
 unsigned int irq)
  static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
   irq_hw_number_t hw)
  {
 + gic_priodrop_remap_eoi(gic_chip);

gic_priodrop_remap_eoi() 

Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Mario Smarduch

On 6/11/2013 10:28 AM, Alexander Graf wrote:

 
 Is there any particular reason you're not going down that path for your ARM 
 implementation?

We see this as a good starting point to build on, we need baseline numbers
for performance, latency, interrupt throughput on real hardware
ASAP to build competency for NFV, which has demanding Dev. Passthrough
requirements. Over time we plan contributing to SMMU and VFIO as well
(we're looking into this now).

FYI NFV is an initiative wireless/fixed network operators are working 
towards - to virtualize Core, likely Radia Access and even Home Network 
equipment, this is a epic undertaking (i.e. Network Function Virtualization). 
So far VMware has taken the lead (mostly x86).
 
 
 On the embedded PPC side we've been discussing vfio and how it fits into a 
 device tree, non-PCI world for a while. If you like, we can dive into more 
 detail on that, either via email or via phone.

I'll email you offline, I'd like to know more what you've done on this
and see where we can align/leverage the effort.

- Mario
 
 
 Alex
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM minutes for 2013-06-11

2013-06-11 Thread Juan Quintela



2013-06-11
--

- move ACPI table generation to QEMU
  - code sharing with SEABIOS
  - easier to generate there
Anthony: it is the same put in QEMU or SEABIOS

Michael: there are some information not easily available in
seabios (hot plug)

Anthony: transfer QOM tree to SEABIOS,  current interface shows its age.

  - information hardcoded that change over time
this is easier in qemu

  Example bus device number:
maintain device number stable over migration

  It is easier to maintian the mostly static tables in QEMU.

  QEMU knows the whole device tree,  so it is easy to generate.

  Where are we know?  Do we have enough ACPI support into QEMU?
  Anthony wants a mergable tree before starting.  Still think it is the wrong 
approach.
  Create a new serial port and enable it through ACPI?

  Using iasl at roon time and source table is problematic.  iasl don't
  work on big endian hosts.

- VFIO?
  How to do it (Alex),  will be discussed on the list

Later,  Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Alex Williamson
On Tue, 2013-06-11 at 16:13 +0200, Mario Smarduch wrote:
 On 6/11/2013 10:28 AM, Alexander Graf wrote:
 
  
  Is there any particular reason you're not going down that path for your ARM 
  implementation?
 
 We see this as a good starting point to build on, we need baseline numbers
 for performance, latency, interrupt throughput on real hardware
 ASAP to build competency for NFV, which has demanding Dev. Passthrough
 requirements. Over time we plan contributing to SMMU and VFIO as well
 (we're looking into this now).
 
 FYI NFV is an initiative wireless/fixed network operators are working 
 towards - to virtualize Core, likely Radia Access and even Home Network 
 equipment, this is a epic undertaking (i.e. Network Function Virtualization). 
 So far VMware has taken the lead (mostly x86).
  
  
  On the embedded PPC side we've been discussing vfio and how it fits into a 
  device tree, non-PCI world for a while. If you like, we can dive into more 
  detail on that, either via email or via phone.
 
 I'll email you offline, I'd like to know more what you've done on this
 and see where we can align/leverage the effort.

Yes, please let's use VFIO rather than continue to use or invent new
device assignment interfaces for KVM.  Antonios Motakis (cc'd) already
contacted me about VFIO for ARM.  IIRC, his initial impression was that
the IOMMU backend was almost entirely reusable for ARM (a couple PCI
assumptions implicit in the IOMMU API to handle) and my hope was that
ARM and PPC could work together on a common VFIO device tree backend.
Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Mario Smarduch

I know Antonios very well. Yes our intent is definitely to use VFIO.

- Mario 

On 6/11/2013 4:52 PM, Alex Williamson wrote:
 On Tue, 2013-06-11 at 16:13 +0200, Mario Smarduch wrote:
 On 6/11/2013 10:28 AM, Alexander Graf wrote:


 Is there any particular reason you're not going down that path for your ARM 
 implementation?

 We see this as a good starting point to build on, we need baseline numbers
 for performance, latency, interrupt throughput on real hardware
 ASAP to build competency for NFV, which has demanding Dev. Passthrough
 requirements. Over time we plan contributing to SMMU and VFIO as well
 (we're looking into this now).

 FYI NFV is an initiative wireless/fixed network operators are working 
 towards - to virtualize Core, likely Radia Access and even Home Network 
 equipment, this is a epic undertaking (i.e. Network Function 
 Virtualization). 
 So far VMware has taken the lead (mostly x86).
  

 On the embedded PPC side we've been discussing vfio and how it fits into a 
 device tree, non-PCI world for a while. If you like, we can dive into more 
 detail on that, either via email or via phone.

 I'll email you offline, I'd like to know more what you've done on this
 and see where we can align/leverage the effort.
 
 Yes, please let's use VFIO rather than continue to use or invent new
 device assignment interfaces for KVM.  Antonios Motakis (cc'd) already
 contacted me about VFIO for ARM.  IIRC, his initial impression was that
 the IOMMU backend was almost entirely reusable for ARM (a couple PCI
 assumptions implicit in the IOMMU API to handle) and my hope was that
 ARM and PPC could work together on a common VFIO device tree backend.
 Thanks,
 
 Alex
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] add irq priodrop support

2013-06-11 Thread Mario Smarduch
Hi Grant,
appreciate the strong feedback, I agree with all
the coding observations will make the changes. 
I have few inline responses.


 +static u32  gic_irq_prio_drop[DIV_ROUND_UP(1020, 32)] __read_mostly;
 
 I believe it is possible to have more than one GIC in a system. This map
 assumes only one. The prio_drop map should probably be part of
 gic_chip_data so that it is per-instance.
 
 Also, as discussed below, the code should be using DECLARE_BITMAP()

Agree.

 
 gic_priodrop_remap_eoi() is used exactly once. You should instead put
 the body of it inline like so:
 
   if (IS_ENABLED(CONFIG_KVM_ARM_INT_PRIO_DROP)  is_hyp_mode_available())
   chip-irq_eoi = gic_eoi_irq_priodrop;

Yes much cleaner.

 
 However, this block is problematic. For each map call it modifies the
 /global/ gic_chip. It's not a per-interrupt thing, but rather changes
 the callback for all gic interrupts, on *any* gic in the system. Is this
 really what you want?
 
 If it is, then I would expect the callback to be modified once sometime
 around gic_init_bases() time.

Yes need to move it up, now its being set for each IRQ domain mapping call.

 
 If it is not, and what you really want is per-irq behaviour, then what
 you need to do is have a separate gic_priodrop_chip that can be used on
 a per-irq basis instead of the gic_chip.

Prio drop/deactivate is per CPU and all IRQs are affected including SGIs.
It's possible to run mixed CPU modes, but this patch enables all CPUs for
device passthrough, similar to hyp mode enable.

Another way would be the reverse - set all non-passthrough irqs to 
gic_priodrop_chip
and the passed through IRQ to gic_chip.  I think keeping it in one function
and just setting a bit to enable/disable is cleaner.


 
  if (hw  32) {
  irq_set_percpu_devid(irq);
  irq_set_chip_and_handler(irq, gic_chip,
 @@ -857,4 +875,125 @@ IRQCHIP_DECLARE(cortex_a9_gic, arm,cortex-a9-gic, 
 gic_of_init);
  IRQCHIP_DECLARE(msm_8660_qgic, qcom,msm-8660-qgic, gic_of_init);
  IRQCHIP_DECLARE(msm_qgic2, qcom,msm-qgic2, gic_of_init);
  
 +#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP
 +/* If HYP mode enabled and PRIO DROP set EOIR function to handle PRIO DROP 
 */
 +static inline void gic_priodrop_remap_eoi(struct irq_chip *chip)
 +{
 +if (is_hyp_mode_available())
 +chip-irq_eoi = gic_eoi_irq_priodrop;
 +}
 +
 +/* If HYP mode set enable interrupt priority drop/deactivation, and mark
 + * SGIs to deactive through writes to GCICC_DIR. For Guest only enable 
 normal
 + * mode.
 + */
 
 Nit: Read Documentation/kernel-doc-nano-HOWTO.txt. It's a good idea to
 stick to that format when writing function documenation. Also,
 convention is for multiline comments to have an empty /* line before the
 first line of text.

Will do.

 


 +}
 +
 +void gic_spi_clr_priodrop(int irq)
 +{
 +struct irq_data *d = irq_get_irq_data(irq);
 +if (likely(irq = 32  irq  1019)) {
 
  1019 ...
 
 +clear_bit(irq % 32, (void *) gic_irq_prio_drop[irq/32]);
 +writel_relaxed(irq, gic_cpu_base(d) + GIC_CPU_DIR);
 +}
 +}
 +
 +int gic_spi_get_priodrop(int irq)
 +{
 +if (likely(irq = 32  irq = 1019))
 
 ... = 1019
 
 Looks like some off-by-one errors going on here. Also, the rest of the
 gic code uses 1020, not 1019 as the upper limit. What is the reason for
 being difference in this code block?

Hmmm a mistake.

 ___
 linux-arm-kernel mailing list
 linux-arm-ker...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-06-11

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote:
 Juan is not available now, and Anthony asked for
 agenda to be sent early.
 So here comes:
 
 Agenda for the meeting Tue, June 11:
  
 - Generating acpi tables, redux

Not so much notes as a quick summary of the call:

There are the following reasons to generate ACPI tables in QEMU:

- sharing code with e.g. ovmf
Anthony thinks this is not a valid argument

- so we can make tables more dynamic and move away from iasl
Anthony thinks this is not a valid reason too,
since qemu and seabios have access to same info
MST noted several info not accessible to bios.
Anthony said they can be added, e.g. by exposing
QOM to the bios.

- even though most tables are static, hardcoded
  they are likely to change over time
Anthony sees this as justified

To summarize, there's a concensus now that generating ACPI
tables in QEMU is a good idea.

Two issues that need to be addressed:
- original patches break cross-version migration. Need to fix that.

- Anthony requested that patchset is merged together with
  some new feature. I'm not sure the reasoning is clear:
  current a version intentionally generates tables
  that are bug for bug compatible with seabios,
  to simplify testing.

  It seems clear we have users for this such as
  hotplug of devices behind pci bridges, so
  why keep the infrastructure out of tree?

  Looking for something additional, smaller as the hotplug patch
  is a bit big, so might delay merging.


Going forward - would we want to move
smbios as well? Everyone seems to think it's a
good idea.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for 2013-06-25

2013-06-11 Thread Juan Quintela

Hi

Now we have moved to one call each other week.
Please, send any topic that you are interested in covering.

Thanks, Juan.

PD.  If you want to attend and you don't have the call details,
  contact me.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] vfio-pci: Avoid deadlock on remove

2013-06-11 Thread Alex Williamson
If an attempt is made to unbind a device from vfio-pci while that
device is in use, the request is blocked until the device becomes
unused.  Unfortunately, that unbind path still grabs the device_lock,
which certain things like __pci_reset_function() also want to take.
This means we need to try to acquire the locks ourselves and use the
pre-locked version, __pci_reset_function_locked().

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci.c |   23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index ac37254..41023e4 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -137,8 +137,27 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 */
pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
 
-   if (vdev-reset_works)
-   __pci_reset_function(pdev);
+   /*
+* Careful, device_lock may already be held.  This is the case if
+* a driver unbind is blocked.  Try to get the locks ourselves to
+* prevent a deadlock.
+*/
+   if (vdev-reset_works) {
+   bool reset_done = false;
+
+   if (pci_cfg_access_trylock(pdev)) {
+   if (device_trylock(pdev-dev)) {
+   __pci_reset_function_locked(pdev);
+   reset_done = true;
+   device_unlock(pdev-dev);
+   }
+   pci_cfg_access_unlock(pdev);
+   }
+
+   if (!reset_done)
+   pr_warn(%s: Unable to acquire locks for reset of %s\n,
+   __func__, dev_name(pdev-dev));
+   }
 
pci_restore_state(pdev);
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] vfio: Don't overreact to DEL_DEVICE

2013-06-11 Thread Alex Williamson
BUS_NOTIFY_DEL_DEVICE triggers IOMMU drivers to remove devices from
their iommu group, but there's really nothing we can do about it at
this point.  If the device is in use, then the vfio sub-driver will
block the device_del from completing until it's released.  If the
device is not in use or not owned by a vfio sub-driver, then we
really don't care that it's being removed.

The current code can be triggered just by unloading an sr-iov driver
(ex. igb) while the VFs are attached to vfio-pci because it makes an
incorrect assumption about the ordering of driver remove callbacks
vs the DEL_DEVICE notification.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/vfio.c |   29 +++--
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6d78736..1bed313 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -492,27 +492,6 @@ static int vfio_group_nb_add_dev(struct vfio_group *group, 
struct device *dev)
return 0;
 }
 
-static int vfio_group_nb_del_dev(struct vfio_group *group, struct device *dev)
-{
-   struct vfio_device *device;
-
-   /*
-* Expect to fall out here.  If a device was in use, it would
-* have been bound to a vfio sub-driver, which would have blocked
-* in .remove at vfio_del_group_dev.  Sanity check that we no
-* longer track the device, so it's safe to remove.
-*/
-   device = vfio_group_get_device(group, dev);
-   if (likely(!device))
-   return 0;
-
-   WARN(Device %s removed from live group %d!\n, dev_name(dev),
-iommu_group_id(group-iommu_group));
-
-   vfio_device_put(device);
-   return 0;
-}
-
 static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev)
 {
/* We don't care what happens when the group isn't in use */
@@ -543,7 +522,13 @@ static int vfio_iommu_group_notifier(struct notifier_block 
*nb,
vfio_group_nb_add_dev(group, dev);
break;
case IOMMU_GROUP_NOTIFY_DEL_DEVICE:
-   vfio_group_nb_del_dev(group, dev);
+   /*
+* Nothing to do here.  If the device is in use, then the
+* vfio sub-driver should block the remove callback until
+* it is unused.  If the device is unused or attached to a
+* stub driver, then it should be released and we don't
+* care that it will be going away.
+*/
break;
case IOMMU_GROUP_NOTIFY_BIND_DRIVER:
pr_debug(%s: Device %s, group %d binding to driver\n,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] vfio: Ignore sprurious notifies

2013-06-11 Thread Alex Williamson
Remove debugging WARN_ON if we get a spurious notify for a group that
no longer exists.  No reports of anyone hitting this, but it would
likely be a race and not a bug if they did.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/vfio.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 1bed313..2edfecc 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -508,13 +508,11 @@ static int vfio_iommu_group_notifier(struct 
notifier_block *nb,
struct device *dev = data;
 
/*
-* Need to go through a group_lock lookup to get a reference or
-* we risk racing a group being removed.  Leave a WARN_ON for
-* debuging, but if the group no longer exists, a spurious notify
-* is harmless.
+* Need to go through a group_lock lookup to get a reference or we
+* risk racing a group being removed.  Ignore spurious notifies.
 */
group = vfio_group_try_get(group);
-   if (WARN_ON(!group))
+   if (!group)
return NOTIFY_OK;
 
switch (action) {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] vfio: fixup notifiers and avoid possible deadlock

2013-06-11 Thread Alex Williamson
Cleanup a couple of the notifier paths to remove bogus WARN_ON calls.
One is pretty easy to hit and neither really signifies a problems.
Fix remove path to avoid potential deadlock with other device_lock
holders.  Thanks,

Alex

---

Alex Williamson (3):
  vfio: Don't overreact to DEL_DEVICE
  vfio: Ignore sprurious notifies
  vfio-pci: Avoid deadlock on remove


 drivers/vfio/pci/vfio_pci.c |   23 +--
 drivers/vfio/vfio.c |   37 ++---
 2 files changed, 31 insertions(+), 29 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 59521] KVM linux guest reads uninitialized pvclock values before executing rdmsr MSR_KVM_WALL_CLOCK

2013-06-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=59521





--- Comment #1 from Eugene Batalov eabatalo...@gmail.com  2013-06-11 16:03:55 
---
I have reconstructed the uninitialized pvclock read backtrace.
References to file lines are for Ubuntu-raring kernel
git://kernel.ubuntu.com/ubuntu/ubuntu-raring.git
tag is Ubuntu-3.8.0-19.30

bp: 0xf3ccbe68 ip: 0xc103cfbd
arch/x86/include/asm/pvclock.h:78
arch/x86/kernel/pvclock.c:74
bp: 0xf3ccbe70 ip: 0xc103c057
arch/x86/kernel/kvmclock.c:91
bp: 0xf3ccbe78 ip: 0xc1017598
arch/x86/kernel/tsc.c:58
bp: 0xf3ccbea8 ip: 0xc107e98d
kernel/sched/clock.c:248
bp: 0xf3ccbeb8 ip: 0xc107ea35
kernel/sched/clock.c:342
bp: 0xf3ccbf08 ip: 0xc104ad85
kernel/printk.c:356
bp: 0xf3ccbf50 ip: 0xc104c4e1
kernel/printk.c:1607
bp: 0xf3ccbf70 ip: 0xc1609bb6
kernel/printk.c:1688
bp: 0xf3ccbf90 ip: 0xc1600a51
arch/x86/include/asm/bitops.h:321
arch/x86/kernel/cpu/common.c:1325
bp: 0xf3ccbfb4 ip: 0xc1604000
??
bp: 0x


kernel/printk.c:356
calls local_clock()
calls sched_clock_cpu()
calls sched_clock()
calls paravirt_sched_clock()
calls indirectly kvm_clock_read()
unintialized pv_clock is read here

vcpu kvmclock initialization is performed in kvm_register_clock.
kvm_register_clock is called from
static void __init kvm_smp_prepare_boot_cpu(void)
called form
 ./init/main.c:524 as smp_prepare_boot_cpu

I'll think about proper fix soon. We probably should fix cpu initialization
stages order or disable usage of pvclock before it initialized.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm/ppc/booke: Delay kvmppc_lazy_ee_enable

2013-06-11 Thread Scott Wood
kwmppc_lazy_ee_enable() should be called as late as possible,
or else we get things like WARN_ON(preemptible()) in enable_kernel_fp()
in configurations where preemptible() works.

Note that book3s_pr already waits until just before __kvmppc_vcpu_run
to call kvmppc_lazy_ee_enable().

Signed-off-by: Scott Wood scottw...@freescale.com
---
Rebased without patches 5 and 6 in the previous patchset.

 arch/powerpc/kvm/booke.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 5cd7ad0..1a1b511 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -673,7 +673,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
ret = s;
goto out;
}
-   kvmppc_lazy_ee_enable();
 
kvm_guest_enter();
 
@@ -699,6 +698,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvmppc_load_guest_fp(vcpu);
 #endif
 
+   kvmppc_lazy_ee_enable();
+
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
/* No need for kvm_guest_exit. It's done in handle_exit.
-- 
1.7.10.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2013-06-25

2013-06-11 Thread Michael R. Hines


I don't think my presence on the call is necessary,
but I would appreciate it you put RDMA on the agenda.

The patches have been thoroughly bug-tested and reviewed.

- Michael

On 06/11/2013 11:52 AM, Juan Quintela wrote:

Hi

Now we have moved to one call each other week.
Please, send any topic that you are interested in covering.

Thanks, Juan.

PD.  If you want to attend and you don't have the call details,
   contact me.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-06-11

2013-06-11 Thread Laszlo Ersek
On 06/11/13 17:45, Michael S. Tsirkin wrote:

 To summarize, there's a concensus now that generating ACPI
 tables in QEMU is a good idea.
 
 Two issues that need to be addressed:
 - original patches break cross-version migration. Need to fix that.
 
 - Anthony requested that patchset is merged together with
   some new feature. I'm not sure the reasoning is clear:
   current a version intentionally generates tables
   that are bug for bug compatible with seabios,
   to simplify testing.

Sorry about not following the series more closely -- is there now a qemu
interface available that allows any firmware just take the tables, maybe
to fix them up blindly / algorithmically, and to install them?

IOW, is the interface at such a point that in OVMF we could start
looking throwing out specific code, in favor of implementing the generic
fw-side algorithm?

   It seems clear we have users for this such as
   hotplug of devices behind pci bridges, so
   why keep the infrastructure out of tree?
 
   Looking for something additional, smaller as the hotplug patch
   is a bit big, so might delay merging.
 
 
 Going forward - would we want to move
 smbios as well? Everyone seems to think it's a
 good idea.

I think the current fw_cfg interface for SMBIOS tables is already good
enough to save a lot of work in OVMF. Namely, if all required tables
were generated (table template + field-wise patching) in qemu, and then
all exported over fw_cfg as verbatim tables, my SMBIOS series currently
pending for OVMF should be able to install them.

This would save OVMF the coding of templates (and any necessary
patching) for types 3, 4 (especially nasty), 9, 16, 17, 19, and 32.
(Basically all except type 0 and type 1, which are already implemented
(but verbatim tables from qemu would take priority even for type 0 and
type 1). Type 7 can be left out apparently; IIRC dmidecode doesn't
report it even under SeaBIOS.)

I'm not implying anyone should start working on this (myself included
:)), but yeah, moving SMBIOS would save work in OVMF. (Provided there
was any reason to support said SMBIOS tables in OVMF :))

Thanks,
Laszlo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2013-06-25

2013-06-11 Thread Alexander Graf

On 11.06.2013, at 17:52, Juan Quintela wrote:

 
 Hi
 
 Now we have moved to one call each other week.
 Please, send any topic that you are interested in covering.

VFIO for device tree based platforms


Alex

 
 Thanks, Juan.
 
 PD.  If you want to attend and you don't have the call details,
  contact me.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-06-11

2013-06-11 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote:
 Juan is not available now, and Anthony asked for
 agenda to be sent early.
 So here comes:
 
 Agenda for the meeting Tue, June 11:
  
 - Generating acpi tables, redux

 Not so much notes as a quick summary of the call:

 There are the following reasons to generate ACPI tables in QEMU:

 - sharing code with e.g. ovmf
   Anthony thinks this is not a valid argument

 - so we can make tables more dynamic and move away from iasl
   Anthony thinks this is not a valid reason too,
   since qemu and seabios have access to same info
   MST noted several info not accessible to bios.
   Anthony said they can be added, e.g. by exposing
   QOM to the bios.

 - even though most tables are static, hardcoded
   they are likely to change over time
   Anthony sees this as justified

 To summarize, there's a concensus now that generating ACPI
 tables in QEMU is a good idea.

I would say best worst idea ;-)

I am deeply concerned about the complexity it introduces but I don't see
many other options.


 Two issues that need to be addressed:
 - original patches break cross-version migration. Need to fix that.

 - Anthony requested that patchset is merged together with
   some new feature. I'm not sure the reasoning is clear:
   current a version intentionally generates tables
   that are bug for bug compatible with seabios,
   to simplify testing.

I expect that there will be additional issues that need to be worked out
and want to see a feature that actually uses the infrastructure before
we add it.

   It seems clear we have users for this such as
   hotplug of devices behind pci bridges, so
   why keep the infrastructure out of tree?

It's hard to evaluate the infrastructure without a user.

   Looking for something additional, smaller as the hotplug patch
   is a bit big, so might delay merging.


 Going forward - would we want to move
 smbios as well? Everyone seems to think it's a
 good idea.

Yes, independent of ACPI, I think QEMU should be generating the SMBIOS
tables.

Regards,

Anthony Liguori

 -- 
 MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-06-11

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 08:06:15PM +0200, Laszlo Ersek wrote:
 On 06/11/13 17:45, Michael S. Tsirkin wrote:
 
  To summarize, there's a concensus now that generating ACPI
  tables in QEMU is a good idea.
  
  Two issues that need to be addressed:
  - original patches break cross-version migration. Need to fix that.
  
  - Anthony requested that patchset is merged together with
some new feature. I'm not sure the reasoning is clear:
current a version intentionally generates tables
that are bug for bug compatible with seabios,
to simplify testing.
 
 Sorry about not following the series more closely -- is there now a qemu
 interface available that allows any firmware just take the tables, maybe
 to fix them up blindly / algorithmically, and to install them?

Yes.

 IOW, is the interface at such a point that in OVMF we could start
 looking throwing out specific code, in favor of implementing the generic
 fw-side algorithm?
 
It seems clear we have users for this such as
hotplug of devices behind pci bridges, so
why keep the infrastructure out of tree?
  
Looking for something additional, smaller as the hotplug patch
is a bit big, so might delay merging.
  
  
  Going forward - would we want to move
  smbios as well? Everyone seems to think it's a
  good idea.
 
 I think the current fw_cfg interface for SMBIOS tables is already good
 enough to save a lot of work in OVMF. Namely, if all required tables
 were generated (table template + field-wise patching) in qemu, and then
 all exported over fw_cfg as verbatim tables, my SMBIOS series currently
 pending for OVMF should be able to install them.
 
 This would save OVMF the coding of templates (and any necessary
 patching) for types 3, 4 (especially nasty), 9, 16, 17, 19, and 32.
 (Basically all except type 0 and type 1, which are already implemented
 (but verbatim tables from qemu would take priority even for type 0 and
 type 1). Type 7 can be left out apparently; IIRC dmidecode doesn't
 report it even under SeaBIOS.)
 
 I'm not implying anyone should start working on this (myself included
 :)), but yeah, moving SMBIOS would save work in OVMF. (Provided there
 was any reason to support said SMBIOS tables in OVMF :))
 
 Thanks,
 Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-06-11

2013-06-11 Thread Michael S. Tsirkin
On Tue, Jun 11, 2013 at 01:38:11PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote:
  Juan is not available now, and Anthony asked for
  agenda to be sent early.
  So here comes:
  
  Agenda for the meeting Tue, June 11:
   
  - Generating acpi tables, redux
 
  Not so much notes as a quick summary of the call:
 
  There are the following reasons to generate ACPI tables in QEMU:
 
  - sharing code with e.g. ovmf
  Anthony thinks this is not a valid argument
 
  - so we can make tables more dynamic and move away from iasl
  Anthony thinks this is not a valid reason too,
  since qemu and seabios have access to same info
  MST noted several info not accessible to bios.
  Anthony said they can be added, e.g. by exposing
  QOM to the bios.
 
  - even though most tables are static, hardcoded
they are likely to change over time
  Anthony sees this as justified
 
  To summarize, there's a concensus now that generating ACPI
  tables in QEMU is a good idea.
 
 I would say best worst idea ;-)
 
 I am deeply concerned about the complexity it introduces but I don't see
 many other options.
 
 
  Two issues that need to be addressed:
  - original patches break cross-version migration. Need to fix that.
 
  - Anthony requested that patchset is merged together with
some new feature. I'm not sure the reasoning is clear:
current a version intentionally generates tables
that are bug for bug compatible with seabios,
to simplify testing.
 
 I expect that there will be additional issues that need to be worked out
 and want to see a feature that actually uses the infrastructure before
 we add it.

So please look at it, that code has been posted.
See:
[PATCH] qemu: piix: PCI bridge ACPI hotplug support

it does not seem to show any major issues to work out
besides the cross-version migration issue that we
know about.

It seems clear we have users for this such as
hotplug of devices behind pci bridges, so
why keep the infrastructure out of tree?
 
 It's hard to evaluate the infrastructure without a user.

But the user has been posted, even if there are still issues to work out
with it,  that should be enough to evaluate the infrastructure - the
user itself does not need to be merged for this.

So please evaluate and give feedback.

Looking for something additional, smaller as the hotplug patch
is a bit big, so might delay merging.
 
 
  Going forward - would we want to move
  smbios as well? Everyone seems to think it's a
  good idea.
 
 Yes, independent of ACPI, I think QEMU should be generating the SMBIOS
 tables.
 
 Regards,
 
 Anthony Liguori
 
  -- 
  MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-scsi: return -ENOENT when no matching tcm_vhost_tpg found

2013-06-11 Thread wenchao

cc to Greg for 3.9.


On Tue, May 28, 2013 at 04:54:44PM +0800, Wenchao Xia wrote:

ioctl for VHOST_SCSI_SET_ENDPOINT report file exist errori, when I forget
to set it correctly in configfs, make user confused. Actually it fail
to find a matching one, so change the error value.

Signed-off-by: Wenchao Xia wenchaoli...@gmail.com


Acked-by: Asias He as...@redhat.com

BTW, It would be nice to print more informative info in qemu when wwpn
is not available as well.


---
  drivers/vhost/scsi.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 7014202..6325b1d 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1219,7 +1219,7 @@ static int vhost_scsi_set_endpoint(
}
ret = 0;
} else {
-   ret = -EEXIST;
+   ret = -ENOENT;
}

/*
--
1.7.1





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: x86: handle idiv overflow at kvm_write_tsc

2013-06-11 Thread Marcelo Tosatti

Its possible that idivl overflows (due to large delta stored in usdiff,
valid scenario).

Create an exception handler to catch the overflow exception (division by zero
is protected by vcpu-arch.virtual_tsc_khz check), and interpret it accordingly
(delta is larger than USEC_PER_SEC).

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=969644

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 094b5d9..64a4b03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1194,20 +1194,37 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
elapsed = ns - kvm-arch.last_tsc_nsec;
 
if (vcpu-arch.virtual_tsc_khz) {
+   int faulted = 0;
+
/* n.b - signed multiplication and division required */
usdiff = data - kvm-arch.last_tsc_write;
 #ifdef CONFIG_X86_64
usdiff = (usdiff * 1000) / vcpu-arch.virtual_tsc_khz;
 #else
/* do_div() only does unsigned */
-   asm(idivl %2; xor %%edx, %%edx
-   : =A(usdiff)
-   : A(usdiff * 1000), rm(vcpu-arch.virtual_tsc_khz));
+   asm(1: idivl %[divisor]\n
+   2: xor %%edx, %%edx\n
+  movl $0, %[faulted]\n
+   3:\n
+   .section .fixup,\ax\\n
+   4: movl $1, %[faulted]\n
+  jmp  3b\n
+   .previous\n
+
+   _ASM_EXTABLE(1b, 4b)
+
+   : =A(usdiff), [faulted] =r (faulted)
+   : A(usdiff * 1000), [divisor] 
rm(vcpu-arch.virtual_tsc_khz));
+
 #endif
do_div(elapsed, 1000);
usdiff -= elapsed;
if (usdiff  0)
usdiff = -usdiff;
+
+   /* idivl overflow = difference is larger than USEC_PER_SEC */
+   if (faulted)
+   usdiff = USEC_PER_SEC;
} else
usdiff = USEC_PER_SEC; /* disable TSC match window below */
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-scsi: return -ENOENT when no matching tcm_vhost_tpg found

2013-06-11 Thread Greg Kroah-Hartman
On Wed, Jun 12, 2013 at 09:39:50AM +0800, wenchao wrote:
 cc to Greg for 3.9.

formletter

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

/formletter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 v3] KVM: PPC: IOMMU in-kernel handling

2013-06-11 Thread Benjamin Herrenschmidt
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote:
 Ben, ping! :)
 
 This series has tiny fixes (capability and ioctl numbers,
 changed documentation, compile errors in some configuration).
 More details are in the commit messages.
 Rebased on v3.10-rc4.

Alex, I assume you'll merge that once I ack it ?

Cheers,
Ben.

 
 Alexey Kardashevskiy (4):
   KVM: PPC: Add support for multiple-TCE hcalls
   powerpc: Prepare to support kernel handling of IOMMU map/unmap
   KVM: PPC: Add support for IOMMU in-kernel handling
   KVM: PPC: Add hugepage support for IOMMU in-kernel handling
 
  Documentation/virtual/kvm/api.txt|   45 +++
  arch/powerpc/include/asm/kvm_host.h  |7 +
  arch/powerpc/include/asm/kvm_ppc.h   |   40 ++-
  arch/powerpc/include/asm/pgtable-ppc64.h |4 +
  arch/powerpc/include/uapi/asm/kvm.h  |7 +
  arch/powerpc/kvm/book3s_64_vio.c |  398 -
  arch/powerpc/kvm/book3s_64_vio_hv.c  |  471 
 --
  arch/powerpc/kvm/book3s_hv.c |   39 +++
  arch/powerpc/kvm/book3s_hv_rmhandlers.S  |6 +
  arch/powerpc/kvm/book3s_pr_papr.c|   37 ++-
  arch/powerpc/kvm/powerpc.c   |   15 +
  arch/powerpc/mm/init_64.c|   77 -
  include/uapi/linux/kvm.h |3 +
  13 files changed, 1121 insertions(+), 28 deletions(-)
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm/ppc/booke: Delay kvmppc_lazy_ee_enable

2013-06-11 Thread Scott Wood
kwmppc_lazy_ee_enable() should be called as late as possible,
or else we get things like WARN_ON(preemptible()) in enable_kernel_fp()
in configurations where preemptible() works.

Note that book3s_pr already waits until just before __kvmppc_vcpu_run
to call kvmppc_lazy_ee_enable().

Signed-off-by: Scott Wood scottw...@freescale.com
---
Rebased without patches 5 and 6 in the previous patchset.

 arch/powerpc/kvm/booke.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 5cd7ad0..1a1b511 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -673,7 +673,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
ret = s;
goto out;
}
-   kvmppc_lazy_ee_enable();
 
kvm_guest_enter();
 
@@ -699,6 +698,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvmppc_load_guest_fp(vcpu);
 #endif
 
+   kvmppc_lazy_ee_enable();
+
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
/* No need for kvm_guest_exit. It's done in handle_exit.
-- 
1.7.10.4


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 v3] KVM: PPC: IOMMU in-kernel handling

2013-06-11 Thread Benjamin Herrenschmidt
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote:
 Ben, ping! :)
 
 This series has tiny fixes (capability and ioctl numbers,
 changed documentation, compile errors in some configuration).
 More details are in the commit messages.
 Rebased on v3.10-rc4.

Alex, I assume you'll merge that once I ack it ?

Cheers,
Ben.

 
 Alexey Kardashevskiy (4):
   KVM: PPC: Add support for multiple-TCE hcalls
   powerpc: Prepare to support kernel handling of IOMMU map/unmap
   KVM: PPC: Add support for IOMMU in-kernel handling
   KVM: PPC: Add hugepage support for IOMMU in-kernel handling
 
  Documentation/virtual/kvm/api.txt|   45 +++
  arch/powerpc/include/asm/kvm_host.h  |7 +
  arch/powerpc/include/asm/kvm_ppc.h   |   40 ++-
  arch/powerpc/include/asm/pgtable-ppc64.h |4 +
  arch/powerpc/include/uapi/asm/kvm.h  |7 +
  arch/powerpc/kvm/book3s_64_vio.c |  398 -
  arch/powerpc/kvm/book3s_64_vio_hv.c  |  471 
 --
  arch/powerpc/kvm/book3s_hv.c |   39 +++
  arch/powerpc/kvm/book3s_hv_rmhandlers.S  |6 +
  arch/powerpc/kvm/book3s_pr_papr.c|   37 ++-
  arch/powerpc/kvm/powerpc.c   |   15 +
  arch/powerpc/mm/init_64.c|   77 -
  include/uapi/linux/kvm.h |3 +
  13 files changed, 1121 insertions(+), 28 deletions(-)
 


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html