Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-12-13 Thread Avi Kivity

On 12/12/2010 01:22 PM, Jan Kiszka wrote:

From: Jan Kiszkajan.kis...@siemens.com

PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share IRQs of such devices on the host side when passing
them to a guest.

However, IRQ disabling via the PCI config space is more costly than
masking the line via disable_irq. Therefore we register the IRQ in adaptive
mode and switch between line and device level disabling on demand.

This feature is optional, user space has to request it explicitly as it
also has to inform us about its view of PCI_COMMAND_INTX_DISABLE. That
way, we can avoid unmasking the interrupt and signaling it if the guest
masked it via the PCI config space.



Looks fine.


+   ret =IRQ_NONE;
+


Danger, whitespace error detected.  Initiating self-destruct sequence.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-12-12 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share IRQs of such devices on the host side when passing
them to a guest.

However, IRQ disabling via the PCI config space is more costly than
masking the line via disable_irq. Therefore we register the IRQ in adaptive
mode and switch between line and device level disabling on demand.

This feature is optional, user space has to request it explicitly as it
also has to inform us about its view of PCI_COMMAND_INTX_DISABLE. That
way, we can avoid unmasking the interrupt and signaling it if the guest
masked it via the PCI config space.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Documentation/kvm/api.txt |   27 
 arch/x86/kvm/x86.c|1 +
 include/linux/kvm.h   |6 +
 include/linux/kvm_host.h  |   10 ++-
 virt/kvm/assigned-dev.c   |  336 -
 5 files changed, 346 insertions(+), 34 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index e1a9297..1c34e25 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1112,6 +1112,14 @@ following flags are specified:
 
 /* Depends on KVM_CAP_IOMMU */
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
+/* The following two depend on KVM_CAP_PCI_2_3 */
+#define KVM_DEV_ASSIGN_PCI_2_3 (1  1)
+#define KVM_DEV_ASSIGN_MASK_INTX   (1  2)
+
+If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts
+via the PCI-2.3-compliant device-level mask, but only if IRQ sharing with other
+assigned or host devices requires it. KVM_DEV_ASSIGN_MASK_INTX specifies the
+guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details.
 
 4.48 KVM_DEASSIGN_PCI_DEVICE
 
@@ -1263,6 +1271,25 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_ASSIGN_SET_INTX_MASK
+
+Capability: KVM_CAP_PCI_2_3
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_assigned_pci_dev (in)
+Returns: 0 on success, -1 on error
+
+Informs the kernel about the guest's view on the INTx mask. As long as the
+guest masks the legacy INTx, the kernel will refrain from unmasking it at
+hardware level and will not assert the guest's IRQ line. User space is still
+responsible for applying this state to the assigned device's real config space.
+To avoid that the kernel overwrites the state user space wants to set,
+KVM_ASSIGN_SET_INTX_MASK has to be called prior to updating the config space.
+
+See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
+by assigned_dev_id. In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is
+evaluated.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ed373ba..8775a54 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1965,6 +1965,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_PCI_2_3:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..3cadb42 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -541,6 +541,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#define KVM_CAP_PCI_2_3 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -677,6 +678,9 @@ struct kvm_clock_data {
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
 /* Available with KVM_CAP_PPC_GET_PVINFO */
 #define KVM_PPC_GET_PVINFO   _IOW(KVMIO,  0xa1, struct kvm_ppc_pvinfo)
+/* Available with KVM_CAP_PCI_2_3 */
+#define KVM_ASSIGN_SET_INTX_MASK  _IOW(KVMIO,  0xa2, \
+  struct kvm_assigned_pci_dev)
 
 /*
  * ioctls for vcpu fds
@@ -742,6 +746,8 @@ struct kvm_clock_data {
 #define KVM_SET_XCRS _IOW(KVMIO,  0xa7, struct kvm_xcrs)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
+#define KVM_DEV_ASSIGN_PCI_2_3 (1  1)
+#define KVM_DEV_ASSIGN_MASK_INTX   (1  2)
 
 struct kvm_assigned_pci_dev {
__u32 assigned_dev_id;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ac4e83a..4f95070 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -477,6 +477,12 @@ struct kvm_irq_ack_notifier {
void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
 
+enum kvm_intx_state {
+   KVM_INTX_ENABLED,
+   KVM_INTX_LINE_DISABLED,
+   KVM_INTX_DEVICE_DISABLED,
+};
+
 struct kvm_assigned_dev_kernel {
struct kvm_irq_ack_notifier ack_notifier;
struct list_head list;
@@ -486,7 +492,7 @@ struct kvm_assigned_dev_kernel {
int host_devfn;
unsigned int entries_nr;
int host_irq;
-   bool host_irq_disabled;
+   unsigned long 

[PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share IRQs of such devices between on the host side when
passing them to a guest.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 include/linux/kvm_host.h |1 +
 virt/kvm/assigned-dev.c  |  194 ++
 2 files changed, 180 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 46120da..fdc2bd9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -466,6 +466,7 @@ struct kvm_assigned_dev_kernel {
unsigned int entries_nr;
int host_irq;
bool host_irq_disabled;
+   bool pci_2_3;
struct msix_entry *host_msix_entries;
int guest_irq;
struct msix_entry *guest_msix_entries;
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index ca402ed..91fe9c8 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -55,17 +55,145 @@ static int find_index_from_host_irq(struct 
kvm_assigned_dev_kernel
return index;
 }
 
+/*
+ * Verify that the device supports Interrupt Disable bit in command register,
+ * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
+ * in PCI 2.2.
+ */
+static bool pci_2_3_supported(struct pci_dev *pdev)
+{
+   bool supported = false;
+   u16 orig, new;
+
+   pci_block_user_cfg_access(pdev);
+   pci_read_config_word(pdev, PCI_COMMAND, orig);
+   pci_write_config_word(pdev, PCI_COMMAND,
+ orig ^ PCI_COMMAND_INTX_DISABLE);
+   pci_read_config_word(pdev, PCI_COMMAND, new);
+
+   /*
+* There's no way to protect against
+* hardware bugs or detect them reliably, but as long as we know
+* what the value should be, let's go ahead and check it.
+*/
+   if ((new ^ orig)  ~PCI_COMMAND_INTX_DISABLE) {
+   dev_err(pdev-dev, Command changed from 0x%x to 0x%x: 
+   driver or HW bug?\n, orig, new);
+   goto out;
+   }
+   if (!((new ^ orig)  PCI_COMMAND_INTX_DISABLE)) {
+   dev_warn(pdev-dev, Device does not support 
+disabling interrupts: unable to bind.\n);
+   goto out;
+   }
+   supported = true;
+
+   /* Now restore the original value. */
+   pci_write_config_word(pdev, PCI_COMMAND, orig);
+
+out:
+   pci_unblock_user_cfg_access(pdev);
+   return supported;
+}
+
+static unsigned int
+pci_2_3_set_irq_mask(struct pci_dev *dev, bool mask, bool check_status)
+{
+   u32 cmd_status_dword;
+   u16 origcmd, newcmd;
+   unsigned int status;
+
+   /*
+* We do a single dword read to retrieve both command and status.
+* Document assumptions that make this possible.
+*/
+   BUILD_BUG_ON(PCI_COMMAND % 4);
+   BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
+
+   pci_block_user_cfg_access(dev);
+
+   /*
+* Read both command and status registers in a single 32-bit operation.
+* Note: we could cache the value for command and move the status read
+* out of the lock if there was a way to get notified of user changes
+* to command register through sysfs. Should be good for shared irqs.
+*/
+   pci_read_config_dword(dev, PCI_COMMAND, cmd_status_dword);
+   origcmd = cmd_status_dword;
+   status = cmd_status_dword  16;
+
+   if (check_status) {
+   bool irq_pending = status  PCI_STATUS_INTERRUPT;
+
+   /*
+* Check interrupt status register to see whether our device
+* triggered the interrupt (when masking) or the next IRQ is
+* already pending (when unmasking).
+*/
+   if (!(mask == irq_pending))
+   goto done;
+   }
+
+   newcmd = origcmd  ~PCI_COMMAND_INTX_DISABLE;
+   if (mask)
+   newcmd |= PCI_COMMAND_INTX_DISABLE;
+   if (newcmd != origcmd)
+   pci_write_config_word(dev, PCI_COMMAND, newcmd);
+
+done:
+   pci_unblock_user_cfg_access(dev);
+   return status;
+}
+
+static void pci_2_3_irq_mask(struct pci_dev *dev)
+{
+   pci_2_3_set_irq_mask(dev, true, false);
+}
+
+static unsigned int pci_2_3_irq_check_and_mask(struct pci_dev *dev)
+{
+   return pci_2_3_set_irq_mask(dev, true, true);
+}
+
+static void pci_2_3_irq_unmask(struct pci_dev *dev)
+{
+   pci_2_3_set_irq_mask(dev, false, false);
+}
+
+static unsigned int pci_2_3_irq_check_and_unmask(struct pci_dev *dev)
+{
+   return pci_2_3_set_irq_mask(dev, false, true);
+}
+
+static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
+{
+   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
+   int ret = IRQ_WAKE_THREAD;
+
+   spin_lock(assigned_dev-intx_lock);
+   if (assigned_dev-host_irq_disabled ||
+   

Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 04:49:20PM +0100, Jan Kiszka wrote:
 PCI 2.3 allows to generically disable IRQ sources at device level. This
 enables us to share IRQs of such devices between on the host side when
 passing them to a guest.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  include/linux/kvm_host.h |1 +
  virt/kvm/assigned-dev.c  |  194 
 ++
  2 files changed, 180 insertions(+), 15 deletions(-)
 
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 46120da..fdc2bd9 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -466,6 +466,7 @@ struct kvm_assigned_dev_kernel {
   unsigned int entries_nr;
   int host_irq;
   bool host_irq_disabled;
 + bool pci_2_3;
   struct msix_entry *host_msix_entries;
   int guest_irq;
   struct msix_entry *guest_msix_entries;
 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index ca402ed..91fe9c8 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -55,17 +55,145 @@ static int find_index_from_host_irq(struct 
 kvm_assigned_dev_kernel
   return index;
  }
  
 +/*
 + * Verify that the device supports Interrupt Disable bit in command register,
 + * per PCI 2.3, by flipping this bit and reading it back: this bit was 
 readonly
 + * in PCI 2.2.
 + */
 +static bool pci_2_3_supported(struct pci_dev *pdev)
 +{
 + bool supported = false;
 + u16 orig, new;
 +
 + pci_block_user_cfg_access(pdev);
 + pci_read_config_word(pdev, PCI_COMMAND, orig);
 + pci_write_config_word(pdev, PCI_COMMAND,
 +   orig ^ PCI_COMMAND_INTX_DISABLE);
 + pci_read_config_word(pdev, PCI_COMMAND, new);
 +
 + /*
 +  * There's no way to protect against
 +  * hardware bugs or detect them reliably, but as long as we know
 +  * what the value should be, let's go ahead and check it.
 +  */
 + if ((new ^ orig)  ~PCI_COMMAND_INTX_DISABLE) {
 + dev_err(pdev-dev, Command changed from 0x%x to 0x%x: 
 + driver or HW bug?\n, orig, new);
 + goto out;
 + }
 + if (!((new ^ orig)  PCI_COMMAND_INTX_DISABLE)) {
 + dev_warn(pdev-dev, Device does not support 
 +  disabling interrupts: unable to bind.\n);
 + goto out;
 + }
 + supported = true;
 +
 + /* Now restore the original value. */
 + pci_write_config_word(pdev, PCI_COMMAND, orig);
 +
 +out:
 + pci_unblock_user_cfg_access(pdev);
 + return supported;
 +}
 +
 +static unsigned int
 +pci_2_3_set_irq_mask(struct pci_dev *dev, bool mask, bool check_status)
 +{
 + u32 cmd_status_dword;
 + u16 origcmd, newcmd;
 + unsigned int status;
 +
 + /*
 +  * We do a single dword read to retrieve both command and status.
 +  * Document assumptions that make this possible.
 +  */
 + BUILD_BUG_ON(PCI_COMMAND % 4);
 + BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
 +
 + pci_block_user_cfg_access(dev);
 +
 + /*
 +  * Read both command and status registers in a single 32-bit operation.
 +  * Note: we could cache the value for command and move the status read
 +  * out of the lock if there was a way to get notified of user changes
 +  * to command register through sysfs. Should be good for shared irqs.
 +  */
 + pci_read_config_dword(dev, PCI_COMMAND, cmd_status_dword);
 + origcmd = cmd_status_dword;
 + status = cmd_status_dword  16;
 +
 + if (check_status) {
 + bool irq_pending = status  PCI_STATUS_INTERRUPT;
 +
 + /*
 +  * Check interrupt status register to see whether our device
 +  * triggered the interrupt (when masking) or the next IRQ is
 +  * already pending (when unmasking).
 +  */
 + if (!(mask == irq_pending))

Same as mask != irq_pending?

 + goto done;
 + }
 +
 + newcmd = origcmd  ~PCI_COMMAND_INTX_DISABLE;
 + if (mask)
 + newcmd |= PCI_COMMAND_INTX_DISABLE;
 + if (newcmd != origcmd)
 + pci_write_config_word(dev, PCI_COMMAND, newcmd);
 +
 +done:
 + pci_unblock_user_cfg_access(dev);
 + return status;
 +}
 +
 +static void pci_2_3_irq_mask(struct pci_dev *dev)
 +{
 + pci_2_3_set_irq_mask(dev, true, false);
 +}
 +
 +static unsigned int pci_2_3_irq_check_and_mask(struct pci_dev *dev)
 +{
 + return pci_2_3_set_irq_mask(dev, true, true);
 +}
 +
 +static void pci_2_3_irq_unmask(struct pci_dev *dev)
 +{
 + pci_2_3_set_irq_mask(dev, false, false);
 +}
 +
 +static unsigned int pci_2_3_irq_check_and_unmask(struct pci_dev *dev)
 +{
 + return pci_2_3_set_irq_mask(dev, false, true);
 +}
 +

IMO this is not a terribly good interface: all users check the pending bit
(PCI_STATUS_INTERRUPT) which is what the function pci_2_3_set_irq_mask
did anyway.  I'd suggest returning irqreturn_t or bool and not unsigned
int.


 

Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 18:41, Michael S. Tsirkin wrote:
 On Tue, Nov 02, 2010 at 04:49:20PM +0100, Jan Kiszka wrote:
 PCI 2.3 allows to generically disable IRQ sources at device level. This
 enables us to share IRQs of such devices between on the host side when
 passing them to a guest.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  include/linux/kvm_host.h |1 +
  virt/kvm/assigned-dev.c  |  194 
 ++
  2 files changed, 180 insertions(+), 15 deletions(-)

 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 46120da..fdc2bd9 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -466,6 +466,7 @@ struct kvm_assigned_dev_kernel {
  unsigned int entries_nr;
  int host_irq;
  bool host_irq_disabled;
 +bool pci_2_3;
  struct msix_entry *host_msix_entries;
  int guest_irq;
  struct msix_entry *guest_msix_entries;
 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index ca402ed..91fe9c8 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -55,17 +55,145 @@ static int find_index_from_host_irq(struct 
 kvm_assigned_dev_kernel
  return index;
  }
  
 +/*
 + * Verify that the device supports Interrupt Disable bit in command 
 register,
 + * per PCI 2.3, by flipping this bit and reading it back: this bit was 
 readonly
 + * in PCI 2.2.
 + */
 +static bool pci_2_3_supported(struct pci_dev *pdev)
 +{
 +bool supported = false;
 +u16 orig, new;
 +
 +pci_block_user_cfg_access(pdev);
 +pci_read_config_word(pdev, PCI_COMMAND, orig);
 +pci_write_config_word(pdev, PCI_COMMAND,
 +  orig ^ PCI_COMMAND_INTX_DISABLE);
 +pci_read_config_word(pdev, PCI_COMMAND, new);
 +
 +/*
 + * There's no way to protect against
 + * hardware bugs or detect them reliably, but as long as we know
 + * what the value should be, let's go ahead and check it.
 + */
 +if ((new ^ orig)  ~PCI_COMMAND_INTX_DISABLE) {
 +dev_err(pdev-dev, Command changed from 0x%x to 0x%x: 
 +driver or HW bug?\n, orig, new);
 +goto out;
 +}
 +if (!((new ^ orig)  PCI_COMMAND_INTX_DISABLE)) {
 +dev_warn(pdev-dev, Device does not support 
 + disabling interrupts: unable to bind.\n);
 +goto out;
 +}
 +supported = true;
 +
 +/* Now restore the original value. */
 +pci_write_config_word(pdev, PCI_COMMAND, orig);
 +
 +out:
 +pci_unblock_user_cfg_access(pdev);
 +return supported;
 +}
 +
 +static unsigned int
 +pci_2_3_set_irq_mask(struct pci_dev *dev, bool mask, bool check_status)
 +{
 +u32 cmd_status_dword;
 +u16 origcmd, newcmd;
 +unsigned int status;
 +
 +/*
 + * We do a single dword read to retrieve both command and status.
 + * Document assumptions that make this possible.
 + */
 +BUILD_BUG_ON(PCI_COMMAND % 4);
 +BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
 +
 +pci_block_user_cfg_access(dev);
 +
 +/*
 + * Read both command and status registers in a single 32-bit operation.
 + * Note: we could cache the value for command and move the status read
 + * out of the lock if there was a way to get notified of user changes
 + * to command register through sysfs. Should be good for shared irqs.
 + */
 +pci_read_config_dword(dev, PCI_COMMAND, cmd_status_dword);
 +origcmd = cmd_status_dword;
 +status = cmd_status_dword  16;
 +
 +if (check_status) {
 +bool irq_pending = status  PCI_STATUS_INTERRUPT;
 +
 +/*
 + * Check interrupt status register to see whether our device
 + * triggered the interrupt (when masking) or the next IRQ is
 + * already pending (when unmasking).
 + */
 +if (!(mask == irq_pending))
 
 Same as mask != irq_pending?

Yes. Relict of various refactorings.

 
 +goto done;
 +}
 +
 +newcmd = origcmd  ~PCI_COMMAND_INTX_DISABLE;
 +if (mask)
 +newcmd |= PCI_COMMAND_INTX_DISABLE;
 +if (newcmd != origcmd)
 +pci_write_config_word(dev, PCI_COMMAND, newcmd);
 +
 +done:
 +pci_unblock_user_cfg_access(dev);
 +return status;
 +}
 +
 +static void pci_2_3_irq_mask(struct pci_dev *dev)
 +{
 +pci_2_3_set_irq_mask(dev, true, false);
 +}
 +
 +static unsigned int pci_2_3_irq_check_and_mask(struct pci_dev *dev)
 +{
 +return pci_2_3_set_irq_mask(dev, true, true);
 +}
 +
 +static void pci_2_3_irq_unmask(struct pci_dev *dev)
 +{
 +pci_2_3_set_irq_mask(dev, false, false);
 +}
 +
 +static unsigned int pci_2_3_irq_check_and_unmask(struct pci_dev *dev)
 +{
 +return pci_2_3_set_irq_mask(dev, false, true);
 +}
 +
 
 IMO this is not a terribly good interface: all users check the pending bit
 (PCI_STATUS_INTERRUPT) which is what the function pci_2_3_set_irq_mask
 did anyway.  I'd suggest returning irqreturn_t or bool and 

Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
  @@ -99,12 +228,23 @@ static void kvm_assigned_dev_ack_irq(struct 
  kvm_irq_ack_notifier *kian)
 /* The guest irq may be shared so this ack may be
  * from another device.
  */
  -  spin_lock(dev-intx_lock);
  +  spin_lock_irq(dev-intx_lock);
 if (dev-host_irq_disabled) {
  -  enable_irq(dev-host_irq);
  +  if (dev-pci_2_3) {
  +  if (pci_2_3_irq_check_and_unmask(dev-dev) 
  +  PCI_STATUS_INTERRUPT) {
  +  reassert = true;
  +  goto out;
  +  }
  +  } else
  +  enable_irq(dev-host_irq);
  
  Or
  
  if (!dev-pci_2_3)
  enable_irq(dev-host_irq);
  else if (pci_2_3_irq_check_and_unmask(dev-dev)  
  PCI_STATUS_INTERRUPT) {
  reassert = true;
  goto out;
  }
  
  to reduce nesting.
 
 Yeah.
 
  
 dev-host_irq_disabled = false;
 }
  -  spin_unlock(dev-intx_lock);
  +out:
  +  spin_unlock_irq(dev-intx_lock);
  +
  +  if (reassert)
  +  kvm_set_irq(dev-kvm, dev-irq_source_id, dev-guest_irq, 1);
  
  Hmm, I think this still has more overhead than it needs to have.
  Instead of setting level to 0 and then back to 1, can't we just
  avoid set to 1 in the first place? This would need a different
  interface than pci_2_3_irq_check_and_unmask to avoid a race
  where interrupt is received while we are acking another one:
  
  block userspace access
  check pending bit
  if (!pending)
  set irq (0)
  clear pending
  block userspace access
  
  Would be worth it for high volume devices.
 
 The problem is that we can't reorder guest IRQ line clearing and host
 IRQ line enabling without taking a lock across host IRQ disable + guest
 IRQ raise - and that is now distributed across hard and threaded IRQ
 handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.

Oh I think I confused you.
What I mean is:

block userspace access
check interrupt status bit
if (!interrupt status bit set)
set irq (0)
clear interrupt disable bit
block userspace access

This way we enable interrupt after set irq so not need for
extra locks I think.

Hmm one thing I noticed is that pci_block_user_cfg_access
will BUG_ON if it was already blocked. So I think we have
a bug here when interrupt handler kicks in right after
we unmask interrupts.

Probably need some kind of lock to protect against this.


  
   }
   
   static void deassign_guest_irq(struct kvm *kvm,
  @@ -151,7 +291,11 @@ static void deassign_host_irq(struct kvm *kvm,
 pci_disable_msix(assigned_dev-dev);
 } else {
 /* Deal with MSI and INTx */
  -  disable_irq(assigned_dev-host_irq);
  +  if (assigned_dev-pci_2_3) {
  +  pci_2_3_irq_mask(assigned_dev-dev);
  +  synchronize_irq(assigned_dev-host_irq);
  +  } else
  +  disable_irq(assigned_dev-host_irq);
   
 free_irq(assigned_dev-host_irq, (void *)assigned_dev);
   
  @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm *kvm,
   
 pci_reset_function(assigned_dev-dev);
   
  +  /*
  +   * Unmask the IRQ at PCI level once the reset is done - the next user
  +   * may not expect the IRQ being masked.
  +   */
  +  if (assigned_dev-pci_2_3)
  +  pci_2_3_irq_unmask(assigned_dev-dev);
  +
  
  Doesn't pci_reset_function clear mask bit? It seems to ...
 
 I was left with non-functional devices for the host here if I was not
 doing this. Need to recheck, but I think it was required.

Interesting. Could you check why please?


  
 pci_release_regions(assigned_dev-dev);
 pci_disable_device(assigned_dev-dev);
 pci_dev_put(assigned_dev-dev);
  @@ -224,15 +375,29 @@ void kvm_free_all_assigned_devices(struct kvm *kvm)
   static int assigned_device_enable_host_intx(struct kvm *kvm,
 struct kvm_assigned_dev_kernel *dev)
   {
  +  irq_handler_t irq_handler = NULL;
  +  unsigned long flags = 0;
  +
 dev-host_irq = dev-dev-irq;
  -  /* Even though this is PCI, we don't want to use shared
  -   * interrupts. Sharing host devices with guest-assigned devices
  -   * on the same interrupt line is not a happy situation: there
  -   * are going to be long delays in accepting, acking, etc.
  +
  +  /*
  +   * We can only share the IRQ line with other host devices if we are
  +   * able to disable the IRQ source at device-level - independently of
  +   * the guest driver. Otherwise host devices may suffer from unbounded
  +   * IRQ latencies when the guest keeps the line asserted.
  */
  -  if (request_threaded_irq(dev-host_irq, NULL, kvm_assigned_dev_thread,
  -   0, dev-irq_name, (void *)dev))
  +  dev-pci_2_3 = 

Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
 On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
dev-host_irq_disabled = false;
}
 -  spin_unlock(dev-intx_lock);
 +out:
 +  spin_unlock_irq(dev-intx_lock);
 +
 +  if (reassert)
 +  kvm_set_irq(dev-kvm, dev-irq_source_id, dev-guest_irq, 1);

 Hmm, I think this still has more overhead than it needs to have.
 Instead of setting level to 0 and then back to 1, can't we just
 avoid set to 1 in the first place? This would need a different
 interface than pci_2_3_irq_check_and_unmask to avoid a race
 where interrupt is received while we are acking another one:

 block userspace access
 check pending bit
 if (!pending)
 set irq (0)
 clear pending
 block userspace access

 Would be worth it for high volume devices.

 The problem is that we can't reorder guest IRQ line clearing and host
 IRQ line enabling without taking a lock across host IRQ disable + guest
 IRQ raise - and that is now distributed across hard and threaded IRQ
 handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
 
 Oh I think I confused you.
 What I mean is:
 
   block userspace access
   check interrupt status bit
   if (!interrupt status bit set)
   set irq (0)
   clear interrupt disable bit
   block userspace access
 
 This way we enable interrupt after set irq so not need for
 extra locks I think.

OK. Would require some serious refactoring again.

But what about edge IRQs? Don't we need to toggle the bit for them? And
as we do not differentiate between level and edge, we currently have to
do this unconditionally.

 
 Hmm one thing I noticed is that pci_block_user_cfg_access
 will BUG_ON if it was already blocked. So I think we have
 a bug here when interrupt handler kicks in right after
 we unmask interrupts.
 
 Probably need some kind of lock to protect against this.
 

Or an atomic counter. Will have a look.

Alex, does VFIO take care of this already?

 

  }
  
  static void deassign_guest_irq(struct kvm *kvm,
 @@ -151,7 +291,11 @@ static void deassign_host_irq(struct kvm *kvm,
pci_disable_msix(assigned_dev-dev);
} else {
/* Deal with MSI and INTx */
 -  disable_irq(assigned_dev-host_irq);
 +  if (assigned_dev-pci_2_3) {
 +  pci_2_3_irq_mask(assigned_dev-dev);
 +  synchronize_irq(assigned_dev-host_irq);
 +  } else
 +  disable_irq(assigned_dev-host_irq);
  
free_irq(assigned_dev-host_irq, (void *)assigned_dev);
  
 @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm *kvm,
  
pci_reset_function(assigned_dev-dev);
  
 +  /*
 +   * Unmask the IRQ at PCI level once the reset is done - the next user
 +   * may not expect the IRQ being masked.
 +   */
 +  if (assigned_dev-pci_2_3)
 +  pci_2_3_irq_unmask(assigned_dev-dev);
 +

 Doesn't pci_reset_function clear mask bit? It seems to ...

 I was left with non-functional devices for the host here if I was not
 doing this. Need to recheck, but I think it was required.
 
 Interesting. Could you check why please?
 

Can't reproduce anymore. This was early code, maybe affected by some
bits or buts that no longer exist.

Spec says it's cleared on reset, so I removed those lines now.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:40, Jan Kiszka wrote:
 @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm *kvm,
  
   pci_reset_function(assigned_dev-dev);
  
 + /*
 +  * Unmask the IRQ at PCI level once the reset is done - the next user
 +  * may not expect the IRQ being masked.
 +  */
 + if (assigned_dev-pci_2_3)
 + pci_2_3_irq_unmask(assigned_dev-dev);
 +

 Doesn't pci_reset_function clear mask bit? It seems to ...

 I was left with non-functional devices for the host here if I was not
 doing this. Need to recheck, but I think it was required.

 Interesting. Could you check why please?

 
 Can't reproduce anymore. This was early code, maybe affected by some
 bits or buts that no longer exist.
 
 Spec says it's cleared on reset, so I removed those lines now.
 

Hmpf, it just happened again: Guest was using my ath9k, I killed the
guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
config space after the reset, bringing the disable bit back?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:48, Jan Kiszka wrote:
 Am 02.11.2010 19:40, Jan Kiszka wrote:
 @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
 *kvm,
  
  pci_reset_function(assigned_dev-dev);
  
 +/*
 + * Unmask the IRQ at PCI level once the reset is done - the 
 next user
 + * may not expect the IRQ being masked.
 + */
 +if (assigned_dev-pci_2_3)
 +pci_2_3_irq_unmask(assigned_dev-dev);
 +

 Doesn't pci_reset_function clear mask bit? It seems to ...

 I was left with non-functional devices for the host here if I was not
 doing this. Need to recheck, but I think it was required.

 Interesting. Could you check why please?


 Can't reproduce anymore. This was early code, maybe affected by some
 bits or buts that no longer exist.

 Spec says it's cleared on reset, so I removed those lines now.

 
 Hmpf, it just happened again: Guest was using my ath9k, I killed the
 guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
 config space after the reset, bringing the disable bit back?

Or does the kernel cache the control word?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 07:40:46PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
  On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
   dev-host_irq_disabled = false;
   }
  -spin_unlock(dev-intx_lock);
  +out:
  +spin_unlock_irq(dev-intx_lock);
  +
  +if (reassert)
  +kvm_set_irq(dev-kvm, dev-irq_source_id, 
  dev-guest_irq, 1);
 
  Hmm, I think this still has more overhead than it needs to have.
  Instead of setting level to 0 and then back to 1, can't we just
  avoid set to 1 in the first place? This would need a different
  interface than pci_2_3_irq_check_and_unmask to avoid a race
  where interrupt is received while we are acking another one:
 
block userspace access
check pending bit
if (!pending)
set irq (0)
clear pending
block userspace access
 
  Would be worth it for high volume devices.
 
  The problem is that we can't reorder guest IRQ line clearing and host
  IRQ line enabling without taking a lock across host IRQ disable + guest
  IRQ raise - and that is now distributed across hard and threaded IRQ
  handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
  
  Oh I think I confused you.
  What I mean is:
  
  block userspace access
  check interrupt status bit
  if (!interrupt status bit set)
  set irq (0)
  clear interrupt disable bit
  block userspace access
  
  This way we enable interrupt after set irq so not need for
  extra locks I think.
 
 OK. Would require some serious refactoring again.

That would also mean we can't just solve the nested block/unblock
problem with a simple lock.  Not sure this is worth the effort.

 But what about edge IRQs? Don't we need to toggle the bit for them? And
 as we do not differentiate between level and edge, we currently have to
 do this unconditionally.

AFAIK PCI IRQs are level, so I don't think we need to bother.

  
  Hmm one thing I noticed is that pci_block_user_cfg_access
  will BUG_ON if it was already blocked. So I think we have
  a bug here when interrupt handler kicks in right after
  we unmask interrupts.
  
  Probably need some kind of lock to protect against this.
  
 
 Or an atomic counter.

BTW block userspace access uses a global spinlock which will likely hurt
us on multi-CPU. Switching that to something more SMP friendly, e.g. a
per-device spinlock, might be a good idea: I don't see why that lock and
queue are global.

 Will have a look.

Need to also consider an interrupt running in parallel
with unmasking in thread.

 Alex, does VFIO take care of this already?

VFIO does this all under spin_lock_irq.

  
 
   }
   
   static void deassign_guest_irq(struct kvm *kvm,
  @@ -151,7 +291,11 @@ static void deassign_host_irq(struct kvm *kvm,
   pci_disable_msix(assigned_dev-dev);
   } else {
   /* Deal with MSI and INTx */
  -disable_irq(assigned_dev-host_irq);
  +if (assigned_dev-pci_2_3) {
  +pci_2_3_irq_mask(assigned_dev-dev);
  +synchronize_irq(assigned_dev-host_irq);
  +} else
  +disable_irq(assigned_dev-host_irq);
   
   free_irq(assigned_dev-host_irq, (void *)assigned_dev);
   
  @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
  *kvm,
   
   pci_reset_function(assigned_dev-dev);
   
  +/*
  + * Unmask the IRQ at PCI level once the reset is done - the 
  next user
  + * may not expect the IRQ being masked.
  + */
  +if (assigned_dev-pci_2_3)
  +pci_2_3_irq_unmask(assigned_dev-dev);
  +
 
  Doesn't pci_reset_function clear mask bit? It seems to ...
 
  I was left with non-functional devices for the host here if I was not
  doing this. Need to recheck, but I think it was required.
  
  Interesting. Could you check why please?
  
 
 Can't reproduce anymore. This was early code, maybe affected by some
 bits or buts that no longer exist.
 
 Spec says it's cleared on reset, so I removed those lines now.
 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:48, Jan Kiszka wrote:
  Am 02.11.2010 19:40, Jan Kiszka wrote:
  @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
  *kvm,
   
 pci_reset_function(assigned_dev-dev);
   
  +  /*
  +   * Unmask the IRQ at PCI level once the reset is done - the 
  next user
  +   * may not expect the IRQ being masked.
  +   */
  +  if (assigned_dev-pci_2_3)
  +  pci_2_3_irq_unmask(assigned_dev-dev);
  +
 
  Doesn't pci_reset_function clear mask bit? It seems to ...
 
  I was left with non-functional devices for the host here if I was not
  doing this. Need to recheck, but I think it was required.
 
  Interesting. Could you check why please?
 
 
  Can't reproduce anymore. This was early code, maybe affected by some
  bits or buts that no longer exist.
 
  Spec says it's cleared on reset, so I removed those lines now.
 
  
  Hmpf, it just happened again: Guest was using my ath9k, I killed the
  guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
  config space after the reset, bringing the disable bit back?
 
 Or does the kernel cache the control word?
 
 Jan

Maybe it does, need to dig in drivers/pci. If it does this
might have other implications.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:52, Michael S. Tsirkin wrote:
 On Tue, Nov 02, 2010 at 07:40:46PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
 On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
  dev-host_irq_disabled = false;
  }
 -spin_unlock(dev-intx_lock);
 +out:
 +spin_unlock_irq(dev-intx_lock);
 +
 +if (reassert)
 +kvm_set_irq(dev-kvm, dev-irq_source_id, 
 dev-guest_irq, 1);

 Hmm, I think this still has more overhead than it needs to have.
 Instead of setting level to 0 and then back to 1, can't we just
 avoid set to 1 in the first place? This would need a different
 interface than pci_2_3_irq_check_and_unmask to avoid a race
 where interrupt is received while we are acking another one:

   block userspace access
   check pending bit
   if (!pending)
   set irq (0)
   clear pending
   block userspace access

 Would be worth it for high volume devices.

 The problem is that we can't reorder guest IRQ line clearing and host
 IRQ line enabling without taking a lock across host IRQ disable + guest
 IRQ raise - and that is now distributed across hard and threaded IRQ
 handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.

 Oh I think I confused you.
 What I mean is:

 block userspace access
 check interrupt status bit
 if (!interrupt status bit set)
 set irq (0)
 clear interrupt disable bit
 block userspace access

 This way we enable interrupt after set irq so not need for
 extra locks I think.

 OK. Would require some serious refactoring again.
 
 That would also mean we can't just solve the nested block/unblock
 problem with a simple lock.  Not sure this is worth the effort.

Can't follow: what can be nested and how?

 
 But what about edge IRQs? Don't we need to toggle the bit for them? And
 as we do not differentiate between level and edge, we currently have to
 do this unconditionally.
 
 AFAIK PCI IRQs are level, so I don't think we need to bother.

Ah, indeed.

 

 Hmm one thing I noticed is that pci_block_user_cfg_access
 will BUG_ON if it was already blocked. So I think we have
 a bug here when interrupt handler kicks in right after
 we unmask interrupts.

 Probably need some kind of lock to protect against this.


 Or an atomic counter.
 
 BTW block userspace access uses a global spinlock which will likely hurt
 us on multi-CPU. Switching that to something more SMP friendly, e.g. a
 per-device spinlock, might be a good idea: I don't see why that lock and
 queue are global.

Been through that code recently, hairy stuff. pci_lock also protects the
bus operation which can be overloaded (e.g. for error injection - if
that is not the only use case). So we need a per-bus lock, but that can
still cause contentions if devices on the same bus are handled on
different cpus.

I think the whole PCI config interface is simply not designed for
performance. It's considered a slow path, which it normally is.

 
 Will have a look.
 
 Need to also consider an interrupt running in parallel
 with unmasking in thread.
 
 Alex, does VFIO take care of this already?
 
 VFIO does this all under spin_lock_irq.

Everything has its pros and cons...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 08:11:31PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:52, Michael S. Tsirkin wrote:
  On Tue, Nov 02, 2010 at 07:40:46PM +0100, Jan Kiszka wrote:
  Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
  On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
 dev-host_irq_disabled = false;
 }
  -  spin_unlock(dev-intx_lock);
  +out:
  +  spin_unlock_irq(dev-intx_lock);
  +
  +  if (reassert)
  +  kvm_set_irq(dev-kvm, dev-irq_source_id, 
  dev-guest_irq, 1);
 
  Hmm, I think this still has more overhead than it needs to have.
  Instead of setting level to 0 and then back to 1, can't we just
  avoid set to 1 in the first place? This would need a different
  interface than pci_2_3_irq_check_and_unmask to avoid a race
  where interrupt is received while we are acking another one:
 
  block userspace access
  check pending bit
  if (!pending)
  set irq (0)
  clear pending
  block userspace access
 
  Would be worth it for high volume devices.
 
  The problem is that we can't reorder guest IRQ line clearing and host
  IRQ line enabling without taking a lock across host IRQ disable + guest
  IRQ raise - and that is now distributed across hard and threaded IRQ
  handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
 
  Oh I think I confused you.
  What I mean is:
 
block userspace access
check interrupt status bit
if (!interrupt status bit set)
set irq (0)
clear interrupt disable bit
block userspace access
 
  This way we enable interrupt after set irq so not need for
  extra locks I think.
 
  OK. Would require some serious refactoring again.
  
  That would also mean we can't just solve the nested block/unblock
  problem with a simple lock.  Not sure this is worth the effort.
 
 Can't follow: what can be nested and how?

I just mean interrupt trying to block userspace when thread
did that already.

  
  But what about edge IRQs? Don't we need to toggle the bit for them? And
  as we do not differentiate between level and edge, we currently have to
  do this unconditionally.
  
  AFAIK PCI IRQs are level, so I don't think we need to bother.
 
 Ah, indeed.
 
  
 
  Hmm one thing I noticed is that pci_block_user_cfg_access
  will BUG_ON if it was already blocked. So I think we have
  a bug here when interrupt handler kicks in right after
  we unmask interrupts.
 
  Probably need some kind of lock to protect against this.
 
 
  Or an atomic counter.
  
  BTW block userspace access uses a global spinlock which will likely hurt
  us on multi-CPU. Switching that to something more SMP friendly, e.g. a
  per-device spinlock, might be a good idea: I don't see why that lock and
  queue are global.
 
 Been through that code recently, hairy stuff. pci_lock also protects the
 bus operation which can be overloaded (e.g. for error injection - if
 that is not the only use case). So we need a per-bus lock, but that can
 still cause contentions if devices on the same bus are handled on
 different cpus.

Right. So this lock got reused for blocking userspace, I do not suggest
we rip it all out, just make userspace blocking use
a finer-grained lock.

 I think the whole PCI config interface is simply not designed for
 performance. It's considered a slow path, which it normally is.

So I guess we'll need to fix that now, at least if we
want to make the 2.3 way the default.

  
  Will have a look.
  
  Need to also consider an interrupt running in parallel
  with unmasking in thread.
  
  Alex, does VFIO take care of this already?
  
  VFIO does this all under spin_lock_irq.
 
 Everything has its pros and cons...
 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 19:54, Michael S. Tsirkin wrote:
 On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:48, Jan Kiszka wrote:
 Am 02.11.2010 19:40, Jan Kiszka wrote:
 @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
 *kvm,
  
pci_reset_function(assigned_dev-dev);
  
 +  /*
 +   * Unmask the IRQ at PCI level once the reset is done - the 
 next user
 +   * may not expect the IRQ being masked.
 +   */
 +  if (assigned_dev-pci_2_3)
 +  pci_2_3_irq_unmask(assigned_dev-dev);
 +

 Doesn't pci_reset_function clear mask bit? It seems to ...

 I was left with non-functional devices for the host here if I was not
 doing this. Need to recheck, but I think it was required.

 Interesting. Could you check why please?


 Can't reproduce anymore. This was early code, maybe affected by some
 bits or buts that no longer exist.

 Spec says it's cleared on reset, so I removed those lines now.


 Hmpf, it just happened again: Guest was using my ath9k, I killed the
 guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
 config space after the reset, bringing the disable bit back?

 Or does the kernel cache the control word?

 Jan
 
 Maybe it does, need to dig in drivers/pci. If it does this
 might have other implications.

OK, that mystery is resolved now: pci_reset_function saves  restores
the device state.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Alex Williamson
On Tue, 2010-11-02 at 19:40 +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
  On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
   dev-host_irq_disabled = false;
   }
  -spin_unlock(dev-intx_lock);
  +out:
  +spin_unlock_irq(dev-intx_lock);
  +
  +if (reassert)
  +kvm_set_irq(dev-kvm, dev-irq_source_id, 
  dev-guest_irq, 1);
 
  Hmm, I think this still has more overhead than it needs to have.
  Instead of setting level to 0 and then back to 1, can't we just
  avoid set to 1 in the first place? This would need a different
  interface than pci_2_3_irq_check_and_unmask to avoid a race
  where interrupt is received while we are acking another one:
 
block userspace access
check pending bit
if (!pending)
set irq (0)
clear pending
block userspace access
 
  Would be worth it for high volume devices.
 
  The problem is that we can't reorder guest IRQ line clearing and host
  IRQ line enabling without taking a lock across host IRQ disable + guest
  IRQ raise - and that is now distributed across hard and threaded IRQ
  handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
  
  Oh I think I confused you.
  What I mean is:
  
  block userspace access
  check interrupt status bit
  if (!interrupt status bit set)
  set irq (0)
  clear interrupt disable bit
  block userspace access
  
  This way we enable interrupt after set irq so not need for
  extra locks I think.
 
 OK. Would require some serious refactoring again.
 
 But what about edge IRQs? Don't we need to toggle the bit for them? And
 as we do not differentiate between level and edge, we currently have to
 do this unconditionally.
 
  
  Hmm one thing I noticed is that pci_block_user_cfg_access
  will BUG_ON if it was already blocked. So I think we have
  a bug here when interrupt handler kicks in right after
  we unmask interrupts.
  
  Probably need some kind of lock to protect against this.
  
 
 Or an atomic counter. Will have a look.
 
 Alex, does VFIO take care of this already?

Yes, VFIO has a lock used by the interrupt handler and the EOI handler
that prevents them from both blocking user cfg access at the same time.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 08:30:25PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:54, Michael S. Tsirkin wrote:
  On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
  Am 02.11.2010 19:48, Jan Kiszka wrote:
  Am 02.11.2010 19:40, Jan Kiszka wrote:
  @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
  *kvm,
   
   pci_reset_function(assigned_dev-dev);
   
  +/*
  + * Unmask the IRQ at PCI level once the reset is done - the 
  next user
  + * may not expect the IRQ being masked.
  + */
  +if (assigned_dev-pci_2_3)
  +pci_2_3_irq_unmask(assigned_dev-dev);
  +
 
  Doesn't pci_reset_function clear mask bit? It seems to ...
 
  I was left with non-functional devices for the host here if I was not
  doing this. Need to recheck, but I think it was required.
 
  Interesting. Could you check why please?
 
 
  Can't reproduce anymore. This was early code, maybe affected by some
  bits or buts that no longer exist.
 
  Spec says it's cleared on reset, so I removed those lines now.
 
 
  Hmpf, it just happened again: Guest was using my ath9k, I killed the
  guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
  config space after the reset, bringing the disable bit back?
 
  Or does the kernel cache the control word?
 
  Jan
  
  Maybe it does, need to dig in drivers/pci. If it does this
  might have other implications.
 
 OK, that mystery is resolved now: pci_reset_function saves  restores
 the device state.
 
 Jan

Aha. I wonder what other state we need to clear.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 20:14, Michael S. Tsirkin wrote:
 BTW block userspace access uses a global spinlock which will likely hurt
 us on multi-CPU. Switching that to something more SMP friendly, e.g. a
 per-device spinlock, might be a good idea: I don't see why that lock and
 queue are global.

 Been through that code recently, hairy stuff. pci_lock also protects the
 bus operation which can be overloaded (e.g. for error injection - if
 that is not the only use case). So we need a per-bus lock, but that can
 still cause contentions if devices on the same bus are handled on
 different cpus.
 
 Right. So this lock got reused for blocking userspace, I do not suggest
 we rip it all out, just make userspace blocking use
 a finer-grained lock.
 
 I think the whole PCI config interface is simply not designed for
 performance. It's considered a slow path, which it normally is.
 
 So I guess we'll need to fix that now, at least if we
 want to make the 2.3 way the default.
 

On many systems (those with direct PCI config access), there is
another lock down the road: pci_config_lock. That can't be broken up as
the protected resource is unique. So we do not gain much improving the
higher-level lock.

BTW, accessing the interrupt controller for IRQ line fiddling is not a
per-device thing either. So as long as the latency of the code under the
locks is not significantly worse with PCI-level masking, we don't lose
scalability here.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Jan Kiszka
Am 02.11.2010 20:53, Michael S. Tsirkin wrote:
 On Tue, Nov 02, 2010 at 08:30:25PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:54, Michael S. Tsirkin wrote:
 On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 19:48, Jan Kiszka wrote:
 Am 02.11.2010 19:40, Jan Kiszka wrote:
 @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct kvm 
 *kvm,
  
  pci_reset_function(assigned_dev-dev);
  
 +/*
 + * Unmask the IRQ at PCI level once the reset is done - the 
 next user
 + * may not expect the IRQ being masked.
 + */
 +if (assigned_dev-pci_2_3)
 +pci_2_3_irq_unmask(assigned_dev-dev);
 +

 Doesn't pci_reset_function clear mask bit? It seems to ...

 I was left with non-functional devices for the host here if I was not
 doing this. Need to recheck, but I think it was required.

 Interesting. Could you check why please?


 Can't reproduce anymore. This was early code, maybe affected by some
 bits or buts that no longer exist.

 Spec says it's cleared on reset, so I removed those lines now.


 Hmpf, it just happened again: Guest was using my ath9k, I killed the
 guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
 config space after the reset, bringing the disable bit back?

 Or does the kernel cache the control word?

 Jan

 Maybe it does, need to dig in drivers/pci. If it does this
 might have other implications.

 OK, that mystery is resolved now: pci_reset_function saves  restores
 the device state.

 Jan
 
 Aha. I wonder what other state we need to clear.
 

Maybe just save/restore before/after assigning the device?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

2010-11-02 Thread Michael S. Tsirkin
On Tue, Nov 02, 2010 at 08:58:36PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 20:53, Michael S. Tsirkin wrote:
  On Tue, Nov 02, 2010 at 08:30:25PM +0100, Jan Kiszka wrote:
  Am 02.11.2010 19:54, Michael S. Tsirkin wrote:
  On Tue, Nov 02, 2010 at 07:51:39PM +0100, Jan Kiszka wrote:
  Am 02.11.2010 19:48, Jan Kiszka wrote:
  Am 02.11.2010 19:40, Jan Kiszka wrote:
  @@ -199,6 +343,13 @@ static void kvm_free_assigned_device(struct 
  kvm *kvm,
   
 pci_reset_function(assigned_dev-dev);
   
  +  /*
  +   * Unmask the IRQ at PCI level once the reset is done - the 
  next user
  +   * may not expect the IRQ being masked.
  +   */
  +  if (assigned_dev-pci_2_3)
  +  pci_2_3_irq_unmask(assigned_dev-dev);
  +
 
  Doesn't pci_reset_function clear mask bit? It seems to ...
 
  I was left with non-functional devices for the host here if I was not
  doing this. Need to recheck, but I think it was required.
 
  Interesting. Could you check why please?
 
 
  Can't reproduce anymore. This was early code, maybe affected by some
  bits or buts that no longer exist.
 
  Spec says it's cleared on reset, so I removed those lines now.
 
 
  Hmpf, it just happened again: Guest was using my ath9k, I killed the
  guest, lspci says DisINTx+. Strange. Is anyone (qemu) restoring the
  config space after the reset, bringing the disable bit back?
 
  Or does the kernel cache the control word?
 
  Jan
 
  Maybe it does, need to dig in drivers/pci. If it does this
  might have other implications.
 
  OK, that mystery is resolved now: pci_reset_function saves  restores
  the device state.
 
  Jan
  
  Aha. I wonder what other state we need to clear.
  
 
 Maybe just save/restore before/after assigning the device?
 
 Jan
 

Yea. Sounds good.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html