Re: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path

2015-07-23 Thread Gerald Schaefer
On Wed, 22 Jul 2015 10:54:35 -0600
Alex Williamson alex.william...@redhat.com wrote:

 On Tue, 2015-07-21 at 19:44 +0200, Gerald Schaefer wrote:
  When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci
  device is removed thereafter (before any other ioctl like
  VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of the
  underlying IOMMU API is never called.
  
  This patch adds a call to vfio_group_try_dissolve_container() to
  the remove path, which will trigger the missing detach_dev callback
  in this scenario.
  
  Signed-off-by: Gerald Schaefer gerald.schae...@de.ibm.com
  ---
   drivers/vfio/vfio.c | 3 +++
   1 file changed, 3 insertions(+)
  
  diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
  index 2fb29df..9c5c784 100644
  --- a/drivers/vfio/vfio.c
  +++ b/drivers/vfio/vfio.c
  @@ -711,6 +711,8 @@ static bool vfio_dev_present(struct vfio_group
  *group, struct device *dev) return true;
   }
   
  +static void vfio_group_try_dissolve_container(struct vfio_group
  *group); +
   /*
* Decrement the device reference count and wait for the device to
  be
* removed.  Open file descriptors for the device... */
  @@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev)
  }
  } while (ret = 0);
   
  +   vfio_group_try_dissolve_container(group);
  vfio_group_put(group);
   
  return device_data;
 
 
 This won't work, vfio_group_try_dissolve_container() decrements
 container_users, which an unused device is not.  Imagine if we had
 more than one device in the iommu group, one device is removed and the
 container is dissolved despite the user holding a reference and other
 viable devices remaining.  Additionally, from an isolation
 perspective, an unbind from vfio-pci should not pull the device out
 of the iommu domain, it's part of the domain because it's not
 isolated and that continues even after unbind.
 
 I think what you want to do is detach a device from the iommu domain
 only when it's being removed from iommu group, such as through
 iommu_group_remove_device().  We already have a bit of an asymmetry
 there as iommu_group_add_device() will add devices to the currently
 active iommu domain for the group, but iommu_group_remove_device()
 does not appear to do the reverse.  Thanks,

Interesting, I haven't noticed this asymmetry so far, do you mean
something like this:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f286090..82ac8b3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -447,6 +447,9 @@ rename:
 }
 EXPORT_SYMBOL_GPL(iommu_group_add_device);
 
+static void __iommu_detach_device(struct iommu_domain *domain,
+ struct device *dev);
+
 /**
  * iommu_group_remove_device - remove a device from it's current group
  * @dev: device to be removed
@@ -466,6 +469,8 @@ void iommu_group_remove_device(struct device *dev)
 IOMMU_GROUP_NOTIFY_DEL_DEVICE,
dev); 
mutex_lock(group-mutex);
+   if (group-domain)
+   __iommu_detach_device(group-domain, dev);
list_for_each_entry(tmp_device, group-devices, list) {
if (tmp_device-dev == dev) {
device = tmp_device;

This would also fix the issue in my scenario, but like before that
doesn't need to mean it is the correct fix. Adding the iommu list and
maintainer to cc.

Joerg, what do you think? (see https://lkml.org/lkml/2015/7/21/635 for
the problem description)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path

2015-07-23 Thread Gerald Schaefer
On Wed, 22 Jul 2015 11:10:57 -0600
Alex Williamson alex.william...@redhat.com wrote:

 On Wed, 2015-07-22 at 10:54 -0600, Alex Williamson wrote:
  On Tue, 2015-07-21 at 19:44 +0200, Gerald Schaefer wrote:
   When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci
   device is removed thereafter (before any other ioctl like
   VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of the
   underlying IOMMU API is never called.
   
   This patch adds a call to vfio_group_try_dissolve_container() to
   the remove path, which will trigger the missing detach_dev
   callback in this scenario.
   
   Signed-off-by: Gerald Schaefer gerald.schae...@de.ibm.com
   ---
drivers/vfio/vfio.c | 3 +++
1 file changed, 3 insertions(+)
   
   diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
   index 2fb29df..9c5c784 100644
   --- a/drivers/vfio/vfio.c
   +++ b/drivers/vfio/vfio.c
   @@ -711,6 +711,8 @@ static bool vfio_dev_present(struct
   vfio_group *group, struct device *dev) return true;
}

   +static void vfio_group_try_dissolve_container(struct vfio_group
   *group); +
/*
 * Decrement the device reference count and wait for the device
   to be
 * removed.  Open file descriptors for the device... */
   @@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev)
 }
 } while (ret = 0);

   + vfio_group_try_dissolve_container(group);
 vfio_group_put(group);

 return device_data;
  
  
  This won't work, vfio_group_try_dissolve_container() decrements
  container_users, which an unused device is not.  Imagine if we had
  more than one device in the iommu group, one device is removed and
  the container is dissolved despite the user holding a reference and
  other viable devices remaining.  Additionally, from an isolation
  perspective, an unbind from vfio-pci should not pull the device out
  of the iommu domain, it's part of the domain because it's not
  isolated and that continues even after unbind.
  
  I think what you want to do is detach a device from the iommu domain
  only when it's being removed from iommu group, such as through
  iommu_group_remove_device().  We already have a bit of an asymmetry
  there as iommu_group_add_device() will add devices to the currently
  active iommu domain for the group, but iommu_group_remove_device()
  does not appear to do the reverse.  Thanks,
 
 BTW, VT-d on x86 avoids a leak using its own notifier_block,
 drivers/iommu/intel-iommu.c:device_notifier() catches
 BUS_NOTIFY_REMOVED_DEVICE and removes the device from the domain (the
 domain_exit() there is only used for non-IOMMU-API domains).  It's
 possible that's the only IOMMU driver that avoids a leak due to the
 scenario you describe.  Thanks,

Thanks, that's good to know, so as a last resort I could also use the
notifier to work around the issue. But x86 seems to be the only arch
using this notifier so far, so a general fix would be nice.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/6] iommu: add ARM short descriptor page table allocator.

2015-07-23 Thread Yong Wu
Hi Will,
Thanks for your review so detail.
When you are free, please help me check whether it's ok if it's
changed like below.
Thanks very much.

On Tue, 2015-07-21 at 18:11 +0100, Will Deacon wrote:
 Hello,
 
 This is looking better, but I still have some concerns.
 
 On Thu, Jul 16, 2015 at 10:04:32AM +0100, Yong Wu wrote:
  This patch is for ARM Short Descriptor Format.
  
  Signed-off-by: Yong Wu yong...@mediatek.com
  ---
   drivers/iommu/Kconfig|   18 +
   drivers/iommu/Makefile   |1 +
   drivers/iommu/io-pgtable-arm-short.c |  742 
  ++
   drivers/iommu/io-pgtable-arm.c   |3 -
   drivers/iommu/io-pgtable.c   |4 +
   drivers/iommu/io-pgtable.h   |   13 +
   6 files changed, 778 insertions(+), 3 deletions(-)
   create mode 100644 drivers/iommu/io-pgtable-arm-short.c
 
 [...]
 
  +#define ARM_SHORT_PGDIR_SHIFT  20
  +#define ARM_SHORT_PAGE_SHIFT   12
  +#define ARM_SHORT_PTRS_PER_PTE \
  +   (1  (ARM_SHORT_PGDIR_SHIFT - ARM_SHORT_PAGE_SHIFT))
  +#define ARM_SHORT_BYTES_PER_PTE\
  +   (ARM_SHORT_PTRS_PER_PTE * sizeof(arm_short_iopte))
  +
  +/* level 1 pagetable */
  +#define ARM_SHORT_PGD_TYPE_PGTABLE BIT(0)
  +#define ARM_SHORT_PGD_SECTION_XN   (BIT(0) | BIT(4))
  +#define ARM_SHORT_PGD_TYPE_SECTION BIT(1)
  +#define ARM_SHORT_PGD_PGTABLE_XN   BIT(2)
 
 This should be PXN, but I'm not even sure why we care for a table at
 level 1.

I will delete it.

 
  +#define ARM_SHORT_PGD_BBIT(2)
  +#define ARM_SHORT_PGD_CBIT(3)
  +#define ARM_SHORT_PGD_PGTABLE_NS   BIT(3)
  +#define ARM_SHORT_PGD_IMPLEBIT(9)
  +#define ARM_SHORT_PGD_TEX0 BIT(12)
  +#define ARM_SHORT_PGD_SBIT(16)
  +#define ARM_SHORT_PGD_nG   BIT(17)
  +#define ARM_SHORT_PGD_SUPERSECTION BIT(18)
  +#define ARM_SHORT_PGD_SECTION_NS   BIT(19)
  +
  +#define ARM_SHORT_PGD_TYPE_SUPERSECTION\
  +   (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
  +#define ARM_SHORT_PGD_SECTION_TYPE_MSK \
  +   (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
  +#define ARM_SHORT_PGD_PGTABLE_TYPE_MSK \
  +   (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_TYPE_PGTABLE)
  +#define ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd) \
  +   (((pgd)  ARM_SHORT_PGD_PGTABLE_TYPE_MSK) == 
  ARM_SHORT_PGD_TYPE_PGTABLE)
  +#define ARM_SHORT_PGD_TYPE_IS_SECTION(pgd) \
  +   (((pgd)  ARM_SHORT_PGD_SECTION_TYPE_MSK) == 
  ARM_SHORT_PGD_TYPE_SECTION)
  +#define ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(pgd)\
  +   (((pgd)  ARM_SHORT_PGD_SECTION_TYPE_MSK) == \
  +   ARM_SHORT_PGD_TYPE_SUPERSECTION)
  +#define ARM_SHORT_PGD_PGTABLE_MSK  0xfc00
  +#define ARM_SHORT_PGD_SECTION_MSK  (~(SZ_1M - 1))
  +#define ARM_SHORT_PGD_SUPERSECTION_MSK (~(SZ_16M - 1))
  +
  +/* level 2 pagetable */
  +#define ARM_SHORT_PTE_TYPE_LARGE   BIT(0)
  +#define ARM_SHORT_PTE_SMALL_XN BIT(0)
  +#define ARM_SHORT_PTE_TYPE_SMALL   BIT(1)
  +#define ARM_SHORT_PTE_BBIT(2)
  +#define ARM_SHORT_PTE_CBIT(3)
  +#define ARM_SHORT_PTE_SMALL_TEX0   BIT(6)
  +#define ARM_SHORT_PTE_IMPLEBIT(9)
 
 This is AP[2] for small pages.

Sorry, In our pagetable bit9 in PGD and PTE is PA[32] that is for  the
dram size over 4G. I didn't care it is different in PTE of the standard
spec.
And I don't use the AP[2] currently, so I only delete this line in next
time.

 
  +#define ARM_SHORT_PTE_SBIT(10)
  +#define ARM_SHORT_PTE_nG   BIT(11)
  +#define ARM_SHORT_PTE_LARGE_TEX0   BIT(12)
  +#define ARM_SHORT_PTE_LARGE_XN BIT(15)
  +#define ARM_SHORT_PTE_LARGE_MSK(~(SZ_64K - 1))
  +#define ARM_SHORT_PTE_SMALL_MSK(~(SZ_4K - 1))
  +#define ARM_SHORT_PTE_TYPE_MSK \
  +   (ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
  +#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)   \
  +   (pte)  ARM_SHORT_PTE_TYPE_MSK)  1)  1)\
  +   == ARM_SHORT_PTE_TYPE_SMALL)
 
 I see what you're trying to do here, but the shifting is confusing. I
 think it's better doing something like:
 
   ((pte)  ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL
 
 or even just:
 
   (pte)  ARM_SHORT_PTE_TYPE_SMALL

Thanks. I will use this:
((pte)  ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL

This line may be close to the ARM_SHORT_PTE_TYPE_IS_LARGEPAGE below.
 
  +#define ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(pte)   \
  +   (((pte)  

Re: [PATCH v3 5/6] iommu/mediatek: Add mt8173 IOMMU driver

2015-07-23 Thread Yong Wu
On Tue, 2015-07-21 at 15:59 +0100, Will Deacon wrote:
 Hi Yong Wu,
 
 On Thu, Jul 16, 2015 at 10:04:34AM +0100, Yong Wu wrote:
  This patch adds support for mediatek m4u (MultiMedia Memory Management
  Unit).
 
 [...]
 
  +static void mtk_iommu_tlb_flush_all(void *cookie)
  +{
  +   struct mtk_iommu_domain *domain = cookie;
  +   void __iomem *base;
  +
  +   base = domain-data-base;
  +   writel(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL);
  +   writel(F_ALL_INVLD, base + REG_MMU_INVALIDATE);
 
 This needs to be synchronous, so you probably want to call
 mtk_iommu_tlb_sync at the end.

From our spec, we have to wait until HW done after tlb flush range.
But it don't need wait after tlb flush all.
so It isn't necessary to add mtk_iommu_tlb_sync in tlb_flush_all here.

 
  +}
  +
  +static void mtk_iommu_tlb_add_flush(unsigned long iova, size_t size,
  +   bool leaf, void *cookie)
  +{
  +   struct mtk_iommu_domain *domain = cookie;
  +   void __iomem *base = domain-data-base;
  +   unsigned int iova_start = iova, iova_end = iova + size - 1;
  +
  +   writel(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL);
  +
  +   writel(iova_start, base + REG_MMU_INVLD_START_A);
  +   writel(iova_end, base + REG_MMU_INVLD_END_A);
  +   writel(F_MMU_INV_RANGE, base + REG_MMU_INVALIDATE);
 
 Why are you using writel instead of writel_relaxed? I asked this before
 but I don't think you replied.

Sorry, I didn't notice _relax last time. I will improve in next version.
Thanks very much.

 
 Will


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/omap: Add support for configuring dsp iommus on DRA7xx

2015-07-23 Thread Tony Lindgren
* Suman Anna s-a...@ti.com [150723 09:25]:
 Hi Tony,
 
 On 07/23/2015 02:24 AM, Tony Lindgren wrote:
  * Suman Anna s-a...@ti.com [150722 09:25]:
  On 07/22/2015 12:26 AM, Tony Lindgren wrote:
 
  I don't like using syscon for tinkering directly with SoC registers.
 
  This is not a SoC-level register, but a register within a sub-module of
  the DSP processor sub-system. The DSP_SYSTEM sub-module in general is
  described in Section 5.3.3 of the TRM [1], and it implements different
  functionalities like the PRCM handshaking, wakeup logic and DSP
  subsystem top-level configuration. It is a module present within the DSP
  processor sub-system, so can only be accessed when the sub-system is
  clocked and the appropriate reset is released.
  
  OK so if it's specific to the DSP driver along the lines of sysc and
  syss registers.
 
 There will be those registers too within the MMU config register space,
 even for DRA7xx MMUs. This is different, think of it like a register in
 the Control module except that it is present within the DSP sub-system
 instead of at the SoC level.

And what is taking care of pm_runtime_get here to ensure the module
is powered and clocked?

I think you are missing a layer here and it's the Linux kernel side
device driver for DSP that initializes things.
 
  Typically we handle these registers by mapping them to the PM runtime
  functions for the interconnect so we can reset and idle the hardware
  modules even if there is no driver, see for example
  omap54xx_mmu_dsp_hwmod.
 
 I haven't yet submitted the DRA7xx hwmods, but they will look almost
 exactly like the OMAP5 ones. The reset and idle on these are in general
 not effective at boot time since these are also controlled by a
 hard-reset line, so that's left to the drivers to deal with it through
 the omap_device_deassert_hardreset API.

If the MMU configuration is one time init, it may make sense to add
it to the hwmod reset function. However, if the Linux kernel side
driver needs to configure things depending on the DSP firmware, it
should be done in the kernel side device driver. 
 
  We should use some Linux generic framework for configuring these
  bits to avoid nasty dependencies between various hardware modules
  on the SoC.
 
  What does DSP_SYS_MMU_CONFIG register do? It seems it's probably
  a regulator or a gate clock? If so, it should be set up as a
  regulator or a clock and then the omap-iommu driver can just
  use regulator or clcok framework to request the resource.
 
  No, its neither. It is a control bit that dictates whether the
  processor/EDMA addresses go through the respective MMU or not. The
  register currently has 4 bits (bit 0 in each nibble), one each for
  enabling each MMU and requesting an MMU abort on each. The MMU
  integration and enablement notes are detailed in Section 5.3.6 of the
  TRM [1], and the DSP_SYS_MMU_CONFIG register layout is in Table 5-28
  (Page 1641).
  
  OK yeah seems like it should be handled by the DSP driver during
  probe after doing pm_runtime_get. Then the driver can configure
  things like IOMMU and load firmware. Or am I missing something here?
 
 The DSP (remoteproc) driver uses the generic IOMMU API, and all the
 IOMMU enabling sequence is transparent to the DSP driver. So, any
 configuration/enabling sequence specific to the IOMMU has to be handled
 within the respective IOMMU platform driver that gets integrated into
 the IOMMU API.

To me it seems you still need a DSP kernel driver to manage the
DSP hardware for PM runtime. I don't see how iommu and remoteproc
alone would be sufficient here. You end up sprinkling DSP driver
specific code to the iommu and remoteproc layers.

Regards,

Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/omap: Add support for configuring dsp iommus on DRA7xx

2015-07-23 Thread Suman Anna
Hi Tony,

On 07/23/2015 02:24 AM, Tony Lindgren wrote:
 * Suman Anna s-a...@ti.com [150722 09:25]:
 On 07/22/2015 12:26 AM, Tony Lindgren wrote:

 I don't like using syscon for tinkering directly with SoC registers.

 This is not a SoC-level register, but a register within a sub-module of
 the DSP processor sub-system. The DSP_SYSTEM sub-module in general is
 described in Section 5.3.3 of the TRM [1], and it implements different
 functionalities like the PRCM handshaking, wakeup logic and DSP
 subsystem top-level configuration. It is a module present within the DSP
 processor sub-system, so can only be accessed when the sub-system is
 clocked and the appropriate reset is released.
 
 OK so if it's specific to the DSP driver along the lines of sysc and
 syss registers.

There will be those registers too within the MMU config register space,
even for DRA7xx MMUs. This is different, think of it like a register in
the Control module except that it is present within the DSP sub-system
instead of at the SoC level.

 
 Typically we handle these registers by mapping them to the PM runtime
 functions for the interconnect so we can reset and idle the hardware
 modules even if there is no driver, see for example
 omap54xx_mmu_dsp_hwmod.

I haven't yet submitted the DRA7xx hwmods, but they will look almost
exactly like the OMAP5 ones. The reset and idle on these are in general
not effective at boot time since these are also controlled by a
hard-reset line, so that's left to the drivers to deal with it through
the omap_device_deassert_hardreset API.

  
 We should use some Linux generic framework for configuring these
 bits to avoid nasty dependencies between various hardware modules
 on the SoC.

 What does DSP_SYS_MMU_CONFIG register do? It seems it's probably
 a regulator or a gate clock? If so, it should be set up as a
 regulator or a clock and then the omap-iommu driver can just
 use regulator or clcok framework to request the resource.

 No, its neither. It is a control bit that dictates whether the
 processor/EDMA addresses go through the respective MMU or not. The
 register currently has 4 bits (bit 0 in each nibble), one each for
 enabling each MMU and requesting an MMU abort on each. The MMU
 integration and enablement notes are detailed in Section 5.3.6 of the
 TRM [1], and the DSP_SYS_MMU_CONFIG register layout is in Table 5-28
 (Page 1641).
 
 OK yeah seems like it should be handled by the DSP driver during
 probe after doing pm_runtime_get. Then the driver can configure
 things like IOMMU and load firmware. Or am I missing something here?

The DSP (remoteproc) driver uses the generic IOMMU API, and all the
IOMMU enabling sequence is transparent to the DSP driver. So, any
configuration/enabling sequence specific to the IOMMU has to be handled
within the respective IOMMU platform driver that gets integrated into
the IOMMU API.

regards
Suman

 
  
 [1] http://www.ti.com/lit/ug/spruhz6/spruhz6.pdf

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] Generic PCI MSI + IOMMU topology bindings

2015-07-23 Thread Mark Rutland
Hi all,

Currently we have no generic/standard mechanisms for describing the
relationship between PCI root complexes and other components which may be
required to make them usable, specifically IOMMUs and MSI controllers.

There is an existing binding for IOMMUs, and there is a de-facto standard for
referring to MSI controllers, but the generic portion of these can only
describe a relationship with a root complex as opposed to a device under a root
complex. This falls apart where IOMMUs and MSI controllers may distinguish
individual devices based on non-probeable information.

This series adds a generic glue bindings for describing the relationship
between root complexes, IOMMUs, and MSI controllers. The existing de-facto
binding for MSI controllers is formalised, along with a (backwards compatible)
extension necessary for describing contemporary MSI controllers which make use
of (non-probeable) sideband data.

Thanks,
Mark.

Mark Rutland (3):
  Docs: dt: add generic MSI bindings
  Docs: dt: Add PCI MSI map bindings
  Docs: dt: add PCI IOMMU map bindings

 .../bindings/interrupt-controller/msi.txt  | 135 +
 .../devicetree/bindings/pci/pci-iommu.txt  | 171 
 Documentation/devicetree/bindings/pci/pci-msi.txt  | 220 +
 3 files changed, 526 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/msi.txt
 create mode 100644 Documentation/devicetree/bindings/pci/pci-iommu.txt
 create mode 100644 Documentation/devicetree/bindings/pci/pci-msi.txt

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] Docs: dt: Add PCI MSI map bindings

2015-07-23 Thread Mark Rutland
Currently msi-parent is used by a few bindings to describe the
relationship between a PCI root complex and a single MSI controller, but
this property does not have a generic binding document.

Additionally, msi-parent is insufficient to describe more complex
relationships between MSI controllers and devices under a root complex,
where devices may be able to target multiple MSI controllers, or where
MSI controllers use (non-probeable) sideband information to distinguish
devices.

This patch adds a generic binding for mapping PCI devices to MSI
controllers. This document covers msi-parent, and a new msi-map property
(specific to PCI*) which may be used to map devices (identified by their
Requester ID) to sideband data for each MSI controller that they may
target.

Signed-off-by: Mark Rutland mark.rutl...@arm.com
---
 Documentation/devicetree/bindings/pci/pci-msi.txt | 220 ++
 1 file changed, 220 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/pci-msi.txt

diff --git a/Documentation/devicetree/bindings/pci/pci-msi.txt 
b/Documentation/devicetree/bindings/pci/pci-msi.txt
new file mode 100644
index 000..9b3cc81
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/pci-msi.txt
@@ -0,0 +1,220 @@
+This document describes the generic device tree binding for describing the
+relationship between PCI devices and MSI controllers.
+
+Each PCI device under a root complex is uniquely identified by its Requester ID
+(AKA RID). A Requester ID is a triplet of a Bus number, Device number, and
+Function number.
+
+For the purpose of this document, when treated as a numeric value, a RID is
+formatted such that:
+
+* Bits [15:8] are the Bus number.
+* Bits [7:3] are the Device number.
+* Bits [2:0] are the Function number.
+* Any other bits required for padding must be zero.
+
+MSIs may be distinguished in part through the use of sideband data accompanying
+writes. In the case of PCI devices, this sideband data may be derived from the
+Requester ID. A mechanism is required to associate a device with both the MSI
+controllers it can address, and the sideband data that will be associated with
+its writes to those controllers.
+
+For generic MSI bindings, see
+Documentation/devicetree/bindings/interrupt-controller/msi.txt.
+
+
+PCI root complex
+
+
+Optional properties
+---
+
+- msi-map: Maps a Requester ID to an MSI controller and associated
+  msi-specifier data. The property is an arbitrary number of tuples of
+  (rid-base,msi-controller,msi-base,length), where:
+
+  * rid-base is a single cell describing the first RID matched by the entry.
+
+  * msi-controller is a single phandle to an MSI controller
+
+  * msi-base is an msi-specifier describing the msi-specifier produced for the
+first RID matched by the entry.
+
+  * length is a single cell describing how many consecutive RIDs are matched
+following the rid-base.
+
+  Any RID r in the interval [rid-base, rid-base + length) is associated with
+  the listed msi-controller, with the msi-specifier (r - rid-base + msi-base).
+
+- msi-map-mask: A mask to be applied to each Requester ID prior to being mapped
+  to an msi-specifier per the msi-map property.
+
+- msi-parent: Describes the MSI parent of the root complex itself. Where
+  the root complex and MSI controller do not pass sideband data with MSI
+  writes, this property may be used to describe the MSI controller(s)
+  used by PCI devices under the root complex, if defined as such in the
+  binding for the root complex.
+
+
+Example (1)
+===
+
+/ {
+   #address-cells = 1;
+   #size-cells = 1;
+
+   msi: msi-controller@a {
+   reg = 0xa 0x1;
+   compatible = vendor,some-controller;
+   msi-controller;
+   #msi-cells = 1;
+   };
+
+   pci: pci@f {
+   reg = 0xf 0x1;
+   compatible = vendor,pcie-root-complex;
+   device_type = pci;
+
+   /*
+* The sideband data provided to the MSI controller is
+* the RID, identity-mapped.
+*/
+   msi-map = 0x0 msi_a 0x0 0x1,
+   };
+};
+
+
+Example (2)
+===
+
+/ {
+   #address-cells = 1;
+   #size-cells = 1;
+
+   msi: msi-controller@a {
+   reg = 0xa 0x1;
+   compatible = vendor,some-controller;
+   msi-controller;
+   #msi-cells = 1;
+   };
+
+   pci: pci@f {
+   reg = 0xf 0x1;
+   compatible = vendor,pcie-root-complex;
+   device_type = pci;
+
+   /*
+* The sideband data provided to the MSI controller is
+* the RID, masked to only the device and function bits.
+*/
+   msi-map = 0x0 msi_a 0x0 0x100,
+   msi-map-mask = 0xff
+   };
+};
+
+
+Example (3)
+===
+
+/ {
+   #address-cells = 1;
+   

Re: Lockdep warning in VFIO using v4.2-rc3

2015-07-23 Thread Alex Williamson
On Thu, 2015-07-23 at 11:15 +0200, Joerg Roedel wrote:
 Hi Alex,
 
 I stumbled over this lockdep warning yesterday while testing my VT-d
 changes. It looks like one code path is taking the locks:
 
   group-device_lock
   driver_lock
   pci_bus_sem
 
 while another path is taking
 
   pci_bus_sem
   group-device_lock
 
 which could lead to a deadlock. I attach the full warning, can you
 please have a look?

Hi Joerg,

Thanks for the report.  I think I found it.  I'll do further testing
myself, but would appreciate if you're able to see if this clears the
problem.  Thanks,

Alex

commit 8a8efa7943533495abfbfd18d316db64477136eb
Author: Alex Williamson alex.william...@redhat.com
Date:   Thu Jul 23 10:15:28 2015 -0600

vfio: Fix lockdep issue

When we open a device file descriptor, we currently have the
following:

vfio_group_get_device_fd()
  mutex_lock(group-device_lock);
open()
...
if (ret)
  release()

If we hit that error case, we call the backend driver release path,
which for vfio-pci looks like this:

vfio_pci_release()
  vfio_pci_disable()
vfio_pci_try_bus_reset()
  vfio_pci_get_devs()
vfio_device_get_from_dev()
  vfio_group_get_device()
mutex_lock(group-device_lock);

Whoops, we've stumbled back onto group.device_lock and created a
deadlock.  There's a low likelihood of ever seeing this play out, but
obviously it needs to be fixed.  To do that we can use a reference to
the vfio_device for vfio_group_get_device_fd() rather than holding the
lock.  There was a loop in this function, theoretically allowing
multiple devices with the same name, but in practice we don't expect
such a thing to happen and the code is already aborting from the loop
with break on any sort of error rather than coninuing and only parsing
the first match anyway, so the loop was effectively unused anyway.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Reported-by: Joerg Roedel j...@8bytes.org

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 2fb29df..563c510 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -689,6 +689,23 @@ struct vfio_device *vfio_device_get_from_dev(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
 
+static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
+char *buf)
+{
+   struct vfio_device *device;
+
+   mutex_lock(group-device_lock);
+   list_for_each_entry(device, group-device_list, group_next) {
+   if (!strcmp(dev_name(device-dev), buf)) {
+   vfio_device_get(device);
+   break;
+   }
+   }
+   mutex_unlock(group-device_lock);
+
+   return device;
+}
+
 /*
  * Caller must hold a reference to the vfio_device
  */
@@ -1198,53 +1215,53 @@ static int vfio_group_get_device_fd(struct vfio_group 
*group, char *buf)
 {
struct vfio_device *device;
struct file *filep;
-   int ret = -ENODEV;
+   int ret;
 
if (0 == atomic_read(group-container_users) ||
!group-container-iommu_driver || !vfio_group_viable(group))
return -EINVAL;
 
-   mutex_lock(group-device_lock);
-   list_for_each_entry(device, group-device_list, group_next) {
-   if (strcmp(dev_name(device-dev), buf))
-   continue;
+   device = vfio_device_get_from_name(group, buf);
+   if (!device)
+   return -ENODEV;
 
-   ret = device-ops-open(device-device_data);
-   if (ret)
-   break;
-   /*
-* We can't use anon_inode_getfd() because we need to modify
-* the f_mode flags directly to allow more than just ioctls
-*/
-   ret = get_unused_fd_flags(O_CLOEXEC);
-   if (ret  0) {
-   device-ops-release(device-device_data);
-   break;
-   }
+   ret = device-ops-open(device-device_data);
+   if (ret) {
+   vfio_device_put(device);
+   return ret;
+   }
 
-   filep = anon_inode_getfile([vfio-device], vfio_device_fops,
-  device, O_RDWR);
-   if (IS_ERR(filep)) {
-   put_unused_fd(ret);
-   ret = PTR_ERR(filep);
-   device-ops-release(device-device_data);
-   break;
-   }
+   /*
+* We can't use anon_inode_getfd() because we need to modify
+* the f_mode flags directly to allow more than just ioctls
+*/
+   ret = get_unused_fd_flags(O_CLOEXEC);
+   if (ret  0) {
+   device-ops-release(device-device_data);
+

Re: Lockdep warning in VFIO using v4.2-rc3

2015-07-23 Thread Joerg Roedel
Hi Alex,

On Thu, Jul 23, 2015 at 11:07:37AM -0600, Alex Williamson wrote:
 Thanks for the report.  I think I found it.  I'll do further testing
 myself, but would appreciate if you're able to see if this clears the
 problem.  Thanks,

Just tested it and the lockdep warning is gone. Thanks for your quick
look.

Tested-by: Joerg Roedel jroe...@suse.de

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] Docs: dt: add generic MSI bindings

2015-07-23 Thread Mark Rutland
Currently msi-parent is used in a couple of drivers despite being fairly
underspecified. This patch adds a generic binding for MSIs (including
the existing msi-parent property) enabling the description of platform
devices capable of using MSIs.

While MSIs are primarily distinguished by doorbell and payload, some MSI
controllers (e.g. the GICv3 ITS) also use side-band information
accompanying the write to identify the master which originated the MSI,
to allow for sandboxing. This sideband information is non-probeable and
needs to be described in the DT. Other MSI controllers may have
additional configuration details which need to be described per-master.

This patch adds a generic msi-parent binding document, extending the
de-facto standard with a new (optional) #msi-cells which can be used to
express any per-master configuration and/or sideband data. This is
sufficient to describe non-hotpluggable devices.

For busses where sideband data may be derived from some bus-specific
master ID scheme, other properties will be required to describe the
mapping.

Signed-off-by: Mark Rutland mark.rutl...@arm.com
---
 .../bindings/interrupt-controller/msi.txt  | 135 +
 1 file changed, 135 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/msi.txt

diff --git a/Documentation/devicetree/bindings/interrupt-controller/msi.txt 
b/Documentation/devicetree/bindings/interrupt-controller/msi.txt
new file mode 100644
index 000..c60c034
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/msi.txt
@@ -0,0 +1,135 @@
+This document describes the generic device tree binding for MSI controllers and
+their master(s).
+
+Message Signaled Interrupts (MSIs) are a class of interrupts generated by a
+write to an MMIO address.
+
+MSIs were originally specified by PCI (and are used with PCIe), but may also be
+used with other busses, and hence a mechanism is required to relate devices on
+those busses to the MSI controllers which they are capable of using,
+potentially including additional information.
+
+MSIs are distinguished by some combination of:
+
+- The doorbell (the MMIO address written to).
+  
+  Devices may be configured by software to write to arbitrary doorbells which
+  they can address. An MSI controller may feature a number of doorbells.
+
+- The payload (the value written to the doorbell).
+  
+  Devices may be configured to write an arbitrary payload chosen by software.
+  MSI controllers may have restrictions on permitted payloads.
+
+- Sideband information accompanying the write.
+  
+  Typically this is neither configurable nor probeable, and depends on the path
+  taken through the memory system (i.e. it is a property of the combination of
+  MSI controller and device rather than a property of either in isolation).
+
+
+MSI controllers:
+
+
+An MSI controller signals interrupts to a CPU when a write is made to an MMIO
+address by some master. An MSI controller may feature a number of doorbells.
+
+Required properties:
+
+
+- msi-controller: Identifies the node as an MSI controller.
+
+Optional properties:
+
+
+- #msi-cells: The number of cells in an msi-specifier, required if not zero.
+
+  Typically this will encode information related to sideband data, and will
+  not encode doorbells or payloads as these can be configured dynamically.
+
+  The meaning of the msi-specifier is defined by the device tree binding of
+  the specific MSI controller. 
+
+
+MSI clients
+===
+
+MSI clients are devices which generate MSIs. For each MSI they wish to
+generate, the doorbell and payload may be configured, though sideband
+information may not be configurable.
+
+Required properties:
+
+
+- msi-parent: A list of phandle + msi-specifier pairs, one for each MSI
+  controller which the device is capable of using.
+
+  This property is unordered, and MSIs may be allocated from any combination of
+  MSI controllers listed in the msi-parent property.
+
+  If a device has restrictions on the allocation of MSIs, these restrictions
+  must be described with additional properties.
+
+  When #msi-cells is non-zero, busses with an msi-parent will require
+  additional properties to describe the relationship between devices on the bus
+  and the set of MSIs they can potentially generate.
+
+
+Example
+===
+
+/ {
+   #address-cells = 1;
+   #size-cells = 1;
+
+   msi_a: msi-controller@a {
+   reg = 0xa 0xf00;
+   compatible = vendor-a,some-controller;
+   msi-controller;
+   /* No sideband data, so #msi-cells omitted */
+   };
+
+   msi_b: msi-controller@b {
+   reg = 0xb 0xf00;
+   compatible = vendor-b,another-controller;
+   msi-controller;
+   /* Each device has some unique ID */
+   #msi-cells = 1;
+   };
+
+   msi_c: 

Re: [PATCH] iommu/vt-d: Fix VM domain ID leak

2015-07-23 Thread Joerg Roedel
On Wed, Jul 22, 2015 at 03:01:04PM -0600, Alex Williamson wrote:
 In the meantime, we're still leaking domain IDs, do you see any problem
 with the proposed patch for 4.2 and stable?  Thanks,

Applied and pull-request sent.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Fixes for Linux v4.2-rc3

2015-07-23 Thread Joerg Roedel
Hi Linus,

The following changes since commit d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754:

  Linux 4.2-rc1 (2015-07-05 11:01:52 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v4.2-rc3

for you to fetch changes up to 46ebb7af7b93792de65e124e1ab8b89a108a41f2:

  iommu/vt-d: Fix VM domain ID leak (2015-07-23 14:17:39 +0200)


IOMMU Fixes for Linux v4.2-rc3

The fixes include:

* A couple of fixes for the new ARM-SMMUv3 driver to fix issues found
  on the first real implementation of that hardware.

* A patch for the Intel VT-d driver to fix a domain-id leak.


Alex Williamson (1):
  iommu/vt-d: Fix VM domain ID leak

Will Deacon (1):
  iommu/arm-smmu: Fix LOG2SIZE setting for 2-level stream tables

Zhen Lei (4):
  iommu/arm-smmu: Fix the index calculation of strtab
  iommu/arm-smmu: Fix the values of ARM64_TCR_{I,O}RGN0_SHIFT
  iommu/arm-smmu: Enlarge STRTAB_L1_SZ_SHIFT to support larger sidsize
  iommu/arm-smmu: Skip the execution of CMD_PREFETCH_CONFIG

 .../devicetree/bindings/iommu/arm,smmu-v3.txt  |  3 ++
 drivers/iommu/arm-smmu-v3.c| 60 +-
 drivers/iommu/intel-iommu.c|  9 ++--
 3 files changed, 55 insertions(+), 17 deletions(-)

Please pull.

Thanks,

Joerg



signature.asc
Description: Digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/3] Docs: dt: add PCI IOMMU map bindings

2015-07-23 Thread Mark Rutland
The existing IOMMU bindings are able to specify the relationship between
masters and IOMMUs, but they are insufficient for describing the general
case of hotpluggable busses such as PCI where the set of masters is not
known until runtime, and the relationship between masters and IOMMUs is
a property of the integration of the system.

This patch adds a generic binding for mapping PCI devices to IOMMUs,
using a new iommu-map property (specific to PCI*) which may be used to
map devices (identified by their Requester ID) to sideband data for the
IOMMU which they master through.

Signed-off-by: Mark Rutland mark.rutl...@arm.com
---
 .../devicetree/bindings/pci/pci-iommu.txt  | 171 +
 1 file changed, 171 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/pci-iommu.txt

diff --git a/Documentation/devicetree/bindings/pci/pci-iommu.txt 
b/Documentation/devicetree/bindings/pci/pci-iommu.txt
new file mode 100644
index 000..56c8296
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/pci-iommu.txt
@@ -0,0 +1,171 @@
+This document describes the generic device tree binding for describing the
+relationship between PCI(e) devices and IOMMU(s).
+
+Each PCI(e) device under a root complex is uniquely identified by its Requester
+ID (AKA RID). A Requester ID is a triplet of a Bus number, Device number, and
+Function number.
+
+For the purpose of this document, when treated as a numeric value, a RID is
+formatted such that:
+
+* Bits [15:8] are the Bus number.
+* Bits [7:3] are the Device number.
+* Bits [2:0] are the Function number.
+* Any other bits required for padding must be zero.
+
+IOMMUs may distinguish PCI devices through sideband data derived from the
+Requester ID. While a given PCI device can only master through one IOMMU, a
+root complex may split masters across a set of IOMMUs (e.g. with one IOMMU per
+bus).
+
+The generic 'iommus' property is insufficient to describe this relationship,
+and a mechanism is required to map from a PCI device to its IOMMU and sideband
+data.
+
+For generic IOMMU bindings, see
+Documentation/devicetree/bindings/iommu/iommu.txt.
+
+
+PCI root complex
+
+
+Optional properties
+---
+
+- iommu-map: Maps a Requester ID to an IOMMU and associated iommu-specifier
+  data.
+
+  The property is an arbitrary number of tuples of
+  (rid-base,iommu,iommu-base,length).
+
+  Any RID r in the interval [rid-base, rid-base + length) is associated with
+  the listed IOMMU, with the iommu-specifier (r - rid-base + iommu-base).
+
+- iommu-map-mask: A mask to be applied to each Requester ID prior to being
+  mapped to an iommu-specifier per the iommu-map property.
+
+
+Example (1)
+===
+
+/ {
+   #address-cells = 1;
+   #size-cells = 1;
+
+   iommu: iommu@a {
+   reg = 0xa 0x1;
+   compatible = vendor,some-iommu;
+   #iommu-cells = 1;
+   };
+
+   pci: pci@f {
+   reg = 0xf 0x1;
+   compatible = vendor,pcie-root-complex;
+   device_type = pci;
+
+   /*
+* The sideband data provided to the IOMMU is the RID,
+* identity-mapped.
+*/
+   iommu-map = 0x0 iommu 0x0 0x1;
+   };
+};
+
+
+Example (2)
+===
+
+/ {
+   #address-cells = 1;
+   #size-cells = 1;
+
+   iommu: iommu@a {
+   reg = 0xa 0x1;
+   compatible = vendor,some-iommu;
+   #iommu-cells = 1;
+   };
+
+   pci: pci@f {
+   reg = 0xf 0x1;
+   compatible = vendor,pcie-root-complex;
+   device_type = pci;
+
+   /*
+* The sideband data provided to the IOMMU is the RID with the
+* function bits masked out.
+*/
+   iommu-map = 0x0 iommu 0x0 0x1;
+   iommu-map-mask = 0xfff8;
+   };
+};
+
+
+Example (3)
+===
+
+/ {
+   #address-cells = 1;
+   #size-cells = 1;
+
+   iommu: iommu@a {
+   reg = 0xa 0x1;
+   compatible = vendor,some-iommu;
+   #iommu-cells = 1;
+   };
+
+   pci: pci@f {
+   reg = 0xf 0x1;
+   compatible = vendor,pcie-root-complex;
+   device_type = pci;
+
+   /*
+* The sideband data provided to the IOMMU is the RID,
+* but the high bits of the bus number are flipped.
+*/
+   iommu-map = 0x iommu 0x8000 0x8000,
+   0x8000 iommu 0x 0x8000;
+   };
+};
+
+
+Example (4)
+===
+
+/ {
+   #address-cells = 1;
+   #size-cells = 1;
+
+   iommu_a: iommu@a {
+   reg = 0xa 0x1;
+   compatible = vendor,some-iommu;
+   #iommu-cells = 1;
+   };
+
+   iommu_b: iommu@b {
+   reg = 0xb 0x1;
+   compatible = 

Re: [PATCH 2/2] iommu/omap: Add support for configuring dsp iommus on DRA7xx

2015-07-23 Thread Tony Lindgren
* Suman Anna s-a...@ti.com [150722 09:25]:
 On 07/22/2015 12:26 AM, Tony Lindgren wrote:
  
  I don't like using syscon for tinkering directly with SoC registers.
 
 This is not a SoC-level register, but a register within a sub-module of
 the DSP processor sub-system. The DSP_SYSTEM sub-module in general is
 described in Section 5.3.3 of the TRM [1], and it implements different
 functionalities like the PRCM handshaking, wakeup logic and DSP
 subsystem top-level configuration. It is a module present within the DSP
 processor sub-system, so can only be accessed when the sub-system is
 clocked and the appropriate reset is released.

OK so if it's specific to the DSP driver along the lines of sysc and
syss registers.

Typically we handle these registers by mapping them to the PM runtime
functions for the interconnect so we can reset and idle the hardware
modules even if there is no driver, see for example
omap54xx_mmu_dsp_hwmod.
 
  We should use some Linux generic framework for configuring these
  bits to avoid nasty dependencies between various hardware modules
  on the SoC.
  
  What does DSP_SYS_MMU_CONFIG register do? It seems it's probably
  a regulator or a gate clock? If so, it should be set up as a
  regulator or a clock and then the omap-iommu driver can just
  use regulator or clcok framework to request the resource.
 
 No, its neither. It is a control bit that dictates whether the
 processor/EDMA addresses go through the respective MMU or not. The
 register currently has 4 bits (bit 0 in each nibble), one each for
 enabling each MMU and requesting an MMU abort on each. The MMU
 integration and enablement notes are detailed in Section 5.3.6 of the
 TRM [1], and the DSP_SYS_MMU_CONFIG register layout is in Table 5-28
 (Page 1641).

OK yeah seems like it should be handled by the DSP driver during
probe after doing pm_runtime_get. Then the driver can configure
things like IOMMU and load firmware. Or am I missing something here?

Regards,

Tony

 
 [1] http://www.ti.com/lit/ug/spruhz6/spruhz6.pdf
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Lockdep warning in VFIO using v4.2-rc3

2015-07-23 Thread Joerg Roedel
Hi Alex,

I stumbled over this lockdep warning yesterday while testing my VT-d
changes. It looks like one code path is taking the locks:

group-device_lock
driver_lock
pci_bus_sem

while another path is taking

pci_bus_sem
group-device_lock

which could lead to a deadlock. I attach the full warning, can you
please have a look?

[  252.892008] ==
[  252.892008] [ INFO: possible circular locking dependency detected ]
[  252.892008] 4.2.0-rc3+ #16 Not tainted
[  252.892008] ---
[  252.892008] qemu-system-x86/4986 is trying to acquire lock:
[  252.892008]  (group-device_lock){+.+.+.}, at: [a0569da4] 
vfio_group_get_device+0x24/0xb0 [vfio]
[  252.892008] 
but task is already holding lock:
[  252.892008]  (pci_bus_sem){.+}, at: [813ee47e] 
pci_walk_bus+0x2e/0xa0
[  252.892008] 
which lock already depends on the new lock.

[  252.892008] 
the existing dependency chain (in reverse order) is:
[  252.892008] 
- #2 (pci_bus_sem){.+}:
[  252.892008][810c9152] __lock_acquire+0xca2/0x1550
[  252.892008][810c9b6f] lock_acquire+0xdf/0x2c0
[  252.892008][81710c4c] down_read+0x4c/0xa0
[  252.892008][813ee47e] pci_walk_bus+0x2e/0xa0
[  252.892008][a0657dcb] vfio_pci_release+0x18b/0x3c0 
[vfio_pci]
[  252.892008][a056a100] vfio_device_fops_release+0x20/0x40 
[vfio]
[  252.892008][81211460] __fput+0xf0/0x200
[  252.892008][812115be] fput+0xe/0x10
[  252.892008][8108f5fd] task_work_run+0x8d/0xc0
[  252.892008][8106d35c] do_exit+0x32c/0xc30
[  252.892008][8106f10c] do_group_exit+0x4c/0xc0
[  252.892008][8107d678] get_signal+0x328/0x9e0
[  252.892008][81003458] do_signal+0x28/0x9e0
[  252.892008][81003e75] do_notify_resume+0x65/0x80
[  252.892008][81713e2e] int_signal+0x12/0x17
[  252.892008] 
- #1 (driver_lock){+.+.+.}:
[  252.892008][810c9152] __lock_acquire+0xca2/0x1550
[  252.892008][810c9b6f] lock_acquire+0xdf/0x2c0
[  252.892008][8170e71b] mutex_lock_nested+0x6b/0x420
[  252.892008][a06576b8] vfio_pci_open+0x38/0x270 [vfio_pci]
[  252.892008][a056ab17] 
vfio_group_fops_unl_ioctl+0x267/0x460 [vfio]
[  252.892008][8122454d] do_vfs_ioctl+0x30d/0x580
[  252.892008][81224839] SyS_ioctl+0x79/0x90
[  252.892008][81713c32] entry_SYSCALL_64_fastpath+0x16/0x7a
[  252.892008] 
- #2 (group-device_lock){+.+.+.}:
[  252.892008][810c6c6c] check_prevs_add+0x8fc/0x900
[  252.892008][810c9152] __lock_acquire+0xca2/0x1550
[  252.892008][810c9b6f] lock_acquire+0xdf/0x2c0
[  252.892008][8170e71b] mutex_lock_nested+0x6b/0x420
[  252.892008][a0569da4] vfio_group_get_device+0x24/0xb0 
[vfio]
[  252.892008][a056a165] vfio_device_get_from_dev+0x45/0xa0 
[vfio]
[  252.892008][a065762c] vfio_pci_get_devs+0x2c/0x80 
[vfio_pci]
[  252.892008][a065706d] vfio_pci_walk_wrapper+0x5d/0x70 
[vfio_pci]
[  252.892008][813ee4c5] pci_walk_bus+0x75/0xa0
[  252.892008][a0657e93] vfio_pci_release+0x253/0x3c0 
[vfio_pci]
[  252.892008][a056a100] vfio_device_fops_release+0x20/0x40 
[vfio]
[  252.892008][81211460] __fput+0xf0/0x200
[  252.892008][812115be] fput+0xe/0x10
[  252.892008][8108f5fd] task_work_run+0x8d/0xc0
[  252.892008][8106d35c] do_exit+0x32c/0xc30
[  252.892008][8106f10c] do_group_exit+0x4c/0xc0
[  252.892008][8107d678] get_signal+0x328/0x9e0
[  252.892008][81003458] do_signal+0x28/0x9e0
[  252.892008][81003e75] do_notify_resume+0x65/0x80
[  252.892008][81713e2e] int_signal+0x12/0x17
[  252.892008] 
other info that might help us debug this:

[  252.892008] Chain exists of:
  group-device_lock -- driver_lock -- pci_bus_sem

[  252.892008]  Possible unsafe locking scenario:

[  252.892008]CPU0CPU1
[  252.892008]
[  252.892008]   lock(pci_bus_sem);
[  252.892008]lock(driver_lock);
[  252.892008]lock(pci_bus_sem);
[  252.892008]   lock(group-device_lock);
[  252.892008] 
 *** DEADLOCK ***

[  252.892008] 2 locks held by qemu-system-x86/4986:
[  252.892008]  #0:  (driver_lock){+.+.+.}, at: [a0657c67] 
vfio_pci_release+0x27/0x3c0 [vfio_pci]
[  252.892008]  #1:  (pci_bus_sem){.+}, at: [813ee47e] 
pci_walk_bus+0x2e/0xa0
[  252.892008] 
stack backtrace:
[  252.892008]