Re: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path
On Wed, 22 Jul 2015 10:54:35 -0600 Alex Williamson alex.william...@redhat.com wrote: On Tue, 2015-07-21 at 19:44 +0200, Gerald Schaefer wrote: When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci device is removed thereafter (before any other ioctl like VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of the underlying IOMMU API is never called. This patch adds a call to vfio_group_try_dissolve_container() to the remove path, which will trigger the missing detach_dev callback in this scenario. Signed-off-by: Gerald Schaefer gerald.schae...@de.ibm.com --- drivers/vfio/vfio.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 2fb29df..9c5c784 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -711,6 +711,8 @@ static bool vfio_dev_present(struct vfio_group *group, struct device *dev) return true; } +static void vfio_group_try_dissolve_container(struct vfio_group *group); + /* * Decrement the device reference count and wait for the device to be * removed. Open file descriptors for the device... */ @@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev) } } while (ret = 0); + vfio_group_try_dissolve_container(group); vfio_group_put(group); return device_data; This won't work, vfio_group_try_dissolve_container() decrements container_users, which an unused device is not. Imagine if we had more than one device in the iommu group, one device is removed and the container is dissolved despite the user holding a reference and other viable devices remaining. Additionally, from an isolation perspective, an unbind from vfio-pci should not pull the device out of the iommu domain, it's part of the domain because it's not isolated and that continues even after unbind. I think what you want to do is detach a device from the iommu domain only when it's being removed from iommu group, such as through iommu_group_remove_device(). We already have a bit of an asymmetry there as iommu_group_add_device() will add devices to the currently active iommu domain for the group, but iommu_group_remove_device() does not appear to do the reverse. Thanks, Interesting, I haven't noticed this asymmetry so far, do you mean something like this: diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f286090..82ac8b3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -447,6 +447,9 @@ rename: } EXPORT_SYMBOL_GPL(iommu_group_add_device); +static void __iommu_detach_device(struct iommu_domain *domain, + struct device *dev); + /** * iommu_group_remove_device - remove a device from it's current group * @dev: device to be removed @@ -466,6 +469,8 @@ void iommu_group_remove_device(struct device *dev) IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev); mutex_lock(group-mutex); + if (group-domain) + __iommu_detach_device(group-domain, dev); list_for_each_entry(tmp_device, group-devices, list) { if (tmp_device-dev == dev) { device = tmp_device; This would also fix the issue in my scenario, but like before that doesn't need to mean it is the correct fix. Adding the iommu list and maintainer to cc. Joerg, what do you think? (see https://lkml.org/lkml/2015/7/21/635 for the problem description) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 1/1] vfio-pci/iommu: Detach iommu group on remove path
On Wed, 22 Jul 2015 11:10:57 -0600 Alex Williamson alex.william...@redhat.com wrote: On Wed, 2015-07-22 at 10:54 -0600, Alex Williamson wrote: On Tue, 2015-07-21 at 19:44 +0200, Gerald Schaefer wrote: When a user completes the VFIO_SET_IOMMU ioctl and the vfio-pci device is removed thereafter (before any other ioctl like VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of the underlying IOMMU API is never called. This patch adds a call to vfio_group_try_dissolve_container() to the remove path, which will trigger the missing detach_dev callback in this scenario. Signed-off-by: Gerald Schaefer gerald.schae...@de.ibm.com --- drivers/vfio/vfio.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 2fb29df..9c5c784 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -711,6 +711,8 @@ static bool vfio_dev_present(struct vfio_group *group, struct device *dev) return true; } +static void vfio_group_try_dissolve_container(struct vfio_group *group); + /* * Decrement the device reference count and wait for the device to be * removed. Open file descriptors for the device... */ @@ -785,6 +787,7 @@ void *vfio_del_group_dev(struct device *dev) } } while (ret = 0); + vfio_group_try_dissolve_container(group); vfio_group_put(group); return device_data; This won't work, vfio_group_try_dissolve_container() decrements container_users, which an unused device is not. Imagine if we had more than one device in the iommu group, one device is removed and the container is dissolved despite the user holding a reference and other viable devices remaining. Additionally, from an isolation perspective, an unbind from vfio-pci should not pull the device out of the iommu domain, it's part of the domain because it's not isolated and that continues even after unbind. I think what you want to do is detach a device from the iommu domain only when it's being removed from iommu group, such as through iommu_group_remove_device(). We already have a bit of an asymmetry there as iommu_group_add_device() will add devices to the currently active iommu domain for the group, but iommu_group_remove_device() does not appear to do the reverse. Thanks, BTW, VT-d on x86 avoids a leak using its own notifier_block, drivers/iommu/intel-iommu.c:device_notifier() catches BUS_NOTIFY_REMOVED_DEVICE and removes the device from the domain (the domain_exit() there is only used for non-IOMMU-API domains). It's possible that's the only IOMMU driver that avoids a leak due to the scenario you describe. Thanks, Thanks, that's good to know, so as a last resort I could also use the notifier to work around the issue. But x86 seems to be the only arch using this notifier so far, so a general fix would be nice. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 3/6] iommu: add ARM short descriptor page table allocator.
Hi Will, Thanks for your review so detail. When you are free, please help me check whether it's ok if it's changed like below. Thanks very much. On Tue, 2015-07-21 at 18:11 +0100, Will Deacon wrote: Hello, This is looking better, but I still have some concerns. On Thu, Jul 16, 2015 at 10:04:32AM +0100, Yong Wu wrote: This patch is for ARM Short Descriptor Format. Signed-off-by: Yong Wu yong...@mediatek.com --- drivers/iommu/Kconfig| 18 + drivers/iommu/Makefile |1 + drivers/iommu/io-pgtable-arm-short.c | 742 ++ drivers/iommu/io-pgtable-arm.c |3 - drivers/iommu/io-pgtable.c |4 + drivers/iommu/io-pgtable.h | 13 + 6 files changed, 778 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/io-pgtable-arm-short.c [...] +#define ARM_SHORT_PGDIR_SHIFT 20 +#define ARM_SHORT_PAGE_SHIFT 12 +#define ARM_SHORT_PTRS_PER_PTE \ + (1 (ARM_SHORT_PGDIR_SHIFT - ARM_SHORT_PAGE_SHIFT)) +#define ARM_SHORT_BYTES_PER_PTE\ + (ARM_SHORT_PTRS_PER_PTE * sizeof(arm_short_iopte)) + +/* level 1 pagetable */ +#define ARM_SHORT_PGD_TYPE_PGTABLE BIT(0) +#define ARM_SHORT_PGD_SECTION_XN (BIT(0) | BIT(4)) +#define ARM_SHORT_PGD_TYPE_SECTION BIT(1) +#define ARM_SHORT_PGD_PGTABLE_XN BIT(2) This should be PXN, but I'm not even sure why we care for a table at level 1. I will delete it. +#define ARM_SHORT_PGD_BBIT(2) +#define ARM_SHORT_PGD_CBIT(3) +#define ARM_SHORT_PGD_PGTABLE_NS BIT(3) +#define ARM_SHORT_PGD_IMPLEBIT(9) +#define ARM_SHORT_PGD_TEX0 BIT(12) +#define ARM_SHORT_PGD_SBIT(16) +#define ARM_SHORT_PGD_nG BIT(17) +#define ARM_SHORT_PGD_SUPERSECTION BIT(18) +#define ARM_SHORT_PGD_SECTION_NS BIT(19) + +#define ARM_SHORT_PGD_TYPE_SUPERSECTION\ + (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION) +#define ARM_SHORT_PGD_SECTION_TYPE_MSK \ + (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION) +#define ARM_SHORT_PGD_PGTABLE_TYPE_MSK \ + (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_TYPE_PGTABLE) +#define ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd) \ + (((pgd) ARM_SHORT_PGD_PGTABLE_TYPE_MSK) == ARM_SHORT_PGD_TYPE_PGTABLE) +#define ARM_SHORT_PGD_TYPE_IS_SECTION(pgd) \ + (((pgd) ARM_SHORT_PGD_SECTION_TYPE_MSK) == ARM_SHORT_PGD_TYPE_SECTION) +#define ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(pgd)\ + (((pgd) ARM_SHORT_PGD_SECTION_TYPE_MSK) == \ + ARM_SHORT_PGD_TYPE_SUPERSECTION) +#define ARM_SHORT_PGD_PGTABLE_MSK 0xfc00 +#define ARM_SHORT_PGD_SECTION_MSK (~(SZ_1M - 1)) +#define ARM_SHORT_PGD_SUPERSECTION_MSK (~(SZ_16M - 1)) + +/* level 2 pagetable */ +#define ARM_SHORT_PTE_TYPE_LARGE BIT(0) +#define ARM_SHORT_PTE_SMALL_XN BIT(0) +#define ARM_SHORT_PTE_TYPE_SMALL BIT(1) +#define ARM_SHORT_PTE_BBIT(2) +#define ARM_SHORT_PTE_CBIT(3) +#define ARM_SHORT_PTE_SMALL_TEX0 BIT(6) +#define ARM_SHORT_PTE_IMPLEBIT(9) This is AP[2] for small pages. Sorry, In our pagetable bit9 in PGD and PTE is PA[32] that is for the dram size over 4G. I didn't care it is different in PTE of the standard spec. And I don't use the AP[2] currently, so I only delete this line in next time. +#define ARM_SHORT_PTE_SBIT(10) +#define ARM_SHORT_PTE_nG BIT(11) +#define ARM_SHORT_PTE_LARGE_TEX0 BIT(12) +#define ARM_SHORT_PTE_LARGE_XN BIT(15) +#define ARM_SHORT_PTE_LARGE_MSK(~(SZ_64K - 1)) +#define ARM_SHORT_PTE_SMALL_MSK(~(SZ_4K - 1)) +#define ARM_SHORT_PTE_TYPE_MSK \ + (ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL) +#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte) \ + (pte) ARM_SHORT_PTE_TYPE_MSK) 1) 1)\ + == ARM_SHORT_PTE_TYPE_SMALL) I see what you're trying to do here, but the shifting is confusing. I think it's better doing something like: ((pte) ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL or even just: (pte) ARM_SHORT_PTE_TYPE_SMALL Thanks. I will use this: ((pte) ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL This line may be close to the ARM_SHORT_PTE_TYPE_IS_LARGEPAGE below. +#define ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(pte) \ + (((pte)
Re: [PATCH v3 5/6] iommu/mediatek: Add mt8173 IOMMU driver
On Tue, 2015-07-21 at 15:59 +0100, Will Deacon wrote: Hi Yong Wu, On Thu, Jul 16, 2015 at 10:04:34AM +0100, Yong Wu wrote: This patch adds support for mediatek m4u (MultiMedia Memory Management Unit). [...] +static void mtk_iommu_tlb_flush_all(void *cookie) +{ + struct mtk_iommu_domain *domain = cookie; + void __iomem *base; + + base = domain-data-base; + writel(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL); + writel(F_ALL_INVLD, base + REG_MMU_INVALIDATE); This needs to be synchronous, so you probably want to call mtk_iommu_tlb_sync at the end. From our spec, we have to wait until HW done after tlb flush range. But it don't need wait after tlb flush all. so It isn't necessary to add mtk_iommu_tlb_sync in tlb_flush_all here. +} + +static void mtk_iommu_tlb_add_flush(unsigned long iova, size_t size, + bool leaf, void *cookie) +{ + struct mtk_iommu_domain *domain = cookie; + void __iomem *base = domain-data-base; + unsigned int iova_start = iova, iova_end = iova + size - 1; + + writel(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL); + + writel(iova_start, base + REG_MMU_INVLD_START_A); + writel(iova_end, base + REG_MMU_INVLD_END_A); + writel(F_MMU_INV_RANGE, base + REG_MMU_INVALIDATE); Why are you using writel instead of writel_relaxed? I asked this before but I don't think you replied. Sorry, I didn't notice _relax last time. I will improve in next version. Thanks very much. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/omap: Add support for configuring dsp iommus on DRA7xx
* Suman Anna s-a...@ti.com [150723 09:25]: Hi Tony, On 07/23/2015 02:24 AM, Tony Lindgren wrote: * Suman Anna s-a...@ti.com [150722 09:25]: On 07/22/2015 12:26 AM, Tony Lindgren wrote: I don't like using syscon for tinkering directly with SoC registers. This is not a SoC-level register, but a register within a sub-module of the DSP processor sub-system. The DSP_SYSTEM sub-module in general is described in Section 5.3.3 of the TRM [1], and it implements different functionalities like the PRCM handshaking, wakeup logic and DSP subsystem top-level configuration. It is a module present within the DSP processor sub-system, so can only be accessed when the sub-system is clocked and the appropriate reset is released. OK so if it's specific to the DSP driver along the lines of sysc and syss registers. There will be those registers too within the MMU config register space, even for DRA7xx MMUs. This is different, think of it like a register in the Control module except that it is present within the DSP sub-system instead of at the SoC level. And what is taking care of pm_runtime_get here to ensure the module is powered and clocked? I think you are missing a layer here and it's the Linux kernel side device driver for DSP that initializes things. Typically we handle these registers by mapping them to the PM runtime functions for the interconnect so we can reset and idle the hardware modules even if there is no driver, see for example omap54xx_mmu_dsp_hwmod. I haven't yet submitted the DRA7xx hwmods, but they will look almost exactly like the OMAP5 ones. The reset and idle on these are in general not effective at boot time since these are also controlled by a hard-reset line, so that's left to the drivers to deal with it through the omap_device_deassert_hardreset API. If the MMU configuration is one time init, it may make sense to add it to the hwmod reset function. However, if the Linux kernel side driver needs to configure things depending on the DSP firmware, it should be done in the kernel side device driver. We should use some Linux generic framework for configuring these bits to avoid nasty dependencies between various hardware modules on the SoC. What does DSP_SYS_MMU_CONFIG register do? It seems it's probably a regulator or a gate clock? If so, it should be set up as a regulator or a clock and then the omap-iommu driver can just use regulator or clcok framework to request the resource. No, its neither. It is a control bit that dictates whether the processor/EDMA addresses go through the respective MMU or not. The register currently has 4 bits (bit 0 in each nibble), one each for enabling each MMU and requesting an MMU abort on each. The MMU integration and enablement notes are detailed in Section 5.3.6 of the TRM [1], and the DSP_SYS_MMU_CONFIG register layout is in Table 5-28 (Page 1641). OK yeah seems like it should be handled by the DSP driver during probe after doing pm_runtime_get. Then the driver can configure things like IOMMU and load firmware. Or am I missing something here? The DSP (remoteproc) driver uses the generic IOMMU API, and all the IOMMU enabling sequence is transparent to the DSP driver. So, any configuration/enabling sequence specific to the IOMMU has to be handled within the respective IOMMU platform driver that gets integrated into the IOMMU API. To me it seems you still need a DSP kernel driver to manage the DSP hardware for PM runtime. I don't see how iommu and remoteproc alone would be sufficient here. You end up sprinkling DSP driver specific code to the iommu and remoteproc layers. Regards, Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/omap: Add support for configuring dsp iommus on DRA7xx
Hi Tony, On 07/23/2015 02:24 AM, Tony Lindgren wrote: * Suman Anna s-a...@ti.com [150722 09:25]: On 07/22/2015 12:26 AM, Tony Lindgren wrote: I don't like using syscon for tinkering directly with SoC registers. This is not a SoC-level register, but a register within a sub-module of the DSP processor sub-system. The DSP_SYSTEM sub-module in general is described in Section 5.3.3 of the TRM [1], and it implements different functionalities like the PRCM handshaking, wakeup logic and DSP subsystem top-level configuration. It is a module present within the DSP processor sub-system, so can only be accessed when the sub-system is clocked and the appropriate reset is released. OK so if it's specific to the DSP driver along the lines of sysc and syss registers. There will be those registers too within the MMU config register space, even for DRA7xx MMUs. This is different, think of it like a register in the Control module except that it is present within the DSP sub-system instead of at the SoC level. Typically we handle these registers by mapping them to the PM runtime functions for the interconnect so we can reset and idle the hardware modules even if there is no driver, see for example omap54xx_mmu_dsp_hwmod. I haven't yet submitted the DRA7xx hwmods, but they will look almost exactly like the OMAP5 ones. The reset and idle on these are in general not effective at boot time since these are also controlled by a hard-reset line, so that's left to the drivers to deal with it through the omap_device_deassert_hardreset API. We should use some Linux generic framework for configuring these bits to avoid nasty dependencies between various hardware modules on the SoC. What does DSP_SYS_MMU_CONFIG register do? It seems it's probably a regulator or a gate clock? If so, it should be set up as a regulator or a clock and then the omap-iommu driver can just use regulator or clcok framework to request the resource. No, its neither. It is a control bit that dictates whether the processor/EDMA addresses go through the respective MMU or not. The register currently has 4 bits (bit 0 in each nibble), one each for enabling each MMU and requesting an MMU abort on each. The MMU integration and enablement notes are detailed in Section 5.3.6 of the TRM [1], and the DSP_SYS_MMU_CONFIG register layout is in Table 5-28 (Page 1641). OK yeah seems like it should be handled by the DSP driver during probe after doing pm_runtime_get. Then the driver can configure things like IOMMU and load firmware. Or am I missing something here? The DSP (remoteproc) driver uses the generic IOMMU API, and all the IOMMU enabling sequence is transparent to the DSP driver. So, any configuration/enabling sequence specific to the IOMMU has to be handled within the respective IOMMU platform driver that gets integrated into the IOMMU API. regards Suman [1] http://www.ti.com/lit/ug/spruhz6/spruhz6.pdf ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/3] Generic PCI MSI + IOMMU topology bindings
Hi all, Currently we have no generic/standard mechanisms for describing the relationship between PCI root complexes and other components which may be required to make them usable, specifically IOMMUs and MSI controllers. There is an existing binding for IOMMUs, and there is a de-facto standard for referring to MSI controllers, but the generic portion of these can only describe a relationship with a root complex as opposed to a device under a root complex. This falls apart where IOMMUs and MSI controllers may distinguish individual devices based on non-probeable information. This series adds a generic glue bindings for describing the relationship between root complexes, IOMMUs, and MSI controllers. The existing de-facto binding for MSI controllers is formalised, along with a (backwards compatible) extension necessary for describing contemporary MSI controllers which make use of (non-probeable) sideband data. Thanks, Mark. Mark Rutland (3): Docs: dt: add generic MSI bindings Docs: dt: Add PCI MSI map bindings Docs: dt: add PCI IOMMU map bindings .../bindings/interrupt-controller/msi.txt | 135 + .../devicetree/bindings/pci/pci-iommu.txt | 171 Documentation/devicetree/bindings/pci/pci-msi.txt | 220 + 3 files changed, 526 insertions(+) create mode 100644 Documentation/devicetree/bindings/interrupt-controller/msi.txt create mode 100644 Documentation/devicetree/bindings/pci/pci-iommu.txt create mode 100644 Documentation/devicetree/bindings/pci/pci-msi.txt -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/3] Docs: dt: Add PCI MSI map bindings
Currently msi-parent is used by a few bindings to describe the relationship between a PCI root complex and a single MSI controller, but this property does not have a generic binding document. Additionally, msi-parent is insufficient to describe more complex relationships between MSI controllers and devices under a root complex, where devices may be able to target multiple MSI controllers, or where MSI controllers use (non-probeable) sideband information to distinguish devices. This patch adds a generic binding for mapping PCI devices to MSI controllers. This document covers msi-parent, and a new msi-map property (specific to PCI*) which may be used to map devices (identified by their Requester ID) to sideband data for each MSI controller that they may target. Signed-off-by: Mark Rutland mark.rutl...@arm.com --- Documentation/devicetree/bindings/pci/pci-msi.txt | 220 ++ 1 file changed, 220 insertions(+) create mode 100644 Documentation/devicetree/bindings/pci/pci-msi.txt diff --git a/Documentation/devicetree/bindings/pci/pci-msi.txt b/Documentation/devicetree/bindings/pci/pci-msi.txt new file mode 100644 index 000..9b3cc81 --- /dev/null +++ b/Documentation/devicetree/bindings/pci/pci-msi.txt @@ -0,0 +1,220 @@ +This document describes the generic device tree binding for describing the +relationship between PCI devices and MSI controllers. + +Each PCI device under a root complex is uniquely identified by its Requester ID +(AKA RID). A Requester ID is a triplet of a Bus number, Device number, and +Function number. + +For the purpose of this document, when treated as a numeric value, a RID is +formatted such that: + +* Bits [15:8] are the Bus number. +* Bits [7:3] are the Device number. +* Bits [2:0] are the Function number. +* Any other bits required for padding must be zero. + +MSIs may be distinguished in part through the use of sideband data accompanying +writes. In the case of PCI devices, this sideband data may be derived from the +Requester ID. A mechanism is required to associate a device with both the MSI +controllers it can address, and the sideband data that will be associated with +its writes to those controllers. + +For generic MSI bindings, see +Documentation/devicetree/bindings/interrupt-controller/msi.txt. + + +PCI root complex + + +Optional properties +--- + +- msi-map: Maps a Requester ID to an MSI controller and associated + msi-specifier data. The property is an arbitrary number of tuples of + (rid-base,msi-controller,msi-base,length), where: + + * rid-base is a single cell describing the first RID matched by the entry. + + * msi-controller is a single phandle to an MSI controller + + * msi-base is an msi-specifier describing the msi-specifier produced for the +first RID matched by the entry. + + * length is a single cell describing how many consecutive RIDs are matched +following the rid-base. + + Any RID r in the interval [rid-base, rid-base + length) is associated with + the listed msi-controller, with the msi-specifier (r - rid-base + msi-base). + +- msi-map-mask: A mask to be applied to each Requester ID prior to being mapped + to an msi-specifier per the msi-map property. + +- msi-parent: Describes the MSI parent of the root complex itself. Where + the root complex and MSI controller do not pass sideband data with MSI + writes, this property may be used to describe the MSI controller(s) + used by PCI devices under the root complex, if defined as such in the + binding for the root complex. + + +Example (1) +=== + +/ { + #address-cells = 1; + #size-cells = 1; + + msi: msi-controller@a { + reg = 0xa 0x1; + compatible = vendor,some-controller; + msi-controller; + #msi-cells = 1; + }; + + pci: pci@f { + reg = 0xf 0x1; + compatible = vendor,pcie-root-complex; + device_type = pci; + + /* +* The sideband data provided to the MSI controller is +* the RID, identity-mapped. +*/ + msi-map = 0x0 msi_a 0x0 0x1, + }; +}; + + +Example (2) +=== + +/ { + #address-cells = 1; + #size-cells = 1; + + msi: msi-controller@a { + reg = 0xa 0x1; + compatible = vendor,some-controller; + msi-controller; + #msi-cells = 1; + }; + + pci: pci@f { + reg = 0xf 0x1; + compatible = vendor,pcie-root-complex; + device_type = pci; + + /* +* The sideband data provided to the MSI controller is +* the RID, masked to only the device and function bits. +*/ + msi-map = 0x0 msi_a 0x0 0x100, + msi-map-mask = 0xff + }; +}; + + +Example (3) +=== + +/ { + #address-cells = 1; +
Re: Lockdep warning in VFIO using v4.2-rc3
On Thu, 2015-07-23 at 11:15 +0200, Joerg Roedel wrote: Hi Alex, I stumbled over this lockdep warning yesterday while testing my VT-d changes. It looks like one code path is taking the locks: group-device_lock driver_lock pci_bus_sem while another path is taking pci_bus_sem group-device_lock which could lead to a deadlock. I attach the full warning, can you please have a look? Hi Joerg, Thanks for the report. I think I found it. I'll do further testing myself, but would appreciate if you're able to see if this clears the problem. Thanks, Alex commit 8a8efa7943533495abfbfd18d316db64477136eb Author: Alex Williamson alex.william...@redhat.com Date: Thu Jul 23 10:15:28 2015 -0600 vfio: Fix lockdep issue When we open a device file descriptor, we currently have the following: vfio_group_get_device_fd() mutex_lock(group-device_lock); open() ... if (ret) release() If we hit that error case, we call the backend driver release path, which for vfio-pci looks like this: vfio_pci_release() vfio_pci_disable() vfio_pci_try_bus_reset() vfio_pci_get_devs() vfio_device_get_from_dev() vfio_group_get_device() mutex_lock(group-device_lock); Whoops, we've stumbled back onto group.device_lock and created a deadlock. There's a low likelihood of ever seeing this play out, but obviously it needs to be fixed. To do that we can use a reference to the vfio_device for vfio_group_get_device_fd() rather than holding the lock. There was a loop in this function, theoretically allowing multiple devices with the same name, but in practice we don't expect such a thing to happen and the code is already aborting from the loop with break on any sort of error rather than coninuing and only parsing the first match anyway, so the loop was effectively unused anyway. Signed-off-by: Alex Williamson alex.william...@redhat.com Reported-by: Joerg Roedel j...@8bytes.org diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 2fb29df..563c510 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -689,6 +689,23 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev) } EXPORT_SYMBOL_GPL(vfio_device_get_from_dev); +static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group, +char *buf) +{ + struct vfio_device *device; + + mutex_lock(group-device_lock); + list_for_each_entry(device, group-device_list, group_next) { + if (!strcmp(dev_name(device-dev), buf)) { + vfio_device_get(device); + break; + } + } + mutex_unlock(group-device_lock); + + return device; +} + /* * Caller must hold a reference to the vfio_device */ @@ -1198,53 +1215,53 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) { struct vfio_device *device; struct file *filep; - int ret = -ENODEV; + int ret; if (0 == atomic_read(group-container_users) || !group-container-iommu_driver || !vfio_group_viable(group)) return -EINVAL; - mutex_lock(group-device_lock); - list_for_each_entry(device, group-device_list, group_next) { - if (strcmp(dev_name(device-dev), buf)) - continue; + device = vfio_device_get_from_name(group, buf); + if (!device) + return -ENODEV; - ret = device-ops-open(device-device_data); - if (ret) - break; - /* -* We can't use anon_inode_getfd() because we need to modify -* the f_mode flags directly to allow more than just ioctls -*/ - ret = get_unused_fd_flags(O_CLOEXEC); - if (ret 0) { - device-ops-release(device-device_data); - break; - } + ret = device-ops-open(device-device_data); + if (ret) { + vfio_device_put(device); + return ret; + } - filep = anon_inode_getfile([vfio-device], vfio_device_fops, - device, O_RDWR); - if (IS_ERR(filep)) { - put_unused_fd(ret); - ret = PTR_ERR(filep); - device-ops-release(device-device_data); - break; - } + /* +* We can't use anon_inode_getfd() because we need to modify +* the f_mode flags directly to allow more than just ioctls +*/ + ret = get_unused_fd_flags(O_CLOEXEC); + if (ret 0) { + device-ops-release(device-device_data); +
Re: Lockdep warning in VFIO using v4.2-rc3
Hi Alex, On Thu, Jul 23, 2015 at 11:07:37AM -0600, Alex Williamson wrote: Thanks for the report. I think I found it. I'll do further testing myself, but would appreciate if you're able to see if this clears the problem. Thanks, Just tested it and the lockdep warning is gone. Thanks for your quick look. Tested-by: Joerg Roedel jroe...@suse.de ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/3] Docs: dt: add generic MSI bindings
Currently msi-parent is used in a couple of drivers despite being fairly underspecified. This patch adds a generic binding for MSIs (including the existing msi-parent property) enabling the description of platform devices capable of using MSIs. While MSIs are primarily distinguished by doorbell and payload, some MSI controllers (e.g. the GICv3 ITS) also use side-band information accompanying the write to identify the master which originated the MSI, to allow for sandboxing. This sideband information is non-probeable and needs to be described in the DT. Other MSI controllers may have additional configuration details which need to be described per-master. This patch adds a generic msi-parent binding document, extending the de-facto standard with a new (optional) #msi-cells which can be used to express any per-master configuration and/or sideband data. This is sufficient to describe non-hotpluggable devices. For busses where sideband data may be derived from some bus-specific master ID scheme, other properties will be required to describe the mapping. Signed-off-by: Mark Rutland mark.rutl...@arm.com --- .../bindings/interrupt-controller/msi.txt | 135 + 1 file changed, 135 insertions(+) create mode 100644 Documentation/devicetree/bindings/interrupt-controller/msi.txt diff --git a/Documentation/devicetree/bindings/interrupt-controller/msi.txt b/Documentation/devicetree/bindings/interrupt-controller/msi.txt new file mode 100644 index 000..c60c034 --- /dev/null +++ b/Documentation/devicetree/bindings/interrupt-controller/msi.txt @@ -0,0 +1,135 @@ +This document describes the generic device tree binding for MSI controllers and +their master(s). + +Message Signaled Interrupts (MSIs) are a class of interrupts generated by a +write to an MMIO address. + +MSIs were originally specified by PCI (and are used with PCIe), but may also be +used with other busses, and hence a mechanism is required to relate devices on +those busses to the MSI controllers which they are capable of using, +potentially including additional information. + +MSIs are distinguished by some combination of: + +- The doorbell (the MMIO address written to). + + Devices may be configured by software to write to arbitrary doorbells which + they can address. An MSI controller may feature a number of doorbells. + +- The payload (the value written to the doorbell). + + Devices may be configured to write an arbitrary payload chosen by software. + MSI controllers may have restrictions on permitted payloads. + +- Sideband information accompanying the write. + + Typically this is neither configurable nor probeable, and depends on the path + taken through the memory system (i.e. it is a property of the combination of + MSI controller and device rather than a property of either in isolation). + + +MSI controllers: + + +An MSI controller signals interrupts to a CPU when a write is made to an MMIO +address by some master. An MSI controller may feature a number of doorbells. + +Required properties: + + +- msi-controller: Identifies the node as an MSI controller. + +Optional properties: + + +- #msi-cells: The number of cells in an msi-specifier, required if not zero. + + Typically this will encode information related to sideband data, and will + not encode doorbells or payloads as these can be configured dynamically. + + The meaning of the msi-specifier is defined by the device tree binding of + the specific MSI controller. + + +MSI clients +=== + +MSI clients are devices which generate MSIs. For each MSI they wish to +generate, the doorbell and payload may be configured, though sideband +information may not be configurable. + +Required properties: + + +- msi-parent: A list of phandle + msi-specifier pairs, one for each MSI + controller which the device is capable of using. + + This property is unordered, and MSIs may be allocated from any combination of + MSI controllers listed in the msi-parent property. + + If a device has restrictions on the allocation of MSIs, these restrictions + must be described with additional properties. + + When #msi-cells is non-zero, busses with an msi-parent will require + additional properties to describe the relationship between devices on the bus + and the set of MSIs they can potentially generate. + + +Example +=== + +/ { + #address-cells = 1; + #size-cells = 1; + + msi_a: msi-controller@a { + reg = 0xa 0xf00; + compatible = vendor-a,some-controller; + msi-controller; + /* No sideband data, so #msi-cells omitted */ + }; + + msi_b: msi-controller@b { + reg = 0xb 0xf00; + compatible = vendor-b,another-controller; + msi-controller; + /* Each device has some unique ID */ + #msi-cells = 1; + }; + + msi_c:
Re: [PATCH] iommu/vt-d: Fix VM domain ID leak
On Wed, Jul 22, 2015 at 03:01:04PM -0600, Alex Williamson wrote: In the meantime, we're still leaking domain IDs, do you see any problem with the proposed patch for 4.2 and stable? Thanks, Applied and pull-request sent. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[git pull] IOMMU Fixes for Linux v4.2-rc3
Hi Linus, The following changes since commit d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754: Linux 4.2-rc1 (2015-07-05 11:01:52 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tags/iommu-fixes-v4.2-rc3 for you to fetch changes up to 46ebb7af7b93792de65e124e1ab8b89a108a41f2: iommu/vt-d: Fix VM domain ID leak (2015-07-23 14:17:39 +0200) IOMMU Fixes for Linux v4.2-rc3 The fixes include: * A couple of fixes for the new ARM-SMMUv3 driver to fix issues found on the first real implementation of that hardware. * A patch for the Intel VT-d driver to fix a domain-id leak. Alex Williamson (1): iommu/vt-d: Fix VM domain ID leak Will Deacon (1): iommu/arm-smmu: Fix LOG2SIZE setting for 2-level stream tables Zhen Lei (4): iommu/arm-smmu: Fix the index calculation of strtab iommu/arm-smmu: Fix the values of ARM64_TCR_{I,O}RGN0_SHIFT iommu/arm-smmu: Enlarge STRTAB_L1_SZ_SHIFT to support larger sidsize iommu/arm-smmu: Skip the execution of CMD_PREFETCH_CONFIG .../devicetree/bindings/iommu/arm,smmu-v3.txt | 3 ++ drivers/iommu/arm-smmu-v3.c| 60 +- drivers/iommu/intel-iommu.c| 9 ++-- 3 files changed, 55 insertions(+), 17 deletions(-) Please pull. Thanks, Joerg signature.asc Description: Digital signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/3] Docs: dt: add PCI IOMMU map bindings
The existing IOMMU bindings are able to specify the relationship between masters and IOMMUs, but they are insufficient for describing the general case of hotpluggable busses such as PCI where the set of masters is not known until runtime, and the relationship between masters and IOMMUs is a property of the integration of the system. This patch adds a generic binding for mapping PCI devices to IOMMUs, using a new iommu-map property (specific to PCI*) which may be used to map devices (identified by their Requester ID) to sideband data for the IOMMU which they master through. Signed-off-by: Mark Rutland mark.rutl...@arm.com --- .../devicetree/bindings/pci/pci-iommu.txt | 171 + 1 file changed, 171 insertions(+) create mode 100644 Documentation/devicetree/bindings/pci/pci-iommu.txt diff --git a/Documentation/devicetree/bindings/pci/pci-iommu.txt b/Documentation/devicetree/bindings/pci/pci-iommu.txt new file mode 100644 index 000..56c8296 --- /dev/null +++ b/Documentation/devicetree/bindings/pci/pci-iommu.txt @@ -0,0 +1,171 @@ +This document describes the generic device tree binding for describing the +relationship between PCI(e) devices and IOMMU(s). + +Each PCI(e) device under a root complex is uniquely identified by its Requester +ID (AKA RID). A Requester ID is a triplet of a Bus number, Device number, and +Function number. + +For the purpose of this document, when treated as a numeric value, a RID is +formatted such that: + +* Bits [15:8] are the Bus number. +* Bits [7:3] are the Device number. +* Bits [2:0] are the Function number. +* Any other bits required for padding must be zero. + +IOMMUs may distinguish PCI devices through sideband data derived from the +Requester ID. While a given PCI device can only master through one IOMMU, a +root complex may split masters across a set of IOMMUs (e.g. with one IOMMU per +bus). + +The generic 'iommus' property is insufficient to describe this relationship, +and a mechanism is required to map from a PCI device to its IOMMU and sideband +data. + +For generic IOMMU bindings, see +Documentation/devicetree/bindings/iommu/iommu.txt. + + +PCI root complex + + +Optional properties +--- + +- iommu-map: Maps a Requester ID to an IOMMU and associated iommu-specifier + data. + + The property is an arbitrary number of tuples of + (rid-base,iommu,iommu-base,length). + + Any RID r in the interval [rid-base, rid-base + length) is associated with + the listed IOMMU, with the iommu-specifier (r - rid-base + iommu-base). + +- iommu-map-mask: A mask to be applied to each Requester ID prior to being + mapped to an iommu-specifier per the iommu-map property. + + +Example (1) +=== + +/ { + #address-cells = 1; + #size-cells = 1; + + iommu: iommu@a { + reg = 0xa 0x1; + compatible = vendor,some-iommu; + #iommu-cells = 1; + }; + + pci: pci@f { + reg = 0xf 0x1; + compatible = vendor,pcie-root-complex; + device_type = pci; + + /* +* The sideband data provided to the IOMMU is the RID, +* identity-mapped. +*/ + iommu-map = 0x0 iommu 0x0 0x1; + }; +}; + + +Example (2) +=== + +/ { + #address-cells = 1; + #size-cells = 1; + + iommu: iommu@a { + reg = 0xa 0x1; + compatible = vendor,some-iommu; + #iommu-cells = 1; + }; + + pci: pci@f { + reg = 0xf 0x1; + compatible = vendor,pcie-root-complex; + device_type = pci; + + /* +* The sideband data provided to the IOMMU is the RID with the +* function bits masked out. +*/ + iommu-map = 0x0 iommu 0x0 0x1; + iommu-map-mask = 0xfff8; + }; +}; + + +Example (3) +=== + +/ { + #address-cells = 1; + #size-cells = 1; + + iommu: iommu@a { + reg = 0xa 0x1; + compatible = vendor,some-iommu; + #iommu-cells = 1; + }; + + pci: pci@f { + reg = 0xf 0x1; + compatible = vendor,pcie-root-complex; + device_type = pci; + + /* +* The sideband data provided to the IOMMU is the RID, +* but the high bits of the bus number are flipped. +*/ + iommu-map = 0x iommu 0x8000 0x8000, + 0x8000 iommu 0x 0x8000; + }; +}; + + +Example (4) +=== + +/ { + #address-cells = 1; + #size-cells = 1; + + iommu_a: iommu@a { + reg = 0xa 0x1; + compatible = vendor,some-iommu; + #iommu-cells = 1; + }; + + iommu_b: iommu@b { + reg = 0xb 0x1; + compatible =
Re: [PATCH 2/2] iommu/omap: Add support for configuring dsp iommus on DRA7xx
* Suman Anna s-a...@ti.com [150722 09:25]: On 07/22/2015 12:26 AM, Tony Lindgren wrote: I don't like using syscon for tinkering directly with SoC registers. This is not a SoC-level register, but a register within a sub-module of the DSP processor sub-system. The DSP_SYSTEM sub-module in general is described in Section 5.3.3 of the TRM [1], and it implements different functionalities like the PRCM handshaking, wakeup logic and DSP subsystem top-level configuration. It is a module present within the DSP processor sub-system, so can only be accessed when the sub-system is clocked and the appropriate reset is released. OK so if it's specific to the DSP driver along the lines of sysc and syss registers. Typically we handle these registers by mapping them to the PM runtime functions for the interconnect so we can reset and idle the hardware modules even if there is no driver, see for example omap54xx_mmu_dsp_hwmod. We should use some Linux generic framework for configuring these bits to avoid nasty dependencies between various hardware modules on the SoC. What does DSP_SYS_MMU_CONFIG register do? It seems it's probably a regulator or a gate clock? If so, it should be set up as a regulator or a clock and then the omap-iommu driver can just use regulator or clcok framework to request the resource. No, its neither. It is a control bit that dictates whether the processor/EDMA addresses go through the respective MMU or not. The register currently has 4 bits (bit 0 in each nibble), one each for enabling each MMU and requesting an MMU abort on each. The MMU integration and enablement notes are detailed in Section 5.3.6 of the TRM [1], and the DSP_SYS_MMU_CONFIG register layout is in Table 5-28 (Page 1641). OK yeah seems like it should be handled by the DSP driver during probe after doing pm_runtime_get. Then the driver can configure things like IOMMU and load firmware. Or am I missing something here? Regards, Tony [1] http://www.ti.com/lit/ug/spruhz6/spruhz6.pdf ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Lockdep warning in VFIO using v4.2-rc3
Hi Alex, I stumbled over this lockdep warning yesterday while testing my VT-d changes. It looks like one code path is taking the locks: group-device_lock driver_lock pci_bus_sem while another path is taking pci_bus_sem group-device_lock which could lead to a deadlock. I attach the full warning, can you please have a look? [ 252.892008] == [ 252.892008] [ INFO: possible circular locking dependency detected ] [ 252.892008] 4.2.0-rc3+ #16 Not tainted [ 252.892008] --- [ 252.892008] qemu-system-x86/4986 is trying to acquire lock: [ 252.892008] (group-device_lock){+.+.+.}, at: [a0569da4] vfio_group_get_device+0x24/0xb0 [vfio] [ 252.892008] but task is already holding lock: [ 252.892008] (pci_bus_sem){.+}, at: [813ee47e] pci_walk_bus+0x2e/0xa0 [ 252.892008] which lock already depends on the new lock. [ 252.892008] the existing dependency chain (in reverse order) is: [ 252.892008] - #2 (pci_bus_sem){.+}: [ 252.892008][810c9152] __lock_acquire+0xca2/0x1550 [ 252.892008][810c9b6f] lock_acquire+0xdf/0x2c0 [ 252.892008][81710c4c] down_read+0x4c/0xa0 [ 252.892008][813ee47e] pci_walk_bus+0x2e/0xa0 [ 252.892008][a0657dcb] vfio_pci_release+0x18b/0x3c0 [vfio_pci] [ 252.892008][a056a100] vfio_device_fops_release+0x20/0x40 [vfio] [ 252.892008][81211460] __fput+0xf0/0x200 [ 252.892008][812115be] fput+0xe/0x10 [ 252.892008][8108f5fd] task_work_run+0x8d/0xc0 [ 252.892008][8106d35c] do_exit+0x32c/0xc30 [ 252.892008][8106f10c] do_group_exit+0x4c/0xc0 [ 252.892008][8107d678] get_signal+0x328/0x9e0 [ 252.892008][81003458] do_signal+0x28/0x9e0 [ 252.892008][81003e75] do_notify_resume+0x65/0x80 [ 252.892008][81713e2e] int_signal+0x12/0x17 [ 252.892008] - #1 (driver_lock){+.+.+.}: [ 252.892008][810c9152] __lock_acquire+0xca2/0x1550 [ 252.892008][810c9b6f] lock_acquire+0xdf/0x2c0 [ 252.892008][8170e71b] mutex_lock_nested+0x6b/0x420 [ 252.892008][a06576b8] vfio_pci_open+0x38/0x270 [vfio_pci] [ 252.892008][a056ab17] vfio_group_fops_unl_ioctl+0x267/0x460 [vfio] [ 252.892008][8122454d] do_vfs_ioctl+0x30d/0x580 [ 252.892008][81224839] SyS_ioctl+0x79/0x90 [ 252.892008][81713c32] entry_SYSCALL_64_fastpath+0x16/0x7a [ 252.892008] - #2 (group-device_lock){+.+.+.}: [ 252.892008][810c6c6c] check_prevs_add+0x8fc/0x900 [ 252.892008][810c9152] __lock_acquire+0xca2/0x1550 [ 252.892008][810c9b6f] lock_acquire+0xdf/0x2c0 [ 252.892008][8170e71b] mutex_lock_nested+0x6b/0x420 [ 252.892008][a0569da4] vfio_group_get_device+0x24/0xb0 [vfio] [ 252.892008][a056a165] vfio_device_get_from_dev+0x45/0xa0 [vfio] [ 252.892008][a065762c] vfio_pci_get_devs+0x2c/0x80 [vfio_pci] [ 252.892008][a065706d] vfio_pci_walk_wrapper+0x5d/0x70 [vfio_pci] [ 252.892008][813ee4c5] pci_walk_bus+0x75/0xa0 [ 252.892008][a0657e93] vfio_pci_release+0x253/0x3c0 [vfio_pci] [ 252.892008][a056a100] vfio_device_fops_release+0x20/0x40 [vfio] [ 252.892008][81211460] __fput+0xf0/0x200 [ 252.892008][812115be] fput+0xe/0x10 [ 252.892008][8108f5fd] task_work_run+0x8d/0xc0 [ 252.892008][8106d35c] do_exit+0x32c/0xc30 [ 252.892008][8106f10c] do_group_exit+0x4c/0xc0 [ 252.892008][8107d678] get_signal+0x328/0x9e0 [ 252.892008][81003458] do_signal+0x28/0x9e0 [ 252.892008][81003e75] do_notify_resume+0x65/0x80 [ 252.892008][81713e2e] int_signal+0x12/0x17 [ 252.892008] other info that might help us debug this: [ 252.892008] Chain exists of: group-device_lock -- driver_lock -- pci_bus_sem [ 252.892008] Possible unsafe locking scenario: [ 252.892008]CPU0CPU1 [ 252.892008] [ 252.892008] lock(pci_bus_sem); [ 252.892008]lock(driver_lock); [ 252.892008]lock(pci_bus_sem); [ 252.892008] lock(group-device_lock); [ 252.892008] *** DEADLOCK *** [ 252.892008] 2 locks held by qemu-system-x86/4986: [ 252.892008] #0: (driver_lock){+.+.+.}, at: [a0657c67] vfio_pci_release+0x27/0x3c0 [vfio_pci] [ 252.892008] #1: (pci_bus_sem){.+}, at: [813ee47e] pci_walk_bus+0x2e/0xa0 [ 252.892008] stack backtrace: [ 252.892008]