Re: Regression when booting 5.15 as dom0 on arm64 (WAS: Re: [linux-linus test] 161829: regressions - FAIL)
Hi Christoph, On 10/05/2021 09:40, Christoph Hellwig wrote: On Sat, May 08, 2021 at 12:32:37AM +0100, Julien Grall wrote: The pointer dereferenced seems to suggest that the swiotlb hasn't been allocated. From what I can tell, this may be because swiotlb_force is set to SWIOTLB_NO_FORCE, we will still enable the swiotlb when running on top of Xen. I am not entirely sure what would be the correct fix. Any opinions? Can you try something like the patch below (not even compile tested, but the intent should be obvious? diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 16a2b2b1c54d..7671bc153fb1 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -44,6 +44,8 @@ #include #include +#include + /* * We need to be able to catch inadvertent references to memstart_addr * that occur (potentially in generic code) before arm64_memblock_init() @@ -482,7 +484,7 @@ void __init mem_init(void) if (swiotlb_force == SWIOTLB_FORCE || max_pfn > PFN_DOWN(arm64_dma_phys_limit)) swiotlb_init(1); - else + else if (!IS_ENABLED(CONFIG_XEN) || !xen_swiotlb_detect()) swiotlb_force = SWIOTLB_NO_FORCE; set_max_mapnr(max_pfn - PHYS_PFN_OFFSET); I have applied the patch on top of 5.13-rc1 and can confirm I am able to boot dom0. Are you going to submit the patch? Thank you for your help! Best regards, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Regression when booting 5.15 as dom0 on arm64 (WAS: Re: [linux-linus test] 161829: regressions - FAIL)
amd64-xl-qcow2fail test-armhf-armhf-libvirt-raw pass test-amd64-i386-xl-raw fail test-amd64-amd64-xl-rtds fail test-armhf-armhf-xl-rtds pass test-arm64-arm64-xl-seattle fail test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow fail test-amd64-amd64-xl-shadow pass test-amd64-i386-xl-shadowfail test-arm64-arm64-xl-thunderx fail test-amd64-amd64-libvirt-vhd fail test-armhf-armhf-xl-vhd fail sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 1625261 lines long.) -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
Hi Robin, On 16/10/2019 17:09, Robin Murphy wrote: On 16/10/2019 17:03, Joerg Roedel wrote: On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: The x86 one might just be a mistake. diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ad05484d0c80..63c4b894751d 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); Yeah, that is a bug, I spotted that too. @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!iova) goto out_free_page; - if (iommu_map(domain, iova, msi_addr, size, prot)) + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) goto out_free_iova; Not so sure this is a bug, this code is only about setting up MSIs on ARM. It probably doesn't need to be atomic. Right, since the iommu_dma_prepare_msi() operation was broken out to happen at MSI domain setup time, I don't think the comment in there applies any more, so we can probably stop disabling irqs around the iommu_dma_get_msi_page() call. Yes, I agree that it does not need to be atomic anymore. Cheers, -- Julien Grall
Re: [Xen-devel] [PATCH 08/11] swiotlb-xen: use the same foreign page check everywhere
Hi Christoph, On 8/16/19 2:00 PM, Christoph Hellwig wrote: xen_dma_map_page uses a different and more complicated check for foreign pages than the other three cache maintainance helpers. Switch it to the simpler pfn_vali method a well. NIT: s/pfn_vali/pfn_valid/ Signed-off-by: Christoph Hellwig Reviewed-by: Julien Grall Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Xen-devel] [PATCH 01/11] xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance
Hi Christoph, On 8/16/19 2:00 PM, Christoph Hellwig wrote: +static inline void xen_dma_map_page(struct device *hwdev, struct page *page, +dma_addr_t dev_addr, unsigned long offset, size_t size, +enum dma_data_direction dir, unsigned long attrs) +{ + unsigned long page_pfn = page_to_xen_pfn(page); + unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr); + unsigned long compound_pages = + (1< The Arm version as a comment here. Could we retain it? + if (local) + dma_direct_map_page(hwdev, page, offset, size, dir, attrs); + else + __xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, attrs); +} + Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Xen-devel] [PATCH 04/11] xen/arm: remove xen_dma_ops
Hi Christoph, On 8/16/19 2:00 PM, Christoph Hellwig wrote: arm and arm64 can just use xen_swiotlb_dma_ops directly like x86, no need for a pointer indirection. Signed-off-by: Christoph Hellwig Reviewed-by: Julien Grall Cheers, -- Julien Grall
Re: [Xen-devel] [PATCH 02/11] xen/arm: use dev_is_dma_coherent
Hi Christoph, On 8/16/19 2:00 PM, Christoph Hellwig wrote: Use the dma-noncoherent dev_is_dma_coherent helper instead of the home grown variant. It took me a bit of time to understand that dev->archdata.dma_coherent and dev->dma_coherent will always contain the same value. Would you mind it mention it in the commit message? Other than that: Reviewed-by: Julien Grall Signed-off-by: Christoph Hellwig Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Xen-devel] [PATCH 07/11] swiotlb-xen: provide a single page-coherent.h header
Hi Christoph, On 8/17/19 7:50 AM, Christoph Hellwig wrote: On Fri, Aug 16, 2019 at 11:40:43PM +0100, Julien Grall wrote: I am not sure I agree with this rename. The implementation of the helpers are very Arm specific as this is assuming Dom0 is 1:1 mapped. This was necessary due to the lack of IOMMU on Arm platforms back then. But this is now a pain to get rid of it on newer platform... So if you look at the final version of the header after the whole series, what assumes a 1:1 mapping? It all just is if (pfn_valid()) local cache sync; else call into the arch code; In the context of Xen Arm, the dev_addr is a host physical address. From my understanding pfn_valid() is dealing with a guest physical frame. Therefore by passing PFN_DOWN(dev_addr) in argument you assume that the host and guest address spaces are the same. are you concerned that the local cache sync might have to be split up more for a non-1:1 map in that case? We could just movea the xen_dma_* routines into the arch instead of __xen_dma, but it really helps to have a common interface header. Moving xen_dma_* routines into the arch would be a good option. Although, I would still consider a stub version for arch not requiring specific DMA. Cheers, -- Julien Grall
Re: [Xen-devel] [PATCH 07/11] swiotlb-xen: provide a single page-coherent.h header
Hi, On 8/16/19 2:00 PM, Christoph Hellwig wrote: Merge the various page-coherent.h files into a single one that either provides prototypes or stubs depending on the need for cache maintainance. For extra benefits alo include in the file actually implementing the interfaces provided. Signed-off-by: Christoph Hellwig --- arch/arm/include/asm/xen/page-coherent.h | 2 -- arch/arm/xen/mm.c | 1 + arch/arm64/include/asm/xen/page-coherent.h | 2 -- arch/x86/include/asm/xen/page-coherent.h | 22 -- drivers/xen/swiotlb-xen.c | 4 +--- include/Kbuild | 2 +- include/xen/{arm => }/page-coherent.h | 27 +++--- I am not sure I agree with this rename. The implementation of the helpers are very Arm specific as this is assuming Dom0 is 1:1 mapped. This was necessary due to the lack of IOMMU on Arm platforms back then. But this is now a pain to get rid of it on newer platform... Cheers, -- Julien Grall
[PATCH v3 7/7] iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
A recent change split iommu_dma_map_msi_msg() in two new functions. The function was still implemented to avoid modifying all the callers at once. Now that all the callers have been reworked, iommu_dma_map_msi_msg() can be removed. Signed-off-by: Julien Grall Reviewed-by: Robin Murphy Reviewed-by: Eric Auger --- Changes in v3: - Add Robin's and Eric's reviewed-by Changes in v2: - Rework the commit message --- drivers/iommu/dma-iommu.c | 20 include/linux/dma-iommu.h | 5 - 2 files changed, 25 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f847904098f7..13916fefeb27 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -935,23 +935,3 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; msg->address_lo += lower_32_bits(msi_page->iova); } - -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ - struct msi_desc *desc = irq_get_msi_desc(irq); - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; - - if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { - /* -* We're called from a void callback, so the best we can do is -* 'fail' by filling the message with obviously bogus values. -* Since we got this far due to an IOMMU being present, it's -* not like the existing address would have worked anyway... -*/ - msg->address_hi = ~0U; - msg->address_lo = ~0U; - msg->data = ~0U; - } else { - iommu_dma_compose_msi_msg(desc, msg); - } -} diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 0b781a98ee73..476e0c54de2d 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -84,7 +84,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg); -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); #else @@ -124,10 +123,6 @@ static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, { } -static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ -} - static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) { } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 6/7] irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg()
The functions mbi_compose_m{b, s}i_msg may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent patch split iommu_dma_map_msi_msg in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 MSI driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the two MSI mappings when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-mbi.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-mbi.c b/drivers/irqchip/irq-gic-v3-mbi.c index fbfa7ff6deb1..d50f6cdf043c 100644 --- a/drivers/irqchip/irq-gic-v3-mbi.c +++ b/drivers/irqchip/irq-gic-v3-mbi.c @@ -84,6 +84,7 @@ static void mbi_free_msi(struct mbi_range *mbi, unsigned int hwirq, static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct mbi_range *mbi = NULL; int hwirq, offset, i, err = 0; @@ -104,6 +105,16 @@ static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = mbi->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_CLRSPI_NSR); + if (err) + return err; + + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_SETSPI_NSR); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = mbi_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) @@ -142,7 +153,7 @@ static void mbi_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) msg[0].address_lo = lower_32_bits(mbi_phys_base + GICD_SETSPI_NSR); msg[0].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } #ifdef CONFIG_PCI_MSI @@ -202,7 +213,7 @@ static void mbi_compose_mbi_msg(struct irq_data *data, struct msi_msg *msg) msg[1].address_lo = lower_32_bits(mbi_phys_base + GICD_CLRSPI_NSR); msg[1].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, &msg[1]); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), &msg[1]); } /* Platform-MSI specific irqchip */ -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 5/7] irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg()
ls_scfg_msi_compose_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent patch split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The FreeScale SCFG MSI driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI maping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-ls-scfg-msi.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-ls-scfg-msi.c b/drivers/irqchip/irq-ls-scfg-msi.c index c671b3212010..669d29105772 100644 --- a/drivers/irqchip/irq-ls-scfg-msi.c +++ b/drivers/irqchip/irq-ls-scfg-msi.c @@ -100,7 +100,7 @@ static void ls_scfg_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) msg->data |= cpumask_first(mask); } - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } static int ls_scfg_msi_set_affinity(struct irq_data *irq_data, @@ -141,6 +141,7 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct ls_scfg_msi *msi_data = domain->host_data; int pos, err = 0; @@ -157,6 +158,10 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, msi_data->msiir_addr); + if (err) + return err; + irq_domain_set_info(domain, virq, pos, &ls_scfg_msi_parent_chip, msi_data, handle_simple_irq, NULL, NULL); -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
its_irq_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 ITS driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall Reviewed-by: Eric Auger --- Changes in v3: - Fix typo in the commit message - Check the return of iommu_dma_prepare_msi - Add Eric's reviewed-by Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-its.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7577755bdcf4..9cddf336c09d 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) msg->address_hi = upper_32_bits(addr); msg->data = its_get_event_id(d); - iommu_dma_map_msi_msg(d->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg); } static int its_irq_set_irqchip_state(struct irq_data *d, @@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + struct its_node *its = its_dev->its; irq_hw_number_t hwirq; int err; int i; @@ -2574,6 +2575,10 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 3/7] irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg()
gicv2m_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv2m driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall Reviewed-by: Eric Auger --- Changes in v3: - Add Eric's reviewed-by Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v2m.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c index f5fe0100f9ff..4359f0583377 100644 --- a/drivers/irqchip/irq-gic-v2m.c +++ b/drivers/irqchip/irq-gic-v2m.c @@ -110,7 +110,7 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (v2m->flags & GICV2M_NEEDS_SPI_OFFSET) msg->data -= v2m->spi_offset; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } static struct irq_chip gicv2m_irq_chip = { @@ -167,6 +167,7 @@ static void gicv2m_unalloc_msi(struct v2m_data *v2m, unsigned int hwirq, static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct v2m_data *v2m = NULL, *tmp; int hwirq, offset, i, err = 0; @@ -186,6 +187,11 @@ static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = v2m->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + v2m->res.start + V2M_MSI_SETSPI_NS); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = gicv2m_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
On RT, iommu_dma_map_msi_msg() may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). iommu_dma_map_msi_msg() is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. iomma_dma_map_msi_msg() is now split in two functions: - iommu_dma_prepare_msi(): This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg(): This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall Reviewed-by: Robin Murphy Reviewed-by: Eric Auguer --- Changes in v3: - Update the comment to use kerneldoc format - Fix typoes in the comments - More use of msi_desc_set_iommu_cookie - Add Robin's and Eric's reviewed-by Changes in v2: - Rework the commit message to use imperative mood - Use the MSI accessor to get/set the iommu cookie - Don't use ternary on return - Select CONFIG_IRQ_MSI_IOMMU - Pass an msi_desc rather than the irq number --- drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 46 +- include/linux/dma-iommu.h | 25 + 3 files changed, 63 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 6f07f3b21816..eb1c8cd243f9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ config IOMMU_DMA bool select IOMMU_API select IOMMU_IOVA + select IRQ_MSI_IOMMU select NEED_SG_DMA_LENGTH config FSL_PAMU diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe637a60..f847904098f7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -888,17 +888,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; } -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; struct iommu_dma_msi_page *msi_page; - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; unsigned long flags; - if (!domain || !domain->iova_cookie) - return; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -911,7 +912,36 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(&cookie->msi_lock, flags); - if (WARN_ON(!msi_page)) { + msi_desc_set_iommu_cookie(desc, msi_page); + + if (!msi_page) + return -ENOMEM; + return 0; +} + +void iommu_dma_compose_msi_msg(struct msi_desc *desc, + struct msi_msg *msg) +{ + struct device *dev = msi_desc_to_dev(desc); + const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_dma_msi_page *msi_page; + + msi_page = msi_desc_get_iommu_cookie(desc); + + if (!domain || !domain->iova_cookie || WARN_ON(!msi_page)) + return; + + msg->address_hi = upper_32_bits(msi_page->iova); + msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; + msg->address_lo += lower_32_bits(msi_page->iova); +} + +void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_get_msi_desc(irq); + phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; + + if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { /* * We're called from a void callback, so the best we can do is * 'fail' by filling the message with obviously bogus values. @@ -922,8 +952,6 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msg->address_lo = ~0U; msg->data = ~0U; } else { - msg->address_hi = upper_32_bits(msi_page->iova); - msg->address_lo &= cookie_msi_granule(cookie) - 1; - msg->address_lo += lower_32_bits(msi_page->iova); +
[PATCH v3 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up change will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. A new field is introduced in msi_desc to store an IOMMU cookie. As the cookie may not be required in some configuration, the field is protected under a new config CONFIG_IRQ_MSI_IOMMU. A pair of helpers has also been introduced to access the field. Signed-off-by: Julien Grall Reviewed-by: Robin Murphy Reviewed-by: Eric Auger --- Changes in v3: - Add Robin's and Eric's reviewed-by Changes in v2: - Update the commit message to use imperative mood - Protect the field with a new config that will be selected by IOMMU_DMA later on - Add a set of helpers to access the new field --- include/linux/msi.h | 26 ++ kernel/irq/Kconfig | 3 +++ 2 files changed, 29 insertions(+) diff --git a/include/linux/msi.h b/include/linux/msi.h index 7e9b81c3b50d..82a308c19222 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -77,6 +77,9 @@ struct msi_desc { struct device *dev; struct msi_msg msg; struct irq_affinity_desc*affinity; +#ifdef CONFIG_IRQ_MSI_IOMMU + const void *iommu_cookie; +#endif union { /* PCI MSI/X specific data */ @@ -119,6 +122,29 @@ struct msi_desc { #define for_each_msi_entry_safe(desc, tmp, dev)\ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#ifdef CONFIG_IRQ_MSI_IOMMU +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return desc->iommu_cookie; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ + desc->iommu_cookie = iommu_cookie; +} +#else +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return NULL; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ +} +#endif + #ifdef CONFIG_PCI_MSI #define first_pci_msi_entry(pdev) first_msi_entry(&(pdev)->dev) #define for_each_pci_msi_entry(desc, pdev) \ diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 5f3e2baefca9..8fee06625c37 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -91,6 +91,9 @@ config GENERIC_MSI_IRQ_DOMAIN select IRQ_DOMAIN_HIERARCHY select GENERIC_MSI_IRQ +config IRQ_MSI_IOMMU + bool + config HANDLE_DOMAIN_IRQ bool -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 0/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi all, On RT, the function iommu_dma_map_msi_msg expects to be called from preemptible context. However, this is not always the case resulting a splat with !CONFIG_DEBUG_ATOMIC_SLEEP: [ 48.875777] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:974 [ 48.875779] in_atomic(): 1, irqs_disabled(): 128, pid: 2103, name: ip [ 48.875782] INFO: lockdep is turned off. [ 48.875784] irq event stamp: 10684 [ 48.875786] hardirqs last enabled at (10683): [] _raw_spin_unlock_irqrestore+0x88/0x90 [ 48.875791] hardirqs last disabled at (10684): [] _raw_spin_lock_irqsave+0x24/0x68 [ 48.875796] softirqs last enabled at (0): [] copy_process.isra.1.part.2+0x8d8/0x1970 [ 48.875801] softirqs last disabled at (0): [<>] (null) [ 48.875805] Preemption disabled at: [ 48.875805] [] __setup_irq+0xd8/0x6c0 [ 48.875811] CPU: 2 PID: 2103 Comm: ip Not tainted 5.0.3-rt1-7-g42ede9a0fed6 #45 [ 48.875815] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jan 23 2017 [ 48.875817] Call trace: [ 48.875818] dump_backtrace+0x0/0x140 [ 48.875821] show_stack+0x14/0x20 [ 48.875823] dump_stack+0xa0/0xd4 [ 48.875827] ___might_sleep+0x16c/0x1f8 [ 48.875831] rt_spin_lock+0x5c/0x70 [ 48.875835] iommu_dma_map_msi_msg+0x5c/0x1d8 [ 48.875839] gicv2m_compose_msi_msg+0x3c/0x48 [ 48.875843] irq_chip_compose_msi_msg+0x40/0x58 [ 48.875846] msi_domain_activate+0x38/0x98 [ 48.875849] __irq_domain_activate_irq+0x58/0xa0 [ 48.875852] irq_domain_activate_irq+0x34/0x58 [ 48.875855] irq_activate+0x28/0x30 [ 48.875858] __setup_irq+0x2b0/0x6c0 [ 48.875861] request_threaded_irq+0xdc/0x188 [ 48.875865] sky2_setup_irq+0x44/0xf8 [ 48.875868] sky2_open+0x1a4/0x240 [ 48.875871] __dev_open+0xd8/0x188 [ 48.875874] __dev_change_flags+0x164/0x1f0 [ 48.875877] dev_change_flags+0x20/0x60 [ 48.875879] do_setlink+0x2a0/0xd30 [ 48.875882] __rtnl_newlink+0x5b4/0x6d8 [ 48.875885] rtnl_newlink+0x50/0x78 [ 48.875888] rtnetlink_rcv_msg+0x178/0x640 [ 48.875891] netlink_rcv_skb+0x58/0x118 [ 48.875893] rtnetlink_rcv+0x14/0x20 [ 48.875896] netlink_unicast+0x188/0x200 [ 48.875898] netlink_sendmsg+0x248/0x3d8 [ 48.875900] sock_sendmsg+0x18/0x40 [ 48.875904] ___sys_sendmsg+0x294/0x2d0 [ 48.875908] __sys_sendmsg+0x68/0xb8 [ 48.875911] __arm64_sys_sendmsg+0x20/0x28 [ 48.875914] el0_svc_common+0x90/0x118 [ 48.875918] el0_svc_handler+0x2c/0x80 [ 48.875922] el0_svc+0x8/0xc Most of the patches have now been acked (missing a couple of ack from Joerg). I was able to test the changes in GICv2m and GICv3 ITS. I don't have hardware for the other interrupt controllers. Cheers, Julien Grall (7): genirq/msi: Add a new field in msi_desc to store an IOMMU cookie iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg() irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg() irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg() irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg() iommu/dma-iommu: Remove iommu_dma_map_msi_msg() drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 48 +++ drivers/irqchip/irq-gic-v2m.c | 8 ++- drivers/irqchip/irq-gic-v3-its.c | 7 +- drivers/irqchip/irq-gic-v3-mbi.c | 15 ++-- drivers/irqchip/irq-ls-scfg-msi.c | 7 +- include/linux/dma-iommu.h | 24 ++-- include/linux/msi.h | 26 + kernel/irq/Kconfig| 3 +++ 9 files changed, 112 insertions(+), 27 deletions(-) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
On 30/04/2019 13:34, Auger Eric wrote: Hi Julien, Hi Eric, Thank you for the review! On 4/29/19 4:44 PM, Julien Grall wrote: its_irq_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 ITS driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI maping when allocating the MSI interrupt. mapping Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-its.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7577755bdcf4..12ddbcfe1b1e 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) msg->address_hi = upper_32_bits(addr); msg->data= its_get_event_id(d); - iommu_dma_map_msi_msg(d->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg); } static int its_irq_set_irqchip_state(struct irq_data *d, @@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + struct its_node *its = its_dev->its; irq_hw_number_t hwirq; int err; int i; @@ -2574,6 +2575,8 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); Test err as in gicv2m driver? Hmmm yes. Marc, do you want me to respin the patch? Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 7/7] iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
A recent change split iommu_dma_map_msi_msg() in two new functions. The function was still implemented to avoid modifying all the callers at once. Now that all the callers have been reworked, iommu_dma_map_msi_msg() can be removed. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message --- drivers/iommu/dma-iommu.c | 20 include/linux/dma-iommu.h | 5 - 2 files changed, 25 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 2309f59cefa4..12f4464828a4 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -936,23 +936,3 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; msg->address_lo += lower_32_bits(msi_page->iova); } - -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ - struct msi_desc *desc = irq_get_msi_desc(irq); - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; - - if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { - /* -* We're called from a void callback, so the best we can do is -* 'fail' by filling the message with obviously bogus values. -* Since we got this far due to an IOMMU being present, it's -* not like the existing address would have worked anyway... -*/ - msg->address_hi = ~0U; - msg->address_lo = ~0U; - msg->data = ~0U; - } else { - iommu_dma_compose_msi_msg(desc, msg); - } -} diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 3fc48fbd6f63..ddd217c197df 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -82,7 +82,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg); -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); #else @@ -122,10 +121,6 @@ static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, { } -static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ -} - static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) { } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 5/7] irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg()
ls_scfg_msi_compose_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent patch split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The FreeScale SCFG MSI driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI maping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-ls-scfg-msi.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-ls-scfg-msi.c b/drivers/irqchip/irq-ls-scfg-msi.c index c671b3212010..669d29105772 100644 --- a/drivers/irqchip/irq-ls-scfg-msi.c +++ b/drivers/irqchip/irq-ls-scfg-msi.c @@ -100,7 +100,7 @@ static void ls_scfg_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) msg->data |= cpumask_first(mask); } - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } static int ls_scfg_msi_set_affinity(struct irq_data *irq_data, @@ -141,6 +141,7 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct ls_scfg_msi *msi_data = domain->host_data; int pos, err = 0; @@ -157,6 +158,10 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, msi_data->msiir_addr); + if (err) + return err; + irq_domain_set_info(domain, virq, pos, &ls_scfg_msi_parent_chip, msi_data, handle_simple_irq, NULL, NULL); -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
its_irq_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 ITS driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI maping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-its.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7577755bdcf4..12ddbcfe1b1e 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) msg->address_hi = upper_32_bits(addr); msg->data = its_get_event_id(d); - iommu_dma_map_msi_msg(d->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg); } static int its_irq_set_irqchip_state(struct irq_data *d, @@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + struct its_node *its = its_dev->its; irq_hw_number_t hwirq; int err; int i; @@ -2574,6 +2575,8 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); + for (i = 0; i < nr_irqs; i++) { err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
On RT, iommu_dma_map_msi_msg() may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). iommu_dma_map_msi_msg() is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. iomma_dma_map_msi_msg() is now split in two functions: - iommu_dma_prepare_msi(): This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg(): This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood - Use the MSI accessor to get/set the iommu cookie - Don't use ternary on return - Select CONFIG_IRQ_MSI_IOMMU - Pass an msi_desc rather than the irq number --- drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 47 ++- include/linux/dma-iommu.h | 23 +++ 3 files changed, 62 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 6f07f3b21816..eb1c8cd243f9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ config IOMMU_DMA bool select IOMMU_API select IOMMU_IOVA + select IRQ_MSI_IOMMU select NEED_SG_DMA_LENGTH config FSL_PAMU diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe637a60..2309f59cefa4 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -888,17 +888,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; } -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; struct iommu_dma_msi_page *msi_page; - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; unsigned long flags; - if (!domain || !domain->iova_cookie) - return; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -911,7 +912,37 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(&cookie->msi_lock, flags); - if (WARN_ON(!msi_page)) { + msi_desc_set_iommu_cookie(desc, msi_page); + + if (!msi_page) + return -ENOMEM; + else + return 0; +} + +void iommu_dma_compose_msi_msg(struct msi_desc *desc, + struct msi_msg *msg) +{ + struct device *dev = msi_desc_to_dev(desc); + const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_dma_msi_page *msi_page; + + msi_page = msi_desc_get_iommu_cookie(desc); + + if (!domain || !domain->iova_cookie || WARN_ON(!msi_page)) + return; + + msg->address_hi = upper_32_bits(msi_page->iova); + msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; + msg->address_lo += lower_32_bits(msi_page->iova); +} + +void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_get_msi_desc(irq); + phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; + + if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { /* * We're called from a void callback, so the best we can do is * 'fail' by filling the message with obviously bogus values. @@ -922,8 +953,6 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msg->address_lo = ~0U; msg->data = ~0U; } else { - msg->address_hi = upper_32_bits(msi_page->iova); - msg->address_lo &= cookie_msi_granule(cookie) - 1; - msg->address_lo += lower_32_bits(msi_page->iova); + iommu_dma_compose_msi_msg(desc, msg); } } diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index e760dc5d1fa8..3fc48fbd6f63 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -71,12
[PATCH v2 0/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi all, On RT, the function iommu_dma_map_msi_msg expects to be called from preemptible context. However, this is not always the case resulting a splat with !CONFIG_DEBUG_ATOMIC_SLEEP: [ 48.875777] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:974 [ 48.875779] in_atomic(): 1, irqs_disabled(): 128, pid: 2103, name: ip [ 48.875782] INFO: lockdep is turned off. [ 48.875784] irq event stamp: 10684 [ 48.875786] hardirqs last enabled at (10683): [] _raw_spin_unlock_irqrestore+0x88/0x90 [ 48.875791] hardirqs last disabled at (10684): [] _raw_spin_lock_irqsave+0x24/0x68 [ 48.875796] softirqs last enabled at (0): [] copy_process.isra.1.part.2+0x8d8/0x1970 [ 48.875801] softirqs last disabled at (0): [<>] (null) [ 48.875805] Preemption disabled at: [ 48.875805] [] __setup_irq+0xd8/0x6c0 [ 48.875811] CPU: 2 PID: 2103 Comm: ip Not tainted 5.0.3-rt1-7-g42ede9a0fed6 #45 [ 48.875815] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jan 23 2017 [ 48.875817] Call trace: [ 48.875818] dump_backtrace+0x0/0x140 [ 48.875821] show_stack+0x14/0x20 [ 48.875823] dump_stack+0xa0/0xd4 [ 48.875827] ___might_sleep+0x16c/0x1f8 [ 48.875831] rt_spin_lock+0x5c/0x70 [ 48.875835] iommu_dma_map_msi_msg+0x5c/0x1d8 [ 48.875839] gicv2m_compose_msi_msg+0x3c/0x48 [ 48.875843] irq_chip_compose_msi_msg+0x40/0x58 [ 48.875846] msi_domain_activate+0x38/0x98 [ 48.875849] __irq_domain_activate_irq+0x58/0xa0 [ 48.875852] irq_domain_activate_irq+0x34/0x58 [ 48.875855] irq_activate+0x28/0x30 [ 48.875858] __setup_irq+0x2b0/0x6c0 [ 48.875861] request_threaded_irq+0xdc/0x188 [ 48.875865] sky2_setup_irq+0x44/0xf8 [ 48.875868] sky2_open+0x1a4/0x240 [ 48.875871] __dev_open+0xd8/0x188 [ 48.875874] __dev_change_flags+0x164/0x1f0 [ 48.875877] dev_change_flags+0x20/0x60 [ 48.875879] do_setlink+0x2a0/0xd30 [ 48.875882] __rtnl_newlink+0x5b4/0x6d8 [ 48.875885] rtnl_newlink+0x50/0x78 [ 48.875888] rtnetlink_rcv_msg+0x178/0x640 [ 48.875891] netlink_rcv_skb+0x58/0x118 [ 48.875893] rtnetlink_rcv+0x14/0x20 [ 48.875896] netlink_unicast+0x188/0x200 [ 48.875898] netlink_sendmsg+0x248/0x3d8 [ 48.875900] sock_sendmsg+0x18/0x40 [ 48.875904] ___sys_sendmsg+0x294/0x2d0 [ 48.875908] __sys_sendmsg+0x68/0xb8 [ 48.875911] __arm64_sys_sendmsg+0x20/0x28 [ 48.875914] el0_svc_common+0x90/0x118 [ 48.875918] el0_svc_handler+0x2c/0x80 [ 48.875922] el0_svc+0x8/0xc This series is a first attempt to rework how MSI are mapped and composed when an IOMMU is present. I was able to test the changes in GICv2m and GICv3 ITS. I don't have hardware for the other interrupt controllers. Cheers, Julien Grall (7): genirq/msi: Add a new field in msi_desc to store an IOMMU cookie iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg() irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg() irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg() irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg() iommu/dma-iommu: Remove iommu_dma_map_msi_msg() drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 49 +++ drivers/irqchip/irq-gic-v2m.c | 8 ++- drivers/irqchip/irq-gic-v3-its.c | 5 +++- drivers/irqchip/irq-gic-v3-mbi.c | 15 ++-- drivers/irqchip/irq-ls-scfg-msi.c | 7 +- include/linux/dma-iommu.h | 22 -- include/linux/msi.h | 26 + kernel/irq/Kconfig| 3 +++ 9 files changed, 109 insertions(+), 27 deletions(-) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 6/7] irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg()
The functions mbi_compose_m{b, s}i_msg may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent patch split iommu_dma_map_msi_msg in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 MSI driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the two MSI mappings when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-mbi.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-mbi.c b/drivers/irqchip/irq-gic-v3-mbi.c index fbfa7ff6deb1..d50f6cdf043c 100644 --- a/drivers/irqchip/irq-gic-v3-mbi.c +++ b/drivers/irqchip/irq-gic-v3-mbi.c @@ -84,6 +84,7 @@ static void mbi_free_msi(struct mbi_range *mbi, unsigned int hwirq, static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct mbi_range *mbi = NULL; int hwirq, offset, i, err = 0; @@ -104,6 +105,16 @@ static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = mbi->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_CLRSPI_NSR); + if (err) + return err; + + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_SETSPI_NSR); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = mbi_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) @@ -142,7 +153,7 @@ static void mbi_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) msg[0].address_lo = lower_32_bits(mbi_phys_base + GICD_SETSPI_NSR); msg[0].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } #ifdef CONFIG_PCI_MSI @@ -202,7 +213,7 @@ static void mbi_compose_mbi_msg(struct irq_data *data, struct msi_msg *msg) msg[1].address_lo = lower_32_bits(mbi_phys_base + GICD_CLRSPI_NSR); msg[1].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, &msg[1]); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), &msg[1]); } /* Platform-MSI specific irqchip */ -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/7] irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg()
gicv2m_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv2m driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v2m.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c index f5fe0100f9ff..4359f0583377 100644 --- a/drivers/irqchip/irq-gic-v2m.c +++ b/drivers/irqchip/irq-gic-v2m.c @@ -110,7 +110,7 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (v2m->flags & GICV2M_NEEDS_SPI_OFFSET) msg->data -= v2m->spi_offset; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } static struct irq_chip gicv2m_irq_chip = { @@ -167,6 +167,7 @@ static void gicv2m_unalloc_msi(struct v2m_data *v2m, unsigned int hwirq, static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct v2m_data *v2m = NULL, *tmp; int hwirq, offset, i, err = 0; @@ -186,6 +187,11 @@ static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = v2m->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + v2m->res.start + V2M_MSI_SETSPI_NS); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = gicv2m_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up change will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. A new field is introduced in msi_desc to store an IOMMU cookie. As the cookie may not be required in some configuration, the field is protected under a new config CONFIG_IRQ_MSI_IOMMU. A pair of helpers has also been introduced to access the field. Signed-off-by: Julien Grall --- Changes in v2: - Update the commit message to use imperative mood - Protect the field with a new config that will be selected by IOMMU_DMA later on - Add a set of helpers to access the new field --- include/linux/msi.h | 26 ++ kernel/irq/Kconfig | 3 +++ 2 files changed, 29 insertions(+) diff --git a/include/linux/msi.h b/include/linux/msi.h index 7e9b81c3b50d..82a308c19222 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -77,6 +77,9 @@ struct msi_desc { struct device *dev; struct msi_msg msg; struct irq_affinity_desc*affinity; +#ifdef CONFIG_IRQ_MSI_IOMMU + const void *iommu_cookie; +#endif union { /* PCI MSI/X specific data */ @@ -119,6 +122,29 @@ struct msi_desc { #define for_each_msi_entry_safe(desc, tmp, dev)\ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#ifdef CONFIG_IRQ_MSI_IOMMU +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return desc->iommu_cookie; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ + desc->iommu_cookie = iommu_cookie; +} +#else +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return NULL; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ +} +#endif + #ifdef CONFIG_PCI_MSI #define first_pci_msi_entry(pdev) first_msi_entry(&(pdev)->dev) #define for_each_pci_msi_entry(desc, pdev) \ diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 5f3e2baefca9..8fee06625c37 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -91,6 +91,9 @@ config GENERIC_MSI_IRQ_DOMAIN select IRQ_DOMAIN_HIERARCHY select GENERIC_MSI_IRQ +config IRQ_MSI_IOMMU + bool + config HANDLE_DOMAIN_IRQ bool -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi Marc, On 23/04/2019 11:54, Marc Zyngier wrote: On 18/04/2019 18:26, Julien Grall wrote: On RT, the function iommu_dma_map_msi_msg may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). The function iommu_dma_map_msi_msg is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. This patch split the function iommu_dma_map_msi_msg in two new functions: - iommu_dma_prepare_msi: This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg: This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall --- drivers/iommu/dma-iommu.c | 43 --- include/linux/dma-iommu.h | 21 + 2 files changed, 53 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe637a60..f5c1f1685095 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -888,17 +888,17 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; } -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) I quite like the idea of moving from having an irq to having an msi_desc passed to the IOMMU layer... { - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; - struct iommu_dma_msi_page *msi_page; - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; unsigned long flags; - if (!domain || !domain->iova_cookie) - return; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -908,10 +908,33 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) * of an MSI from within an IPI handler. */ spin_lock_irqsave(&cookie->msi_lock, flags); - msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); + desc->iommu_cookie = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(&cookie->msi_lock, flags); - if (WARN_ON(!msi_page)) { + return (desc->iommu_cookie) ? 0 : -ENOMEM; +} + +void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg) ... but I'd like it even better if it was uniform. Can you please move the irq_get_msi_desc() to the callers of iommu_dma_compose_msi_msg(), and make both functions take a msi_desc? Make sense. I will modify iommu_dma_compose_msi_msg to take a msi_desc in parameter. Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi, On 4/23/19 8:08 AM, Christoph Hellwig wrote: On Thu, Apr 18, 2019 at 06:26:06PM +0100, Julien Grall wrote: +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; unsigned long flags; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -908,10 +908,33 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) * of an MSI from within an IPI handler. */ spin_lock_irqsave(&cookie->msi_lock, flags); + desc->iommu_cookie = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(&cookie->msi_lock, flags); + return (desc->iommu_cookie) ? 0 : -ENOMEM; No need for the braces. Also I personally find a: if (!desc->iommu_cookie) return -ENOMEM; return 0; much more readable, but that might just be personal preference. I am happy either way. I will use your suggestion in the next version. Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
On 4/23/19 11:23 AM, Marc Zyngier wrote: Hi Julien, Hi Marc, On 18/04/2019 18:26, Julien Grall wrote: When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up patch will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. This patch introduces a new field in msi_desc to store an IOMMU cookie when CONFIG_IOMMU_DMA is selected. Signed-off-by: Julien Grall --- include/linux/msi.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/msi.h b/include/linux/msi.h index 7e9b81c3b50d..d7907feef1bb 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -77,6 +77,9 @@ struct msi_desc { struct device *dev; struct msi_msg msg; struct irq_affinity_desc*affinity; +#ifdef CONFIG_IOMMU_DMA + const void *iommu_cookie; +#endif union { /* PCI MSI/X specific data */ Given that this is the only member in this structure that is dependent on a config option, you could also add a couple of accessors that would do nothing when IOMMU_DMA is not selected (and use that in the DMA code). I haven't seen any use of the helpers so far because the DMA code is also protected by IOMMU_DMA. I can add the helpers in the next version if you see any use outside of the DMA code. Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
Hi Thomas, On 4/18/19 8:28 PM, Thomas Gleixner wrote: On Thu, 18 Apr 2019, Julien Grall wrote: When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up patch will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. This patch introduces a new field in msi_desc to store an IOMMU cookie when CONFIG_IOMMU_DMA is selected. # git grep 'This patch' Documentation/process/ Applied to the whole series. Sorry for that. I will rework all the commit messages and resend the series. Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/7] irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg
The functions mbi_compose_m{b, s}i_msg may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent patch split the function iommu_dma_map_msi_msg in 2 functions: one that should be called in preemptible context, the other does not have any requirement. This patch reworks the GICv3 MBI driver to avoid executing preemptible code in non-preemptible context by preparing the MSI mappings when allocating the MSI interrupt. Signed-off-by: Julien Grall --- drivers/irqchip/irq-gic-v3-mbi.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-mbi.c b/drivers/irqchip/irq-gic-v3-mbi.c index fbfa7ff6deb1..c812b80e3ce9 100644 --- a/drivers/irqchip/irq-gic-v3-mbi.c +++ b/drivers/irqchip/irq-gic-v3-mbi.c @@ -84,6 +84,7 @@ static void mbi_free_msi(struct mbi_range *mbi, unsigned int hwirq, static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct mbi_range *mbi = NULL; int hwirq, offset, i, err = 0; @@ -104,6 +105,16 @@ static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = mbi->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_CLRSPI_NSR); + if (err) + return err; + + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_SETSPI_NSR); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = mbi_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) @@ -142,7 +153,7 @@ static void mbi_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) msg[0].address_lo = lower_32_bits(mbi_phys_base + GICD_SETSPI_NSR); msg[0].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(data->irq, msg); } #ifdef CONFIG_PCI_MSI @@ -202,7 +213,7 @@ static void mbi_compose_mbi_msg(struct irq_data *data, struct msi_msg *msg) msg[1].address_lo = lower_32_bits(mbi_phys_base + GICD_CLRSPI_NSR); msg[1].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, &msg[1]); + iommu_dma_compose_msi_msg(data->irq, &msg[1]); } /* Platform-MSI specific irqchip */ -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 7/7] iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
A recent patch introduced two new functions to replace iommu_dma_map_msi_msg() to avoid executing preemptible code in non-preemptible context. All the existings callers are now using the two new functions, so iommu_dma_map_msi_msg() can be removed. Signed-off-by: Julien Grall --- drivers/iommu/dma-iommu.c | 20 include/linux/dma-iommu.h | 5 - 2 files changed, 25 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f5c1f1685095..fdc8ded62e87 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -928,23 +928,3 @@ void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg) msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; msg->address_lo += lower_32_bits(msi_page->iova); } - -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ - struct msi_desc *desc = irq_get_msi_desc(irq); - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; - - if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { - /* -* We're called from a void callback, so the best we can do is -* 'fail' by filling the message with obviously bogus values. -* Since we got this far due to an IOMMU being present, it's -* not like the existing address would have worked anyway... -*/ - msg->address_hi = ~0U; - msg->address_lo = ~0U; - msg->data = ~0U; - } else { - iommu_dma_compose_msi_msg(irq, msg); - } -} diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 2f4b2c2cc859..4fe2b2fb19bf 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -81,7 +81,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); /* Update the MSI message if required. */ void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg); -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); #else @@ -120,10 +119,6 @@ static inline void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg) { } -static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ -} - static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) { } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/7] irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg
The function gicv2m_compose_msi_msg may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent patch split the function iommu_dma_map_msi_msg in 2 functions: one that should be called in preemptible context, the other does not have any requirement. This patch reworks the gicv2m driver to avoid executing preemptible code in non-preemptible context by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- drivers/irqchip/irq-gic-v2m.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c index f5fe0100f9ff..e5372acd92c9 100644 --- a/drivers/irqchip/irq-gic-v2m.c +++ b/drivers/irqchip/irq-gic-v2m.c @@ -110,7 +110,7 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (v2m->flags & GICV2M_NEEDS_SPI_OFFSET) msg->data -= v2m->spi_offset; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(data->irq, msg); } static struct irq_chip gicv2m_irq_chip = { @@ -167,6 +167,7 @@ static void gicv2m_unalloc_msi(struct v2m_data *v2m, unsigned int hwirq, static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct v2m_data *v2m = NULL, *tmp; int hwirq, offset, i, err = 0; @@ -186,6 +187,11 @@ static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = v2m->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + v2m->res.start + V2M_MSI_SETSPI_NS); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = gicv2m_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
On RT, the function iommu_dma_map_msi_msg may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). The function iommu_dma_map_msi_msg is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. This patch split the function iommu_dma_map_msi_msg in two new functions: - iommu_dma_prepare_msi: This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg: This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall --- drivers/iommu/dma-iommu.c | 43 --- include/linux/dma-iommu.h | 21 + 2 files changed, 53 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe637a60..f5c1f1685095 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -888,17 +888,17 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; } -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; - struct iommu_dma_msi_page *msi_page; - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; unsigned long flags; - if (!domain || !domain->iova_cookie) - return; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -908,10 +908,33 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) * of an MSI from within an IPI handler. */ spin_lock_irqsave(&cookie->msi_lock, flags); - msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); + desc->iommu_cookie = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(&cookie->msi_lock, flags); - if (WARN_ON(!msi_page)) { + return (desc->iommu_cookie) ? 0 : -ENOMEM; +} + +void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_get_msi_desc(irq); + struct device *dev = msi_desc_to_dev(desc); + const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_dma_msi_page *msi_page = desc->iommu_cookie; + + if (!domain || !domain->iova_cookie || WARN_ON(!msi_page)) + return; + + msg->address_hi = upper_32_bits(msi_page->iova); + msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; + msg->address_lo += lower_32_bits(msi_page->iova); +} + +void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_get_msi_desc(irq); + phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; + + if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { /* * We're called from a void callback, so the best we can do is * 'fail' by filling the message with obviously bogus values. @@ -922,8 +945,6 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msg->address_lo = ~0U; msg->data = ~0U; } else { - msg->address_hi = upper_32_bits(msi_page->iova); - msg->address_lo &= cookie_msi_granule(cookie) - 1; - msg->address_lo += lower_32_bits(msi_page->iova); + iommu_dma_compose_msi_msg(irq, msg); } } diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index e760dc5d1fa8..2f4b2c2cc859 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -71,12 +71,23 @@ void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir, unsigned long attrs); /* The DMA API isn't _quite_ the whole story, though... */ +/* + * Map the MSI page in the IOMMU device and store it in @desc + * + * Return 0 if succeeded other an error if the preparation has failed. + */ +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); + +/* Update the MSI message if required. */ +void iommu_dma_compose_msi_msg(int irq, struc
[PATCH 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg
The function its_irq_compose_msi_msg may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent patch split the function iommu_dma_map_msi_msg in 2 functions: one that should be called in preemptible context, the other does not have any requirement. This patch reworks the GICv3 ITS driver to avoid executing preemptible code in non-preemptible context by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- drivers/irqchip/irq-gic-v3-its.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7577755bdcf4..1e8e01797d9b 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) msg->address_hi = upper_32_bits(addr); msg->data = its_get_event_id(d); - iommu_dma_map_msi_msg(d->irq, msg); + iommu_dma_compose_msi_msg(d->irq, msg); } static int its_irq_set_irqchip_state(struct irq_data *d, @@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + struct its_node *its = its_dev->its; irq_hw_number_t hwirq; int err; int i; @@ -2574,6 +2575,8 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); + for (i = 0; i < nr_irqs; i++) { err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/7] irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg
The function ls_scfg_msi_compose_msg may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent patch split the function iommu_dma_map_msi_msg in 2 functions: one that should be called in preemptible context, the other does not have any requirement. This patch reworks the FreeScale SCFG MSI driver to avoid executing preemptible code in non-preemptible context by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- drivers/irqchip/irq-ls-scfg-msi.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-ls-scfg-msi.c b/drivers/irqchip/irq-ls-scfg-msi.c index c671b3212010..8099c5b1fcb5 100644 --- a/drivers/irqchip/irq-ls-scfg-msi.c +++ b/drivers/irqchip/irq-ls-scfg-msi.c @@ -100,7 +100,7 @@ static void ls_scfg_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) msg->data |= cpumask_first(mask); } - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(data->irq, msg); } static int ls_scfg_msi_set_affinity(struct irq_data *irq_data, @@ -141,6 +141,7 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct ls_scfg_msi *msi_data = domain->host_data; int pos, err = 0; @@ -157,6 +158,10 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, msi_data->msiir_addr); + if (err) + return err; + irq_domain_set_info(domain, virq, pos, &ls_scfg_msi_parent_chip, msi_data, handle_simple_irq, NULL, NULL); -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi all, On RT, the function iommu_dma_map_msi_msg expects to be called from preemptible context. However, this is not always the case resulting a splat with !CONFIG_DEBUG_ATOMIC_SLEEP: [ 48.875777] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:974 [ 48.875779] in_atomic(): 1, irqs_disabled(): 128, pid: 2103, name: ip [ 48.875782] INFO: lockdep is turned off. [ 48.875784] irq event stamp: 10684 [ 48.875786] hardirqs last enabled at (10683): [] _raw_spin_unlock_irqrestore+0x88/0x90 [ 48.875791] hardirqs last disabled at (10684): [] _raw_spin_lock_irqsave+0x24/0x68 [ 48.875796] softirqs last enabled at (0): [] copy_process.isra.1.part.2+0x8d8/0x1970 [ 48.875801] softirqs last disabled at (0): [<>] (null) [ 48.875805] Preemption disabled at: [ 48.875805] [] __setup_irq+0xd8/0x6c0 [ 48.875811] CPU: 2 PID: 2103 Comm: ip Not tainted 5.0.3-rt1-7-g42ede9a0fed6 #45 [ 48.875815] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jan 23 2017 [ 48.875817] Call trace: [ 48.875818] dump_backtrace+0x0/0x140 [ 48.875821] show_stack+0x14/0x20 [ 48.875823] dump_stack+0xa0/0xd4 [ 48.875827] ___might_sleep+0x16c/0x1f8 [ 48.875831] rt_spin_lock+0x5c/0x70 [ 48.875835] iommu_dma_map_msi_msg+0x5c/0x1d8 [ 48.875839] gicv2m_compose_msi_msg+0x3c/0x48 [ 48.875843] irq_chip_compose_msi_msg+0x40/0x58 [ 48.875846] msi_domain_activate+0x38/0x98 [ 48.875849] __irq_domain_activate_irq+0x58/0xa0 [ 48.875852] irq_domain_activate_irq+0x34/0x58 [ 48.875855] irq_activate+0x28/0x30 [ 48.875858] __setup_irq+0x2b0/0x6c0 [ 48.875861] request_threaded_irq+0xdc/0x188 [ 48.875865] sky2_setup_irq+0x44/0xf8 [ 48.875868] sky2_open+0x1a4/0x240 [ 48.875871] __dev_open+0xd8/0x188 [ 48.875874] __dev_change_flags+0x164/0x1f0 [ 48.875877] dev_change_flags+0x20/0x60 [ 48.875879] do_setlink+0x2a0/0xd30 [ 48.875882] __rtnl_newlink+0x5b4/0x6d8 [ 48.875885] rtnl_newlink+0x50/0x78 [ 48.875888] rtnetlink_rcv_msg+0x178/0x640 [ 48.875891] netlink_rcv_skb+0x58/0x118 [ 48.875893] rtnetlink_rcv+0x14/0x20 [ 48.875896] netlink_unicast+0x188/0x200 [ 48.875898] netlink_sendmsg+0x248/0x3d8 [ 48.875900] sock_sendmsg+0x18/0x40 [ 48.875904] ___sys_sendmsg+0x294/0x2d0 [ 48.875908] __sys_sendmsg+0x68/0xb8 [ 48.875911] __arm64_sys_sendmsg+0x20/0x28 [ 48.875914] el0_svc_common+0x90/0x118 [ 48.875918] el0_svc_handler+0x2c/0x80 [ 48.875922] el0_svc+0x8/0xc This series is a first attempt to rework how MSI are mapped and composed when an IOMMU is present. I was able to test the changes in GICv2m and GICv3 ITS. I don't have hardware for the other interrupt controllers. Cheers, Julien Grall (7): genirq/msi: Add a new field in msi_desc to store an IOMMU cookie iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg iommu/dma-iommu: Remove iommu_dma_map_msi_msg() drivers/iommu/dma-iommu.c | 45 --- drivers/irqchip/irq-gic-v2m.c | 8 ++- drivers/irqchip/irq-gic-v3-its.c | 5 - drivers/irqchip/irq-gic-v3-mbi.c | 15 +++-- drivers/irqchip/irq-ls-scfg-msi.c | 7 +- include/linux/dma-iommu.h | 20 +++-- include/linux/msi.h | 3 +++ 7 files changed, 74 insertions(+), 29 deletions(-) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up patch will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. This patch introduces a new field in msi_desc to store an IOMMU cookie when CONFIG_IOMMU_DMA is selected. Signed-off-by: Julien Grall --- include/linux/msi.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/msi.h b/include/linux/msi.h index 7e9b81c3b50d..d7907feef1bb 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -77,6 +77,9 @@ struct msi_desc { struct device *dev; struct msi_msg msg; struct irq_affinity_desc*affinity; +#ifdef CONFIG_IOMMU_DMA + const void *iommu_cookie; +#endif union { /* PCI MSI/X specific data */ -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Xen-devel] [PATCH] Revert "swiotlb: remove SWIOTLB_MAP_ERROR"
Hi Christoph, On 08/03/2019 15:23, Christoph Hellwig wrote: On Tue, Mar 05, 2019 at 09:41:46AM +, Julien Grall wrote: On Xen, dma_addr_t will always be 64-bit while the phys_addr_t will depend on the MMU type. So we may have phys_addr_t smaller than dma_addr_t from the kernel point of view. How can dma_addr_t on arm have value > 32-bit when physical addresses are 32-bit only? The number of platform with IOMMU protected all DMA-capable device is not yet there. So we decided to not require IOMMU for Dom0. Instead, its memory is be directly mapped (i.e guest physical address == host physical address) and will always be below 4GB to cater the short page table case. In the common case, Dom0 also contains the PV backend drivers. Those drivers may directly use the guest buffer in the DMA request (so a copy is avoided). To avoid using a bounce buffer too much, xen-swiotlb will find the host physical address associated to the guest buffer and will use it to compute the DMA address. While Dom0 kernel may only deal with 32-bit physical address, the hypervisor can still deal with up to 40-bit physical address. This means the guest memory can be allocated above the 4GB threshold. Hence why the dma_addr_t is always 64-bit with CONFIG_XEN=y. Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Xen-devel] [PATCH] Revert "swiotlb: remove SWIOTLB_MAP_ERROR"
Hi Arnd, On 3/5/19 8:16 AM, Arnd Bergmann wrote: On Tue, Mar 5, 2019 at 12:56 AM Robin Murphy wrote: On 2019-03-04 7:59 pm, Arnd Bergmann wrote: This reverts commit b907e20508d0 ("swiotlb: remove SWIOTLB_MAP_ERROR"), which introduced an overflow warning in configurations that have a larger dma_addr_t than phys_addr_t: In file included from include/linux/dma-direct.h:5, from kernel/dma/swiotlb.c:23: kernel/dma/swiotlb.c: In function 'swiotlb_tbl_map_single': include/linux/dma-mapping.h:136:28: error: conversion from 'long long unsigned int' to 'phys_addr_t' {aka 'unsigned int'} changes value from '18446744073709551615' to '4294967295' [-Werror=overflow] #define DMA_MAPPING_ERROR (~(dma_addr_t)0) ^ kernel/dma/swiotlb.c:544:9: note: in expansion of macro 'DMA_MAPPING_ERROR' return DMA_MAPPING_ERROR; The configuration that caused this is on 32-bit ARM, where the DMA address space depends on the enabled hardware platforms, while the physical address space depends on the type of MMU chosen (classic vs LPAE). Are these real platforms, or random configs? Realistically I don't see a great deal of need to support DMA_ADDR_T_64BIT for non-LPAE. Particularly in this case since AFAIK the only selector of SWIOTLB on Arm is Xen, and that by definition is never going to be useful on non-LPAE hardware. ... On Mon, Mar 4, 2019 at 11:00 PM Konrad Rzeszutek Wilk wrote: What about making the phys_addr_t and dma_addr_t have the same width with some magic #ifdef hackery? As far as I can tell, only randconfig builds see this problem, in real systems phys_addr_t is normally the same as dma_addr_t, and you could reasonably have a machine with a larger phys_addr_t than dma_addr_t but wouldn't need to bother. On Xen, dma_addr_t will always be 64-bit while the phys_addr_t will depend on the MMU type. So we may have phys_addr_t smaller than dma_addr_t from the kernel point of view. Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Xen-devel] [PATCH] Revert "swiotlb: remove SWIOTLB_MAP_ERROR"
Hi Robin, On 3/4/19 11:56 PM, Robin Murphy wrote: On 2019-03-04 7:59 pm, Arnd Bergmann wrote: This reverts commit b907e20508d0 ("swiotlb: remove SWIOTLB_MAP_ERROR"), which introduced an overflow warning in configurations that have a larger dma_addr_t than phys_addr_t: In file included from include/linux/dma-direct.h:5, from kernel/dma/swiotlb.c:23: kernel/dma/swiotlb.c: In function 'swiotlb_tbl_map_single': include/linux/dma-mapping.h:136:28: error: conversion from 'long long unsigned int' to 'phys_addr_t' {aka 'unsigned int'} changes value from '18446744073709551615' to '4294967295' [-Werror=overflow] #define DMA_MAPPING_ERROR (~(dma_addr_t)0) ^ kernel/dma/swiotlb.c:544:9: note: in expansion of macro 'DMA_MAPPING_ERROR' return DMA_MAPPING_ERROR; The configuration that caused this is on 32-bit ARM, where the DMA address space depends on the enabled hardware platforms, while the physical address space depends on the type of MMU chosen (classic vs LPAE). Are these real platforms, or random configs? Realistically I don't see a great deal of need to support DMA_ADDR_T_64BIT for non-LPAE. This is selected by CONFIG_XEN no matter the type of MMU chosen (see more below). Particularly in this case since AFAIK the only selector of SWIOTLB on Arm is Xen, and that by definition is never going to be useful on non-LPAE hardware. While Xen itself requires LPAE, it is still possible to run a non-LPAE kernel in the guest. For instance, last time I checked, Debian was shipping only non-LPAE kernel for Arm32. On Arm, swiotlb is only used by the hardware domain (aka Dom0) to allow DMA in memory mapped from other guest. So the returned DMA address may be 64-bit. Hence why we select DMA_ADDR_T_64BIT above. Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Fail to boot Dom0 on Xen Arm64 after "dma-mapping: bypass indirect calls for dma-direct"
(+ Leo Yan who reported a similar issues on xen-devel) Hi Christoph, On 16/01/2019 18:19, Christoph Hellwig wrote: Does this fix your problem? Thank you for the patch. This allows me to boot Dom0 on Juno r2. My knowledge of DMA is quite limited, so I may have missed something. Looking at the change for arm64, you will always call dma-direct API. In previous Linux version, xen-swiotlb will call dev->archdata.dev_dma_ops (a copy of dev->dma_ops before setting Xen DMA ops) if not NULL. Does it mean we expect dev->dma_ops to always be NULL and hence using dma-direct API? Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Fail to boot Dom0 on Xen Arm64 after "dma-mapping: bypass indirect calls for dma-direct"
Hi, I made an attempt to boot Linux 5.0-rc2 as Dom0 on Xen Arm64 and got the following splat: [4.074264] Unable to handle kernel NULL pointer dereference at virtual address [4.083074] Mem abort info: [4.085870] ESR = 0x9604 [4.089050] Exception class = DABT (current EL), IL = 32 bits [4.095065] SET = 0, FnV = 0 [4.098138] EA = 0, S1PTW = 0 [4.101405] Data abort info: [4.104362] ISV = 0, ISS = 0x0004 [4.108289] CM = 0, WnR = 0 [4.111328] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval) [4.118025] [] pgd= [4.123058] Internal error: Oops: 9604 [#1] PREEMPT SMP [4.128590] Modules linked in: [4.131727] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2-00154-gc5dbed6dcf60 #1191 [4.139997] Hardware name: ARM Juno development board (r2) (DT) [4.145995] pstate: 2005 (nzCv daif -PAN -UAO) [4.150876] pc : xen_swiotlb_alloc_coherent+0x64/0x1e8 [4.156092] lr : dma_alloc_attrs+0xf4/0x110 [4.160359] sp : 1003b960 [4.163743] x29: 1003b960 x28: 112cfbb4 [4.169137] x27: 116ed000 x26: 1003ba90 [4.174533] x25: 10d6c350 x24: 0005 [4.179937] x23: 0002 x22: 0001f000 [4.185323] x21: x20: 8008b904b0b0 [4.190727] x19: 114d4000 x18: 14033fff [4.196113] x17: x16: [4.201509] x15: 4000 x14: 110fc000 [4.206903] x13: 114dd000 x12: 0068 [4.212299] x11: 0001 x10: [4.217693] x9 : x8 : 8008b9b05b00 [4.223089] x7 : x6 : [4.228483] x5 : 106d4c38 x4 : [4.233879] x3 : 006000c0 x2 : 1003ba90 [4.239274] x1 : 0002 x0 : [4.244671] Process swapper/0 (pid: 1, stack limit = 0x(ptrval)) [4.251456] Call trace: [4.253985] xen_swiotlb_alloc_coherent+0x64/0x1e8 [4.258857] dma_alloc_attrs+0xf4/0x110 [4.262774] dmam_alloc_attrs+0x64/0xb8 [4.266691] sil24_port_start+0x6c/0x100 [4.270704] ata_host_start.part.10+0x104/0x210 [4.275304] ata_host_activate+0x64/0x148 [4.279392] sil24_init_one+0x1ac/0x2b0 [4.283312] pci_device_probe+0xdc/0x168 [4.287313] really_probe+0x1f0/0x298 [4.291054] driver_probe_device+0x58/0x100 [4.295316] __driver_attach+0xdc/0xe0 [4.299145] bus_for_each_dev+0x74/0xc8 [4.303061] driver_attach+0x20/0x28 [4.306716] bus_add_driver+0x1b0/0x220 [4.310641] driver_register+0x60/0x110 [4.314549] __pci_register_driver+0x58/0x68 [4.318902] sil24_pci_driver_init+0x20/0x28 [4.323251] do_one_initcall+0xb8/0x348 [4.327166] kernel_init_freeable+0x3cc/0x474 [4.331606] kernel_init+0x10/0x100 [4.335171] ret_from_fork+0x10/0x1c [4.338829] Code: f941fa80 aa1503e4 aa1a03e2 aa1703e1 (f945) [4.345028] ---[ end trace 277757f27697a72b ]--- The bisector fingered the following commit: commit 356da6d0cde3323236977fce54c1f9612a742036 Author: Christoph Hellwig Date: Thu Dec 6 13:39:32 2018 -0800 dma-mapping: bypass indirect calls for dma-direct Avoid expensive indirect calls in the fast path DMA mapping operations by directly calling the dma_direct_* ops if we are using the directly mapped DMA operations. Signed-off-by: Christoph Hellwig Acked-by: Jesper Dangaard Brouer Tested-by: Jesper Dangaard Brouer Tested-by: Tony Luck Discussing with Robin, it looks like that the current wrappers in for Xen (see include/xen/arm/page-coherent.h) are not able to cope with direct calls. Those wrappers are used by swiotlb to call the underlying DMA ops of the device. They are unable to cope with NULL (aka direct-mapping), hence the NULL dereference crash. Do you have any suggestion how this should be fixed? Best regards, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC v5 03/17] iommu: introduce a reserved iova cookie
Hi Eric, On 01/03/16 18:27, Eric Auger wrote: diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 0e3b009..7b2bb94 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1072,6 +1072,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, domain->ops = bus->iommu_ops; domain->type = type; + mutex_init(&domain->reserved_mutex); For consistency, the RB-tree reserved_binding_list should be initialized too: domain->reserved_binding_list = RB_ROOT; Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu