Re: [PATCH v4 00/13] MT8195 SMI support
On Tue, 14 Sep 2021 19:36:50 +0800, Yong Wu wrote: > This patchset mainly adds SMI support for mt8195. > > Comparing with the previous version, add two new functions: > a) add smi sub common > b) add initial setting for smi-common and smi-larb. > > Change note: > v4:1) base on v5.15-rc1 >2) In the dt-binding: > a. add "else mediatek,smi: false." in the yaml. > b. Remove mediatek,smi_sub_common. since we have only 2 level currently, > It should be smi-sub-common if that node has "mediatek,smi". otherwise, > it is smi-common. > > [...] Applied, thanks! [01/13] dt-bindings: memory: mediatek: Add mt8195 smi binding commit: b01065eee432b3ae91a2c0aaab66c2cae2e9812d [02/13] dt-bindings: memory: mediatek: Add mt8195 smi sub common commit: 599e681a31a2dfa7359b8e420a1157ed015f840b [03/13] memory: mtk-smi: Use clk_bulk clock ops commit: 0e14917c57f9d8b9b7d4f41813849a29659447b3 [04/13] memory: mtk-smi: Rename smi_gen to smi_type commit: a5c18986f404206fdbc27f208620dc13bffb5657 [05/13] memory: mtk-smi: Adjust some code position commit: 534e0ad2ed4f9296a8c7ccb1a393bc4d5000dbad [06/13] memory: mtk-smi: Add error handle for smi_probe commit: 30b869e77a1c626190bbbe6b4e5f5382b0102ac3 [07/13] memory: mtk-smi: Add device link for smi-sub-common commit: 47404757702ec8aa5c8310cdf58a267081f0ce6c [08/13] memory: mtk-smi: Add clocks for smi-sub-common commit: 3e4f74e0ea5a6a6d6d825fd7afd8a10afbd1152c [09/13] memory: mtk-smi: Use devm_platform_ioremap_resource commit: 912fea8bf8d854aef967c82a279ffd20be0326d7 [10/13] memory: mtk-smi: mt8195: Add smi support commit: cc4f9dcd9c1543394d6ee0d635551a2bd96bcacb [11/13] memory: mtk-smi: mt8195: Add initial setting for smi-common commit: 431e9cab7097b5d5eb3cf2b04d4b12d272df85e0 [12/13] memory: mtk-smi: mt8195: Add initial setting for smi-larb commit: fe6dd2a4017d3dfbf4a530d95270a1d498a16a4c [13/13] MAINTAINERS: Add entry for MediaTek SMI commit: 93403ede5aa4edeec2c63541b185d9c4fc9ae1e4 Best regards, -- Krzysztof Kozlowski ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses
Hi Bjorn, On 9/4/21 3:37 AM, Bjorn Helgaas wrote: From: Bjorn Helgaas 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed the DMA fault reason from hex to decimal. It also added "0x" prefixes to the PCI bus/device, e.g., - DMAR: [INTR-REMAP] Request device [00:00.5] + DMAR: [INTR-REMAP] Request device [0x00:0x00.5] These no longer match dev_printk() and other similar messages in dmar_match_pci_path() and dmar_acpi_insert_dev_scope(). Drop the "0x" prefixes from the bus and device addresses. Fixes: 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") Signed-off-by: Bjorn Helgaas Thank you for this fix. I have queued it for v5.15. Best regards, baolu --- drivers/iommu/intel/dmar.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index d66f79acd14d..8647a355dad0 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1944,18 +1944,18 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type, reason = dmar_get_fault_reason(fault_reason, &fault_type); if (fault_type == INTR_REMAP) - pr_err("[INTR-REMAP] Request device [0x%02x:0x%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n", source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr >> 48, fault_reason, reason); else if (pasid == INVALID_IOASID) - pr_err("[%s NO_PASID] Request device [0x%02x:0x%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[%s NO_PASID] Request device [%02x:%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", type ? "DMA Read" : "DMA Write", source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); else - pr_err("[%s PASID 0x%x] Request device [0x%02x:0x%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[%s PASID 0x%x] Request device [%02x:%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", type ? "DMA Read" : "DMA Write", pasid, source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/1] iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses
From: Bjorn Helgaas 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed the DMA fault reason from hex to decimal. It also added "0x" prefixes to the PCI bus/device, e.g., - DMAR: [INTR-REMAP] Request device [00:00.5] + DMAR: [INTR-REMAP] Request device [0x00:0x00.5] These no longer match dev_printk() and other similar messages in dmar_match_pci_path() and dmar_acpi_insert_dev_scope(). Drop the "0x" prefixes from the bus and device addresses. Fixes: 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") Signed-off-by: Bjorn Helgaas Link: https://lore.kernel.org/r/20210903193711.483999-1-helg...@kernel.org Signed-off-by: Lu Baolu --- drivers/iommu/intel/dmar.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 0ec5514c9980..b7708b93f3fa 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1942,18 +1942,18 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type, reason = dmar_get_fault_reason(fault_reason, &fault_type); if (fault_type == INTR_REMAP) - pr_err("[INTR-REMAP] Request device [0x%02x:0x%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n", source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr >> 48, fault_reason, reason); else if (pasid == INVALID_IOASID) - pr_err("[%s NO_PASID] Request device [0x%02x:0x%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[%s NO_PASID] Request device [%02x:%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", type ? "DMA Read" : "DMA Write", source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); else - pr_err("[%s PASID 0x%x] Request device [0x%02x:0x%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[%s PASID 0x%x] Request device [%02x:%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", type ? "DMA Read" : "DMA Write", pasid, source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr, -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/1] iommu/vt-d: A fix for v5.15-rc3
Hi Joerg, A fix is queued for v5.15. It aims to fix: - Drop "0x" prefix from PCI bus & device addresses Please consider it for the iommu/fix branch. Best regards, Lu Baolu Bjorn Helgaas (1): iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses drivers/iommu/intel/dmar.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 7/7] dma/idxd: Use dma-iommu PASID API instead of SVA lib
Showcase a partial usage of the new PASID DMA API, it replaces SVA lib API in terms of obtaining system PASID and addressing mode setup. But the actual work submission is not included. Signed-off-by: Jacob Pan --- drivers/dma/idxd/idxd.h | 4 +++- drivers/dma/idxd/init.c | 36 2 files changed, 19 insertions(+), 21 deletions(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 507f5d5119f9..eaedaf3c3e3b 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -12,6 +12,7 @@ #include #include #include +#include #include "registers.h" #define IDXD_DRIVER_VERSION"1.00" @@ -253,7 +254,8 @@ struct idxd_device { struct iommu_sva *sva; unsigned int pasid; - + enum iommu_dma_pasid_mode spasid_mode; + struct iommu_domain *domain; /* For KVA mapping */ int num_groups; u32 msix_perm_offset; diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index c404a1320536..8f952a4c8909 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include "../dmaengine.h" @@ -32,6 +33,11 @@ static bool sva = true; module_param(sva, bool, 0644); MODULE_PARM_DESC(sva, "Toggle SVA support on/off"); +static int spasid_mode = IOMMU_DMA_PASID_IOVA; +module_param(spasid_mode, int, 0644); +MODULE_PARM_DESC(spasid_mode, +"Supervisor PASID mode (1: pass-through,2: DMA API)"); + #define DRV_NAME "idxd" bool support_enqcmd; @@ -519,35 +525,25 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d static int idxd_enable_system_pasid(struct idxd_device *idxd) { - int flags; - unsigned int pasid; - struct iommu_sva *sva; - - flags = SVM_FLAG_SUPERVISOR_MODE; - - sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags); - if (IS_ERR(sva)) { - dev_warn(&idxd->pdev->dev, -"iommu sva bind failed: %ld\n", PTR_ERR(sva)); - return PTR_ERR(sva); - } + int pasid; + struct iommu_domain *domain = NULL; - pasid = iommu_sva_get_pasid(sva); - if (pasid == IOMMU_PASID_INVALID) { - iommu_sva_unbind_device(sva); + pasid = iommu_dma_pasid_enable(&idxd->pdev->dev, &domain, spasid_mode); + if (pasid == INVALID_IOASID) { + dev_err(&idxd->pdev->dev, "No DMA PASID in mode %d\n", spasid_mode); return -ENODEV; } - - idxd->sva = sva; idxd->pasid = pasid; - dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid); + idxd->spasid_mode = spasid_mode; + idxd->domain = domain; + return 0; } static void idxd_disable_system_pasid(struct idxd_device *idxd) { - - iommu_sva_unbind_device(idxd->sva); + /* TODO: remove sva, restore no PASID PT and PASIDE */ + iommu_dma_pasid_disable(&idxd->pdev->dev, idxd->domain); idxd->sva = NULL; } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 3/7] iommu/vt-d: Add DMA w/ PASID support for PA and IOVA
For physical address(PA) mode, PASID entry for the given supervisor PASID will be set up for HW pass-through (PT). For IOVA mode, the supervisor PASID entry will be configured the same as PASID 0, which is a special PASID for DMA request w/o PASID, a.k.a. RID2PASID. Additional IOTLB flush for the supervisor PASID is also included. Signed-off-by: Jacob Pan --- drivers/iommu/intel/iommu.c | 95 - include/linux/intel-iommu.h | 7 ++- 2 files changed, 100 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index be35284a2016..cbcfd178c16f 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1631,7 +1631,11 @@ static void domain_flush_piotlb(struct intel_iommu *iommu, if (domain->default_pasid) qi_flush_piotlb(iommu, did, domain->default_pasid, addr, npages, ih); - + if (domain->s_pasid) { + pr_alert_ratelimited("%s: pasid %u", __func__, domain->s_pasid); + qi_flush_piotlb(iommu, did, domain->s_pasid, + addr, npages, ih); + } if (!list_empty(&domain->devices)) qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih); } @@ -5535,6 +5539,93 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain, } } +static int intel_enable_pasid_dma(struct device *dev, u32 pasid, int mode) +{ + struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL); + struct pci_dev *pdev = to_pci_dev(dev); + struct device_domain_info *info; + unsigned long flags; + int ret = 0; + + info = get_domain_info(dev); + if (!info) + return -ENODEV; + + if (!dev_is_pci(dev) || !sm_supported(info->iommu)) + return -EINVAL; + + if (intel_iommu_enable_pasid(info->iommu, dev)) + return -ENODEV; + + spin_lock_irqsave(&iommu->lock, flags); + switch (mode) { + case IOMMU_DMA_PASID_BYPASS: + dev_dbg(dev, "%s PT mode", __func__); + /* As a precaution, translation request should be responded with +* physical address. +*/ + if (!hw_pass_through) { + ret = -ENODEV; + goto exit_unlock; + } + /* HW may use large page for ATS */ + pci_disable_ats(pdev); + ret = intel_pasid_setup_pass_through(info->iommu, info->domain, + dev, pasid); + if (ret) + dev_err(dev, "Failed SPASID %d BYPASS", pasid); + break; + case IOMMU_DMA_PASID_IOVA: + dev_dbg(dev, "%s IOVA mode", __func__); + /* +* We could use SL but FL is preferred for consistency with VM +* where vIOMMU exposes FL only cap +*/ + if (!domain_use_first_level(info->domain)) + return -EINVAL; + /* To be used for IOTLB flush at PASID granularity */ + info->domain->s_pasid = pasid; + ret = domain_setup_first_level(info->iommu, info->domain, dev, + pasid); + break; + default: + dev_err(dev, "Invalid PASID DMA mode %d", mode); + ret = -EINVAL; + goto exit_unlock; + } + info->pasid_mode = mode; +exit_unlock: + spin_unlock_irqrestore(&iommu->lock, flags); + + return ret; +} + +static int intel_disable_pasid_dma(struct device *dev) +{ + struct device_domain_info *info; + int ret = 0; + + info = get_domain_info(dev); + if (!info) + return -ENODEV; + + if (!dev_is_pci(dev) || !sm_supported(info->iommu)) + return -EINVAL; + + if (intel_iommu_enable_pasid(info->iommu, dev)) + return -ENODEV; + + if (!dev->iommu) { + dev_err(dev, "No IOMMU params"); + return -ENODEV; + } + dev_info(dev, "Tearing down DMA PASID %d", info->domain->s_pasid); + intel_pasid_tear_down_entry(info->iommu, info->dev, info->domain->s_pasid, false); + + info->domain->s_pasid = 0; + return ret; +} + const struct iommu_ops intel_iommu_ops = { .capable= intel_iommu_capable, .domain_alloc = intel_iommu_domain_alloc, @@ -5573,6 +5664,8 @@ const struct iommu_ops intel_iommu_ops = { .sva_get_pasid = intel_svm_get_pasid, .page_response = intel_svm_page_response, #endif + .enable_pasid_dma = intel_enable_pasid_dma, + .disable_pasid_dma = intel_disable_pasid_dma, }; static void quirk_iommu_igfx(struct pci_dev *dev) diff --git a/include/linux/intel-iommu.h b
[RFC 5/7] iommu/vt-d: Add support for KVA PASID mode
To support KVA fast mode, the VT-d driver must support domain allocation of IOMMU_DOMAIN_KVA type. Since all devices in fast KVA mode share the same kernel mapping, a single KVA domain is sufficient. This global KVA domain contains the kernel mapping, i.e. init_mm.pgd. The programming of the KVA domain follows the existing flow of auxiliary domain attachment. Signed-off-by: Jacob Pan --- drivers/iommu/intel/iommu.c | 59 ++--- 1 file changed, 55 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index cbcfd178c16f..0dabd5f75acf 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -293,6 +293,9 @@ static inline void context_clear_entry(struct context_entry *context) * 3. Each iommu mapps to this domain if successful. */ static struct dmar_domain *si_domain; + +/* This domain is used for shared virtual addressing with CPU kernel mapping */ +static struct dmar_domain *kva_domain; static int hw_pass_through = 1; #define for_each_domain_iommu(idx, domain) \ @@ -1989,6 +1992,10 @@ static void domain_exit(struct dmar_domain *domain) /* Remove associated devices and clear attached or cached domains */ domain_remove_dev_info(domain); + /* There is no IOMMU page table for KVA */ + if (domain->pgd == (struct dma_pte *)init_mm.pgd) + return; + /* destroy iovas */ if (domain->domain.type == IOMMU_DOMAIN_DMA) iommu_put_dma_cookie(&domain->domain); @@ -2533,6 +2540,10 @@ static int domain_setup_first_level(struct intel_iommu *iommu, int agaw, level; int flags = 0; + if (domain->domain.type == IOMMU_DOMAIN_KVA) { + flags |= PASID_FLAG_SUPERVISOR_MODE; + goto do_setup; + } /* * Skip top levels of page tables for iommu which has * less agaw than default. Unnecessary for PT mode. @@ -2554,7 +2565,7 @@ static int domain_setup_first_level(struct intel_iommu *iommu, if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED) flags |= PASID_FLAG_PAGE_SNOOP; - +do_setup: return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid, domain->iommu_did[iommu->seq_id], flags); @@ -2713,7 +2724,28 @@ static int iommu_domain_identity_map(struct dmar_domain *domain, } static int md_domain_init(struct dmar_domain *domain, int guest_width); +#ifdef CONFIG_INTEL_IOMMU_SVM +static int __init kva_domain_init(void) +{ + struct dmar_domain *dmar_domain; + struct iommu_domain *domain; + kva_domain = alloc_domain(0); + if (!kva_domain) { + pr_err("Can't allocate KVA domain\n"); + return -EFAULT; + } + kva_domain->pgd = (struct dma_pte *)init_mm.pgd; + domain = &kva_domain->domain; + domain->type = IOMMU_DOMAIN_KVA; + /* REVISIT: may not need this other than sanity check */ + domain->geometry.aperture_start = 0; + domain->geometry.aperture_end = + __DOMAIN_MAX_ADDR(dmar_domain->gaw); + domain->geometry.force_aperture = true; + return 0; +} +#endif static int __init si_domain_init(int hw) { struct dmar_rmrr_unit *rmrr; @@ -3363,6 +3395,11 @@ static int __init init_dmars(void) down_write(&dmar_global_lock); if (ret) goto free_iommu; + /* For in-kernel DMA with PASID in SVA */ + ret = kva_domain_init(); + if (ret) + goto free_iommu; + } #endif ret = dmar_set_interrupt(iommu); @@ -4558,6 +4595,9 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) domain->geometry.force_aperture = true; return domain; + case IOMMU_DOMAIN_KVA: + /* Use a global domain for shared KVA mapping */ + return &kva_domain->domain; case IOMMU_DOMAIN_IDENTITY: return &si_domain->domain; default: @@ -4583,7 +4623,8 @@ is_aux_domain(struct device *dev, struct iommu_domain *domain) struct device_domain_info *info = get_domain_info(dev); return info && info->auxd_enabled && - domain->type == IOMMU_DOMAIN_UNMANAGED; + (domain->type == IOMMU_DOMAIN_UNMANAGED || + domain->type == IOMMU_DOMAIN_KVA); } static inline struct subdev_domain_info * @@ -4693,8 +4734,8 @@ static int aux_domain_add_dev(struct dmar_domain *domain, if (ret) goto attach_failed; - /* Setup the PASID entry for mediated devices: */ - if (domain_use_first_level(domain)) + /* Setup the PASID entry for devices do DMA
[RFC 6/7] iommu: Add KVA map API
This patch adds KVA map API. It enforces KVA address range checking and other potential sanity checks. Currently, only the direct map range is checked. For trusted devices, this API returns immediately after the above sanity check. For untrusted devices, this API serves as a simple wrapper around IOMMU map/unmap APIs. OPEN: Alignment at the minimum page size is required, not as rich and flexible as DMA-APIs. Signed-off-by: Jacob Pan --- drivers/iommu/iommu.c | 57 +++ include/linux/iommu.h | 5 2 files changed, 62 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index acfdcd7ebd6a..45ba55941209 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2490,6 +2490,63 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, } EXPORT_SYMBOL_GPL(iommu_map); +/* + * REVISIT: This might not be sufficient. Could also check permission match, + * exclude kernel text, etc. + */ +static inline bool is_kernel_direct_map(unsigned long start, phys_addr_t size) +{ + return (start >= PAGE_OFFSET) && ((start + size) <= VMALLOC_START); +} + +/** + * @brief Map kernel virtual address for DMA remap. DMA request with + * domain's default PASID will target kernel virtual address space. + * + * @param domain Domain contains the PASID + * @param page Kernel virtual address + * @param size Size to map + * @param prot Permissions + * @return int 0 on success or error code + */ +int iommu_map_kva(struct iommu_domain *domain, struct page *page, + size_t size, int prot) +{ + phys_addr_t phys = page_to_phys(page); + void *kva = phys_to_virt(phys); + + /* +* TODO: Limit DMA to kernel direct mapping only, avoid dynamic range +* until we have mmu_notifier for making IOTLB coherent with CPU. +*/ + if (!is_kernel_direct_map((unsigned long)kva, size)) + return -EINVAL; + /* KVA domain type indicates shared CPU page table, skip building +* IOMMU page tables. This is the fast mode where only sanity check +* is performed. +*/ + if (domain->type == IOMMU_DOMAIN_KVA) + return 0; + + return iommu_map(domain, (unsigned long)kva, phys, size, prot); +} +EXPORT_SYMBOL_GPL(iommu_map_kva); + +int iommu_unmap_kva(struct iommu_domain *domain, void *kva, + size_t size) +{ + if (!is_kernel_direct_map((unsigned long)kva, size)) + return -EINVAL; + + if (domain->type == IOMMU_DOMAIN_KVA) { + pr_debug_ratelimited("unmap kva skipped %llx", (u64)kva); + return 0; + } + /* REVISIT: do we need a fast version? */ + return iommu_unmap(domain, (unsigned long)kva, size); +} +EXPORT_SYMBOL_GPL(iommu_unmap_kva); + int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index cd8225f6bc23..c0fac050ca57 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -427,6 +427,11 @@ extern size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, extern size_t iommu_map_sg_atomic(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot); +extern int iommu_map_kva(struct iommu_domain *domain, +struct page *page, size_t size, int prot); +extern int iommu_unmap_kva(struct iommu_domain *domain, + void *kva, size_t size); + extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova); extern void iommu_set_fault_handler(struct iommu_domain *domain, iommu_fault_handler_t handler, void *token); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 1/7] ioasid: reserve special PASID for in-kernel DMA
IOASIDs associated with user processes are subject to resource restrictions. However, in-kernel usages can target only one global kernel virtual address mapping. Reserve a special IOASID for the devices that perform DMA requests with PASID. This global PASID is excluded from the IOASID allocator. Signed-off-by: Jacob Pan --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/ioasid.c | 2 ++ drivers/iommu/iommu-sva-lib.c | 1 + include/linux/ioasid.h | 4 4 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index bb251cab61f3..c5fb89bd6229 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -329,7 +329,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) return ERR_PTR(-ENOMEM); /* Allocate a PASID for this mm if necessary */ - ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1); + ret = iommu_sva_alloc_pasid(mm, IOASID_ALLOC_START, (1U << master->ssid_bits) - 1); if (ret) goto err_free_bond; diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c index 50ee27bbd04e..89c6132bf1ec 100644 --- a/drivers/iommu/ioasid.c +++ b/drivers/iommu/ioasid.c @@ -317,6 +317,8 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, data->private = private; refcount_set(&data->refs, 1); + if (min < IOASID_ALLOC_BASE) + min = IOASID_ALLOC_BASE; /* * Custom allocator needs allocator data to perform platform specific * operations. diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c index bd41405d34e9..4f56843517e5 100644 --- a/drivers/iommu/iommu-sva-lib.c +++ b/drivers/iommu/iommu-sva-lib.c @@ -28,6 +28,7 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max) int ret = 0; ioasid_t pasid; + /* TODO: no need to check!! reserved range is checked in ioasid_alloc() */ if (min == INVALID_IOASID || max == INVALID_IOASID || min == 0 || max < min) return -EINVAL; diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h index e9dacd4b9f6b..4d435cbd48b8 100644 --- a/include/linux/ioasid.h +++ b/include/linux/ioasid.h @@ -6,6 +6,10 @@ #include #define INVALID_IOASID ((ioasid_t)-1) +#define IOASID_DMA_NO_PASID0 /* For DMA request w/o PASID */ +#define IOASID_DMA_PASID 1 /* For in-kernel DMA w/ PASID */ +#define IOASID_ALLOC_BASE 2 /* Start of the allocation */ + typedef unsigned int ioasid_t; typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data); typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 4/7] dma-iommu: Add support for DMA w/ PASID in KVA
Sharing virtual addresses between DMA and CPU has many advantages. It simplifies the programming model, enhances DMA security over physical addresses. This patch adds KVA support for DMA IOMMU API. Strict and fast sub-modes are supported transparently based on the device trustfulness. The strict mode is intended for untrusted devices. KVA mapping is established on-demand by a separate IOMMU page table referenced by a supervisor PASID. An aux domain is allocated per device to carry the supervisor PASID. The fast mode is for trusted devices where KVA mapping is shared with the CPU via kernel page table. Vendor IOMMU driver can choose to use a global KVA domain for all devices in fast KVA mode. The follow-up patches will introduce iommu_map_kva() API where KVA domains will be used.The performance advantage of the fast mode rests upon the fact that there is no need to build/teardown a separate IOMMU page table for each DMA buffer. Though multiple PASIDs and domains per device can be supported the lack of compelling usages leads to a single PASID option for now. Signed-off-by: Jacob Pan --- drivers/iommu/dma-iommu.c | 75 ++- drivers/iommu/iommu.c | 6 include/linux/dma-iommu.h | 6 ++-- include/linux/iommu.h | 6 4 files changed, 83 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 490731659def..5b25dbcef8ee 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -174,9 +174,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain) } EXPORT_SYMBOL(iommu_put_dma_cookie); +static bool dev_is_untrusted(struct device *dev) +{ + return dev_is_pci(dev) && to_pci_dev(dev)->untrusted; +} + /** * iommu_dma_pasid_enable --Enable device DMA request with PASID * @dev: Device to be enabled + * @domain:IOMMU domain returned for KVA mode mapping * @mode: DMA addressing mode * * The following mutually exclusive DMA addressing modes are supported: @@ -189,10 +195,25 @@ EXPORT_SYMBOL(iommu_put_dma_cookie); * tables. PCI requester ID (RID) and RID+PASID will be pointed to the same * PGD. i.e. the default DMA domain will be used by both RID and RID+PASID. * + * 3. KVA mode. DMA address == CPU virtual address. There are two sub-modes: + * strict mode and fast mode. + * The strict mode is intended for the untrusted devices, where DMA address + * is identical to KVA but restricted per device on the supervisor PASID. + * The fast mode is for trusted devices where its DMA is only restricted + * by the kernel page tables used by the CPU. iommu_domains with UNMANAGED + * and KVA types are returned respectively. They are used by iommu_map_kva() + * + * The performance advantage of the fast mode (based on whether the device + * is trusted or user allowed), relies on the fact that there is no need + * to build/teardown a separate IOMMU page tables for KVA mapping. + * + * Though multiple PASIDs and domains per device can be supported but the + * lack of compelling usages lead to a single PASID option for now. + * * @return the supervisor PASID to be used for DMA. Or INVALID_IOASID upon * failure. */ -int iommu_dma_pasid_enable(struct device *dev, +int iommu_dma_pasid_enable(struct device *dev, struct iommu_domain **domain, enum iommu_dma_pasid_mode mode) { int pasid = INVALID_IOASID; @@ -201,8 +222,37 @@ int iommu_dma_pasid_enable(struct device *dev, /* TODO: only allow a single mode each time, track PASID DMA enabling * status per device. Perhaps add a flag in struct device.dev_iommu. */ + if (mode == IOMMU_DMA_PASID_KVA) { + if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX)) { + dev_err(dev, "No aux domain support"); + goto exit; + }; + /* +* Untrusted devices gets an unmanaged domain which will be +* returned to the caller for strict IOMMU API KVA mapping. +* Trusted device gets a special KVA domain with init_mm.pgd +* assigned. +*/ + if (dev_is_untrusted(dev)) + dom = iommu_domain_alloc(dev->bus); + else + dom = iommu_domain_alloc_kva(dev->bus); + if (!dom) { + dev_err(dev, "No KVA iommu domain allocated"); + goto exit_disable_aux; + } + if (iommu_aux_attach_device(dom, dev)) { + dev_err(dev, "Failed to attach KVA iommu domain"); + goto exit_free_domain; + }; + pasid = iommu_aux_get_pasid(dom, dev); + + dev_dbg(dev, "KVA mode pasid %d", pasi
[RFC 2/7] dma-iommu: Add API for DMA request with PASID
DMA-API is the standard way for device drivers to perform in-kernel DMA requests. If security is on, isolation is enforced by the IOMMU at requester ID (RID) granularity. With the introduction of process address space ID (PASID), DMA security is enforced per RID+PASID granularity. On some modern platforms, DMA request with PASID is the only option available. e.g. The shared work queues in Intel Data Streaming Accelerators (DSA). To support DMA with PASID while maintaining the ubiquitous usage of DMA API, this patch introduces a new IOMMU DMA API that supports two addressing modes: 1. Physical address or bypass mode. This is similar to DMA direct mode, where trusted devices can DMA passthrough IOMMU. 2. IOVA mode that abides DMA APIs. Once set up, callers can use DMA API as-is. DMA requests w/ and w/o PASID will be mapped by the same page tables. i.e. the default DMA domain will be used by both RID and RID+PASID. Signed-off-by: Jacob Pan --- drivers/iommu/dma-iommu.c | 54 +++ include/linux/dma-iommu.h | 12 + include/linux/iommu.h | 2 ++ 3 files changed, 68 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 7bcdd1205535..490731659def 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -174,6 +174,60 @@ void iommu_put_dma_cookie(struct iommu_domain *domain) } EXPORT_SYMBOL(iommu_put_dma_cookie); +/** + * iommu_dma_pasid_enable --Enable device DMA request with PASID + * @dev: Device to be enabled + * @mode: DMA addressing mode + * + * The following mutually exclusive DMA addressing modes are supported: + * + * 1. Physical address or bypass mode. This is similar to DMA direct where + * trusted devices can DMA pass-through IOMMU. + * + * 2. IOVA mode. This abides DMA APIs. Once set up, callers can use DMA API + * as-is. DMA request w/ and w/o PASID will be mapped by the same page + * tables. PCI requester ID (RID) and RID+PASID will be pointed to the same + * PGD. i.e. the default DMA domain will be used by both RID and RID+PASID. + * + * @return the supervisor PASID to be used for DMA. Or INVALID_IOASID upon + * failure. + */ +int iommu_dma_pasid_enable(struct device *dev, + enum iommu_dma_pasid_mode mode) +{ + int pasid = INVALID_IOASID; + struct iommu_domain *dom = NULL; + + /* TODO: only allow a single mode each time, track PASID DMA enabling +* status per device. Perhaps add a flag in struct device.dev_iommu. +*/ + + /* Call vendor drivers to handle IOVA, bypass mode */ + dom = iommu_get_domain_for_dev(dev); + if (dom->ops->enable_pasid_dma(dev, IOASID_DMA_PASID, mode)) { + dev_dbg(dev, "Failed to enable DMA pasid in mode %d", mode); + goto exit; + } + pasid = IOASID_DMA_PASID; + + dev_dbg(dev, "Enable DMA pasid %d in mode %d", pasid, mode); + goto exit; +exit: + return pasid; +} +EXPORT_SYMBOL(iommu_dma_pasid_enable); + +int iommu_dma_pasid_disable(struct device *dev) +{ + struct iommu_domain *dom; + + /* Call vendor iommu ops to clean up supervisor PASID context */ + dom = iommu_get_domain_for_dev(dev); + + return dom->ops->disable_pasid_dma(dev); +} +EXPORT_SYMBOL(iommu_dma_pasid_disable); + /** * iommu_dma_get_resv_regions - Reserved region driver helper * @dev: Device from iommu_get_resv_regions() diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 6e75a2d689b4..3c1555e0fd51 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -17,6 +17,18 @@ int iommu_get_dma_cookie(struct iommu_domain *domain); int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base); void iommu_put_dma_cookie(struct iommu_domain *domain); +enum iommu_dma_pasid_mode { + /* Pass-through mode, use physical address */ + IOMMU_DMA_PASID_BYPASS = 1, + /* Compatible with DMA APIs, same mapping as DMA w/o PASID */ + IOMMU_DMA_PASID_IOVA, +}; +/* For devices that can do DMA request with PASID, setup the system + * based on the address mode selected. + */ +int iommu_dma_pasid_enable(struct device *dev, + enum iommu_dma_pasid_mode mode); +int iommu_dma_pasid_disable(struct device *dev); /* Setup call for arch DMA mapping code */ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..610cbfd03e6b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -283,6 +283,8 @@ struct iommu_ops { int (*def_domain_type)(struct device *dev); + int (*enable_pasid_dma)(struct device *dev, u32 pasid, int mode); + int (*disable_pasid_dma)(struct device *dev); unsigned long pgsize_bitmap; struct module *owner; }; -- 2.25.1 ___ iommu mailing list iommu@li
[RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Joerg/Jason/Christoph et all, The current in-kernel supervisor PASID support is based on the SVM/SVA machinery in sva-lib. Kernel SVA is achieved by extending a special flag to indicate the binding of the device and a page table should be performed on init_mm instead of the mm of the current process.Page requests and other differences between user and kernel SVA are handled as special cases. This unrestricted binding with the kernel page table is being challenged for security and the convention that in-kernel DMA must be compatible with DMA APIs. (https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/) There is also the lack of IOTLB synchronization upon kernel page table updates. This patchset is trying to address these concerns by having an explicit DMA API compatible model while continue to support in-kernel use of DMA requests with PASID. Specifically, the following DMA-IOMMU APIs are introduced: int iommu_dma_pasid_enable/disable(struct device *dev, struct iommu_domain **domain, enum iommu_dma_pasid_mode mode); int iommu_map/unmap_kva(struct iommu_domain *domain, void *cpu_addr,size_t size, int prot); The following three addressing modes are supported with example API usages by device drivers. 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices can DMA pass through IOMMU on a per PASID basis. Example: pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS); /* Use the returning PASID and PA for work submission */ 2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as the PCI requester ID (RID) Example: pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA); /* Use the PASID and DMA API allocated IOVA for work submission */ 3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes transparently based on device trustfulness. Example: pasid = iommu_dma_pasid_enable(dev, &domain, IOMMU_DMA_PASID_KVA); iommu_map_kva(domain, &buf, size, prot); /* Use the returned PASID and KVA to submit work */ Where: Fast mode: Shared CPU page tables for trusted devices only Strict mode: IOMMU domain returned for the untrusted device to replicate KVA-PA mapping in IOMMU page tables. On a per device basis, DMA address and performance modes are enabled by the device drivers. Platform information such as trustability, user command line input (not included in this set) could also be taken into consideration (not implemented in this RFC). This RFC is intended to communicate the API directions. Little testing is done outside IDXD and DMA engine tests. For PA and IOVA modes, the implementation is straightforward and tested with Intel IDXD driver. But several opens remain in KVA fast mode thus not tested: 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a result of module loading/eBPF load. Adding kernel mmu notifier? 2. The use of the auxiliary domain for KVA map, will aux domain stay in the long term? Is there another way to represent sub-device granu isolation? 3. Is limiting the KVA sharing to the direct map range reasonable and practical for all architectures? Many thanks to Ashok Raj, Kevin Tian, and Baolu who provided feedback and many ideas in this set. Thanks, Jacob Jacob Pan (7): ioasid: reserve special PASID for in-kernel DMA dma-iommu: Add API for DMA request with PASID iommu/vt-d: Add DMA w/ PASID support for PA and IOVA dma-iommu: Add support for DMA w/ PASID in KVA iommu/vt-d: Add support for KVA PASID mode iommu: Add KVA map API dma/idxd: Use dma-iommu PASID API instead of SVA lib drivers/dma/idxd/idxd.h | 4 +- drivers/dma/idxd/init.c | 36 ++-- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/dma-iommu.c | 123 +- drivers/iommu/intel/iommu.c | 154 +- drivers/iommu/ioasid.c| 2 + drivers/iommu/iommu-sva-lib.c | 1 + drivers/iommu/iommu.c | 63 +++ include/linux/dma-iommu.h | 14 ++ include/linux/intel-iommu.h | 7 +- include/linux/ioasid.h| 4 + include/linux/iommu.h | 13 ++ 12 files changed, 390 insertions(+), 33 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 04/20] iommu: Add iommu_device_get_info interface
On Wed, Sep 22, 2021 at 10:31:47AM +0800, Lu Baolu wrote: > Hi Jason, > > On 9/22/21 12:19 AM, Jason Gunthorpe wrote: >> On Sun, Sep 19, 2021 at 02:38:32PM +0800, Liu Yi L wrote: >>> From: Lu Baolu >>> >>> This provides an interface for upper layers to get the per-device iommu >>> attributes. >>> >>> int iommu_device_get_info(struct device *dev, >>>enum iommu_devattr attr, void *data); >> >> Can't we use properly typed ops and functions here instead of a void >> *data? >> >> get_snoop() >> get_page_size() >> get_addr_width() > > Yeah! Above are more friendly to the upper layer callers. The other option would be a struct with all the attributes. Still type safe, but not as many methods. It'll require a little boilerplate in the callers, though. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 16/20] vfio/type1: Export symbols for dma [un]map code sharing
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 2:15 AM > > On Sun, Sep 19, 2021 at 02:38:44PM +0800, Liu Yi L wrote: > > [HACK. will fix in v2] > > > > There are two options to impelement vfio type1v2 mapping semantics in > > /dev/iommu. > > > > One is to duplicate the related code from vfio as the starting point, > > and then merge with vfio type1 at a later time. However > vfio_iommu_type1.c > > has over 3000LOC with ~80% related to dma management logic, including: > > I can't really see a way forward like this. I think some scheme to > move the vfio datastructure is going to be necessary. > > > - the dma map/unmap metadata management > > - page pinning, and related accounting > > - iova range reporting > > - dirty bitmap retrieving > > - dynamic vaddr update, etc. > > All of this needs to be part of the iommufd anyhow.. yes > > > The alternative is to consolidate type1v2 logic in /dev/iommu immediately, > > which requires converting vfio_iommu_type1 to be a shim driver. > > Another choice is the the datastructure coulde move and the two > drivers could share its code and continue to exist more independently > where to put the shared code? btw this is one major open that I plan to discuss in LPC. 😊 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 15/20] vfio/pci: Add VFIO_DEVICE_[DE]ATTACH_IOASID
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 2:04 AM > > On Sun, Sep 19, 2021 at 02:38:43PM +0800, Liu Yi L wrote: > > This patch adds interface for userspace to attach device to specified > > IOASID. > > > > Note: > > One device can only be attached to one IOASID in this version. This is > > on par with what vfio provides today. In the future this restriction can > > be relaxed when multiple I/O address spaces are supported per device > > ?? In VFIO the container is the IOS and the container can be shared > with multiple devices. This needs to start at about the same > functionality. a device can be only attached to one container. One container can be shared by multiple devices. a device can be only attached to one IOASID. One IOASID can be shared by multiple devices. it does start at the same functionality. > > > + } else if (cmd == VFIO_DEVICE_ATTACH_IOASID) { > > This should be in the core code, right? There is nothing PCI specific > here. > but if you insist on a pci-wrapper attach function, we still need something here (e.g. with .attach_ioasid() callback)? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 14/20] iommu/iommufd: Add iommufd_device_[de]attach_ioasid()
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 2:02 AM > > > +static int ioas_check_device_compatibility(struct iommufd_ioas *ioas, > > + struct device *dev) > > +{ > > + bool snoop = false; > > + u32 addr_width; > > + int ret; > > + > > + /* > > +* currently we only support I/O page table with iommu enforce- > snoop > > +* format. Attaching a device which doesn't support this format in its > > +* upstreaming iommu is rejected. > > +*/ > > + ret = iommu_device_get_info(dev, > IOMMU_DEV_INFO_FORCE_SNOOP, &snoop); > > + if (ret || !snoop) > > + return -EINVAL; > > + > > + ret = iommu_device_get_info(dev, > IOMMU_DEV_INFO_ADDR_WIDTH, &addr_width); > > + if (ret || addr_width < ioas->addr_width) > > + return -EINVAL; > > + > > + /* TODO: also need to check permitted iova ranges and pgsize > bitmap */ > > + > > + return 0; > > +} > > This seems kind of weird.. > > I expect the iommufd to hold a SW copy of the IO page table and each > time a new domain is to be created it should push the SW copy into the > domain. If the domain cannot support it then the domain driver should > naturally fail a request. > > When the user changes the IO page table the SW copy is updated then > all of the domains are updated too - again if any domain cannot > support the change then it fails and the change is rolled back. > > It seems like this is a side effect of roughly hacking in the vfio > code? Actually this was one open we closed in previous design proposal, but looks you have a different thought now. vfio maintains one ioas per container. Devices in the container can be attached to different domains (e.g. due to snoop format). Every time when the ioas is updated, every attached domain is updated in accordance. You recommended one-ioas-one-domain model instead, i.e. any device with a format incompatible with the one currently used in ioas has to be attached to a new ioas, even if the two ioas's have the same mapping. This leads to compatibility check at attaching time. Now you want returning back to the vfio model? > > > + /* > > +* Each ioas is backed by an iommu domain, which is allocated > > +* when the ioas is attached for the first time and then shared > > +* by following devices. > > +*/ > > + if (list_empty(&ioas->device_list)) { > > Seems strange, what if the devices are forced to have different > domains? We don't want to model that in the SW layer.. this is due to above background > > > + /* Install the I/O page table to the iommu for this device */ > > + ret = iommu_attach_device(domain, idev->dev); > > + if (ret) > > + goto out_domain; > > This is where things start to get confusing when you talk about PASID > as the above call needs to be some PASID centric API. yes, for pasid new api (e.g. iommu_attach_device_pasid()) will be added. but here we only talk about physical device, and iommu_attach_device() is only for physical device. > > > @@ -27,6 +28,16 @@ struct iommufd_device * > > iommufd_bind_device(int fd, struct device *dev, u64 dev_cookie); > > void iommufd_unbind_device(struct iommufd_device *idev); > > > > +int iommufd_device_attach_ioasid(struct iommufd_device *idev, int > ioasid); > > +void iommufd_device_detach_ioasid(struct iommufd_device *idev, int > ioasid); > > + > > +static inline int > > +__pci_iommufd_device_attach_ioasid(struct pci_dev *pdev, > > + struct iommufd_device *idev, int ioasid) > > +{ > > + return iommufd_device_attach_ioasid(idev, ioasid); > > +} > > If think sis taking in the iommfd_device then there isn't a logical > place to signal the PCIness can you elaborate? > > But, I think the API should at least have a kdoc that this is > capturing the entire device and specify that for PCI this means all > TLPs with the RID. > yes, this should be documented. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 12/20] iommu/iommufd: Add IOMMU_CHECK_EXTENSION
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 1:47 AM > > On Sun, Sep 19, 2021 at 02:38:40PM +0800, Liu Yi L wrote: > > As aforementioned, userspace should check extension for what formats > > can be specified when allocating an IOASID. This patch adds such > > interface for userspace. In this RFC, iommufd reports EXT_MAP_TYPE1V2 > > support and no no-snoop support yet. > > > > Signed-off-by: Liu Yi L > > drivers/iommu/iommufd/iommufd.c | 7 +++ > > include/uapi/linux/iommu.h | 27 +++ > > 2 files changed, 34 insertions(+) > > > > diff --git a/drivers/iommu/iommufd/iommufd.c > b/drivers/iommu/iommufd/iommufd.c > > index 4839f128b24a..e45d76359e34 100644 > > +++ b/drivers/iommu/iommufd/iommufd.c > > @@ -306,6 +306,13 @@ static long iommufd_fops_unl_ioctl(struct file > *filep, > > return ret; > > > > switch (cmd) { > > + case IOMMU_CHECK_EXTENSION: > > + switch (arg) { > > + case EXT_MAP_TYPE1V2: > > + return 1; > > + default: > > + return 0; > > + } > > case IOMMU_DEVICE_GET_INFO: > > ret = iommufd_get_device_info(ictx, arg); > > break; > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > > index 5cbd300eb0ee..49731be71213 100644 > > +++ b/include/uapi/linux/iommu.h > > @@ -14,6 +14,33 @@ > > #define IOMMU_TYPE (';') > > #define IOMMU_BASE 100 > > > > +/* > > + * IOMMU_CHECK_EXTENSION - _IO(IOMMU_TYPE, IOMMU_BASE + 0) > > + * > > + * Check whether an uAPI extension is supported. > > + * > > + * It's unlikely that all planned capabilities in IOMMU fd will be ready > > + * in one breath. User should check which uAPI extension is supported > > + * according to its intended usage. > > + * > > + * A rough list of possible extensions may include: > > + * > > + * - EXT_MAP_TYPE1V2 for vfio type1v2 map semantics; > > + * - EXT_DMA_NO_SNOOP for no-snoop DMA support; > > + * - EXT_MAP_NEWTYPE for an enhanced map semantics; > > + * - EXT_MULTIDEV_GROUP for 1:N iommu group; > > + * - EXT_IOASID_NESTING for what the name stands; > > + * - EXT_USER_PAGE_TABLE for user managed page table; > > + * - EXT_USER_PASID_TABLE for user managed PASID table; > > + * - EXT_DIRTY_TRACKING for tracking pages dirtied by DMA; > > + * - ... > > + * > > + * Return: 0 if not supported, 1 if supported. > > + */ > > +#define EXT_MAP_TYPE1V21 > > +#define EXT_DMA_NO_SNOOP 2 > > +#define IOMMU_CHECK_EXTENSION _IO(IOMMU_TYPE, > IOMMU_BASE + 0) > > I generally advocate for a 'try and fail' approach to discovering > compatibility. > > If that doesn't work for the userspace then a query to return a > generic capability flag is the next best idea. Each flag should > clearly define what 'try and fail' it is talking about We don't have strong preference here. Just follow what vfio does today. So Alex's opinion is appreciated here. 😊 > > Eg dma_no_snoop is about creating an IOS with flag NO SNOOP set > > TYPE1V2 seems like nonsense just in case other mapping protocols are introduced in the future > > Not sure about the others. > > IOW, this should recast to a generic 'query capabilities' IOCTL > > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 1:45 AM > > On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote: > > This patch adds IOASID allocation/free interface per iommufd. When > > allocating an IOASID, userspace is expected to specify the type and > > format information for the target I/O page table. > > > > This RFC supports only one type (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2), > > implying a kernel-managed I/O page table with vfio type1v2 mapping > > semantics. For this type the user should specify the addr_width of > > the I/O address space and whether the I/O page table is created in > > an iommu enfore_snoop format. enforce_snoop must be true at this point, > > as the false setting requires additional contract with KVM on handling > > WBINVD emulation, which can be added later. > > > > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next patch) > > for what formats can be specified when allocating an IOASID. > > > > Open: > > - Devices on PPC platform currently use a different iommu driver in vfio. > > Per previous discussion they can also use vfio type1v2 as long as there > > is a way to claim a specific iova range from a system-wide address space. > > This requirement doesn't sound PPC specific, as addr_width for pci > devices > > can be also represented by a range [0, 2^addr_width-1]. This RFC hasn't > > adopted this design yet. We hope to have formal alignment in v1 > discussion > > and then decide how to incorporate it in v2. > > I think the request was to include a start/end IO address hint when > creating the ios. When the kernel creates it then it can return the is the hint single-range or could be multiple-ranges? > actual geometry including any holes via a query. I'd like to see a detail flow from David on how the uAPI works today with existing spapr driver and what exact changes he'd like to make on this proposed interface. Above info is still insufficient for us to think about the right solution. > > > - Currently ioasid term has already been used in the kernel > (drivers/iommu/ > > ioasid.c) to represent the hardware I/O address space ID in the wire. It > > covers both PCI PASID (Process Address Space ID) and ARM SSID (Sub- > Stream > > ID). We need find a way to resolve the naming conflict between the > hardware > > ID and software handle. One option is to rename the existing ioasid to be > > pasid or ssid, given their full names still sound generic. Appreciate more > > thoughts on this open! > > ioas works well here I think. Use ioas_id to refer to the xarray > index. What about when introducing pasid to this uAPI? Then use ioas_id for the xarray index and ioasid to represent pasid/ssid? At this point the software handle and hardware id are mixed together thus need a clear terminology to differentiate them. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 1:41 AM > > On Sun, Sep 19, 2021 at 02:38:38PM +0800, Liu Yi L wrote: > > After a device is bound to the iommufd, userspace can use this interface > > to query the underlying iommu capability and format info for this device. > > Based on this information the user then creates I/O address space in a > > compatible format with the to-be-attached devices. > > > > Device cookie which is registered at binding time is used to mark the > > device which is being queried here. > > > > Signed-off-by: Liu Yi L > > drivers/iommu/iommufd/iommufd.c | 68 > + > > include/uapi/linux/iommu.h | 49 > > 2 files changed, 117 insertions(+) > > > > diff --git a/drivers/iommu/iommufd/iommufd.c > b/drivers/iommu/iommufd/iommufd.c > > index e16ca21e4534..641f199f2d41 100644 > > +++ b/drivers/iommu/iommufd/iommufd.c > > @@ -117,6 +117,71 @@ static int iommufd_fops_release(struct inode > *inode, struct file *filep) > > return 0; > > } > > > > +static struct device * > > +iommu_find_device_from_cookie(struct iommufd_ctx *ictx, u64 > dev_cookie) > > +{ > > We have an xarray ID for the device, why are we allowing userspace to > use the dev_cookie as input? > > Userspace should always pass in the ID. The only place dev_cookie > should appear is if the kernel generates an event back to > userspace. Then the kernel should return both the ID and the > dev_cookie in the event to allow userspace to correlate it. > A little background. In earlier design proposal we discussed two options. One is to return an kernel-allocated ID (label) to userspace. The other is to have user register a cookie and use it in iommufd uAPI. At that time the two options were discussed exclusively and the cookie one is preferred. Now you instead recommended a mixed option. We can follow it for sure if nobody objects. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 00/20] Introduce /dev/iommu for userspace I/O address space management
> From: Jason Gunthorpe > Sent: Tuesday, September 21, 2021 9:45 PM > > On Sun, Sep 19, 2021 at 02:38:28PM +0800, Liu Yi L wrote: > > Linux now includes multiple device-passthrough frameworks (e.g. VFIO > and > > vDPA) to manage secure device access from the userspace. One critical > task > > of those frameworks is to put the assigned device in a secure, IOMMU- > > protected context so user-initiated DMAs are prevented from doing harm > to > > the rest of the system. > > Some bot will probably send this too, but it has compile warnings and > needs to be rebased to 5.15-rc1 thanks Jason, will fix the warnings. yeah, I was using 5.14 in the test, will rebase to 5.15-rc# in next version. Regards, Yi Liu > drivers/iommu/iommufd/iommufd.c:269:6: warning: variable 'ret' is used > uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] > if (refcount_read(&ioas->refs) > 1) { > ^~ > drivers/iommu/iommufd/iommufd.c:277:9: note: uninitialized use occurs > here > return ret; >^~~ > drivers/iommu/iommufd/iommufd.c:269:2: note: remove the 'if' if its > condition is always true > if (refcount_read(&ioas->refs) > 1) { > ^~~~ > drivers/iommu/iommufd/iommufd.c:253:17: note: initialize the variable 'ret' > to silence this warning > int ioasid, ret; >^ > = 0 > drivers/iommu/iommufd/iommufd.c:727:7: warning: variable 'ret' is used > uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] > if (idev->dev == dev || idev->dev_cookie == dev_cookie) { > ^~ > drivers/iommu/iommufd/iommufd.c:767:17: note: uninitialized use occurs > here > return ERR_PTR(ret); >^~~ > drivers/iommu/iommufd/iommufd.c:727:3: note: remove the 'if' if its > condition is always false > if (idev->dev == dev || idev->dev_cookie == dev_cookie) { > > ^ > drivers/iommu/iommufd/iommufd.c:727:7: warning: variable 'ret' is used > uninitialized whenever '||' condition is true [-Wsometimes-uninitialized] > if (idev->dev == dev || idev->dev_cookie == dev_cookie) { > ^~~~ > drivers/iommu/iommufd/iommufd.c:767:17: note: uninitialized use occurs > here > return ERR_PTR(ret); >^~~ > drivers/iommu/iommufd/iommufd.c:727:7: note: remove the '||' if its > condition is always false > if (idev->dev == dev || idev->dev_cookie == dev_cookie) { > ^~~ > drivers/iommu/iommufd/iommufd.c:717:9: note: initialize the variable 'ret' > to silence this warning > int ret; >^ > = 0 > > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Tian, Kevin > Sent: Wednesday, September 22, 2021 9:07 AM > > > From: Jason Gunthorpe > > Sent: Wednesday, September 22, 2021 8:55 AM > > > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > > > The opened atomic is aweful. A newly created fd should start in a > > > > state where it has a disabled fops > > > > > > > > The only thing the disabled fops can do is register the device to the > > > > iommu fd. When successfully registered the device gets the normal fops. > > > > > > > > The registration steps should be done under a normal lock inside the > > > > vfio_device. If a vfio_device is already registered then further > > > > registration should fail. > > > > > > > > Getting the device fd via the group fd triggers the same sequence as > > > > above. > > > > > > > > > > Above works if the group interface is also connected to iommufd, i.e. > > > making vfio type1 as a shim. In this case we can use the registration > > > status as the exclusive switch. But if we keep vfio type1 separate as > > > today, then a new atomic is still necessary. This all depends on how > > > we want to deal with vfio type1 and iommufd, and possibly what's > > > discussed here just adds another pound to the shim option... > > > > No, it works the same either way, the group FD path is identical to > > the normal FD path, it just triggers some of the state transitions > > automatically internally instead of requiring external ioctls. > > > > The device FDs starts disabled, an internal API binds it to the iommu > > via open coding with the group API, and then the rest of the APIs can > > be enabled. Same as today. > > After reading your comments on patch08, I may have a clearer picture on your suggestion. The key is to handle exclusive access at the binding time (based on vdev->iommu_dev). Please see whether below makes sense: Shared sequence: 1) initialize the device with a parked fops; 2) need binding (explicit or implicit) to move away from parked fops; 3) switch to normal fops after successful binding; 1) happens at device probe. for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD: - 2) is done by calling .bind_iommufd() callback; - 3) could be done within .bind_iommufd(), or via a new callback e.g. .finalize_device(). The latter may be preferred for the group interface; - Two threads may open the same device simultaneously, with exclusive access guaranteed by iommufd_bind_device(); - Open() after successful binding is rejected, since normal fops has been activated. This is checked upon vdev->iommu_dev; for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD: - 2) is done by open coding bind_iommufd + attach_ioas. Create an iommufd_device object and record it to vdev->iommu_dev - 3) is done by calling .finalize_device(); - open() after a valid vdev->iommu_dev is rejected. this also ensures exclusive ownership with the nongroup path. If Alex also agrees with it, this might be another mini-series to be merged (just for group path) before this one. Doing so sort of nullifies the existing group/container attaching process, where attach_ioas will be skipped and now the security context is established when the device is opened. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 04/20] iommu: Add iommu_device_get_info interface
Hi Jason, On 9/22/21 12:19 AM, Jason Gunthorpe wrote: On Sun, Sep 19, 2021 at 02:38:32PM +0800, Liu Yi L wrote: From: Lu Baolu This provides an interface for upper layers to get the per-device iommu attributes. int iommu_device_get_info(struct device *dev, enum iommu_devattr attr, void *data); Can't we use properly typed ops and functions here instead of a void *data? get_snoop() get_page_size() get_addr_width() Yeah! Above are more friendly to the upper layer callers. ? Jason Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 1:10 AM > > On Sun, Sep 19, 2021 at 02:38:34PM +0800, Liu Yi L wrote: > > From: Lu Baolu > > > > This extends iommu core to manage security context for passthrough > > devices. Please bear a long explanation for how we reach this design > > instead of managing it solely in iommufd like what vfio does today. > > > > Devices which cannot be isolated from each other are organized into an > > iommu group. When a device is assigned to the user space, the entire > > group must be put in a security context so that user-initiated DMAs via > > the assigned device cannot harm the rest of the system. No user access > > should be granted on a device before the security context is established > > for the group which the device belongs to. > > > Managing the security context must meet below criteria: > > > > 1) The group is viable for user-initiated DMAs. This implies that the > > devices in the group must be either bound to a device-passthrough > > s/a/the same/ > > > framework, or driver-less, or bound to a driver which is known safe > > (not do DMA). > > > > 2) The security context should only allow DMA to the user's memory and > > devices in this group; > > > > 3) After the security context is established for the group, the group > > viability must be continuously monitored before the user relinquishes > > all devices belonging to the group. The viability might be broken e.g. > > when a driver-less device is later bound to a driver which does DMA. > > > > 4) The security context should not be destroyed before user access > > permission is withdrawn. > > > > Existing vfio introduces explicit container/group semantics in its uAPI > > to meet above requirements. A single security context (iommu domain) > > is created per container. Attaching group to container moves the entire > > group into the associated security context, and vice versa. The user can > > open the device only after group attach. A group can be detached only > > after all devices in the group are closed. Group viability is monitored > > by listening to iommu group events. > > > > Unlike vfio, iommufd adopts a device-centric design with all group > > logistics hidden behind the fd. Binding a device to iommufd serves > > as the contract to get security context established (and vice versa > > for unbinding). One additional requirement in iommufd is to manage the > > switch between multiple security contexts due to decoupled bind/attach: > > This should be a precursor series that actually does clean things up > properly. There is no reason for vfio and iommufd to differ here, if > we are implementing this logic into the iommu layer then it should be > deleted from the VFIO layer, not left duplicated like this. make sense > > IIRC in VFIO the container is the IOAS and when the group goes to > create the device fd it should simply do the > iommu_device_init_user_dma() followed immediately by a call to bind > the container IOAS as your #3. a slight correction. to meet vfio semantics we could do init_user_dma() at group attach time and then call binding to container IOAS when the device fd is created. This is because vfio requires the group in a security context before the device is opened. > > Then delete all the group viability stuff from vfio, relying on the > iommu to do it. > > It should have full symmetry with the iommufd. agree > > > @@ -1664,6 +1671,17 @@ static int iommu_bus_notifier(struct > notifier_block *nb, > > group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER; > > break; > > case BUS_NOTIFY_BOUND_DRIVER: > > + /* > > +* FIXME: Alternatively the attached drivers could generically > > +* indicate to the iommu layer that they are safe for keeping > > +* the iommu group user viable by calling some function > around > > +* probe(). We could eliminate this gross BUG_ON() by > denying > > +* probe to non-iommu-safe driver. > > +*/ > > + mutex_lock(&group->mutex); > > + if (group->user_dma_owner_id) > > + BUG_ON(!iommu_group_user_dma_viable(group)); > > + mutex_unlock(&group->mutex); > > And the mini-series should fix this BUG_ON properly by interlocking > with the driver core to simply refuse to bind a driver under these > conditions instead of allowing userspace to crash the kernel. > > That alone would be justification enough to merge this work. yes > > > + > > +/* > > + * IOMMU core interfaces for iommufd. > > + */ > > + > > +/* > > + * FIXME: We currently simply follow vifo policy to mantain the group's > > + * viability to user. Eventually, we should avoid below hard-coded list > > + * by letting drivers indicate to the iommu layer that they are safe for > > + * keeping the iommu group's user aviability. > > + */ > > +static const char * const iommu_driver_allowed[] = { > > +
RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
> From: Jason Gunthorpe > Sent: Tuesday, September 21, 2021 11:42 PM > > - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does >not need locking (order it properly too, it is in the wrong order), and >don't check for duplicate devices or dev_cookie duplication, that >is user error and is harmless to the kernel. > I'm confused here. yes it's user error, but we check so many user errors and then return -EINVAL, -EBUSY, etc. Why is this one special? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 5:59 AM > > On Tue, Sep 21, 2021 at 03:09:29PM -0600, Alex Williamson wrote: > > > the iommufd uAPI at all. Isn't part of that work to understand how KVM > > will be told about non-coherent devices rather than "meh, skip it in the > > kernel"? Also let's not forget that vfio is not only for KVM. > > vfio is not only for KVM, but AFIACT the wbinv stuff is only for > KVM... But yes, I agree this should be sorted out at this stage If such devices are even not exposed in the new hierarchy at this stage, suppose sorting it out later should be fine? > > > > > When the device is opened via /dev/vfio/devices, vfio-pci should > prevent > > > > the user from accessing the assigned device because the device is still > > > > attached to the default domain which may allow user-initiated DMAs to > > > > touch arbitrary place. The user access must be blocked until the device > > > > is later bound to an iommufd (see patch 08). The binding acts as the > > > > contract for putting the device in a security context which ensures > > > > user- > > > > initiated DMAs via this device cannot harm the rest of the system. > > > > > > > > This patch introduces a vdev->block_access flag for this purpose. It's > > > > set > > > > when the device is opened via /dev/vfio/devices and cleared after > binding > > > > to iommufd succeeds. mmap and r/w handlers check this flag to decide > whether > > > > user access should be blocked or not. > > > > > > This should not be in vfio_pci. > > > > > > AFAIK there is no condition where a vfio driver can work without being > > > connected to some kind of iommu back end, so the core code should > > > handle this interlock globally. A vfio driver's ops should not be > > > callable until the iommu is connected. > > > > > > The only vfio_pci patch in this series should be adding a new callback > > > op to take in an iommufd and register the pci_device as a iommufd > > > device. > > > > Couldn't the same argument be made that registering a $bus device as an > > iommufd device is a common interface that shouldn't be the > > responsibility of the vfio device driver? > > The driver needs enough involvment to signal what kind of IOMMU > connection it wants, eg attaching to a physical device will use the > iofd_attach_device() path, but attaching to a SW page table should use > a different API call. PASID should also be different. Exactly > > Possibly a good arrangement is to have the core provide some generic > ioctl ops functions 'vfio_all_device_iommufd_bind' that everything > except mdev drivers can use so the code is all duplicated. Could this be an future enhancement when we have multiple device types supporting iommufd? > > > non-group device anything more than a reservation of that device if > > access is withheld until iommu isolation? I also don't really want to > > predict how ioctls might evolve to guess whether only blocking .read, > > .write, and .mmap callbacks are sufficient. Thanks, > > This is why I said the entire fops should be blocked in a dummy fops > so the core code the vfio_device FD parked and userspace unable to > access the ops until device attachment and thus IOMMU ioslation is > completed. > > Simple and easy to reason about, a parked FD is very similar to a > closed FD. > This rationale makes sense. Just the open how to handle exclusive open between group and nongroup interfaces still needs some more clarification here, especially about what a parked FD means for the group interface (where parking is unnecessary since the security context is already established before the device is opened) Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices
> From: Alex Williamson > Sent: Wednesday, September 22, 2021 5:09 AM > > On Tue, 21 Sep 2021 13:40:01 -0300 > Jason Gunthorpe wrote: > > > On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote: > > > This patch exposes the device-centric interface for vfio-pci devices. To > > > be compatiable with existing users, vfio-pci exposes both legacy group > > > interface and device-centric interface. > > > > > > As explained in last patch, this change doesn't apply to devices which > > > cannot be forced to snoop cache by their upstream iommu. Such devices > > > are still expected to be opened via the legacy group interface. > > This doesn't make much sense to me. The previous patch indicates > there's work to be done in updating the kvm-vfio contract to understand > DMA coherency, so you're trying to limit use cases to those where the > IOMMU enforces coherency, but there's QEMU work to be done to support > the iommufd uAPI at all. Isn't part of that work to understand how KVM > will be told about non-coherent devices rather than "meh, skip it in the > kernel"? Also let's not forget that vfio is not only for KVM. The policy here is that VFIO will not expose such devices (no enforce-snoop) in the new device hierarchy at all. In this case QEMU will fall back to the group interface automatically and then rely on the existing contract to connect vfio and QEMU. It doesn't need to care about the whatever new contract until such devices are exposed in the new interface. yes, vfio is not only for KVM. But here it's more a task split based on staging consideration. imo it's not necessary to further split task into supporting non-snoop device for userspace driver and then for kvm. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 8:55 AM > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > > The opened atomic is aweful. A newly created fd should start in a > > > state where it has a disabled fops > > > > > > The only thing the disabled fops can do is register the device to the > > > iommu fd. When successfully registered the device gets the normal fops. > > > > > > The registration steps should be done under a normal lock inside the > > > vfio_device. If a vfio_device is already registered then further > > > registration should fail. > > > > > > Getting the device fd via the group fd triggers the same sequence as > > > above. > > > > > > > Above works if the group interface is also connected to iommufd, i.e. > > making vfio type1 as a shim. In this case we can use the registration > > status as the exclusive switch. But if we keep vfio type1 separate as > > today, then a new atomic is still necessary. This all depends on how > > we want to deal with vfio type1 and iommufd, and possibly what's > > discussed here just adds another pound to the shim option... > > No, it works the same either way, the group FD path is identical to > the normal FD path, it just triggers some of the state transitions > automatically internally instead of requiring external ioctls. > > The device FDs starts disabled, an internal API binds it to the iommu > via open coding with the group API, and then the rest of the APIs can > be enabled. Same as today. > Still a bit confused. if vfio type1 also connects to iommufd, whether the device is registered can be centrally checked based on whether an iommu_ctx is recorded. But if type1 doesn't talk to iommufd at all, don't we still need introduce a new state (calling it 'opened' or 'registered') to protect the two interfaces? In this case what is the point of keeping device FD disabled even for the group path? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 03/20] vfio: Add vfio_[un]register_device()
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 9:00 AM > > On Wed, Sep 22, 2021 at 12:54:02AM +, Tian, Kevin wrote: > > > From: Jason Gunthorpe > > > Sent: Wednesday, September 22, 2021 12:01 AM > > > > > > > One open about how to organize the device nodes under > > > /dev/vfio/devices/. > > > > This RFC adopts a simple policy by keeping a flat layout with mixed > > > devname > > > > from all kinds of devices. The prerequisite of this model is that > devnames > > > > from different bus types are unique formats: > > > > > > This isn't reliable, the devname should just be vfio0, vfio1, etc > > > > > > The userspace can learn the correct major/minor by inspecting the > > > sysfs. > > > > > > This whole concept should disappear into the prior patch that adds the > > > struct device in the first place, and I think most of the code here > > > can be deleted once the struct device is used properly. > > > > > > > Can you help elaborate above flow? This is one area where we need > > more guidance. > > > > When Qemu accepts an option "-device vfio-pci,host=:BB:DD.F", > > how does Qemu identify which vifo0/1/... is associated with the specified > > :BB:DD.F? > > When done properly in the kernel the file: > > /sys/bus/pci/devices/:BB:DD.F/vfio/vfioX/dev > > Will contain the major:minor of the VFIO device. > > Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat > that the major:minor matches. ah, that's the trick. > > in the above pattern "pci" and ":BB:DD.FF" are the arguments passed > to qemu. > > You can look at this for some general over engineered code to handle > opening from a sysfs handle like above: > > https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c > will check. Thanks for suggestion. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 03/20] vfio: Add vfio_[un]register_device()
On Wed, Sep 22, 2021 at 12:54:02AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, September 22, 2021 12:01 AM > > > > > One open about how to organize the device nodes under > > /dev/vfio/devices/. > > > This RFC adopts a simple policy by keeping a flat layout with mixed > > devname > > > from all kinds of devices. The prerequisite of this model is that devnames > > > from different bus types are unique formats: > > > > This isn't reliable, the devname should just be vfio0, vfio1, etc > > > > The userspace can learn the correct major/minor by inspecting the > > sysfs. > > > > This whole concept should disappear into the prior patch that adds the > > struct device in the first place, and I think most of the code here > > can be deleted once the struct device is used properly. > > > > Can you help elaborate above flow? This is one area where we need > more guidance. > > When Qemu accepts an option "-device vfio-pci,host=:BB:DD.F", > how does Qemu identify which vifo0/1/... is associated with the specified > :BB:DD.F? When done properly in the kernel the file: /sys/bus/pci/devices/:BB:DD.F/vfio/vfioX/dev Will contain the major:minor of the VFIO device. Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat that the major:minor matches. in the above pattern "pci" and ":BB:DD.FF" are the arguments passed to qemu. You can look at this for some general over engineered code to handle opening from a sysfs handle like above: https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 03/20] vfio: Add vfio_[un]register_device()
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 8:54 AM > > On Tue, Sep 21, 2021 at 11:10:15PM +, Tian, Kevin wrote: > > > From: Jason Gunthorpe > > > Sent: Wednesday, September 22, 2021 12:01 AM > > > > > > On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote: > > > > With /dev/vfio/devices introduced, now a vfio device driver has three > > > > options to expose its device to userspace: > > > > > > > > a) only legacy group interface, for devices which haven't been moved > to > > > > iommufd (e.g. platform devices, sw mdev, etc.); > > > > > > > > b) both legacy group interface and new device-centric interface, for > > > > devices which supports iommufd but also wants to keep backward > > > > compatibility (e.g. pci devices in this RFC); > > > > > > > > c) only new device-centric interface, for new devices which don't carry > > > > backward compatibility burden (e.g. hw mdev/subdev with pasid); > > > > > > We shouldn't have 'b'? Where does it come from? > > > > a vfio-pci device can be opened via the existing group interface. if no b) > > it > > means legacy vfio userspace can never use vfio-pci device any more > > once the latter is moved to iommufd. > > Sorry, I think I ment a, which I guess you will say is SW mdev devices We listed a) here in case we don't want to move all vfio device types to use iommufd in one breath. It's supposed to be a type valid only in this transition phase. In the end only b) and c) are valid. > > But even so, I think the way forward here is to still always expose > the device /dev/vfio/devices/X and some devices may not allow iommufd > usage initially. > > Providing an ioctl to bind to a normal VFIO container or group might > allow a reasonable fallback in userspace.. > but doesn't a new ioctl still imply breaking existing vfio userspace? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Alex Williamson > Sent: Wednesday, September 22, 2021 3:56 AM > > On Sun, 19 Sep 2021 14:38:30 +0800 > Liu Yi L wrote: > > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > > userspace to directly open a vfio device w/o relying on container/group > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > > iommufd (more specifically in iommu core by this RFC) in a device-centric > > manner. > > > > In case a device is exposed in both legacy and new interfaces (see next > > patch for how to decide it), this patch also ensures that when the device > > is already opened via one interface then the other one must be blocked. > > > > Signed-off-by: Liu Yi L > > --- > > drivers/vfio/vfio.c | 228 +++ > > include/linux/vfio.h | 2 + > > 2 files changed, 213 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > > index 02cc51ce6891..84436d7abedd 100644 > > --- a/drivers/vfio/vfio.c > > +++ b/drivers/vfio/vfio.c > ... > > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = { > > .mode = S_IRUGO | S_IWUGO, > > }; > > > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode) > > +{ > > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); > > +} > > dev_name() doesn't provide us with any uniqueness guarantees, so this > could potentially generate naming conflicts. The similar scheme for > devices within an iommu group appends an instance number if a conflict > occurs, but that solution doesn't work here where the name isn't just a > link to the actual device. Devices within an iommu group are also > likely associated within a bus_type, so the potential for conflict is > pretty negligible, that's not the case as vfio is adopted for new > device types. Thanks, > This is also our concern. Thanks for confirming it. Appreciate if you can help think out some better alternative to deal with it. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > The opened atomic is aweful. A newly created fd should start in a > > state where it has a disabled fops > > > > The only thing the disabled fops can do is register the device to the > > iommu fd. When successfully registered the device gets the normal fops. > > > > The registration steps should be done under a normal lock inside the > > vfio_device. If a vfio_device is already registered then further > > registration should fail. > > > > Getting the device fd via the group fd triggers the same sequence as > > above. > > > > Above works if the group interface is also connected to iommufd, i.e. > making vfio type1 as a shim. In this case we can use the registration > status as the exclusive switch. But if we keep vfio type1 separate as > today, then a new atomic is still necessary. This all depends on how > we want to deal with vfio type1 and iommufd, and possibly what's > discussed here just adds another pound to the shim option... No, it works the same either way, the group FD path is identical to the normal FD path, it just triggers some of the state transitions automatically internally instead of requiring external ioctls. The device FDs starts disabled, an internal API binds it to the iommu via open coding with the group API, and then the rest of the APIs can be enabled. Same as today. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 03/20] vfio: Add vfio_[un]register_device()
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 12:01 AM > > > One open about how to organize the device nodes under > /dev/vfio/devices/. > > This RFC adopts a simple policy by keeping a flat layout with mixed > devname > > from all kinds of devices. The prerequisite of this model is that devnames > > from different bus types are unique formats: > > This isn't reliable, the devname should just be vfio0, vfio1, etc > > The userspace can learn the correct major/minor by inspecting the > sysfs. > > This whole concept should disappear into the prior patch that adds the > struct device in the first place, and I think most of the code here > can be deleted once the struct device is used properly. > Can you help elaborate above flow? This is one area where we need more guidance. When Qemu accepts an option "-device vfio-pci,host=:BB:DD.F", how does Qemu identify which vifo0/1/... is associated with the specified :BB:DD.F? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 03/20] vfio: Add vfio_[un]register_device()
On Tue, Sep 21, 2021 at 11:10:15PM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, September 22, 2021 12:01 AM > > > > On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote: > > > With /dev/vfio/devices introduced, now a vfio device driver has three > > > options to expose its device to userspace: > > > > > > a) only legacy group interface, for devices which haven't been moved to > > > iommufd (e.g. platform devices, sw mdev, etc.); > > > > > > b) both legacy group interface and new device-centric interface, for > > > devices which supports iommufd but also wants to keep backward > > > compatibility (e.g. pci devices in this RFC); > > > > > > c) only new device-centric interface, for new devices which don't carry > > > backward compatibility burden (e.g. hw mdev/subdev with pasid); > > > > We shouldn't have 'b'? Where does it come from? > > a vfio-pci device can be opened via the existing group interface. if no b) it > means legacy vfio userspace can never use vfio-pci device any more > once the latter is moved to iommufd. Sorry, I think I ment a, which I guess you will say is SW mdev devices But even so, I think the way forward here is to still always expose the device /dev/vfio/devices/X and some devices may not allow iommufd usage initially. Providing an ioctl to bind to a normal VFIO container or group might allow a reasonable fallback in userspace.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Jason Gunthorpe > Sent: Tuesday, September 21, 2021 11:57 PM > > On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > > userspace to directly open a vfio device w/o relying on container/group > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > > iommufd (more specifically in iommu core by this RFC) in a device-centric > > manner. > > > > In case a device is exposed in both legacy and new interfaces (see next > > patch for how to decide it), this patch also ensures that when the device > > is already opened via one interface then the other one must be blocked. > > > > Signed-off-by: Liu Yi L > > --- > > drivers/vfio/vfio.c | 228 +++ > > include/linux/vfio.h | 2 + > > 2 files changed, 213 insertions(+), 17 deletions(-) > > > +static int vfio_init_device_class(void) > > +{ > > + int ret; > > + > > + mutex_init(&vfio.device_lock); > > + idr_init(&vfio.device_idr); > > + > > + /* /dev/vfio/devices/$DEVICE */ > > + vfio.device_class = class_create(THIS_MODULE, "vfio-device"); > > + if (IS_ERR(vfio.device_class)) > > + return PTR_ERR(vfio.device_class); > > + > > + vfio.device_class->devnode = vfio_device_devnode; > > + > > + ret = alloc_chrdev_region(&vfio.device_devt, 0, MINORMASK + 1, > "vfio-device"); > > + if (ret) > > + goto err_alloc_chrdev; > > + > > + cdev_init(&vfio.device_cdev, &vfio_device_fops); > > + ret = cdev_add(&vfio.device_cdev, vfio.device_devt, MINORMASK + > 1); > > + if (ret) > > + goto err_cdev_add; > > Huh? This is not how cdevs are used. This patch needs rewriting. > > The struct vfio_device should gain a 'struct device' and 'struct cdev' > as non-pointer members > > vfio register path should end up doing cdev_device_add() for each > vfio_device > > vfio_unregister path should do cdev_device_del() > > No idr should be needed, an ida is used to allocate minor numbers > > The struct device release function should trigger a kfree which > requires some reworking of the callers > > vfio_init_group_dev() should do a device_initialize() > vfio_uninit_group_dev() should do a device_put() All above are good suggestions! > > The opened atomic is aweful. A newly created fd should start in a > state where it has a disabled fops > > The only thing the disabled fops can do is register the device to the > iommu fd. When successfully registered the device gets the normal fops. > > The registration steps should be done under a normal lock inside the > vfio_device. If a vfio_device is already registered then further > registration should fail. > > Getting the device fd via the group fd triggers the same sequence as > above. > Above works if the group interface is also connected to iommufd, i.e. making vfio type1 as a shim. In this case we can use the registration status as the exclusive switch. But if we keep vfio type1 separate as today, then a new atomic is still necessary. This all depends on how we want to deal with vfio type1 and iommufd, and possibly what's discussed here just adds another pound to the shim option... Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 03/20] vfio: Add vfio_[un]register_device()
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 12:01 AM > > On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote: > > With /dev/vfio/devices introduced, now a vfio device driver has three > > options to expose its device to userspace: > > > > a) only legacy group interface, for devices which haven't been moved to > > iommufd (e.g. platform devices, sw mdev, etc.); > > > > b) both legacy group interface and new device-centric interface, for > > devices which supports iommufd but also wants to keep backward > > compatibility (e.g. pci devices in this RFC); > > > > c) only new device-centric interface, for new devices which don't carry > > backward compatibility burden (e.g. hw mdev/subdev with pasid); > > We shouldn't have 'b'? Where does it come from? a vfio-pci device can be opened via the existing group interface. if no b) it means legacy vfio userspace can never use vfio-pci device any more once the latter is moved to iommufd. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices
On Tue, Sep 21, 2021 at 03:09:29PM -0600, Alex Williamson wrote: > the iommufd uAPI at all. Isn't part of that work to understand how KVM > will be told about non-coherent devices rather than "meh, skip it in the > kernel"? Also let's not forget that vfio is not only for KVM. vfio is not only for KVM, but AFIACT the wbinv stuff is only for KVM... But yes, I agree this should be sorted out at this stage > > > When the device is opened via /dev/vfio/devices, vfio-pci should prevent > > > the user from accessing the assigned device because the device is still > > > attached to the default domain which may allow user-initiated DMAs to > > > touch arbitrary place. The user access must be blocked until the device > > > is later bound to an iommufd (see patch 08). The binding acts as the > > > contract for putting the device in a security context which ensures user- > > > initiated DMAs via this device cannot harm the rest of the system. > > > > > > This patch introduces a vdev->block_access flag for this purpose. It's set > > > when the device is opened via /dev/vfio/devices and cleared after binding > > > to iommufd succeeds. mmap and r/w handlers check this flag to decide > > > whether > > > user access should be blocked or not. > > > > This should not be in vfio_pci. > > > > AFAIK there is no condition where a vfio driver can work without being > > connected to some kind of iommu back end, so the core code should > > handle this interlock globally. A vfio driver's ops should not be > > callable until the iommu is connected. > > > > The only vfio_pci patch in this series should be adding a new callback > > op to take in an iommufd and register the pci_device as a iommufd > > device. > > Couldn't the same argument be made that registering a $bus device as an > iommufd device is a common interface that shouldn't be the > responsibility of the vfio device driver? The driver needs enough involvment to signal what kind of IOMMU connection it wants, eg attaching to a physical device will use the iofd_attach_device() path, but attaching to a SW page table should use a different API call. PASID should also be different. Possibly a good arrangement is to have the core provide some generic ioctl ops functions 'vfio_all_device_iommufd_bind' that everything except mdev drivers can use so the code is all duplicated. > non-group device anything more than a reservation of that device if > access is withheld until iommu isolation? I also don't really want to > predict how ioctls might evolve to guess whether only blocking .read, > .write, and .mmap callbacks are sufficient. Thanks, This is why I said the entire fops should be blocked in a dummy fops so the core code the vfio_device FD parked and userspace unable to access the ops until device attachment and thus IOMMU ioslation is completed. Simple and easy to reason about, a parked FD is very similar to a closed FD. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
On Tue, Sep 21, 2021 at 04:43:59PM -0500, Tom Lendacky wrote: > On 9/21/21 4:34 PM, Kirill A. Shutemov wrote: > > On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote: > > > On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > > > > I still believe calling cc_platform_has() from __startup_64() is totally > > > > broken as it lacks proper wrapping while accessing global variables. > > > > > > Well, one of the issues on the AMD side was using boot_cpu_data too > > > early and the Intel side uses it too. Can you replace those checks with > > > is_tdx_guest() or whatever was the helper's name which would check > > > whether the the kernel is running as a TDX guest, and see if that helps? > > > > There's no need in Intel check this early. Only AMD need it. Maybe just > > opencode them? > > Any way you can put a gzipped/bzipped copy of your vmlinux file somewhere I > can grab it from and take a look at it? You can find broken vmlinux and bzImage here: https://drive.google.com/drive/folders/1n74vUQHOGebnF70Im32qLFY8iS3wvjIs?usp=sharing Let me know when I can remove it. -- Kirill A. Shutemov ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
On 9/21/21 4:34 PM, Kirill A. Shutemov wrote: On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote: On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: I still believe calling cc_platform_has() from __startup_64() is totally broken as it lacks proper wrapping while accessing global variables. Well, one of the issues on the AMD side was using boot_cpu_data too early and the Intel side uses it too. Can you replace those checks with is_tdx_guest() or whatever was the helper's name which would check whether the the kernel is running as a TDX guest, and see if that helps? There's no need in Intel check this early. Only AMD need it. Maybe just opencode them? Any way you can put a gzipped/bzipped copy of your vmlinux file somewhere I can grab it from and take a look at it? Thanks, Tom ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote: > On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > > I still believe calling cc_platform_has() from __startup_64() is totally > > broken as it lacks proper wrapping while accessing global variables. > > Well, one of the issues on the AMD side was using boot_cpu_data too > early and the Intel side uses it too. Can you replace those checks with > is_tdx_guest() or whatever was the helper's name which would check > whether the the kernel is running as a TDX guest, and see if that helps? There's no need in Intel check this early. Only AMD need it. Maybe just opencode them? -- Kirill A. Shutemov ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > I still believe calling cc_platform_has() from __startup_64() is totally > broken as it lacks proper wrapping while accessing global variables. Well, one of the issues on the AMD side was using boot_cpu_data too early and the Intel side uses it too. Can you replace those checks with is_tdx_guest() or whatever was the helper's name which would check whether the the kernel is running as a TDX guest, and see if that helps? Thx. -- Regards/Gruss, Boris. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
On Tue, Sep 21, 2021 at 07:47:15PM +0200, Borislav Petkov wrote: > On Tue, Sep 21, 2021 at 12:04:58PM -0500, Tom Lendacky wrote: > > Looks like instrumentation during early boot. I worked with Boris offline to > > exclude arch/x86/kernel/cc_platform.c from some of the instrumentation and > > that allowed an allyesconfig to boot. > > And here's the lineup I have so far, I'd appreciate it if ppc and s390 folks > could run it too: > > https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc2-cc Still broken for me with allyesconfig. gcc version 11.2.0 (Gentoo 11.2.0 p1) GNU ld (Gentoo 2.37_p1 p0) 2.37 I still believe calling cc_platform_has() from __startup_64() is totally broken as it lacks proper wrapping while accessing global variables. I think sme_get_me_mask() has the same problem. I just happened to work (until next compiler update). This hack makes kernel boot again: diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index f98c76a1d16c..e9110a44bf1b 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr, * there is no need to zero it after changing the memory encryption * attribute. */ - if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) { + if (0 && cc_platform_has(CC_ATTR_MEM_ENCRYPT)) { vaddr = (unsigned long)__start_bss_decrypted; vaddr_end = (unsigned long)__end_bss_decrypted; for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c index eff4d19f9cb4..91638ed0b1db 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -288,7 +288,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp) unsigned long pgtable_area_len; unsigned long decrypted_base; - if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) + if (1 || !cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) return; /* -- Kirill A. Shutemov ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices
On Tue, 21 Sep 2021 13:40:01 -0300 Jason Gunthorpe wrote: > On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote: > > This patch exposes the device-centric interface for vfio-pci devices. To > > be compatiable with existing users, vfio-pci exposes both legacy group > > interface and device-centric interface. > > > > As explained in last patch, this change doesn't apply to devices which > > cannot be forced to snoop cache by their upstream iommu. Such devices > > are still expected to be opened via the legacy group interface. This doesn't make much sense to me. The previous patch indicates there's work to be done in updating the kvm-vfio contract to understand DMA coherency, so you're trying to limit use cases to those where the IOMMU enforces coherency, but there's QEMU work to be done to support the iommufd uAPI at all. Isn't part of that work to understand how KVM will be told about non-coherent devices rather than "meh, skip it in the kernel"? Also let's not forget that vfio is not only for KVM. > > When the device is opened via /dev/vfio/devices, vfio-pci should prevent > > the user from accessing the assigned device because the device is still > > attached to the default domain which may allow user-initiated DMAs to > > touch arbitrary place. The user access must be blocked until the device > > is later bound to an iommufd (see patch 08). The binding acts as the > > contract for putting the device in a security context which ensures user- > > initiated DMAs via this device cannot harm the rest of the system. > > > > This patch introduces a vdev->block_access flag for this purpose. It's set > > when the device is opened via /dev/vfio/devices and cleared after binding > > to iommufd succeeds. mmap and r/w handlers check this flag to decide whether > > user access should be blocked or not. > > This should not be in vfio_pci. > > AFAIK there is no condition where a vfio driver can work without being > connected to some kind of iommu back end, so the core code should > handle this interlock globally. A vfio driver's ops should not be > callable until the iommu is connected. > > The only vfio_pci patch in this series should be adding a new callback > op to take in an iommufd and register the pci_device as a iommufd > device. Couldn't the same argument be made that registering a $bus device as an iommufd device is a common interface that shouldn't be the responsibility of the vfio device driver? Is userspace opening the non-group device anything more than a reservation of that device if access is withheld until iommu isolation? I also don't really want to predict how ioctls might evolve to guess whether only blocking .read, .write, and .mmap callbacks are sufficient. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 02/13] dt-bindings: memory: mediatek: Add mt8195 smi sub common
On Tue, 14 Sep 2021 19:36:52 +0800, Yong Wu wrote: > Add the binding for smi-sub-common. The SMI block diagram like this: > > IOMMU > | | > smi-common > -- > | | > larb0 larb7 <-max is 8 > > The smi-common connects with smi-larb and IOMMU. The maximum larbs number > that connects with a smi-common is 8. If the engines number is over 8, > sometimes we use a smi-sub-common which is nearly same with smi-common. > It supports up to 8 input and 1 output(smi-common has 2 output) > > Something like: > > IOMMU > | | > smi-common > - > | | ... > larb0 sub-common ... <-max is 8 > --- >||... <-max is 8 too. > larb2 larb5 > > We don't need extra SW setting for smi-sub-common, only the sub-common has > special clocks need to enable when the engines access dram. > > If it is sub-common, it should have a "mediatek,smi" phandle to point to > its smi-common. meanwhile the sub-common only has one gals clock. > > Signed-off-by: Yong Wu > --- > change note: add "else mediatek,smi: false". > --- > .../mediatek,smi-common.yaml | 28 +++ > 1 file changed, 28 insertions(+) > Reviewed-by: Rob Herring ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Sun, 19 Sep 2021 14:38:30 +0800 Liu Yi L wrote: > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > userspace to directly open a vfio device w/o relying on container/group > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > iommufd (more specifically in iommu core by this RFC) in a device-centric > manner. > > In case a device is exposed in both legacy and new interfaces (see next > patch for how to decide it), this patch also ensures that when the device > is already opened via one interface then the other one must be blocked. > > Signed-off-by: Liu Yi L > --- > drivers/vfio/vfio.c | 228 +++ > include/linux/vfio.h | 2 + > 2 files changed, 213 insertions(+), 17 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index 02cc51ce6891..84436d7abedd 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c ... > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = { > .mode = S_IRUGO | S_IWUGO, > }; > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode) > +{ > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); > +} dev_name() doesn't provide us with any uniqueness guarantees, so this could potentially generate naming conflicts. The similar scheme for devices within an iommu group appends an instance number if a conflict occurs, but that solution doesn't work here where the name isn't just a link to the actual device. Devices within an iommu group are also likely associated within a bus_type, so the potential for conflict is pretty negligible, that's not the case as vfio is adopted for new device types. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 16/20] vfio/type1: Export symbols for dma [un]map code sharing
On Sun, Sep 19, 2021 at 02:38:44PM +0800, Liu Yi L wrote: > [HACK. will fix in v2] > > There are two options to impelement vfio type1v2 mapping semantics in > /dev/iommu. > > One is to duplicate the related code from vfio as the starting point, > and then merge with vfio type1 at a later time. However vfio_iommu_type1.c > has over 3000LOC with ~80% related to dma management logic, including: I can't really see a way forward like this. I think some scheme to move the vfio datastructure is going to be necessary. > - the dma map/unmap metadata management > - page pinning, and related accounting > - iova range reporting > - dirty bitmap retrieving > - dynamic vaddr update, etc. All of this needs to be part of the iommufd anyhow.. > The alternative is to consolidate type1v2 logic in /dev/iommu immediately, > which requires converting vfio_iommu_type1 to be a shim driver. Another choice is the the datastructure coulde move and the two drivers could share its code and continue to exist more independently Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 15/20] vfio/pci: Add VFIO_DEVICE_[DE]ATTACH_IOASID
On Sun, Sep 19, 2021 at 02:38:43PM +0800, Liu Yi L wrote: > This patch adds interface for userspace to attach device to specified > IOASID. > > Note: > One device can only be attached to one IOASID in this version. This is > on par with what vfio provides today. In the future this restriction can > be relaxed when multiple I/O address spaces are supported per device ?? In VFIO the container is the IOS and the container can be shared with multiple devices. This needs to start at about the same functionality. > + } else if (cmd == VFIO_DEVICE_ATTACH_IOASID) { This should be in the core code, right? There is nothing PCI specific here. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 14/20] iommu/iommufd: Add iommufd_device_[de]attach_ioasid()
On Sun, Sep 19, 2021 at 02:38:42PM +0800, Liu Yi L wrote: > An I/O address space takes effect in the iommu only after it's attached > by a device. This patch provides iommufd_device_[de/at]tach_ioasid() > helpers for this purpose. One device can be only attached to one ioasid > at this point, but one ioasid can be attached by multiple devices. > > The caller specifies the iommufd_device (returned at binding time) and > the target ioasid when calling the helper function. Upon request, iommufd > installs the specified I/O page table to the correct place in the IOMMU, > according to the routing information (struct device* which represents > RID) recorded in iommufd_device. Future variants could allow the caller > to specify additional routing information (e.g. pasid/ssid) when multiple > I/O address spaces are supported per device. > > Open: > Per Jason's comment in below link, bus-specific wrappers are recommended. > This RFC implements one wrapper for pci device. But it looks that struct > pci_device is not used at all since iommufd_ device already carries all > necessary info. So want to have another discussion on its necessity, e.g. > whether making more sense to have bus-specific wrappers for binding, while > leaving a common attaching helper per iommufd_device. > https://lore.kernel.org/linux-iommu/20210528233649.gb3816...@nvidia.com/ > > TODO: > When multiple devices are attached to a same ioasid, the permitted iova > ranges and supported pgsize bitmap on this ioasid should be a common > subset of all attached devices. iommufd needs to track such info per > ioasid and update it every time when a new device is attached to the > ioasid. This has not been done in this version yet, due to the temporary > hack adopted in patch 16-18. The hack reuses vfio type1 driver which > already includes the necessary logic for iova ranges and pgsize bitmap. > Once we get a clear direction for those patches, that logic will be moved > to this patch. > > Signed-off-by: Liu Yi L > drivers/iommu/iommufd/iommufd.c | 226 > include/linux/iommufd.h | 29 > 2 files changed, 255 insertions(+) > > diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c > index e45d76359e34..25373a0e037a 100644 > +++ b/drivers/iommu/iommufd/iommufd.c > @@ -51,6 +51,19 @@ struct iommufd_ioas { > bool enforce_snoop; > struct iommufd_ctx *ictx; > refcount_t refs; > + struct mutex lock; > + struct list_head device_list; > + struct iommu_domain *domain; This should just be another xarray indexed by the device id > +/* Caller should hold ioas->lock */ > +static struct ioas_device_info *ioas_find_device(struct iommufd_ioas *ioas, > + struct iommufd_device *idev) > +{ > + struct ioas_device_info *ioas_dev; > + > + list_for_each_entry(ioas_dev, &ioas->device_list, next) { > + if (ioas_dev->idev == idev) > + return ioas_dev; > + } Which eliminates this search. xarray with tightly packed indexes is generally more efficient than linked lists.. > +static int ioas_check_device_compatibility(struct iommufd_ioas *ioas, > +struct device *dev) > +{ > + bool snoop = false; > + u32 addr_width; > + int ret; > + > + /* > + * currently we only support I/O page table with iommu enforce-snoop > + * format. Attaching a device which doesn't support this format in its > + * upstreaming iommu is rejected. > + */ > + ret = iommu_device_get_info(dev, IOMMU_DEV_INFO_FORCE_SNOOP, &snoop); > + if (ret || !snoop) > + return -EINVAL; > + > + ret = iommu_device_get_info(dev, IOMMU_DEV_INFO_ADDR_WIDTH, > &addr_width); > + if (ret || addr_width < ioas->addr_width) > + return -EINVAL; > + > + /* TODO: also need to check permitted iova ranges and pgsize bitmap */ > + > + return 0; > +} This seems kind of weird.. I expect the iommufd to hold a SW copy of the IO page table and each time a new domain is to be created it should push the SW copy into the domain. If the domain cannot support it then the domain driver should naturally fail a request. When the user changes the IO page table the SW copy is updated then all of the domains are updated too - again if any domain cannot support the change then it fails and the change is rolled back. It seems like this is a side effect of roughly hacking in the vfio code? > + > +/** > + * iommufd_device_attach_ioasid - attach device to an ioasid > + * @idev: [in] Pointer to struct iommufd_device. > + * @ioasid: [in] ioasid points to an I/O address space. > + * > + * Returns 0 for successful attach, otherwise returns error. > + * > + */ > +int iommufd_device_attach_ioasid(struct iommufd_device *idev, int ioasid) Types for the ioas_id again.. > +{ > + struct iommufd_ioas *ioas; > + struct ioas_device_info *ioas_dev; > +
Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
On Tue, Sep 21, 2021 at 12:04:58PM -0500, Tom Lendacky wrote: > Looks like instrumentation during early boot. I worked with Boris offline to > exclude arch/x86/kernel/cc_platform.c from some of the instrumentation and > that allowed an allyesconfig to boot. And here's the lineup I have so far, I'd appreciate it if ppc and s390 folks could run it too: https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc2-cc Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 12/20] iommu/iommufd: Add IOMMU_CHECK_EXTENSION
On Sun, Sep 19, 2021 at 02:38:40PM +0800, Liu Yi L wrote: > As aforementioned, userspace should check extension for what formats > can be specified when allocating an IOASID. This patch adds such > interface for userspace. In this RFC, iommufd reports EXT_MAP_TYPE1V2 > support and no no-snoop support yet. > > Signed-off-by: Liu Yi L > drivers/iommu/iommufd/iommufd.c | 7 +++ > include/uapi/linux/iommu.h | 27 +++ > 2 files changed, 34 insertions(+) > > diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c > index 4839f128b24a..e45d76359e34 100644 > +++ b/drivers/iommu/iommufd/iommufd.c > @@ -306,6 +306,13 @@ static long iommufd_fops_unl_ioctl(struct file *filep, > return ret; > > switch (cmd) { > + case IOMMU_CHECK_EXTENSION: > + switch (arg) { > + case EXT_MAP_TYPE1V2: > + return 1; > + default: > + return 0; > + } > case IOMMU_DEVICE_GET_INFO: > ret = iommufd_get_device_info(ictx, arg); > break; > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > index 5cbd300eb0ee..49731be71213 100644 > +++ b/include/uapi/linux/iommu.h > @@ -14,6 +14,33 @@ > #define IOMMU_TYPE (';') > #define IOMMU_BASE 100 > > +/* > + * IOMMU_CHECK_EXTENSION - _IO(IOMMU_TYPE, IOMMU_BASE + 0) > + * > + * Check whether an uAPI extension is supported. > + * > + * It's unlikely that all planned capabilities in IOMMU fd will be ready > + * in one breath. User should check which uAPI extension is supported > + * according to its intended usage. > + * > + * A rough list of possible extensions may include: > + * > + * - EXT_MAP_TYPE1V2 for vfio type1v2 map semantics; > + * - EXT_DMA_NO_SNOOP for no-snoop DMA support; > + * - EXT_MAP_NEWTYPE for an enhanced map semantics; > + * - EXT_MULTIDEV_GROUP for 1:N iommu group; > + * - EXT_IOASID_NESTING for what the name stands; > + * - EXT_USER_PAGE_TABLE for user managed page table; > + * - EXT_USER_PASID_TABLE for user managed PASID table; > + * - EXT_DIRTY_TRACKING for tracking pages dirtied by DMA; > + * - ... > + * > + * Return: 0 if not supported, 1 if supported. > + */ > +#define EXT_MAP_TYPE1V2 1 > +#define EXT_DMA_NO_SNOOP 2 > +#define IOMMU_CHECK_EXTENSION_IO(IOMMU_TYPE, IOMMU_BASE + 0) I generally advocate for a 'try and fail' approach to discovering compatibility. If that doesn't work for the userspace then a query to return a generic capability flag is the next best idea. Each flag should clearly define what 'try and fail' it is talking about Eg dma_no_snoop is about creating an IOS with flag NO SNOOP set TYPE1V2 seems like nonsense Not sure about the others. IOW, this should recast to a generic 'query capabilities' IOCTL Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE
On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote: > This patch adds IOASID allocation/free interface per iommufd. When > allocating an IOASID, userspace is expected to specify the type and > format information for the target I/O page table. > > This RFC supports only one type (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2), > implying a kernel-managed I/O page table with vfio type1v2 mapping > semantics. For this type the user should specify the addr_width of > the I/O address space and whether the I/O page table is created in > an iommu enfore_snoop format. enforce_snoop must be true at this point, > as the false setting requires additional contract with KVM on handling > WBINVD emulation, which can be added later. > > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next patch) > for what formats can be specified when allocating an IOASID. > > Open: > - Devices on PPC platform currently use a different iommu driver in vfio. > Per previous discussion they can also use vfio type1v2 as long as there > is a way to claim a specific iova range from a system-wide address space. > This requirement doesn't sound PPC specific, as addr_width for pci devices > can be also represented by a range [0, 2^addr_width-1]. This RFC hasn't > adopted this design yet. We hope to have formal alignment in v1 discussion > and then decide how to incorporate it in v2. I think the request was to include a start/end IO address hint when creating the ios. When the kernel creates it then it can return the actual geometry including any holes via a query. > - Currently ioasid term has already been used in the kernel (drivers/iommu/ > ioasid.c) to represent the hardware I/O address space ID in the wire. It > covers both PCI PASID (Process Address Space ID) and ARM SSID (Sub-Stream > ID). We need find a way to resolve the naming conflict between the hardware > ID and software handle. One option is to rename the existing ioasid to be > pasid or ssid, given their full names still sound generic. Appreciate more > thoughts on this open! ioas works well here I think. Use ioas_id to refer to the xarray index. > Signed-off-by: Liu Yi L > drivers/iommu/iommufd/iommufd.c | 120 > include/linux/iommufd.h | 3 + > include/uapi/linux/iommu.h | 54 ++ > 3 files changed, 177 insertions(+) > > diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c > index 641f199f2d41..4839f128b24a 100644 > +++ b/drivers/iommu/iommufd/iommufd.c > @@ -24,6 +24,7 @@ > struct iommufd_ctx { > refcount_t refs; > struct mutex lock; > + struct xarray ioasid_xa; /* xarray of ioasids */ > struct xarray device_xa; /* xarray of bound devices */ > }; > > @@ -42,6 +43,16 @@ struct iommufd_device { > u64 dev_cookie; > }; > > +/* Represent an I/O address space */ > +struct iommufd_ioas { > + int ioasid; xarray id's should consistently be u32s everywhere. Many of the same prior comments repeated here Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO
On Sun, Sep 19, 2021 at 02:38:38PM +0800, Liu Yi L wrote: > After a device is bound to the iommufd, userspace can use this interface > to query the underlying iommu capability and format info for this device. > Based on this information the user then creates I/O address space in a > compatible format with the to-be-attached devices. > > Device cookie which is registered at binding time is used to mark the > device which is being queried here. > > Signed-off-by: Liu Yi L > drivers/iommu/iommufd/iommufd.c | 68 + > include/uapi/linux/iommu.h | 49 > 2 files changed, 117 insertions(+) > > diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c > index e16ca21e4534..641f199f2d41 100644 > +++ b/drivers/iommu/iommufd/iommufd.c > @@ -117,6 +117,71 @@ static int iommufd_fops_release(struct inode *inode, > struct file *filep) > return 0; > } > > +static struct device * > +iommu_find_device_from_cookie(struct iommufd_ctx *ictx, u64 dev_cookie) > +{ We have an xarray ID for the device, why are we allowing userspace to use the dev_cookie as input? Userspace should always pass in the ID. The only place dev_cookie should appear is if the kernel generates an event back to userspace. Then the kernel should return both the ID and the dev_cookie in the event to allow userspace to correlate it. > +static void iommu_device_build_info(struct device *dev, > + struct iommu_device_info *info) > +{ > + bool snoop; > + u64 awidth, pgsizes; > + > + if (!iommu_device_get_info(dev, IOMMU_DEV_INFO_FORCE_SNOOP, &snoop)) > + info->flags |= snoop ? IOMMU_DEVICE_INFO_ENFORCE_SNOOP : 0; > + > + if (!iommu_device_get_info(dev, IOMMU_DEV_INFO_PAGE_SIZE, &pgsizes)) { > + info->pgsize_bitmap = pgsizes; > + info->flags |= IOMMU_DEVICE_INFO_PGSIZES; > + } > + > + if (!iommu_device_get_info(dev, IOMMU_DEV_INFO_ADDR_WIDTH, &awidth)) { > + info->addr_width = awidth; > + info->flags |= IOMMU_DEVICE_INFO_ADDR_WIDTH; > + } Another good option is to push the iommu_device_info uAPI struct down through to the iommu driver to fill it in and forget about the crazy enum. A big part of thinking of this iommu interface is a way to bind the HW IOMMU driver to a uAPI and allow the HW driver to expose its unique functionalities. > +static int iommufd_get_device_info(struct iommufd_ctx *ictx, > +unsigned long arg) > +{ > + struct iommu_device_info info; > + unsigned long minsz; > + struct device *dev; > + > + minsz = offsetofend(struct iommu_device_info, addr_width); > + > + if (copy_from_user(&info, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (info.argsz < minsz) > + return -EINVAL; All of these patterns everywhere are wrongly coded for forward/back compatibility. static int iommufd_get_device_info(struct iommufd_ctx *ictx, struct iommu_device_info __user *arg, size_t usize) { struct iommu_device_info info; int ret; if (usize < offsetofend(struct iommu_device_info, addr_flags)) return -EINVAL; ret = copy_struct_from_user(&info, sizeof(info), arg, usize); if (ret) return ret; 'usize' should be in a 'common' header extracted by the main ioctl handler. > +struct iommu_device_info { > + __u32 argsz; > + __u32 flags; > +#define IOMMU_DEVICE_INFO_ENFORCE_SNOOP (1 << 0) /* IOMMU enforced > snoop */ > +#define IOMMU_DEVICE_INFO_PGSIZES(1 << 1) /* supported page sizes */ > +#define IOMMU_DEVICE_INFO_ADDR_WIDTH (1 << 2) /* addr_wdith field valid */ > + __u64 dev_cookie; > + __u64 pgsize_bitmap; > + __u32 addr_width; > +}; Be explicit with padding here too. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD
On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote: > This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind the vfio > device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface is provided > because it's implicitly done when the device fd is closed. > > In concept a vfio device can be bound to multiple iommufds, each hosting > a subset of I/O address spaces attached by this device. However as a > starting point (matching current vfio), only one I/O address space is > supported per vfio device. It implies one device can only be attached > to one iommufd at this point. > > Signed-off-by: Liu Yi L > drivers/vfio/pci/Kconfig| 1 + > drivers/vfio/pci/vfio_pci.c | 72 - > drivers/vfio/pci/vfio_pci_private.h | 8 > include/uapi/linux/vfio.h | 30 > 4 files changed, 110 insertions(+), 1 deletion(-) > > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig > index 5e2e1b9a9fd3..3abfb098b4dc 100644 > +++ b/drivers/vfio/pci/Kconfig > @@ -5,6 +5,7 @@ config VFIO_PCI > depends on MMU > select VFIO_VIRQFD > select IRQ_BYPASS_MANAGER > + select IOMMUFD > help > Support for the PCI VFIO bus driver. This is required to make > use of PCI drivers using the VFIO framework. > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 145addde983b..20006bb66430 100644 > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -552,6 +552,16 @@ static void vfio_pci_release(struct vfio_device > *core_vdev) > vdev->req_trigger = NULL; > } > mutex_unlock(&vdev->igate); > + > + mutex_lock(&vdev->videv_lock); > + if (vdev->videv) { > + struct vfio_iommufd_device *videv = vdev->videv; > + > + vdev->videv = NULL; > + iommufd_unbind_device(videv->idev); > + kfree(videv); > + } > + mutex_unlock(&vdev->videv_lock); > } > > mutex_unlock(&vdev->reflck->lock); > @@ -780,7 +790,66 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev, > container_of(core_vdev, struct vfio_pci_device, vdev); > unsigned long minsz; > > - if (cmd == VFIO_DEVICE_GET_INFO) { > + if (cmd == VFIO_DEVICE_BIND_IOMMUFD) { Choosing to implement this through the ioctl multiplexor is what is causing so much ugly gyration in the previous patches This should be a straightforward new function and ops: struct iommufd_device *vfio_pci_bind_iommufd(struct vfio_device *) { iommu_dev = iommufd_bind_device(bind_data.iommu_fd, &vdev->pdev->dev, bind_data.dev_cookie); if (!iommu_dev) return ERR vdev->iommu_dev = iommu_dev; } static const struct vfio_device_ops vfio_pci_ops = { .bind_iommufd = &*vfio_pci_bind_iommufd If you do the other stuff I said then you'll notice that the iommufd_bind_device() will provide automatic exclusivity. The thread that sees ops->bind_device succeed will know it is the only thread that can see that (by definition, the iommu enable user stuff has to be exclusive and race free) thus it can go ahead and store the iommu pointer. The other half of the problem '&vdev->block_access' is solved by manipulating the filp->f_ops. Start with a fops that can ONLY call the above op. When the above op succeeds switch the fops to the normal full ops. . The same flow happens when the group fd spawns the device fd, just parts of iommfd_bind_device are open coded into the vfio code, but the whole flow and sequence should be the same. > + /* > + * Reject the request if the device is already opened and > + * attached to a container. > + */ > + if (vfio_device_in_container(core_vdev)) > + return -ENOTTY; This is wrongly locked > + > + minsz = offsetofend(struct vfio_device_iommu_bind_data, > dev_cookie); > + > + if (copy_from_user(&bind_data, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (bind_data.argsz < minsz || > + bind_data.flags || bind_data.iommu_fd < 0) > + return -EINVAL; > + > + mutex_lock(&vdev->videv_lock); > + /* > + * Allow only one iommufd per device until multiple > + * address spaces (e.g. vSVA) support is introduced > + * in the future. > + */ > + if (vdev->videv) { > + mutex_unlock(&vdev->videv_lock); > + return -EBUSY; > + } > + > + idev = iommufd_bind_device(bind_data.iommu_fd, > +&vdev->pdev->dev, > +bind_data.dev_co
Re: [RFC 07/20] iommu/iommufd: Add iommufd_[un]bind_device()
On Sun, Sep 19, 2021 at 02:38:35PM +0800, Liu Yi L wrote: > +/* > + * A iommufd_device object represents the binding relationship > + * between iommufd and device. It is created per a successful > + * binding request from device driver. The bound device must be > + * a physical device so far. Subdevice will be supported later > + * (with additional PASID information). An user-assigned cookie > + * is also recorded to mark the device in the /dev/iommu uAPI. > + */ > +struct iommufd_device { > + unsigned int id; > + struct iommufd_ctx *ictx; > + struct device *dev; /* always be the physical device */ > + u64 dev_cookie; > }; > > static int iommufd_fops_open(struct inode *inode, struct file *filep) > @@ -32,15 +52,58 @@ static int iommufd_fops_open(struct inode *inode, struct > file *filep) > return -ENOMEM; > > refcount_set(&ictx->refs, 1); > + mutex_init(&ictx->lock); > + xa_init_flags(&ictx->device_xa, XA_FLAGS_ALLOC); > filep->private_data = ictx; > > return ret; > } > > +static void iommufd_ctx_get(struct iommufd_ctx *ictx) > +{ > + refcount_inc(&ictx->refs); > +} See my earlier remarks about how to structure the lifetime logic, this ref isn't necessary. > +static const struct file_operations iommufd_fops; > + > +/** > + * iommufd_ctx_fdget - Acquires a reference to the internal iommufd context. > + * @fd: [in] iommufd file descriptor. > + * > + * Returns a pointer to the iommufd context, otherwise NULL; > + * > + */ > +static struct iommufd_ctx *iommufd_ctx_fdget(int fd) > +{ > + struct fd f = fdget(fd); > + struct file *file = f.file; > + struct iommufd_ctx *ictx; > + > + if (!file) > + return NULL; > + > + if (file->f_op != &iommufd_fops) > + return NULL; Leaks the fdget > + > + ictx = file->private_data; > + if (ictx) > + iommufd_ctx_get(ictx); Use success oriented flow > + fdput(f); > + return ictx; > +} > + */ > +struct iommufd_device *iommufd_bind_device(int fd, struct device *dev, > +u64 dev_cookie) > +{ > + struct iommufd_ctx *ictx; > + struct iommufd_device *idev; > + unsigned long index; > + unsigned int id; > + int ret; > + > + ictx = iommufd_ctx_fdget(fd); > + if (!ictx) > + return ERR_PTR(-EINVAL); > + > + mutex_lock(&ictx->lock); > + > + /* check duplicate registration */ > + xa_for_each(&ictx->device_xa, index, idev) { > + if (idev->dev == dev || idev->dev_cookie == dev_cookie) { > + idev = ERR_PTR(-EBUSY); > + goto out_unlock; > + } I can't think of a reason why this expensive check is needed. > + } > + > + idev = kzalloc(sizeof(*idev), GFP_KERNEL); > + if (!idev) { > + ret = -ENOMEM; > + goto out_unlock; > + } > + > + /* Establish the security context */ > + ret = iommu_device_init_user_dma(dev, (unsigned long)ictx); > + if (ret) > + goto out_free; > + > + ret = xa_alloc(&ictx->device_xa, &id, idev, > +XA_LIMIT(IOMMUFD_DEVID_MIN, IOMMUFD_DEVID_MAX), > +GFP_KERNEL); idev should be fully initialized before being placed in the xarray, so this should be the last thing done. Why not just use the standard xa_limit_32b instead of special single use constants? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu-v3: poll cmdq until it has space
Hi John, > But is the polarity really correct? That is, if we don't have space, > then exit with success (the function to check for space). You are absolutely correct, this is a mistake that I made as I was resolving conflicts while porting this patch to iommu/next from 5.4 where I implemented and tested it. It should be: > - if (!queue_full(llq)) > + if (queue_has_space(llq, n)) > what is llq->state->val? This is an other oversight for the same reason, llq->state->val has since then been renamed llq->val Will fix both of these in the next revision. Thanks and kind regards, --Fernand From: John Garry Sent: Tuesday, September 21, 2021 18:22 To: Sieber, Fernand; w...@kernel.org; robin.mur...@arm.com Cc: linux-arm-ker...@lists.infradead.org; iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: RE: [EXTERNAL] [PATCH] iommu/arm-smmu-v3: poll cmdq until it has space CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 21/09/2021 12:43, Fernand Sieber wrote: > do { I didn't follow the full logic of this change yet ... > llq->val = READ_ONCE(cmdq->q.llq.val); > - if (!queue_full(llq)) > + if (!queue_has_space(llq, n)) But is the polarity really correct? That is, if we don't have space, then exit with success (the function to check for space). > break; > > + /* > + * We must return here even if there's no space, because the > producer > + * having moved forward could mean that the last thread > observing the > + * SMMU progress has allocated space in the cmdq and moved on, > leaving > + * us in this waiting loop with no other thread updating > + * llq->state->val. what is llq->state->val? > + */ > + if (llq->prod != prod) > + return -EAGAIN; > + > ret = queue_poll(&qp); Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On Sun, Sep 19, 2021 at 02:38:34PM +0800, Liu Yi L wrote: > From: Lu Baolu > > This extends iommu core to manage security context for passthrough > devices. Please bear a long explanation for how we reach this design > instead of managing it solely in iommufd like what vfio does today. > > Devices which cannot be isolated from each other are organized into an > iommu group. When a device is assigned to the user space, the entire > group must be put in a security context so that user-initiated DMAs via > the assigned device cannot harm the rest of the system. No user access > should be granted on a device before the security context is established > for the group which the device belongs to. > Managing the security context must meet below criteria: > > 1) The group is viable for user-initiated DMAs. This implies that the > devices in the group must be either bound to a device-passthrough s/a/the same/ > framework, or driver-less, or bound to a driver which is known safe > (not do DMA). > > 2) The security context should only allow DMA to the user's memory and > devices in this group; > > 3) After the security context is established for the group, the group > viability must be continuously monitored before the user relinquishes > all devices belonging to the group. The viability might be broken e.g. > when a driver-less device is later bound to a driver which does DMA. > > 4) The security context should not be destroyed before user access > permission is withdrawn. > > Existing vfio introduces explicit container/group semantics in its uAPI > to meet above requirements. A single security context (iommu domain) > is created per container. Attaching group to container moves the entire > group into the associated security context, and vice versa. The user can > open the device only after group attach. A group can be detached only > after all devices in the group are closed. Group viability is monitored > by listening to iommu group events. > > Unlike vfio, iommufd adopts a device-centric design with all group > logistics hidden behind the fd. Binding a device to iommufd serves > as the contract to get security context established (and vice versa > for unbinding). One additional requirement in iommufd is to manage the > switch between multiple security contexts due to decoupled bind/attach: This should be a precursor series that actually does clean things up properly. There is no reason for vfio and iommufd to differ here, if we are implementing this logic into the iommu layer then it should be deleted from the VFIO layer, not left duplicated like this. IIRC in VFIO the container is the IOAS and when the group goes to create the device fd it should simply do the iommu_device_init_user_dma() followed immediately by a call to bind the container IOAS as your #3. Then delete all the group viability stuff from vfio, relying on the iommu to do it. It should have full symmetry with the iommufd. > @@ -1664,6 +1671,17 @@ static int iommu_bus_notifier(struct notifier_block > *nb, > group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER; > break; > case BUS_NOTIFY_BOUND_DRIVER: > + /* > + * FIXME: Alternatively the attached drivers could generically > + * indicate to the iommu layer that they are safe for keeping > + * the iommu group user viable by calling some function around > + * probe(). We could eliminate this gross BUG_ON() by denying > + * probe to non-iommu-safe driver. > + */ > + mutex_lock(&group->mutex); > + if (group->user_dma_owner_id) > + BUG_ON(!iommu_group_user_dma_viable(group)); > + mutex_unlock(&group->mutex); And the mini-series should fix this BUG_ON properly by interlocking with the driver core to simply refuse to bind a driver under these conditions instead of allowing userspace to crash the kernel. That alone would be justification enough to merge this work. > + > +/* > + * IOMMU core interfaces for iommufd. > + */ > + > +/* > + * FIXME: We currently simply follow vifo policy to mantain the group's > + * viability to user. Eventually, we should avoid below hard-coded list > + * by letting drivers indicate to the iommu layer that they are safe for > + * keeping the iommu group's user aviability. > + */ > +static const char * const iommu_driver_allowed[] = { > + "vfio-pci", > + "pci-stub" > +}; Yuk. This should be done with some callback in those drivers 'iomm_allow_user_dma()" Ie the basic flow would see the driver core doing some: ret = iommu_doing_kernel_dma() if (ret) do not bind driver_bind pci_stub_probe() iommu_allow_user_dma() And the various functions are manipulating some atomic. 0 = nothing happening 1 = kernel DMA 2 = user DMA No BUG_ON. Jason ___ iommu mailing list iommu@lists.linux-foundation.o
Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
On 9/20/21 2:23 PM, Kirill A. Shutemov wrote: On Wed, Sep 08, 2021 at 05:58:36PM -0500, Tom Lendacky wrote: diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c index 470b20208430..eff4d19f9cb4 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -287,7 +288,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp) unsigned long pgtable_area_len; unsigned long decrypted_base; - if (!sme_active()) + if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) return; /* This change break boot for me (in KVM on Intel host). It only reproduces with allyesconfig. More reasonable config works fine, but I didn't try to find exact cause in config. Looks like instrumentation during early boot. I worked with Boris offline to exclude arch/x86/kernel/cc_platform.c from some of the instrumentation and that allowed an allyesconfig to boot. Thanks, Tom Convertion to cc_platform_has() in __startup_64() in 8/8 has the same effect. I believe it caused by sme_me_mask access from __startup_64() without fixup_pointer() magic. I think __startup_64() requires special treatement and we should avoid cc_platform_has() there (or have a special version of the helper). Note that only AMD requires these cc_platform_has() to return true. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices
On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote: > This patch exposes the device-centric interface for vfio-pci devices. To > be compatiable with existing users, vfio-pci exposes both legacy group > interface and device-centric interface. > > As explained in last patch, this change doesn't apply to devices which > cannot be forced to snoop cache by their upstream iommu. Such devices > are still expected to be opened via the legacy group interface. > > When the device is opened via /dev/vfio/devices, vfio-pci should prevent > the user from accessing the assigned device because the device is still > attached to the default domain which may allow user-initiated DMAs to > touch arbitrary place. The user access must be blocked until the device > is later bound to an iommufd (see patch 08). The binding acts as the > contract for putting the device in a security context which ensures user- > initiated DMAs via this device cannot harm the rest of the system. > > This patch introduces a vdev->block_access flag for this purpose. It's set > when the device is opened via /dev/vfio/devices and cleared after binding > to iommufd succeeds. mmap and r/w handlers check this flag to decide whether > user access should be blocked or not. This should not be in vfio_pci. AFAIK there is no condition where a vfio driver can work without being connected to some kind of iommu back end, so the core code should handle this interlock globally. A vfio driver's ops should not be callable until the iommu is connected. The only vfio_pci patch in this series should be adding a new callback op to take in an iommufd and register the pci_device as a iommufd device. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 04/20] iommu: Add iommu_device_get_info interface
On Sun, Sep 19, 2021 at 02:38:32PM +0800, Liu Yi L wrote: > From: Lu Baolu > > This provides an interface for upper layers to get the per-device iommu > attributes. > > int iommu_device_get_info(struct device *dev, > enum iommu_devattr attr, void *data); Can't we use properly typed ops and functions here instead of a void *data? get_snoop() get_page_size() get_addr_width() ? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu-v3: poll cmdq until it has space
On 21/09/2021 12:43, Fernand Sieber wrote: do { I didn't follow the full logic of this change yet ... llq->val = READ_ONCE(cmdq->q.llq.val); - if (!queue_full(llq)) + if (!queue_has_space(llq, n)) But is the polarity really correct? That is, if we don't have space, then exit with success (the function to check for space). break; + /* +* We must return here even if there's no space, because the producer +* having moved forward could mean that the last thread observing the +* SMMU progress has allocated space in the cmdq and moved on, leaving +* us in this waiting loop with no other thread updating +* llq->state->val. what is llq->state->val? +*/ + if (llq->prod != prod) + return -EAGAIN; + ret = queue_poll(&qp); Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC v1 10/11] uapi/virtio-iommu: Add a new request type to send page response
On Fri, Apr 23, 2021 at 03:21:46PM +0530, Vivek Gautam wrote: > Once the page faults are handled, the response has to be sent to > virtio-iommu backend, from where it can be sent to the host to > prepare the response to a generated io page fault by the device. > Add a new virt-queue request type to handle this. > > Signed-off-by: Vivek Gautam > --- > include/uapi/linux/virtio_iommu.h | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/include/uapi/linux/virtio_iommu.h > b/include/uapi/linux/virtio_iommu.h > index c12d9b6a7243..1b174b98663a 100644 > --- a/include/uapi/linux/virtio_iommu.h > +++ b/include/uapi/linux/virtio_iommu.h > @@ -48,6 +48,7 @@ struct virtio_iommu_config { > #define VIRTIO_IOMMU_T_PROBE 0x05 > #define VIRTIO_IOMMU_T_ATTACH_TABLE 0x06 > #define VIRTIO_IOMMU_T_INVALIDATE0x07 > +#define VIRTIO_IOMMU_T_PAGE_RESP 0x08 > > /* Status types */ > #define VIRTIO_IOMMU_S_OK0x00 > @@ -70,6 +71,23 @@ struct virtio_iommu_req_tail { > __u8reserved[3]; > }; > > +struct virtio_iommu_req_page_resp { > + struct virtio_iommu_req_headhead; > + __le32 domain; I don't think we need this field, since the fault report doesn't come with a domain. > + __le32 endpoint; > +#define VIRTIO_IOMMU_PAGE_RESP_PASID_VALID (1 << 0) To be consistent with the rest of the document this would be VIRTIO_IOMMU_PAGE_RESP_F_PASID_VALID > + __le32 flags; > + __le32 pasid; > + __le32 grpid; > +#define VIRTIO_IOMMU_PAGE_RESP_SUCCESS (0x0) > +#define VIRTIO_IOMMU_PAGE_RESP_INVALID (0x1) > +#define VIRTIO_IOMMU_PAGE_RESP_FAILURE (0x2) > + __le16 resp_code; > + __u8pasid_valid; This field isn't needed since there already is VIRTIO_IOMMU_PAGE_RESP_PASID_VALID > + __u8reserved[9]; > + struct virtio_iommu_req_tailtail; > +}; I'd align the size of the struct to 16 bytes, but I don't think that's strictly necessary. Thanks, Jean > + > struct virtio_iommu_req_attach { > struct virtio_iommu_req_headhead; > __le32 domain; > -- > 2.17.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC v1 09/11] iommu/virtio: Implement sva bind/unbind calls
On Fri, Apr 23, 2021 at 03:21:45PM +0530, Vivek Gautam wrote: > SVA bind and unbind implementations will allow to prepare translation > context with CPU page tables that can be programmed into host iommu > hardware to realize shared address space utilization between the CPU > and virtualized devices using virtio-iommu. > > Signed-off-by: Vivek Gautam > --- > drivers/iommu/virtio-iommu.c | 199 +- > include/uapi/linux/virtio_iommu.h | 2 + > 2 files changed, 199 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index 250c137a211b..08f1294baeab 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -14,6 +14,9 @@ > #include > #include > #include > +#include > +#include > +#include > #include > #include > #include > @@ -28,6 +31,7 @@ > #include > #include "iommu-pasid-table.h" > #include "iommu-sva-lib.h" > +#include "io-pgtable-arm.h" Is this used here? > > #define MSI_IOVA_BASE0x800 > #define MSI_IOVA_LENGTH 0x10 > @@ -41,6 +45,7 @@ DEFINE_XARRAY_ALLOC1(viommu_asid_xa); > > static DEFINE_MUTEX(sva_lock); > static DEFINE_MUTEX(iopf_lock); > +static DEFINE_MUTEX(viommu_asid_lock); > > struct viommu_dev_pri_work { > struct work_struct work; > @@ -88,10 +93,22 @@ struct viommu_mapping { > struct viommu_mm { > int pasid; > u64 archid; > + struct viommu_sva_bond *bond; > struct io_pgtable_ops *ops; > struct viommu_domain*domain; > }; > > +struct viommu_sva_bond { > + struct iommu_svasva; > + struct mm_struct*mm; > + struct iommu_psdtable_mmu_notifier *viommu_mn; > + struct list_headlist; > + refcount_t refs; > +}; > + > +#define sva_to_viommu_bond(handle) \ > + container_of(handle, struct viommu_sva_bond, sva) > + > struct viommu_domain { > struct iommu_domain domain; > struct viommu_dev *viommu; > @@ -136,6 +153,7 @@ struct viommu_endpoint { > boolpri_supported; > boolsva_enabled; > booliopf_enabled; > + struct list_headbonds; > }; > > struct viommu_ep_entry { > @@ -1423,14 +1441,15 @@ static int viommu_attach_pasid_table(struct > viommu_endpoint *vdev, > > pst_cfg->iommu_dev = viommu->dev->parent; > > + mutex_lock(&viommu_asid_lock); > /* Prepare PASID tables info to allocate a new table */ > ret = viommu_prepare_pst(vdev, pst_cfg, fmt); > if (ret) > - return ret; > + goto err_out_unlock; > > ret = iommu_psdtable_alloc(tbl, pst_cfg); > if (ret) > - return ret; > + goto err_out_unlock; > > pst_cfg->iommu_dev = viommu->dev->parent; > pst_cfg->fmt = PASID_TABLE_ARM_SMMU_V3; > @@ -1452,6 +1471,7 @@ static int viommu_attach_pasid_table(struct > viommu_endpoint *vdev, > if (ret) > goto err_free_ops; > } > + mutex_unlock(&viommu_asid_lock); > } else { > /* TODO: otherwise, check for compatibility with vdev. */ > return -ENOSYS; > @@ -1467,6 +1487,8 @@ static int viommu_attach_pasid_table(struct > viommu_endpoint *vdev, > err_free_psdtable: > iommu_psdtable_free(tbl, &tbl->cfg); > > +err_out_unlock: > + mutex_unlock(&viommu_asid_lock); > return ret; > } > > @@ -1706,6 +1728,7 @@ static struct iommu_device *viommu_probe_device(struct > device *dev) > vdev->dev = dev; > vdev->viommu = viommu; > INIT_LIST_HEAD(&vdev->resv_regions); > + INIT_LIST_HEAD(&vdev->bonds); > dev_iommu_priv_set(dev, vdev); > > if (viommu->probe_size) { > @@ -1755,6 +1778,175 @@ static int viommu_of_xlate(struct device *dev, struct > of_phandle_args *args) > return iommu_fwspec_add_ids(dev, args->args, 1); > } > > +static u32 viommu_sva_get_pasid(struct iommu_sva *handle) > +{ > + struct viommu_sva_bond *bond = sva_to_viommu_bond(handle); > + > + return bond->mm->pasid; > +} > + > +static void viommu_mmu_notifier_free(struct mmu_notifier *mn) > +{ > + kfree(mn_to_pstiommu(mn)); > +} > + > +static struct mmu_notifier_ops viommu_mmu_notifier_ops = { > + .free_notifier = viommu_mmu_notifier_free, .invalidate_range and .release will be needed as well, to keep up to date with changes to the address space > +}; > + > +/* Allocate or get existing MMU notifier for this {domain, mm} pair */ > +static struct iommu_psdtable_mmu_notifier * >
Re: [PATCH RFC v1 08/11] iommu/arm-smmu-v3: Implement shared context alloc and free ops
On Fri, Apr 23, 2021 at 03:21:44PM +0530, Vivek Gautam wrote: > Implementing the alloc_shared_cd and free_shared_cd in cd-lib, and > start using them for arm-smmu-v3-sva implementation. > > Signed-off-by: Vivek Gautam > --- > .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 71 > .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 83 --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 - > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 14 > 4 files changed, 73 insertions(+), 96 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c > index 537b7c784d40..b87829796596 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c > @@ -285,16 +285,14 @@ static bool arm_smmu_free_asid(struct xarray *xa, void > *cookie_cd) > * descriptor is using it, try to replace it. > */ > static struct arm_smmu_ctx_desc * > -arm_smmu_share_asid(struct mm_struct *mm, u16 asid) > +arm_smmu_share_asid(struct iommu_pasid_table *tbl, struct mm_struct *mm, > + struct xarray *xa, u16 asid, u32 asid_bits) xa and asid_bits could be stored in some arch-specific section of the iommu_pasid_table struct. Other table drivers wouldn't need those arguments. More a comment for the parent series: it may be clearer to give a different prefix to functions in this file (arm_smmu_cd_, pst_arm_?). Reading this patch I'm a little confused by what belongs in the IOMMU driver and what is done by this library. (I also keep reading 'tbl' as 'tlb'. Maybe we could make it 'table' since that doesn't take a lot more space) > { > int ret; > u32 new_asid; > struct arm_smmu_ctx_desc *cd; > - struct arm_smmu_device *smmu; > - struct arm_smmu_domain *smmu_domain; > - struct iommu_pasid_table *tbl; > > - cd = xa_load(&arm_smmu_asid_xa, asid); > + cd = xa_load(xa, asid); > if (!cd) > return NULL; > > @@ -306,12 +304,8 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid) > return cd; > } > > - smmu_domain = container_of(cd, struct arm_smmu_domain, s1_cfg.cd); > - smmu = smmu_domain->smmu; > - tbl = smmu_domain->tbl; > - > - ret = xa_alloc(&arm_smmu_asid_xa, &new_asid, cd, > -XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); > + ret = xa_alloc(xa, &new_asid, cd, XA_LIMIT(1, (1 << asid_bits) - 1), > +GFP_KERNEL); > if (ret) > return ERR_PTR(-ENOSPC); > /* > @@ -325,48 +319,52 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid) >* be some overlap between use of both ASIDs, until we invalidate the >* TLB. >*/ > - ret = iommu_psdtable_write(tbl, &tbl->cfg, 0, cd); > + ret = arm_smmu_write_ctx_desc(&tbl->cfg, 0, cd); > if (ret) > return ERR_PTR(-ENOSYS); > > /* Invalidate TLB entries previously associated with that context */ > - iommu_psdtable_flush_tlb(tbl, smmu_domain, asid); > + iommu_psdtable_flush_tlb(tbl, tbl->cookie, asid); > > - xa_erase(&arm_smmu_asid_xa, asid); > + xa_erase(xa, asid); > return NULL; > } > > -struct arm_smmu_ctx_desc * > -arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, struct mm_struct *mm) > +static struct iommu_psdtable_mmu_notifier * > +arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, struct mm_struct *mm, > + struct xarray *xa, u32 asid_bits) > { > u16 asid; > int err = 0; > u64 tcr, par, reg; > struct arm_smmu_ctx_desc *cd; > struct arm_smmu_ctx_desc *ret = NULL; > + struct iommu_psdtable_mmu_notifier *pst_mn; > > asid = arm64_mm_context_get(mm); > if (!asid) > return ERR_PTR(-ESRCH); > > + pst_mn = kzalloc(sizeof(*pst_mn), GFP_KERNEL); > + if (!pst_mn) { > + err = -ENOMEM; > + goto out_put_context; > + } > + > cd = kzalloc(sizeof(*cd), GFP_KERNEL); > if (!cd) { > err = -ENOMEM; > - goto out_put_context; > + goto out_free_mn; > } > > refcount_set(&cd->refs, 1); > > - mutex_lock(&arm_smmu_asid_lock); > - ret = arm_smmu_share_asid(mm, asid); > + ret = arm_smmu_share_asid(tbl, mm, xa, asid, asid_bits); > if (ret) { > - mutex_unlock(&arm_smmu_asid_lock); > goto out_free_cd; > } > > - err = xa_insert(&arm_smmu_asid_xa, asid, cd, GFP_KERNEL); > - mutex_unlock(&arm_smmu_asid_lock); > - > + err = xa_insert(xa, asid, cd, GFP_KERNEL); > if (err) > goto out_free_asid; > > @@ -406,21 +404,26 @@ arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, > struct mm_struct *mm) > cd->asid = asid; > cd->mm = mm; > > - return cd; > + pst_mn->vendor.cd = cd; > + return pst_m
Re: [PATCH RFC v1 05/11] iommu/virtio: Add SVA feature and related enable/disable callbacks
On Fri, Apr 23, 2021 at 03:21:41PM +0530, Vivek Gautam wrote: > Add a feature flag to virtio iommu for Shared virtual addressing > (SVA). This feature would indicate the availablily path for handling > device page faults, and the provision for sending page response. In this case the feature should probably be called PAGE_REQUEST or similar. SVA aggregates PF + PASID + shared page tables Thanks, Jean > Also add necessary methods to enable and disable SVA so that the > masters can enable the SVA path. This also requires enabling the > PRI capability on the device. > > Signed-off-by: Vivek Gautam > --- > drivers/iommu/virtio-iommu.c | 268 ++ > include/uapi/linux/virtio_iommu.h | 1 + > 2 files changed, 269 insertions(+) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index 3da5f0807711..250c137a211b 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -26,6 +27,7 @@ > > #include > #include "iommu-pasid-table.h" > +#include "iommu-sva-lib.h" > > #define MSI_IOVA_BASE0x800 > #define MSI_IOVA_LENGTH 0x10 > @@ -37,6 +39,9 @@ > /* Some architectures need an Address Space ID for each page table */ > DEFINE_XARRAY_ALLOC1(viommu_asid_xa); > > +static DEFINE_MUTEX(sva_lock); > +static DEFINE_MUTEX(iopf_lock); > + > struct viommu_dev_pri_work { > struct work_struct work; > struct viommu_dev *dev; > @@ -71,6 +76,7 @@ struct viommu_dev { > > boolhas_map:1; > boolhas_table:1; > + boolhas_sva:1; > }; > > struct viommu_mapping { > @@ -124,6 +130,12 @@ struct viommu_endpoint { > void*pstf; > /* Preferred page table format */ > void*pgtf; > + > + /* sva */ > + boolats_supported; > + boolpri_supported; > + boolsva_enabled; > + booliopf_enabled; > }; > > struct viommu_ep_entry { > @@ -582,6 +594,64 @@ static int viommu_add_pstf(struct viommu_endpoint *vdev, > void *pstf, size_t len) > return 0; > } > > +static int viommu_init_ats_pri(struct viommu_endpoint *vdev) > +{ > + struct device *dev = vdev->dev; > + struct pci_dev *pdev = to_pci_dev(dev); > + > + if (!dev_is_pci(vdev->dev)) > + return -EINVAL; > + > + if (pci_ats_supported(pdev)) > + vdev->ats_supported = true; > + > + if (pci_pri_supported(pdev)) > + vdev->pri_supported = true; > + > + return 0; > +} > + > +static int viommu_enable_pri(struct viommu_endpoint *vdev) > +{ > + int ret; > + struct pci_dev *pdev; > + > + /* Let's allow only 4 requests for PRI right now */ > + size_t max_inflight_pprs = 4; > + > + if (!vdev->pri_supported || !vdev->ats_supported) > + return -ENODEV; > + > + pdev = to_pci_dev(vdev->dev); > + > + ret = pci_reset_pri(pdev); > + if (ret) > + return ret; > + > + ret = pci_enable_pri(pdev, max_inflight_pprs); > + if (ret) { > + dev_err(vdev->dev, "cannot enable PRI: %d\n", ret); > + return ret; > + } > + > + return 0; > +} > + > +static void viommu_disable_pri(struct viommu_endpoint *vdev) > +{ > + struct pci_dev *pdev; > + > + if (!dev_is_pci(vdev->dev)) > + return; > + > + pdev = to_pci_dev(vdev->dev); > + > + if (!pdev->pri_enabled) > + return; > + > + pci_disable_pri(pdev); > +} > + > static int viommu_init_queues(struct viommu_dev *viommu) > { > viommu->iopf_pri = iopf_queue_alloc(dev_name(viommu->dev)); > @@ -684,6 +754,10 @@ static int viommu_probe_endpoint(struct viommu_dev > *viommu, struct device *dev) > if (ret) > goto out_free_eps; > > + ret = viommu_init_ats_pri(vdev); > + if (ret) > + goto out_free_eps; > + > kfree(probe); > return 0; > > @@ -1681,6 +1755,194 @@ static int viommu_of_xlate(struct device *dev, struct > of_phandle_args *args) > return iommu_fwspec_add_ids(dev, args->args, 1); > } > > +static bool viommu_endpoint_iopf_supported(struct viommu_endpoint *vdev) > +{ > + /* TODO: support Stall model later */ > + return vdev->pri_supported; > +} > + > +bool viommu_endpoint_sva_supported(struct viommu_endpoint *vdev) > +{ > + struct viommu_dev *viommu = vdev->viommu; > + > + if (!viommu->has_sva) > + return false; > + > + return vdev->pasid_bits; > +} > + > +bool viommu_endpoint_sva_enabled(struct viommu_endpoint *vdev) > +{ > + bool enabled; > + > + mutex_lock(&sva_lock); > +
Re: [PATCH RFC v1 03/11] iommu/virtio: Handle incoming page faults
On Fri, Apr 23, 2021 at 03:21:39PM +0530, Vivek Gautam wrote: > Redirect the incoming page faults to the registered fault handler > that can take the fault information such as, pasid, page request > group-id, address and pasid flags. > > Signed-off-by: Vivek Gautam > --- > drivers/iommu/virtio-iommu.c | 80 ++- > include/uapi/linux/virtio_iommu.h | 1 + > 2 files changed, 80 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index c970f386f031..fd237cad1ce5 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -37,6 +37,13 @@ > /* Some architectures need an Address Space ID for each page table */ > DEFINE_XARRAY_ALLOC1(viommu_asid_xa); > > +struct viommu_dev_pri_work { > + struct work_struct work; > + struct viommu_dev *dev; > + struct virtio_iommu_fault *vfault; > + u32 endpoint; > +}; > + > struct viommu_dev { > struct iommu_device iommu; > struct device *dev; > @@ -49,6 +56,8 @@ struct viommu_dev { > struct list_headrequests; > void*evts; > struct list_headendpoints; > + struct workqueue_struct *pri_wq; > + struct viommu_dev_pri_work *pri_work; IOPF already has a workqueue, so the driver doesn't need one. iommu_report_device_fault() should be fast enough to be called from the event handler. > > /* Device configuration */ > struct iommu_domain_geometrygeometry; > @@ -666,6 +675,58 @@ static int viommu_probe_endpoint(struct viommu_dev > *viommu, struct device *dev) > return ret; > } > > +static void viommu_handle_ppr(struct work_struct *work) > +{ > + struct viommu_dev_pri_work *pwork = > + container_of(work, struct viommu_dev_pri_work, > work); > + struct viommu_dev *viommu = pwork->dev; > + struct virtio_iommu_fault *vfault = pwork->vfault; > + struct viommu_endpoint *vdev; > + struct viommu_ep_entry *ep; > + struct iommu_fault_event fault_evt = { > + .fault.type = IOMMU_FAULT_PAGE_REQ, > + }; > + struct iommu_fault_page_request *prq = &fault_evt.fault.prm; > + > + u32 flags = le32_to_cpu(vfault->flags); > + u32 prq_flags = le32_to_cpu(vfault->pr_evt_flags); > + u32 endpoint= pwork->endpoint; > + > + memset(prq, 0, sizeof(struct iommu_fault_page_request)); The fault_evt struct is already initialized > + prq->addr = le64_to_cpu(vfault->address); > + > + if (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_LAST_PAGE) > + prq->flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; > + if (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID) { > + prq->flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; > + prq->pasid = le32_to_cpu(vfault->pasid); > + prq->grpid = le32_to_cpu(vfault->grpid); > + } > + > + if (flags & VIRTIO_IOMMU_FAULT_F_READ) > + prq->perm |= IOMMU_FAULT_PERM_READ; > + if (flags & VIRTIO_IOMMU_FAULT_F_WRITE) > + prq->perm |= IOMMU_FAULT_PERM_WRITE; > + if (flags & VIRTIO_IOMMU_FAULT_F_EXEC) > + prq->perm |= IOMMU_FAULT_PERM_EXEC; > + if (flags & VIRTIO_IOMMU_FAULT_F_PRIV) > + prq->perm |= IOMMU_FAULT_PERM_PRIV; > + > + list_for_each_entry(ep, &viommu->endpoints, list) { > + if (ep->eid == endpoint) { > + vdev = ep->vdev; > + break; > + } > + } > + > + if ((prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID) && > + (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_NEEDS_PASID)) > + prq->flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID; > + > + if (iommu_report_device_fault(vdev->dev, &fault_evt)) > + dev_err(vdev->dev, "Couldn't handle page request\n"); An error likely means that nobody registered a fault handler, but we could display a few more details about the fault that would help debug the endpoint > +} > + > static int viommu_fault_handler(struct viommu_dev *viommu, > struct virtio_iommu_fault *fault) > { > @@ -679,7 +740,13 @@ static int viommu_fault_handler(struct viommu_dev > *viommu, > u32 pasid = le32_to_cpu(fault->pasid); > > if (type == VIRTIO_IOMMU_FAULT_F_PAGE_REQ) { > - dev_info(viommu->dev, "Page request fault - unhandled\n"); > + dev_info_ratelimited(viommu->dev, > + "Page request fault from EP %u\n", > + endpoint); That's rather for debugging the virtio-iommu driver, so should be dev_dbg() (or removed entirely) > + > + viommu->pri_work->vfault = fault; > + viommu->pri_work->endpoint = endpoint; > + queue_work(viommu->pri_wq, &vi
Re: [RFC 03/20] vfio: Add vfio_[un]register_device()
On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote: > With /dev/vfio/devices introduced, now a vfio device driver has three > options to expose its device to userspace: > > a) only legacy group interface, for devices which haven't been moved to > iommufd (e.g. platform devices, sw mdev, etc.); > > b) both legacy group interface and new device-centric interface, for > devices which supports iommufd but also wants to keep backward > compatibility (e.g. pci devices in this RFC); > > c) only new device-centric interface, for new devices which don't carry > backward compatibility burden (e.g. hw mdev/subdev with pasid); We shouldn't have 'b'? Where does it come from? > This patch introduces vfio_[un]register_device() helpers for the device > drivers to specify the device exposure policy to vfio core. Hence the > existing vfio_[un]register_group_dev() become the wrapper of the new > helper functions. The new device-centric interface is described as > 'nongroup' to differentiate from existing 'group' stuff. Detect what the driver supports based on the ops it declares. There should be a function provided through the ops for the driver to bind to the iommufd. > One open about how to organize the device nodes under /dev/vfio/devices/. > This RFC adopts a simple policy by keeping a flat layout with mixed devname > from all kinds of devices. The prerequisite of this model is that devnames > from different bus types are unique formats: This isn't reliable, the devname should just be vfio0, vfio1, etc The userspace can learn the correct major/minor by inspecting the sysfs. This whole concept should disappear into the prior patch that adds the struct device in the first place, and I think most of the code here can be deleted once the struct device is used properly. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC v1 02/11] iommu/virtio: Maintain a list of endpoints served by viommu_dev
On Fri, Apr 23, 2021 at 03:21:38PM +0530, Vivek Gautam wrote: > Keeping a record of list of endpoints that are served by the virtio-iommu > device would help in redirecting the requests of page faults to the > correct endpoint device to handle such requests. > > Signed-off-by: Vivek Gautam > --- > drivers/iommu/virtio-iommu.c | 21 + > 1 file changed, 21 insertions(+) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index 50039070e2aa..c970f386f031 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -48,6 +48,7 @@ struct viommu_dev { > spinlock_t request_lock; > struct list_headrequests; > void*evts; > + struct list_headendpoints; As we're going to search by ID, an xarray or rb_tree would be more appropriate than a list > > /* Device configuration */ > struct iommu_domain_geometrygeometry; > @@ -115,6 +116,12 @@ struct viommu_endpoint { > void*pgtf; > }; > > +struct viommu_ep_entry { > + u32 eid; > + struct viommu_endpoint *vdev; > + struct list_headlist; > +}; No need for a separate struct, I think you can just add the list head and id into viommu_endpoint. > + > struct viommu_request { > struct list_headlist; > void*writeback; > @@ -573,6 +580,7 @@ static int viommu_probe_endpoint(struct viommu_dev > *viommu, struct device *dev) > size_t probe_len; > struct virtio_iommu_req_probe *probe; > struct virtio_iommu_probe_property *prop; > + struct viommu_ep_entry *ep; > struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > struct viommu_endpoint *vdev = dev_iommu_priv_get(dev); > > @@ -640,6 +648,18 @@ static int viommu_probe_endpoint(struct viommu_dev > *viommu, struct device *dev) > prop = (void *)probe->properties + cur; > type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK; > } > + if (ret) > + goto out_free; > + > + ep = kzalloc(sizeof(*ep), GFP_KERNEL); > + if (!ep) { > + ret = -ENOMEM; > + goto out_free; > + } > + ep->eid = probe->endpoint; > + ep->vdev = vdev; > + > + list_add(&ep->list, &viommu->endpoints); This should be in viommu_probe_device() (viommu_probe_endpoint() is only called if F_PROBE is negotiated). I think we need a lock for this list/xarray Thanks, Jean > > out_free: > kfree(probe); > @@ -1649,6 +1669,7 @@ static int viommu_probe(struct virtio_device *vdev) > viommu->dev = dev; > viommu->vdev = vdev; > INIT_LIST_HEAD(&viommu->requests); > + INIT_LIST_HEAD(&viommu->endpoints); > > ret = viommu_init_vqs(viommu); > if (ret) > -- > 2.17.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC v1 01/11] uapi/virtio-iommu: Add page request grp-id and flags information
Hi Vivek, Thanks a lot for your work on this On Fri, Apr 23, 2021 at 03:21:37PM +0530, Vivek Gautam wrote: > Add fault information for group-id and necessary flags for page > request faults that can be handled by page fault handler in > virtio-iommu driver. > > Signed-off-by: Vivek Gautam > Cc: Joerg Roedel > Cc: Will Deacon > Cc: Robin Murphy > Cc: Jean-Philippe Brucker > Cc: Eric Auger > Cc: Alex Williamson > Cc: Kevin Tian > Cc: Jacob Pan > Cc: Liu Yi L > Cc: Lorenzo Pieralisi > Cc: Shameerali Kolothum Thodi > --- > include/uapi/linux/virtio_iommu.h | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/include/uapi/linux/virtio_iommu.h > b/include/uapi/linux/virtio_iommu.h > index f8bf927a0689..accc3318ce46 100644 > --- a/include/uapi/linux/virtio_iommu.h > +++ b/include/uapi/linux/virtio_iommu.h > @@ -307,14 +307,27 @@ struct virtio_iommu_req_invalidate { > #define VIRTIO_IOMMU_FAULT_F_DMA_UNRECOV 1 > #define VIRTIO_IOMMU_FAULT_F_PAGE_REQ2 > > +#define VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID (1 << 0) > +#define VIRTIO_IOMMU_FAULT_PRQ_F_LAST_PAGE (1 << 1) > +#define VIRTIO_IOMMU_FAULT_PRQ_F_PRIV_DATA (1 << 2) > +#define VIRTIO_IOMMU_FAULT_PRQ_F_NEEDS_PASID (1 << 3) I don't think this one is necessary here. The NEEDS_PASID flags added by commit 970471914c67 ("iommu: Allow page responses without PASID") mainly helps Linux keep track of things internally. It does tell the fault handler whether to reply with PASID or not, but we don't need that here. The virtio-iommu driver knows whether a PASID is required by looking at the "PRG Response PASID Required" bit in the PCIe capability. For non-PCIe faults (e.g. SMMU stall), I'm guessing we'll need a PROBE property to declare that the endpoint supports recoverable faults anyway, so "PASID required in response" can go through there as well. > + > +#define VIRTIO_IOMMU_FAULT_UNREC_F_PASID_VALID (1 << 0) > +#define VIRTIO_IOMMU_FAULT_UNREC_F_ADDR_VALID(1 << 1) > +#define VIRTIO_IOMMU_FAULT_UNREC_F_FETCH_ADDR_VALID (1 << 2) > + > struct virtio_iommu_fault { > __u8reason; > __u8reserved[3]; > __le16 flt_type; > __u8reserved2[2]; > + /* flags is actually permission flags */ It's also used for declaring validity of fields. VIRTIO_IOMMU_FAULT_F_ADDRESS already tells whether the address field is valid, so all the other flags introduced by this patch can go in here. > __le32 flags; > + /* flags for PASID and Page request handling info */ > + __le32 pr_evt_flags; > __le32 endpoint; > __le32 pasid; > + __le32 grpid; I'm not sure why we made it 32-bit in Linux UAPI, it's a little wasteful. PCIe PRGI is 9-bits and SMMU STAG is 16-bits. Since the scope of the grpid is the endpoint, 16-bit means 64k in-flight faults per endpoint, which seems more than enough. New fields must be appended at the end of the struct, because old drivers will expect to find the 'endpoint' field at this offset. You could remove 'reserved3' while adding 'grpid', to keep the struct layout. > __u8reserved3[4]; > __le64 address; > __u8reserved4[8]; So the base structure, currently in the spec, looks like this: struct virtio_iommu_fault { u8 reason; u8 reserved[3]; le32 flags; le32 endpoint; le32 reserved1; le64 address; }; #define VIRTIO_IOMMU_FAULT_F_READ (1 << 0) #define VIRTIO_IOMMU_FAULT_F_WRITE (1 << 1) #define VIRTIO_IOMMU_FAULT_F_ADDRESS(1 << 8) The extended struct could be: struct virtio_iommu_fault { u8 reason; u8 reserved[3]; le32 flags; le32 endpoint; le32 pasid; le64 address; /* Page request group ID */ le16 group_id; u8 reserved1[6]; /* For VT-d private data */ le64 private_data[2]; }; #define VIRTIO_IOMMU_FAULT_F_READ (1 << 0) #define VIRTIO_IOMMU_FAULT_F_WRITE (1 << 1) #define VIRTIO_IOMMU_FAULT_F_EXEC (1 << 2) #define VIRTIO_IOMMU_FAULT_F_PRIVILEGED (1 << 3) /* Last fault in group */ #define VIRTIO_IOMMU_FAULT_F_LAST (1 << 4) /* Fault is a recoverable page request and requires a response */ #define VIRTIO_IOMMU_FAULT_F_PAGE_REQ (1 << 5)
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > userspace to directly open a vfio device w/o relying on container/group > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > iommufd (more specifically in iommu core by this RFC) in a device-centric > manner. > > In case a device is exposed in both legacy and new interfaces (see next > patch for how to decide it), this patch also ensures that when the device > is already opened via one interface then the other one must be blocked. > > Signed-off-by: Liu Yi L > --- > drivers/vfio/vfio.c | 228 +++ > include/linux/vfio.h | 2 + > 2 files changed, 213 insertions(+), 17 deletions(-) > +static int vfio_init_device_class(void) > +{ > + int ret; > + > + mutex_init(&vfio.device_lock); > + idr_init(&vfio.device_idr); > + > + /* /dev/vfio/devices/$DEVICE */ > + vfio.device_class = class_create(THIS_MODULE, "vfio-device"); > + if (IS_ERR(vfio.device_class)) > + return PTR_ERR(vfio.device_class); > + > + vfio.device_class->devnode = vfio_device_devnode; > + > + ret = alloc_chrdev_region(&vfio.device_devt, 0, MINORMASK + 1, > "vfio-device"); > + if (ret) > + goto err_alloc_chrdev; > + > + cdev_init(&vfio.device_cdev, &vfio_device_fops); > + ret = cdev_add(&vfio.device_cdev, vfio.device_devt, MINORMASK + 1); > + if (ret) > + goto err_cdev_add; Huh? This is not how cdevs are used. This patch needs rewriting. The struct vfio_device should gain a 'struct device' and 'struct cdev' as non-pointer members vfio register path should end up doing cdev_device_add() for each vfio_device vfio_unregister path should do cdev_device_del() No idr should be needed, an ida is used to allocate minor numbers The struct device release function should trigger a kfree which requires some reworking of the callers vfio_init_group_dev() should do a device_initialize() vfio_uninit_group_dev() should do a device_put() The opened atomic is aweful. A newly created fd should start in a state where it has a disabled fops The only thing the disabled fops can do is register the device to the iommu fd. When successfully registered the device gets the normal fops. The registration steps should be done under a normal lock inside the vfio_device. If a vfio_device is already registered then further registration should fail. Getting the device fd via the group fd triggers the same sequence as above. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dart: Remove iommu_flush_ops
On Tue, 21 Sep 2021 16:39:34 +0100, Sven Peter wrote: > > apple_dart_tlb_flush_{all,walk} expect to get a struct apple_dart_domain > but instead get a struct iommu_domain right now. This breaks those two > functions and can lead to kernel panics like the one below. > DART can only invalidate the entire TLB and apple_dart_iotlb_sync will > already flush everything. There's no need to do that again inside those > two functions. Let's just drop them. > > pci :03:00.0: Removing from iommu group 1 > Unable to handle kernel paging request at virtual address 00010023 > [...] > Call trace: >_raw_spin_lock_irqsave+0x54/0xbc >apple_dart_hw_stream_command.constprop.0+0x2c/0x130 >apple_dart_tlb_flush_all+0x48/0x90 >free_io_pgtable_ops+0x40/0x70 >apple_dart_domain_free+0x2c/0x44 >iommu_group_release+0x68/0xac >kobject_cleanup+0x4c/0x1fc >kobject_cleanup+0x14c/0x1fc >kobject_put+0x64/0x84 >iommu_group_remove_device+0x110/0x180 >iommu_release_device+0x50/0xa0 > [...] > > Fixes: 46d1fb072e76b161 ("iommu/dart: Add DART iommu driver") > Reported-by: Marc Zyngier > Signed-off-by: Sven Peter Thanks for addressing this so quickly. Acked-by: Marc Zyngier Tested-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core
On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote: > /dev/iommu aims to provide a unified interface for managing I/O address > spaces for devices assigned to userspace. This patch adds the initial > framework to create a /dev/iommu node. Each open of this node returns an > iommufd. And this fd is the handle for userspace to initiate its I/O > address space management. > > One open: > - We call this feature as IOMMUFD in Kconfig in this RFC. However this > name is not clear enough to indicate its purpose to user. Back to 2010 > vfio even introduced a /dev/uiommu [1] as the predecessor of its > container concept. Is that a better name? Appreciate opinions here. > > [1] https://lore.kernel.org/kvm/4c0eb470.1hmjondo00nivfm6%25p...@cisco.com/ > > Signed-off-by: Liu Yi L > drivers/iommu/Kconfig | 1 + > drivers/iommu/Makefile | 1 + > drivers/iommu/iommufd/Kconfig | 11 > drivers/iommu/iommufd/Makefile | 2 + > drivers/iommu/iommufd/iommufd.c | 112 > 5 files changed, 127 insertions(+) > create mode 100644 drivers/iommu/iommufd/Kconfig > create mode 100644 drivers/iommu/iommufd/Makefile > create mode 100644 drivers/iommu/iommufd/iommufd.c > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 07b7c25cbed8..a83ce0acd09d 100644 > +++ b/drivers/iommu/Kconfig > @@ -136,6 +136,7 @@ config MSM_IOMMU > > source "drivers/iommu/amd/Kconfig" > source "drivers/iommu/intel/Kconfig" > +source "drivers/iommu/iommufd/Kconfig" > > config IRQ_REMAP > bool "Support for Interrupt Remapping" > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index c0fb0ba88143..719c799f23ad 100644 > +++ b/drivers/iommu/Makefile > @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o > obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o > +obj-$(CONFIG_IOMMUFD) += iommufd/ > diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig > new file mode 100644 > index ..9fb7769a815d > +++ b/drivers/iommu/iommufd/Kconfig > @@ -0,0 +1,11 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +config IOMMUFD > + tristate "I/O Address Space management framework for passthrough > devices" > + select IOMMU_API > + default n > + help > + provides unified I/O address space management framework for > + isolating untrusted DMAs via devices which are passed through > + to userspace drivers. > + > + If you don't know what to do here, say N. > diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile > new file mode 100644 > index ..54381a01d003 > +++ b/drivers/iommu/iommufd/Makefile > @@ -0,0 +1,2 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +obj-$(CONFIG_IOMMUFD) += iommufd.o > diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c > new file mode 100644 > index ..710b7e62988b > +++ b/drivers/iommu/iommufd/iommufd.c > @@ -0,0 +1,112 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * I/O Address Space Management for passthrough devices > + * > + * Copyright (C) 2021 Intel Corporation > + * > + * Author: Liu Yi L > + */ > + > +#define pr_fmt(fmt)"iommufd: " fmt > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* Per iommufd */ > +struct iommufd_ctx { > + refcount_t refs; > +}; A private_data of a struct file should avoid having a refcount (and this should have been a kref anyhow) Use the refcount on the struct file instead. In general the lifetime models look overly convoluted to me with refcounts being used as locks and going in all manner of directions. - No refcount on iommufd_ctx, this should use the fget on the fd. The driver facing version of the API has the driver holds a fget inside the iommufd_device. - Put a rwlock inside the iommufd_ioas that is a 'destroying_lock'. The rwlock starts out unlocked. Acquire from the xarray is rcu_lock() ioas = xa_load() if (ioas) if (down_read_trylock(&ioas->destroying_lock)) // success Unacquire is just up_read() Do down_write when the ioas is to be destroyed, do not return ebusy. - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does not need locking (order it properly too, it is in the wrong order), and don't check for duplicate devices or dev_cookie duplication, that is user error and is harmless to the kernel. > +static int iommufd_fops_release(struct inode *inode, struct file *filep) > +{ > + struct iommufd_ctx *ictx = filep->private_data; > + > + filep->private_data = NULL; unnecessary > + iommufd_ctx_put(ictx); > + > + return 0; > +} > + > +static long iommufd_fops_unl_ioctl(struct file *filep, > +unsigned int cmd, unsigned long arg) > +{ > + struct
[PATCH] iommu/dart: Remove iommu_flush_ops
apple_dart_tlb_flush_{all,walk} expect to get a struct apple_dart_domain but instead get a struct iommu_domain right now. This breaks those two functions and can lead to kernel panics like the one below. DART can only invalidate the entire TLB and apple_dart_iotlb_sync will already flush everything. There's no need to do that again inside those two functions. Let's just drop them. pci :03:00.0: Removing from iommu group 1 Unable to handle kernel paging request at virtual address 00010023 [...] Call trace: _raw_spin_lock_irqsave+0x54/0xbc apple_dart_hw_stream_command.constprop.0+0x2c/0x130 apple_dart_tlb_flush_all+0x48/0x90 free_io_pgtable_ops+0x40/0x70 apple_dart_domain_free+0x2c/0x44 iommu_group_release+0x68/0xac kobject_cleanup+0x4c/0x1fc kobject_cleanup+0x14c/0x1fc kobject_put+0x64/0x84 iommu_group_remove_device+0x110/0x180 iommu_release_device+0x50/0xa0 [...] Fixes: 46d1fb072e76b161 ("iommu/dart: Add DART iommu driver") Reported-by: Marc Zyngier Signed-off-by: Sven Peter --- drivers/iommu/apple-dart.c | 18 -- 1 file changed, 18 deletions(-) diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index c37fb4790e8a..47ffe9e49abb 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -181,7 +181,6 @@ struct apple_dart_master_cfg { static struct platform_driver apple_dart_driver; static const struct iommu_ops apple_dart_iommu_ops; -static const struct iommu_flush_ops apple_dart_tlb_ops; static struct apple_dart_domain *to_dart_domain(struct iommu_domain *dom) { @@ -336,22 +335,6 @@ static void apple_dart_iotlb_sync_map(struct iommu_domain *domain, apple_dart_domain_flush_tlb(to_dart_domain(domain)); } -static void apple_dart_tlb_flush_all(void *cookie) -{ - apple_dart_domain_flush_tlb(cookie); -} - -static void apple_dart_tlb_flush_walk(unsigned long iova, size_t size, - size_t granule, void *cookie) -{ - apple_dart_domain_flush_tlb(cookie); -} - -static const struct iommu_flush_ops apple_dart_tlb_ops = { - .tlb_flush_all = apple_dart_tlb_flush_all, - .tlb_flush_walk = apple_dart_tlb_flush_walk, -}; - static phys_addr_t apple_dart_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) { @@ -433,7 +416,6 @@ static int apple_dart_finalize_domain(struct iommu_domain *domain, .ias = 32, .oas = 36, .coherent_walk = 1, - .tlb = &apple_dart_tlb_ops, .iommu_dev = dart->dev, }; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
Hi Robin, >> I use Xen PV display. In my case, PV display backend(Dom0) allocates >> contiguous buffer via DMA-API to >> to implement zero-copy between Dom0 and DomU. >> > Well, something's gone badly wrong there - if you have to shadow the > entire thing in a bounce buffer to import it then it's hardly zero-copy, > is it? If you want to do buffer sharing the buffer really needs to be > allocated appropriately to begin with, such that all relevant devices > can access it directly. That might be something which needs fixing in Xen. > Right, in case when we want to use a zero-copy approach need to avoid using swiotlb bounce buffer for all devices which is potentially using this buffer. The root of the problem is that this buffer mapped to foreign pages and when I tried to retrieve dma_addr for this buffer I got a foreign MFN that bigger than 32 bit and swiotlb tries to use bounce buffer. I understood, that, need to find a way to avoid using swiotlb in this case. At the moment, it's unclear how to do this properly. But, this is another story... I guess, we can have the situation when some device like rcar-du needs to use a sufficiently large buffer which is greater than 256 KB (128(CURRENT_IO_TLB_SEGMENT * 2048) and need to adjust this parameter during boot time, not compilation time. In order to this point, this patch was created. Thanks, Roman пт, 17 сент. 2021 г. в 12:44, Robin Murphy : > > On 2021-09-17 10:36, Roman Skakun wrote: > > Hi, Christoph > > > > I use Xen PV display. In my case, PV display backend(Dom0) allocates > > contiguous buffer via DMA-API to > > to implement zero-copy between Dom0 and DomU. > > Well, something's gone badly wrong there - if you have to shadow the > entire thing in a bounce buffer to import it then it's hardly zero-copy, > is it? If you want to do buffer sharing the buffer really needs to be > allocated appropriately to begin with, such that all relevant devices > can access it directly. That might be something which needs fixing in Xen. > > Robin. > > > When I start Weston under DomU, I got the next log in Dom0: > > ``` > > [ 112.554471] CPU: 0 PID: 367 Comm: weston Tainted: G O > > 5.10.0-yocto-standard+ #312 > > [ 112.575149] Call trace: > > [ 112.577666] dump_backtrace+0x0/0x1b0 > > [ 112.581373] show_stack+0x18/0x70 > > [ 112.584746] dump_stack+0xd0/0x12c > > [ 112.588200] swiotlb_tbl_map_single+0x234/0x360 > > [ 112.592781] xen_swiotlb_map_page+0xe4/0x4c0 > > [ 112.597095] xen_swiotlb_map_sg+0x84/0x12c > > [ 112.601249] dma_map_sg_attrs+0x54/0x60 > > [ 112.605138] vsp1_du_map_sg+0x30/0x60 > > [ 112.608851] rcar_du_vsp_map_fb+0x134/0x170 > > [ 112.613082] rcar_du_vsp_plane_prepare_fb+0x44/0x64 > > [ 112.618007] drm_atomic_helper_prepare_planes+0xac/0x160 > > [ 112.623362] drm_atomic_helper_commit+0x88/0x390 > > [ 112.628029] drm_atomic_nonblocking_commit+0x4c/0x60 > > [ 112.633043] drm_mode_atomic_ioctl+0x9a8/0xb0c > > [ 112.637532] drm_ioctl_kernel+0xc4/0x11c > > [ 112.641506] drm_ioctl+0x21c/0x460 > > [ 112.644967] __arm64_sys_ioctl+0xa8/0xf0 > > [ 112.648939] el0_svc_common.constprop.0+0x78/0x1a0 > > [ 112.653775] do_el0_svc+0x24/0x90 > > [ 112.657148] el0_svc+0x14/0x20 > > [ 112.660254] el0_sync_handler+0x1a4/0x1b0 > > [ 112.664315] el0_sync+0x174/0x180 > > [ 112.668145] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz: > > 3686400 bytes), total 65536 (slots), used 112 (slots) > > ``` > > The problem is happened here: > > https://elixir.bootlin.com/linux/v5.14.4/source/drivers/gpu/drm/rcar-du/rcar_du_vsp.c#L202 > > > > Sgt was created in dma_get_sgtable() by dma_common_get_sgtable() and > > includes a single page chunk > > as shown here: > > https://elixir.bootlin.com/linux/v5.14.5/source/kernel/dma/ops_helpers.c#L18 > > > > After creating a new sgt, we tried to map this sgt through vsp1_du_map_sg(). > > Internally, vsp1_du_map_sg() using ops->map_sg (e.g > > xen_swiotlb_map_sg) to perform > > mapping. > > > > I realized that required segment is too big to be fitted to default > > swiotlb segment and condition > > https://elixir.bootlin.com/linux/latest/source/kernel/dma/swiotlb.c#L474 > > is always false. > > > > I know that I use a large buffer, but why can't I map this buffer in one > > chunk? > > > > Thanks! > > > > ср, 15 сент. 2021 г. в 16:53, Christoph Hellwig : > >> > >> On Wed, Sep 15, 2021 at 03:49:52PM +0200, Jan Beulich wrote: > >>> But the question remains: Why does the framebuffer need to be mapped > >>> in a single giant chunk? > >> > >> More importantly: if you use dynamic dma mappings for your framebuffer > >> you're doing something wrong. > > > > > > -- Best Regards, Roman. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 00/20] Introduce /dev/iommu for userspace I/O address space management
On Sun, Sep 19, 2021 at 02:38:28PM +0800, Liu Yi L wrote: > Linux now includes multiple device-passthrough frameworks (e.g. VFIO and > vDPA) to manage secure device access from the userspace. One critical task > of those frameworks is to put the assigned device in a secure, IOMMU- > protected context so user-initiated DMAs are prevented from doing harm to > the rest of the system. Some bot will probably send this too, but it has compile warnings and needs to be rebased to 5.15-rc1 drivers/iommu/iommufd/iommufd.c:269:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (refcount_read(&ioas->refs) > 1) { ^~ drivers/iommu/iommufd/iommufd.c:277:9: note: uninitialized use occurs here return ret; ^~~ drivers/iommu/iommufd/iommufd.c:269:2: note: remove the 'if' if its condition is always true if (refcount_read(&ioas->refs) > 1) { ^~~~ drivers/iommu/iommufd/iommufd.c:253:17: note: initialize the variable 'ret' to silence this warning int ioasid, ret; ^ = 0 drivers/iommu/iommufd/iommufd.c:727:7: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (idev->dev == dev || idev->dev_cookie == dev_cookie) { ^~ drivers/iommu/iommufd/iommufd.c:767:17: note: uninitialized use occurs here return ERR_PTR(ret); ^~~ drivers/iommu/iommufd/iommufd.c:727:3: note: remove the 'if' if its condition is always false if (idev->dev == dev || idev->dev_cookie == dev_cookie) { ^ drivers/iommu/iommufd/iommufd.c:727:7: warning: variable 'ret' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized] if (idev->dev == dev || idev->dev_cookie == dev_cookie) { ^~~~ drivers/iommu/iommufd/iommufd.c:767:17: note: uninitialized use occurs here return ERR_PTR(ret); ^~~ drivers/iommu/iommufd/iommufd.c:727:7: note: remove the '||' if its condition is always false if (idev->dev == dev || idev->dev_cookie == dev_cookie) { ^~~ drivers/iommu/iommufd/iommufd.c:717:9: note: initialize the variable 'ret' to silence this warning int ret; ^ = 0 Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: poll cmdq until it has space
When a thread sends commands to the SMMU, it needs to allocate some space to write its commands in a ring buffer. The allocation algorithms works as follows: until enough free spaced is available in the queue, repeat the following outer loop. First, try to acquire an exclusive lock to read the consumer index from the SMMU register over MMIO. If that fails, it means that another thread holds the lock (or multiple threads, in shared mode, for sync commands). The current code guarantees that when a thread is holding the lock, the consumer index will be updated from the SMMU register. So then when the acquisition of the exclusive lock fails, we can safely assume that another thread will eventually update the consumer index and enter an inner waiting loop until that happens. The problem that this patch fixes is that the waiting loop exits as soon as any space is available in the queue, so it is possible that it exits immediately if there's some space available but not enough to write the thread's commands. That means the cmdq allocation queue will fail (outer loop) and the thread will spin attempting to acquire the exclusive lock to update the consumer index from the SMMU register. If a lot of threads are failing to allocate commands, this can cause heavy contention on the lock, to the point where the system slowdown or livelocks. The livelock is possible if some other threads are attempting to execute synchronous commands. These threads need to ensure that they control updates to the consumer index so that they can correctly observe when their command is executed, they enforce that by acquiring the lock in shared mode. If there's too much contention, they never succeed to acquire the lock via the read+cmpxchg mechanism, and their progress stall. But because they already hold allocated space in the command queue, their stall prevents progress from other threads attempting to allocate space in the cmdq. This causes a livelock. This patch makes the waiting loop exit as soon as enough space is available, rather than as soon as any space is available. This means that when two threads are competing for the exclusive lock when allocating space in the queue, one of them will fail to acquiire the lock in exclusive lock and be knocked to the waiting loop and stay there until there's enough free space rather than exiting it immediately when any space is available. Threads in the waiting loop do not compete for the lock, reducing contention enough to enable synchronous threads to make progress, when applicable. Note that we cannot afford to have all threads parked in the waiting loop unless there are synchronous threads executing concurrenty, otherwise no thread is observing the SMMU register and updating the consumer index. Thus if we succeed to acquire the lock in exclusive mode, we cannot enter the waiting loop because we could be the last thread observing the SMMU. Similarly, if the producer index is updated, we need to exit the waiting loop because it could mean that the latest thread to observe the SMMU has succeeded to allocate commands and thus has moved on. Signed-off-by: Fernand Sieber --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++-- 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a388e318f86e..9ccda3bd5402 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -118,12 +118,6 @@ static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n) return space >= n; } -static bool queue_full(struct arm_smmu_ll_queue *q) -{ - return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) && - Q_WRP(q, q->prod) != Q_WRP(q, q->cons); -} - static bool queue_empty(struct arm_smmu_ll_queue *q) { return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) && @@ -582,14 +576,16 @@ static void arm_smmu_cmdq_poll_valid_map(struct arm_smmu_cmdq *cmdq, __arm_smmu_cmdq_poll_set_valid_map(cmdq, sprod, eprod, false); } -/* Wait for the command queue to become non-full */ -static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu, -struct arm_smmu_ll_queue *llq) +/* Wait for the command queue to have enough space */ +static int arm_smmu_cmdq_poll_until_has_space(struct arm_smmu_device *smmu, + struct arm_smmu_ll_queue *llq, + u32 n) { unsigned long flags; struct arm_smmu_queue_poll qp; struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); int ret = 0; + int prod; /* * Try to update our copy of cons by grabbing exclusive cmdq access. If @@ -599,15 +595,36 @@ static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu, WRITE_ONCE(cmdq->q.llq.cons, readl_relaxed(cmdq->q.cons_reg));
Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()
On 02/08/2021 17:16, Robin Murphy wrote: Also fix up all users to set this value (at 0, meaning use default). Wrap that in init_iova_domain_defaults() to avoid the mysterious 0? Sure, I can do something like that. I actually did have separate along those lines in v3 before I decided to change it. Y'know, at this point I'm now starting to seriously wonder whether moving the rcaches into iommu_dma_cookie wouldn't make a whole lot of things simpler... :/ Does that sound like crazy talk to you, or an idea worth entertaining? Hi Robin, JFYI, to try to make inroads into my IOVA issues, I'm going to look to do this first, if you don't mind. I think that the fq stuff can also be put into a separate structure also, rather than iova_domain, and that can also be a member of iommu_dma_cookie. BTW, with regards to separating the rcache magazine code out, I see someone already trying to introduce something similar: https://lore.kernel.org/lkml/cakw4uuxperg41z8lu5qyss-yegt1anud1cuiuqxc0anfqjb...@mail.gmail.com/T/#me4cc5de775ad16ab3d6e7ca854b55f274ddcba08 https://lore.kernel.org/lkml/yukerk1vvzmht...@casper.infradead.org/T/#t Cheers, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu