Re: [PATCH v4 00/13] MT8195 SMI support

2021-09-21 Thread Krzysztof Kozlowski
On Tue, 14 Sep 2021 19:36:50 +0800, Yong Wu wrote:
> This patchset mainly adds SMI support for mt8195.
> 
> Comparing with the previous version, add two new functions:
> a) add smi sub common
> b) add initial setting for smi-common and smi-larb.
> 
> Change note:
> v4:1) base on v5.15-rc1
>2) In the dt-binding:
>   a. add "else mediatek,smi: false." in the yaml.
>   b. Remove mediatek,smi_sub_common. since we have only 2 level currently,
>   It should be smi-sub-common if that node has "mediatek,smi". otherwise,
>   it is smi-common.
> 
> [...]

Applied, thanks!

[01/13] dt-bindings: memory: mediatek: Add mt8195 smi binding
commit: b01065eee432b3ae91a2c0aaab66c2cae2e9812d
[02/13] dt-bindings: memory: mediatek: Add mt8195 smi sub common
commit: 599e681a31a2dfa7359b8e420a1157ed015f840b
[03/13] memory: mtk-smi: Use clk_bulk clock ops
commit: 0e14917c57f9d8b9b7d4f41813849a29659447b3
[04/13] memory: mtk-smi: Rename smi_gen to smi_type
commit: a5c18986f404206fdbc27f208620dc13bffb5657
[05/13] memory: mtk-smi: Adjust some code position
commit: 534e0ad2ed4f9296a8c7ccb1a393bc4d5000dbad
[06/13] memory: mtk-smi: Add error handle for smi_probe
commit: 30b869e77a1c626190bbbe6b4e5f5382b0102ac3
[07/13] memory: mtk-smi: Add device link for smi-sub-common
commit: 47404757702ec8aa5c8310cdf58a267081f0ce6c
[08/13] memory: mtk-smi: Add clocks for smi-sub-common
commit: 3e4f74e0ea5a6a6d6d825fd7afd8a10afbd1152c
[09/13] memory: mtk-smi: Use devm_platform_ioremap_resource
commit: 912fea8bf8d854aef967c82a279ffd20be0326d7
[10/13] memory: mtk-smi: mt8195: Add smi support
commit: cc4f9dcd9c1543394d6ee0d635551a2bd96bcacb
[11/13] memory: mtk-smi: mt8195: Add initial setting for smi-common
commit: 431e9cab7097b5d5eb3cf2b04d4b12d272df85e0
[12/13] memory: mtk-smi: mt8195: Add initial setting for smi-larb
commit: fe6dd2a4017d3dfbf4a530d95270a1d498a16a4c
[13/13] MAINTAINERS: Add entry for MediaTek SMI
commit: 93403ede5aa4edeec2c63541b185d9c4fc9ae1e4

Best regards,
-- 
Krzysztof Kozlowski 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses

2021-09-21 Thread Lu Baolu

Hi Bjorn,

On 9/4/21 3:37 AM, Bjorn Helgaas wrote:

From: Bjorn Helgaas 

719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed
the DMA fault reason from hex to decimal.  It also added "0x" prefixes to
the PCI bus/device, e.g.,

   - DMAR: [INTR-REMAP] Request device [00:00.5]
   + DMAR: [INTR-REMAP] Request device [0x00:0x00.5]

These no longer match dev_printk() and other similar messages in
dmar_match_pci_path() and dmar_acpi_insert_dev_scope().

Drop the "0x" prefixes from the bus and device addresses.

Fixes: 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault")
Signed-off-by: Bjorn Helgaas 


Thank you for this fix. I have queued it for v5.15.

Best regards,
baolu


---
  drivers/iommu/intel/dmar.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d66f79acd14d..8647a355dad0 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1944,18 +1944,18 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, 
int type,
reason = dmar_get_fault_reason(fault_reason, &fault_type);
  
  	if (fault_type == INTR_REMAP)

-   pr_err("[INTR-REMAP] Request device [0x%02x:0x%02x.%d] fault index 
0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 
0x%llx [fault reason 0x%02x] %s\n",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr >> 48,
   fault_reason, reason);
else if (pasid == INVALID_IOASID)
-   pr_err("[%s NO_PASID] Request device [0x%02x:0x%02x.%d] fault addr 
0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[%s NO_PASID] Request device [%02x:%02x.%d] fault addr 
0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr,
   fault_reason, reason);
else
-   pr_err("[%s PASID 0x%x] Request device [0x%02x:0x%02x.%d] fault addr 
0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[%s PASID 0x%x] Request device [%02x:%02x.%d] fault addr 
0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write", pasid,
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses

2021-09-21 Thread Lu Baolu
From: Bjorn Helgaas 

719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed
the DMA fault reason from hex to decimal.  It also added "0x" prefixes to
the PCI bus/device, e.g.,

  - DMAR: [INTR-REMAP] Request device [00:00.5]
  + DMAR: [INTR-REMAP] Request device [0x00:0x00.5]

These no longer match dev_printk() and other similar messages in
dmar_match_pci_path() and dmar_acpi_insert_dev_scope().

Drop the "0x" prefixes from the bus and device addresses.

Fixes: 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault")
Signed-off-by: Bjorn Helgaas 
Link: https://lore.kernel.org/r/20210903193711.483999-1-helg...@kernel.org
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/dmar.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 0ec5514c9980..b7708b93f3fa 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1942,18 +1942,18 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, 
int type,
reason = dmar_get_fault_reason(fault_reason, &fault_type);
 
if (fault_type == INTR_REMAP)
-   pr_err("[INTR-REMAP] Request device [0x%02x:0x%02x.%d] fault 
index 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 
0x%llx [fault reason 0x%02x] %s\n",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr >> 48,
   fault_reason, reason);
else if (pasid == INVALID_IOASID)
-   pr_err("[%s NO_PASID] Request device [0x%02x:0x%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[%s NO_PASID] Request device [%02x:%02x.%d] fault addr 
0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr,
   fault_reason, reason);
else
-   pr_err("[%s PASID 0x%x] Request device [0x%02x:0x%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[%s PASID 0x%x] Request device [%02x:%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write", pasid,
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/1] iommu/vt-d: A fix for v5.15-rc3

2021-09-21 Thread Lu Baolu
Hi Joerg,

A fix is queued for v5.15. It aims to fix:

 - Drop "0x" prefix from PCI bus & device addresses

Please consider it for the iommu/fix branch.

Best regards,
Lu Baolu

Bjorn Helgaas (1):
  iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses

 drivers/iommu/intel/dmar.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 7/7] dma/idxd: Use dma-iommu PASID API instead of SVA lib

2021-09-21 Thread Jacob Pan
Showcase a partial usage of the new PASID DMA API, it replaces SVA
lib API in terms of obtaining system PASID and addressing mode setup.
But the actual work submission is not included.

Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/idxd.h |  4 +++-
 drivers/dma/idxd/init.c | 36 
 2 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 507f5d5119f9..eaedaf3c3e3b 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "registers.h"
 
 #define IDXD_DRIVER_VERSION"1.00"
@@ -253,7 +254,8 @@ struct idxd_device {
 
struct iommu_sva *sva;
unsigned int pasid;
-
+   enum iommu_dma_pasid_mode spasid_mode;
+   struct iommu_domain *domain; /* For KVA mapping */
int num_groups;
 
u32 msix_perm_offset;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index c404a1320536..8f952a4c8909 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "../dmaengine.h"
@@ -32,6 +33,11 @@ static bool sva = true;
 module_param(sva, bool, 0644);
 MODULE_PARM_DESC(sva, "Toggle SVA support on/off");
 
+static int spasid_mode = IOMMU_DMA_PASID_IOVA;
+module_param(spasid_mode, int, 0644);
+MODULE_PARM_DESC(spasid_mode,
+"Supervisor PASID mode (1: pass-through,2: DMA API)");
+
 #define DRV_NAME "idxd"
 
 bool support_enqcmd;
@@ -519,35 +525,25 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
-   unsigned int pasid;
-   struct iommu_sva *sva;
-
-   flags = SVM_FLAG_SUPERVISOR_MODE;
-
-   sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
-   if (IS_ERR(sva)) {
-   dev_warn(&idxd->pdev->dev,
-"iommu sva bind failed: %ld\n", PTR_ERR(sva));
-   return PTR_ERR(sva);
-   }
+   int pasid;
+   struct iommu_domain *domain = NULL;
 
-   pasid = iommu_sva_get_pasid(sva);
-   if (pasid == IOMMU_PASID_INVALID) {
-   iommu_sva_unbind_device(sva);
+   pasid = iommu_dma_pasid_enable(&idxd->pdev->dev, &domain, spasid_mode);
+   if (pasid == INVALID_IOASID) {
+   dev_err(&idxd->pdev->dev, "No DMA PASID in mode %d\n", 
spasid_mode);
return -ENODEV;
}
-
-   idxd->sva = sva;
idxd->pasid = pasid;
-   dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid);
+   idxd->spasid_mode = spasid_mode;
+   idxd->domain = domain;
+
return 0;
 }
 
 static void idxd_disable_system_pasid(struct idxd_device *idxd)
 {
-
-   iommu_sva_unbind_device(idxd->sva);
+   /* TODO: remove sva, restore no PASID PT and PASIDE */
+   iommu_dma_pasid_disable(&idxd->pdev->dev, idxd->domain);
idxd->sva = NULL;
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC 3/7] iommu/vt-d: Add DMA w/ PASID support for PA and IOVA

2021-09-21 Thread Jacob Pan
For physical address(PA) mode, PASID entry for the given supervisor
PASID will be set up for HW pass-through (PT). For IOVA mode, the
supervisor PASID entry will be configured the same as PASID 0, which is
a special PASID for DMA request w/o PASID, a.k.a.
RID2PASID. Additional IOTLB flush for the supervisor PASID is also
included.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 95 -
 include/linux/intel-iommu.h |  7 ++-
 2 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284a2016..cbcfd178c16f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1631,7 +1631,11 @@ static void domain_flush_piotlb(struct intel_iommu 
*iommu,
if (domain->default_pasid)
qi_flush_piotlb(iommu, did, domain->default_pasid,
addr, npages, ih);
-
+   if (domain->s_pasid) {
+   pr_alert_ratelimited("%s: pasid %u", __func__, domain->s_pasid);
+   qi_flush_piotlb(iommu, did, domain->s_pasid,
+   addr, npages, ih);
+   }
if (!list_empty(&domain->devices))
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
 }
@@ -5535,6 +5539,93 @@ static void intel_iommu_iotlb_sync_map(struct 
iommu_domain *domain,
}
 }
 
+static int intel_enable_pasid_dma(struct device *dev, u32 pasid, int mode)
+{
+   struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct device_domain_info *info;
+   unsigned long flags;
+   int ret = 0;
+
+   info = get_domain_info(dev);
+   if (!info)
+   return -ENODEV;
+
+   if (!dev_is_pci(dev) || !sm_supported(info->iommu))
+   return -EINVAL;
+
+   if (intel_iommu_enable_pasid(info->iommu, dev))
+   return -ENODEV;
+
+   spin_lock_irqsave(&iommu->lock, flags);
+   switch (mode) {
+   case IOMMU_DMA_PASID_BYPASS:
+   dev_dbg(dev, "%s PT mode", __func__);
+   /* As a precaution, translation request should be responded with
+* physical address.
+*/
+   if (!hw_pass_through) {
+   ret = -ENODEV;
+   goto exit_unlock;
+   }
+   /* HW may use large page for ATS */
+   pci_disable_ats(pdev);
+   ret = intel_pasid_setup_pass_through(info->iommu, info->domain,
+ dev, pasid);
+   if (ret)
+   dev_err(dev, "Failed SPASID %d BYPASS", pasid);
+   break;
+   case IOMMU_DMA_PASID_IOVA:
+   dev_dbg(dev, "%s IOVA mode", __func__);
+   /*
+* We could use SL but FL is preferred for consistency with VM
+* where vIOMMU exposes FL only cap
+*/
+   if (!domain_use_first_level(info->domain))
+   return -EINVAL;
+   /* To be used for IOTLB flush at PASID granularity */
+   info->domain->s_pasid = pasid;
+   ret = domain_setup_first_level(info->iommu, info->domain, dev,
+   pasid);
+   break;
+   default:
+   dev_err(dev, "Invalid PASID DMA mode %d", mode);
+   ret = -EINVAL;
+   goto exit_unlock;
+   }
+   info->pasid_mode = mode;
+exit_unlock:
+   spin_unlock_irqrestore(&iommu->lock, flags);
+
+   return ret;
+}
+
+static int intel_disable_pasid_dma(struct device *dev)
+{
+   struct device_domain_info *info;
+   int ret = 0;
+
+   info = get_domain_info(dev);
+   if (!info)
+   return -ENODEV;
+
+   if (!dev_is_pci(dev) || !sm_supported(info->iommu))
+   return -EINVAL;
+
+   if (intel_iommu_enable_pasid(info->iommu, dev))
+   return -ENODEV;
+
+   if (!dev->iommu) {
+   dev_err(dev, "No IOMMU params");
+   return -ENODEV;
+   }
+   dev_info(dev, "Tearing down DMA PASID %d",  info->domain->s_pasid);
+   intel_pasid_tear_down_entry(info->iommu, info->dev, 
info->domain->s_pasid, false);
+
+   info->domain->s_pasid = 0;
+   return ret;
+}
+
 const struct iommu_ops intel_iommu_ops = {
.capable= intel_iommu_capable,
.domain_alloc   = intel_iommu_domain_alloc,
@@ -5573,6 +5664,8 @@ const struct iommu_ops intel_iommu_ops = {
.sva_get_pasid  = intel_svm_get_pasid,
.page_response  = intel_svm_page_response,
 #endif
+   .enable_pasid_dma   = intel_enable_pasid_dma,
+   .disable_pasid_dma  = intel_disable_pasid_dma,
 };
 
 static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/include/linux/intel-iommu.h b

[RFC 5/7] iommu/vt-d: Add support for KVA PASID mode

2021-09-21 Thread Jacob Pan
To support KVA fast mode, the VT-d driver must support domain allocation
of IOMMU_DOMAIN_KVA type. Since all devices in fast KVA mode share the
same kernel mapping, a single KVA domain is sufficient. This global KVA
domain contains the kernel mapping, i.e. init_mm.pgd.

The programming of the KVA domain follows the existing flow of auxiliary
domain attachment.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 59 ++---
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cbcfd178c16f..0dabd5f75acf 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -293,6 +293,9 @@ static inline void context_clear_entry(struct context_entry 
*context)
  * 3. Each iommu mapps to this domain if successful.
  */
 static struct dmar_domain *si_domain;
+
+/* This domain is used for shared virtual addressing with CPU kernel mapping */
+static struct dmar_domain *kva_domain;
 static int hw_pass_through = 1;
 
 #define for_each_domain_iommu(idx, domain) \
@@ -1989,6 +1992,10 @@ static void domain_exit(struct dmar_domain *domain)
/* Remove associated devices and clear attached or cached domains */
domain_remove_dev_info(domain);
 
+   /* There is no IOMMU page table for KVA */
+   if (domain->pgd == (struct dma_pte *)init_mm.pgd)
+   return;
+
/* destroy iovas */
if (domain->domain.type == IOMMU_DOMAIN_DMA)
iommu_put_dma_cookie(&domain->domain);
@@ -2533,6 +2540,10 @@ static int domain_setup_first_level(struct intel_iommu 
*iommu,
int agaw, level;
int flags = 0;
 
+   if (domain->domain.type == IOMMU_DOMAIN_KVA) {
+   flags |= PASID_FLAG_SUPERVISOR_MODE;
+   goto do_setup;
+   }
/*
 * Skip top levels of page tables for iommu which has
 * less agaw than default. Unnecessary for PT mode.
@@ -2554,7 +2565,7 @@ static int domain_setup_first_level(struct intel_iommu 
*iommu,
 
if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
flags |= PASID_FLAG_PAGE_SNOOP;
-
+do_setup:
return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid,
 domain->iommu_did[iommu->seq_id],
 flags);
@@ -2713,7 +2724,28 @@ static int iommu_domain_identity_map(struct dmar_domain 
*domain,
 }
 
 static int md_domain_init(struct dmar_domain *domain, int guest_width);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int __init kva_domain_init(void)
+{
+   struct dmar_domain *dmar_domain;
+   struct iommu_domain *domain;
 
+   kva_domain = alloc_domain(0);
+   if (!kva_domain) {
+   pr_err("Can't allocate KVA domain\n");
+   return -EFAULT;
+   }
+   kva_domain->pgd = (struct dma_pte *)init_mm.pgd;
+   domain = &kva_domain->domain;
+   domain->type = IOMMU_DOMAIN_KVA;
+   /* REVISIT: may not need this other than sanity check */
+   domain->geometry.aperture_start = 0;
+   domain->geometry.aperture_end   =
+   __DOMAIN_MAX_ADDR(dmar_domain->gaw);
+   domain->geometry.force_aperture = true;
+   return 0;
+}
+#endif
 static int __init si_domain_init(int hw)
 {
struct dmar_rmrr_unit *rmrr;
@@ -3363,6 +3395,11 @@ static int __init init_dmars(void)
down_write(&dmar_global_lock);
if (ret)
goto free_iommu;
+   /* For in-kernel DMA with PASID in SVA */
+   ret = kva_domain_init();
+   if (ret)
+   goto free_iommu;
+
}
 #endif
ret = dmar_set_interrupt(iommu);
@@ -4558,6 +4595,9 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
domain->geometry.force_aperture = true;
 
return domain;
+   case IOMMU_DOMAIN_KVA:
+   /* Use a global domain for shared KVA mapping */
+   return &kva_domain->domain;
case IOMMU_DOMAIN_IDENTITY:
return &si_domain->domain;
default:
@@ -4583,7 +4623,8 @@ is_aux_domain(struct device *dev, struct iommu_domain 
*domain)
struct device_domain_info *info = get_domain_info(dev);
 
return info && info->auxd_enabled &&
-   domain->type == IOMMU_DOMAIN_UNMANAGED;
+   (domain->type == IOMMU_DOMAIN_UNMANAGED ||
+   domain->type == IOMMU_DOMAIN_KVA);
 }
 
 static inline struct subdev_domain_info *
@@ -4693,8 +4734,8 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
if (ret)
goto attach_failed;
 
-   /* Setup the PASID entry for mediated devices: */
-   if (domain_use_first_level(domain))
+   /* Setup the PASID entry for devices do DMA

[RFC 6/7] iommu: Add KVA map API

2021-09-21 Thread Jacob Pan
This patch adds KVA map API. It enforces KVA address range checking and
other potential sanity checks. Currently, only the direct map range is
checked.
For trusted devices, this API returns immediately after the above sanity
check. For untrusted devices, this API serves as a simple wrapper around
IOMMU map/unmap APIs. 
OPEN: Alignment at the minimum page size is required, not as rich and
flexible as DMA-APIs.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 57 +++
 include/linux/iommu.h |  5 
 2 files changed, 62 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index acfdcd7ebd6a..45ba55941209 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2490,6 +2490,63 @@ int iommu_map(struct iommu_domain *domain, unsigned long 
iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
+/*
+ * REVISIT: This might not be sufficient. Could also check permission match,
+ * exclude kernel text, etc.
+ */
+static inline bool is_kernel_direct_map(unsigned long start, phys_addr_t size)
+{
+   return (start >= PAGE_OFFSET) && ((start + size) <= VMALLOC_START);
+}
+
+/**
+ * @brief Map kernel virtual address for DMA remap. DMA request with
+ * domain's default PASID will target kernel virtual address space.
+ *
+ * @param domain   Domain contains the PASID
+ * @param page Kernel virtual address
+ * @param size Size to map
+ * @param prot Permissions
+ * @return int 0 on success or error code
+ */
+int iommu_map_kva(struct iommu_domain *domain, struct page *page,
+ size_t size, int prot)
+{
+   phys_addr_t phys = page_to_phys(page);
+   void *kva = phys_to_virt(phys);
+
+   /*
+* TODO: Limit DMA to kernel direct mapping only, avoid dynamic range
+* until we have mmu_notifier for making IOTLB coherent with CPU.
+*/
+   if (!is_kernel_direct_map((unsigned long)kva, size))
+   return -EINVAL;
+   /* KVA domain type indicates shared CPU page table, skip building
+* IOMMU page tables. This is the fast mode where only sanity check
+* is performed.
+*/
+   if (domain->type == IOMMU_DOMAIN_KVA)
+   return 0;
+
+   return iommu_map(domain, (unsigned long)kva, phys, size, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_map_kva);
+
+int iommu_unmap_kva(struct iommu_domain *domain, void *kva,
+   size_t size)
+{
+   if (!is_kernel_direct_map((unsigned long)kva, size))
+   return -EINVAL;
+
+   if (domain->type == IOMMU_DOMAIN_KVA) {
+   pr_debug_ratelimited("unmap kva skipped %llx", (u64)kva);
+   return 0;
+   }
+   /* REVISIT: do we need a fast version? */
+   return iommu_unmap(domain, (unsigned long)kva, size);
+}
+EXPORT_SYMBOL_GPL(iommu_unmap_kva);
+
 int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
  phys_addr_t paddr, size_t size, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index cd8225f6bc23..c0fac050ca57 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -427,6 +427,11 @@ extern size_t iommu_map_sg(struct iommu_domain *domain, 
unsigned long iova,
 extern size_t iommu_map_sg_atomic(struct iommu_domain *domain,
  unsigned long iova, struct scatterlist *sg,
  unsigned int nents, int prot);
+extern int iommu_map_kva(struct iommu_domain *domain,
+struct page *page, size_t size, int prot);
+extern int iommu_unmap_kva(struct iommu_domain *domain,
+  void *kva, size_t size);
+
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t 
iova);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
iommu_fault_handler_t handler, void *token);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC 1/7] ioasid: reserve special PASID for in-kernel DMA

2021-09-21 Thread Jacob Pan
IOASIDs associated with user processes are subject to resource
restrictions. However, in-kernel usages can target only one global
kernel virtual address mapping. Reserve a special IOASID for the devices
that perform DMA requests with PASID. This global PASID is excluded from
the IOASID allocator.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
 drivers/iommu/ioasid.c  | 2 ++
 drivers/iommu/iommu-sva-lib.c   | 1 +
 include/linux/ioasid.h  | 4 
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..c5fb89bd6229 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -329,7 +329,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
return ERR_PTR(-ENOMEM);
 
/* Allocate a PASID for this mm if necessary */
-   ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
+   ret = iommu_sva_alloc_pasid(mm, IOASID_ALLOC_START, (1U << 
master->ssid_bits) - 1);
if (ret)
goto err_free_bond;
 
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 50ee27bbd04e..89c6132bf1ec 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -317,6 +317,8 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, 
ioasid_t max,
data->private = private;
refcount_set(&data->refs, 1);
 
+   if (min < IOASID_ALLOC_BASE)
+   min = IOASID_ALLOC_BASE;
/*
 * Custom allocator needs allocator data to perform platform specific
 * operations.
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index bd41405d34e9..4f56843517e5 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -28,6 +28,7 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, 
ioasid_t max)
int ret = 0;
ioasid_t pasid;
 
+   /* TODO: no need to check!! reserved range is checked in ioasid_alloc() 
*/
if (min == INVALID_IOASID || max == INVALID_IOASID ||
min == 0 || max < min)
return -EINVAL;
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index e9dacd4b9f6b..4d435cbd48b8 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -6,6 +6,10 @@
 #include 
 
 #define INVALID_IOASID ((ioasid_t)-1)
+#define IOASID_DMA_NO_PASID0 /* For DMA request w/o PASID */
+#define IOASID_DMA_PASID   1 /* For in-kernel DMA w/ PASID */
+#define IOASID_ALLOC_BASE  2 /* Start of the allocation */
+
 typedef unsigned int ioasid_t;
 typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
 typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 4/7] dma-iommu: Add support for DMA w/ PASID in KVA

2021-09-21 Thread Jacob Pan
Sharing virtual addresses between DMA and CPU has many advantages. It
simplifies the programming model, enhances DMA security over physical
addresses. This patch adds KVA support for DMA IOMMU API. Strict and
fast sub-modes are supported transparently based on the device
trustfulness.

The strict mode is intended for untrusted devices. KVA mapping is
established on-demand by a separate IOMMU page table referenced by a
supervisor PASID. An aux domain is allocated per device to carry
the supervisor PASID.

The fast mode is for trusted devices where KVA mapping is shared with
the CPU via kernel page table. Vendor IOMMU driver can choose to use a
global KVA domain for all devices in fast KVA mode.

The follow-up patches will introduce iommu_map_kva() API where KVA
domains will be used.The performance advantage of the fast mode rests
upon the fact that there is no need to build/teardown a separate IOMMU
page table for each DMA buffer.
                                                                       
Though multiple PASIDs and domains per device can be supported the    
lack of compelling usages leads to a single PASID option for now.       

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dma-iommu.c | 75 ++-
 drivers/iommu/iommu.c |  6 
 include/linux/dma-iommu.h |  6 ++--
 include/linux/iommu.h |  6 
 4 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 490731659def..5b25dbcef8ee 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -174,9 +174,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
 
+static bool dev_is_untrusted(struct device *dev)
+{
+   return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
+}
+
 /**
  * iommu_dma_pasid_enable --Enable device DMA request with PASID
  * @dev:   Device to be enabled
+ * @domain:IOMMU domain returned for KVA mode mapping
  * @mode:  DMA addressing mode
  *
  * The following mutually exclusive DMA addressing modes are supported:
@@ -189,10 +195,25 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
  * tables. PCI requester ID (RID) and RID+PASID will be pointed to the same
  * PGD. i.e. the default DMA domain will be used by both RID and RID+PASID.
  *
+ *  3. KVA mode. DMA address == CPU virtual address. There are two sub-modes:
+ * strict mode and fast mode.
+ * The strict mode is intended for the untrusted devices, where DMA address
+ * is identical to KVA but restricted per device on the supervisor PASID.
+ * The fast mode is for trusted devices where its DMA is only restricted
+ * by the kernel page tables used by the CPU. iommu_domains with UNMANAGED
+ * and KVA types are returned respectively. They are used by 
iommu_map_kva()
+ *
+ * The performance advantage of the fast mode (based on whether the device
+ * is trusted or user allowed), relies on the fact that there is no need
+ * to build/teardown a separate IOMMU page tables for KVA mapping.
+ *
+ * Though multiple PASIDs and domains per device can be supported but the
+ * lack of compelling usages lead to a single PASID option for now.
+ *
  * @return the supervisor PASID to be used for DMA. Or INVALID_IOASID upon
  * failure.
  */
-int iommu_dma_pasid_enable(struct device *dev,
+int iommu_dma_pasid_enable(struct device *dev, struct iommu_domain **domain,
   enum iommu_dma_pasid_mode mode)
 {
int pasid = INVALID_IOASID;
@@ -201,8 +222,37 @@ int iommu_dma_pasid_enable(struct device *dev,
/* TODO: only allow a single mode each time, track PASID DMA enabling
 * status per device. Perhaps add a flag in struct device.dev_iommu.
 */
+   if (mode == IOMMU_DMA_PASID_KVA) {
+   if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX)) {
+   dev_err(dev, "No aux domain support");
+   goto exit;
+   };
+   /*
+* Untrusted devices gets an unmanaged domain which will be
+* returned to the caller for strict IOMMU API KVA mapping.
+* Trusted device gets a special KVA domain with init_mm.pgd
+* assigned.
+*/
+   if (dev_is_untrusted(dev))
+   dom = iommu_domain_alloc(dev->bus);
+   else
+   dom = iommu_domain_alloc_kva(dev->bus);
+   if (!dom) {
+   dev_err(dev, "No KVA iommu domain allocated");
+   goto exit_disable_aux;
+   }
+   if (iommu_aux_attach_device(dom, dev)) {
+   dev_err(dev, "Failed to attach KVA iommu domain");
+   goto exit_free_domain;
+   };
+   pasid = iommu_aux_get_pasid(dom, dev);
+
+   dev_dbg(dev, "KVA mode pasid %d", pasi

[RFC 2/7] dma-iommu: Add API for DMA request with PASID

2021-09-21 Thread Jacob Pan
DMA-API is the standard way for device drivers to perform in-kernel
DMA requests. If security is on, isolation is enforced by the IOMMU at
requester ID (RID) granularity. With the introduction of process address
space ID (PASID), DMA security is enforced per RID+PASID
granularity. On some modern platforms, DMA request with PASID is the
only option available. e.g. The shared work queues in Intel Data
Streaming Accelerators (DSA).
To support DMA with PASID while maintaining the ubiquitous usage of DMA
API, this patch introduces a new IOMMU DMA API that supports two
addressing modes:
1. Physical address or bypass mode. This is similar to DMA direct mode,
where trusted devices can DMA passthrough IOMMU.
2. IOVA mode that abides DMA APIs. Once set up, callers can use DMA API
as-is. DMA requests w/ and w/o PASID will be mapped by the same page
tables.  i.e. the default DMA domain will be used by both RID and
RID+PASID.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dma-iommu.c | 54 +++
 include/linux/dma-iommu.h | 12 +
 include/linux/iommu.h |  2 ++
 3 files changed, 68 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..490731659def 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -174,6 +174,60 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
 
+/**
+ * iommu_dma_pasid_enable --Enable device DMA request with PASID
+ * @dev:   Device to be enabled
+ * @mode:  DMA addressing mode
+ *
+ * The following mutually exclusive DMA addressing modes are supported:
+ *
+ *  1. Physical address or bypass mode. This is similar to DMA direct where
+ * trusted devices can DMA pass-through IOMMU.
+ *
+ *  2. IOVA mode. This abides DMA APIs. Once set up, callers can use DMA API
+ * as-is. DMA request w/ and w/o PASID will be mapped by the same page
+ * tables. PCI requester ID (RID) and RID+PASID will be pointed to the same
+ * PGD. i.e. the default DMA domain will be used by both RID and RID+PASID.
+ *
+ * @return the supervisor PASID to be used for DMA. Or INVALID_IOASID upon
+ * failure.
+ */
+int iommu_dma_pasid_enable(struct device *dev,
+  enum iommu_dma_pasid_mode mode)
+{
+   int pasid = INVALID_IOASID;
+   struct iommu_domain *dom = NULL;
+
+   /* TODO: only allow a single mode each time, track PASID DMA enabling
+* status per device. Perhaps add a flag in struct device.dev_iommu.
+*/
+
+   /* Call vendor drivers to handle IOVA, bypass mode */
+   dom = iommu_get_domain_for_dev(dev);
+   if (dom->ops->enable_pasid_dma(dev, IOASID_DMA_PASID, mode)) {
+   dev_dbg(dev, "Failed to enable DMA pasid in mode %d", mode);
+   goto exit;
+   }
+   pasid = IOASID_DMA_PASID;
+
+   dev_dbg(dev, "Enable DMA pasid %d in mode %d", pasid, mode);
+   goto exit;
+exit:
+   return pasid;
+}
+EXPORT_SYMBOL(iommu_dma_pasid_enable);
+
+int iommu_dma_pasid_disable(struct device *dev)
+{
+   struct iommu_domain *dom;
+
+   /* Call vendor iommu ops to clean up supervisor PASID context */
+   dom = iommu_get_domain_for_dev(dev);
+
+   return dom->ops->disable_pasid_dma(dev);
+}
+EXPORT_SYMBOL(iommu_dma_pasid_disable);
+
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 6e75a2d689b4..3c1555e0fd51 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -17,6 +17,18 @@
 int iommu_get_dma_cookie(struct iommu_domain *domain);
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
+enum iommu_dma_pasid_mode {
+   /* Pass-through mode, use physical address */
+   IOMMU_DMA_PASID_BYPASS = 1,
+   /* Compatible with DMA APIs, same mapping as DMA w/o PASID */
+   IOMMU_DMA_PASID_IOVA,
+};
+/* For devices that can do DMA request with PASID, setup the system
+ * based on the address mode selected.
+ */
+int iommu_dma_pasid_enable(struct device *dev,
+  enum iommu_dma_pasid_mode mode);
+int iommu_dma_pasid_disable(struct device *dev);
 
 /* Setup call for arch DMA mapping code */
 void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..610cbfd03e6b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -283,6 +283,8 @@ struct iommu_ops {
 
int (*def_domain_type)(struct device *dev);
 
+   int (*enable_pasid_dma)(struct device *dev, u32 pasid, int mode);
+   int (*disable_pasid_dma)(struct device *dev);
unsigned long pgsize_bitmap;
struct module *owner;
 };
-- 
2.25.1

___
iommu mailing list
iommu@li

[RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-21 Thread Jacob Pan
Hi Joerg/Jason/Christoph et all,

The current in-kernel supervisor PASID support is based on the SVM/SVA
machinery in sva-lib. Kernel SVA is achieved by extending a special flag
to indicate the binding of the device and a page table should be performed
on init_mm instead of the mm of the current process.Page requests and other
differences between user and kernel SVA are handled as special cases.

This unrestricted binding with the kernel page table is being challenged
for security and the convention that in-kernel DMA must be compatible with
DMA APIs.
(https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
There is also the lack of IOTLB synchronization upon kernel page table updates.

This patchset is trying to address these concerns by having an explicit DMA
API compatible model while continue to support in-kernel use of DMA requests
with PASID. Specifically, the following DMA-IOMMU APIs are introduced:

int iommu_dma_pasid_enable/disable(struct device *dev,
   struct iommu_domain **domain,
   enum iommu_dma_pasid_mode mode);
int iommu_map/unmap_kva(struct iommu_domain *domain,
void *cpu_addr,size_t size, int prot);

The following three addressing modes are supported with example API usages
by device drivers.

1. Physical address (bypass) mode. Similar to DMA direct where trusted devices
can DMA pass through IOMMU on a per PASID basis.
Example:
pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
/* Use the returning PASID and PA for work submission */

2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as the
PCI requester ID (RID)
Example:
pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA);
/* Use the PASID and DMA API allocated IOVA for work submission */

3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes
transparently based on device trustfulness.
Example:
pasid = iommu_dma_pasid_enable(dev, &domain, IOMMU_DMA_PASID_KVA);
iommu_map_kva(domain, &buf, size, prot);
/* Use the returned PASID and KVA to submit work */
Where:
Fast mode: Shared CPU page tables for trusted devices only
Strict mode: IOMMU domain returned for the untrusted device to
replicate KVA-PA mapping in IOMMU page tables.

On a per device basis, DMA address and performance modes are enabled by the
device drivers. Platform information such as trustability, user command line
input (not included in this set) could also be taken into consideration (not
implemented in this RFC).

This RFC is intended to communicate the API directions. Little testing is done
outside IDXD and DMA engine tests.

For PA and IOVA modes, the implementation is straightforward and tested with
Intel IDXD driver. But several opens remain in KVA fast mode thus not tested:
1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a
result of module loading/eBPF load. Adding kernel mmu notifier?
2. The use of the auxiliary domain for KVA map, will aux domain stay in the
long term? Is there another way to represent sub-device granu isolation?
3. Is limiting the KVA sharing to the direct map range reasonable and
practical for all architectures?


Many thanks to Ashok Raj, Kevin Tian, and Baolu who provided feedback and
many ideas in this set.

Thanks,

Jacob

Jacob Pan (7):
  ioasid: reserve special PASID for in-kernel DMA
  dma-iommu: Add API for DMA request with PASID
  iommu/vt-d: Add DMA w/ PASID support for PA and IOVA
  dma-iommu: Add support for DMA w/ PASID in KVA
  iommu/vt-d: Add support for KVA PASID mode
  iommu: Add KVA map API
  dma/idxd: Use dma-iommu PASID API instead of SVA lib

 drivers/dma/idxd/idxd.h   |   4 +-
 drivers/dma/idxd/init.c   |  36 ++--
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
 drivers/iommu/dma-iommu.c | 123 +-
 drivers/iommu/intel/iommu.c   | 154 +-
 drivers/iommu/ioasid.c|   2 +
 drivers/iommu/iommu-sva-lib.c |   1 +
 drivers/iommu/iommu.c |  63 +++
 include/linux/dma-iommu.h |  14 ++
 include/linux/intel-iommu.h   |   7 +-
 include/linux/ioasid.h|   4 +
 include/linux/iommu.h |  13 ++
 12 files changed, 390 insertions(+), 33 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 04/20] iommu: Add iommu_device_get_info interface

2021-09-21 Thread Christoph Hellwig
On Wed, Sep 22, 2021 at 10:31:47AM +0800, Lu Baolu wrote:
> Hi Jason,
>
> On 9/22/21 12:19 AM, Jason Gunthorpe wrote:
>> On Sun, Sep 19, 2021 at 02:38:32PM +0800, Liu Yi L wrote:
>>> From: Lu Baolu 
>>>
>>> This provides an interface for upper layers to get the per-device iommu
>>> attributes.
>>>
>>>  int iommu_device_get_info(struct device *dev,
>>>enum iommu_devattr attr, void *data);
>>
>> Can't we use properly typed ops and functions here instead of a void
>> *data?
>>
>> get_snoop()
>> get_page_size()
>> get_addr_width()
>
> Yeah! Above are more friendly to the upper layer callers.

The other option would be a struct with all the attributes.  Still
type safe, but not as many methods.  It'll require a little boilerplate
in the callers, though.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 16/20] vfio/type1: Export symbols for dma [un]map code sharing

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 2:15 AM
> 
> On Sun, Sep 19, 2021 at 02:38:44PM +0800, Liu Yi L wrote:
> > [HACK. will fix in v2]
> >
> > There are two options to impelement vfio type1v2 mapping semantics in
> > /dev/iommu.
> >
> > One is to duplicate the related code from vfio as the starting point,
> > and then merge with vfio type1 at a later time. However
> vfio_iommu_type1.c
> > has over 3000LOC with ~80% related to dma management logic, including:
> 
> I can't really see a way forward like this. I think some scheme to
> move the vfio datastructure is going to be necessary.
> 
> > - the dma map/unmap metadata management
> > - page pinning, and related accounting
> > - iova range reporting
> > - dirty bitmap retrieving
> > - dynamic vaddr update, etc.
> 
> All of this needs to be part of the iommufd anyhow..

yes

> 
> > The alternative is to consolidate type1v2 logic in /dev/iommu immediately,
> > which requires converting vfio_iommu_type1 to be a shim driver.
> 
> Another choice is the the datastructure coulde move and the two
> drivers could share its code and continue to exist more independently
> 

where to put the shared code?

btw this is one major open that I plan to discuss in LPC. 😊
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [RFC 15/20] vfio/pci: Add VFIO_DEVICE_[DE]ATTACH_IOASID

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 2:04 AM
> 
> On Sun, Sep 19, 2021 at 02:38:43PM +0800, Liu Yi L wrote:
> > This patch adds interface for userspace to attach device to specified
> > IOASID.
> >
> > Note:
> > One device can only be attached to one IOASID in this version. This is
> > on par with what vfio provides today. In the future this restriction can
> > be relaxed when multiple I/O address spaces are supported per device
> 
> ?? In VFIO the container is the IOS and the container can be shared
> with multiple devices. This needs to start at about the same
> functionality.

a device can be only attached to one container. One container can be
shared by multiple devices.

a device can be only attached to one IOASID. One IOASID can be shared
by multiple devices.

it does start at the same functionality.

> 
> > +   } else if (cmd == VFIO_DEVICE_ATTACH_IOASID) {
> 
> This should be in the core code, right? There is nothing PCI specific
> here.
> 

but if you insist on a pci-wrapper attach function, we still need something
here (e.g. with .attach_ioasid() callback)?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 14/20] iommu/iommufd: Add iommufd_device_[de]attach_ioasid()

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 2:02 AM
> 
> > +static int ioas_check_device_compatibility(struct iommufd_ioas *ioas,
> > +  struct device *dev)
> > +{
> > +   bool snoop = false;
> > +   u32 addr_width;
> > +   int ret;
> > +
> > +   /*
> > +* currently we only support I/O page table with iommu enforce-
> snoop
> > +* format. Attaching a device which doesn't support this format in its
> > +* upstreaming iommu is rejected.
> > +*/
> > +   ret = iommu_device_get_info(dev,
> IOMMU_DEV_INFO_FORCE_SNOOP, &snoop);
> > +   if (ret || !snoop)
> > +   return -EINVAL;
> > +
> > +   ret = iommu_device_get_info(dev,
> IOMMU_DEV_INFO_ADDR_WIDTH, &addr_width);
> > +   if (ret || addr_width < ioas->addr_width)
> > +   return -EINVAL;
> > +
> > +   /* TODO: also need to check permitted iova ranges and pgsize
> bitmap */
> > +
> > +   return 0;
> > +}
> 
> This seems kind of weird..
> 
> I expect the iommufd to hold a SW copy of the IO page table and each
> time a new domain is to be created it should push the SW copy into the
> domain. If the domain cannot support it then the domain driver should
> naturally fail a request.
> 
> When the user changes the IO page table the SW copy is updated then
> all of the domains are updated too - again if any domain cannot
> support the change then it fails and the change is rolled back.
> 
> It seems like this is a side effect of roughly hacking in the vfio
> code?

Actually this was one open we closed in previous design proposal, but
looks you have a different thought now.

vfio maintains one ioas per container. Devices in the container
can be attached to different domains (e.g. due to snoop format). Every
time when the ioas is updated, every attached domain is updated
in accordance. 

You recommended one-ioas-one-domain model instead, i.e. any device 
with a format incompatible with the one currently used in ioas has to 
be attached to a new ioas, even if the two ioas's have the same mapping.
This leads to compatibility check at attaching time.

Now you want returning back to the vfio model?

> 
> > +   /*
> > +* Each ioas is backed by an iommu domain, which is allocated
> > +* when the ioas is attached for the first time and then shared
> > +* by following devices.
> > +*/
> > +   if (list_empty(&ioas->device_list)) {
> 
> Seems strange, what if the devices are forced to have different
> domains? We don't want to model that in the SW layer..

this is due to above background

> 
> > +   /* Install the I/O page table to the iommu for this device */
> > +   ret = iommu_attach_device(domain, idev->dev);
> > +   if (ret)
> > +   goto out_domain;
> 
> This is where things start to get confusing when you talk about PASID
> as the above call needs to be some PASID centric API.

yes, for pasid new api (e.g. iommu_attach_device_pasid()) will be added.

but here we only talk about physical device, and iommu_attach_device()
is only for physical device.

> 
> > @@ -27,6 +28,16 @@ struct iommufd_device *
> >  iommufd_bind_device(int fd, struct device *dev, u64 dev_cookie);
> >  void iommufd_unbind_device(struct iommufd_device *idev);
> >
> > +int iommufd_device_attach_ioasid(struct iommufd_device *idev, int
> ioasid);
> > +void iommufd_device_detach_ioasid(struct iommufd_device *idev, int
> ioasid);
> > +
> > +static inline int
> > +__pci_iommufd_device_attach_ioasid(struct pci_dev *pdev,
> > +  struct iommufd_device *idev, int ioasid)
> > +{
> > +   return iommufd_device_attach_ioasid(idev, ioasid);
> > +}
> 
> If think sis taking in the iommfd_device then there isn't a logical
> place to signal the PCIness

can you elaborate?

> 
> But, I think the API should at least have a kdoc that this is
> capturing the entire device and specify that for PCI this means all
> TLPs with the RID.
> 

yes, this should be documented.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 12/20] iommu/iommufd: Add IOMMU_CHECK_EXTENSION

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 1:47 AM
> 
> On Sun, Sep 19, 2021 at 02:38:40PM +0800, Liu Yi L wrote:
> > As aforementioned, userspace should check extension for what formats
> > can be specified when allocating an IOASID. This patch adds such
> > interface for userspace. In this RFC, iommufd reports EXT_MAP_TYPE1V2
> > support and no no-snoop support yet.
> >
> > Signed-off-by: Liu Yi L 
> >  drivers/iommu/iommufd/iommufd.c |  7 +++
> >  include/uapi/linux/iommu.h  | 27 +++
> >  2 files changed, 34 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/iommufd.c
> b/drivers/iommu/iommufd/iommufd.c
> > index 4839f128b24a..e45d76359e34 100644
> > +++ b/drivers/iommu/iommufd/iommufd.c
> > @@ -306,6 +306,13 @@ static long iommufd_fops_unl_ioctl(struct file
> *filep,
> > return ret;
> >
> > switch (cmd) {
> > +   case IOMMU_CHECK_EXTENSION:
> > +   switch (arg) {
> > +   case EXT_MAP_TYPE1V2:
> > +   return 1;
> > +   default:
> > +   return 0;
> > +   }
> > case IOMMU_DEVICE_GET_INFO:
> > ret = iommufd_get_device_info(ictx, arg);
> > break;
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 5cbd300eb0ee..49731be71213 100644
> > +++ b/include/uapi/linux/iommu.h
> > @@ -14,6 +14,33 @@
> >  #define IOMMU_TYPE (';')
> >  #define IOMMU_BASE 100
> >
> > +/*
> > + * IOMMU_CHECK_EXTENSION - _IO(IOMMU_TYPE, IOMMU_BASE + 0)
> > + *
> > + * Check whether an uAPI extension is supported.
> > + *
> > + * It's unlikely that all planned capabilities in IOMMU fd will be ready
> > + * in one breath. User should check which uAPI extension is supported
> > + * according to its intended usage.
> > + *
> > + * A rough list of possible extensions may include:
> > + *
> > + * - EXT_MAP_TYPE1V2 for vfio type1v2 map semantics;
> > + * - EXT_DMA_NO_SNOOP for no-snoop DMA support;
> > + * - EXT_MAP_NEWTYPE for an enhanced map semantics;
> > + * - EXT_MULTIDEV_GROUP for 1:N iommu group;
> > + * - EXT_IOASID_NESTING for what the name stands;
> > + * - EXT_USER_PAGE_TABLE for user managed page table;
> > + * - EXT_USER_PASID_TABLE for user managed PASID table;
> > + * - EXT_DIRTY_TRACKING for tracking pages dirtied by DMA;
> > + * - ...
> > + *
> > + * Return: 0 if not supported, 1 if supported.
> > + */
> > +#define EXT_MAP_TYPE1V21
> > +#define EXT_DMA_NO_SNOOP   2
> > +#define IOMMU_CHECK_EXTENSION  _IO(IOMMU_TYPE,
> IOMMU_BASE + 0)
> 
> I generally advocate for a 'try and fail' approach to discovering
> compatibility.
> 
> If that doesn't work for the userspace then a query to return a
> generic capability flag is the next best idea. Each flag should
> clearly define what 'try and fail' it is talking about

We don't have strong preference here. Just follow what vfio does
today. So Alex's opinion is appreciated here. 😊

> 
> Eg dma_no_snoop is about creating an IOS with flag NO SNOOP set
> 
> TYPE1V2 seems like nonsense

just in case other mapping protocols are introduced in the future

> 
> Not sure about the others.
> 
> IOW, this should recast to a generic 'query capabilities' IOCTL
> 
> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 1:45 AM
> 
> On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote:
> > This patch adds IOASID allocation/free interface per iommufd. When
> > allocating an IOASID, userspace is expected to specify the type and
> > format information for the target I/O page table.
> >
> > This RFC supports only one type (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2),
> > implying a kernel-managed I/O page table with vfio type1v2 mapping
> > semantics. For this type the user should specify the addr_width of
> > the I/O address space and whether the I/O page table is created in
> > an iommu enfore_snoop format. enforce_snoop must be true at this point,
> > as the false setting requires additional contract with KVM on handling
> > WBINVD emulation, which can be added later.
> >
> > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next patch)
> > for what formats can be specified when allocating an IOASID.
> >
> > Open:
> > - Devices on PPC platform currently use a different iommu driver in vfio.
> >   Per previous discussion they can also use vfio type1v2 as long as there
> >   is a way to claim a specific iova range from a system-wide address space.
> >   This requirement doesn't sound PPC specific, as addr_width for pci
> devices
> >   can be also represented by a range [0, 2^addr_width-1]. This RFC hasn't
> >   adopted this design yet. We hope to have formal alignment in v1
> discussion
> >   and then decide how to incorporate it in v2.
> 
> I think the request was to include a start/end IO address hint when
> creating the ios. When the kernel creates it then it can return the

is the hint single-range or could be multiple-ranges?

> actual geometry including any holes via a query.

I'd like to see a detail flow from David on how the uAPI works today with
existing spapr driver and what exact changes he'd like to make on this
proposed interface. Above info is still insufficient for us to think about the
right solution.

> 
> > - Currently ioasid term has already been used in the kernel
> (drivers/iommu/
> >   ioasid.c) to represent the hardware I/O address space ID in the wire. It
> >   covers both PCI PASID (Process Address Space ID) and ARM SSID (Sub-
> Stream
> >   ID). We need find a way to resolve the naming conflict between the
> hardware
> >   ID and software handle. One option is to rename the existing ioasid to be
> >   pasid or ssid, given their full names still sound generic. Appreciate more
> >   thoughts on this open!
> 
> ioas works well here I think. Use ioas_id to refer to the xarray
> index.

What about when introducing pasid to this uAPI? Then use ioas_id
for the xarray index and ioasid to represent pasid/ssid? At this point
the software handle and hardware id are mixed together thus need
a clear terminology to differentiate them.


Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 1:41 AM
> 
> On Sun, Sep 19, 2021 at 02:38:38PM +0800, Liu Yi L wrote:
> > After a device is bound to the iommufd, userspace can use this interface
> > to query the underlying iommu capability and format info for this device.
> > Based on this information the user then creates I/O address space in a
> > compatible format with the to-be-attached devices.
> >
> > Device cookie which is registered at binding time is used to mark the
> > device which is being queried here.
> >
> > Signed-off-by: Liu Yi L 
> >  drivers/iommu/iommufd/iommufd.c | 68
> +
> >  include/uapi/linux/iommu.h  | 49 
> >  2 files changed, 117 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/iommufd.c
> b/drivers/iommu/iommufd/iommufd.c
> > index e16ca21e4534..641f199f2d41 100644
> > +++ b/drivers/iommu/iommufd/iommufd.c
> > @@ -117,6 +117,71 @@ static int iommufd_fops_release(struct inode
> *inode, struct file *filep)
> > return 0;
> >  }
> >
> > +static struct device *
> > +iommu_find_device_from_cookie(struct iommufd_ctx *ictx, u64
> dev_cookie)
> > +{
> 
> We have an xarray ID for the device, why are we allowing userspace to
> use the dev_cookie as input?
> 
> Userspace should always pass in the ID. The only place dev_cookie
> should appear is if the kernel generates an event back to
> userspace. Then the kernel should return both the ID and the
> dev_cookie in the event to allow userspace to correlate it.
> 

A little background.

In earlier design proposal we discussed two options. One is to return
an kernel-allocated ID (label) to userspace. The other is to have user
register a cookie and use it in iommufd uAPI. At that time the two
options were discussed exclusively and the cookie one is preferred.

Now you instead recommended a mixed option. We can follow it for
sure if nobody objects.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 00/20] Introduce /dev/iommu for userspace I/O address space management

2021-09-21 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Tuesday, September 21, 2021 9:45 PM
> 
> On Sun, Sep 19, 2021 at 02:38:28PM +0800, Liu Yi L wrote:
> > Linux now includes multiple device-passthrough frameworks (e.g. VFIO
> and
> > vDPA) to manage secure device access from the userspace. One critical
> task
> > of those frameworks is to put the assigned device in a secure, IOMMU-
> > protected context so user-initiated DMAs are prevented from doing harm
> to
> > the rest of the system.
> 
> Some bot will probably send this too, but it has compile warnings and
> needs to be rebased to 5.15-rc1

thanks Jason, will fix the warnings. yeah, I was using 5.14 in the test, will
rebase to 5.15-rc# in next version.

Regards,
Yi Liu

> drivers/iommu/iommufd/iommufd.c:269:6: warning: variable 'ret' is used
> uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
> if (refcount_read(&ioas->refs) > 1) {
> ^~
> drivers/iommu/iommufd/iommufd.c:277:9: note: uninitialized use occurs
> here
> return ret;
>^~~
> drivers/iommu/iommufd/iommufd.c:269:2: note: remove the 'if' if its
> condition is always true
> if (refcount_read(&ioas->refs) > 1) {
> ^~~~
> drivers/iommu/iommufd/iommufd.c:253:17: note: initialize the variable 'ret'
> to silence this warning
> int ioasid, ret;
>^
> = 0
> drivers/iommu/iommufd/iommufd.c:727:7: warning: variable 'ret' is used
> uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
> if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
> ^~
> drivers/iommu/iommufd/iommufd.c:767:17: note: uninitialized use occurs
> here
> return ERR_PTR(ret);
>^~~
> drivers/iommu/iommufd/iommufd.c:727:3: note: remove the 'if' if its
> condition is always false
> if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
> 
> ^
> drivers/iommu/iommufd/iommufd.c:727:7: warning: variable 'ret' is used
> uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
> if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
> ^~~~
> drivers/iommu/iommufd/iommufd.c:767:17: note: uninitialized use occurs
> here
> return ERR_PTR(ret);
>^~~
> drivers/iommu/iommufd/iommufd.c:727:7: note: remove the '||' if its
> condition is always false
> if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
> ^~~
> drivers/iommu/iommufd/iommufd.c:717:9: note: initialize the variable 'ret'
> to silence this warning
> int ret;
>^
> = 0
> 
> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Wednesday, September 22, 2021 9:07 AM
> 
> > From: Jason Gunthorpe 
> > Sent: Wednesday, September 22, 2021 8:55 AM
> >
> > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > > > The opened atomic is aweful. A newly created fd should start in a
> > > > state where it has a disabled fops
> > > >
> > > > The only thing the disabled fops can do is register the device to the
> > > > iommu fd. When successfully registered the device gets the normal fops.
> > > >
> > > > The registration steps should be done under a normal lock inside the
> > > > vfio_device. If a vfio_device is already registered then further
> > > > registration should fail.
> > > >
> > > > Getting the device fd via the group fd triggers the same sequence as
> > > > above.
> > > >
> > >
> > > Above works if the group interface is also connected to iommufd, i.e.
> > > making vfio type1 as a shim. In this case we can use the registration
> > > status as the exclusive switch. But if we keep vfio type1 separate as
> > > today, then a new atomic is still necessary. This all depends on how
> > > we want to deal with vfio type1 and iommufd, and possibly what's
> > > discussed here just adds another pound to the shim option...
> >
> > No, it works the same either way, the group FD path is identical to
> > the normal FD path, it just triggers some of the state transitions
> > automatically internally instead of requiring external ioctls.
> >
> > The device FDs starts disabled, an internal API binds it to the iommu
> > via open coding with the group API, and then the rest of the APIs can
> > be enabled. Same as today.
> >

After reading your comments on patch08, I may have a clearer picture
on your suggestion. The key is to handle exclusive access at the binding
time (based on vdev->iommu_dev). Please see whether below makes 
sense:

Shared sequence:

1)  initialize the device with a parked fops;
2)  need binding (explicit or implicit) to move away from parked fops;
3)  switch to normal fops after successful binding;

1) happens at device probe.

for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD:

  - 2) is done by calling .bind_iommufd() callback;
  - 3) could be done within .bind_iommufd(), or via a new callback e.g.
.finalize_device(). The latter may be preferred for the group interface;
  - Two threads may open the same device simultaneously, with exclusive 
access guaranteed by iommufd_bind_device();
  - Open() after successful binding is rejected, since normal fops has been
activated. This is checked upon vdev->iommu_dev;

for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD:

  - 2) is done by open coding bind_iommufd + attach_ioas. Create an 
iommufd_device object and record it to vdev->iommu_dev
  - 3) is done by calling .finalize_device();
  - open() after a valid vdev->iommu_dev is rejected. this also ensures
exclusive ownership with the nongroup path.

If Alex also agrees with it, this might be another mini-series to be merged
(just for group path) before this one. Doing so sort of nullifies the existing
group/container attaching process, where attach_ioas will be skipped and
now the security context is established when the device is opened.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 04/20] iommu: Add iommu_device_get_info interface

2021-09-21 Thread Lu Baolu

Hi Jason,

On 9/22/21 12:19 AM, Jason Gunthorpe wrote:

On Sun, Sep 19, 2021 at 02:38:32PM +0800, Liu Yi L wrote:

From: Lu Baolu 

This provides an interface for upper layers to get the per-device iommu
attributes.

 int iommu_device_get_info(struct device *dev,
   enum iommu_devattr attr, void *data);


Can't we use properly typed ops and functions here instead of a void
*data?

get_snoop()
get_page_size()
get_addr_width()


Yeah! Above are more friendly to the upper layer callers.



?

Jason



Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 1:10 AM
> 
> On Sun, Sep 19, 2021 at 02:38:34PM +0800, Liu Yi L wrote:
> > From: Lu Baolu 
> >
> > This extends iommu core to manage security context for passthrough
> > devices. Please bear a long explanation for how we reach this design
> > instead of managing it solely in iommufd like what vfio does today.
> >
> > Devices which cannot be isolated from each other are organized into an
> > iommu group. When a device is assigned to the user space, the entire
> > group must be put in a security context so that user-initiated DMAs via
> > the assigned device cannot harm the rest of the system. No user access
> > should be granted on a device before the security context is established
> > for the group which the device belongs to.
> 
> > Managing the security context must meet below criteria:
> >
> > 1)  The group is viable for user-initiated DMAs. This implies that the
> > devices in the group must be either bound to a device-passthrough
> 
> s/a/the same/
> 
> > framework, or driver-less, or bound to a driver which is known safe
> > (not do DMA).
> >
> > 2)  The security context should only allow DMA to the user's memory and
> > devices in this group;
> >
> > 3)  After the security context is established for the group, the group
> > viability must be continuously monitored before the user relinquishes
> > all devices belonging to the group. The viability might be broken e.g.
> > when a driver-less device is later bound to a driver which does DMA.
> >
> > 4)  The security context should not be destroyed before user access
> > permission is withdrawn.
> >
> > Existing vfio introduces explicit container/group semantics in its uAPI
> > to meet above requirements. A single security context (iommu domain)
> > is created per container. Attaching group to container moves the entire
> > group into the associated security context, and vice versa. The user can
> > open the device only after group attach. A group can be detached only
> > after all devices in the group are closed. Group viability is monitored
> > by listening to iommu group events.
> >
> > Unlike vfio, iommufd adopts a device-centric design with all group
> > logistics hidden behind the fd. Binding a device to iommufd serves
> > as the contract to get security context established (and vice versa
> > for unbinding). One additional requirement in iommufd is to manage the
> > switch between multiple security contexts due to decoupled bind/attach:
> 
> This should be a precursor series that actually does clean things up
> properly. There is no reason for vfio and iommufd to differ here, if
> we are implementing this logic into the iommu layer then it should be
> deleted from the VFIO layer, not left duplicated like this.

make sense

> 
> IIRC in VFIO the container is the IOAS and when the group goes to
> create the device fd it should simply do the
> iommu_device_init_user_dma() followed immediately by a call to bind
> the container IOAS as your #3.

a slight correction.

to meet vfio semantics we could do init_user_dma() at group attach
time and then call binding to container IOAS when the device fd
is created. This is because vfio requires the group in a security context
before the device is opened. 

> 
> Then delete all the group viability stuff from vfio, relying on the
> iommu to do it.
> 
> It should have full symmetry with the iommufd.

agree

> 
> > @@ -1664,6 +1671,17 @@ static int iommu_bus_notifier(struct
> notifier_block *nb,
> > group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
> > break;
> > case BUS_NOTIFY_BOUND_DRIVER:
> > +   /*
> > +* FIXME: Alternatively the attached drivers could generically
> > +* indicate to the iommu layer that they are safe for keeping
> > +* the iommu group user viable by calling some function
> around
> > +* probe(). We could eliminate this gross BUG_ON() by
> denying
> > +* probe to non-iommu-safe driver.
> > +*/
> > +   mutex_lock(&group->mutex);
> > +   if (group->user_dma_owner_id)
> > +   BUG_ON(!iommu_group_user_dma_viable(group));
> > +   mutex_unlock(&group->mutex);
> 
> And the mini-series should fix this BUG_ON properly by interlocking
> with the driver core to simply refuse to bind a driver under these
> conditions instead of allowing userspace to crash the kernel.
> 
> That alone would be justification enough to merge this work.

yes

> 
> > +
> > +/*
> > + * IOMMU core interfaces for iommufd.
> > + */
> > +
> > +/*
> > + * FIXME: We currently simply follow vifo policy to mantain the group's
> > + * viability to user. Eventually, we should avoid below hard-coded list
> > + * by letting drivers indicate to the iommu layer that they are safe for
> > + * keeping the iommu group's user aviability.
> > + */
> > +static const char * const iommu_driver_allowed[] = {
> > + 

RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, September 21, 2021 11:42 PM
> 
>  - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does
>not need locking (order it properly too, it is in the wrong order), and
>don't check for duplicate devices or dev_cookie duplication, that
>is user error and is harmless to the kernel.
> 

I'm confused here. yes it's user error, but we check so many user errors
and then return -EINVAL, -EBUSY, etc. Why is this one special?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 5:59 AM
> 
> On Tue, Sep 21, 2021 at 03:09:29PM -0600, Alex Williamson wrote:
> 
> > the iommufd uAPI at all.  Isn't part of that work to understand how KVM
> > will be told about non-coherent devices rather than "meh, skip it in the
> > kernel"?  Also let's not forget that vfio is not only for KVM.
> 
> vfio is not only for KVM, but AFIACT the wbinv stuff is only for
> KVM... But yes, I agree this should be sorted out at this stage

If such devices are even not exposed in the new hierarchy at this stage,
suppose sorting it out later should be fine?

> 
> > > > When the device is opened via /dev/vfio/devices, vfio-pci should
> prevent
> > > > the user from accessing the assigned device because the device is still
> > > > attached to the default domain which may allow user-initiated DMAs to
> > > > touch arbitrary place. The user access must be blocked until the device
> > > > is later bound to an iommufd (see patch 08). The binding acts as the
> > > > contract for putting the device in a security context which ensures 
> > > > user-
> > > > initiated DMAs via this device cannot harm the rest of the system.
> > > >
> > > > This patch introduces a vdev->block_access flag for this purpose. It's 
> > > > set
> > > > when the device is opened via /dev/vfio/devices and cleared after
> binding
> > > > to iommufd succeeds. mmap and r/w handlers check this flag to decide
> whether
> > > > user access should be blocked or not.
> > >
> > > This should not be in vfio_pci.
> > >
> > > AFAIK there is no condition where a vfio driver can work without being
> > > connected to some kind of iommu back end, so the core code should
> > > handle this interlock globally. A vfio driver's ops should not be
> > > callable until the iommu is connected.
> > >
> > > The only vfio_pci patch in this series should be adding a new callback
> > > op to take in an iommufd and register the pci_device as a iommufd
> > > device.
> >
> > Couldn't the same argument be made that registering a $bus device as an
> > iommufd device is a common interface that shouldn't be the
> > responsibility of the vfio device driver?
> 
> The driver needs enough involvment to signal what kind of IOMMU
> connection it wants, eg attaching to a physical device will use the
> iofd_attach_device() path, but attaching to a SW page table should use
> a different API call. PASID should also be different.

Exactly

> 
> Possibly a good arrangement is to have the core provide some generic
> ioctl ops functions 'vfio_all_device_iommufd_bind' that everything
> except mdev drivers can use so the code is all duplicated.

Could this be an future enhancement when we have multiple device
types supporting iommufd?

> 
> > non-group device anything more than a reservation of that device if
> > access is withheld until iommu isolation?  I also don't really want to
> > predict how ioctls might evolve to guess whether only blocking .read,
> > .write, and .mmap callbacks are sufficient.  Thanks,
> 
> This is why I said the entire fops should be blocked in a dummy fops
> so the core code the vfio_device FD parked and userspace unable to
> access the ops until device attachment and thus IOMMU ioslation is
> completed.
> 
> Simple and easy to reason about, a parked FD is very similar to a
> closed FD.
> 

This rationale makes sense. Just the open how to handle exclusive
open between group and nongroup interfaces still needs some
more clarification here, especially about what a parked FD means
for the group interface (where parking is unnecessary since the 
security context is already established before the device is opened)

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Wednesday, September 22, 2021 5:09 AM
> 
> On Tue, 21 Sep 2021 13:40:01 -0300
> Jason Gunthorpe  wrote:
> 
> > On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote:
> > > This patch exposes the device-centric interface for vfio-pci devices. To
> > > be compatiable with existing users, vfio-pci exposes both legacy group
> > > interface and device-centric interface.
> > >
> > > As explained in last patch, this change doesn't apply to devices which
> > > cannot be forced to snoop cache by their upstream iommu. Such devices
> > > are still expected to be opened via the legacy group interface.
> 
> This doesn't make much sense to me.  The previous patch indicates
> there's work to be done in updating the kvm-vfio contract to understand
> DMA coherency, so you're trying to limit use cases to those where the
> IOMMU enforces coherency, but there's QEMU work to be done to support
> the iommufd uAPI at all.  Isn't part of that work to understand how KVM
> will be told about non-coherent devices rather than "meh, skip it in the
> kernel"?  Also let's not forget that vfio is not only for KVM.

The policy here is that VFIO will not expose such devices (no enforce-snoop)
in the new device hierarchy at all. In this case QEMU will fall back to the
group interface automatically and then rely on the existing contract to connect 
vfio and QEMU. It doesn't need to care about the whatever new contract
until such devices are exposed in the new interface.

yes, vfio is not only for KVM. But here it's more a task split based on staging
consideration. imo it's not necessary to further split task into supporting
non-snoop device for userspace driver and then for kvm.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 8:55 AM
> 
> On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > > The opened atomic is aweful. A newly created fd should start in a
> > > state where it has a disabled fops
> > >
> > > The only thing the disabled fops can do is register the device to the
> > > iommu fd. When successfully registered the device gets the normal fops.
> > >
> > > The registration steps should be done under a normal lock inside the
> > > vfio_device. If a vfio_device is already registered then further
> > > registration should fail.
> > >
> > > Getting the device fd via the group fd triggers the same sequence as
> > > above.
> > >
> >
> > Above works if the group interface is also connected to iommufd, i.e.
> > making vfio type1 as a shim. In this case we can use the registration
> > status as the exclusive switch. But if we keep vfio type1 separate as
> > today, then a new atomic is still necessary. This all depends on how
> > we want to deal with vfio type1 and iommufd, and possibly what's
> > discussed here just adds another pound to the shim option...
> 
> No, it works the same either way, the group FD path is identical to
> the normal FD path, it just triggers some of the state transitions
> automatically internally instead of requiring external ioctls.
> 
> The device FDs starts disabled, an internal API binds it to the iommu
> via open coding with the group API, and then the rest of the APIs can
> be enabled. Same as today.
> 

Still a bit confused. if vfio type1 also connects to iommufd, whether 
the device is registered can be centrally checked based on whether
an iommu_ctx is recorded. But if type1 doesn't talk to iommufd at
all, don't we still need introduce a new state (calling it 'opened' or
'registered') to protect the two interfaces? In this case what is the
point of keeping device FD disabled even for the group path?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 03/20] vfio: Add vfio_[un]register_device()

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 9:00 AM
> 
> On Wed, Sep 22, 2021 at 12:54:02AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, September 22, 2021 12:01 AM
> > >
> > > >  One open about how to organize the device nodes under
> > > /dev/vfio/devices/.
> > > > This RFC adopts a simple policy by keeping a flat layout with mixed
> > > devname
> > > > from all kinds of devices. The prerequisite of this model is that
> devnames
> > > > from different bus types are unique formats:
> > >
> > > This isn't reliable, the devname should just be vfio0, vfio1, etc
> > >
> > > The userspace can learn the correct major/minor by inspecting the
> > > sysfs.
> > >
> > > This whole concept should disappear into the prior patch that adds the
> > > struct device in the first place, and I think most of the code here
> > > can be deleted once the struct device is used properly.
> > >
> >
> > Can you help elaborate above flow? This is one area where we need
> > more guidance.
> >
> > When Qemu accepts an option "-device vfio-pci,host=:BB:DD.F",
> > how does Qemu identify which vifo0/1/... is associated with the specified
> > :BB:DD.F?
> 
> When done properly in the kernel the file:
> 
> /sys/bus/pci/devices/:BB:DD.F/vfio/vfioX/dev
> 
> Will contain the major:minor of the VFIO device.
> 
> Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> that the major:minor matches.

ah, that's the trick.

> 
> in the above pattern "pci" and ":BB:DD.FF" are the arguments passed
> to qemu.
> 
> You can look at this for some general over engineered code to handle
> opening from a sysfs handle like above:
> 
> https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
> 

will check. Thanks for suggestion.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 03/20] vfio: Add vfio_[un]register_device()

2021-09-21 Thread Jason Gunthorpe via iommu
On Wed, Sep 22, 2021 at 12:54:02AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, September 22, 2021 12:01 AM
> > 
> > >  One open about how to organize the device nodes under
> > /dev/vfio/devices/.
> > > This RFC adopts a simple policy by keeping a flat layout with mixed
> > devname
> > > from all kinds of devices. The prerequisite of this model is that devnames
> > > from different bus types are unique formats:
> > 
> > This isn't reliable, the devname should just be vfio0, vfio1, etc
> > 
> > The userspace can learn the correct major/minor by inspecting the
> > sysfs.
> > 
> > This whole concept should disappear into the prior patch that adds the
> > struct device in the first place, and I think most of the code here
> > can be deleted once the struct device is used properly.
> > 
> 
> Can you help elaborate above flow? This is one area where we need
> more guidance.
> 
> When Qemu accepts an option "-device vfio-pci,host=:BB:DD.F",
> how does Qemu identify which vifo0/1/... is associated with the specified 
> :BB:DD.F? 

When done properly in the kernel the file:

/sys/bus/pci/devices/:BB:DD.F/vfio/vfioX/dev

Will contain the major:minor of the VFIO device.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

in the above pattern "pci" and ":BB:DD.FF" are the arguments passed
to qemu.

You can look at this for some general over engineered code to handle
opening from a sysfs handle like above:

https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 03/20] vfio: Add vfio_[un]register_device()

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 8:54 AM
> 
> On Tue, Sep 21, 2021 at 11:10:15PM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, September 22, 2021 12:01 AM
> > >
> > > On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote:
> > > > With /dev/vfio/devices introduced, now a vfio device driver has three
> > > > options to expose its device to userspace:
> > > >
> > > > a)  only legacy group interface, for devices which haven't been moved
> to
> > > > iommufd (e.g. platform devices, sw mdev, etc.);
> > > >
> > > > b)  both legacy group interface and new device-centric interface, for
> > > > devices which supports iommufd but also wants to keep backward
> > > > compatibility (e.g. pci devices in this RFC);
> > > >
> > > > c)  only new device-centric interface, for new devices which don't carry
> > > > backward compatibility burden (e.g. hw mdev/subdev with pasid);
> > >
> > > We shouldn't have 'b'? Where does it come from?
> >
> > a vfio-pci device can be opened via the existing group interface. if no b) 
> > it
> > means legacy vfio userspace can never use vfio-pci device any more
> > once the latter is moved to iommufd.
> 
> Sorry, I think I ment a, which I guess you will say is SW mdev devices

We listed a) here in case we don't want to move all vfio device types to
use iommufd in one breath. It's supposed to be a type valid only in this
transition phase. In the end only b) and c) are valid.

> 
> But even so, I think the way forward here is to still always expose
> the device /dev/vfio/devices/X and some devices may not allow iommufd
> usage initially.
> 
> Providing an ioctl to bind to a normal VFIO container or group might
> allow a reasonable fallback in userspace..
> 

but doesn't a new ioctl still imply breaking existing vfio userspace?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Wednesday, September 22, 2021 3:56 AM
> 
> On Sun, 19 Sep 2021 14:38:30 +0800
> Liu Yi L  wrote:
> 
> > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> > userspace to directly open a vfio device w/o relying on container/group
> > (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> > iommufd (more specifically in iommu core by this RFC) in a device-centric
> > manner.
> >
> > In case a device is exposed in both legacy and new interfaces (see next
> > patch for how to decide it), this patch also ensures that when the device
> > is already opened via one interface then the other one must be blocked.
> >
> > Signed-off-by: Liu Yi L 
> > ---
> >  drivers/vfio/vfio.c  | 228 +++
> >  include/linux/vfio.h |   2 +
> >  2 files changed, 213 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 02cc51ce6891..84436d7abedd 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> ...
> > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = {
> > .mode = S_IRUGO | S_IWUGO,
> >  };
> >
> > +static char *vfio_device_devnode(struct device *dev, umode_t *mode)
> > +{
> > +   return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > +}
> 
> dev_name() doesn't provide us with any uniqueness guarantees, so this
> could potentially generate naming conflicts.  The similar scheme for
> devices within an iommu group appends an instance number if a conflict
> occurs, but that solution doesn't work here where the name isn't just a
> link to the actual device.  Devices within an iommu group are also
> likely associated within a bus_type, so the potential for conflict is
> pretty negligible, that's not the case as vfio is adopted for new
> device types.  Thanks,
> 

This is also our concern. Thanks for confirming it. Appreciate if you
can help think out some better alternative to deal with it.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Jason Gunthorpe via iommu
On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > The opened atomic is aweful. A newly created fd should start in a
> > state where it has a disabled fops
> > 
> > The only thing the disabled fops can do is register the device to the
> > iommu fd. When successfully registered the device gets the normal fops.
> > 
> > The registration steps should be done under a normal lock inside the
> > vfio_device. If a vfio_device is already registered then further
> > registration should fail.
> > 
> > Getting the device fd via the group fd triggers the same sequence as
> > above.
> > 
> 
> Above works if the group interface is also connected to iommufd, i.e.
> making vfio type1 as a shim. In this case we can use the registration
> status as the exclusive switch. But if we keep vfio type1 separate as
> today, then a new atomic is still necessary. This all depends on how
> we want to deal with vfio type1 and iommufd, and possibly what's
> discussed here just adds another pound to the shim option...

No, it works the same either way, the group FD path is identical to
the normal FD path, it just triggers some of the state transitions
automatically internally instead of requiring external ioctls.

The device FDs starts disabled, an internal API binds it to the iommu
via open coding with the group API, and then the rest of the APIs can
be enabled. Same as today.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 03/20] vfio: Add vfio_[un]register_device()

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 12:01 AM
> 
> >  One open about how to organize the device nodes under
> /dev/vfio/devices/.
> > This RFC adopts a simple policy by keeping a flat layout with mixed
> devname
> > from all kinds of devices. The prerequisite of this model is that devnames
> > from different bus types are unique formats:
> 
> This isn't reliable, the devname should just be vfio0, vfio1, etc
> 
> The userspace can learn the correct major/minor by inspecting the
> sysfs.
> 
> This whole concept should disappear into the prior patch that adds the
> struct device in the first place, and I think most of the code here
> can be deleted once the struct device is used properly.
> 

Can you help elaborate above flow? This is one area where we need
more guidance.

When Qemu accepts an option "-device vfio-pci,host=:BB:DD.F",
how does Qemu identify which vifo0/1/... is associated with the specified 
:BB:DD.F? 

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 03/20] vfio: Add vfio_[un]register_device()

2021-09-21 Thread Jason Gunthorpe via iommu
On Tue, Sep 21, 2021 at 11:10:15PM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, September 22, 2021 12:01 AM
> > 
> > On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote:
> > > With /dev/vfio/devices introduced, now a vfio device driver has three
> > > options to expose its device to userspace:
> > >
> > > a)  only legacy group interface, for devices which haven't been moved to
> > > iommufd (e.g. platform devices, sw mdev, etc.);
> > >
> > > b)  both legacy group interface and new device-centric interface, for
> > > devices which supports iommufd but also wants to keep backward
> > > compatibility (e.g. pci devices in this RFC);
> > >
> > > c)  only new device-centric interface, for new devices which don't carry
> > > backward compatibility burden (e.g. hw mdev/subdev with pasid);
> > 
> > We shouldn't have 'b'? Where does it come from?
> 
> a vfio-pci device can be opened via the existing group interface. if no b) it 
> means legacy vfio userspace can never use vfio-pci device any more
> once the latter is moved to iommufd.

Sorry, I think I ment a, which I guess you will say is SW mdev devices

But even so, I think the way forward here is to still always expose
the device /dev/vfio/devices/X and some devices may not allow iommufd
usage initially.

Providing an ioctl to bind to a normal VFIO container or group might
allow a reasonable fallback in userspace..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, September 21, 2021 11:57 PM
> 
> On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:
> > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> > userspace to directly open a vfio device w/o relying on container/group
> > (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> > iommufd (more specifically in iommu core by this RFC) in a device-centric
> > manner.
> >
> > In case a device is exposed in both legacy and new interfaces (see next
> > patch for how to decide it), this patch also ensures that when the device
> > is already opened via one interface then the other one must be blocked.
> >
> > Signed-off-by: Liu Yi L 
> > ---
> >  drivers/vfio/vfio.c  | 228 +++
> >  include/linux/vfio.h |   2 +
> >  2 files changed, 213 insertions(+), 17 deletions(-)
> 
> > +static int vfio_init_device_class(void)
> > +{
> > +   int ret;
> > +
> > +   mutex_init(&vfio.device_lock);
> > +   idr_init(&vfio.device_idr);
> > +
> > +   /* /dev/vfio/devices/$DEVICE */
> > +   vfio.device_class = class_create(THIS_MODULE, "vfio-device");
> > +   if (IS_ERR(vfio.device_class))
> > +   return PTR_ERR(vfio.device_class);
> > +
> > +   vfio.device_class->devnode = vfio_device_devnode;
> > +
> > +   ret = alloc_chrdev_region(&vfio.device_devt, 0, MINORMASK + 1,
> "vfio-device");
> > +   if (ret)
> > +   goto err_alloc_chrdev;
> > +
> > +   cdev_init(&vfio.device_cdev, &vfio_device_fops);
> > +   ret = cdev_add(&vfio.device_cdev, vfio.device_devt, MINORMASK +
> 1);
> > +   if (ret)
> > +   goto err_cdev_add;
> 
> Huh? This is not how cdevs are used. This patch needs rewriting.
> 
> The struct vfio_device should gain a 'struct device' and 'struct cdev'
> as non-pointer members
> 
> vfio register path should end up doing cdev_device_add() for each
> vfio_device
> 
> vfio_unregister path should do cdev_device_del()
> 
> No idr should be needed, an ida is used to allocate minor numbers
> 
> The struct device release function should trigger a kfree which
> requires some reworking of the callers
> 
> vfio_init_group_dev() should do a device_initialize()
> vfio_uninit_group_dev() should do a device_put()

All above are good suggestions!

> 
> The opened atomic is aweful. A newly created fd should start in a
> state where it has a disabled fops
> 
> The only thing the disabled fops can do is register the device to the
> iommu fd. When successfully registered the device gets the normal fops.
> 
> The registration steps should be done under a normal lock inside the
> vfio_device. If a vfio_device is already registered then further
> registration should fail.
> 
> Getting the device fd via the group fd triggers the same sequence as
> above.
> 

Above works if the group interface is also connected to iommufd, i.e.
making vfio type1 as a shim. In this case we can use the registration
status as the exclusive switch. But if we keep vfio type1 separate as
today, then a new atomic is still necessary. This all depends on how
we want to deal with vfio type1 and iommufd, and possibly what's
discussed here just adds another pound to the shim option...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 03/20] vfio: Add vfio_[un]register_device()

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 12:01 AM
> 
> On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote:
> > With /dev/vfio/devices introduced, now a vfio device driver has three
> > options to expose its device to userspace:
> >
> > a)  only legacy group interface, for devices which haven't been moved to
> > iommufd (e.g. platform devices, sw mdev, etc.);
> >
> > b)  both legacy group interface and new device-centric interface, for
> > devices which supports iommufd but also wants to keep backward
> > compatibility (e.g. pci devices in this RFC);
> >
> > c)  only new device-centric interface, for new devices which don't carry
> > backward compatibility burden (e.g. hw mdev/subdev with pasid);
> 
> We shouldn't have 'b'? Where does it come from?

a vfio-pci device can be opened via the existing group interface. if no b) it 
means legacy vfio userspace can never use vfio-pci device any more
once the latter is moved to iommufd.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices

2021-09-21 Thread Jason Gunthorpe via iommu
On Tue, Sep 21, 2021 at 03:09:29PM -0600, Alex Williamson wrote:

> the iommufd uAPI at all.  Isn't part of that work to understand how KVM
> will be told about non-coherent devices rather than "meh, skip it in the
> kernel"?  Also let's not forget that vfio is not only for KVM.

vfio is not only for KVM, but AFIACT the wbinv stuff is only for
KVM... But yes, I agree this should be sorted out at this stage

> > > When the device is opened via /dev/vfio/devices, vfio-pci should prevent
> > > the user from accessing the assigned device because the device is still
> > > attached to the default domain which may allow user-initiated DMAs to
> > > touch arbitrary place. The user access must be blocked until the device
> > > is later bound to an iommufd (see patch 08). The binding acts as the
> > > contract for putting the device in a security context which ensures user-
> > > initiated DMAs via this device cannot harm the rest of the system.
> > > 
> > > This patch introduces a vdev->block_access flag for this purpose. It's set
> > > when the device is opened via /dev/vfio/devices and cleared after binding
> > > to iommufd succeeds. mmap and r/w handlers check this flag to decide 
> > > whether
> > > user access should be blocked or not.  
> > 
> > This should not be in vfio_pci.
> > 
> > AFAIK there is no condition where a vfio driver can work without being
> > connected to some kind of iommu back end, so the core code should
> > handle this interlock globally. A vfio driver's ops should not be
> > callable until the iommu is connected.
> > 
> > The only vfio_pci patch in this series should be adding a new callback
> > op to take in an iommufd and register the pci_device as a iommufd
> > device.
> 
> Couldn't the same argument be made that registering a $bus device as an
> iommufd device is a common interface that shouldn't be the
> responsibility of the vfio device driver? 

The driver needs enough involvment to signal what kind of IOMMU
connection it wants, eg attaching to a physical device will use the
iofd_attach_device() path, but attaching to a SW page table should use
a different API call. PASID should also be different.

Possibly a good arrangement is to have the core provide some generic
ioctl ops functions 'vfio_all_device_iommufd_bind' that everything
except mdev drivers can use so the code is all duplicated.

> non-group device anything more than a reservation of that device if
> access is withheld until iommu isolation?  I also don't really want to
> predict how ioctls might evolve to guess whether only blocking .read,
> .write, and .mmap callbacks are sufficient.  Thanks,

This is why I said the entire fops should be blocked in a dummy fops
so the core code the vfio_device FD parked and userspace unable to
access the ops until device attachment and thus IOMMU ioslation is
completed.

Simple and easy to reason about, a parked FD is very similar to a
closed FD.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov
On Tue, Sep 21, 2021 at 04:43:59PM -0500, Tom Lendacky wrote:
> On 9/21/21 4:34 PM, Kirill A. Shutemov wrote:
> > On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote:
> > > On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote:
> > > > I still believe calling cc_platform_has() from __startup_64() is totally
> > > > broken as it lacks proper wrapping while accessing global variables.
> > > 
> > > Well, one of the issues on the AMD side was using boot_cpu_data too
> > > early and the Intel side uses it too. Can you replace those checks with
> > > is_tdx_guest() or whatever was the helper's name which would check
> > > whether the the kernel is running as a TDX guest, and see if that helps?
> > 
> > There's no need in Intel check this early. Only AMD need it. Maybe just
> > opencode them?
> 
> Any way you can put a gzipped/bzipped copy of your vmlinux file somewhere I
> can grab it from and take a look at it?

You can find broken vmlinux and bzImage here:

https://drive.google.com/drive/folders/1n74vUQHOGebnF70Im32qLFY8iS3wvjIs?usp=sharing

Let me know when I can remove it.

-- 
 Kirill A. Shutemov
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Tom Lendacky via iommu

On 9/21/21 4:34 PM, Kirill A. Shutemov wrote:

On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote:

On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote:

I still believe calling cc_platform_has() from __startup_64() is totally
broken as it lacks proper wrapping while accessing global variables.


Well, one of the issues on the AMD side was using boot_cpu_data too
early and the Intel side uses it too. Can you replace those checks with
is_tdx_guest() or whatever was the helper's name which would check
whether the the kernel is running as a TDX guest, and see if that helps?


There's no need in Intel check this early. Only AMD need it. Maybe just
opencode them?


Any way you can put a gzipped/bzipped copy of your vmlinux file somewhere 
I can grab it from and take a look at it?


Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov
On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote:
> On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote:
> > I still believe calling cc_platform_has() from __startup_64() is totally
> > broken as it lacks proper wrapping while accessing global variables.
> 
> Well, one of the issues on the AMD side was using boot_cpu_data too
> early and the Intel side uses it too. Can you replace those checks with
> is_tdx_guest() or whatever was the helper's name which would check
> whether the the kernel is running as a TDX guest, and see if that helps?

There's no need in Intel check this early. Only AMD need it. Maybe just
opencode them?

-- 
 Kirill A. Shutemov
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Borislav Petkov
On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote:
> I still believe calling cc_platform_has() from __startup_64() is totally
> broken as it lacks proper wrapping while accessing global variables.

Well, one of the issues on the AMD side was using boot_cpu_data too
early and the Intel side uses it too. Can you replace those checks with
is_tdx_guest() or whatever was the helper's name which would check
whether the the kernel is running as a TDX guest, and see if that helps?

Thx.

-- 
Regards/Gruss,
Boris.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov
On Tue, Sep 21, 2021 at 07:47:15PM +0200, Borislav Petkov wrote:
> On Tue, Sep 21, 2021 at 12:04:58PM -0500, Tom Lendacky wrote:
> > Looks like instrumentation during early boot. I worked with Boris offline to
> > exclude arch/x86/kernel/cc_platform.c from some of the instrumentation and
> > that allowed an allyesconfig to boot.
> 
> And here's the lineup I have so far, I'd appreciate it if ppc and s390 folks
> could run it too:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc2-cc

Still broken for me with allyesconfig.

gcc version 11.2.0 (Gentoo 11.2.0 p1)
GNU ld (Gentoo 2.37_p1 p0) 2.37

I still believe calling cc_platform_has() from __startup_64() is totally
broken as it lacks proper wrapping while accessing global variables.

I think sme_get_me_mask() has the same problem. I just happened to work
(until next compiler update).

This hack makes kernel boot again:

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index f98c76a1d16c..e9110a44bf1b 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 * there is no need to zero it after changing the memory encryption
 * attribute.
 */
-   if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
+   if (0 && cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index eff4d19f9cb4..91638ed0b1db 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -288,7 +288,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
unsigned long pgtable_area_len;
unsigned long decrypted_base;
 
-   if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
+   if (1 || !cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
return;
 
/*
-- 
 Kirill A. Shutemov
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices

2021-09-21 Thread Alex Williamson
On Tue, 21 Sep 2021 13:40:01 -0300
Jason Gunthorpe  wrote:

> On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote:
> > This patch exposes the device-centric interface for vfio-pci devices. To
> > be compatiable with existing users, vfio-pci exposes both legacy group
> > interface and device-centric interface.
> > 
> > As explained in last patch, this change doesn't apply to devices which
> > cannot be forced to snoop cache by their upstream iommu. Such devices
> > are still expected to be opened via the legacy group interface.

This doesn't make much sense to me.  The previous patch indicates
there's work to be done in updating the kvm-vfio contract to understand
DMA coherency, so you're trying to limit use cases to those where the
IOMMU enforces coherency, but there's QEMU work to be done to support
the iommufd uAPI at all.  Isn't part of that work to understand how KVM
will be told about non-coherent devices rather than "meh, skip it in the
kernel"?  Also let's not forget that vfio is not only for KVM.
 
> > When the device is opened via /dev/vfio/devices, vfio-pci should prevent
> > the user from accessing the assigned device because the device is still
> > attached to the default domain which may allow user-initiated DMAs to
> > touch arbitrary place. The user access must be blocked until the device
> > is later bound to an iommufd (see patch 08). The binding acts as the
> > contract for putting the device in a security context which ensures user-
> > initiated DMAs via this device cannot harm the rest of the system.
> > 
> > This patch introduces a vdev->block_access flag for this purpose. It's set
> > when the device is opened via /dev/vfio/devices and cleared after binding
> > to iommufd succeeds. mmap and r/w handlers check this flag to decide whether
> > user access should be blocked or not.  
> 
> This should not be in vfio_pci.
> 
> AFAIK there is no condition where a vfio driver can work without being
> connected to some kind of iommu back end, so the core code should
> handle this interlock globally. A vfio driver's ops should not be
> callable until the iommu is connected.
> 
> The only vfio_pci patch in this series should be adding a new callback
> op to take in an iommufd and register the pci_device as a iommufd
> device.

Couldn't the same argument be made that registering a $bus device as an
iommufd device is a common interface that shouldn't be the
responsibility of the vfio device driver?  Is userspace opening the
non-group device anything more than a reservation of that device if
access is withheld until iommu isolation?  I also don't really want to
predict how ioctls might evolve to guess whether only blocking .read,
.write, and .mmap callbacks are sufficient.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 02/13] dt-bindings: memory: mediatek: Add mt8195 smi sub common

2021-09-21 Thread Rob Herring
On Tue, 14 Sep 2021 19:36:52 +0800, Yong Wu wrote:
> Add the binding for smi-sub-common. The SMI block diagram like this:
> 
> IOMMU
>  |  |
>   smi-common
>   --
>   |    |
>  larb0   larb7   <-max is 8
> 
> The smi-common connects with smi-larb and IOMMU. The maximum larbs number
> that connects with a smi-common is 8. If the engines number is over 8,
> sometimes we use a smi-sub-common which is nearly same with smi-common.
> It supports up to 8 input and 1 output(smi-common has 2 output)
> 
> Something like:
> 
> IOMMU
>  |  |
>   smi-common
>   -
>   |  |  ...
> larb0  sub-common   ...   <-max is 8
>   ---
>||...   <-max is 8 too.
>  larb2 larb5
> 
> We don't need extra SW setting for smi-sub-common, only the sub-common has
> special clocks need to enable when the engines access dram.
> 
> If it is sub-common, it should have a "mediatek,smi" phandle to point to
> its smi-common. meanwhile the sub-common only has one gals clock.
> 
> Signed-off-by: Yong Wu 
> ---
> change note: add "else mediatek,smi: false".
> ---
>  .../mediatek,smi-common.yaml  | 28 +++
>  1 file changed, 28 insertions(+)
> 

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Alex Williamson
On Sun, 19 Sep 2021 14:38:30 +0800
Liu Yi L  wrote:

> This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> userspace to directly open a vfio device w/o relying on container/group
> (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> iommufd (more specifically in iommu core by this RFC) in a device-centric
> manner.
> 
> In case a device is exposed in both legacy and new interfaces (see next
> patch for how to decide it), this patch also ensures that when the device
> is already opened via one interface then the other one must be blocked.
> 
> Signed-off-by: Liu Yi L 
> ---
>  drivers/vfio/vfio.c  | 228 +++
>  include/linux/vfio.h |   2 +
>  2 files changed, 213 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 02cc51ce6891..84436d7abedd 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
...
> @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = {
>   .mode = S_IRUGO | S_IWUGO,
>  };
>  
> +static char *vfio_device_devnode(struct device *dev, umode_t *mode)
> +{
> + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> +}

dev_name() doesn't provide us with any uniqueness guarantees, so this
could potentially generate naming conflicts.  The similar scheme for
devices within an iommu group appends an instance number if a conflict
occurs, but that solution doesn't work here where the name isn't just a
link to the actual device.  Devices within an iommu group are also
likely associated within a bus_type, so the potential for conflict is
pretty negligible, that's not the case as vfio is adopted for new
device types.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 16/20] vfio/type1: Export symbols for dma [un]map code sharing

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:44PM +0800, Liu Yi L wrote:
> [HACK. will fix in v2]
> 
> There are two options to impelement vfio type1v2 mapping semantics in
> /dev/iommu.
> 
> One is to duplicate the related code from vfio as the starting point,
> and then merge with vfio type1 at a later time. However vfio_iommu_type1.c
> has over 3000LOC with ~80% related to dma management logic, including:

I can't really see a way forward like this. I think some scheme to
move the vfio datastructure is going to be necessary.

> - the dma map/unmap metadata management
> - page pinning, and related accounting
> - iova range reporting
> - dirty bitmap retrieving
> - dynamic vaddr update, etc.

All of this needs to be part of the iommufd anyhow..

> The alternative is to consolidate type1v2 logic in /dev/iommu immediately,
> which requires converting vfio_iommu_type1 to be a shim driver. 

Another choice is the the datastructure coulde move and the two
drivers could share its code and continue to exist more independently

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 15/20] vfio/pci: Add VFIO_DEVICE_[DE]ATTACH_IOASID

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:43PM +0800, Liu Yi L wrote:
> This patch adds interface for userspace to attach device to specified
> IOASID.
> 
> Note:
> One device can only be attached to one IOASID in this version. This is
> on par with what vfio provides today. In the future this restriction can
> be relaxed when multiple I/O address spaces are supported per device

?? In VFIO the container is the IOS and the container can be shared
with multiple devices. This needs to start at about the same
functionality.

> + } else if (cmd == VFIO_DEVICE_ATTACH_IOASID) {

This should be in the core code, right? There is nothing PCI specific
here.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 14/20] iommu/iommufd: Add iommufd_device_[de]attach_ioasid()

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:42PM +0800, Liu Yi L wrote:
> An I/O address space takes effect in the iommu only after it's attached
> by a device. This patch provides iommufd_device_[de/at]tach_ioasid()
> helpers for this purpose. One device can be only attached to one ioasid
> at this point, but one ioasid can be attached by multiple devices.
> 
> The caller specifies the iommufd_device (returned at binding time) and
> the target ioasid when calling the helper function. Upon request, iommufd
> installs the specified I/O page table to the correct place in the IOMMU,
> according to the routing information (struct device* which represents
> RID) recorded in iommufd_device. Future variants could allow the caller
> to specify additional routing information (e.g. pasid/ssid) when multiple
> I/O address spaces are supported per device.
> 
> Open:
> Per Jason's comment in below link, bus-specific wrappers are recommended.
> This RFC implements one wrapper for pci device. But it looks that struct
> pci_device is not used at all since iommufd_ device already carries all
> necessary info. So want to have another discussion on its necessity, e.g.
> whether making more sense to have bus-specific wrappers for binding, while
> leaving a common attaching helper per iommufd_device.
> https://lore.kernel.org/linux-iommu/20210528233649.gb3816...@nvidia.com/
> 
> TODO:
> When multiple devices are attached to a same ioasid, the permitted iova
> ranges and supported pgsize bitmap on this ioasid should be a common
> subset of all attached devices. iommufd needs to track such info per
> ioasid and update it every time when a new device is attached to the
> ioasid. This has not been done in this version yet, due to the temporary
> hack adopted in patch 16-18. The hack reuses vfio type1 driver which
> already includes the necessary logic for iova ranges and pgsize bitmap.
> Once we get a clear direction for those patches, that logic will be moved
> to this patch.
> 
> Signed-off-by: Liu Yi L 
>  drivers/iommu/iommufd/iommufd.c | 226 
>  include/linux/iommufd.h |  29 
>  2 files changed, 255 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c
> index e45d76359e34..25373a0e037a 100644
> +++ b/drivers/iommu/iommufd/iommufd.c
> @@ -51,6 +51,19 @@ struct iommufd_ioas {
>   bool enforce_snoop;
>   struct iommufd_ctx *ictx;
>   refcount_t refs;
> + struct mutex lock;
> + struct list_head device_list;
> + struct iommu_domain *domain;

This should just be another xarray indexed by the device id

> +/* Caller should hold ioas->lock */
> +static struct ioas_device_info *ioas_find_device(struct iommufd_ioas *ioas,
> +  struct iommufd_device *idev)
> +{
> + struct ioas_device_info *ioas_dev;
> +
> + list_for_each_entry(ioas_dev, &ioas->device_list, next) {
> + if (ioas_dev->idev == idev)
> + return ioas_dev;
> + }

Which eliminates this search. xarray with tightly packed indexes is
generally more efficient than linked lists..

> +static int ioas_check_device_compatibility(struct iommufd_ioas *ioas,
> +struct device *dev)
> +{
> + bool snoop = false;
> + u32 addr_width;
> + int ret;
> +
> + /*
> +  * currently we only support I/O page table with iommu enforce-snoop
> +  * format. Attaching a device which doesn't support this format in its
> +  * upstreaming iommu is rejected.
> +  */
> + ret = iommu_device_get_info(dev, IOMMU_DEV_INFO_FORCE_SNOOP, &snoop);
> + if (ret || !snoop)
> + return -EINVAL;
> +
> + ret = iommu_device_get_info(dev, IOMMU_DEV_INFO_ADDR_WIDTH, 
> &addr_width);
> + if (ret || addr_width < ioas->addr_width)
> + return -EINVAL;
> +
> + /* TODO: also need to check permitted iova ranges and pgsize bitmap */
> +
> + return 0;
> +}

This seems kind of weird..

I expect the iommufd to hold a SW copy of the IO page table and each
time a new domain is to be created it should push the SW copy into the
domain. If the domain cannot support it then the domain driver should
naturally fail a request.

When the user changes the IO page table the SW copy is updated then
all of the domains are updated too - again if any domain cannot
support the change then it fails and the change is rolled back.

It seems like this is a side effect of roughly hacking in the vfio
code?

> +
> +/**
> + * iommufd_device_attach_ioasid - attach device to an ioasid
> + * @idev: [in] Pointer to struct iommufd_device.
> + * @ioasid: [in] ioasid points to an I/O address space.
> + *
> + * Returns 0 for successful attach, otherwise returns error.
> + *
> + */
> +int iommufd_device_attach_ioasid(struct iommufd_device *idev, int ioasid)

Types for the ioas_id again..

> +{
> + struct iommufd_ioas *ioas;
> + struct ioas_device_info *ioas_dev;
> +

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Borislav Petkov
On Tue, Sep 21, 2021 at 12:04:58PM -0500, Tom Lendacky wrote:
> Looks like instrumentation during early boot. I worked with Boris offline to
> exclude arch/x86/kernel/cc_platform.c from some of the instrumentation and
> that allowed an allyesconfig to boot.

And here's the lineup I have so far, I'd appreciate it if ppc and s390 folks
could run it too:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc2-cc

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 12/20] iommu/iommufd: Add IOMMU_CHECK_EXTENSION

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:40PM +0800, Liu Yi L wrote:
> As aforementioned, userspace should check extension for what formats
> can be specified when allocating an IOASID. This patch adds such
> interface for userspace. In this RFC, iommufd reports EXT_MAP_TYPE1V2
> support and no no-snoop support yet.
> 
> Signed-off-by: Liu Yi L 
>  drivers/iommu/iommufd/iommufd.c |  7 +++
>  include/uapi/linux/iommu.h  | 27 +++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c
> index 4839f128b24a..e45d76359e34 100644
> +++ b/drivers/iommu/iommufd/iommufd.c
> @@ -306,6 +306,13 @@ static long iommufd_fops_unl_ioctl(struct file *filep,
>   return ret;
>  
>   switch (cmd) {
> + case IOMMU_CHECK_EXTENSION:
> + switch (arg) {
> + case EXT_MAP_TYPE1V2:
> + return 1;
> + default:
> + return 0;
> + }
>   case IOMMU_DEVICE_GET_INFO:
>   ret = iommufd_get_device_info(ictx, arg);
>   break;
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 5cbd300eb0ee..49731be71213 100644
> +++ b/include/uapi/linux/iommu.h
> @@ -14,6 +14,33 @@
>  #define IOMMU_TYPE   (';')
>  #define IOMMU_BASE   100
>  
> +/*
> + * IOMMU_CHECK_EXTENSION - _IO(IOMMU_TYPE, IOMMU_BASE + 0)
> + *
> + * Check whether an uAPI extension is supported.
> + *
> + * It's unlikely that all planned capabilities in IOMMU fd will be ready
> + * in one breath. User should check which uAPI extension is supported
> + * according to its intended usage.
> + *
> + * A rough list of possible extensions may include:
> + *
> + *   - EXT_MAP_TYPE1V2 for vfio type1v2 map semantics;
> + *   - EXT_DMA_NO_SNOOP for no-snoop DMA support;
> + *   - EXT_MAP_NEWTYPE for an enhanced map semantics;
> + *   - EXT_MULTIDEV_GROUP for 1:N iommu group;
> + *   - EXT_IOASID_NESTING for what the name stands;
> + *   - EXT_USER_PAGE_TABLE for user managed page table;
> + *   - EXT_USER_PASID_TABLE for user managed PASID table;
> + *   - EXT_DIRTY_TRACKING for tracking pages dirtied by DMA;
> + *   - ...
> + *
> + * Return: 0 if not supported, 1 if supported.
> + */
> +#define EXT_MAP_TYPE1V2  1
> +#define EXT_DMA_NO_SNOOP 2
> +#define IOMMU_CHECK_EXTENSION_IO(IOMMU_TYPE, IOMMU_BASE + 0)

I generally advocate for a 'try and fail' approach to discovering
compatibility.

If that doesn't work for the userspace then a query to return a
generic capability flag is the next best idea. Each flag should
clearly define what 'try and fail' it is talking about

Eg dma_no_snoop is about creating an IOS with flag NO SNOOP set

TYPE1V2 seems like nonsense

Not sure about the others.

IOW, this should recast to a generic 'query capabilities' IOCTL

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote:
> This patch adds IOASID allocation/free interface per iommufd. When
> allocating an IOASID, userspace is expected to specify the type and
> format information for the target I/O page table.
> 
> This RFC supports only one type (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2),
> implying a kernel-managed I/O page table with vfio type1v2 mapping
> semantics. For this type the user should specify the addr_width of
> the I/O address space and whether the I/O page table is created in
> an iommu enfore_snoop format. enforce_snoop must be true at this point,
> as the false setting requires additional contract with KVM on handling
> WBINVD emulation, which can be added later.
> 
> Userspace is expected to call IOMMU_CHECK_EXTENSION (see next patch)
> for what formats can be specified when allocating an IOASID.
> 
> Open:
> - Devices on PPC platform currently use a different iommu driver in vfio.
>   Per previous discussion they can also use vfio type1v2 as long as there
>   is a way to claim a specific iova range from a system-wide address space.
>   This requirement doesn't sound PPC specific, as addr_width for pci devices
>   can be also represented by a range [0, 2^addr_width-1]. This RFC hasn't
>   adopted this design yet. We hope to have formal alignment in v1 discussion
>   and then decide how to incorporate it in v2.

I think the request was to include a start/end IO address hint when
creating the ios. When the kernel creates it then it can return the
actual geometry including any holes via a query.

> - Currently ioasid term has already been used in the kernel (drivers/iommu/
>   ioasid.c) to represent the hardware I/O address space ID in the wire. It
>   covers both PCI PASID (Process Address Space ID) and ARM SSID (Sub-Stream
>   ID). We need find a way to resolve the naming conflict between the hardware
>   ID and software handle. One option is to rename the existing ioasid to be
>   pasid or ssid, given their full names still sound generic. Appreciate more
>   thoughts on this open!

ioas works well here I think. Use ioas_id to refer to the xarray
index.

> Signed-off-by: Liu Yi L 
>  drivers/iommu/iommufd/iommufd.c | 120 
>  include/linux/iommufd.h |   3 +
>  include/uapi/linux/iommu.h  |  54 ++
>  3 files changed, 177 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c
> index 641f199f2d41..4839f128b24a 100644
> +++ b/drivers/iommu/iommufd/iommufd.c
> @@ -24,6 +24,7 @@
>  struct iommufd_ctx {
>   refcount_t refs;
>   struct mutex lock;
> + struct xarray ioasid_xa; /* xarray of ioasids */
>   struct xarray device_xa; /* xarray of bound devices */
>  };
>  
> @@ -42,6 +43,16 @@ struct iommufd_device {
>   u64 dev_cookie;
>  };
>  
> +/* Represent an I/O address space */
> +struct iommufd_ioas {
> + int ioasid;

xarray id's should consistently be u32s everywhere.

Many of the same prior comments repeated here

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:38PM +0800, Liu Yi L wrote:
> After a device is bound to the iommufd, userspace can use this interface
> to query the underlying iommu capability and format info for this device.
> Based on this information the user then creates I/O address space in a
> compatible format with the to-be-attached devices.
> 
> Device cookie which is registered at binding time is used to mark the
> device which is being queried here.
> 
> Signed-off-by: Liu Yi L 
>  drivers/iommu/iommufd/iommufd.c | 68 +
>  include/uapi/linux/iommu.h  | 49 
>  2 files changed, 117 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c
> index e16ca21e4534..641f199f2d41 100644
> +++ b/drivers/iommu/iommufd/iommufd.c
> @@ -117,6 +117,71 @@ static int iommufd_fops_release(struct inode *inode, 
> struct file *filep)
>   return 0;
>  }
>  
> +static struct device *
> +iommu_find_device_from_cookie(struct iommufd_ctx *ictx, u64 dev_cookie)
> +{

We have an xarray ID for the device, why are we allowing userspace to
use the dev_cookie as input?

Userspace should always pass in the ID. The only place dev_cookie
should appear is if the kernel generates an event back to
userspace. Then the kernel should return both the ID and the
dev_cookie in the event to allow userspace to correlate it.

> +static void iommu_device_build_info(struct device *dev,
> + struct iommu_device_info *info)
> +{
> + bool snoop;
> + u64 awidth, pgsizes;
> +
> + if (!iommu_device_get_info(dev, IOMMU_DEV_INFO_FORCE_SNOOP, &snoop))
> + info->flags |= snoop ? IOMMU_DEVICE_INFO_ENFORCE_SNOOP : 0;
> +
> + if (!iommu_device_get_info(dev, IOMMU_DEV_INFO_PAGE_SIZE, &pgsizes)) {
> + info->pgsize_bitmap = pgsizes;
> + info->flags |= IOMMU_DEVICE_INFO_PGSIZES;
> + }
> +
> + if (!iommu_device_get_info(dev, IOMMU_DEV_INFO_ADDR_WIDTH, &awidth)) {
> + info->addr_width = awidth;
> + info->flags |= IOMMU_DEVICE_INFO_ADDR_WIDTH;
> + }

Another good option is to push the iommu_device_info uAPI struct down
through to the iommu driver to fill it in and forget about the crazy
enum.

A big part of thinking of this iommu interface is a way to bind the HW
IOMMU driver to a uAPI and allow the HW driver to expose its unique
functionalities.

> +static int iommufd_get_device_info(struct iommufd_ctx *ictx,
> +unsigned long arg)
> +{
> + struct iommu_device_info info;
> + unsigned long minsz;
> + struct device *dev;
> +
> + minsz = offsetofend(struct iommu_device_info, addr_width);
> +
> + if (copy_from_user(&info, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (info.argsz < minsz)
> + return -EINVAL;

All of these patterns everywhere are wrongly coded for forward/back
compatibility.

static int iommufd_get_device_info(struct iommufd_ctx *ictx,
   struct iommu_device_info __user *arg, size_t usize)
{
struct iommu_device_info info;
int ret;

if (usize < offsetofend(struct iommu_device_info, addr_flags))
   return -EINVAL;

ret = copy_struct_from_user(&info, sizeof(info), arg, usize);
if (ret)
  return ret;

'usize' should be in a 'common' header extracted by the main ioctl handler.

> +struct iommu_device_info {
> + __u32   argsz;
> + __u32   flags;
> +#define IOMMU_DEVICE_INFO_ENFORCE_SNOOP  (1 << 0) /* IOMMU enforced 
> snoop */
> +#define IOMMU_DEVICE_INFO_PGSIZES(1 << 1) /* supported page sizes */
> +#define IOMMU_DEVICE_INFO_ADDR_WIDTH (1 << 2) /* addr_wdith field valid */
> + __u64   dev_cookie;
> + __u64   pgsize_bitmap;
> + __u32   addr_width;
> +};

Be explicit with padding here too.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind the vfio
> device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface is provided
> because it's implicitly done when the device fd is closed.
> 
> In concept a vfio device can be bound to multiple iommufds, each hosting
> a subset of I/O address spaces attached by this device. However as a
> starting point (matching current vfio), only one I/O address space is
> supported per vfio device. It implies one device can only be attached
> to one iommufd at this point.
> 
> Signed-off-by: Liu Yi L 
>  drivers/vfio/pci/Kconfig|  1 +
>  drivers/vfio/pci/vfio_pci.c | 72 -
>  drivers/vfio/pci/vfio_pci_private.h |  8 
>  include/uapi/linux/vfio.h   | 30 
>  4 files changed, 110 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 5e2e1b9a9fd3..3abfb098b4dc 100644
> +++ b/drivers/vfio/pci/Kconfig
> @@ -5,6 +5,7 @@ config VFIO_PCI
>   depends on MMU
>   select VFIO_VIRQFD
>   select IRQ_BYPASS_MANAGER
> + select IOMMUFD
>   help
> Support for the PCI VFIO bus driver.  This is required to make
> use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 145addde983b..20006bb66430 100644
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -552,6 +552,16 @@ static void vfio_pci_release(struct vfio_device 
> *core_vdev)
>   vdev->req_trigger = NULL;
>   }
>   mutex_unlock(&vdev->igate);
> +
> + mutex_lock(&vdev->videv_lock);
> + if (vdev->videv) {
> + struct vfio_iommufd_device *videv = vdev->videv;
> +
> + vdev->videv = NULL;
> + iommufd_unbind_device(videv->idev);
> + kfree(videv);
> + }
> + mutex_unlock(&vdev->videv_lock);
>   }
>  
>   mutex_unlock(&vdev->reflck->lock);
> @@ -780,7 +790,66 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
>   container_of(core_vdev, struct vfio_pci_device, vdev);
>   unsigned long minsz;
>  
> - if (cmd == VFIO_DEVICE_GET_INFO) {
> + if (cmd == VFIO_DEVICE_BIND_IOMMUFD) {

Choosing to implement this through the ioctl multiplexor is what is
causing so much ugly gyration in the previous patches

This should be a straightforward new function and ops:

struct iommufd_device *vfio_pci_bind_iommufd(struct vfio_device *)
{
iommu_dev = iommufd_bind_device(bind_data.iommu_fd,
   &vdev->pdev->dev,
   bind_data.dev_cookie);
if (!iommu_dev) return ERR
vdev->iommu_dev = iommu_dev;
}
static const struct vfio_device_ops vfio_pci_ops = {
   .bind_iommufd = &*vfio_pci_bind_iommufd

If you do the other stuff I said then you'll notice that the
iommufd_bind_device() will provide automatic exclusivity.

The thread that sees ops->bind_device succeed will know it is the only
thread that can see that (by definition, the iommu enable user stuff
has to be exclusive and race free) thus it can go ahead and store the
iommu pointer.

The other half of the problem '&vdev->block_access' is solved by
manipulating the filp->f_ops. Start with a fops that can ONLY call the
above op. When the above op succeeds switch the fops to the normal
full ops. .

The same flow happens when the group fd spawns the device fd, just
parts of iommfd_bind_device are open coded into the vfio code, but the
whole flow and sequence should be the same.

> + /*
> +  * Reject the request if the device is already opened and
> +  * attached to a container.
> +  */
> + if (vfio_device_in_container(core_vdev))
> + return -ENOTTY;

This is wrongly locked

> +
> + minsz = offsetofend(struct vfio_device_iommu_bind_data, 
> dev_cookie);
> +
> + if (copy_from_user(&bind_data, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (bind_data.argsz < minsz ||
> + bind_data.flags || bind_data.iommu_fd < 0)
> + return -EINVAL;
> +
> + mutex_lock(&vdev->videv_lock);
> + /*
> +  * Allow only one iommufd per device until multiple
> +  * address spaces (e.g. vSVA) support is introduced
> +  * in the future.
> +  */
> + if (vdev->videv) {
> + mutex_unlock(&vdev->videv_lock);
> + return -EBUSY;
> + }
> +
> + idev = iommufd_bind_device(bind_data.iommu_fd,
> +&vdev->pdev->dev,
> +bind_data.dev_co

Re: [RFC 07/20] iommu/iommufd: Add iommufd_[un]bind_device()

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:35PM +0800, Liu Yi L wrote:

> +/*
> + * A iommufd_device object represents the binding relationship
> + * between iommufd and device. It is created per a successful
> + * binding request from device driver. The bound device must be
> + * a physical device so far. Subdevice will be supported later
> + * (with additional PASID information). An user-assigned cookie
> + * is also recorded to mark the device in the /dev/iommu uAPI.
> + */
> +struct iommufd_device {
> + unsigned int id;
> + struct iommufd_ctx *ictx;
> + struct device *dev; /* always be the physical device */
> + u64 dev_cookie;
>  };
>  
>  static int iommufd_fops_open(struct inode *inode, struct file *filep)
> @@ -32,15 +52,58 @@ static int iommufd_fops_open(struct inode *inode, struct 
> file *filep)
>   return -ENOMEM;
>  
>   refcount_set(&ictx->refs, 1);
> + mutex_init(&ictx->lock);
> + xa_init_flags(&ictx->device_xa, XA_FLAGS_ALLOC);
>   filep->private_data = ictx;
>  
>   return ret;
>  }
>  
> +static void iommufd_ctx_get(struct iommufd_ctx *ictx)
> +{
> + refcount_inc(&ictx->refs);
> +}

See my earlier remarks about how to structure the lifetime logic, this
ref isn't necessary.

> +static const struct file_operations iommufd_fops;
> +
> +/**
> + * iommufd_ctx_fdget - Acquires a reference to the internal iommufd context.
> + * @fd: [in] iommufd file descriptor.
> + *
> + * Returns a pointer to the iommufd context, otherwise NULL;
> + *
> + */
> +static struct iommufd_ctx *iommufd_ctx_fdget(int fd)
> +{
> + struct fd f = fdget(fd);
> + struct file *file = f.file;
> + struct iommufd_ctx *ictx;
> +
> + if (!file)
> + return NULL;
> +
> + if (file->f_op != &iommufd_fops)
> + return NULL;

Leaks the fdget

> +
> + ictx = file->private_data;
> + if (ictx)
> + iommufd_ctx_get(ictx);

Use success oriented flow

> + fdput(f);
> + return ictx;
> +}

> + */
> +struct iommufd_device *iommufd_bind_device(int fd, struct device *dev,
> +u64 dev_cookie)
> +{
> + struct iommufd_ctx *ictx;
> + struct iommufd_device *idev;
> + unsigned long index;
> + unsigned int id;
> + int ret;
> +
> + ictx = iommufd_ctx_fdget(fd);
> + if (!ictx)
> + return ERR_PTR(-EINVAL);
> +
> + mutex_lock(&ictx->lock);
> +
> + /* check duplicate registration */
> + xa_for_each(&ictx->device_xa, index, idev) {
> + if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
> + idev = ERR_PTR(-EBUSY);
> + goto out_unlock;
> + }

I can't think of a reason why this expensive check is needed.

> + }
> +
> + idev = kzalloc(sizeof(*idev), GFP_KERNEL);
> + if (!idev) {
> + ret = -ENOMEM;
> + goto out_unlock;
> + }
> +
> + /* Establish the security context */
> + ret = iommu_device_init_user_dma(dev, (unsigned long)ictx);
> + if (ret)
> + goto out_free;
> +
> + ret = xa_alloc(&ictx->device_xa, &id, idev,
> +XA_LIMIT(IOMMUFD_DEVID_MIN, IOMMUFD_DEVID_MAX),
> +GFP_KERNEL);

idev should be fully initialized before being placed in the xarray, so
this should be the last thing done.

Why not just use the standard xa_limit_32b instead of special single
use constants?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: poll cmdq until it has space

2021-09-21 Thread Sieber, Fernand via iommu
Hi John,

> But is the polarity really correct? That is, if we don't have space,
> then exit with success (the function to check for space).

You are absolutely correct, this is a mistake that I made as I was resolving 
conflicts while porting this patch to iommu/next from 5.4 where I implemented 
and tested it.
It should be:

> - if (!queue_full(llq))
> + if (queue_has_space(llq, n))


> what is llq->state->val?

This is an other oversight for the same reason, llq->state->val has since then 
been renamed llq->val

Will fix both of these in the next revision.
Thanks and kind regards,

--Fernand


From: John Garry 
Sent: Tuesday, September 21, 2021 18:22
To: Sieber, Fernand; w...@kernel.org; robin.mur...@arm.com
Cc: linux-arm-ker...@lists.infradead.org; iommu@lists.linux-foundation.org; 
linux-ker...@vger.kernel.org
Subject: RE: [EXTERNAL] [PATCH] iommu/arm-smmu-v3: poll cmdq until it has space

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On 21/09/2021 12:43, Fernand Sieber wrote:
>   do {

I didn't follow the full logic of this change yet ...

>   llq->val = READ_ONCE(cmdq->q.llq.val);
> - if (!queue_full(llq))
> + if (!queue_has_space(llq, n))

But is the polarity really correct? That is, if we don't have space,
then exit with success (the function to check for space).

>   break;
>
> + /*
> +  * We must return here even if there's no space, because the 
> producer
> +  * having moved forward could mean that the last thread 
> observing the
> +  * SMMU progress has allocated space in the cmdq and moved on, 
> leaving
> +  * us in this waiting loop with no other thread updating
> +  * llq->state->val.

what is llq->state->val?

> +  */
> + if (llq->prod != prod)
> + return -EAGAIN;
> +
>   ret = queue_poll(&qp);

Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:34PM +0800, Liu Yi L wrote:
> From: Lu Baolu 
> 
> This extends iommu core to manage security context for passthrough
> devices. Please bear a long explanation for how we reach this design
> instead of managing it solely in iommufd like what vfio does today.
> 
> Devices which cannot be isolated from each other are organized into an
> iommu group. When a device is assigned to the user space, the entire
> group must be put in a security context so that user-initiated DMAs via
> the assigned device cannot harm the rest of the system. No user access
> should be granted on a device before the security context is established
> for the group which the device belongs to.

> Managing the security context must meet below criteria:
> 
> 1)  The group is viable for user-initiated DMAs. This implies that the
> devices in the group must be either bound to a device-passthrough

s/a/the same/

> framework, or driver-less, or bound to a driver which is known safe
> (not do DMA).
> 
> 2)  The security context should only allow DMA to the user's memory and
> devices in this group;
> 
> 3)  After the security context is established for the group, the group
> viability must be continuously monitored before the user relinquishes
> all devices belonging to the group. The viability might be broken e.g.
> when a driver-less device is later bound to a driver which does DMA.
> 
> 4)  The security context should not be destroyed before user access
> permission is withdrawn.
> 
> Existing vfio introduces explicit container/group semantics in its uAPI
> to meet above requirements. A single security context (iommu domain)
> is created per container. Attaching group to container moves the entire
> group into the associated security context, and vice versa. The user can
> open the device only after group attach. A group can be detached only
> after all devices in the group are closed. Group viability is monitored
> by listening to iommu group events.
> 
> Unlike vfio, iommufd adopts a device-centric design with all group
> logistics hidden behind the fd. Binding a device to iommufd serves
> as the contract to get security context established (and vice versa
> for unbinding). One additional requirement in iommufd is to manage the
> switch between multiple security contexts due to decoupled bind/attach:

This should be a precursor series that actually does clean things up
properly. There is no reason for vfio and iommufd to differ here, if
we are implementing this logic into the iommu layer then it should be
deleted from the VFIO layer, not left duplicated like this.

IIRC in VFIO the container is the IOAS and when the group goes to
create the device fd it should simply do the
iommu_device_init_user_dma() followed immediately by a call to bind
the container IOAS as your #3.

Then delete all the group viability stuff from vfio, relying on the
iommu to do it.

It should have full symmetry with the iommufd.

> @@ -1664,6 +1671,17 @@ static int iommu_bus_notifier(struct notifier_block 
> *nb,
>   group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
>   break;
>   case BUS_NOTIFY_BOUND_DRIVER:
> + /*
> +  * FIXME: Alternatively the attached drivers could generically
> +  * indicate to the iommu layer that they are safe for keeping
> +  * the iommu group user viable by calling some function around
> +  * probe(). We could eliminate this gross BUG_ON() by denying
> +  * probe to non-iommu-safe driver.
> +  */
> + mutex_lock(&group->mutex);
> + if (group->user_dma_owner_id)
> + BUG_ON(!iommu_group_user_dma_viable(group));
> + mutex_unlock(&group->mutex);

And the mini-series should fix this BUG_ON properly by interlocking
with the driver core to simply refuse to bind a driver under these
conditions instead of allowing userspace to crash the kernel.

That alone would be justification enough to merge this work.

> +
> +/*
> + * IOMMU core interfaces for iommufd.
> + */
> +
> +/*
> + * FIXME: We currently simply follow vifo policy to mantain the group's
> + * viability to user. Eventually, we should avoid below hard-coded list
> + * by letting drivers indicate to the iommu layer that they are safe for
> + * keeping the iommu group's user aviability.
> + */
> +static const char * const iommu_driver_allowed[] = {
> + "vfio-pci",
> + "pci-stub"
> +};

Yuk. This should be done with some callback in those drivers
'iomm_allow_user_dma()"

Ie the basic flow would see the driver core doing some:

 ret = iommu_doing_kernel_dma()
 if (ret) do not bind
 driver_bind
  pci_stub_probe()
 iommu_allow_user_dma()

And the various functions are manipulating some atomic.
 0 = nothing happening
 1 = kernel DMA
 2 = user DMA

No BUG_ON.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.o

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Tom Lendacky via iommu

On 9/20/21 2:23 PM, Kirill A. Shutemov wrote:

On Wed, Sep 08, 2021 at 05:58:36PM -0500, Tom Lendacky wrote:

diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index 470b20208430..eff4d19f9cb4 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -30,6 +30,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -287,7 +288,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
unsigned long pgtable_area_len;
unsigned long decrypted_base;
  
-	if (!sme_active())

+   if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
return;
  
  	/*


This change break boot for me (in KVM on Intel host). It only reproduces
with allyesconfig. More reasonable config works fine, but I didn't try to
find exact cause in config.


Looks like instrumentation during early boot. I worked with Boris offline 
to exclude arch/x86/kernel/cc_platform.c from some of the instrumentation 
and that allowed an allyesconfig to boot.


Thanks,
Tom



Convertion to cc_platform_has() in __startup_64() in 8/8 has the same
effect.

I believe it caused by sme_me_mask access from __startup_64() without
fixup_pointer() magic. I think __startup_64() requires special treatement
and we should avoid cc_platform_has() there (or have a special version of
the helper). Note that only AMD requires these cc_platform_has() to return
true.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 05/20] vfio/pci: Register device to /dev/vfio/devices

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:33PM +0800, Liu Yi L wrote:
> This patch exposes the device-centric interface for vfio-pci devices. To
> be compatiable with existing users, vfio-pci exposes both legacy group
> interface and device-centric interface.
> 
> As explained in last patch, this change doesn't apply to devices which
> cannot be forced to snoop cache by their upstream iommu. Such devices
> are still expected to be opened via the legacy group interface.
> 
> When the device is opened via /dev/vfio/devices, vfio-pci should prevent
> the user from accessing the assigned device because the device is still
> attached to the default domain which may allow user-initiated DMAs to
> touch arbitrary place. The user access must be blocked until the device
> is later bound to an iommufd (see patch 08). The binding acts as the
> contract for putting the device in a security context which ensures user-
> initiated DMAs via this device cannot harm the rest of the system.
> 
> This patch introduces a vdev->block_access flag for this purpose. It's set
> when the device is opened via /dev/vfio/devices and cleared after binding
> to iommufd succeeds. mmap and r/w handlers check this flag to decide whether
> user access should be blocked or not.

This should not be in vfio_pci.

AFAIK there is no condition where a vfio driver can work without being
connected to some kind of iommu back end, so the core code should
handle this interlock globally. A vfio driver's ops should not be
callable until the iommu is connected.

The only vfio_pci patch in this series should be adding a new callback
op to take in an iommufd and register the pci_device as a iommufd
device.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 04/20] iommu: Add iommu_device_get_info interface

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:32PM +0800, Liu Yi L wrote:
> From: Lu Baolu 
> 
> This provides an interface for upper layers to get the per-device iommu
> attributes.
> 
> int iommu_device_get_info(struct device *dev,
>   enum iommu_devattr attr, void *data);

Can't we use properly typed ops and functions here instead of a void
*data?

get_snoop()
get_page_size()
get_addr_width()

?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: poll cmdq until it has space

2021-09-21 Thread John Garry

On 21/09/2021 12:43, Fernand Sieber wrote:

do {


I didn't follow the full logic of this change yet ...


llq->val = READ_ONCE(cmdq->q.llq.val);
-   if (!queue_full(llq))
+   if (!queue_has_space(llq, n))


But is the polarity really correct? That is, if we don't have space, 
then exit with success (the function to check for space).



break;

+   /*
+* We must return here even if there's no space, because the 
producer
+* having moved forward could mean that the last thread 
observing the
+* SMMU progress has allocated space in the cmdq and moved on, 
leaving
+* us in this waiting loop with no other thread updating
+* llq->state->val.


what is llq->state->val?


+*/
+   if (llq->prod != prod)
+   return -EAGAIN;
+
ret = queue_poll(&qp);


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 10/11] uapi/virtio-iommu: Add a new request type to send page response

2021-09-21 Thread Jean-Philippe Brucker
On Fri, Apr 23, 2021 at 03:21:46PM +0530, Vivek Gautam wrote:
> Once the page faults are handled, the response has to be sent to
> virtio-iommu backend, from where it can be sent to the host to
> prepare the response to a generated io page fault by the device.
> Add a new virt-queue request type to handle this.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  include/uapi/linux/virtio_iommu.h | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index c12d9b6a7243..1b174b98663a 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -48,6 +48,7 @@ struct virtio_iommu_config {
>  #define VIRTIO_IOMMU_T_PROBE 0x05
>  #define VIRTIO_IOMMU_T_ATTACH_TABLE  0x06
>  #define VIRTIO_IOMMU_T_INVALIDATE0x07
> +#define VIRTIO_IOMMU_T_PAGE_RESP 0x08
>  
>  /* Status types */
>  #define VIRTIO_IOMMU_S_OK0x00
> @@ -70,6 +71,23 @@ struct virtio_iommu_req_tail {
>   __u8reserved[3];
>  };
>  
> +struct virtio_iommu_req_page_resp {
> + struct virtio_iommu_req_headhead;
> + __le32  domain;

I don't think we need this field, since the fault report doesn't come with
a domain.

> + __le32  endpoint;
> +#define VIRTIO_IOMMU_PAGE_RESP_PASID_VALID   (1 << 0)

To be consistent with the rest of the document this would be
VIRTIO_IOMMU_PAGE_RESP_F_PASID_VALID

> + __le32  flags;
> + __le32  pasid;
> + __le32  grpid;
> +#define VIRTIO_IOMMU_PAGE_RESP_SUCCESS   (0x0)
> +#define VIRTIO_IOMMU_PAGE_RESP_INVALID   (0x1)
> +#define VIRTIO_IOMMU_PAGE_RESP_FAILURE   (0x2)
> + __le16  resp_code;
> + __u8pasid_valid;

This field isn't needed since there already is
VIRTIO_IOMMU_PAGE_RESP_PASID_VALID

> + __u8reserved[9];
> + struct virtio_iommu_req_tailtail;
> +};

I'd align the size of the struct to 16 bytes, but I don't think that's
strictly necessary.

Thanks,
Jean

> +
>  struct virtio_iommu_req_attach {
>   struct virtio_iommu_req_headhead;
>   __le32  domain;
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 09/11] iommu/virtio: Implement sva bind/unbind calls

2021-09-21 Thread Jean-Philippe Brucker
On Fri, Apr 23, 2021 at 03:21:45PM +0530, Vivek Gautam wrote:
> SVA bind and unbind implementations will allow to prepare translation
> context with CPU page tables that can be programmed into host iommu
> hardware to realize shared address space utilization between the CPU
> and virtualized devices using virtio-iommu.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/virtio-iommu.c  | 199 +-
>  include/uapi/linux/virtio_iommu.h |   2 +
>  2 files changed, 199 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 250c137a211b..08f1294baeab 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -14,6 +14,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -28,6 +31,7 @@
>  #include 
>  #include "iommu-pasid-table.h"
>  #include "iommu-sva-lib.h"
> +#include "io-pgtable-arm.h"

Is this used here?

>  
>  #define MSI_IOVA_BASE0x800
>  #define MSI_IOVA_LENGTH  0x10
> @@ -41,6 +45,7 @@ DEFINE_XARRAY_ALLOC1(viommu_asid_xa);
>  
>  static DEFINE_MUTEX(sva_lock);
>  static DEFINE_MUTEX(iopf_lock);
> +static DEFINE_MUTEX(viommu_asid_lock);
>  
>  struct viommu_dev_pri_work {
>   struct work_struct  work;
> @@ -88,10 +93,22 @@ struct viommu_mapping {
>  struct viommu_mm {
>   int pasid;
>   u64 archid;
> + struct viommu_sva_bond  *bond;
>   struct io_pgtable_ops   *ops;
>   struct viommu_domain*domain;
>  };
>  
> +struct viommu_sva_bond {
> + struct iommu_svasva;
> + struct mm_struct*mm;
> + struct iommu_psdtable_mmu_notifier  *viommu_mn;
> + struct list_headlist;
> + refcount_t  refs;
> +};
> +
> +#define sva_to_viommu_bond(handle) \
> + container_of(handle, struct viommu_sva_bond, sva)
> +
>  struct viommu_domain {
>   struct iommu_domain domain;
>   struct viommu_dev   *viommu;
> @@ -136,6 +153,7 @@ struct viommu_endpoint {
>   boolpri_supported;
>   boolsva_enabled;
>   booliopf_enabled;
> + struct list_headbonds;
>  };
>  
>  struct viommu_ep_entry {
> @@ -1423,14 +1441,15 @@ static int viommu_attach_pasid_table(struct 
> viommu_endpoint *vdev,
>  
>   pst_cfg->iommu_dev = viommu->dev->parent;
>  
> + mutex_lock(&viommu_asid_lock);
>   /* Prepare PASID tables info to allocate a new table */
>   ret = viommu_prepare_pst(vdev, pst_cfg, fmt);
>   if (ret)
> - return ret;
> + goto err_out_unlock;
>  
>   ret = iommu_psdtable_alloc(tbl, pst_cfg);
>   if (ret)
> - return ret;
> + goto err_out_unlock;
>  
>   pst_cfg->iommu_dev = viommu->dev->parent;
>   pst_cfg->fmt = PASID_TABLE_ARM_SMMU_V3;
> @@ -1452,6 +1471,7 @@ static int viommu_attach_pasid_table(struct 
> viommu_endpoint *vdev,
>   if (ret)
>   goto err_free_ops;
>   }
> + mutex_unlock(&viommu_asid_lock);
>   } else {
>   /* TODO: otherwise, check for compatibility with vdev. */
>   return -ENOSYS;
> @@ -1467,6 +1487,8 @@ static int viommu_attach_pasid_table(struct 
> viommu_endpoint *vdev,
>  err_free_psdtable:
>   iommu_psdtable_free(tbl, &tbl->cfg);
>  
> +err_out_unlock:
> + mutex_unlock(&viommu_asid_lock);
>   return ret;
>  }
>  
> @@ -1706,6 +1728,7 @@ static struct iommu_device *viommu_probe_device(struct 
> device *dev)
>   vdev->dev = dev;
>   vdev->viommu = viommu;
>   INIT_LIST_HEAD(&vdev->resv_regions);
> + INIT_LIST_HEAD(&vdev->bonds);
>   dev_iommu_priv_set(dev, vdev);
>  
>   if (viommu->probe_size) {
> @@ -1755,6 +1778,175 @@ static int viommu_of_xlate(struct device *dev, struct 
> of_phandle_args *args)
>   return iommu_fwspec_add_ids(dev, args->args, 1);
>  }
>  
> +static u32 viommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + struct viommu_sva_bond *bond = sva_to_viommu_bond(handle);
> +
> + return bond->mm->pasid;
> +}
> +
> +static void viommu_mmu_notifier_free(struct mmu_notifier *mn)
> +{
> + kfree(mn_to_pstiommu(mn));
> +}
> +
> +static struct mmu_notifier_ops viommu_mmu_notifier_ops = {
> + .free_notifier  = viommu_mmu_notifier_free,

.invalidate_range and .release will be needed as well, to keep up to date
with changes to the address space

> +};
> +
> +/* Allocate or get existing MMU notifier for this {domain, mm} pair */
> +static struct iommu_psdtable_mmu_notifier *
> 

Re: [PATCH RFC v1 08/11] iommu/arm-smmu-v3: Implement shared context alloc and free ops

2021-09-21 Thread Jean-Philippe Brucker
On Fri, Apr 23, 2021 at 03:21:44PM +0530, Vivek Gautam wrote:
> Implementing the alloc_shared_cd and free_shared_cd in cd-lib, and
> start using them for arm-smmu-v3-sva implementation.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c  | 71 
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 83 ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  1 -
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 14 
>  4 files changed, 73 insertions(+), 96 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
> index 537b7c784d40..b87829796596 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
> @@ -285,16 +285,14 @@ static bool arm_smmu_free_asid(struct xarray *xa, void 
> *cookie_cd)
>   * descriptor is using it, try to replace it.
>   */
>  static struct arm_smmu_ctx_desc *
> -arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
> +arm_smmu_share_asid(struct iommu_pasid_table *tbl, struct mm_struct *mm,
> + struct xarray *xa, u16 asid, u32 asid_bits)

xa and asid_bits could be stored in some arch-specific section of the
iommu_pasid_table struct. Other table drivers wouldn't need those
arguments.

More a comment for the parent series: it may be clearer to give a
different prefix to functions in this file (arm_smmu_cd_, pst_arm_?).
Reading this patch I'm a little confused by what belongs in the IOMMU
driver and what is done by this library. (I also keep reading 'tbl' as
'tlb'. Maybe we could make it 'table' since that doesn't take a lot more
space)

>  {
>   int ret;
>   u32 new_asid;
>   struct arm_smmu_ctx_desc *cd;
> - struct arm_smmu_device *smmu;
> - struct arm_smmu_domain *smmu_domain;
> - struct iommu_pasid_table *tbl;
>  
> - cd = xa_load(&arm_smmu_asid_xa, asid);
> + cd = xa_load(xa, asid);
>   if (!cd)
>   return NULL;
>  
> @@ -306,12 +304,8 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>   return cd;
>   }
>  
> - smmu_domain = container_of(cd, struct arm_smmu_domain, s1_cfg.cd);
> - smmu = smmu_domain->smmu;
> - tbl = smmu_domain->tbl;
> -
> - ret = xa_alloc(&arm_smmu_asid_xa, &new_asid, cd,
> -XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL);
> + ret = xa_alloc(xa, &new_asid, cd, XA_LIMIT(1, (1 << asid_bits) - 1),
> +GFP_KERNEL);
>   if (ret)
>   return ERR_PTR(-ENOSPC);
>   /*
> @@ -325,48 +319,52 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>* be some overlap between use of both ASIDs, until we invalidate the
>* TLB.
>*/
> - ret = iommu_psdtable_write(tbl, &tbl->cfg, 0, cd);
> + ret = arm_smmu_write_ctx_desc(&tbl->cfg, 0, cd);
>   if (ret)
>   return ERR_PTR(-ENOSYS);
>  
>   /* Invalidate TLB entries previously associated with that context */
> - iommu_psdtable_flush_tlb(tbl, smmu_domain, asid);
> + iommu_psdtable_flush_tlb(tbl, tbl->cookie, asid);
>  
> - xa_erase(&arm_smmu_asid_xa, asid);
> + xa_erase(xa, asid);
>   return NULL;
>  }
>  
> -struct arm_smmu_ctx_desc *
> -arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, struct mm_struct *mm)
> +static struct iommu_psdtable_mmu_notifier *
> +arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, struct mm_struct *mm,
> +  struct xarray *xa, u32 asid_bits)
>  {
>   u16 asid;
>   int err = 0;
>   u64 tcr, par, reg;
>   struct arm_smmu_ctx_desc *cd;
>   struct arm_smmu_ctx_desc *ret = NULL;
> + struct iommu_psdtable_mmu_notifier *pst_mn;
>  
>   asid = arm64_mm_context_get(mm);
>   if (!asid)
>   return ERR_PTR(-ESRCH);
>  
> + pst_mn = kzalloc(sizeof(*pst_mn), GFP_KERNEL);
> + if (!pst_mn) {
> + err = -ENOMEM;
> + goto out_put_context;
> + }
> +
>   cd = kzalloc(sizeof(*cd), GFP_KERNEL);
>   if (!cd) {
>   err = -ENOMEM;
> - goto out_put_context;
> + goto out_free_mn;
>   }
>  
>   refcount_set(&cd->refs, 1);
>  
> - mutex_lock(&arm_smmu_asid_lock);
> - ret = arm_smmu_share_asid(mm, asid);
> + ret = arm_smmu_share_asid(tbl, mm, xa, asid, asid_bits);
>   if (ret) {
> - mutex_unlock(&arm_smmu_asid_lock);
>   goto out_free_cd;
>   }
>  
> - err = xa_insert(&arm_smmu_asid_xa, asid, cd, GFP_KERNEL);
> - mutex_unlock(&arm_smmu_asid_lock);
> -
> + err = xa_insert(xa, asid, cd, GFP_KERNEL);
>   if (err)
>   goto out_free_asid;
>  
> @@ -406,21 +404,26 @@ arm_smmu_alloc_shared_cd(struct iommu_pasid_table *tbl, 
> struct mm_struct *mm)
>   cd->asid = asid;
>   cd->mm = mm;
>  
> - return cd;
> + pst_mn->vendor.cd = cd;
> + return pst_m

Re: [PATCH RFC v1 05/11] iommu/virtio: Add SVA feature and related enable/disable callbacks

2021-09-21 Thread Jean-Philippe Brucker
On Fri, Apr 23, 2021 at 03:21:41PM +0530, Vivek Gautam wrote:
> Add a feature flag to virtio iommu for Shared virtual addressing
> (SVA). This feature would indicate the availablily path for handling
> device page faults, and the provision for sending page response.

In this case the feature should probably be called PAGE_REQUEST or
similar. SVA aggregates PF + PASID + shared page tables

Thanks,
Jean

> Also add necessary methods to enable and disable SVA so that the
> masters can enable the SVA path. This also requires enabling the
> PRI capability on the device.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/virtio-iommu.c  | 268 ++
>  include/uapi/linux/virtio_iommu.h |   1 +
>  2 files changed, 269 insertions(+)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 3da5f0807711..250c137a211b 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -26,6 +27,7 @@
>  
>  #include 
>  #include "iommu-pasid-table.h"
> +#include "iommu-sva-lib.h"
>  
>  #define MSI_IOVA_BASE0x800
>  #define MSI_IOVA_LENGTH  0x10
> @@ -37,6 +39,9 @@
>  /* Some architectures need an Address Space ID for each page table */
>  DEFINE_XARRAY_ALLOC1(viommu_asid_xa);
>  
> +static DEFINE_MUTEX(sva_lock);
> +static DEFINE_MUTEX(iopf_lock);
> +
>  struct viommu_dev_pri_work {
>   struct work_struct  work;
>   struct viommu_dev   *dev;
> @@ -71,6 +76,7 @@ struct viommu_dev {
>  
>   boolhas_map:1;
>   boolhas_table:1;
> + boolhas_sva:1;
>  };
>  
>  struct viommu_mapping {
> @@ -124,6 +130,12 @@ struct viommu_endpoint {
>   void*pstf;
>   /* Preferred page table format */
>   void*pgtf;
> +
> + /* sva */
> + boolats_supported;
> + boolpri_supported;
> + boolsva_enabled;
> + booliopf_enabled;
>  };
>  
>  struct viommu_ep_entry {
> @@ -582,6 +594,64 @@ static int viommu_add_pstf(struct viommu_endpoint *vdev, 
> void *pstf, size_t len)
>   return 0;
>  }
>  
> +static int viommu_init_ats_pri(struct viommu_endpoint *vdev)
> +{
> + struct device *dev = vdev->dev;
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!dev_is_pci(vdev->dev))
> + return -EINVAL;
> +
> + if (pci_ats_supported(pdev))
> + vdev->ats_supported = true;
> +
> + if (pci_pri_supported(pdev))
> + vdev->pri_supported = true;
> +
> + return 0;
> +}
> +
> +static int viommu_enable_pri(struct viommu_endpoint *vdev)
> +{
> + int ret;
> + struct pci_dev *pdev;
> +
> + /* Let's allow only 4 requests for PRI right now */
> + size_t max_inflight_pprs = 4;
> +
> + if (!vdev->pri_supported || !vdev->ats_supported)
> + return -ENODEV;
> +
> + pdev = to_pci_dev(vdev->dev);
> +
> + ret = pci_reset_pri(pdev);
> + if (ret)
> + return ret;
> +
> + ret = pci_enable_pri(pdev, max_inflight_pprs);
> + if (ret) {
> + dev_err(vdev->dev, "cannot enable PRI: %d\n", ret);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static void viommu_disable_pri(struct viommu_endpoint *vdev)
> +{
> + struct pci_dev *pdev;
> +
> + if (!dev_is_pci(vdev->dev))
> + return;
> +
> + pdev = to_pci_dev(vdev->dev);
> +
> + if (!pdev->pri_enabled)
> + return;
> +
> + pci_disable_pri(pdev);
> +}
> +
>  static int viommu_init_queues(struct viommu_dev *viommu)
>  {
>   viommu->iopf_pri = iopf_queue_alloc(dev_name(viommu->dev));
> @@ -684,6 +754,10 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   if (ret)
>   goto out_free_eps;
>  
> + ret = viommu_init_ats_pri(vdev);
> + if (ret)
> + goto out_free_eps;
> +
>   kfree(probe);
>   return 0;
>  
> @@ -1681,6 +1755,194 @@ static int viommu_of_xlate(struct device *dev, struct 
> of_phandle_args *args)
>   return iommu_fwspec_add_ids(dev, args->args, 1);
>  }
>  
> +static bool viommu_endpoint_iopf_supported(struct viommu_endpoint *vdev)
> +{
> + /* TODO: support Stall model later */
> + return vdev->pri_supported;
> +}
> +
> +bool viommu_endpoint_sva_supported(struct viommu_endpoint *vdev)
> +{
> + struct viommu_dev *viommu = vdev->viommu;
> +
> + if (!viommu->has_sva)
> + return false;
> +
> + return vdev->pasid_bits;
> +}
> +
> +bool viommu_endpoint_sva_enabled(struct viommu_endpoint *vdev)
> +{
> + bool enabled;
> +
> + mutex_lock(&sva_lock);
> + 

Re: [PATCH RFC v1 03/11] iommu/virtio: Handle incoming page faults

2021-09-21 Thread Jean-Philippe Brucker
On Fri, Apr 23, 2021 at 03:21:39PM +0530, Vivek Gautam wrote:
> Redirect the incoming page faults to the registered fault handler
> that can take the fault information such as, pasid, page request
> group-id, address and pasid flags.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/virtio-iommu.c  | 80 ++-
>  include/uapi/linux/virtio_iommu.h |  1 +
>  2 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index c970f386f031..fd237cad1ce5 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -37,6 +37,13 @@
>  /* Some architectures need an Address Space ID for each page table */
>  DEFINE_XARRAY_ALLOC1(viommu_asid_xa);
>  
> +struct viommu_dev_pri_work {
> + struct work_struct  work;
> + struct viommu_dev   *dev;
> + struct virtio_iommu_fault   *vfault;
> + u32 endpoint;
> +};
> +
>  struct viommu_dev {
>   struct iommu_device iommu;
>   struct device   *dev;
> @@ -49,6 +56,8 @@ struct viommu_dev {
>   struct list_headrequests;
>   void*evts;
>   struct list_headendpoints;
> + struct workqueue_struct *pri_wq;
> + struct viommu_dev_pri_work  *pri_work;

IOPF already has a workqueue, so the driver doesn't need one.
iommu_report_device_fault() should be fast enough to be called from the
event handler.

>  
>   /* Device configuration */
>   struct iommu_domain_geometrygeometry;
> @@ -666,6 +675,58 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   return ret;
>  }
>  
> +static void viommu_handle_ppr(struct work_struct *work)
> +{
> + struct viommu_dev_pri_work *pwork =
> + container_of(work, struct viommu_dev_pri_work, 
> work);
> + struct viommu_dev *viommu = pwork->dev;
> + struct virtio_iommu_fault *vfault = pwork->vfault;
> + struct viommu_endpoint *vdev;
> + struct viommu_ep_entry *ep;
> + struct iommu_fault_event fault_evt = {
> + .fault.type = IOMMU_FAULT_PAGE_REQ,
> + };
> + struct iommu_fault_page_request *prq = &fault_evt.fault.prm;
> +
> + u32 flags   = le32_to_cpu(vfault->flags);
> + u32 prq_flags   = le32_to_cpu(vfault->pr_evt_flags);
> + u32 endpoint= pwork->endpoint;
> +
> + memset(prq, 0, sizeof(struct iommu_fault_page_request));

The fault_evt struct is already initialized

> + prq->addr = le64_to_cpu(vfault->address);
> +
> + if (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_LAST_PAGE)
> + prq->flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> + if (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID) {
> + prq->flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> + prq->pasid = le32_to_cpu(vfault->pasid);
> + prq->grpid = le32_to_cpu(vfault->grpid);
> + }
> +
> + if (flags & VIRTIO_IOMMU_FAULT_F_READ)
> + prq->perm |= IOMMU_FAULT_PERM_READ;
> + if (flags & VIRTIO_IOMMU_FAULT_F_WRITE)
> + prq->perm |= IOMMU_FAULT_PERM_WRITE;
> + if (flags & VIRTIO_IOMMU_FAULT_F_EXEC)
> + prq->perm |= IOMMU_FAULT_PERM_EXEC;
> + if (flags & VIRTIO_IOMMU_FAULT_F_PRIV)
> + prq->perm |= IOMMU_FAULT_PERM_PRIV;
> +
> + list_for_each_entry(ep, &viommu->endpoints, list) {
> + if (ep->eid == endpoint) {
> + vdev = ep->vdev;
> + break;
> + }
> + }
> +
> + if ((prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID) &&
> + (prq_flags & VIRTIO_IOMMU_FAULT_PRQ_F_NEEDS_PASID))
> + prq->flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
> +
> + if (iommu_report_device_fault(vdev->dev, &fault_evt))
> + dev_err(vdev->dev, "Couldn't handle page request\n");

An error likely means that nobody registered a fault handler, but we could
display a few more details about the fault that would help debug the
endpoint

> +}
> +
>  static int viommu_fault_handler(struct viommu_dev *viommu,
>   struct virtio_iommu_fault *fault)
>  {
> @@ -679,7 +740,13 @@ static int viommu_fault_handler(struct viommu_dev 
> *viommu,
>   u32 pasid   = le32_to_cpu(fault->pasid);
>  
>   if (type == VIRTIO_IOMMU_FAULT_F_PAGE_REQ) {
> - dev_info(viommu->dev, "Page request fault - unhandled\n");
> + dev_info_ratelimited(viommu->dev,
> +  "Page request fault from EP %u\n",
> +  endpoint);

That's rather for debugging the virtio-iommu driver, so should be
dev_dbg() (or removed entirely)

> +
> + viommu->pri_work->vfault = fault;
> + viommu->pri_work->endpoint = endpoint;
> + queue_work(viommu->pri_wq, &vi

Re: [RFC 03/20] vfio: Add vfio_[un]register_device()

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:31PM +0800, Liu Yi L wrote:
> With /dev/vfio/devices introduced, now a vfio device driver has three
> options to expose its device to userspace:
> 
> a)  only legacy group interface, for devices which haven't been moved to
> iommufd (e.g. platform devices, sw mdev, etc.);
> 
> b)  both legacy group interface and new device-centric interface, for
> devices which supports iommufd but also wants to keep backward
> compatibility (e.g. pci devices in this RFC);
> 
> c)  only new device-centric interface, for new devices which don't carry
> backward compatibility burden (e.g. hw mdev/subdev with pasid);

We shouldn't have 'b'? Where does it come from?

> This patch introduces vfio_[un]register_device() helpers for the device
> drivers to specify the device exposure policy to vfio core. Hence the
> existing vfio_[un]register_group_dev() become the wrapper of the new
> helper functions. The new device-centric interface is described as
> 'nongroup' to differentiate from existing 'group' stuff.

Detect what the driver supports based on the ops it declares. There
should be a function provided through the ops for the driver to bind
to the iommufd.

>  One open about how to organize the device nodes under /dev/vfio/devices/.
> This RFC adopts a simple policy by keeping a flat layout with mixed devname
> from all kinds of devices. The prerequisite of this model is that devnames
> from different bus types are unique formats:

This isn't reliable, the devname should just be vfio0, vfio1, etc

The userspace can learn the correct major/minor by inspecting the
sysfs.

This whole concept should disappear into the prior patch that adds the
struct device in the first place, and I think most of the code here
can be deleted once the struct device is used properly.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 02/11] iommu/virtio: Maintain a list of endpoints served by viommu_dev

2021-09-21 Thread Jean-Philippe Brucker
On Fri, Apr 23, 2021 at 03:21:38PM +0530, Vivek Gautam wrote:
> Keeping a record of list of endpoints that are served by the virtio-iommu
> device would help in redirecting the requests of page faults to the
> correct endpoint device to handle such requests.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/virtio-iommu.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 50039070e2aa..c970f386f031 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -48,6 +48,7 @@ struct viommu_dev {
>   spinlock_t  request_lock;
>   struct list_headrequests;
>   void*evts;
> + struct list_headendpoints;

As we're going to search by ID, an xarray or rb_tree would be more
appropriate than a list

>  
>   /* Device configuration */
>   struct iommu_domain_geometrygeometry;
> @@ -115,6 +116,12 @@ struct viommu_endpoint {
>   void*pgtf;
>  };
>  
> +struct viommu_ep_entry {
> + u32 eid;
> + struct viommu_endpoint  *vdev;
> + struct list_headlist;
> +};

No need for a separate struct, I think you can just add the list head and
id into viommu_endpoint.

> +
>  struct viommu_request {
>   struct list_headlist;
>   void*writeback;
> @@ -573,6 +580,7 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   size_t probe_len;
>   struct virtio_iommu_req_probe *probe;
>   struct virtio_iommu_probe_property *prop;
> + struct viommu_ep_entry *ep;
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>   struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
>  
> @@ -640,6 +648,18 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   prop = (void *)probe->properties + cur;
>   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
>   }
> + if (ret)
> + goto out_free;
> +
> + ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> + if (!ep) {
> + ret = -ENOMEM;
> + goto out_free;
> + }
> + ep->eid = probe->endpoint;
> + ep->vdev = vdev;
> +
> + list_add(&ep->list, &viommu->endpoints);

This should be in viommu_probe_device() (viommu_probe_endpoint() is only
called if F_PROBE is negotiated). I think we need a lock for this
list/xarray

Thanks,
Jean

>  
>  out_free:
>   kfree(probe);
> @@ -1649,6 +1669,7 @@ static int viommu_probe(struct virtio_device *vdev)
>   viommu->dev = dev;
>   viommu->vdev = vdev;
>   INIT_LIST_HEAD(&viommu->requests);
> + INIT_LIST_HEAD(&viommu->endpoints);
>  
>   ret = viommu_init_vqs(viommu);
>   if (ret)
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 01/11] uapi/virtio-iommu: Add page request grp-id and flags information

2021-09-21 Thread Jean-Philippe Brucker
Hi Vivek,

Thanks a lot for your work on this

On Fri, Apr 23, 2021 at 03:21:37PM +0530, Vivek Gautam wrote:
> Add fault information for group-id and necessary flags for page
> request faults that can be handled by page fault handler in
> virtio-iommu driver.
> 
> Signed-off-by: Vivek Gautam 
> Cc: Joerg Roedel 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Cc: Jean-Philippe Brucker 
> Cc: Eric Auger 
> Cc: Alex Williamson 
> Cc: Kevin Tian 
> Cc: Jacob Pan 
> Cc: Liu Yi L 
> Cc: Lorenzo Pieralisi 
> Cc: Shameerali Kolothum Thodi 
> ---
>  include/uapi/linux/virtio_iommu.h | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index f8bf927a0689..accc3318ce46 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -307,14 +307,27 @@ struct virtio_iommu_req_invalidate {
>  #define VIRTIO_IOMMU_FAULT_F_DMA_UNRECOV 1
>  #define VIRTIO_IOMMU_FAULT_F_PAGE_REQ2
>  
> +#define VIRTIO_IOMMU_FAULT_PRQ_F_PASID_VALID (1 << 0)
> +#define VIRTIO_IOMMU_FAULT_PRQ_F_LAST_PAGE   (1 << 1)
> +#define VIRTIO_IOMMU_FAULT_PRQ_F_PRIV_DATA   (1 << 2)
> +#define VIRTIO_IOMMU_FAULT_PRQ_F_NEEDS_PASID (1 << 3)

I don't think this one is necessary here. The NEEDS_PASID flags added by
commit 970471914c67 ("iommu: Allow page responses without PASID") mainly
helps Linux keep track of things internally. It does tell the fault
handler whether to reply with PASID or not, but we don't need that here.
The virtio-iommu driver knows whether a PASID is required by looking at
the "PRG Response PASID Required" bit in the PCIe capability. For non-PCIe
faults (e.g. SMMU stall), I'm guessing we'll need a PROBE property to
declare that the endpoint supports recoverable faults anyway, so "PASID
required in response" can go through there as well.

> +
> +#define VIRTIO_IOMMU_FAULT_UNREC_F_PASID_VALID   (1 << 0)
> +#define VIRTIO_IOMMU_FAULT_UNREC_F_ADDR_VALID(1 << 1)
> +#define VIRTIO_IOMMU_FAULT_UNREC_F_FETCH_ADDR_VALID  (1 << 2)
> +
>  struct virtio_iommu_fault {
>   __u8reason;
>   __u8reserved[3];
>   __le16  flt_type;
>   __u8reserved2[2];
> + /* flags is actually permission flags */

It's also used for declaring validity of fields.
VIRTIO_IOMMU_FAULT_F_ADDRESS already tells whether the address field is
valid, so all the other flags introduced by this patch can go in here.

>   __le32  flags;
> + /* flags for PASID and Page request handling info */
> + __le32  pr_evt_flags;
>   __le32  endpoint;
>   __le32  pasid;
> + __le32  grpid;

I'm not sure why we made it 32-bit in Linux UAPI, it's a little wasteful.
PCIe PRGI is 9-bits and SMMU STAG is 16-bits. Since the scope of the grpid
is the endpoint, 16-bit means 64k in-flight faults per endpoint, which
seems more than enough.

New fields must be appended at the end of the struct, because old drivers
will expect to find the 'endpoint' field at this offset. You could remove
'reserved3' while adding 'grpid', to keep the struct layout.

>   __u8reserved3[4];
>   __le64  address;
>   __u8reserved4[8];


So the base structure, currently in the spec, looks like this:

struct virtio_iommu_fault {
u8   reason;
u8   reserved[3];
le32 flags;
le32 endpoint;
le32 reserved1;
le64 address;
};

#define VIRTIO_IOMMU_FAULT_F_READ   (1 << 0)
#define VIRTIO_IOMMU_FAULT_F_WRITE  (1 << 1)
#define VIRTIO_IOMMU_FAULT_F_ADDRESS(1 << 8)

The extended struct could be:

struct virtio_iommu_fault {
u8   reason;
u8   reserved[3];
le32 flags;
le32 endpoint;
le32 pasid;
le64 address;
/* Page request group ID */
le16 group_id;
u8   reserved1[6];
/* For VT-d private data */
le64 private_data[2];
};

#define VIRTIO_IOMMU_FAULT_F_READ   (1 << 0)
#define VIRTIO_IOMMU_FAULT_F_WRITE  (1 << 1)
#define VIRTIO_IOMMU_FAULT_F_EXEC   (1 << 2)
#define VIRTIO_IOMMU_FAULT_F_PRIVILEGED (1 << 3)
/* Last fault in group */
#define VIRTIO_IOMMU_FAULT_F_LAST   (1 << 4)
/* Fault is a recoverable page request and requires a response */
#define VIRTIO_IOMMU_FAULT_F_PAGE_REQ   (1 << 5)


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:
> This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> userspace to directly open a vfio device w/o relying on container/group
> (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> iommufd (more specifically in iommu core by this RFC) in a device-centric
> manner.
> 
> In case a device is exposed in both legacy and new interfaces (see next
> patch for how to decide it), this patch also ensures that when the device
> is already opened via one interface then the other one must be blocked.
> 
> Signed-off-by: Liu Yi L 
> ---
>  drivers/vfio/vfio.c  | 228 +++
>  include/linux/vfio.h |   2 +
>  2 files changed, 213 insertions(+), 17 deletions(-)

> +static int vfio_init_device_class(void)
> +{
> + int ret;
> +
> + mutex_init(&vfio.device_lock);
> + idr_init(&vfio.device_idr);
> +
> + /* /dev/vfio/devices/$DEVICE */
> + vfio.device_class = class_create(THIS_MODULE, "vfio-device");
> + if (IS_ERR(vfio.device_class))
> + return PTR_ERR(vfio.device_class);
> +
> + vfio.device_class->devnode = vfio_device_devnode;
> +
> + ret = alloc_chrdev_region(&vfio.device_devt, 0, MINORMASK + 1, 
> "vfio-device");
> + if (ret)
> + goto err_alloc_chrdev;
> +
> + cdev_init(&vfio.device_cdev, &vfio_device_fops);
> + ret = cdev_add(&vfio.device_cdev, vfio.device_devt, MINORMASK + 1);
> + if (ret)
> + goto err_cdev_add;

Huh? This is not how cdevs are used. This patch needs rewriting.

The struct vfio_device should gain a 'struct device' and 'struct cdev'
as non-pointer members

vfio register path should end up doing cdev_device_add() for each
vfio_device

vfio_unregister path should do cdev_device_del()

No idr should be needed, an ida is used to allocate minor numbers

The struct device release function should trigger a kfree which
requires some reworking of the callers

vfio_init_group_dev() should do a device_initialize()
vfio_uninit_group_dev() should do a device_put()

The opened atomic is aweful. A newly created fd should start in a
state where it has a disabled fops

The only thing the disabled fops can do is register the device to the
iommu fd. When successfully registered the device gets the normal fops.

The registration steps should be done under a normal lock inside the
vfio_device. If a vfio_device is already registered then further
registration should fail.

Getting the device fd via the group fd triggers the same sequence as
above.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dart: Remove iommu_flush_ops

2021-09-21 Thread Marc Zyngier
On Tue, 21 Sep 2021 16:39:34 +0100,
Sven Peter  wrote:
> 
> apple_dart_tlb_flush_{all,walk} expect to get a struct apple_dart_domain
> but instead get a struct iommu_domain right now. This breaks those two
> functions and can lead to kernel panics like the one below.
> DART can only invalidate the entire TLB and apple_dart_iotlb_sync will
> already flush everything. There's no need to do that again inside those
> two functions. Let's just drop them.
> 
>   pci :03:00.0: Removing from iommu group 1
>   Unable to handle kernel paging request at virtual address 00010023
>   [...]
>   Call trace:
>_raw_spin_lock_irqsave+0x54/0xbc
>apple_dart_hw_stream_command.constprop.0+0x2c/0x130
>apple_dart_tlb_flush_all+0x48/0x90
>free_io_pgtable_ops+0x40/0x70
>apple_dart_domain_free+0x2c/0x44
>iommu_group_release+0x68/0xac
>kobject_cleanup+0x4c/0x1fc
>kobject_cleanup+0x14c/0x1fc
>kobject_put+0x64/0x84
>iommu_group_remove_device+0x110/0x180
>iommu_release_device+0x50/0xa0
>   [...]
> 
> Fixes: 46d1fb072e76b161 ("iommu/dart: Add DART iommu driver")
> Reported-by: Marc Zyngier 
> Signed-off-by: Sven Peter 

Thanks for addressing this so quickly.

Acked-by: Marc Zyngier 
Tested-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote:
> /dev/iommu aims to provide a unified interface for managing I/O address
> spaces for devices assigned to userspace. This patch adds the initial
> framework to create a /dev/iommu node. Each open of this node returns an
> iommufd. And this fd is the handle for userspace to initiate its I/O
> address space management.
> 
> One open:
> - We call this feature as IOMMUFD in Kconfig in this RFC. However this
>   name is not clear enough to indicate its purpose to user. Back to 2010
>   vfio even introduced a /dev/uiommu [1] as the predecessor of its
>   container concept. Is that a better name? Appreciate opinions here.
> 
> [1] https://lore.kernel.org/kvm/4c0eb470.1hmjondo00nivfm6%25p...@cisco.com/
> 
> Signed-off-by: Liu Yi L 
>  drivers/iommu/Kconfig   |   1 +
>  drivers/iommu/Makefile  |   1 +
>  drivers/iommu/iommufd/Kconfig   |  11 
>  drivers/iommu/iommufd/Makefile  |   2 +
>  drivers/iommu/iommufd/iommufd.c | 112 
>  5 files changed, 127 insertions(+)
>  create mode 100644 drivers/iommu/iommufd/Kconfig
>  create mode 100644 drivers/iommu/iommufd/Makefile
>  create mode 100644 drivers/iommu/iommufd/iommufd.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 07b7c25cbed8..a83ce0acd09d 100644
> +++ b/drivers/iommu/Kconfig
> @@ -136,6 +136,7 @@ config MSM_IOMMU
>  
>  source "drivers/iommu/amd/Kconfig"
>  source "drivers/iommu/intel/Kconfig"
> +source "drivers/iommu/iommufd/Kconfig"
>  
>  config IRQ_REMAP
>   bool "Support for Interrupt Remapping"
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index c0fb0ba88143..719c799f23ad 100644
> +++ b/drivers/iommu/Makefile
> @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
>  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
>  obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
> +obj-$(CONFIG_IOMMUFD) += iommufd/
> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> new file mode 100644
> index ..9fb7769a815d
> +++ b/drivers/iommu/iommufd/Kconfig
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config IOMMUFD
> + tristate "I/O Address Space management framework for passthrough 
> devices"
> + select IOMMU_API
> + default n
> + help
> +   provides unified I/O address space management framework for
> +   isolating untrusted DMAs via devices which are passed through
> +   to userspace drivers.
> +
> +   If you don't know what to do here, say N.
> diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
> new file mode 100644
> index ..54381a01d003
> +++ b/drivers/iommu/iommufd/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_IOMMUFD) += iommufd.o
> diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c
> new file mode 100644
> index ..710b7e62988b
> +++ b/drivers/iommu/iommufd/iommufd.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * I/O Address Space Management for passthrough devices
> + *
> + * Copyright (C) 2021 Intel Corporation
> + *
> + * Author: Liu Yi L 
> + */
> +
> +#define pr_fmt(fmt)"iommufd: " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Per iommufd */
> +struct iommufd_ctx {
> + refcount_t refs;
> +};

A private_data of a struct file should avoid having a refcount (and
this should have been a kref anyhow)

Use the refcount on the struct file instead.

In general the lifetime models look overly convoluted to me with
refcounts being used as locks and going in all manner of directions.

- No refcount on iommufd_ctx, this should use the fget on the fd.
  The driver facing version of the API has the driver holds a fget
  inside the iommufd_device.

- Put a rwlock inside the iommufd_ioas that is a
  'destroying_lock'. The rwlock starts out unlocked.
  
  Acquire from the xarray is
   rcu_lock()
   ioas = xa_load()
   if (ioas)
  if (down_read_trylock(&ioas->destroying_lock))
   // success
  Unacquire is just up_read()

  Do down_write when the ioas is to be destroyed, do not return ebusy.

 - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does
   not need locking (order it properly too, it is in the wrong order), and
   don't check for duplicate devices or dev_cookie duplication, that
   is user error and is harmless to the kernel.
  
> +static int iommufd_fops_release(struct inode *inode, struct file *filep)
> +{
> + struct iommufd_ctx *ictx = filep->private_data;
> +
> + filep->private_data = NULL;

unnecessary

> + iommufd_ctx_put(ictx);
> +
> + return 0;
> +}
> +
> +static long iommufd_fops_unl_ioctl(struct file *filep,
> +unsigned int cmd, unsigned long arg)
> +{
> + struct 

[PATCH] iommu/dart: Remove iommu_flush_ops

2021-09-21 Thread Sven Peter via iommu
apple_dart_tlb_flush_{all,walk} expect to get a struct apple_dart_domain
but instead get a struct iommu_domain right now. This breaks those two
functions and can lead to kernel panics like the one below.
DART can only invalidate the entire TLB and apple_dart_iotlb_sync will
already flush everything. There's no need to do that again inside those
two functions. Let's just drop them.

  pci :03:00.0: Removing from iommu group 1
  Unable to handle kernel paging request at virtual address 00010023
  [...]
  Call trace:
   _raw_spin_lock_irqsave+0x54/0xbc
   apple_dart_hw_stream_command.constprop.0+0x2c/0x130
   apple_dart_tlb_flush_all+0x48/0x90
   free_io_pgtable_ops+0x40/0x70
   apple_dart_domain_free+0x2c/0x44
   iommu_group_release+0x68/0xac
   kobject_cleanup+0x4c/0x1fc
   kobject_cleanup+0x14c/0x1fc
   kobject_put+0x64/0x84
   iommu_group_remove_device+0x110/0x180
   iommu_release_device+0x50/0xa0
  [...]

Fixes: 46d1fb072e76b161 ("iommu/dart: Add DART iommu driver")
Reported-by: Marc Zyngier 
Signed-off-by: Sven Peter 
---
 drivers/iommu/apple-dart.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index c37fb4790e8a..47ffe9e49abb 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -181,7 +181,6 @@ struct apple_dart_master_cfg {
 
 static struct platform_driver apple_dart_driver;
 static const struct iommu_ops apple_dart_iommu_ops;
-static const struct iommu_flush_ops apple_dart_tlb_ops;
 
 static struct apple_dart_domain *to_dart_domain(struct iommu_domain *dom)
 {
@@ -336,22 +335,6 @@ static void apple_dart_iotlb_sync_map(struct iommu_domain 
*domain,
apple_dart_domain_flush_tlb(to_dart_domain(domain));
 }
 
-static void apple_dart_tlb_flush_all(void *cookie)
-{
-   apple_dart_domain_flush_tlb(cookie);
-}
-
-static void apple_dart_tlb_flush_walk(unsigned long iova, size_t size,
- size_t granule, void *cookie)
-{
-   apple_dart_domain_flush_tlb(cookie);
-}
-
-static const struct iommu_flush_ops apple_dart_tlb_ops = {
-   .tlb_flush_all = apple_dart_tlb_flush_all,
-   .tlb_flush_walk = apple_dart_tlb_flush_walk,
-};
-
 static phys_addr_t apple_dart_iova_to_phys(struct iommu_domain *domain,
   dma_addr_t iova)
 {
@@ -433,7 +416,6 @@ static int apple_dart_finalize_domain(struct iommu_domain 
*domain,
.ias = 32,
.oas = 36,
.coherent_walk = 1,
-   .tlb = &apple_dart_tlb_ops,
.iommu_dev = dart->dev,
};
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: set IO TLB segment size via cmdline

2021-09-21 Thread Roman Skakun
Hi Robin,

>> I use Xen PV display. In my case, PV display backend(Dom0) allocates
>> contiguous buffer via DMA-API to
>> to implement zero-copy between Dom0 and DomU.
>>
> Well, something's gone badly wrong there - if you have to shadow the
> entire thing in a bounce buffer to import it then it's hardly zero-copy,
> is it? If you want to do buffer sharing the buffer really needs to be
> allocated appropriately to begin with, such that all relevant devices
> can access it directly. That might be something which needs fixing in Xen.
>

Right, in case when we want to use a zero-copy approach need to avoid
using swiotlb
bounce buffer for all devices which is potentially using this buffer.
The root of the problem is that this buffer mapped to foreign pages
and when I tried to
retrieve dma_addr for this buffer I got a foreign MFN that bigger than
32 bit and swiotlb tries to
use bounce buffer.
I understood, that, need to find a way to avoid using swiotlb in this case.
At the moment, it's unclear how to do this properly.
But, this is another story...

I guess, we can have the situation when some device like rcar-du needs
to use a sufficiently large
buffer which is greater than 256 KB (128(CURRENT_IO_TLB_SEGMENT *
2048) and need to
adjust this parameter during boot time, not compilation time.
In order to this point, this patch was created.

Thanks,
Roman

пт, 17 сент. 2021 г. в 12:44, Robin Murphy :
>
> On 2021-09-17 10:36, Roman Skakun wrote:
> > Hi, Christoph
> >
> > I use Xen PV display. In my case, PV display backend(Dom0) allocates
> > contiguous buffer via DMA-API to
> > to implement zero-copy between Dom0 and DomU.
>
> Well, something's gone badly wrong there - if you have to shadow the
> entire thing in a bounce buffer to import it then it's hardly zero-copy,
> is it? If you want to do buffer sharing the buffer really needs to be
> allocated appropriately to begin with, such that all relevant devices
> can access it directly. That might be something which needs fixing in Xen.
>
> Robin.
>
> > When I start Weston under DomU, I got the next log in Dom0:
> > ```
> > [ 112.554471] CPU: 0 PID: 367 Comm: weston Tainted: G O
> > 5.10.0-yocto-standard+ #312
> > [ 112.575149] Call trace:
> > [ 112.577666] dump_backtrace+0x0/0x1b0
> > [ 112.581373] show_stack+0x18/0x70
> > [ 112.584746] dump_stack+0xd0/0x12c
> > [ 112.588200] swiotlb_tbl_map_single+0x234/0x360
> > [ 112.592781] xen_swiotlb_map_page+0xe4/0x4c0
> > [ 112.597095] xen_swiotlb_map_sg+0x84/0x12c
> > [ 112.601249] dma_map_sg_attrs+0x54/0x60
> > [ 112.605138] vsp1_du_map_sg+0x30/0x60
> > [ 112.608851] rcar_du_vsp_map_fb+0x134/0x170
> > [ 112.613082] rcar_du_vsp_plane_prepare_fb+0x44/0x64
> > [ 112.618007] drm_atomic_helper_prepare_planes+0xac/0x160
> > [ 112.623362] drm_atomic_helper_commit+0x88/0x390
> > [ 112.628029] drm_atomic_nonblocking_commit+0x4c/0x60
> > [ 112.633043] drm_mode_atomic_ioctl+0x9a8/0xb0c
> > [ 112.637532] drm_ioctl_kernel+0xc4/0x11c
> > [ 112.641506] drm_ioctl+0x21c/0x460
> > [ 112.644967] __arm64_sys_ioctl+0xa8/0xf0
> > [ 112.648939] el0_svc_common.constprop.0+0x78/0x1a0
> > [ 112.653775] do_el0_svc+0x24/0x90
> > [ 112.657148] el0_svc+0x14/0x20
> > [ 112.660254] el0_sync_handler+0x1a4/0x1b0
> > [ 112.664315] el0_sync+0x174/0x180
> > [ 112.668145] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz:
> > 3686400 bytes), total 65536 (slots), used 112 (slots)
> > ```
> > The problem is happened here:
> > https://elixir.bootlin.com/linux/v5.14.4/source/drivers/gpu/drm/rcar-du/rcar_du_vsp.c#L202
> >
> > Sgt was created in dma_get_sgtable() by dma_common_get_sgtable() and
> > includes a single page chunk
> > as shown here:
> > https://elixir.bootlin.com/linux/v5.14.5/source/kernel/dma/ops_helpers.c#L18
> >
> > After creating a new sgt, we tried to map this sgt through vsp1_du_map_sg().
> > Internally, vsp1_du_map_sg() using ops->map_sg (e.g
> > xen_swiotlb_map_sg) to perform
> > mapping.
> >
> > I realized that required segment is too big to be fitted to default
> > swiotlb segment and condition
> > https://elixir.bootlin.com/linux/latest/source/kernel/dma/swiotlb.c#L474
> > is always false.
> >
> > I know that I use a large buffer, but why can't I map this buffer in one 
> > chunk?
> >
> > Thanks!
> >
> > ср, 15 сент. 2021 г. в 16:53, Christoph Hellwig :
> >>
> >> On Wed, Sep 15, 2021 at 03:49:52PM +0200, Jan Beulich wrote:
> >>> But the question remains: Why does the framebuffer need to be mapped
> >>> in a single giant chunk?
> >>
> >> More importantly: if you use dynamic dma mappings for your framebuffer
> >> you're doing something wrong.
> >
> >
> >



-- 
Best Regards, Roman.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC 00/20] Introduce /dev/iommu for userspace I/O address space management

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:28PM +0800, Liu Yi L wrote:
> Linux now includes multiple device-passthrough frameworks (e.g. VFIO and
> vDPA) to manage secure device access from the userspace. One critical task
> of those frameworks is to put the assigned device in a secure, IOMMU-
> protected context so user-initiated DMAs are prevented from doing harm to
> the rest of the system.

Some bot will probably send this too, but it has compile warnings and
needs to be rebased to 5.15-rc1

drivers/iommu/iommufd/iommufd.c:269:6: warning: variable 'ret' is used 
uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (refcount_read(&ioas->refs) > 1) {
^~
drivers/iommu/iommufd/iommufd.c:277:9: note: uninitialized use occurs here
return ret;
   ^~~
drivers/iommu/iommufd/iommufd.c:269:2: note: remove the 'if' if its condition 
is always true
if (refcount_read(&ioas->refs) > 1) {
^~~~
drivers/iommu/iommufd/iommufd.c:253:17: note: initialize the variable 'ret' to 
silence this warning
int ioasid, ret;
   ^
= 0
drivers/iommu/iommufd/iommufd.c:727:7: warning: variable 'ret' is used 
uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
^~
drivers/iommu/iommufd/iommufd.c:767:17: note: uninitialized use occurs here
return ERR_PTR(ret);
   ^~~
drivers/iommu/iommufd/iommufd.c:727:3: note: remove the 'if' if its condition 
is always false
if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
^
drivers/iommu/iommufd/iommufd.c:727:7: warning: variable 'ret' is used 
uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
^~~~
drivers/iommu/iommufd/iommufd.c:767:17: note: uninitialized use occurs here
return ERR_PTR(ret);
   ^~~
drivers/iommu/iommufd/iommufd.c:727:7: note: remove the '||' if its condition 
is always false
if (idev->dev == dev || idev->dev_cookie == dev_cookie) {
^~~
drivers/iommu/iommufd/iommufd.c:717:9: note: initialize the variable 'ret' to 
silence this warning
int ret;
   ^
= 0

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu-v3: poll cmdq until it has space

2021-09-21 Thread Fernand Sieber via iommu
When a thread sends commands to the SMMU, it needs to allocate some
space to write its commands in a ring buffer.

The allocation algorithms works as follows: until enough free spaced is
available in the queue, repeat the following outer loop. First, try to
acquire an exclusive lock to read the consumer index from the SMMU
register over MMIO. If that fails, it means that another thread holds
the lock (or multiple threads, in shared mode, for sync commands). The
current code guarantees that when a thread is holding the lock, the
consumer index will be updated from the SMMU register. So then when the
acquisition of the exclusive lock fails, we can safely assume that
another thread will eventually update the consumer index and enter an
inner waiting loop until that happens.

The problem that this patch fixes is that the waiting loop exits as soon
as any space is available in the queue, so it is possible that it exits
immediately if there's some space available but not enough to write the
thread's commands. That means the cmdq allocation queue will fail (outer
loop) and the thread will spin attempting to acquire the exclusive lock
to update the consumer index from the SMMU register.

If a lot of threads are failing to allocate commands, this can cause
heavy contention on the lock, to the point where the system slowdown or
livelocks. The livelock is possible if some other threads are attempting
to execute synchronous commands. These threads need to ensure that they
control updates to the consumer index so that they can correctly observe
when their command is executed, they enforce that by acquiring the lock
in shared mode. If there's too much contention, they never succeed to
acquire the lock via the read+cmpxchg mechanism, and their progress
stall. But because they already hold allocated space in the command
queue, their stall prevents progress from other threads attempting to
allocate space in the cmdq. This causes a livelock.

This patch makes the waiting loop exit as soon as enough space is
available, rather than as soon as any space is available. This means
that when two threads are competing for the exclusive lock when
allocating space in the queue, one of them will fail to acquiire the
lock in exclusive lock and be knocked to the waiting loop and stay there
until there's enough free space rather than exiting it immediately when
any space is available. Threads in the waiting loop do not compete for
the lock, reducing contention enough to enable synchronous threads to
make progress, when applicable.

Note that we cannot afford to have all threads parked in the waiting
loop unless there are synchronous threads executing concurrenty,
otherwise no thread is observing the SMMU register and updating the
consumer index. Thus if we succeed to acquire the lock in exclusive
mode, we cannot enter the waiting loop because we could be the last
thread observing the SMMU. Similarly, if the producer index is updated,
we need to exit the waiting loop because it could mean that the latest
thread to observe the SMMU has succeeded to allocate commands and thus
has moved on.

Signed-off-by: Fernand Sieber 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++--
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a388e318f86e..9ccda3bd5402 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -118,12 +118,6 @@ static bool queue_has_space(struct arm_smmu_ll_queue *q, 
u32 n)
return space >= n;
 }

-static bool queue_full(struct arm_smmu_ll_queue *q)
-{
-   return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
-  Q_WRP(q, q->prod) != Q_WRP(q, q->cons);
-}
-
 static bool queue_empty(struct arm_smmu_ll_queue *q)
 {
return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
@@ -582,14 +576,16 @@ static void arm_smmu_cmdq_poll_valid_map(struct 
arm_smmu_cmdq *cmdq,
__arm_smmu_cmdq_poll_set_valid_map(cmdq, sprod, eprod, false);
 }

-/* Wait for the command queue to become non-full */
-static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu,
-struct arm_smmu_ll_queue *llq)
+/* Wait for the command queue to have enough space */
+static int arm_smmu_cmdq_poll_until_has_space(struct arm_smmu_device *smmu,
+ struct arm_smmu_ll_queue *llq,
+ u32 n)
 {
unsigned long flags;
struct arm_smmu_queue_poll qp;
struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
int ret = 0;
+   int prod;

/*
 * Try to update our copy of cons by grabbing exclusive cmdq access. If
@@ -599,15 +595,36 @@ static int arm_smmu_cmdq_poll_until_not_full(struct 
arm_smmu_device *smmu,
WRITE_ONCE(cmdq->q.llq.cons, readl_relaxed(cmdq->q.cons_reg));

Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()

2021-09-21 Thread John Garry

On 02/08/2021 17:16, Robin Murphy wrote:


Also fix up all users to set this value (at 0, meaning use default).

Wrap that in init_iova_domain_defaults() to avoid the mysterious 0?


Sure, I can do something like that. I actually did have separate along 
those lines in v3 before I decided to change it.


Y'know, at this point I'm now starting to seriously wonder whether 
moving the rcaches into iommu_dma_cookie wouldn't make a whole lot of 
things simpler... :/


Does that sound like crazy talk to you, or an idea worth entertaining?


Hi Robin,

JFYI, to try to make inroads into my IOVA issues, I'm going to look to 
do this first, if you don't mind. I think that the fq stuff can also be 
put into a separate structure also, rather than iova_domain, and that 
can also be a member of iommu_dma_cookie.


BTW, with regards to separating the rcache magazine code out, I see 
someone already trying to introduce something similar:


https://lore.kernel.org/lkml/cakw4uuxperg41z8lu5qyss-yegt1anud1cuiuqxc0anfqjb...@mail.gmail.com/T/#me4cc5de775ad16ab3d6e7ca854b55f274ddcba08

https://lore.kernel.org/lkml/yukerk1vvzmht...@casper.infradead.org/T/#t

Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu