Re: [PATCH v7 0/9] ACPI/IORT: Support for IORT RMR node

2021-08-30 Thread Jon Nettleton
On Thu, Aug 5, 2021 at 4:09 PM Ard Biesheuvel  wrote:
>
> On Thu, 5 Aug 2021 at 15:35, Shameerali Kolothum Thodi
>  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Ard Biesheuvel [mailto:a...@kernel.org]
> > > Sent: 05 August 2021 14:23
> > > To: Shameerali Kolothum Thodi 
> > > Cc: Linux ARM ; ACPI Devel Maling 
> > > List
> > > ; Linux IOMMU
> > > ; Linuxarm ;
> > > Lorenzo Pieralisi ; Joerg Roedel
> > > ; Robin Murphy ; Will Deacon
> > > ; wanghuiqiang ; Guohanjun
> > > (Hanjun Guo) ; Steven Price
> > > ; Sami Mujawar ; Jon
> > > Nettleton ; Eric Auger ;
> > > yangyicong 
> > > Subject: Re: [PATCH v7 0/9] ACPI/IORT: Support for IORT RMR node
> > >
> > > On Thu, 5 Aug 2021 at 10:10, Shameer Kolothum
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > The series adds support to IORT RMR nodes specified in IORT
> > > > Revision E.b -ARM DEN 0049E[0]. RMR nodes are used to describe
> > > > memory ranges that are used by endpoints and require a unity
> > > > mapping in SMMU.
> > > >
> > > > We have faced issues with 3408iMR RAID controller cards which
> > > > fail to boot when SMMU is enabled. This is because these
> > > > controllers make use of host memory for various caching related
> > > > purposes and when SMMU is enabled the iMR firmware fails to
> > > > access these memory regions as there is no mapping for them.
> > > > IORT RMR provides a way for UEFI to describe and report these
> > > > memory regions so that the kernel can make a unity mapping for
> > > > these in SMMU.
> > > >
> > >
> > > Does this mean we are ignoring the RMR memory ranges, and exposing the
> > > entire physical address space to devices using the stream IDs in
> > > question?
> >
> > Nope. RMR node is used to describe the memory ranges used by end points
> > behind SMMU. And this information is used to create 1 : 1 mappings for those
> > ranges in SMMU. Anything outside those ranges will result in translation
> > fault(if there are no other dynamic DMA mappings).
> >
>
> Excellent! It was not obvious to me from looking at the patches, so I
> had to ask.
>
> Thanks,
> Ard.
>
> >
> > >
> > > > Change History:
> > > >
> > > > v6 --> v7
> > > >
> > > > The only change from v6 is the fix pointed out by Steve to
> > > > the SMMUv2 SMR bypass install in patch #8.
> > > >
> > > > Thanks to the Tested-by tags by Laurentiu with SMMUv2 and
> > > > Hanjun/Huiqiang with SMMUv3 for v6. I haven't added the tags
> > > > yet as the series still needs more review[1].
> > > >
> > > > Feedback and tests on this series is very much appreciated.
> > > >
> > > > v5 --> v6
> > > > - Addressed comments from Robin & Lorenzo.
> > > >   : Moved iort_parse_rmr() to acpi_iort_init() from
> > > > iort_init_platform_devices().
> > > >   : Removed use of struct iort_rmr_entry during the initial
> > > > parse. Using struct iommu_resv_region instead.
> > > >   : Report RMR address alignment and overlap errors, but continue.
> > > >   : Reworked arm_smmu_init_bypass_stes() (patch # 6).
> > > > - Updated SMMUv2 bypass SMR code. Thanks to Jon N (patch #8).
> > > > - Set IOMMU protection flags(IOMMU_CACHE, IOMMU_MMIO) based
> > > >   on Type of RMR region. Suggested by Jon N.
> > > >
> > > > Thanks,
> > > > Shameer
> > > > [0] https://developer.arm.com/documentation/den0049/latest/
> > > > [1]
> > > https://lore.kernel.org/linux-acpi/20210716083442.1708-1-shameerali.koloth
> > > um.th...@huawei.com/T/#m043c95b869973a834b2fd57f3e1ed0325c84f3b7
> > > > --
> > > > v4 --> v5
> > > >  -Added a fw_data union to struct iommu_resv_region and removed
> > > >   struct iommu_rmr (Based on comments from Joerg/Robin).
> > > >  -Added iommu_put_rmrs() to release mem.
> > > >  -Thanks to Steve for verifying on SMMUv2, but not added the Tested-by
> > > >   yet because of the above changes.
> > > >
> > > > v3 -->v4
> > > > -Included the SMMUv2 SMR bypass install changes suggested by
> > > >  Steve(patch #7)
> > > > -As per Robin's comments, RMR reserve implementation is now
> > > >  more generic  (patch #8) and dropped v3 patches 8 and 10.
> > > > -Rebase to 5.13-rc1
> > > >
> > > > RFC v2 --> v3
> > > >  -Dropped RFC tag as the ACPICA header changes are now ready to be
> > > >   part of 5.13[0]. But this series still has a dependency on that patch.
> > > >  -Added IORT E.b related changes(node flags, _DSM function 5 checks for
> > > >   PCIe).
> > > >  -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
> > > >   discussion here[1].
> > > >  -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!)
> > > > --
> > > >
> > > > Jon Nettleton (1):
> > > >   iommu/arm-smmu: Get associated RMR info and install bypass SMR
> > > >
> > > > Shameer Kolothum (8):
> > > >   iommu: Introduce a union to struct iommu_resv_region
> > > >   ACPI/IORT: Add support for RMR node parsing
> > > >   iommu/dma: Introduce generic helper to retrieve RMR info
> > > >   ACPI/IORT: Add a helper to retrieve RMR memory regions
> > > >   iommu/arm-smmu-v3: I

[RFC][PATCH v2 12/13] iommu/arm-smmu-v3: Add support for NVIDIA CMDQ-Virtualization hw

2021-08-30 Thread Nicolin Chen via iommu
From: Nate Watterson 

NVIDIA's Grace SoC has a CMDQ-Virtualization (CMDQV) hardware,
which adds multiple VCMDQ interfaces (VINTFs) to supplement the
architected SMMU_CMDQ in an effort to reduce contention.

To make use of these supplemental CMDQs in arm-smmu-v3 driver,
this patch borrows the "implemenatation infrastructure" design
from the arm-smmu driver, and then adds implementation specific
supports for ->device_reset() and ->get_cmdq() functions. Since
nvidia's ->get_cmdq() implemenatation needs to check the first
command of the cmdlist to determine whether to redirect to its
own vcmdq, this patch also adds augments to arm_smmu_get_cmdq()
function.

For the CMDQV driver itself, this patch only adds the essential
parts for the host kernel, in terms of virtualization use cases.
VINTF0 is being reserved for host kernel use, so is initialized
with the driver also.

Note that, for the current plan, the CMDQV driver only supports
ACPI configuration.

Signed-off-by: Nate Watterson 
Signed-off-by: Nicolin Chen 
---
 MAINTAINERS   |   2 +
 drivers/iommu/arm/arm-smmu-v3/Makefile|   2 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c  |   7 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  15 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   8 +
 .../iommu/arm/arm-smmu-v3/nvidia-smmu-v3.c| 432 ++
 6 files changed, 463 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/nvidia-smmu-v3.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f800abca74b0..7a2f21279d35 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18428,8 +18428,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
 R: Krishna Reddy 
+R: Nicolin Chen 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm/arm-smmu-v3/nvidia-smmu-v3.c
 F: drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile 
b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 1f5838d3351b..0aa84c0a50ea 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
-arm_smmu_v3-objs-y += arm-smmu-v3.o arm-smmu-v3-impl.o
+arm_smmu_v3-objs-y += arm-smmu-v3.o arm-smmu-v3-impl.o nvidia-smmu-v3.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c
index 6947d28067a8..37d062e40eb5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c
@@ -4,5 +4,12 @@
 
 struct arm_smmu_device *arm_smmu_v3_impl_init(struct arm_smmu_device *smmu)
 {
+   /*
+* Nvidia implementation supports ACPI only, so calling its init()
+* unconditionally to walk through ACPI tables to probe the device.
+* It will keep the smmu pointer intact, if it fails.
+*/
+   smmu = nvidia_smmu_v3_impl_init(smmu);
+
return smmu;
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 510e1493fd5a..1b9459592f76 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -335,8 +335,11 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
return 0;
 }
 
-static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu)
+static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu, 
u64 *cmds, int n)
 {
+   if (smmu->impl && smmu->impl->get_cmdq)
+   return smmu->impl->get_cmdq(smmu, cmds, n);
+
return &smmu->cmdq;
 }
 
@@ -742,7 +745,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct 
arm_smmu_device *smmu,
u32 prod;
unsigned long flags;
bool owner;
-   struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
+   struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu, cmds, n);
struct arm_smmu_ll_queue llq, head;
int ret = 0;
 
@@ -3487,6 +3490,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
return ret;
}
 
+   if (smmu->impl && smmu->impl->device_reset) {
+   ret = smmu->impl->device_reset(smmu);
+   if (ret) {
+   dev_err(smmu->dev, "failed at implementation specific 
device_reset\n");
+   return ret;
+   }
+   }
+
return 0;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index c65c39336916..bb903a7fa662 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -647,6 +647,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_O

[RFC][PATCH v2 10/13] iommu/arm-smmu-v3: Pass cmdq pointer in arm_smmu_cmdq_issue_cmdlist()

2021-08-30 Thread Nicolin Chen via iommu
The driver currently calls arm_smmu_get_cmdq() helper internally in
different places, though they are all actually called from the same
source -- arm_smmu_cmdq_issue_cmdlist() function.

This patch changes this to pass the cmdq pointer to these functions
instead of calling arm_smmu_get_cmdq() every time.

This also helps NVIDIA implementation, which maintains its own cmdq
pointers and needs to redirect the cmdq pointer from arm_smmu->cmdq
pointer to its own, upon scanning the illegal commands by checking
the opcode of the cmdlist.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6878a83582b9..216f3442aac4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -584,11 +584,11 @@ static void arm_smmu_cmdq_poll_valid_map(struct 
arm_smmu_cmdq *cmdq,
 
 /* Wait for the command queue to become non-full */
 static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu,
+struct arm_smmu_cmdq *cmdq,
 struct arm_smmu_ll_queue *llq)
 {
unsigned long flags;
struct arm_smmu_queue_poll qp;
-   struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
int ret = 0;
 
/*
@@ -619,11 +619,11 @@ static int arm_smmu_cmdq_poll_until_not_full(struct 
arm_smmu_device *smmu,
  * Must be called with the cmdq lock held in some capacity.
  */
 static int __arm_smmu_cmdq_poll_until_msi(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq *cmdq,
  struct arm_smmu_ll_queue *llq)
 {
int ret = 0;
struct arm_smmu_queue_poll qp;
-   struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
u32 *cmd = (u32 *)(Q_ENT(&cmdq->q, llq->prod));
 
queue_poll_init(smmu, &qp);
@@ -643,10 +643,10 @@ static int __arm_smmu_cmdq_poll_until_msi(struct 
arm_smmu_device *smmu,
  * Must be called with the cmdq lock held in some capacity.
  */
 static int __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu,
+  struct arm_smmu_cmdq *cmdq,
   struct arm_smmu_ll_queue *llq)
 {
struct arm_smmu_queue_poll qp;
-   struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu);
u32 prod = llq->prod;
int ret = 0;
 
@@ -693,12 +693,13 @@ static int __arm_smmu_cmdq_poll_until_consumed(struct 
arm_smmu_device *smmu,
 }
 
 static int arm_smmu_cmdq_poll_until_sync(struct arm_smmu_device *smmu,
+struct arm_smmu_cmdq *cmdq,
 struct arm_smmu_ll_queue *llq)
 {
if (smmu->options & ARM_SMMU_OPT_MSIPOLL)
-   return __arm_smmu_cmdq_poll_until_msi(smmu, llq);
+   return __arm_smmu_cmdq_poll_until_msi(smmu, cmdq, llq);
 
-   return __arm_smmu_cmdq_poll_until_consumed(smmu, llq);
+   return __arm_smmu_cmdq_poll_until_consumed(smmu, cmdq, llq);
 }
 
 static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds,
@@ -755,7 +756,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct 
arm_smmu_device *smmu,
 
while (!queue_has_space(&llq, n + sync)) {
local_irq_restore(flags);
-   if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
+   if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
dev_err_ratelimited(smmu->dev, "CMDQ 
timeout\n");
local_irq_save(flags);
}
@@ -831,7 +832,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct 
arm_smmu_device *smmu,
/* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
if (sync) {
llq.prod = queue_inc_prod_n(&llq, n);
-   ret = arm_smmu_cmdq_poll_until_sync(smmu, &llq);
+   ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
if (ret) {
dev_err_ratelimited(smmu->dev,
"CMD_SYNC timeout at 0x%08x [hwprod 
0x%08x, hwcons 0x%08x]\n",
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 06/13] vfio/type1: Set/get VMID to/from iommu driver

2021-08-30 Thread Nicolin Chen via iommu
This patch adds a pair of callbacks of iommu_set_nesting_vmid() and
iommu_get_nesting_vmid() to exchange VMID with the IOMMU core (then
an IOMMU driver).

As a VMID is generated in an IOMMU driver, which is called from the
vfio_iommu_attach_group() function call, add iommu_get_nesting_vmid
right after it creates a VMID and add iommu_set_nesting_vmid before
it to let IOMMU driver reuse it.

Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index bb5d949bc1af..9e72d74dedcd 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2322,12 +2322,24 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
ret = iommu_enable_nesting(domain->domain);
if (ret)
goto out_domain;
+
+   if (iommu->vmid != VFIO_IOMMU_VMID_INVALID) {
+   ret = iommu_set_nesting_vmid(domain->domain, 
iommu->vmid);
+   if (ret)
+   goto out_domain;
+   }
}
 
ret = vfio_iommu_attach_group(domain, group);
if (ret)
goto out_domain;
 
+   if (iommu->nesting && iommu->vmid == VFIO_IOMMU_VMID_INVALID) {
+   ret = iommu_get_nesting_vmid(domain->domain, &iommu->vmid);
+   if (ret)
+   goto out_domain;
+   }
+
/* Get aperture info */
geo = &domain->domain->geometry;
if (vfio_iommu_aper_conflict(iommu, geo->aperture_start,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 11/13] iommu/arm-smmu-v3: Add implementation infrastructure

2021-08-30 Thread Nicolin Chen via iommu
From: Nate Watterson 

Follow arm-smmu driver's infrastructure for handling implementation
specific details outside the flow of architectural code.

Signed-off-by: Nate Watterson 
Signed-off-by: Nicolin Chen 
---
 drivers/iommu/arm/arm-smmu-v3/Makefile   | 2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c | 8 
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c  | 4 
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h  | 4 
 4 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c

diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile 
b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 54feb1ecccad..1f5838d3351b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
-arm_smmu_v3-objs-y += arm-smmu-v3.o
+arm_smmu_v3-objs-y += arm-smmu-v3.o arm-smmu-v3-impl.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c
new file mode 100644
index ..6947d28067a8
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "arm-smmu-v3.h"
+
+struct arm_smmu_device *arm_smmu_v3_impl_init(struct arm_smmu_device *smmu)
+{
+   return smmu;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 216f3442aac4..510e1493fd5a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3844,6 +3844,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
return ret;
}
 
+   smmu = arm_smmu_v3_impl_init(smmu);
+   if (IS_ERR(smmu))
+   return PTR_ERR(smmu);
+
/* Set bypass mode according to firmware probing result */
bypass = !!ret;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 20463d17fd9f..c65c39336916 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -810,4 +810,8 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva 
*handle)
 
 static inline void arm_smmu_sva_notifier_synchronize(void) {}
 #endif /* CONFIG_ARM_SMMU_V3_SVA */
+
+/* Implementation details */
+struct arm_smmu_device *arm_smmu_v3_impl_init(struct arm_smmu_device *smmu);
+
 #endif /* _ARM_SMMU_V3_H */
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 08/13] iommu/arm-smmu-v3: Add VMID alloc/free helpers

2021-08-30 Thread Nicolin Chen via iommu
NVIDIA implementation needs to link its Virtual Interface to a
VMID, before a device gets attached to the corresponding iommu
domain. One way to ensure that is to allocate a VMID from impl
side and to pass it down to virtual machine hypervisor so that
later it can set it back to passthrough devices' iommu domains
calling newly added arm_smmu_set/get_nesting_vmid() functions.

This patch adds a pair of helpers to allocate and free VMID in
the bitmap.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c0ae117711fa..497d55ec659b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2032,6 +2032,16 @@ static void arm_smmu_bitmap_free(unsigned long *map, int 
idx)
clear_bit(idx, map);
 }
 
+int arm_smmu_vmid_alloc(struct arm_smmu_device *smmu)
+{
+   return arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits);
+}
+
+void arm_smmu_vmid_free(struct arm_smmu_device *smmu, u16 vmid)
+{
+   arm_smmu_bitmap_free(smmu->vmid_map, vmid);
+}
+
 static void arm_smmu_domain_free(struct iommu_domain *domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ea2c61d52df8..20463d17fd9f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -749,6 +749,9 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
 int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
unsigned long iova, size_t size);
 
+int arm_smmu_vmid_alloc(struct arm_smmu_device *smmu);
+void arm_smmu_vmid_free(struct arm_smmu_device *smmu, u16 vmid);
+
 #ifdef CONFIG_ARM_SMMU_V3_SVA
 bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
 bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 13/13] iommu/nvidia-smmu-v3: Add mdev interface support

2021-08-30 Thread Nicolin Chen via iommu
From: Nate Watterson 

This patch adds initial mdev interface support for NVIDIA SMMU CMDQV
driver.

The NVIDIA SMMU CMDQV module has multiple virtual interfaces (VINTFs),
designed to be exposed to virtual machines running on the user space,
while each VINTF can allocate dedicated VCMDQs for TLB invalidations.

The hypervisor can import, to a VM, one of these interfaces via VFIO
mdev interface, to get access to VINTF registers in the host kernel.

Each VINTF has two pages of MMIO regions: PAGE0 and PAGE1. PAGE0 has
performance sensitive registers such as CONS_INDX and PROD_INDX that
should be programmed by the guest directly, so the driver has a mmap
implementation via the mdev interface to let user space get acces to
PAGE0 directly. PAGE1 then has two base address configuring registers
where the addresses should be translated from guest PAs to host PAs,
so they are handled via mdev read/write() to trap for replacements.

As previous patch mentioned, VINTF0 is reserved for the host kernel
(or hypervisor) use, a VINTFx (x > 0) should be allocated to a guest
VM. And from the guest perspective, the VINTFx (host) is seen as the
VINTF0 of the guest. Beside the two MMIO regions of VINTF0, the guest
VM also has the global configuration MMIO region as the host kernel
does, and this global region is also handled via mdev read/write()
to limit the guest to access the bits of its own.

Additionally, there were a couple of issues for this implementation:
1) Setting into VINTF CONFIG register the same VMID as SMMU's s2_cfg.
2) Before enabling the VINTF, programing up-to-16 sets of SID_REPLACE
   and SID_MATCH registers that stores physical stream IDs of host's
   and corresponding virtual stream IDs of guest's respectively.

And in this patch, we add a pair of ->attach_dev and ->detach_dev and
implement them in the following ways:
1) For each VINTF, pre-allocating a VMID on the bitmap of arm_smmu_v3
   driver to create a link between VINTF index and VMID, so either of
   them can be quickly looked up using the counterpart later.
2) Programming PHY_SID into SID_REPLACE (corresponding register), yet
   writing iommu_group_id (a fake VIRT_SID) into SID_MATCH, as it is
   the only shared information of a passthrough device between a host
   kernel and a hypervisor. So the hypervisor is responsible to match
   the iommu_group_id and then to replace it with a virtual SID.
3) Note that, by doing (1) the VMID is now created along with a VINTF
   in the nvidia_smmu_cmdqv_mdev_create() function, which is executed
   before a hypervisor or VM starts, comparing to previous situation:
   we added a few patches to let arm-smmu-v3 driver allocate a shared
   VMID in arm_smmu_attach_dev() function, when the first passthrough
   device is added to the VM. This means that, in the new situation,
   the shared VMID needs to be passed to the hypervisor, before any
   passthrough device gets attached. So, we reuse VFIO_IOMMU_GET_VMID
   command via the mdev ioctl interface to pass the VMID to the CMDQV
   device model, then to the SMMUv3 device model, so that hypervisor
   can set the same VMID to all IOMMU domains of passthrough devices
   using the previous pathway via VFIO core back to SMMUv3 driver.

Signed-off-by: Nate Watterson 
Signed-off-by: Nicolin Chen 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   6 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   2 +
 .../iommu/arm/arm-smmu-v3/nvidia-smmu-v3.c| 817 ++
 3 files changed, 825 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1b9459592f76..fc543181ddde 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2389,6 +2389,9 @@ static void arm_smmu_detach_dev(struct arm_smmu_master 
*master, struct device *d
if (!smmu_domain)
return;
 
+   if (master->smmu->impl && master->smmu->impl->detach_dev)
+   master->smmu->impl->detach_dev(smmu_domain, dev);
+
arm_smmu_disable_ats(master);
 
spin_lock_irqsave(&smmu_domain->devices_lock, flags);
@@ -2471,6 +2474,9 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 
arm_smmu_enable_ats(master);
 
+   if (smmu->impl && smmu->impl->attach_dev)
+   ret = smmu->impl->attach_dev(smmu_domain, dev);
+
 out_unlock:
mutex_unlock(&smmu_domain->init_mutex);
return ret;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index bb903a7fa662..a872c0d2f23c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -817,6 +817,8 @@ static inline void arm_smmu_sva_notifier_synchronize(void) 
{}
 struct arm_smmu_impl {
int (*device_reset)(struct arm_smmu_device *smmu);
struct arm_smmu_cmdq *(*get_cmdq)(struct arm_smmu_device *smmu, u64 
*cmds, int n);
+   

[RFC][PATCH v2 03/13] vfio: Document VMID control for IOMMU Virtualization

2021-08-30 Thread Nicolin Chen via iommu
The VFIO API was enhanced to support VMID control with two
new iotcls to set and get VMID between the kernel and the
virtual machine hypervisor. So updating the document.

Signed-off-by: Nicolin Chen 
---
 Documentation/driver-api/vfio.rst | 34 +++
 1 file changed, 34 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst 
b/Documentation/driver-api/vfio.rst
index c663b6f97825..a76a17065cdd 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,40 @@ group and can access them as follows::
/* Gratuitous device reset and go... */
ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Virtual Machine Identifier (VMID)
+
+In case of virtualization, each VM is assigned a Virtual Machine Identifier
+(VMID). This VMID is used to tag translation lookaside buffer (TLB) entries,
+to identify which VM each entry belongs to. This tagging allows translations
+for multiple different VMs to be present in the TLBs at the same time.
+
+The IOMMU Kernel driver is responsible for allocating a VMID. However, only
+a hypervisor knows what physical devices get assigned to the same VM. Thus,
+when the first physical device gets assigned to the VM, once the hypervisor
+finishes its IOCTL call of VFIO_SET_IOMMU, it should call the following:
+
+struct vm {
+   int iommu_type;
+   uint32_t vmid;  /* initial value is VFIO_IOMMU_VMID_INVALID */
+} vm0;
+
+   /* ... */
+   ioctl(container->fd, VFIO_SET_IOMMU, vm0->iommu_type);
+   /* ... */
+   if (vm0->vmid == VFIO_IOMMU_VMID_INVALID)
+   ioctl(container->fd, VFIO_IOMMU_GET_VMID, &vm0->vmid);
+
+This VMID would be the shared value, across the entire VM, between all the
+physical devices that are assigned to it. So, when other physical devices
+get assigned to the VM, before the hypervisor runs into the IOCTL call of
+VFIO_IOMMU_SET_VMID, it should call the following:
+
+   /* ... */
+   ioctl(container->fd, VFIO_SET_IOMMU, vm0->iommu_type);
+   /* ... */
+   if (vm0->vmid != VFIO_IOMMU_VMID_INVALID)
+   ioctl(container->fd, VFIO_IOMMU_SET_VMID, vmid);
+
 VFIO User API
 ---
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 09/13] iommu/arm-smmu-v3: Pass dev pointer to arm_smmu_detach_dev

2021-08-30 Thread Nicolin Chen via iommu
We are adding NVIDIA implementation that will need a ->detach_dev()
callback along with the dev pointer to grab client information.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 497d55ec659b..6878a83582b9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2377,7 +2377,7 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master 
*master)
pci_disable_pasid(pdev);
 }
 
-static void arm_smmu_detach_dev(struct arm_smmu_master *master)
+static void arm_smmu_detach_dev(struct arm_smmu_master *master, struct device 
*dev)
 {
unsigned long flags;
struct arm_smmu_domain *smmu_domain = master->domain;
@@ -2421,7 +2421,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -EBUSY;
}
 
-   arm_smmu_detach_dev(master);
+   arm_smmu_detach_dev(master, dev);
 
mutex_lock(&smmu_domain->init_mutex);
 
@@ -2713,7 +2713,7 @@ static void arm_smmu_release_device(struct device *dev)
master = dev_iommu_priv_get(dev);
if (WARN_ON(arm_smmu_master_sva_enabled(master)))
iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
-   arm_smmu_detach_dev(master);
+   arm_smmu_detach_dev(master, dev);
arm_smmu_disable_pasid(master);
arm_smmu_remove_master(master);
kfree(master);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 04/13] vfio: add set_vmid and get_vmid for vfio_iommu_type1

2021-08-30 Thread Nicolin Chen via iommu
A VMID is generated in an IOMMU driver, being called from this
->attach_group() callback. So call ->get_vmid() right after it
creates a new VMID, and call ->set_vmid() before it, to let it
reuse the same VMID.

Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio.c  | 12 
 include/linux/vfio.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c17b25c127a2..8b7442deca93 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1080,9 +1080,21 @@ static int __vfio_container_attach_groups(struct 
vfio_container *container,
int ret = -ENODEV;
 
list_for_each_entry(group, &container->group_list, container_next) {
+   if (driver->ops->set_vmid && container->vmid != 
VFIO_IOMMU_VMID_INVALID) {
+   ret = driver->ops->set_vmid(data, container->vmid);
+   if (ret)
+   goto unwind;
+   }
+
ret = driver->ops->attach_group(data, group->iommu_group);
if (ret)
goto unwind;
+
+   if (driver->ops->get_vmid && container->vmid == 
VFIO_IOMMU_VMID_INVALID) {
+   ret = driver->ops->get_vmid(data, &container->vmid);
+   if (ret)
+   goto unwind;
+   }
}
 
return ret;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index b53a9557884a..b43e7cbef4ab 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -126,6 +126,8 @@ struct vfio_iommu_driver_ops {
   struct iommu_group *group);
void(*notify)(void *iommu_data,
  enum vfio_iommu_notify_type event);
+   int (*set_vmid)(void *iommu_data, u32 vmid);
+   int (*get_vmid)(void *iommu_data, u32 *vmid);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 07/13] iommu/arm-smmu-v3: Add shared VMID support for NESTING

2021-08-30 Thread Nicolin Chen via iommu
A VMID can be shared among iommu domains being attached to the same
Virtual Machine in order to improve utilization of TLB cache.

This patch implements ->set_nesting_vmid() and ->get_nesting_vmid()
to set/get s2_cfg->vmid for nesting cases, and then changes to reuse
the VMID.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
 2 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a388e318f86e..c0ae117711fa 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2051,7 +2051,7 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
mutex_unlock(&arm_smmu_asid_lock);
} else {
struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
-   if (cfg->vmid)
+   if (cfg->vmid && 
!atomic_dec_return(&smmu->vmid_refcnts[cfg->vmid]))
arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
}
 
@@ -2121,17 +2121,28 @@ static int arm_smmu_domain_finalise_s2(struct 
arm_smmu_domain *smmu_domain,
   struct arm_smmu_master *master,
   struct io_pgtable_cfg *pgtbl_cfg)
 {
-   int vmid;
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr;
 
-   vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits);
-   if (vmid < 0)
-   return vmid;
+   /*
+* For a nested case where there are multiple passthrough devices to a
+* VM, they share a commond VMID, allocated when the first passthrough
+* device is attached to the VM. So the cfg->vmid might be already set
+* in arm_smmu_set_nesting_vmid(), reported from the hypervisor. In this
+* case, simply reuse the shared VMID and increase its refcount.
+*/
+   if (!cfg->vmid) {
+   int vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, 
smmu->vmid_bits);
+
+   if (vmid < 0)
+   return vmid;
+   cfg->vmid = (u16)vmid;
+   }
+
+   atomic_inc(&smmu->vmid_refcnts[cfg->vmid]);
 
vtcr = &pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
-   cfg->vmid   = (u16)vmid;
cfg->vttbr  = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
cfg->vtcr   = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) |
  FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) |
@@ -2731,6 +2742,44 @@ static int arm_smmu_enable_nesting(struct iommu_domain 
*domain)
return ret;
 }
 
+static int arm_smmu_set_nesting_vmid(struct iommu_domain *domain, u32 vmid)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
+   int ret = 0;
+
+   if (vmid == IOMMU_VMID_INVALID)
+   return -EINVAL;
+
+   mutex_lock(&smmu_domain->init_mutex);
+   if (smmu_domain->smmu || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   ret = -EPERM;
+   else
+   s2_cfg->vmid = vmid;
+   mutex_unlock(&smmu_domain->init_mutex);
+
+   return ret;
+}
+
+static int arm_smmu_get_nesting_vmid(struct iommu_domain *domain, u32 *vmid)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
+   int ret = 0;
+
+   if (!vmid)
+   return -EINVAL;
+
+   mutex_lock(&smmu_domain->init_mutex);
+   if (smmu_domain->smmu || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   ret = -EPERM;
+   else
+   *vmid = s2_cfg->vmid;
+   mutex_unlock(&smmu_domain->init_mutex);
+
+   return ret;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2845,6 +2894,8 @@ static struct iommu_ops arm_smmu_ops = {
.release_device = arm_smmu_release_device,
.device_group   = arm_smmu_device_group,
.enable_nesting = arm_smmu_enable_nesting,
+   .set_nesting_vmid   = arm_smmu_set_nesting_vmid,
+   .get_nesting_vmid   = arm_smmu_get_nesting_vmid,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
@@ -3530,6 +3581,8 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
/* ASID/VMID sizes */
smmu->asid_bits = reg & IDR0_ASID16 ? 16 : 8;
smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;
+   smmu->vmid_refcnts = devm_kcalloc(smmu->dev, 1 << smmu->vmid_bits,
+  

[RFC][PATCH v2 01/13] iommu: Add set_nesting_vmid/get_nesting_vmid functions

2021-08-30 Thread Nicolin Chen via iommu
VMID stands for Virtual Machine Identifier, being used to tag
TLB entries to indicate which VM they belong to. This is used
by some IOMMU like SMMUv3 for virtualization case, in nesting
mode.

So this patch adds a pair of new iommu_ops callback functions
with a pair of exported set/get functions to allow VFIO core
to get access of the VMID value in an IOMMU driver.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/iommu.c | 20 
 include/linux/iommu.h |  5 +
 2 files changed, 25 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3303d707bab4..051f2df36dc0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2774,6 +2774,26 @@ int iommu_enable_nesting(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_enable_nesting);
 
+int iommu_set_nesting_vmid(struct iommu_domain *domain, u32 vmid)
+{
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
+   if (!domain->ops->set_nesting_vmid)
+   return -EINVAL;
+   return domain->ops->set_nesting_vmid(domain, vmid);
+}
+EXPORT_SYMBOL_GPL(iommu_set_nesting_vmid);
+
+int iommu_get_nesting_vmid(struct iommu_domain *domain, u32 *vmid)
+{
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
+   if (!domain->ops->get_nesting_vmid)
+   return -EINVAL;
+   return domain->ops->get_nesting_vmid(domain, vmid);
+}
+EXPORT_SYMBOL_GPL(iommu_get_nesting_vmid);
+
 int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirk)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d2f3435e7d17..bda6b3450909 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -163,6 +163,7 @@ enum iommu_dev_features {
 };
 
 #define IOMMU_PASID_INVALID(-1U)
+#define IOMMU_VMID_INVALID (-1U)
 
 #ifdef CONFIG_IOMMU_API
 
@@ -269,6 +270,8 @@ struct iommu_ops {
void (*probe_finalize)(struct device *dev);
struct iommu_group *(*device_group)(struct device *dev);
int (*enable_nesting)(struct iommu_domain *domain);
+   int (*set_nesting_vmid)(struct iommu_domain *domain, u32 vmid);
+   int (*get_nesting_vmid)(struct iommu_domain *domain, u32 *vmid);
int (*set_pgtable_quirks)(struct iommu_domain *domain,
  unsigned long quirks);
 
@@ -500,6 +503,8 @@ extern int iommu_group_id(struct iommu_group *group);
 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
 
 int iommu_enable_nesting(struct iommu_domain *domain);
+int iommu_set_nesting_vmid(struct iommu_domain *domain, u32 vmid);
+int iommu_get_nesting_vmid(struct iommu_domain *domain, u32 *vmid);
 int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 00/13] iommu/arm-smmu-v3: Add NVIDIA implementation

2021-08-30 Thread Nicolin Chen via iommu
The SMMUv3 devices implemented in the Grace SoC support NVIDIA's custom
CMDQ-Virtualization (CMDQV) hardware. Like the new ECMDQ feature first
introduced in the ARM SMMUv3.3 specification, CMDQV adds multiple VCMDQ
interfaces to supplement the single architected SMMU_CMDQ in an effort
to reduce contention.

This series of patches add CMDQV support with its preparational changes:

* PATCH-1 to PATCH-8 are related to shared VMID feature: they are used
  first to improve TLB utilization, second to bind a shared VMID with a
  VCMDQ interface for hardware configuring requirement.

* PATCH-9 and PATCH-10 are to accommodate the NVIDIA implementation with
  the existing arm-smmu-v3 driver.

* PATCH-11 borrows the "implementation infrastructure" from the arm-smmu
  driver so later change can build upon it.

* PATCH-12 adds an initial NVIDIA implementation related to host feature,
  and also adds implementation specific ->device_reset() and ->get_cmdq()
  callback functions.

* PATCH-13 adds virtualization features using VFIO mdev interface, which
  allows user space hypervisor to map and get access to one of the VCMDQ
  interfaces of CMDQV module.

( Thinking that reviewers can get a better view of this implementation,
  I am attaching QEMU changes here for reference purpose:
  https://github.com/nicolinc/qemu/commits/dev/cmdqv_v6.0.0-rc2
  The branch has all preparational changes, while I'm still integrating
  device model and ARM-VIRT changes, and will push them these two days,
  although they might not be in a good shape of being sent to review yet )

Above all, I marked RFC for this series, as I feel that we may come up
some better solution. So please kindly share your reviews and insights.

Thank you!

Changelog
v1->v2:
 * Added mdev interface support for hypervisor and VMs.
 * Added preparational changes for mdev interface implementation.
 * PATCH-12 Changed ->issue_cmdlist() to ->get_cmdq() for a better
   integration with recently merged ECMDQ-related changes.

Nate Watterson (3):
  iommu/arm-smmu-v3: Add implementation infrastructure
  iommu/arm-smmu-v3: Add support for NVIDIA CMDQ-Virtualization hw
  iommu/nvidia-smmu-v3: Add mdev interface support

Nicolin Chen (10):
  iommu: Add set_nesting_vmid/get_nesting_vmid functions
  vfio: add VFIO_IOMMU_GET_VMID and VFIO_IOMMU_SET_VMID
  vfio: Document VMID control for IOMMU Virtualization
  vfio: add set_vmid and get_vmid for vfio_iommu_type1
  vfio/type1: Implement set_vmid and get_vmid
  vfio/type1: Set/get VMID to/from iommu driver
  iommu/arm-smmu-v3: Add shared VMID support for NESTING
  iommu/arm-smmu-v3: Add VMID alloc/free helpers
  iommu/arm-smmu-v3: Pass dev pointer to arm_smmu_detach_dev
  iommu/arm-smmu-v3: Pass cmdq pointer in arm_smmu_cmdq_issue_cmdlist()

 Documentation/driver-api/vfio.rst |   34 +
 MAINTAINERS   |2 +
 drivers/iommu/arm/arm-smmu-v3/Makefile|2 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c  |   15 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  121 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   18 +
 .../iommu/arm/arm-smmu-v3/nvidia-smmu-v3.c| 1249 +
 drivers/iommu/iommu.c |   20 +
 drivers/vfio/vfio.c   |   25 +
 drivers/vfio/vfio_iommu_type1.c   |   37 +
 include/linux/iommu.h |5 +
 include/linux/vfio.h  |2 +
 include/uapi/linux/vfio.h |   26 +
 13 files changed, 1537 insertions(+), 19 deletions(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/nvidia-smmu-v3.c

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 05/13] vfio/type1: Implement set_vmid and get_vmid

2021-08-30 Thread Nicolin Chen via iommu
Now we have a pair of ->set_vmid() and ->get_vmid() function
pointers. This patch implements them, to exchange VMID value
between vfio container and vfio_iommu_type1.

Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0e9217687f5c..bb5d949bc1af 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -74,6 +74,7 @@ struct vfio_iommu {
uint64_tpgsize_bitmap;
uint64_tnum_non_pinned_groups;
wait_queue_head_t   vaddr_wait;
+   uint32_tvmid;
boolv2;
boolnesting;
booldirty_page_tracking;
@@ -2674,6 +2675,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
iommu->dma_list = RB_ROOT;
iommu->dma_avail = dma_entry_limit;
iommu->container_open = true;
+   iommu->vmid = VFIO_IOMMU_VMID_INVALID;
mutex_init(&iommu->lock);
BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
init_waitqueue_head(&iommu->vaddr_wait);
@@ -3255,6 +3257,27 @@ static void vfio_iommu_type1_notify(void *iommu_data,
wake_up_all(&iommu->vaddr_wait);
 }
 
+static int vfio_iommu_type1_get_vmid(void *iommu_data, u32 *vmid)
+{
+   struct vfio_iommu *iommu = iommu_data;
+
+   *vmid = iommu->vmid;
+
+   return 0;
+}
+
+static int vfio_iommu_type1_set_vmid(void *iommu_data, u32 vmid)
+{
+   struct vfio_iommu *iommu = iommu_data;
+
+   if (vmid == VFIO_IOMMU_VMID_INVALID)
+   return -EINVAL;
+
+   iommu->vmid = vmid;
+
+   return 0;
+}
+
 static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
.name   = "vfio-iommu-type1",
.owner  = THIS_MODULE,
@@ -3270,6 +3293,8 @@ static const struct vfio_iommu_driver_ops 
vfio_iommu_driver_ops_type1 = {
.dma_rw = vfio_iommu_type1_dma_rw,
.group_iommu_domain = vfio_iommu_type1_group_iommu_domain,
.notify = vfio_iommu_type1_notify,
+   .set_vmid   = vfio_iommu_type1_set_vmid,
+   .get_vmid   = vfio_iommu_type1_get_vmid,
 };
 
 static int __init vfio_iommu_type1_init(void)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC][PATCH v2 02/13] vfio: add VFIO_IOMMU_GET_VMID and VFIO_IOMMU_SET_VMID

2021-08-30 Thread Nicolin Chen via iommu
This patch adds a pair of new ioctl commands to communicate with
user space (virtual machine hypervisor) to get and set VMID that
indicates a Virtual Machine Identifier, being used by some IOMMU
to tag TLB entries -- similar to CPU MMU, using this VMID number
allows IOMMU to invalidate at the same time TLBs of the same VM.

Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio.c   | 13 +
 include/uapi/linux/vfio.h | 26 ++
 2 files changed, 39 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 3c034fe14ccb..c17b25c127a2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -59,6 +59,7 @@ struct vfio_container {
struct rw_semaphore group_lock;
struct vfio_iommu_driver*iommu_driver;
void*iommu_data;
+   u32 vmid;
boolnoiommu;
 };
 
@@ -1190,6 +1191,16 @@ static long vfio_fops_unl_ioctl(struct file *filep,
case VFIO_SET_IOMMU:
ret = vfio_ioctl_set_iommu(container, arg);
break;
+   case VFIO_IOMMU_GET_VMID:
+   ret = copy_to_user((void __user *)arg, &container->vmid,
+  sizeof(u32)) ? -EFAULT : 0;
+   break;
+   case VFIO_IOMMU_SET_VMID:
+   if ((u32)arg == VFIO_IOMMU_VMID_INVALID)
+   return -EINVAL;
+   container->vmid = (u32)arg;
+   ret = 0;
+   break;
default:
driver = container->iommu_driver;
data = container->iommu_data;
@@ -1213,6 +1224,8 @@ static int vfio_fops_open(struct inode *inode, struct 
file *filep)
init_rwsem(&container->group_lock);
kref_init(&container->kref);
 
+   container->vmid = VFIO_IOMMU_VMID_INVALID;
+
filep->private_data = container;
 
return 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index ef33ea002b0b..58c5fa6aaca6 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1216,6 +1216,32 @@ struct vfio_iommu_type1_dirty_bitmap_get {
 
 #define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17)
 
+/**
+ * VFIO_IOMMU_GET_VMID - _IOWR(VFIO_TYPE, VFIO_BASE + 22, __u32 *vmid)
+ * VFIO_IOMMU_SET_VMID - _IOWR(VFIO_TYPE, VFIO_BASE + 23, __u32 vmid)
+ *
+ * IOCTLs are used for VMID alignment between Kernel and User Space hypervisor.
+ * In a virtualization use case, a guest owns the first stage translation, and
+ * the hypervisor owns the second stage translation. VMID is an Virtual Machine
+ * Identifier that is to tag TLB entries of a VM. If a VM has multiple physical
+ * devices being assigned to it, while these devices are under different IOMMU
+ * domains, the VMIDs in the second stage configurations of these IOMMU domains
+ * could be aligned to a unified VMID value. This could be achieved by using
+ * these two IOCTLs.
+ *
+ * Caller should get VMID upon its initial value when the first physical device
+ * is assigned to the VM.
+ *
+ * Caller then should set VMID to share the same VMID value with other physical
+ * devices being assigned to the same VM.
+ *
+ */
+#define VFIO_IOMMU_VMID_INVALID(-1U)
+
+#define VFIO_IOMMU_GET_VMID_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+#define VFIO_IOMMU_SET_VMID_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 14/14] tpm: Allow locality 2 to be set when initializing the TPM for Secure Launch

2021-08-30 Thread Daniel P. Smith
On 8/27/21 9:30 AM, Jason Gunthorpe wrote:
> On Fri, Aug 27, 2021 at 09:28:37AM -0400, Ross Philipson wrote:
>> The Secure Launch MLE environment uses PCRs that are only accessible from
>> the DRTM locality 2. By default the TPM drivers always initialize the
>> locality to 0. When a Secure Launch is in progress, initialize the
>> locality to 2.
>>
>> Signed-off-by: Ross Philipson 
>> ---
>>  drivers/char/tpm/tpm-chip.c | 9 -
>>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> Global state like this seems quite dangerous, shouldn't the locality
> be selected based on the PCR being accessed, or passed down from
> higher up in the call chain?
> 
> Jason
> 

Hey Jason,

While locality does control which PCRs are accessible, it is meant to
reflect what component that a TPM command is originating. To quote the
TCG Spec, "“Locality” is an assertion to the TPM that a command’s source
is associated with a particular component. Locality can be thought of as
a hardware-based authorization."

Thus when the SRTM chain, to include the Static OS, is in control, the
hardware is required to restrict access to only Locality 0. Once a
Dynamic Launch has occurred, the hardware grants access to Locality 1
and 2. Again to refer to the TCG spec, the definition of Locality 2 is,
the "Dynamically Launched OS (Dynamic OS) “runtime” environment".

When Linux is started from the SRTM, it is correct for the TPM driver to
set the locality to Locality 0 to denote that the source is the Static
OS. Now when Linux is started from the DRTM, the TPM driver should set
the locality to Locality 2 to denote that it is the "runtime" Dynamic
OS. As for the differences in access, with Locality 0 Linux (being the
Static OS) is restricted to the Static PCRs (0-15), the Debug PCR (16),
and the Application PCR (23). Whereas with Locality 2 Linux now being
the Dynamic OS will have access to all PCRs.

To summarize, TPM locality really is a global state that reflects the
component in control of the platform. Considering the definition of
locality, setting the locality to Locality 0 is saying that the Static
Environment is now in control. Doing so after the Dynamic Environment
has started would be an inaccurate setting of the platform state. The
correct time at which the locality should change back to Locality 0 is
after the Dynamic Environment has been exited. On Intel this can be done
by invoking the instruction GETSEC[SEXIT]. It should be noted that
Secure Launch adds the call to the GETSEC[SEXIT] instruction in the
kexec, reboot, and shutdown paths.

v/r,
dps
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 0/7] Fixes for dma-iommu swiotlb bounce buffers

2021-08-30 Thread Rajat Jain via iommu
I'm wondering why I don't see v7 on these patches on patchwork (these
patches on 
https://lore.kernel.org/patchwork/project/lkml/list/?series=&submitter=27643&state=&q=&archive=&delegate=)
?

On Sun, Aug 29, 2021 at 10:00 PM David Stevens  wrote:
>
> This patch set includes various fixes for dma-iommu's swiotlb bounce
> buffers for untrusted devices.
>
> The min_align_mask issue was found when running fio on an untrusted nvme
> device with bs=512. The other issues were found via code inspection, so
> I don't have any specific use cases where things were not working, nor
> any concrete performance numbers.
>
> There are two issues related to min_align_mask that this patch series
> does not attempt to fix. First, it does not address the case where
> min_align_mask is larger than the IOVA granule. Doing so requires
> changes to IOVA allocation, and is not specific to when swiotlb bounce
> buffers are used. This is not a problem in practice today, since the
> only driver which uses min_align_mask is nvme, which sets it to 4096.
>
> The second issue this series does not address is the fact that extra
> swiotlb slots adjacent to a bounce buffer can be exposed to untrusted
> devices whose drivers use min_align_mask. Fixing this requires being
> able to allocate padding slots at the beginning of a swiotlb allocation.
> This is a rather significant change that I am not comfortable making.
> Without being able to handle this, there is also little point to
> clearing the padding at the start of such a buffer, since we can only
> clear based on (IO_TLB_SIZE - 1) instead of iova_mask.
>
> v6 -> v7:
>  - Remove unsafe attempt to clear padding at start of swiotlb buffer
>  - Rewrite commit message for min_align_mask commit to better explain
>the problem it's fixing
>  - Rebase on iommu/core
>  - Acknowledge unsolved issues in cover letter
>
> v5 -> v6:
>  - Remove unnecessary line break
>  - Remove redundant config check
>
> v4 -> v5:
>  - Fix xen build error
>  - Move _swiotlb refactor into its own patch
>
> v3 -> v4:
>  - Fold _swiotlb functions into _page functions
>  - Add patch to align swiotlb buffer to iovad granule
>  - Combine if checks in iommu_dma_sync_sg_* functions
>
> v2 -> v3:
>  - Add new patch to address min_align_mask bug
>  - Set SKIP_CPU_SYNC flag after syncing in map/unmap
>  - Properly call arch_sync_dma_for_cpu in iommu_dma_sync_sg_for_cpu
>
> v1 -> v2:
>  - Split fixes into dedicated patches
>  - Less invasive changes to fix arch_sync when mapping
>  - Leave dev_is_untrusted check for strict iommu
>
> David Stevens (7):
>   dma-iommu: fix sync_sg with swiotlb
>   dma-iommu: fix arch_sync_dma for map
>   dma-iommu: skip extra sync during unmap w/swiotlb
>   dma-iommu: fold _swiotlb helpers into callers
>   dma-iommu: Check CONFIG_SWIOTLB more broadly
>   swiotlb: support aligned swiotlb buffers
>   dma-iommu: account for min_align_mask w/swiotlb
>
>  drivers/iommu/dma-iommu.c | 188 +-
>  drivers/xen/swiotlb-xen.c |   2 +-
>  include/linux/swiotlb.h   |   3 +-
>  kernel/dma/swiotlb.c  |  11 ++-
>  4 files changed, 93 insertions(+), 111 deletions(-)
>
> --
> 2.33.0.259.gc128427fd7-goog
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 13/13] Documentation: Add documentation for VDUSE

2021-08-30 Thread Xie Yongji
VDUSE (vDPA Device in Userspace) is a framework to support
implementing software-emulated vDPA devices in userspace. This
document is intended to clarify the VDUSE design and usage.

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 Documentation/userspace-api/index.rst |   1 +
 Documentation/userspace-api/vduse.rst | 233 ++
 2 files changed, 234 insertions(+)
 create mode 100644 Documentation/userspace-api/vduse.rst

diff --git a/Documentation/userspace-api/index.rst 
b/Documentation/userspace-api/index.rst
index 0b5eefed027e..c432be070f67 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -27,6 +27,7 @@ place where this information is gathered.
iommu
media/index
sysfs-platform_profile
+   vduse
 
 .. only::  subproject and html
 
diff --git a/Documentation/userspace-api/vduse.rst 
b/Documentation/userspace-api/vduse.rst
new file mode 100644
index ..42ef59ea5314
--- /dev/null
+++ b/Documentation/userspace-api/vduse.rst
@@ -0,0 +1,233 @@
+==
+VDUSE - "vDPA Device in Userspace"
+==
+
+vDPA (virtio data path acceleration) device is a device that uses a
+datapath which complies with the virtio specifications with vendor
+specific control path. vDPA devices can be both physically located on
+the hardware or emulated by software. VDUSE is a framework that makes it
+possible to implement software-emulated vDPA devices in userspace. And
+to make the device emulation more secure, the emulated vDPA device's
+control path is handled in the kernel and only the data path is
+implemented in the userspace.
+
+Note that only virtio block device is supported by VDUSE framework now,
+which can reduce security risks when the userspace process that implements
+the data path is run by an unprivileged user. The support for other device
+types can be added after the security issue of corresponding device driver
+is clarified or fixed in the future.
+
+Create/Destroy VDUSE devices
+
+
+VDUSE devices are created as follows:
+
+1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
+   /dev/vduse/control.
+
+2. Setup each virtqueue with ioctl(VDUSE_VQ_SETUP) on /dev/vduse/$NAME.
+
+3. Begin processing VDUSE messages from /dev/vduse/$NAME. The first
+   messages will arrive while attaching the VDUSE instance to vDPA bus.
+
+4. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE
+   instance to vDPA bus.
+
+VDUSE devices are destroyed as follows:
+
+1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE
+   instance from vDPA bus.
+
+2. Close the file descriptor referring to /dev/vduse/$NAME.
+
+3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on
+   /dev/vduse/control.
+
+The netlink messages can be sent via vdpa tool in iproute2 or use the
+below sample codes:
+
+.. code-block:: c
+
+   static int netlink_add_vduse(const char *name, enum vdpa_command cmd)
+   {
+   struct nl_sock *nlsock;
+   struct nl_msg *msg;
+   int famid;
+
+   nlsock = nl_socket_alloc();
+   if (!nlsock)
+   return -ENOMEM;
+
+   if (genl_connect(nlsock))
+   goto free_sock;
+
+   famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME);
+   if (famid < 0)
+   goto close_sock;
+
+   msg = nlmsg_alloc();
+   if (!msg)
+   goto close_sock;
+
+   if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, 
cmd, 0))
+   goto nla_put_failure;
+
+   NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name);
+   if (cmd == VDPA_CMD_DEV_NEW)
+   NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, 
"vduse");
+
+   if (nl_send_sync(nlsock, msg))
+   goto close_sock;
+
+   nl_close(nlsock);
+   nl_socket_free(nlsock);
+
+   return 0;
+   nla_put_failure:
+   nlmsg_free(msg);
+   close_sock:
+   nl_close(nlsock);
+   free_sock:
+   nl_socket_free(nlsock);
+   return -1;
+   }
+
+How VDUSE works
+---
+
+As mentioned above, a VDUSE device is created by ioctl(VDUSE_CREATE_DEV) on
+/dev/vduse/control. With this ioctl, userspace can specify some basic 
configuration
+such as device name (uniquely identify a VDUSE device), virtio features, virtio
+configuration space, the number of virtqueues and so on for this emulated 
device.
+Then a char device interface (/dev/vduse/$NAME) is exported to userspace for 
device
+emulation. Userspace can use the VDUSE_VQ_SETUP ioctl on /dev/vduse/$NAME to
+add per-virtqueue configuration such as the max size of virtqueue to the 
device.
+
+After the initialization, the VDUSE device can be attached to vDPA bus via

[PATCH v12 12/13] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-08-30 Thread Xie Yongji
This VDUSE driver enables implementing software-emulated vDPA
devices in userspace. The vDPA device is created by
ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
interface (/dev/vduse/$NAME) is exported to userspace for device
emulation.

In order to make the device emulation more secure, the device's
control path is handled in kernel. A message mechnism is introduced
to forward some dataplane related control messages to userspace.

And in the data path, the DMA buffer will be mapped into userspace
address space through different ways depending on the vDPA bus to
which the vDPA device is attached. In virtio-vdpa case, the MMU-based
software IOTLB is used to achieve that. And in vhost-vdpa case, the
DMA buffer is reside in a userspace memory region which can be shared
to the VDUSE userspace processs via transferring the shmfd.

For more details on VDUSE design and usage, please see the follow-on
Documentation commit.

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
 drivers/vdpa/Kconfig   |   10 +
 drivers/vdpa/Makefile  |1 +
 drivers/vdpa/vdpa_user/Makefile|5 +
 drivers/vdpa/vdpa_user/vduse_dev.c | 1641 
 include/uapi/linux/vduse.h |  306 
 6 files changed, 1964 insertions(+)
 create mode 100644 drivers/vdpa/vdpa_user/Makefile
 create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
 create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 1409e40e6345..293ca3aef358 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
 'z'   10-4F  drivers/s390/crypto/zcrypt_api.hconflict!
 '|'   00-7F  linux/media.h
 0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
 0x89  00-06  arch/x86/include/asm/sockios.h
 0x89  0B-DF  linux/sockios.h
 0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index e48e2b10ca36..3d91982d8371 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
  vDPA block device simulator which terminates IO request in a
  memory buffer.
 
+config VDPA_USER
+   tristate "VDUSE (vDPA Device in Userspace) support"
+   depends on EVENTFD && MMU && HAS_DMA
+   select DMA_OPS
+   select VHOST_IOTLB
+   select IOMMU_IOVA
+   help
+ With VDUSE it is possible to emulate a vDPA Device
+ in a userspace program.
+
 config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_VDPA) += vdpa.o
 obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
 obj-$(CONFIG_IFCVF)+= ifcvf/
 obj-$(CONFIG_MLX5_VDPA) += mlx5/
 obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..59a93e5b967a
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1641 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024)
+#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+#define VDUSE_MSG_DEFAULT_TIMEOUT 30
+
+struct vduse_virtqueue {
+   u16 index;
+   u16 num_max;
+   u32 num;
+   u64 desc_addr;
+   u64 driver_addr;
+   u64 device_addr;
+   struct vdpa_vq_state state;
+   bool ready;
+   bool kicked;
+   spinlock_t kick_lock;
+   spinlock_t irq_lock;
+   struct eventfd_ctx *kickfd;
+   str

[PATCH v12 11/13] vduse: Implement an MMU-based software IOTLB

2021-08-30 Thread Xie Yongji
This implements an MMU-based software IOTLB to support mapping
kernel dma buffer into userspace dynamically. The basic idea
behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The
software IOTLB will set up MMU mapping instead of IOMMU mapping
for the DMA transfer so that the userspace process is able to
use its virtual address to access the dma buffer in kernel.

To avoid security issue, a bounce-buffering mechanism is
introduced to prevent userspace accessing the original buffer
directly which may contain other kernel data. During the mapping,
unmapping, the software IOTLB will copy the data from the original
buffer to the bounce buffer and back, depending on the direction
of the transfer. And the bounce-buffer addresses will be mapped
into the user address space instead of the original one.

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/vdpa_user/iova_domain.c | 545 +++
 drivers/vdpa/vdpa_user/iova_domain.h |  73 +
 2 files changed, 618 insertions(+)
 create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
 create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
new file mode 100644
index ..1daae2608860
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -0,0 +1,545 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * MMU-based software IOTLB.
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+static int vduse_iotlb_add_range(struct vduse_iova_domain *domain,
+u64 start, u64 last,
+u64 addr, unsigned int perm,
+struct file *file, u64 offset)
+{
+   struct vdpa_map_file *map_file;
+   int ret;
+
+   map_file = kmalloc(sizeof(*map_file), GFP_ATOMIC);
+   if (!map_file)
+   return -ENOMEM;
+
+   map_file->file = get_file(file);
+   map_file->offset = offset;
+
+   ret = vhost_iotlb_add_range_ctx(domain->iotlb, start, last,
+   addr, perm, map_file);
+   if (ret) {
+   fput(map_file->file);
+   kfree(map_file);
+   return ret;
+   }
+   return 0;
+}
+
+static void vduse_iotlb_del_range(struct vduse_iova_domain *domain,
+ u64 start, u64 last)
+{
+   struct vdpa_map_file *map_file;
+   struct vhost_iotlb_map *map;
+
+   while ((map = vhost_iotlb_itree_first(domain->iotlb, start, last))) {
+   map_file = (struct vdpa_map_file *)map->opaque;
+   fput(map_file->file);
+   kfree(map_file);
+   vhost_iotlb_map_free(domain->iotlb, map);
+   }
+}
+
+int vduse_domain_set_map(struct vduse_iova_domain *domain,
+struct vhost_iotlb *iotlb)
+{
+   struct vdpa_map_file *map_file;
+   struct vhost_iotlb_map *map;
+   u64 start = 0ULL, last = ULLONG_MAX;
+   int ret;
+
+   spin_lock(&domain->iotlb_lock);
+   vduse_iotlb_del_range(domain, start, last);
+
+   for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+map = vhost_iotlb_itree_next(map, start, last)) {
+   map_file = (struct vdpa_map_file *)map->opaque;
+   ret = vduse_iotlb_add_range(domain, map->start, map->last,
+   map->addr, map->perm,
+   map_file->file,
+   map_file->offset);
+   if (ret)
+   goto err;
+   }
+   spin_unlock(&domain->iotlb_lock);
+
+   return 0;
+err:
+   vduse_iotlb_del_range(domain, start, last);
+   spin_unlock(&domain->iotlb_lock);
+   return ret;
+}
+
+void vduse_domain_clear_map(struct vduse_iova_domain *domain,
+   struct vhost_iotlb *iotlb)
+{
+   struct vhost_iotlb_map *map;
+   u64 start = 0ULL, last = ULLONG_MAX;
+
+   spin_lock(&domain->iotlb_lock);
+   for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+map = vhost_iotlb_itree_next(map, start, last)) {
+   vduse_iotlb_del_range(domain, map->start, map->last);
+   }
+   spin_unlock(&domain->iotlb_lock);
+}
+
+static int vduse_domain_map_bounce_page(struct vduse_iova_domain *domain,
+u64 iova, u64 size, u64 paddr)
+{
+   struct vduse_bounce_map *map;
+   u64 last = iova + size - 1;
+
+   while (iova <= last) {
+   map = &domain->bounce_maps[iova >> PAGE_SHIFT];
+   if (!map->bounce_page) {
+   map->bounce_page = alloc_page(GFP_ATOMIC);
+   if (!map->bounce_page)
+

[PATCH v12 10/13] vdpa: Support transferring virtual addressing during DMA mapping

2021-08-30 Thread Xie Yongji
This patch introduces an attribute for vDPA device to indicate
whether virtual address can be used. If vDPA device driver set
it, vhost-vdpa bus driver will not pin user page and transfer
userspace virtual address instead of physical address during
DMA mapping. And corresponding vma->vm_file and offset will be
also passed as an opaque pointer.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/ifcvf/ifcvf_main.c   |  2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c |  2 +-
 drivers/vdpa/vdpa.c   |  9 +++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c  |  2 +-
 drivers/vdpa/virtio_pci/vp_vdpa.c |  2 +-
 drivers/vhost/vdpa.c  | 99 ++-
 include/linux/vdpa.h  | 20 ++--
 7 files changed, 117 insertions(+), 19 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 6708671a0603..358f3e2607da 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -515,7 +515,7 @@ static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, 
const char *name)
pdev = ifcvf_mgmt_dev->pdev;
dev = &pdev->dev;
adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
-   dev, &ifc_vdpa_ops, name);
+   dev, &ifc_vdpa_ops, name, false);
if (IS_ERR(adapter)) {
IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
return PTR_ERR(adapter);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 608f6b900cd9..08f39952fa6a 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2425,7 +2425,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name)
max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
 
ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
mdev->device, &mlx5_vdpa_ops,
-name);
+name, false);
if (IS_ERR(ndev))
return PTR_ERR(ndev);
 
diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index bb3f1d1f0422..8f01d6a7ecc5 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -71,6 +71,7 @@ static void vdpa_release_dev(struct device *d)
  * @config: the bus operations that is supported by this device
  * @size: size of the parent structure that contains private data
  * @name: name of the vdpa device; optional.
+ * @use_va: indicate whether virtual address must be used by this device
  *
  * Driver should use vdpa_alloc_device() wrapper macro instead of
  * using this directly.
@@ -80,7 +81,8 @@ static void vdpa_release_dev(struct device *d)
  */
 struct vdpa_device *__vdpa_alloc_device(struct device *parent,
const struct vdpa_config_ops *config,
-   size_t size, const char *name)
+   size_t size, const char *name,
+   bool use_va)
 {
struct vdpa_device *vdev;
int err = -EINVAL;
@@ -91,6 +93,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
if (!!config->dma_map != !!config->dma_unmap)
goto err;
 
+   /* It should only work for the device that use on-chip IOMMU */
+   if (use_va && !(config->dma_map || config->set_map))
+   goto err;
+
err = -ENOMEM;
vdev = kzalloc(size, GFP_KERNEL);
if (!vdev)
@@ -106,6 +112,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
vdev->index = err;
vdev->config = config;
vdev->features_valid = false;
+   vdev->use_va = use_va;
 
if (name)
err = dev_set_name(&vdev->dev, "%s", name);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index a70fd2a08ff1..5f484fff8dbe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -251,7 +251,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr)
ops = &vdpasim_config_ops;
 
vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops,
-   dev_attr->name);
+   dev_attr->name, false);
if (IS_ERR(vdpasim)) {
ret = PTR_ERR(vdpasim);
goto err_alloc;
diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index cd7718b43a6e..5bcd00246d2e 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -446,7 +446,7 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
return ret;
 
vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa,
-   dev, &vp_vdpa_ops, NULL);
+   dev, &vp_vdpa_ops, NU

[PATCH v12 09/13] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

2021-08-30 Thread Xie Yongji
The upcoming patch is going to support VA mapping/unmapping.
So let's factor out the logic of PA mapping/unmapping firstly
to make the code more readable.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vhost/vdpa.c | 55 +---
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ba030150b4b6..49a1f45ccef8 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -504,7 +504,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
return r;
 }
 
-static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last)
+static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last)
 {
struct vhost_dev *dev = &v->vdev;
struct vhost_iotlb *iotlb = dev->iotlb;
@@ -526,6 +526,11 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, 
u64 start, u64 last)
}
 }
 
+static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last)
+{
+   return vhost_vdpa_pa_unmap(v, start, last);
+}
+
 static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)
 {
struct vhost_dev *dev = &v->vdev;
@@ -606,38 +611,28 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 
iova, u64 size)
}
 }
 
-static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
-  struct vhost_iotlb_msg *msg)
+static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
+u64 iova, u64 size, u64 uaddr, u32 perm)
 {
struct vhost_dev *dev = &v->vdev;
-   struct vhost_iotlb *iotlb = dev->iotlb;
struct page **page_list;
unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
unsigned int gup_flags = FOLL_LONGTERM;
unsigned long npages, cur_base, map_pfn, last_pfn = 0;
unsigned long lock_limit, sz2pin, nchunks, i;
-   u64 iova = msg->iova;
+   u64 start = iova;
long pinned;
int ret = 0;
 
-   if (msg->iova < v->range.first || !msg->size ||
-   msg->iova > U64_MAX - msg->size + 1 ||
-   msg->iova + msg->size - 1 > v->range.last)
-   return -EINVAL;
-
-   if (vhost_iotlb_itree_first(iotlb, msg->iova,
-   msg->iova + msg->size - 1))
-   return -EEXIST;
-
/* Limit the use of memory for bookkeeping */
page_list = (struct page **) __get_free_page(GFP_KERNEL);
if (!page_list)
return -ENOMEM;
 
-   if (msg->perm & VHOST_ACCESS_WO)
+   if (perm & VHOST_ACCESS_WO)
gup_flags |= FOLL_WRITE;
 
-   npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
+   npages = PAGE_ALIGN(size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
if (!npages) {
ret = -EINVAL;
goto free;
@@ -651,7 +646,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
goto unlock;
}
 
-   cur_base = msg->uaddr & PAGE_MASK;
+   cur_base = uaddr & PAGE_MASK;
iova &= PAGE_MASK;
nchunks = 0;
 
@@ -682,7 +677,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
ret = vhost_vdpa_map(v, iova, csize,
 map_pfn << PAGE_SHIFT,
-msg->perm);
+perm);
if (ret) {
/*
 * Unpin the pages that are left 
unmapped
@@ -711,7 +706,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
 
/* Pin the rest chunk */
ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT,
-map_pfn << PAGE_SHIFT, msg->perm);
+map_pfn << PAGE_SHIFT, perm);
 out:
if (ret) {
if (nchunks) {
@@ -730,13 +725,33 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
for (pfn = map_pfn; pfn <= last_pfn; pfn++)
unpin_user_page(pfn_to_page(pfn));
}
-   vhost_vdpa_unmap(v, msg->iova, msg->size);
+   vhost_vdpa_unmap(v, start, size);
}
 unlock:
mmap_read_unlock(dev->mm);
 free:
free_page((unsigned long)page_list);
return ret;
+
+}
+
+static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
+  struct vhost_iotlb_msg *msg)
+{
+   struct vhost_dev *dev = &v->vdev;
+   struct vhost_iotlb *iotlb = dev->iotlb;
+
+   if (msg->iova < v->range.first || !msg->size ||
+   msg->iova > U64_MAX - msg->size + 1 ||
+

[PATCH v12 08/13] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()

2021-08-30 Thread Xie Yongji
Add an opaque pointer for DMA mapping.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 +++---
 drivers/vhost/vdpa.c | 2 +-
 include/linux/vdpa.h | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index f292bb05d6c9..a70fd2a08ff1 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -555,14 +555,14 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,
 }
 
 static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size,
-  u64 pa, u32 perm)
+  u64 pa, u32 perm, void *opaque)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int ret;
 
spin_lock(&vdpasim->iommu_lock);
-   ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa,
-   perm);
+   ret = vhost_iotlb_add_range_ctx(vdpasim->iommu, iova, iova + size - 1,
+   pa, perm, opaque);
spin_unlock(&vdpasim->iommu_lock);
 
return ret;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ab39805ecff1..ba030150b4b6 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -571,7 +571,7 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
return r;
 
if (ops->dma_map) {
-   r = ops->dma_map(vdpa, iova, size, pa, perm);
+   r = ops->dma_map(vdpa, iova, size, pa, perm, NULL);
} else if (ops->set_map) {
if (!v->in_batch)
r = ops->set_map(vdpa, dev->iotlb);
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index e1eae8c7483d..f3014aaca47e 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -270,7 +270,7 @@ struct vdpa_config_ops {
/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
-  u64 pa, u32 perm);
+  u64 pa, u32 perm, void *opaque);
int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
 
/* Free device resources */
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 07/13] vhost-iotlb: Add an opaque pointer for vhost IOTLB

2021-08-30 Thread Xie Yongji
Add an opaque pointer for vhost IOTLB. And introduce
vhost_iotlb_add_range_ctx() to accept it.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vhost/iotlb.c   | 20 
 include/linux/vhost_iotlb.h |  3 +++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 0582079e4bcc..670d56c879e5 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -36,19 +36,21 @@ void vhost_iotlb_map_free(struct vhost_iotlb *iotlb,
 EXPORT_SYMBOL_GPL(vhost_iotlb_map_free);
 
 /**
- * vhost_iotlb_add_range - add a new range to vhost IOTLB
+ * vhost_iotlb_add_range_ctx - add a new range to vhost IOTLB
  * @iotlb: the IOTLB
  * @start: start of the IOVA range
  * @last: last of IOVA range
  * @addr: the address that is mapped to @start
  * @perm: access permission of this range
+ * @opaque: the opaque pointer for the new mapping
  *
  * Returns an error last is smaller than start or memory allocation
  * fails
  */
-int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
- u64 start, u64 last,
- u64 addr, unsigned int perm)
+int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb,
+ u64 start, u64 last,
+ u64 addr, unsigned int perm,
+ void *opaque)
 {
struct vhost_iotlb_map *map;
 
@@ -71,6 +73,7 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
map->last = last;
map->addr = addr;
map->perm = perm;
+   map->opaque = opaque;
 
iotlb->nmaps++;
vhost_iotlb_itree_insert(map, &iotlb->root);
@@ -80,6 +83,15 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
 
return 0;
 }
+EXPORT_SYMBOL_GPL(vhost_iotlb_add_range_ctx);
+
+int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
+ u64 start, u64 last,
+ u64 addr, unsigned int perm)
+{
+   return vhost_iotlb_add_range_ctx(iotlb, start, last,
+addr, perm, NULL);
+}
 EXPORT_SYMBOL_GPL(vhost_iotlb_add_range);
 
 /**
diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h
index 6b09b786a762..2d0e2f52f938 100644
--- a/include/linux/vhost_iotlb.h
+++ b/include/linux/vhost_iotlb.h
@@ -17,6 +17,7 @@ struct vhost_iotlb_map {
u32 perm;
u32 flags_padding;
u64 __subtree_last;
+   void *opaque;
 };
 
 #define VHOST_IOTLB_FLAG_RETIRE 0x1
@@ -29,6 +30,8 @@ struct vhost_iotlb {
unsigned int flags;
 };
 
+int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, u64 start, u64 last,
+ u64 addr, unsigned int perm, void *opaque);
 int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last,
  u64 addr, unsigned int perm);
 void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 06/13] vhost-vdpa: Handle the failure of vdpa_reset()

2021-08-30 Thread Xie Yongji
The vdpa_reset() may fail now. This adds check to its return
value and fail the vhost_vdpa_open().

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
Reviewed-by: Stefano Garzarella 
---
 drivers/vhost/vdpa.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ab7a24613982..ab39805ecff1 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -116,12 +116,13 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa 
*v, u16 qid)
irq_bypass_unregister_producer(&vq->call_ctx.producer);
 }
 
-static void vhost_vdpa_reset(struct vhost_vdpa *v)
+static int vhost_vdpa_reset(struct vhost_vdpa *v)
 {
struct vdpa_device *vdpa = v->vdpa;
 
-   vdpa_reset(vdpa);
v->in_batch = 0;
+
+   return vdpa_reset(vdpa);
 }
 
 static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
@@ -865,7 +866,9 @@ static int vhost_vdpa_open(struct inode *inode, struct file 
*filep)
return -EBUSY;
 
nvqs = v->nvqs;
-   vhost_vdpa_reset(v);
+   r = vhost_vdpa_reset(v);
+   if (r)
+   goto err;
 
vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL);
if (!vqs) {
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 05/13] vdpa: Add reset callback in vdpa_config_ops

2021-08-30 Thread Xie Yongji
This adds a new callback to support device specific reset
behavior. The vdpa bus driver will call the reset function
instead of setting status to zero during resetting.

Signed-off-by: Xie Yongji 
---
 drivers/vdpa/ifcvf/ifcvf_main.c   | 35 +++---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 40 +++
 drivers/vdpa/vdpa_sim/vdpa_sim.c  | 18 +++---
 drivers/vdpa/virtio_pci/vp_vdpa.c | 15 +--
 drivers/vhost/vdpa.c  |  9 +++--
 include/linux/vdpa.h  |  8 ++--
 6 files changed, 89 insertions(+), 36 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index bfc3d7d40c09..6708671a0603 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -222,17 +222,6 @@ static void ifcvf_vdpa_set_status(struct vdpa_device 
*vdpa_dev, u8 status)
if (status_old == status)
return;
 
-   if ((status_old & VIRTIO_CONFIG_S_DRIVER_OK) &&
-   !(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
-   ifcvf_stop_datapath(adapter);
-   ifcvf_free_irq(adapter, vf->nr_vring);
-   }
-
-   if (status == 0) {
-   ifcvf_reset_vring(adapter);
-   return;
-   }
-
if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
!(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) {
ret = ifcvf_request_irq(adapter);
@@ -252,6 +241,29 @@ static void ifcvf_vdpa_set_status(struct vdpa_device 
*vdpa_dev, u8 status)
ifcvf_set_status(vf, status);
 }
 
+static int ifcvf_vdpa_reset(struct vdpa_device *vdpa_dev)
+{
+   struct ifcvf_adapter *adapter;
+   struct ifcvf_hw *vf;
+   u8 status_old;
+
+   vf  = vdpa_to_vf(vdpa_dev);
+   adapter = vdpa_to_adapter(vdpa_dev);
+   status_old = ifcvf_get_status(vf);
+
+   if (status_old == 0)
+   return 0;
+
+   if (status_old & VIRTIO_CONFIG_S_DRIVER_OK)
+   ifcvf_stop_datapath(adapter);
+   ifcvf_free_irq(adapter, vf->nr_vring);
+   }
+
+   ifcvf_reset_vring(adapter);
+
+   return 0;
+}
+
 static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
 {
return IFCVF_QUEUE_MAX;
@@ -435,6 +447,7 @@ static const struct vdpa_config_ops ifc_vdpa_ops = {
.set_features   = ifcvf_vdpa_set_features,
.get_status = ifcvf_vdpa_get_status,
.set_status = ifcvf_vdpa_set_status,
+   .reset  = ifcvf_vdpa_reset,
.get_vq_num_max = ifcvf_vdpa_get_vq_num_max,
.get_vq_state   = ifcvf_vdpa_get_vq_state,
.set_vq_state   = ifcvf_vdpa_set_vq_state,
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 4ba3ac48ee83..608f6b900cd9 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2154,22 +2154,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
int err;
 
print_status(mvdev, status, true);
-   if (!status) {
-   mlx5_vdpa_info(mvdev, "performing device reset\n");
-   teardown_driver(ndev);
-   clear_vqs_ready(ndev);
-   mlx5_vdpa_destroy_mr(&ndev->mvdev);
-   ndev->mvdev.status = 0;
-   ndev->mvdev.mlx_features = 0;
-   memset(ndev->event_cbs, 0, sizeof(ndev->event_cbs));
-   ndev->mvdev.actual_features = 0;
-   ++mvdev->generation;
-   if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
-   if (mlx5_vdpa_create_mr(mvdev, NULL))
-   mlx5_vdpa_warn(mvdev, "create MR failed\n");
-   }
-   return;
-   }
 
if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
@@ -2192,6 +2176,29 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
 }
 
+static int mlx5_vdpa_reset(struct vdpa_device *vdev)
+{
+   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+
+   print_status(mvdev, 0, true);
+   mlx5_vdpa_info(mvdev, "performing device reset\n");
+   teardown_driver(ndev);
+   clear_vqs_ready(ndev);
+   mlx5_vdpa_destroy_mr(&ndev->mvdev);
+   ndev->mvdev.status = 0;
+   ndev->mvdev.mlx_features = 0;
+   memset(ndev->event_cbs, 0, sizeof(ndev->event_cbs));
+   ndev->mvdev.actual_features = 0;
+   ++mvdev->generation;
+   if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
+   if (mlx5_vdpa_create_mr(mvdev, NULL))
+   mlx5_vdpa_warn(mvdev, "create MR failed\n");
+   }
+
+   return 0;
+}
+
 static size_t mlx5_vdpa_get_config_size(struct vdpa_device *vdev)
 {
return sizeof(struct virtio_net_config);
@@ -2305,6 +2312,7 @@ static const st

[PATCH v12 04/13] vdpa: Fix some coding style issues

2021-08-30 Thread Xie Yongji
Fix some code indent issues and following checkpatch warning:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
371: FILE: include/linux/vdpa.h:371:
+static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
Reviewed-by: Stefano Garzarella 
---
 include/linux/vdpa.h | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 8cfe49d201dd..8ae1134070eb 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -43,17 +43,17 @@ struct vdpa_vq_state_split {
  * @last_used_idx: used index
  */
 struct vdpa_vq_state_packed {
-u16last_avail_counter:1;
-u16last_avail_idx:15;
-u16last_used_counter:1;
-u16last_used_idx:15;
+   u16 last_avail_counter:1;
+   u16 last_avail_idx:15;
+   u16 last_used_counter:1;
+   u16 last_used_idx:15;
 };
 
 struct vdpa_vq_state {
- union {
-  struct vdpa_vq_state_split split;
-  struct vdpa_vq_state_packed packed;
- };
+   union {
+   struct vdpa_vq_state_split split;
+   struct vdpa_vq_state_packed packed;
+   };
 };
 
 struct vdpa_mgmt_dev;
@@ -131,7 +131,7 @@ struct vdpa_iova_range {
  * @vdev: vdpa device
  * @idx: virtqueue index
  * @state: pointer to returned state 
(last_avail_idx)
- * @get_vq_notification:   Get the notification area for a virtqueue
+ * @get_vq_notification:   Get the notification area for a virtqueue
  * @vdev: vdpa device
  * @idx: virtqueue index
  * Returns the notifcation area
@@ -350,25 +350,25 @@ static inline struct device *vdpa_get_dma_dev(struct 
vdpa_device *vdev)
 
 static inline void vdpa_reset(struct vdpa_device *vdev)
 {
-const struct vdpa_config_ops *ops = vdev->config;
+   const struct vdpa_config_ops *ops = vdev->config;
 
vdev->features_valid = false;
-ops->set_status(vdev, 0);
+   ops->set_status(vdev, 0);
 }
 
 static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features)
 {
-const struct vdpa_config_ops *ops = vdev->config;
+   const struct vdpa_config_ops *ops = vdev->config;
 
vdev->features_valid = true;
-return ops->set_features(vdev, features);
+   return ops->set_features(vdev, features);
 }
 
-
-static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,
-  void *buf, unsigned int len)
+static inline void vdpa_get_config(struct vdpa_device *vdev,
+  unsigned int offset, void *buf,
+  unsigned int len)
 {
-const struct vdpa_config_ops *ops = vdev->config;
+   const struct vdpa_config_ops *ops = vdev->config;
 
/*
 * Config accesses aren't supposed to trigger before features are set.
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 03/13] file: Export receive_fd() to modules

2021-08-30 Thread Xie Yongji
Export receive_fd() so that some modules can use
it to pass file descriptor between processes without
missing any security stuffs.

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 fs/file.c| 6 ++
 include/linux/file.h | 7 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 86dc9956af32..210e540672aa 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1134,6 +1134,12 @@ int receive_fd_replace(int new_fd, struct file *file, 
unsigned int o_flags)
return new_fd;
 }
 
+int receive_fd(struct file *file, unsigned int o_flags)
+{
+   return __receive_fd(file, NULL, o_flags);
+}
+EXPORT_SYMBOL_GPL(receive_fd);
+
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
 {
int err = -EBADF;
diff --git a/include/linux/file.h b/include/linux/file.h
index 2de2e4613d7b..51e830b4fe3a 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -94,6 +94,9 @@ extern void fd_install(unsigned int fd, struct file *file);
 
 extern int __receive_fd(struct file *file, int __user *ufd,
unsigned int o_flags);
+
+extern int receive_fd(struct file *file, unsigned int o_flags);
+
 static inline int receive_fd_user(struct file *file, int __user *ufd,
  unsigned int o_flags)
 {
@@ -101,10 +104,6 @@ static inline int receive_fd_user(struct file *file, int 
__user *ufd,
return -EFAULT;
return __receive_fd(file, ufd, o_flags);
 }
-static inline int receive_fd(struct file *file, unsigned int o_flags)
-{
-   return __receive_fd(file, NULL, o_flags);
-}
 int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags);
 
 extern void flush_delayed_fput(void);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 02/13] eventfd: Export eventfd_wake_count to modules

2021-08-30 Thread Xie Yongji
Export eventfd_wake_count so that some modules can use
the eventfd_signal_count() to check whether the
eventfd_signal() call should be deferred to a safe context.

Signed-off-by: Xie Yongji 
---
 fs/eventfd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index e265b6dd4f34..1b3130b8d6c1 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -26,6 +26,7 @@
 #include 
 
 DEFINE_PER_CPU(int, eventfd_wake_count);
+EXPORT_PER_CPU_SYMBOL_GPL(eventfd_wake_count);
 
 static DEFINE_IDA(eventfd_ida);
 
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 01/13] iova: Export alloc_iova_fast() and free_iova_fast()

2021-08-30 Thread Xie Yongji
Export alloc_iova_fast() and free_iova_fast() so that
some modules can make use of the per-CPU cache to get
rid of rbtree spinlock in alloc_iova() and free_iova()
during IOVA allocation.

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
Acked-by: Will Deacon 
---
 drivers/iommu/iova.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b6cf5f16123b..3941ed6bb99b 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -521,6 +521,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
 
return new_iova->pfn_lo;
 }
+EXPORT_SYMBOL_GPL(alloc_iova_fast);
 
 /**
  * free_iova_fast - free iova pfn range into rcache
@@ -538,6 +539,7 @@ free_iova_fast(struct iova_domain *iovad, unsigned long 
pfn, unsigned long size)
 
free_iova(iovad, pfn);
 }
+EXPORT_SYMBOL_GPL(free_iova_fast);
 
 #define fq_ring_for_each(i, fq) \
for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % 
IOVA_FQ_SIZE)
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 00/13] Introduce VDUSE - vDPA Device in Userspace

2021-08-30 Thread Xie Yongji
This series introduces a framework that makes it possible to implement
software-emulated vDPA devices in userspace. And to make the device
emulation more secure, the emulated vDPA device's control path is handled
in the kernel and only the data path is implemented in the userspace.

Since the emuldated vDPA device's control path is handled in the kernel,
a message mechnism is introduced to make userspace be aware of the data
path related changes. Userspace can use read()/write() to receive/reply
the control messages.

In the data path, the core is mapping dma buffer into VDUSE daemon's
address space, which can be implemented in different ways depending on
the vdpa bus to which the vDPA device is attached.

In virtio-vdpa case, we implements a MMU-based software IOTLB with
bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
buffer is reside in a userspace memory region which can be shared to the
VDUSE userspace processs via transferring the shmfd.

The details and our user case is shown below:

-   
--
|Container ||  QEMU(VM) |   |   
VDUSE daemon |
|   -  ||  ---  |   | 
-  |
|   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
emulation | | block driver | |
+--- ---+   
-+--+-
|   ||  
|
|   ||  
|
+---++--+-
|| block device |   |  vhost device || vduse driver |   
   | TCP/IP ||
|---+   +---+   
   -+|
|   |   |   |   
||
| --+--   --+--- ---+---
||
| | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |
||
| --+--   --+--- ---+---
||
|   |  virtio bus   |   |   
||
|   ++---   |   |   
||
||  |   |   
||
|  --+--|   |   
||
|  | virtio-blk device ||   |   
||
|  --+--|   |   
||
||  |   |   
||
| ---+---   |   |   
||
| |  virtio-vdpa driver |   |   |   
||
| ---+---   |   |   
||
||  |   |vdpa 
bus   ||
| 
---+--+---+ 
  ||
|   
 ---+--- |
-|
 NIC |--

 ---+---

|

   -+-

   | Remote Storages |

   ---

We make use of it to implement a block device connecting to
our distributed storage, which can be used both in containers and
VMs. Thus, we can have an unified technology stack in this two cases.

To test it with null-blk:

  $ qemu-storage-daemon \
  --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
  --monitor chardev=charmonitor \
  --blockdev 
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
 \
  --export 
type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queu

Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-08-30 Thread Christoph Hellwig
Sorry for the delayed answer, but I look at the vmap_pfn usage in the
previous version and tried to come up with a better version.  This
mostly untested branch:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap

get us there for swiotlb and the channel infrastructure  I've started
looking at the network driver and didn't get anywhere due to other work.

As far as I can tell the network driver does gigantic multi-megabyte
vmalloc allocation for the send and receive buffers, which are then
passed to the hardware, but always copied to/from when interacting
with the networking stack.  Did I see that right?  Are these big
buffers actually required unlike the normal buffer management schemes
in other Linux network drivers?

If so I suspect the best way to allocate them is by not using vmalloc
but just discontiguous pages, and then use kmap_local_pfn where the
PFN includes the share_gpa offset when actually copying from/to the
skbs.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH -next v4 2/2] iommu/arm-smmu-v3: Add suspend and resume support

2021-08-30 Thread Bixuan Cui



On 2021/8/30 15:38, Bixuan Cui wrote:
> Changes in v4:
> * Restore the arm_smmu_suspend() function code to the v2 version(Directly 
> return 0 in it).
Hello,
I looked up the code of smmu.c and the manual of smmu v3. For the suspend is 
implemented
by the external clock-gating, it is not clear what the SMMUv3 needs to do. Is 
it the same
as that in smmu.c?
I roll back the code here, and hope to get some advice.

Thanks,
Bixuan Cui
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next v4 1/2] platform-msi: Save the msg context to desc in platform_msi_write_msg()

2021-08-30 Thread Bixuan Cui
Save the msg context to desc when when the msi interrupt is requested.
The drivers can use it in special scenarios(such as resume).

Signed-off-by: Bixuan Cui 
---
 drivers/base/platform-msi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 3d6c8f9caf43..60962a224fcc 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -90,6 +90,9 @@ static void platform_msi_write_msg(struct irq_data *data, 
struct msi_msg *msg)
 
priv_data = desc->platform.msi_priv_data;
 
+   desc->msg.address_lo = msg->address_lo;
+   desc->msg.address_hi = msg->address_hi;
+   desc->msg.data = msg->data;
priv_data->write_msg(desc, msg);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next v4 2/2] iommu/arm-smmu-v3: Add suspend and resume support

2021-08-30 Thread Bixuan Cui
Add suspend and resume support for arm-smmu-v3 by low-power mode.

When the smmu is suspended, it is powered off and the registers are
cleared. So saves the msi_msg context during msi interrupt initialization
of smmu. When resume happens it calls arm_smmu_device_reset() to restore
the registers.

Signed-off-by: Bixuan Cui 
Reviewed-by: Wei Yongjun 
Reviewed-by: Zhen Lei 
Reviewed-by: Ding Tianhong 
Reviewed-by: Hanjun Guo 
---
Changes in v4:
* Restore the arm_smmu_suspend() function code to the v2 version(Directly 
return 0 in it).

Changes in v3:
* Move the code of save msg context into msi platform (A new patch: 
"platform-msi: Save the msg context to desc in platform_msi_write_msg()").

* Add bypass member to the struct arm_smmu_device for per-SMMU bypass 
control(used by resume mode).

* Separate interrupt requests and register operations in 
arm_smmu_device_reset(). The arm_smmu_setup_irqs() is added to request 
interrupts and is called before arm_smmu_device_reset(). The 
arm_smmu_device_reset() resets irq registers (can also be called in resume 
mode).

Changes in v2:
* Using get_cached_msi_msg() instead of the descriptor to resume msi_msg
  in arm_smmu_resume_msis();

* Move arm_smmu_resume_msis() from arm_smmu_setup_unique_irqs() into
  arm_smmu_setup_irqs() and rename it to arm_smmu_resume_unique_irqs();

  Call arm_smmu_setup_unique_irqs() to configure the IRQ during probe and
  call arm_smmu_resume_unique_irqs() in resume mode to restore the IRQ
  registers to make the code more reasonable.

* Call arm_smmu_device_disable() to disable smmu and clear CR0_SMMUEN on
  suspend. Then the warning about CR0_SMMUEN being enabled can be cleared
  on resume.

* Using SET_SYSTEM_SLEEP_PM_OPS();
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 94 +++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
 2 files changed, 88 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a388e318f86e..af6752a735bb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3282,6 +3282,61 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
return 0;
 }
 
+static void arm_smmu_reset_unique_irqs(struct arm_smmu_device *smmu)
+{
+   struct msi_desc *desc;
+   struct msi_msg msg;
+
+   desc = irq_get_msi_desc(smmu->evtq.q.irq);
+   if (desc) {
+   get_cached_msi_msg(smmu->evtq.q.irq, &msg);
+   arm_smmu_write_msi_msg(desc, &msg);
+   }
+
+   desc = irq_get_msi_desc(smmu->gerr_irq);
+   if (desc) {
+   get_cached_msi_msg(smmu->gerr_irq, &msg);
+   arm_smmu_write_msi_msg(desc, &msg);
+   }
+
+   if (smmu->features & ARM_SMMU_FEAT_PRI) {
+   desc = irq_get_msi_desc(smmu->priq.q.irq);
+   if (desc) {
+   get_cached_msi_msg(smmu->priq.q.irq, &msg);
+   arm_smmu_write_msi_msg(desc, &msg);
+   }
+   }
+}
+
+static int arm_smmu_reset_irqs(struct arm_smmu_device *smmu)
+{
+   int ret, irq;
+   u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
+
+   /* Disable IRQs first */
+   ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
+ ARM_SMMU_IRQ_CTRLACK);
+   if (ret) {
+   dev_err(smmu->dev, "failed to disable irqs\n");
+   return ret;
+   }
+
+   irq = smmu->combined_irq;
+   if (!irq)
+   arm_smmu_reset_unique_irqs(smmu);
+
+   if (smmu->features & ARM_SMMU_FEAT_PRI)
+   irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
+
+   /* Enable interrupt generation on the SMMU */
+   ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
+ ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
+   if (ret)
+   dev_warn(smmu->dev, "failed to reset irqs\n");
+
+   return 0;
+}
+
 static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
 {
int ret;
@@ -3293,7 +3348,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device 
*smmu)
return ret;
 }
 
-static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
+static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 {
int ret;
u32 reg, enables;
@@ -3401,9 +3456,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
}
}
 
-   ret = arm_smmu_setup_irqs(smmu);
+   ret = arm_smmu_reset_irqs(smmu);
if (ret) {
-   dev_err(smmu->dev, "failed to setup irqs\n");
+   dev_err(smmu->dev, "failed to reset irqs\n");
return ret;
}
 
@@ -3411,7 +3466,7 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
 
/* Enable the SMMU interface, or ensure