Re: [PATCH v8 0/9] x86: tag application address space for devices

2020-09-16 Thread Joerg Roedel
On Tue, Sep 15, 2020 at 09:30:04AM -0700, Fenghua Yu wrote:
> Ashok Raj (1):
>   Documentation/x86: Add documentation for SVA (Shared Virtual
> Addressing)
> 
> Fenghua Yu (7):
>   drm, iommu: Change type of pasid to u32
>   iommu/vt-d: Change flags type to unsigned int in binding mm
>   x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
>   x86/msr-index: Define IA32_PASID MSR
>   mm: Define pasid in mm
>   x86/cpufeatures: Mark ENQCMD as disabled when configured out
>   x86/mmu: Allocate/free PASID
> 
> Yu-cheng Yu (1):
>   x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

For the IOMMU bits:

Acked-by: Joerg Roedel 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent

2020-09-16 Thread Neil Armstrong
Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy 
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi 
> b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>   };
>   };
>  };
> +
> +&mali {
> + dma-coherent;
> +};
> 

Thanks a lot for digging, I'll run a test to confirm it fixes the issue !

Neil
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jean-Philippe Brucker
On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, September 15, 2020 10:29 PM
> >
> > > Do they need a device at all?  It's not clear to me why RID based
> > > IOMMU management fits within vfio's scope, but PASID based does not.
> > 
> > In RID mode vfio-pci completely owns the PCI function, so it is more
> > natural that VFIO, as the sole device owner, would own the DMA mapping
> > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
> > so there is not much reason to try and disaggregate the API.
> 
> It is also used by vDPA.
> 
> > 
> > PASID on the other hand, is shared. vfio-mdev drivers will share the
> > device with other kernel drivers. PASID and DMA will be concurrent
> > with VFIO and other kernel drivers/etc.
> > 
> 
> Looks you are equating PASID to host-side sharing, while ignoring 
> another valid usage that a PASID-capable device is passed through
> to the guest through vfio-pci and then PASID is used by the guest 
> for guest-side sharing. In such case, it is an exclusive usage in host
> side and then what is the problem for VFIO to manage PASID given
> that vfio-pci completely owns the function?

And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
the PASID space of a PCI function cannot be shared between host and guest,
so we assign the whole PASID table along with the RID. Since we need the
BIND, INVALIDATE, and report APIs introduced here to support nested
translation, a /dev/sva interface would need to support this mode as well.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: fix interrupt remapping for avic

2020-09-16 Thread Suravee Suthikulpanit



On 9/15/20 8:19 PM, Joao Martins wrote:

On 9/15/20 1:30 PM, Suravee Suthikulpanit wrote:

On 9/15/20 6:25 PM, Maxim Levitsky wrote:

On Mon, 2020-09-14 at 21:48 +0700, Suravee Suthikulpanit wrote:

Could you please try with the following patch instead?

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3840,14 +3840,18 @@ int amd_iommu_activate_guest_mode(void *data)
{
   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
+   u64 valid;

   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
   !entry || entry->lo.fields_vapic.guest_mode)
   return 0;

+   valid = entry->lo.fields_vapic.valid;
+
   entry->lo.val = 0;
   entry->hi.val = 0;

+   entry->lo.fields_vapic.valid   = valid;
   entry->lo.fields_vapic.guest_mode  = 1;
   entry->lo.fields_vapic.ga_log_intr = 1;
   entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
@@ -3864,12 +3868,14 @@ int amd_iommu_deactivate_guest_mode(void *data)
   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
   struct irq_cfg *cfg = ir_data->cfg;
-   u64 valid = entry->lo.fields_remap.valid;
+   u64 valid;

   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
   !entry || !entry->lo.fields_vapic.guest_mode)
   return 0;

+   valid = entry->lo.fields_remap.valid;
+
   entry->lo.val = 0;
   entry->hi.val = 0;

I see. I based my approach on the fact that valid bit was
set always to true anyway before, plus that amd_iommu_activate_guest_mode
should be really only called when someone activates a valid interrupt remapping
entry, but IMHO the approach of preserving the valid bit is safer anyway.

It works on my system (I applied the patch manually, since either your or my 
email client,
seems to mangle the patch)



Sorry for the mangled patch. I'll submit the patch w/ your information. Thanks 
for your help reporting, debugging, and
testing the patch.


I assume you're only doing the valid bit preservation in 
amd_iommu_activate_guest_mode() ?
The null deref fix in amd_iommu_deactivate_guest_mode() was fixed elsewhere[0], 
or are you
planning on merging both changes like the diff you attached?


I am planning to send a separate patch just for amd_iommu_activate_guest_mode().


Asking also because commit 26e495f341 ("iommu/amd: Restore IRTE.RemapEn bit 
after
programming IRTE") was added in v5.4 and v5.8 stable trees but the v5.4 
backport didn't
include e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit 
IRTE").


We should probably backport the e52d58d54a321 along with the fixes in amd_iommu_activate_guest_mode() and 
amd_iommu_deactivate_guest_mode(). I'll work with the community to get these back-ported.


Thanks,
Suravee

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/8] iommu/arm-smmu: Support maintaining bootloader mappings

2020-09-16 Thread Laurentiu Tudor


On 9/4/2020 6:55 PM, Bjorn Andersson wrote:
> Based on previous attempts and discussions this is the latest attempt at
> inheriting stream mappings set up by the bootloader, for e.g. boot splash or
> efifb.
> 
> Per Will's request this builds on the work by Jordan and Rob for the Adreno
> SMMU support. It applies cleanly ontop of v16 of their series, which can be
> found at
> https://lore.kernel.org/linux-arm-msm/20200901164707.2645413-1-robdcl...@gmail.com/
> 
> Bjorn Andersson (8):
>   iommu/arm-smmu: Refactor context bank allocation
>   iommu/arm-smmu: Delay modifying domain during init
>   iommu/arm-smmu: Consult context bank allocator for identify domains
>   iommu/arm-smmu-qcom: Emulate bypass by using context banks
>   iommu/arm-smmu-qcom: Consistently initialize stream mappings
>   iommu/arm-smmu: Add impl hook for inherit boot mappings
>   iommu/arm-smmu: Provide helper for allocating identity domain
>   iommu/arm-smmu-qcom: Setup identity domain for boot mappings
> 
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 111 ++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c  | 122 ++---
>  drivers/iommu/arm/arm-smmu/arm-smmu.h  |  14 ++-
>  3 files changed, 205 insertions(+), 42 deletions(-)
> 

Tested on a NXP LX2160A with John's tree [1] and below diff [2], so for
the whole series:

Tested-by: Laurentiu Tudor 

[1]
https://git.linaro.org/people/john.stultz/android-dev.git/log/?h=dev/db845c-mainline-WIP
[2]
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -190,11 +190,43 @@ static const struct arm_smmu_impl mrvl_mmu500_impl = {
.reset = arm_mmu500_reset,
 };

+static int nxp_smmu_inherit_mappings(struct arm_smmu_device *smmu)
+{
+   u32 smr;
+   int i, cnt = 0;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+
+   if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
+   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+   smmu->smrs[i].mask =
FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+   smmu->smrs[i].valid = true;
+
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].count++;
+
+   cnt++;
+   }
+   }
+
+   dev_notice(smmu->dev, "\tpreserved %d boot mapping%s\n", cnt,
+  cnt == 1 ? "" : "s");
+
+   return 0;
+}
+
+static const struct arm_smmu_impl nxp_impl = {
+   .inherit_mappings = nxp_smmu_inherit_mappings,
+};

 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 {
const struct device_node *np = smmu->dev->of_node;

/*
 * Set the impl for model-specific implementation quirks first,
 * such that platform integration quirks can pick it up and
@@ -229,5 +261,12 @@ struct arm_smmu_device *arm_smmu_impl_init(struct
arm_smmu_device *smmu)
if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
smmu->impl = &mrvl_mmu500_impl;

+   if (of_property_read_bool(np, "nxp,keep-bypass-mappings"))
+   smmu->impl = &nxp_impl;

---
Best Regards, Laurentiu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/amd: Restore IRTE.RemapEn bit for amd_iommu_activate_guest_mode

2020-09-16 Thread Suravee Suthikulpanit
Commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when updating
128-bit IRTE") removed an assumption that modify_irte_ga always set
the valid bit, which requires the callers to set the appropriate value
for the struct irte_ga.valid bit before calling the function.

Similar to the commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn
bit after programming IRTE"), which is for the function
amd_iommu_deactivate_guest_mode().

The same change is also needed for the amd_iommu_activate_guest_mode().
Otherwise, this could trigger IO_PAGE_FAULT for the VFIO based VMs with
AVIC enabled.

Reported-by: Maxim Levitsky 
Tested-by: Maxim Levitsky 
Cc: Joao Martins 
Fixes: e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit 
IRTE")
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e938677af8bc..db4fb840c59c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3900,14 +3900,18 @@ int amd_iommu_activate_guest_mode(void *data)
 {
struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
+   u64 valid;
 
if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
!entry || entry->lo.fields_vapic.guest_mode)
return 0;
 
+   valid = entry->lo.fields_vapic.valid;
+
entry->lo.val = 0;
entry->hi.val = 0;
 
+   entry->lo.fields_vapic.valid   = valid;
entry->lo.fields_vapic.guest_mode  = 1;
entry->lo.fields_vapic.ga_log_intr = 1;
entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] iommu: amd: Re-purpose Exclusion range registers to support SNP CWWB

2020-09-16 Thread Suravee Suthikulpanit
When the IOMMU SNP support bit is set in the IOMMU Extended Features
register, hardware re-purposes the following registers:

1. IOMMU Exclusion Base register (MMIO offset 0020h) to
   Completion Wait Write-Back (CWWB) Base register

2. IOMMU Exclusion Range Limit (MMIO offset 0028h) to
   Completion Wait Write-Back (CWWB) Range Limit register

and requires the IOMMU CWWB semaphore base and range to be programmed
in the register offset 0020h and 0028h accordingly.

Cc: Brijesh Singh 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/init.c| 26 ++
 2 files changed, 27 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 1e7966c73707..f696ac7c5f89 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -93,6 +93,7 @@
 #define FEATURE_PC (1ULL<<9)
 #define FEATURE_GAM_VAPIC  (1ULL<<21)
 #define FEATURE_EPHSUP (1ULL<<50)
+#define FEATURE_SNP(1ULL<<63)
 
 #define FEATURE_PASID_SHIFT32
 #define FEATURE_PASID_MASK (0x1fULL << FEATURE_PASID_SHIFT)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index febc072f2717..c55df4347487 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -359,6 +359,29 @@ static void iommu_set_exclusion_range(struct amd_iommu 
*iommu)
&entry, sizeof(entry));
 }
 
+static void iommu_set_cwwb_range(struct amd_iommu *iommu)
+{
+   u64 start = iommu_virt_to_phys((void *)iommu->cmd_sem);
+   u64 entry = start & PM_ADDR_MASK;
+
+   if (!iommu_feature(iommu, FEATURE_SNP))
+   return;
+
+   /* Note:
+* Re-purpose Exclusion base/limit registers for Completion wait
+* write-back base/limit.
+*/
+   memcpy_toio(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET,
+   &entry, sizeof(entry));
+
+   /* Note:
+* Default to 4 Kbytes, which can be specified by setting base
+* address equal to the limit address.
+*/
+   memcpy_toio(iommu->mmio_base + MMIO_EXCL_LIMIT_OFFSET,
+   &entry, sizeof(entry));
+}
+
 /* Programs the physical address of the device table into the IOMMU hardware */
 static void iommu_set_device_table(struct amd_iommu *iommu)
 {
@@ -1901,6 +1924,9 @@ static int __init amd_iommu_init_pci(void)
ret = iommu_init_pci(iommu);
if (ret)
break;
+
+   /* Need to setup range after PCI init */
+   iommu_set_cwwb_range(iommu);
}
 
/*
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] iommu: amd: Use 4K page for completion wait write-back semaphore

2020-09-16 Thread Suravee Suthikulpanit
IOMMU SNP support requires the completion wait write-back semaphore to be
implemented using a 4K-aligned page, where the page address is to be
programmed into the newly introduced MMIO base/range registers.

This new scheme uses a per-iommu atomic variable to store the current
semaphore value, which is incremented for every completion wait command.

Since this new scheme is also compatible with non-SNP mode,
generalize the driver to use 4K page for completion-wait semaphore in
both modes.

Cc: Brijesh Singh 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 ++-
 drivers/iommu/amd/init.c| 18 ++
 drivers/iommu/amd/iommu.c   | 23 +++
 3 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 30a5d412255a..4c80483e78ec 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -595,7 +595,8 @@ struct amd_iommu {
 #endif
 
u32 flags;
-   volatile u64 __aligned(8) cmd_sem;
+   volatile u64 *cmd_sem;
+   u64 cmd_sem_val;
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
/* DebugFS Info */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 445a08d23fed..febc072f2717 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -813,6 +813,19 @@ static int iommu_init_ga(struct amd_iommu *iommu)
return ret;
 }
 
+static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
+{
+   iommu->cmd_sem = (void *)get_zeroed_page(GFP_KERNEL);
+
+   return iommu->cmd_sem ? 0 : -ENOMEM;
+}
+
+static void __init free_cwwb_sem(struct amd_iommu *iommu)
+{
+   if (iommu->cmd_sem)
+   free_page((unsigned long)iommu->cmd_sem);
+}
+
 static void iommu_enable_xt(struct amd_iommu *iommu)
 {
 #ifdef CONFIG_IRQ_REMAP
@@ -1395,6 +1408,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 
 static void __init free_iommu_one(struct amd_iommu *iommu)
 {
+   free_cwwb_sem(iommu);
free_command_buffer(iommu);
free_event_buffer(iommu);
free_ppr_log(iommu);
@@ -1481,6 +1495,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h)
int ret;
 
raw_spin_lock_init(&iommu->lock);
+   iommu->cmd_sem_val = 0;
 
/* Add IOMMU to internal data structures */
list_add_tail(&iommu->list, &amd_iommu_list);
@@ -1558,6 +1573,9 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h)
if (!iommu->mmio_base)
return -ENOMEM;
 
+   if (alloc_cwwb_sem(iommu))
+   return -ENOMEM;
+
if (alloc_command_buffer(iommu))
return -ENOMEM;
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 07ae8b93887e..a1d2c749a21f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -792,11 +792,11 @@ irqreturn_t amd_iommu_int_handler(int irq, void *data)
  *
  /
 
-static int wait_on_sem(volatile u64 *sem)
+static int wait_on_sem(struct amd_iommu *iommu, u64 data)
 {
int i = 0;
 
-   while (*sem == 0 && i < LOOP_TIMEOUT) {
+   while (*iommu->cmd_sem != data && i < LOOP_TIMEOUT) {
udelay(1);
i += 1;
}
@@ -827,16 +827,16 @@ static void copy_cmd_to_buffer(struct amd_iommu *iommu,
writel(tail, iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
 }
 
-static void build_completion_wait(struct iommu_cmd *cmd, u64 address)
+static void build_completion_wait(struct iommu_cmd *cmd,
+ struct amd_iommu *iommu,
+ u64 data)
 {
-   u64 paddr = iommu_virt_to_phys((void *)address);
-
-   WARN_ON(address & 0x7ULL);
+   u64 paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
 
memset(cmd, 0, sizeof(*cmd));
cmd->data[0] = lower_32_bits(paddr) | CMD_COMPL_WAIT_STORE_MASK;
cmd->data[1] = upper_32_bits(paddr);
-   cmd->data[2] = 1;
+   cmd->data[2] = data;
CMD_SET_TYPE(cmd, CMD_COMPL_WAIT);
 }
 
@@ -1045,22 +1045,21 @@ static int iommu_completion_wait(struct amd_iommu 
*iommu)
struct iommu_cmd cmd;
unsigned long flags;
int ret;
+   u64 data;
 
if (!iommu->need_sync)
return 0;
 
-
-   build_completion_wait(&cmd, (u64)&iommu->cmd_sem);
-
raw_spin_lock_irqsave(&iommu->lock, flags);
 
-   iommu->cmd_sem = 0;
+   data = ++iommu->cmd_sem_val;
+   build_completion_wait(&cmd, iommu, data);
 
ret = __iommu_queue_command_sync(iommu, &cmd, false);
if (ret)
goto out_unlock;
 
-   ret = wait_on_sem(&iommu->cmd_sem);
+   ret = wait_on_sem(iommu, data);
 
 out_unlock:
raw_spin_unlock_irqrestore(&iommu->lock, flags);
-- 
2.17.1

__

[PATCH 0/3] amd : iommu : Initial IOMMU support for SNP

2020-09-16 Thread Suravee Suthikulpanit
Introducing support for AMD Secure Nested Paging (SNP) with IOMMU,
which mainly affects the use of IOMMU Exclusion Base and Range Limit
registers. Note that these registers are no longer used by Linux IOMMU
driver. Patch 2 and 3 are SNP-specific, and discuss detail of
the implementation.

In order to support SNP, the current Completion Wait Write-back logic
is modified (patch 1/4). This change is independent from SNP.

Please see the following white paper for more info on SNP:
  
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
 

Thank you,
Suravee

Suravee Suthikulpanit (3):
  iommu: amd: Use 4K page for completion wait write-back semaphore
  iommu: amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR
  iommu: amd: Re-purpose Exclusion range registers to support SNP CWWB

 drivers/iommu/amd/amd_iommu_types.h |  6 ++-
 drivers/iommu/amd/init.c| 44 +++
 drivers/iommu/amd/iommu.c   | 84 -
 3 files changed, 121 insertions(+), 13 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu: amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR

2020-09-16 Thread Suravee Suthikulpanit
IOMMU SNP support introduces two new IOMMU events:
  * RMP Page Fault event
  * RMP Hardware Error event

Hence, add reporting functions for these events.

Cc: Brijesh Singh 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 +
 drivers/iommu/amd/iommu.c   | 61 +
 2 files changed, 63 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 4c80483e78ec..1e7966c73707 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -128,6 +128,8 @@
 #define EVENT_TYPE_IOTLB_INV_TO0x7
 #define EVENT_TYPE_INV_DEV_REQ 0x8
 #define EVENT_TYPE_INV_PPR_REQ 0x9
+#define EVENT_TYPE_RMP_FAULT   0xd
+#define EVENT_TYPE_RMP_HW_ERR  0xe
 #define EVENT_DEVID_MASK   0x
 #define EVENT_DEVID_SHIFT  0
 #define EVENT_DOMID_MASK_LO0x
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1d2c749a21f..73c035161f28 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -486,6 +486,61 @@ static void dump_command(unsigned long phys_addr)
pr_err("CMD[%d]: %08x\n", i, cmd->data[i]);
 }
 
+static void amd_iommu_report_rmp_hw_error(volatile u32 *event)
+{
+   struct pci_dev *pdev;
+   struct iommu_dev_data *dev_data = NULL;
+   int devid = (event[0] >> EVENT_DEVID_SHIFT) & EVENT_DEVID_MASK;
+   int vmg_tag   = (event[1]) & 0x;
+   int flags = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
+   u64 spa   = ((u64)event[3] << 32) | (event[2] & 0xFFF8);
+
+   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+  devid & 0xff);
+   if (pdev)
+   dev_data = dev_iommu_priv_get(&pdev->dev);
+
+   if (dev_data && __ratelimit(&dev_data->rs)) {
+   pci_err(pdev, "Event logged [RMP_HW_ERROR devid=0x%04x, 
vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
+   devid, vmg_tag, spa, flags);
+   } else {
+   pr_err_ratelimited("Event logged [RMP_HW_ERROR 
device=%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
+   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   vmg_tag, spa, flags);
+   }
+
+   if (pdev)
+   pci_dev_put(pdev);
+}
+
+static void amd_iommu_report_rmp_fault(volatile u32 *event)
+{
+   struct pci_dev *pdev;
+   struct iommu_dev_data *dev_data = NULL;
+   int devid = (event[0] >> EVENT_DEVID_SHIFT) & EVENT_DEVID_MASK;
+   int flags_rmp = (event[0] >> EVENT_FLAGS_SHIFT) & 0xFF;
+   int vmg_tag   = (event[1]) & 0x;
+   int flags = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
+   u64 gpa   = ((u64)event[3] << 32) | event[2];
+
+   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+  devid & 0xff);
+   if (pdev)
+   dev_data = dev_iommu_priv_get(&pdev->dev);
+
+   if (dev_data && __ratelimit(&dev_data->rs)) {
+   pci_err(pdev, "Event logged [RMP_PAGE_FAULT devid=0x%04x, 
vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n",
+   devid, vmg_tag, gpa, flags_rmp, flags);
+   } else {
+   pr_err_ratelimited("Event logged [RMP_PAGE_FAULT 
device=%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, 
flags=0x%04x]\n",
+   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   vmg_tag, gpa, flags_rmp, flags);
+   }
+
+   if (pdev)
+   pci_dev_put(pdev);
+}
+
 static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
u64 address, int flags)
 {
@@ -577,6 +632,12 @@ static void iommu_print_event(struct amd_iommu *iommu, 
void *__evt)
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
break;
+   case EVENT_TYPE_RMP_FAULT:
+   amd_iommu_report_rmp_fault(event);
+   break;
+   case EVENT_TYPE_RMP_HW_ERR:
+   amd_iommu_report_rmp_hw_error(event);
+   break;
case EVENT_TYPE_INV_PPR_REQ:
pasid = PPR_PASID(*((u64 *)__evt));
tag = event[1] & 0x03FF;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 18/24] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

2020-09-16 Thread Jean-Philippe Brucker
On Fri, Aug 28, 2020 at 05:28:22PM +0800, Zenghui Yu wrote:
> On 2020/5/20 1:54, Jean-Philippe Brucker wrote:
> > @@ -4454,6 +4470,12 @@ static int arm_smmu_device_hw_probe(struct 
> > arm_smmu_device *smmu)
> > smmu->features |= ARM_SMMU_FEAT_E2H;
> > }
> > +   if (reg & (IDR0_HA | IDR0_HD)) {
> > +   smmu->features |= ARM_SMMU_FEAT_HA;
> > +   if (reg & IDR0_HD)
> > +   smmu->features |= ARM_SMMU_FEAT_HD;
> > +   }
> > +
> 
> nitpick:
> 
> As per the IORT spec (DEN0049D, 3.1.1.2 SMMUv3 node, Table 10), the
> "HTTU Override" flag of the SMMUv3 node can override the value in
> SMMU_IDR0.HTTU. You may want to check this bit before selecting the
> {HA,HD} features and shout if there is a mismatch between firmware and
> the SMMU implementation. Just like how ARM_SMMU_FEAT_COHERENCY is
> selected.

Thanks for pointing this out, I didn't know about these flags but have
added them to the patch now.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, September 15, 2020 10:29 PM
> >
> > > Do they need a device at all?  It's not clear to me why RID based
> > > IOMMU management fits within vfio's scope, but PASID based does not.
> > 
> > In RID mode vfio-pci completely owns the PCI function, so it is more
> > natural that VFIO, as the sole device owner, would own the DMA mapping
> > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
> > so there is not much reason to try and disaggregate the API.
> 
> It is also used by vDPA.

A driver in VDPA, not VDPA itself.

> > PASID on the other hand, is shared. vfio-mdev drivers will share the
> > device with other kernel drivers. PASID and DMA will be concurrent
> > with VFIO and other kernel drivers/etc.
> 
> Looks you are equating PASID to host-side sharing, while ignoring 
> another valid usage that a PASID-capable device is passed through
> to the guest through vfio-pci and then PASID is used by the guest
> for guest-side sharing. In such case, it is an exclusive usage in host
> side and then what is the problem for VFIO to manage PASID given
> that vfio-pci completely owns the function?

This is no different than vfio-pci being yet another client to
/dev/sva

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
> And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
> the PASID space of a PCI function cannot be shared between host and guest,
> so we assign the whole PASID table along with the RID. Since we need the
> BIND, INVALIDATE, and report APIs introduced here to support nested
> translation, a /dev/sva interface would need to support this mode as well.

Well, that means this HW cannot support PASID capable 'SIOV' style
devices in guests.

I admit whole function PASID delegation might be something vfio-pci
should handle - but only if it really doesn't fit in some /dev/sva
after we cover the other PASID cases.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent

2020-09-16 Thread Neil Armstrong
On 16/09/2020 01:51, Robin Murphy wrote:
> According to a downstream commit I found in the Khadas vendor kernel,
> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
> how to handle this properly) we should describe it as such. Otherwise
> the mismatch leads to all manner of fun with mismatched attributes and
> inadvertently snooping stale data from caches, which would account for
> at least some of the brokenness observed on this platform.
> 
> Signed-off-by: Robin Murphy 
> ---
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi 
> b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> index 9b8548e5f6e5..ee8fcae9f9f0 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
> @@ -135,3 +135,7 @@ map1 {
>   };
>   };
>  };
> +
> +&mali {
> + dma-coherent;
> +};
> 

Reviewed-by: Neil Armstrong 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:
> > If user space wants to bind page tables, create the PASID with
> > /dev/sva, use ioctls there to setup the page table the way it wants,
> > then pass the now configured PASID to a driver that can use it. 
> 
> Are we talking about bare metal SVA? 

What a weird term.

> If so, I don't see the need for userspace to know there is a
> PASID. All user space need is that my current mm is bound to a
> device by the driver. So it can be a one-step process for user
> instead of two.

You've missed the entire point of the conversation, VDPA already needs
more than "my current mm is bound to a device"

> > PASID managment and binding is seperated from the driver(s) that are
> > using the PASID.
> 
> Why separate? Drivers need to be involved in PASID life cycle
> management. For example, when tearing down a PASID, the driver needs to
> stop DMA, IOMMU driver needs to unbind, etc. If driver is the control
> point, then things are just in order. I am referring to bare metal SVA.

Drivers can be involved and still have the uAPIs seperate. It isn't hard.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/amd: Restore IRTE.RemapEn bit for amd_iommu_activate_guest_mode

2020-09-16 Thread Maxim Levitsky
On Wed, 2020-09-16 at 11:17 +, Suravee Suthikulpanit wrote:
> Commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when updating
> 128-bit IRTE") removed an assumption that modify_irte_ga always set
> the valid bit, which requires the callers to set the appropriate value
> for the struct irte_ga.valid bit before calling the function.
> 
> Similar to the commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn
> bit after programming IRTE"), which is for the function
> amd_iommu_deactivate_guest_mode().
> 
> The same change is also needed for the amd_iommu_activate_guest_mode().
> Otherwise, this could trigger IO_PAGE_FAULT for the VFIO based VMs with
> AVIC enabled.
> 
> Reported-by: Maxim Levitsky 
> Tested-by: Maxim Levitsky 
> Cc: Joao Martins 
> Fixes: e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit 
> IRTE")
> Signed-off-by: Suravee Suthikulpanit 
> ---
>  drivers/iommu/amd/iommu.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index e938677af8bc..db4fb840c59c 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3900,14 +3900,18 @@ int amd_iommu_activate_guest_mode(void *data)
>  {
>   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
>   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
> + u64 valid;
>  
>   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
>   !entry || entry->lo.fields_vapic.guest_mode)
>   return 0;
>  
> + valid = entry->lo.fields_vapic.valid;
> +
>   entry->lo.val = 0;
>   entry->hi.val = 0;
>  
> + entry->lo.fields_vapic.valid   = valid;
>   entry->lo.fields_vapic.guest_mode  = 1;
>   entry->lo.fields_vapic.ga_log_intr = 1;
>   entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;

Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] drm: panfrost: Coherency support

2020-09-16 Thread Neil Armstrong
Hi Robin,

On 16/09/2020 01:51, Robin Murphy wrote:
> Hi all,
> 
> I polished up my original proof-of-concept a little while back, but now
> that I've got my hands on my Juno again I've been able to actually test
> it to my satisfaction, so here are proper patches!

I tested on the Kkadas VIM3, and yes it fixes the random FAULTS I have 
*without*:
[  152.417127] panfrost ffe4.gpu: gpu sched timeout, js=0, config=0x7300, 
status=0x58, head=0x3091400, tail=0x3091400, sched_job=4d83c2d7
[  152.530928] panfrost ffe4.gpu: js fault, js=1, status=INSTR_INVALID_ENC, 
head=0x30913c0, tail=0x30913c0
[  152.539797] panfrost ffe4.gpu: gpu sched timeout, js=1, config=0x7300, 
status=0x51, head=0x30913c0, tail=0x30913c0, sched_job=38cecaf6
[  156.943505] panfrost ffe4.gpu: js fault, js=0, status=TILE_RANGE_FAULT, 
head=0x3091400, tail=0x3091400

but, with this patchset, I get the following fps with kmscube:
Rendered 97 frames in 2.016291 sec (48.108145 fps)
Rendered 206 frames in 4.016723 sec (51.285584 fps)
Rendered 316 frames in 6.017208 sec (52.516052 fps)
Rendered 430 frames in 8.017456 sec (53.632975 fps)

but when I resurrect my BROKEN_NS patchset (simply disabling shareability), I 
get:
Rendered 120 frames in 2.000143 sec (59.995724 fps)
Rendered 241 frames in 4.016760 sec (59.998605 fps)
Rendered 362 frames in 6.033443 sec (59.998911 fps)
Rendered 482 frames in 8.033531 sec (59.998522 fps)

So I get a performance regression with the dma-coherent approach, even if it's
clearly the cleaner.

So:
Tested-by: Neil Armstrong 

Neil

> 
> It probably makes sense for patches #1 and #2 to stay together and both
> go via drm-misc, provided Will's OK with that.
> 
> Robin.
> 
> 
> Robin Murphy (3):
>   iommu/io-pgtable-arm: Support coherency for Mali LPAE
>   drm/panfrost: Support cache-coherent integrations
>   arm64: dts: meson: Describe G12b GPU as coherent
> 
>  arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_gem.c | 2 ++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 +
>  drivers/iommu/io-pgtable-arm.c  | 5 -
>  6 files changed, 14 insertions(+), 1 deletion(-)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jean-Philippe Brucker
On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
> > And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
> > the PASID space of a PCI function cannot be shared between host and guest,
> > so we assign the whole PASID table along with the RID. Since we need the
> > BIND, INVALIDATE, and report APIs introduced here to support nested
> > translation, a /dev/sva interface would need to support this mode as well.
> 
> Well, that means this HW cannot support PASID capable 'SIOV' style
> devices in guests.

It does not yet support Intel SIOV, no. It does support the standards,
though: PCI SR-IOV to partition a device and PASIDs in a guest.

> I admit whole function PASID delegation might be something vfio-pci
> should handle - but only if it really doesn't fit in some /dev/sva
> after we cover the other PASID cases.

Wouldn't that be the duplication you're trying to avoid?  A second channel
for bind, invalidate, capability and fault reporting mechanisms?  If we
extract SVA parts of vfio_iommu_type1 into a separate chardev, PASID table
pass-through [1] will have to use that.

Thanks,
Jean

[1] 
https://lore.kernel.org/linux-iommu/20200320161911.27494-1-eric.au...@redhat.com/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 06:20:52PM +0200, Jean-Philippe Brucker wrote:
> On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
> > > And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
> > > the PASID space of a PCI function cannot be shared between host and guest,
> > > so we assign the whole PASID table along with the RID. Since we need the
> > > BIND, INVALIDATE, and report APIs introduced here to support nested
> > > translation, a /dev/sva interface would need to support this mode as well.
> > 
> > Well, that means this HW cannot support PASID capable 'SIOV' style
> > devices in guests.
> 
> It does not yet support Intel SIOV, no. It does support the standards,
> though: PCI SR-IOV to partition a device and PASIDs in a guest.

SIOV is basically standards based, it is better thought of as a
cookbook on how to use PASID and IOMMU together.

> > I admit whole function PASID delegation might be something vfio-pci
> > should handle - but only if it really doesn't fit in some /dev/sva
> > after we cover the other PASID cases.
> 
> Wouldn't that be the duplication you're trying to avoid?  A second
> channel for bind, invalidate, capability and fault reporting
> mechanisms?

Yes, which is why it seems like it would be nicer to avoid it. Why I
said "might" :)

> If we extract SVA parts of vfio_iommu_type1 into a separate chardev,
> PASID table pass-through [1] will have to use that.

Yes, '/dev/sva' (which is a terrible name) would want to be the uAPI
entry point for controlling the vIOMMU related to PASID.

Does anything in the [1] series have tight coupling to VFIO other than
needing to know a bus/device/function? It looks like it is mostly
exposing iommu_* functions as uAPI?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Raj, Ashok
On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:
> > > If user space wants to bind page tables, create the PASID with
> > > /dev/sva, use ioctls there to setup the page table the way it wants,
> > > then pass the now configured PASID to a driver that can use it. 
> > 
> > Are we talking about bare metal SVA? 
> 
> What a weird term.

Glad you noticed it at v7 :-) 

Any suggestions on something less weird than 
Shared Virtual Addressing? There is a reason why we moved from SVM to SVA.
> 
> > If so, I don't see the need for userspace to know there is a
> > PASID. All user space need is that my current mm is bound to a
> > device by the driver. So it can be a one-step process for user
> > instead of two.
> 
> You've missed the entire point of the conversation, VDPA already needs
> more than "my current mm is bound to a device"

You mean current version of vDPA? or a potential future version of vDPA?

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Auger Eric
Hi,
On 9/16/20 6:32 PM, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 06:20:52PM +0200, Jean-Philippe Brucker wrote:
>> On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote:
>>> On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
 And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
 the PASID space of a PCI function cannot be shared between host and guest,
 so we assign the whole PASID table along with the RID. Since we need the
 BIND, INVALIDATE, and report APIs introduced here to support nested
 translation, a /dev/sva interface would need to support this mode as well.
>>>
>>> Well, that means this HW cannot support PASID capable 'SIOV' style
>>> devices in guests.
>>
>> It does not yet support Intel SIOV, no. It does support the standards,
>> though: PCI SR-IOV to partition a device and PASIDs in a guest.
> 
> SIOV is basically standards based, it is better thought of as a
> cookbook on how to use PASID and IOMMU together.
> 
>>> I admit whole function PASID delegation might be something vfio-pci
>>> should handle - but only if it really doesn't fit in some /dev/sva
>>> after we cover the other PASID cases.
>>
>> Wouldn't that be the duplication you're trying to avoid?  A second
>> channel for bind, invalidate, capability and fault reporting
>> mechanisms?
> 
> Yes, which is why it seems like it would be nicer to avoid it. Why I
> said "might" :)
> 
>> If we extract SVA parts of vfio_iommu_type1 into a separate chardev,
>> PASID table pass-through [1] will have to use that.
> 
> Yes, '/dev/sva' (which is a terrible name) would want to be the uAPI
> entry point for controlling the vIOMMU related to PASID.
> 
> Does anything in the [1] series have tight coupling to VFIO other than
> needing to know a bus/device/function? It looks like it is mostly
> exposing iommu_* functions as uAPI?

this series does not use any PASID so it fits quite nicely into the VFIO
framework I think. Besides cache invalidation that takes the struct
device, other operations (MSI binding and PASID table passing operate on
the iommu domain). Also we use the VFIO memory region and
interrupt/eventfd registration mechanism to return faults.

Thanks

Eric
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:
> On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:
> > > > If user space wants to bind page tables, create the PASID with
> > > > /dev/sva, use ioctls there to setup the page table the way it wants,
> > > > then pass the now configured PASID to a driver that can use it. 
> > > 
> > > Are we talking about bare metal SVA? 
> > 
> > What a weird term.
> 
> Glad you noticed it at v7 :-) 
> 
> Any suggestions on something less weird than 
> Shared Virtual Addressing? There is a reason why we moved from SVM
> to SVA.

SVA is fine, what is "bare metal" supposed to mean?

PASID is about constructing an arbitary DMA IOVA map for PCI-E
devices, being able to intercept device DMA faults, etc.

SVA is doing DMA IOVA 1:1 with the mm_struct CPU VA. DMA faults
trigger the same thing as CPU page faults. If is it not 1:1 then there
is no "shared". When SVA is done using PCI-E PASID it is "PASID for
SVA". Lots of existing devices already have SVA without PASID or
IOMMU, so lets not muddy the terminology.

vPASID/vIOMMU is allowing a guest to control the DMA IOVA map and
manipulate the PASIDs.

vSVA is when a guest uses a vPASID to provide SVA, not sure this is
an informative term.

This particular patch series seems to be about vPASID/vIOMMU for vfio-mdev
vs the other vPASID/vIOMMU patch which was about vPASID for vfio-pci.

> > > If so, I don't see the need for userspace to know there is a
> > > PASID. All user space need is that my current mm is bound to a
> > > device by the driver. So it can be a one-step process for user
> > > instead of two.
> > 
> > You've missed the entire point of the conversation, VDPA already needs
> > more than "my current mm is bound to a device"
> 
> You mean current version of vDPA? or a potential future version of vDPA?

Future VDPA drivers, it was made clear this was important to Intel
during the argument about VDPA as a mdev.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/amd: Restore IRTE.RemapEn bit for amd_iommu_activate_guest_mode

2020-09-16 Thread Joao Martins
On 9/16/20 12:17 PM, Suravee Suthikulpanit wrote:
> Commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when updating
> 128-bit IRTE") removed an assumption that modify_irte_ga always set
> the valid bit, which requires the callers to set the appropriate value
> for the struct irte_ga.valid bit before calling the function.
> 
> Similar to the commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn
> bit after programming IRTE"), which is for the function
> amd_iommu_deactivate_guest_mode().
> 
> The same change is also needed for the amd_iommu_activate_guest_mode().
> Otherwise, this could trigger IO_PAGE_FAULT for the VFIO based VMs with
> AVIC enabled.
> 
> Reported-by: Maxim Levitsky 
> Tested-by: Maxim Levitsky 
> Cc: Joao Martins 
> Fixes: e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit 
> IRTE")
> Signed-off-by: Suravee Suthikulpanit 
> ---
>  drivers/iommu/amd/iommu.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index e938677af8bc..db4fb840c59c 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3900,14 +3900,18 @@ int amd_iommu_activate_guest_mode(void *data)
>  {
>   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
>   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
> + u64 valid;
>  
>   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
>   !entry || entry->lo.fields_vapic.guest_mode)
>   return 0;
>  
> + valid = entry->lo.fields_vapic.valid;
> +
>   entry->lo.val = 0;
>   entry->hi.val = 0;
>  
> + entry->lo.fields_vapic.valid   = valid;
>   entry->lo.fields_vapic.guest_mode  = 1;
>   entry->lo.fields_vapic.ga_log_intr = 1;
>   entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
> 
FWIW,

Reviewed-by: Joao Martins 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] dma-mapping: introduce DMA range map, supplanting dma_pfn_offset

2020-09-16 Thread Mathieu Poirier
On Wed, Sep 16, 2020 at 08:14:59AM +0200, Christoph Hellwig wrote:
> From: Jim Quinlan 
> 
> The new field 'dma_range_map' in struct device is used to facilitate the
> use of single or multiple offsets between mapping regions of cpu addrs and
> dma addrs.  It subsumes the role of "dev->dma_pfn_offset" which was only
> capable of holding a single uniform offset and had no region bounds
> checking.
> 
> The function of_dma_get_range() has been modified so that it takes a single
> argument -- the device node -- and returns a map, NULL, or an error code.
> The map is an array that holds the information regarding the DMA regions.
> Each range entry contains the address offset, the cpu_start address, the
> dma_start address, and the size of the region.
> 
> of_dma_configure() is the typical manner to set range offsets but there are
> a number of ad hoc assignments to "dev->dma_pfn_offset" in the kernel
> driver code.  These cases now invoke the function
> dma_direct_set_offset(dev, cpu_addr, dma_addr, size).
> 
> Signed-off-by: Jim Quinlan 
> [hch: various interface cleanups]
> Signed-off-by: Christoph Hellwig 
> Tested-by: Nathan Chancellor 
> ---
>  arch/arm/include/asm/dma-direct.h |  9 +--
>  arch/arm/mach-keystone/keystone.c | 17 ++---
>  arch/arm/mach-omap1/include/mach/memory.h |  4 +
>  arch/sh/drivers/pci/pcie-sh7786.c |  9 ++-
>  arch/x86/pci/sta2x11-fixup.c  |  6 +-
>  drivers/acpi/arm64/iort.c |  6 +-
>  drivers/base/core.c   |  2 +
>  drivers/gpu/drm/sun4i/sun4i_backend.c |  8 +-
>  drivers/iommu/io-pgtable-arm.c|  2 +-
>  .../platform/sunxi/sun4i-csi/sun4i_csi.c  |  9 ++-
>  .../platform/sunxi/sun6i-csi/sun6i_csi.c  | 11 ++-
>  drivers/of/address.c  | 73 ---
>  drivers/of/device.c   | 44 ++-
>  drivers/of/of_private.h   | 11 +--
>  drivers/of/unittest.c | 34 ++---
>  drivers/remoteproc/remoteproc_core.c  | 24 +-
>  .../staging/media/sunxi/cedrus/cedrus_hw.c| 10 ++-
>  include/linux/device.h|  4 +-
>  include/linux/dma-direct.h| 54 --
>  include/linux/dma-mapping.h   |  9 ++-
>  kernel/dma/coherent.c |  7 +-
>  kernel/dma/direct.c   | 51 -
>  22 files changed, 285 insertions(+), 119 deletions(-)
> 
> diff --git a/arch/arm/include/asm/dma-direct.h 
> b/arch/arm/include/asm/dma-direct.h
> index fbcf4367b5cb1a..436544aeb83405 100644
> --- a/arch/arm/include/asm/dma-direct.h
> +++ b/arch/arm/include/asm/dma-direct.h
> @@ -12,8 +12,8 @@
>  #ifndef __arch_pfn_to_dma
>  static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
>  {
> - if (dev)
> - pfn -= dev->dma_pfn_offset;
> + if (dev && dev->dma_range_map)
> + pfn = PFN_DOWN(translate_phys_to_dma(dev, PFN_PHYS(pfn)));
>   return (dma_addr_t)__pfn_to_bus(pfn);
>  }
>  
> @@ -21,9 +21,8 @@ static inline unsigned long dma_to_pfn(struct device *dev, 
> dma_addr_t addr)
>  {
>   unsigned long pfn = __bus_to_pfn(addr);
>  
> - if (dev)
> - pfn += dev->dma_pfn_offset;
> -
> + if (dev && dev->dma_range_map)
> + pfn = PFN_DOWN(translate_dma_to_phys(dev, PFN_PHYS(pfn)));
>   return pfn;
>  }
>  
> diff --git a/arch/arm/mach-keystone/keystone.c 
> b/arch/arm/mach-keystone/keystone.c
> index dcd031ba84c2e0..09a65c2dfd7327 100644
> --- a/arch/arm/mach-keystone/keystone.c
> +++ b/arch/arm/mach-keystone/keystone.c
> @@ -8,6 +8,7 @@
>   */
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -25,8 +26,6 @@
>  #include "keystone.h"
>  
>  #ifdef CONFIG_ARM_LPAE
> -static unsigned long keystone_dma_pfn_offset __read_mostly;
> -
>  static int keystone_platform_notifier(struct notifier_block *nb,
> unsigned long event, void *data)
>  {
> @@ -39,9 +38,12 @@ static int keystone_platform_notifier(struct 
> notifier_block *nb,
>   return NOTIFY_BAD;
>  
>   if (!dev->of_node) {
> - dev->dma_pfn_offset = keystone_dma_pfn_offset;
> - dev_err(dev, "set dma_pfn_offset%08lx\n",
> - dev->dma_pfn_offset);
> + int ret = dma_direct_set_offset(dev, KEYSTONE_HIGH_PHYS_START,
> + KEYSTONE_LOW_PHYS_START,
> + KEYSTONE_HIGH_PHYS_SIZE);
> + dev_err(dev, "set dma_offset%08llx%s\n",
> + KEYSTONE_HIGH_PHYS_START - KEYSTONE_LOW_PHYS_START,
> + ret ? " failed" : "");
>   }
>   return NOTIFY_OK;
>  }
> @@ -54,11 +56,8 @@ static struct notifier_block platform_nb = {
>  static void __init keystone_init(void)
>  {
>  #ifdef CONFIG_ARM_

Re: [PATCH 0/3] drm: panfrost: Coherency support

2020-09-16 Thread Rob Herring
On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
 wrote:
>
> > So I get a performance regression with the dma-coherent approach, even if 
> > it's
> > clearly the cleaner.
>
> That's bizarre -- this should really be the faster of the two.

Coherency may not be free. CortexA9 had something like 4x slower
memcpy if SMP was enabled as an example. I don't know if there's
anything going on like that specifically here. If there's never any
CPU accesses mixed in with kmscube, then there would be no benefit to
coherency.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] drm: panfrost: Coherency support

2020-09-16 Thread Alyssa Rosenzweig
> So I get a performance regression with the dma-coherent approach, even if it's
> clearly the cleaner.

That's bizarre -- this should really be the faster of the two.


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jacob Pan (Jun)
Hi Jason,
On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe 
wrote:

> On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:
> > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote:  
> > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:  
> > > > > If user space wants to bind page tables, create the PASID with
> > > > > /dev/sva, use ioctls there to setup the page table the way it
> > > > > wants, then pass the now configured PASID to a driver that
> > > > > can use it.   
> > > > 
> > > > Are we talking about bare metal SVA?   
> > > 
> > > What a weird term.  
> > 
> > Glad you noticed it at v7 :-) 
> > 
> > Any suggestions on something less weird than 
> > Shared Virtual Addressing? There is a reason why we moved from SVM
> > to SVA.  
> 
> SVA is fine, what is "bare metal" supposed to mean?
> 
What I meant here is sharing virtual address between DMA and host
process. This requires devices perform DMA request with PASID and use
IOMMU first level/stage 1 page tables.
This can be further divided into 1) user SVA 2) supervisor SVA (sharing
init_mm)

My point is that /dev/sva is not useful here since the driver can
perform PASID allocation while doing SVA bind.

> PASID is about constructing an arbitary DMA IOVA map for PCI-E
> devices, being able to intercept device DMA faults, etc.
> 
An arbitrary IOVA map does not need PASID. In IOVA, you do map/unmap
explicitly, why you need to handle IO page fault?

To me, PASID identifies an address space that is associated with a
mm_struct.

> SVA is doing DMA IOVA 1:1 with the mm_struct CPU VA. DMA faults
> trigger the same thing as CPU page faults. If is it not 1:1 then there
> is no "shared". When SVA is done using PCI-E PASID it is "PASID for
> SVA". Lots of existing devices already have SVA without PASID or
> IOMMU, so lets not muddy the terminology.
> 
I agree. This conversation is about "PASID for SVA" not "SVA without
PASID"


> vPASID/vIOMMU is allowing a guest to control the DMA IOVA map and
> manipulate the PASIDs.
> 
> vSVA is when a guest uses a vPASID to provide SVA, not sure this is
> an informative term.
> 
I agree.

> This particular patch series seems to be about vPASID/vIOMMU for
> vfio-mdev vs the other vPASID/vIOMMU patch which was about vPASID for
> vfio-pci.
> 
Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will be
introduced later.

> > > > If so, I don't see the need for userspace to know there is a
> > > > PASID. All user space need is that my current mm is bound to a
> > > > device by the driver. So it can be a one-step process for user
> > > > instead of two.  
> > > 
> > > You've missed the entire point of the conversation, VDPA already
> > > needs more than "my current mm is bound to a device"  
> > 
> > You mean current version of vDPA? or a potential future version of
> > vDPA?  
> 
> Future VDPA drivers, it was made clear this was important to Intel
> during the argument about VDPA as a mdev.
> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 1/7] docs: IOMMU user API

2020-09-16 Thread Randy Dunlap
On 9/11/20 2:57 PM, Jacob Pan wrote:
> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is intended to clarify the UAPI design and usage. The
> mechanics of how future extensions should be achieved are also covered
> in this documentation.
> 
> Reviewed-by: Eric Auger 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  Documentation/userspace-api/iommu.rst | 211 
> ++
>  MAINTAINERS   |   1 +
>  2 files changed, 212 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst

Hi,
I have a few edit changes for you below:


> diff --git a/Documentation/userspace-api/iommu.rst 
> b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index ..1e68e8f05bb3
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,211 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=
> +IOMMU Userspace API
> +=
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For baremetal
> +usage, the IOMMU is a system device which does not need to communicate
> +with user space directly.

userspace

for consistency

> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherin the vIOMMU implementation

wherein

> +relies on the physical IOMMU and for this reason requires interactions
> +with the host driver.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +===
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/Unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/Unbind guest PASID table (e.g. ARM SMMU)
> +4. Invalidate IOMMU caches upon guest requests
> +5. Report errors to the guest and serve page requests
> +
> +Requirements
> +
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM SMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space

  userspace

> +
> +Interfaces
> +==
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is

   there are no

> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase the structure sizes.
> +
> +No new fields can be added *after* the variable sized union in that it
> +will break backward compatibility when offset moves. A new flag must
> +be introduced whenever a change affects the structure using either
> +method. The IOMMU driver processes the data based on flags which
> +ensures backward compatibility.
> +
> +Version field is only reserved for the unlikely event of UAPI upgrade
> +at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +Though at the same time, argsz is user provided data which is not
> +trusted. The argsz field allows the user app to indicate how much data
> +it is providing, it's still the kernel's responsibility to validate

 providing;

> +whether it's correct and sufficient for the requested operation.
> +
> +Compatibility Checking
> +--
> +When IOMMU UAPI extension results in some structure size increase,
> +IOMMU UAPI code shall handle the following cases:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on an older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within

   passing

> +   range. The data may contain garbage.
> +
> +Feature Checking
> +
> +While launching a guest with vIOMMU, it is strongly advised to check
> +the compatibility upfront, as some subsequent errors happening during
> +vIOMMU operation, such as cache invalidation failures cannot be nicely> 
> +escaladated to the guest due to IOMMU specifications. This can lead to

   escalated

> +catastrophic failures for the users.
> +
> +User applications such as QEMU are expect

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:
> Hi Jason,
> On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe 
> wrote:
> 
> > On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:
> > > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote:  
> > > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:  
> > > > > > If user space wants to bind page tables, create the PASID with
> > > > > > /dev/sva, use ioctls there to setup the page table the way it
> > > > > > wants, then pass the now configured PASID to a driver that
> > > > > > can use it.   
> > > > > 
> > > > > Are we talking about bare metal SVA?   
> > > > 
> > > > What a weird term.  
> > > 
> > > Glad you noticed it at v7 :-) 
> > > 
> > > Any suggestions on something less weird than 
> > > Shared Virtual Addressing? There is a reason why we moved from SVM
> > > to SVA.  
> > 
> > SVA is fine, what is "bare metal" supposed to mean?
> > 
> What I meant here is sharing virtual address between DMA and host
> process. This requires devices perform DMA request with PASID and use
> IOMMU first level/stage 1 page tables.
> This can be further divided into 1) user SVA 2) supervisor SVA (sharing
> init_mm)
> 
> My point is that /dev/sva is not useful here since the driver can
> perform PASID allocation while doing SVA bind.

No, you are thinking too small.

Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA.

When VDPA is used by DPDK it makes sense that the PASID will be SVA and
1:1 with the mm_struct.

When VDPA is used by qemu it makes sense that the PASID will be an
arbitary IOVA map constructed to be 1:1 with the guest vCPU physical
map. /dev/sva allows a single uAPI to do this kind of setup, and qemu
can support it while supporting a range of SVA kernel drivers. VDPA
and vfio-mdev are obvious initial targets.

*BOTH* are needed.

In general any uAPI for PASID should have the option to use either the
mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually
nothing to implement this in the driver as PASID is just a number, and
gives so much more flexability.

> Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will be
> introduced later.

Last patch is:

  vfio/type1: Add vSVA support for IOMMU-backed mdevs

So pretty hard to see how this is not about vfio-mdev, at least a
little..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: intel_iommu=on breaks resume from suspend on several Thinkpad models

2020-09-16 Thread Jarkko Sakkinen
On Sun, Sep 06, 2020 at 11:38:08PM -0400, Ronan Jouchet wrote:
> Hi. This is a follow-up of [BUG]
> https://bugzilla.kernel.org/show_bug.cgi?id=197029 ,
> where Jarkko Sakkinen asks in comment 31 to move discussion here.
> 
> [1.] One line summary of the problem:
> 
> intel_iommu=on breaks resume from suspend on several Thinkpad models
> 
> [2.] Full description of the problem/report:
> 
> With intel_iommu=on, on several Thinkpad models (my personal T560, and
> the X1 Yoga / Yoga 460 of commenters over at [BUG]), suspend does work,
> but pressing POWER / Enter / whatever key fails to resume from suspend.
> 
> Instead, the machine doesn't do anything: system remains suspended,
> the glowing LED keeps glowing, and the only option is to force a
> hard shutdown with a long press on POWER, and start the system again.
> 
> [3.] Keywords (i.e., modules, networking, kernel):
> 
> suspend, resume, power management, laptop, lenovo, ibm, thinkpad, intel
> 
> [4.] Kernel information
> 
> [4.1.] Kernel version (from /proc/version):
> 
> Linux version 5.8.7-arch1-1 (linux@archlinux)
>   (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35) #1 SMP PREEMPT
>   Sat, 05Sep 2020 12:31:32 +
> 
> This is the official `linux` package currently in Arch's `core` repo:
> https://www.archlinux.org/packages/core/x86_64/linux/
> 
> [4.2.] Kernel .config file:
> 
> https://github.com/archlinux/svntogit-packages/blob/packages/linux/trunk/config
> 
> [5.] Most recent kernel version which did not have the bug:
> 
> Undetermined.
> 
> I witnessed the bug in Linux [ 4.13 , 5.8.7 ] but the bug predates 4.13.
> I first noticed it in 4.13 because it's the first version where Arch
> shipped a kernel enabling `intel_iommu=on` by default.
> 
> Since then, following the Arch Linux sister bug report linked below at
> [ARCH-BUG], Arch kernel packagers switched back to `intel_iommu=off`.
> 
> [X. Other notes and bugzilla bug summary/chronology]
> 
> X.1. This is a follow-up to these threads:
>  - [BUG] https://bugzilla.kernel.org/show_bug.cgi?id=197029
>  - [ARCH-BBS] https://bbs.archlinux.org/viewtopic.php?pid=1737688
>  - [ARCH-BUG] https://bugs.archlinux.org/task/55705
> 
> X.2. Over at [ARCH-BBS], someone suggested I try `intel_iommu=igfx_off`
>  rather than full `intel_iommu=off`. It's not enough; even with
>  `intel_iommu=igfx_off`, resume from suspend is broken.
> 
> X.3. The same commenter over at [ARCH-BBS] suggests this bug might be
>  related to https://bugs.freedesktop.org/show_bug.cgi?id=89360
> 
> X.4. Problem was brought to the Linux IOMMU list:
> 
> https://lists.linuxfoundation.org/pipermail/iommu/2017-September/024382.html
> 
> X.5. Several Reddit commenters confirmed the problem:
> 
> https://www.reddit.com/r/archlinux/comments/72z2rv/linux_41331_is_in_core/dnmjaeo/
> 
> X.6. On 2017-09-30, buzilla commenter Albert wrote at
>  https://bugzilla.kernel.org/show_bug.cgi?id=197029#c9 that:
> 
>  > I'm seeing this on my X1 Yoga (gen1) as well.
>  >
>  > When going to suspend (via systemctl suspend) with the default
>  > (intel_iommu=on), the power light starts fading/"breathing",
>  > but the audio mute LED stays on and the machine hangs.
>  >
>  > With intel_iommu=off, the power light breathes as well and the
>  > auto mute LED turns off correctly. I can then resume it normally
>  > (by pressing the Fn key).
> 
> X.7. On 2017-10-16, Lu Baolu from Intel wrote at
>  https://bugzilla.kernel.org/show_bug.cgi?id=197029#c13 that:
> 
>  > This issue has been narrowed down to a hidden ME device which
>  > is not OS aware. The main symptom is below error log message
>  > and system fails to resume after being suspended.
>  >
>  > DMAR: DRHD: handling fault status reg 3
>  > DMAR: [DMA Read] Request device [00:12.4] fault addr b7fff000
>  >   [fault reason 02] Present bit in context entry is clear
>  >
>  > A quick workaround is make PTP OS aware in BIOS configuration.
>  > It's likely at "PCH-FW Configuration"->"PTP aware OS".
> 
>  However, I couldn't find such an option in my T560's BIOS :-/
> 
> X.8. On 2017-11-13, Lu Baolu from Intel wrote at
>  https://bugzilla.kernel.org/show_bug.cgi?id=197029#c18 that:
> 
>  > This bug is still under investigation. We have narrowed it
>  > as a regression caused by a previous commit.
>  > The commit owner is now working on a fix.
> 
> X.9. On 2020-02-03, to my followup requests, Lu Baolu wrote at
>  https://bugzilla.kernel.org/show_bug.cgi?id=197029#c21 that:
> 
>  > It seems to be caused by below commit:
>  >
>  > commit 422eac3f7deae34dbaffd08e03e27f37a5394a56
>  > Author: Jarkko Sakkinen 
>  > Date: Tue Apr 19 12:54:18 2016 +0300
>  >
>  > tpm_crb: fix mapping of the buffers
>  >
>  > On my Lenovo x250 the following situation occurs:
>  >
>  > [18697.813871] tpm_crb MSFT0101:00: can't request region for res

[PATCH V2] dma-direct: Fix potential NULL pointer dereference

2020-09-16 Thread Thomas Tai
When booting the kernel v5.9-rc4 on a VM, the kernel would panic when
printing a warning message in swiotlb_map(). The dev->dma_mask must not
be a NULL pointer when calling the dma mapping layer. A NULL pointer check
can potentially avoid the panic.

[drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0
 BUG: kernel NULL pointer dereference, address: 
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x) - not-present page
 PGD 0 P4D 0
 Oops:  [#1] SMP PTI
 CPU: 1 PID: 331 Comm: systemd-udevd Not tainted 5.9.0-rc4 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
 BIOS 1.13.0-1ubuntu1 04/01/2014
 RIP: 0010:swiotlb_map+0x1ac/0x200
 Code: e8 d9 fc ff ff 80 3d 92 ee 4c 01 00 75 51 49 8b 84 24 48 02 00 00
 4d 8b 6c 24 50 c6 05 7c ee 4c 01 01 4d 8b bc 24 58 02 00 00 <4c> 8b 30
 4d 85 ed 75 04 4d 8b 2c 24 4c 89 e7 e8 10 6b 4f 00 4d 89
 RSP: 0018:9f96801af6f8 EFLAGS: 00010246
 RAX:  RBX: 1000 RCX: 0080
 RDX: 007f RSI: 0202 RDI: 0202
 RBP: 9f96801af748 R08:  R09: 0020
 R10:  R11: 8fabfffa3000 R12: 8faad02c7810
 R13:  R14: 0020 R15: 
 FS:  7fabc63588c0() GS:8fabf7c8()
 knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2:  CR3: 000151496005 CR4: 00370ee0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
  dma_direct_map_sg+0x124/0x210
  dma_map_sg_attrs+0x32/0x50
  drm_gem_shmem_get_pages_sgt+0x6a/0x90 [drm]
  virtio_gpu_object_create+0x140/0x2f0 [virtio_gpu]
  ? ww_mutex_unlock+0x26/0x30
  virtio_gpu_mode_dumb_create+0xab/0x160 [virtio_gpu]
  drm_mode_create_dumb+0x82/0x90 [drm]
  drm_client_framebuffer_create+0xaa/0x200 [drm]
  drm_fb_helper_generic_probe+0x59/0x150 [drm_kms_helper]
  drm_fb_helper_single_fb_probe+0x29e/0x3e0 [drm_kms_helper]
  __drm_fb_helper_initial_config_and_unlock+0x41/0xd0 [drm_kms_helper]
  drm_fbdev_client_hotplug+0xe6/0x1a0 [drm_kms_helper]
  drm_fbdev_generic_setup+0xaf/0x170 [drm_kms_helper]
  virtio_gpu_probe+0xea/0x100 [virtio_gpu]
  virtio_dev_probe+0x14b/0x1e0 [virtio]
  really_probe+0x1db/0x440
  driver_probe_device+0xe9/0x160
  device_driver_attach+0x5d/0x70
  __driver_attach+0x8f/0x150
  ? device_driver_attach+0x70/0x70
  bus_for_each_dev+0x7e/0xc0
  driver_attach+0x1e/0x20
  bus_add_driver+0x152/0x1f0
  driver_register+0x74/0xd0
  ? 0xc0529000
  register_virtio_driver+0x20/0x30 [virtio]
  virtio_gpu_driver_init+0x15/0x1000 [virtio_gpu]
  do_one_initcall+0x4a/0x1fa
  ? _cond_resched+0x19/0x30
  ? kmem_cache_alloc_trace+0x16b/0x2e0
  do_init_module+0x62/0x240
  load_module+0xe0e/0x1100
  ? security_kernel_post_read_file+0x5c/0x70
  __do_sys_finit_module+0xbe/0x120
  ? __do_sys_finit_module+0xbe/0x120
  __x64_sys_finit_module+0x1a/0x20
  do_syscall_64+0x38/0x50
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Thomas Tai 
---
 include/linux/dma-direct.h |  3 ---
 kernel/dma/mapping.c   | 11 +++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6e87225..0648708 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -62,9 +62,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 {
dma_addr_t end = addr + size - 1;
 
-   if (!dev->dma_mask)
-   return false;
-
if (is_ram && !IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
return false;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 0d12942..7133d5c 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -144,6 +144,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct 
page *page,
dma_addr_t addr;
 
BUG_ON(!valid_dma_direction(dir));
+
+   if (WARN_ON_ONCE(!dev->dma_mask))
+   return DMA_MAPPING_ERROR;
+
if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
@@ -179,6 +183,10 @@ int dma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg, int nents,
int ents;
 
BUG_ON(!valid_dma_direction(dir));
+
+   if (WARN_ON_ONCE(!dev->dma_mask))
+   return 0;
+
if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
@@ -213,6 +221,9 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t 
phys_addr,
 
BUG_ON(!valid_dma_direction(dir));
 
+   if (WARN_ON_ONCE(!dev->dma_mask))
+   return DMA_MAPPING_ERROR;
+
/* Don't allow RAM to be mapped */
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 

Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-09-16 Thread Bjorn Helgaas
On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 

Applied (slightly modified) to pci/acs for v5.10, thanks!

I think the warning is superfluous because every external_facing
device is a Root Port or Switch Downstream Port, and if those support
ACS at all, they are required to support Translation Blocking.  So we
should only see the warning if the device is defective, and I don't
think we need to go out of our way to look for those.

> ---
> v4: Add braces to avoid warning from kernel robot
> print warning for only external-facing devices.
> v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
> Minor code comments fixes.
> v2: Commit log change
> 
>  drivers/pci/pci.c|  8 
>  drivers/pci/quirks.c | 15 +++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 73a8627822140..a5a6bea7af7ce 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..bb22b46c1d719 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF
> + *
> + * TODO: This quirk also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted
> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,14 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5] PCI/ACS: Enable PCI_ACS_TB and disable only when needed for ATS

2020-09-16 Thread Bjorn Helgaas
On Tue, Jul 14, 2020 at 01:15:40PM -0700, Rajat Jain wrote:
> The ACS "Translation Blocking" bit blocks the translated addresses from
> the devices. We don't expect such traffic from devices unless ATS is
> enabled on them. A device sending such traffic without ATS enabled,
> indicates malicious intent, and thus should be blocked.
> 
> Enable PCI_ACS_TB by default for all devices, and it stays enabled until
> atleast one of the devices downstream wants to enable ATS. It gets
> disabled to enable ATS on a device downstream it, and then gets enabled
> back on once all the downstream devices don't need ATS.
> 
> Signed-off-by: Rajat Jain 

I applied v4 of this patch instead because I think the complexity of
this one, where we have to walk up the tree and disable TB in upstream
bridges, is too high.  It's always tricky to modify the state of
device Y when we're doing something for device X.

> ---
> Note that I'm ignoring the devices that require quirks to enable or
> disable ACS, instead of using the standard way for ACS configuration.
> The reason is that it would require adding yet another quirk table or
> quirk function pointer, that I don't know how to implement for those
> devices, and will neither have the devices to test that code.
> 
> v5: Enable TB and disable ATS for all devices on boot. Disable TB later
> only if needed to enable ATS on downstream devices.
> v4: Add braces to avoid warning from kernel robot
> print warning for only external-facing devices.
> v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
> Minor code comments fixes.
> v2: Commit log change
> 
>  drivers/pci/ats.c   |  5 
>  drivers/pci/pci.c   | 57 +
>  drivers/pci/pci.h   |  2 ++
>  drivers/pci/probe.c |  2 +-
>  include/linux/pci.h |  2 ++
>  5 files changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..e2ea9083f30f 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -28,6 +28,9 @@ void pci_ats_init(struct pci_dev *dev)
>   return;
>  
>   dev->ats_cap = pos;
> +
> + dev->ats_enabled = 1; /* To avoid WARN_ON from pci_disable_ats() */
> + pci_disable_ats(dev);
>  }
>  
>  /**
> @@ -82,6 +85,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
>   }
>   pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
>  
> + pci_disable_acs_trans_blocking(dev);
>   dev->ats_enabled = 1;
>   return 0;
>  }
> @@ -102,6 +106,7 @@ void pci_disable_ats(struct pci_dev *dev)
>   ctrl &= ~PCI_ATS_CTRL_ENABLE;
>   pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
>  
> + pci_enable_acs_trans_blocking(dev);
>   dev->ats_enabled = 0;
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_ats);
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 73a862782214..614e3c1e8c56 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -876,6 +876,9 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> @@ -904,6 +907,60 @@ static void pci_enable_acs(struct pci_dev *dev)
>   pci_disable_acs_redir(dev);
>  }
>  
> +void pci_disable_acs_trans_blocking(struct pci_dev *pdev)
> +{
> + u16 cap, ctrl, pos;
> + struct pci_dev *dev;
> +
> + if (!pci_acs_enable)
> + return;
> +
> + for (dev = pdev; dev; dev = pci_upstream_bridge(pdev)) {
> +
> + pos = dev->acs_cap;
> + if (!pos)
> + continue;
> +
> + /*
> +  * Disable translation blocking when first downstream
> +  * device that needs it (for ATS) wants to enable ATS
> +  */
> + if (++dev->ats_dependencies == 1) {
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> + ctrl &= ~(cap & PCI_ACS_TB);
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> + }
> + }
> +}
> +
> +void pci_enable_acs_trans_blocking(struct pci_dev *pdev)
> +{
> + u16 cap, ctrl, pos;
> + struct pci_dev *dev;
> +
> + if (!pci_acs_enable)
> + return;
> +
> + for (dev = pdev; dev; dev = pci_upstream_bridge(pdev)) {
> +
> + pos = dev->acs_cap;
> + if (!pos)
> + continue;
> +
> + /*
> +  * Enable translation blocking when last downstream device
> +  * that depends on it (for ATS), doesn't need ATS anymore
> +  */
> + if (--dev->ats_dependencies == 0) {
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
> + pci_read_config_wo

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jacob Pan (Jun)
Hi Jason,
On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe 
wrote:

> On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:
> > Hi Jason,
> > On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe 
> > wrote:
> >   
> > > On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:  
> > > > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe
> > > > wrote:
> > > > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun)
> > > > > wrote:
> > > > > > > If user space wants to bind page tables, create the PASID
> > > > > > > with /dev/sva, use ioctls there to setup the page table
> > > > > > > the way it wants, then pass the now configured PASID to a
> > > > > > > driver that can use it. 
> > > > > > 
> > > > > > Are we talking about bare metal SVA? 
> > > > > 
> > > > > What a weird term.
> > > > 
> > > > Glad you noticed it at v7 :-) 
> > > > 
> > > > Any suggestions on something less weird than 
> > > > Shared Virtual Addressing? There is a reason why we moved from
> > > > SVM to SVA.
> > > 
> > > SVA is fine, what is "bare metal" supposed to mean?
> > >   
> > What I meant here is sharing virtual address between DMA and host
> > process. This requires devices perform DMA request with PASID and
> > use IOMMU first level/stage 1 page tables.
> > This can be further divided into 1) user SVA 2) supervisor SVA
> > (sharing init_mm)
> > 
> > My point is that /dev/sva is not useful here since the driver can
> > perform PASID allocation while doing SVA bind.  
> 
> No, you are thinking too small.
> 
> Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA.
> 
Could you point to me the SVA UAPI? I couldn't find it in the mainline.
Seems VDPA uses VHOST interface?

> When VDPA is used by DPDK it makes sense that the PASID will be SVA
> and 1:1 with the mm_struct.
> 
I still don't see why bare metal DPDK needs to get a handle of the
PASID. Perhaps the SVA patch would explain. Or are you talking about
vDPA DPDK process that is used to support virtio-net-pmd in the guest?

> When VDPA is used by qemu it makes sense that the PASID will be an
> arbitary IOVA map constructed to be 1:1 with the guest vCPU physical
> map. /dev/sva allows a single uAPI to do this kind of setup, and qemu
> can support it while supporting a range of SVA kernel drivers. VDPA
> and vfio-mdev are obvious initial targets.
> 
> *BOTH* are needed.
> 
> In general any uAPI for PASID should have the option to use either the
> mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually
> nothing to implement this in the driver as PASID is just a number, and
> gives so much more flexability.
> 
Not really nothing in terms of PASID life cycles. For example, if user
uses uacce interface to open an accelerator, it gets an FD_acc. Then it
opens /dev/sva to allocate PASID then get another FD_pasid. Then we
pass FD_pasid to the driver to bind page tables, perhaps multiple
drivers. Now we have to worry about If FD_pasid gets closed before
FD_acc(s) closed and all these race conditions.

If we do not expose FD_pasid to the user, the teardown is much simpler
and streamlined. Following each FD_acc close, PASID unbind is performed.

> > Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will
> > be introduced later.  
> 
> Last patch is:
> 
>   vfio/type1: Add vSVA support for IOMMU-backed mdevs
> 
> So pretty hard to see how this is not about vfio-mdev, at least a
> little..
> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Wang


On 2020/9/17 上午7:09, Jacob Pan (Jun) wrote:

Hi Jason,
On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe 
wrote:


On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:

Hi Jason,
On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe 
wrote:
   

On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:

On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe
wrote:

On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun)
wrote:

If user space wants to bind page tables, create the PASID
with /dev/sva, use ioctls there to setup the page table
the way it wants, then pass the now configured PASID to a
driver that can use it.

Are we talking about bare metal SVA?

What a weird term.

Glad you noticed it at v7 :-)

Any suggestions on something less weird than
Shared Virtual Addressing? There is a reason why we moved from
SVM to SVA.

SVA is fine, what is "bare metal" supposed to mean?
   

What I meant here is sharing virtual address between DMA and host
process. This requires devices perform DMA request with PASID and
use IOMMU first level/stage 1 page tables.
This can be further divided into 1) user SVA 2) supervisor SVA
(sharing init_mm)

My point is that /dev/sva is not useful here since the driver can
perform PASID allocation while doing SVA bind.

No, you are thinking too small.

Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA.


Could you point to me the SVA UAPI? I couldn't find it in the mainline.
Seems VDPA uses VHOST interface?



It's the vhost_iotlb_msg defined in uapi/linux/vhost_types.h.





When VDPA is used by DPDK it makes sense that the PASID will be SVA
and 1:1 with the mm_struct.


I still don't see why bare metal DPDK needs to get a handle of the
PASID.



My understanding is that it may:

- have a unified uAPI with vSVA: alloc, bind, unbind, free
- leave the binding policy to userspace instead of the using a implied 
one in the kenrel




Perhaps the SVA patch would explain. Or are you talking about
vDPA DPDK process that is used to support virtio-net-pmd in the guest?


When VDPA is used by qemu it makes sense that the PASID will be an
arbitary IOVA map constructed to be 1:1 with the guest vCPU physical
map. /dev/sva allows a single uAPI to do this kind of setup, and qemu
can support it while supporting a range of SVA kernel drivers. VDPA
and vfio-mdev are obvious initial targets.

*BOTH* are needed.

In general any uAPI for PASID should have the option to use either the
mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually
nothing to implement this in the driver as PASID is just a number, and
gives so much more flexability.


Not really nothing in terms of PASID life cycles. For example, if user
uses uacce interface to open an accelerator, it gets an FD_acc. Then it
opens /dev/sva to allocate PASID then get another FD_pasid. Then we
pass FD_pasid to the driver to bind page tables, perhaps multiple
drivers. Now we have to worry about If FD_pasid gets closed before
FD_acc(s) closed and all these race conditions.



I'm not sure I understand this. But this demonstrates the flexibility of 
an unified uAPI. E.g it allows vDPA and VFIO device to use the same 
PAISD which can be shared with a process in the guest.


For the race condition, it could be probably solved with refcnt.

Thanks




If we do not expose FD_pasid to the user, the teardown is much simpler
and streamlined. Following each FD_acc close, PASID unbind is performed.


Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will
be introduced later.

Last patch is:

   vfio/type1: Add vSVA support for IOMMU-backed mdevs

So pretty hard to see how this is not about vfio-mdev, at least a
little..

Jason


Thanks,

Jacob



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 16, 2020 10:45 PM
> 
> On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Tuesday, September 15, 2020 10:29 PM
> > >
> > > > Do they need a device at all?  It's not clear to me why RID based
> > > > IOMMU management fits within vfio's scope, but PASID based does not.
> > >
> > > In RID mode vfio-pci completely owns the PCI function, so it is more
> > > natural that VFIO, as the sole device owner, would own the DMA
> mapping
> > > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
> > > so there is not much reason to try and disaggregate the API.
> >
> > It is also used by vDPA.
> 
> A driver in VDPA, not VDPA itself.

what is the difference? It is still the example of using RID IOMMU mode
outside of VFIO (and just implies that vDPA even doesn't do a good
abstraction internally).

> 
> > > PASID on the other hand, is shared. vfio-mdev drivers will share the
> > > device with other kernel drivers. PASID and DMA will be concurrent
> > > with VFIO and other kernel drivers/etc.
> >
> > Looks you are equating PASID to host-side sharing, while ignoring
> > another valid usage that a PASID-capable device is passed through
> > to the guest through vfio-pci and then PASID is used by the guest
> > for guest-side sharing. In such case, it is an exclusive usage in host
> > side and then what is the problem for VFIO to manage PASID given
> > that vfio-pci completely owns the function?
> 
> This is no different than vfio-pci being yet another client to
> /dev/sva
> 

My comment was to echo Alex's question about "why RID based
IOMMU management fits within vfio's scope, but PASID based 
does not". and when talking about generalization we should look
bigger beyond sva. What really matters here is the iommu_domain
which is about everything related to DMA mapping. The domain
associated with a passthru device is marked as "unmanaged" in 
kernel and allows userspace to manage DMA mapping of this 
device through a set of iommu_ops:

- alloc/free domain;
- attach/detach device/subdevice;
- map/unmap a memory region;
- bind/unbind page table and invalidate iommu cache;
- ... (and lots of other callbacks)

map/unmap or bind/unbind are just different ways of managing
DMAs in an iommu domain. The passthrough framework (VFIO 
or VDPA) has been providing its uAPI to manage every aspect of 
iommu_domain so far, and sva is just a natural extension following 
this design. If we really want to generalize something, it needs to 
be /dev/iommu as an unified interface for managing every aspect 
of iommu_domain. Asking SVA abstraction alone just causes 
unnecessary mess to both kernel (sync domain/device association 
between /dev/vfio and /dev/sva) and userspace (talk to two 
interfaces even for same vfio-pci device). Then it sounds like more 
like a bandaid for saving development effort in VDPA (which 
instead should go proposing /dev/iommu when it was invented 
instead of reinventing its own bits until such effort is unaffordable 
and then ask for partial abstraction to fix its gap).

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu