[PATCH v4 1/3] iommu/arm-smmu: Allow implementation specific write_s2cr

2020-10-16 Thread Bjorn Andersson
The firmware found in some Qualcomm platforms intercepts writes to the
S2CR register in order to replace the BYPASS type with FAULT. Further
more it treats faults at this level as catastrophic and restarts the
device.

Add support for providing implementation specific versions of the S2CR
write function, to allow the Qualcomm driver to work around this
behavior.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- New patch

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 22 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dad7fa86fbd4..ed3f0428c110 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -929,14 +929,20 @@ static void arm_smmu_write_smr(struct arm_smmu_device 
*smmu, int idx)
 static void arm_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)
 {
struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
-   u32 reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, s2cr->type) |
- FIELD_PREP(ARM_SMMU_S2CR_CBNDX, s2cr->cbndx) |
- FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
-
-   if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs &&
-   smmu->smrs[idx].valid)
-   reg |= ARM_SMMU_S2CR_EXIDVALID;
-   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+   u32 reg;
+
+   if (smmu->impl && smmu->impl->write_s2cr) {
+   smmu->impl->write_s2cr(smmu, idx);
+   } else {
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, s2cr->type) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, s2cr->cbndx) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
+
+   if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs &&
+   smmu->smrs[idx].valid)
+   reg |= ARM_SMMU_S2CR_EXIDVALID;
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+   }
 }
 
 static void arm_smmu_write_sme(struct arm_smmu_device *smmu, int idx)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 1a746476927c..b71647eaa319 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -436,6 +436,7 @@ struct arm_smmu_impl {
int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
  struct arm_smmu_device *smmu,
  struct device *dev, int start);
+   void (*write_s2cr)(struct arm_smmu_device *smmu, int idx);
 };
 
 #define INVALID_SMENDX -1
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/3] iommu/arm-smmu-qcom: Support maintaining bootloader mappings

2020-10-16 Thread Bjorn Andersson
This is the fourth attempt of inheriting the stream mapping for the framebuffer
on many Qualcomm platforms, in order to not hit catastrophic faults during
arm-smmu initialization.

The new approach does, based on Robin's suggestion, take a much more direct
approach with the allocation of a context bank for bypass emulation and use of
this context bank pretty much isolated to the Qualcomm specific implementation.

As before the patchset has been tested to boot DB845c (with splash screen) and
Lenovo Yoga C630 (with EFI framebuffer).

Bjorn Andersson (3):
  iommu/arm-smmu: Allow implementation specific write_s2cr
  iommu/arm-smmu-qcom: Read back stream mappings
  iommu/arm-smmu-qcom: Implement S2CR quirk

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 92 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 22 --
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  1 +
 3 files changed, 107 insertions(+), 8 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/3] iommu/arm-smmu-qcom: Read back stream mappings

2020-10-16 Thread Bjorn Andersson
The Qualcomm boot loader configures stream mapping for the peripherals
that it accesses and in particular it sets up the stream mapping for the
display controller to be allowed to scan out a splash screen or EFI
framebuffer.

Read back the stream mappings during initialization and make the
arm-smmu driver maintain the streams in bypass mode.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- Extracted from different patch in v3.
- Now configures the stream as BYPASS, rather than translate, which should work
  for platforms with working S2CR handling as well.

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 24 ++
 1 file changed, 24 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..0089048342dd 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -23,6 +23,29 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
{ }
 };
 
+static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
+{
+   u32 smr;
+   int i;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+
+   if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
+   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+   smmu->smrs[i].valid = true;
+
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].cbndx = 0xff;
+   smmu->s2crs[i].count++;
+   }
+   }
+
+   return 0;
+}
+
 static int qcom_smmu_def_domain_type(struct device *dev)
 {
const struct of_device_id *match =
@@ -61,6 +84,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
 }
 
 static const struct arm_smmu_impl qcom_smmu_impl = {
+   .cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
 };
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/3] iommu/arm-smmu-qcom: Implement S2CR quirk

2020-10-16 Thread Bjorn Andersson
The firmware found in some Qualcomm platforms intercepts writes to S2CR
in order to replace bypass type streams with fault; and ignore S2CR
updates of type fault.

Detect this behavior and implement a custom write_s2cr function in order
to trick the firmware into supporting bypass streams by the means of
configuring the stream for translation using a reserved and disabled
context bank.

Also circumvent the problem of configuring faulting streams by
configuring the stream as bypass.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- Move the reservation of the "identity context bank" to the Qualcomm specific
  implementation.
- Implement the S2CR quirk with the newly introduced write_s2cr callback.

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 68 ++
 1 file changed, 68 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 0089048342dd..c0f42d6a6e01 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -10,8 +10,14 @@
 
 struct qcom_smmu {
struct arm_smmu_device smmu;
+   bool bypass_cbndx;
 };
 
+static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct qcom_smmu, smmu);
+}
+
 static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -25,9 +31,32 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
 
 static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
 {
+   unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 reg;
u32 smr;
int i;
 
+   /*
+* With some firmware versions writes to S2CR of type FAULT are
+* ignored, and writing BYPASS will end up written as FAULT in the
+* register. Perform a write to S2CR to detect if this is the case and
+* if so reserve a context bank to emulate bypass streams.
+*/
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, S2CR_TYPE_BYPASS) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, 0xff) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, S2CR_PRIVCFG_DEFAULT);
+   arm_smmu_gr0_write(smmu, last_s2cr, reg);
+   reg = arm_smmu_gr0_read(smmu, last_s2cr);
+   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS) {
+   qsmmu->bypass_cbndx = smmu->num_context_banks - 1;
+
+   set_bit(qsmmu->bypass_cbndx, smmu->context_map);
+
+   reg = FIELD_PREP(ARM_SMMU_CBAR_TYPE, 
CBAR_TYPE_S1_TRANS_S2_BYPASS);
+   arm_smmu_gr1_write(smmu, 
ARM_SMMU_GR1_CBAR(qsmmu->bypass_cbndx), reg);
+   }
+
for (i = 0; i < smmu->num_mapping_groups; i++) {
smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
 
@@ -46,6 +75,44 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
+static void qcom_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)
+{
+   struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 cbndx = s2cr->cbndx;
+   u32 type = s2cr->type;
+   u32 reg;
+
+   if (qsmmu->bypass_cbndx) {
+   if (type == S2CR_TYPE_BYPASS) {
+   /*
+* Firmware with quirky S2CR handling will substitute
+* BYPASS writes with FAULT, so point the stream to the
+* reserved context bank and ask for translation on the
+* stream
+*/
+   type = S2CR_TYPE_TRANS;
+   cbndx = qsmmu->bypass_cbndx;
+   } else if (type == S2CR_TYPE_FAULT) {
+   /*
+* Firmware with quirky S2CR handling will ignore FAULT
+* writes, so trick it to write FAULT by asking for a
+* BYPASS.
+*/
+   type = S2CR_TYPE_BYPASS;
+   cbndx = 0xff;
+   }
+   }
+
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, type) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, cbndx) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
+
+   if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs && 
smmu->smrs[idx].valid)
+   reg |= ARM_SMMU_S2CR_EXIDVALID;
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+}
+
 static int qcom_smmu_def_domain_type(struct device *dev)
 {
const struct of_device_id *match =
@@ -87,6 +154,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
+   .write_s2cr = qcom_smmu_write_s2cr,
 };
 
 struct a

Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support

2020-10-16 Thread Nicolin Chen
On Fri, Oct 16, 2020 at 03:10:26PM +0100, Robin Murphy wrote:
> On 2020-10-16 04:53, Nicolin Chen wrote:
> > On Thu, Oct 15, 2020 at 10:55:52AM +0100, Robin Murphy wrote:
> > > On 2020-10-15 05:13, Nicolin Chen wrote:
> > > > On Wed, Oct 14, 2020 at 06:42:36PM +0100, Robin Murphy wrote:
> > > > > On 2020-10-09 17:19, Nicolin Chen wrote:
> > > > > > This patch simply adds support for PCI devices.
> > > > > > 
> > > > > > Reviewed-by: Dmitry Osipenko 
> > > > > > Tested-by: Dmitry Osipenko 
> > > > > > Signed-off-by: Nicolin Chen 
> > > > > > ---
> > > > > > 
> > > > > > Changelog
> > > > > > v6->v7
> > > > > > * Renamed goto labels, suggested by Thierry.
> > > > > > v5->v6
> > > > > > * Added Dmitry's Reviewed-by and Tested-by.
> > > > > > v4->v5
> > > > > > * Added Dmitry's Reviewed-by
> > > > > > v3->v4
> > > > > > * Dropped !iommu_present() check
> > > > > > * Added CONFIG_PCI check in the exit path
> > > > > > v2->v3
> > > > > > * Replaced ternary conditional operator with if-else in 
> > > > > > .device_group()
> > > > > > * Dropped change in tegra_smmu_remove()
> > > > > > v1->v2
> > > > > > * Added error-out labels in tegra_smmu_probe()
> > > > > > * Dropped pci_request_acs() since IOMMU core would call it.
> > > > > > 
> > > > > > drivers/iommu/tegra-smmu.c | 35 
> > > > > > +--
> > > > > > 1 file changed, 25 insertions(+), 10 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> > > > > > index be29f5977145..2941d6459076 100644
> > > > > > --- a/drivers/iommu/tegra-smmu.c
> > > > > > +++ b/drivers/iommu/tegra-smmu.c
> > > > > > @@ -10,6 +10,7 @@
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > +#include 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > @@ -865,7 +866,11 @@ static struct iommu_group 
> > > > > > *tegra_smmu_device_group(struct device *dev)
> > > > > > group->smmu = smmu;
> > > > > > group->soc = soc;
> > > > > > -   group->group = iommu_group_alloc();
> > > > > > +   if (dev_is_pci(dev))
> > > > > > +   group->group = pci_device_group(dev);
> > > > > 
> > > > > Just to check, is it OK to have two or more swgroups "owning" the same
> > > > > iommu_group if an existing one gets returned here? It looks like that 
> > > > > might
> > > > > not play nice with the use of iommu_group_set_iommudata().
> > > > 
> > > > Do you mean by "gets returned here" the "IS_ERR" check below?
> > > 
> > > I mean that unlike iommu_group_alloc()/generic_device_group(),
> > > pci_device_group() may give you back a group that already contains another
> > > device and has already been set up from that device's perspective. This 
> > > can
> > > happen for topological reasons like requester ID aliasing through a 
> > > PCI-PCIe
> > > bridge or lack of isolation between functions.
> > 
> > Okay..but we don't really have two swgroups owning the same groups
> > in case of PCI devices. For Tegra210, all PCI devices inherit the
> > same swgroup from the PCI controller. And I'd think previous chips
> > do the same. The only use case currently of 2+ swgroups owning the
> > same iommu_group is for display controller.
> > 
> > Or do you suggest we need an additional check for pci_device_group?
> 
> Ah, OK - I still don't have the best comprehension of what exactly swgroups

The "swgroup" stands for "software group", which associates with
an ASID (Address Space Identifier) for SMMU to determine whether
this client is going through SMMU translation or not.

> are, and the path through .of_xlate looked like you might be using the PCI
> requester ID as the swgroup identifier, but I guess that ultimately depends
> on what your "iommu-map" is supposed to look like. If pci_device_group()

This is copied from pcie node in our downstream dtsi file:

iommus = <&mc TEGRA_SWGROUP_AFI>;
iommu-map = <0x0 &mc TEGRA_SWGROUP_AFI 0x1000>;
iommu-map-mask = <0x0>;

> will effectively only ever get called once regardless of how many endpoints
> exist, then indeed this won't be a concern (although if that's *guaranteed*
> to be the case then you may as well just stick with calling
> iommu_group_alloc() directly). Thanks for clarifying.

All PCI devices are supposed to get this swgroup of the pcie node
above. So the function will return the existing group of the pcie
controller, before giving a chance to call iommu_group_alloc().

Thanks for the review and input.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-10-16 Thread Arvind Sankar
On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote:
> 
> I am discussing with Ross the other option. We can create
> .rodata.mle_header section and put it at fixed offset as
> kernel_info is. So, we would have, e.g.:
> 
> arch/x86/boot/compressed/vmlinux.lds.S:
> .rodata.kernel_info KERNEL_INFO_OFFSET : {
> *(.rodata.kernel_info)
> }
> ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info at 
> bad address!")
> 
> .rodata.mle_header MLE_HEADER_OFFSET : {
> *(.rodata.mle_header)
> }
> ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header at bad 
> address!")
> 
> arch/x86/boot/compressed/sl_stub.S:
> #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET)
> 
> .section ".rodata.mle_header", "a"
> 
> SYM_DATA_START(mle_header)
> .long   0x9082ac5a/* UUID0 */
> .long   0x74a7476f/* UUID1 */
> .long   0xa2555c0f/* UUID2 */
> .long   0x42b651cb/* UUID3 */
> .long   0x0034/* MLE header size */
> .long   0x00020002/* MLE version 2.2 */
> .long   mleh_rva(sl_stub_entry)/* Linear entry point of MLE 
> (virt. address) */
> .long   0x/* First valid page of MLE */
> .long   0x/* Offset within binary of first byte of MLE */
> .long   0x/* Offset within binary of last byte + 1 of MLE 
> */
> .long   0x0223/* Bit vector of MLE-supported capabilities */
> .long   0x/* Starting linear address of command line 
> (unused) */
> .long   0x/* Ending linear address of command line 
> (unused) */
> SYM_DATA_END(mle_header)
> 
> Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere.
> Anyway, is it acceptable?
> 
> There is also another problem. We have to put into mle_header size of
> the Linux kernel image. Currently it is done by the bootloader but
> I think it is not a role of the bootloader. The kernel image should
> provide all data describing its properties and do not rely on the
> bootloader to do that. Ross and I investigated various options but we
> did not find a good/simple way to do that. Could you suggest how we
> should do that or at least where we should take a look to get some
> ideas?
> 
> Daniel

What exactly is the size you need here? Is it just the size of the
protected mode image, that's startup_32 to _edata. Or is it the size of
the whole bzImage file, or something else? I guess the same question
applies to "first valid page of MLE" and "first byte of MLE", and the
linear entry point -- are those all relative to startup_32 or do they
need to be relative to the start of the bzImage, i.e. you have to add
the size of the real-mode boot stub?

If you need to include the size of the bzImage file, that's not known
when the files in boot/compressed are built. It's only known after the
real-mode stub is linked. arch/x86/boot/tools/build.c fills in various
details in the setup header and creates the bzImage file, but it does
not currently modify anything in the protected-mode portion of the
compressed kernel (i.e. arch/x86/boot/compressed/vmlinux, which then
gets converted to binary format as arch/x86/boot/vmlinux.bin), so it
would need to be extended if you need to modify the MLE header to
include the bzImage size or anything depending on the size of the
real-mode stub.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-16 Thread Jason Gunthorpe
On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote:
> Hi, Alex and Jason (G),
> 
> How about your opinion for this new proposal? For now looks both
> Jason (W) and Jean are OK with this direction and more discussions
> are possibly required for the new /dev/ioasid interface. Internally 
> we're doing a quick prototype to see any unforeseen issue with this
> separation. 

Assuming VDPA and VFIO will be the only two users so duplicating
everything only twice sounds pretty restricting to me.

> > Second, IOMMU nested translation is a per IOMMU domain
> > capability. Since IOMMU domains are managed by VFIO/VDPA
> >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > etc.), reporting/enabling the nesting capability is an natural
> > extension to the domain uAPI of existing passthrough frameworks.
> > Actually, VFIO already includes a nesting enable interface even
> > before this series. So it doesn't make sense to generalize this uAPI
> > out.

The subsystem that obtains an IOMMU domain for a device would have to
register it with an open FD of the '/dev/sva'. That is the connection
between the two subsystems. It would be some simple kernel internal
stuff:

  sva = get_sva_from_file(fd);
  sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);

Not sure why this is a roadblock?

How would this be any different from having some kernel libsva that
VDPA and VFIO would both rely on?

You don't plan to just open code all this stuff in VFIO, do you?

> > Then the tricky part comes with the remaining operations (3/4/5),
> > which are all backed by iommu_ops thus effective only within an
> > IOMMU domain. To generalize them, the first thing is to find a way
> > to associate the sva_FD (opened through generic /dev/sva) with an
> > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > to replicate {domain<->device/subdevice} association in /dev/sva
> > path because some operations (e.g. page fault) is triggered/handled
> > per device/subdevice. Therefore, /dev/sva must provide both per-
> > domain and per-device uAPIs similar to what VFIO/VDPA already
> > does. 

Yes, the point here was to move the general APIs out of VFIO and into
a sharable location. So, of course one would expect some duplication
during the transition period.

> > Moreover, mapping page fault to subdevice requires pre-
> > registering subdevice fault data to IOMMU layer when binding
> > guest page table, while such fault data can be only retrieved from
> > parent driver through VFIO/VDPA.

Not sure what this means, page fault should be tied to the PASID, any
hookup needed for that should be done in-kernel when the device is
connected to the PASID.

> > space but they may be organized in multiple IOMMU domains based
> > on their bus type. How (should we let) the userspace know the
> > domain information and open an sva_FD for each domain is the main
> > problem here.

Why is one sva_FD per iommu domain required? The HW can attach the
same PASID to multiple iommu domains, right?

> > In the end we just realized that doing such generalization doesn't
> > really lead to a clear design and instead requires tight coordination
> > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > (especially about synchronization when the domain/device
> > association is changed or when the device/subdevice is being reset/
> > drained). Finally it may become a usability burden to the userspace
> > on proper use of the two interfaces on the assigned device.

If you have a list of things that needs to be done to attach a PCI
device to a PASID then of course they should be tidy kernel APIs
already, and not just hard wired into VFIO.

The worst outcome would be to have VDPA and VFIO have to different
ways to do all of this with a different set of bugs. Bug fixes/new
features in VFIO won't flow over to VDPA.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support

2020-10-16 Thread Robin Murphy

On 2020-10-16 04:53, Nicolin Chen wrote:

On Thu, Oct 15, 2020 at 10:55:52AM +0100, Robin Murphy wrote:

On 2020-10-15 05:13, Nicolin Chen wrote:

On Wed, Oct 14, 2020 at 06:42:36PM +0100, Robin Murphy wrote:

On 2020-10-09 17:19, Nicolin Chen wrote:

This patch simply adds support for PCI devices.

Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Signed-off-by: Nicolin Chen 
---

Changelog
v6->v7
* Renamed goto labels, suggested by Thierry.
v5->v6
* Added Dmitry's Reviewed-by and Tested-by.
v4->v5
* Added Dmitry's Reviewed-by
v3->v4
* Dropped !iommu_present() check
* Added CONFIG_PCI check in the exit path
v2->v3
* Replaced ternary conditional operator with if-else in .device_group()
* Dropped change in tegra_smmu_remove()
v1->v2
* Added error-out labels in tegra_smmu_probe()
* Dropped pci_request_acs() since IOMMU core would call it.

drivers/iommu/tegra-smmu.c | 35 +--
1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index be29f5977145..2941d6459076 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -10,6 +10,7 @@
#include 
#include 
#include 
+#include 
#include 
#include 
#include 
@@ -865,7 +866,11 @@ static struct iommu_group *tegra_smmu_device_group(struct 
device *dev)
group->smmu = smmu;
group->soc = soc;
-   group->group = iommu_group_alloc();
+   if (dev_is_pci(dev))
+   group->group = pci_device_group(dev);


Just to check, is it OK to have two or more swgroups "owning" the same
iommu_group if an existing one gets returned here? It looks like that might
not play nice with the use of iommu_group_set_iommudata().


Do you mean by "gets returned here" the "IS_ERR" check below?


I mean that unlike iommu_group_alloc()/generic_device_group(),
pci_device_group() may give you back a group that already contains another
device and has already been set up from that device's perspective. This can
happen for topological reasons like requester ID aliasing through a PCI-PCIe
bridge or lack of isolation between functions.


Okay..but we don't really have two swgroups owning the same groups
in case of PCI devices. For Tegra210, all PCI devices inherit the
same swgroup from the PCI controller. And I'd think previous chips
do the same. The only use case currently of 2+ swgroups owning the
same iommu_group is for display controller.

Or do you suggest we need an additional check for pci_device_group?


Ah, OK - I still don't have the best comprehension of what exactly 
swgroups are, and the path through .of_xlate looked like you might be 
using the PCI requester ID as the swgroup identifier, but I guess that 
ultimately depends on what your "iommu-map" is supposed to look like. If 
pci_device_group() will effectively only ever get called once regardless 
of how many endpoints exist, then indeed this won't be a concern 
(although if that's *guaranteed* to be the case then you may as well 
just stick with calling iommu_group_alloc() directly). Thanks for 
clarifying.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-16 Thread Rob Herring
On Thu, Oct 15, 2020 at 12:42 AM Christoph Hellwig  wrote:
>
> > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> > +{
> > + phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> > + struct of_range_parser parser;
> > + phys_addr_t subtree_max_addr;
> > + struct device_node *child;
> > + phys_addr_t cpu_end = 0;
> > + struct of_range range;
> > + const __be32 *ranges;
> > + int len;
> > +
> > + if (!np)
> > + np = of_root;
>
> Requiring of_root to be passed explicitly would seem more natural
> to me than the magic NULL argument.  There doesn't seem to be any
> precedent for that kind of calling convention either.

I prefer that of_root is not more widely exposed and NULL regularly
means 'the whole tree'.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes

2020-10-16 Thread Jean-Philippe Brucker
On Thu, Oct 15, 2020 at 11:22:11AM -0700, Raj, Ashok wrote:
> Hi Jean
> 
> + Baolu who is looking into this.
> 
> 
> On Thu, Oct 15, 2020 at 11:00:27AM +0200, Jean-Philippe Brucker wrote:
> > Add a parameter to iommu_sva_unbind_device() that tells the IOMMU driver
> > whether the PRI queue needs flushing. When looking at the PCIe spec
> > again I noticed that most of the time the SMMUv3 driver doesn't actually
> > need to flush the PRI queue. Does this make sense for Intel VT-d as well
> > or did I overlook something?
> > 
> > Before calling iommu_sva_unbind_device(), device drivers must stop the
> > device from using the PASID. For PCIe devices, that consists of
> > completing any pending DMA, and completing any pending page request
> > unless the device uses Stop Markers. So unless the device uses Stop
> > Markers, we don't need to flush the PRI queue. For SMMUv3, stopping DMA
> > means completing all stall events, so we never need to flush the event
> > queue.
> 
> I don't think this is true. Baolu is working on an enhancement to this,
> I'll quickly summarize this below:
> 
> Stop markers are weird, I'm not certain there is any device today that
> sends STOP markers. Even if they did, markers don't have a required
> response, they are fire and forget from the device pov.

Definitely agree with this, and I hope none will implement stop marker.
For devices that *don't* use a stop marker, the PCIe spec says (10.4.1.2):

  To stop [using a PASID] without using a Stop Marker Message, the
  function shall:
  1. Stop queueing new Page Request Messages for this PASID.
  2. Finish transmitting any multi-page Page Request Messages for this
 PASID (i.e. send the Page Request Message with the L bit Set).
  3. Wait for PRG Response Messages associated any outstanding Page
 Request Messages for the PASID.

So they have to flush their PR themselves. And since the device driver
completes this sequence before calling unbind(), then there shouldn't be
any oustanding PR for the PASID, and unbind() doesn't need to flush,
right?

> I'm not sure about other IOMMU's how they behave, When there is no space in
> the PRQ, IOMMU auto-responds to the device. This puts the device in a
> while (1) loop. The fake successful response will let the device do a ATS
> lookup, and that would fail forcing the device to do another PRQ.

But in the sequence above, step 1 should ensure that the device will not
send another PR for any successful response coming back at step 3.

So I agree with the below if we suspect there could be pending PR, but
given that pending PR are a stop marker thing and we don't know any device
using stop markers, I wondered why I bothered implementing PRIq flush at
all for SMMUv3, hence this RFC.

Thanks,
Jean


> The idea
> is somewhere there the OS has repeated the others and this will find a way
> in the PRQ. The point is this is less reliable and can't be the only
> indication. PRQ draining has a specific sequence. 
> 
> The detailed steps are outlined in 
> "Chapter 7.10 "Software Steps to Drain Page Requests & Responses"
> 
> - Submit invalidation wait with fence flag to ensure all prior
>   invalidations are processed.
> - submit iotlb followed by devtlb invalidation
> - Submit invalidation wait with page-drain to make sure any page-requests
>   issued by the device are flushed when this invalidation wait completes.
> - If during the above process there was a queue overflow SW can assume no
>   outstanding page-requests are there. If we had a queue full condition,
>   then sw must repeat step 2,3 above.
> 
> 
> To that extent the proposal is as follows: names are suggestive :-) I'm
> making this up as I go!
> 
> - iommu_stop_page_req() - Kernel needs to make sure we respond to all
>   outstanding requests (since we can't drop responses) 
>   - Ensure we respond immediatly for anything that comes before the drain
> sequence completes
> - iommu_drain_page_req() - Which does the above invalidation with
>   Page_drain set.
> 
> Once the driver has performed a reset and before issuing any new request,
> it does iommu_resume_page_req() this will ensure we start processing
> incoming page-req after this point.
> 
> 
> > 
> > First patch adds flags to unbind(), and the second one lets device
> > drivers tell whether the PRI queue needs to be flushed.
> > 
> > Other remarks:
> > 
> > * The PCIe spec (see quote on patch 2), says that the device signals
> >   whether it has sent a Stop Marker or not during Stop PASID. In reality
> >   it's unlikely that a given device will sometimes use one stop method
> >   and sometimes the other, so it could be a device-wide flag rather than
> >   passing it at each unbind(). I don't want to speculate too much about
> >   future implementation so I prefer having the flag in unbind().
> > 
> > * In patch 1, uacce passes 0 to unbind(). To pass the right flag I'm
> >   thinking that uacce->ops->stop_queue(), which tells the device driver
> >   to stop DMA, should return whet

Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-16 Thread Hanjun Guo

On 2020/10/16 15:27, Hanjun Guo wrote:
The patch only takes the address limit field into account if its value 
> 0.


Sorry I missed the if (*->memory_address_limit) check, thanks
for the reminding.



Also, before commit 7fb89e1d44cb6aec ("ACPI/IORT: take _DMA methods
into account for named components"), the _DMA method was not taken
into account for named components at all, and only the IORT limit was
used, so I do not anticipate any problems with that.


Then this patch is fine to me.


Certainly we need to address Lorenzo's comments.

Thanks
Hanjun
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-16 Thread Hanjun Guo

Hi Ard,

On 2020/10/16 14:54, Ard Biesheuvel wrote:

On Fri, 16 Oct 2020 at 08:51, Hanjun Guo  wrote:


On 2020/10/16 2:03, Catalin Marinas wrote:

On Thu, Oct 15, 2020 at 10:26:18PM +0800, Hanjun Guo wrote:

On 2020/10/15 3:12, Nicolas Saenz Julienne wrote:

From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.


Sorry, I'm still a little bit confused. With this patch, if we have
a device which set the right _DMA method (DMA size >= 32), but with the
wrong DMA size in IORT, we still have the ZONE_DMA created which
is actually not needed?


With the current kernel, we get a ZONE_DMA already with an arbitrary
size of 1GB that matches what RPi4 needs. We are trying to eliminate
such unnecessary ZONE_DMA based on some heuristics (well, something that
looks "better" than a OEM ID based quirk). Now, if we learn that IORT
for platforms in the field is that broken as to describe few bits-wide
DMA masks, we may have to go back to the OEM ID quirk.


Some platforms using 0 as the memory size limit, for example D05 [0] and
D06 [1], I think we need to go back to the OEM ID quirk.

For D05/D06, there are multi interrupt controllers named as mbigen,
mbigen is using the named component to describe the mappings with
the ITS controller, and mbigen is using 0 as the memory size limit.

Also since the memory size limit for PCI RC was introduced by later
IORT revision, so firmware people may think it's fine to set that
as 0 because the system works without it.



Hello Hanjun,

The patch only takes the address limit field into account if its value > 0.


Sorry I missed the if (*->memory_address_limit) check, thanks
for the reminding.



Also, before commit 7fb89e1d44cb6aec ("ACPI/IORT: take _DMA methods
into account for named components"), the _DMA method was not taken
into account for named components at all, and only the IORT limit was
used, so I do not anticipate any problems with that.


Then this patch is fine to me.

Thanks
Hanjun
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] iommu: Add IOMMU_UNBIND_FAULT_PENDING flag

2020-10-16 Thread Christoph Hellwig
On Thu, Oct 15, 2020 at 11:00:29AM +0200, Jean-Philippe Brucker wrote:
> IOMMU drivers only need to flush their PRI queue when faults might be
> pending. According to the PCIe spec (quoted below) this only happens
> when using the "Stop Marker" method. Otherwise the function waits for
> pending faults before signaling to the device driver that it can
> unbind().
> 
> Add the IOMMU_UNBIND_FAULT_PENDING flags to unbind(), to tell the IOMMU
> driver whether it's worth flushing the queue.

As we have no code using the "stop marker" method please just remove
the code entirely instead of adding a flag that is just dead code.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu