[PATCH] swiotlb: add overflow checks to swiotlb_bounce

2021-07-06 Thread Dominique Martinet
This is a follow-up on 5f89468e2f06 ("swiotlb: manipulate orig_addr
when tlb_addr has offset") which fixed unaligned dma mappings,
making sure the following overflows are caught:

- offset of the start of the slot within the device bigger than
requested address' offset, in other words if the base address
given in swiotlb_tbl_map_single to create the mapping (orig_addr)
was after the requested address for the sync (tlb_offset) in the
same block:

 |--| block
  <> mapped part of the block
  ^
  orig_addr
   ^
   invalid tlb_addr for sync

- if the resulting offset was bigger than the allocation size
this one could happen if the mapping was not until the end. e.g.

 |--| block
  <-> mapped part of the block
  ^   ^
  orig_addr   invalid tlb_addr

Both should never happen so print a warning and bail out without trying
to adjust the sizes/offsets: the first one could try to sync from
orig_addr to whatever is left of the requested size, but the later
really has nothing to sync there...

Signed-off-by: Dominique Martinet 
Cc: Konrad Rzeszutek Wilk 
Cc: Bumyong Lee 
Cc: Chanho Park 
Cc: Christoph Hellwig 
---

Hi Konrad,

here's the follow up for the swiotlb/caamjr regression I had promissed.
It doesn't really change anything, and I confirmed I don't hit either of
the warnings on our board, but it's probably best to have as either
could really happen.


 kernel/dma/swiotlb.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e50df8d8f87e..23f8d0b168c5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -354,13 +354,27 @@ static void swiotlb_bounce(struct device *dev, 
phys_addr_t tlb_addr, size_t size
size_t alloc_size = mem->slots[index].alloc_size;
unsigned long pfn = PFN_DOWN(orig_addr);
unsigned char *vaddr = phys_to_virt(tlb_addr);
-   unsigned int tlb_offset;
+   unsigned int tlb_offset, orig_addr_offset;
 
if (orig_addr == INVALID_PHYS_ADDR)
return;
 
-   tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
-swiotlb_align_offset(dev, orig_addr);
+   tlb_offset = tlb_addr & (IO_TLB_SIZE - 1);
+   orig_addr_offset = swiotlb_align_offset(dev, orig_addr);
+   if (tlb_offset < orig_addr_offset) {
+   dev_WARN_ONCE(dev, 1,
+   "Access before mapping start detected. orig offset %u, 
requested offset %u.\n",
+   orig_addr_offset, tlb_offset);
+   return;
+   }
+
+   tlb_offset -= orig_addr_offset;
+   if (tlb_offset > alloc_size) {
+   dev_WARN_ONCE(dev, 1,
+   "Buffer overflow detected. Allocation size: %zu. 
Mapping size: %zu+%u.\n",
+   alloc_size, size, tlb_offset);
+   return;
+   }
 
orig_addr += tlb_offset;
alloc_size -= tlb_offset;
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-07-06 Thread John Stultz
On Sun, Jul 4, 2021 at 11:16 AM Rob Clark  wrote:
>
> I suspect you are getting a dpu fault, and need:
>
> https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=h...@mail.gmail.com/
>
> I suppose Bjorn was expecting me to send that patch

If it's helpful, I applied that and it got the db845c booting mainline
again for me (along with some reverts for a separate ext4 shrinker
crash).
Tested-by: John Stultz 

thanks
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module

2021-07-06 Thread John Stultz
Allow the qcom_scm driver to be loadable as a permenent module.

This still uses the "depends on QCOM_SCM || !QCOM_SCM" bit to
ensure that drivers that call into the qcom_scm driver are
also built as modules. While not ideal in some cases its the
only safe way I can find to avoid build errors without having
those drivers select QCOM_SCM and have to force it on (as
QCOM_SCM=n can be valid for those drivers).

Reviving this now that Saravana's fw_devlink defaults to on,
which should avoid loading troubles seen before.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Joerg Roedel 
Cc: Thomas Gleixner 
Cc: Marc Zyngier 
Cc: Linus Walleij 
Cc: Vinod Koul 
Cc: Kalle Valo 
Cc: Maulik Shah 
Cc: Saravana Kannan 
Cc: Todd Kjos 
Cc: Greg Kroah-Hartman 
Cc: linux-arm-...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-g...@vger.kernel.org
Acked-by: Kalle Valo 
Acked-by: Greg Kroah-Hartman 
Acked-by: Will Deacon 
Reviewed-by: Bjorn Andersson 
Signed-off-by: John Stultz 
---
v3:
* Fix __arm_smccc_smc build issue reported by
  kernel test robot 
v4:
* Add "depends on QCOM_SCM || !QCOM_SCM" bit to ath10k
  config that requires it.
v5:
* Fix QCOM_QCM typo in Kconfig, it should be QCOM_SCM
---
 drivers/firmware/Kconfig| 2 +-
 drivers/firmware/Makefile   | 3 ++-
 drivers/firmware/qcom_scm.c | 4 
 drivers/iommu/Kconfig   | 2 ++
 drivers/net/wireless/ath/ath10k/Kconfig | 1 +
 5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index db0ea2d2d75a..af53778edc7e 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -235,7 +235,7 @@ config INTEL_STRATIX10_RSU
  Say Y here if you want Intel RSU support.
 
 config QCOM_SCM
-   bool
+   tristate "Qcom SCM driver"
depends on ARM || ARM64
depends on HAVE_ARM_SMCCC
select RESET_CONTROLLER
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 5e013b6a3692..523173cbff33 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -17,7 +17,8 @@ obj-$(CONFIG_ISCSI_IBFT)  += iscsi_ibft.o
 obj-$(CONFIG_FIRMWARE_MEMMAP)  += memmap.o
 obj-$(CONFIG_RASPBERRYPI_FIRMWARE) += raspberrypi.o
 obj-$(CONFIG_FW_CFG_SYSFS) += qemu_fw_cfg.o
-obj-$(CONFIG_QCOM_SCM) += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
+obj-$(CONFIG_QCOM_SCM) += qcom-scm.o
+qcom-scm-objs += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
 obj-$(CONFIG_TI_SCI_PROTOCOL)  += ti_sci.o
 obj-$(CONFIG_TRUSTED_FOUNDATIONS) += trusted_foundations.o
 obj-$(CONFIG_TURRIS_MOX_RWTM)  += turris-mox-rwtm.o
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index ee9cb545e73b..bb9ce3f92931 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -1296,6 +1296,7 @@ static const struct of_device_id qcom_scm_dt_match[] = {
{ .compatible = "qcom,scm" },
{}
 };
+MODULE_DEVICE_TABLE(of, qcom_scm_dt_match);
 
 static struct platform_driver qcom_scm_driver = {
.driver = {
@@ -1312,3 +1313,6 @@ static int __init qcom_scm_init(void)
return platform_driver_register(&qcom_scm_driver);
 }
 subsys_initcall(qcom_scm_init);
+
+MODULE_DESCRIPTION("Qualcomm Technologies, Inc. SCM driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 07b7c25cbed8..f61516c17589 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -253,6 +253,7 @@ config SPAPR_TCE_IOMMU
 config ARM_SMMU
tristate "ARM Ltd. System MMU (SMMU) Support"
depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
@@ -382,6 +383,7 @@ config QCOM_IOMMU
# Note: iommu drivers cannot (yet?) be built as modules
bool "Qualcomm IOMMU Support"
depends on ARCH_QCOM || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on QCOM_SCM=y
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU
diff --git a/drivers/net/wireless/ath/ath10k/Kconfig 
b/drivers/net/wireless/ath/ath10k/Kconfig
index 40f91bc8514d..741289e385d5 100644
--- a/drivers/net/wireless/ath/ath10k/Kconfig
+++ b/drivers/net/wireless/ath/ath10k/Kconfig
@@ -44,6 +44,7 @@ config ATH10K_SNOC
tristate "Qualcomm ath10k SNOC support"
depends on ATH10K
depends on ARCH_QCOM || COMPILE_TEST
+   depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
select QCOM_QMI_HELPERS
help
  This module adds support for integrated WCN3990 chip connected
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-07-06 Thread Bjorn Andersson
On Sun 04 Jul 13:20 CDT 2021, Rob Clark wrote:

> I suspect you are getting a dpu fault, and need:
> 
> https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=h...@mail.gmail.com/
> 
> I suppose Bjorn was expecting me to send that patch
> 

No, I left that discussion with the same understanding as you... But I
ended up side tracked by some other craziness.

Did you post this somewhere or would you still like me to test it and
spin a patch?

Regards,
Bjorn

> BR,
> -R
> 
> On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov
>  wrote:
> >
> > Hi,
> >
> > I've had splash screen disabled on my RB3. However once I've enabled it,
> > I've got the attached crash during the boot on the msm/msm-next. It
> > looks like it is related to this particular set of changes.
> >
> > On 11/06/2021 00:44, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > This picks up an earlier series[1] from Jordan, and adds additional
> > > support needed to generate GPU devcore dumps on iova faults.  Original
> > > description:
> > >
> > > This is a stack to add an Adreno GPU specific handler for pagefaults. The 
> > > first
> > > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch 
> > > adds
> > > a adreno-smmu-priv function hook to capture a handful of important 
> > > debugging
> > > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by 
> > > the
> > > third patch to print more detailed information on page fault such as the 
> > > TTBR0
> > > for the pagetable that caused the fault and the source of the fault as
> > > determined by a combination of the FSYNR1 register and an internal GPU
> > > register.
> > >
> > > This code provides a solid base that we can expand on later for even more
> > > extensive GPU side page fault debugging capabilities.
> > >
> > > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
> > >  GPU snapshotting needs to avoid crashdumper, and check the
> > >  RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> > > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
> > >  resume translation after it has had a chance to snapshot the GPUs
> > >  state
> > > v3: Always clear FSR even if the target driver is going to handle resume
> > > v2: Fix comment wording and function pointer check per Rob Clark
> > >
> > > [1] 
> > > https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcro...@codeaurora.org/
> > >
> > > Jordan Crouse (3):
> > >iommu/arm-smmu: Add support for driver IOMMU fault handlers
> > >iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
> > >  info
> > >drm/msm: Improve the a6xx page fault handler
> > >
> > > Rob Clark (2):
> > >iommu/arm-smmu-qcom: Add stall support
> > >drm/msm: devcoredump iommu fault support
> > >
> > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  23 +++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 110 +++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++--
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.c |  15 +++
> > >   drivers/gpu/drm/msm/msm_gem.h   |   1 +
> > >   drivers/gpu/drm/msm/msm_gem_submit.c|   1 +
> > >   drivers/gpu/drm/msm/msm_gpu.c   |  48 +
> > >   drivers/gpu/drm/msm/msm_gpu.h   |  17 +++
> > >   drivers/gpu/drm/msm/msm_gpummu.c|   5 +
> > >   drivers/gpu/drm/msm/msm_iommu.c |  22 +++-
> > >   drivers/gpu/drm/msm/msm_mmu.h   |   5 +-
> > >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.c   |   9 +-
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.h   |   2 +
> > >   include/linux/adreno-smmu-priv.h|  38 ++-
> > >   15 files changed, 367 insertions(+), 21 deletions(-)
> > >
> >
> >
> > --
> > With best wishes
> > Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Nathan Chancellor

Hi Will and Robin,

On 7/6/2021 10:06 AM, Will Deacon wrote:

On Tue, Jul 06, 2021 at 04:39:11PM +0100, Robin Murphy wrote:

On 2021-07-06 15:05, Christoph Hellwig wrote:

On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:

FWIW I was pondering the question of whether to do something along those
lines or just scrap the default assignment entirely, so since I hadn't got
round to saying that I've gone ahead and hacked up the alternative
(similarly untested) for comparison :)

TBH I'm still not sure which one I prefer...


Claire did implement something like your suggestion originally, but
I don't really like it as it doesn't scale for adding multiple global
pools, e.g. for the 64-bit addressable one for the various encrypted
secure guest schemes.


Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're
not concerned with a minimal fix for backports anyway I'm more than happy to
focus on Will's approach. Another thing is that that looks to take us a
quiet step closer to the possibility of dynamically resizing a SWIOTLB pool,
which is something that some of the hypervisor protection schemes looking to
build on top of this series may want to explore at some point.


Ok, I'll split that nasty diff I posted up into a reviewable series and we
can take it from there.


For what it's worth, I attempted to boot Will's diff on top of Konrad's 
devel/for-linus-5.14 and it did not work; in fact, I got no output on my 
monitor period, even with earlyprintk=, and I do not think this machine 
has a serial console.


Robin's fix does work, it survived ten reboots with no issues getting to 
X and I do not see the KASAN and slub debug messages anymore but I 
understand that this is not the preferred solution it seems (although 
Konrad did want to know if it works).


I am happy to test any further patches or follow ups as needed, just 
keep me on CC.


Cheers,
Nathan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/1] dma-debug: fix check_for_illegal_area() in debug_dma_map_sg()

2021-07-06 Thread Gerald Schaefer
On Tue, 6 Jul 2021 10:22:40 +0100
Robin Murphy  wrote:

> On 2021-07-05 19:52, Gerald Schaefer wrote:
> > The following warning occurred sporadically on s390:
> > DMA-API: nvme 0006:00:00.0: device driver maps memory from kernel text or 
> > rodata [addr=48cc5e2f] [len=131072]
> > WARNING: CPU: 4 PID: 825 at kernel/dma/debug.c:1083 
> > check_for_illegal_area+0xa8/0x138
> > 
> > It is a false-positive warning, due to a broken logic in debug_dma_map_sg().
> > check_for_illegal_area() should check for overlay of sg elements with kernel
> > text or rodata. It is called with sg_dma_len(s) instead of s->length as
> > parameter. After the call to ->map_sg(), sg_dma_len() contains the length
> > of possibly combined sg elements in the DMA address space, and not the
> > individual sg element length, which would be s->length.
> > 
> > The check will then use the kernel start address of an sg element, and add
> > the DMA length for overlap check, which can result in the false-positive
> > warning because the DMA length can be larger than the actual single sg
> > element length in kernel address space.
> > 
> > In addition, the call to check_for_illegal_area() happens in the iteration
> > over mapped_ents, which will not include all individual sg elements if
> > any of them were combined in ->map_sg().
> > 
> > Fix this by using s->length instead of sg_dma_len(s). Also put the call to
> > check_for_illegal_area() in a separate loop, iterating over all the
> > individual sg elements ("nents" instead of "mapped_ents").
> > 
> > Fixes: 884d05970bfb ("dma-debug: use sg_dma_len accessor")
> > Tested-by: Niklas Schnelle 
> > Signed-off-by: Gerald Schaefer 
> > ---
> >   kernel/dma/debug.c | 10 ++
> >   1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
> > index 14de1271463f..d7d44b7fe7e2 100644
> > --- a/kernel/dma/debug.c
> > +++ b/kernel/dma/debug.c
> > @@ -1299,6 +1299,12 @@ void debug_dma_map_sg(struct device *dev, struct 
> > scatterlist *sg,
> > if (unlikely(dma_debug_disabled()))
> > return;
> >   
> > +   for_each_sg(sg, s, nents, i) {
> > +   if (!PageHighMem(sg_page(s))) {
> > +   check_for_illegal_area(dev, sg_virt(s), s->length);
> > +   }
> > +   }
> > +
> > for_each_sg(sg, s, mapped_ents, i) {
> > entry = dma_entry_alloc();
> > if (!entry)
> > @@ -1316,10 +1322,6 @@ void debug_dma_map_sg(struct device *dev, struct 
> > scatterlist *sg,
> >   
> > check_for_stack(dev, sg_page(s), s->offset);
> 
> Strictly this should probably be moved to the new loop as well, as it is 
> similarly concerned with validating the source segments rather than the 
> DMA mappings - I think with virtually-mapped stacks it might technically 
> be possible for a stack page to be physically adjacent to a "valid" page 
> such that it could get merged and overlooked if it were near the end of 
> the list, although in fairness that would probably be indicative of 
> something having gone far more fundamentally wrong. Otherwise, the 
> overall reasoning looks sound to me.

I see, good point. I think I can add this to my patch, and a different
subject like "dma-debug: fix sg checks in debug_dma_map_sg()".

However, I do not quite understand why check_for_stack() does not also
consider s->length. It seems to check only the first page of an sg
element.

So, shouldn't check_for_stack() behave similar to check_for_illegal_area(),
i.e. check all source sg elements for overlap with the task stack area?

If yes, then this probably should be a separate patch, but I can try
to come up with something and send a new RFC with two patches. Maybe
check_for_stack() can also be integrated into check_for_illegal_area(),
they are both called at the same places. And mapping memory from the
stack also sounds rather illegal.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Fallback to default setting when def_domain_type() callback returns 0

2021-07-06 Thread Robin Murphy

On 2021-07-06 17:21, Kai-Heng Feng wrote:

On Tue, Jul 6, 2021 at 5:27 PM Robin Murphy  wrote:


On 2021-07-06 07:51, Kai-Heng Feng wrote:

Commit 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted
device into core") not only moved the check for untrusted device to
IOMMU core, it also introduced a behavioral change by returning
def_domain_type() directly without checking its return value. That makes
many devices no longer use the default IOMMU setting.

So revert back to the old behavior which defaults to
iommu_def_domain_type when driver callback returns 0.

Fixes: 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted device into 
core")


Are you sure about that? From that same commit:

@@ -1507,7 +1509,7 @@ static int iommu_alloc_default_domain(struct
iommu_group *group,
  if (group->default_domain)
  return 0;

-   type = iommu_get_def_domain_type(dev);
+   type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type;

  return iommu_group_alloc_default_domain(dev->bus, group, type);
   }

AFAICS the other two callers should also handle 0 correctly. Have you
seen a problem in practice?


Thanks for pointing out how the return value is being handled by the callers.
However, the same check is missing in probe_get_default_domain_type():
static int probe_get_default_domain_type(struct device *dev, void *data)
{
 struct __group_domain_type *gtype = data;
 unsigned int type = iommu_get_def_domain_type(dev);
...
}


I'm still not following - the next line right after that is "if (type)", 
which means it won't touch gtype, and if that happens for every 
iteration, probe_alloc_default_domain() subsequently hits its "if 
(!gtype.type)" condition and still ends up with iommu_def_domain_type. 
This *was* one of the other two callers I was talking about (the second 
being iommu_change_dev_def_domain()), and in fact on second look I think 
your proposed change will actually break this logic, since it's 
necessary to differentiate between a specific type being requested for 
the given device, and a "don't care" response which only implies to use 
the global default type if it's still standing after *all* the 
appropriate devices have been queried.



I personally prefer the old way instead of open coding with ternary
operator, so I'll do that in v2.

In practice, this causes a kernel panic when probing Realtek WiFi.
Because of the bug, dma_ops isn't set by probe_finalize(),
dma_map_single() falls back to swiotlb which isn't set and caused a
kernel panic.


Hmm, but if that's the case, wouldn't it still be a problem anyway if 
the end result was IOMMU_DOMAIN_IDENTITY? I can't claim to fully 
understand the x86 swiotlb and no_iommu dance, but nothing really stands 
out to give me confidence that it handles the passthrough options correctly.


Robin.


I didn't attach the panic log because the system simply is frozen at
that point so the message is not logged to the storage.
I'll see if I can find another way to collect the log and attach it in v2.

Kai-Heng



Robin.


Signed-off-by: Kai-Heng Feng 
---
   drivers/iommu/iommu.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5419c4b9f27a..faac4f795025 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1507,14 +1507,15 @@ EXPORT_SYMBOL_GPL(fsl_mc_device_group);
   static int iommu_get_def_domain_type(struct device *dev)
   {
   const struct iommu_ops *ops = dev->bus->iommu_ops;
+ unsigned int type = 0;

   if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted)
   return IOMMU_DOMAIN_DMA;

   if (ops->def_domain_type)
- return ops->def_domain_type(dev);
+ type = ops->def_domain_type(dev);

- return 0;
+ return (type == 0) ? iommu_def_domain_type : type;
   }

   static int iommu_group_alloc_default_domain(struct bus_type *bus,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Will Deacon
On Tue, Jul 06, 2021 at 04:39:11PM +0100, Robin Murphy wrote:
> On 2021-07-06 15:05, Christoph Hellwig wrote:
> > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > FWIW I was pondering the question of whether to do something along those
> > > lines or just scrap the default assignment entirely, so since I hadn't got
> > > round to saying that I've gone ahead and hacked up the alternative
> > > (similarly untested) for comparison :)
> > > 
> > > TBH I'm still not sure which one I prefer...
> > 
> > Claire did implement something like your suggestion originally, but
> > I don't really like it as it doesn't scale for adding multiple global
> > pools, e.g. for the 64-bit addressable one for the various encrypted
> > secure guest schemes.
> 
> Ah yes, that had slipped my mind, and it's a fair point indeed. Since we're
> not concerned with a minimal fix for backports anyway I'm more than happy to
> focus on Will's approach. Another thing is that that looks to take us a
> quiet step closer to the possibility of dynamically resizing a SWIOTLB pool,
> which is something that some of the hypervisor protection schemes looking to
> build on top of this series may want to explore at some point.

Ok, I'll split that nasty diff I posted up into a reviewable series and we
can take it from there.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Konrad Rzeszutek Wilk
On Tue, Jul 06, 2021 at 05:57:21PM +0100, Will Deacon wrote:
> On Tue, Jul 06, 2021 at 10:46:07AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > > FWIW I was pondering the question of whether to do something along 
> > > > those 
> > > > lines or just scrap the default assignment entirely, so since I hadn't 
> > > > got 
> > > > round to saying that I've gone ahead and hacked up the alternative 
> > > > (similarly untested) for comparison :)
> > > >
> > > > TBH I'm still not sure which one I prefer...
> > > 
> > > Claire did implement something like your suggestion originally, but
> > > I don't really like it as it doesn't scale for adding multiple global
> > > pools, e.g. for the 64-bit addressable one for the various encrypted
> > > secure guest schemes.
> > 
> > Couple of things:
> >  - I am not pushing to Linus the Claire's patchset until we have a
> >resolution on this. I hope you all agree that is a sensible way
> >forward as much as I hate doing that.
> 
> Sure, it's a pity but we could clearly use a bit more time to get these
> just right and we've run out of time for 5.14.
> 
> I think the main question I have is how would you like to see patches for
> 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?

Yes that would be perfect. If there are any dependencies on the rc1, I
can rebase it on top of that.

> 
> Cheers,
> 
> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Will Deacon
On Tue, Jul 06, 2021 at 10:46:07AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > FWIW I was pondering the question of whether to do something along those 
> > > lines or just scrap the default assignment entirely, so since I hadn't 
> > > got 
> > > round to saying that I've gone ahead and hacked up the alternative 
> > > (similarly untested) for comparison :)
> > >
> > > TBH I'm still not sure which one I prefer...
> > 
> > Claire did implement something like your suggestion originally, but
> > I don't really like it as it doesn't scale for adding multiple global
> > pools, e.g. for the 64-bit addressable one for the various encrypted
> > secure guest schemes.
> 
> Couple of things:
>  - I am not pushing to Linus the Claire's patchset until we have a
>resolution on this. I hope you all agree that is a sensible way
>forward as much as I hate doing that.

Sure, it's a pity but we could clearly use a bit more time to get these
just right and we've run out of time for 5.14.

I think the main question I have is how would you like to see patches for
5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"

2021-07-06 Thread Will Deacon
On Mon, Jul 05, 2021 at 08:56:57AM +0200, Marek Szyprowski wrote:
> QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller,
> what fails for the second and latter IOMMU devices. This is intended and
> must be not fatal to the driver registration process. Also the cleanup
> path should take care of the runtime PM state, what is missing in the
> current patch. Revert relevant changes to the QCOM IOMMU driver until
> a proper fix is prepared.
> 
> This partially reverts commit 249c9dc6aa0db74a0f7908efd04acf774e19b155.
> 
> Fixes: 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error 
> path")
> Suggested-by: Will Deacon 
> Signed-off-by: Marek Szyprowski 
> ---
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c | 13 ++---
>  1 file changed, 2 insertions(+), 11 deletions(-)

Thanks, Marek:

Acked-by: Will Deacon 

Joerg -- please can you pick this up as a fix?

Cheers,

Will

> diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
> b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> index 25ed444ff94d..021cf8f65ffc 100644
> --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> @@ -849,12 +849,10 @@ static int qcom_iommu_device_probe(struct 
> platform_device *pdev)
>   ret = iommu_device_register(&qcom_iommu->iommu, &qcom_iommu_ops, dev);
>   if (ret) {
>   dev_err(dev, "Failed to register iommu\n");
> - goto err_sysfs_remove;
> + return ret;
>   }
>  
> - ret = bus_set_iommu(&platform_bus_type, &qcom_iommu_ops);
> - if (ret)
> - goto err_unregister_device;
> + bus_set_iommu(&platform_bus_type, &qcom_iommu_ops);
>  
>   if (qcom_iommu->local_base) {
>   pm_runtime_get_sync(dev);
> @@ -863,13 +861,6 @@ static int qcom_iommu_device_probe(struct 
> platform_device *pdev)
>   }
>  
>   return 0;
> -
> -err_unregister_device:
> - iommu_device_unregister(&qcom_iommu->iommu);
> -
> -err_sysfs_remove:
> - iommu_device_sysfs_remove(&qcom_iommu->iommu);
> - return ret;
>  }
>  
>  static int qcom_iommu_device_remove(struct platform_device *pdev)
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Fallback to default setting when def_domain_type() callback returns 0

2021-07-06 Thread Kai-Heng Feng
On Tue, Jul 6, 2021 at 5:27 PM Robin Murphy  wrote:
>
> On 2021-07-06 07:51, Kai-Heng Feng wrote:
> > Commit 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted
> > device into core") not only moved the check for untrusted device to
> > IOMMU core, it also introduced a behavioral change by returning
> > def_domain_type() directly without checking its return value. That makes
> > many devices no longer use the default IOMMU setting.
> >
> > So revert back to the old behavior which defaults to
> > iommu_def_domain_type when driver callback returns 0.
> >
> > Fixes: 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted 
> > device into core")
>
> Are you sure about that? From that same commit:
>
> @@ -1507,7 +1509,7 @@ static int iommu_alloc_default_domain(struct
> iommu_group *group,
>  if (group->default_domain)
>  return 0;
>
> -   type = iommu_get_def_domain_type(dev);
> +   type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type;
>
>  return iommu_group_alloc_default_domain(dev->bus, group, type);
>   }
>
> AFAICS the other two callers should also handle 0 correctly. Have you
> seen a problem in practice?

Thanks for pointing out how the return value is being handled by the callers.
However, the same check is missing in probe_get_default_domain_type():
static int probe_get_default_domain_type(struct device *dev, void *data)
{
struct __group_domain_type *gtype = data;
unsigned int type = iommu_get_def_domain_type(dev);
...
}

I personally prefer the old way instead of open coding with ternary
operator, so I'll do that in v2.

In practice, this causes a kernel panic when probing Realtek WiFi.
Because of the bug, dma_ops isn't set by probe_finalize(),
dma_map_single() falls back to swiotlb which isn't set and caused a
kernel panic.
I didn't attach the panic log because the system simply is frozen at
that point so the message is not logged to the storage.
I'll see if I can find another way to collect the log and attach it in v2.

Kai-Heng

>
> Robin.
>
> > Signed-off-by: Kai-Heng Feng 
> > ---
> >   drivers/iommu/iommu.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 5419c4b9f27a..faac4f795025 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1507,14 +1507,15 @@ EXPORT_SYMBOL_GPL(fsl_mc_device_group);
> >   static int iommu_get_def_domain_type(struct device *dev)
> >   {
> >   const struct iommu_ops *ops = dev->bus->iommu_ops;
> > + unsigned int type = 0;
> >
> >   if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted)
> >   return IOMMU_DOMAIN_DMA;
> >
> >   if (ops->def_domain_type)
> > - return ops->def_domain_type(dev);
> > + type = ops->def_domain_type(dev);
> >
> > - return 0;
> > + return (type == 0) ? iommu_def_domain_type : type;
> >   }
> >
> >   static int iommu_group_alloc_default_domain(struct bus_type *bus,
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Robin Murphy

On 2021-07-06 15:05, Christoph Hellwig wrote:

On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:

FWIW I was pondering the question of whether to do something along those
lines or just scrap the default assignment entirely, so since I hadn't got
round to saying that I've gone ahead and hacked up the alternative
(similarly untested) for comparison :)

TBH I'm still not sure which one I prefer...


Claire did implement something like your suggestion originally, but
I don't really like it as it doesn't scale for adding multiple global
pools, e.g. for the 64-bit addressable one for the various encrypted
secure guest schemes.


Ah yes, that had slipped my mind, and it's a fair point indeed. Since 
we're not concerned with a minimal fix for backports anyway I'm more 
than happy to focus on Will's approach. Another thing is that that looks 
to take us a quiet step closer to the possibility of dynamically 
resizing a SWIOTLB pool, which is something that some of the hypervisor 
protection schemes looking to build on top of this series may want to 
explore at some point.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Konrad Rzeszutek Wilk
On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > FWIW I was pondering the question of whether to do something along those 
> > lines or just scrap the default assignment entirely, so since I hadn't got 
> > round to saying that I've gone ahead and hacked up the alternative 
> > (similarly untested) for comparison :)
> >
> > TBH I'm still not sure which one I prefer...
> 
> Claire did implement something like your suggestion originally, but
> I don't really like it as it doesn't scale for adding multiple global
> pools, e.g. for the 64-bit addressable one for the various encrypted
> secure guest schemes.

Couple of things:
 - I am not pushing to Linus the Claire's patchset until we have a
   resolution on this. I hope you all agree that is a sensible way
   forward as much as I hate doing that.

 - I like Robin's fix as it is simplest looking. Would love to see if it
   does fix the problem.

 - Christopher - we can always add multiple pools as the next milestone
   and just focus on this feature getting tested extensively during this
   release.

 - Would it be worth (for future or maybe in another tiny fix) to also add
   a printk in swiotlb when we de-allocate the buffer so when someone looks
   through the `dmesg` it becomes much easier to diagnose issues?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/5] drm/msm: Improve the a6xx page fault handler

2021-07-06 Thread Yassine Oudjana via iommu
In-Reply-To: <20210610214431.539029-4-robdcl...@gmail.com>

On Thu 10 Jun 16:44 CDT 2021, Rob Clark wrote:

> From: Jordan Crouse 
>
> Use the new adreno-smmu-priv fault info function to get more SMMU
> debug registers and print the current TTBR0 to debug per-instance
> pagetables and figure out which GPU block generated the request.
>
> Signed-off-by: Jordan Crouse 
> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  4 +-
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 76 +--
>  drivers/gpu/drm/msm/msm_iommu.c   | 11 +++-
>  drivers/gpu/drm/msm/msm_mmu.h |  4 +-
>  4 files changed, 87 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index f46562c12022..eb030b00bff4 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1075,7 +1075,7 @@ bool a5xx_idle(struct msm_gpu *gpu, struct 
> msm_ringbuffer *ring)
>   return true;
>  }
>
> -static int a5xx_fault_handler(void *arg, unsigned long iova, int flags)
> +static int a5xx_fault_handler(void *arg, unsigned long iova, int flags, void 
> *data)
>  {
>   struct msm_gpu *gpu = arg;
>   pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d 
> (%u,%u,%u,%u)\n",
> @@ -1085,7 +1085,7 @@ static int a5xx_fault_handler(void *arg, unsigned long 
> iova, int flags)
>   gpu_read(gpu, REG_A5XX_CP_SCRATCH_REG(6)),
>   gpu_read(gpu, REG_A5XX_CP_SCRATCH_REG(7)));
>
> - return -EFAULT;
> + return 0;
>  }
>
>  static void a5xx_cp_err_irq(struct msm_gpu *gpu)
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index c7f0ddb12d8f..fc19db10bff1 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1032,18 +1032,88 @@ static void a6xx_recover(struct msm_gpu *gpu)
>   msm_gpu_hw_init(gpu);
>  }
>
> -static int a6xx_fault_handler(void *arg, unsigned long iova, int flags)
> +static const char *a6xx_uche_fault_block(struct msm_gpu *gpu, u32 mid)
> +{
> + static const char *uche_clients[7] = {
> + "VFD", "SP", "VSC", "VPC", "HLSQ", "PC", "LRZ",
> + };
> + u32 val;
> +
> + if (mid < 1 || mid > 3)
> + return "UNKNOWN";
> +
> + /*
> +  * The source of the data depends on the mid ID read from FSYNR1.
> +  * and the client ID read from the UCHE block
> +  */
> + val = gpu_read(gpu, REG_A6XX_UCHE_CLIENT_PF);
> +
> + /* mid = 3 is most precise and refers to only one block per client */
> + if (mid == 3)
> + return uche_clients[val & 7];
> +
> + /* For mid=2 the source is TP or VFD except when the client id is 0 */
> + if (mid == 2)
> + return ((val & 7) == 0) ? "TP" : "TP|VFD";
> +
> + /* For mid=1 just return "UCHE" as a catchall for everything else */
> + return "UCHE";
> +}
> +
> +static const char *a6xx_fault_block(struct msm_gpu *gpu, u32 id)
> +{
> + if (id == 0)
> + return "CP";
> + else if (id == 4)
> + return "CCU";
> + else if (id == 6)
> + return "CDP Prefetch";
> +
> + return a6xx_uche_fault_block(gpu, id);
> +}
> +
> +#define ARM_SMMU_FSR_TF BIT(1)
> +#define ARM_SMMU_FSR_PF  BIT(3)
> +#define ARM_SMMU_FSR_EF  BIT(4)
> +
> +static int a6xx_fault_handler(void *arg, unsigned long iova, int flags, void 
> *data)
>  {
>   struct msm_gpu *gpu = arg;
> + struct adreno_smmu_fault_info *info = data;
> + const char *type = "UNKNOWN";
>
> - pr_warn_ratelimited("*** gpu fault: iova=%08lx, flags=%d 
> (%u,%u,%u,%u)\n",
> + /*
> +  * Print a default message if we couldn't get the data from the
> +  * adreno-smmu-priv
> +  */
> + if (!info) {
> + pr_warn_ratelimited("*** gpu fault: iova=%.16lx flags=%d 
> (%u,%u,%u,%u)\n",
>   iova, flags,
>   gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4)),
>   gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(5)),
>   gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(6)),
>   gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(7)));
>
> - return -EFAULT;
> + return 0;
> + }
> +
> + if (info->fsr & ARM_SMMU_FSR_TF)
> + type = "TRANSLATION";
> + else if (info->fsr & ARM_SMMU_FSR_PF)
> + type = "PERMISSION";
> + else if (info->fsr & ARM_SMMU_FSR_EF)
> + type = "EXTERNAL";
> +
> + pr_warn_ratelimited("*** gpu fault: ttbr0=%.16llx iova=%.16lx dir=%s 
> type=%s source=%s (%u,%u,%u,%u)\n",
> + info->ttbr0, iova,
> + flags & IOMMU_FAULT_WRITE ? "WRITE" : "READ", type,
> + a6xx_fault_block(gpu, info->fsynr1 & 0xff),
> + gpu_read(gpu, REG_A6XX_CP_SCRATCH_REG(4))

Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Christoph Hellwig
On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> FWIW I was pondering the question of whether to do something along those 
> lines or just scrap the default assignment entirely, so since I hadn't got 
> round to saying that I've gone ahead and hacked up the alternative 
> (similarly untested) for comparison :)
>
> TBH I'm still not sure which one I prefer...

Claire did implement something like your suggestion originally, but
I don't really like it as it doesn't scale for adding multiple global
pools, e.g. for the 64-bit addressable one for the various encrypted
secure guest schemes.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Robin Murphy

On 2021-07-06 14:24, Will Deacon wrote:

On Tue, Jul 06, 2021 at 06:48:48AM +0200, Christoph Hellwig wrote:

On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote:

So at this point, the AMD IOMMU driver does:

swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;

where 'swiotlb' is a global variable indicating whether or not swiotlb
is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which
will call swiotlb_exit() if 'swiotlb' is false.

Now, that used to work fine, because swiotlb_exit() clears
'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I
think that all the devices which have successfully probed beforehand will
have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem'
field.


Yeah.  I don't think we can do that anymore, and I also think it is
a bad idea to start with.


I've had a crack at reworking things along the following lines:

   - io_tlb_default_mem now lives in the BSS, the flexible array member
 is now a pointer and that part is allocated dynamically (downside of
 this is an extra indirection to get at the slots).

   - io_tlb_default_mem.nslabs tells you whether the thing is valid

   - swiotlb_exit() frees the slots array and clears the rest of the
 structure to 0. I also extended it to free the actual slabs, but I'm
 not sure why it wasn't doing that before.

So a non-NULL dev->dma_io_tlb_mem should always be valid to follow.


FWIW I was pondering the question of whether to do something along those 
lines or just scrap the default assignment entirely, so since I hadn't 
got round to saying that I've gone ahead and hacked up the alternative 
(similarly untested) for comparison :)


TBH I'm still not sure which one I prefer...

Robin.

->8-
diff --git a/drivers/base/core.c b/drivers/base/core.c
index ea5b85354526..394abf184c1a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2847,9 +2847,6 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
-#ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = io_tlb_default_mem;
-#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 39284ff2a6cd..620f16d89a98 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -107,16 +107,21 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;

+static inline struct io_tlb_mem *dev_iotlb_mem(struct device *dev)
+{
+   return dev->dma_io_tlb_mem ?: io_tlb_default_mem;
+}
+
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t 
paddr)

 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);

return mem && paddr >= mem->start && paddr < mem->end;
 }

 static inline bool is_swiotlb_force_bounce(struct device *dev)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);

return mem && mem->force_bounce;
 }
@@ -167,7 +172,7 @@ bool swiotlb_free(struct device *dev, struct page 
*page, size_t size);


 static inline bool is_swiotlb_for_alloc(struct device *dev)
 {
-   return dev->dma_io_tlb_mem->for_alloc;
+   return dev_iotlb_mem(dev)->for_alloc;
 }
 #else
 static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b7f76bca89bf..f4942149f87d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -359,7 +359,7 @@ static unsigned int swiotlb_align_offset(struct 
device *dev, u64 addr)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, 
size_t size,

   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -440,7 +440,7 @@ static unsigned int wrap_index(struct io_tlb_mem 
*mem, unsigned int index)

 static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
  size_t alloc_size)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -522,7 +522,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device 
*dev, phys_addr_t orig_addr,

size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev_iotlb_mem(dev);
unsigned int offset = swiotlb_al

Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Will Deacon
On Tue, Jul 06, 2021 at 06:48:48AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 05, 2021 at 08:03:52PM +0100, Will Deacon wrote:
> > So at this point, the AMD IOMMU driver does:
> > 
> > swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
> > 
> > where 'swiotlb' is a global variable indicating whether or not swiotlb
> > is in use. It's picked up a bit later on by pci_swiotlb_late_init(), which
> > will call swiotlb_exit() if 'swiotlb' is false.
> > 
> > Now, that used to work fine, because swiotlb_exit() clears
> > 'io_tlb_default_mem' to NULL, but now with the restricted DMA changes, I
> > think that all the devices which have successfully probed beforehand will
> > have stale pointers to the freed structure in their 'dev->dma_io_tlb_mem'
> > field.
> 
> Yeah.  I don't think we can do that anymore, and I also think it is
> a bad idea to start with.

I've had a crack at reworking things along the following lines:

  - io_tlb_default_mem now lives in the BSS, the flexible array member
is now a pointer and that part is allocated dynamically (downside of
this is an extra indirection to get at the slots).

  - io_tlb_default_mem.nslabs tells you whether the thing is valid

  - swiotlb_exit() frees the slots array and clears the rest of the
structure to 0. I also extended it to free the actual slabs, but I'm
not sure why it wasn't doing that before.

So a non-NULL dev->dma_io_tlb_mem should always be valid to follow.

Untested diff below... Nathan, it would be ace if you're brave enough
to give this a shot.

Will

--->8

diff --git a/drivers/base/core.c b/drivers/base/core.c
index bbad7c559901..9e1218f89e4b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2820,7 +2820,7 @@ void device_initialize(struct device *dev)
dev->dma_coherent = dma_default_coherent;
 #endif
 #ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = io_tlb_default_mem;
+   dev->dma_io_tlb_mem = &io_tlb_default_mem;
 #endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 785ec7e8be01..f06d9b4f1e0f 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -164,7 +164,7 @@ int __ref xen_swiotlb_init(void)
int rc = -ENOMEM;
char *start;
 
-   if (io_tlb_default_mem != NULL) {
+   if (io_tlb_default_mem.nslabs) {
pr_warn("swiotlb buffer already initialized\n");
return -EEXIST;
}
@@ -547,7 +547,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem->end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 39284ff2a6cd..b0cb2a9973f4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -103,9 +103,9 @@ struct io_tlb_mem {
phys_addr_t orig_addr;
size_t alloc_size;
unsigned int list;
-   } slots[];
+   } *slots;
 };
-extern struct io_tlb_mem *io_tlb_default_mem;
+extern struct io_tlb_mem io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0ffbaae9fba2..91cd1d413027 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -70,7 +70,7 @@
 
 enum swiotlb_force swiotlb_force;
 
-struct io_tlb_mem *io_tlb_default_mem;
+struct io_tlb_mem io_tlb_default_mem;
 
 /*
  * Max segment that we can provide which (if pages are contingous) will
@@ -101,7 +101,7 @@ early_param("swiotlb", setup_io_tlb_npages);
 
 unsigned int swiotlb_max_segment(void)
 {
-   return io_tlb_default_mem ? max_segment : 0;
+   return io_tlb_default_mem.nslabs ? max_segment : 0;
 }
 EXPORT_SYMBOL_GPL(swiotlb_max_segment);
 
@@ -134,9 +134,9 @@ void __init swiotlb_adjust_size(unsigned long size)
 
 void swiotlb_print_info(void)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = &io_tlb_default_mem;
 
-   if (!mem) {
+   if (!mem->nslabs) {
pr_warn("No low mem\n");
return;
}
@@ -163,11 +163,11 @@ static inline unsigned long nr_slots(u64 val)
  */
 void __init swiotlb_update_mem_attributes(void)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = &io_tlb_default_mem;
void *vaddr;
unsigned long bytes;
 
-   if (!mem || mem->late_alloc)
+   if (!mem->nslabs || mem->late_alloc)
return;
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
@@ -201,25 +201,24 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
 
 int __init swiotlb_init_with_tbl(ch

Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-06 Thread Stefan Hajnoczi
On Tue, Jul 06, 2021 at 11:04:18AM +0800, Yongji Xie wrote:
> On Mon, Jul 5, 2021 at 8:50 PM Stefan Hajnoczi  wrote:
> >
> > On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote:
> > >
> > > 在 2021/7/4 下午5:49, Yongji Xie 写道:
> > > > > > OK, I get you now. Since the VIRTIO specification says "Device
> > > > > > configuration space is generally used for rarely-changing or
> > > > > > initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG
> > > > > > ioctl should not be called frequently.
> > > > > The spec uses MUST and other terms to define the precise requirements.
> > > > > Here the language (especially the word "generally") is weaker and 
> > > > > means
> > > > > there may be exceptions.
> > > > >
> > > > > Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG
> > > > > approach is reads that have side-effects. For example, imagine a field
> > > > > containing an error code if the device encounters a problem unrelated 
> > > > > to
> > > > > a specific virtqueue request. Reading from this field resets the error
> > > > > code to 0, saving the driver an extra configuration space write access
> > > > > and possibly race conditions. It isn't possible to implement those
> > > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it
> > > > > makes me think that the interface does not allow full VIRTIO 
> > > > > semantics.
> > >
> > >
> > > Note that though you're correct, my understanding is that config space is
> > > not suitable for this kind of error propagating. And it would be very hard
> > > to implement such kind of semantic in some transports.  Virtqueue should 
> > > be
> > > much better. As Yong Ji quoted, the config space is used for
> > > "rarely-changing or intialization-time parameters".
> > >
> > >
> > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to
> > > > handle the message failure, I'm going to add a return value to
> > > > virtio_config_ops.get() and virtio_cread_* API so that the error can
> > > > be propagated to the virtio device driver. Then the virtio-blk device
> > > > driver can be modified to handle that.
> > > >
> > > > Jason and Stefan, what do you think of this way?
> >
> > Why does VDUSE_DEV_GET_CONFIG need to support an error return value?
> >
> 
> We add a timeout and return error in case userspace never replies to
> the message.
> 
> > The VIRTIO spec provides no way for the device to report errors from
> > config space accesses.
> >
> > The QEMU virtio-pci implementation returns -1 from invalid
> > virtio_config_read*() and silently discards virtio_config_write*()
> > accesses.
> >
> > VDUSE can take the same approach with
> > VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG.
> >
> 
> I noticed that virtio_config_read*() only returns -1 when we access a
> invalid field. But in the VDUSE case, VDUSE_DEV_GET_CONFIG might fail
> when we access a valid field. Not sure if it's ok to silently ignore
> this kind of error.

That's a good point but it's a general VIRTIO issue. Any device
implementation (QEMU userspace, hardware vDPA, etc) can fail, so the
VIRTIO specification needs to provide a way for the driver to detect
this.

If userspace violates the contract then VDUSE needs to mark the device
broken. QEMU's device emulation does something similar with the
vdev->broken flag.

The VIRTIO Device Status field DEVICE_NEEDS_RESET bit can be set by
vDPA/VDUSE to indicate that the device is not operational and must be
reset.

The driver code may still process the -1 value read from the
configuration space. Hopefully this isn't a problem. There is currently
no VIRTIO interface besides DEVICE_NEEDS_RESET to indicate configuration
space access failures. On the other hand, drivers need to handle
malicious devices so they should be able to cope with the -1 value
anyway.

Stefan


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-06 Thread Stefan Hajnoczi
On Tue, Jul 06, 2021 at 10:34:33AM +0800, Jason Wang wrote:
> 
> 在 2021/7/5 下午8:49, Stefan Hajnoczi 写道:
> > On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote:
> > > 在 2021/7/4 下午5:49, Yongji Xie 写道:
> > > > > > OK, I get you now. Since the VIRTIO specification says "Device
> > > > > > configuration space is generally used for rarely-changing or
> > > > > > initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG
> > > > > > ioctl should not be called frequently.
> > > > > The spec uses MUST and other terms to define the precise requirements.
> > > > > Here the language (especially the word "generally") is weaker and 
> > > > > means
> > > > > there may be exceptions.
> > > > > 
> > > > > Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG
> > > > > approach is reads that have side-effects. For example, imagine a field
> > > > > containing an error code if the device encounters a problem unrelated 
> > > > > to
> > > > > a specific virtqueue request. Reading from this field resets the error
> > > > > code to 0, saving the driver an extra configuration space write access
> > > > > and possibly race conditions. It isn't possible to implement those
> > > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it
> > > > > makes me think that the interface does not allow full VIRTIO 
> > > > > semantics.
> > > 
> > > Note that though you're correct, my understanding is that config space is
> > > not suitable for this kind of error propagating. And it would be very hard
> > > to implement such kind of semantic in some transports.  Virtqueue should 
> > > be
> > > much better. As Yong Ji quoted, the config space is used for
> > > "rarely-changing or intialization-time parameters".
> > > 
> > > 
> > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to
> > > > handle the message failure, I'm going to add a return value to
> > > > virtio_config_ops.get() and virtio_cread_* API so that the error can
> > > > be propagated to the virtio device driver. Then the virtio-blk device
> > > > driver can be modified to handle that.
> > > > 
> > > > Jason and Stefan, what do you think of this way?
> > Why does VDUSE_DEV_GET_CONFIG need to support an error return value?
> > 
> > The VIRTIO spec provides no way for the device to report errors from
> > config space accesses.
> > 
> > The QEMU virtio-pci implementation returns -1 from invalid
> > virtio_config_read*() and silently discards virtio_config_write*()
> > accesses.
> > 
> > VDUSE can take the same approach with
> > VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG.
> > 
> > > I'd like to stick to the current assumption thich get_config won't fail.
> > > That is to say,
> > > 
> > > 1) maintain a config in the kernel, make sure the config space read can
> > > always succeed
> > > 2) introduce an ioctl for the vduse usersapce to update the config space.
> > > 3) we can synchronize with the vduse userspace during set_config
> > > 
> > > Does this work?
> > I noticed that caching is also allowed by the vhost-user protocol
> > messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't
> > know whether or not caching is in effect. The interface you outlined
> > above requires caching.
> > 
> > Is there a reason why the host kernel vDPA code needs to cache the
> > configuration space?
> 
> 
> Because:
> 
> 1) Kernel can not wait forever in get_config(), this is the major difference
> with vhost-user.

virtio_cread() can sleep:

  #define virtio_cread(vdev, structname, member, ptr) \
  do {\
  typeof(((structname*)0)->member) virtio_cread_v;\
  \
  might_sleep();  \
  ^^

Which code path cannot sleep?

> 2) Stick to the current assumption that virtio_cread() should always
> succeed.

That can be done by reading -1 (like QEMU does) when the read fails.

Stefan


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Fallback to default setting when def_domain_type() callback returns 0

2021-07-06 Thread Robin Murphy

On 2021-07-06 07:51, Kai-Heng Feng wrote:

Commit 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted
device into core") not only moved the check for untrusted device to
IOMMU core, it also introduced a behavioral change by returning
def_domain_type() directly without checking its return value. That makes
many devices no longer use the default IOMMU setting.

So revert back to the old behavior which defaults to
iommu_def_domain_type when driver callback returns 0.

Fixes: 28b41e2c6aeb ("iommu: Move def_domain type check for untrusted device into 
core")


Are you sure about that? From that same commit:

@@ -1507,7 +1509,7 @@ static int iommu_alloc_default_domain(struct 
iommu_group *group,

if (group->default_domain)
return 0;

-   type = iommu_get_def_domain_type(dev);
+   type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type;

return iommu_group_alloc_default_domain(dev->bus, group, type);
 }

AFAICS the other two callers should also handle 0 correctly. Have you 
seen a problem in practice?


Robin.


Signed-off-by: Kai-Heng Feng 
---
  drivers/iommu/iommu.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5419c4b9f27a..faac4f795025 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1507,14 +1507,15 @@ EXPORT_SYMBOL_GPL(fsl_mc_device_group);
  static int iommu_get_def_domain_type(struct device *dev)
  {
const struct iommu_ops *ops = dev->bus->iommu_ops;
+   unsigned int type = 0;
  
  	if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted)

return IOMMU_DOMAIN_DMA;
  
  	if (ops->def_domain_type)

-   return ops->def_domain_type(dev);
+   type = ops->def_domain_type(dev);
  
-	return 0;

+   return (type == 0) ? iommu_def_domain_type : type;
  }
  
  static int iommu_group_alloc_default_domain(struct bus_type *bus,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/1] dma-debug: fix check_for_illegal_area() in debug_dma_map_sg()

2021-07-06 Thread Robin Murphy

On 2021-07-05 19:52, Gerald Schaefer wrote:

The following warning occurred sporadically on s390:
DMA-API: nvme 0006:00:00.0: device driver maps memory from kernel text or 
rodata [addr=48cc5e2f] [len=131072]
WARNING: CPU: 4 PID: 825 at kernel/dma/debug.c:1083 
check_for_illegal_area+0xa8/0x138

It is a false-positive warning, due to a broken logic in debug_dma_map_sg().
check_for_illegal_area() should check for overlay of sg elements with kernel
text or rodata. It is called with sg_dma_len(s) instead of s->length as
parameter. After the call to ->map_sg(), sg_dma_len() contains the length
of possibly combined sg elements in the DMA address space, and not the
individual sg element length, which would be s->length.

The check will then use the kernel start address of an sg element, and add
the DMA length for overlap check, which can result in the false-positive
warning because the DMA length can be larger than the actual single sg
element length in kernel address space.

In addition, the call to check_for_illegal_area() happens in the iteration
over mapped_ents, which will not include all individual sg elements if
any of them were combined in ->map_sg().

Fix this by using s->length instead of sg_dma_len(s). Also put the call to
check_for_illegal_area() in a separate loop, iterating over all the
individual sg elements ("nents" instead of "mapped_ents").

Fixes: 884d05970bfb ("dma-debug: use sg_dma_len accessor")
Tested-by: Niklas Schnelle 
Signed-off-by: Gerald Schaefer 
---
  kernel/dma/debug.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 14de1271463f..d7d44b7fe7e2 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -1299,6 +1299,12 @@ void debug_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
if (unlikely(dma_debug_disabled()))
return;
  
+	for_each_sg(sg, s, nents, i) {

+   if (!PageHighMem(sg_page(s))) {
+   check_for_illegal_area(dev, sg_virt(s), s->length);
+   }
+   }
+
for_each_sg(sg, s, mapped_ents, i) {
entry = dma_entry_alloc();
if (!entry)
@@ -1316,10 +1322,6 @@ void debug_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
  
  		check_for_stack(dev, sg_page(s), s->offset);


Strictly this should probably be moved to the new loop as well, as it is 
similarly concerned with validating the source segments rather than the 
DMA mappings - I think with virtually-mapped stacks it might technically 
be possible for a stack page to be physically adjacent to a "valid" page 
such that it could get merged and overlooked if it were near the end of 
the list, although in fairness that would probably be indicative of 
something having gone far more fundamentally wrong. Otherwise, the 
overall reasoning looks sound to me.


Robin.

  
-		if (!PageHighMem(sg_page(s))) {

-   check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s));
-   }
-
check_sg_segment(dev, s);
  
  		add_dma_entry(entry);



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu