Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops
On 2022/5/17 21:13, Jason Gunthorpe wrote: On Tue, May 17, 2022 at 01:43:03PM +0100, Robin Murphy wrote: FWIW from my point of view I'm happy with having a .detach_dev_pasid op meaning implicitly-blocked access for now. If this is the path then lets not call it attach/detach please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer names. Sure. And with the blocking domain implemented, the set_dev_blocking_pasid could be deprecated. On SMMUv3, PASIDs don't mix with our current notion of IOMMU_DOMAIN_IDENTITY (nor the potential one for IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with devices would need significantly more work anyway. There is no extra work in the driver, beyond SMMU having to implement a blocking domain. The blocking domain's set_dev_pasid op simply is whatever set_dev_blocking_pasid would have done on the unmanaged domain. identity doesn't come into this, identity domains should have a NULL set_dev_pasid op if the driver can't support using it on a PASID. IMHO blocking_domain->ops->set_dev_pasid() is just a more logical name than domain->ops->set_dev_blocking_pasid() - especially since VFIO would like drivers to implement blocking domain anyhow. Perhaps implementing blocking domains for intel and smmuv3 iommu drivers seem to be a more practical start. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 8/8] dt-bindings: iommu: mediatek: Require mediatek,pericfg for mt8195-infra
On Tue, 17 May 2022 15:21:07 +0200, AngeloGioacchino Del Regno wrote: > The MT8195 SoC has IOMMU related registers in the pericfg_ao iospace: > require a phandle to that. > > Signed-off-by: AngeloGioacchino Del Regno > > --- > > Note for Rob: as of now, there's no iommu node in upstream mt8195 devicetrees > yet. > > .../devicetree/bindings/iommu/mediatek,iommu.yaml | 10 ++ > 1 file changed, 10 insertions(+) > Acked-by: Rob Herring ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 7/8] dt-bindings: iommu: mediatek: Require mediatek,infracfg for mt2712/8173
On Tue, May 17, 2022 at 03:21:06PM +0200, AngeloGioacchino Del Regno wrote: > Both MT2712 and MT8173 got a mediatek,infracfg phandle: add that to > the required properties for these SoCs to deprecate the old way of > looking for SoC-specific infracfg compatible in the entire devicetree. Wait, what? If there's only one possible node that can match, I prefer the 'old way'. Until we implemented a phandle cache, searching the entire tree was how phandle lookups worked too, so not any better. But if this makes things more consistent, Acked-by: Rob Herring ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us
On Tue, 2022-05-17 at 11:26 +0200, AngeloGioacchino Del Regno wrote: > Il 17/05/22 11:08, Yong Wu ha scritto: > > On Fri, 2022-05-13 at 17:14 +0200, AngeloGioacchino Del Regno > > wrote: > > > Add support for the M4Us found in the MT6795 Helio X10 SoC. > > > > > > Signed-off-by: AngeloGioacchino Del Regno < > > > angelogioacchino.delre...@collabora.com> > > > --- > > > drivers/iommu/mtk_iommu.c | 20 +++- > > > 1 file changed, 19 insertions(+), 1 deletion(-) [...] > > > +static const struct mtk_iommu_plat_data mt6795_data = { > > > + .m4u_plat = M4U_MT6795, > > > + .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI | > > > + HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM, > > > + .inv_sel_reg = REG_MMU_INV_SEL_GEN1, > > > + .banks_num= 1, > > > + .banks_enable = {true}, > > > + .iova_region = single_domain, > > > + .iova_region_nr = ARRAY_SIZE(single_domain), > > > + .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping. > > > */ > > > +}; > > > > This is nearly same with mt8173_data. mt8173 has one more larb than > > mt6795, its larbid_remap is also ok for mt6795. > > > > I think that we should be explicit about the larbid_remap property, > since mt6795 has one less larb, we should explicitly say that like > I did there... that's only for human readability I admit ... but, > still, I wouldn't want to see people thinking that MT6795 has 6 LARBs > because they've read that larbid_remap having 6 entries. In the linear mapping case, It does help. Strictly larbid_remap is two- dimensional array now, It doesn't indicate how many larbs this SoC has. If someone would like to know this, he could read the mtxxx-larb- port.h. We also could document the larb numbers in the binding for readability. Anyway, It is not a big problem. A new structure is ok for me. I just complain there are too many structures, specially in the internal branch which contains many non-upstream SoCs, and there may be several IOMMU HWs in one SoC. thus I'd like to group it personally. > > > thus it looks we could use mt8173 as the backward compatible. > > compatible = "mediatek,mt6795-m4u", > > "mediatek,mt8173-m4u"; > > > > After this, the only thing is about "mediatek,mt6795-infracfg". we > > have > > to try again with mediatek,mt6795-infracfg after mediatek,mt8173- > > infracfg fail. I think we should allow the backward case in 4GB > > mode > > judgment if we have. > > > > What's your opinion? or some other suggestion? > > Thanks. > > I know, I may have a plan for that, but I wanted to have a good > reason to > propose such a thing, as if it's just about two SoCs needing that, > there > would be no good reason to get things done differently. > > ...so, in order to provide a good cleanup, we have two possible roads > to > follow here: either we add a generic "mediatek,infracfg" compatible > to the > infra node (but I don't like that), or we can do it like it was > previously > done in mtk-pm-domains.c (I prefer that approach): > > iommu: iommu@somewhere { > ... something ... > mediatek,infracfg = <&infracfg>; We like this too. But this was not suggested from Rob long before. https://lore.kernel.org/linux-mediatek/20200715205120.GA778876@bogus/ > }; > > infracfg = syscon_regmap_lookup_by_compatible(node, > "mediatek,infracfg"); > if (IS_ERR(infracfg)) { > /* try with the older way */ > switch (...) { > case p = "mediatek,mt2712-infracfg"; > ... blah blah ... > } > /* legacy also failed, ouch! */ > if (IS_ERR(infracfg)) > return PTR_ERR(infracfg); > } > > ret = regmap_read ... etc etc etc > > Cheers, > Angelo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()
On Sat, May 14, 2022 at 10:17:38AM +0200, Thomas Gleixner wrote: > On Fri, May 13 2022 at 14:19, Ricardo Neri wrote: > > On Fri, May 06, 2022 at 11:41:13PM +0200, Thomas Gleixner wrote: > >> The argument about not bloating the code > >> with an "obvious???" function which is quite small is slightly beyond my > >> comprehension level. > > > > That obvious function would look like this: > > > > void hpet_set_comparator_one_shot(int channel, u32 delta) > > { > > u32 count; > > > > count = hpet_readl(HPET_COUNTER); > > count += delta; > > hpet_writel(count, HPET_Tn_CMP(channel)); > > } > > This function only works reliably when the delta is large. See > hpet_clkevt_set_next_event(). That is a good point. One more reason to not have a hpet_set_comparator_one_shot(), IMO. Thanks and BR, Ricardo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote: > On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote: > > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > >> + /* > >> + * If in use, the HPET hardlockup detector relies on tsc_khz. > >> + * Reconfigure it to make use of the refined tsc_khz. > >> + */ > >> + lockup_detector_reconfigure(); > > > > I don't know if the API is conceptually good. > > > > You change something that the lockup detector is currently using, > > *while* the detector is running asynchronously, and then reconfigure > > it. What happens in the window? If this code is only used for small > > adjustments maybe it does not really matter but in principle it's > > a bad API to export. > > > > lockup_detector_reconfigure as an internal API is okay because it > > reconfigures things while the watchdog is stopped [actually that > > looks untrue for soft dog which uses watchdog_thresh in > > is_softlockup(), but that should be fixed]. > > > > You're the arch so you're allowed to stop the watchdog and configure > > it, e.g., hardlockup_detector_perf_stop() is called in arch/. > > > > So you want to disable HPET watchdog if it was enabled, then update > > wherever you're using tsc_khz, then re-enable. > > The real question is whether making this refined tsc_khz value > immediately effective matters at all. IMO, it does not because up to > that point the watchdog was happily using the coarse calibrated value > and the whole use TSC to assess whether the HPET fired mechanism is just > a guestimate anyway. So what's the point of trying to guess 'more > correct'. In some of my test systems I observed that, the TSC value does not fall within the expected error window the first time the HPET channel expires. I inferred that the error computed using the coarser tsc_khz was wrong. Recalculating the error window with refined tsc_khz would correct it. However, restarting the timer has the side-effect of kicking the timer and, therefore pushing the first HPET NMI further in the future. Perhaps kicking HPET channel, not recomputing the error window, corrected (masked?) the problem. I will investigate further and rework or drop this patch as needed. Thanks and BR, Ricardo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote: > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > > The HPET hardlockup detector relies on tsc_khz to estimate the value of > > that the TSC will have when its HPET channel fires. A refined tsc_khz > > helps to estimate better the expected TSC value. > > > > Using the early value of tsc_khz may lead to a large error in the expected > > TSC value. Restarting the NMI watchdog detector has the effect of kicking > > its HPET channel and make use of the refined tsc_khz. > > > > When the HPET hardlockup is not in use, restarting the NMI watchdog is > > a noop. > > > > Cc: Andi Kleen > > Cc: Stephane Eranian > > Cc: "Ravi V. Shankar" > > Cc: iommu@lists.linux-foundation.org > > Cc: linuxppc-...@lists.ozlabs.org > > Cc: x...@kernel.org > > Signed-off-by: Ricardo Neri > > --- > > Changes since v5: > > * Introduced this patch > > > > Changes since v4 > > * N/A > > > > Changes since v3 > > * N/A > > > > Changes since v2: > > * N/A > > > > Changes since v1: > > * N/A > > --- > > arch/x86/kernel/tsc.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > > index cafacb2e58cc..cc1843044d88 100644 > > --- a/arch/x86/kernel/tsc.c > > +++ b/arch/x86/kernel/tsc.c > > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct > > work_struct *work) > > /* Inform the TSC deadline clockevent devices about the recalibration */ > > lapic_update_tsc_freq(); > > > > + /* > > +* If in use, the HPET hardlockup detector relies on tsc_khz. > > +* Reconfigure it to make use of the refined tsc_khz. > > +*/ > > + lockup_detector_reconfigure(); > > I don't know if the API is conceptually good. > > You change something that the lockup detector is currently using, > *while* the detector is running asynchronously, and then reconfigure > it. Yes, this is what happens. > What happens in the window? If this code is only used for small > adjustments maybe it does not really matter Please see my comment > but in principle it's a bad API to export. > > lockup_detector_reconfigure as an internal API is okay because it > reconfigures things while the watchdog is stopped I see. > [actually that looks untrue for soft dog which uses watchdog_thresh in > is_softlockup(), but that should be fixed]. Perhaps there should be a watchdog_thresh_user. When the user updates it, the detector is stopped, watchdog_thresh is updated, and then the detector is resumed. > > You're the arch so you're allowed to stop the watchdog and configure > it, e.g., hardlockup_detector_perf_stop() is called in arch/. I had it like this but it did not look right to me. You are right, however, I can stop/restart the watchdog when needed. Thanks and BR, Ricardo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
On Tue, 10 May 2022 13:55:24 -0300 Jason Gunthorpe wrote: > This control causes the ARM SMMU drivers to choose a stage 2 > implementation for the IO pagetable (vs the stage 1 usual default), > however this choice has no visible impact to the VFIO user. Further qemu > never implemented this and no other userspace user is known. > > The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add > new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide > SMMU translation services to the guest operating system" however the rest > of the API to set the guest table pointer for the stage 1 was never > completed, or at least never upstreamed, rendering this part useless dead > code. > > Since the current patches to enable nested translation, aka userspace page > tables, rely on iommufd and will not use the enable_nesting() > iommu_domain_op, remove this infrastructure. However, don't cut too deep > into the SMMU drivers for now expecting the iommufd work to pick it up - > we still need to create S2 IO page tables. > > Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the > enable_nesting iommu_domain_op. > > Just in-case there is some userspace using this continue to treat > requesting it as a NOP, but do not advertise support any more. > > Signed-off-by: Jason Gunthorpe > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 > drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 > drivers/iommu/iommu.c | 10 -- > drivers/vfio/vfio_iommu_type1.c | 12 +--- > include/linux/iommu.h | 3 --- > include/uapi/linux/vfio.h | 2 +- > 6 files changed, 2 insertions(+), 57 deletions(-) > > It would probably make sense for this to go through the VFIO tree with Robin's > ack for the SMMU changes. I'd be in favor of applying this, but it seems Robin and Eric are looking for a stay of execution and I'd also be looking for an ack from Joerg. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category
On Mon, May 09, 2022 at 03:59:40PM +0200, Thomas Gleixner wrote: > On Thu, May 05 2022 at 17:00, Ricardo Neri wrote: > > Add a NMI_WATCHDOG as a new category of NMI handler. This new category > > is to be used with the HPET-based hardlockup detector. This detector > > does not have a direct way of checking if the HPET timer is the source of > > the NMI. Instead, it indirectly estimates it using the time-stamp counter. > > > > Therefore, we may have false-positives in case another NMI occurs within > > the estimated time window. For this reason, we want the handler of the > > detector to be called after all the NMI_LOCAL handlers. A simple way > > of achieving this with a new NMI handler category. > > > > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs > > *regs) > > } > > raw_spin_unlock(&nmi_reason_lock); > > > > + handled = nmi_handle(NMI_WATCHDOG, regs); > > + if (handled == NMI_HANDLED) > > + goto out; > > + > > How is this supposed to work reliably? > > If perf is active and the HPET NMI and the perf NMI come in around the > same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog > won't be checked. Because MSI is strictly edge and the message is only > sent once, this can result in a stale watchdog, no? This is true. Instead, at the end of each NMI I should _also_ check if the TSC is within the expected value of the HPET NMI watchdog. In this way, unrelated NMIs (e.g., perf NMI) are handled and we don't miss the NMI from the HPET channel. Thanks and BR, Ricardo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Error when running fio against nvme-of rdma target (mlx5 driver)
Hi Robin, I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6. [ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts) [ 4879.122015] nvme nvme0: starting error recovery [ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error [ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe [ 4879.122037] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122039] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122040] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122040] 0030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2 [ 4881.085547] nvme nvme3: Reconnecting in 10 seconds... I assume this means that the problem has still not been resolved? If so, I'll try to diagnose the problem. Thanks, --Mark On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" wrote: On 2022-02-10 23:58, Martin Oliveira wrote: > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote: >> On 2/8/22 6:50 PM, Martin Oliveira wrote: >>> Hello, >>> >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions. >>> >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace. >>> >> >> Thanks for reporting this, if you can bisect the problem on your setup >> it will help others to help you better. >> >> -ck > > Hi Chaitanya, > > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from. > > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version. > > I'd be happy to try any tests if someone has any suggestions. The IOMMU is probably your friend here - one thing that might be worth trying is capturing the iommu:map and iommu:unmap tracepoints to see if the address reported in subsequent IOMMU faults was previously mapped as a valid DMA address (be warned that there will likely be a *lot* of trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" should also make it easier to tell real DMA IOVAs from rogue physical addresses or other nonsense, as real DMA addresses should then look more like 0x24d08000. That could at least help narrow down whether it's some kind of use-after-free race or a completely bogus address creeping in somehow. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/8] dt-bindings: iommu: mediatek: Add mediatek,pericfg phandle
On 17/05/2022 15:21, AngeloGioacchino Del Regno wrote: > Add property "mediatek,pericfg" to let the mtk_iommu driver retrieve > a phandle to the pericfg syscon instead of performing a per-soc > compatible lookup, as it was also done with infracfg. > > Signed-off-by: AngeloGioacchino Del Regno > > --- > Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml | 4 > 1 file changed, 4 insertions(+) > > diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > index 78c72c22740b..a6cf9678271f 100644 > --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > @@ -116,6 +116,10 @@ properties: >Refer to bindings/memory-controllers/mediatek,smi-larb.yaml. It must > sort >according to the local arbiter index, like larb0, larb1, larb2... > > + mediatek,pericfg: > +$ref: "/schemas/types.yaml#/definitions/phandle" No need for quotes. Best regards, Krzysztof ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/8] dt-bindings: iommu: mediatek: Add mediatek,infracfg phandle
On 17/05/2022 15:21, AngeloGioacchino Del Regno wrote: > Add property "mediatek,infracfg" to let the mtk_iommu driver retrieve > a phandle to the infracfg syscon instead of performing a per-soc > compatible lookup. > > Signed-off-by: AngeloGioacchino Del Regno > > --- > Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml | 4 > 1 file changed, 4 insertions(+) > > diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > index 2ae3bbad7f1a..78c72c22740b 100644 > --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml > @@ -101,6 +101,10 @@ properties: > items: >- const: bclk > > + mediatek,infracfg: > +$ref: "/schemas/types.yaml#/definitions/phandle" No need for quotes. They are not present in other places. > +description: The phandle to the mediatek infracfg syscon > + >mediatek,larbs: > $ref: /schemas/types.yaml#/definitions/phandle-array > minItems: 1 Best regards, Krzysztof ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/8] iommu: mtk_iommu: Lookup phandle to retrieve syscon to infracfg
On 2022-05-17 14:21, AngeloGioacchino Del Regno wrote: This driver will get support for more SoCs and the list of infracfg compatibles is expected to grow: in order to prevent getting this situation out of control and see a long list of compatible strings, add support to retrieve a handle to infracfg's regmap through a new "mediatek,infracfg" phandle. In order to keep retrocompatibility with older devicetrees, the old way is kept in place, but also a dev_warn() was added to advertise this change in hope that the user will see it and eventually update the devicetree if this is possible. Signed-off-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 40 +-- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 71b2ace74cd6..cfaaa98d2b50 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -1134,22 +1134,34 @@ static int mtk_iommu_probe(struct platform_device *pdev) data->protect_base = ALIGN(virt_to_phys(protect), MTK_PROTECT_PA_ALIGN); if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE)) { - switch (data->plat_data->m4u_plat) { - case M4U_MT2712: - p = "mediatek,mt2712-infracfg"; - break; - case M4U_MT8173: - p = "mediatek,mt8173-infracfg"; - break; - default: - p = NULL; + infracfg = syscon_regmap_lookup_by_phandle(dev->of_node, "mediatek,infracfg"); + if (IS_ERR(infracfg)) { + dev_warn(dev, "Cannot find phandle to mediatek,infracfg:" + " Please update your devicetree.\n"); Is this really a dev_warn-level problem? There's no functional impact, given that we can't stop supporting the original binding any time soon, if ever, so I suspect this is more likely to just annoy users and CI systems than effect any significant change. + /* +* Legacy devicetrees will not specify a phandle to +* mediatek,infracfg: in that case, we use the older +* way to retrieve a syscon to infra. +* +* This is for retrocompatibility purposes only, hence +* no more compatibles shall be added to this. +*/ + switch (data->plat_data->m4u_plat) { + case M4U_MT2712: + p = "mediatek,mt2712-infracfg"; + break; + case M4U_MT8173: + p = "mediatek,mt8173-infracfg"; + break; + default: + p = NULL; + } + + infracfg = syscon_regmap_lookup_by_compatible(p); Would it not make sense to punt this over to the same mechanism as for pericfg, such that it simplifies down to something like: if (IS_ERR(infracfg) && plat_data->infracfg) { infracfg = syscon_regmap_lookup_by_compatible(plat_data->infracfg); ... } ? TBH if we're still going to have a load of per-SoC data in the driver anyway then I don't see that we really gain much by delegating one aspect of it to DT, but meh. I would note that with the phandle approach, you still need some *other* flag in the driver to know whether a phandle is expected to be present or not, whereas a NULL vs. non-NULL string is at least neatly self-describing. Robin. + if (IS_ERR(infracfg)) + return PTR_ERR(infracfg); } - infracfg = syscon_regmap_lookup_by_compatible(p); - - if (IS_ERR(infracfg)) - return PTR_ERR(infracfg); - ret = regmap_read(infracfg, REG_INFRA_MISC, &val); if (ret) return ret; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()
On 17/05/2022 13:02, Robin Murphy wrote: Indeed, sorry but NAK for this being nonsense. As I've said at least once before, if the unnecessary SAC address allocation attempt slows down your workload, make it not do that in the first place. If you don't like the existing command-line parameter then fine, > there are plenty of other options, it just needs to be done in a way that doesn't break x86 systems with dodgy firmware, as my first attempt turned out to. Sorry, but I am not interested in this. It was discussed in Jan last year without any viable solution. Er, OK, if you're not interested in solving that problem I don't see why you'd bring it up, but hey ho. *I* still think it's important, so I guess I'll revive my old patch with a CONFIG_X86 bodge and have another go at some point. Let me rephrase, I would be happy to help fix that problem if we really can get it fixed, however for my problem it's better to try to get the SCSI driver to stop requesting uncached IOVAs foremost. Anyway we still have the long-term IOVA aging issue, and requesting non-cached IOVAs is involved in that. So I would rather keep the SCSI driver to requesting cached IOVAs all the time. I did try to do it the other way around - configuring the IOVA caching range according to the drivers requirement but that got nowhere. Note that this is still not a final solution as it's not always viable to ask a user to unbind + bind the driver. FWIW I thought that all looked OK, it just kept getting drowned out by more critical things in my inbox so I hoped someone else might comment. If it turns out that I've become the de-facto IOVA maintainer in everyone else's minds now and they're all waiting for my word then fair enough, I just need to know and reset my expectations accordingly. Joerg? It would be great to see an improvement here... Furthermore, if a particular SCSI driver doesn't benefit from mappings larger than 256KB, then that driver is also free to limit its own mapping size. There are other folks out there with use-cases for mapping *gigabytes* at once; you don't get to cripple the API and say that that's suddenly not allowed just because it happens to make your thing go faster, that's absurd. I'd say less catastrophically slow, not faster. So how to inform the SCSI driver of this caching limit then so that it may limit the SGL length? Driver-specific mechanism; block-layer-specific mechanism; redefine this whole API to something like dma_opt_mapping_size(), as a limit above which mappings might become less efficient or start to fail (callback to my thoughts on [1] as well, I suppose); many options. ok, fine. Just not imposing a ridiculously low *maximum* on everyone wherein mapping calls "should not be larger than the returned value" when that's clearly bollocks. I agree that this change is in violation as the documentation clearly implies a hard limit. However, FWIW, from looking at users of dma_max_mapping_size(), they seem to use in a similar way to SCSI/block layer core, i.e. use this value to limit the max SGL total len per IO command. And not as a method to learn max DMA consistent mappings size for ring buffers, etc. Anyway I'll look at dma_opt_mapping_size() but I am not sure how keen Christoph will be on this... it is strange to introduce that API due to peculiarity of the IOVA allocator. [1] https://lore.kernel.org/linux-iommu/20220510142109.38-1-ltyker...@gmail.com/ Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 8/8] dt-bindings: iommu: mediatek: Require mediatek, pericfg for mt8195-infra
The MT8195 SoC has IOMMU related registers in the pericfg_ao iospace: require a phandle to that. Signed-off-by: AngeloGioacchino Del Regno --- Note for Rob: as of now, there's no iommu node in upstream mt8195 devicetrees yet. .../devicetree/bindings/iommu/mediatek,iommu.yaml | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml index 17d78b17027a..2441c2e8e55d 100644 --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml @@ -187,6 +187,16 @@ allOf: required: - mediatek,infracfg + - if: + properties: +compatible: + contains: +const: mediatek,mt8195-iommu-infra + +then: + required: +- mediatek,pericfg + - if: # The IOMMUs don't have larbs. not: properties: -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/8] mtk_iommu: Specify phandles to infracfg and pericfg
The IOMMU has registers in the infracfg and/or pericfg iospaces: as for the currently supported SoCs, MT2712 and MT8173 need a phandle to infracfg, while MT8195 needs one to pericfg. Before this change, the driver was checking for a SoC-specific infra/peri compatible but, sooner or later, these lists are going to grow a lot... ...and this is why it was chosen to add phandles (as it was done with some other drivers already - look at mtk-pm-domains, mt8192-afe Please note that, while it was necessary to update the devicetrees for MT8173 and MT2712e, there was no update for MT8195 because there is no IOMMU node in there yet. AngeloGioacchino Del Regno (8): dt-bindings: iommu: mediatek: Add mediatek,infracfg phandle iommu: mtk_iommu: Lookup phandle to retrieve syscon to infracfg dt-bindings: iommu: mediatek: Add mediatek,pericfg phandle iommu: mtk_iommu: Lookup phandle to retrieve syscon to pericfg arm64: dts: mediatek: mt8173: Add mediatek,infracfg phandle for IOMMU arm64: dts: mediatek: mt2712e: Add mediatek,infracfg phandle for IOMMU dt-bindings: iommu: mediatek: Require mediatek,infracfg for mt2712/8173 dt-bindings: iommu: mediatek: Require mediatek,pericfg for mt8195-infra .../bindings/iommu/mediatek,iommu.yaml| 30 + arch/arm64/boot/dts/mediatek/mt2712e.dtsi | 2 + arch/arm64/boot/dts/mediatek/mt8173.dtsi | 1 + drivers/iommu/mtk_iommu.c | 66 --- 4 files changed, 75 insertions(+), 24 deletions(-) -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/8] arm64: dts: mediatek: mt8173: Add mediatek, infracfg phandle for IOMMU
The IOMMU driver now looks for the "mediatek,infracfg" phandle as a new way to retrieve a syscon to that: even though the old way is retained, it has been deprecated and the driver will write a message in kmsg advertising to use the phandle way instead. For this reason, assign the right phandle to mediatek,infracfg in the iommu node. Signed-off-by: AngeloGioacchino Del Regno --- arch/arm64/boot/dts/mediatek/mt8173.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi b/arch/arm64/boot/dts/mediatek/mt8173.dtsi index 40d7b47fc52e..825a3c670373 100644 --- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi @@ -588,6 +588,7 @@ iommu: iommu@10205000 { interrupts = ; clocks = <&infracfg CLK_INFRA_M4U>; clock-names = "bclk"; + mediatek,infracfg = <&infracfg>; mediatek,larbs = <&larb0>, <&larb1>, <&larb2>, <&larb3>, <&larb4>, <&larb5>; #iommu-cells = <1>; -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/8] dt-bindings: iommu: mediatek: Add mediatek, infracfg phandle
Add property "mediatek,infracfg" to let the mtk_iommu driver retrieve a phandle to the infracfg syscon instead of performing a per-soc compatible lookup. Signed-off-by: AngeloGioacchino Del Regno --- Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml index 2ae3bbad7f1a..78c72c22740b 100644 --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml @@ -101,6 +101,10 @@ properties: items: - const: bclk + mediatek,infracfg: +$ref: "/schemas/types.yaml#/definitions/phandle" +description: The phandle to the mediatek infracfg syscon + mediatek,larbs: $ref: /schemas/types.yaml#/definitions/phandle-array minItems: 1 -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/8] iommu: mtk_iommu: Lookup phandle to retrieve syscon to pericfg
On some SoCs (of which only MT8195 is supported at the time of writing), the "R" and "W" (I/O) enable bits for the IOMMUs are in the pericfg_ao register space and not in the IOMMU space: as it happened already with infracfg, it is expected that this list will grow. Instead of specifying pericfg compatibles on a per-SoC basis, following what was done with infracfg, let's lookup the syscon by phandle instead. Also following the previous infracfg change, add a warning for outdated devicetrees, in hope that the user will take action. Signed-off-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index cfaaa98d2b50..c7e2d836199e 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -138,6 +138,8 @@ /* PM and clock always on. e.g. infra iommu */ #define PM_CLK_AO BIT(15) #define IFA_IOMMU_PCIE_SUPPORT BIT(16) +/* IOMMU I/O (r/w) is enabled using PERICFG_IOMMU_1 register */ +#define HAS_PERI_IOMMU1_REGBIT(17) #define MTK_IOMMU_HAS_FLAG_MASK(pdata, _x, mask) \ pdata)->flags) & (mask)) == (_x)) @@ -187,7 +189,6 @@ struct mtk_iommu_plat_data { u32 flags; u32 inv_sel_reg; - char*pericfg_comp_str; struct list_head*hw_list; unsigned intiova_region_nr; const struct mtk_iommu_iova_region *iova_region; @@ -1214,14 +1215,19 @@ static int mtk_iommu_probe(struct platform_device *pdev) goto out_runtime_disable; } } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) && - data->plat_data->pericfg_comp_str) { - infracfg = syscon_regmap_lookup_by_compatible(data->plat_data->pericfg_comp_str); - if (IS_ERR(infracfg)) { - ret = PTR_ERR(infracfg); - goto out_runtime_disable; - } + MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_PERI_IOMMU1_REG)) { + data->pericfg = syscon_regmap_lookup_by_phandle(dev->of_node, "mediatek,pericfg"); + if (IS_ERR(data->pericfg)) { + dev_warn(dev, "Cannot find phandle to mediatek,pericfg:" + " Please update your devicetree.\n"); - data->pericfg = infracfg; + p = "mediatek,mt8195-pericfg_ao"; + data->pericfg = syscon_regmap_lookup_by_compatible(p); + if (IS_ERR(data->pericfg)) { + ret = PTR_ERR(data->pericfg); + goto out_runtime_disable; + } + } } platform_set_drvdata(pdev, data); @@ -1480,8 +1486,8 @@ static const struct mtk_iommu_plat_data mt8192_data = { static const struct mtk_iommu_plat_data mt8195_data_infra = { .m4u_plat = M4U_MT8195, .flags= WR_THROT_EN | DCM_DISABLE | STD_AXI_MODE | PM_CLK_AO | - MTK_IOMMU_TYPE_INFRA | IFA_IOMMU_PCIE_SUPPORT, - .pericfg_comp_str = "mediatek,mt8195-pericfg_ao", + HAS_PERI_IOMMU1_REG | MTK_IOMMU_TYPE_INFRA | + IFA_IOMMU_PCIE_SUPPORT, .inv_sel_reg = REG_MMU_INV_SEL_GEN2, .banks_num= 5, .banks_enable = {true, false, false, false, true}, -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/8] arm64: dts: mediatek: mt2712e: Add mediatek, infracfg phandle for IOMMU
The IOMMU driver now looks for the "mediatek,infracfg" phandle as a new way to retrieve a syscon to that: even though the old way is retained, it has been deprecated and the driver will write a message in kmsg advertising to use the phandle way instead. For this reason, assign the right phandle to mediatek,infracfg in the iommu node. Signed-off-by: AngeloGioacchino Del Regno --- arch/arm64/boot/dts/mediatek/mt2712e.dtsi | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi index 623eb3beabf2..4797537cb368 100644 --- a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi @@ -329,6 +329,7 @@ iommu0: iommu@10205000 { interrupts = ; clocks = <&infracfg CLK_INFRA_M4U>; clock-names = "bclk"; + mediatek,infracfg = <&infracfg>; mediatek,larbs = <&larb0>, <&larb1>, <&larb2>, <&larb3>, <&larb6>; #iommu-cells = <1>; @@ -346,6 +347,7 @@ iommu1: iommu@1020a000 { interrupts = ; clocks = <&infracfg CLK_INFRA_M4U>; clock-names = "bclk"; + mediatek,infracfg = <&infracfg>; mediatek,larbs = <&larb4>, <&larb5>, <&larb7>; #iommu-cells = <1>; }; -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/8] iommu: mtk_iommu: Lookup phandle to retrieve syscon to infracfg
This driver will get support for more SoCs and the list of infracfg compatibles is expected to grow: in order to prevent getting this situation out of control and see a long list of compatible strings, add support to retrieve a handle to infracfg's regmap through a new "mediatek,infracfg" phandle. In order to keep retrocompatibility with older devicetrees, the old way is kept in place, but also a dev_warn() was added to advertise this change in hope that the user will see it and eventually update the devicetree if this is possible. Signed-off-by: AngeloGioacchino Del Regno --- drivers/iommu/mtk_iommu.c | 40 +-- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 71b2ace74cd6..cfaaa98d2b50 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -1134,22 +1134,34 @@ static int mtk_iommu_probe(struct platform_device *pdev) data->protect_base = ALIGN(virt_to_phys(protect), MTK_PROTECT_PA_ALIGN); if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE)) { - switch (data->plat_data->m4u_plat) { - case M4U_MT2712: - p = "mediatek,mt2712-infracfg"; - break; - case M4U_MT8173: - p = "mediatek,mt8173-infracfg"; - break; - default: - p = NULL; + infracfg = syscon_regmap_lookup_by_phandle(dev->of_node, "mediatek,infracfg"); + if (IS_ERR(infracfg)) { + dev_warn(dev, "Cannot find phandle to mediatek,infracfg:" + " Please update your devicetree.\n"); + /* +* Legacy devicetrees will not specify a phandle to +* mediatek,infracfg: in that case, we use the older +* way to retrieve a syscon to infra. +* +* This is for retrocompatibility purposes only, hence +* no more compatibles shall be added to this. +*/ + switch (data->plat_data->m4u_plat) { + case M4U_MT2712: + p = "mediatek,mt2712-infracfg"; + break; + case M4U_MT8173: + p = "mediatek,mt8173-infracfg"; + break; + default: + p = NULL; + } + + infracfg = syscon_regmap_lookup_by_compatible(p); + if (IS_ERR(infracfg)) + return PTR_ERR(infracfg); } - infracfg = syscon_regmap_lookup_by_compatible(p); - - if (IS_ERR(infracfg)) - return PTR_ERR(infracfg); - ret = regmap_read(infracfg, REG_INFRA_MISC, &val); if (ret) return ret; -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/8] dt-bindings: iommu: mediatek: Add mediatek, pericfg phandle
Add property "mediatek,pericfg" to let the mtk_iommu driver retrieve a phandle to the pericfg syscon instead of performing a per-soc compatible lookup, as it was also done with infracfg. Signed-off-by: AngeloGioacchino Del Regno --- Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml index 78c72c22740b..a6cf9678271f 100644 --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml @@ -116,6 +116,10 @@ properties: Refer to bindings/memory-controllers/mediatek,smi-larb.yaml. It must sort according to the local arbiter index, like larb0, larb1, larb2... + mediatek,pericfg: +$ref: "/schemas/types.yaml#/definitions/phandle" +description: The phandle to the mediatek pericfg syscon + '#iommu-cells': const: 1 description: | -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 7/8] dt-bindings: iommu: mediatek: Require mediatek, infracfg for mt2712/8173
Both MT2712 and MT8173 got a mediatek,infracfg phandle: add that to the required properties for these SoCs to deprecate the old way of looking for SoC-specific infracfg compatible in the entire devicetree. Signed-off-by: AngeloGioacchino Del Regno --- .../devicetree/bindings/iommu/mediatek,iommu.yaml| 12 1 file changed, 12 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml index a6cf9678271f..17d78b17027a 100644 --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml @@ -175,6 +175,18 @@ allOf: required: - power-domains + - if: + properties: +compatible: + contains: +enum: + - mediatek,mt2712-m4u + - mediatek,mt8173-m4u + +then: + required: +- mediatek,infracfg + - if: # The IOMMUs don't have larbs. not: properties: -- 2.35.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops
On Tue, May 17, 2022 at 01:43:03PM +0100, Robin Murphy wrote: > FWIW from my point of view I'm happy with having a .detach_dev_pasid op > meaning implicitly-blocked access for now. If this is the path then lets not call it attach/detach please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer names. > On SMMUv3, PASIDs don't mix with our current notion of > IOMMU_DOMAIN_IDENTITY (nor the potential one for > IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with > devices would need significantly more work anyway. There is no extra work in the driver, beyond SMMU having to implement a blocking domain. The blocking domain's set_dev_pasid op simply is whatever set_dev_blocking_pasid would have done on the unmanaged domain. identity doesn't come into this, identity domains should have a NULL set_dev_pasid op if the driver can't support using it on a PASID. IMHO blocking_domain->ops->set_dev_pasid() is just a more logical name than domain->ops->set_dev_blocking_pasid() - especially since VFIO would like drivers to implement blocking domain anyhow. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops
On Tue, May 17, 2022 at 10:37:55AM +0800, Baolu Lu wrote: > Hi Jason, > > On 2022/5/16 21:57, Jason Gunthorpe wrote: > > On Mon, May 16, 2022 at 12:22:08PM +0100, Robin Murphy wrote: > > > On 2022-05-16 02:57, Lu Baolu wrote: > > > > Each IOMMU driver must provide a blocking domain ops. If the hardware > > > > supports detaching domain from device, setting blocking domain equals > > > > detaching the existing domain from the deivce. Otherwise, an UNMANAGED > > > > domain without any mapping will be used instead. > > > Unfortunately that's backwards - most of the implementations of > > > .detach_dev > > > are disabling translation entirely, meaning the device ends up effectively > > > in passthrough rather than blocked. > > Ideally we'd convert the detach_dev of every driver into either > > a blocking or identity domain. The trick is knowing which is which.. > > I am still a bit puzzled about how the blocking_domain should be used when > it is extended to support ->set_dev_pasid. > > If it's a blocking domain, the IOMMU driver knows that setting the > blocking domain to device pasid means detaching the existing one. > But if it's an identity domain, how could the IOMMU driver choose > between: The identity domain would never be used for detaching a PASID. The reason it is in this explanation is because every detach_dev op should be deleted - and the code that it was doing can remain in the driver as either a blocking or identity domain. > > So, the approach here should be to go driver by driver and convert > > detach_dev to either identity, blocking or just delete it entirely, > > excluding the above 7 that don't support default domains. And get acks > > from the driver owners. > Agreed. There seems to be a long way to go. I am wondering if we could > decouple this refactoring from my new SVA API work? We can easily switch > .detach_dev_pasid to using blocking domain later. Well, PASID has nothing to do with this. PASID enabled drivers would just need: - Must be using default domains and must have a NULL detach_dev op - Must provide a blocking_domain - Must provide set_dev_pasid for the unmanaged and blocking domains it creates That is it. Realigning the existing drivers for their base RID support is another task. The appeal of this is that it matches how base RID support should look and doesn't continue the confusing appearance that there is a strictly paired attach/detach operation. Any domain can be set ony and RID/PASID at any time. Drivers intended to be used with VFIO really want a blocking_domain anyhow for efficiency. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops
On 2022-05-17 03:37, Baolu Lu wrote: Hi Jason, On 2022/5/16 21:57, Jason Gunthorpe wrote: On Mon, May 16, 2022 at 12:22:08PM +0100, Robin Murphy wrote: On 2022-05-16 02:57, Lu Baolu wrote: Each IOMMU driver must provide a blocking domain ops. If the hardware supports detaching domain from device, setting blocking domain equals detaching the existing domain from the deivce. Otherwise, an UNMANAGED domain without any mapping will be used instead. Unfortunately that's backwards - most of the implementations of .detach_dev are disabling translation entirely, meaning the device ends up effectively in passthrough rather than blocked. Ideally we'd convert the detach_dev of every driver into either a blocking or identity domain. The trick is knowing which is which.. I am still a bit puzzled about how the blocking_domain should be used when it is extended to support ->set_dev_pasid. If it's a blocking domain, the IOMMU driver knows that setting the blocking domain to device pasid means detaching the existing one. But if it's an identity domain, how could the IOMMU driver choose between: - setting the input domain to the pasid on device; or, - detaching the existing domain. I've ever thought about below solutions: - Checking the domain types and dispatching them to different operations. - Using different blocking domains for different types of domains. But both look rough. Guessing going down the list: apple dart - blocking, detach_dev calls apple_dart_hw_disable_dma() same as IOMMU_DOMAIN_BLOCKED [I wonder if this drive ris wrong in other ways though because I dont see a remove_streams in attach_dev] exynos - this seems to disable the 'sysmmu' so I'm guessing this is identity iommu-vmsa - Comment says 'disable mmu translaction' so I'm guessing this is idenity mkt_v1 - Code looks similar to mkt, which is probably identity. rkt - No idea sprd - No idea sun50i - This driver confusingly treats identity the same as unmanaged, seems wrong, no idea. amd - Not sure, clear_dte_entry() seems to set translation on but points the PTE to 0 ? Based on the spec table 8 I would have expected TV to be clear which would be blocking. Maybe a bug?? arm smmu qcomm - not sure intel - blocking These doesn't support default domains, so detach_dev should return back to DMA API ownership, which is either identity or something weird: fsl_pamu - identity due to the PPC use of dma direct msm mkt omap s390 - platform DMA ops terga-gart - Usually something called a GART would be 0 length once disabled, guessing blocking? tegra-smmu So, the approach here should be to go driver by driver and convert detach_dev to either identity, blocking or just delete it entirely, excluding the above 7 that don't support default domains. And get acks from the driver owners. Agreed. There seems to be a long way to go. I am wondering if we could decouple this refactoring from my new SVA API work? We can easily switch .detach_dev_pasid to using blocking domain later. FWIW from my point of view I'm happy with having a .detach_dev_pasid op meaning implicitly-blocked access for now. On SMMUv3, PASIDs don't mix with our current notion of IOMMU_DOMAIN_IDENTITY (nor the potential one for IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with devices would need significantly more work anyway. Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()
On 2022-05-17 12:26, John Garry wrote: On 17/05/2022 11:40, Robin Murphy wrote: On 2022-05-16 14:06, John Garry wrote: For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which allows the drivers to know the mapping limit and thus limit the requested IOVA lengths. This resolves the performance issue originally reported in [0] for a SCSI HBA driver which was regularly mapping SGLs which required IOVAs in excess of the IOVA caching limit. In this case the block layer limits the max sectors per request - as configured in __scsi_init_queue() - which will limit the total SGL length the driver tries to map and in turn limits IOVA lengths requested. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ Signed-off-by: John Garry --- Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and not a hard limit which I expect is the semantics of dma_map_ops.max_mapping_size Indeed, sorry but NAK for this being nonsense. As I've said at least once before, if the unnecessary SAC address allocation attempt slows down your workload, make it not do that in the first place. If you don't like the existing command-line parameter then fine, > there are plenty of other options, it just needs to be done in a way that doesn't break x86 systems with dodgy firmware, as my first attempt turned out to. Sorry, but I am not interested in this. It was discussed in Jan last year without any viable solution. Er, OK, if you're not interested in solving that problem I don't see why you'd bring it up, but hey ho. *I* still think it's important, so I guess I'll revive my old patch with a CONFIG_X86 bodge and have another go at some point. Anyway we still have the long-term IOVA aging issue, and requesting non-cached IOVAs is involved in that. So I would rather keep the SCSI driver to requesting cached IOVAs all the time. I did try to do it the other way around - configuring the IOVA caching range according to the drivers requirement but that got nowhere. FWIW I thought that all looked OK, it just kept getting drowned out by more critical things in my inbox so I hoped someone else might comment. If it turns out that I've become the de-facto IOVA maintainer in everyone else's minds now and they're all waiting for my word then fair enough, I just need to know and reset my expectations accordingly. Joerg? Furthermore, if a particular SCSI driver doesn't benefit from mappings larger than 256KB, then that driver is also free to limit its own mapping size. There are other folks out there with use-cases for mapping *gigabytes* at once; you don't get to cripple the API and say that that's suddenly not allowed just because it happens to make your thing go faster, that's absurd. I'd say less catastrophically slow, not faster. So how to inform the SCSI driver of this caching limit then so that it may limit the SGL length? Driver-specific mechanism; block-layer-specific mechanism; redefine this whole API to something like dma_opt_mapping_size(), as a limit above which mappings might become less efficient or start to fail (callback to my thoughts on [1] as well, I suppose); many options. Just not imposing a ridiculously low *maximum* on everyone wherein mapping calls "should not be larger than the returned value" when that's clearly bollocks. Cheers, Robin. [1] https://lore.kernel.org/linux-iommu/20220510142109.38-1-ltyker...@gmail.com/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()
On 17/05/2022 11:40, Robin Murphy wrote: On 2022-05-16 14:06, John Garry wrote: For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which allows the drivers to know the mapping limit and thus limit the requested IOVA lengths. This resolves the performance issue originally reported in [0] for a SCSI HBA driver which was regularly mapping SGLs which required IOVAs in excess of the IOVA caching limit. In this case the block layer limits the max sectors per request - as configured in __scsi_init_queue() - which will limit the total SGL length the driver tries to map and in turn limits IOVA lengths requested. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ Signed-off-by: John Garry --- Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and not a hard limit which I expect is the semantics of dma_map_ops.max_mapping_size Indeed, sorry but NAK for this being nonsense. As I've said at least once before, if the unnecessary SAC address allocation attempt slows down your workload, make it not do that in the first place. If you don't like the existing command-line parameter then fine, > there are plenty of other options, it just needs to be done in a way that doesn't break x86 systems with dodgy firmware, as my first attempt turned out to. Sorry, but I am not interested in this. It was discussed in Jan last year without any viable solution. Anyway we still have the long-term IOVA aging issue, and requesting non-cached IOVAs is involved in that. So I would rather keep the SCSI driver to requesting cached IOVAs all the time. I did try to do it the other way around - configuring the IOVA caching range according to the drivers requirement but that got nowhere. Furthermore, if a particular SCSI driver doesn't benefit from mappings larger than 256KB, then that driver is also free to limit its own mapping size. There are other folks out there with use-cases for mapping *gigabytes* at once; you don't get to cripple the API and say that that's suddenly not allowed just because it happens to make your thing go faster, that's absurd. I'd say less catastrophically slow, not faster. So how to inform the SCSI driver of this caching limit then so that it may limit the SGL length? Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Error when running fio against nvme-of rdma target (mlx5 driver)
Hi, Can you please send the original scenario, setup details and dumps ? I can't find it in my mailbox. you can send it directly to me to avoid spam. -Max. On 5/17/2022 11:26 AM, Mark Ruijter wrote: Hi Robin, I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 5.18-rc6. [ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts) [ 4879.122015] nvme nvme0: starting error recovery [ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 4, Message: local protection error [ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe [ 4879.122037] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122039] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122040] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 4879.122040] 0030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2 [ 4881.085547] nvme nvme3: Reconnecting in 10 seconds... I assume this means that the problem has still not been resolved? If so, I'll try to diagnose the problem. Thanks, --Mark On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" wrote: On 2022-02-10 23:58, Martin Oliveira wrote: > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote: >> On 2/8/22 6:50 PM, Martin Oliveira wrote: >>> Hello, >>> >>> We have been hitting an error when running IO over our nvme-of setup, using the mlx5 driver and we are wondering if anyone has seen anything similar/has any suggestions. >>> >>> Both initiator and target are AMD EPYC 7502 machines connected over RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe fabrics device, one physical SSD per namespace. >>> >> >> Thanks for reporting this, if you can bisect the problem on your setup >> it will help others to help you better. >> >> -ck > > Hi Chaitanya, > > I went back to a kernel as old as 4.15 and the problem was still there, so I don't know of a good commit to start from. > > I also learned that I can reproduce this with as little as 3 cards and I updated the firmware on the Mellanox cards to the latest version. > > I'd be happy to try any tests if someone has any suggestions. The IOMMU is probably your friend here - one thing that might be worth trying is capturing the iommu:map and iommu:unmap tracepoints to see if the address reported in subsequent IOMMU faults was previously mapped as a valid DMA address (be warned that there will likely be a *lot* of trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" should also make it easier to tell real DMA IOVAs from rogue physical addresses or other nonsense, as real DMA addresses should then look more like 0x24d08000. That could at least help narrow down whether it's some kind of use-after-free race or a completely bogus address creeping in somehow. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU
On Tue, May 17, 2022 at 10:05:43AM +0800, Baolu Lu wrote: > Hi Jason, > > On 2022/5/17 02:06, Jason Gunthorpe wrote: > > > +static __init int tboot_force_iommu(void) > > > +{ > > > + if (!tboot_enabled()) > > > + return 0; > > > + > > > + if (no_iommu || dmar_disabled) > > > + pr_warn("Forcing Intel-IOMMU to enabled\n"); > > Unrelated, but when we are in the special secure IOMMU modes, do we > > force ATS off? Specifically does the IOMMU reject TLPs that are marked > > as translated? > > Good question. From IOMMU point of view, I don't see a point to force > ATS off, but trust boot involves lots of other things that I am not > familiar with. Anybody else could help to answer? ATS is inherently not secure, if a rouge device can issue a TLP with the translated bit set then it has unlimited access to host memory. Many of these trusted iommu scenarios rely on the idea that a rouge device cannot DMA to arbitary system memory. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dt-bindings: mediatek: Add bindings for MT6795 M4U
Il 16/05/22 18:03, Rob Herring ha scritto: On Fri, 13 May 2022 17:14:10 +0200, AngeloGioacchino Del Regno wrote: Add bindings for the MediaTek Helio X10 (MT6795) IOMMU/M4U. Signed-off-by: AngeloGioacchino Del Regno --- .../bindings/iommu/mediatek,iommu.yaml| 3 + include/dt-bindings/memory/mt6795-larb-port.h | 96 +++ 2 files changed, 99 insertions(+) create mode 100644 include/dt-bindings/memory/mt6795-larb-port.h Acked-by: Rob Herring Hello Rob, I'm sad to say that, but I have to send a new version of this patch even though you have acked it already... and I will have to drop your ack, as the changes to the yaml patch will be a bit different, as we're adding support for a phandle to mediatek,infracfg after some discussion about it on patch 2/2 of this series. The mediatek,infracfg phandle addition will come as a different series, and this patch (on v2) will add a conditional for: - if: properties: compatible: contains: enum: - mediatek,mt6795-m4u then: required: - mediatek,infracfg Sorry about wasting your time on this v1. Regards, Angelo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()
On 2022-05-16 14:06, John Garry wrote: For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which allows the drivers to know the mapping limit and thus limit the requested IOVA lengths. This resolves the performance issue originally reported in [0] for a SCSI HBA driver which was regularly mapping SGLs which required IOVAs in excess of the IOVA caching limit. In this case the block layer limits the max sectors per request - as configured in __scsi_init_queue() - which will limit the total SGL length the driver tries to map and in turn limits IOVA lengths requested. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ Signed-off-by: John Garry --- Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and not a hard limit which I expect is the semantics of dma_map_ops.max_mapping_size Indeed, sorry but NAK for this being nonsense. As I've said at least once before, if the unnecessary SAC address allocation attempt slows down your workload, make it not do that in the first place. If you don't like the existing command-line parameter then fine, there are plenty of other options, it just needs to be done in a way that doesn't break x86 systems with dodgy firmware, as my first attempt turned out to. Furthermore, if a particular SCSI driver doesn't benefit from mappings larger than 256KB, then that driver is also free to limit its own mapping size. There are other folks out there with use-cases for mapping *gigabytes* at once; you don't get to cripple the API and say that that's suddenly not allowed just because it happens to make your thing go faster, that's absurd. Thanks, Robin. diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 09f6e1c0f9c0..e2d5205cde37 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1442,6 +1442,21 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev) return (1UL << __ffs(domain->pgsize_bitmap)) - 1; } +static size_t iommu_dma_max_mapping_size(struct device *dev) +{ + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + struct iommu_dma_cookie *cookie; + + if (!domain) + return 0; + + cookie = domain->iova_cookie; + if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) + return 0; + + return iova_rcache_range(); +} + static const struct dma_map_ops iommu_dma_ops = { .alloc = iommu_dma_alloc, .free = iommu_dma_free, @@ -1462,6 +1477,7 @@ static const struct dma_map_ops iommu_dma_ops = { .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, .get_merge_boundary = iommu_dma_get_merge_boundary, + .max_mapping_size = iommu_dma_max_mapping_size, }; /* diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index db77aa675145..9f00b58d546e 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain *iovad, static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad); static void free_iova_rcaches(struct iova_domain *iovad); +unsigned long iova_rcache_range(void) +{ + return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1); +} + static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node) { struct iova_domain *iovad; diff --git a/include/linux/iova.h b/include/linux/iova.h index 320a70e40233..ae3e18d77e6c 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain *iovad, dma_addr_t iova) int iova_cache_get(void); void iova_cache_put(void); +unsigned long iova_rcache_range(void); + void free_iova(struct iova_domain *iovad, unsigned long pfn); void __free_iova(struct iova_domain *iovad, struct iova *iova); struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size, @@ -105,6 +107,11 @@ static inline void iova_cache_put(void) { } +static inline unsigned long iova_rcache_range(void) +{ + return 0; +} + static inline void free_iova(struct iova_domain *iovad, unsigned long pfn) { } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Fix check for error return from iommu_map_sg_atomic()
On Tue, 2022-05-17 at 11:18 +0100, Robin Murphy wrote: > On 2022-05-17 11:17, Niklas Schnelle wrote: > > On Tue, 2022-05-17 at 10:36 +0200, Christoph Hellwig wrote: > > > On Fri, May 13, 2022 at 05:39:48PM +0200, Niklas Schnelle wrote: > > > > In __iommu_dma_alloc_noncontiguous() the value returned by > > > > iommu_map_sg_atomic() is checked for being smaller than size. Before > > > > commit ad8f36e4b6b1 ("iommu: return full error code from > > > > iommu_map_sg[_atomic]()") this simply checked if the requested size was > > > > successfully mapped. > > > > > > > > After that commit iommu_map_sg_atomic() may also return a negative > > > > error value. In principle this too would be covered by the existing > > > > check. There is one problem however, as size is of type size_t while the > > > > return type of iommu_map_sg_atomic() is now of type ssize_t the latter > > > > gets > > > > converted to size_t and negative error values end up as very large > > > > positive values making the check succeed. Fix this by making the return > > > > type visible with a local variable and add an explicit cast to ssize_t. > > > > > > > > Fixes: ad8f36e4b6b1 ("iommu: return full error code from > > > > iommu_map_sg[_atomic]()") > > > > Cc: sta...@vger.kernel.org > > > > Signed-off-by: Niklas Schnelle > > > > > > I don't see what the point of the newly added local variable is here. > > > Just casting size should be all that is needed as far as I can tell. > > > > No technical reason just found it easier to read and more descriptive. > > I'll sent a v2 with just the cast, it does simplify the commit message. > > Note that this is already fixed upstream, though: > > https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=core&id=a3884774d731f03d3a3dd4fb70ec2d9341ceb39d > > Robin. Ah oh well then nevermind and you can of course also ignore the v2 I sent out a minute ago. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/dma: Fix check for error return from iommu_map_sg_atomic()
In __iommu_dma_alloc_noncontiguous() the return value of iommu_map_sg_atomic() is treated as an error if it is smaller than size. Before upstream commit ad8f36e4b6b1 ("iommu: return full error code from iommu_map_sg[_atomic]()") this simply checked if the requested size was successfully mapped. After that commit iommu_map_sg_atomic() may also return a negative error value. In principle this too would be covered by the existing check. There is one problem however, as size is of type size_t while the return type of iommu_map_sg_atomic() is now of type ssize_t the latter gets converted to size_t and negative error values end up as very large positive values making the check succeed even if an error was returned. Fix this by casting size to ssize_t. Fixes: ad8f36e4b6b1 ("iommu: return full error code from iommu_map_sg[_atomic]()") Cc: sta...@vger.kernel.org Signed-off-by: Niklas Schnelle --- v1 -> v2: - Don't needlessly add a local variable drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 09f6e1c0f9c0..80db2aa5458c 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -814,7 +814,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev, } if (iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt->orig_nents, ioprot) - < size) + < (ssize_t)size) goto out_free_sg; sgt->sgl->dma_address = iova; -- 2.32.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Fix check for error return from iommu_map_sg_atomic()
On 2022-05-17 11:17, Niklas Schnelle wrote: On Tue, 2022-05-17 at 10:36 +0200, Christoph Hellwig wrote: On Fri, May 13, 2022 at 05:39:48PM +0200, Niklas Schnelle wrote: In __iommu_dma_alloc_noncontiguous() the value returned by iommu_map_sg_atomic() is checked for being smaller than size. Before commit ad8f36e4b6b1 ("iommu: return full error code from iommu_map_sg[_atomic]()") this simply checked if the requested size was successfully mapped. After that commit iommu_map_sg_atomic() may also return a negative error value. In principle this too would be covered by the existing check. There is one problem however, as size is of type size_t while the return type of iommu_map_sg_atomic() is now of type ssize_t the latter gets converted to size_t and negative error values end up as very large positive values making the check succeed. Fix this by making the return type visible with a local variable and add an explicit cast to ssize_t. Fixes: ad8f36e4b6b1 ("iommu: return full error code from iommu_map_sg[_atomic]()") Cc: sta...@vger.kernel.org Signed-off-by: Niklas Schnelle I don't see what the point of the newly added local variable is here. Just casting size should be all that is needed as far as I can tell. No technical reason just found it easier to read and more descriptive. I'll sent a v2 with just the cast, it does simplify the commit message. Note that this is already fixed upstream, though: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=core&id=a3884774d731f03d3a3dd4fb70ec2d9341ceb39d Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Fix check for error return from iommu_map_sg_atomic()
On Tue, 2022-05-17 at 10:36 +0200, Christoph Hellwig wrote: > On Fri, May 13, 2022 at 05:39:48PM +0200, Niklas Schnelle wrote: > > In __iommu_dma_alloc_noncontiguous() the value returned by > > iommu_map_sg_atomic() is checked for being smaller than size. Before > > commit ad8f36e4b6b1 ("iommu: return full error code from > > iommu_map_sg[_atomic]()") this simply checked if the requested size was > > successfully mapped. > > > > After that commit iommu_map_sg_atomic() may also return a negative > > error value. In principle this too would be covered by the existing > > check. There is one problem however, as size is of type size_t while the > > return type of iommu_map_sg_atomic() is now of type ssize_t the latter gets > > converted to size_t and negative error values end up as very large > > positive values making the check succeed. Fix this by making the return > > type visible with a local variable and add an explicit cast to ssize_t. > > > > Fixes: ad8f36e4b6b1 ("iommu: return full error code from > > iommu_map_sg[_atomic]()") > > Cc: sta...@vger.kernel.org > > Signed-off-by: Niklas Schnelle > > I don't see what the point of the newly added local variable is here. > Just casting size should be all that is needed as far as I can tell. No technical reason just found it easier to read and more descriptive. I'll sent a v2 with just the cast, it does simplify the commit message. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us
On 17/05/2022 11:26, AngeloGioacchino Del Regno wrote: Il 17/05/22 11:08, Yong Wu ha scritto: On Fri, 2022-05-13 at 17:14 +0200, AngeloGioacchino Del Regno wrote: Add support for the M4Us found in the MT6795 Helio X10 SoC. Signed-off-by: AngeloGioacchino Del Regno < angelogioacchino.delre...@collabora.com> --- drivers/iommu/mtk_iommu.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 71b2ace74cd6..3d802dd3f377 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -157,6 +157,7 @@ enum mtk_iommu_plat { M4U_MT2712, M4U_MT6779, + M4U_MT6795, M4U_MT8167, M4U_MT8173, M4U_MT8183, @@ -953,7 +954,8 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data, unsigned int ban * Global control settings are in bank0. May re-init these global registers * since no sure if there is bank0 consumers. */ - if (data->plat_data->m4u_plat == M4U_MT8173) { + if (data->plat_data->m4u_plat == M4U_MT6795 || + data->plat_data->m4u_plat == M4U_MT8173) { regval = F_MMU_PREFETCH_RT_REPLACE_MOD | F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173; } else { @@ -1138,6 +1140,9 @@ static int mtk_iommu_probe(struct platform_device *pdev) case M4U_MT2712: p = "mediatek,mt2712-infracfg"; break; + case M4U_MT6795: + p = "mediatek,mt6795-infracfg"; + break; case M4U_MT8173: p = "mediatek,mt8173-infracfg"; break; @@ -1404,6 +1409,18 @@ static const struct mtk_iommu_plat_data mt6779_data = { .larbid_remap = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}}, }; +static const struct mtk_iommu_plat_data mt6795_data = { + .m4u_plat = M4U_MT6795, + .flags = HAS_4GB_MODE | HAS_BCLK | RESET_AXI | + HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM, + .inv_sel_reg = REG_MMU_INV_SEL_GEN1, + .banks_num = 1, + .banks_enable = {true}, + .iova_region = single_domain, + .iova_region_nr = ARRAY_SIZE(single_domain), + .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping. */ +}; This is nearly same with mt8173_data. mt8173 has one more larb than mt6795, its larbid_remap is also ok for mt6795. I think that we should be explicit about the larbid_remap property, since mt6795 has one less larb, we should explicitly say that like I did there... that's only for human readability I admit ... but, still, I wouldn't want to see people thinking that MT6795 has 6 LARBs because they've read that larbid_remap having 6 entries. thus it looks we could use mt8173 as the backward compatible. compatible = "mediatek,mt6795-m4u", "mediatek,mt8173-m4u"; After this, the only thing is about "mediatek,mt6795-infracfg". we have to try again with mediatek,mt6795-infracfg after mediatek,mt8173- infracfg fail. I think we should allow the backward case in 4GB mode judgment if we have. What's your opinion? or some other suggestion? Thanks. I know, I may have a plan for that, but I wanted to have a good reason to propose such a thing, as if it's just about two SoCs needing that, there would be no good reason to get things done differently. ...so, in order to provide a good cleanup, we have two possible roads to follow here: either we add a generic "mediatek,infracfg" compatible to the infra node (but I don't like that), or we can do it like it was previously done in mtk-pm-domains.c (I prefer that approach): iommu: iommu@somewhere { ... something ... mediatek,infracfg = <&infracfg>; }; infracfg = syscon_regmap_lookup_by_compatible(node, "mediatek,infracfg"); if (IS_ERR(infracfg)) { /* try with the older way */ switch (...) { case p = "mediatek,mt2712-infracfg"; ... blah blah ... } /* legacy also failed, ouch! */ if (IS_ERR(infracfg)) return PTR_ERR(infracfg); } ret = regmap_read ... etc etc etc I prefer that approach as well. Regards, Matthias Cheers, Angelo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us
Il 17/05/22 11:08, Yong Wu ha scritto: On Fri, 2022-05-13 at 17:14 +0200, AngeloGioacchino Del Regno wrote: Add support for the M4Us found in the MT6795 Helio X10 SoC. Signed-off-by: AngeloGioacchino Del Regno < angelogioacchino.delre...@collabora.com> --- drivers/iommu/mtk_iommu.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 71b2ace74cd6..3d802dd3f377 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -157,6 +157,7 @@ enum mtk_iommu_plat { M4U_MT2712, M4U_MT6779, + M4U_MT6795, M4U_MT8167, M4U_MT8173, M4U_MT8183, @@ -953,7 +954,8 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data, unsigned int ban * Global control settings are in bank0. May re-init these global registers * since no sure if there is bank0 consumers. */ - if (data->plat_data->m4u_plat == M4U_MT8173) { + if (data->plat_data->m4u_plat == M4U_MT6795 || + data->plat_data->m4u_plat == M4U_MT8173) { regval = F_MMU_PREFETCH_RT_REPLACE_MOD | F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173; } else { @@ -1138,6 +1140,9 @@ static int mtk_iommu_probe(struct platform_device *pdev) case M4U_MT2712: p = "mediatek,mt2712-infracfg"; break; + case M4U_MT6795: + p = "mediatek,mt6795-infracfg"; + break; case M4U_MT8173: p = "mediatek,mt8173-infracfg"; break; @@ -1404,6 +1409,18 @@ static const struct mtk_iommu_plat_data mt6779_data = { .larbid_remap = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}}, }; +static const struct mtk_iommu_plat_data mt6795_data = { + .m4u_plat = M4U_MT6795, + .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI | + HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM, + .inv_sel_reg = REG_MMU_INV_SEL_GEN1, + .banks_num= 1, + .banks_enable = {true}, + .iova_region = single_domain, + .iova_region_nr = ARRAY_SIZE(single_domain), + .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping. */ +}; This is nearly same with mt8173_data. mt8173 has one more larb than mt6795, its larbid_remap is also ok for mt6795. I think that we should be explicit about the larbid_remap property, since mt6795 has one less larb, we should explicitly say that like I did there... that's only for human readability I admit ... but, still, I wouldn't want to see people thinking that MT6795 has 6 LARBs because they've read that larbid_remap having 6 entries. thus it looks we could use mt8173 as the backward compatible. compatible = "mediatek,mt6795-m4u", "mediatek,mt8173-m4u"; After this, the only thing is about "mediatek,mt6795-infracfg". we have to try again with mediatek,mt6795-infracfg after mediatek,mt8173- infracfg fail. I think we should allow the backward case in 4GB mode judgment if we have. What's your opinion? or some other suggestion? Thanks. I know, I may have a plan for that, but I wanted to have a good reason to propose such a thing, as if it's just about two SoCs needing that, there would be no good reason to get things done differently. ...so, in order to provide a good cleanup, we have two possible roads to follow here: either we add a generic "mediatek,infracfg" compatible to the infra node (but I don't like that), or we can do it like it was previously done in mtk-pm-domains.c (I prefer that approach): iommu: iommu@somewhere { ... something ... mediatek,infracfg = <&infracfg>; }; infracfg = syscon_regmap_lookup_by_compatible(node, "mediatek,infracfg"); if (IS_ERR(infracfg)) { /* try with the older way */ switch (...) { case p = "mediatek,mt2712-infracfg"; ... blah blah ... } /* legacy also failed, ouch! */ if (IS_ERR(infracfg)) return PTR_ERR(infracfg); } ret = regmap_read ... etc etc etc Cheers, Angelo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] swiotlb: Max mapping size takes min align mask into account
Thanks, applied to the dma-mapping for-next tree. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device
On 2022/5/17 16:21, John Garry wrote: > On 17/05/2022 09:09, Yicong Yang wrote: + target = cpumask_any(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev))); + if (target < nr_cpumask_bits) { >>> the comment for cpumask_any() hints to check against nr_cpu_ids - any >>> specific reason to check against nr_cpumask_bits? >>> >> here should be: >> if (target >= nr_cpumask_bits) { >> >> will fix this up. >> > > I am still not sure that using nr_cpumask_bits is correct. Let's use nr_cpu_ids to match the comment of cpumask_any(). Actually we should have nr_cpu_ids(possible cpus, init to NR_CPUS) <= nr_cpumask_bits (NR_CPUS) so it's ok here. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()
On Tue, May 17, 2022 at 10:02:00AM +0100, John Garry wrote: > BTW, on a separate topic, I noticed that even with this change my ATA > devices have max_hw_sectors_kb of 32767, as opposed to 128 for SAS devices. > It seems that libata-scsi - specifically ata_scsi_dev_config() - doesn't > honour the shost max_sectors limit. I guess that is not intentional. I don't think it is. the libsas/libsata integration is a bit messy sometimes.. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us
On Fri, 2022-05-13 at 17:14 +0200, AngeloGioacchino Del Regno wrote: > Add support for the M4Us found in the MT6795 Helio X10 SoC. > > Signed-off-by: AngeloGioacchino Del Regno < > angelogioacchino.delre...@collabora.com> > --- > drivers/iommu/mtk_iommu.c | 20 +++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > index 71b2ace74cd6..3d802dd3f377 100644 > --- a/drivers/iommu/mtk_iommu.c > +++ b/drivers/iommu/mtk_iommu.c > @@ -157,6 +157,7 @@ > enum mtk_iommu_plat { > M4U_MT2712, > M4U_MT6779, > + M4U_MT6795, > M4U_MT8167, > M4U_MT8173, > M4U_MT8183, > @@ -953,7 +954,8 @@ static int mtk_iommu_hw_init(const struct > mtk_iommu_data *data, unsigned int ban >* Global control settings are in bank0. May re-init these > global registers >* since no sure if there is bank0 consumers. >*/ > - if (data->plat_data->m4u_plat == M4U_MT8173) { > + if (data->plat_data->m4u_plat == M4U_MT6795 || > + data->plat_data->m4u_plat == M4U_MT8173) { > regval = F_MMU_PREFETCH_RT_REPLACE_MOD | >F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173; > } else { > @@ -1138,6 +1140,9 @@ static int mtk_iommu_probe(struct > platform_device *pdev) > case M4U_MT2712: > p = "mediatek,mt2712-infracfg"; > break; > + case M4U_MT6795: > + p = "mediatek,mt6795-infracfg"; > + break; > case M4U_MT8173: > p = "mediatek,mt8173-infracfg"; > break; > @@ -1404,6 +1409,18 @@ static const struct mtk_iommu_plat_data > mt6779_data = { > .larbid_remap = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}}, > }; > > +static const struct mtk_iommu_plat_data mt6795_data = { > + .m4u_plat = M4U_MT6795, > + .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI | > + HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM, > + .inv_sel_reg = REG_MMU_INV_SEL_GEN1, > + .banks_num= 1, > + .banks_enable = {true}, > + .iova_region = single_domain, > + .iova_region_nr = ARRAY_SIZE(single_domain), > + .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping. > */ > +}; This is nearly same with mt8173_data. mt8173 has one more larb than mt6795, its larbid_remap is also ok for mt6795. thus it looks we could use mt8173 as the backward compatible. compatible = "mediatek,mt6795-m4u", "mediatek,mt8173-m4u"; After this, the only thing is about "mediatek,mt6795-infracfg". we have to try again with mediatek,mt6795-infracfg after mediatek,mt8173- infracfg fail. I think we should allow the backward case in 4GB mode judgment if we have. What's your opinion? or some other suggestion? Thanks. > + > static const struct mtk_iommu_plat_data mt8167_data = { > .m4u_plat = M4U_MT8167, > .flags= RESET_AXI | HAS_LEGACY_IVRP_PADDR | > MTK_IOMMU_TYPE_MM, > @@ -1515,6 +1532,7 @@ static const struct mtk_iommu_plat_data > mt8195_data_vpp = { > static const struct of_device_id mtk_iommu_of_ids[] = { > { .compatible = "mediatek,mt2712-m4u", .data = &mt2712_data}, > { .compatible = "mediatek,mt6779-m4u", .data = &mt6779_data}, > + { .compatible = "mediatek,mt6795-m4u", .data = &mt6795_data}, > { .compatible = "mediatek,mt8167-m4u", .data = &mt8167_data}, > { .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data}, > { .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data}, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()
On 17/05/2022 09:38, Christoph Hellwig wrote: On Mon, May 16, 2022 at 09:06:01PM +0800, John Garry wrote: For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which allows the drivers to know the mapping limit and thus limit the requested IOVA lengths. This resolves the performance issue originally reported in [0] for a SCSI HBA driver which was regularly mapping SGLs which required IOVAs in excess of the IOVA caching limit. In this case the block layer limits the max sectors per request - as configured in __scsi_init_queue() - which will limit the total SGL length the driver tries to map and in turn limits IOVA lengths requested. BTW, on a separate topic, I noticed that even with this change my ATA devices have max_hw_sectors_kb of 32767, as opposed to 128 for SAS devices. It seems that libata-scsi - specifically ata_scsi_dev_config() - doesn't honour the shost max_sectors limit. I guess that is not intentional. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ Signed-off-by: John Garry --- Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and not a hard limit which I expect is the semantics of dma_map_ops.max_mapping_size diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 09f6e1c0f9c0..e2d5205cde37 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1442,6 +1442,21 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev) return (1UL << __ffs(domain->pgsize_bitmap)) - 1; } + if (!domain) + return 0; + + cookie = domain->iova_cookie; + if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) + return 0; Can these conditions even be true here? I don't think so. Paranoia on my part. +static inline unsigned long iova_rcache_range(void) +{ + return 0; +} Given that IOMMU_DMA select IOMMU_IOVA there is no need for this stub. hmmm.. ok. Policy was to be stub everything but I think that it has changed. Otherwise this looks sensible to me. . Great, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()
On Mon, May 16, 2022 at 09:06:01PM +0800, John Garry wrote: > For streaming DMA mappings involving an IOMMU and whose IOVA len regularly > exceeds the IOVA rcache upper limit (meaning that they are not cached), > performance can be reduced. > > Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which > allows the drivers to know the mapping limit and thus limit the requested > IOVA lengths. > > This resolves the performance issue originally reported in [0] for a SCSI > HBA driver which was regularly mapping SGLs which required IOVAs in > excess of the IOVA caching limit. In this case the block layer limits the > max sectors per request - as configured in __scsi_init_queue() - which > will limit the total SGL length the driver tries to map and in turn limits > IOVA lengths requested. > > [0] > https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ > > Signed-off-by: John Garry > --- > Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and not > a hard limit which I expect is the semantics of dma_map_ops.max_mapping_size > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 09f6e1c0f9c0..e2d5205cde37 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -1442,6 +1442,21 @@ static unsigned long > iommu_dma_get_merge_boundary(struct device *dev) > return (1UL << __ffs(domain->pgsize_bitmap)) - 1; > } > > + if (!domain) > + return 0; > + > + cookie = domain->iova_cookie; > + if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) > + return 0; Can these conditions even be true here? > +static inline unsigned long iova_rcache_range(void) > +{ > + return 0; > +} Given that IOMMU_DMA select IOMMU_IOVA there is no need for this stub. Otherwise this looks sensible to me. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Fix check for error return from iommu_map_sg_atomic()
On Fri, May 13, 2022 at 05:39:48PM +0200, Niklas Schnelle wrote: > In __iommu_dma_alloc_noncontiguous() the value returned by > iommu_map_sg_atomic() is checked for being smaller than size. Before > commit ad8f36e4b6b1 ("iommu: return full error code from > iommu_map_sg[_atomic]()") this simply checked if the requested size was > successfully mapped. > > After that commit iommu_map_sg_atomic() may also return a negative > error value. In principle this too would be covered by the existing > check. There is one problem however, as size is of type size_t while the > return type of iommu_map_sg_atomic() is now of type ssize_t the latter gets > converted to size_t and negative error values end up as very large > positive values making the check succeed. Fix this by making the return > type visible with a local variable and add an explicit cast to ssize_t. > > Fixes: ad8f36e4b6b1 ("iommu: return full error code from > iommu_map_sg[_atomic]()") > Cc: sta...@vger.kernel.org > Signed-off-by: Niklas Schnelle I don't see what the point of the newly added local variable is here. Just casting size should be all that is needed as far as I can tell. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device
On 17/05/2022 09:09, Yicong Yang wrote: + target = cpumask_any(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev))); + if (target < nr_cpumask_bits) { the comment for cpumask_any() hints to check against nr_cpu_ids - any specific reason to check against nr_cpumask_bits? here should be: if (target >= nr_cpumask_bits) { will fix this up. I am still not sure that using nr_cpumask_bits is correct. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device
On 2022/5/17 0:23, John Garry wrote: > On 16/05/2022 13:52, Yicong Yang wrote: >> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated >> Endpoint(RCiEP) device, providing the capability to dynamically monitor and >> tune the PCIe traffic and trace the TLP headers. >> >> Add the driver for the device to enable the trace function. Register PMU >> device of PTT trace, then users can use trace through perf command. The >> driver makes use of perf AUX trace function and support the following >> events to configure the trace: >> >> - filter: select Root port or Endpoint to trace >> - type: select the type of traced TLP headers >> - direction: select the direction of traced TLP headers >> - format: select the data format of the traced TLP headers >> >> This patch initially add a basic driver of PTT trace. > > Initially add basic trace support. > >> >> Signed-off-by: Yicong Yang > > Generally this looks ok, apart from nitpicking below, so, FWIW: > Reviewed-by: John Garry > >> --- >> drivers/Makefile | 1 + >> drivers/hwtracing/Kconfig | 2 + >> drivers/hwtracing/ptt/Kconfig | 12 + >> drivers/hwtracing/ptt/Makefile | 2 + >> drivers/hwtracing/ptt/hisi_ptt.c | 964 +++ >> drivers/hwtracing/ptt/hisi_ptt.h | 178 ++ >> 6 files changed, 1159 insertions(+) >> create mode 100644 drivers/hwtracing/ptt/Kconfig >> create mode 100644 drivers/hwtracing/ptt/Makefile >> create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c >> create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h >> [...] >> +static int hisi_ptt_cpu_teardown(unsigned int cpu, struct hlist_node *node) >> +{ >> + struct hisi_ptt *hisi_ptt; >> + int target, src; >> + >> + hisi_ptt = hlist_entry_safe(node, struct hisi_ptt, hotplug_node); >> + src = hisi_ptt->trace_ctrl.on_cpu; >> + >> + if (!hisi_ptt->trace_ctrl.started || src != cpu) >> + return 0; >> + >> + target = >> cpumask_any(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev))); >> + if (target < nr_cpumask_bits) { > > the comment for cpumask_any() hints to check against nr_cpu_ids - any > specific reason to check against nr_cpumask_bits? > here should be: if (target >= nr_cpumask_bits) { will fix this up. >> + dev_err(hisi_ptt->hisi_ptt_pmu.dev, "no available cpu for perf >> context migration\n"); >> + return 0; >> + } >> + >> + perf_pmu_migrate_context(&hisi_ptt->hisi_ptt_pmu, src, target); >> + WARN_ON(irq_set_affinity(pci_irq_vector(hisi_ptt->pdev, >> HISI_PTT_TRACE_DMA_IRQ), >> + cpumask_of(cpu))); >> + hisi_ptt->trace_ctrl.on_cpu = target; >> + >> + return 0; >> +} >> + >> +static int __init hisi_ptt_init(void) >> +{ >> + int ret; >> + >> + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRV_NAME, NULL, >> + hisi_ptt_cpu_teardown); >> + if (ret < 0) >> + return ret; >> + hisi_ptt_pmu_online = ret; >> + >> + ret = pci_register_driver(&hisi_ptt_driver); >> + if (ret) >> + cpuhp_remove_multi_state(hisi_ptt_pmu_online); >> + >> + return ret; >> +} >> +module_init(hisi_ptt_init); >> + >> +static void __exit hisi_ptt_exit(void) >> +{ >> + pci_unregister_driver(&hisi_ptt_driver); >> + cpuhp_remove_multi_state(hisi_ptt_pmu_online); >> +} >> +module_exit(hisi_ptt_exit); >> + >> +MODULE_LICENSE("GPL"); >> +MODULE_AUTHOR("Yicong Yang "); >> +MODULE_DESCRIPTION("Driver for HiSilicon PCIe tune and trace device"); >> diff --git a/drivers/hwtracing/ptt/hisi_ptt.h >> b/drivers/hwtracing/ptt/hisi_ptt.h >> new file mode 100644 >> index ..2344e4195648 >> --- /dev/null >> +++ b/drivers/hwtracing/ptt/hisi_ptt.h >> @@ -0,0 +1,178 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +/* >> + * Driver for HiSilicon PCIe tune and trace device >> + * >> + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd. >> + * Author: Yicong Yang >> + */ >> + >> +#ifndef _HISI_PTT_H >> +#define _HISI_PTT_H >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#define DRV_NAME "hisi_ptt" >> + >> +/* >> + * The definition of the device registers and register fields. >> + */ >> +#define HISI_PTT_TRACE_ADDR_SIZE 0x0800 >> +#define HISI_PTT_TRACE_ADDR_BASE_LO_0 0x0810 >> +#define HISI_PTT_TRACE_ADDR_BASE_HI_0 0x0814 >> +#define HISI_PTT_TRACE_ADDR_STRIDE 0x8 >> +#define HISI_PTT_TRACE_CTRL 0x0850 >> +#define HISI_PTT_TRACE_CTRL_EN BIT(0) >> +#define HISI_PTT_TRACE_CTRL_RST BIT(1) >> +#define HISI_PTT_TRACE_CTRL_RXTX_SEL GENMASK(3, 2) >> +#define HISI_PTT_TRACE_CTRL_TYPE_SEL GENMASK(7, 4) >> +#define HISI_PTT_TRACE_CTRL_DATA_FORMAT BIT(14) >> +#define HISI_PTT_TRACE_CTRL_FILTER_MODE BIT(15) >> +#define HISI_PTT_TRACE_CTRL_TARGET_SEL GENMASK(31, 16) >> +#define HISI_PTT_TRACE_INT_STAT 0x0890 >> +#define HISI_PTT_TRACE_INT_STAT_MASK GENMASK(3, 0) >> +#defin
Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device
On 2022/5/16 22:03, Jonathan Cameron wrote: > On Mon, 16 May 2022 20:52:17 +0800 > Yicong Yang wrote: > >> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated >> Endpoint(RCiEP) device, providing the capability to dynamically monitor and >> tune the PCIe traffic and trace the TLP headers. >> >> Add the driver for the device to enable the trace function. Register PMU >> device of PTT trace, then users can use trace through perf command. The >> driver makes use of perf AUX trace function and support the following >> events to configure the trace: >> >> - filter: select Root port or Endpoint to trace >> - type: select the type of traced TLP headers >> - direction: select the direction of traced TLP headers >> - format: select the data format of the traced TLP headers >> >> This patch initially add a basic driver of PTT trace. >> >> Signed-off-by: Yicong Yang > > Hi Yicong, > > It's been a while since I looked at this driver, so I'll admit > I can't remember if any of the things I've raised below were > previously discussed. > > All minor stuff (biggest is question of failing cleanly in unlikely > case of failing the allocation in the filter addition vs carrying > on anyway), so feel free to add > > Reviewed-by: Jonathan Cameron > >> diff --git a/drivers/hwtracing/ptt/Makefile b/drivers/hwtracing/ptt/Makefile >> new file mode 100644 >> index ..908c09a98161 >> --- /dev/null >> +++ b/drivers/hwtracing/ptt/Makefile >> @@ -0,0 +1,2 @@ >> +# SPDX-License-Identifier: GPL-2.0 >> +obj-$(CONFIG_HISI_PTT) += hisi_ptt.o >> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c >> b/drivers/hwtracing/ptt/hisi_ptt.c >> new file mode 100644 >> index ..ef25ce98f664 >> --- /dev/null >> +++ b/drivers/hwtracing/ptt/hisi_ptt.c > > > ... > > >> + >> +static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data) >> +{ >> +struct hisi_ptt_filter_desc *filter; >> +struct hisi_ptt *hisi_ptt = data; >> + >> +filter = kzalloc(sizeof(*filter), GFP_KERNEL); >> +if (!filter) { >> +pci_err(hisi_ptt->pdev, "failed to add filter %s\n", >> pci_name(pdev)); > > If this fails we carry on anyway (no error checking on the bus_walk). > I think we should error out in that case (would need to use a flag placed > somewhere in hisi_ptt to tell we had an error). > > That would complicate the unwind though. > Easiest way to do that unwind is probably to register a separate > devm_add_action_or_reset() callback for each filter. > > If you prefer to carry on even with this allocation error, then maybe add a > comment > here somewhere to make it clear that will happen. > I'd prefer to carry on with this memory allocation error because this may not be fatal as: 1) the filter maybe partial initialized but the trace is available with resident filters and 2) the driver log here provides enough information that which filter has problem to add. 3) furthermore the tune function doesn't rely on this so we give a chance for the tune to be available if the filter allocation failed. The log message provide information about what happened here, will add a comment here why we decide to carry on: /* * We won't fail the probe if filter allocation failed here. The filters * should be partial initialized and users would know which filter fails * through the log. Other functions of PTT device are still available. */ filter = kzalloc(sizeof(*filter), GFP_KERNEL); if (!filter) { pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev)); return -ENOMEM; } It's better to manually manage the filters as we're going to add support of dynamically updating the filters, which means the filter can be added/removed after probe. With devres it'll be complex to keep the order (the new added one will be released before the PMU unregistration, which is not expected). Will address the rest comments. Thanks, Yicong >> +return -ENOMEM; >> +} >> + >> +filter->devid = PCI_DEVID(pdev->bus->number, pdev->devfn); >> + >> +if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) { >> +filter->is_port = true; >> +list_add_tail(&filter->list, &hisi_ptt->port_filters); >> + >> +/* Update the available port mask */ >> +hisi_ptt->port_mask |= hisi_ptt_get_filter_val(filter->devid, >> true); >> +} else { >> +list_add_tail(&filter->list, &hisi_ptt->req_filters); >> +} >> + >> +return 0; >> +} >> + >> +static void hisi_ptt_release_filters(void *data) >> +{ >> +struct hisi_ptt_filter_desc *filter, *tmp; >> +struct hisi_ptt *hisi_ptt = data; >> + >> +list_for_each_entry_safe(filter, tmp, &hisi_ptt->req_filters, list) { >> +list_del(&filter->list); >> +kfree(filter); > > I think with separate release per entry above, this bit become simpler as > we walk all the elemen
RE: [PATCH v12 0/9] ACPI/IORT: Support for IORT RMR node
> -Original Message- > From: Lorenzo Pieralisi [mailto:lorenzo.pieral...@arm.com] > Sent: 13 May 2022 10:50 > To: Robin Murphy ; Shameerali Kolothum Thodi > ; raf...@kernel.org; > j...@8bytes.org > Cc: Shameerali Kolothum Thodi ; > Guohanjun (Hanjun Guo) ; > linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; > iommu@lists.linux-foundation.org; Linuxarm ; > w...@kernel.org; wanghuiqiang ; > steven.pr...@arm.com; sami.muja...@arm.com; j...@solid-run.com; > eric.au...@redhat.com; laurentiu.tu...@nxp.com; h...@infradead.org > Subject: Re: [PATCH v12 0/9] ACPI/IORT: Support for IORT RMR node > > [with Christoph's correct email address] > > On Tue, May 10, 2022 at 09:07:00AM +0100, Robin Murphy wrote: > > On 2022-05-10 08:23, Shameerali Kolothum Thodi wrote: > > > Hi Joerg/Robin, > > > > > > I think this series is now ready to be merged. Could you please let > > > me know if there is anything missing. > > > > Fine by me - these patches have had enough review and testing now that > > even if anything else did come up, I think it would be better done as > > follow-up work on the merged code. > > Given the ACPICA dependency I believe it is best for this series > to go via the ACPI tree, right ? > > I assume there are all the required ACKs for that to happen. The SMMUv3/SMMU related changes (patches 6 - 9) still doesn't have explicit ACK from maintainers other than the go ahead above from Robin. Just thought of highlighting it as not sure that will be an issue or not. Thanks, Shameer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu