Re: [PATCH v3 1/2] dt-bindings: iommu: add bindings for sprd iommu
Hi Rob, On Fri, 5 Feb 2021 at 07:25, Rob Herring wrote: > > On Wed, Feb 03, 2021 at 05:07:26PM +0800, Chunyan Zhang wrote: > > From: Chunyan Zhang > > > > This iommu module can be used by Unisoc's multimedia devices, such as > > display, Image codec(jpeg) and a few signal processors, including > > VSP(video), GSP(graphic), ISP(image), and CPP(camera pixel processor), etc. > > > > Signed-off-by: Chunyan Zhang > > --- > > .../devicetree/bindings/iommu/sprd,iommu.yaml | 72 +++ > > 1 file changed, 72 insertions(+) > > create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml > > > > diff --git a/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml > > b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml > > new file mode 100644 > > index ..4fc99e81fa66 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml > > @@ -0,0 +1,72 @@ > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > > +# Copyright 2020 Unisoc Inc. > > +%YAML 1.2 > > +--- > > +$id: http://devicetree.org/schemas/iommu/sprd,iommu.yaml# > > +$schema: http://devicetree.org/meta-schemas/core.yaml# > > + > > +title: Unisoc IOMMU and Multi-media MMU > > + > > +maintainers: > > + - Chunyan Zhang > > + > > +properties: > > + compatible: > > +enum: > > + - sprd,iommu-v1 > > + > > + "#iommu-cells": > > +const: 0 > > +description: > > + Unisoc IOMMUs are all single-master IOMMU devices, therefore no > > + additional information needs to associate with its master device. > > + Please refer to the generic bindings document for more details, > > + Documentation/devicetree/bindings/iommu/iommu.txt > > + > > + reg: > > +maxItems: 1 > > +description: > > + Not required if 'sprd,iommu-regs' is defined. > > + > > + clocks: > > +description: > > + Reference to a gate clock phandle, since access to some of IOMMUs are > > + controlled by gate clock, but this is not required. > > + > > + sprd,iommu-regs: > > +$ref: /schemas/types.yaml#/definitions/phandle-array > > +description: > > + Reference to a syscon phandle plus 1 cell, the syscon defines the > > + register range used by the iommu and the media device, the cell > > + defines the offset for iommu registers. Since iommu module shares > > + the same register range with the media device which uses it. > > + > > +required: > > + - compatible > > + - "#iommu-cells" > > + > > +oneOf: > > + - required: > > + - reg > > + - required: > > + - sprd,iommu-regs > > + > > +additionalProperties: false > > + > > +examples: > > + - | > > +iommu_disp: iommu-disp { > > + compatible = "sprd,iommu-v1"; > > + sprd,iommu-regs = <&dpu_regs 0x800>; > > If the IOMMU is contained within another device, then it should just be > a child node of that device. Yes, actually IOMMU can be seen as a child of multimedia devices, I considered moving IOMMU under into multimedia device node, but multimedia devices need IOMMU when probe[1], so I dropped that idea. And they share the same register base, e.g. + mm { + compatible = "simple-bus"; + #address-cells = <2>; + #size-cells = <2>; + ranges; + + dpu_regs: syscon@6300 { + compatible = "sprd,sc9863a-dpuregs", "syscon"; + reg = <0 0x6300 0 0x1000>; + }; + + dpu: dpu@6300 { + compatible = "sprd,sharkl3-dpu"; + sprd,disp-regs = <&dpu_regs>; + iommus = <&iommu_dispc>; + }; + + iommu_dispc: iommu@6300 { + compatible = "sprd,iommu-v1"; + sprd,iommu-regs = <&dpu_regs 0x800>; + #iommu-cells = <0>; +}; DPU use the registers from 0, IOMMU from 0x800, the purpose of using syscon node was to avoid remapping register physical address. > Or just make 'dpu_regs' an IOMMU provider > (i.e. just add #iommu-cells to it). xxx_regs(syscon node) defines the register range for IOMMU and a multimedia device (such as dpu, image codec, etc.) Hope I've explained the relationship of xxx_regs, multimedia device, and iommu clearly :) Any suggestion for this kind of cases? Thanks, Chunyan [1] https://elixir.bootlin.com/linux/v5.11-rc6/source/drivers/iommu/of_iommu.c#L145 > > > + #iommu-cells = <0>; > > +}; > > + > > + - | > > +iommu_jpg: iommu-jpg { > > + compatible = "sprd,iommu-v1"; > > + sprd,iommu-regs = <&jpg_regs 0x300>; > > + #iommu-cells = <0>; > > + clocks = <&mm_gate 1>; > > +}; > > + > > +... > > -- > > 2.25.1 > > ___ iomm
Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()
Hi Guillaume, On Thu, Feb 04, 2021 at 11:10:15AM +, Guillaume Tucker wrote: > Hi Nicolin, > > A regression was detected by kernelci.org in IGT's drm_read tests > on mainline, it was first seen on 17th December 2020. You can > find some details here: > > https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/ Thanks for reporting the issue. We did test on Tegra210 and Tegra30 yet not on Tegra124. I am wondering what could go wrong... > Please let us know if you need any help debugging this issue or > to try a fix on this platform. Yes, I don't have any Tegra124 platform to run. It'd be very nice if you can run some debugging patch (I can provide you) and a fix after I root cause the issue. Thanks Nicolin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC] Use SMMU HTTU for DMA dirty page tracking
> From: Keqian Zhu > Sent: Friday, February 5, 2021 11:31 AM > > Hi Jean and Kevin, > > FYI, I have send out the SMMUv3 HTTU support for DMA dirty tracking[1] a > week ago. > > Thanks, > Keqian > > [1] https://lore.kernel.org/linux-iommu/20210128151742.18840-1- > zhukeqi...@huawei.com/ Thanks for the pointer. We will take a look. > > On 2020/5/27 17:14, Jean-Philippe Brucker wrote: > > On Wed, May 27, 2020 at 08:40:47AM +, Tian, Kevin wrote: > >>> From: Xiang Zheng > >>> Sent: Wednesday, May 27, 2020 2:45 PM > >>> > >>> > >>> On 2020/5/27 11:27, Tian, Kevin wrote: > > From: Xiang Zheng > > Sent: Monday, May 25, 2020 7:34 PM > > > > [+cc Kirti, Yan, Alex] > > > > On 2020/5/23 1:14, Jean-Philippe Brucker wrote: > >> Hi, > >> > >> On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote: > >>> Hi all, > >>> > >>> Is there any plan for enabling SMMU HTTU? > >> > >> Not outside of SVA, as far as I know. > >> > > > >>> I have seen the patch locates in the SVA series patch, which adds > >>> support for HTTU: > >>> https://www.spinics.net/lists/arm-kernel/msg798694.html > >>> > >>> HTTU reduces the number of access faults on SMMU fault queue > >>> (permission faults also benifit from it). > >>> > >>> Besides reducing the faults, HTTU also helps to track dirty pages for > >>> device DMA. Is it feasible to utilize HTTU to get dirty pages on > >>> device > >>> DMA during VFIO live migration? > >> > >> As you know there is a VFIO interface for this under discussion: > >> https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email- > > kwankh...@nvidia.com/ > >> It doesn't implement an internal API to communicate with the > IOMMU > > driver > >> about dirty pages. > > We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level > page tables (Rev 3.0). > > >>> > >>> Thank you, Kevin. > >>> > >>> When will you send this series patches? Maybe(Hope) we can also > support > >>> hardware-based dirty pages tracking via common APIs based on your > >>> patches. :) > >> > >> Yan is working with Kirti on basic live migration support now. After that > >> part is done, we will start working on A/D bit support. Yes, common APIs > >> are definitely the goal here. > >> > >>> > > > >> > >>> If SMMU can track dirty pages, devices are not required to > implement > >>> additional dirty pages tracking to support VFIO live migration. > >> > >> It seems feasible, though tracking it in the device might be more > >> efficient. I might have misunderstood but I think for live migration of > >> the Intel NIC they trap guest accesses to the device and introspect its > >> state to figure out which pages it is accessing. > > Does HTTU implement A/D-like mechanism in SMMU page tables, or > just > report dirty pages in a log buffer? Either way tracking dirty pages in > IOMMU > side is generic thus doesn't require device-specific tweak like in Intel > NIC. > > >>> > >>> Currently HTTU just implement A/D-like mechanism in SMMU page > tables. > >>> We certainly > >>> expect SMMU can also implement PML-like feature so that we can avoid > >>> walking the > >>> whole page table to get the dirty pages. > > > > There is no reporting of dirty pages in log buffer. It might be possible > > to do software logging based on PRI or Stall, but that requires special > > support in the endpoint as well as the SMMU. > > > >> Is there a link to HTTU introduction? > > > > I don't know any gentle introduction, but there are sections D5.4.11 > > "Hardware management of the Access flag and dirty state" in the ARM > > Architecture Reference Manual (DDI0487E), and section 3.13 "Translation > > table entries and Access/Dirty flags" in the SMMU specification > > (IHI0070C). HTTU stands for "Hardware Translation Table Update". > > > > In short, when HTTU is enabled, the SMMU translation performs an atomic > > read-modify-write on the leaf translation table descriptor, setting some > > bits depending on the type of memory access. This can be enabled > > independently on both stage-1 and stage-2 tables (equivalent to your 1st > > and 2nd page tables levels, I think). > > > > Thanks, > > Jean > > ___ > > kvmarm mailing list > > kvm...@lists.cs.columbia.edu > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > > . > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] Use SMMU HTTU for DMA dirty page tracking
Hi Jean and Kevin, FYI, I have send out the SMMUv3 HTTU support for DMA dirty tracking[1] a week ago. Thanks, Keqian [1] https://lore.kernel.org/linux-iommu/20210128151742.18840-1-zhukeqi...@huawei.com/ On 2020/5/27 17:14, Jean-Philippe Brucker wrote: > On Wed, May 27, 2020 at 08:40:47AM +, Tian, Kevin wrote: >>> From: Xiang Zheng >>> Sent: Wednesday, May 27, 2020 2:45 PM >>> >>> >>> On 2020/5/27 11:27, Tian, Kevin wrote: > From: Xiang Zheng > Sent: Monday, May 25, 2020 7:34 PM > > [+cc Kirti, Yan, Alex] > > On 2020/5/23 1:14, Jean-Philippe Brucker wrote: >> Hi, >> >> On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote: >>> Hi all, >>> >>> Is there any plan for enabling SMMU HTTU? >> >> Not outside of SVA, as far as I know. >> > >>> I have seen the patch locates in the SVA series patch, which adds >>> support for HTTU: >>> https://www.spinics.net/lists/arm-kernel/msg798694.html >>> >>> HTTU reduces the number of access faults on SMMU fault queue >>> (permission faults also benifit from it). >>> >>> Besides reducing the faults, HTTU also helps to track dirty pages for >>> device DMA. Is it feasible to utilize HTTU to get dirty pages on device >>> DMA during VFIO live migration? >> >> As you know there is a VFIO interface for this under discussion: >> https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email- > kwankh...@nvidia.com/ >> It doesn't implement an internal API to communicate with the IOMMU > driver >> about dirty pages. We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level page tables (Rev 3.0). >>> >>> Thank you, Kevin. >>> >>> When will you send this series patches? Maybe(Hope) we can also support >>> hardware-based dirty pages tracking via common APIs based on your >>> patches. :) >> >> Yan is working with Kirti on basic live migration support now. After that >> part is done, we will start working on A/D bit support. Yes, common APIs >> are definitely the goal here. >> >>> > >> >>> If SMMU can track dirty pages, devices are not required to implement >>> additional dirty pages tracking to support VFIO live migration. >> >> It seems feasible, though tracking it in the device might be more >> efficient. I might have misunderstood but I think for live migration of >> the Intel NIC they trap guest accesses to the device and introspect its >> state to figure out which pages it is accessing. Does HTTU implement A/D-like mechanism in SMMU page tables, or just report dirty pages in a log buffer? Either way tracking dirty pages in IOMMU side is generic thus doesn't require device-specific tweak like in Intel NIC. >>> >>> Currently HTTU just implement A/D-like mechanism in SMMU page tables. >>> We certainly >>> expect SMMU can also implement PML-like feature so that we can avoid >>> walking the >>> whole page table to get the dirty pages. > > There is no reporting of dirty pages in log buffer. It might be possible > to do software logging based on PRI or Stall, but that requires special > support in the endpoint as well as the SMMU. > >> Is there a link to HTTU introduction? > > I don't know any gentle introduction, but there are sections D5.4.11 > "Hardware management of the Access flag and dirty state" in the ARM > Architecture Reference Manual (DDI0487E), and section 3.13 "Translation > table entries and Access/Dirty flags" in the SMMU specification > (IHI0070C). HTTU stands for "Hardware Translation Table Update". > > In short, when HTTU is enabled, the SMMU translation performs an atomic > read-modify-write on the leaf translation table descriptor, setting some > bits depending on the type of memory access. This can be enabled > independently on both stage-1 and stage-2 tables (equivalent to your 1st > and 2nd page tables levels, I think). > > Thanks, > Jean > ___ > kvmarm mailing list > kvm...@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > . > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting
In a real dma mapping user case, after dma_map is done, data will be transmit. Thus, in multi-threaded user scenario, IOMMU contention should not be that severe. For example, if users enable multiple threads to send network packets through 1G/10G/100Gbps NIC, usually the steps will be: map -> transmission -> unmap. Transmission delay reduces the contention of IOMMU. Here a delay is added to simulate the transmission between map and unmap so that the tested result could be more accurate for TX and simple RX. A typical TX transmission for NIC would be like: map -> TX -> unmap since the socket buffers come from OS. Simple RX model eg. disk driver, is also map -> RX -> unmap, but real RX model in a NIC could be more complicated considering packets can come spontaneously and many drivers are using pre-mapped buffers pool. This is in the TBD list. Signed-off-by: Barry Song --- -v2: cleanup according to Robin's feedback. thanks, Robin. kernel/dma/map_benchmark.c| 10 ++ .../testing/selftests/dma/dma_map_benchmark.c | 19 +-- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c index 1b1b8ff875cb..06636406a245 100644 --- a/kernel/dma/map_benchmark.c +++ b/kernel/dma/map_benchmark.c @@ -21,6 +21,7 @@ #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) #define DMA_MAP_MAX_THREADS1024 #define DMA_MAP_MAX_SECONDS300 +#define DMA_MAP_MAX_TRANS_DELAY(10 * NSEC_PER_MSEC) /* 10ms */ #define DMA_MAP_BIDIRECTIONAL 0 #define DMA_MAP_TO_DEVICE 1 @@ -36,6 +37,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ + __u32 dma_trans_ns; /* time for DMA transmission in ns */ __u64 expansion[10];/* For future use */ }; @@ -87,6 +89,9 @@ static int map_benchmark_thread(void *data) map_etime = ktime_get(); map_delta = ktime_sub(map_etime, map_stime); + /* Pretend DMA is transmitting */ + ndelay(map->bparam.dma_trans_ns); + unmap_stime = ktime_get(); dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir); unmap_etime = ktime_get(); @@ -218,6 +223,11 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd, return -EINVAL; } + if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) { + pr_err("invalid transmission delay\n"); + return -EINVAL; + } + if (map->bparam.node != NUMA_NO_NODE && !node_possible(map->bparam.node)) { pr_err("invalid numa node\n"); diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c index 7065163a8388..a370290d9503 100644 --- a/tools/testing/selftests/dma/dma_map_benchmark.c +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -11,9 +11,12 @@ #include #include +#define NSEC_PER_MSEC 100L + #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) #define DMA_MAP_MAX_THREADS1024 #define DMA_MAP_MAX_SECONDS 300 +#define DMA_MAP_MAX_TRANS_DELAY(10 * NSEC_PER_MSEC) /* 10ms */ #define DMA_MAP_BIDIRECTIONAL 0 #define DMA_MAP_TO_DEVICE 1 @@ -35,6 +38,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ + __u32 dma_trans_ns; /* delay for DMA transmission in ns */ __u64 expansion[10];/* For future use */ }; @@ -45,12 +49,12 @@ int main(int argc, char **argv) /* default single thread, run 20 seconds on NUMA_NO_NODE */ int threads = 1, seconds = 20, node = -1; /* default dma mask 32bit, bidirectional DMA */ - int bits = 32, dir = DMA_MAP_BIDIRECTIONAL; + int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL; int cmd = DMA_MAP_BENCHMARK; char *p; - while ((opt = getopt(argc, argv, "t:s:n:b:d:")) != -1) { + while ((opt = getopt(argc, argv, "t:s:n:b:d:x:")) != -1) { switch (opt) { case 't': threads = atoi(optarg); @@ -67,6 +71,9 @@ int main(int argc, char **argv) case 'd': dir = atoi(optarg); break; + case 'x': + xdelay = atoi(optarg); + break; default: return -1; } @@ -84,6 +91,12 @@ int main(int argc, char **argv) exit(1); } + if (xdelay < 0 || xdelay > DMA_MAP_MAX_TRANS_DELAY) { + fprin
Re: [PATCH v2 1/2] iommu/vt-d: Use Real PCI DMA device for IRTE
On 2/5/21 3:09 AM, Jon Derrick wrote: VMD retransmits child device MSI/X with the VMD endpoint's requester-id. In order to support direct interrupt remapping of VMD child devices, ensure that the IRTE is programmed with the VMD endpoint's requester-id using pci_real_dma_dev(). Signed-off-by: Jon Derrick --- drivers/iommu/intel/irq_remapping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 685200a5cff0..1939e070eec8 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1276,7 +1276,8 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, break; case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - set_msi_sid(irte, msi_desc_to_pci_dev(info->desc)); + set_msi_sid(irte, + pci_real_dma_dev(msi_desc_to_pci_dev(info->desc))); break; default: BUG_ON(1); Acked-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] dma-mapping: benchmark: pretend DMA is transmitting
> -Original Message- > From: Robin Murphy [mailto:robin.mur...@arm.com] > Sent: Friday, February 5, 2021 12:51 PM > To: Song Bao Hua (Barry Song) ; > m.szyprow...@samsung.com; h...@lst.de; iommu@lists.linux-foundation.org > Cc: linux-ker...@vger.kernel.org; linux...@openeuler.org > Subject: Re: [PATCH] dma-mapping: benchmark: pretend DMA is transmitting > > On 2021-02-04 22:58, Barry Song wrote: > > In a real dma mapping user case, after dma_map is done, data will be > > transmit. Thus, in multi-threaded user scenario, IOMMU contention > > should not be that severe. For example, if users enable multiple > > threads to send network packets through 1G/10G/100Gbps NIC, usually > > the steps will be: map -> transmission -> unmap. Transmission delay > > reduces the contention of IOMMU. Here a delay is added to simulate > > the transmission for TX case so that the tested result could be > > more accurate. > > > > RX case would be much more tricky. It is not supported yet. > > I guess it might be a reasonable approximation to map several pages, > then unmap them again after a slightly more random delay. Or maybe > divide the threads into pairs of mappers and unmappers respectively > filling up and draining proper little buffer pools. Yes. Good suggestions. I am actually thinking about how to support cases like networks. There is a pre-mapped list of pages, each page is bound with some hardware DMA block descriptor(BD). So if Linux can consume the packets in time, those buffers are always re-used. Only when the page bound with BD is full and OS can't consume it in time, another temp page will be allocated and mapped, BD will switch to use this temp page, then finally unmap it if it is not needed any more. On the other hand, the pre-mapped pages are never unmapped. For things like filesystem and disk driver, RX is always requested by users. The model would be simpler: map -> rx -> unmap. For networks, RX transmission can come spontaneously. Anyway, I'll put this into TBD. For this moment, mainly handle TX path. Or maybe the current code has been able to handle simple RX model :-) > > > Signed-off-by: Barry Song > > --- > > kernel/dma/map_benchmark.c | 11 +++ > > tools/testing/selftests/dma/dma_map_benchmark.c | 17 +++-- > > 2 files changed, 26 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c > > index 1b1b8ff875cb..1976db7e34e4 100644 > > --- a/kernel/dma/map_benchmark.c > > +++ b/kernel/dma/map_benchmark.c > > @@ -21,6 +21,7 @@ > > #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) > > #define DMA_MAP_MAX_THREADS 1024 > > #define DMA_MAP_MAX_SECONDS 300 > > +#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */ > > Using MSEC_PER_SEC might be sufficiently self-documenting? Yes, I guess you mean NSEC_PER_MSEC. will move to it. > > > #define DMA_MAP_BIDIRECTIONAL 0 > > #define DMA_MAP_TO_DEVICE 1 > > @@ -36,6 +37,7 @@ struct map_benchmark { > > __s32 node; /* which numa node this benchmark will run on */ > > __u32 dma_bits; /* DMA addressing capability */ > > __u32 dma_dir; /* DMA data direction */ > > + __u32 dma_trans_ns; /* time for DMA transmission in ns */ > > __u64 expansion[10];/* For future use */ > > }; > > > > @@ -87,6 +89,10 @@ static int map_benchmark_thread(void *data) > > map_etime = ktime_get(); > > map_delta = ktime_sub(map_etime, map_stime); > > > > + /* Pretend DMA is transmitting */ > > + if (map->dir != DMA_FROM_DEVICE) > > + ndelay(map->bparam.dma_trans_ns); > > TBH I think the option of a fixed delay between map and unmap might be a > handy thing in general, so having the direction check at all seems > needlessly restrictive. As long as the driver implements all the basic > building blocks, combining them to simulate specific traffic patterns > can be left up to the benchmark tool. Sensible, will remove the condition check. > > Robin. > > > + > > unmap_stime = ktime_get(); > > dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir); > > unmap_etime = ktime_get(); > > @@ -218,6 +224,11 @@ static long map_benchmark_ioctl(struct file *file, > unsigned int cmd, > > return -EINVAL; > > } > > > > + if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) { > > + pr_err("invalid transmission delay\n"); > > + return -EINVAL; > > + } > > + > > if (map->bparam.node != NUMA_NO_NODE && > > !node_possible(map->bparam.node)) { > > pr_err("invalid numa node\n"); > > diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c > b/tools/testing/selftests/dma/dma_map_benchmark.c > > index 7065163a8388..dbf426e2fb7f 100644 > > --- a/tools/testing/selftests/dma/dma_map_benchmark.c
Re: [PATCH] dma-mapping: benchmark: pretend DMA is transmitting
On 2021-02-04 22:58, Barry Song wrote: In a real dma mapping user case, after dma_map is done, data will be transmit. Thus, in multi-threaded user scenario, IOMMU contention should not be that severe. For example, if users enable multiple threads to send network packets through 1G/10G/100Gbps NIC, usually the steps will be: map -> transmission -> unmap. Transmission delay reduces the contention of IOMMU. Here a delay is added to simulate the transmission for TX case so that the tested result could be more accurate. RX case would be much more tricky. It is not supported yet. I guess it might be a reasonable approximation to map several pages, then unmap them again after a slightly more random delay. Or maybe divide the threads into pairs of mappers and unmappers respectively filling up and draining proper little buffer pools. Signed-off-by: Barry Song --- kernel/dma/map_benchmark.c | 11 +++ tools/testing/selftests/dma/dma_map_benchmark.c | 17 +++-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c index 1b1b8ff875cb..1976db7e34e4 100644 --- a/kernel/dma/map_benchmark.c +++ b/kernel/dma/map_benchmark.c @@ -21,6 +21,7 @@ #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) #define DMA_MAP_MAX_THREADS 1024 #define DMA_MAP_MAX_SECONDS 300 +#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */ Using MSEC_PER_SEC might be sufficiently self-documenting? #define DMA_MAP_BIDIRECTIONAL 0 #define DMA_MAP_TO_DEVICE 1 @@ -36,6 +37,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ + __u32 dma_trans_ns; /* time for DMA transmission in ns */ __u64 expansion[10];/* For future use */ }; @@ -87,6 +89,10 @@ static int map_benchmark_thread(void *data) map_etime = ktime_get(); map_delta = ktime_sub(map_etime, map_stime); + /* Pretend DMA is transmitting */ + if (map->dir != DMA_FROM_DEVICE) + ndelay(map->bparam.dma_trans_ns); TBH I think the option of a fixed delay between map and unmap might be a handy thing in general, so having the direction check at all seems needlessly restrictive. As long as the driver implements all the basic building blocks, combining them to simulate specific traffic patterns can be left up to the benchmark tool. Robin. + unmap_stime = ktime_get(); dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir); unmap_etime = ktime_get(); @@ -218,6 +224,11 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd, return -EINVAL; } + if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) { + pr_err("invalid transmission delay\n"); + return -EINVAL; + } + if (map->bparam.node != NUMA_NO_NODE && !node_possible(map->bparam.node)) { pr_err("invalid numa node\n"); diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c index 7065163a8388..dbf426e2fb7f 100644 --- a/tools/testing/selftests/dma/dma_map_benchmark.c +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -14,6 +14,7 @@ #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) #define DMA_MAP_MAX_THREADS 1024 #define DMA_MAP_MAX_SECONDS 300 +#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */ #define DMA_MAP_BIDIRECTIONAL 0 #define DMA_MAP_TO_DEVICE 1 @@ -35,6 +36,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ + __u32 dma_trans_ns; /* delay for DMA transmission in ns */ __u64 expansion[10];/* For future use */ }; @@ -45,12 +47,12 @@ int main(int argc, char **argv) /* default single thread, run 20 seconds on NUMA_NO_NODE */ int threads = 1, seconds = 20, node = -1; /* default dma mask 32bit, bidirectional DMA */ - int bits = 32, dir = DMA_MAP_BIDIRECTIONAL; + int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL; int cmd = DMA_MAP_BENCHMARK; char *p; - while ((opt = getopt(argc, argv, "t:s:n:b:d:")) != -1) { + while ((opt = getopt(argc, argv, "t:s:n:b:d:x:")) != -1) { switch (opt) { case 't': threads = atoi(optarg); @@ -67,6 +69,9 @@ int main(int argc, char **argv) case 'd': dir = atoi(optarg); break; + case 'x': +
Re: [PATCH v3 1/2] dt-bindings: iommu: add bindings for sprd iommu
On Wed, Feb 03, 2021 at 05:07:26PM +0800, Chunyan Zhang wrote: > From: Chunyan Zhang > > This iommu module can be used by Unisoc's multimedia devices, such as > display, Image codec(jpeg) and a few signal processors, including > VSP(video), GSP(graphic), ISP(image), and CPP(camera pixel processor), etc. > > Signed-off-by: Chunyan Zhang > --- > .../devicetree/bindings/iommu/sprd,iommu.yaml | 72 +++ > 1 file changed, 72 insertions(+) > create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml > > diff --git a/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml > b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml > new file mode 100644 > index ..4fc99e81fa66 > --- /dev/null > +++ b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml > @@ -0,0 +1,72 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +# Copyright 2020 Unisoc Inc. > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/iommu/sprd,iommu.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Unisoc IOMMU and Multi-media MMU > + > +maintainers: > + - Chunyan Zhang > + > +properties: > + compatible: > +enum: > + - sprd,iommu-v1 > + > + "#iommu-cells": > +const: 0 > +description: > + Unisoc IOMMUs are all single-master IOMMU devices, therefore no > + additional information needs to associate with its master device. > + Please refer to the generic bindings document for more details, > + Documentation/devicetree/bindings/iommu/iommu.txt > + > + reg: > +maxItems: 1 > +description: > + Not required if 'sprd,iommu-regs' is defined. > + > + clocks: > +description: > + Reference to a gate clock phandle, since access to some of IOMMUs are > + controlled by gate clock, but this is not required. > + > + sprd,iommu-regs: > +$ref: /schemas/types.yaml#/definitions/phandle-array > +description: > + Reference to a syscon phandle plus 1 cell, the syscon defines the > + register range used by the iommu and the media device, the cell > + defines the offset for iommu registers. Since iommu module shares > + the same register range with the media device which uses it. > + > +required: > + - compatible > + - "#iommu-cells" > + > +oneOf: > + - required: > + - reg > + - required: > + - sprd,iommu-regs > + > +additionalProperties: false > + > +examples: > + - | > +iommu_disp: iommu-disp { > + compatible = "sprd,iommu-v1"; > + sprd,iommu-regs = <&dpu_regs 0x800>; If the IOMMU is contained within another device, then it should just be a child node of that device. Or just make 'dpu_regs' an IOMMU provider (i.e. just add #iommu-cells to it). > + #iommu-cells = <0>; > +}; > + > + - | > +iommu_jpg: iommu-jpg { > + compatible = "sprd,iommu-v1"; > + sprd,iommu-regs = <&jpg_regs 0x300>; > + #iommu-cells = <0>; > + clocks = <&mm_gate 1>; > +}; > + > +... > -- > 2.25.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 7/8] swiotlb: respect min_align_mask
On 2021-02-04 19:30, Christoph Hellwig wrote: Respect the min_align_mask in struct device_dma_parameters in swiotlb. There are two parts to it: 1) for the lower bits of the alignment inside the io tlb slot, just extent the size of the allocation and leave the start of the slot empty 2) for the high bits ensure we find a slot that matches the high bits of the alignment to avoid wasting too much memory Based on an earlier patch from Jianxiong Gao . Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 49 +--- 1 file changed, 42 insertions(+), 7 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 6a2439826a1ba4..ab3192142b9906 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, } } +/* + * Return the offset into a iotlb slot required to keep the device happy. + */ +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr) +{ + unsigned min_align_mask = dma_get_min_align_mask(dev); + + if (!min_align_mask) + return 0; I doubt that's beneficial - even if the compiler can convert it into a csel, it'll then be doing unnecessary work to throw away a cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters ;) + return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1); (BTW, for readability throughout, "#define IO_TLB_SIZE (1 << IO_TLB_SHIFT)" sure wouldn't go amiss...) +} + /* * Carefully handle integer overflow which can occur when boundary_mask == ~0UL. */ @@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long boundary_mask) return nr_slots(boundary_mask + 1); } +static inline bool check_alignment(phys_addr_t orig_addr, + dma_addr_t tbl_dma_addr, unsigned int index, + unsigned int min_align_mask) +{ + if (!min_align_mask) + return true; Ditto - even the 5 or so operations this might skip is unlikely to outweigh a branch on anything that matters, and again csel would be a net loss since x & 0 == y & 0 is still the correct answer. + return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) == + (orig_addr & min_align_mask); +} + static unsigned int wrap_index(unsigned int index) { if (index >= io_tlb_nslabs) @@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index) * Find a suitable number of IO TLB entries size that will fit this request and * allocate a buffer from that IO TLB pool. */ -static int find_slots(struct device *dev, size_t alloc_size, - dma_addr_t tbl_dma_addr) +static int find_slots(struct device *dev, phys_addr_t orig_addr, + size_t alloc_size, dma_addr_t tbl_dma_addr) { + unsigned int min_align_mask = dma_get_min_align_mask(dev) & + ~((1 << IO_TLB_SHIFT) - 1); unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev)); unsigned int nslots = nr_slots(alloc_size), stride = 1; unsigned int index, wrap, count = 0, i; @@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size, * For mappings greater than or equal to a page, we limit the stride * (and hence alignment) to a page size. */ - if (alloc_size >= PAGE_SIZE) + if (min_align_mask) + stride = (min_align_mask + 1) >> IO_TLB_SHIFT; So this can't underflow because "min_align_mask" is actually just the high-order bits representing the number of iotlb slots needed to meet the requirement, right? (It took a good 5 minutes to realise this wasn't doing what I initially thought it did...) In that case, a) could the local var be called something like iotlb_align_mask to clarify that it's *not* just a copy of the device's min_align_mask, and b) maybe just have an unconditional initialisation that works either way: stride = (min_align_mask >> IO_TLB_SHIFT) + 1; In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE in the call to check_alignment() below, or shift everything down by IO_TLB_SHIFT in check_alignment() itself, instead of mangling min_align_mask at all (I'm assuming we do need to ignore the low-order bits of orig_addr at this point). Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] dma-mapping: benchmark: pretend DMA is transmitting
In a real dma mapping user case, after dma_map is done, data will be transmit. Thus, in multi-threaded user scenario, IOMMU contention should not be that severe. For example, if users enable multiple threads to send network packets through 1G/10G/100Gbps NIC, usually the steps will be: map -> transmission -> unmap. Transmission delay reduces the contention of IOMMU. Here a delay is added to simulate the transmission for TX case so that the tested result could be more accurate. RX case would be much more tricky. It is not supported yet. Signed-off-by: Barry Song --- kernel/dma/map_benchmark.c | 11 +++ tools/testing/selftests/dma/dma_map_benchmark.c | 17 +++-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c index 1b1b8ff875cb..1976db7e34e4 100644 --- a/kernel/dma/map_benchmark.c +++ b/kernel/dma/map_benchmark.c @@ -21,6 +21,7 @@ #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) #define DMA_MAP_MAX_THREADS1024 #define DMA_MAP_MAX_SECONDS300 +#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */ #define DMA_MAP_BIDIRECTIONAL 0 #define DMA_MAP_TO_DEVICE 1 @@ -36,6 +37,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ + __u32 dma_trans_ns; /* time for DMA transmission in ns */ __u64 expansion[10];/* For future use */ }; @@ -87,6 +89,10 @@ static int map_benchmark_thread(void *data) map_etime = ktime_get(); map_delta = ktime_sub(map_etime, map_stime); + /* Pretend DMA is transmitting */ + if (map->dir != DMA_FROM_DEVICE) + ndelay(map->bparam.dma_trans_ns); + unmap_stime = ktime_get(); dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir); unmap_etime = ktime_get(); @@ -218,6 +224,11 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd, return -EINVAL; } + if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) { + pr_err("invalid transmission delay\n"); + return -EINVAL; + } + if (map->bparam.node != NUMA_NO_NODE && !node_possible(map->bparam.node)) { pr_err("invalid numa node\n"); diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c index 7065163a8388..dbf426e2fb7f 100644 --- a/tools/testing/selftests/dma/dma_map_benchmark.c +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -14,6 +14,7 @@ #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) #define DMA_MAP_MAX_THREADS1024 #define DMA_MAP_MAX_SECONDS 300 +#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */ #define DMA_MAP_BIDIRECTIONAL 0 #define DMA_MAP_TO_DEVICE 1 @@ -35,6 +36,7 @@ struct map_benchmark { __s32 node; /* which numa node this benchmark will run on */ __u32 dma_bits; /* DMA addressing capability */ __u32 dma_dir; /* DMA data direction */ + __u32 dma_trans_ns; /* delay for DMA transmission in ns */ __u64 expansion[10];/* For future use */ }; @@ -45,12 +47,12 @@ int main(int argc, char **argv) /* default single thread, run 20 seconds on NUMA_NO_NODE */ int threads = 1, seconds = 20, node = -1; /* default dma mask 32bit, bidirectional DMA */ - int bits = 32, dir = DMA_MAP_BIDIRECTIONAL; + int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL; int cmd = DMA_MAP_BENCHMARK; char *p; - while ((opt = getopt(argc, argv, "t:s:n:b:d:")) != -1) { + while ((opt = getopt(argc, argv, "t:s:n:b:d:x:")) != -1) { switch (opt) { case 't': threads = atoi(optarg); @@ -67,6 +69,9 @@ int main(int argc, char **argv) case 'd': dir = atoi(optarg); break; + case 'x': + xdelay = atoi(optarg); + break; default: return -1; } @@ -84,6 +89,12 @@ int main(int argc, char **argv) exit(1); } + if (xdelay < 0 || xdelay > DMA_MAP_MAX_TRANS_DELAY) { + fprintf(stderr, "invalid transmit delay, must be in 0-%d\n", + DMA_MAP_MAX_TRANS_DELAY); + exit(1); + } + /* suppose the mininum DMA zone is 1MB in the world */ if (bits < 20 || bits > 64) { fprintf(stderr, "invalid dma mask bit, must be in 20-64\n"); @@ -107,6 +118,8 @@ int main(int argc, char **argv)
Re: [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
On 2021-02-04 19:30, Christoph Hellwig wrote: Split out a bunch of a self-contained helpers to make the function easier to follow. Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 177 +-- 1 file changed, 87 insertions(+), 90 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 79d5b236f25f10..e78615e3be2906 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -468,134 +468,131 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, } } -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, - size_t mapping_size, size_t alloc_size, - enum dma_data_direction dir, unsigned long attrs) +/* + * Carefully handle integer overflow which can occur when boundary_mask == ~0UL. + */ +static inline unsigned long get_max_slots(unsigned long boundary_mask) { - dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start); - unsigned long flags; - phys_addr_t tlb_addr; - unsigned int nslots, stride, index, wrap; - int i; - unsigned long mask; - unsigned long offset_slots; - unsigned long max_slots; - unsigned long tmp_io_tlb_used; - - if (no_iotlb_memory) - panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer"); - - if (mem_encrypt_active()) - pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n"); - - if (mapping_size > alloc_size) { - dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)", - mapping_size, alloc_size); - return (phys_addr_t)DMA_MAPPING_ERROR; - } - - mask = dma_get_seg_boundary(hwdev); + if (boundary_mask + 1 == ~0UL) Either "mask == ~0UL" or "mask + 1 == 0", surely? + return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); + return nr_slots(boundary_mask + 1); +} - tbl_dma_addr &= mask; +static unsigned int wrap_index(unsigned int index) +{ + if (index >= io_tlb_nslabs) + return 0; + return index; +} - offset_slots = nr_slots(tbl_dma_addr); +/* + * Find a suitable number of IO TLB entries size that will fit this request and + * allocate a buffer from that IO TLB pool. + */ +static int find_slots(struct device *dev, size_t alloc_size, + dma_addr_t tbl_dma_addr) +{ + unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev)); + unsigned int nslots = nr_slots(alloc_size), stride = 1; + unsigned int index, wrap, count = 0, i; + unsigned long flags; - /* -* Carefully handle integer overflow which can occur when mask == ~0UL. -*/ - max_slots = mask + 1 - ? nr_slots(mask + 1) - : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); ...since the condition here is effectively the latter. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/8] swiotlb: factor out a nr_slots helper
On 2021-02-04 19:30, Christoph Hellwig wrote: Factor out a helper to find the number of slots for a given size. Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 838dbad10ab916..0c0b81799edbdb 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long val) return val & (IO_TLB_SEGSIZE - 1); } +static unsigned long nr_slots(u64 val) +{ + return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; Would DIV_ROUND_UP(val, 1 << IOTLB_SHIFT) be even clearer? Robin. +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -493,20 +498,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, tbl_dma_addr &= mask; - offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; + offset_slots = nr_slots(tbl_dma_addr); /* * Carefully handle integer overflow which can occur when mask == ~0UL. */ max_slots = mask + 1 - ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT + ? nr_slots(mask + 1) : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); /* * For mappings greater than or equal to a page, we limit the stride * (and hence alignment) to a page size. */ - nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; + nslots = nr_slots(alloc_size); if (alloc_size >= PAGE_SIZE) stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT)); else @@ -602,7 +607,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, enum dma_data_direction dir, unsigned long attrs) { unsigned long flags; - int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; + int i, count, nslots = nr_slots(alloc_size); int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT; phys_addr_t orig_addr = io_tlb_orig_addr[index]; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: Question regarding VIOT proposal
On 03 Feb 2021 09:46, Jean-Philippe Brucker wrote: > On Tue, Feb 02, 2021 at 01:27:13PM -0700, Al Stone wrote: > > On 02 Feb 2021 10:17, Jean-Philippe Brucker wrote: > > > Hi Al, > > > > > > On Fri, Dec 04, 2020 at 01:18:25PM -0700, Al Stone wrote: > > > > > I updated the doc: https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf > > > > > You can incorporate it into the ASWG proposal. > > > > > Changes since v8: > > > > > * One typo (s/programing/programming/) > > > > > * Modified the PCI Range node to include a segment range. > > > > > > > > > > I also updated the Linux and QEMU implementations on branch > > > > > virtio-iommu/devel in https://jpbrucker.net/git/linux/ and > > > > > https://jpbrucker.net/git/qemu/ > > > > > > > > > > Thanks again for helping with this > > > > > > > > > > Jean > > > > > > > > Perfect. Thanks. I'll update the ASWG info right away. > > > > > > Has there been any more feedback on the VIOT specification? I'm wondering > > > when we can start upstreaming support for it. > > > > > > Thanks, > > > Jean > > > > Ah, sorry, Jean. I meant to get back to you sooner. My apologies. > > > > The latest version that resulted from the discussion with Yinghan of > > Microsoft is the one in front the ASWG; I think if you upstream that > > version, it should be identical to the spec when it is next published > > (post ACPI 6.4). > > > > The process is: (1) propose the change, (2) review it in committee, > > (3) give people more time to think about it, then (4) have a finale > > vote. We've been iterating over (1), (2) and (3). Since there was > > no new discussion at the last meeting, we should have the final vote > > on this (4) in the next meeting. I had hoped we could do that last > > week but the meeting was canceled at the last minute. I hope to have > > the final vote this Thursday and will let you know as soon as it has > > been decided. > > > > My apologies for the delays; getting things done by committee always > > takes a bazillion times longer than one would think. > > No worries, thanks a lot for the update! > > Thanks, > Jean Sigh. Just got a note that today's meeting has been canceled :(. So, next week for the final vote. OTOH, there have been no comments of any sort on the proposal -- and silence is good, in this case. -- ciao, al --- Al Stone Software Engineer Red Hat, Inc. a...@redhat.com --- ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log
On 2021-01-28 15:17, Keqian Zhu wrote: From: jiangkunkun During dirty log tracking, user will try to retrieve dirty log from iommu if it supports hardware dirty log. This adds a new interface named sync_dirty_log in iommu layer and arm smmuv3 implements it, which scans leaf TTD and treats it's dirty if it's writable (As we just enable HTTU for stage1, so check AP[2] is not set). Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++ drivers/iommu/io-pgtable-arm.c | 90 + drivers/iommu/iommu.c | 41 ++ include/linux/io-pgtable.h | 4 + include/linux/iommu.h | 17 5 files changed, 179 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 2434519e4bb6..43d0536b429a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2548,6 +2548,32 @@ static size_t arm_smmu_merge_page(struct iommu_domain *domain, unsigned long iov return ops->merge_page(ops, iova, paddr, size, prot); } +static int arm_smmu_sync_dirty_log(struct iommu_domain *domain, + unsigned long iova, size_t size, + unsigned long *bitmap, + unsigned long base_iova, + unsigned long bitmap_pgshift) +{ + struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; + + if (!(smmu->features & ARM_SMMU_FEAT_HTTU_HD)) { + dev_err(smmu->dev, "don't support HTTU_HD and sync dirty log\n"); + return -EPERM; + } + + if (!ops || !ops->sync_dirty_log) { + pr_err("don't support sync dirty log\n"); + return -ENODEV; + } + + /* To ensure all inflight transactions are completed */ + arm_smmu_flush_iotlb_all(domain); What about transactions that arrive between the point that this completes, and the point - potentially much later - that we actually access any given PTE during the walk? I don't see what this is supposed to be synchronising against, even if it were just a CMD_SYNC (I especially don't see why we'd want to knock out the TLBs). + + return ops->sync_dirty_log(ops, iova, size, bitmap, + base_iova, bitmap_pgshift); +} + static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) { return iommu_fwspec_add_ids(dev, args->args, 1); @@ -2649,6 +2675,7 @@ static struct iommu_ops arm_smmu_ops = { .domain_set_attr= arm_smmu_domain_set_attr, .split_block= arm_smmu_split_block, .merge_page = arm_smmu_merge_page, + .sync_dirty_log = arm_smmu_sync_dirty_log, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 17390f258eb1..6cfe1ef3fedd 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -877,6 +877,95 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long iova return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot); } +static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data, +unsigned long iova, size_t size, +int lvl, arm_lpae_iopte *ptep, +unsigned long *bitmap, +unsigned long base_iova, +unsigned long bitmap_pgshift) +{ + arm_lpae_iopte pte; + struct io_pgtable *iop = &data->iop; + size_t base, next_size; + unsigned long offset; + int nbits, ret; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return -EINVAL; + + ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); + pte = READ_ONCE(*ptep); + if (WARN_ON(!pte)) + return -EINVAL; + + if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { + if (iopte_leaf(pte, lvl, iop->fmt)) { + if (pte & ARM_LPAE_PTE_AP_RDONLY) + return 0; + + /* It is writable, set the bitmap */ + nbits = size >> bitmap_pgshift; + offset = (iova - base_iova) >> bitmap_pgshift; + bitmap_set(bitmap, offset, nbits); + return 0; + } else { + /* To traverse next level */ + next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data); +
Re: [RFC PATCH 05/11] iommu/arm-smmu-v3: Merge a span of page to block descriptor
On 2021-01-28 15:17, Keqian Zhu wrote: From: jiangkunkun When stop dirty log tracking, we need to recover all block descriptors which are splited when start dirty log tracking. This adds a new interface named merge_page in iommu layer and arm smmuv3 implements it, which reinstall block mappings and unmap the span of page mappings. It's caller's duty to find contiuous physical memory. During merging page, other interfaces are not expected to be working, so race condition does not exist. And we flush all iotlbs after the merge procedure is completed to ease the pressure of iommu, as we will merge a huge range of page mappings in general. Again, I think we need better reasoning than "race conditions don't exist because we don't expect them to exist". Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 ++ drivers/iommu/io-pgtable-arm.c | 78 + drivers/iommu/iommu.c | 75 include/linux/io-pgtable.h | 2 + include/linux/iommu.h | 10 +++ 5 files changed, 185 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5469f4fca820..2434519e4bb6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2529,6 +2529,25 @@ static size_t arm_smmu_split_block(struct iommu_domain *domain, return ops->split_block(ops, iova, size); } +static size_t arm_smmu_merge_page(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; + + if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) { + dev_err(smmu->dev, "don't support BBML1/2 and merge page\n"); + return 0; + } + + if (!ops || !ops->merge_page) { + pr_err("don't support merge page\n"); + return 0; + } + + return ops->merge_page(ops, iova, paddr, size, prot); +} + static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) { return iommu_fwspec_add_ids(dev, args->args, 1); @@ -2629,6 +2648,7 @@ static struct iommu_ops arm_smmu_ops = { .domain_get_attr= arm_smmu_domain_get_attr, .domain_set_attr= arm_smmu_domain_set_attr, .split_block= arm_smmu_split_block, + .merge_page = arm_smmu_merge_page, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index f3b7f7115e38..17390f258eb1 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops *ops, return __arm_lpae_split_block(data, iova, size, lvl, ptep); } +static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data, + unsigned long iova, phys_addr_t paddr, + size_t size, int lvl, arm_lpae_iopte *ptep, + arm_lpae_iopte prot) +{ + arm_lpae_iopte pte, *tablep; + struct io_pgtable *iop = &data->iop; + struct io_pgtable_cfg *cfg = &data->iop.cfg; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return 0; + + ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); + pte = READ_ONCE(*ptep); + if (WARN_ON(!pte)) + return 0; + + if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { + if (iopte_leaf(pte, lvl, iop->fmt)) + return size; + + /* Race does not exist */ + if (cfg->bbml == 1) { + prot |= ARM_LPAE_PTE_NT; + __arm_lpae_init_pte(data, paddr, prot, lvl, ptep); + io_pgtable_tlb_flush_walk(iop, iova, size, + ARM_LPAE_GRANULE(data)); + + prot &= ~(ARM_LPAE_PTE_NT); + __arm_lpae_init_pte(data, paddr, prot, lvl, ptep); + } else { + __arm_lpae_init_pte(data, paddr, prot, lvl, ptep); + } + + tablep = iopte_deref(pte, data); + __arm_lpae_free_pgtable(data, lvl + 1, tablep); + return size; + } else if (iopte_leaf(pte, lvl, iop->fmt)) { + /* The size is too small, already merged */ + return size; + } + + /* Keep on walkin */ + ptep = iopte_deref(pte, data); + return __a
Re: [RFC PATCH 04/11] iommu/arm-smmu-v3: Split block descriptor to a span of page
On 2021-01-28 15:17, Keqian Zhu wrote: From: jiangkunkun Block descriptor is not a proper granule for dirty log tracking. This adds a new interface named split_block in iommu layer and arm smmuv3 implements it, which splits block descriptor to an equivalent span of page descriptors. During spliting block, other interfaces are not expected to be working, so race condition does not exist. And we flush all iotlbs after the split procedure is completed to ease the pressure of iommu, as we will split a huge range of block mappings in general. "Not expected to be" is not the same thing as "can not". Presumably the whole point of dirty log tracking is that it can be run speculatively in the background, so is there any actual guarantee that the guest can't, say, issue a hotplug event that would cause some memory to be released back to the host and unmapped while a scan might be in progress? Saying effectively "there is no race condition as long as you assume there is no race condition" isn't all that reassuring... That said, it's not very clear why patches #4 and #5 are here at all, given that patches #6 and #7 appear quite happy to handle block entries. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 drivers/iommu/io-pgtable-arm.c | 122 drivers/iommu/iommu.c | 40 +++ include/linux/io-pgtable.h | 2 + include/linux/iommu.h | 10 ++ 5 files changed, 194 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9208881a571c..5469f4fca820 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2510,6 +2510,25 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, return ret; } +static size_t arm_smmu_split_block(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; + struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + + if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) { + dev_err(smmu->dev, "don't support BBML1/2 and split block\n"); + return 0; + } + + if (!ops || !ops->split_block) { + pr_err("don't support split block\n"); + return 0; + } + + return ops->split_block(ops, iova, size); +} + static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) { return iommu_fwspec_add_ids(dev, args->args, 1); @@ -2609,6 +2628,7 @@ static struct iommu_ops arm_smmu_ops = { .device_group = arm_smmu_device_group, .domain_get_attr= arm_smmu_domain_get_attr, .domain_set_attr= arm_smmu_domain_set_attr, + .split_block= arm_smmu_split_block, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index e299a44808ae..f3b7f7115e38 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -79,6 +79,8 @@ #define ARM_LPAE_PTE_SH_IS(((arm_lpae_iopte)3) << 8) #define ARM_LPAE_PTE_NS (((arm_lpae_iopte)1) << 5) #define ARM_LPAE_PTE_VALID(((arm_lpae_iopte)1) << 0) +/* Block descriptor bits */ +#define ARM_LPAE_PTE_NT(((arm_lpae_iopte)1) << 16) #define ARM_LPAE_PTE_ATTR_LO_MASK (((arm_lpae_iopte)0x3ff) << 2) /* Ignore the contiguous bit for block splitting */ @@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, return iopte_to_paddr(pte, data) | iova; } +static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data, +unsigned long iova, size_t size, int lvl, +arm_lpae_iopte *ptep); + +static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data, + unsigned long iova, size_t size, + arm_lpae_iopte blk_pte, int lvl, + arm_lpae_iopte *ptep) +{ + struct io_pgtable_cfg *cfg = &data->iop.cfg; + arm_lpae_iopte pte, *tablep; + phys_addr_t blk_paddr; + size_t tablesz = ARM_LPAE_GRANULE(data); + size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data); + int i; + + if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) + return 0; + + tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg); + if (!tablep) + return 0; + + blk_paddr = iopte_to_paddr(blk_pte, data); +
Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU
On 2021-01-28 15:17, Keqian Zhu wrote: From: jiangkunkun The SMMU which supports HTTU (Hardware Translation Table Update) can update the access flag and the dirty state of TTD by hardware. It is essential to track dirty pages of DMA. This adds feature detection, none functional change. Co-developed-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 include/linux/io-pgtable.h | 1 + 3 files changed, 25 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8ca7415d785d..0f0fe71cc10d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain, .pgsize_bitmap = smmu->pgsize_bitmap, .ias= ias, .oas= oas, + .httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD, .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY, .tlb= &arm_smmu_flush_ops, .iommu_dev = smmu->dev, @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) if (reg & IDR0_HYP) smmu->features |= ARM_SMMU_FEAT_HYP; + switch (FIELD_GET(IDR0_HTTU, reg)) { We need to accommodate the firmware override as well if we need this to be meaningful. Jean-Philippe is already carrying a suitable patch in the SVA stack[1]. + case IDR0_HTTU_NONE: + break; + case IDR0_HTTU_HA: + smmu->features |= ARM_SMMU_FEAT_HTTU_HA; + break; + case IDR0_HTTU_HAD: + smmu->features |= ARM_SMMU_FEAT_HTTU_HA; + smmu->features |= ARM_SMMU_FEAT_HTTU_HD; + break; + default: + dev_err(smmu->dev, "unknown/unsupported HTTU!\n"); + return -ENXIO; + } + /* * The coherency feature as set by FW is used in preference to the ID * register, but warn on mismatch. diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 96c2e9565e00..e91bea44519e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -33,6 +33,10 @@ #define IDR0_ASID16 (1 << 12) #define IDR0_ATS (1 << 10) #define IDR0_HYP (1 << 9) +#define IDR0_HTTU GENMASK(7, 6) +#define IDR0_HTTU_NONE 0 +#define IDR0_HTTU_HA 1 +#define IDR0_HTTU_HAD 2 #define IDR0_COHACC (1 << 4) #define IDR0_TTF GENMASK(3, 2) #define IDR0_TTF_AARCH64 2 @@ -286,6 +290,8 @@ #define CTXDESC_CD_0_TCR_TBI0 (1ULL << 38) #define CTXDESC_CD_0_AA64 (1UL << 41) +#define CTXDESC_CD_0_HD(1UL << 42) +#define CTXDESC_CD_0_HA(1UL << 43) #define CTXDESC_CD_0_S(1UL << 44) #define CTXDESC_CD_0_R(1UL << 45) #define CTXDESC_CD_0_A(1UL << 46) @@ -604,6 +610,8 @@ struct arm_smmu_device { #define ARM_SMMU_FEAT_RANGE_INV (1 << 15) #define ARM_SMMU_FEAT_BTM (1 << 16) #define ARM_SMMU_FEAT_SVA (1 << 17) +#define ARM_SMMU_FEAT_HTTU_HA (1 << 18) +#define ARM_SMMU_FEAT_HTTU_HD (1 << 19) u32 features; #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0) diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index ea727eb1a1a9..1a00ea8562c7 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -97,6 +97,7 @@ struct io_pgtable_cfg { unsigned long pgsize_bitmap; unsigned intias; unsigned intoas; + boolhttu_hd; This is very specific to the AArch64 stage 1 format, not a generic capability - I think it should be a quirk flag rather than a common field. Robin. [1] https://jpbrucker.net/git/linux/commit/?h=sva/current&id=1ef7d512fb9082450dfe0d22ca4f7e35625a097b boolcoherent_walk; const struct iommu_flush_ops*tlb; struct device *iommu_dev; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters
On Thu, Feb 04, 2021 at 08:30:28PM +0100, Christoph Hellwig wrote: > From: Jianxiong Gao > > Some devices rely on the address offset in a page to function > correctly (NVMe driver as an example). These devices may use > a different page size than the Linux kernel. The address offset > has to be preserved upon mapping, and in order to do so, we > need to record the page_offset_mask first. > > Signed-off-by: Jianxiong Gao > Signed-off-by: Christoph Hellwig > --- > include/linux/device.h | 1 + > include/linux/dma-mapping.h | 16 > 2 files changed, 17 insertions(+) Acked-by: Greg Kroah-Hartman ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC v1 2/6] swiotlb: convert variables to arrays
On Thu, Feb 04, 2021 at 11:49:23AM +, Robin Murphy wrote: > On 2021-02-04 07:29, Christoph Hellwig wrote: > > On Wed, Feb 03, 2021 at 03:37:05PM -0800, Dongli Zhang wrote: > > > This patch converts several swiotlb related variables to arrays, in > > > order to maintain stat/status for different swiotlb buffers. Here are > > > variables involved: > > > > > > - io_tlb_start and io_tlb_end > > > - io_tlb_nslabs and io_tlb_used > > > - io_tlb_list > > > - io_tlb_index > > > - max_segment > > > - io_tlb_orig_addr > > > - no_iotlb_memory > > > > > > There is no functional change and this is to prepare to enable 64-bit > > > swiotlb. > > > > Claire Chang (on Cc) already posted a patch like this a month ago, > > which looks much better because it actually uses a struct instead > > of all the random variables. > > Indeed, I skimmed the cover letter and immediately thought that this whole > thing is just the restricted DMA pool concept[1] again, only from a slightly > different angle. Kind of. Let me lay out how some of these pieces are right now: +---+ +--+ | | | | | | | | | a)Xen-SWIOTLB | | b)SWIOTLB (for !Xen) | | | | | +---XX--+ +---X--+ X XX XXX X XX +--XX---+ | | | | | c) SWIOTLB generic | | | +---+ Dongli's patches modify the SWIOTLB generic c), and Xen-SWIOTLB a) parts. Also see the IOMMU_INIT logic which lays this a bit more deepth (for example how to enable SWIOTLB on AMD boxes, or IBM with Calgary IOMMU, etc - see iommu_table.h). Furtheremore it lays the groundwork to allocate AMD SEV SWIOTLB buffers later after boot (so that you can stich different pools together). All the bits are kind of inside of the SWIOTLB code. And also it changes the Xen-SWIOTLB to do something similar. The mempool did it similarly by taking the internal parts (aka the various io_tlb) of SWIOTLB and exposing them out and having other code: +---+ +--+ | | | | | | | | | a)Xen-SWIOTLB | | b)SWIOTLB (for !Xen) | | | | | +---XX--+ +---X--+ X XX XXX X XX +--XX---+ +--+ | | | Device tree | | +<+ enabling SWIOTLB | |c) SWIOTLB generic | | | | | | mempool | +---+ +--+ What I was suggesting to Clarie to follow Xen model, that is do something like this: +---+ +--+ ++ | | | | || | | | | || | a)Xen-SWIOTLB | | b)SWIOTLB (for !Xen) | | e) DT-SWIOTLB | | | | | || +---XX--+ +---X--+ +XX-X+ XXXX X X XX X XX XX XXX X XX X +--XXX--+ | | | | |c) SWIOTLB generic | | | +---+ so using the SWIOTLB generic parts, and then bolt on top of the device-tree logic, along with the mempool logic. But Christopher has an interesting suggestion which is to squash the all the existing code (a, b, c) all together and pepper it with various jump-tables. So: -+ | SWIOTLB: | || | a) SWIOTLB (for non-Xen) | | b) Xen-SWIOTLB| | c) DT-SWIOTLB | || || -+ with all the various bits (M2P/P2M for Xen, mempool for ARM, and normal allocation for BM) in one big file. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/l
Re: [PATCH 8/8] nvme-pci: set min_align_mask
> + dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1); And due to a last minute change from me this doesn't actually compile, as pdev should be dev. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 7/8] swiotlb: respect min_align_mask
Respect the min_align_mask in struct device_dma_parameters in swiotlb. There are two parts to it: 1) for the lower bits of the alignment inside the io tlb slot, just extent the size of the allocation and leave the start of the slot empty 2) for the high bits ensure we find a slot that matches the high bits of the alignment to avoid wasting too much memory Based on an earlier patch from Jianxiong Gao . Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 49 +--- 1 file changed, 42 insertions(+), 7 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 6a2439826a1ba4..ab3192142b9906 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, } } +/* + * Return the offset into a iotlb slot required to keep the device happy. + */ +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr) +{ + unsigned min_align_mask = dma_get_min_align_mask(dev); + + if (!min_align_mask) + return 0; + return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1); +} + /* * Carefully handle integer overflow which can occur when boundary_mask == ~0UL. */ @@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long boundary_mask) return nr_slots(boundary_mask + 1); } +static inline bool check_alignment(phys_addr_t orig_addr, + dma_addr_t tbl_dma_addr, unsigned int index, + unsigned int min_align_mask) +{ + if (!min_align_mask) + return true; + return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) == + (orig_addr & min_align_mask); +} + static unsigned int wrap_index(unsigned int index) { if (index >= io_tlb_nslabs) @@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index) * Find a suitable number of IO TLB entries size that will fit this request and * allocate a buffer from that IO TLB pool. */ -static int find_slots(struct device *dev, size_t alloc_size, - dma_addr_t tbl_dma_addr) +static int find_slots(struct device *dev, phys_addr_t orig_addr, + size_t alloc_size, dma_addr_t tbl_dma_addr) { + unsigned int min_align_mask = dma_get_min_align_mask(dev) & + ~((1 << IO_TLB_SHIFT) - 1); unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev)); unsigned int nslots = nr_slots(alloc_size), stride = 1; unsigned int index, wrap, count = 0, i; @@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size, * For mappings greater than or equal to a page, we limit the stride * (and hence alignment) to a page size. */ - if (alloc_size >= PAGE_SIZE) + if (min_align_mask) + stride = (min_align_mask + 1) >> IO_TLB_SHIFT; + else if (alloc_size >= PAGE_SIZE) stride <<= (PAGE_SHIFT - IO_TLB_SHIFT); spin_lock_irqsave(&io_tlb_lock, flags); @@ -512,6 +538,12 @@ static int find_slots(struct device *dev, size_t alloc_size, index = wrap = wrap_index(ALIGN(io_tlb_index, stride)); do { + if (!check_alignment(orig_addr, tbl_dma_addr, index, +min_align_mask)) { + index = wrap_index(index + 1); + continue; + } + /* * If we find a slot that indicates we have 'nslots' number of * contiguous buffers, we allocate the buffers from that slot @@ -557,6 +589,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, { dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, io_tlb_start) & dma_get_seg_boundary(dev); + unsigned int offset = swiotlb_align_offset(dev, orig_addr); unsigned int index, i; phys_addr_t tlb_addr; @@ -572,7 +605,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, return (phys_addr_t)DMA_MAPPING_ERROR; } - index = find_slots(dev, alloc_size, tbl_dma_addr); + alloc_size += offset; + index = find_slots(dev, orig_addr, alloc_size, tbl_dma_addr); if (index == -1) { if (!(attrs & DMA_ATTR_NO_WARN)) dev_warn_ratelimited(dev, @@ -589,7 +623,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, for (i = 0; i < nr_slots(alloc_size); i++) io_tlb_orig_addr[index + i] = orig_addr + (i << IO_TLB_SHIFT); - tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT); + tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT) + offset; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) swiotlb_bou
[PATCH 8/8] nvme-pci: set min_align_mask
From: Jianxiong Gao The PRP addressing scheme requires all PRP entries except for the first one to have a zero offset into the NVMe controller pages (which can be different from the Linux PAGE_SIZE). Use the min_align_mask device parameter to ensure that swiotlb does not change the address of the buffer modulo the device page size to ensure that the PRPs won't be malformed. Signed-off-by: Jianxiong Gao Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 81e6389b204205..5d194b4e814719 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2629,6 +2629,7 @@ static void nvme_reset_work(struct work_struct *work) * Don't limit the IOMMU merged segment size. */ dma_set_max_seg_size(dev->dev, 0x); + dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1); mutex_unlock(&dev->shutdown_lock); -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single
Split out a bunch of a self-contained helpers to make the function easier to follow. Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 177 +-- 1 file changed, 87 insertions(+), 90 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 79d5b236f25f10..e78615e3be2906 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -468,134 +468,131 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, } } -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, - size_t mapping_size, size_t alloc_size, - enum dma_data_direction dir, unsigned long attrs) +/* + * Carefully handle integer overflow which can occur when boundary_mask == ~0UL. + */ +static inline unsigned long get_max_slots(unsigned long boundary_mask) { - dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start); - unsigned long flags; - phys_addr_t tlb_addr; - unsigned int nslots, stride, index, wrap; - int i; - unsigned long mask; - unsigned long offset_slots; - unsigned long max_slots; - unsigned long tmp_io_tlb_used; - - if (no_iotlb_memory) - panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer"); - - if (mem_encrypt_active()) - pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n"); - - if (mapping_size > alloc_size) { - dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)", - mapping_size, alloc_size); - return (phys_addr_t)DMA_MAPPING_ERROR; - } - - mask = dma_get_seg_boundary(hwdev); + if (boundary_mask + 1 == ~0UL) + return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); + return nr_slots(boundary_mask + 1); +} - tbl_dma_addr &= mask; +static unsigned int wrap_index(unsigned int index) +{ + if (index >= io_tlb_nslabs) + return 0; + return index; +} - offset_slots = nr_slots(tbl_dma_addr); +/* + * Find a suitable number of IO TLB entries size that will fit this request and + * allocate a buffer from that IO TLB pool. + */ +static int find_slots(struct device *dev, size_t alloc_size, + dma_addr_t tbl_dma_addr) +{ + unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev)); + unsigned int nslots = nr_slots(alloc_size), stride = 1; + unsigned int index, wrap, count = 0, i; + unsigned long flags; - /* -* Carefully handle integer overflow which can occur when mask == ~0UL. -*/ - max_slots = mask + 1 - ? nr_slots(mask + 1) - : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); + BUG_ON(!nslots); /* * For mappings greater than or equal to a page, we limit the stride * (and hence alignment) to a page size. */ - nslots = nr_slots(alloc_size); if (alloc_size >= PAGE_SIZE) - stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT)); - else - stride = 1; - - BUG_ON(!nslots); + stride <<= (PAGE_SHIFT - IO_TLB_SHIFT); - /* -* Find suitable number of IO TLB entries size that will fit this -* request and allocate a buffer from that IO TLB pool. -*/ spin_lock_irqsave(&io_tlb_lock, flags); - if (unlikely(nslots > io_tlb_nslabs - io_tlb_used)) goto not_found; - index = ALIGN(io_tlb_index, stride); - if (index >= io_tlb_nslabs) - index = 0; - wrap = index; - + index = wrap = wrap_index(ALIGN(io_tlb_index, stride)); do { - while (iommu_is_span_boundary(index, nslots, offset_slots, - max_slots)) { - index += stride; - if (index >= io_tlb_nslabs) - index = 0; - if (index == wrap) - goto not_found; - } - /* * If we find a slot that indicates we have 'nslots' number of * contiguous buffers, we allocate the buffers from that slot * and mark the entries as '0' indicating unavailable. */ - if (io_tlb_list[index] >= nslots) { - int count = 0; - - for (i = index; i < (int) (index + nslots); i++) - io_tlb_list[i] = 0; - for (i = index - 1; -io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && -io_tlb_list[i]; i--) - io_tlb_list[i] = ++count; - tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT
[PATCH 6/8] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
swiotlb_tbl_map_single currently nevers sets a tlb_addr that is not aligned to the tlb bucket size. But we're going to add such a case soon, for which this adjustment would be bogus. Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index e78615e3be2906..6a2439826a1ba4 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -658,7 +658,6 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr, if (orig_addr == INVALID_PHYS_ADDR) return; - orig_addr += (unsigned long)tlb_addr & ((1 << IO_TLB_SHIFT) - 1); switch (target) { case SYNC_FOR_CPU: -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
preserve DMA offsets when using swiotlb
Hi all, this series make NVMe happy when running with swiotlb. This caters towards to completely broken NVMe controllers that ignore the specification (hello to the biggest cloud provider on the planet!), to crappy SOC that have addressing limitations, or "secure" virtualization that force bounce buffering to enhance the user experience. Or in other words, no one sane should hit it, but people do. It is basically a respin of the "SWIOTLB: Preserve swiotlb map offset when needed." series from Jianxiong Gao. It complete rewrites the swiotlb part so that the offset really is preserved and not just the offset into the swiotlb slot, and to do so it grew half a dozen patches to refactor the swiotlb so that a mere mortal like me could actually understand it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/8] swiotlb: factor out a nr_slots helper
Factor out a helper to find the number of slots for a given size. Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 838dbad10ab916..0c0b81799edbdb 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long val) return val & (IO_TLB_SEGSIZE - 1); } +static unsigned long nr_slots(u64 val) +{ + return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -493,20 +498,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, tbl_dma_addr &= mask; - offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; + offset_slots = nr_slots(tbl_dma_addr); /* * Carefully handle integer overflow which can occur when mask == ~0UL. */ max_slots = mask + 1 - ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT + ? nr_slots(mask + 1) : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); /* * For mappings greater than or equal to a page, we limit the stride * (and hence alignment) to a page size. */ - nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; + nslots = nr_slots(alloc_size); if (alloc_size >= PAGE_SIZE) stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT)); else @@ -602,7 +607,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, enum dma_data_direction dir, unsigned long attrs) { unsigned long flags; - int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; + int i, count, nslots = nr_slots(alloc_size); int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT; phys_addr_t orig_addr = io_tlb_orig_addr[index]; -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/8] swiotlb: clean up swiotlb_tbl_unmap_single
Remove a layer of pointless indentation, replace a hard to follow ternary expression with a plain if/else. Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 41 + 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 0c0b81799edbdb..79d5b236f25f10 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -626,28 +626,29 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, * with slots below and above the pool being returned. */ spin_lock_irqsave(&io_tlb_lock, flags); - { - count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ? -io_tlb_list[index + nslots] : 0); - /* -* Step 1: return the slots to the free list, merging the -* slots with superceeding slots -*/ - for (i = index + nslots - 1; i >= index; i--) { - io_tlb_list[i] = ++count; - io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; - } - /* -* Step 2: merge the returned slots with the preceding slots, -* if available (non zero) -*/ - for (i = index - 1; -io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && -io_tlb_list[i]; i--) - io_tlb_list[i] = ++count; + if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE)) + count = io_tlb_list[index + nslots]; + else + count = 0; - io_tlb_used -= nslots; + /* +* Step 1: return the slots to the free list, merging the slots with +* superceeding slots +*/ + for (i = index + nslots - 1; i >= index; i--) { + io_tlb_list[i] = ++count; + io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; } + + /* +* Step 2: merge the returned slots with the preceding slots, if +* available (non zero) +*/ + for (i = index - 1; +io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && io_tlb_list[i]; +i--) + io_tlb_list[i] = ++count; + io_tlb_used -= nslots; spin_unlock_irqrestore(&io_tlb_lock, flags); } -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/8] swiotlb: add a io_tlb_offset helper
Replace the very genericly named OFFSET macro with a little inline helper that hardcodes the alignment to the only value ever passed. Signed-off-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 7c42df6e61001d..838dbad10ab916 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,9 +50,6 @@ #define CREATE_TRACE_POINTS #include -#define OFFSET(val,align) ((unsigned long) \ - ( (val) & ( (align) - 1))) - #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT)) /* @@ -192,6 +189,11 @@ void swiotlb_print_info(void) bytes >> 20); } +static inline unsigned long io_tlb_offset(unsigned long val) +{ + return val & (IO_TLB_SEGSIZE - 1); +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -241,7 +243,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) __func__, alloc_size, PAGE_SIZE); for (i = 0; i < io_tlb_nslabs; i++) { - io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE); + io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i); io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; } io_tlb_index = 0; @@ -375,7 +377,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) goto cleanup4; for (i = 0; i < io_tlb_nslabs; i++) { - io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE); + io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i); io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; } io_tlb_index = 0; @@ -546,7 +548,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, for (i = index; i < (int) (index + nslots); i++) io_tlb_list[i] = 0; - for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--) + for (i = index - 1; +io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && +io_tlb_list[i]; i--) io_tlb_list[i] = ++count; tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT); @@ -632,7 +636,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, * Step 2: merge the returned slots with the preceding slots, * if available (non zero) */ - for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--) + for (i = index - 1; +io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && +io_tlb_list[i]; i--) io_tlb_list[i] = ++count; io_tlb_used -= nslots; -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters
From: Jianxiong Gao Some devices rely on the address offset in a page to function correctly (NVMe driver as an example). These devices may use a different page size than the Linux kernel. The address offset has to be preserved upon mapping, and in order to do so, we need to record the page_offset_mask first. Signed-off-by: Jianxiong Gao Signed-off-by: Christoph Hellwig --- include/linux/device.h | 1 + include/linux/dma-mapping.h | 16 2 files changed, 17 insertions(+) diff --git a/include/linux/device.h b/include/linux/device.h index 1779f90eeb4cb4..7960bf516dd7fe 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -291,6 +291,7 @@ struct device_dma_parameters { * sg limitations. */ unsigned int max_segment_size; + unsigned int min_align_mask; unsigned long segment_boundary_mask; }; diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 2e49996a8f391a..9c26225754e719 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -500,6 +500,22 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask) return -EIO; } +static inline unsigned int dma_get_min_align_mask(struct device *dev) +{ + if (dev->dma_parms) + return dev->dma_parms->min_align_mask; + return 0; +} + +static inline int dma_set_min_align_mask(struct device *dev, + unsigned int min_align_mask) +{ + if (WARN_ON_ONCE(!dev->dma_parms)) + return -EIO; + dev->dma_parms->min_align_mask = min_align_mask; + return 0; +} + static inline int dma_get_cache_alignment(void) { #ifdef ARCH_DMA_MINALIGN -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/2] iommu/vt-d: Use Real PCI DMA device for IRTE
VMD retransmits child device MSI/X with the VMD endpoint's requester-id. In order to support direct interrupt remapping of VMD child devices, ensure that the IRTE is programmed with the VMD endpoint's requester-id using pci_real_dma_dev(). Signed-off-by: Jon Derrick --- drivers/iommu/intel/irq_remapping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 685200a5cff0..1939e070eec8 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1276,7 +1276,8 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, break; case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - set_msi_sid(irte, msi_desc_to_pci_dev(info->desc)); + set_msi_sid(irte, + pci_real_dma_dev(msi_desc_to_pci_dev(info->desc))); break; default: BUG_ON(1); -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/2] PCI: vmd: Disable MSI/X remapping when possible
VMD will retransmit child device MSI/X using its own MSI/X table and requester-id. This limits the number of MSI/X available to the whole child device domain to the number of VMD MSI/X interrupts. Some VMD devices have a mode where this remapping can be disabled, allowing child device interrupts to bypass processing with the VMD MSI/X domain interrupt handler and going straight the child device interrupt handler, allowing for better performance and scaling. The requester-id still gets changed to the VMD endpoint's requester-id, and the interrupt remapping handlers have been updated to properly set IRTE for child device interrupts to the VMD endpoint's context. Some VMD platforms have existing production BIOS which rely on MSI/X remapping and won't explicitly program the MSI/X remapping bit. This re-enables MSI/X remapping on unload. Signed-off-by: Jon Derrick --- drivers/pci/controller/vmd.c | 60 1 file changed, 48 insertions(+), 12 deletions(-) diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 5e80f28f0119..a319ce49645b 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -59,6 +59,13 @@ enum vmd_features { * be used for MSI remapping */ VMD_FEAT_OFFSET_FIRST_VECTOR= (1 << 3), + + /* +* Device can bypass remapping MSI/X transactions into its MSI/X table, +* avoding the requirement of a VMD MSI domain for child device +* interrupt handling +*/ + VMD_FEAT_BYPASS_MSI_REMAP = (1 << 4), }; /* @@ -306,6 +313,15 @@ static struct msi_domain_info vmd_msi_domain_info = { .chip = &vmd_msi_controller, }; +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable) +{ + u16 reg; + + pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, ®); + reg = enable ? (reg & ~0x2) : (reg | 0x2); + pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg); +} + static int vmd_create_irq_domain(struct vmd_dev *vmd) { struct fwnode_handle *fn; @@ -325,6 +341,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd) static void vmd_remove_irq_domain(struct vmd_dev *vmd) { + /* +* Some production BIOS won't enable remapping between soft reboots. +* Ensure remapping is restored before unloading the driver. +*/ + if (!vmd->msix_count) + vmd_enable_msi_remapping(vmd, true); + if (vmd->irq_domain) { struct fwnode_handle *fn = vmd->irq_domain->fwnode; @@ -679,15 +702,31 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) sd->node = pcibus_to_node(vmd->dev->bus); - ret = vmd_create_irq_domain(vmd); - if (ret) - return ret; - /* -* Override the irq domain bus token so the domain can be distinguished -* from a regular PCI/MSI domain. +* Currently MSI remapping must be enabled in guest passthrough mode +* due to some missing interrupt remapping plumbing. This is probably +* acceptable because the guest is usually CPU-limited and MSI +* remapping doesn't become a performance bottleneck. */ - irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); + if (!(features & VMD_FEAT_BYPASS_MSI_REMAP) || offset[0] || offset[1]) { + ret = vmd_alloc_irqs(vmd); + if (ret) + return ret; + + vmd_enable_msi_remapping(vmd, true); + + ret = vmd_create_irq_domain(vmd); + if (ret) + return ret; + + /* +* Override the irq domain bus token so the domain can be +* distinguished from a regular PCI/MSI domain. +*/ + irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); + } else { + vmd_enable_msi_remapping(vmd, false); + } pci_add_resource(&resources, &vmd->resources[0]); pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]); @@ -753,10 +792,6 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id) if (features & VMD_FEAT_OFFSET_FIRST_VECTOR) vmd->first_vec = 1; - err = vmd_alloc_irqs(vmd); - if (err) - return err; - spin_lock_init(&vmd->cfg_lock); pci_set_drvdata(dev, vmd); err = vmd_enable_domain(vmd, features); @@ -825,7 +860,8 @@ static const struct pci_device_id vmd_ids[] = { .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP,}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_28C0), .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW | - VMD_FEAT_HAS_BUS_RESTRICTIONS,}, + VMD_FEAT_HAS_BUS_RESTRICTIONS | + VM
[PATCH v2 0/2] VMD MSI Remapping Bypass
The Intel Volume Management Device acts similar to a PCI-to-PCI bridge in that it changes downstream devices' requester-ids to its own. As VMD supports PCIe devices, it has its own MSI/X table and transmits child device MSI/X by remapping child device MSI/X and handling like a demultiplexer. Some newer VMD devices (Icelake Server) have an option to bypass the VMD MSI/X remapping table. This allows for better performance scaling as the child device MSI/X won't be limited by VMD's MSI/X count and IRQ handler. V1->V2: Updated for 5.12-next Moved IRQ allocation and remapping enable/disable to a more logical location V1 patches 1-4 were already merged V1, 5/6: https://patchwork.kernel.org/project/linux-pci/patch/20200728194945.14126-6-jonathan.derr...@intel.com/ V1, 6/6: https://patchwork.kernel.org/project/linux-pci/patch/20200728194945.14126-7-jonathan.derr...@intel.com/ Jon Derrick (2): iommu/vt-d: Use Real PCI DMA device for IRTE PCI: vmd: Disable MSI/X remapping when possible drivers/iommu/intel/irq_remapping.c | 3 +- drivers/pci/controller/vmd.c| 60 +++-- 2 files changed, 50 insertions(+), 13 deletions(-) -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
From: Wei Liu Sent: Wednesday, February 3, 2021 7:05 AM > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft > Hypervisor when Linux runs as the root partition. Implement an IRQ > domain to handle mapping and unmapping of IO-APIC interrupts. > > Signed-off-by: Wei Liu > --- > v6: > 1. Simplify code due to changes in a previous patch. > --- > arch/x86/hyperv/irqdomain.c | 25 + > arch/x86/include/asm/mshyperv.h | 4 + > drivers/iommu/hyperv-iommu.c| 177 +++- > 3 files changed, 203 insertions(+), 3 deletions(-) > Reviewed-by: Michael Kelley ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
On Thu, Feb 04, 2021 at 04:41:47PM +, Michael Kelley wrote: > From: Wei Liu Sent: Wednesday, February 3, 2021 4:47 AM [...] > > > > + > > > > + if (level) > > > > + intr_desc->trigger_mode = > > > > HV_INTERRUPT_TRIGGER_MODE_LEVEL; > > > > + else > > > > + intr_desc->trigger_mode = > > > > HV_INTERRUPT_TRIGGER_MODE_EDGE; > > > > + > > > > + __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask); > > > > + > > > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, > > > > input, > > output) & > > > > +HV_HYPERCALL_RESULT_MASK; > > > > + local_irq_restore(flags); > > > > + > > > > + *entry = output->interrupt_entry; > > > > + > > > > + return status; > > > > > > As a cross-check, I was comparing this code against > > > hv_map_msi_interrupt(). They are > > > mostly parallel, though some of the assignments are done in a different > > > order. It's a nit, > > > but making them as parallel as possible would be nice. :-) > > > > > > > Indeed. I will see about factoring out a function. > > If factoring out a separate helper function is clumsy, just having the > parallel code > in the two functions be as similar as possible makes it easier to see what's > the > same and what's different. > No. It is not clumsy at all. I've done it in the newly posted v6. I was baffled why I wrote hv_unmap_interrupt helper to be used by both hv_unmap_ioapic_interrupt and hv_unmap_msi_interrupt in the previous patch, but didn't write a hv_map_interrupt. Maybe I didn't have enough coffee that day. :-/ Thanks for pointing out that issue. It definitely helped improve the quality of this series. Wei. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
From: Wei Liu Sent: Wednesday, February 3, 2021 4:47 AM > > On Wed, Jan 27, 2021 at 05:47:08AM +, Michael Kelley wrote: > > From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM > > > > > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft > > > Hypervisor when Linux runs as the root partition. Implement an IRQ > > > domain to handle mapping and unmapping of IO-APIC interrupts. > > > > > > Signed-off-by: Wei Liu > > > --- > > > arch/x86/hyperv/irqdomain.c | 54 ++ > > > arch/x86/include/asm/mshyperv.h | 4 + > > > drivers/iommu/hyperv-iommu.c| 179 +++- > > > 3 files changed, 233 insertions(+), 4 deletions(-) > > > > > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > > > index 19637cd60231..8e2b4e478b70 100644 > > > --- a/arch/x86/hyperv/irqdomain.c > > > +++ b/arch/x86/hyperv/irqdomain.c > > > @@ -330,3 +330,57 @@ struct irq_domain * __init > > > hv_create_pci_msi_domain(void) > > > } > > > > > > #endif /* CONFIG_PCI_MSI */ > > > + > > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > > > *entry) > > > +{ > > > + union hv_device_id device_id; > > > + > > > + device_id.as_uint64 = 0; > > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; > > > + device_id.ioapic.ioapic_id = (u8)ioapic_id; > > > + > > > + return hv_unmap_interrupt(device_id.as_uint64, entry) & > HV_HYPERCALL_RESULT_MASK; > > > > The masking is already done in hv_unmap_interrupt. > > Fixed. > > > > > > +} > > > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt); > > > + > > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int > > > vector, > > > + struct hv_interrupt_entry *entry) > > > +{ > > > + unsigned long flags; > > > + struct hv_input_map_device_interrupt *input; > > > + struct hv_output_map_device_interrupt *output; > > > + union hv_device_id device_id; > > > + struct hv_device_interrupt_descriptor *intr_desc; > > > + u16 status; > > > + > > > + device_id.as_uint64 = 0; > > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; > > > + device_id.ioapic.ioapic_id = (u8)ioapic_id; > > > + > > > + local_irq_save(flags); > > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg); > > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg); > > > + memset(input, 0, sizeof(*input)); > > > + intr_desc = &input->interrupt_descriptor; > > > + input->partition_id = hv_current_partition_id; > > > + input->device_id = device_id.as_uint64; > > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED; > > > + intr_desc->target.vector = vector; > > > + intr_desc->vector_count = 1; > > > + > > > + if (level) > > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL; > > > + else > > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; > > > + > > > + __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask); > > > + > > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, > output) & > > > + HV_HYPERCALL_RESULT_MASK; > > > + local_irq_restore(flags); > > > + > > > + *entry = output->interrupt_entry; > > > + > > > + return status; > > > > As a cross-check, I was comparing this code against hv_map_msi_interrupt(). > > They are > > mostly parallel, though some of the assignments are done in a different > > order. It's a nit, > > but making them as parallel as possible would be nice. :-) > > > > Indeed. I will see about factoring out a function. If factoring out a separate helper function is clumsy, just having the parallel code in the two functions be as similar as possible makes it easier to see what's the same and what's different. > > > Same 64 vCPU comment applies here as well. > > > > This is changed to use vpset instead. Took me a bit of time to get it > working because document is a bit lacking. > > > > > > +} > > > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt); > > > diff --git a/arch/x86/include/asm/mshyperv.h > > > b/arch/x86/include/asm/mshyperv.h > > > index ccc849e25d5e..345d7c6f8c37 100644 > > > --- a/arch/x86/include/asm/mshyperv.h > > > +++ b/arch/x86/include/asm/mshyperv.h > > > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union > > > hv_msi_entry *msi_entry, > > > > > > struct irq_domain *hv_create_pci_msi_domain(void); > > > > > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int > > > vector, > > > + struct hv_interrupt_entry *entry); > > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > > > *entry); > > > + > > > #else /* CONFIG_HYPERV */ > > > static inline void hyperv_init(void) {} > > > static inline void hyperv_setup_mmu_ops(void) {} > > > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c > > > index b7db6024e65c..6d35e4c303c6 100644 > > > --- a/drivers/iommu/hyperv-iommu.c > > > +++ b/drivers/iommu/hyperv-iommu.c > > > @@ -116,30 +116,43 @@ static const struct irq_domain_ops > > > hyperv_ir_domain_o
Re: [RFC PATCH 3/3] iommu/arm-smmu-v3: Add debug interfaces for SMMUv3
On 2021-01-29 09:06, Zhou Wang wrote: This patch adds debug interfaces for SMMUv3 driver in sysfs. It adds debug related files under /sys/kernel/debug/iommu/smmuv3. User should firstly set device and pasid to pci_dev and pasid by: (currently only support PCI device) echo ::. > /sys/kernel/debug/iommu/smmuv3/pci_dev echo > /sys/kernel/debug/iommu/smmuv3/pasid Then value in cd and ste can be got by: cat /sys/kernel/debug/iommu/smmuv3/ste cat /sys/kernel/debug/iommu/smmuv3/cd S1 and S2 page tables can be got by: cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s1 cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s2 For ste, cd and page table, related device and pasid are set in pci_dev and pasid files as above. Signed-off-by: Zhou Wang --- drivers/iommu/Kconfig | 11 + drivers/iommu/arm/arm-smmu-v3/Makefile | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 + drivers/iommu/arm/arm-smmu-v3/debugfs.c | 398 5 files changed, 421 insertions(+) create mode 100644 drivers/iommu/arm/arm-smmu-v3/debugfs.c diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 192ef8f..4822c88 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -325,6 +325,17 @@ config ARM_SMMU_V3_SVA Say Y here if your system supports SVA extensions such as PCIe PASID and PRI. +config ARM_SMMU_V3_DEBUGFS + bool "Export ARM SMMUv3 internals in Debugfs" + depends on ARM_SMMU_V3 && IOMMU_DEBUGFS + help + DO NOT ENABLE THIS OPTION UNLESS YOU REALLY KNOW WHAT YOU ARE DOING! + + Expose ARM SMMUv3 internals in Debugfs. + + This option is -NOT- intended for production environments, and should + only be enabled for debugging ARM SMMUv3. + config S390_IOMMU def_bool y if S390 && PCI depends on S390 && PCI diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index 54feb1ec..55b411a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o arm_smmu_v3-objs-y += arm-smmu-v3.o arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) +obj-$(CONFIG_ARM_SMMU_V3_DEBUGFS) += debugfs.o diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index b65f63e2..aac7fdb 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3602,6 +3602,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev) return ret; } + arm_smmu_debugfs_init(); + return arm_smmu_set_bus_ops(&arm_smmu_ops); } @@ -3610,6 +3612,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev) struct arm_smmu_device *smmu = platform_get_drvdata(pdev); arm_smmu_set_bus_ops(NULL); + arm_smmu_debugfs_uninit(); iommu_device_unregister(&smmu->iommu); iommu_device_sysfs_remove(&smmu->iommu); arm_smmu_device_disable(smmu); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 3e7af39..31c4580 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -752,4 +752,12 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle) static inline void arm_smmu_sva_notifier_synchronize(void) {} #endif /* CONFIG_ARM_SMMU_V3_SVA */ + +#ifdef CONFIG_ARM_SMMU_V3_DEBUGFS +void arm_smmu_debugfs_init(void); +void arm_smmu_debugfs_uninit(void); +#else +static inline void arm_smmu_debugfs_init(void) {} +static inline void arm_smmu_debugfs_uninit(void) {} +#endif /* CONFIG_ARM_SMMU_V3_DEBUGFS */ #endif /* _ARM_SMMU_V3_H */ diff --git a/drivers/iommu/arm/arm-smmu-v3/debugfs.c b/drivers/iommu/arm/arm-smmu-v3/debugfs.c new file mode 100644 index 000..1af219a --- /dev/null +++ b/drivers/iommu/arm/arm-smmu-v3/debugfs.c @@ -0,0 +1,398 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include "arm-smmu-v3.h" +#include "../../io-pgtable-arm.h" + +#undef pr_fmt +#define pr_fmt(fmt)"SMMUv3 debug: " fmt + +#define NAME_BUF_LEN 32 + +static struct dentry *arm_smmu_debug; +static char dump_pci_dev[NAME_BUF_LEN]; +static u32 pasid; +static struct mutex lock; + +static ssize_t master_pdev_read(struct file *filp, char __user *buf, + size_t count, loff_t *pos) +{ + char pdev_name[NAME_BUF_LEN]; + char name[NAME_BUF_LEN]; + int ret; + + mutex_lock(&lock); + strncpy(pdev_name, dump_pci_dev, NAME_BUF_LEN); + mutex_unlock(&lock); + + if (!strlen(pdev_name)) { + pr_err("Please set pci_dev firstly\n"); Even if it's "just debugfs", printing userspace-triggered errors to
Re: [PATCH][next][V2] iommu/mediatek: Fix unsigned domid comparison with less than zero
On Thu, Feb 04, 2021 at 03:00:01PM +, Colin King wrote: > From: Colin Ian King > > Currently the check for domid < 0 is always false because domid > is unsigned. Fix this by casting domid to an int before making > the comparison. > > Addresses-Coverity: ("Unsigned comparison against 0") > Signed-off-by: Colin Ian King > --- > > V2: cast domid rather than making it an int. Replace L with : in > the commit message. > > --- > drivers/iommu/mtk_iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > index 0ad14a7604b1..1f262621ef19 100644 > --- a/drivers/iommu/mtk_iommu.c > +++ b/drivers/iommu/mtk_iommu.c > @@ -645,7 +645,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev, > struct iommu_resv_region *region; > int prot = IOMMU_WRITE | IOMMU_READ; > > - if (domid < 0) > + if ((int)domid < 0) > return; > curdom = data->plat_data->iova_region + domid; > for (i = 0; i < data->plat_data->iova_region_nr; i++) { Thanks, Colin. Acked-by: Will Deacon Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH][next][V2] iommu/mediatek: Fix unsigned domid comparison with less than zero
From: Colin Ian King Currently the check for domid < 0 is always false because domid is unsigned. Fix this by casting domid to an int before making the comparison. Addresses-Coverity: ("Unsigned comparison against 0") Signed-off-by: Colin Ian King --- V2: cast domid rather than making it an int. Replace L with : in the commit message. --- drivers/iommu/mtk_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 0ad14a7604b1..1f262621ef19 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -645,7 +645,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev, struct iommu_resv_region *region; int prot = IOMMU_WRITE | IOMMU_READ; - if (domid < 0) + if ((int)domid < 0) return; curdom = data->plat_data->iova_region + domid; for (i = 0; i < data->plat_data->iova_region_nr; i++) { -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/7] [PULL REQUEST] iommu/vt-d: Update for v5.12
On Thu, Feb 04, 2021 at 07:52:29PM +0800, Lu Baolu wrote: > Hi Joerg, > > I just received some internal comments on the last patch > > [PATCH 7/7] iommu/vt-d: Apply SATC policy > > We need some extra work on it and probably re-target it to v5.13. > > Can you please only consider patch 1 ~ 6 for v5.12? Applied patches 1-6, thanks Baolu. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
On Wed, Feb 03, 2021 at 03:04:35PM +, Wei Liu wrote: > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft > Hypervisor when Linux runs as the root partition. Implement an IRQ > domain to handle mapping and unmapping of IO-APIC interrupts. > > Signed-off-by: Wei Liu Acked-by: Joerg Roedel > --- > v6: > 1. Simplify code due to changes in a previous patch. > --- > arch/x86/hyperv/irqdomain.c | 25 + > arch/x86/include/asm/mshyperv.h | 4 + > drivers/iommu/hyperv-iommu.c| 177 +++- > 3 files changed, 203 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > index 117f17e8c88a..0cabc9aece38 100644 > --- a/arch/x86/hyperv/irqdomain.c > +++ b/arch/x86/hyperv/irqdomain.c > @@ -360,3 +360,28 @@ struct irq_domain * __init hv_create_pci_msi_domain(void) > } > > #endif /* CONFIG_PCI_MSI */ > + > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > *entry) > +{ > + union hv_device_id device_id; > + > + device_id.as_uint64 = 0; > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; > + device_id.ioapic.ioapic_id = (u8)ioapic_id; > + > + return hv_unmap_interrupt(device_id.as_uint64, entry); > +} > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt); > + > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector, > + struct hv_interrupt_entry *entry) > +{ > + union hv_device_id device_id; > + > + device_id.as_uint64 = 0; > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; > + device_id.ioapic.ioapic_id = (u8)ioapic_id; > + > + return hv_map_interrupt(device_id, level, cpu, vector, entry); > +} > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt); > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h > index ccc849e25d5e..345d7c6f8c37 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union > hv_msi_entry *msi_entry, > > struct irq_domain *hv_create_pci_msi_domain(void); > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector, > + struct hv_interrupt_entry *entry); > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > *entry); > + > #else /* CONFIG_HYPERV */ > static inline void hyperv_init(void) {} > static inline void hyperv_setup_mmu_ops(void) {} > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c > index 1d21a0b5f724..e285a220c913 100644 > --- a/drivers/iommu/hyperv-iommu.c > +++ b/drivers/iommu/hyperv-iommu.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > > #include "irq_remapping.h" > > @@ -115,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops > = { > .free = hyperv_irq_remapping_free, > }; > > +static const struct irq_domain_ops hyperv_root_ir_domain_ops; > static int __init hyperv_prepare_irq_remapping(void) > { > struct fwnode_handle *fn; > int i; > + const char *name; > + const struct irq_domain_ops *ops; > > if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) || > x86_init.hyper.msi_ext_dest_id() || > !x2apic_supported()) > return -ENODEV; > > - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0); > + if (hv_root_partition) { > + name = "HYPERV-ROOT-IR"; > + ops = &hyperv_root_ir_domain_ops; > + } else { > + name = "HYPERV-IR"; > + ops = &hyperv_ir_domain_ops; > + } > + > + fn = irq_domain_alloc_named_id_fwnode(name, 0); > if (!fn) > return -ENOMEM; > > ioapic_ir_domain = > irq_domain_create_hierarchy(arch_get_ir_parent_domain(), > - 0, IOAPIC_REMAPPING_ENTRY, fn, > - &hyperv_ir_domain_ops, NULL); > + 0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL); > > if (!ioapic_ir_domain) { > irq_domain_free_fwnode(fn); > return -ENOMEM; > } > > + if (hv_root_partition) > + return 0; /* The rest is only relevant to guests */ > + > /* >* Hyper-V doesn't provide irq remapping function for >* IO-APIC and so IO-APIC only accepts 8-bit APIC ID. > @@ -166,4 +180,161 @@ struct irq_remap_ops hyperv_irq_remap_ops = { > .enable = hyperv_enable_irq_remapping, > }; > > +/* IRQ remapping domain when Linux runs as the root partition */ > +struct hyperv_root_ir_data { > + u8 ioapic_id; > + bool is_level; > + struct hv_interrupt_entry entry; > +}; > + > +static void > +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg > *msg) > +{ > + u64 status; > + u32 vector; > + struct irq_cfg *cfg; > + int ioapic_id; > + struct cpumask *affinity; > + int cpu; > +
Re: [PATCH][next] iommu/mediatek: Fix unsigned domid comparison with less than zero
On Thu, Feb 04, 2021 at 09:25:58AM +, Will Deacon wrote: > On Wed, Feb 03, 2021 at 01:59:36PM +, Colin King wrote: > > From: Colin Ian King > > > > Currently the check for domid < 0 is always false because domid > > is unsigned. Fix this by making it signed. > > > > Addresses-CoverityL ("Unsigned comparison against 0") > > Typo here ('L' instead of ':') > > > Fixes: ab1d5281a62b ("iommu/mediatek: Add iova reserved function") > > Signed-off-by: Colin Ian King > > --- > > drivers/iommu/mtk_iommu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > > index 0ad14a7604b1..823d719945b2 100644 > > --- a/drivers/iommu/mtk_iommu.c > > +++ b/drivers/iommu/mtk_iommu.c > > @@ -640,7 +640,7 @@ static void mtk_iommu_get_resv_regions(struct device > > *dev, > >struct list_head *head) > > { > > struct mtk_iommu_data *data = dev_iommu_priv_get(dev); > > - unsigned int domid = mtk_iommu_get_domain_id(dev, data->plat_data), i; > > + int domid = mtk_iommu_get_domain_id(dev, data->plat_data), i; > > Not sure if it's intentional, but this also makes 'i' signed. It probably > should remain 'unsigned' to match 'iova_region_nr' in > 'struct mtk_iommu_plat_data'. Yes, 'i' should stay unsigned. Colin, can you please fix that up and re-send? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 11/12] platform-msi: Add platform check for subdevice irq domain
Hi Jason, On 2021/2/4 20:14, Jason Gunthorpe wrote: On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote: +bool arch_support_pci_device_ims(struct pci_dev *pdev) +{ Consistent language please, we are not using IMS elsewhere, this feature is called device_msi in Linux. Thanks for pointing this out. I will correct it. Jason Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/3] iommu/arm-smmu-v3: Add debug interfaces for SMMUv3
On 2021/2/4 5:38, Will Deacon wrote: > On Wed, Feb 03, 2021 at 11:15:18AM +0800, Zhou Wang wrote: >> On 2021/1/29 17:06, Zhou Wang wrote: >>> This RFC series is the followed patch of this discussion: >>> https://www.spinics.net/lists/arm-kernel/msg866187.html. >>> >>> Currently there is no debug interface about SMMUv3 driver, which makes it >>> not convenient when we want to dump some information, like the value of >>> CD/STE, S1/S2 page table, SMMU registers or cmd/event/pri queues. >>> >>> This series tries to add support of dumping CD/STE and page table. The >>> interface design is that user sets device/pasid firstly by sysfs files >>> and then read related sysfs file to get information: >>> >>> (currently only support PCI device) >>> echo ::. > /sys/kernel/debug/iommu/smmuv3/pci_dev >>> echo > /sys/kernel/debug/iommu/smmuv3/pasid >>> >>> Then value in CD and STE can be got by: >>> cat /sys/kernel/debug/iommu/smmuv3/ste >>> cat /sys/kernel/debug/iommu/smmuv3/cd >>> >>> S1 and S2 page tables can be got by: >>> cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s1 >>> cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s2 >>> >>> For STE, CD and page table, related device and pasid are set in pci_dev >>> and pasid files as above. >>> >>> First and second patch export some help functions or macros in arm-smmu-v3 >>> and io-pgtable-arm codes, so we can reuse them in debugfs.c. As a RFC, this >>> series does not go further to dump SMMU registers and cmd/event/pri queues. >>> I am not sure this series is in the right way, so let's post it out and >>> have a >>> discussion. Looking forward to any feedback. >>> >>> Zhou Wang (3): >>> iommu/arm-smmu-v3: Export cd/ste get functions >>> iommu/io-pgtable: Export page table walk needed functions and macros >>> iommu/arm-smmu-v3: Add debug interfaces for SMMUv3 >>> >>> drivers/iommu/Kconfig | 11 + >>> drivers/iommu/arm/arm-smmu-v3/Makefile | 1 + >>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 +- >>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 10 + >>> drivers/iommu/arm/arm-smmu-v3/debugfs.c | 398 >>> >>> drivers/iommu/io-pgtable-arm.c | 47 +--- >>> drivers/iommu/io-pgtable-arm.h | 43 +++ >>> 7 files changed, 475 insertions(+), 45 deletions(-) >>> create mode 100644 drivers/iommu/arm/arm-smmu-v3/debugfs.c >>> >> >> Any comments about this series? > > Truthfully, I don't really see the use in dumping the state of the SMMU > data structures. They're not especially dynamic, and there are higher level > ways to determine how devices map to groups etc. Here the aim is not to find the device/groups maps, but to dump CD/STE value. Of cause, we can know them from reading codes, however, we expect to dump hardware configures quickly and directly when debugging hardware related problem. It is a real pain when your hardware guy ask you to do this :) > > However, I can see some utility in dumping the page-tables. We have that > functionality for the CPU side via /sys/kernel/debug/kernel_page_tables, > so something similar in the io-pgtable code could be quite neat. In > particular, the logic to expose things in debugfs and drive the dumping > could be agnostic of the page-table format, while the formats themselves Do you mean different page-table format, like 64_s1, 64_s2, 32_s1, 32_s2? Seems in io-pgtable code, we only tell them in arm_lpae_prot_to_pte, and currently in this RFC patch, we do not print attributes. Thanks, Zhou > coule implement optional callback(s) to return the data. > > Will > > . > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 11/12] platform-msi: Add platform check for subdevice irq domain
On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote: > +bool arch_support_pci_device_ims(struct pci_dev *pdev) > +{ Consistent language please, we are not using IMS elsewhere, this feature is called device_msi in Linux. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/7] [PULL REQUEST] iommu/vt-d: Update for v5.12
Hi Joerg, I just received some internal comments on the last patch [PATCH 7/7] iommu/vt-d: Apply SATC policy We need some extra work on it and probably re-target it to v5.13. Can you please only consider patch 1 ~ 6 for v5.12? Sorry for the inconvenience. Best regards, baolu On 2021/2/4 9:43, Lu Baolu wrote: Hi Joerg, The patches queued in this series is for v5.12. It includes: - Audit capability consistency among different IOMMUs - Add SATC reporting structure support - Add iotlb_sync_map callback support - Misc cleanup Please consider them for v5.12. Best regards, Lu Baolu Bjorn Helgaas (1): iommu/vt-d: Fix 'physical' typos Kyung Min Park (2): iommu/vt-d: Audit IOMMU Capabilities and add helper functions iommu/vt-d: Move capability check code to cap_audit files Lu Baolu (1): iommu/vt-d: Add iotlb_sync_map callback Yian Chen (3): iommu/vt-d: Add new enum value and structure for SATC iommu/vt-d: Parse SATC reporting structure iommu/vt-d: Apply SATC policy drivers/iommu/intel/Makefile| 2 +- drivers/iommu/intel/cap_audit.c | 205 + drivers/iommu/intel/cap_audit.h | 130 +++ drivers/iommu/intel/dmar.c | 8 + drivers/iommu/intel/iommu.c | 337 +++- drivers/iommu/intel/irq_remapping.c | 8 + include/acpi/actbl1.h | 11 +- include/linux/dmar.h| 2 + include/linux/intel-iommu.h | 41 ++-- 9 files changed, 615 insertions(+), 129 deletions(-) create mode 100644 drivers/iommu/intel/cap_audit.c create mode 100644 drivers/iommu/intel/cap_audit.h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC v1 2/6] swiotlb: convert variables to arrays
On 2021-02-04 07:29, Christoph Hellwig wrote: On Wed, Feb 03, 2021 at 03:37:05PM -0800, Dongli Zhang wrote: This patch converts several swiotlb related variables to arrays, in order to maintain stat/status for different swiotlb buffers. Here are variables involved: - io_tlb_start and io_tlb_end - io_tlb_nslabs and io_tlb_used - io_tlb_list - io_tlb_index - max_segment - io_tlb_orig_addr - no_iotlb_memory There is no functional change and this is to prepare to enable 64-bit swiotlb. Claire Chang (on Cc) already posted a patch like this a month ago, which looks much better because it actually uses a struct instead of all the random variables. Indeed, I skimmed the cover letter and immediately thought that this whole thing is just the restricted DMA pool concept[1] again, only from a slightly different angle. Robin. [1] https://lore.kernel.org/linux-iommu/20210106034124.30560-1-tien...@chromium.org/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()
Hi Nicolin, A regression was detected by kernelci.org in IGT's drm_read tests on mainline, it was first seen on 17th December 2020. You can find some details here: https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/ Then an automated bisection was run and it landed on this patch (v5.10-rc3-4-g25938c73cd79 on mainline). Normally, an email is generated automatically but I had to start this one by hand as there were issues getting it to complete. You can see the failing test cases with this patch: https://lava.collabora.co.uk/results/3126405/0_igt-kms-tegra Some errors are seen around this point in the log: https://lava.collabora.co.uk/scheduler/job/3126405#L1005 [3.029729] tegra-mc 70019000.memory-controller: display0a: read @0xfe00: EMEM address decode error (SMMU translation error [--S]) [3.042058] tegra-mc 70019000.memory-controller: display0a: read @0xfe00: Page fault (SMMU translation error [--S]) Here's the same test passing with this patch reverted: https://lava.collabora.co.uk/results/3126570/0_igt-kms-tegra For completeness, you can see all the test jobs run by the automated bisection here: https://lava.collabora.co.uk/scheduler/device_type/tegra124-nyan-big?dt_length=25&dt_search=bisection-gtucker-12#dt_ Please let us know if you need any help debugging this issue or to try a fix on this platform. Best wishes, Guillaume On 25/11/2020 10:10, Nicolin Chen wrote: > The bus_set_iommu() in tegra_smmu_probe() enumerates all clients > to call in tegra_smmu_probe_device() where each client searches > its DT node for smmu pointer and swgroup ID, so as to configure > an fwspec. But this requires a valid smmu pointer even before mc > and smmu drivers are probed. So in tegra_smmu_probe() we added a > line of code to fill mc->smmu, marking "a bit of a hack". > > This works for most of clients in the DTB, however, doesn't work > for a client that doesn't exist in DTB, a PCI device for example. > > Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when > it's called from bus_set_iommu(), iommu core will let everything > carry on. Then when a client gets probed, of_iommu_configure() in > iommu core will search DTB for swgroup ID and call ->of_xlate() > to prepare an fwspec, similar to tegra_smmu_probe_device() and > tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device() > again, and this time we shall return smmu->iommu pointer properly. > > So we can get rid of tegra_smmu_find() and tegra_smmu_configure() > along with DT polling code by letting the iommu core handle every > thing, except a problem that we search iommus property in DTB not > only for swgroup ID but also for mc node to get mc->smmu pointer > to call dev_iommu_priv_set() and return the smmu->iommu pointer. > So we'll need to find another way to get smmu pointer. > > Referencing the implementation of sun50i-iommu driver, of_xlate() > has client's dev pointer, mc node and swgroup ID. This means that > we can call dev_iommu_priv_set() in of_xlate() instead, so we can > simply get smmu pointer in ->probe_device(). > > This patch reworks tegra_smmu_probe_device() by: > 1) Removing mc->smmu hack in tegra_smmu_probe() so as to return >ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of >tegra_smmu_probe/tegra_mc_probe(). > 2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu >pointer in tegra_smmu_probe_device() to replace DTB polling. > 3) Removing tegra_smmu_configure() accordingly since iommu core >takes care of it. > > This also fixes a problem that previously we could add clients to > iommu groups before iommu core initializes its default domain: > ubuntu@jetson:~$ dmesg | grep iommu > platform 5000.host1x: Adding to iommu group 1 > platform 5700.gpu: Adding to iommu group 2 > iommu: Default domain type: Translated > platform 5420.dc: Adding to iommu group 3 > platform 5424.dc: Adding to iommu group 3 > platform 5434.vic: Adding to iommu group 4 > > Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have > warnings if switching to IOMMU_DOMAIN_DMA: > iommu: Failed to allocate default IOMMU domain of type 0 for >group (null) - Falling back to IOMMU_DOMAIN_DMA > iommu: Failed to allocate default IOMMU domain of type 0 for >group (null) - Falling back to IOMMU_DOMAIN_DMA > > Now, bypassing the first probe_device() call from bus_set_iommu() > fixes the sequence: > ubuntu@jetson:~$ dmesg | grep iommu > iommu: Default domain type: Translated > tegra-host1x 5000.host1x: Adding to iommu group 0 > tegra-dc 5420.dc: Adding to iommu group 1 > tegra-dc 5424.dc: Adding to iommu group 1 > tegra-vic 5434.vic: Adding to iommu group 2 > nouveau 5700.gpu: Adding to iommu group 3 > > Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED. > > Reviewed-by: Dmitry Osipenko > Tested-by: Dmitry Osipenko
Re: [PATCH][next] iommu/mediatek: Fix unsigned domid comparison with less than zero
On Wed, Feb 03, 2021 at 01:59:36PM +, Colin King wrote: > From: Colin Ian King > > Currently the check for domid < 0 is always false because domid > is unsigned. Fix this by making it signed. > > Addresses-CoverityL ("Unsigned comparison against 0") Typo here ('L' instead of ':') > Fixes: ab1d5281a62b ("iommu/mediatek: Add iova reserved function") > Signed-off-by: Colin Ian King > --- > drivers/iommu/mtk_iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > index 0ad14a7604b1..823d719945b2 100644 > --- a/drivers/iommu/mtk_iommu.c > +++ b/drivers/iommu/mtk_iommu.c > @@ -640,7 +640,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev, > struct list_head *head) > { > struct mtk_iommu_data *data = dev_iommu_priv_get(dev); > - unsigned int domid = mtk_iommu_get_domain_id(dev, data->plat_data), i; > + int domid = mtk_iommu_get_domain_id(dev, data->plat_data), i; Not sure if it's intentional, but this also makes 'i' signed. It probably should remain 'unsigned' to match 'iova_region_nr' in 'struct mtk_iommu_plat_data'. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC v1 5/6] xen-swiotlb: convert variables to arrays
So one thing that has been on my mind for a while: I'd really like to kill the separate dma ops in Xen swiotlb. If we compare xen-swiotlb to swiotlb the main difference seems to be: - additional reasons to bounce I/O vs the plain DMA capable - the possibility to do a hypercall on arm/arm64 - an extra translation layer before doing the phys_to_dma and vice versa - an special memory allocator I wonder if inbetween a few jump labels or other no overhead enablement options and possibly better use of the dma_range_map we could kill off most of swiotlb-xen instead of maintaining all this code duplication? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 7/7] media: uvcvideo: Use dma_alloc_noncontiguos API
Tue, 2 Feb 2021 10:51:10 +0100 > From: Ricardo Ribalda > > On architectures where the is no coherent caching such as ARM use the > dma_alloc_noncontiguos API and handle manually the cache flushing using > dma_sync_sgtable(). > > With this patch on the affected architectures we can measure up to 20x > performance improvement in uvc_video_copy_data_work(). > > Eg: aarch64 with an external usb camera > > NON_CONTIGUOUS > frames: 999 > packets: 999 > empty: 0 (0 %) > errors: 0 > invalid: 0 > pts: 0 early, 0 initial, 999 ok > scr: 0 count ok, 0 diff ok > sof: 2048 <= sof <= 0, freq 0.000 kHz > bytes 67034480 : duration 33303 > FPS: 29.99 > URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS) > header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS) > latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS) > decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS) > raw decode speed: 9.931 Gbits/s > raw URB handling speed: 1.025 Gbits/s > throughput: 16.102 Mbits/s > URB decode CPU usage 0.162600 % > > COHERENT > frames: 999 > packets: 999 > empty: 0 (0 %) > errors: 0 > invalid: 0 > pts: 0 early, 0 initial, 999 ok > scr: 0 count ok, 0 diff ok > sof: 2048 <= sof <= 0, freq 0.000 kHz > bytes 54683536 : duration 33302 > FPS: 29.99 > URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS) > header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS) > latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS) > decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max > (uS) > raw decode speed: 365.470 Mbits/s > raw URB handling speed: 295.986 Mbits/s > throughput: 13.136 Mbits/s > URB decode CPU usage 3.594500 % > > Signed-off-by: Ricardo Ribalda > Signed-off-by: Christoph Hellwig > --- > drivers/media/usb/uvc/uvc_video.c | 79 ++- > drivers/media/usb/uvc/uvcvideo.h | 4 +- > 2 files changed, 60 insertions(+), 23 deletions(-) > > diff --git a/drivers/media/usb/uvc/uvc_video.c > b/drivers/media/usb/uvc/uvc_video.c > index a6a441d92b9488..0a7d287dc41528 100644 > --- a/drivers/media/usb/uvc/uvc_video.c > +++ b/drivers/media/usb/uvc/uvc_video.c > @@ -6,11 +6,13 @@ > * Laurent Pinchart (laurent.pinch...@ideasonboard.com) > */ > > +#include > #include > #include > #include > #include > #include > +#include > #include > #include > #include > @@ -1097,6 +1099,26 @@ static int uvc_video_decode_start(struct uvc_streaming > *stream, > return data[0]; > } > > +static inline struct device *stream_to_dmadev(struct uvc_streaming *stream) > +{ > + return bus_to_hcd(stream->dev->udev->bus)->self.sysdev; > +} > + > +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device) > +{ > + struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream); > + > + if (for_device) { > + dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt, > + DMA_FROM_DEVICE); > + } else { > + dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt, > + DMA_FROM_DEVICE); > + invalidate_kernel_vmap_range(uvc_urb->buffer, > + uvc_urb->stream->urb_size); > + } > +} > + > /* > * uvc_video_decode_data_work: Asynchronous memcpy processing > * > @@ -1118,6 +1140,8 @@ static void uvc_video_copy_data_work(struct work_struct > *work) > uvc_queue_buffer_release(op->buf); > } > > + uvc_urb_dma_sync(uvc_urb, true); > + > ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL); > if (ret < 0) > uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n", > @@ -1539,10 +1563,12 @@ static void uvc_video_complete(struct urb *urb) >* Process the URB headers, and optionally queue expensive memcpy tasks >* to be deferred to a work queue. >*/ > + uvc_urb_dma_sync(uvc_urb, false); > stream->decode(uvc_urb, buf, buf_meta); > > /* If no async work is needed, resubmit the URB immediately. */ > if (!uvc_urb->async_operations) { > + uvc_urb_dma_sync(uvc_urb, true); > ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC); > if (ret < 0) > uvc_printk(KERN_ERR, > @@ -1559,24 +1585,46 @@ static void uvc_video_complete(struct urb *urb) > */ > static void uvc_free_urb_buffers(struct uvc_streaming *stream) > { > + struct device *dma_dev = dma_dev = stream_to_dmadev(stream); > struct uvc_urb *uvc_urb; > > for_each_uvc_urb(uvc_urb, stream) { > if (!uvc_urb->buffer) > continue; > > -#ifndef CONFIG_DMA_NONCOHERENT > - usb_free_coherent(stream->dev->udev, stream->urb_size, > - uvc_urb->buffer, uvc_urb->dma); > -#else > -