Re: [PATCH v3 1/2] dt-bindings: iommu: add bindings for sprd iommu

2021-02-04 Thread Chunyan Zhang
Hi Rob,

On Fri, 5 Feb 2021 at 07:25, Rob Herring  wrote:
>
> On Wed, Feb 03, 2021 at 05:07:26PM +0800, Chunyan Zhang wrote:
> > From: Chunyan Zhang 
> >
> > This iommu module can be used by Unisoc's multimedia devices, such as
> > display, Image codec(jpeg) and a few signal processors, including
> > VSP(video), GSP(graphic), ISP(image), and CPP(camera pixel processor), etc.
> >
> > Signed-off-by: Chunyan Zhang 
> > ---
> >  .../devicetree/bindings/iommu/sprd,iommu.yaml | 72 +++
> >  1 file changed, 72 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml 
> > b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> > new file mode 100644
> > index ..4fc99e81fa66
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> > @@ -0,0 +1,72 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +# Copyright 2020 Unisoc Inc.
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/iommu/sprd,iommu.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Unisoc IOMMU and Multi-media MMU
> > +
> > +maintainers:
> > +  - Chunyan Zhang 
> > +
> > +properties:
> > +  compatible:
> > +enum:
> > +  - sprd,iommu-v1
> > +
> > +  "#iommu-cells":
> > +const: 0
> > +description:
> > +  Unisoc IOMMUs are all single-master IOMMU devices, therefore no
> > +  additional information needs to associate with its master device.
> > +  Please refer to the generic bindings document for more details,
> > +  Documentation/devicetree/bindings/iommu/iommu.txt
> > +
> > +  reg:
> > +maxItems: 1
> > +description:
> > +  Not required if 'sprd,iommu-regs' is defined.
> > +
> > +  clocks:
> > +description:
> > +  Reference to a gate clock phandle, since access to some of IOMMUs are
> > +  controlled by gate clock, but this is not required.
> > +
> > +  sprd,iommu-regs:
> > +$ref: /schemas/types.yaml#/definitions/phandle-array
> > +description:
> > +  Reference to a syscon phandle plus 1 cell, the syscon defines the
> > +  register range used by the iommu and the media device, the cell
> > +  defines the offset for iommu registers. Since iommu module shares
> > +  the same register range with the media device which uses it.
> > +
> > +required:
> > +  - compatible
> > +  - "#iommu-cells"
> > +
> > +oneOf:
> > +  - required:
> > +  - reg
> > +  - required:
> > +  - sprd,iommu-regs
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > +  - |
> > +iommu_disp: iommu-disp {
> > +  compatible = "sprd,iommu-v1";
> > +  sprd,iommu-regs = <&dpu_regs 0x800>;
>
> If the IOMMU is contained within another device, then it should just be
> a child node of that device.

Yes, actually IOMMU can be seen as a child of multimedia devices, I
considered moving IOMMU under into multimedia device node, but
multimedia devices need IOMMU when probe[1], so I dropped that idea.

And they share the same register base, e.g.

+   mm {
+   compatible = "simple-bus";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   dpu_regs: syscon@6300 {
+   compatible = "sprd,sc9863a-dpuregs", "syscon";
+   reg = <0 0x6300 0 0x1000>;
+   };
+
+   dpu: dpu@6300 {
+   compatible = "sprd,sharkl3-dpu";
+   sprd,disp-regs = <&dpu_regs>;
+   iommus = <&iommu_dispc>;
+   };
+
+   iommu_dispc: iommu@6300 {
+   compatible = "sprd,iommu-v1";
+   sprd,iommu-regs = <&dpu_regs 0x800>;
+   #iommu-cells = <0>;
+};

DPU use the registers from 0, IOMMU from 0x800, the purpose of using
syscon node was to avoid remapping register physical address.

> Or just make 'dpu_regs' an IOMMU provider
> (i.e. just add #iommu-cells to it).

xxx_regs(syscon node) defines the register range for IOMMU and a
multimedia device (such as dpu, image codec, etc.)

Hope I've explained the relationship of xxx_regs, multimedia device,
and iommu clearly :)

Any suggestion for this kind of cases?

Thanks,
Chunyan

[1] 
https://elixir.bootlin.com/linux/v5.11-rc6/source/drivers/iommu/of_iommu.c#L145
>
> > +  #iommu-cells = <0>;
> > +};
> > +
> > +  - |
> > +iommu_jpg: iommu-jpg {
> > +  compatible = "sprd,iommu-v1";
> > +  sprd,iommu-regs = <&jpg_regs 0x300>;
> > +  #iommu-cells = <0>;
> > +  clocks = <&mm_gate 1>;
> > +};
> > +
> > +...
> > --
> > 2.25.1
> >
___
iomm

Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-04 Thread Nicolin Chen
Hi Guillaume,

On Thu, Feb 04, 2021 at 11:10:15AM +, Guillaume Tucker wrote:
> Hi Nicolin,
> 
> A regression was detected by kernelci.org in IGT's drm_read tests
> on mainline, it was first seen on 17th December 2020.  You can
> find some details here:
> 
>   https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/

Thanks for reporting the issue. We did test on Tegra210 and Tegra30
yet not on Tegra124. I am wondering what could go wrong...

> Please let us know if you need any help debugging this issue or
> to try a fix on this platform.

Yes, I don't have any Tegra124 platform to run. It'd be very nice
if you can run some debugging patch (I can provide you) and a fix
after I root cause the issue.

Thanks
Nicolin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] Use SMMU HTTU for DMA dirty page tracking

2021-02-04 Thread Tian, Kevin
> From: Keqian Zhu 
> Sent: Friday, February 5, 2021 11:31 AM
> 
> Hi Jean and Kevin,
> 
> FYI, I have send out the SMMUv3 HTTU support for DMA dirty tracking[1] a
> week ago.
> 
> Thanks,
> Keqian
> 
> [1] https://lore.kernel.org/linux-iommu/20210128151742.18840-1-
> zhukeqi...@huawei.com/

Thanks for the pointer. We will take a look.

> 
> On 2020/5/27 17:14, Jean-Philippe Brucker wrote:
> > On Wed, May 27, 2020 at 08:40:47AM +, Tian, Kevin wrote:
> >>> From: Xiang Zheng 
> >>> Sent: Wednesday, May 27, 2020 2:45 PM
> >>>
> >>>
> >>> On 2020/5/27 11:27, Tian, Kevin wrote:
> > From: Xiang Zheng
> > Sent: Monday, May 25, 2020 7:34 PM
> >
> > [+cc Kirti, Yan, Alex]
> >
> > On 2020/5/23 1:14, Jean-Philippe Brucker wrote:
> >> Hi,
> >>
> >> On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote:
> >>> Hi all,
> >>>
> >>> Is there any plan for enabling SMMU HTTU?
> >>
> >> Not outside of SVA, as far as I know.
> >>
> >
> >>> I have seen the patch locates in the SVA series patch, which adds
> >>> support for HTTU:
> >>> https://www.spinics.net/lists/arm-kernel/msg798694.html
> >>>
> >>> HTTU reduces the number of access faults on SMMU fault queue
> >>> (permission faults also benifit from it).
> >>>
> >>> Besides reducing the faults, HTTU also helps to track dirty pages for
> >>> device DMA. Is it feasible to utilize HTTU to get dirty pages on 
> >>> device
> >>> DMA during VFIO live migration?
> >>
> >> As you know there is a VFIO interface for this under discussion:
> >> https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email-
> > kwankh...@nvidia.com/
> >> It doesn't implement an internal API to communicate with the
> IOMMU
> > driver
> >> about dirty pages.
> 
>  We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level
>  page tables (Rev 3.0).
> 
> >>>
> >>> Thank you, Kevin.
> >>>
> >>> When will you send this series patches? Maybe(Hope) we can also
> support
> >>> hardware-based dirty pages tracking via common APIs based on your
> >>> patches. :)
> >>
> >> Yan is working with Kirti on basic live migration support now. After that
> >> part is done, we will start working on A/D bit support. Yes, common APIs
> >> are definitely the goal here.
> >>
> >>>
> >
> >>
> >>> If SMMU can track dirty pages, devices are not required to
> implement
> >>> additional dirty pages tracking to support VFIO live migration.
> >>
> >> It seems feasible, though tracking it in the device might be more
> >> efficient. I might have misunderstood but I think for live migration of
> >> the Intel NIC they trap guest accesses to the device and introspect its
> >> state to figure out which pages it is accessing.
> 
>  Does HTTU implement A/D-like mechanism in SMMU page tables, or
> just
>  report dirty pages in a log buffer? Either way tracking dirty pages in
> IOMMU
>  side is generic thus doesn't require device-specific tweak like in Intel 
>  NIC.
> 
> >>>
> >>> Currently HTTU just implement A/D-like mechanism in SMMU page
> tables.
> >>> We certainly
> >>> expect SMMU can also implement PML-like feature so that we can avoid
> >>> walking the
> >>> whole page table to get the dirty pages.
> >
> > There is no reporting of dirty pages in log buffer. It might be possible
> > to do software logging based on PRI or Stall, but that requires special
> > support in the endpoint as well as the SMMU.
> >
> >> Is there a link to HTTU introduction?
> >
> > I don't know any gentle introduction, but there are sections D5.4.11
> > "Hardware management of the Access flag and dirty state" in the ARM
> > Architecture Reference Manual (DDI0487E), and section 3.13 "Translation
> > table entries and Access/Dirty flags" in the SMMU specification
> > (IHI0070C). HTTU stands for "Hardware Translation Table Update".
> >
> > In short, when HTTU is enabled, the SMMU translation performs an atomic
> > read-modify-write on the leaf translation table descriptor, setting some
> > bits depending on the type of memory access. This can be enabled
> > independently on both stage-1 and stage-2 tables (equivalent to your 1st
> > and 2nd page tables levels, I think).
> >
> > Thanks,
> > Jean
> > ___
> > kvmarm mailing list
> > kvm...@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> > .
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] Use SMMU HTTU for DMA dirty page tracking

2021-02-04 Thread Keqian Zhu
Hi Jean and Kevin,

FYI, I have send out the SMMUv3 HTTU support for DMA dirty tracking[1] a week 
ago.

Thanks,
Keqian

[1] 
https://lore.kernel.org/linux-iommu/20210128151742.18840-1-zhukeqi...@huawei.com/

On 2020/5/27 17:14, Jean-Philippe Brucker wrote:
> On Wed, May 27, 2020 at 08:40:47AM +, Tian, Kevin wrote:
>>> From: Xiang Zheng 
>>> Sent: Wednesday, May 27, 2020 2:45 PM
>>>
>>>
>>> On 2020/5/27 11:27, Tian, Kevin wrote:
> From: Xiang Zheng
> Sent: Monday, May 25, 2020 7:34 PM
>
> [+cc Kirti, Yan, Alex]
>
> On 2020/5/23 1:14, Jean-Philippe Brucker wrote:
>> Hi,
>>
>> On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote:
>>> Hi all,
>>>
>>> Is there any plan for enabling SMMU HTTU?
>>
>> Not outside of SVA, as far as I know.
>>
>
>>> I have seen the patch locates in the SVA series patch, which adds
>>> support for HTTU:
>>> https://www.spinics.net/lists/arm-kernel/msg798694.html
>>>
>>> HTTU reduces the number of access faults on SMMU fault queue
>>> (permission faults also benifit from it).
>>>
>>> Besides reducing the faults, HTTU also helps to track dirty pages for
>>> device DMA. Is it feasible to utilize HTTU to get dirty pages on device
>>> DMA during VFIO live migration?
>>
>> As you know there is a VFIO interface for this under discussion:
>> https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email-
> kwankh...@nvidia.com/
>> It doesn't implement an internal API to communicate with the IOMMU
> driver
>> about dirty pages.

 We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level
 page tables (Rev 3.0).

>>>
>>> Thank you, Kevin.
>>>
>>> When will you send this series patches? Maybe(Hope) we can also support
>>> hardware-based dirty pages tracking via common APIs based on your
>>> patches. :)
>>
>> Yan is working with Kirti on basic live migration support now. After that
>> part is done, we will start working on A/D bit support. Yes, common APIs
>> are definitely the goal here.
>>
>>>
>
>>
>>> If SMMU can track dirty pages, devices are not required to implement
>>> additional dirty pages tracking to support VFIO live migration.
>>
>> It seems feasible, though tracking it in the device might be more
>> efficient. I might have misunderstood but I think for live migration of
>> the Intel NIC they trap guest accesses to the device and introspect its
>> state to figure out which pages it is accessing.

 Does HTTU implement A/D-like mechanism in SMMU page tables, or just
 report dirty pages in a log buffer? Either way tracking dirty pages in 
 IOMMU
 side is generic thus doesn't require device-specific tweak like in Intel 
 NIC.

>>>
>>> Currently HTTU just implement A/D-like mechanism in SMMU page tables.
>>> We certainly
>>> expect SMMU can also implement PML-like feature so that we can avoid
>>> walking the
>>> whole page table to get the dirty pages.
> 
> There is no reporting of dirty pages in log buffer. It might be possible
> to do software logging based on PRI or Stall, but that requires special
> support in the endpoint as well as the SMMU.
> 
>> Is there a link to HTTU introduction?
> 
> I don't know any gentle introduction, but there are sections D5.4.11
> "Hardware management of the Access flag and dirty state" in the ARM
> Architecture Reference Manual (DDI0487E), and section 3.13 "Translation
> table entries and Access/Dirty flags" in the SMMU specification
> (IHI0070C). HTTU stands for "Hardware Translation Table Update".
> 
> In short, when HTTU is enabled, the SMMU translation performs an atomic
> read-modify-write on the leaf translation table descriptor, setting some
> bits depending on the type of memory access. This can be enabled
> independently on both stage-1 and stage-2 tables (equivalent to your 1st
> and 2nd page tables levels, I think).
> 
> Thanks,
> Jean
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] dma-mapping: benchmark: pretend DMA is transmitting

2021-02-04 Thread Barry Song
In a real dma mapping user case, after dma_map is done, data will be
transmit. Thus, in multi-threaded user scenario, IOMMU contention
should not be that severe. For example, if users enable multiple
threads to send network packets through 1G/10G/100Gbps NIC, usually
the steps will be: map -> transmission -> unmap.  Transmission delay
reduces the contention of IOMMU.

Here a delay is added to simulate the transmission between map and unmap
so that the tested result could be more accurate for TX and simple RX.
A typical TX transmission for NIC would be like: map -> TX -> unmap
since the socket buffers come from OS. Simple RX model eg. disk driver,
is also map -> RX -> unmap, but real RX model in a NIC could be more
complicated considering packets can come spontaneously and many drivers
are using pre-mapped buffers pool. This is in the TBD list.

Signed-off-by: Barry Song 
---
 -v2: cleanup according to Robin's feedback. thanks, Robin.

 kernel/dma/map_benchmark.c| 10 ++
 .../testing/selftests/dma/dma_map_benchmark.c | 19 +--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index 1b1b8ff875cb..06636406a245 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -21,6 +21,7 @@
 #define DMA_MAP_BENCHMARK  _IOWR('d', 1, struct map_benchmark)
 #define DMA_MAP_MAX_THREADS1024
 #define DMA_MAP_MAX_SECONDS300
+#define DMA_MAP_MAX_TRANS_DELAY(10 * NSEC_PER_MSEC) /* 10ms */
 
 #define DMA_MAP_BIDIRECTIONAL  0
 #define DMA_MAP_TO_DEVICE  1
@@ -36,6 +37,7 @@ struct map_benchmark {
__s32 node; /* which numa node this benchmark will run on */
__u32 dma_bits; /* DMA addressing capability */
__u32 dma_dir; /* DMA data direction */
+   __u32 dma_trans_ns; /* time for DMA transmission in ns */
__u64 expansion[10];/* For future use */
 };
 
@@ -87,6 +89,9 @@ static int map_benchmark_thread(void *data)
map_etime = ktime_get();
map_delta = ktime_sub(map_etime, map_stime);
 
+   /* Pretend DMA is transmitting */
+   ndelay(map->bparam.dma_trans_ns);
+
unmap_stime = ktime_get();
dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir);
unmap_etime = ktime_get();
@@ -218,6 +223,11 @@ static long map_benchmark_ioctl(struct file *file, 
unsigned int cmd,
return -EINVAL;
}
 
+   if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) {
+   pr_err("invalid transmission delay\n");
+   return -EINVAL;
+   }
+
if (map->bparam.node != NUMA_NO_NODE &&
!node_possible(map->bparam.node)) {
pr_err("invalid numa node\n");
diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c 
b/tools/testing/selftests/dma/dma_map_benchmark.c
index 7065163a8388..a370290d9503 100644
--- a/tools/testing/selftests/dma/dma_map_benchmark.c
+++ b/tools/testing/selftests/dma/dma_map_benchmark.c
@@ -11,9 +11,12 @@
 #include 
 #include 
 
+#define NSEC_PER_MSEC  100L
+
 #define DMA_MAP_BENCHMARK  _IOWR('d', 1, struct map_benchmark)
 #define DMA_MAP_MAX_THREADS1024
 #define DMA_MAP_MAX_SECONDS 300
+#define DMA_MAP_MAX_TRANS_DELAY(10 * NSEC_PER_MSEC) /* 10ms */
 
 #define DMA_MAP_BIDIRECTIONAL  0
 #define DMA_MAP_TO_DEVICE  1
@@ -35,6 +38,7 @@ struct map_benchmark {
__s32 node; /* which numa node this benchmark will run on */
__u32 dma_bits; /* DMA addressing capability */
__u32 dma_dir; /* DMA data direction */
+   __u32 dma_trans_ns; /* delay for DMA transmission in ns */
__u64 expansion[10];/* For future use */
 };
 
@@ -45,12 +49,12 @@ int main(int argc, char **argv)
/* default single thread, run 20 seconds on NUMA_NO_NODE */
int threads = 1, seconds = 20, node = -1;
/* default dma mask 32bit, bidirectional DMA */
-   int bits = 32, dir = DMA_MAP_BIDIRECTIONAL;
+   int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL;
 
int cmd = DMA_MAP_BENCHMARK;
char *p;
 
-   while ((opt = getopt(argc, argv, "t:s:n:b:d:")) != -1) {
+   while ((opt = getopt(argc, argv, "t:s:n:b:d:x:")) != -1) {
switch (opt) {
case 't':
threads = atoi(optarg);
@@ -67,6 +71,9 @@ int main(int argc, char **argv)
case 'd':
dir = atoi(optarg);
break;
+   case 'x':
+   xdelay = atoi(optarg);
+   break;
default:
return -1;
}
@@ -84,6 +91,12 @@ int main(int argc, char **argv)
exit(1);
}
 
+   if (xdelay < 0 || xdelay > DMA_MAP_MAX_TRANS_DELAY) {
+   fprin

Re: [PATCH v2 1/2] iommu/vt-d: Use Real PCI DMA device for IRTE

2021-02-04 Thread Lu Baolu

On 2/5/21 3:09 AM, Jon Derrick wrote:

VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
In order to support direct interrupt remapping of VMD child devices,
ensure that the IRTE is programmed with the VMD endpoint's requester-id
using pci_real_dma_dev().

Signed-off-by: Jon Derrick 
---
  drivers/iommu/intel/irq_remapping.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index 685200a5cff0..1939e070eec8 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1276,7 +1276,8 @@ static void intel_irq_remapping_prepare_irte(struct 
intel_ir_data *data,
break;
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
+   set_msi_sid(irte,
+   pci_real_dma_dev(msi_desc_to_pci_dev(info->desc)));
break;
default:
BUG_ON(1);



Acked-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] dma-mapping: benchmark: pretend DMA is transmitting

2021-02-04 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Friday, February 5, 2021 12:51 PM
> To: Song Bao Hua (Barry Song) ;
> m.szyprow...@samsung.com; h...@lst.de; iommu@lists.linux-foundation.org
> Cc: linux-ker...@vger.kernel.org; linux...@openeuler.org
> Subject: Re: [PATCH] dma-mapping: benchmark: pretend DMA is transmitting
> 
> On 2021-02-04 22:58, Barry Song wrote:
> > In a real dma mapping user case, after dma_map is done, data will be
> > transmit. Thus, in multi-threaded user scenario, IOMMU contention
> > should not be that severe. For example, if users enable multiple
> > threads to send network packets through 1G/10G/100Gbps NIC, usually
> > the steps will be: map -> transmission -> unmap.  Transmission delay
> > reduces the contention of IOMMU. Here a delay is added to simulate
> > the transmission for TX case so that the tested result could be
> > more accurate.
> >
> > RX case would be much more tricky. It is not supported yet.
> 
> I guess it might be a reasonable approximation to map several pages,
> then unmap them again after a slightly more random delay. Or maybe
> divide the threads into pairs of mappers and unmappers respectively
> filling up and draining proper little buffer pools.

Yes. Good suggestions. I am actually thinking about how to support
cases like networks. There is a pre-mapped list of pages, each page
is bound with some hardware DMA block descriptor(BD). So if Linux can
consume the packets in time, those buffers are always re-used. Only
when the page bound with BD is full and OS can't consume it in time,
another temp page will be allocated and mapped, BD will switch to use
this temp page, then finally unmap it if it is not needed any more.
On the other hand, the pre-mapped pages are never unmapped.

For things like filesystem and disk driver, RX is always requested by
users. The model would be simpler: map -> rx -> unmap. For networks,
RX transmission can come spontaneously.

Anyway, I'll put this into TBD. For this moment, mainly handle TX path.
Or maybe the current code has been able to handle simple RX model :-)

> 
> > Signed-off-by: Barry Song 
> > ---
> >   kernel/dma/map_benchmark.c  | 11 +++
> >   tools/testing/selftests/dma/dma_map_benchmark.c | 17 +++--
> >   2 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
> > index 1b1b8ff875cb..1976db7e34e4 100644
> > --- a/kernel/dma/map_benchmark.c
> > +++ b/kernel/dma/map_benchmark.c
> > @@ -21,6 +21,7 @@
> >   #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark)
> >   #define DMA_MAP_MAX_THREADS   1024
> >   #define DMA_MAP_MAX_SECONDS   300
> > +#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */
> 
> Using MSEC_PER_SEC might be sufficiently self-documenting?

Yes, I guess you mean NSEC_PER_MSEC. will move to it.

> 
> >   #define DMA_MAP_BIDIRECTIONAL 0
> >   #define DMA_MAP_TO_DEVICE 1
> > @@ -36,6 +37,7 @@ struct map_benchmark {
> > __s32 node; /* which numa node this benchmark will run on */
> > __u32 dma_bits; /* DMA addressing capability */
> > __u32 dma_dir; /* DMA data direction */
> > +   __u32 dma_trans_ns; /* time for DMA transmission in ns */
> > __u64 expansion[10];/* For future use */
> >   };
> >
> > @@ -87,6 +89,10 @@ static int map_benchmark_thread(void *data)
> > map_etime = ktime_get();
> > map_delta = ktime_sub(map_etime, map_stime);
> >
> > +   /* Pretend DMA is transmitting */
> > +   if (map->dir != DMA_FROM_DEVICE)
> > +   ndelay(map->bparam.dma_trans_ns);
> 
> TBH I think the option of a fixed delay between map and unmap might be a
> handy thing in general, so having the direction check at all seems
> needlessly restrictive. As long as the driver implements all the basic
> building blocks, combining them to simulate specific traffic patterns
> can be left up to the benchmark tool.

Sensible, will remove the condition check.

> 
> Robin.
> 
> > +
> > unmap_stime = ktime_get();
> > dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir);
> > unmap_etime = ktime_get();
> > @@ -218,6 +224,11 @@ static long map_benchmark_ioctl(struct file *file,
> unsigned int cmd,
> > return -EINVAL;
> > }
> >
> > +   if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) {
> > +   pr_err("invalid transmission delay\n");
> > +   return -EINVAL;
> > +   }
> > +
> > if (map->bparam.node != NUMA_NO_NODE &&
> > !node_possible(map->bparam.node)) {
> > pr_err("invalid numa node\n");
> > diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c
> b/tools/testing/selftests/dma/dma_map_benchmark.c
> > index 7065163a8388..dbf426e2fb7f 100644
> > --- a/tools/testing/selftests/dma/dma_map_benchmark.c

Re: [PATCH] dma-mapping: benchmark: pretend DMA is transmitting

2021-02-04 Thread Robin Murphy

On 2021-02-04 22:58, Barry Song wrote:

In a real dma mapping user case, after dma_map is done, data will be
transmit. Thus, in multi-threaded user scenario, IOMMU contention
should not be that severe. For example, if users enable multiple
threads to send network packets through 1G/10G/100Gbps NIC, usually
the steps will be: map -> transmission -> unmap.  Transmission delay
reduces the contention of IOMMU. Here a delay is added to simulate
the transmission for TX case so that the tested result could be
more accurate.

RX case would be much more tricky. It is not supported yet.


I guess it might be a reasonable approximation to map several pages, 
then unmap them again after a slightly more random delay. Or maybe 
divide the threads into pairs of mappers and unmappers respectively 
filling up and draining proper little buffer pools.



Signed-off-by: Barry Song 
---
  kernel/dma/map_benchmark.c  | 11 +++
  tools/testing/selftests/dma/dma_map_benchmark.c | 17 +++--
  2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index 1b1b8ff875cb..1976db7e34e4 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -21,6 +21,7 @@
  #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark)
  #define DMA_MAP_MAX_THREADS   1024
  #define DMA_MAP_MAX_SECONDS   300
+#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */


Using MSEC_PER_SEC might be sufficiently self-documenting?


  #define DMA_MAP_BIDIRECTIONAL 0
  #define DMA_MAP_TO_DEVICE 1
@@ -36,6 +37,7 @@ struct map_benchmark {
__s32 node; /* which numa node this benchmark will run on */
__u32 dma_bits; /* DMA addressing capability */
__u32 dma_dir; /* DMA data direction */
+   __u32 dma_trans_ns; /* time for DMA transmission in ns */
__u64 expansion[10];/* For future use */
  };
  
@@ -87,6 +89,10 @@ static int map_benchmark_thread(void *data)

map_etime = ktime_get();
map_delta = ktime_sub(map_etime, map_stime);
  
+		/* Pretend DMA is transmitting */

+   if (map->dir != DMA_FROM_DEVICE)
+   ndelay(map->bparam.dma_trans_ns);


TBH I think the option of a fixed delay between map and unmap might be a 
handy thing in general, so having the direction check at all seems 
needlessly restrictive. As long as the driver implements all the basic 
building blocks, combining them to simulate specific traffic patterns 
can be left up to the benchmark tool.


Robin.


+
unmap_stime = ktime_get();
dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir);
unmap_etime = ktime_get();
@@ -218,6 +224,11 @@ static long map_benchmark_ioctl(struct file *file, 
unsigned int cmd,
return -EINVAL;
}
  
+		if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) {

+   pr_err("invalid transmission delay\n");
+   return -EINVAL;
+   }
+
if (map->bparam.node != NUMA_NO_NODE &&
!node_possible(map->bparam.node)) {
pr_err("invalid numa node\n");
diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c 
b/tools/testing/selftests/dma/dma_map_benchmark.c
index 7065163a8388..dbf426e2fb7f 100644
--- a/tools/testing/selftests/dma/dma_map_benchmark.c
+++ b/tools/testing/selftests/dma/dma_map_benchmark.c
@@ -14,6 +14,7 @@
  #define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark)
  #define DMA_MAP_MAX_THREADS   1024
  #define DMA_MAP_MAX_SECONDS 300
+#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */
  
  #define DMA_MAP_BIDIRECTIONAL	0

  #define DMA_MAP_TO_DEVICE 1
@@ -35,6 +36,7 @@ struct map_benchmark {
__s32 node; /* which numa node this benchmark will run on */
__u32 dma_bits; /* DMA addressing capability */
__u32 dma_dir; /* DMA data direction */
+   __u32 dma_trans_ns; /* delay for DMA transmission in ns */
__u64 expansion[10];/* For future use */
  };
  
@@ -45,12 +47,12 @@ int main(int argc, char **argv)

/* default single thread, run 20 seconds on NUMA_NO_NODE */
int threads = 1, seconds = 20, node = -1;
/* default dma mask 32bit, bidirectional DMA */
-   int bits = 32, dir = DMA_MAP_BIDIRECTIONAL;
+   int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL;
  
  	int cmd = DMA_MAP_BENCHMARK;

char *p;
  
-	while ((opt = getopt(argc, argv, "t:s:n:b:d:")) != -1) {

+   while ((opt = getopt(argc, argv, "t:s:n:b:d:x:")) != -1) {
switch (opt) {
case 't':
threads = atoi(optarg);
@@ -67,6 +69,9 @@ int main(int argc, char **argv)
case 'd':
dir = atoi(optarg);
break;
+   case 'x':
+  

Re: [PATCH v3 1/2] dt-bindings: iommu: add bindings for sprd iommu

2021-02-04 Thread Rob Herring
On Wed, Feb 03, 2021 at 05:07:26PM +0800, Chunyan Zhang wrote:
> From: Chunyan Zhang 
> 
> This iommu module can be used by Unisoc's multimedia devices, such as
> display, Image codec(jpeg) and a few signal processors, including
> VSP(video), GSP(graphic), ISP(image), and CPP(camera pixel processor), etc.
> 
> Signed-off-by: Chunyan Zhang 
> ---
>  .../devicetree/bindings/iommu/sprd,iommu.yaml | 72 +++
>  1 file changed, 72 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> 
> diff --git a/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml 
> b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> new file mode 100644
> index ..4fc99e81fa66
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> @@ -0,0 +1,72 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +# Copyright 2020 Unisoc Inc.
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/sprd,iommu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Unisoc IOMMU and Multi-media MMU
> +
> +maintainers:
> +  - Chunyan Zhang 
> +
> +properties:
> +  compatible:
> +enum:
> +  - sprd,iommu-v1
> +
> +  "#iommu-cells":
> +const: 0
> +description:
> +  Unisoc IOMMUs are all single-master IOMMU devices, therefore no
> +  additional information needs to associate with its master device.
> +  Please refer to the generic bindings document for more details,
> +  Documentation/devicetree/bindings/iommu/iommu.txt
> +
> +  reg:
> +maxItems: 1
> +description:
> +  Not required if 'sprd,iommu-regs' is defined.
> +
> +  clocks:
> +description:
> +  Reference to a gate clock phandle, since access to some of IOMMUs are
> +  controlled by gate clock, but this is not required.
> +
> +  sprd,iommu-regs:
> +$ref: /schemas/types.yaml#/definitions/phandle-array
> +description:
> +  Reference to a syscon phandle plus 1 cell, the syscon defines the
> +  register range used by the iommu and the media device, the cell
> +  defines the offset for iommu registers. Since iommu module shares
> +  the same register range with the media device which uses it.
> +
> +required:
> +  - compatible
> +  - "#iommu-cells"
> +
> +oneOf:
> +  - required:
> +  - reg
> +  - required:
> +  - sprd,iommu-regs
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +iommu_disp: iommu-disp {
> +  compatible = "sprd,iommu-v1";
> +  sprd,iommu-regs = <&dpu_regs 0x800>;

If the IOMMU is contained within another device, then it should just be 
a child node of that device. Or just make 'dpu_regs' an IOMMU provider 
(i.e. just add #iommu-cells to it).

> +  #iommu-cells = <0>;
> +};
> +
> +  - |
> +iommu_jpg: iommu-jpg {
> +  compatible = "sprd,iommu-v1";
> +  sprd,iommu-regs = <&jpg_regs 0x300>;
> +  #iommu-cells = <0>;
> +  clocks = <&mm_gate 1>;
> +};
> +
> +...
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/8] swiotlb: respect min_align_mask

2021-02-04 Thread Robin Murphy

On 2021-02-04 19:30, Christoph Hellwig wrote:

Respect the min_align_mask in struct device_dma_parameters in swiotlb.

There are two parts to it:
  1) for the lower bits of the alignment inside the io tlb slot, just
 extent the size of the allocation and leave the start of the slot
  empty
  2) for the high bits ensure we find a slot that matches the high bits
 of the alignment to avoid wasting too much memory

Based on an earlier patch from Jianxiong Gao .

Signed-off-by: Christoph Hellwig 
---
  kernel/dma/swiotlb.c | 49 +---
  1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6a2439826a1ba4..ab3192142b9906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
}
  }
  
+/*

+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+   unsigned min_align_mask = dma_get_min_align_mask(dev);
+
+   if (!min_align_mask)
+   return 0;


I doubt that's beneficial - even if the compiler can convert it into a 
csel, it'll then be doing unnecessary work to throw away a 
cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters ;)



+   return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);


(BTW, for readability throughout, "#define IO_TLB_SIZE (1 << 
IO_TLB_SHIFT)" sure wouldn't go amiss...)



+}
+
  /*
   * Carefully handle integer overflow which can occur when boundary_mask == 
~0UL.
   */
@@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long 
boundary_mask)
return nr_slots(boundary_mask + 1);
  }
  
+static inline bool check_alignment(phys_addr_t orig_addr,

+   dma_addr_t tbl_dma_addr, unsigned int index,
+   unsigned int min_align_mask)
+{
+   if (!min_align_mask)
+   return true;


Ditto - even the 5 or so operations this might skip is unlikely to 
outweigh a branch on anything that matters, and again csel would be a 
net loss since x & 0 == y & 0 is still the correct answer.



+   return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) ==
+   (orig_addr & min_align_mask);
+}
+
  static unsigned int wrap_index(unsigned int index)
  {
if (index >= io_tlb_nslabs)
@@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index)
   * Find a suitable number of IO TLB entries size that will fit this request 
and
   * allocate a buffer from that IO TLB pool.
   */
-static int find_slots(struct device *dev, size_t alloc_size,
-   dma_addr_t tbl_dma_addr)
+static int find_slots(struct device *dev, phys_addr_t orig_addr,
+   size_t alloc_size, dma_addr_t tbl_dma_addr)
  {
+   unsigned int min_align_mask = dma_get_min_align_mask(dev) &
+   ~((1 << IO_TLB_SHIFT) - 1);
unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
unsigned int nslots = nr_slots(alloc_size), stride = 1;
unsigned int index, wrap, count = 0, i;
@@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size,
 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
-   if (alloc_size >= PAGE_SIZE)
+   if (min_align_mask)
+   stride = (min_align_mask + 1) >> IO_TLB_SHIFT;


So this can't underflow because "min_align_mask" is actually just the 
high-order bits representing the number of iotlb slots needed to meet 
the requirement, right? (It took a good 5 minutes to realise this wasn't 
doing what I initially thought it did...)


In that case, a) could the local var be called something like 
iotlb_align_mask to clarify that it's *not* just a copy of the device's 
min_align_mask, and b) maybe just have an unconditional initialisation 
that works either way:


stride = (min_align_mask >> IO_TLB_SHIFT) + 1;

In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE 
in the call to check_alignment() below, or shift everything down by 
IO_TLB_SHIFT in check_alignment() itself, instead of mangling 
min_align_mask at all (I'm assuming we do need to ignore the low-order 
bits of orig_addr at this point).


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-mapping: benchmark: pretend DMA is transmitting

2021-02-04 Thread Barry Song
In a real dma mapping user case, after dma_map is done, data will be
transmit. Thus, in multi-threaded user scenario, IOMMU contention
should not be that severe. For example, if users enable multiple
threads to send network packets through 1G/10G/100Gbps NIC, usually
the steps will be: map -> transmission -> unmap.  Transmission delay
reduces the contention of IOMMU. Here a delay is added to simulate
the transmission for TX case so that the tested result could be
more accurate.

RX case would be much more tricky. It is not supported yet.

Signed-off-by: Barry Song 
---
 kernel/dma/map_benchmark.c  | 11 +++
 tools/testing/selftests/dma/dma_map_benchmark.c | 17 +++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index 1b1b8ff875cb..1976db7e34e4 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -21,6 +21,7 @@
 #define DMA_MAP_BENCHMARK  _IOWR('d', 1, struct map_benchmark)
 #define DMA_MAP_MAX_THREADS1024
 #define DMA_MAP_MAX_SECONDS300
+#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */
 
 #define DMA_MAP_BIDIRECTIONAL  0
 #define DMA_MAP_TO_DEVICE  1
@@ -36,6 +37,7 @@ struct map_benchmark {
__s32 node; /* which numa node this benchmark will run on */
__u32 dma_bits; /* DMA addressing capability */
__u32 dma_dir; /* DMA data direction */
+   __u32 dma_trans_ns; /* time for DMA transmission in ns */
__u64 expansion[10];/* For future use */
 };
 
@@ -87,6 +89,10 @@ static int map_benchmark_thread(void *data)
map_etime = ktime_get();
map_delta = ktime_sub(map_etime, map_stime);
 
+   /* Pretend DMA is transmitting */
+   if (map->dir != DMA_FROM_DEVICE)
+   ndelay(map->bparam.dma_trans_ns);
+
unmap_stime = ktime_get();
dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, map->dir);
unmap_etime = ktime_get();
@@ -218,6 +224,11 @@ static long map_benchmark_ioctl(struct file *file, 
unsigned int cmd,
return -EINVAL;
}
 
+   if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) {
+   pr_err("invalid transmission delay\n");
+   return -EINVAL;
+   }
+
if (map->bparam.node != NUMA_NO_NODE &&
!node_possible(map->bparam.node)) {
pr_err("invalid numa node\n");
diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c 
b/tools/testing/selftests/dma/dma_map_benchmark.c
index 7065163a8388..dbf426e2fb7f 100644
--- a/tools/testing/selftests/dma/dma_map_benchmark.c
+++ b/tools/testing/selftests/dma/dma_map_benchmark.c
@@ -14,6 +14,7 @@
 #define DMA_MAP_BENCHMARK  _IOWR('d', 1, struct map_benchmark)
 #define DMA_MAP_MAX_THREADS1024
 #define DMA_MAP_MAX_SECONDS 300
+#define DMA_MAP_MAX_TRANS_DELAY(10 * 1000 * 1000) /* 10ms */
 
 #define DMA_MAP_BIDIRECTIONAL  0
 #define DMA_MAP_TO_DEVICE  1
@@ -35,6 +36,7 @@ struct map_benchmark {
__s32 node; /* which numa node this benchmark will run on */
__u32 dma_bits; /* DMA addressing capability */
__u32 dma_dir; /* DMA data direction */
+   __u32 dma_trans_ns; /* delay for DMA transmission in ns */
__u64 expansion[10];/* For future use */
 };
 
@@ -45,12 +47,12 @@ int main(int argc, char **argv)
/* default single thread, run 20 seconds on NUMA_NO_NODE */
int threads = 1, seconds = 20, node = -1;
/* default dma mask 32bit, bidirectional DMA */
-   int bits = 32, dir = DMA_MAP_BIDIRECTIONAL;
+   int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL;
 
int cmd = DMA_MAP_BENCHMARK;
char *p;
 
-   while ((opt = getopt(argc, argv, "t:s:n:b:d:")) != -1) {
+   while ((opt = getopt(argc, argv, "t:s:n:b:d:x:")) != -1) {
switch (opt) {
case 't':
threads = atoi(optarg);
@@ -67,6 +69,9 @@ int main(int argc, char **argv)
case 'd':
dir = atoi(optarg);
break;
+   case 'x':
+   xdelay = atoi(optarg);
+   break;
default:
return -1;
}
@@ -84,6 +89,12 @@ int main(int argc, char **argv)
exit(1);
}
 
+   if (xdelay < 0 || xdelay > DMA_MAP_MAX_TRANS_DELAY) {
+   fprintf(stderr, "invalid transmit delay, must be in 0-%d\n",
+   DMA_MAP_MAX_TRANS_DELAY);
+   exit(1);
+   }
+
/* suppose the mininum DMA zone is 1MB in the world */
if (bits < 20 || bits > 64) {
fprintf(stderr, "invalid dma mask bit, must be in 20-64\n");
@@ -107,6 +118,8 @@ int main(int argc, char **argv)
   

Re: [PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single

2021-02-04 Thread Robin Murphy

On 2021-02-04 19:30, Christoph Hellwig wrote:

Split out a bunch of a self-contained helpers to make the function easier
to follow.

Signed-off-by: Christoph Hellwig 
---
  kernel/dma/swiotlb.c | 177 +--
  1 file changed, 87 insertions(+), 90 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 79d5b236f25f10..e78615e3be2906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,134 +468,131 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
}
  }
  
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,

-   size_t mapping_size, size_t alloc_size,
-   enum dma_data_direction dir, unsigned long attrs)
+/*
+ * Carefully handle integer overflow which can occur when boundary_mask == 
~0UL.
+ */
+static inline unsigned long get_max_slots(unsigned long boundary_mask)
  {
-   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
-   unsigned long flags;
-   phys_addr_t tlb_addr;
-   unsigned int nslots, stride, index, wrap;
-   int i;
-   unsigned long mask;
-   unsigned long offset_slots;
-   unsigned long max_slots;
-   unsigned long tmp_io_tlb_used;
-
-   if (no_iotlb_memory)
-   panic("Can not allocate SWIOTLB buffer earlier and can't now provide 
you with the DMA bounce buffer");
-
-   if (mem_encrypt_active())
-   pr_warn_once("Memory encryption is active and system is using DMA 
bounce buffers\n");
-
-   if (mapping_size > alloc_size) {
-   dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd 
bytes)",
- mapping_size, alloc_size);
-   return (phys_addr_t)DMA_MAPPING_ERROR;
-   }
-
-   mask = dma_get_seg_boundary(hwdev);
+   if (boundary_mask + 1 == ~0UL)


Either "mask == ~0UL" or "mask + 1 == 0", surely?


+   return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+   return nr_slots(boundary_mask + 1);
+}
  
-	tbl_dma_addr &= mask;

+static unsigned int wrap_index(unsigned int index)
+{
+   if (index >= io_tlb_nslabs)
+   return 0;
+   return index;
+}
  
-	offset_slots = nr_slots(tbl_dma_addr);

+/*
+ * Find a suitable number of IO TLB entries size that will fit this request and
+ * allocate a buffer from that IO TLB pool.
+ */
+static int find_slots(struct device *dev, size_t alloc_size,
+   dma_addr_t tbl_dma_addr)
+{
+   unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
+   unsigned int nslots = nr_slots(alloc_size), stride = 1;
+   unsigned int index, wrap, count = 0, i;
+   unsigned long flags;
  
-	/*

-* Carefully handle integer overflow which can occur when mask == ~0UL.
-*/
-   max_slots = mask + 1
-   ? nr_slots(mask + 1)
-   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);


...since the condition here is effectively the latter.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/8] swiotlb: factor out a nr_slots helper

2021-02-04 Thread Robin Murphy

On 2021-02-04 19:30, Christoph Hellwig wrote:

Factor out a helper to find the number of slots for a given size.

Signed-off-by: Christoph Hellwig 
---
  kernel/dma/swiotlb.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 838dbad10ab916..0c0b81799edbdb 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long 
val)
return val & (IO_TLB_SEGSIZE - 1);
  }
  
+static unsigned long nr_slots(u64 val)

+{
+   return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;


Would DIV_ROUND_UP(val, 1 << IOTLB_SHIFT) be even clearer?

Robin.


+}
+
  /*
   * Early SWIOTLB allocation may be too early to allow an architecture to
   * perform the desired operations.  This function allows the architecture to
@@ -493,20 +498,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
  
  	tbl_dma_addr &= mask;
  
-	offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;

+   offset_slots = nr_slots(tbl_dma_addr);
  
  	/*

 * Carefully handle integer overflow which can occur when mask == ~0UL.
 */
max_slots = mask + 1
-   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
+   ? nr_slots(mask + 1)
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
  
  	/*

 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
-   nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+   nslots = nr_slots(alloc_size);
if (alloc_size >= PAGE_SIZE)
stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
else
@@ -602,7 +607,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  enum dma_data_direction dir, unsigned long attrs)
  {
unsigned long flags;
-   int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> 
IO_TLB_SHIFT;
+   int i, count, nslots = nr_slots(alloc_size);
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = io_tlb_orig_addr[index];
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: Question regarding VIOT proposal

2021-02-04 Thread Al Stone
On 03 Feb 2021 09:46, Jean-Philippe Brucker wrote:
> On Tue, Feb 02, 2021 at 01:27:13PM -0700, Al Stone wrote:
> > On 02 Feb 2021 10:17, Jean-Philippe Brucker wrote:
> > > Hi Al,
> > > 
> > > On Fri, Dec 04, 2020 at 01:18:25PM -0700, Al Stone wrote:
> > > > > I updated the doc: https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf
> > > > > You can incorporate it into the ASWG proposal.
> > > > > Changes since v8:
> > > > > * One typo (s/programing/programming/)
> > > > > * Modified the PCI Range node to include a segment range.
> > > > > 
> > > > > I also updated the Linux and QEMU implementations on branch
> > > > > virtio-iommu/devel in https://jpbrucker.net/git/linux/ and
> > > > > https://jpbrucker.net/git/qemu/
> > > > > 
> > > > > Thanks again for helping with this
> > > > > 
> > > > > Jean
> > > > 
> > > > Perfect.  Thanks.  I'll update the ASWG info right away.
> > > 
> > > Has there been any more feedback on the VIOT specification? I'm wondering
> > > when we can start upstreaming support for it.
> > > 
> > > Thanks,
> > > Jean
> > 
> > Ah, sorry, Jean.  I meant to get back to you sooner.  My apologies.
> > 
> > The latest version that resulted from the discussion with Yinghan of
> > Microsoft is the one in front the ASWG; I think if you upstream that
> > version, it should be identical to the spec when it is next published
> > (post ACPI 6.4).
> > 
> > The process is: (1) propose the change, (2) review it in committee,
> > (3) give people more time to think about it, then (4) have a finale
> > vote.  We've been iterating over (1), (2) and (3).  Since there was
> > no new discussion at the last meeting, we should have the final vote
> > on this (4) in the next meeting.  I had hoped we could do that last
> > week but the meeting was canceled at the last minute.  I hope to have
> > the final vote this Thursday and will let you know as soon as it has
> > been decided.
> > 
> > My apologies for the delays; getting things done by committee always
> > takes a bazillion times longer than one would think.
> 
> No worries, thanks a lot for the update!
> 
> Thanks,
> Jean

Sigh.  Just got a note that today's meeting has been canceled :(.

So, next week for the final vote.  OTOH, there have been no comments
of any sort on the proposal -- and silence is good, in this case.

-- 
ciao,
al
---
Al Stone
Software Engineer
Red Hat, Inc.
a...@redhat.com
---

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log

2021-02-04 Thread Robin Murphy

On 2021-01-28 15:17, Keqian Zhu wrote:

From: jiangkunkun 

During dirty log tracking, user will try to retrieve dirty log from
iommu if it supports hardware dirty log. This adds a new interface
named sync_dirty_log in iommu layer and arm smmuv3 implements it,
which scans leaf TTD and treats it's dirty if it's writable (As we
just enable HTTU for stage1, so check AP[2] is not set).

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++
  drivers/iommu/io-pgtable-arm.c  | 90 +
  drivers/iommu/iommu.c   | 41 ++
  include/linux/io-pgtable.h  |  4 +
  include/linux/iommu.h   | 17 
  5 files changed, 179 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2434519e4bb6..43d0536b429a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2548,6 +2548,32 @@ static size_t arm_smmu_merge_page(struct iommu_domain 
*domain, unsigned long iov
return ops->merge_page(ops, iova, paddr, size, prot);
  }
  
+static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,

+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HTTU_HD)) {
+   dev_err(smmu->dev, "don't support HTTU_HD and sync dirty 
log\n");
+   return -EPERM;
+   }
+
+   if (!ops || !ops->sync_dirty_log) {
+   pr_err("don't support sync dirty log\n");
+   return -ENODEV;
+   }
+
+   /* To ensure all inflight transactions are completed */
+   arm_smmu_flush_iotlb_all(domain);


What about transactions that arrive between the point that this 
completes, and the point - potentially much later - that we actually 
access any given PTE during the walk? I don't see what this is supposed 
to be synchronising against, even if it were just a CMD_SYNC (I 
especially don't see why we'd want to knock out the TLBs).



+
+   return ops->sync_dirty_log(ops, iova, size, bitmap,
+   base_iova, bitmap_pgshift);
+}
+
  static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
  {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2649,6 +2675,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
.merge_page = arm_smmu_merge_page,
+   .sync_dirty_log = arm_smmu_sync_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 17390f258eb1..6cfe1ef3fedd 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -877,6 +877,95 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops 
*ops, unsigned long iova
return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
  }
  
+static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data,

+unsigned long iova, size_t size,
+int lvl, arm_lpae_iopte *ptep,
+unsigned long *bitmap,
+unsigned long base_iova,
+unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = &data->iop;
+   size_t base, next_size;
+   unsigned long offset;
+   int nbits, ret;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* It is writable, set the bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   } else {
+   /* To traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+

Re: [RFC PATCH 05/11] iommu/arm-smmu-v3: Merge a span of page to block descriptor

2021-02-04 Thread Robin Murphy

On 2021-01-28 15:17, Keqian Zhu wrote:

From: jiangkunkun 

When stop dirty log tracking, we need to recover all block descriptors
which are splited when start dirty log tracking. This adds a new
interface named merge_page in iommu layer and arm smmuv3 implements it,
which reinstall block mappings and unmap the span of page mappings.

It's caller's duty to find contiuous physical memory.

During merging page, other interfaces are not expected to be working,
so race condition does not exist. And we flush all iotlbs after the merge
procedure is completed to ease the pressure of iommu, as we will merge a
huge range of page mappings in general.


Again, I think we need better reasoning than "race conditions don't 
exist because we don't expect them to exist".



Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 ++
  drivers/iommu/io-pgtable-arm.c  | 78 +
  drivers/iommu/iommu.c   | 75 
  include/linux/io-pgtable.h  |  2 +
  include/linux/iommu.h   | 10 +++
  5 files changed, 185 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5469f4fca820..2434519e4bb6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2529,6 +2529,25 @@ static size_t arm_smmu_split_block(struct iommu_domain 
*domain,
return ops->split_block(ops, iova, size);
  }
  
+static size_t arm_smmu_merge_page(struct iommu_domain *domain, unsigned long iova,

+ phys_addr_t paddr, size_t size, int prot)
+{
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2 and merge page\n");
+   return 0;
+   }
+
+   if (!ops || !ops->merge_page) {
+   pr_err("don't support merge page\n");
+   return 0;
+   }
+
+   return ops->merge_page(ops, iova, paddr, size, prot);
+}
+
  static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
  {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2629,6 +2648,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
+   .merge_page = arm_smmu_merge_page,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f3b7f7115e38..17390f258eb1 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops 
*ops,
return __arm_lpae_split_block(data, iova, size, lvl, ptep);
  }
  
+static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data,

+   unsigned long iova, phys_addr_t paddr,
+   size_t size, int lvl, arm_lpae_iopte *ptep,
+   arm_lpae_iopte prot)
+{
+   arm_lpae_iopte pte, *tablep;
+   struct io_pgtable *iop = &data->iop;
+   struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return 0;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt))
+   return size;
+
+   /* Race does not exist */
+   if (cfg->bbml == 1) {
+   prot |= ARM_LPAE_PTE_NT;
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ ARM_LPAE_GRANULE(data));
+
+   prot &= ~(ARM_LPAE_PTE_NT);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   } else {
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   }
+
+   tablep = iopte_deref(pte, data);
+   __arm_lpae_free_pgtable(data, lvl + 1, tablep);
+   return size;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   /* The size is too small, already merged */
+   return size;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __a

Re: [RFC PATCH 04/11] iommu/arm-smmu-v3: Split block descriptor to a span of page

2021-02-04 Thread Robin Murphy

On 2021-01-28 15:17, Keqian Zhu wrote:

From: jiangkunkun 

Block descriptor is not a proper granule for dirty log tracking. This
adds a new interface named split_block in iommu layer and arm smmuv3
implements it, which splits block descriptor to an equivalent span of
page descriptors.

During spliting block, other interfaces are not expected to be working,
so race condition does not exist. And we flush all iotlbs after the split
procedure is completed to ease the pressure of iommu, as we will split a
huge range of block mappings in general.


"Not expected to be" is not the same thing as "can not". Presumably the 
whole point of dirty log tracking is that it can be run speculatively in 
the background, so is there any actual guarantee that the guest can't, 
say, issue a hotplug event that would cause some memory to be released 
back to the host and unmapped while a scan might be in progress? Saying 
effectively "there is no race condition as long as you assume there is 
no race condition" isn't all that reassuring...


That said, it's not very clear why patches #4 and #5 are here at all, 
given that patches #6 and #7 appear quite happy to handle block entries.



Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  20 
  drivers/iommu/io-pgtable-arm.c  | 122 
  drivers/iommu/iommu.c   |  40 +++
  include/linux/io-pgtable.h  |   2 +
  include/linux/iommu.h   |  10 ++
  5 files changed, 194 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9208881a571c..5469f4fca820 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2510,6 +2510,25 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
return ret;
  }
  
+static size_t arm_smmu_split_block(struct iommu_domain *domain,

+  unsigned long iova, size_t size)
+{
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2 and split block\n");
+   return 0;
+   }
+
+   if (!ops || !ops->split_block) {
+   pr_err("don't support split block\n");
+   return 0;
+   }
+
+   return ops->split_block(ops, iova, size);
+}
+
  static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
  {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2609,6 +2628,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
+   .split_block= arm_smmu_split_block,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e299a44808ae..f3b7f7115e38 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -79,6 +79,8 @@
  #define ARM_LPAE_PTE_SH_IS(((arm_lpae_iopte)3) << 8)
  #define ARM_LPAE_PTE_NS   (((arm_lpae_iopte)1) << 5)
  #define ARM_LPAE_PTE_VALID(((arm_lpae_iopte)1) << 0)
+/* Block descriptor bits */
+#define ARM_LPAE_PTE_NT(((arm_lpae_iopte)1) << 16)
  
  #define ARM_LPAE_PTE_ATTR_LO_MASK	(((arm_lpae_iopte)0x3ff) << 2)

  /* Ignore the contiguous bit for block splitting */
@@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
return iopte_to_paddr(pte, data) | iova;
  }
  
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,

+unsigned long iova, size_t size, int lvl,
+arm_lpae_iopte *ptep);
+
+static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, size_t size,
+   arm_lpae_iopte blk_pte, int lvl,
+   arm_lpae_iopte *ptep)
+{
+   struct io_pgtable_cfg *cfg = &data->iop.cfg;
+   arm_lpae_iopte pte, *tablep;
+   phys_addr_t blk_paddr;
+   size_t tablesz = ARM_LPAE_GRANULE(data);
+   size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+   if (!tablep)
+   return 0;
+
+   blk_paddr = iopte_to_paddr(blk_pte, data);
+   

Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-02-04 Thread Robin Murphy

On 2021-01-28 15:17, Keqian Zhu wrote:

From: jiangkunkun 

The SMMU which supports HTTU (Hardware Translation Table Update) can
update the access flag and the dirty state of TTD by hardware. It is
essential to track dirty pages of DMA.

This adds feature detection, none functional change.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 
  include/linux/io-pgtable.h  |  1 +
  3 files changed, 25 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8ca7415d785d..0f0fe71cc10d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
.pgsize_bitmap  = smmu->pgsize_bitmap,
.ias= ias,
.oas= oas,
+   .httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENCY,
.tlb= &arm_smmu_flush_ops,
.iommu_dev  = smmu->dev,
@@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
if (reg & IDR0_HYP)
smmu->features |= ARM_SMMU_FEAT_HYP;
  
+	switch (FIELD_GET(IDR0_HTTU, reg)) {


We need to accommodate the firmware override as well if we need this to 
be meaningful. Jean-Philippe is already carrying a suitable patch in the 
SVA stack[1].



+   case IDR0_HTTU_NONE:
+   break;
+   case IDR0_HTTU_HA:
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
+   break;
+   case IDR0_HTTU_HAD:
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HD;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported HTTU!\n");
+   return -ENXIO;
+   }
+
/*
 * The coherency feature as set by FW is used in preference to the ID
 * register, but warn on mismatch.
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 96c2e9565e00..e91bea44519e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -33,6 +33,10 @@
  #define IDR0_ASID16   (1 << 12)
  #define IDR0_ATS  (1 << 10)
  #define IDR0_HYP  (1 << 9)
+#define IDR0_HTTU  GENMASK(7, 6)
+#define IDR0_HTTU_NONE 0
+#define IDR0_HTTU_HA   1
+#define IDR0_HTTU_HAD  2
  #define IDR0_COHACC   (1 << 4)
  #define IDR0_TTF  GENMASK(3, 2)
  #define IDR0_TTF_AARCH64  2
@@ -286,6 +290,8 @@
  #define CTXDESC_CD_0_TCR_TBI0 (1ULL << 38)
  
  #define CTXDESC_CD_0_AA64		(1UL << 41)

+#define CTXDESC_CD_0_HD(1UL << 42)
+#define CTXDESC_CD_0_HA(1UL << 43)
  #define CTXDESC_CD_0_S(1UL << 44)
  #define CTXDESC_CD_0_R(1UL << 45)
  #define CTXDESC_CD_0_A(1UL << 46)
@@ -604,6 +610,8 @@ struct arm_smmu_device {
  #define ARM_SMMU_FEAT_RANGE_INV   (1 << 15)
  #define ARM_SMMU_FEAT_BTM (1 << 16)
  #define ARM_SMMU_FEAT_SVA (1 << 17)
+#define ARM_SMMU_FEAT_HTTU_HA  (1 << 18)
+#define ARM_SMMU_FEAT_HTTU_HD  (1 << 19)
u32 features;
  
  #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ea727eb1a1a9..1a00ea8562c7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -97,6 +97,7 @@ struct io_pgtable_cfg {
unsigned long   pgsize_bitmap;
unsigned intias;
unsigned intoas;
+   boolhttu_hd;


This is very specific to the AArch64 stage 1 format, not a generic 
capability - I think it should be a quirk flag rather than a common field.


Robin.

[1] 
https://jpbrucker.net/git/linux/commit/?h=sva/current&id=1ef7d512fb9082450dfe0d22ca4f7e35625a097b



boolcoherent_walk;
const struct iommu_flush_ops*tlb;
struct device   *iommu_dev;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters

2021-02-04 Thread Greg KH
On Thu, Feb 04, 2021 at 08:30:28PM +0100, Christoph Hellwig wrote:
> From: Jianxiong Gao 
> 
> Some devices rely on the address offset in a page to function
> correctly (NVMe driver as an example). These devices may use
> a different page size than the Linux kernel. The address offset
> has to be preserved upon mapping, and in order to do so, we
> need to record the page_offset_mask first.
> 
> Signed-off-by: Jianxiong Gao 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/device.h  |  1 +
>  include/linux/dma-mapping.h | 16 
>  2 files changed, 17 insertions(+)

Acked-by: Greg Kroah-Hartman 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 2/6] swiotlb: convert variables to arrays

2021-02-04 Thread Konrad Rzeszutek Wilk
On Thu, Feb 04, 2021 at 11:49:23AM +, Robin Murphy wrote:
> On 2021-02-04 07:29, Christoph Hellwig wrote:
> > On Wed, Feb 03, 2021 at 03:37:05PM -0800, Dongli Zhang wrote:
> > > This patch converts several swiotlb related variables to arrays, in
> > > order to maintain stat/status for different swiotlb buffers. Here are
> > > variables involved:
> > > 
> > > - io_tlb_start and io_tlb_end
> > > - io_tlb_nslabs and io_tlb_used
> > > - io_tlb_list
> > > - io_tlb_index
> > > - max_segment
> > > - io_tlb_orig_addr
> > > - no_iotlb_memory
> > > 
> > > There is no functional change and this is to prepare to enable 64-bit
> > > swiotlb.
> > 
> > Claire Chang (on Cc) already posted a patch like this a month ago,
> > which looks much better because it actually uses a struct instead
> > of all the random variables.
> 
> Indeed, I skimmed the cover letter and immediately thought that this whole
> thing is just the restricted DMA pool concept[1] again, only from a slightly
> different angle.


Kind of. Let me lay out how some of these pieces are right now:

+---+  +--+
|   |  |  |
|   |  |  |
|   a)Xen-SWIOTLB   |  | b)SWIOTLB (for !Xen) |
|   |  |  |
+---XX--+  +---X--+
   X
  XX XXX
X   XX

 +--XX---+
 |   |
 |   |
 |   c) SWIOTLB generic  |
 |   |
 +---+

Dongli's patches modify the SWIOTLB generic c), and Xen-SWIOTLB a)
parts.

Also see the IOMMU_INIT logic which lays this a bit more deepth
(for example how to enable SWIOTLB on AMD boxes, or IBM with Calgary
IOMMU, etc - see iommu_table.h).

Furtheremore it lays the groundwork to allocate AMD SEV SWIOTLB buffers
later after boot (so that you can stich different pools together).
All the bits are kind of inside of the SWIOTLB code. And also it changes
the Xen-SWIOTLB to do something similar.

The mempool did it similarly by taking the internal parts (aka the
various io_tlb) of SWIOTLB and exposing them out and having
other code:

+---+  +--+
|   |  |  |
|   |  |  |
| a)Xen-SWIOTLB |  | b)SWIOTLB (for !Xen) |
|   |  |  |
+---XX--+  +---X--+
   X
  XX XXX
X   XX

 +--XX---+ +--+
 |   | | Device tree  |
 |   +<+ enabling SWIOTLB |
 |c) SWIOTLB generic | |  |
 |   | | mempool  |
 +---+ +--+

What I was suggesting to Clarie to follow Xen model, that is
do something like this:

+---+  +--+   ++
|   |  |  |   ||
|   |  |  |   ||
| a)Xen-SWIOTLB |  | b)SWIOTLB (for !Xen) |   | e) DT-SWIOTLB  |
|   |  |  |   ||
+---XX--+  +---X--+   +XX-X+
   XXXX X X XX X XX
  XX XXX
X   XX X

 +--XXX--+
 |   |
 |   |
 |c) SWIOTLB generic |
 |   |
 +---+


so using the SWIOTLB generic parts, and then bolt on top
of the device-tree logic, along with the mempool logic.



But Christopher has an interesting suggestion which is
to squash the all the existing code (a, b, c) all together
and pepper it with various jump-tables.


So:


-+
| SWIOTLB:   |
||
|  a) SWIOTLB (for non-Xen)  |
|  b) Xen-SWIOTLB|
|  c) DT-SWIOTLB |
||
||
-+


with all the various bits (M2P/P2M for Xen, mempool for ARM,
and normal allocation for BM) in one big file.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/l

Re: [PATCH 8/8] nvme-pci: set min_align_mask

2021-02-04 Thread Christoph Hellwig
> + dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);

And due to a last minute change from me this doesn't actually compile,
as pdev should be dev.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/8] swiotlb: respect min_align_mask

2021-02-04 Thread Christoph Hellwig
Respect the min_align_mask in struct device_dma_parameters in swiotlb.

There are two parts to it:
 1) for the lower bits of the alignment inside the io tlb slot, just
extent the size of the allocation and leave the start of the slot
 empty
 2) for the high bits ensure we find a slot that matches the high bits
of the alignment to avoid wasting too much memory

Based on an earlier patch from Jianxiong Gao .

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 49 +---
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6a2439826a1ba4..ab3192142b9906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
}
 }
 
+/*
+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+   unsigned min_align_mask = dma_get_min_align_mask(dev);
+
+   if (!min_align_mask)
+   return 0;
+   return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);
+}
+
 /*
  * Carefully handle integer overflow which can occur when boundary_mask == 
~0UL.
  */
@@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long 
boundary_mask)
return nr_slots(boundary_mask + 1);
 }
 
+static inline bool check_alignment(phys_addr_t orig_addr,
+   dma_addr_t tbl_dma_addr, unsigned int index,
+   unsigned int min_align_mask)
+{
+   if (!min_align_mask)
+   return true;
+   return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) ==
+   (orig_addr & min_align_mask);
+}
+
 static unsigned int wrap_index(unsigned int index)
 {
if (index >= io_tlb_nslabs)
@@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, size_t alloc_size,
-   dma_addr_t tbl_dma_addr)
+static int find_slots(struct device *dev, phys_addr_t orig_addr,
+   size_t alloc_size, dma_addr_t tbl_dma_addr)
 {
+   unsigned int min_align_mask = dma_get_min_align_mask(dev) &
+   ~((1 << IO_TLB_SHIFT) - 1);
unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
unsigned int nslots = nr_slots(alloc_size), stride = 1;
unsigned int index, wrap, count = 0, i;
@@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size,
 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
-   if (alloc_size >= PAGE_SIZE)
+   if (min_align_mask)
+   stride = (min_align_mask + 1) >> IO_TLB_SHIFT;
+   else if (alloc_size >= PAGE_SIZE)
stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
 
spin_lock_irqsave(&io_tlb_lock, flags);
@@ -512,6 +538,12 @@ static int find_slots(struct device *dev, size_t 
alloc_size,
 
index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
do {
+   if (!check_alignment(orig_addr, tbl_dma_addr, index,
+min_align_mask)) {
+   index = wrap_index(index + 1);
+   continue;
+   }
+
/*
 * If we find a slot that indicates we have 'nslots' number of
 * contiguous buffers, we allocate the buffers from that slot
@@ -557,6 +589,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 {
dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, io_tlb_start) &
dma_get_seg_boundary(dev);
+   unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int index, i;
phys_addr_t tlb_addr;
 
@@ -572,7 +605,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
 
-   index = find_slots(dev, alloc_size, tbl_dma_addr);
+   alloc_size += offset;
+   index = find_slots(dev, orig_addr, alloc_size, tbl_dma_addr);
if (index == -1) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
@@ -589,7 +623,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
for (i = 0; i < nr_slots(alloc_size); i++)
io_tlb_orig_addr[index + i] = orig_addr + (i << IO_TLB_SHIFT);
 
-   tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
+   tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bou

[PATCH 8/8] nvme-pci: set min_align_mask

2021-02-04 Thread Christoph Hellwig
From: Jianxiong Gao 

The PRP addressing scheme requires all PRP entries except for the
first one to have a zero offset into the NVMe controller pages (which
can be different from the Linux PAGE_SIZE).  Use the min_align_mask
device parameter to ensure that swiotlb does not change the address
of the buffer modulo the device page size to ensure that the PRPs
won't be malformed.

Signed-off-by: Jianxiong Gao 
Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b204205..5d194b4e814719 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2629,6 +2629,7 @@ static void nvme_reset_work(struct work_struct *work)
 * Don't limit the IOMMU merged segment size.
 */
dma_set_max_seg_size(dev->dev, 0x);
+   dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);
 
mutex_unlock(&dev->shutdown_lock);
 
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/8] swiotlb: refactor swiotlb_tbl_map_single

2021-02-04 Thread Christoph Hellwig
Split out a bunch of a self-contained helpers to make the function easier
to follow.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 177 +--
 1 file changed, 87 insertions(+), 90 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 79d5b236f25f10..e78615e3be2906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,134 +468,131 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
}
 }
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
-   size_t mapping_size, size_t alloc_size,
-   enum dma_data_direction dir, unsigned long attrs)
+/*
+ * Carefully handle integer overflow which can occur when boundary_mask == 
~0UL.
+ */
+static inline unsigned long get_max_slots(unsigned long boundary_mask)
 {
-   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
-   unsigned long flags;
-   phys_addr_t tlb_addr;
-   unsigned int nslots, stride, index, wrap;
-   int i;
-   unsigned long mask;
-   unsigned long offset_slots;
-   unsigned long max_slots;
-   unsigned long tmp_io_tlb_used;
-
-   if (no_iotlb_memory)
-   panic("Can not allocate SWIOTLB buffer earlier and can't now 
provide you with the DMA bounce buffer");
-
-   if (mem_encrypt_active())
-   pr_warn_once("Memory encryption is active and system is using 
DMA bounce buffers\n");
-
-   if (mapping_size > alloc_size) {
-   dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: 
%zd bytes)",
- mapping_size, alloc_size);
-   return (phys_addr_t)DMA_MAPPING_ERROR;
-   }
-
-   mask = dma_get_seg_boundary(hwdev);
+   if (boundary_mask + 1 == ~0UL)
+   return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+   return nr_slots(boundary_mask + 1);
+}
 
-   tbl_dma_addr &= mask;
+static unsigned int wrap_index(unsigned int index)
+{
+   if (index >= io_tlb_nslabs)
+   return 0;
+   return index;
+}
 
-   offset_slots = nr_slots(tbl_dma_addr);
+/*
+ * Find a suitable number of IO TLB entries size that will fit this request and
+ * allocate a buffer from that IO TLB pool.
+ */
+static int find_slots(struct device *dev, size_t alloc_size,
+   dma_addr_t tbl_dma_addr)
+{
+   unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
+   unsigned int nslots = nr_slots(alloc_size), stride = 1;
+   unsigned int index, wrap, count = 0, i;
+   unsigned long flags;
 
-   /*
-* Carefully handle integer overflow which can occur when mask == ~0UL.
-*/
-   max_slots = mask + 1
-   ? nr_slots(mask + 1)
-   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+   BUG_ON(!nslots);
 
/*
 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
-   nslots = nr_slots(alloc_size);
if (alloc_size >= PAGE_SIZE)
-   stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
-   else
-   stride = 1;
-
-   BUG_ON(!nslots);
+   stride <<= (PAGE_SHIFT - IO_TLB_SHIFT);
 
-   /*
-* Find suitable number of IO TLB entries size that will fit this
-* request and allocate a buffer from that IO TLB pool.
-*/
spin_lock_irqsave(&io_tlb_lock, flags);
-
if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
goto not_found;
 
-   index = ALIGN(io_tlb_index, stride);
-   if (index >= io_tlb_nslabs)
-   index = 0;
-   wrap = index;
-
+   index = wrap = wrap_index(ALIGN(io_tlb_index, stride));
do {
-   while (iommu_is_span_boundary(index, nslots, offset_slots,
- max_slots)) {
-   index += stride;
-   if (index >= io_tlb_nslabs)
-   index = 0;
-   if (index == wrap)
-   goto not_found;
-   }
-
/*
 * If we find a slot that indicates we have 'nslots' number of
 * contiguous buffers, we allocate the buffers from that slot
 * and mark the entries as '0' indicating unavailable.
 */
-   if (io_tlb_list[index] >= nslots) {
-   int count = 0;
-
-   for (i = index; i < (int) (index + nslots); i++)
-   io_tlb_list[i] = 0;
-   for (i = index - 1;
-io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-io_tlb_list[i]; i--)
-   io_tlb_list[i] = ++count;
-   tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT

[PATCH 6/8] swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single

2021-02-04 Thread Christoph Hellwig
swiotlb_tbl_map_single currently nevers sets a tlb_addr that is not
aligned to the tlb bucket size.  But we're going to add such a case
soon, for which this adjustment would be bogus.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e78615e3be2906..6a2439826a1ba4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -658,7 +658,6 @@ void swiotlb_tbl_sync_single(struct device *hwdev, 
phys_addr_t tlb_addr,
 
if (orig_addr == INVALID_PHYS_ADDR)
return;
-   orig_addr += (unsigned long)tlb_addr & ((1 << IO_TLB_SHIFT) - 1);
 
switch (target) {
case SYNC_FOR_CPU:
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


preserve DMA offsets when using swiotlb

2021-02-04 Thread Christoph Hellwig
Hi all,

this series make NVMe happy when running with swiotlb.  This caters
towards to completely broken NVMe controllers that ignore the
specification (hello to the biggest cloud provider on the planet!),
to crappy SOC that have addressing limitations, or "secure"
virtualization that force bounce buffering to enhance the user
experience.  Or in other words, no one sane should hit it, but
people do.

It is basically a respin of the

"SWIOTLB: Preserve swiotlb map offset when needed."

series from Jianxiong Gao.  It complete rewrites the swiotlb part so that
the offset really is preserved and not just the offset into the swiotlb
slot, and to do so it grew half a dozen patches to refactor the swiotlb
so that a mere mortal like me could actually understand it.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/8] swiotlb: factor out a nr_slots helper

2021-02-04 Thread Christoph Hellwig
Factor out a helper to find the number of slots for a given size.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 838dbad10ab916..0c0b81799edbdb 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -194,6 +194,11 @@ static inline unsigned long io_tlb_offset(unsigned long 
val)
return val & (IO_TLB_SEGSIZE - 1);
 }
 
+static unsigned long nr_slots(u64 val)
+{
+   return ALIGN(val, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -493,20 +498,20 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
 
tbl_dma_addr &= mask;
 
-   offset_slots = ALIGN(tbl_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+   offset_slots = nr_slots(tbl_dma_addr);
 
/*
 * Carefully handle integer overflow which can occur when mask == ~0UL.
 */
max_slots = mask + 1
-   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
+   ? nr_slots(mask + 1)
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
/*
 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
-   nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+   nslots = nr_slots(alloc_size);
if (alloc_size >= PAGE_SIZE)
stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
else
@@ -602,7 +607,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  enum dma_data_direction dir, unsigned long attrs)
 {
unsigned long flags;
-   int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> 
IO_TLB_SHIFT;
+   int i, count, nslots = nr_slots(alloc_size);
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/8] swiotlb: clean up swiotlb_tbl_unmap_single

2021-02-04 Thread Christoph Hellwig
Remove a layer of pointless indentation, replace a hard to follow
ternary expression with a plain if/else.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 41 +
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0c0b81799edbdb..79d5b236f25f10 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -626,28 +626,29 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
 * with slots below and above the pool being returned.
 */
spin_lock_irqsave(&io_tlb_lock, flags);
-   {
-   count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
-io_tlb_list[index + nslots] : 0);
-   /*
-* Step 1: return the slots to the free list, merging the
-* slots with superceeding slots
-*/
-   for (i = index + nslots - 1; i >= index; i--) {
-   io_tlb_list[i] = ++count;
-   io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
-   }
-   /*
-* Step 2: merge the returned slots with the preceding slots,
-* if available (non zero)
-*/
-   for (i = index - 1;
-io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
-io_tlb_list[i]; i--)
-   io_tlb_list[i] = ++count;
+   if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE))
+   count = io_tlb_list[index + nslots];
+   else
+   count = 0;
 
-   io_tlb_used -= nslots;
+   /*
+* Step 1: return the slots to the free list, merging the slots with
+* superceeding slots
+*/
+   for (i = index + nslots - 1; i >= index; i--) {
+   io_tlb_list[i] = ++count;
+   io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
+
+   /*
+* Step 2: merge the returned slots with the preceding slots, if
+* available (non zero)
+*/
+   for (i = index - 1;
+io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && io_tlb_list[i];
+i--)
+   io_tlb_list[i] = ++count;
+   io_tlb_used -= nslots;
spin_unlock_irqrestore(&io_tlb_lock, flags);
 }
 
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/8] swiotlb: add a io_tlb_offset helper

2021-02-04 Thread Christoph Hellwig
Replace the very genericly named OFFSET macro with a little inline
helper that hardcodes the alignment to the only value ever passed.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e61001d..838dbad10ab916 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,9 +50,6 @@
 #define CREATE_TRACE_POINTS
 #include 
 
-#define OFFSET(val,align) ((unsigned long) \
-  ( (val) & ( (align) - 1)))
-
 #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
 
 /*
@@ -192,6 +189,11 @@ void swiotlb_print_info(void)
   bytes >> 20);
 }
 
+static inline unsigned long io_tlb_offset(unsigned long val)
+{
+   return val & (IO_TLB_SEGSIZE - 1);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -241,7 +243,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
  __func__, alloc_size, PAGE_SIZE);
 
for (i = 0; i < io_tlb_nslabs; i++) {
-   io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+   io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
io_tlb_index = 0;
@@ -375,7 +377,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
goto cleanup4;
 
for (i = 0; i < io_tlb_nslabs; i++) {
-   io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+   io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
io_tlb_index = 0;
@@ -546,7 +548,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
 
for (i = index; i < (int) (index + nslots); i++)
io_tlb_list[i] = 0;
-   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
+   for (i = index - 1;
+io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+io_tlb_list[i]; i--)
io_tlb_list[i] = ++count;
tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
 
@@ -632,7 +636,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
 * Step 2: merge the returned slots with the preceding slots,
 * if available (non zero)
 */
-   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
+   for (i = index - 1;
+io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+io_tlb_list[i]; i--)
io_tlb_list[i] = ++count;
 
io_tlb_used -= nslots;
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/8] driver core: add a min_align_mask field to struct device_dma_parameters

2021-02-04 Thread Christoph Hellwig
From: Jianxiong Gao 

Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao 
Signed-off-by: Christoph Hellwig 
---
 include/linux/device.h  |  1 +
 include/linux/dma-mapping.h | 16 
 2 files changed, 17 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4cb4..7960bf516dd7fe 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -291,6 +291,7 @@ struct device_dma_parameters {
 * sg limitations.
 */
unsigned int max_segment_size;
+   unsigned int min_align_mask;
unsigned long segment_boundary_mask;
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f391a..9c26225754e719 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,22 @@ static inline int dma_set_seg_boundary(struct device *dev, 
unsigned long mask)
return -EIO;
 }
 
+static inline unsigned int dma_get_min_align_mask(struct device *dev)
+{
+   if (dev->dma_parms)
+   return dev->dma_parms->min_align_mask;
+   return 0;
+}
+
+static inline int dma_set_min_align_mask(struct device *dev,
+   unsigned int min_align_mask)
+{
+   if (WARN_ON_ONCE(!dev->dma_parms))
+   return -EIO;
+   dev->dma_parms->min_align_mask = min_align_mask;
+   return 0;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] iommu/vt-d: Use Real PCI DMA device for IRTE

2021-02-04 Thread Jon Derrick
VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
In order to support direct interrupt remapping of VMD child devices,
ensure that the IRTE is programmed with the VMD endpoint's requester-id
using pci_real_dma_dev().

Signed-off-by: Jon Derrick 
---
 drivers/iommu/intel/irq_remapping.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index 685200a5cff0..1939e070eec8 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1276,7 +1276,8 @@ static void intel_irq_remapping_prepare_irte(struct 
intel_ir_data *data,
break;
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
+   set_msi_sid(irte,
+   pci_real_dma_dev(msi_desc_to_pci_dev(info->desc)));
break;
default:
BUG_ON(1);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] PCI: vmd: Disable MSI/X remapping when possible

2021-02-04 Thread Jon Derrick
VMD will retransmit child device MSI/X using its own MSI/X table and
requester-id. This limits the number of MSI/X available to the whole
child device domain to the number of VMD MSI/X interrupts.

Some VMD devices have a mode where this remapping can be disabled,
allowing child device interrupts to bypass processing with the VMD MSI/X
domain interrupt handler and going straight the child device interrupt
handler, allowing for better performance and scaling. The requester-id
still gets changed to the VMD endpoint's requester-id, and the interrupt
remapping handlers have been updated to properly set IRTE for child
device interrupts to the VMD endpoint's context.

Some VMD platforms have existing production BIOS which rely on MSI/X
remapping and won't explicitly program the MSI/X remapping bit. This
re-enables MSI/X remapping on unload.

Signed-off-by: Jon Derrick 
---
 drivers/pci/controller/vmd.c | 60 
 1 file changed, 48 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 5e80f28f0119..a319ce49645b 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -59,6 +59,13 @@ enum vmd_features {
 * be used for MSI remapping
 */
VMD_FEAT_OFFSET_FIRST_VECTOR= (1 << 3),
+
+   /*
+* Device can bypass remapping MSI/X transactions into its MSI/X table,
+* avoding the requirement of a VMD MSI domain for child device
+* interrupt handling
+*/
+   VMD_FEAT_BYPASS_MSI_REMAP   = (1 << 4),
 };
 
 /*
@@ -306,6 +313,15 @@ static struct msi_domain_info vmd_msi_domain_info = {
.chip   = &vmd_msi_controller,
 };
 
+static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable)
+{
+   u16 reg;
+
+   pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, ®);
+   reg = enable ? (reg & ~0x2) : (reg | 0x2);
+   pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg);
+}
+
 static int vmd_create_irq_domain(struct vmd_dev *vmd)
 {
struct fwnode_handle *fn;
@@ -325,6 +341,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd)
 
 static void vmd_remove_irq_domain(struct vmd_dev *vmd)
 {
+   /*
+* Some production BIOS won't enable remapping between soft reboots.
+* Ensure remapping is restored before unloading the driver.
+*/
+   if (!vmd->msix_count)
+   vmd_enable_msi_remapping(vmd, true);
+
if (vmd->irq_domain) {
struct fwnode_handle *fn = vmd->irq_domain->fwnode;
 
@@ -679,15 +702,31 @@ static int vmd_enable_domain(struct vmd_dev *vmd, 
unsigned long features)
 
sd->node = pcibus_to_node(vmd->dev->bus);
 
-   ret = vmd_create_irq_domain(vmd);
-   if (ret)
-   return ret;
-
/*
-* Override the irq domain bus token so the domain can be distinguished
-* from a regular PCI/MSI domain.
+* Currently MSI remapping must be enabled in guest passthrough mode
+* due to some missing interrupt remapping plumbing. This is probably
+* acceptable because the guest is usually CPU-limited and MSI
+* remapping doesn't become a performance bottleneck.
 */
-   irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
+   if (!(features & VMD_FEAT_BYPASS_MSI_REMAP) || offset[0] || offset[1]) {
+   ret = vmd_alloc_irqs(vmd);
+   if (ret)
+   return ret;
+
+   vmd_enable_msi_remapping(vmd, true);
+
+   ret = vmd_create_irq_domain(vmd);
+   if (ret)
+   return ret;
+
+   /*
+* Override the irq domain bus token so the domain can be
+* distinguished from a regular PCI/MSI domain.
+*/
+   irq_domain_update_bus_token(vmd->irq_domain, 
DOMAIN_BUS_VMD_MSI);
+   } else {
+   vmd_enable_msi_remapping(vmd, false);
+   }
 
pci_add_resource(&resources, &vmd->resources[0]);
pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]);
@@ -753,10 +792,6 @@ static int vmd_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
if (features & VMD_FEAT_OFFSET_FIRST_VECTOR)
vmd->first_vec = 1;
 
-   err = vmd_alloc_irqs(vmd);
-   if (err)
-   return err;
-
spin_lock_init(&vmd->cfg_lock);
pci_set_drvdata(dev, vmd);
err = vmd_enable_domain(vmd, features);
@@ -825,7 +860,8 @@ static const struct pci_device_id vmd_ids[] = {
.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP,},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_28C0),
.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW |
-   VMD_FEAT_HAS_BUS_RESTRICTIONS,},
+   VMD_FEAT_HAS_BUS_RESTRICTIONS |
+   VM

[PATCH v2 0/2] VMD MSI Remapping Bypass

2021-02-04 Thread Jon Derrick
The Intel Volume Management Device acts similar to a PCI-to-PCI bridge in that
it changes downstream devices' requester-ids to its own. As VMD supports PCIe
devices, it has its own MSI/X table and transmits child device MSI/X by
remapping child device MSI/X and handling like a demultiplexer.

Some newer VMD devices (Icelake Server) have an option to bypass the VMD MSI/X
remapping table. This allows for better performance scaling as the child device
MSI/X won't be limited by VMD's MSI/X count and IRQ handler.

V1->V2:
Updated for 5.12-next
Moved IRQ allocation and remapping enable/disable to a more logical location

V1 patches 1-4 were already merged
V1, 5/6: 
https://patchwork.kernel.org/project/linux-pci/patch/20200728194945.14126-6-jonathan.derr...@intel.com/
V1, 6/6: 
https://patchwork.kernel.org/project/linux-pci/patch/20200728194945.14126-7-jonathan.derr...@intel.com/

Jon Derrick (2):
  iommu/vt-d: Use Real PCI DMA device for IRTE
  PCI: vmd: Disable MSI/X remapping when possible

 drivers/iommu/intel/irq_remapping.c |  3 +-
 drivers/pci/controller/vmd.c| 60 +++--
 2 files changed, 50 insertions(+), 13 deletions(-)

-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Michael Kelley via iommu
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:05 AM
> 
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu 
> ---
> v6:
> 1. Simplify code due to changes in a previous patch.
> ---
>  arch/x86/hyperv/irqdomain.c |  25 +
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c| 177 +++-
>  3 files changed, 203 insertions(+), 3 deletions(-)
> 

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Wei Liu
On Thu, Feb 04, 2021 at 04:41:47PM +, Michael Kelley wrote:
> From: Wei Liu  Sent: Wednesday, February 3, 2021 4:47 AM
[...]
> > > > +
> > > > +   if (level)
> > > > +   intr_desc->trigger_mode = 
> > > > HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > > +   else
> > > > +   intr_desc->trigger_mode = 
> > > > HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > > +
> > > > +   __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > > > +
> > > > +   status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, 
> > > > input,
> > output) &
> > > > +HV_HYPERCALL_RESULT_MASK;
> > > > +   local_irq_restore(flags);
> > > > +
> > > > +   *entry = output->interrupt_entry;
> > > > +
> > > > +   return status;
> > >
> > > As a cross-check, I was comparing this code against 
> > > hv_map_msi_interrupt().  They are
> > > mostly parallel, though some of the assignments are done in a different 
> > > order.  It's a nit,
> > > but making them as parallel as possible would be nice. :-)
> > >
> > 
> > Indeed. I will see about factoring out a function.
> 
> If factoring out a separate helper function is clumsy, just having the 
> parallel code
> in the two functions be as similar as possible makes it easier to see what's 
> the
> same and what's different.
> 

No. It is not clumsy at all. I've done it in the newly posted v6.

I was baffled why I wrote hv_unmap_interrupt helper to be used by both
hv_unmap_ioapic_interrupt and hv_unmap_msi_interrupt in the previous
patch, but didn't write a hv_map_interrupt. Maybe I didn't have enough
coffee that day. :-/

Thanks for pointing out that issue. It definitely helped improve the
quality of this series.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Michael Kelley via iommu
From: Wei Liu  Sent: Wednesday, February 3, 2021 4:47 AM
> 
> On Wed, Jan 27, 2021 at 05:47:08AM +, Michael Kelley wrote:
> > From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> > > Hypervisor when Linux runs as the root partition. Implement an IRQ
> > > domain to handle mapping and unmapping of IO-APIC interrupts.
> > >
> > > Signed-off-by: Wei Liu 
> > > ---
> > >  arch/x86/hyperv/irqdomain.c |  54 ++
> > >  arch/x86/include/asm/mshyperv.h |   4 +
> > >  drivers/iommu/hyperv-iommu.c| 179 +++-
> > >  3 files changed, 233 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > > index 19637cd60231..8e2b4e478b70 100644
> > > --- a/arch/x86/hyperv/irqdomain.c
> > > +++ b/arch/x86/hyperv/irqdomain.c
> > > @@ -330,3 +330,57 @@ struct irq_domain * __init 
> > > hv_create_pci_msi_domain(void)
> > >  }
> > >
> > >  #endif /* CONFIG_PCI_MSI */
> > > +
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> > > *entry)
> > > +{
> > > + union hv_device_id device_id;
> > > +
> > > + device_id.as_uint64 = 0;
> > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > + return hv_unmap_interrupt(device_id.as_uint64, entry) &
> HV_HYPERCALL_RESULT_MASK;
> >
> > The masking is already done in hv_unmap_interrupt.
> 
> Fixed.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> > > +
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int 
> > > vector,
> > > + struct hv_interrupt_entry *entry)
> > > +{
> > > + unsigned long flags;
> > > + struct hv_input_map_device_interrupt *input;
> > > + struct hv_output_map_device_interrupt *output;
> > > + union hv_device_id device_id;
> > > + struct hv_device_interrupt_descriptor *intr_desc;
> > > + u16 status;
> > > +
> > > + device_id.as_uint64 = 0;
> > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > + local_irq_save(flags);
> > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > + memset(input, 0, sizeof(*input));
> > > + intr_desc = &input->interrupt_descriptor;
> > > + input->partition_id = hv_current_partition_id;
> > > + input->device_id = device_id.as_uint64;
> > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > > + intr_desc->target.vector = vector;
> > > + intr_desc->vector_count = 1;
> > > +
> > > + if (level)
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > + else
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > +
> > > + __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > > +
> > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input,
> output) &
> > > +  HV_HYPERCALL_RESULT_MASK;
> > > + local_irq_restore(flags);
> > > +
> > > + *entry = output->interrupt_entry;
> > > +
> > > + return status;
> >
> > As a cross-check, I was comparing this code against hv_map_msi_interrupt(). 
> >  They are
> > mostly parallel, though some of the assignments are done in a different 
> > order.  It's a nit,
> > but making them as parallel as possible would be nice. :-)
> >
> 
> Indeed. I will see about factoring out a function.

If factoring out a separate helper function is clumsy, just having the parallel 
code
in the two functions be as similar as possible makes it easier to see what's the
same and what's different.

> 
> > Same 64 vCPU comment applies here as well.
> >
> 
> This is changed to use vpset instead. Took me a bit of time to get it
> working because document is a bit lacking.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> > > diff --git a/arch/x86/include/asm/mshyperv.h 
> > > b/arch/x86/include/asm/mshyperv.h
> > > index ccc849e25d5e..345d7c6f8c37 100644
> > > --- a/arch/x86/include/asm/mshyperv.h
> > > +++ b/arch/x86/include/asm/mshyperv.h
> > > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> > > hv_msi_entry *msi_entry,
> > >
> > >  struct irq_domain *hv_create_pci_msi_domain(void);
> > >
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int 
> > > vector,
> > > + struct hv_interrupt_entry *entry);
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> > > *entry);
> > > +
> > >  #else /* CONFIG_HYPERV */
> > >  static inline void hyperv_init(void) {}
> > >  static inline void hyperv_setup_mmu_ops(void) {}
> > > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > > index b7db6024e65c..6d35e4c303c6 100644
> > > --- a/drivers/iommu/hyperv-iommu.c
> > > +++ b/drivers/iommu/hyperv-iommu.c
> > > @@ -116,30 +116,43 @@ static const struct irq_domain_ops 
> > > hyperv_ir_domain_o

Re: [RFC PATCH 3/3] iommu/arm-smmu-v3: Add debug interfaces for SMMUv3

2021-02-04 Thread Robin Murphy

On 2021-01-29 09:06, Zhou Wang wrote:

This patch adds debug interfaces for SMMUv3 driver in sysfs. It adds debug
related files under /sys/kernel/debug/iommu/smmuv3.

User should firstly set device and pasid to pci_dev and pasid by:
(currently only support PCI device)
echo ::. > /sys/kernel/debug/iommu/smmuv3/pci_dev
echo  > /sys/kernel/debug/iommu/smmuv3/pasid

Then value in cd and ste can be got by:
cat /sys/kernel/debug/iommu/smmuv3/ste
cat /sys/kernel/debug/iommu/smmuv3/cd

S1 and S2 page tables can be got by:
cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s1
cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s2

For ste, cd and page table, related device and pasid are set in pci_dev and
pasid files as above.

Signed-off-by: Zhou Wang 
---
  drivers/iommu/Kconfig   |  11 +
  drivers/iommu/arm/arm-smmu-v3/Makefile  |   1 +
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   3 +
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   8 +
  drivers/iommu/arm/arm-smmu-v3/debugfs.c | 398 
  5 files changed, 421 insertions(+)
  create mode 100644 drivers/iommu/arm/arm-smmu-v3/debugfs.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 192ef8f..4822c88 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -325,6 +325,17 @@ config ARM_SMMU_V3_SVA
  Say Y here if your system supports SVA extensions such as PCIe PASID
  and PRI.
  
+config ARM_SMMU_V3_DEBUGFS

+   bool "Export ARM SMMUv3 internals in Debugfs"
+   depends on ARM_SMMU_V3 && IOMMU_DEBUGFS
+   help
+ DO NOT ENABLE THIS OPTION UNLESS YOU REALLY KNOW WHAT YOU ARE DOING!
+
+ Expose ARM SMMUv3 internals in Debugfs.
+
+ This option is -NOT- intended for production environments, and should
+ only be enabled for debugging ARM SMMUv3.
+
  config S390_IOMMU
def_bool y if S390 && PCI
depends on S390 && PCI
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile 
b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 54feb1ec..55b411a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
  arm_smmu_v3-objs-y += arm-smmu-v3.o
  arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
+obj-$(CONFIG_ARM_SMMU_V3_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b65f63e2..aac7fdb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3602,6 +3602,8 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
return ret;
}
  
+	arm_smmu_debugfs_init();

+
return arm_smmu_set_bus_ops(&arm_smmu_ops);
  }
  
@@ -3610,6 +3612,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev)

struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
  
  	arm_smmu_set_bus_ops(NULL);

+   arm_smmu_debugfs_uninit();
iommu_device_unregister(&smmu->iommu);
iommu_device_sysfs_remove(&smmu->iommu);
arm_smmu_device_disable(smmu);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3e7af39..31c4580 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -752,4 +752,12 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva 
*handle)
  
  static inline void arm_smmu_sva_notifier_synchronize(void) {}

  #endif /* CONFIG_ARM_SMMU_V3_SVA */
+
+#ifdef CONFIG_ARM_SMMU_V3_DEBUGFS
+void arm_smmu_debugfs_init(void);
+void arm_smmu_debugfs_uninit(void);
+#else
+static inline void arm_smmu_debugfs_init(void) {}
+static inline void arm_smmu_debugfs_uninit(void) {}
+#endif /* CONFIG_ARM_SMMU_V3_DEBUGFS */
  #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/debugfs.c 
b/drivers/iommu/arm/arm-smmu-v3/debugfs.c
new file mode 100644
index 000..1af219a
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/debugfs.c
@@ -0,0 +1,398 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include "arm-smmu-v3.h"
+#include "../../io-pgtable-arm.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt)"SMMUv3 debug: " fmt
+
+#define NAME_BUF_LEN 32
+
+static struct dentry *arm_smmu_debug;
+static char dump_pci_dev[NAME_BUF_LEN];
+static u32 pasid;
+static struct mutex lock;
+
+static ssize_t master_pdev_read(struct file *filp, char __user *buf,
+   size_t count, loff_t *pos)
+{
+   char pdev_name[NAME_BUF_LEN];
+   char name[NAME_BUF_LEN];
+   int ret;
+
+   mutex_lock(&lock);
+   strncpy(pdev_name, dump_pci_dev, NAME_BUF_LEN);
+   mutex_unlock(&lock);
+
+   if (!strlen(pdev_name)) {
+   pr_err("Please set pci_dev firstly\n");


Even if it's "just debugfs", printing userspace-triggered errors to

Re: [PATCH][next][V2] iommu/mediatek: Fix unsigned domid comparison with less than zero

2021-02-04 Thread Will Deacon
On Thu, Feb 04, 2021 at 03:00:01PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Currently the check for domid < 0 is always false because domid
> is unsigned. Fix this by casting domid to an int before making
> the comparison.
> 
> Addresses-Coverity: ("Unsigned comparison against 0")
> Signed-off-by: Colin Ian King 
> ---
> 
> V2: cast domid rather than making it an int. Replace L with : in
> the commit message.
> 
> ---
>  drivers/iommu/mtk_iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 0ad14a7604b1..1f262621ef19 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -645,7 +645,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
>   struct iommu_resv_region *region;
>   int prot = IOMMU_WRITE | IOMMU_READ;
>  
> - if (domid < 0)
> + if ((int)domid < 0)
>   return;
>   curdom = data->plat_data->iova_region + domid;
>   for (i = 0; i < data->plat_data->iova_region_nr; i++) {

Thanks, Colin.

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH][next][V2] iommu/mediatek: Fix unsigned domid comparison with less than zero

2021-02-04 Thread Colin King
From: Colin Ian King 

Currently the check for domid < 0 is always false because domid
is unsigned. Fix this by casting domid to an int before making
the comparison.

Addresses-Coverity: ("Unsigned comparison against 0")
Signed-off-by: Colin Ian King 
---

V2: cast domid rather than making it an int. Replace L with : in
the commit message.

---
 drivers/iommu/mtk_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 0ad14a7604b1..1f262621ef19 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -645,7 +645,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
struct iommu_resv_region *region;
int prot = IOMMU_WRITE | IOMMU_READ;
 
-   if (domid < 0)
+   if ((int)domid < 0)
return;
curdom = data->plat_data->iova_region + domid;
for (i = 0; i < data->plat_data->iova_region_nr; i++) {
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/7] [PULL REQUEST] iommu/vt-d: Update for v5.12

2021-02-04 Thread Joerg Roedel
On Thu, Feb 04, 2021 at 07:52:29PM +0800, Lu Baolu wrote:
> Hi Joerg,
> 
> I just received some internal comments on the last patch
> 
> [PATCH 7/7] iommu/vt-d: Apply SATC policy
> 
> We need some extra work on it and probably re-target it to v5.13.
> 
> Can you please only consider patch 1 ~ 6 for v5.12?

Applied patches 1-6, thanks Baolu.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Joerg Roedel
On Wed, Feb 03, 2021 at 03:04:35PM +, Wei Liu wrote:
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu 

Acked-by: Joerg Roedel 

> ---
> v6:
> 1. Simplify code due to changes in a previous patch.
> ---
>  arch/x86/hyperv/irqdomain.c |  25 +
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c| 177 +++-
>  3 files changed, 203 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 117f17e8c88a..0cabc9aece38 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -360,3 +360,28 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
>  }
>  
>  #endif /* CONFIG_PCI_MSI */
> +
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_unmap_interrupt(device_id.as_uint64, entry);
> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> +
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector,
> + struct hv_interrupt_entry *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_map_interrupt(device_id, level, cpu, vector, entry);
> +}
> +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccc849e25d5e..345d7c6f8c37 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union 
> hv_msi_entry *msi_entry,
>  
>  struct irq_domain *hv_create_pci_msi_domain(void);
>  
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry);
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry);
> +
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..e285a220c913 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "irq_remapping.h"
>  
> @@ -115,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops 
> = {
>   .free = hyperv_irq_remapping_free,
>  };
>  
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
>  static int __init hyperv_prepare_irq_remapping(void)
>  {
>   struct fwnode_handle *fn;
>   int i;
> + const char *name;
> + const struct irq_domain_ops *ops;
>  
>   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>   x86_init.hyper.msi_ext_dest_id() ||
>   !x2apic_supported())
>   return -ENODEV;
>  
> - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> + if (hv_root_partition) {
> + name = "HYPERV-ROOT-IR";
> + ops = &hyperv_root_ir_domain_ops;
> + } else {
> + name = "HYPERV-IR";
> + ops = &hyperv_ir_domain_ops;
> + }
> +
> + fn = irq_domain_alloc_named_id_fwnode(name, 0);
>   if (!fn)
>   return -ENOMEM;
>  
>   ioapic_ir_domain =
>   irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
> - 0, IOAPIC_REMAPPING_ENTRY, fn,
> - &hyperv_ir_domain_ops, NULL);
> + 0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
>  
>   if (!ioapic_ir_domain) {
>   irq_domain_free_fwnode(fn);
>   return -ENOMEM;
>   }
>  
> + if (hv_root_partition)
> + return 0; /* The rest is only relevant to guests */
> +
>   /*
>* Hyper-V doesn't provide irq remapping function for
>* IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> @@ -166,4 +180,161 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
>   .enable = hyperv_enable_irq_remapping,
>  };
>  
> +/* IRQ remapping domain when Linux runs as the root partition */
> +struct hyperv_root_ir_data {
> + u8 ioapic_id;
> + bool is_level;
> + struct hv_interrupt_entry entry;
> +};
> +
> +static void
> +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg 
> *msg)
> +{
> + u64 status;
> + u32 vector;
> + struct irq_cfg *cfg;
> + int ioapic_id;
> + struct cpumask *affinity;
> + int cpu;
> +   

Re: [PATCH][next] iommu/mediatek: Fix unsigned domid comparison with less than zero

2021-02-04 Thread Joerg Roedel
On Thu, Feb 04, 2021 at 09:25:58AM +, Will Deacon wrote:
> On Wed, Feb 03, 2021 at 01:59:36PM +, Colin King wrote:
> > From: Colin Ian King 
> > 
> > Currently the check for domid < 0 is always false because domid
> > is unsigned.  Fix this by making it signed.
> > 
> > Addresses-CoverityL ("Unsigned comparison against 0")
> 
> Typo here ('L' instead of ':')
> 
> > Fixes: ab1d5281a62b ("iommu/mediatek: Add iova reserved function")
> > Signed-off-by: Colin Ian King 
> > ---
> >  drivers/iommu/mtk_iommu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 0ad14a7604b1..823d719945b2 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -640,7 +640,7 @@ static void mtk_iommu_get_resv_regions(struct device 
> > *dev,
> >struct list_head *head)
> >  {
> > struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
> > -   unsigned int domid = mtk_iommu_get_domain_id(dev, data->plat_data), i;
> > +   int domid = mtk_iommu_get_domain_id(dev, data->plat_data), i;
> 
> Not sure if it's intentional, but this also makes 'i' signed. It probably
> should remain 'unsigned' to match 'iova_region_nr' in
> 'struct mtk_iommu_plat_data'.

Yes, 'i' should stay unsigned. Colin, can you please fix that up and
re-send?

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/12] platform-msi: Add platform check for subdevice irq domain

2021-02-04 Thread Lu Baolu

Hi Jason,

On 2021/2/4 20:14, Jason Gunthorpe wrote:

On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote:

+bool arch_support_pci_device_ims(struct pci_dev *pdev)
+{


Consistent language please, we are not using IMS elsewhere, this
feature is called device_msi in Linux.



Thanks for pointing this out. I will correct it.


Jason



Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/3] iommu/arm-smmu-v3: Add debug interfaces for SMMUv3

2021-02-04 Thread Zhou Wang
On 2021/2/4 5:38, Will Deacon wrote:
> On Wed, Feb 03, 2021 at 11:15:18AM +0800, Zhou Wang wrote:
>> On 2021/1/29 17:06, Zhou Wang wrote:
>>> This RFC series is the followed patch of this discussion:
>>> https://www.spinics.net/lists/arm-kernel/msg866187.html. 
>>>
>>> Currently there is no debug interface about SMMUv3 driver, which makes it
>>> not convenient when we want to dump some information, like the value of
>>> CD/STE, S1/S2 page table, SMMU registers or cmd/event/pri queues.
>>>
>>> This series tries to add support of dumping CD/STE and page table. The
>>> interface design is that user sets device/pasid firstly by sysfs files
>>> and then read related sysfs file to get information:
>>>
>>>  (currently only support PCI device)
>>>  echo ::. > /sys/kernel/debug/iommu/smmuv3/pci_dev
>>>  echo  > /sys/kernel/debug/iommu/smmuv3/pasid
>>>  
>>>  Then value in CD and STE can be got by:
>>>  cat /sys/kernel/debug/iommu/smmuv3/ste
>>>  cat /sys/kernel/debug/iommu/smmuv3/cd
>>>  
>>>  S1 and S2 page tables can be got by:
>>>  cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s1
>>>  cat /sys/kernel/debug/iommu/smmuv3/pt_dump_s2
>>>
>>> For STE, CD and page table, related device and pasid are set in pci_dev
>>> and pasid files as above.
>>>
>>> First and second patch export some help functions or macros in arm-smmu-v3
>>> and io-pgtable-arm codes, so we can reuse them in debugfs.c. As a RFC, this
>>> series does not go further to dump SMMU registers and cmd/event/pri queues.
>>> I am not sure this series is in the right way, so let's post it out and 
>>> have a
>>> discussion. Looking forward to any feedback.
>>>
>>> Zhou Wang (3):
>>>   iommu/arm-smmu-v3: Export cd/ste get functions
>>>   iommu/io-pgtable: Export page table walk needed functions and macros
>>>   iommu/arm-smmu-v3: Add debug interfaces for SMMUv3
>>>
>>>  drivers/iommu/Kconfig   |  11 +
>>>  drivers/iommu/arm/arm-smmu-v3/Makefile  |   1 +
>>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  10 +-
>>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  10 +
>>>  drivers/iommu/arm/arm-smmu-v3/debugfs.c | 398 
>>> 
>>>  drivers/iommu/io-pgtable-arm.c  |  47 +---
>>>  drivers/iommu/io-pgtable-arm.h  |  43 +++
>>>  7 files changed, 475 insertions(+), 45 deletions(-)
>>>  create mode 100644 drivers/iommu/arm/arm-smmu-v3/debugfs.c
>>>
>>
>> Any comments about this series?
> 
> Truthfully, I don't really see the use in dumping the state of the SMMU
> data structures. They're not especially dynamic, and there are higher level
> ways to determine how devices map to groups etc.

Here the aim is not to find the device/groups maps, but to dump CD/STE value.
Of cause, we can know them from reading codes, however, we expect to dump
hardware configures quickly and directly when debugging hardware related
problem. It is a real pain when your hardware guy ask you to do this :)

> 
> However, I can see some utility in dumping the page-tables. We have that
> functionality for the CPU side via /sys/kernel/debug/kernel_page_tables,
> so something similar in the io-pgtable code could be quite neat. In
> particular, the logic to expose things in debugfs and drive the dumping
> could be agnostic of the page-table format, while the formats themselves

Do you mean different page-table format, like 64_s1, 64_s2, 32_s1, 32_s2?
Seems in io-pgtable code, we only tell them in arm_lpae_prot_to_pte, and
currently in this RFC patch, we do not print attributes.

Thanks,
Zhou

> coule implement optional callback(s) to return the data.
> 
> Will
> 
> .
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/12] platform-msi: Add platform check for subdevice irq domain

2021-02-04 Thread Jason Gunthorpe
On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote:
> +bool arch_support_pci_device_ims(struct pci_dev *pdev)
> +{

Consistent language please, we are not using IMS elsewhere, this
feature is called device_msi in Linux.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/7] [PULL REQUEST] iommu/vt-d: Update for v5.12

2021-02-04 Thread Lu Baolu

Hi Joerg,

I just received some internal comments on the last patch

[PATCH 7/7] iommu/vt-d: Apply SATC policy

We need some extra work on it and probably re-target it to v5.13.

Can you please only consider patch 1 ~ 6 for v5.12?

Sorry for the inconvenience.

Best regards,
baolu

On 2021/2/4 9:43, Lu Baolu wrote:

Hi Joerg,

The patches queued in this series is for v5.12. It includes:

  - Audit capability consistency among different IOMMUs
  - Add SATC reporting structure support
  - Add iotlb_sync_map callback support
  - Misc cleanup

Please consider them for v5.12.

Best regards,
Lu Baolu

Bjorn Helgaas (1):
   iommu/vt-d: Fix 'physical' typos

Kyung Min Park (2):
   iommu/vt-d: Audit IOMMU Capabilities and add helper functions
   iommu/vt-d: Move capability check code to cap_audit files

Lu Baolu (1):
   iommu/vt-d: Add iotlb_sync_map callback

Yian Chen (3):
   iommu/vt-d: Add new enum value and structure for SATC
   iommu/vt-d: Parse SATC reporting structure
   iommu/vt-d: Apply SATC policy

  drivers/iommu/intel/Makefile|   2 +-
  drivers/iommu/intel/cap_audit.c | 205 +
  drivers/iommu/intel/cap_audit.h | 130 +++
  drivers/iommu/intel/dmar.c  |   8 +
  drivers/iommu/intel/iommu.c | 337 +++-
  drivers/iommu/intel/irq_remapping.c |   8 +
  include/acpi/actbl1.h   |  11 +-
  include/linux/dmar.h|   2 +
  include/linux/intel-iommu.h |  41 ++--
  9 files changed, 615 insertions(+), 129 deletions(-)
  create mode 100644 drivers/iommu/intel/cap_audit.c
  create mode 100644 drivers/iommu/intel/cap_audit.h


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 2/6] swiotlb: convert variables to arrays

2021-02-04 Thread Robin Murphy

On 2021-02-04 07:29, Christoph Hellwig wrote:

On Wed, Feb 03, 2021 at 03:37:05PM -0800, Dongli Zhang wrote:

This patch converts several swiotlb related variables to arrays, in
order to maintain stat/status for different swiotlb buffers. Here are
variables involved:

- io_tlb_start and io_tlb_end
- io_tlb_nslabs and io_tlb_used
- io_tlb_list
- io_tlb_index
- max_segment
- io_tlb_orig_addr
- no_iotlb_memory

There is no functional change and this is to prepare to enable 64-bit
swiotlb.


Claire Chang (on Cc) already posted a patch like this a month ago,
which looks much better because it actually uses a struct instead
of all the random variables.


Indeed, I skimmed the cover letter and immediately thought that this 
whole thing is just the restricted DMA pool concept[1] again, only from 
a slightly different angle.


Robin.

[1] 
https://lore.kernel.org/linux-iommu/20210106034124.30560-1-tien...@chromium.org/

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-04 Thread Guillaume Tucker
Hi Nicolin,

A regression was detected by kernelci.org in IGT's drm_read tests
on mainline, it was first seen on 17th December 2020.  You can
find some details here:

  https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/

Then an automated bisection was run and it landed on this
patch (v5.10-rc3-4-g25938c73cd79 on mainline).  Normally, an
email is generated automatically but I had to start this one by
hand as there were issues getting it to complete.

You can see the failing test cases with this patch:

  https://lava.collabora.co.uk/results/3126405/0_igt-kms-tegra

Some errors are seen around this point in the log:

  https://lava.collabora.co.uk/scheduler/job/3126405#L1005

[3.029729] tegra-mc 70019000.memory-controller: display0a: read 
@0xfe00: EMEM address decode error (SMMU translation error [--S])
[3.042058] tegra-mc 70019000.memory-controller: display0a: read 
@0xfe00: Page fault (SMMU translation error [--S])


Here's the same test passing with this patch reverted:

  https://lava.collabora.co.uk/results/3126570/0_igt-kms-tegra
  

For completeness, you can see all the test jobs run by the
automated bisection here:

  
https://lava.collabora.co.uk/scheduler/device_type/tegra124-nyan-big?dt_length=25&dt_search=bisection-gtucker-12#dt_


Please let us know if you need any help debugging this issue or
to try a fix on this platform.

Best wishes,
Guillaume

On 25/11/2020 10:10, Nicolin Chen wrote:
> The bus_set_iommu() in tegra_smmu_probe() enumerates all clients
> to call in tegra_smmu_probe_device() where each client searches
> its DT node for smmu pointer and swgroup ID, so as to configure
> an fwspec. But this requires a valid smmu pointer even before mc
> and smmu drivers are probed. So in tegra_smmu_probe() we added a
> line of code to fill mc->smmu, marking "a bit of a hack".
> 
> This works for most of clients in the DTB, however, doesn't work
> for a client that doesn't exist in DTB, a PCI device for example.
> 
> Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when
> it's called from bus_set_iommu(), iommu core will let everything
> carry on. Then when a client gets probed, of_iommu_configure() in
> iommu core will search DTB for swgroup ID and call ->of_xlate()
> to prepare an fwspec, similar to tegra_smmu_probe_device() and
> tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device()
> again, and this time we shall return smmu->iommu pointer properly.
> 
> So we can get rid of tegra_smmu_find() and tegra_smmu_configure()
> along with DT polling code by letting the iommu core handle every
> thing, except a problem that we search iommus property in DTB not
> only for swgroup ID but also for mc node to get mc->smmu pointer
> to call dev_iommu_priv_set() and return the smmu->iommu pointer.
> So we'll need to find another way to get smmu pointer.
> 
> Referencing the implementation of sun50i-iommu driver, of_xlate()
> has client's dev pointer, mc node and swgroup ID. This means that
> we can call dev_iommu_priv_set() in of_xlate() instead, so we can
> simply get smmu pointer in ->probe_device().
> 
> This patch reworks tegra_smmu_probe_device() by:
> 1) Removing mc->smmu hack in tegra_smmu_probe() so as to return
>ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of
>tegra_smmu_probe/tegra_mc_probe().
> 2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu
>pointer in tegra_smmu_probe_device() to replace DTB polling.
> 3) Removing tegra_smmu_configure() accordingly since iommu core
>takes care of it.
> 
> This also fixes a problem that previously we could add clients to
> iommu groups before iommu core initializes its default domain:
> ubuntu@jetson:~$ dmesg | grep iommu
> platform 5000.host1x: Adding to iommu group 1
> platform 5700.gpu: Adding to iommu group 2
> iommu: Default domain type: Translated
> platform 5420.dc: Adding to iommu group 3
> platform 5424.dc: Adding to iommu group 3
> platform 5434.vic: Adding to iommu group 4
> 
> Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have
> warnings if switching to IOMMU_DOMAIN_DMA:
> iommu: Failed to allocate default IOMMU domain of type 0 for
>group (null) - Falling back to IOMMU_DOMAIN_DMA
> iommu: Failed to allocate default IOMMU domain of type 0 for
>group (null) - Falling back to IOMMU_DOMAIN_DMA
> 
> Now, bypassing the first probe_device() call from bus_set_iommu()
> fixes the sequence:
> ubuntu@jetson:~$ dmesg | grep iommu
> iommu: Default domain type: Translated
> tegra-host1x 5000.host1x: Adding to iommu group 0
> tegra-dc 5420.dc: Adding to iommu group 1
> tegra-dc 5424.dc: Adding to iommu group 1
> tegra-vic 5434.vic: Adding to iommu group 2
> nouveau 5700.gpu: Adding to iommu group 3
> 
> Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED.
> 
> Reviewed-by: Dmitry Osipenko 
> Tested-by: Dmitry Osipenko 

Re: [PATCH][next] iommu/mediatek: Fix unsigned domid comparison with less than zero

2021-02-04 Thread Will Deacon
On Wed, Feb 03, 2021 at 01:59:36PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Currently the check for domid < 0 is always false because domid
> is unsigned.  Fix this by making it signed.
> 
> Addresses-CoverityL ("Unsigned comparison against 0")

Typo here ('L' instead of ':')

> Fixes: ab1d5281a62b ("iommu/mediatek: Add iova reserved function")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/iommu/mtk_iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 0ad14a7604b1..823d719945b2 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -640,7 +640,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
>  struct list_head *head)
>  {
>   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
> - unsigned int domid = mtk_iommu_get_domain_id(dev, data->plat_data), i;
> + int domid = mtk_iommu_get_domain_id(dev, data->plat_data), i;

Not sure if it's intentional, but this also makes 'i' signed. It probably
should remain 'unsigned' to match 'iova_region_nr' in
'struct mtk_iommu_plat_data'.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 5/6] xen-swiotlb: convert variables to arrays

2021-02-04 Thread Christoph Hellwig
So one thing that has been on my mind for a while:  I'd really like
to kill the separate dma ops in Xen swiotlb.  If we compare xen-swiotlb
to swiotlb the main difference seems to be:

 - additional reasons to bounce I/O vs the plain DMA capable
 - the possibility to do a hypercall on arm/arm64
 - an extra translation layer before doing the phys_to_dma and vice
   versa
 - an special memory allocator

I wonder if inbetween a few jump labels or other no overhead enablement
options and possibly better use of the dma_range_map we could kill
off most of swiotlb-xen instead of maintaining all this code duplication?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/7] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-02-04 Thread Hillf Danton
Tue,  2 Feb 2021 10:51:10 +0100
> From: Ricardo Ribalda 
> 
> On architectures where the is no coherent caching such as ARM use the
> dma_alloc_noncontiguos API and handle manually the cache flushing using
> dma_sync_sgtable().
> 
> With this patch on the affected architectures we can measure up to 20x
> performance improvement in uvc_video_copy_data_work().
> 
> Eg: aarch64 with an external usb camera
> 
> NON_CONTIGUOUS
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 67034480 : duration 33303
> FPS: 29.99
> URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
> header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
> latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
> decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
> raw decode speed: 9.931 Gbits/s
> raw URB handling speed: 1.025 Gbits/s
> throughput: 16.102 Mbits/s
> URB decode CPU usage 0.162600 %
> 
> COHERENT
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 54683536 : duration 33302
> FPS: 29.99
> URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
> header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
> latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
> decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max 
> (uS)
> raw decode speed: 365.470 Mbits/s
> raw URB handling speed: 295.986 Mbits/s
> throughput: 13.136 Mbits/s
> URB decode CPU usage 3.594500 %
> 
> Signed-off-by: Ricardo Ribalda 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/media/usb/uvc/uvc_video.c | 79 ++-
>  drivers/media/usb/uvc/uvcvideo.h  |  4 +-
>  2 files changed, 60 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/media/usb/uvc/uvc_video.c 
> b/drivers/media/usb/uvc/uvc_video.c
> index a6a441d92b9488..0a7d287dc41528 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -6,11 +6,13 @@
>   *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1097,6 +1099,26 @@ static int uvc_video_decode_start(struct uvc_streaming 
> *stream,
>   return data[0];
>  }
>  
> +static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
> +{
> + return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> +}
> +
> +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
> +{
> + struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream);
> +
> + if (for_device) {
> + dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
> + DMA_FROM_DEVICE);
> + } else {
> + dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
> +  DMA_FROM_DEVICE);
> + invalidate_kernel_vmap_range(uvc_urb->buffer,
> +  uvc_urb->stream->urb_size);
> + }
> +}
> +
>  /*
>   * uvc_video_decode_data_work: Asynchronous memcpy processing
>   *
> @@ -1118,6 +1140,8 @@ static void uvc_video_copy_data_work(struct work_struct 
> *work)
>   uvc_queue_buffer_release(op->buf);
>   }
>  
> + uvc_urb_dma_sync(uvc_urb, true);
> +
>   ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
>   if (ret < 0)
>   uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
> @@ -1539,10 +1563,12 @@ static void uvc_video_complete(struct urb *urb)
>* Process the URB headers, and optionally queue expensive memcpy tasks
>* to be deferred to a work queue.
>*/
> + uvc_urb_dma_sync(uvc_urb, false);
>   stream->decode(uvc_urb, buf, buf_meta);
>  
>   /* If no async work is needed, resubmit the URB immediately. */
>   if (!uvc_urb->async_operations) {
> + uvc_urb_dma_sync(uvc_urb, true);
>   ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
>   if (ret < 0)
>   uvc_printk(KERN_ERR,
> @@ -1559,24 +1585,46 @@ static void uvc_video_complete(struct urb *urb)
>   */
>  static void uvc_free_urb_buffers(struct uvc_streaming *stream)
>  {
> + struct device *dma_dev = dma_dev = stream_to_dmadev(stream);
>   struct uvc_urb *uvc_urb;
>  
>   for_each_uvc_urb(uvc_urb, stream) {
>   if (!uvc_urb->buffer)
>   continue;
>  
> -#ifndef CONFIG_DMA_NONCOHERENT
> - usb_free_coherent(stream->dev->udev, stream->urb_size,
> -   uvc_urb->buffer, uvc_urb->dma);
> -#else
> -