Re: [EXTERNAL] Re: Question regarding VIOT proposal

2021-02-16 Thread Al Stone
On 04 Feb 2021 13:25, Al Stone wrote:
> On 03 Feb 2021 09:46, Jean-Philippe Brucker wrote:
> > On Tue, Feb 02, 2021 at 01:27:13PM -0700, Al Stone wrote:
> > > On 02 Feb 2021 10:17, Jean-Philippe Brucker wrote:
> > > > Hi Al,
> > > > 
> > > > On Fri, Dec 04, 2020 at 01:18:25PM -0700, Al Stone wrote:
> > > > > > I updated the doc: 
> > > > > > https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf
> > > > > > You can incorporate it into the ASWG proposal.
> > > > > > Changes since v8:
> > > > > > * One typo (s/programing/programming/)
> > > > > > * Modified the PCI Range node to include a segment range.
> > > > > > 
> > > > > > I also updated the Linux and QEMU implementations on branch
> > > > > > virtio-iommu/devel in https://jpbrucker.net/git/linux/ and
> > > > > > https://jpbrucker.net/git/qemu/
> > > > > > 
> > > > > > Thanks again for helping with this
> > > > > > 
> > > > > > Jean
> > > > > 
> > > > > Perfect.  Thanks.  I'll update the ASWG info right away.
> > > > 
> > > > Has there been any more feedback on the VIOT specification? I'm 
> > > > wondering
> > > > when we can start upstreaming support for it.
> > > > 
> > > > Thanks,
> > > > Jean
> > > 
> > > Ah, sorry, Jean.  I meant to get back to you sooner.  My apologies.
> > > 
> > > The latest version that resulted from the discussion with Yinghan of
> > > Microsoft is the one in front the ASWG; I think if you upstream that
> > > version, it should be identical to the spec when it is next published
> > > (post ACPI 6.4).
> > > 
> > > The process is: (1) propose the change, (2) review it in committee,
> > > (3) give people more time to think about it, then (4) have a finale
> > > vote.  We've been iterating over (1), (2) and (3).  Since there was
> > > no new discussion at the last meeting, we should have the final vote
> > > on this (4) in the next meeting.  I had hoped we could do that last
> > > week but the meeting was canceled at the last minute.  I hope to have
> > > the final vote this Thursday and will let you know as soon as it has
> > > been decided.
> > > 
> > > My apologies for the delays; getting things done by committee always
> > > takes a bazillion times longer than one would think.
> > 
> > No worries, thanks a lot for the update!
> > 
> > Thanks,
> > Jean
> 
> Sigh.  Just got a note that today's meeting has been canceled :(.
> 
> So, next week for the final vote.  OTOH, there have been no comments
> of any sort on the proposal -- and silence is good, in this case.

Would you believe last week's meeting was canceled, too?  Not sure
why the chair/co-chair are doing this but I'm finding it a little
frustrating.

We'll try again this week ... again, apologies for the delays.  I'd
recommend going with the last version posted just so progress can be
made.  The spec can always be fixed later.

-- 
ciao,
al
---
Al Stone
Software Engineer
Red Hat, Inc.
a...@redhat.com
---

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API

2021-02-16 Thread Robin Murphy

On 2021-02-02 09:51, Christoph Hellwig wrote:

Add a new API that returns a potentiall virtually non-contigous sg_table
and a DMA address.  This API is only properly implemented for dma-iommu
and will simply return a contigious chunk as a fallback.

The intent is that media drivers can use this API if either:

  - no kernel mapping or only temporary kernel mappings are required.
That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
  - a kernel mapping is required for cached and DMA mapped pages, but
the driver also needs the pages to e.g. map them to userspace.
In that sense it is a replacement for some aspects of the recently
removed and never fully implemented DMA_ATTR_NON_CONSISTENT

Signed-off-by: Christoph Hellwig 
---
  Documentation/core-api/dma-api.rst |  74 +
  include/linux/dma-map-ops.h|  18 +
  include/linux/dma-mapping.h|  31 +
  kernel/dma/mapping.c   | 103 +
  4 files changed, 226 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 157a474ae54416..e24b2447f4bfe6 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -594,6 +594,80 @@ dev, size, dma_handle and dir must all be the same as 
those passed into
  dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
  dma_alloc_noncoherent().
  
+::

+
+   struct sg_table *
+   dma_alloc_noncontiguous(struct device *dev, size_t size,
+   enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates   bytes of non-coherent and possibly 
non-contiguous
+memory.  It returns a pointer to struct sg_table that describes the allocated
+and DMA mapped memory, or NULL if the allocation failed. The resulting memory
+can be used for struct page mapped into a scatterlist are suitable for.
+
+The return sg_table is guaranteed to have 1 single DMA mapped segment as
+indicated by sgt->nents, but it might have multiple CPU side segments as
+indicated by sgt->orig_nents.
+
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+Before giving the memory to the device, dma_sync_sgtable_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_sgtable_for_cpu(), just like for streaming DMA mappings that are
+reused.
+
+::
+
+   void
+   dma_free_noncontiguous(struct device *dev, size_t size,
+  struct sg_table *sgt,
+  enum dma_data_direction dir)
+
+Free memory previously allocated using dma_alloc_noncontiguous().  dev, size,
+and dir must all be the same as those passed into dma_alloc_noncontiguous().
+sgt must be the pointer returned by dma_alloc_noncontiguous().
+
+::
+
+   void *
+   dma_vmap_noncontiguous(struct device *dev, size_t size,
+   struct sg_table *sgt)
+
+Return a contiguous kernel mapping for an allocation returned from
+dma_alloc_noncontiguous().  dev and size must be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
+Once a non-contiguous allocation is mapped using this function, the
+flush_kernel_vmap_range() and invalidate_kernel_vmap_range() APIs must be used
+to manage the coherency of the kernel mapping.


Maybe say something like "coherency between the kernel mapping and any 
userspace mappings"? Otherwise people like me may be easily confused and 
think it's referring to coherency between the kernel mapping and the 
device, where in most cases those APIs won't help at all :)



+
+::
+
+   void
+   dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
+
+Unmap a kernel mapping returned by dma_vmap_noncontiguous().  dev must be the
+same the one passed into dma_alloc_noncontiguous().  vaddr must be the pointer
+returned by dma_vmap_noncontiguous().
+
+
+::
+
+   int
+   dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
+  size_t size, struct sg_table *sgt)
+
+Map an allocation returned from dma_alloc_noncontiguous() into a user address
+space.  dev and size must be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
  ::
  
  	int

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 11e02537b9e01b..fe46a41130e662 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -22,6 +22,10 @@ struct dma_map_ops {
gfp_t gfp);
void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
dma_addr_t 

Re: [PATCH v3 1/2] dt-bindings: iommu: add bindings for sprd iommu

2021-02-16 Thread Robin Murphy

On 2021-02-10 19:21, Rob Herring wrote:

On Fri, Feb 5, 2021 at 1:21 AM Chunyan Zhang  wrote:


Hi Rob,

On Fri, 5 Feb 2021 at 07:25, Rob Herring  wrote:


On Wed, Feb 03, 2021 at 05:07:26PM +0800, Chunyan Zhang wrote:

From: Chunyan Zhang 

This iommu module can be used by Unisoc's multimedia devices, such as
display, Image codec(jpeg) and a few signal processors, including
VSP(video), GSP(graphic), ISP(image), and CPP(camera pixel processor), etc.

Signed-off-by: Chunyan Zhang 
---
  .../devicetree/bindings/iommu/sprd,iommu.yaml | 72 +++
  1 file changed, 72 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
new file mode 100644
index ..4fc99e81fa66
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
@@ -0,0 +1,72 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright 2020 Unisoc Inc.
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/sprd,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Unisoc IOMMU and Multi-media MMU
+
+maintainers:
+  - Chunyan Zhang 
+
+properties:
+  compatible:
+enum:
+  - sprd,iommu-v1
+
+  "#iommu-cells":
+const: 0
+description:
+  Unisoc IOMMUs are all single-master IOMMU devices, therefore no
+  additional information needs to associate with its master device.
+  Please refer to the generic bindings document for more details,
+  Documentation/devicetree/bindings/iommu/iommu.txt
+
+  reg:
+maxItems: 1
+description:
+  Not required if 'sprd,iommu-regs' is defined.
+
+  clocks:
+description:
+  Reference to a gate clock phandle, since access to some of IOMMUs are
+  controlled by gate clock, but this is not required.
+
+  sprd,iommu-regs:
+$ref: /schemas/types.yaml#/definitions/phandle-array
+description:
+  Reference to a syscon phandle plus 1 cell, the syscon defines the
+  register range used by the iommu and the media device, the cell
+  defines the offset for iommu registers. Since iommu module shares
+  the same register range with the media device which uses it.
+
+required:
+  - compatible
+  - "#iommu-cells"


OK, so apparently the hardware is not quite as trivial as my initial 
impression, and you should have interrupts as well.



+
+oneOf:
+  - required:
+  - reg
+  - required:
+  - sprd,iommu-regs
+
+additionalProperties: false
+
+examples:
+  - |
+iommu_disp: iommu-disp {
+  compatible = "sprd,iommu-v1";
+  sprd,iommu-regs = <_regs 0x800>;


If the IOMMU is contained within another device, then it should just be
a child node of that device.


Yes, actually IOMMU can be seen as a child of multimedia devices, I
considered moving IOMMU under into multimedia device node, but
multimedia devices need IOMMU when probe[1], so I dropped that idea.


Don't design your binding around working-around linux issues.


Having stumbled across the DRM driver patches the other day, I now see 
where this is coming from, and it's even worse than that - this whole 
binding seems to be largely working around bad driver design.



And they share the same register base, e.g.

+   mm {
+   compatible = "simple-bus";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   dpu_regs: syscon@6300 {


Drop this node.


+   compatible = "sprd,sc9863a-dpuregs", "syscon";
+   reg = <0 0x6300 0 0x1000>;
+   };
+
+   dpu: dpu@6300 {
+   compatible = "sprd,sharkl3-dpu";
+   sprd,disp-regs = <_regs>;


reg = <0 0x6300 0 0x800>;


In fact judging by the other driver it looks like the length only needs 
to be 0x200 here (but maybe there's more to come in future).



+   iommus = <_dispc>;
+   };
+
+   iommu_dispc: iommu@6300 {
+   compatible = "sprd,iommu-v1";
+   sprd,iommu-regs = <_regs 0x800>;


reg = <0 0x63000800 0 0x800>;


...and this one looks to need less than 0x80, even :)




+   #iommu-cells = <0>;


Though given it seems there is only 1 client and this might really be
just 1 h/w block, you don't really need to use the iommu binding at
all. The DPU should be able to instantiate it's own IOMMU device.
There's other examples of this such as mali GPU though that is all one
driver, but that's a Linux implementation detail.


FWIW that's really a very different situation - the MMUs in a Mali GPU 
are fundamental parts of its internal pipelines and would never make 
sense to handle as 

Re: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx functions

2021-02-16 Thread Robin Murphy

On 2021-02-12 17:28, Shameerali Kolothum Thodi wrote:




-Original Message-
From: Shameerali Kolothum Thodi
Sent: 12 February 2021 16:45
To: 'Robin Murphy' ; linux-ker...@vger.kernel.org;
iommu@lists.linux-foundation.org
Cc: j...@8bytes.org; jean-phili...@linaro.org; w...@kernel.org; Zengtao (B)
; linux...@openeuler.org
Subject: RE: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx functions




-Original Message-
From: Robin Murphy [mailto:robin.mur...@arm.com]
Sent: 12 February 2021 16:39
To: Shameerali Kolothum Thodi ;
linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org
Cc: j...@8bytes.org; jean-phili...@linaro.org; w...@kernel.org; Zengtao (B)
; linux...@openeuler.org
Subject: Re: [PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx

functions


On 2021-02-12 14:54, Shameerali Kolothum Thodi wrote:

Hi Robin/Joerg,


-Original Message-
From: Shameer Kolothum

[mailto:shameerali.kolothum.th...@huawei.com]

Sent: 01 February 2021 12:41
To: linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org
Cc: j...@8bytes.org; robin.mur...@arm.com; jean-phili...@linaro.org;
w...@kernel.org; Zengtao (B) ;
linux...@openeuler.org
Subject: [Linuxarm] [PATCH v2] iommu: Check dev->iommu in

iommu_dev_xxx

functions

The device iommu probe/attach might have failed leaving dev->iommu
to NULL and device drivers may still invoke these functions resulting
in a crash in iommu vendor driver code. Hence make sure we check that.

Also added iommu_ops to the "struct dev_iommu" and set it if the dev
is successfully associated with an iommu.

Fixes: a3a195929d40 ("iommu: Add APIs for multiple domains per

device")

Signed-off-by: Shameer Kolothum



---
v1 --> v2:
   -Added iommu_ops to struct dev_iommu based on the discussion with

Robin.

   -Rebased against iommu-tree core branch.


A gentle ping on this...


Is there a convincing justification for maintaining yet another copy of
the ops pointer rather than simply dereferencing iommu_dev->ops at point
of use?



TBH, nothing I can think of now. That was mainly the way I interpreted your
suggestion
from the v1.  Now it looks like you didn’t mean it :). I am Ok to rework it to
dereference
it from iommu_dev. Please let me know.


So we can do something like this,

index fd76e2f579fe..5fd31a3cec18 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2865,10 +2865,12 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
   */
  int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
  {
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   if (dev->iommu && dev->iommu->iommu_dev && dev->iommu->iommu_dev->ops)
+   struct iommu_ops  *ops = dev->iommu->iommu_dev->ops;
  
-   if (ops && ops->dev_enable_feat)

-   return ops->dev_enable_feat(dev, feat);
+   if (ops->dev_enable_feat)
+   return ops->dev_enable_feat(dev, feat);
+   }
  
 return -ENODEV;

  }

Again, not sure we need to do the checking for iommu->dev and ops here. If the
dev->iommu is set, is it safe to assume that we have a valid iommu->iommu_dev
and ops always? (May be it is safer to do the checking in case something
else breaks this assumption in future). Please let me know your thoughts.


I think it *could* happen that dev->iommu is set by iommu_fwspec_init() 
but iommu_probe_device() later refuses the device for whatever reason, 
so we would still need to check iommu->iommu_dev to be completely safe. 
We can assume iommu_dev->ops is valid, since if the IOMMU driver has 
returned something bogus from .probe_device then it's a major bug in 
that driver and crashing is the best indicator :)


Robin.



Thanks,
Shameer


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 1/8] of/device: Allow specifying a custom iommu_spec to of_dma_configure

2021-02-16 Thread Mikko Perttunen

On 2/16/21 2:47 PM, Robin Murphy wrote:

Hi Mikko,

On 2021-02-08 16:38, Mikko Perttunen wrote:

To allow for more customized device tree bindings that point to IOMMUs,
allow manual specification of iommu_spec to of_dma_configure.

The initial use case for this is with Host1x, where the driver manages
a set of device tree-defined IOMMU contexts that are dynamically
allocated to various users. These contexts don't correspond to
platform devices and are instead attached to dummy devices on a custom
software bus.


I'd suggest taking a closer look at the patches that made this 
of_dma_configure_id() in the first place, and the corresponding bus code 
in fsl-mc. At this level, Host1x sounds effectively identical to DPAA2 
in terms of being a bus of logical devices composed from bits of 
implicit behind-the-scenes hardware. I mean, compare your series title 
to the fact that their identifiers are literally named "Isolation 
Context ID" ;)


Please just use the existing mechanisms to describe a mapping between 
Host1x context IDs and SMMU Stream IDs, rather than what looks like a 
giant hacky mess here.


(This also reminds me I wanted to rip out all the PCI special-cases and 
convert pci_dma_configure() over to passing its own IDs too, so thanks 
for the memory-jog...)


Thanks Robin, not sure how I missed that the first time :) Maybe because 
Host1x doesn't have a concept of its own "IDs" for these per se - the 
hardware just uses stream IDs as is. I would need to count the number of 
mapped IDs from the iommu-map property and introduce some 0..N range of 
IDs at the software level. But maybe that's not too bad if we're able to 
use the existing paths and bindings then.


I'll take a look at switching to iommu-map.

Thanks,
Mikko



Robin.


Signed-off-by: Mikko Perttunen 
---
  drivers/iommu/of_iommu.c  | 12 
  drivers/of/device.c   |  9 +
  include/linux/of_device.h | 34 +++---
  include/linux/of_iommu.h  |  6 --
  4 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e505b9130a1c..3fefa6c63863 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -87,8 +87,7 @@ int of_get_dma_window(struct device_node *dn, const 
char *prefix, int index,

  }
  EXPORT_SYMBOL_GPL(of_get_dma_window);
-static int of_iommu_xlate(struct device *dev,
-  struct of_phandle_args *iommu_spec)
+int of_iommu_xlate(struct device *dev, struct of_phandle_args 
*iommu_spec)

  {
  const struct iommu_ops *ops;
  struct fwnode_handle *fwnode = _spec->np->fwnode;
@@ -117,6 +116,7 @@ static int of_iommu_xlate(struct device *dev,
  module_put(ops->owner);
  return ret;
  }
+EXPORT_SYMBOL_GPL(of_iommu_xlate);
  static int of_iommu_configure_dev_id(struct device_node *master_np,
   struct device *dev,
@@ -177,7 +177,8 @@ static int of_iommu_configure_device(struct 
device_node *master_np,

  const struct iommu_ops *of_iommu_configure(struct device *dev,
 struct device_node *master_np,
-   const u32 *id)
+   const u32 *id,
+   struct of_phandle_args *iommu_spec)
  {
  const struct iommu_ops *ops = NULL;
  struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -209,7 +210,10 @@ const struct iommu_ops *of_iommu_configure(struct 
device *dev,

  err = pci_for_each_dma_alias(to_pci_dev(dev),
   of_pci_iommu_init, );
  } else {
-    err = of_iommu_configure_device(master_np, dev, id);
+    if (iommu_spec)
+    err = of_iommu_xlate(dev, iommu_spec);
+    else
+    err = of_iommu_configure_device(master_np, dev, id);
  fwspec = dev_iommu_fwspec_get(dev);
  if (!err && fwspec)
diff --git a/drivers/of/device.c b/drivers/of/device.c
index aedfaaafd3e7..84ada2110c5b 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -88,8 +88,9 @@ int of_device_add(struct platform_device *ofdev)
   * can use a platform bus notifier and handle BUS_NOTIFY_ADD_DEVICE 
events

   * to fix up DMA configuration.
   */
-int of_dma_configure_id(struct device *dev, struct device_node *np,
-    bool force_dma, const u32 *id)
+int __of_dma_configure(struct device *dev, struct device_node *np,
+    bool force_dma, const u32 *id,
+    struct of_phandle_args *iommu_spec)
  {
  const struct iommu_ops *iommu;
  const struct bus_dma_region *map = NULL;
@@ -170,7 +171,7 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,

  dev_dbg(dev, "device is%sdma coherent\n",
  coherent ? " " : " not ");
-    iommu = of_iommu_configure(dev, np, id);
+    iommu = of_iommu_configure(dev, np, id, iommu_spec);
  if (PTR_ERR(iommu) == -EPROBE_DEFER) {
  kfree(map);
  return -EPROBE_DEFER;
@@ -184,7 +185,7 @@ int of_dma_configure_id(struct device *dev, 

Re: [RFC PATCH 1/8] of/device: Allow specifying a custom iommu_spec to of_dma_configure

2021-02-16 Thread Robin Murphy

Hi Mikko,

On 2021-02-08 16:38, Mikko Perttunen wrote:

To allow for more customized device tree bindings that point to IOMMUs,
allow manual specification of iommu_spec to of_dma_configure.

The initial use case for this is with Host1x, where the driver manages
a set of device tree-defined IOMMU contexts that are dynamically
allocated to various users. These contexts don't correspond to
platform devices and are instead attached to dummy devices on a custom
software bus.


I'd suggest taking a closer look at the patches that made this 
of_dma_configure_id() in the first place, and the corresponding bus code 
in fsl-mc. At this level, Host1x sounds effectively identical to DPAA2 
in terms of being a bus of logical devices composed from bits of 
implicit behind-the-scenes hardware. I mean, compare your series title 
to the fact that their identifiers are literally named "Isolation 
Context ID" ;)


Please just use the existing mechanisms to describe a mapping between 
Host1x context IDs and SMMU Stream IDs, rather than what looks like a 
giant hacky mess here.


(This also reminds me I wanted to rip out all the PCI special-cases and 
convert pci_dma_configure() over to passing its own IDs too, so thanks 
for the memory-jog...)


Robin.


Signed-off-by: Mikko Perttunen 
---
  drivers/iommu/of_iommu.c  | 12 
  drivers/of/device.c   |  9 +
  include/linux/of_device.h | 34 +++---
  include/linux/of_iommu.h  |  6 --
  4 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e505b9130a1c..3fefa6c63863 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -87,8 +87,7 @@ int of_get_dma_window(struct device_node *dn, const char 
*prefix, int index,
  }
  EXPORT_SYMBOL_GPL(of_get_dma_window);
  
-static int of_iommu_xlate(struct device *dev,

- struct of_phandle_args *iommu_spec)
+int of_iommu_xlate(struct device *dev, struct of_phandle_args *iommu_spec)
  {
const struct iommu_ops *ops;
struct fwnode_handle *fwnode = _spec->np->fwnode;
@@ -117,6 +116,7 @@ static int of_iommu_xlate(struct device *dev,
module_put(ops->owner);
return ret;
  }
+EXPORT_SYMBOL_GPL(of_iommu_xlate);
  
  static int of_iommu_configure_dev_id(struct device_node *master_np,

 struct device *dev,
@@ -177,7 +177,8 @@ static int of_iommu_configure_device(struct device_node 
*master_np,
  
  const struct iommu_ops *of_iommu_configure(struct device *dev,

   struct device_node *master_np,
-  const u32 *id)
+  const u32 *id,
+  struct of_phandle_args *iommu_spec)
  {
const struct iommu_ops *ops = NULL;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -209,7 +210,10 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, );
} else {
-   err = of_iommu_configure_device(master_np, dev, id);
+   if (iommu_spec)
+   err = of_iommu_xlate(dev, iommu_spec);
+   else
+   err = of_iommu_configure_device(master_np, dev, id);
  
  		fwspec = dev_iommu_fwspec_get(dev);

if (!err && fwspec)
diff --git a/drivers/of/device.c b/drivers/of/device.c
index aedfaaafd3e7..84ada2110c5b 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -88,8 +88,9 @@ int of_device_add(struct platform_device *ofdev)
   * can use a platform bus notifier and handle BUS_NOTIFY_ADD_DEVICE events
   * to fix up DMA configuration.
   */
-int of_dma_configure_id(struct device *dev, struct device_node *np,
-   bool force_dma, const u32 *id)
+int __of_dma_configure(struct device *dev, struct device_node *np,
+   bool force_dma, const u32 *id,
+   struct of_phandle_args *iommu_spec)
  {
const struct iommu_ops *iommu;
const struct bus_dma_region *map = NULL;
@@ -170,7 +171,7 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
dev_dbg(dev, "device is%sdma coherent\n",
coherent ? " " : " not ");
  
-	iommu = of_iommu_configure(dev, np, id);

+   iommu = of_iommu_configure(dev, np, id, iommu_spec);
if (PTR_ERR(iommu) == -EPROBE_DEFER) {
kfree(map);
return -EPROBE_DEFER;
@@ -184,7 +185,7 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
dev->dma_range_map = map;
return 0;
  }
-EXPORT_SYMBOL_GPL(of_dma_configure_id);
+EXPORT_SYMBOL_GPL(__of_dma_configure);
  
  int of_device_register(struct platform_device *pdev)

  {
diff --git 

Re: [PATCH 0/3] mm/page_alloc: Fix pageblock_order with HUGETLB_PAGE_SIZE_VARIABLE

2021-02-16 Thread David Hildenbrand



But again, if there are valid use cases then sure, let's make the code fully 
compatible with HUGETLB_PAGE_ORDER > MAX_ORDER.


Given that gigantic HugeTLB allocation can fallback on alloc_contig_pages()
or CMA if/when available, is there a real need for HUGETLB_PAGE_ORDER to be
upto MAX_ORDER, used as pageblock_order or as COMPACTION_HPAGE_ORDER ? With
gigantic HugeTLB pages being available, HUGETLB_PAGE_ORDER seems to be just
detached from the buddy allocator. But I am not sure, will keep looking.


Having HPAGE_PMD_ORDER >>= MAX_ORDER ("significantly larger") will make 
it very unlikely that you are able to reliably allocate any huge pages 
at all dynamically at runtime without CMA.


Gigantic pages are problematic by nature :)

--
Thanks,

David / dhildenb

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] mm/page_alloc: Fix pageblock_order with HUGETLB_PAGE_SIZE_VARIABLE

2021-02-16 Thread Anshuman Khandual

On 2/12/21 3:09 PM, David Hildenbrand wrote:
> On 12.02.21 08:02, Anshuman Khandual wrote:
>>
>> On 2/11/21 2:07 PM, David Hildenbrand wrote:
>>> On 11.02.21 07:22, Anshuman Khandual wrote:
 The following warning gets triggered while trying to boot a 64K page size
 without THP config kernel on arm64 platform.

 WARNING: CPU: 5 PID: 124 at mm/vmstat.c:1080 
 __fragmentation_index+0xa4/0xc0
 Modules linked in:
 CPU: 5 PID: 124 Comm: kswapd0 Not tainted 5.11.0-rc6-4-ga0ea7d62002 
 #159
 Hardware name: linux,dummy-virt (DT)
 [    8.810673] pstate: 2045 (nzCv daif +PAN -UAO -TCO BTYPE=--)
 [    8.811732] pc : __fragmentation_index+0xa4/0xc0
 [    8.812555] lr : fragmentation_index+0xf8/0x138
 [    8.813360] sp : 864079b0
 [    8.813958] x29: 864079b0 x28: 0372
 [    8.814901] x27: 7682 x26: 8000135b3948
 [    8.815847] x25: 1fffe00010c80f48 x24: 
 [    8.816805] x23:  x22: 000d
 [    8.817764] x21: 0030 x20: 0005ffcb4d58
 [    8.818712] x19: 000b x18: 
 [    8.819656] x17:  x16: 
 [    8.820613] x15:  x14: 8000114c6258
 [    8.821560] x13: 6000bff969ba x12: 1fffe000bff969b9
 [    8.822514] x11: 1fffe000bff969b9 x10: 6000bff969b9
 [    8.823461] x9 : dfff8000 x8 : 0005ffcb4dcf
 [    8.824415] x7 : 0001 x6 : 41b58ab3
 [    8.825359] x5 : 600010c80f48 x4 : dfff8000
 [    8.826313] x3 : 8000102be670 x2 : 0007
 [    8.827259] x1 : 86407a60 x0 : 000d
 [    8.828218] Call trace:
 [    8.828667]  __fragmentation_index+0xa4/0xc0
 [    8.829436]  fragmentation_index+0xf8/0x138
 [    8.830194]  compaction_suitable+0x98/0xb8
 [    8.830934]  wakeup_kcompactd+0xdc/0x128
 [    8.831640]  balance_pgdat+0x71c/0x7a0
 [    8.832327]  kswapd+0x31c/0x520
 [    8.832902]  kthread+0x224/0x230
 [    8.833491]  ret_from_fork+0x10/0x30
 [    8.834150] ---[ end trace 472836f79c15516b ]---

 This warning comes from __fragmentation_index() when the requested order
 is greater than MAX_ORDER.

 static int __fragmentation_index(unsigned int order,
   struct contig_page_info *info)
 {
   unsigned long requested = 1UL << order;

   if (WARN_ON_ONCE(order >= MAX_ORDER)) <= Triggered here
   return 0;

 Digging it further reveals that pageblock_order has been assigned a value
 which is greater than MAX_ORDER failing the above check. But why this
 happened ? Because HUGETLB_PAGE_ORDER for the given config on arm64 is
 greater than MAX_ORDER.

 The solution involves enabling HUGETLB_PAGE_SIZE_VARIABLE which would make
 pageblock_order a variable instead of constant HUGETLB_PAGE_ORDER. But that
 change alone also did not really work as pageblock_order still got assigned
 as HUGETLB_PAGE_ORDER in set_pageblock_order(). HUGETLB_PAGE_ORDER needs to
 be less than MAX_ORDER for its appropriateness as pageblock_order otherwise
 just fallback to MAX_ORDER - 1 as before. While here it also fixes a build
 problem via type casting MAX_ORDER in rmem_cma_setup().
>>>
>>> I'm wondering, is there any real value in allowing FORCE_MAX_ZONEORDER to 
>>> be "11" with ARM64_64K_PAGES/ARM64_16K_PAGES?
>>
>> MAX_ORDER should be as high as would be required for the current config.
>> Unless THP is enabled, there is no need for it to be any higher than 11.
>> But I might be missing historical reasons around this as well. Probably
>> others from arm64 could help here.
> 
> Theoretically yes, practically no. If nobody cares about a configuration, no 
> need to make the code more complicated for that configuration.
> 
>>
>>>
>>> Meaning: are there any real use cases that actually build a kernel without 
>>> TRANSPARENT_HUGEPAGE and with ARM64_64K_PAGES/ARM64_16K_PAGES?
>>
>> THP is always optional. Besides kernel builds without THP should always
>> be supported. Assuming that all builds will have THP enabled, might not
>> be accurate.
>>
>>>
>>> As builds are essentially broken, I assume this is not that relevant? Or 
>>> how long has it been broken?
>>
>> Git blame shows that it's been there for some time now. But how does
>> that make this irrelevant ? A problem should be fixed nonetheless.
> 
> When exactly did I say not to fix it? I'm saying if nobody uses it, we might 
> be able to simplify.

I might have misunderstood the relevance of a problematic configuration
wrt the need for it to be fixed. My bad, never mind :)

> 
>>
>>>
>>> It might be easier to just drop the "TRANSPARENT_HUGEPAGE" part from the 
>>> FORCE_MAX_ZONEORDER config.
>>>
>>
>> Not sure if it would be a good idea to unnecessarily 

Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous

2021-02-16 Thread Christoph Hellwig
On Tue, Feb 16, 2021 at 05:14:55PM +0900, Tomasz Figa wrote:
> When working on the videobuf2 integration with Sergey I noticed that
> we always pass 0 as DMA attrs here, which removes the ability for
> drivers to use DMA_ATTR_ALLOC_SINGLE_PAGES.
> 
> It's quite important from a system stability point of view, because by
> default the iommu_dma allocator would prefer big order allocations for
> TLB locality reasons. For many devices, though, it doesn't really
> affect the performance, because of random access patterns, so single
> pages are good enough and reduce the risk of allocation failures or
> latency due to fragmentation.
> 
> Do you think we could add the attrs parameter to the
> dma_alloc_noncontiguous() API?

Yes, we could probably do that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous

2021-02-16 Thread Tomasz Figa
Hi Christoph


On Tue, Feb 2, 2021 at 6:51 PM Christoph Hellwig  wrote:
>
> Implement support for allocating a non-contiguous DMA region.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/dma-iommu.c | 35 +++
>  1 file changed, 35 insertions(+)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 85cb004d7a44c6..4e0b170d38d57a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -718,6 +718,7 @@ static struct page 
> **__iommu_dma_alloc_noncontiguous(struct device *dev,
> goto out_free_sg;
>
> sgt->sgl->dma_address = iova;
> +   sgt->sgl->dma_length = size;
> return pages;
>
>  out_free_sg:
> @@ -755,6 +756,36 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> return NULL;
>  }
>
> +#ifdef CONFIG_DMA_REMAP
> +static struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev,
> +   size_t size, enum dma_data_direction dir, gfp_t gfp)
> +{
> +   struct dma_sgt_handle *sh;
> +
> +   sh = kmalloc(sizeof(*sh), gfp);
> +   if (!sh)
> +   return NULL;
> +
> +   sh->pages = __iommu_dma_alloc_noncontiguous(dev, size, >sgt, gfp,
> +   PAGE_KERNEL, 0);

When working on the videobuf2 integration with Sergey I noticed that
we always pass 0 as DMA attrs here, which removes the ability for
drivers to use DMA_ATTR_ALLOC_SINGLE_PAGES.

It's quite important from a system stability point of view, because by
default the iommu_dma allocator would prefer big order allocations for
TLB locality reasons. For many devices, though, it doesn't really
affect the performance, because of random access patterns, so single
pages are good enough and reduce the risk of allocation failures or
latency due to fragmentation.

Do you think we could add the attrs parameter to the
dma_alloc_noncontiguous() API?

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu