Re: [PATCH 0/3] Apple M1 DART IOMMU driver

2021-03-21 Thread Sven Peter via iommu



Hi Mark,

Sorry for the spam if you get this message twice. This is pretty embarrassing
but I've just switched mail providers after ProtonMail messed up yesterday and
it looks like my new one defaulted to sending HTML messages even though I only
typed plaintext. This shouldn't have happened in the first place but it
certainly shouldn't happen again now :-(

> On 21. Mar 2021, at 17:00, Mark Kettenis  wrote:
> 
> I don't think the first option is going to work for PCIe.  PCIe
> devices will have to use "iommu-map" properties to map PCI devices to
> the right iommu, and the currently implementation seems to assume that
> #iommu-cells = <1>.  The devictree binding[1] doesn't explicitly state
> that it relies on #iommu-cells = <1>, but it isn't clear how the
> rid-base to iommu-base mapping mechanism would work when that isn't
> the case.
> 
> Now the PCIe DARTs are simpler and seem to have only one "instance"
> per DART.  So if we keep #iommu-cells = <1> for those, you'd still be
> fine using the first approach.

Good point, I guess that only leaves option two for now then.
Having some DARTs use cells = <1> and others <2> sounds confusing to me.


> 
> As I mentioned before, not all DARTs support the full 32-bit aperture.
> In particular the PCIe DARTs support a smaller address-space.  It is
> not clear whether this is a restriction of the PCIe host controller or
> the DART, but the Apple Device Tree has "vm-base" and "vm-size"
> properties that encode the base address and size of the aperture.
> These single-cell properties which is probably why for the USB DARTs
> only "vm-base" is given; since "vm-base" is 0, a 32-bit number
> wouldn't be able to encode the full aperture size.  We could make them
> 64-bit numbers in the Linux device tree though and always be explicit
> about the size.  Older Sun SPARC machines used a single "virtual-dma"
> property to encode the aperture.  We could do someting similar.  You
> would use this property to initialize domain->geometry.aperture_start
> and domain->geometry.aperture_end in diff 3/3 of this series.
> 
> I think it would make sense to include this in this series, as this
> would make adding support for PCIe very easy, and PCIe gives you
> aupport for network (both wired and wireless) and the type-A USB ports
> on the mini.



Agreed, I'd ideally like to converge on a device tree binding
that won't have to change in the near future.

I've tried to use an address space larger than 32bit and that seems to
work for parts of the dwc3 controller but breaks for the xhci parts because
the upper lines don't seem to be connected there (e.g. if xhci tries to
use <0x1 0x> I get a fault for <0 0x>).

Looking at other iommu drivers I have found the following two similar
bindings:

qcom uses a ranges property with a 64bit address and 32 bit size [1]

  apps_iommu: iommu@1e2 {
  ...
  ranges = <0 0x1e2 0x4>;
  ...
  };

and tegra seems to support a dma-window property with 32bit address
and size [2]

  smmu {
  [...]
  dma-window = <0 0x4000>;/* IOVA start & length */
  [...]
  };

I believe there already is of_get_dma_window to handle parsing this
in the common iommu code [3] but I can't find any place using it.
It's a little bit more complex that we need since it allows to specify the
number of cells for both the address and the size but it should allow us to
express all possible configurations:

  usb_dart {
  [ ... ]
  #dma-address-cells = <1>;
  #dma-size-cells = <2>;
  dma-window = <0 0x1 0x0>;
  [ ... ]
  };
  pcie_dart {
  [ ... ]
  #dma-address-cells = <1>;
  #dma-size-cells = <1>;
  dma-window = <0x10 0x3fe0>;
  [ ... ]
  };

where #dma-address-cells and #dma-size-cells default to
#address-cells and #size-cells respectively if I understand
the code correctly. That way we could also just always use
a 64bit address and size in the DT, e.g.

  pcie_dart {
  [ ... ]
  dma-window = <0 0x10 0 0x3fe0>;
  [ ... ]
  };


Best,

Sven


[1] Documentation/devicetree/bindings/iommu/qcom,iommu.txt
[2] Documentation/devicetree/bindings/iommu/nvidia,tegra30-smmu.txt
[3] drivers/iommu/of_iommu.c
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Apple M1 DART IOMMU driver

2021-03-21 Thread Sven Peter via iommu

Hi Mark,

> On 21. Mar 2021, at 17:00, Mark Kettenis  wrote:
> 
> I don't think the first option is going to work for PCIe.  PCIe
> devices will have to use "iommu-map" properties to map PCI devices to
> the right iommu, and the currently implementation seems to assume that
> #iommu-cells = <1>.  The devictree binding[1] doesn't explicitly state
> that it relies on #iommu-cells = <1>, but it isn't clear how the
> rid-base to iommu-base mapping mechanism would work when that isn't
> the case.
> 
> Now the PCIe DARTs are simpler and seem to have only one "instance"
> per DART.  So if we keep #iommu-cells = <1> for those, you'd still be
> fine using the first approach.

Good point, I guess that only leaves option two for now then.
Having some DARTs use cells = <1> and others <2> sounds confusing to me.


> 
> As I mentioned before, not all DARTs support the full 32-bit aperture.
> In particular the PCIe DARTs support a smaller address-space.  It is
> not clear whether this is a restriction of the PCIe host controller or
> the DART, but the Apple Device Tree has "vm-base" and "vm-size"
> properties that encode the base address and size of the aperture.
> These single-cell properties which is probably why for the USB DARTs
> only "vm-base" is given; since "vm-base" is 0, a 32-bit number
> wouldn't be able to encode the full aperture size.  We could make them
> 64-bit numbers in the Linux device tree though and always be explicit
> about the size.  Older Sun SPARC machines used a single "virtual-dma"
> property to encode the aperture.  We could do someting similar.  You
> would use this property to initialize domain->geometry.aperture_start
> and domain->geometry.aperture_end in diff 3/3 of this series.
> 
> I think it would make sense to include this in this series, as this
> would make adding support for PCIe very easy, and PCIe gives you
> aupport for network (both wired and wireless) and the type-A USB ports
> on the mini.



Agreed, I'd ideally like to converge on a device tree binding
that won't have to change in the near future.

I've tried to use an address space larger than 32bit and that seems to
work for parts of the dwc3 controller but breaks for the xhci parts because
the upper lines don't seem to be connected there (e.g. if xhci tries to
use <0x1 0x> I get a fault for <0 0x>).

Looking at other iommu drivers I have found the following two similar
bindings:

qcom uses a ranges property with a 64bit address and 32 bit size [1]

  apps_iommu: iommu@1e2 {
  ...
  ranges = <0 0x1e2 0x4>;
  ...
  };

and tegra seems to support a dma-window property with 32bit address
and size [2]

  smmu {
  [...]
  dma-window = <0 0x4000>;/* IOVA start & length */
  [...]
  };

I believe there already is of_get_dma_window to handle parsing this
in the common iommu code [3] but I can't find any place using it.
It's a little bit more complex that we need since it allows to specify the
number of cells for both the address and the size but it should allow us to
express all possible configurations:

  usb_dart {
  [ ... ]
  #dma-address-cells = <1>;
  #dma-size-cells = <2>;
  dma-window = <0 0x1 0x0>;
  [ ... ]
  };
  pcie_dart {
  [ ... ]
  #dma-address-cells = <1>;
  #dma-size-cells = <1>;
  dma-window = <0x10 0x3fe0>;
  [ ... ]
  };

where #dma-address-cells and #dma-size-cells default to
#address-cells and #size-cells respectively if I understand
the code correctly. That way we could also just always use
a 64bit address and size in the DT, e.g.

  pcie_dart {
  [ ... ]
  dma-window = <0 0x10 0 0x3fe0>;
  [ ... ]
  };


Best,

Sven


[1] Documentation/devicetree/bindings/iommu/qcom,iommu.txt
[2] Documentation/devicetree/bindings/iommu/nvidia,tegra30-smmu.txt
[3] drivers/iommu/of_iommu.c
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/3] Apple M1 DART IOMMU driver

2021-03-21 Thread Mark Kettenis
> Date: Sat, 20 Mar 2021 15:19:33 +
> From: Sven Peter 
> 
> Hi,
> 
> After Hector's initial work [1] to bring up Linux on Apple's M1 it's time to
> bring up more devices. Most peripherals connected to the SoC are behind a 
> iommu
> which Apple calls "Device Address Resolution Table", or DART for short [2].
> Unfortunately, it only shares the name with PowerPC's DART.
> Configuring this iommu is mandatory if these peripherals require DMA access.
> 
> This patchset implements initial support for this iommu. The hardware itself
> uses a pagetable format that's very similar to the one already implement in
> io-pgtable.c. There are some minor modifications, namely some details of the
> PTE format and that there are always three pagetable levels, which I've
> implement as a new format variant.
> 
> I have mainly tested this with the USB controller in device mode which is
> compatible with Linux's dwc3 driver. Some custom PHY initialization (which is
> not yet ready or fully understood) is required though to bring up the ports,
> see e.g. my patches to our m1n1 bootloader [3,4]. If you want to test the same
> setup you will probably need that branch for now and add the nodes from
> the DT binding specification example to your device tree.
> 
> Even though each DART instances could support up to 16 devices usually only
> a single device is actually connected. Different devices generally just use
> an entirely separate DART instance with a seperate MMIO range, IRQ, etc.
> 
> I have just noticed today though that at least the USB DWC3 controller in host
> mode uses *two* darts at the same time. I'm not sure yet which parts seem to
> require which DART instance.
> 
> This means that we might need to support devices attached to two iommus
> simultaneously and just create the same iova mappings. Currently this only
> seems to be required for USB according to Apple's Device Tree.
> 
> I see two options for this and would like to get feedback before
> I implement either one:
> 
> 1) Change #iommu-cells = <1>; to #iommu-cells = <2>; and use the first 
> cell
>to identify the DART and the second one to identify the master.
>The DART DT node would then also take two register ranges that would
>correspond to the two DARTs. Both instances use the same IRQ and the
>same clocks according to Apple's device tree and my experiments.
>This would keep a single device node and the DART driver would then
>simply map iovas in both DARTs if required.
> 
> 2) Keep #iommu-cells as-is but support
> iommus = <&usb_dart1a 1>, <&usb_dart1b 0>;
>instead.
>This would then require two devices nodes for the two DART instances 
> and
>some housekeeping in the DART driver to support mapping iovas in both
>DARTs.
>I believe omap-iommu.c supports this setup but I will have to read
>more code to understand the details there and figure out how to 
> implement
>this in a sane way.
> 
> I currently prefer the first option but I don't understand enough details of
> the iommu system to actually make an informed decision.
> I'm obviously also open to more options :-)

Hi Sven,

I don't think the first option is going to work for PCIe.  PCIe
devices will have to use "iommu-map" properties to map PCI devices to
the right iommu, and the currently implementation seems to assume that
#iommu-cells = <1>.  The devictree binding[1] doesn't explicitly state
that it relies on #iommu-cells = <1>, but it isn't clear how the
rid-base to iommu-base mapping mechanism would work when that isn't
the case.

Now the PCIe DARTs are simpler and seem to have only one "instance"
per DART.  So if we keep #iommu-cells = <1> for those, you'd still be
fine using the first approach.

As I mentioned before, not all DARTs support the full 32-bit aperture.
In particular the PCIe DARTs support a smaller address-space.  It is
not clear whether this is a restriction of the PCIe host controller or
the DART, but the Apple Device Tree has "vm-base" and "vm-size"
properties that encode the base address and size of the aperture.
These single-cell properties which is probably why for the USB DARTs
only "vm-base" is given; since "vm-base" is 0, a 32-bit number
wouldn't be able to encode the full aperture size.  We could make them
64-bit numbers in the Linux device tree though and always be explicit
about the size.  Older Sun SPARC machines used a single "virtual-dma"
property to encode the aperture.  We could do someting similar.  You
would use this property to initialize domain->geometry.aperture_start
and domain->geometry.aperture_end in diff 3/3 of this series.

I think it would make sense to include this in this series, as this
would make adding support for PCIe very easy, and PCIe gives you
aupport for network (both wired and wireless) and the type-A USB ports
on the mini.

Cheers,

Mark

[1] Documentation/devicetree/bindings/pci/pci-iommu.txt
_

Re: [PATCH 0/3] Apple M1 DART IOMMU driver

2021-03-21 Thread Mark Kettenis
> Date: Sun, 21 Mar 2021 18:22:15 +0100
> From: "Sven Peter" 
> 
> Hi Mark,
> 
>  On 21. Mar 2021, at 17:00, Mark Kettenis 
>  wrote:
> 
>  I don't think the first option is going to work for PCIe.  PCIe
>  devices will have to use "iommu-map" properties to map PCI devices to
>  the right iommu, and the currently implementation seems to assume that
>  #iommu-cells = <1>.  The devictree binding[1] doesn't explicitly state
>  that it relies on #iommu-cells = <1>, but it isn't clear how the
>  rid-base to iommu-base mapping mechanism would work when that isn't
>  the case.
> 
>  Now the PCIe DARTs are simpler and seem to have only one "instance"
>  per DART.  So if we keep #iommu-cells = <1> for those, you'd still be
>  fine using the first approach.
> 
> Good point, I guess that only leaves option two for now then.
> Having some DARTs use cells = <1> and others <2> sounds confusing to me.

Guess we do need to understand a little bit better how the USB DART
actually works.  My hypothesis (based on our discussion on #asahi) is
that the XHCI host controller and the peripheral controller of the
DWC3 block use different DMA "streams" that are handled by the
different sub-DARTs.

The Corellium folks use a DART + sub-DART model in their driver and a
single node in the device tree that represents both.  That might sense
since the error registers and interrupts are shared.  Maybe it would
make sense to select the appropriate sub-DART based on the DMA stream
ID?

>  As I mentioned before, not all DARTs support the full 32-bit aperture.
>  In particular the PCIe DARTs support a smaller address-space.  It is
>  not clear whether this is a restriction of the PCIe host controller or
>  the DART, but the Apple Device Tree has "vm-base" and "vm-size"
>  properties that encode the base address and size of the aperture.
>  These single-cell properties which is probably why for the USB DARTs
>  only "vm-base" is given; since "vm-base" is 0, a 32-bit number
>  wouldn't be able to encode the full aperture size.  We could make them
>  64-bit numbers in the Linux device tree though and always be explicit
>  about the size.  Older Sun SPARC machines used a single "virtual-dma"
>  property to encode the aperture.  We could do someting similar.  You
>  would use this property to initialize domain->geometry.aperture_start
>  and domain->geometry.aperture_end in diff 3/3 of this series.
> 
>  I think it would make sense to include this in this series, as this
>  would make adding support for PCIe very easy, and PCIe gives you
>  aupport for network (both wired and wireless) and the type-A USB ports
>  on the mini.
> 
> Agreed, I'd ideally like to converge on a device tree binding
> that won't have to change in the near future.
> 
> I've tried to use an address space larger than 32bit and that seems to
> work for parts of the dwc3 controller but breaks for the xhci parts because
> the upper lines don't seem to be connected there (e.g. if xhci tries to
> use <0x1 0x> I get a fault for <0 0x>).
> 
> Looking at other iommu drivers I have found the following two similar
> bindings:

Ah, missed those.

> qcom uses a ranges property with a 64bit address and 32 bit size [1]
> 
>   apps_iommu: iommu@1e2 {
>   ...
>   ranges = <0 0x1e2 0x4>;
>   ...
>   };

Doesn't sound like a very good idea to me to repurpose "ranges" for
this...

> 
> and tegra seems to support a dma-window property with 32bit address
> and size [2]
> 
>   smmu {
>   [...]
>   dma-window = <0 0x4000>;/* IOVA start & length */
>   [...]
>   };
> 
> I believe there already is of_get_dma_window to handle parsing this
> in the common iommu code [3] but I can't find any place using it.
> It's a little bit more complex that we need since it allows to specify the
> number of cells for both the address and the size but it should allow us to
> express all possible configurations:
> 
>   usb_dart {
>   [ ... ]
>   #dma-address-cells = <1>;
>   #dma-size-cells = <2>;
>   dma-window = <0 0x1 0x0>;
>   [ ... ]
>   };
>   pcie_dart {
>   [ ... ]
>   #dma-address-cells = <1>;
>   #dma-size-cells = <1>;
>   dma-window = <0x10 0x3fe0>;
>   [ ... ]
>   };
> 
> where #dma-address-cells and #dma-size-cells default to
> #address-cells and #size-cells respectively if I understand
> the code correctly. That way we could also just always use
> a 64bit address and size in the DT, e.g.
> 
>   pcie_dart {
>   [ ... ]
>   dma-window = <0 0x10 0 0x3fe0>;
>   [ ... ]
>   };

That sounds like a serious contender to me!  Hopefully one of the
Linux kernel developers can give this some sort of blessing.

I think it would make sense for us to just rely on the #address-cells
and #size-cells defaults for the M1 device tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/io

RE: A problem of Intel IOMMU hardware ?

2021-03-21 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Nadav,

> -Original Message-
> From: Nadav Amit [mailto:nadav.a...@gmail.com]
> Sent: Friday, March 19, 2021 12:46 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> 
> Cc: Tian, Kevin ; chenjiashang
> ; David Woodhouse ;
> iommu@lists.linux-foundation.org; LKML ;
> alex.william...@redhat.com; Gonglei (Arei) ;
> w...@kernel.org
> Subject: Re: A problem of Intel IOMMU hardware ?
> 
> 
> 
> > On Mar 18, 2021, at 2:25 AM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.)  wrote:
> >
> >
> >
> >> -Original Message-
> >> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> >> Sent: Thursday, March 18, 2021 4:56 PM
> >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >> ; Nadav Amit 
> >> Cc: chenjiashang ; David Woodhouse
> >> ; iommu@lists.linux-foundation.org; LKML
> >> ; alex.william...@redhat.com; Gonglei
> >> (Arei) ; w...@kernel.org
> >> Subject: RE: A problem of Intel IOMMU hardware ?
> >>
> >>> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >>> 
> >>>
>  -Original Message-
>  From: Tian, Kevin [mailto:kevin.t...@intel.com]
>  Sent: Thursday, March 18, 2021 4:27 PM
>  To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
>  ; Nadav Amit 
>  Cc: chenjiashang ; David Woodhouse
>  ; iommu@lists.linux-foundation.org; LKML
>  ; alex.william...@redhat.com; Gonglei
> >>> (Arei)
>  ; w...@kernel.org
>  Subject: RE: A problem of Intel IOMMU hardware ?
> 
> > From: iommu  On Behalf
> > Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >
> >> 2. Consider ensuring that the problem is not somehow related to
> >> queued invalidations. Try to use __iommu_flush_iotlb() instead of
>  qi_flush_iotlb().
> >>
> >
> > I tried to force to use __iommu_flush_iotlb(), but maybe something
> > wrong, the system crashed, so I prefer to lower the priority of
> > this
> >>> operation.
> >
> 
>  The VT-d spec clearly says that register-based invalidation can be
>  used only
> >>> when
>  queued-invalidations are not enabled. Intel-IOMMU driver doesn't
>  provide
> >>> an
>  option to disable queued-invalidation though, when the hardware is
> >>> capable. If you
>  really want to try, tweak the code in intel_iommu_init_qi.
> 
> >>>
> >>> Hi Kevin,
> >>>
> >>> Thanks to point out this. Do you have any ideas about this problem ?
> >>> I tried to descript the problem much clear in my reply to Alex, hope
> >>> you could have a look if you're interested.
> >>>
> >>
> >> btw I saw you used 4.18 kernel in this test. What about latest kernel?
> >>
> >
> > Not test yet. It's hard to upgrade kernel in our environment.
> >
> >> Also one way to separate sw/hw bug is to trace the low level
> >> interface (e.g.,
> >> qi_flush_iotlb) which actually sends invalidation descriptors to the
> >> IOMMU hardware. Check the window between b) and c) and see whether
> >> the software does the right thing as expected there.
> >>
> >
> > We add some log in iommu driver these days, the software seems fine.
> > But we didn't look inside the qi_submit_sync yet, I'll try it tonight.
> 
> So here is my guess:
> 
> Intel probably used as a basis for the IOTLB an implementation of some other
> (regular) TLB design.
> 
> Intel SDM says regarding TLBs (4.10.4.2 “Recommended Invalidation”):
> 
> "Software wishing to prevent this uncertainty should not write to a
> paging-structure entry in a way that would change, for any linear address, 
> both the
> page size and either the page frame, access rights, or other attributes.”
> 
> 
> Now the aforementioned uncertainty is a bit different (multiple
> *valid* translations of a single address). Yet, perhaps this is yet another 
> thing that
> might happen.
> 
> From a brief look on the handling of MMU (not IOMMU) hugepages in Linux, 
> indeed
> the PMD is first cleared and flushed before a new valid PMD is set. This is 
> possible
> for MMUs since they allow the software to handle spurious page-faults 
> gracefully.
> This is not the case for the IOMMU though (without PRI).
> 

But in my case, the flush_iotlb is called after the range of (0x0, 0xa) is 
unmapped,
I've no idea why this invalidation isn't effective except I've not look inside 
the qi yet, but
there is no complaints from the driver.

Could you please point out the code of MMU you mentioned above? In MMU code, is 
it
possible that all the entries of the PTE are all not-present but the PMD entry 
is still present?

*Page table after (0x0, 0xa) is unmapped:
PML4: 0x  1a34fbb003
  PDPE: 0x  1a34fbb003
   PDE: 0x  1a34fbf003
PTE: 0x   0

*Page table after (0x0, 0xc000) is mapped:
PML4: 0x  1a34fbb003
  PDPE: 0x  1a34fbb003
   PDE: 0x   15ec00883

> Not sure this explains everything though. If that is the problem, then during 
> a
> mapping that changes page-sizes, a TLB flush is 

Re: [PATCH 2/3] dt-bindings: iommu: add DART iommu bindings

2021-03-21 Thread Rob Herring
On Sat, 20 Mar 2021 15:20:08 +, Sven Peter wrote:
> DART (Device Address Resolution Table) is the iommu found on Apple
> ARM SoCs such as the M1.
> 
> Signed-off-by: Sven Peter 
> ---
>  .../bindings/iommu/apple,t8103-dart.yaml  | 82 +++
>  MAINTAINERS   |  6 ++
>  2 files changed, 88 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/iommu/apple,t8103-dart.yaml
> 

My bot found errors running 'make dt_binding_check' on your patch:

yamllint warnings/errors:

dtschema/dtc warnings/errors:
Error: 
Documentation/devicetree/bindings/iommu/apple,t8103-dart.example.dts:23.25-26 
syntax error
FATAL ERROR: Unable to parse input tree
make[1]: *** [scripts/Makefile.lib:349: 
Documentation/devicetree/bindings/iommu/apple,t8103-dart.example.dt.yaml] Error 
1
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:1380: dt_binding_check] Error 2

See https://patchwork.ozlabs.org/patch/1456122

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: A problem of Intel IOMMU hardware ?

2021-03-21 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)


> -Original Message-
> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> Sent: Monday, March 22, 2021 7:51 AM
> To: 'Nadav Amit' 
> Cc: Tian, Kevin ; chenjiashang
> ; David Woodhouse ;
> iommu@lists.linux-foundation.org; LKML ;
> alex.william...@redhat.com; Gonglei (Arei) ;
> w...@kernel.org; 'Lu Baolu' ; 'Joerg Roedel'
> 
> Subject: RE: A problem of Intel IOMMU hardware ?
> 
> Hi Nadav,
> 
> > -Original Message-
> > From: Nadav Amit [mailto:nadav.a...@gmail.com]
> > Sent: Friday, March 19, 2021 12:46 AM
> > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > 
> > Cc: Tian, Kevin ; chenjiashang
> > ; David Woodhouse ;
> > iommu@lists.linux-foundation.org; LKML ;
> > alex.william...@redhat.com; Gonglei (Arei) ;
> > w...@kernel.org
> > Subject: Re: A problem of Intel IOMMU hardware ?
> >
> >
> >
> > > On Mar 18, 2021, at 2:25 AM, Longpeng (Mike, Cloud Infrastructure
> > > Service
> > Product Dept.)  wrote:
> > >
> > >
> > >
> > >> -Original Message-
> > >> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> > >> Sent: Thursday, March 18, 2021 4:56 PM
> > >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >> ; Nadav Amit 
> > >> Cc: chenjiashang ; David Woodhouse
> > >> ; iommu@lists.linux-foundation.org; LKML
> > >> ; alex.william...@redhat.com; Gonglei
> > >> (Arei) ; w...@kernel.org
> > >> Subject: RE: A problem of Intel IOMMU hardware ?
> > >>
> > >>> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >>> 
> > >>>
> >  -Original Message-
> >  From: Tian, Kevin [mailto:kevin.t...@intel.com]
> >  Sent: Thursday, March 18, 2021 4:27 PM
> >  To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >  ; Nadav Amit 
> >  Cc: chenjiashang ; David Woodhouse
> >  ; iommu@lists.linux-foundation.org; LKML
> >  ; alex.william...@redhat.com;
> >  Gonglei
> > >>> (Arei)
> >  ; w...@kernel.org
> >  Subject: RE: A problem of Intel IOMMU hardware ?
> > 
> > > From: iommu  On Behalf
> > > Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >
> > >> 2. Consider ensuring that the problem is not somehow related to
> > >> queued invalidations. Try to use __iommu_flush_iotlb() instead
> > >> of
> >  qi_flush_iotlb().
> > >>
> > >
> > > I tried to force to use __iommu_flush_iotlb(), but maybe
> > > something wrong, the system crashed, so I prefer to lower the
> > > priority of this
> > >>> operation.
> > >
> > 
> >  The VT-d spec clearly says that register-based invalidation can
> >  be used only
> > >>> when
> >  queued-invalidations are not enabled. Intel-IOMMU driver doesn't
> >  provide
> > >>> an
> >  option to disable queued-invalidation though, when the hardware
> >  is
> > >>> capable. If you
> >  really want to try, tweak the code in intel_iommu_init_qi.
> > 
> > >>>
> > >>> Hi Kevin,
> > >>>
> > >>> Thanks to point out this. Do you have any ideas about this problem ?
> > >>> I tried to descript the problem much clear in my reply to Alex,
> > >>> hope you could have a look if you're interested.
> > >>>
> > >>
> > >> btw I saw you used 4.18 kernel in this test. What about latest kernel?
> > >>
> > >
> > > Not test yet. It's hard to upgrade kernel in our environment.
> > >
> > >> Also one way to separate sw/hw bug is to trace the low level
> > >> interface (e.g.,
> > >> qi_flush_iotlb) which actually sends invalidation descriptors to
> > >> the IOMMU hardware. Check the window between b) and c) and see
> > >> whether the software does the right thing as expected there.
> > >>
> > >
> > > We add some log in iommu driver these days, the software seems fine.
> > > But we didn't look inside the qi_submit_sync yet, I'll try it tonight.
> >
> > So here is my guess:
> >
> > Intel probably used as a basis for the IOTLB an implementation of some
> > other
> > (regular) TLB design.
> >
> > Intel SDM says regarding TLBs (4.10.4.2 “Recommended Invalidation”):
> >
> > "Software wishing to prevent this uncertainty should not write to a
> > paging-structure entry in a way that would change, for any linear
> > address, both the page size and either the page frame, access rights, or 
> > other
> attributes.”
> >
> >
> > Now the aforementioned uncertainty is a bit different (multiple
> > *valid* translations of a single address). Yet, perhaps this is yet
> > another thing that might happen.
> >
> > From a brief look on the handling of MMU (not IOMMU) hugepages in
> > Linux, indeed the PMD is first cleared and flushed before a new valid
> > PMD is set. This is possible for MMUs since they allow the software to 
> > handle
> spurious page-faults gracefully.
> > This is not the case for the IOMMU though (without PRI).
> >
> 
> But in my case, the flush_iotlb is called after the range of (0x0, 0xa) is
> unmapped, I've no idea why this invalidation isn't effective except I've not 

Re: [PATCH] iommu/dma: Fix a typo in a comment

2021-03-21 Thread chenxiang (M)



在 2021/3/19 19:02, Robin Murphy 写道:

On 2021-03-19 01:02, chenxiang (M) wrote:

Hi,


在 2021/3/18 19:01, Robin Murphy 写道:

On 2021-03-18 09:55, chenxiang (M) wrote:

Hi will,


在 2021/3/18 17:34, Will Deacon 写道:

On Thu, Mar 18, 2021 at 11:21:24AM +0800, chenxiang wrote:

From: Xiang Chen 

Fix a type "SAC" to "DAC" in the comment of function
iommu_dma_alloc_iova().

Signed-off-by: Xiang Chen 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c8..3bc17ee 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -443,7 +443,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,

  if (domain->geometry.force_aperture)
  dma_limit = min(dma_limit, 
(u64)domain->geometry.aperture_end);

-/* Try to get PCI devices a SAC address */
+/* Try to get PCI devices a DAC address */
  if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
  iova = alloc_iova_fast(iovad, iova_len,
 DMA_BIT_MASK(32) >> shift, false);

This doesn't look like a typo to me... Please explain.


As the author of said comment, it is definitely not a typo.


What's the short for SAC?


Single Address Cycle - this is standard terminology from the PCI spec.


Got it, thanks.



Robin.

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 6/7] iommu/amd: Introduce amd_iommu_pgtable command-line option

2021-03-21 Thread Suravee Suthikulpanit

Joerg,

On 3/18/21 10:33 PM, Joerg Roedel wrote:

On Fri, Mar 12, 2021 at 03:04:10AM -0600, Suravee Suthikulpanit wrote:

To allow specification whether to use v1 or v2 IOMMU pagetable for
DMA remapping when calling kernel DMA-API.

Signed-off-by: Suravee Suthikulpanit 
---
  Documentation/admin-guide/kernel-parameters.txt |  6 ++
  drivers/iommu/amd/init.c| 15 +++
  2 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..466e807369ea 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -319,6 +319,12 @@
 This mode requires kvm-amd.avic=1.
 (Default when IOMMU HW support is present.)
  
+	amd_iommu_pgtable= [HW,X86-64]

+   Specifies one of the following AMD IOMMU page table to
+   be used for DMA remapping for DMA-API:
+   v1 - Use v1 page table (Default)
+   v2 - Use v2 page table


Any reason v2 can not be the default when it is supported by the IOMMU?



Eventually, we should be able to default to v2. However, we will need to make 
sure that
the v2 implementation will have comparable performance as currently used v1.

FYI: I'm also looking into adding support for SVA as well.

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table

2021-03-21 Thread Keqian Zhu
Hi Eric,

On 2021/3/19 21:15, Auger Eric wrote:
> Hi Keqian,
> 
> On 3/2/21 9:35 AM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2021/2/24 4:56, Eric Auger wrote:
>>> On attach_pasid_table() we program STE S1 related info set
>>> by the guest into the actual physical STEs. At minimum
>>> we need to program the context descriptor GPA and compute
>>> whether the stage1 is translated/bypassed or aborted.
>>>
>>> On detach, the stage 1 config is unset and the abort flag is
>>> unset.
>>>
>>> Signed-off-by: Eric Auger 
>>>
>> [...]
>>
>>> +
>>> +   /*
>>> +* we currently support a single CD so s1fmt and s1dss
>>> +* fields are also ignored
>>> +*/
>>> +   if (cfg->pasid_bits)
>>> +   goto out;
>>> +
>>> +   smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a 
>> specific cd using arm_smmu_get_cd_ptr().
>>
>> Maybe we'd better use a specialized function to fill other fields of "cdcfg" 
>> or add a sanity check in arm_smmu_get_cd_ptr()
>> to prevent calling it under nested mode?
>>
>> As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem 
>> found. Just a suggestion ;-)
> 
> forgive me for the delay. yes I can indeed make sure that code is not
> called in nested mode. Please could you detail why you would need to
> call arm_smmu_get_cd_ptr()?
I accidentally called this function in nested mode when verify the smmu mpam 
feature. :)

Yes, in nested mode, context descriptor is owned by guest, hypervisor does not 
need to care about its content.
Maybe we'd better give an explicit comment for arm_smmu_get_cd_ptr() to let 
coder pay attention to this? :)

Thanks,
Keqian

> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Keqian
>>
>>
>>> +   smmu_domain->s1_cfg.set = true;
>>> +   smmu_domain->abort = false;
>>> +   break;
>>> +   default:
>>> +   goto out;
>>> +   }
>>> +   spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>>> +   list_for_each_entry(master, &smmu_domain->devices, domain_head)
>>> +   arm_smmu_install_ste_for_dev(master);
>>> +   spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>>> +   ret = 0;
>>> +out:
>>> +   mutex_unlock(&smmu_domain->init_mutex);
>>> +   return ret;
>>> +}
>>> +
>>> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>> +{
>>> +   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>> +   struct arm_smmu_master *master;
>>> +   unsigned long flags;
>>> +
>>> +   mutex_lock(&smmu_domain->init_mutex);
>>> +
>>> +   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>>> +   goto unlock;
>>> +
>>> +   smmu_domain->s1_cfg.set = false;
>>> +   smmu_domain->abort = false;
>>> +
>>> +   spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>>> +   list_for_each_entry(master, &smmu_domain->devices, domain_head)
>>> +   arm_smmu_install_ste_for_dev(master);
>>> +   spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>>> +
>>> +unlock:
>>> +   mutex_unlock(&smmu_domain->init_mutex);
>>> +}
>>> +
>>>  static bool arm_smmu_dev_has_feature(struct device *dev,
>>>  enum iommu_dev_features feat)
>>>  {
>>> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>>> .of_xlate   = arm_smmu_of_xlate,
>>> .get_resv_regions   = arm_smmu_get_resv_regions,
>>> .put_resv_regions   = generic_iommu_put_resv_regions,
>>> +   .attach_pasid_table = arm_smmu_attach_pasid_table,
>>> +   .detach_pasid_table = arm_smmu_detach_pasid_table,
>>> .dev_has_feat   = arm_smmu_dev_has_feature,
>>> .dev_feat_enabled   = arm_smmu_dev_feature_enabled,
>>> .dev_enable_feat= arm_smmu_dev_enable_feature,
>>>
>>
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Linuxarm] Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-03-21 Thread chenxiang (M)

Hi Eric,


在 2021/3/20 1:36, Auger Eric 写道:

Hi Chenxiang,

On 3/4/21 8:55 AM, chenxiang (M) wrote:

Hi Eric,


在 2021/2/24 4:56, Eric Auger 写道:

Implement domain-selective, pasid selective and page-selective
IOTLB invalidations.

Signed-off-by: Eric Auger 

---

v13 -> v14:
- Add domain invalidation
- do global inval when asid is not provided with addr
   granularity

v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
   anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74 +
  1 file changed, 74 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4c19a1114de4..df3adc49111c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2949,6 +2949,79 @@ static void arm_smmu_detach_pasid_table(struct 
iommu_domain *domain)
mutex_unlock(&smmu_domain->init_mutex);
  }
  
+static int

+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info)
+{
+   struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -EINVAL;
+
+   if (!smmu)
+   return -EINVAL;
+
+   if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+   inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+   return -ENOENT;
+   }
+
+   if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
+   return -EINVAL;
+
+   /* IOTLB invalidation */
+
+   switch (inv_info->granularity) {
+   case IOMMU_INV_GRANU_PASID:
+   {
+   struct iommu_inv_pasid_info *info =
+   &inv_info->granu.pasid_info;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+   if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_ADDR:
+   {
+   struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
+   size_t size = info->nb_granules * info->granule_size;
+   bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+
+   if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
+   break;
+
+   arm_smmu_tlb_inv_range_domain(info->addr, size,
+ info->granule_size, leaf,
+ info->archid, smmu_domain);

Is it possible to add a check before the function to make sure that
info->granule_size can be recognized by SMMU?
There is a scenario which will cause TLBI issue: RIL feature is enabled
on guest but is disabled on host, and then on
host it just invalidate 4K/2M/1G once a time, but from QEMU,
info->nb_granules is always 1 and info->granule_size = size,
if size is not equal to 4K or 2M or 1G (for example size = granule_size
is 5M), it will only part of the size it wants to invalidate.


Do you have any idea about this issue?



I think maybe we can add a check here: if RIL is not enabled and also
size is not the granule_size (4K/2M/1G) supported by
SMMU hardware, can we just simply use the smallest granule_size
supported by hardware all the time?


+
+   arm_smmu_cmdq_issue_sync(smmu);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_DOMAIN:
+   break;

I check the qemu code
(https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's,
but it seems not set event.entry.granularity which i think it should set
IOMMU_INV_GRAN_ADDR.

this is because IOMMU_INV_GRAN_ADDR = 0. But for clarity I should rather
set it explicitly ;-)


ok


BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap
size = 0x1 on 48bit system (it may spend much time),  maybe
it is better
to set it as IOMMU_INV_GRANU_DOMAIN, then in host OS, send