Re: [PATCH v3 1/3] iommu/vt-d: Allow 32bit devices to uses DMA domain

2020-04-16 Thread Christoph Hellwig
On Thu, Apr 16, 2020 at 03:40:38PM +0800, Lu Baolu wrote:
>> description.  I'd need to look at the final code, but it seems like
>> this will still cause bounce buffering instead of using dynamic
>> mapping, which still seems like an awful idea.
>
> Yes. If the user chooses to use identity domain by default through
> kernel command, identity domain will be applied for all devices. For
> those devices with limited addressing capability, bounce buffering will
> be used when they try to access the memory beyond their address
> capability. This won't cause any kernel regression as far as I can see.
>
> Switching domain during runtime with drivers loaded will cause real
> problems as I said in the commit message. That's the reason why I am
> proposing to remove it. If we want to keep it, we have to make sure that
> switching domain for one device should not impact other devices which
> share the same domain with it. Furthermore, it's better to implement it
> in the generic layer to keep device driver behavior consistent on all
> architectures.

I don't disagree with the technical points.  What I pointed out is that

 a) the actual technical change is not in the commit log, which it
should be
 b) that I still think taking away the ability to dynamically map
devices in the identify domain after all the time we allowed for
that is going to cause nasty regressions.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3] dt-bindings: iommu: renesas,ipmmu-vmsa: convert to json-schema

2020-04-16 Thread Yoshihiro Shimoda
Hi all,

> From: Yoshihiro Shimoda, Sent: Friday, April 17, 2020 1:35 PM

> +properties:
> +  compatible:
> +oneOf:
> +  - items:
> +  - enum:
> +  - renesas,ipmmu-r8a73a4  # R-Mobile APE6
> +  - renesas,ipmmu-r8a7743  # RZ/G1M
> +  - renesas,ipmmu-r8a7744  # RZ/G1N
> +  - renesas,ipmmu-r8a7745  # RZ/G1E
> +  - renesas,ipmmu-r8a7790  # R-Car H2
> +  - renesas,ipmmu-r8a7791  # R-Car M2-W
> +  - renesas,ipmmu-r8a7793  # R-Car M2-N
> +  - renesas,ipmmu-r8a7794  # R-Car E2
> +  - renesas,ipmmu-r8a7795  # R-Car H3

I'm afraid but this renesas,ipmmu-r8a7795 should be the below section.
So, I'll fix it.

Best regards,
Yoshihiro Shimoda

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-16 Thread Liu, Yi L
Hi Alex,
> From: Alex Williamson 
> Sent: Thursday, April 16, 2020 10:41 PM
> To: Liu, Yi L 
> Subject: Re: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE
> 
> On Thu, 16 Apr 2020 10:40:03 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> > Still have a direction question with you. Better get agreement with you
> > before heading forward.
> >
> > > From: Alex Williamson 
> > > Sent: Friday, April 3, 2020 11:35 PM
> > [...]
> > > > > > + *
> > > > > > + * returns: 0 on success, -errno on failure.
> > > > > > + */
> > > > > > +struct vfio_iommu_type1_cache_invalidate {
> > > > > > +   __u32   argsz;
> > > > > > +   __u32   flags;
> > > > > > +   struct  iommu_cache_invalidate_info cache_info;
> > > > > > +};
> > > > > > +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE,
> > > VFIO_BASE
> > > > > + 24)
> > > > >
> > > > > The future extension capabilities of this ioctl worry me, I wonder if
> > > > > we should do another data[] with flag defining that data as 
> > > > > CACHE_INFO.
> > > >
> > > > Can you elaborate? Does it mean with this way we don't rely on iommu
> > > > driver to provide version_to_size conversion and instead we just pass
> > > > data[] to iommu driver for further audit?
> > >
> > > No, my concern is that this ioctl has a single function, strictly tied
> > > to the iommu uapi.  If we replace cache_info with data[] then we can
> > > define a flag to specify that data[] is struct
> > > iommu_cache_invalidate_info, and if we need to, a different flag to
> > > identify data[] as something else.  For example if we get stuck
> > > expanding cache_info to meet new demands and develop a new uapi to
> > > solve that, how would we expand this ioctl to support it rather than
> > > also create a new ioctl?  There's also a trade-off in making the ioctl
> > > usage more difficult for the user.  I'd still expect the vfio layer to
> > > check the flag and interpret data[] as indicated by the flag rather
> > > than just passing a blob of opaque data to the iommu layer though.
> > > Thanks,
> >
> > Based on your comments about defining a single ioctl and a unified
> > vfio structure (with a @data[] field) for pasid_alloc/free, bind/
> > unbind_gpasid, cache_inv. After some offline trying, I think it would
> > be good for bind/unbind_gpasid and cache_inv as both of them use the
> > iommu uapi definition. While the pasid alloc/free operation doesn't.
> > It would be weird to put all of them together. So pasid alloc/free
> > may have a separate ioctl. It would look as below. Does this direction
> > look good per your opinion?
> >
> > ioctl #22: VFIO_IOMMU_PASID_REQUEST
> > /**
> >   * @pasid: used to return the pasid alloc result when flags == ALLOC_PASID
> >   * specify a pasid to be freed when flags == FREE_PASID
> >   * @range: specify the allocation range when flags == ALLOC_PASID
> >   */
> > struct vfio_iommu_pasid_request {
> > __u32   argsz;
> > #define VFIO_IOMMU_ALLOC_PASID  (1 << 0)
> > #define VFIO_IOMMU_FREE_PASID   (1 << 1)
> > __u32   flags;
> > __u32   pasid;
> > struct {
> > __u32   min;
> > __u32   max;
> > } range;
> > };
> 
> Can't the ioctl return the pasid valid on alloc (like GET_DEVICE_FD)?

Yep, I think you mentioned before. At that time, I believed it would be
better to return the result via a __u32 buffer so that make full use of
the 32 bits. But looks like it doesn't make much difference. I'll follow
your suggestion.

> Would it be useful to support freeing a range of pasids?  If so then we
> could simply use range for both, ie. allocate a pasid from this range
> and return it, or free all pasids in this range?  vfio already needs to
> track pasids to free them on release, so presumably this is something
> we could support easily.

yes, I think it is a nice thing. then I can remove the @pasid field.
will do it.

> > ioctl #23: VFIO_IOMMU_NESTING_OP
> > struct vfio_iommu_type1_nesting_op {
> > __u32   argsz;
> > __u32   flags;
> > __u32   op;
> > __u8data[];
> > };
> 
> data only has 4-byte alignment, I think we really want it at an 8-byte
> alignment.  This is why I embedded the "op" into the flag for
> DEVICE_FEATURE.  Thanks,

got it. I may also merge the op into flags (maybe the lower 16 bits for
op).

Thanks,
Yi Liu
> Alex
> 
> >
> > /* Nesting Ops */
> > #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL0
> > #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL  1
> > #define VFIO_IOMMU_NESTING_OP_CACHE_INVLD   2
> >
> > Thanks,
> > Yi Liu
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] dt-bindings: iommu: renesas, ipmmu-vmsa: convert to json-schema

2020-04-16 Thread Yoshihiro Shimoda
Convert Renesas VMSA-Compatible IOMMU bindings documentation
to json-schema.

Note that original documentation doesn't mention renesas,ipmmu-vmsa
for R-Mobile APE6. But, R-Mobile APE6 is similar to the R-Car
Gen2. So, renesas,ipmmu-r8a73a4 belongs the renesas,ipmmu-vmsa
section.

Signed-off-by: Yoshihiro Shimoda 
---
 Changes from v2:
 - Add a description for R-Mobile APE6 on the commit log.
 - Change renesas,ipmmu-r8a73a4 section on the compatible.
 - Add items on the interrupts.
 - Add power-domains to required.
 - Add oneOf for interrupts and renesas,ipmmu-main
 https://patchwork.kernel.org/patch/11490581/

 Changes from v1:
 - Fix typo in the subject.
 - Add a description on #iommu-cells.
 https://patchwork.kernel.org/patch/11485415/

 .../bindings/iommu/renesas,ipmmu-vmsa.txt  |  73 ---
 .../bindings/iommu/renesas,ipmmu-vmsa.yaml | 101 +
 2 files changed, 101 insertions(+), 73 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
 create mode 100644 
Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml

diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt 
b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
deleted file mode 100644
index 020d6f2..
--- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt
+++ /dev/null
@@ -1,73 +0,0 @@
-* Renesas VMSA-Compatible IOMMU
-
-The IPMMU is an IOMMU implementation compatible with the ARM VMSA page tables.
-It provides address translation for bus masters outside of the CPU, each
-connected to the IPMMU through a port called micro-TLB.
-
-
-Required Properties:
-
-  - compatible: Must contain SoC-specific and generic entry below in case
-the device is compatible with the R-Car Gen2 VMSA-compatible IPMMU.
-
-- "renesas,ipmmu-r8a73a4" for the R8A73A4 (R-Mobile APE6) IPMMU.
-- "renesas,ipmmu-r8a7743" for the R8A7743 (RZ/G1M) IPMMU.
-- "renesas,ipmmu-r8a7744" for the R8A7744 (RZ/G1N) IPMMU.
-- "renesas,ipmmu-r8a7745" for the R8A7745 (RZ/G1E) IPMMU.
-- "renesas,ipmmu-r8a774a1" for the R8A774A1 (RZ/G2M) IPMMU.
-- "renesas,ipmmu-r8a774b1" for the R8A774B1 (RZ/G2N) IPMMU.
-- "renesas,ipmmu-r8a774c0" for the R8A774C0 (RZ/G2E) IPMMU.
-- "renesas,ipmmu-r8a7790" for the R8A7790 (R-Car H2) IPMMU.
-- "renesas,ipmmu-r8a7791" for the R8A7791 (R-Car M2-W) IPMMU.
-- "renesas,ipmmu-r8a7793" for the R8A7793 (R-Car M2-N) IPMMU.
-- "renesas,ipmmu-r8a7794" for the R8A7794 (R-Car E2) IPMMU.
-- "renesas,ipmmu-r8a7795" for the R8A7795 (R-Car H3) IPMMU.
-- "renesas,ipmmu-r8a7796" for the R8A7796 (R-Car M3-W) IPMMU.
-- "renesas,ipmmu-r8a77965" for the R8A77965 (R-Car M3-N) IPMMU.
-- "renesas,ipmmu-r8a77970" for the R8A77970 (R-Car V3M) IPMMU.
-- "renesas,ipmmu-r8a77980" for the R8A77980 (R-Car V3H) IPMMU.
-- "renesas,ipmmu-r8a77990" for the R8A77990 (R-Car E3) IPMMU.
-- "renesas,ipmmu-r8a77995" for the R8A77995 (R-Car D3) IPMMU.
-- "renesas,ipmmu-vmsa" for generic R-Car Gen2 or RZ/G1 VMSA-compatible
-  IPMMU.
-
-  - reg: Base address and size of the IPMMU registers.
-  - interrupts: Specifiers for the MMU fault interrupts. For instances that
-support secure mode two interrupts must be specified, for non-secure and
-secure mode, in that order. For instances that don't support secure mode a
-single interrupt must be specified. Not required for cache IPMMUs.
-
-  - #iommu-cells: Must be 1.
-
-Optional properties:
-
-  - renesas,ipmmu-main: reference to the main IPMMU instance in two cells.
-The first cell is a phandle to the main IPMMU and the second cell is
-the interrupt bit number associated with the particular cache IPMMU device.
-The interrupt bit number needs to match the main IPMMU IMSSTR register.
-Only used by cache IPMMU instances.
-
-
-Each bus master connected to an IPMMU must reference the IPMMU in its device
-node with the following property:
-
-  - iommus: A reference to the IPMMU in two cells. The first cell is a phandle
-to the IPMMU and the second cell the number of the micro-TLB that the
-device is connected to.
-
-
-Example: R8A7791 IPMMU-MX and VSP1-D0 bus master
-
-   ipmmu_mx: mmu@fe951000 {
-   compatible = "renasas,ipmmu-r8a7791", "renasas,ipmmu-vmsa";
-   reg = <0 0xfe951000 0 0x1000>;
-   interrupts = <0 222 IRQ_TYPE_LEVEL_HIGH>,
-<0 221 IRQ_TYPE_LEVEL_HIGH>;
-   #iommu-cells = <1>;
-   };
-
-   vsp@fe928000 {
-   ...
-   iommus = <&ipmmu_mx 13>;
-   ...
-   };
diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml 
b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
new file mode 100644
index ..8ade89d
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
@@ -0,0 +1,101 @@
+

Re: [PATCH v2 5/7] iommu/vt-d: Save prq descriptors in an internal list

2020-04-16 Thread Lu Baolu

Hi Kevin,

On 2020/4/16 9:46, Lu Baolu wrote:

On 2020/4/15 17:30, Tian, Kevin wrote:

From: Lu Baolu
Sent: Wednesday, April 15, 2020 1:26 PM

Currently, the page request interrupt thread handles the page
requests in the queue in this way:

- Clear PPR bit to ensure new interrupt could come in;
- Read and record the head and tail registers;
- Handle all descriptors between head and tail;
- Write tail to head register.

This might cause some descriptors to be handles multiple times.
An example sequence:

- Thread A got scheduled with PRQ_1 and PRQ_2 in the queue;
- Thread A clear the PPR bit and record the head and tail;
- A new PRQ_3 comes and Thread B gets scheduled;
- Thread B record the head and tail which includes PRQ_1
   and PRQ_2.

I may overlook something but isn't the prq interrupt thread
per iommu then why would two prq threads contend here?


The prq interrupt could be masked by the PPR (Pending Page Request) bit
in Page Request Status Register. In the interrupt handling thread once
this bit is clear, new prq interrupts are allowed to be generated.

So, if a page request is in process and the PPR bit is cleared, another
page request from any devices under the same iommu could trigger another
interrupt thread.


Rechecked the code. You are right. As long as the interrupt thread is
per iommu, there will only single prq thread scheduled. I will change
this accordingly in the new version. Thank you for pointing this out.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v11 05/10] iommu/vt-d: Add bind guest PASID support

2020-04-16 Thread Tian, Kevin
> From: Auger Eric
> Sent: Thursday, April 16, 2020 6:43 PM
> 
[...]
> >>> + if (svm) {
> >>> + /*
> >>> +  * If we found svm for the PASID, there must be at
> >>> +  * least one device bond, otherwise svm should be
> >>> freed.
> >>> +  */
> >>> + if (WARN_ON(list_empty(&svm->devs))) {
> >>> + ret = -EINVAL;
> >>> + goto out;
> >>> + }
> >>> +
> >>> + for_each_svm_dev(sdev, svm, dev) {
> >>> + /* In case of multiple sub-devices of the
> >>> same pdev
> >>> +  * assigned, we should allow multiple bind
> >>> calls with
> >>> +  * the same PASID and pdev.
> >>> +  */
> >>> + sdev->users++;
> >> What if this is not an mdev device. Is it also allowed?
> > Yes. IOMMU and VT-d driver is not mdev aware. Here mdev is just an
> > example of normal use case. You can bind the same PCI device (PF or
> > SRIOV VF) more than once to the same PASID. Just need to unbind also.
> 
> I don't get the point of binding a non mdev device several times with
> the same PASID. Do you intend to allow that at userspace level or
> prevent this from happening in VFIO?

I feel it's better to prevent this from happening, otherwise VFIO also
needs to track the bind count and do multiple unbinds at mm_exit.
But it's not necessary to prevent it in VFIO. We can check here
upon whether aux_domain is valid, and if not return -EBUSY.

> 
> Besides, the comment is a bit misleading as it gives the impression it
> is only true for mdev and there is no associated check.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 6/7] iommu/vt-d: Add page request draining support

2020-04-16 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Thursday, April 16, 2020 4:38 PM
> 
> Hi Kevin,
> 
> On 2020/4/15 19:10, Tian, Kevin wrote:
> > the completion of above sequence ensures that previous queued
> > page group responses are sent out and received by the endpoint
> > and vice versa all in-fly page requests from the endpoint are queued
> > in iommu page request queue. Then comes a problem - you didn't
> > wait for completion of those newly-queued requests and their
> > responses.
> 
> I thought about this again.
> 
> We do page request draining after PASID table entry gets torn down and
> the devTLB gets invalidated. At this point, if any new page request for
> this pasid comes in, IOMMU will generate an unrecoverable fault and
> response the device with IR (Invalid Request). IOMMU won't put this page
> request into the queue. [VT-d spec 7.4.1]

Non-coverable fault implies severe errors, so I don't see why we should
allow such thing happen when handling a normal situation. if you look at
the start of chapter 7:
--
Non-recoverable Faults: Requests that encounter non-recoverable 
address translation faults are aborted by the remapping hardware, 
and typically require a reset of the device (such as through a function-
level-reset) to recover and re-initialize the device to put it back into 
service.
--

> 
> Hence, we don't need to worry about the newly-queued requests here.
> 
> Best regards,
> Baolu

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-16 Thread Qian Cai


> On Apr 13, 2020, at 9:36 PM, Qian Cai  wrote:
> 
> 
> 
>> On Apr 8, 2020, at 10:19 AM, Joerg Roedel  wrote:
>> 
>> Hi Qian,
>> 
>> On Tue, Apr 07, 2020 at 11:36:05AM -0400, Qian Cai wrote:
>>> After further testing, the change along is insufficient. What I am chasing 
>>> right
>>> now is the swap device will go offline after heavy memory pressure below. 
>>> The
>>> symptom is similar to what we have in the commit,
>>> 
>>> 754265bcab78 (“iommu/amd: Fix race in increase_address_space()”)
>>> 
>>> Apparently, it is no possible to take the domain->lock in fetch_pte() 
>>> because it
>>> could sleep.
>> 
>> Thanks a lot for finding and tracking down another race in the AMD IOMMU
>> page-table code.  The domain->lock is a spin-lock and taking it can't
>> sleep. But fetch_pte() is a fast-path and must not take any locks.
>> 
>> I think the best fix is to update the pt_root and mode of the domain
>> atomically by storing the mode in the lower 12 bits of pt_root. This way
>> they are stored together and can be read/write atomically.
> 
> Like this?

So, this is still not enough that would still trigger storage driver offline 
under
memory pressure for a bit longer. It looks to me that in fetch_pte() there are
could still racy?

level  =  domain->mode - 1;
pte= &domain->pt_root[PM_LEVEL_INDEX(level, address)];
<— 
increase_address_space();
*page_size =  PTE_LEVEL_PAGE_SIZE(level);

while (level > 0) {
*page_size = PTE_LEVEL_PAGE_SIZE(level);

Then in iommu_unmap_page(),

while (unmapped < page_size) {
pte = fetch_pte(dom, bus_addr, &unmap_size);
…
bus_addr  = (bus_addr & ~(unmap_size - 1)) + unmap_size;

bus_addr would be unsync with dom->mode when it enter fetch_pte() again.
Could that be a problem?

> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 20cce366e951..b36c6b07cbfd 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1397,13 +1397,13 @@ static struct page *free_sub_pt(unsigned long root, 
> int mode,
> 
> static void free_pagetable(struct protection_domain *domain)
> {
> - unsigned long root = (unsigned long)domain->pt_root;
> + int level = iommu_get_mode(domain->pt_root);
> + unsigned long root = iommu_get_root(domain->pt_root);
>   struct page *freelist = NULL;
> 
> - BUG_ON(domain->mode < PAGE_MODE_NONE ||
> -domain->mode > PAGE_MODE_6_LEVEL);
> + BUG_ON(level < PAGE_MODE_NONE || level > PAGE_MODE_6_LEVEL);
> 
> - freelist = free_sub_pt(root, domain->mode, freelist);
> + freelist = free_sub_pt(root, level, freelist);
> 
>   free_page_list(freelist);
> }
> @@ -1417,24 +1417,27 @@ static bool increase_address_space(struct 
> protection_domain *domain,
>  unsigned long address,
>  gfp_t gfp)
> {
> + int level;
>   unsigned long flags;
>   bool ret = false;
>   u64 *pte;
> 
>   spin_lock_irqsave(&domain->lock, flags);
> 
> - if (address <= PM_LEVEL_SIZE(domain->mode) ||
> - WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
> + level = iommu_get_mode(domain->pt_root);
> +
> + if (address <= PM_LEVEL_SIZE(level) ||
> + WARN_ON_ONCE(level == PAGE_MODE_6_LEVEL))
>   goto out;
> 
>   pte = (void *)get_zeroed_page(gfp);
>   if (!pte)
>   goto out;
> 
> - *pte = PM_LEVEL_PDE(domain->mode,
> - iommu_virt_to_phys(domain->pt_root));
> - domain->pt_root  = pte;
> - domain->mode+= 1;
> + *pte = PM_LEVEL_PDE(level,
> + iommu_virt_to_phys((void *)iommu_get_root(domain->pt_root)));
> +
> + WRITE_ONCE(domain->pt_root, (unsigned long)pte + level + 1);
> 
>   ret = true;
> 
> @@ -1452,15 +1455,17 @@ static u64 *alloc_pte(struct protection_domain 
> *domain,
> bool *updated)
> {
>   int level, end_lvl;
> - u64 *pte, *page;
> + u64 *pte, *page, *pt_root, *root;
> 
>   BUG_ON(!is_power_of_2(page_size));
> 
> - while (address > PM_LEVEL_SIZE(domain->mode))
> + while (address > PM_LEVEL_SIZE(iommu_get_mode(domain->pt_root)))
>   *updated = increase_address_space(domain, address, gfp) || 
> *updated;
> 
> - level   = domain->mode - 1;
> - pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
> + pt_root = READ_ONCE(domain->pt_root);
> + root= (void *)iommu_get_root(pt_root);
> + level   = iommu_get_mode(pt_root) - 1;
> + pte = &root[PM_LEVEL_INDEX(level, address)];
>   address = PAGE_SIZE_ALIGN(address, page_size);
>   end_lvl = PAGE_SIZE_LEVEL(page_size);
> 
> @@ -1536,16 +1541,18 @@ static u64 *fetch_pte(struct protection_domain 
> *domain,
> unsigned long address,
> unsign

Re: [PATCH v1 0/2] vfio/pci: expose device's PASID capability to VMs

2020-04-16 Thread Yan Zhao
On Fri, Apr 17, 2020 at 06:33:54AM +0800, Raj, Ashok wrote:
> Hi Zhao
> 
> 
> On Thu, Apr 16, 2020 at 06:12:26PM -0400, Yan Zhao wrote:
> > On Tue, Mar 31, 2020 at 03:08:25PM +0800, Lu, Baolu wrote:
> > > On 2020/3/31 14:35, Tian, Kevin wrote:
> > > >> From: Liu, Yi L
> > > >> Sent: Sunday, March 22, 2020 8:33 PM
> > > >>
> > > >> From: Liu Yi L
> > > >>
> > > >> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > >> Intel platforms allows address space sharing between device DMA and
> > > >> applications. SVA can reduce programming complexity and enhance 
> > > >> security.
> > > >>
> > > >> To enable SVA, device needs to have PASID capability, which is a key
> > > >> capability for SVA. This patchset exposes the device's PASID capability
> > > >> to guest instead of hiding it from guest.
> > > >>
> > > >> The second patch emulates PASID capability for VFs (Virtual Function) 
> > > >> since
> > > >> VFs don't implement such capability per PCIe spec. This patch emulates 
> > > >> such
> > > >> capability and expose to VM if the capability is enabled in PF 
> > > >> (Physical
> > > >> Function).
> > > >>
> > > >> However, there is an open for PASID emulation. If PF driver disables 
> > > >> PASID
> > > >> capability at runtime, then it may be an issue. e.g. PF should not 
> > > >> disable
> > > >> PASID capability if there is guest using this capability on any VF 
> > > >> related
> > > >> to this PF. To solve it, may need to introduce a generic communication
> > > >> framework between vfio-pci driver and PF drivers. Please feel free to 
> > > >> give
> > > >> your suggestions on it.
> > > > I'm not sure how this is addressed on bate metal today, i.e. between 
> > > > normal
> > > > kernel PF and VF drivers. I look at pasid enable/disable code in 
> > > > intel-iommu.c.
> > > > There is no check on PF/VF dependency so far. The cap is toggled when
> > > > attaching/detaching the PF to its domain. Let's see how IOMMU guys
> > > > respond, and if there is a way for VF driver to block PF driver from 
> > > > disabling
> > > > the pasid cap when it's being actively used by VF driver, then we may
> > > > leverage the same trick in VFIO when emulation is provided to guest.
> > > 
> > > IOMMU subsystem doesn't expose any APIs for pasid enabling/disabling.
> > > The PCI subsystem does. It handles VF/PF like below.
> > > 
> > > /**
> > >   * pci_enable_pasid - Enable the PASID capability
> > >   * @pdev: PCI device structure
> > >   * @features: Features to enable
> > >   *
> > >   * Returns 0 on success, negative value on error. This function checks
> > >   * whether the features are actually supported by the device and returns
> > >   * an error if not.
> > >   */
> > > int pci_enable_pasid(struct pci_dev *pdev, int features)
> > > {
> > >  u16 control, supported;
> > >  int pasid = pdev->pasid_cap;
> > > 
> > >  /*
> > >   * VFs must not implement the PASID Capability, but if a PF
> > >   * supports PASID, its VFs share the PF PASID configuration.
> > >   */
> > >  if (pdev->is_virtfn) {
> > >  if (pci_physfn(pdev)->pasid_enabled)
> > >  return 0;
> > >  return -EINVAL;
> > >  }
> > > 
> > > /**
> > >   * pci_disable_pasid - Disable the PASID capability
> > >   * @pdev: PCI device structure
> > >   */
> > > void pci_disable_pasid(struct pci_dev *pdev)
> > > {
> > >  u16 control = 0;
> > >  int pasid = pdev->pasid_cap;
> > > 
> > >  /* VFs share the PF PASID configuration */
> > >  if (pdev->is_virtfn)
> > >  return;
> > > 
> > > 
> > > It doesn't block disabling PASID on PF even VFs are possibly using it.
> > >
> > hi
> > I'm not sure, but is it possible for pci_enable_pasid() and
> > pci_disable_pasid() to do the same thing as pdev->driver->sriov_configure,
> > e.g. pci_sriov_configure_simple() below.
> > 
> > It checks whether there are VFs are assigned in pci_vfs_assigned(dev).
> > and we can set the VF in assigned status if vfio_pci_open() is performed
> > on the VF.
> 
> But you can still unbind the PF driver that magically causes the VF's to be
> removed from the guest image too correct? 
> 
> Only the IOMMU mucks with pasid_enable/disable. And it doesn't look like
> we have a path to disable without tearing down the PF binding. 
> 
> We originally had some refcounts and such and would do the real disable only
> when the refcount drops to 0, but we found it wasn't actually necessary 
> to protect these resources like that.
>
right. now unbinding PF driver would cause VFs unplugged from guest.
if we modify vfio_pci and set VFs to be assigned, then VFs could remain
appearing in guest but it cannot function well as PF driver has been unbound.

thanks for explanation :)

> > 
> > 
> > int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
> > {
> > int rc;
> > 
> > might_sleep();
> > 
> >   

Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code

2020-04-16 Thread Derrick, Jonathan
Hi Daniel,

On Fri, 2020-04-17 at 09:03 +0800, Daniel Drake wrote:
> Hi Joerg,
> 
> > Hi,
> > 
> > here is the second version of this patch-set. The first version with
> > some more introductory text can be found here:
> > 
> > https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/
> 
> Thanks for the continued improvements in this area!
> 
> I may have spotted a problem with setups like VMD.
> 
> The core PCI bus is set up during early boot.
> Then, for the PCI bus, we reach iommu_bus_init() -> bus_iommu_probe().
> In there, we call probe_iommu_group() -> dev_iommu_get() for each PCI
> device, which allocates dev->iommu in each case. So far so good.
> 
> The problem is that this is the last time that we'll call dev_iommu_get().
> If any PCI bus devices get added after this point, they do not get passed
> to dev_iommu_get().
> 
> So when the vmd module gets loaded later, and creates more PCI devices,
> we end up in iommu_bus_notifier() -> iommu_probe_device()
> -> __iommu_probe_device() which does:
> 
>   dev->iommu->iommu_dev = iommu_dev;
> 
> dev->iommu-> is a NULL dereference because dev_iommu_get() was never
> called for this new device.
> 
> Daniel
> 

I should have CCed you on this, but it should temporarily resolve that
issue:
https://lists.linuxfoundation.org/pipermail/iommu/2020-April/043253.html

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code

2020-04-16 Thread Daniel Drake
Hi Joerg,

> Hi,
> 
> here is the second version of this patch-set. The first version with
> some more introductory text can be found here:
> 
>   https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/

Thanks for the continued improvements in this area!

I may have spotted a problem with setups like VMD.

The core PCI bus is set up during early boot.
Then, for the PCI bus, we reach iommu_bus_init() -> bus_iommu_probe().
In there, we call probe_iommu_group() -> dev_iommu_get() for each PCI
device, which allocates dev->iommu in each case. So far so good.

The problem is that this is the last time that we'll call dev_iommu_get().
If any PCI bus devices get added after this point, they do not get passed
to dev_iommu_get().

So when the vmd module gets loaded later, and creates more PCI devices,
we end up in iommu_bus_notifier() -> iommu_probe_device()
-> __iommu_probe_device() which does:

dev->iommu->iommu_dev = iommu_dev;

dev->iommu-> is a NULL dereference because dev_iommu_get() was never
called for this new device.

Daniel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 0/2] vfio/pci: expose device's PASID capability to VMs

2020-04-16 Thread Yan Zhao
On Tue, Mar 31, 2020 at 03:08:25PM +0800, Lu, Baolu wrote:
> On 2020/3/31 14:35, Tian, Kevin wrote:
> >> From: Liu, Yi L
> >> Sent: Sunday, March 22, 2020 8:33 PM
> >>
> >> From: Liu Yi L
> >>
> >> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> >> Intel platforms allows address space sharing between device DMA and
> >> applications. SVA can reduce programming complexity and enhance security.
> >>
> >> To enable SVA, device needs to have PASID capability, which is a key
> >> capability for SVA. This patchset exposes the device's PASID capability
> >> to guest instead of hiding it from guest.
> >>
> >> The second patch emulates PASID capability for VFs (Virtual Function) since
> >> VFs don't implement such capability per PCIe spec. This patch emulates such
> >> capability and expose to VM if the capability is enabled in PF (Physical
> >> Function).
> >>
> >> However, there is an open for PASID emulation. If PF driver disables PASID
> >> capability at runtime, then it may be an issue. e.g. PF should not disable
> >> PASID capability if there is guest using this capability on any VF related
> >> to this PF. To solve it, may need to introduce a generic communication
> >> framework between vfio-pci driver and PF drivers. Please feel free to give
> >> your suggestions on it.
> > I'm not sure how this is addressed on bate metal today, i.e. between normal
> > kernel PF and VF drivers. I look at pasid enable/disable code in 
> > intel-iommu.c.
> > There is no check on PF/VF dependency so far. The cap is toggled when
> > attaching/detaching the PF to its domain. Let's see how IOMMU guys
> > respond, and if there is a way for VF driver to block PF driver from 
> > disabling
> > the pasid cap when it's being actively used by VF driver, then we may
> > leverage the same trick in VFIO when emulation is provided to guest.
> 
> IOMMU subsystem doesn't expose any APIs for pasid enabling/disabling.
> The PCI subsystem does. It handles VF/PF like below.
> 
> /**
>   * pci_enable_pasid - Enable the PASID capability
>   * @pdev: PCI device structure
>   * @features: Features to enable
>   *
>   * Returns 0 on success, negative value on error. This function checks
>   * whether the features are actually supported by the device and returns
>   * an error if not.
>   */
> int pci_enable_pasid(struct pci_dev *pdev, int features)
> {
>  u16 control, supported;
>  int pasid = pdev->pasid_cap;
> 
>  /*
>   * VFs must not implement the PASID Capability, but if a PF
>   * supports PASID, its VFs share the PF PASID configuration.
>   */
>  if (pdev->is_virtfn) {
>  if (pci_physfn(pdev)->pasid_enabled)
>  return 0;
>  return -EINVAL;
>  }
> 
> /**
>   * pci_disable_pasid - Disable the PASID capability
>   * @pdev: PCI device structure
>   */
> void pci_disable_pasid(struct pci_dev *pdev)
> {
>  u16 control = 0;
>  int pasid = pdev->pasid_cap;
> 
>  /* VFs share the PF PASID configuration */
>  if (pdev->is_virtfn)
>  return;
> 
> 
> It doesn't block disabling PASID on PF even VFs are possibly using it.
>
hi
I'm not sure, but is it possible for pci_enable_pasid() and
pci_disable_pasid() to do the same thing as pdev->driver->sriov_configure,
e.g. pci_sriov_configure_simple() below.

It checks whether there are VFs are assigned in pci_vfs_assigned(dev).
and we can set the VF in assigned status if vfio_pci_open() is performed
on the VF.


int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
{
int rc;

might_sleep();

if (!dev->is_physfn)
return -ENODEV;

if (pci_vfs_assigned(dev)) {
pci_warn(dev, "Cannot modify SR-IOV while VFs are assigned\n");
return -EPERM;
}

if (nr_virtfn == 0) {
sriov_disable(dev);
return 0;
}

rc = sriov_enable(dev, nr_virtfn);
if (rc < 0)
return rc;

return nr_virtfn;
}

Thanks
Yan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 0/2] vfio/pci: expose device's PASID capability to VMs

2020-04-16 Thread Raj, Ashok
Hi Zhao


On Thu, Apr 16, 2020 at 06:12:26PM -0400, Yan Zhao wrote:
> On Tue, Mar 31, 2020 at 03:08:25PM +0800, Lu, Baolu wrote:
> > On 2020/3/31 14:35, Tian, Kevin wrote:
> > >> From: Liu, Yi L
> > >> Sent: Sunday, March 22, 2020 8:33 PM
> > >>
> > >> From: Liu Yi L
> > >>
> > >> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > >> Intel platforms allows address space sharing between device DMA and
> > >> applications. SVA can reduce programming complexity and enhance security.
> > >>
> > >> To enable SVA, device needs to have PASID capability, which is a key
> > >> capability for SVA. This patchset exposes the device's PASID capability
> > >> to guest instead of hiding it from guest.
> > >>
> > >> The second patch emulates PASID capability for VFs (Virtual Function) 
> > >> since
> > >> VFs don't implement such capability per PCIe spec. This patch emulates 
> > >> such
> > >> capability and expose to VM if the capability is enabled in PF (Physical
> > >> Function).
> > >>
> > >> However, there is an open for PASID emulation. If PF driver disables 
> > >> PASID
> > >> capability at runtime, then it may be an issue. e.g. PF should not 
> > >> disable
> > >> PASID capability if there is guest using this capability on any VF 
> > >> related
> > >> to this PF. To solve it, may need to introduce a generic communication
> > >> framework between vfio-pci driver and PF drivers. Please feel free to 
> > >> give
> > >> your suggestions on it.
> > > I'm not sure how this is addressed on bate metal today, i.e. between 
> > > normal
> > > kernel PF and VF drivers. I look at pasid enable/disable code in 
> > > intel-iommu.c.
> > > There is no check on PF/VF dependency so far. The cap is toggled when
> > > attaching/detaching the PF to its domain. Let's see how IOMMU guys
> > > respond, and if there is a way for VF driver to block PF driver from 
> > > disabling
> > > the pasid cap when it's being actively used by VF driver, then we may
> > > leverage the same trick in VFIO when emulation is provided to guest.
> > 
> > IOMMU subsystem doesn't expose any APIs for pasid enabling/disabling.
> > The PCI subsystem does. It handles VF/PF like below.
> > 
> > /**
> >   * pci_enable_pasid - Enable the PASID capability
> >   * @pdev: PCI device structure
> >   * @features: Features to enable
> >   *
> >   * Returns 0 on success, negative value on error. This function checks
> >   * whether the features are actually supported by the device and returns
> >   * an error if not.
> >   */
> > int pci_enable_pasid(struct pci_dev *pdev, int features)
> > {
> >  u16 control, supported;
> >  int pasid = pdev->pasid_cap;
> > 
> >  /*
> >   * VFs must not implement the PASID Capability, but if a PF
> >   * supports PASID, its VFs share the PF PASID configuration.
> >   */
> >  if (pdev->is_virtfn) {
> >  if (pci_physfn(pdev)->pasid_enabled)
> >  return 0;
> >  return -EINVAL;
> >  }
> > 
> > /**
> >   * pci_disable_pasid - Disable the PASID capability
> >   * @pdev: PCI device structure
> >   */
> > void pci_disable_pasid(struct pci_dev *pdev)
> > {
> >  u16 control = 0;
> >  int pasid = pdev->pasid_cap;
> > 
> >  /* VFs share the PF PASID configuration */
> >  if (pdev->is_virtfn)
> >  return;
> > 
> > 
> > It doesn't block disabling PASID on PF even VFs are possibly using it.
> >
> hi
> I'm not sure, but is it possible for pci_enable_pasid() and
> pci_disable_pasid() to do the same thing as pdev->driver->sriov_configure,
> e.g. pci_sriov_configure_simple() below.
> 
> It checks whether there are VFs are assigned in pci_vfs_assigned(dev).
> and we can set the VF in assigned status if vfio_pci_open() is performed
> on the VF.

But you can still unbind the PF driver that magically causes the VF's to be
removed from the guest image too correct? 

Only the IOMMU mucks with pasid_enable/disable. And it doesn't look like
we have a path to disable without tearing down the PF binding. 

We originally had some refcounts and such and would do the real disable only
when the refcount drops to 0, but we found it wasn't actually necessary 
to protect these resources like that.

> 
> 
> int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
> {
> int rc;
> 
> might_sleep();
> 
> if (!dev->is_physfn)
> return -ENODEV;
> 
> if (pci_vfs_assigned(dev)) {
> pci_warn(dev, "Cannot modify SR-IOV while VFs are 
> assigned\n");
> return -EPERM;
> }
> 
> if (nr_virtfn == 0) {
> sriov_disable(dev);
> return 0;
> }
> 
> rc = sriov_enable(dev, nr_virtfn);
> if (rc < 0)
> return rc;
> 
> return nr_virtfn;
> }
> 
> Thanks
> Yan
___
iommu mailing list
i

Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-16 Thread Jacob Pan
On Wed, 15 Apr 2020 09:47:36 +0200
Jean-Philippe Brucker  wrote:

> On Fri, Apr 10, 2020 at 08:52:49AM -0700, Jacob Pan wrote:
> > On Thu, 9 Apr 2020 16:50:58 +0200
> > Jean-Philippe Brucker  wrote:
> >   
> > > > So unbind is coming anyway, the difference in handling in mmu
> > > > release notifier is whether we silently drop DMA fault vs.
> > > > reporting fault?
> > > 
> > > What I meant is, between mmu release notifier and unbind(), we
> > > can't print any error from DMA fault on dmesg, because an mm exit
> > > is easily triggered by userspace. Look at the lifetime of the
> > > bond:
> > > 
> > > bind()
> > >  |
> > >  : Here any DMA fault is handled by mm, and on error we don't
> > > print : anything to dmesg. Userspace can easily trigger faults by
> > > issuing DMA : on unmapped buffers.
> > >  |
> > > mm exit -> clear pgd, invalidate IOTLBs
> > >  |
> > >  : Here the PASID descriptor doesn't have the pgd anymore, but we
> > > don't : print out any error to dmesg either. DMA is likely still
> > > running but : any fault has to be ignored.
> > >  :
> > >  : We also can't free the PASID yet, since transactions are still
> > > coming : in with this PASID.
> > >  |
> > > unbind() -> clear context descriptor, release PASID and mmu
> > > notifier |
> > >  : Here the PASID descriptor is clear. If DMA is still running the
> > > device : driver really messed up and we have to print out any
> > > fault.
> > > 
> > > For that middle state I had to introduce a new pasid descriptor
> > > state in the SMMU driver, to avoid reporting errors between mm
> > > exit and unbind().  
> > I must have missed something, but why bother with a state when you
> > can always check if the mm is dead by mmget_not_zero()? You would
> > not handle IOPF if the mm is dead anyway, similarly for other DMA
> > errors.  
> 
> In the SMMU a cleared PASID descriptor results in unrecoverable
> faults, which do not go through the I/O page fault handler. I've been
> thinking about injecting everything to the IOPF handler, recoverable
> or not, but filtering down the stream is complicated. Most of the
> time outside this small window, we really need to print out those
> messages because they would indicate serious bugs.
> 
VT-d also results in unrecoverable fault for a cleared PASID. I am
assuming in the fault record, SMMU can also identify the PASID and
source ID. So that should be able to find the matching mm.
Then you can check if the mm is defunct?

> > Also, since you are not freeing ioasid in mmu_notifier release
> > anymore, does it mean the IOASID notifier chain can be non-atomic?  
> 
> Unfortunately not, ioasid_free() is called from
> mmu_notifier_ops::free_notifier() in the RCU callback that results
> from mmu_notifier_put(). 
> 
I agree. I looked at the code, it is much more clean with the
mmu_notifier_get/put.

I am thinking perhaps adding a reclaim mechanism such that IOASID not
directly freed can stay in an in_active list (while waiting for its
states get cleared) until it can be reclaimed. Do you see this is
useful for SMMU?

This is useful for VT-d, since we have more consumers for a given PASID,
i.e. VMCS, VDCM, and IOMMU. Each consumer has its own PASID context to
clean up.

Thanks for the explanation!
> Thanks,
> Jean

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/29] mm: only allow page table mappings for built-in zsmalloc

2020-04-16 Thread Minchan Kim
On Tue, Apr 14, 2020 at 03:13:30PM +0200, Christoph Hellwig wrote:
> This allows to unexport map_vm_area and unmap_kernel_range, which are
> rather deep internal and should not be available to modules, as they for
> example allow fine grained control of mapping permissions, and also
> allow splitting the setup of a vmalloc area and the actual mapping and
> thus expose vmalloc internals.
> 
> zsmalloc is typically built-in and continues to work (just like the
> percpu-vm code using a similar patter), while modular zsmalloc also
> continues to work, but must use copies.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Peter Zijlstra (Intel) 
Acked-by: Minchan Kim 

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-16 Thread Minchan Kim
Hi Christoph,


Sorry for the late.

On Sat, Apr 11, 2020 at 09:20:52AM +0200, Christoph Hellwig wrote:
> Hi Minchan,
> 
> On Fri, Apr 10, 2020 at 04:11:36PM -0700, Minchan Kim wrote:
> > It doesn't mean we couldn't use zsmalloc as module any longer. It means
> > we couldn't use zsmalloc as module with pgtable mapping whcih was little
> > bit faster on microbenchmark in some architecutre(However, I usually temped
> > to remove it since it had several problems). However, we could still use
> > zsmalloc as module as copy way instead of pgtable mapping. Thus, if someone
> > really want to rollback the feature, they should provide reasonable reason
> > why it doesn't work for them. "A little fast" wouldn't be enough to exports
> > deep internal to the module.
> 
> do you have any data how much faster it is on arm (and does that include
> arm64 as well)?  Besides the exports which were my prime concern,

https://github.com/sjenning/zsmapbench

I need to recall the memory. IIRC, it was almost 30% faster at that time
in ARM so was not trivial at that time. However, it was story from
several years ago.

> zsmalloc with pgtable mappings also is the only user of map_kernel_range
> outside of vmalloc.c, if it really is another code base for tiny
> improvements we could mark map_kernel_range or in fact remove it entirely
> and open code it in the remaining callers.

I alsh have temped to remove it. Let me have time to revist it in this
chance.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/arm-smmu: Allow client devices to select direct mapping

2020-04-16 Thread Sai Prakash Ranjan

Hi Robin,

On 2020-04-16 22:47, Robin Murphy wrote:

On 2020-04-16 5:23 pm, Sai Prakash Ranjan wrote:

Hi Robin,

On 2020-04-16 19:28, Robin Murphy wrote:

On 2020-01-22 11:48 am, Sai Prakash Ranjan wrote:

From: Jordan Crouse 

Some client devices want to directly map the IOMMU themselves 
instead

of using the DMA domain. Allow those devices to opt in to direct
mapping by way of a list of compatible strings.

Signed-off-by: Jordan Crouse 
Co-developed-by: Sai Prakash Ranjan 


Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/arm-smmu-qcom.c | 39 
+++

  drivers/iommu/arm-smmu.c  |  3 +++
  drivers/iommu/arm-smmu.h  |  5 +
  3 files changed, 47 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c 
b/drivers/iommu/arm-smmu-qcom.c

index 64a4ab270ab7..ff746acd1c81 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -3,6 +3,7 @@
   * Copyright (c) 2019, The Linux Foundation. All rights reserved.
   */
  +#include 
  #include 
    #include "arm-smmu.h"
@@ -11,6 +12,43 @@ struct qcom_smmu {
  struct arm_smmu_device smmu;
  };
  +static const struct arm_smmu_client_match_data qcom_adreno = {
+    .direct_mapping = true,
+};
+
+static const struct arm_smmu_client_match_data qcom_mdss = {
+    .direct_mapping = true,
+};


Might it make sense to group these by the desired SMMU behaviour
rather than (apparently) what kind of device the client happens to 
be,

which seems like a completely arbitrary distinction from the SMMU
driver's PoV?



Sorry, I did not get the "grouping by the desired SMMU behaviour" 
thing.

Could you please give some more details?


I mean this pattern:

device_a_data {
.thing = this;
}

device_b_data {
.thing = this;
}

device_c_data {
.thing = that;
}

match[] = {
{ .compatible = "A", .data = &device_a_data },
{ .compatible = "B", .data = &device_b_data },
{ .compatible = "C", .data = &device_c_data },
};

...vs. this pattern:

do_this {
.thing = this;
}

do_that {
.thing = that;
}

match[] = {
{ .compatible = "A", .data = &do_this },
{ .compatible = "B", .data = &do_this },
{ .compatible = "C", .data = &do_that },
};

From the perspective of the thing doing the thing, grouping the data
by device is meaningless if all that matters is whether to do this or
that. The second pattern expresses that more naturally.

Of course if every device turns out to need a unique combination of
several behaviours, then there ends up being no practical difference
except that it's probably easier to come up with nice names under the
first pattern. I guess it's up to how you see this developing in
future; whether you're likely to need fine-grained per-device control,
or don't expect it to go much beyond domain type.



Thanks for explaining *this thing* :)
I will update the patch to follow the 2nd pattern as it makes more sense
to do_this or do_that directly. I'm not expecting anything other than
domain type atleast for now but hey we can always add the functionality
if the need arises.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/arm-smmu: Allow client devices to select direct mapping

2020-04-16 Thread Robin Murphy

On 2020-04-16 5:23 pm, Sai Prakash Ranjan wrote:

Hi Robin,

On 2020-04-16 19:28, Robin Murphy wrote:

On 2020-01-22 11:48 am, Sai Prakash Ranjan wrote:

From: Jordan Crouse 

Some client devices want to directly map the IOMMU themselves instead
of using the DMA domain. Allow those devices to opt in to direct
mapping by way of a list of compatible strings.

Signed-off-by: Jordan Crouse 
Co-developed-by: Sai Prakash Ranjan 
Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/arm-smmu-qcom.c | 39 +++
  drivers/iommu/arm-smmu.c  |  3 +++
  drivers/iommu/arm-smmu.h  |  5 +
  3 files changed, 47 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c 
b/drivers/iommu/arm-smmu-qcom.c

index 64a4ab270ab7..ff746acd1c81 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -3,6 +3,7 @@
   * Copyright (c) 2019, The Linux Foundation. All rights reserved.
   */
  +#include 
  #include 
    #include "arm-smmu.h"
@@ -11,6 +12,43 @@ struct qcom_smmu {
  struct arm_smmu_device smmu;
  };
  +static const struct arm_smmu_client_match_data qcom_adreno = {
+    .direct_mapping = true,
+};
+
+static const struct arm_smmu_client_match_data qcom_mdss = {
+    .direct_mapping = true,
+};


Might it make sense to group these by the desired SMMU behaviour
rather than (apparently) what kind of device the client happens to be,
which seems like a completely arbitrary distinction from the SMMU
driver's PoV?



Sorry, I did not get the "grouping by the desired SMMU behaviour" thing.
Could you please give some more details?


I mean this pattern:

device_a_data {
.thing = this;
}

device_b_data {
.thing = this;
}

device_c_data {
.thing = that;
}

match[] = {
{ .compatible = "A", .data = &device_a_data },
{ .compatible = "B", .data = &device_b_data },
{ .compatible = "C", .data = &device_c_data },
};

...vs. this pattern:

do_this {
.thing = this;
}

do_that {
.thing = that;
}

match[] = {
{ .compatible = "A", .data = &do_this },
{ .compatible = "B", .data = &do_this },
{ .compatible = "C", .data = &do_that },
};

From the perspective of the thing doing the thing, grouping the data by 
device is meaningless if all that matters is whether to do this or that. 
The second pattern expresses that more naturally.


Of course if every device turns out to need a unique combination of 
several behaviours, then there ends up being no practical difference 
except that it's probably easier to come up with nice names under the 
first pattern. I guess it's up to how you see this developing in future; 
whether you're likely to need fine-grained per-device control, or don't 
expect it to go much beyond domain type.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu/arm-smmu: Allow client devices to select direct mapping

2020-04-16 Thread Sai Prakash Ranjan

Hi Robin,

On 2020-04-16 19:28, Robin Murphy wrote:

On 2020-01-22 11:48 am, Sai Prakash Ranjan wrote:

From: Jordan Crouse 

Some client devices want to directly map the IOMMU themselves instead
of using the DMA domain. Allow those devices to opt in to direct
mapping by way of a list of compatible strings.

Signed-off-by: Jordan Crouse 
Co-developed-by: Sai Prakash Ranjan 
Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/arm-smmu-qcom.c | 39 
+++

  drivers/iommu/arm-smmu.c  |  3 +++
  drivers/iommu/arm-smmu.h  |  5 +
  3 files changed, 47 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c 
b/drivers/iommu/arm-smmu-qcom.c

index 64a4ab270ab7..ff746acd1c81 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -3,6 +3,7 @@
   * Copyright (c) 2019, The Linux Foundation. All rights reserved.
   */
  +#include 
  #include 
#include "arm-smmu.h"
@@ -11,6 +12,43 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
  };
  +static const struct arm_smmu_client_match_data qcom_adreno = {
+   .direct_mapping = true,
+};
+
+static const struct arm_smmu_client_match_data qcom_mdss = {
+   .direct_mapping = true,
+};


Might it make sense to group these by the desired SMMU behaviour
rather than (apparently) what kind of device the client happens to be,
which seems like a completely arbitrary distinction from the SMMU
driver's PoV?



Sorry, I did not get the "grouping by the desired SMMU behaviour" thing.
Could you please give some more details?


+
+static const struct of_device_id qcom_smmu_client_of_match[] = {
+   { .compatible = "qcom,adreno", .data = &qcom_adreno },
+   { .compatible = "qcom,mdp4", .data = &qcom_mdss },
+   { .compatible = "qcom,mdss", .data = &qcom_mdss },
+   { .compatible = "qcom,sc7180-mdss", .data = &qcom_mdss },
+   { .compatible = "qcom,sdm845-mdss", .data = &qcom_mdss },
+   {},
+};
+
+static const struct arm_smmu_client_match_data *
+qcom_smmu_client_data(struct device *dev)
+{
+   const struct of_device_id *match =
+   of_match_device(qcom_smmu_client_of_match, dev);
+
+   return match ? match->data : NULL;


of_device_get_match_data() is your friend.



Ok will use it.


+}
+
+static int qcom_smmu_request_domain(struct device *dev)
+{
+   const struct arm_smmu_client_match_data *client;
+
+   client = qcom_smmu_client_data(dev);
+   if (client)
+   iommu_request_dm_for_dev(dev);
+
+   return 0;
+}
+
  static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
  {
int ret;
@@ -41,6 +79,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device 
*smmu)

  }
static const struct arm_smmu_impl qcom_smmu_impl = {
+   .req_domain = qcom_smmu_request_domain,
.reset = qcom_smmu500_reset,
  };
  diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16c4b87af42b..67dd9326247a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1448,6 +1448,9 @@ static int arm_smmu_add_device(struct device 
*dev)

device_link_add(dev, smmu->dev,
DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
  + if (smmu->impl && smmu->impl->req_domain)
+   return smmu->impl->req_domain(dev);
+


There are about 5 different patchsets flying around at the moment that
all touch default domain allocation, so this is a fast-moving target,
but I think where the dust should settle is with arm_smmu_ops
forwarding .def_domain_type (or whatever it ends up as) calls to
arm_smmu_impl as appropriate.



I'll wait till the dust settles down and then post the next version.


return 0;
out_cfg_free:
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 8d1cd54d82a6..059dc9c39f64 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -244,6 +244,10 @@ enum arm_smmu_arch_version {
ARM_SMMU_V2,
  };
  +struct arm_smmu_client_match_data {
+   bool direct_mapping;
+};


Does this need to be public? I don't see the other users...



Will move this out.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-16 Thread Auger Eric
Hi Kevin,

On 4/16/20 3:28 PM, Tian, Kevin wrote:
>> From: Auger Eric 
>> Sent: Thursday, April 16, 2020 8:43 PM
>>
>> Hi Kevin,
>> On 4/16/20 2:09 PM, Tian, Kevin wrote:
 From: Liu, Yi L 
 Sent: Thursday, April 16, 2020 6:40 PM

 Hi Alex,
 Still have a direction question with you. Better get agreement with you
 before heading forward.

> From: Alex Williamson 
> Sent: Friday, April 3, 2020 11:35 PM
 [...]
 + *
 + * returns: 0 on success, -errno on failure.
 + */
 +struct vfio_iommu_type1_cache_invalidate {
 +  __u32   argsz;
 +  __u32   flags;
 +  struct  iommu_cache_invalidate_info cache_info;
 +};
 +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE,
> VFIO_BASE
>>> + 24)
>>>
>>> The future extension capabilities of this ioctl worry me, I wonder if
>>> we should do another data[] with flag defining that data as
 CACHE_INFO.
>>
>> Can you elaborate? Does it mean with this way we don't rely on iommu
>> driver to provide version_to_size conversion and instead we just pass
>> data[] to iommu driver for further audit?
>
> No, my concern is that this ioctl has a single function, strictly tied
> to the iommu uapi.  If we replace cache_info with data[] then we can
> define a flag to specify that data[] is struct
> iommu_cache_invalidate_info, and if we need to, a different flag to
> identify data[] as something else.  For example if we get stuck
> expanding cache_info to meet new demands and develop a new uapi to
> solve that, how would we expand this ioctl to support it rather than
> also create a new ioctl?  There's also a trade-off in making the ioctl
> usage more difficult for the user.  I'd still expect the vfio layer to
> check the flag and interpret data[] as indicated by the flag rather
> than just passing a blob of opaque data to the iommu layer though.
> Thanks,

 Based on your comments about defining a single ioctl and a unified
 vfio structure (with a @data[] field) for pasid_alloc/free, bind/
 unbind_gpasid, cache_inv. After some offline trying, I think it would
 be good for bind/unbind_gpasid and cache_inv as both of them use the
 iommu uapi definition. While the pasid alloc/free operation doesn't.
 It would be weird to put all of them together. So pasid alloc/free
 may have a separate ioctl. It would look as below. Does this direction
 look good per your opinion?

 ioctl #22: VFIO_IOMMU_PASID_REQUEST
 /**
   * @pasid: used to return the pasid alloc result when flags ==
>> ALLOC_PASID
   * specify a pasid to be freed when flags == FREE_PASID
   * @range: specify the allocation range when flags == ALLOC_PASID
   */
 struct vfio_iommu_pasid_request {
__u32   argsz;
 #define VFIO_IOMMU_ALLOC_PASID (1 << 0)
 #define VFIO_IOMMU_FREE_PASID  (1 << 1)
__u32   flags;
__u32   pasid;
struct {
__u32   min;
__u32   max;
} range;
 };

 ioctl #23: VFIO_IOMMU_NESTING_OP
 struct vfio_iommu_type1_nesting_op {
__u32   argsz;
__u32   flags;
__u32   op;
__u8data[];
 };

 /* Nesting Ops */
 #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL0
 #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL  1
 #define VFIO_IOMMU_NESTING_OP_CACHE_INVLD   2

>>>
>>> Then why cannot we just put PASID into the header since the
>>> majority of nested usage is associated with a pasid?
>>>
>>> ioctl #23: VFIO_IOMMU_NESTING_OP
>>> struct vfio_iommu_type1_nesting_op {
>>> __u32   argsz;
>>> __u32   flags;
>>> __u32   op;
>>> __u32   pasid;
>>> __u8data[];
>>> };
>>>
>>> In case of SMMUv2 which supports nested w/o PASID, this field can
>>> be ignored for that specific case.
>> On my side I would prefer keeping the pasid in the data[]. This is not
>> always used.
>>
>> For instance, in iommu_cache_invalidate_info/iommu_inv_pasid_info we
>> devised flags to tell whether the PASID is used.
>>
> 
> But don't we include a PASID in both invalidate structures already?
The pasid presence is indicated by the IOMMU_INV_ADDR_FLAGS_PASID flag.

For instance for nested stage SMMUv3 I current performs an ARCHID (asid)
based invalidation only.

Eric
> 
> struct iommu_inv_addr_info {
> #define IOMMU_INV_ADDR_FLAGS_PASID  (1 << 0)
> #define IOMMU_INV_ADDR_FLAGS_ARCHID (1 << 1)
> #define IOMMU_INV_ADDR_FLAGS_LEAF   (1 << 2)
> __u32   flags;
> __u32   archid;
> __u64   pasid;
> __u64   addr;
> __u64   granule_size;
> __u64   nb_granules;
> };
> 
> struct iommu_inv_pasid_info {
> #define IOMMU_INV_PASID_FLAGS_PASID (1 << 0)
> #define IOMMU_INV_PASID_FLAGS_ARCHID(1 << 1)
> __u32   flags;
> __u32   archid

Re: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-16 Thread Alex Williamson
On Thu, 16 Apr 2020 08:40:31 -0600
Alex Williamson  wrote:

> On Thu, 16 Apr 2020 10:40:03 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> > Still have a direction question with you. Better get agreement with you
> > before heading forward.
> >   
> > > From: Alex Williamson 
> > > Sent: Friday, April 3, 2020 11:35 PM
> > [...]  
> > > > > > + *
> > > > > > + * returns: 0 on success, -errno on failure.
> > > > > > + */
> > > > > > +struct vfio_iommu_type1_cache_invalidate {
> > > > > > +   __u32   argsz;
> > > > > > +   __u32   flags;
> > > > > > +   struct  iommu_cache_invalidate_info cache_info;
> > > > > > +};
> > > > > > +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE,
> > > VFIO_BASE
> > > > > + 24)
> > > > >
> > > > > The future extension capabilities of this ioctl worry me, I wonder if
> > > > > we should do another data[] with flag defining that data as 
> > > > > CACHE_INFO.
> > > >
> > > > Can you elaborate? Does it mean with this way we don't rely on iommu
> > > > driver to provide version_to_size conversion and instead we just pass
> > > > data[] to iommu driver for further audit?
> > > 
> > > No, my concern is that this ioctl has a single function, strictly tied
> > > to the iommu uapi.  If we replace cache_info with data[] then we can
> > > define a flag to specify that data[] is struct
> > > iommu_cache_invalidate_info, and if we need to, a different flag to
> > > identify data[] as something else.  For example if we get stuck
> > > expanding cache_info to meet new demands and develop a new uapi to
> > > solve that, how would we expand this ioctl to support it rather than
> > > also create a new ioctl?  There's also a trade-off in making the ioctl
> > > usage more difficult for the user.  I'd still expect the vfio layer to
> > > check the flag and interpret data[] as indicated by the flag rather
> > > than just passing a blob of opaque data to the iommu layer though.
> > > Thanks,
> > 
> > Based on your comments about defining a single ioctl and a unified
> > vfio structure (with a @data[] field) for pasid_alloc/free, bind/
> > unbind_gpasid, cache_inv. After some offline trying, I think it would
> > be good for bind/unbind_gpasid and cache_inv as both of them use the
> > iommu uapi definition. While the pasid alloc/free operation doesn't.
> > It would be weird to put all of them together. So pasid alloc/free
> > may have a separate ioctl. It would look as below. Does this direction
> > look good per your opinion?
> > 
> > ioctl #22: VFIO_IOMMU_PASID_REQUEST
> > /**
> >   * @pasid: used to return the pasid alloc result when flags == ALLOC_PASID
> >   * specify a pasid to be freed when flags == FREE_PASID
> >   * @range: specify the allocation range when flags == ALLOC_PASID
> >   */
> > struct vfio_iommu_pasid_request {
> > __u32   argsz;
> > #define VFIO_IOMMU_ALLOC_PASID  (1 << 0)
> > #define VFIO_IOMMU_FREE_PASID   (1 << 1)
> > __u32   flags;
> > __u32   pasid;
> > struct {
> > __u32   min;
> > __u32   max;
> > } range;
> > };  
> 
> Can't the ioctl return the pasid valid on alloc (like GET_DEVICE_FD)?

s/valid/value/

> Would it be useful to support freeing a range of pasids?  If so then we
> could simply use range for both, ie. allocate a pasid from this range
> and return it, or free all pasids in this range?  vfio already needs to
> track pasids to free them on release, so presumably this is something
> we could support easily.
>  
> > ioctl #23: VFIO_IOMMU_NESTING_OP
> > struct vfio_iommu_type1_nesting_op {
> > __u32   argsz;
> > __u32   flags;
> > __u32   op;
> > __u8data[];
> > };  
> 
> data only has 4-byte alignment, I think we really want it at an 8-byte
> alignment.  This is why I embedded the "op" into the flag for
> DEVICE_FEATURE.  Thanks,
> 
> Alex
> 
> > 
> > /* Nesting Ops */
> > #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL0
> > #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL  1
> > #define VFIO_IOMMU_NESTING_OP_CACHE_INVLD   2
> >  
> > Thanks,
> > Yi Liu
> >   
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-16 Thread Alex Williamson
On Thu, 16 Apr 2020 10:40:03 +
"Liu, Yi L"  wrote:

> Hi Alex,
> Still have a direction question with you. Better get agreement with you
> before heading forward.
> 
> > From: Alex Williamson 
> > Sent: Friday, April 3, 2020 11:35 PM  
> [...]
> > > > > + *
> > > > > + * returns: 0 on success, -errno on failure.
> > > > > + */
> > > > > +struct vfio_iommu_type1_cache_invalidate {
> > > > > + __u32   argsz;
> > > > > + __u32   flags;
> > > > > + struct  iommu_cache_invalidate_info cache_info;
> > > > > +};
> > > > > +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE,  
> > VFIO_BASE  
> > > > + 24)
> > > >
> > > > The future extension capabilities of this ioctl worry me, I wonder if
> > > > we should do another data[] with flag defining that data as CACHE_INFO. 
> > > >  
> > >
> > > Can you elaborate? Does it mean with this way we don't rely on iommu
> > > driver to provide version_to_size conversion and instead we just pass
> > > data[] to iommu driver for further audit?  
> > 
> > No, my concern is that this ioctl has a single function, strictly tied
> > to the iommu uapi.  If we replace cache_info with data[] then we can
> > define a flag to specify that data[] is struct
> > iommu_cache_invalidate_info, and if we need to, a different flag to
> > identify data[] as something else.  For example if we get stuck
> > expanding cache_info to meet new demands and develop a new uapi to
> > solve that, how would we expand this ioctl to support it rather than
> > also create a new ioctl?  There's also a trade-off in making the ioctl
> > usage more difficult for the user.  I'd still expect the vfio layer to
> > check the flag and interpret data[] as indicated by the flag rather
> > than just passing a blob of opaque data to the iommu layer though.
> > Thanks,  
> 
> Based on your comments about defining a single ioctl and a unified
> vfio structure (with a @data[] field) for pasid_alloc/free, bind/
> unbind_gpasid, cache_inv. After some offline trying, I think it would
> be good for bind/unbind_gpasid and cache_inv as both of them use the
> iommu uapi definition. While the pasid alloc/free operation doesn't.
> It would be weird to put all of them together. So pasid alloc/free
> may have a separate ioctl. It would look as below. Does this direction
> look good per your opinion?
> 
> ioctl #22: VFIO_IOMMU_PASID_REQUEST
> /**
>   * @pasid: used to return the pasid alloc result when flags == ALLOC_PASID
>   * specify a pasid to be freed when flags == FREE_PASID
>   * @range: specify the allocation range when flags == ALLOC_PASID
>   */
> struct vfio_iommu_pasid_request {
>   __u32   argsz;
> #define VFIO_IOMMU_ALLOC_PASID(1 << 0)
> #define VFIO_IOMMU_FREE_PASID (1 << 1)
>   __u32   flags;
>   __u32   pasid;
>   struct {
>   __u32   min;
>   __u32   max;
>   } range;
> };

Can't the ioctl return the pasid valid on alloc (like GET_DEVICE_FD)?
Would it be useful to support freeing a range of pasids?  If so then we
could simply use range for both, ie. allocate a pasid from this range
and return it, or free all pasids in this range?  vfio already needs to
track pasids to free them on release, so presumably this is something
we could support easily.
 
> ioctl #23: VFIO_IOMMU_NESTING_OP
> struct vfio_iommu_type1_nesting_op {
>   __u32   argsz;
>   __u32   flags;
>   __u32   op;
>   __u8data[];
> };

data only has 4-byte alignment, I think we really want it at an 8-byte
alignment.  This is why I embedded the "op" into the flag for
DEVICE_FEATURE.  Thanks,

Alex

> 
> /* Nesting Ops */
> #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL0
> #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL  1
> #define VFIO_IOMMU_NESTING_OP_CACHE_INVLD   2
>  
> Thanks,
> Yi Liu
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/4] bus: fsl-mc: add custom .dma_configure implementation

2020-04-16 Thread Laurentiu Tudor



On 4/15/2020 7:04 PM, Lorenzo Pieralisi wrote:
> On Wed, Apr 15, 2020 at 06:44:37PM +0300, Laurentiu Tudor wrote:
>>
>>
>> On 4/14/2020 5:32 PM, Lorenzo Pieralisi wrote:
>>> On Wed, Mar 25, 2020 at 06:48:55PM +0200, Laurentiu Tudor wrote:
 Hi Lorenzo,

 On 3/25/2020 2:51 PM, Lorenzo Pieralisi wrote:
> On Thu, Feb 27, 2020 at 12:05:39PM +0200, laurentiu.tu...@nxp.com wrote:
>> From: Laurentiu Tudor 
>>
>> The devices on this bus are not discovered by way of device tree
>> but by queries to the firmware. It makes little sense to trick the
>> generic of layer into thinking that these devices are of related so
>> that we can get our dma configuration. Instead of doing that, add
>> our custom dma configuration implementation.
>>
>> Signed-off-by: Laurentiu Tudor 
>> ---
>>  drivers/bus/fsl-mc/fsl-mc-bus.c | 31 ++-
>>  1 file changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c 
>> b/drivers/bus/fsl-mc/fsl-mc-bus.c
>> index 36eb25f82c8e..eafaa0e0b906 100644
>> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
>> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
>> @@ -132,11 +132,40 @@ static int fsl_mc_bus_uevent(struct device *dev, 
>> struct kobj_uevent_env *env)
>>  static int fsl_mc_dma_configure(struct device *dev)
>>  {
>>  struct device *dma_dev = dev;
>> +struct iommu_fwspec *fwspec;
>> +const struct iommu_ops *iommu_ops;
>> +struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
>> +int ret;
>> +u32 icid;
>>  
>>  while (dev_is_fsl_mc(dma_dev))
>>  dma_dev = dma_dev->parent;
>>  
>> -return of_dma_configure(dev, dma_dev->of_node, 0);
>> +fwspec = dev_iommu_fwspec_get(dma_dev);
>> +if (!fwspec)
>> +return -ENODEV;
>> +iommu_ops = iommu_ops_from_fwnode(fwspec->iommu_fwnode);
>> +if (!iommu_ops)
>> +return -ENODEV;
>> +
>> +ret = iommu_fwspec_init(dev, fwspec->iommu_fwnode, iommu_ops);
>> +if (ret)
>> +return ret;
>> +
>> +icid = mc_dev->icid;
>> +ret = iommu_fwspec_add_ids(dev, &icid, 1);
>
> I see. So with this patch we would use the MC named component only to
> retrieve the iommu_ops

 Right. I'd also add that the implementation tries to follow the existing
 standard .dma_configure implementations, e.g. of_dma_configure +
 of_iommu_configure. I'd also note that similarly to the ACPI case, this
 MC FW device is probed as a platform device in the DT scenario, binding
 here [1].
 A similar approach is used for the retrieval of the msi irq domain, see
 following patch.

> - the streamid are injected directly here bypassing OF/IORT bindings 
> translations altogether. 

 Actually I've submitted a v2 [2] that calls into .of_xlate() to allow
 the smmu driver to do some processing on the raw streamid coming from
 the firmware. I have not yet tested this with ACPI but expect it to
 work, however, it's debatable how valid is this approach in the context
 of ACPI.
>>>
>>> Actually, what I think you need is of_map_rid() (and an IORT
>>> equivalent, that I am going to write - generalizing iort_msi_map_rid()).
>>>
>>> Would that be enough to enable IORT "normal" mappings in the MC bus
>>> named components ?
>>>
>>
>> At a first glance, looks like this could very well fix the ACPI
>> scenario, but I have some unclarities on the approach:
>>  * are we going to rely in DT and ACPI generic layers even if these
>> devices are not published / enumerated through DT or ACPI tables?
>>  * the firmware manages and provides discrete streamids for the devices
>> it exposes so there's no translation involved. There's no
>>requestor_id / input_id involved but it seems that we would still do
>> some kind of translation relying for this on the DT/ACPI functions.
>>  * MC firmware has its own stream_id (e.g. on some chips 0x4000, others
>> 0xf00, so outside the range of stream_ids used for the mc devices)
>>while for the devices on this bus, MC allocates stream_ids from a
>> range (e.g. 0x17 - 0x3f). Is it possible to describe this in the IORT table?
>>  * Regarding the of_map_rid() use you mentioned, I was planning to
>> decouple the mc bus from the DT layer by dropping the use of
>> of_map_rid(), see patch 4.
>> I briefly glanced over the iort code and spotted this static function:
>> iort_iommu_xlate(). Wouldn't it also help, of course after making it public?
> 
> Guys I have lost you honestly. I don't understand what you really need
> to do with DT and ACPI here. Are they needed to describe what you need
> or not ? If the MC dma configure function does not need any DT/ACPI
> bindings that's fine by me, I don't und

Re: [PATCH 2/2] iommu/arm-smmu: Allow client devices to select direct mapping

2020-04-16 Thread Robin Murphy

On 2020-01-22 11:48 am, Sai Prakash Ranjan wrote:

From: Jordan Crouse 

Some client devices want to directly map the IOMMU themselves instead
of using the DMA domain. Allow those devices to opt in to direct
mapping by way of a list of compatible strings.

Signed-off-by: Jordan Crouse 
Co-developed-by: Sai Prakash Ranjan 
Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/arm-smmu-qcom.c | 39 +++
  drivers/iommu/arm-smmu.c  |  3 +++
  drivers/iommu/arm-smmu.h  |  5 +
  3 files changed, 47 insertions(+)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 64a4ab270ab7..ff746acd1c81 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -3,6 +3,7 @@
   * Copyright (c) 2019, The Linux Foundation. All rights reserved.
   */
  
+#include 

  #include 
  
  #include "arm-smmu.h"

@@ -11,6 +12,43 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
  };
  
+static const struct arm_smmu_client_match_data qcom_adreno = {

+   .direct_mapping = true,
+};
+
+static const struct arm_smmu_client_match_data qcom_mdss = {
+   .direct_mapping = true,
+};


Might it make sense to group these by the desired SMMU behaviour rather 
than (apparently) what kind of device the client happens to be, which 
seems like a completely arbitrary distinction from the SMMU driver's PoV?



+
+static const struct of_device_id qcom_smmu_client_of_match[] = {
+   { .compatible = "qcom,adreno", .data = &qcom_adreno },
+   { .compatible = "qcom,mdp4", .data = &qcom_mdss },
+   { .compatible = "qcom,mdss", .data = &qcom_mdss },
+   { .compatible = "qcom,sc7180-mdss", .data = &qcom_mdss },
+   { .compatible = "qcom,sdm845-mdss", .data = &qcom_mdss },
+   {},
+};
+
+static const struct arm_smmu_client_match_data *
+qcom_smmu_client_data(struct device *dev)
+{
+   const struct of_device_id *match =
+   of_match_device(qcom_smmu_client_of_match, dev);
+
+   return match ? match->data : NULL;


of_device_get_match_data() is your friend.


+}
+
+static int qcom_smmu_request_domain(struct device *dev)
+{
+   const struct arm_smmu_client_match_data *client;
+
+   client = qcom_smmu_client_data(dev);
+   if (client)
+   iommu_request_dm_for_dev(dev);
+
+   return 0;
+}
+
  static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
  {
int ret;
@@ -41,6 +79,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
  }
  
  static const struct arm_smmu_impl qcom_smmu_impl = {

+   .req_domain = qcom_smmu_request_domain,
.reset = qcom_smmu500_reset,
  };
  
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c

index 16c4b87af42b..67dd9326247a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1448,6 +1448,9 @@ static int arm_smmu_add_device(struct device *dev)
device_link_add(dev, smmu->dev,
DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
  
+	if (smmu->impl && smmu->impl->req_domain)

+   return smmu->impl->req_domain(dev);
+


There are about 5 different patchsets flying around at the moment that 
all touch default domain allocation, so this is a fast-moving target, 
but I think where the dust should settle is with arm_smmu_ops forwarding 
.def_domain_type (or whatever it ends up as) calls to arm_smmu_impl as 
appropriate.



return 0;
  
  out_cfg_free:

diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 8d1cd54d82a6..059dc9c39f64 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -244,6 +244,10 @@ enum arm_smmu_arch_version {
ARM_SMMU_V2,
  };
  
+struct arm_smmu_client_match_data {

+   bool direct_mapping;
+};


Does this need to be public? I don't see the other users...

Robin.


+
  enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
@@ -386,6 +390,7 @@ struct arm_smmu_impl {
int (*init_context)(struct arm_smmu_domain *smmu_domain);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
+   int (*req_domain)(struct device *dev);
  };
  
  static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu: arm-smmu-impl: Convert to a generic reset implementation

2020-04-16 Thread Robin Murphy

On 2020-01-22 11:48 am, Sai Prakash Ranjan wrote:

Currently the QCOM specific smmu reset implementation is very
specific to SDM845 SoC and has a wait-for-safe logic which
may not be required for other SoCs. So move the SDM845 specific
logic to its specific reset function. Also add SC7180 SMMU
compatible for calling into QCOM specific implementation.

Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/arm-smmu-impl.c |  8 +---
  drivers/iommu/arm-smmu-qcom.c | 16 +---
  2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 74d97a886e93..c75b9d957b70 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -150,6 +150,8 @@ static const struct arm_smmu_impl arm_mmu500_impl = {
  
  struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)

  {
+   const struct device_node *np = smmu->dev->of_node;
+
/*
 * We will inevitably have to combine model-specific implementation
 * quirks with platform-specific integration quirks, but everything
@@ -166,11 +168,11 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
break;
}
  
-	if (of_property_read_bool(smmu->dev->of_node,

- "calxeda,smmu-secure-config-access"))
+   if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = &calxeda_impl;
  
-	if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500"))

+   if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
+   of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
  
  	return smmu;

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 24c071c1d8b0..64a4ab270ab7 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -15,8 +15,6 @@ static int qcom_sdm845_smmu500_reset(struct arm_smmu_device 
*smmu)
  {
int ret;
  
-	arm_mmu500_reset(smmu);

-
/*
 * To address performance degradation in non-real time clients,
 * such as USB and UFS, turn off wait-for-safe on sdm845 based boards,
@@ -30,8 +28,20 @@ static int qcom_sdm845_smmu500_reset(struct arm_smmu_device 
*smmu)
return ret;
  }
  
+static int qcom_smmu500_reset(struct arm_smmu_device *smmu)

+{
+   const struct device_node *np = smmu->dev->of_node;
+
+   arm_mmu500_reset(smmu);
+
+   if (of_device_is_compatible(np, "qcom,sdm845-smmu-500"))
+   return qcom_sdm845_smmu500_reset(smmu);
+
+   return 0;
+}
+
  static const struct arm_smmu_impl qcom_smmu_impl = {
-   .reset = qcom_sdm845_smmu500_reset,
+   .reset = qcom_smmu500_reset,
  };


It might be logical to have a separate SDM845 impl rather than 
indirecting within the callback itself, but I'm not too concerned either 
way. For the arm-smmu-impl.c changes,


Reviewed-by: Robin Murphy 

Thanks,
Robin.

  
  struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-16 Thread Tian, Kevin
> From: Auger Eric 
> Sent: Thursday, April 16, 2020 8:43 PM
> 
> Hi Kevin,
> On 4/16/20 2:09 PM, Tian, Kevin wrote:
> >> From: Liu, Yi L 
> >> Sent: Thursday, April 16, 2020 6:40 PM
> >>
> >> Hi Alex,
> >> Still have a direction question with you. Better get agreement with you
> >> before heading forward.
> >>
> >>> From: Alex Williamson 
> >>> Sent: Friday, April 3, 2020 11:35 PM
> >> [...]
> >> + *
> >> + * returns: 0 on success, -errno on failure.
> >> + */
> >> +struct vfio_iommu_type1_cache_invalidate {
> >> +  __u32   argsz;
> >> +  __u32   flags;
> >> +  struct  iommu_cache_invalidate_info cache_info;
> >> +};
> >> +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE,
> >>> VFIO_BASE
> > + 24)
> >
> > The future extension capabilities of this ioctl worry me, I wonder if
> > we should do another data[] with flag defining that data as
> >> CACHE_INFO.
> 
>  Can you elaborate? Does it mean with this way we don't rely on iommu
>  driver to provide version_to_size conversion and instead we just pass
>  data[] to iommu driver for further audit?
> >>>
> >>> No, my concern is that this ioctl has a single function, strictly tied
> >>> to the iommu uapi.  If we replace cache_info with data[] then we can
> >>> define a flag to specify that data[] is struct
> >>> iommu_cache_invalidate_info, and if we need to, a different flag to
> >>> identify data[] as something else.  For example if we get stuck
> >>> expanding cache_info to meet new demands and develop a new uapi to
> >>> solve that, how would we expand this ioctl to support it rather than
> >>> also create a new ioctl?  There's also a trade-off in making the ioctl
> >>> usage more difficult for the user.  I'd still expect the vfio layer to
> >>> check the flag and interpret data[] as indicated by the flag rather
> >>> than just passing a blob of opaque data to the iommu layer though.
> >>> Thanks,
> >>
> >> Based on your comments about defining a single ioctl and a unified
> >> vfio structure (with a @data[] field) for pasid_alloc/free, bind/
> >> unbind_gpasid, cache_inv. After some offline trying, I think it would
> >> be good for bind/unbind_gpasid and cache_inv as both of them use the
> >> iommu uapi definition. While the pasid alloc/free operation doesn't.
> >> It would be weird to put all of them together. So pasid alloc/free
> >> may have a separate ioctl. It would look as below. Does this direction
> >> look good per your opinion?
> >>
> >> ioctl #22: VFIO_IOMMU_PASID_REQUEST
> >> /**
> >>   * @pasid: used to return the pasid alloc result when flags ==
> ALLOC_PASID
> >>   * specify a pasid to be freed when flags == FREE_PASID
> >>   * @range: specify the allocation range when flags == ALLOC_PASID
> >>   */
> >> struct vfio_iommu_pasid_request {
> >>__u32   argsz;
> >> #define VFIO_IOMMU_ALLOC_PASID (1 << 0)
> >> #define VFIO_IOMMU_FREE_PASID  (1 << 1)
> >>__u32   flags;
> >>__u32   pasid;
> >>struct {
> >>__u32   min;
> >>__u32   max;
> >>} range;
> >> };
> >>
> >> ioctl #23: VFIO_IOMMU_NESTING_OP
> >> struct vfio_iommu_type1_nesting_op {
> >>__u32   argsz;
> >>__u32   flags;
> >>__u32   op;
> >>__u8data[];
> >> };
> >>
> >> /* Nesting Ops */
> >> #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL0
> >> #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL  1
> >> #define VFIO_IOMMU_NESTING_OP_CACHE_INVLD   2
> >>
> >
> > Then why cannot we just put PASID into the header since the
> > majority of nested usage is associated with a pasid?
> >
> > ioctl #23: VFIO_IOMMU_NESTING_OP
> > struct vfio_iommu_type1_nesting_op {
> > __u32   argsz;
> > __u32   flags;
> > __u32   op;
> > __u32   pasid;
> > __u8data[];
> > };
> >
> > In case of SMMUv2 which supports nested w/o PASID, this field can
> > be ignored for that specific case.
> On my side I would prefer keeping the pasid in the data[]. This is not
> always used.
> 
> For instance, in iommu_cache_invalidate_info/iommu_inv_pasid_info we
> devised flags to tell whether the PASID is used.
> 

But don't we include a PASID in both invalidate structures already?

struct iommu_inv_addr_info {
#define IOMMU_INV_ADDR_FLAGS_PASID  (1 << 0)
#define IOMMU_INV_ADDR_FLAGS_ARCHID (1 << 1)
#define IOMMU_INV_ADDR_FLAGS_LEAF   (1 << 2)
__u32   flags;
__u32   archid;
__u64   pasid;
__u64   addr;
__u64   granule_size;
__u64   nb_granules;
};

struct iommu_inv_pasid_info {
#define IOMMU_INV_PASID_FLAGS_PASID (1 << 0)
#define IOMMU_INV_PASID_FLAGS_ARCHID(1 << 1)
__u32   flags;
__u32   archid;
__u64   pasid;
};

then consolidating the pasid field into generic header doesn't
hurt. the specific handler still rely on flags to tell whether it
is used?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation

Re: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-16 Thread Auger Eric
Hi Kevin,
On 4/16/20 2:09 PM, Tian, Kevin wrote:
>> From: Liu, Yi L 
>> Sent: Thursday, April 16, 2020 6:40 PM
>>
>> Hi Alex,
>> Still have a direction question with you. Better get agreement with you
>> before heading forward.
>>
>>> From: Alex Williamson 
>>> Sent: Friday, April 3, 2020 11:35 PM
>> [...]
>> + *
>> + * returns: 0 on success, -errno on failure.
>> + */
>> +struct vfio_iommu_type1_cache_invalidate {
>> +__u32   argsz;
>> +__u32   flags;
>> +struct  iommu_cache_invalidate_info cache_info;
>> +};
>> +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE,
>>> VFIO_BASE
> + 24)
>
> The future extension capabilities of this ioctl worry me, I wonder if
> we should do another data[] with flag defining that data as
>> CACHE_INFO.

 Can you elaborate? Does it mean with this way we don't rely on iommu
 driver to provide version_to_size conversion and instead we just pass
 data[] to iommu driver for further audit?
>>>
>>> No, my concern is that this ioctl has a single function, strictly tied
>>> to the iommu uapi.  If we replace cache_info with data[] then we can
>>> define a flag to specify that data[] is struct
>>> iommu_cache_invalidate_info, and if we need to, a different flag to
>>> identify data[] as something else.  For example if we get stuck
>>> expanding cache_info to meet new demands and develop a new uapi to
>>> solve that, how would we expand this ioctl to support it rather than
>>> also create a new ioctl?  There's also a trade-off in making the ioctl
>>> usage more difficult for the user.  I'd still expect the vfio layer to
>>> check the flag and interpret data[] as indicated by the flag rather
>>> than just passing a blob of opaque data to the iommu layer though.
>>> Thanks,
>>
>> Based on your comments about defining a single ioctl and a unified
>> vfio structure (with a @data[] field) for pasid_alloc/free, bind/
>> unbind_gpasid, cache_inv. After some offline trying, I think it would
>> be good for bind/unbind_gpasid and cache_inv as both of them use the
>> iommu uapi definition. While the pasid alloc/free operation doesn't.
>> It would be weird to put all of them together. So pasid alloc/free
>> may have a separate ioctl. It would look as below. Does this direction
>> look good per your opinion?
>>
>> ioctl #22: VFIO_IOMMU_PASID_REQUEST
>> /**
>>   * @pasid: used to return the pasid alloc result when flags == ALLOC_PASID
>>   * specify a pasid to be freed when flags == FREE_PASID
>>   * @range: specify the allocation range when flags == ALLOC_PASID
>>   */
>> struct vfio_iommu_pasid_request {
>>  __u32   argsz;
>> #define VFIO_IOMMU_ALLOC_PASID   (1 << 0)
>> #define VFIO_IOMMU_FREE_PASID(1 << 1)
>>  __u32   flags;
>>  __u32   pasid;
>>  struct {
>>  __u32   min;
>>  __u32   max;
>>  } range;
>> };
>>
>> ioctl #23: VFIO_IOMMU_NESTING_OP
>> struct vfio_iommu_type1_nesting_op {
>>  __u32   argsz;
>>  __u32   flags;
>>  __u32   op;
>>  __u8data[];
>> };
>>
>> /* Nesting Ops */
>> #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL0
>> #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL  1
>> #define VFIO_IOMMU_NESTING_OP_CACHE_INVLD   2
>>
> 
> Then why cannot we just put PASID into the header since the
> majority of nested usage is associated with a pasid? 
> 
> ioctl #23: VFIO_IOMMU_NESTING_OP
> struct vfio_iommu_type1_nesting_op {
>   __u32   argsz;
>   __u32   flags;
>   __u32   op;
>   __u32   pasid;
>   __u8data[];
> };
> 
> In case of SMMUv2 which supports nested w/o PASID, this field can
> be ignored for that specific case.
On my side I would prefer keeping the pasid in the data[]. This is not
always used.

For instance, in iommu_cache_invalidate_info/iommu_inv_pasid_info we
devised flags to tell whether the PASID is used.

Thanks

Eric
> 
> Thanks
> Kevin
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces

2020-04-16 Thread Christoph Hellwig
On Thu, Apr 16, 2020 at 10:54:02AM +0200, Jean-Philippe Brucker wrote:
> On Thu, Apr 16, 2020 at 12:28:52AM -0700, Christoph Hellwig wrote:
> > > + rcu_read_lock();
> > > + hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
> > > + io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> > > +start, end - start);
> > > + rcu_read_unlock();
> > > +}
> > 
> > What is the reason that the devices don't register their own notifiers?
> > This kinds of multiplexing is always rather messy, and you do it for
> > all the methods.
> 
> This sends TLB and ATC invalidations through the IOMMU, it doesn't go
> through device drivers

I don't think we mean the same thing, probably because of my rather
imprecise use of the word device.

What I mean is that the mmu_notifier should not be embedded into the
io_mm structure (whch btw, seems to have a way to generic name, just
like all other io_* prefixed names), but instead into the
iommu_bond structure.  That avoid the whole multiplexing layer.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-16 Thread Tian, Kevin
> From: Liu, Yi L 
> Sent: Thursday, April 16, 2020 6:40 PM
> 
> Hi Alex,
> Still have a direction question with you. Better get agreement with you
> before heading forward.
> 
> > From: Alex Williamson 
> > Sent: Friday, April 3, 2020 11:35 PM
> [...]
> > > > > + *
> > > > > + * returns: 0 on success, -errno on failure.
> > > > > + */
> > > > > +struct vfio_iommu_type1_cache_invalidate {
> > > > > + __u32   argsz;
> > > > > + __u32   flags;
> > > > > + struct  iommu_cache_invalidate_info cache_info;
> > > > > +};
> > > > > +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE,
> > VFIO_BASE
> > > > + 24)
> > > >
> > > > The future extension capabilities of this ioctl worry me, I wonder if
> > > > we should do another data[] with flag defining that data as
> CACHE_INFO.
> > >
> > > Can you elaborate? Does it mean with this way we don't rely on iommu
> > > driver to provide version_to_size conversion and instead we just pass
> > > data[] to iommu driver for further audit?
> >
> > No, my concern is that this ioctl has a single function, strictly tied
> > to the iommu uapi.  If we replace cache_info with data[] then we can
> > define a flag to specify that data[] is struct
> > iommu_cache_invalidate_info, and if we need to, a different flag to
> > identify data[] as something else.  For example if we get stuck
> > expanding cache_info to meet new demands and develop a new uapi to
> > solve that, how would we expand this ioctl to support it rather than
> > also create a new ioctl?  There's also a trade-off in making the ioctl
> > usage more difficult for the user.  I'd still expect the vfio layer to
> > check the flag and interpret data[] as indicated by the flag rather
> > than just passing a blob of opaque data to the iommu layer though.
> > Thanks,
> 
> Based on your comments about defining a single ioctl and a unified
> vfio structure (with a @data[] field) for pasid_alloc/free, bind/
> unbind_gpasid, cache_inv. After some offline trying, I think it would
> be good for bind/unbind_gpasid and cache_inv as both of them use the
> iommu uapi definition. While the pasid alloc/free operation doesn't.
> It would be weird to put all of them together. So pasid alloc/free
> may have a separate ioctl. It would look as below. Does this direction
> look good per your opinion?
> 
> ioctl #22: VFIO_IOMMU_PASID_REQUEST
> /**
>   * @pasid: used to return the pasid alloc result when flags == ALLOC_PASID
>   * specify a pasid to be freed when flags == FREE_PASID
>   * @range: specify the allocation range when flags == ALLOC_PASID
>   */
> struct vfio_iommu_pasid_request {
>   __u32   argsz;
> #define VFIO_IOMMU_ALLOC_PASID(1 << 0)
> #define VFIO_IOMMU_FREE_PASID (1 << 1)
>   __u32   flags;
>   __u32   pasid;
>   struct {
>   __u32   min;
>   __u32   max;
>   } range;
> };
> 
> ioctl #23: VFIO_IOMMU_NESTING_OP
> struct vfio_iommu_type1_nesting_op {
>   __u32   argsz;
>   __u32   flags;
>   __u32   op;
>   __u8data[];
> };
> 
> /* Nesting Ops */
> #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL0
> #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL  1
> #define VFIO_IOMMU_NESTING_OP_CACHE_INVLD   2
> 

Then why cannot we just put PASID into the header since the
majority of nested usage is associated with a pasid? 

ioctl #23: VFIO_IOMMU_NESTING_OP
struct vfio_iommu_type1_nesting_op {
__u32   argsz;
__u32   flags;
__u32   op;
__u32   pasid;
__u8data[];
};

In case of SMMUv2 which supports nested w/o PASID, this field can
be ignored for that specific case.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] iommu: Move iommu_group setup to IOMMU core code

2020-04-16 Thread Marek Szyprowski
Hi Joerg,

On 14.04.2020 15:15, Joerg Roedel wrote:
> here is the second version of this patch-set. The first version with
> some more introductory text can be found here:
>
>   https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/
>
> Changes v1->v2:
>
>   * Rebased to v5.7-rc1
>
>   * Re-wrote the arm-smmu changes as suggested by Robin Murphy
>
>   * Re-worked the Exynos patches to hopefully not break the
> driver anymore

Thanks for this rework. This version is much better. Works fine on 
various Exynos-based boards (ARM and ARM64).

Tested-by: Marek Szyprowski 

Acked-by: Marek Szyprowski  (for Exynos and 
core changes)

>   * Fixed a missing mutex_unlock() reported by Marek Szyprowski,
> thanks for that.
>
> There is also a git-branch available with these patches applied:
>
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v2
>
> Please review.
>
> Thanks,
>
>   Joerg
>
> Joerg Roedel (32):
>iommu: Move default domain allocation to separate function
>iommu/amd: Implement iommu_ops->def_domain_type call-back
>iommu/vt-d: Wire up iommu_ops->def_domain_type
>iommu/amd: Remove dma_mask check from check_device()
>iommu/amd: Return -ENODEV in add_device when device is not handled by
>  IOMMU
>iommu: Add probe_device() and remove_device() call-backs
>iommu: Move default domain allocation to iommu_probe_device()
>iommu: Keep a list of allocated groups in __iommu_probe_device()
>iommu: Move new probe_device path to separate function
>iommu: Split off default domain allocation from group assignment
>iommu: Move iommu_group_create_direct_mappings() out of
>  iommu_group_add_device()
>iommu: Export bus_iommu_probe() and make is safe for re-probing
>iommu/amd: Remove dev_data->passthrough
>iommu/amd: Convert to probe/release_device() call-backs
>iommu/vt-d: Convert to probe/release_device() call-backs
>iommu/arm-smmu: Convert to probe/release_device() call-backs
>iommu/pamu: Convert to probe/release_device() call-backs
>iommu/s390: Convert to probe/release_device() call-backs
>iommu/virtio: Convert to probe/release_device() call-backs
>iommu/msm: Convert to probe/release_device() call-backs
>iommu/mediatek: Convert to probe/release_device() call-backs
>iommu/mediatek-v1 Convert to probe/release_device() call-backs
>iommu/qcom: Convert to probe/release_device() call-backs
>iommu/rockchip: Convert to probe/release_device() call-backs
>iommu/tegra: Convert to probe/release_device() call-backs
>iommu/renesas: Convert to probe/release_device() call-backs
>iommu/omap: Remove orphan_dev tracking
>iommu/omap: Convert to probe/release_device() call-backs
>iommu/exynos: Use first SYSMMU in controllers list for IOMMU core
>iommu/exynos: Convert to probe/release_device() call-backs
>iommu: Remove add_device()/remove_device() code-paths
>iommu: Unexport iommu_group_get_for_dev()
>
> Sai Praneeth Prakhya (1):
>iommu: Add def_domain_type() callback in iommu_ops
>
>   drivers/iommu/amd_iommu.c   |  97 
>   drivers/iommu/amd_iommu_types.h |   1 -
>   drivers/iommu/arm-smmu-v3.c |  38 +--
>   drivers/iommu/arm-smmu.c|  39 ++--
>   drivers/iommu/exynos-iommu.c|  24 +-
>   drivers/iommu/fsl_pamu_domain.c |  22 +-
>   drivers/iommu/intel-iommu.c |  68 +-
>   drivers/iommu/iommu.c   | 393 +---
>   drivers/iommu/ipmmu-vmsa.c  |  60 ++---
>   drivers/iommu/msm_iommu.c   |  34 +--
>   drivers/iommu/mtk_iommu.c   |  24 +-
>   drivers/iommu/mtk_iommu_v1.c|  50 ++--
>   drivers/iommu/omap-iommu.c  |  99 ++--
>   drivers/iommu/qcom_iommu.c  |  24 +-
>   drivers/iommu/rockchip-iommu.c  |  26 +--
>   drivers/iommu/s390-iommu.c  |  22 +-
>   drivers/iommu/tegra-gart.c  |  24 +-
>   drivers/iommu/tegra-smmu.c  |  31 +--
>   drivers/iommu/virtio-iommu.c|  41 +---
>   include/linux/iommu.h   |  21 +-
>   20 files changed, 533 insertions(+), 605 deletions(-)
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 07/10] iommu/vt-d: Add svm/sva invalidate function

2020-04-16 Thread Auger Eric
Hi Jacob,
On 4/10/20 11:56 PM, Jacob Pan wrote:
> On Thu, 9 Apr 2020 10:50:34 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>>
>> On 4/3/20 8:42 PM, Jacob Pan wrote:
>>> When Shared Virtual Address (SVA) is enabled for a guest OS via
>>> vIOMMU, we need to provide invalidation support at IOMMU API and
>>> driver level. This patch adds Intel VT-d specific function to
>>> implement iommu passdown invalidate API for shared virtual address.
>>>
>>> The use case is for supporting caching structure invalidation
>>> of assigned SVM capable devices. Emulated IOMMU exposes queue
>>> invalidation capability and passes down all descriptors from the
>>> guest to the physical IOMMU.
>>>
>>> The assumption is that guest to host device ID mapping should be
>>> resolved prior to calling IOMMU driver. Based on the device handle,
>>> host IOMMU driver can replace certain fields before submit to the
>>> invalidation queue.
>>>
>>> ---
>>> v11 - Removed 2D map array, use -EINVAL in granularity lookup array.
>>>   Fixed devTLB invalidation granularity mapping. Disregard G=1
>>> case and use address selective invalidation only.
>>>
>>> v7 review fixed in v10
>>> ---
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Ashok Raj 
>>> Signed-off-by: Liu, Yi L 
>>> ---
>>>  drivers/iommu/intel-iommu.c | 158
>>>  1 file changed, 158
>>> insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c
>>> b/drivers/iommu/intel-iommu.c index 94c7993dac6a..045c5c08d71d
>>> 100644 --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -5594,6 +5594,163 @@ static void
>>> intel_iommu_aux_detach_device(struct iommu_domain *domain,
>>> aux_domain_remove_dev(to_dmar_domain(domain), dev); }
>>>  
>>> +/*
>>> + * 2D array for converting and sanitizing IOMMU generic TLB
>>> granularity to
>>> + * VT-d granularity. Invalidation is typically included in the
>>> unmap operation
>>> + * as a result of DMA or VFIO unmap. However, for assigned devices
>>> guest
>>> + * owns the first level page tables. Invalidations of translation
>>> caches in the
>>> + * guest are trapped and passed down to the host.
>>> + *
>>> + * vIOMMU in the guest will only expose first level page tables,
>>> therefore
>>> + * we do not support IOTLB granularity for request without PASID
>>> (second level).
>>> + *
>>> + * For example, to find the VT-d granularity encoding for IOTLB
>>> + * type and page selective granularity within PASID:
>>> + * X: indexed by iommu cache type
>>> + * Y: indexed by enum iommu_inv_granularity
>>> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
>>> + */
>>> +
>>> +const static int
>>> inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] =
>>> {
>>> +   /*
>>> +* PASID based IOTLB invalidation: PASID selective (per
>>> PASID),
>>> +* page selective (address granularity)
>>> +*/
>>> +   {-EINVAL, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
>>> +   /* PASID based dev TLBs */
>>> +   {-EINVAL, -EINVAL, QI_DEV_IOTLB_GRAN_PASID_SEL},
>>> +   /* PASID cache */
>>> +   {-EINVAL, -EINVAL, -EINVAL}
>>> +};
>>> +
>>> +static inline int to_vtd_granularity(int type, int granu)
>>> +{
>>> +   return inv_type_granu_table[type][granu];
>>> +}
>>> +
>>> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
>>> +{
>>> +   u64 nr_pages = (granu_size * nr_granules) >>
>>> VTD_PAGE_SHIFT; +
>>> +   /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9
>>> for 2MB, etc.
>>> +* IOMMU cache invalidate API passes granu_size in bytes,
>>> and number of
>>> +* granu size in contiguous memory.
>>> +*/
>>> +   return order_base_2(nr_pages);
>>> +}
>>> +
>>> +#ifdef CONFIG_INTEL_IOMMU_SVM
>>> +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
>>> +   struct device *dev, struct
>>> iommu_cache_invalidate_info *inv_info) +{
>>> +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>>> +   struct device_domain_info *info;
>>> +   struct intel_iommu *iommu;
>>> +   unsigned long flags;
>>> +   int cache_type;
>>> +   u8 bus, devfn;
>>> +   u16 did, sid;
>>> +   int ret = 0;
>>> +   u64 size = 0;
>>> +
>>> +   if (!inv_info || !dmar_domain ||
>>> +   inv_info->version !=
>>> IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>>> +   return -EINVAL;
>>> +
>>> +   if (!dev || !dev_is_pci(dev))
>>> +   return -ENODEV;  
>>
>> Check (domain->flags & DOMAIN_FLAG_NESTING_MODE)?
> Good point
> 
>>> +
>>> +   iommu = device_to_iommu(dev, &bus, &devfn);
>>> +   if (!iommu)
>>> +   return -ENODEV;
>>> +
>>> +   spin_lock_irqsave(&device_domain_lock, flags);
>>> +   spin_lock(&iommu->lock);
>>> +   info = iommu_support_dev_iotlb(dmar_domain, iommu, bus,
>>> devfn);
>>> +   if (!info) {
>>> +   ret = -EINVAL;
>>> +   goto out_unlock;
>>> +   }
>>> +   did = dmar_domain->iommu_did[iommu->seq_id];
>>> +   sid = PCI_DEVID(bus, devfn);
>>> +
>>> +   /* Size is only valid in non-PASID selective invalidatio

Re: [PATCH v11 05/10] iommu/vt-d: Add bind guest PASID support

2020-04-16 Thread Auger Eric
Hi Jacob,

On 4/10/20 11:06 PM, Jacob Pan wrote:
> Hi Eric,
> 
> Missed a few things in the last reply.
> 
> On Thu, 9 Apr 2020 09:41:32 +0200
> Auger Eric  wrote:
> 
>>> +   intel_pasid_tear_down_entry(iommu, dev,
>>> svm->pasid);  
>> intel_svm_unbind_mm() calls intel_flush_svm_range_dev(svm, sdev, 0,
>> -1, 0); Don't we need to flush the (DEV-)IOTLBs as well?
> Right, pasid tear down should always include (DEV-)IOTLB flush, I
> initially thought it is taken care of by intel_pasid_tear_down_entry().
> 
>>> +   /* TODO: Drain in flight PRQ for the PASID
>>> since it
>>> +* may get reused soon, we don't want to
>>> +* confuse with its previous life.
>>> +* intel_svm_drain_prq(dev, pasid);
>>> +*/
>>> +   kfree_rcu(sdev, rcu);
>>> +
>>> +   if (list_empty(&svm->devs)) {
>>> +   /*
>>> +* We do not free the IOASID here
>>> in that
>>> +* IOMMU driver did not allocate
>>> it.  
>> s/in/as?
> I meant to say "in that" as "for the reason that"
ok sorry
> 
>>> +* Unlike native SVM, IOASID for
>>> guest use was
>>> +* allocated prior to the bind
>>> call.> + * In any case, if the free
>>> call comes before
>>> +* the unbind, IOMMU driver will
>>> get notified
>>> +* and perform cleanup.
>>> +*/
>>> +   ioasid_set_data(pasid, NULL);
>>> +   kfree(svm);
>>> +   }  
>> nit: you may use intel_svm_free_if_empty()
> True, but I meant to insert ioasid_notifier under the list_empty
> condition in the following ioasid patch.
ok

Thanks

Eric
> 
> 
> Thanks,
> 
> Jacob
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 05/10] iommu/vt-d: Add bind guest PASID support

2020-04-16 Thread Auger Eric
Hi Jacob,

On 4/10/20 9:45 PM, Jacob Pan wrote:
> Hi Eric,
> 
> On Thu, 9 Apr 2020 09:41:32 +0200
> Auger Eric  wrote:
> 
>> Hi Jacob,
>>
>> On 4/3/20 8:42 PM, Jacob Pan wrote:
>>> When supporting guest SVA with emulated IOMMU, the guest PASID
>>> table is shadowed in VMM. Updates to guest vIOMMU PASID table
>>> will result in PASID cache flush which will be passed down to
>>> the host as bind guest PASID calls.
>>>
>>> For the SL page tables, it will be harvested from device's
>>> default domain (request w/o PASID), or aux domain in case of
>>> mediated device.
>>>
>>> .-.  .---.
>>> |   vIOMMU|  | Guest process CR3, FL only|
>>> | |  '---'
>>> ./
>>> | PASID Entry |--- PASID cache flush -
>>> '-'   |
>>> | |   V
>>> | |CR3 in GPA
>>> '-'
>>> Guest
>>> --| Shadow |--|
>>>   vv  v
>>> Host
>>> .-.  .--.
>>> |   pIOMMU|  | Bind FL for GVA-GPA  |
>>> | |  '--'
>>> ./  |
>>> | PASID Entry | V (Nested xlate)
>>> '\.--.
>>> | |   |SL for GPA-HPA, default domain|
>>> | |   '--'
>>> '-'
>>> Where:
>>>  - FL = First level/stage one page tables
>>>  - SL = Second level/stage two page tables
>>>
>>> ---
>>> v11: Fixed locking, avoid duplicated paging mode check, added
>>> helper to free svm if device list is empty. Use rate limited error
>>> message since the bind gpasid call comes from user space.
>>> ---
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Liu, Yi L 
>>> ---
>>>  drivers/iommu/intel-iommu.c |   4 +
>>>  drivers/iommu/intel-svm.c   | 206
>>> 
>>> include/linux/intel-iommu.h |   8 +- include/linux/intel-svm.h   |
>>> 17  4 files changed, 234 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c
>>> b/drivers/iommu/intel-iommu.c index c0dadec5a6b3..94c7993dac6a
>>> 100644 --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -6178,6 +6178,10 @@ const struct iommu_ops intel_iommu_ops = {
>>> .dev_disable_feat   = intel_iommu_dev_disable_feat,
>>> .is_attach_deferred =
>>> intel_iommu_is_attach_deferred, .pgsize_bitmap  =
>>> INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
>>> +   .sva_bind_gpasid= intel_svm_bind_gpasid,
>>> +   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
>>> +#endif
>>>  };
>>>  
>>>  static void quirk_iommu_igfx(struct pci_dev *dev)
>>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>>> index d7f2a5358900..7cf711318b87 100644
>>> --- a/drivers/iommu/intel-svm.c
>>> +++ b/drivers/iommu/intel-svm.c
>>> @@ -226,6 +226,212 @@ static LIST_HEAD(global_svm_list);
>>> list_for_each_entry((sdev), &(svm)->devs, list) \
>>> if ((d) != (sdev)->dev) {} else
>>>  
>>> +
>>> +static inline void intel_svm_free_if_empty(struct intel_svm *svm,
>>> u64 pasid) +{
>>> +   if (list_empty(&svm->devs)) {
>>> +   ioasid_set_data(pasid, NULL);
>>> +   kfree(svm);
>>> +   }
>>> +}
>>> +
>>> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
>>> +   struct device *dev,
>>> +   struct iommu_gpasid_bind_data *data)
>>> +{
>>> +   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>>> +   struct dmar_domain *dmar_domain;
>>> +   struct intel_svm_dev *sdev;
>>> +   struct intel_svm *svm;
>>> +   int ret = 0;
>>> +
>>> +   if (WARN_ON(!iommu) || !data)
>>> +   return -EINVAL;
>>> +
>>> +   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
>>> +   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
>>> +   return -EINVAL;
>>> +
>>> +   if (dev_is_pci(dev)) {
>>> +   /* VT-d supports devices with full 20 bit PASIDs
>>> only */
>>> +   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
>>> +   return -EINVAL;
>>> +   } else {
>>> +   return -ENOTSUPP;
>>> +   }
>>> +
>>> +   /*
>>> +* We only check host PASID range, we have no knowledge to
>>> check
>>> +* guest PASID range.
>>> +*/
>>> +   if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
>>> +   return -EINVAL;
>>> +
>>> +   dmar_domain = to_dmar_domain(domain);
>>> +
>>> +   mutex_lock(&pasid_mutex);
>>> +   svm = ioasid_find(NULL, data->hpasid, NULL);
>>> +   if (IS_ERR(svm)) {
>>> +   ret = PTR_ERR(svm);
>>> +   goto out;
>>> +   }
>>> +
>>> +   if (svm) {
>>> +   /*
>>> +* If we found svm for the PASID, there must be at
>>> +* least one device bond, otherwise svm should be
>>> 

RE: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-16 Thread Liu, Yi L
Hi Alex,
Still have a direction question with you. Better get agreement with you
before heading forward.

> From: Alex Williamson 
> Sent: Friday, April 3, 2020 11:35 PM
[...]
> > > > + *
> > > > + * returns: 0 on success, -errno on failure.
> > > > + */
> > > > +struct vfio_iommu_type1_cache_invalidate {
> > > > +   __u32   argsz;
> > > > +   __u32   flags;
> > > > +   struct  iommu_cache_invalidate_info cache_info;
> > > > +};
> > > > +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE,
> VFIO_BASE
> > > + 24)
> > >
> > > The future extension capabilities of this ioctl worry me, I wonder if
> > > we should do another data[] with flag defining that data as CACHE_INFO.
> >
> > Can you elaborate? Does it mean with this way we don't rely on iommu
> > driver to provide version_to_size conversion and instead we just pass
> > data[] to iommu driver for further audit?
> 
> No, my concern is that this ioctl has a single function, strictly tied
> to the iommu uapi.  If we replace cache_info with data[] then we can
> define a flag to specify that data[] is struct
> iommu_cache_invalidate_info, and if we need to, a different flag to
> identify data[] as something else.  For example if we get stuck
> expanding cache_info to meet new demands and develop a new uapi to
> solve that, how would we expand this ioctl to support it rather than
> also create a new ioctl?  There's also a trade-off in making the ioctl
> usage more difficult for the user.  I'd still expect the vfio layer to
> check the flag and interpret data[] as indicated by the flag rather
> than just passing a blob of opaque data to the iommu layer though.
> Thanks,

Based on your comments about defining a single ioctl and a unified
vfio structure (with a @data[] field) for pasid_alloc/free, bind/
unbind_gpasid, cache_inv. After some offline trying, I think it would
be good for bind/unbind_gpasid and cache_inv as both of them use the
iommu uapi definition. While the pasid alloc/free operation doesn't.
It would be weird to put all of them together. So pasid alloc/free
may have a separate ioctl. It would look as below. Does this direction
look good per your opinion?

ioctl #22: VFIO_IOMMU_PASID_REQUEST
/**
  * @pasid: used to return the pasid alloc result when flags == ALLOC_PASID
  * specify a pasid to be freed when flags == FREE_PASID
  * @range: specify the allocation range when flags == ALLOC_PASID
  */
struct vfio_iommu_pasid_request {
__u32   argsz;
#define VFIO_IOMMU_ALLOC_PASID  (1 << 0)
#define VFIO_IOMMU_FREE_PASID   (1 << 1)
__u32   flags;
__u32   pasid;
struct {
__u32   min;
__u32   max;
} range;
};

ioctl #23: VFIO_IOMMU_NESTING_OP
struct vfio_iommu_type1_nesting_op {
__u32   argsz;
__u32   flags;
__u32   op;
__u8data[];
};

/* Nesting Ops */
#define VFIO_IOMMU_NESTING_OP_BIND_PGTBL0
#define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL  1
#define VFIO_IOMMU_NESTING_OP_CACHE_INVLD   2
 
Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] dt-bndings: iommu: renesas,ipmmu-vmsa: convert to json-schema

2020-04-16 Thread Yoshihiro Shimoda
Hi Geert-san,

> From: Geert Uytterhoeven, Sent: Thursday, April 16, 2020 7:06 PM

> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
> 
> > > > +  renesas,ipmmu-main:
> > > > +$ref: /schemas/types.yaml#/definitions/phandle-array
> > > > +description:
> > > > +  Reference to the main IPMMU instance in two cells. The first 
> > > > cell is
> > > > +  a phandle to the main IPMMU and the second cell is the interrupt 
> > > > bit
> > > > +  number associated with the particular cache IPMMU device. The 
> > > > interrupt
> > > > +  bit number needs to match the main IPMMU IMSSTR register. Only 
> > > > used by
> > > > +  cache IPMMU instances.
> > >
> > > This property is not valid only on R-Car Gen2 and R-Mobile APE6.
> >
> > That sounds good. But, since ipmmu_mm on R-Car Gen3 also is not valid,
> > we cannot use compatible property for detecting it.
> 
> What do you mean by "ipmmu_mm is not valid"?

I just wanted to say that ipmmu_mm node on R-Car Gen3 cannot have
this renesas,ipmmu-main property.

> > However, I realized renesas,ipmmu-main is only used by "cache IPMMU"
> > and "cache IPMMU" doesn't have interrupts property. So,
> > I described the following schema and then it seems success.
> > --
> > if:
> >   properties:
> > interrupts: false
> > then:
> >   required:
> > - renesas,ipmmu-main
> 
> But all other nodes should have interrupts properties, right?

Yes, that's right.

> So they are mutually exclusive:
> 
>   oneOf:
>  - required:
>  - interrupts
> - required:
> - renesas,ipmmu-main

Thank you! Now I understood it and such the schema is better than mine.
I'll fix it.

> > Especially, I could find missing renesas,ipmmu-main property on 
> > r8a77980.dtsi
> > like below :)  So, I'll make and submit a fixing r8a77980.dtsi patch later 
> > (maybe tomorrow).
> 
> Good! ;-)

:)

Best regards,
Yoshihiro Shimoda

> Gr{oetje,eeting}s,
> 
> Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/qcom:fix local_base status check

2020-04-16 Thread Tang Bin


On 2020/4/16 18:05, Robin Murphy wrote:

On 2020-04-02 7:33 am, Tang Bin wrote:

Release resources when exiting on error.

Signed-off-by: Tang Bin 
---
  drivers/iommu/qcom_iommu.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 4328da0b0..c08aa9651 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -813,8 +813,11 @@ static int qcom_iommu_device_probe(struct 
platform_device *pdev)

  qcom_iommu->dev = dev;
    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-    if (res)
+    if (res) {
  qcom_iommu->local_base = devm_ioremap_resource(dev, res);
+    if (IS_ERR(qcom_iommu->local_base))
+    return PTR_ERR(qcom_iommu->local_base);
+    }


...or just use devm_platform_ioremap_resource() to make the whole 
thing simpler.
Yes, I was going to simplify the code here, but status check is still 
required.


So I'm waiting.

Thanks,

Tang Bin



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dt-bndings: iommu: renesas, ipmmu-vmsa: convert to json-schema

2020-04-16 Thread Geert Uytterhoeven
Hi Shimoda-san,

On Thu, Apr 16, 2020 at 11:56 AM Yoshihiro Shimoda
 wrote:
> > From: Geert Uytterhoeven, Sent: Wednesday, April 15, 2020 11:21 PM
> > On Tue, Apr 14, 2020 at 2:26 AM Yoshihiro Shimoda
> >  wrote:
> > > Convert Renesas VMSA-Compatible IOMMU bindings documentation
> > > to json-schema.
> > >
> > > Signed-off-by: Yoshihiro Shimoda 
> >
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml

> > > +  renesas,ipmmu-main:
> > > +$ref: /schemas/types.yaml#/definitions/phandle-array
> > > +description:
> > > +  Reference to the main IPMMU instance in two cells. The first cell 
> > > is
> > > +  a phandle to the main IPMMU and the second cell is the interrupt 
> > > bit
> > > +  number associated with the particular cache IPMMU device. The 
> > > interrupt
> > > +  bit number needs to match the main IPMMU IMSSTR register. Only 
> > > used by
> > > +  cache IPMMU instances.
> >
> > This property is not valid only on R-Car Gen2 and R-Mobile APE6.
>
> That sounds good. But, since ipmmu_mm on R-Car Gen3 also is not valid,
> we cannot use compatible property for detecting it.

What do you mean by "ipmmu_mm is not valid"?

> However, I realized renesas,ipmmu-main is only used by "cache IPMMU"
> and "cache IPMMU" doesn't have interrupts property. So,
> I described the following schema and then it seems success.
> --
> if:
>   properties:
> interrupts: false
> then:
>   required:
> - renesas,ipmmu-main

But all other nodes should have interrupts properties, right?
So they are mutually exclusive:

  oneOf:
 - required:
 - interrupts
- required:
- renesas,ipmmu-main

> Especially, I could find missing renesas,ipmmu-main property on r8a77980.dtsi
> like below :)  So, I'll make and submit a fixing r8a77980.dtsi patch later 
> (maybe tomorrow).

Good! ;-)

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/qcom:fix local_base status check

2020-04-16 Thread Robin Murphy

On 2020-04-02 7:33 am, Tang Bin wrote:

Release resources when exiting on error.

Signed-off-by: Tang Bin 
---
  drivers/iommu/qcom_iommu.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 4328da0b0..c08aa9651 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -813,8 +813,11 @@ static int qcom_iommu_device_probe(struct platform_device 
*pdev)
qcom_iommu->dev = dev;
  
  	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);

-   if (res)
+   if (res) {
qcom_iommu->local_base = devm_ioremap_resource(dev, res);
+   if (IS_ERR(qcom_iommu->local_base))
+   return PTR_ERR(qcom_iommu->local_base);
+   }


...or just use devm_platform_ioremap_resource() to make the whole thing 
simpler.


Robin.

  
  	qcom_iommu->iface_clk = devm_clk_get(dev, "iface");

if (IS_ERR(qcom_iommu->iface_clk)) {


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] dt-bndings: iommu: renesas,ipmmu-vmsa: convert to json-schema

2020-04-16 Thread Yoshihiro Shimoda
Hi Geert-san,

Thank you for your review!

> From: Geert Uytterhoeven, Sent: Wednesday, April 15, 2020 11:21 PM
> 
> Hi Shimoda-san,
> 
> On Tue, Apr 14, 2020 at 2:26 AM Yoshihiro Shimoda
>  wrote:
> > Convert Renesas VMSA-Compatible IOMMU bindings documentation
> > to json-schema.
> >
> > Signed-off-by: Yoshihiro Shimoda 
> 
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
> > @@ -0,0 +1,90 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/iommu/renesas,ipmmu-vmsa.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Renesas VMSA-Compatible IOMMU
> > +
> > +maintainers:
> > +  - Yoshihiro Shimoda 
> > +
> > +description:
> > +  The IPMMU is an IOMMU implementation compatible with the ARM VMSA page 
> > tables.
> > +  It provides address translation for bus masters outside of the CPU, each
> > +  connected to the IPMMU through a port called micro-TLB.
> > +
> > +properties:
> > +  compatible:
> > +oneOf:
> > +  - items:
> > +  - enum:
> > +  - renesas,ipmmu-r8a7743  # RZ/G1M
> > +  - renesas,ipmmu-r8a7744  # RZ/G1N
> > +  - renesas,ipmmu-r8a7745  # RZ/G1E
> > +  - renesas,ipmmu-r8a7790  # R-Car H2
> > +  - renesas,ipmmu-r8a7791  # R-Car M2-W
> > +  - renesas,ipmmu-r8a7793  # R-Car M2-N
> > +  - renesas,ipmmu-r8a7794  # R-Car E2
> > +  - renesas,ipmmu-r8a7795  # R-Car H3
> > +  - const: renesas,ipmmu-vmsa  # R-Car Gen2 or RZ/G1
> > +  - items:
> > +  - enum:
> > +  - renesas,ipmmu-r8a73a4  # R-Mobile APE6
> 
> I believe the R-Mobile APE6 IPMMU is similar to the R-Car Gen2 IPMMU,
> and thus belongs in the section above instead.

I think so.
Since the original document doesn't mention about R-Mobile APE6,
I wrote this here. But, I also think R-Mobile APE6 IPMMU is similar to
the R-Car Gen2 IPMMU. So, I'll change renesas,ipmmu-r8a73a4
to the above section and fix comment of renesas,ipmmu-vmsa

> > +  - renesas,ipmmu-r8a774a1 # RZ/G2M
> > +  - renesas,ipmmu-r8a774b1 # RZ/G2N
> > +  - renesas,ipmmu-r8a774c0 # RZ/G2E
> > +  - renesas,ipmmu-r8a7796  # R-Car M3-W
> > +  - renesas,ipmmu-r8a77965 # R-Car M3-N
> > +  - renesas,ipmmu-r8a77970 # R-Car V3M
> > +  - renesas,ipmmu-r8a77980 # R-Car V3H
> > +  - renesas,ipmmu-r8a77990 # R-Car E3
> > +  - renesas,ipmmu-r8a77995 # R-Car D3
> > +
> > +  reg:
> > +maxItems: 1
> > +
> > +  interrupts:
> > +minItems: 1
> > +maxItems: 2
> > +description:
> > +  Specifiers for the MMU fault interrupts. For instances that support
> > +  secure mode two interrupts must be specified, for non-secure and 
> > secure
> > +  mode, in that order. For instances that don't support secure mode a
> > +  single interrupt must be specified. Not required for cache IPMMUs.
> 
> items:
>   - description: 
>   - description: 

I got it. I'll add items.

> > +
> > +  '#iommu-cells':
> > +const: 1
> > +
> > +  power-domains:
> > +maxItems: 1
> > +
> > +  renesas,ipmmu-main:
> > +$ref: /schemas/types.yaml#/definitions/phandle-array
> > +description:
> > +  Reference to the main IPMMU instance in two cells. The first cell is
> > +  a phandle to the main IPMMU and the second cell is the interrupt bit
> > +  number associated with the particular cache IPMMU device. The 
> > interrupt
> > +  bit number needs to match the main IPMMU IMSSTR register. Only used 
> > by
> > +  cache IPMMU instances.
> 
> This property is not valid only on R-Car Gen2 and R-Mobile APE6.

That sounds good. But, since ipmmu_mm on R-Car Gen3 also is not valid,
we cannot use compatible property for detecting it.

However, I realized renesas,ipmmu-main is only used by "cache IPMMU"
and "cache IPMMU" doesn't have interrupts property. So,
I described the following schema and then it seems success.
--
if:
  properties:
interrupts: false
then:
  required:
- renesas,ipmmu-main
--

Especially, I could find missing renesas,ipmmu-main property on r8a77980.dtsi
like below :)  So, I'll make and submit a fixing r8a77980.dtsi patch later 
(maybe tomorrow).

  CHECK   arch/arm64/boot/dts/renesas/r8a77980-condor.dt.yaml
/opt/home/shimoda/workdir/json-schema/arch/arm64/boot/dts/renesas/r8a77980-condor.dt.yaml:
 mmu@e7b0: 'renesas,ipmmu-main' is a required property
/opt/home/shimoda/workdir/json-schema/arch/arm64/boot/dts/renesas/r8a77980-condor.dt.yaml:
 mmu@e796: 'renesas,ipmmu-main' is a required property

Best regards,
Yoshihiro Shimoda

> (untested)
> 
> oneOf:
>   - properties:
>   contains:
> const: renesas,ipmmu-vmsa
>   - properties:
>   renesas,ipmmu-main:
> $ref: /schemas/types.yaml#/definitions/phandle-array
> descr

[PATCH v2] of_device: removed #include that caused a recursion in included headers

2020-04-16 Thread Hadar Gat
Both of_platform.h and of_device.h were included each other.
In of_device.h, removed unneeded #include to of_platform.h
and added include to of_platform.h in the files that needs it.

Signed-off-by: Hadar Gat 
---
v2: add include to of_platform.h in more files. (reported due other builds)

 arch/sparc/mm/io-unit.c   | 1 +
 arch/sparc/mm/iommu.c | 1 +
 drivers/base/platform.c   | 1 +
 drivers/bus/imx-weim.c| 1 +
 drivers/bus/vexpress-config.c | 1 +
 drivers/clk/mediatek/clk-mt7622-aud.c | 1 +
 drivers/dma/at_hdmac.c| 1 +
 drivers/dma/stm32-dmamux.c| 1 +
 drivers/dma/ti/dma-crossbar.c | 1 +
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 1 +
 drivers/gpu/drm/msm/hdmi/hdmi.c   | 1 +
 drivers/gpu/drm/msm/msm_drv.c | 1 +
 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c   | 1 +
 drivers/gpu/drm/sun4i/sun4i_tcon.c| 1 +
 drivers/iio/adc/stm32-adc-core.c  | 1 +
 drivers/iio/adc/stm32-dfsdm-adc.c | 1 +
 drivers/iio/adc/stm32-dfsdm-core.c| 1 +
 drivers/iommu/tegra-smmu.c| 1 +
 drivers/memory/atmel-ebi.c| 1 +
 drivers/mfd/palmas.c  | 1 +
 drivers/mfd/ssbi.c| 1 +
 drivers/mtd/nand/raw/omap2.c  | 1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 1 +
 drivers/net/ethernet/ti/cpsw.c| 1 +
 drivers/phy/tegra/xusb.c  | 1 +
 drivers/pinctrl/freescale/pinctrl-imx1-core.c | 1 +
 drivers/pinctrl/nomadik/pinctrl-nomadik.c | 1 +
 drivers/soc/samsung/exynos-pmu.c  | 1 +
 drivers/soc/sunxi/sunxi_sram.c| 1 +
 include/linux/of_device.h | 2 --
 lib/genalloc.c| 1 +
 31 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/io-unit.c b/arch/sparc/mm/io-unit.c
index 289276b..5638399 100644
--- a/arch/sparc/mm/io-unit.c
+++ b/arch/sparc/mm/io-unit.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/arch/sparc/mm/iommu.c b/arch/sparc/mm/iommu.c
index b00dde1..9cbb2e7 100644
--- a/arch/sparc/mm/iommu.c
+++ b/arch/sparc/mm/iommu.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 520..f549274b 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/bus/imx-weim.c b/drivers/bus/imx-weim.c
index 28bb65a..8c786da 100644
--- a/drivers/bus/imx-weim.c
+++ b/drivers/bus/imx-weim.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/bus/vexpress-config.c b/drivers/bus/vexpress-config.c
index ff70575..12b8b0b 100644
--- a/drivers/bus/vexpress-config.c
+++ b/drivers/bus/vexpress-config.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 
diff --git a/drivers/clk/mediatek/clk-mt7622-aud.c 
b/drivers/clk/mediatek/clk-mt7622-aud.c
index 2bd4295..8cbb68f 100644
--- a/drivers/clk/mediatek/clk-mt7622-aud.c
+++ b/drivers/clk/mediatek/clk-mt7622-aud.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "clk-mtk.h"
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 73a2078..388f8e10 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "at_hdmac_regs.h"
diff --git a/drivers/dma/stm32-dmamux.c b/drivers/dma/stm32-dmamux.c
index 12f7637..b704896 100644
--- a/drivers/dma/stm32-dmamux.c
+++ b/drivers/dma/stm32-dmamux.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/dma/ti/dma-crossbar.c b/drivers/dma/ti/dma-crossbar.c
index 4ba8fa5..2c0fd44 100644
--- a/drivers/dma/ti/dma-crossbar.c
+++ b/drivers/dma/ti/dma-crossbar.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define TI_XBAR_DRA7   0
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index c4e71ab..f523254 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "a6xx_gpu.h"
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.c b/drivers/gpu/drm/msm/hdmi/hdmi.c
index 737453b..5034d40 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include "hd

Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces

2020-04-16 Thread Jean-Philippe Brucker
On Thu, Apr 16, 2020 at 12:28:52AM -0700, Christoph Hellwig wrote:
> > +   rcu_read_lock();
> > +   hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
> > +   io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> > +  start, end - start);
> > +   rcu_read_unlock();
> > +}
> 
> What is the reason that the devices don't register their own notifiers?
> This kinds of multiplexing is always rather messy, and you do it for
> all the methods.

This sends TLB and ATC invalidations through the IOMMU, it doesn't go
through device drivers

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/7] iommu/vt-d: Add page request draining support

2020-04-16 Thread Lu Baolu

Hi Kevin,

On 2020/4/15 19:10, Tian, Kevin wrote:

the completion of above sequence ensures that previous queued
page group responses are sent out and received by the endpoint
and vice versa all in-fly page requests from the endpoint are queued
in iommu page request queue. Then comes a problem - you didn't
wait for completion of those newly-queued requests and their
responses.


I thought about this again.

We do page request draining after PASID table entry gets torn down and
the devTLB gets invalidated. At this point, if any new page request for
this pasid comes in, IOMMU will generate an unrecoverable fault and
response the device with IR (Invalid Request). IOMMU won't put this page
request into the queue. [VT-d spec 7.4.1]

Hence, we don't need to worry about the newly-queued requests here.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-04-16 Thread Auger Eric
Hi Zhangfei,

On 4/16/20 6:25 AM, Zhangfei Gao wrote:
> 
> 
> On 2020/4/14 下午11:05, Eric Auger wrote:
>> This version fixes an issue observed by Shameer on an SMMU 3.2,
>> when moving from dual stage config to stage 1 only config.
>> The 2 high 64b of the STE now get reset. Otherwise, leaving the
>> S2TTB set may cause a C_BAD_STE error.
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/v5.6-2stage-v11_10.1
>> (including the VFIO part)
>> The QEMU fellow series still can be found at:
>> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>>
>> Users have expressed interest in that work and tested v9/v10:
>> - https://patchwork.kernel.org/cover/11039995/#23012381
>> - https://patchwork.kernel.org/cover/11039995/#23197235
>>
>> Background:
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
>>
>>
> 
> Thanks Eric
> 
> Tested v11 on Hisilicon kunpeng920 board via hardware zip accelerator.
> 1. no-sva works, where guest app directly use physical address via ioctl.
Thank you for the testing. Glad it works for you.
> 2. vSVA still not work, same as v10,
Yes that's normal this series is not meant to support vSVM at this stage.

I intend to add the missing pieces during the next weeks.

Thanks

Eric
> 3.  the v10 issue reported by Shameer has been solved,  first start qemu
> with  iommu=smmuv3, then start qemu without  iommu=smmuv3
> 4. no-sva also works without  iommu=smmuv3
> 
> Test details in https://docs.qq.com/doc/DRU5oR1NtUERseFNL
> 
> Thanks
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/3] iommu/vt-d: Allow 32bit devices to uses DMA domain

2020-04-16 Thread Lu Baolu

Hi Christoph,

On 2020/4/16 15:01, Christoph Hellwig wrote:

On Thu, Apr 16, 2020 at 02:23:52PM +0800, Lu Baolu wrote:

Currently, if a 32bit device initially uses an identity domain,
Intel IOMMU driver will convert it forcibly to a DMA one if its
address capability is not enough for the whole system memory.
The motivation was to overcome the overhead caused by possible
bounced buffer.

Unfortunately, this improvement has led to many problems. For
example, some 32bit devices are required to use an identity
domain, forcing them to use DMA domain will cause the device
not to work anymore. On the other hand, the VMD sub-devices
share a domain but each sub-device might have different address
capability. Forcing a VMD sub-device to use DMA domain blindly
will impact the operation of other sub-devices without any
notification. Further more, PCI aliased devices (PCI bridge
and all devices beneath it, VMD devices and various devices
quirked with pci_add_dma_alias()) must use the same domain.
Forcing one device to switch to DMA domain during runtime
will cause in-fligh DMAs for other devices to abort or target
to other memory which might cause undefind system behavior.


This commit log doesn't actually explain what you are chaning, and
as far as I can tell it just removes the code to change the domain
at run time, which seems to not actually match the subject or


This removes the domain switching in iommu_need_mapping(). Another place
where the private domain is used is intel_iommu_add_device(). Joerg's
patch set has remove that. So with domain switching in
iommu_need_mapping() removed, the private domain helpers could be
removed now. Otherwise, the compiler will complain that some functions
are defined but not used.


description.  I'd need to look at the final code, but it seems like
this will still cause bounce buffering instead of using dynamic
mapping, which still seems like an awful idea.


Yes. If the user chooses to use identity domain by default through
kernel command, identity domain will be applied for all devices. For
those devices with limited addressing capability, bounce buffering will
be used when they try to access the memory beyond their address
capability. This won't cause any kernel regression as far as I can see.

Switching domain during runtime with drivers loaded will cause real
problems as I said in the commit message. That's the reason why I am
proposing to remove it. If we want to keep it, we have to make sure that
switching domain for one device should not impact other devices which
share the same domain with it. Furthermore, it's better to implement it
in the generic layer to keep device driver behavior consistent on all
architectures.



Also from a purely stylistic perspective a lot of the lines seem
very short and not use up the whole 73 charaters allowed.



Yes. I will try to use up the allowed characters.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces

2020-04-16 Thread Christoph Hellwig
> + rcu_read_lock();
> + hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
> + io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> +start, end - start);
> + rcu_read_unlock();
> +}

What is the reason that the devices don't register their own notifiers?
This kinds of multiplexing is always rather messy, and you do it for
all the methods.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/3] iommu/vt-d: Allow 32bit devices to uses DMA domain

2020-04-16 Thread Christoph Hellwig
On Thu, Apr 16, 2020 at 02:23:52PM +0800, Lu Baolu wrote:
> Currently, if a 32bit device initially uses an identity domain,
> Intel IOMMU driver will convert it forcibly to a DMA one if its
> address capability is not enough for the whole system memory.
> The motivation was to overcome the overhead caused by possible
> bounced buffer.
> 
> Unfortunately, this improvement has led to many problems. For
> example, some 32bit devices are required to use an identity
> domain, forcing them to use DMA domain will cause the device
> not to work anymore. On the other hand, the VMD sub-devices
> share a domain but each sub-device might have different address
> capability. Forcing a VMD sub-device to use DMA domain blindly
> will impact the operation of other sub-devices without any
> notification. Further more, PCI aliased devices (PCI bridge
> and all devices beneath it, VMD devices and various devices
> quirked with pci_add_dma_alias()) must use the same domain.
> Forcing one device to switch to DMA domain during runtime
> will cause in-fligh DMAs for other devices to abort or target
> to other memory which might cause undefind system behavior.

This commit log doesn't actually explain what you are chaning, and
as far as I can tell it just removes the code to change the domain
at run time, which seems to not actually match the subject or
description.  I'd need to look at the final code, but it seems like
this will still cause bounce buffering instead of using dynamic
mapping, which still seems like an awful idea.

Also from a purely stylistic perspective a lot of the lines seem
very short and not use up the whole 73 charaters allowed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu