RE: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-13 Thread Tian, Kevin
> From: Christian König
> Sent: Friday, September 7, 2018 4:56 PM
> 
> 5. It would be nice to have to allocate multiple PASIDs for the same
> process address space.
>          E.g. some teams at AMD want to use a separate GPU address space
> for their userspace client library. I'm still trying to avoid that, but
> it is perfectly possible that we are going to need that.
>          Additional to that it is sometimes quite useful for debugging
> to isolate where exactly an incorrect access (segfault) is coming from.
> 
> Let me know if there are some problems with that, especially I want to
> know if there is pushback on #5 so that I can forward that :)
> 

We have similar requirement, except that it is "multiple PASIDs for
same process" instead of "for same process address space".

Intel VT-d goes to a 'true' system-wide PASID allocation policy, 
cross both host processes and guest processes. As Jacob explains,
there will be a virtual cmd register on virtual vtd, through which
guest IOMMU driver requests to get system-wide PASIDs allocated
by host IOMMU driver.

with that design, Qemu represents all guest processes in host
side, thus will get "multiple PASIDs allocated for same process".
However instead of binding all PASIDs to same host address space
of  Qemu, each of PASID entry points to guest address space if 
used by guest process.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-13 Thread Tian, Kevin
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com]
> Sent: Saturday, September 8, 2018 5:25 AM
> > > iommu-sva expects everywhere that the device has an iommu_domain,
> > > it's the first thing we check on entry. Bypassing all of this would
> > > call idr_alloc() directly, and wouldn't have any code in common
> > > with the current iommu-sva. So it seems like you need a layer on
> > > top of iommu-sva calling idr_alloc() when an IOMMU isn't present,
> > > but I don't think it should be in drivers/iommu/
> >
> > In this case I question if the PASID handling should be under
> > drivers/iommu at all.
> >
> > See I can have a mix of VM context which are bound to processes (some
> > few) and VM contexts which are standalone and doesn't care for a
> > process address space. But for each VM context I need a distinct
> > PASID for the hardware to work.

I'm confused about VM context vs. process. Is VM referring to Virtual
Machine or something else? If yes, I don't understand the binding part
- what VM context is bound to (host?) process?

> >
> > I can live if we say if IOMMU is completely disabled we use a simple
> > ida to allocate them, but when IOMMU is enabled I certainly need a
> > way to reserve a PASID without an associated process.
> >
> VT-d would also have such requirement. There is a virtual command
> register for allocate and free PASID for VM use. When that PASID
> allocation request gets propagated to the host IOMMU driver, we need to
> allocate PASID w/o mm.
> 

VT-d is a bit different. In host side, PASID allocation always happens in
Qemu's context, so those PASIDs are recorded with Qemu process, 
though the entries may point to guest page tables instead of host mm
of Qemu.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-12 Thread Christian König

Am 12.09.2018 um 14:40 schrieb Jean-Philippe Brucker:

On 08/09/2018 08:29, Christian König wrote:

Yes, exactly. I just need a PASID which is never used by the OS for a
process and we can easily give that back when the last FD reference is
closed.

Alright, iommu-sva can get its PASID from this external allocator as
well, as long as it has an interface similar to idr. Where would it go,
drivers/base/, mm/, kernel/...?


Good question, my initial instinct was to put it under drivers/pci.

But AFAIKS now you are supporting SVA implementations which are not 
based on PCI.


So drivers/base sounds like a good place to me.




The process dies, iommu-sva is notified and calls the mm_exit()
function passed by the device driver to iommu_sva_device_init(). In
mm_exit() the device driver needs to clear any reference to the
PASID in hardware and in its own structures. When the device driver
returns from mm_exit(), it effectively tells the core that it has
finished using the PASID, and iommu-sva can reuse the PASID for
another process. mm_exit() is allowed to block, so the device
driver has time to clean up and flush the queues.

If the device driver finishes using the PASID before the process
exits, it just calls unbind().

Exactly that's what Michal Hocko is probably going to not like at all.

Can we have a different approach where each driver is informed by the
mm_exit(), but needs to explicitly call unbind() before a PASID is
reused?

It's awful from the IOMMU driver perspective. In addition to "enabled"
and "disabled" PASID states, you add "disabled but DMA still running
normally". Between that new state and "disabled", the IOMMU will be
flooded by translation faults (non-recoverable ones), which it needs to
ignore instead of reporting to the kernel. Not all IOMMUs can deal with
this in hardware (SMMU and VT-d can quiesce translation faults
per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to
filter fault events themselves, depending on the PASID state.


Puh, yeah that is probably true.

Ok let us skip that for a moment, we just need to invest more work in 
killing DMA operations quickly when the process address space is teared 
down.



During that teardown transition it would be ideal if that PASID only
points to a dummy root page directory with only invalid entries.


I guess this can be vendor specific, In VT-d I plan to mark PASID
entry not present and disable fault reporting while draining remaining
activities.

Sounds good to me.

Point is at least in the case where the process was killed by the OOM
killer we should not block in mm_exit().

Instead operations issued by the process to a device driver which uses
SVA needs to be terminated as soon as possible to make sure that the OOM
killer can advance.

I don't see how we're preventing the OOM killer from advancing, so I'm
looking for a stronger argument that justifies adding this complexity to
IOMMU drivers. Time limit of the release MMU notifier, locking
requirement, a concrete example where things break, even a comment
somewhere in mm/ would do...

In my tests I can't manage to disturb the OOM killer, but I could be
missing some special case. Even if the mm_exit() callback
(unrealistically) sleeps for 60 seconds,


Well you are *COMPLETELY* under estimating this. A compute task with a 
huge wave launch can take multiple minutes to tear down.


That's why I'm so concerned about that, but to be honest I think that 
just the hardware needs to become better and we need to be able to block 
dead tasks from spawning threads again.


Regards,
Christian.


  the OOM killer isn't affected
because oom_reap_task_mm() wipes the victim's address space in another
thread, either before or while the release notifier is running.

Thanks,
Jean


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-12 Thread Jean-Philippe Brucker
On 08/09/2018 08:29, Christian König wrote:
> Yes, exactly. I just need a PASID which is never used by the OS for a 
> process and we can easily give that back when the last FD reference is 
> closed.

Alright, iommu-sva can get its PASID from this external allocator as
well, as long as it has an interface similar to idr. Where would it go,
drivers/base/, mm/, kernel/...?

 The process dies, iommu-sva is notified and calls the mm_exit()
 function passed by the device driver to iommu_sva_device_init(). In
 mm_exit() the device driver needs to clear any reference to the
 PASID in hardware and in its own structures. When the device driver
 returns from mm_exit(), it effectively tells the core that it has
 finished using the PASID, and iommu-sva can reuse the PASID for
 another process. mm_exit() is allowed to block, so the device
 driver has time to clean up and flush the queues.

 If the device driver finishes using the PASID before the process
 exits, it just calls unbind().
>>> Exactly that's what Michal Hocko is probably going to not like at all.
>>>
>>> Can we have a different approach where each driver is informed by the
>>> mm_exit(), but needs to explicitly call unbind() before a PASID is
>>> reused?

It's awful from the IOMMU driver perspective. In addition to "enabled"
and "disabled" PASID states, you add "disabled but DMA still running
normally". Between that new state and "disabled", the IOMMU will be
flooded by translation faults (non-recoverable ones), which it needs to
ignore instead of reporting to the kernel. Not all IOMMUs can deal with
this in hardware (SMMU and VT-d can quiesce translation faults
per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to
filter fault events themselves, depending on the PASID state.

>>> During that teardown transition it would be ideal if that PASID only
>>> points to a dummy root page directory with only invalid entries.
>>>
>> I guess this can be vendor specific, In VT-d I plan to mark PASID
>> entry not present and disable fault reporting while draining remaining
>> activities.
>
> Sounds good to me.
> 
> Point is at least in the case where the process was killed by the OOM 
> killer we should not block in mm_exit().
>
> Instead operations issued by the process to a device driver which uses 
> SVA needs to be terminated as soon as possible to make sure that the OOM 
> killer can advance.

I don't see how we're preventing the OOM killer from advancing, so I'm
looking for a stronger argument that justifies adding this complexity to
IOMMU drivers. Time limit of the release MMU notifier, locking
requirement, a concrete example where things break, even a comment
somewhere in mm/ would do...

In my tests I can't manage to disturb the OOM killer, but I could be
missing some special case. Even if the mm_exit() callback
(unrealistically) sleeps for 60 seconds, the OOM killer isn't affected
because oom_reap_task_mm() wipes the victim's address space in another
thread, either before or while the release notifier is running.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-08 Thread Christian König

Am 07.09.2018 um 23:25 schrieb Jacob Pan:

On Fri, 7 Sep 2018 20:02:54 +0200
Christian König  wrote:

[SNIP]

iommu-sva expects everywhere that the device has an iommu_domain,
it's the first thing we check on entry. Bypassing all of this would
call idr_alloc() directly, and wouldn't have any code in common
with the current iommu-sva. So it seems like you need a layer on
top of iommu-sva calling idr_alloc() when an IOMMU isn't present,
but I don't think it should be in drivers/iommu/

In this case I question if the PASID handling should be under
drivers/iommu at all.

See I can have a mix of VM context which are bound to processes (some
few) and VM contexts which are standalone and doesn't care for a
process address space. But for each VM context I need a distinct
PASID for the hardware to work.

I can live if we say if IOMMU is completely disabled we use a simple
ida to allocate them, but when IOMMU is enabled I certainly need a
way to reserve a PASID without an associated process.


VT-d would also have such requirement. There is a virtual command
register for allocate and free PASID for VM use. When that PASID
allocation request gets propagated to the host IOMMU driver, we need to
allocate PASID w/o mm.

If the PASID allocation is done via VFIO, can we have FD to track PASID
life cycle instead of mm_exit()? i.e. all FDs get closed before
mm_exit, I assume?


Yes, exactly. I just need a PASID which is never used by the OS for a 
process and we can easily give that back when the last FD reference is 
closed.



3. Even after destruction of a process address space we need some
grace period before a PASID is reused because it can be that the
specific PASID is still in some hardware queues etc...
           At bare minimum all device drivers using process binding
need to explicitly note to the core when they are done with a
PASID.

Right, much of the horribleness in iommu-sva deals with this:

The process dies, iommu-sva is notified and calls the mm_exit()
function passed by the device driver to iommu_sva_device_init(). In
mm_exit() the device driver needs to clear any reference to the
PASID in hardware and in its own structures. When the device driver
returns from mm_exit(), it effectively tells the core that it has
finished using the PASID, and iommu-sva can reuse the PASID for
another process. mm_exit() is allowed to block, so the device
driver has time to clean up and flush the queues.

If the device driver finishes using the PASID before the process
exits, it just calls unbind().

Exactly that's what Michal Hocko is probably going to not like at all.

Can we have a different approach where each driver is informed by the
mm_exit(), but needs to explicitly call unbind() before a PASID is
reused?

During that teardown transition it would be ideal if that PASID only
points to a dummy root page directory with only invalid entries.


I guess this can be vendor specific, In VT-d I plan to mark PASID
entry not present and disable fault reporting while draining remaining
activities.


Sounds good to me.

Point is at least in the case where the process was killed by the OOM 
killer we should not block in mm_exit().


Instead operations issued by the process to a device driver which uses 
SVA needs to be terminated as soon as possible to make sure that the OOM 
killer can advance.


Thanks,
Christian.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-07 Thread Jacob Pan
On Fri, 7 Sep 2018 20:02:54 +0200
Christian König  wrote:

> Am 07.09.2018 um 17:45 schrieb Jean-Philippe Brucker:
> > On 07/09/2018 09:55, Christian König wrote:  
> >> I will take this as an opportunity to summarize some of the
> >> requirements we have for PASID management from the amdgpu driver
> >> point of view:  
> > That's incredibly useful, thanks :)
> >  
> >> 1. We need to be able to allocate PASID between 1 and some
> >> maximum. Zero is reserved as far as I know, but we don't necessary
> >> need a minimum.  
>  [...]  
> >> 2. We need to be able to allocate PASIDs without a process address
> >> space backing it. E.g. our hardware uses PASIDs even without
> >> Shared Virtual Addressing enabled to distinct clients from each
> >> other. Would be a pity if we need to still have a separate PASID
> >> handling because the system wide is only available when IOMMU is
> >> turned on.  
>  [...]  
> 
> I agree on that.
> 
> > iommu-sva expects everywhere that the device has an iommu_domain,
> > it's the first thing we check on entry. Bypassing all of this would
> > call idr_alloc() directly, and wouldn't have any code in common
> > with the current iommu-sva. So it seems like you need a layer on
> > top of iommu-sva calling idr_alloc() when an IOMMU isn't present,
> > but I don't think it should be in drivers/iommu/  
> 
> In this case I question if the PASID handling should be under 
> drivers/iommu at all.
> 
> See I can have a mix of VM context which are bound to processes (some 
> few) and VM contexts which are standalone and doesn't care for a
> process address space. But for each VM context I need a distinct
> PASID for the hardware to work.
> 
> I can live if we say if IOMMU is completely disabled we use a simple
> ida to allocate them, but when IOMMU is enabled I certainly need a
> way to reserve a PASID without an associated process.
> 
VT-d would also have such requirement. There is a virtual command
register for allocate and free PASID for VM use. When that PASID
allocation request gets propagated to the host IOMMU driver, we need to
allocate PASID w/o mm.

If the PASID allocation is done via VFIO, can we have FD to track PASID
life cycle instead of mm_exit()? i.e. all FDs get closed before
mm_exit, I assume?

> >> 3. Even after destruction of a process address space we need some
> >> grace period before a PASID is reused because it can be that the
> >> specific PASID is still in some hardware queues etc...
> >>           At bare minimum all device drivers using process binding
> >> need to explicitly note to the core when they are done with a
> >> PASID.  
> > Right, much of the horribleness in iommu-sva deals with this:
> >
> > The process dies, iommu-sva is notified and calls the mm_exit()
> > function passed by the device driver to iommu_sva_device_init(). In
> > mm_exit() the device driver needs to clear any reference to the
> > PASID in hardware and in its own structures. When the device driver
> > returns from mm_exit(), it effectively tells the core that it has
> > finished using the PASID, and iommu-sva can reuse the PASID for
> > another process. mm_exit() is allowed to block, so the device
> > driver has time to clean up and flush the queues.
> >
> > If the device driver finishes using the PASID before the process
> > exits, it just calls unbind().  
> 
> Exactly that's what Michal Hocko is probably going to not like at all.
> 
> Can we have a different approach where each driver is informed by the 
> mm_exit(), but needs to explicitly call unbind() before a PASID is
> reused?
> 
> During that teardown transition it would be ideal if that PASID only 
> points to a dummy root page directory with only invalid entries.
> 
I guess this can be vendor specific, In VT-d I plan to mark PASID
entry not present and disable fault reporting while draining remaining
activities.

> >  
> >> 4. It would be nice to have to be able to set a "void *" for each
> >> PASID/device combination while binding to a process which then can
> >> be queried later on based on the PASID.
> >>           E.g. when you have a per PASID/device structure around
> >> anyway, just add an extra field.  
> > iommu_sva_bind_device() takes a "drvdata" pointer that is stored
> > internally for the PASID/device combination (iommu_bond). It is
> > passed to mm_exit(), but I haven't added anything for the device
> > driver to query it back.  
> 
> Nice! Looks like all we need additionally is a function to retrieve
> that based on the PASID.
> 
> >> 5. It would be nice to have to allocate multiple PASIDs for the
> >> same process address space.
> >>           E.g. some teams at AMD want to use a separate GPU
> >> address space for their userspace client library. I'm still trying
> >> to avoid that, but it is perfectly possible that we are going to
> >> need that.  
> > Two PASIDs pointing to the same process pgd? At first glance it
> > seems feasible, maybe with a flag passed to bind() and a few
> > changes to internal 

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-07 Thread Christian König

Am 07.09.2018 um 17:45 schrieb Jean-Philippe Brucker:

On 07/09/2018 09:55, Christian König wrote:

I will take this as an opportunity to summarize some of the requirements
we have for PASID management from the amdgpu driver point of view:

That's incredibly useful, thanks :)


1. We need to be able to allocate PASID between 1 and some maximum. Zero
is reserved as far as I know, but we don't necessary need a minimum.

Should be fine. The PASID range is restricted by the PCI PASID
capability, firmware description (for non-PCI devices), the IOMMU
capacity, and what the device driver passes to iommu_sva_device_init.
Not all IOMMUs reserve PASID 0 (AMD IOMMU without GIoSup doesn't, if I'm
not mistaken), so the KFD driver will need to pass min_pasid=1 to make
sure that 0 isn't allocated.


2. We need to be able to allocate PASIDs without a process address space
backing it. E.g. our hardware uses PASIDs even without Shared Virtual
Addressing enabled to distinct clients from each other.
          Would be a pity if we need to still have a separate PASID
handling because the system wide is only available when IOMMU is turned on.

I'm still not sure about this one. From my point of view we shouldn't
add to the IOMMU subsystem helpers for devices without an IOMMU.


I agree on that.


iommu-sva expects everywhere that the device has an iommu_domain, it's
the first thing we check on entry. Bypassing all of this would call
idr_alloc() directly, and wouldn't have any code in common with the
current iommu-sva. So it seems like you need a layer on top of iommu-sva
calling idr_alloc() when an IOMMU isn't present, but I don't think it
should be in drivers/iommu/


In this case I question if the PASID handling should be under 
drivers/iommu at all.


See I can have a mix of VM context which are bound to processes (some 
few) and VM contexts which are standalone and doesn't care for a process 
address space. But for each VM context I need a distinct PASID for the 
hardware to work.


I can live if we say if IOMMU is completely disabled we use a simple ida 
to allocate them, but when IOMMU is enabled I certainly need a way to 
reserve a PASID without an associated process.



3. Even after destruction of a process address space we need some grace
period before a PASID is reused because it can be that the specific
PASID is still in some hardware queues etc...
          At bare minimum all device drivers using process binding need
to explicitly note to the core when they are done with a PASID.

Right, much of the horribleness in iommu-sva deals with this:

The process dies, iommu-sva is notified and calls the mm_exit() function
passed by the device driver to iommu_sva_device_init(). In mm_exit() the
device driver needs to clear any reference to the PASID in hardware and
in its own structures. When the device driver returns from mm_exit(), it
effectively tells the core that it has finished using the PASID, and
iommu-sva can reuse the PASID for another process. mm_exit() is allowed
to block, so the device driver has time to clean up and flush the queues.

If the device driver finishes using the PASID before the process exits,
it just calls unbind().


Exactly that's what Michal Hocko is probably going to not like at all.

Can we have a different approach where each driver is informed by the 
mm_exit(), but needs to explicitly call unbind() before a PASID is reused?


During that teardown transition it would be ideal if that PASID only 
points to a dummy root page directory with only invalid entries.





4. It would be nice to have to be able to set a "void *" for each
PASID/device combination while binding to a process which then can be
queried later on based on the PASID.
          E.g. when you have a per PASID/device structure around anyway,
just add an extra field.

iommu_sva_bind_device() takes a "drvdata" pointer that is stored
internally for the PASID/device combination (iommu_bond). It is passed
to mm_exit(), but I haven't added anything for the device driver to
query it back.


Nice! Looks like all we need additionally is a function to retrieve that 
based on the PASID.



5. It would be nice to have to allocate multiple PASIDs for the same
process address space.
          E.g. some teams at AMD want to use a separate GPU address space
for their userspace client library. I'm still trying to avoid that, but
it is perfectly possible that we are going to need that.

Two PASIDs pointing to the same process pgd? At first glance it seems
feasible, maybe with a flag passed to bind() and a few changes to
internal structures. It will duplicate ATC invalidation commands for
each process address space change (munmap etc) so you might take a
performance hit.

Intel's SVM code has the SVM_FLAG_PRIVATE_PASID which seems similar to
what you describe, but I don't plan to support it in this series (the
io_mm model is already pretty complicated). I think it can be added
without too much effort in a future series, though with a 

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-07 Thread Jean-Philippe Brucker
On 07/09/2018 09:55, Christian König wrote:
> I will take this as an opportunity to summarize some of the requirements 
> we have for PASID management from the amdgpu driver point of view:

That's incredibly useful, thanks :)

> 1. We need to be able to allocate PASID between 1 and some maximum. Zero 
> is reserved as far as I know, but we don't necessary need a minimum.

Should be fine. The PASID range is restricted by the PCI PASID
capability, firmware description (for non-PCI devices), the IOMMU
capacity, and what the device driver passes to iommu_sva_device_init.
Not all IOMMUs reserve PASID 0 (AMD IOMMU without GIoSup doesn't, if I'm
not mistaken), so the KFD driver will need to pass min_pasid=1 to make
sure that 0 isn't allocated.

> 2. We need to be able to allocate PASIDs without a process address space 
> backing it. E.g. our hardware uses PASIDs even without Shared Virtual 
> Addressing enabled to distinct clients from each other.
>          Would be a pity if we need to still have a separate PASID 
> handling because the system wide is only available when IOMMU is turned on.

I'm still not sure about this one. From my point of view we shouldn't
add to the IOMMU subsystem helpers for devices without an IOMMU.
iommu-sva expects everywhere that the device has an iommu_domain, it's
the first thing we check on entry. Bypassing all of this would call
idr_alloc() directly, and wouldn't have any code in common with the
current iommu-sva. So it seems like you need a layer on top of iommu-sva
calling idr_alloc() when an IOMMU isn't present, but I don't think it
should be in drivers/iommu/

> 3. Even after destruction of a process address space we need some grace 
> period before a PASID is reused because it can be that the specific 
> PASID is still in some hardware queues etc...
>          At bare minimum all device drivers using process binding need 
> to explicitly note to the core when they are done with a PASID.

Right, much of the horribleness in iommu-sva deals with this:

The process dies, iommu-sva is notified and calls the mm_exit() function
passed by the device driver to iommu_sva_device_init(). In mm_exit() the
device driver needs to clear any reference to the PASID in hardware and
in its own structures. When the device driver returns from mm_exit(), it
effectively tells the core that it has finished using the PASID, and
iommu-sva can reuse the PASID for another process. mm_exit() is allowed
to block, so the device driver has time to clean up and flush the queues.

If the device driver finishes using the PASID before the process exits,
it just calls unbind().

> 4. It would be nice to have to be able to set a "void *" for each 
> PASID/device combination while binding to a process which then can be 
> queried later on based on the PASID.
>          E.g. when you have a per PASID/device structure around anyway, 
> just add an extra field.

iommu_sva_bind_device() takes a "drvdata" pointer that is stored
internally for the PASID/device combination (iommu_bond). It is passed
to mm_exit(), but I haven't added anything for the device driver to
query it back.

> 5. It would be nice to have to allocate multiple PASIDs for the same 
> process address space.
>          E.g. some teams at AMD want to use a separate GPU address space 
> for their userspace client library. I'm still trying to avoid that, but 
> it is perfectly possible that we are going to need that.

Two PASIDs pointing to the same process pgd? At first glance it seems
feasible, maybe with a flag passed to bind() and a few changes to
internal structures. It will duplicate ATC invalidation commands for
each process address space change (munmap etc) so you might take a
performance hit.

Intel's SVM code has the SVM_FLAG_PRIVATE_PASID which seems similar to
what you describe, but I don't plan to support it in this series (the
io_mm model is already pretty complicated). I think it can be added
without too much effort in a future series, though with a different flag
name since we'd like to use "private PASID" for something else
(https://www.spinics.net/lists/dri-devel/msg177007.html).

Thanks,
Jean

>          Additional to that it is sometimes quite useful for debugging 
> to isolate where exactly an incorrect access (segfault) is coming from.
> 
> Let me know if there are some problems with that, especially I want to 
> know if there is pushback on #5 so that I can forward that :)
> 
> Thanks,
> Christian.
> 
>>
>> Thanks,
>> Jean
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-07 Thread Christian König

Am 06.09.2018 um 14:45 schrieb Jean-Philippe Brucker:

On 06/09/2018 12:12, Christian König wrote:

Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker:

Hi Eric,

Thanks for reviewing

On 05/09/2018 12:29, Auger Eric wrote:

+int iommu_sva_device_init(struct device *dev, unsigned long features,
+  unsigned int max_pasid)

what about min_pasid?

No one asked for it... The max_pasid parameter is here for drivers that
have vendor-specific PASID size limits, such as AMD KFD (see
kfd_iommu_device_init and
https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases
the PASID size will only depend on the PCI PASID capability and the
IOMMU limits, both known by the IOMMU driver, so device drivers won't
have to set max_pasid.

IOMMU drivers need to set min_pasid in the sva_device_init callback
because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0
(Intel Vt-d rev2), but at the moment I can't see a reason for device
drivers to override min_pasid

Sorry to ruin your day, but if I'm not completely mistaken PASID zero is
reserved in the AMD KFD as well.

Heh, fair enough. I'll add the min_pasid parameter


I will take this as an opportunity to summarize some of the requirements 
we have for PASID management from the amdgpu driver point of view:


1. We need to be able to allocate PASID between 1 and some maximum. Zero 
is reserved as far as I know, but we don't necessary need a minimum.


2. We need to be able to allocate PASIDs without a process address space 
backing it. E.g. our hardware uses PASIDs even without Shared Virtual 
Addressing enabled to distinct clients from each other.
        Would be a pity if we need to still have a separate PASID 
handling because the system wide is only available when IOMMU is turned on.


3. Even after destruction of a process address space we need some grace 
period before a PASID is reused because it can be that the specific 
PASID is still in some hardware queues etc...
        At bare minimum all device drivers using process binding need 
to explicitly note to the core when they are done with a PASID.


4. It would be nice to have to be able to set a "void *" for each 
PASID/device combination while binding to a process which then can be 
queried later on based on the PASID.
        E.g. when you have a per PASID/device structure around anyway, 
just add an extra field.


5. It would be nice to have to allocate multiple PASIDs for the same 
process address space.
        E.g. some teams at AMD want to use a separate GPU address space 
for their userspace client library. I'm still trying to avoid that, but 
it is perfectly possible that we are going to need that.
        Additional to that it is sometimes quite useful for debugging 
to isolate where exactly an incorrect access (segfault) is coming from.


Let me know if there are some problems with that, especially I want to 
know if there is pushback on #5 so that I can forward that :)


Thanks,
Christian.



Thanks,
Jean


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-06 Thread Jean-Philippe Brucker
On 06/09/2018 12:12, Christian König wrote:
> Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker:
>> Hi Eric,
>>
>> Thanks for reviewing
>>
>> On 05/09/2018 12:29, Auger Eric wrote:
 +int iommu_sva_device_init(struct device *dev, unsigned long features,
 +  unsigned int max_pasid)
>>> what about min_pasid?
>> No one asked for it... The max_pasid parameter is here for drivers that
>> have vendor-specific PASID size limits, such as AMD KFD (see
>> kfd_iommu_device_init and
>> https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases
>> the PASID size will only depend on the PCI PASID capability and the
>> IOMMU limits, both known by the IOMMU driver, so device drivers won't
>> have to set max_pasid.
>>
>> IOMMU drivers need to set min_pasid in the sva_device_init callback
>> because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0
>> (Intel Vt-d rev2), but at the moment I can't see a reason for device
>> drivers to override min_pasid
> 
> Sorry to ruin your day, but if I'm not completely mistaken PASID zero is
> reserved in the AMD KFD as well.

Heh, fair enough. I'll add the min_pasid parameter

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-06 Thread Christian König

Am 06.09.2018 um 13:09 schrieb Jean-Philippe Brucker:

Hi Eric,

Thanks for reviewing

On 05/09/2018 12:29, Auger Eric wrote:

+int iommu_sva_device_init(struct device *dev, unsigned long features,
+ unsigned int max_pasid)

what about min_pasid?

No one asked for it... The max_pasid parameter is here for drivers that
have vendor-specific PASID size limits, such as AMD KFD (see
kfd_iommu_device_init and
https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases
the PASID size will only depend on the PCI PASID capability and the
IOMMU limits, both known by the IOMMU driver, so device drivers won't
have to set max_pasid.

IOMMU drivers need to set min_pasid in the sva_device_init callback
because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0
(Intel Vt-d rev2), but at the moment I can't see a reason for device
drivers to override min_pasid


Sorry to ruin your day, but if I'm not completely mistaken PASID zero is 
reserved in the AMD KFD as well.


Regards,
Christian.




+   /*
+* IOMMU driver updates the limits depending on the IOMMU and device
+* capabilities.
+*/
+   ret = domain->ops->sva_device_init(dev, param);
+   if (ret)
+   goto err_free_param;

So you are likely to call sva_device_init even if it was already called
(ie. dev->iommu_param->sva_param is already set). Can't you test whether
dev->iommu_param->sva_param is already set first?

Right, that's probably better


+/**
+ * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a 
device
+ * @dev: the device
+ *
+ * Disable SVA. Device driver should ensure that the device isn't performing 
any
+ * DMA while this function is running.
+ */
+int iommu_sva_device_shutdown(struct device *dev)

Not sure the returned value is required for a shutdown operation.

I don't know either. I like them because they help me debug, but are
otherwise rather useless if we don't describe precise semantics. The
caller cannot do anything with it. Given that the corresponding IOMMU op
is already void, I can change this function to void as well


+struct iommu_sva_param {

What are the feature values?

At the moment only IOMMU_SVA_FEAT_IOPF, introduced by patch 09


+   unsigned long features;
+   unsigned int min_pasid;
+   unsigned int max_pasid;
+};
+
  /**
   * struct iommu_ops - iommu ops and capabilities
   * @capable: check capability
@@ -219,6 +225,8 @@ struct page_response_msg {
   * @domain_free: free iommu domain
   * @attach_dev: attach device to an iommu domain
   * @detach_dev: detach device from an iommu domain
+ * @sva_device_init: initialize Shared Virtual Adressing for a device

Addressing

+ * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device

Addressing

Nice catch

Thanks,
Jean


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-06 Thread Jean-Philippe Brucker
Hi Eric,

Thanks for reviewing

On 05/09/2018 12:29, Auger Eric wrote:
>> +int iommu_sva_device_init(struct device *dev, unsigned long features,
>> +  unsigned int max_pasid)
> what about min_pasid?

No one asked for it... The max_pasid parameter is here for drivers that
have vendor-specific PASID size limits, such as AMD KFD (see
kfd_iommu_device_init and
https://patchwork.kernel.org/patch/9989307/#21389571). But in most cases
the PASID size will only depend on the PCI PASID capability and the
IOMMU limits, both known by the IOMMU driver, so device drivers won't
have to set max_pasid.

IOMMU drivers need to set min_pasid in the sva_device_init callback
because it may be either 1 (e.g. Arm where PASID #0 is reserved) or 0
(Intel Vt-d rev2), but at the moment I can't see a reason for device
drivers to override min_pasid

>> +/*
>> + * IOMMU driver updates the limits depending on the IOMMU and device
>> + * capabilities.
>> + */
>> +ret = domain->ops->sva_device_init(dev, param);
>> +if (ret)
>> +goto err_free_param;
> So you are likely to call sva_device_init even if it was already called
> (ie. dev->iommu_param->sva_param is already set). Can't you test whether
> dev->iommu_param->sva_param is already set first?

Right, that's probably better

>> +/**
>> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a 
>> device
>> + * @dev: the device
>> + *
>> + * Disable SVA. Device driver should ensure that the device isn't 
>> performing any
>> + * DMA while this function is running.
>> + */
>> +int iommu_sva_device_shutdown(struct device *dev)
> Not sure the returned value is required for a shutdown operation.

I don't know either. I like them because they help me debug, but are
otherwise rather useless if we don't describe precise semantics. The
caller cannot do anything with it. Given that the corresponding IOMMU op
is already void, I can change this function to void as well

>> +struct iommu_sva_param {
> What are the feature values?

At the moment only IOMMU_SVA_FEAT_IOPF, introduced by patch 09

>> +unsigned long features;
>> +unsigned int min_pasid;
>> +unsigned int max_pasid;
>> +};
>> +
>>  /**
>>   * struct iommu_ops - iommu ops and capabilities
>>   * @capable: check capability
>> @@ -219,6 +225,8 @@ struct page_response_msg {
>>   * @domain_free: free iommu domain
>>   * @attach_dev: attach device to an iommu domain
>>   * @detach_dev: detach device from an iommu domain
>> + * @sva_device_init: initialize Shared Virtual Adressing for a device
> Addressing
>> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
> Addressing

Nice catch

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-05 Thread Auger Eric
Hi Jean-Philippe,
On 05/11/2018 09:06 PM, Jean-Philippe Brucker wrote:
> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
> process address spaces to devices. This requires the IOMMU to support page
> table format and features compatible with the CPUs, and usually requires
> the system to support I/O Page Faults (IOPF) and Process Address Space ID
> (PASID). When all of these are available, DMA can access virtual addresses
> of a process. A PASID is allocated for each process, and the device driver
> programs it into the device in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce two
> IOMMU operations, sva_device_init() and sva_device_shutdown(), that
> prepare the IOMMU driver for SVA. For example allocate PASID tables and
> fault queues. Subsequent patches will implement the bind() and unbind()
> operations.
> 
> Support for I/O Page Faults will be added in a later patch using a new
> feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
> down all shared mappings. Other feature bits that may be added in the
> future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address
> spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address
> space to a process.
> 
> Signed-off-by: Jean-Philippe Brucker 
> 
> ---
> v1->v2:
> * Add sva_param structure to iommu_param
> * CONFIG option is only selectable by IOMMU drivers
> ---
>  drivers/iommu/Kconfig |   4 ++
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/iommu-sva.c | 110 ++
>  include/linux/iommu.h |  32 +++
>  4 files changed, 147 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 7564237f788d..cca8e06903c7 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,10 @@ config IOMMU_DMA
>   select IOMMU_IOVA
>   select NEED_SG_DMA_LENGTH
>  
> +config IOMMU_SVA
> + bool
> + select IOMMU_API
> +
>  config FSL_PAMU
>   bool "Freescale IOMMU support"
>   depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..1dbcc89ebe4c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index ..8b4afb7c63ae
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,110 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Manage PASIDs and bind process address spaces to devices.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + */
> +
> +#include 
> +#include 
> +
> +/**
> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a 
> device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to initialize all
> + * features required for SVA.
> + *
> + * The device must support multiple address spaces (e.g. PCI PASID). By 
> default
> + * the PASID allocated during bind() is limited by the IOMMU capacity, and by
> + * the device PASID width defined in the PCI capability or in the firmware
> + * description. Setting @max_pasid to a non-zero value smaller than this 
> limit
> + * overrides it.
> + *
> + * The device should not be performing any DMA while this function is 
> running,
> + * otherwise the behavior is undefined.
> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_device_init(struct device *dev, unsigned long features,
> +   unsigned int max_pasid)
what about min_pasid?
> +{
> + int ret;
> + struct iommu_sva_param *param;
> + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> + if (!domain || !domain->ops->sva_device_init)
> + return -ENODEV;
> +
> + if (features)
> + return -EINVAL;
> +
> + param = kzalloc(sizeof(*param), GFP_KERNEL);
> + if (!param)
> + return -ENOMEM;
> +
> + param->features = features;
> + param->max_pasid= max_pasid;
> +
> + /*
> +  * IOMMU driver updates the limits depending on the IOMMU and device
> +  * capabilities.
> +  */
> + ret = domain->ops->sva_device_init(dev, param);
> + if (ret)
> + goto err_free_param;
So you are likely to call sva_device_init even if it was already called
(ie. dev->iommu_param->sva_param is 

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-05-17 Thread Jacob Pan
On Thu, 17 May 2018 11:02:02 +0100
Jean-Philippe Brucker  wrote:

> Hi Jacob,
> 
> Thanks for reviewing this
> 
> On 16/05/18 21:41, Jacob Pan wrote:
>  [...]  
> > seems the min_pasid never gets used. do you really need it?  
> 
> Yes, the SMMU sets it to 1 in patch 28/40, because it needs to reserve
> PASID 0
> 
>  [...]  
> > should it be !features?  
> 
> This checks if the user sets any unsupported bit in features. No
> feature is supported right now, but patch 09/40 adds
> IOMMU_SVA_FEAT_IOPF, and changes this line to "features &
> ~IOMMU_SVA_FEAT_IOPF"
> 
> >> +  mutex_lock(>iommu_param->lock);
> >> +  param = dev->iommu_param->sva_param;
> >> +  dev->iommu_param->sva_param = NULL;
> >> +  mutex_unlock(>iommu_param->lock);
> >> +  if (!param)
> >> +  return -ENODEV;
> >> +
> >> +  if (domain->ops->sva_device_shutdown)
> >> +  domain->ops->sva_device_shutdown(dev, param);  
> > seems a little mismatch here, do you need pass the param. I don't
> > think there is anything else model specific iommu driver need to do
> > for the param.  
> 
> SMMU doesn't use it, but maybe it would remind other IOMMU driver
> which features were enabled, so they don't have to keep track of that
> themselves? I can remove it if it isn't useful
> 
If there is a use case, I guess iommu driver can always retrieve the
param from struct device.
> Thanks,
> Jean

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-05-17 Thread Jean-Philippe Brucker
Hi Jacob,

Thanks for reviewing this

On 16/05/18 21:41, Jacob Pan wrote:
>> + * The device must support multiple address spaces (e.g. PCI PASID).
>> By default
>> + * the PASID allocated during bind() is limited by the IOMMU
>> capacity, and by
>> + * the device PASID width defined in the PCI capability or in the
>> firmware
>> + * description. Setting @max_pasid to a non-zero value smaller than
>> this limit
>> + * overrides it.
>> + *
> seems the min_pasid never gets used. do you really need it?

Yes, the SMMU sets it to 1 in patch 28/40, because it needs to reserve
PASID 0

>> + * The device should not be performing any DMA while this function
>> is running,
>> + * otherwise the behavior is undefined.
>> + *
>> + * Return 0 if initialization succeeded, or an error.
>> + */
>> +int iommu_sva_device_init(struct device *dev, unsigned long features,
>> +  unsigned int max_pasid)
>> +{
>> +int ret;
>> +struct iommu_sva_param *param;
>> +struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +
>> +if (!domain || !domain->ops->sva_device_init)
>> +return -ENODEV;
>> +
>> +if (features)
>> +return -EINVAL;
> should it be !features?

This checks if the user sets any unsupported bit in features. No feature
is supported right now, but patch 09/40 adds IOMMU_SVA_FEAT_IOPF, and
changes this line to "features & ~IOMMU_SVA_FEAT_IOPF"

>> +mutex_lock(>iommu_param->lock);
>> +param = dev->iommu_param->sva_param;
>> +dev->iommu_param->sva_param = NULL;
>> +mutex_unlock(>iommu_param->lock);
>> +if (!param)
>> +return -ENODEV;
>> +
>> +if (domain->ops->sva_device_shutdown)
>> +domain->ops->sva_device_shutdown(dev, param);
> seems a little mismatch here, do you need pass the param. I don't think
> there is anything else model specific iommu driver need to do for the
> param.

SMMU doesn't use it, but maybe it would remind other IOMMU driver which
features were enabled, so they don't have to keep track of that
themselves? I can remove it if it isn't useful

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-05-16 Thread Jacob Pan
On Fri, 11 May 2018 20:06:02 +0100
Jean-Philippe Brucker  wrote:

> Shared Virtual Addressing (SVA) provides a way for device drivers to
> bind process address spaces to devices. This requires the IOMMU to
> support page table format and features compatible with the CPUs, and
> usually requires the system to support I/O Page Faults (IOPF) and
> Process Address Space ID (PASID). When all of these are available,
> DMA can access virtual addresses of a process. A PASID is allocated
> for each process, and the device driver programs it into the device
> in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce
> two IOMMU operations, sva_device_init() and sva_device_shutdown(),
> that prepare the IOMMU driver for SVA. For example allocate PASID
> tables and fault queues. Subsequent patches will implement the bind()
> and unbind() operations.
> 
> Support for I/O Page Faults will be added in a later patch using a new
> feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
> down all shared mappings. Other feature bits that may be added in the
> future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address
> spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address
> space to a process.
> 
> Signed-off-by: Jean-Philippe Brucker 
> 
> ---
> v1->v2:
> * Add sva_param structure to iommu_param
> * CONFIG option is only selectable by IOMMU drivers
> ---
>  drivers/iommu/Kconfig |   4 ++
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/iommu-sva.c | 110
> ++ include/linux/iommu.h |
> 32 +++ 4 files changed, 147 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 7564237f788d..cca8e06903c7 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,10 @@ config IOMMU_DMA
>   select IOMMU_IOVA
>   select NEED_SG_DMA_LENGTH
>  
> +config IOMMU_SVA
> + bool
> + select IOMMU_API
> +
>  config FSL_PAMU
>   bool "Freescale IOMMU support"
>   depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..1dbcc89ebe4c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index ..8b4afb7c63ae
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,110 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Manage PASIDs and bind process address spaces to devices.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + */
> +
> +#include 
> +#include 
> +
> +/**
> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing
> for a device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to
> initialize all
> + * features required for SVA.
> + *
> + * The device must support multiple address spaces (e.g. PCI PASID).
> By default
> + * the PASID allocated during bind() is limited by the IOMMU
> capacity, and by
> + * the device PASID width defined in the PCI capability or in the
> firmware
> + * description. Setting @max_pasid to a non-zero value smaller than
> this limit
> + * overrides it.
> + *
seems the min_pasid never gets used. do you really need it?
 
> + * The device should not be performing any DMA while this function
> is running,
> + * otherwise the behavior is undefined.
> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_device_init(struct device *dev, unsigned long features,
> +   unsigned int max_pasid)
> +{
> + int ret;
> + struct iommu_sva_param *param;
> + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> + if (!domain || !domain->ops->sva_device_init)
> + return -ENODEV;
> +
> + if (features)
> + return -EINVAL;
should it be !features?

> +
> + param = kzalloc(sizeof(*param), GFP_KERNEL);
> + if (!param)
> + return -ENOMEM;
> +
> + param->features = features;
> + param->max_pasid= max_pasid;
> +
> + /*
> +  * IOMMU driver updates the limits depending on the IOMMU
> and device
> +  * capabilities.
> +  */
> + ret = domain->ops->sva_device_init(dev, param);
> + if (ret)
> + goto 

[PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-05-11 Thread Jean-Philippe Brucker
Shared Virtual Addressing (SVA) provides a way for device drivers to bind
process address spaces to devices. This requires the IOMMU to support page
table format and features compatible with the CPUs, and usually requires
the system to support I/O Page Faults (IOPF) and Process Address Space ID
(PASID). When all of these are available, DMA can access virtual addresses
of a process. A PASID is allocated for each process, and the device driver
programs it into the device in an implementation-specific way.

Add a new API for sharing process page tables with devices. Introduce two
IOMMU operations, sva_device_init() and sva_device_shutdown(), that
prepare the IOMMU driver for SVA. For example allocate PASID tables and
fault queues. Subsequent patches will implement the bind() and unbind()
operations.

Support for I/O Page Faults will be added in a later patch using a new
feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
down all shared mappings. Other feature bits that may be added in the
future are IOMMU_SVA_FEAT_PRIVATE, to support private PASID address
spaces, and IOMMU_SVA_FEAT_NO_PASID, to bind the whole device address
space to a process.

Signed-off-by: Jean-Philippe Brucker 

---
v1->v2:
* Add sva_param structure to iommu_param
* CONFIG option is only selectable by IOMMU drivers
---
 drivers/iommu/Kconfig |   4 ++
 drivers/iommu/Makefile|   1 +
 drivers/iommu/iommu-sva.c | 110 ++
 include/linux/iommu.h |  32 +++
 4 files changed, 147 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7564237f788d..cca8e06903c7 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,6 +74,10 @@ config IOMMU_DMA
select IOMMU_IOVA
select NEED_SG_DMA_LENGTH
 
+config IOMMU_SVA
+   bool
+   select IOMMU_API
+
 config FSL_PAMU
bool "Freescale IOMMU support"
depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..1dbcc89ebe4c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
new file mode 100644
index ..8b4afb7c63ae
--- /dev/null
+++ b/drivers/iommu/iommu-sva.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Manage PASIDs and bind process address spaces to devices.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include 
+#include 
+
+/**
+ * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
+ * @dev: the device
+ * @features: bitmask of features that need to be initialized
+ * @max_pasid: max PASID value supported by the device
+ *
+ * Users of the bind()/unbind() API must call this function to initialize all
+ * features required for SVA.
+ *
+ * The device must support multiple address spaces (e.g. PCI PASID). By default
+ * the PASID allocated during bind() is limited by the IOMMU capacity, and by
+ * the device PASID width defined in the PCI capability or in the firmware
+ * description. Setting @max_pasid to a non-zero value smaller than this limit
+ * overrides it.
+ *
+ * The device should not be performing any DMA while this function is running,
+ * otherwise the behavior is undefined.
+ *
+ * Return 0 if initialization succeeded, or an error.
+ */
+int iommu_sva_device_init(struct device *dev, unsigned long features,
+ unsigned int max_pasid)
+{
+   int ret;
+   struct iommu_sva_param *param;
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+   if (!domain || !domain->ops->sva_device_init)
+   return -ENODEV;
+
+   if (features)
+   return -EINVAL;
+
+   param = kzalloc(sizeof(*param), GFP_KERNEL);
+   if (!param)
+   return -ENOMEM;
+
+   param->features = features;
+   param->max_pasid= max_pasid;
+
+   /*
+* IOMMU driver updates the limits depending on the IOMMU and device
+* capabilities.
+*/
+   ret = domain->ops->sva_device_init(dev, param);
+   if (ret)
+   goto err_free_param;
+
+   mutex_lock(>iommu_param->lock);
+   if (dev->iommu_param->sva_param)
+   ret = -EEXIST;
+   else
+   dev->iommu_param->sva_param = param;
+   mutex_unlock(>iommu_param->lock);
+   if (ret)
+   goto err_device_shutdown;
+
+   return 0;
+
+err_device_shutdown:
+   if (domain->ops->sva_device_shutdown)
+