Re: [PATCH] dt-bindings: arm-smmu: Fix json-schema syntax

2021-06-22 Thread Krzysztof Kozlowski
On Mon, 21 Jun 2021 16:00:36 +0200, Thierry Reding wrote:
> Commit 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible
> string") introduced a jsonschema syntax error as a result of a rebase
> gone wrong. Fix it.

Applied, thanks!

[1/1] dt-bindings: arm-smmu: Fix json-schema syntax
  commit: bf3ec9deaa33889630722c47f7bb86ba58872ea7

Best regards,
-- 
Krzysztof Kozlowski 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Yongji Xie
On Wed, Jun 23, 2021 at 11:31 AM Jason Wang  wrote:
>
>
> 在 2021/6/22 下午4:14, Yongji Xie 写道:
> > On Tue, Jun 22, 2021 at 3:50 PM Jason Wang  wrote:
> >>
> >> 在 2021/6/22 下午3:22, Yongji Xie 写道:
>  We need fix a way to propagate the error to the userspace.
> 
>  E.g if we want to stop the deivce, we will delay the status reset until
>  we get respose from the userspace?
> 
> >>> I didn't get how to delay the status reset. And should it be a DoS
> >>> that we want to fix if the userspace doesn't give a response forever?
> >>
> >> You're right. So let's make set_status() can fail first, then propagate
> >> its failure via VHOST_VDPA_SET_STATUS.
> >>
> > OK. So we only need to propagate the failure in the vhost-vdpa case, right?
>
>
> I think not, we need to deal with the reset for virtio as well:
>
> E.g in register_virtio_devices(), we have:
>
>  /* We always start by resetting the device, in case a previous
>   * driver messed it up.  This also tests that code path a
> little. */
>dev->config->reset(dev);
>
> We probably need to make reset can fail and then fail the
> register_virtio_device() as well.
>

OK, looks like virtio_add_status() and virtio_device_ready()[1] should
be also modified if we need to propagate the failure in the
virtio-vdpa case. Or do we only need to care about the reset case?

[1] https://lore.kernel.org/lkml/20210517093428.670-1-xieyon...@bytedance.com/

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Jason Wang


在 2021/6/22 下午4:14, Yongji Xie 写道:

On Tue, Jun 22, 2021 at 3:50 PM Jason Wang  wrote:


在 2021/6/22 下午3:22, Yongji Xie 写道:

We need fix a way to propagate the error to the userspace.

E.g if we want to stop the deivce, we will delay the status reset until
we get respose from the userspace?


I didn't get how to delay the status reset. And should it be a DoS
that we want to fix if the userspace doesn't give a response forever?


You're right. So let's make set_status() can fail first, then propagate
its failure via VHOST_VDPA_SET_STATUS.


OK. So we only need to propagate the failure in the vhost-vdpa case, right?



I think not, we need to deal with the reset for virtio as well:

E.g in register_virtio_devices(), we have:

    /* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a 
little. */

  dev->config->reset(dev);

We probably need to make reset can fail and then fail the 
register_virtio_device() as well.


Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-22 Thread 'Dominique MARTINET'
Konrad Rzeszutek Wilk wrote on Tue, Jun 22, 2021 at 05:58:14PM -0400:
> On Tue, Jun 22, 2021 at 04:48:24PM +0900, 'Dominique MARTINET' wrote:
> > Thanks, that should be good.
> > 
> > Do you want me to send a follow-up patch with the two extra checks
> > (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)
> > tlb_offset < alloc_size
> > 
> > or are we certain this can't ever happen?
> 
> I would love more patches and I saw the previous one you posted.
> 
> But we only got two (or one) weeks before the next merge window opens
> so I am sending to Linus the one that was tested with NVMe and crypto
> (see above).
> 
> That is the
> https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=stable/for-linus-5.14
> 
> And then after Linus releases the 5.14 - I would love to take your
> cleanup on top of that and test it?

That sounds good to me, will send with proper formatting after release.

-- 
Dominique
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-22 Thread Robin Murphy

On 2021-06-21 15:32, Lu Baolu wrote:

Hi Robin,

On 2021/6/21 19:59, Robin Murphy wrote:

On 2021-06-21 11:34, John Garry wrote:

On 21/06/2021 11:00, Lu Baolu wrote:

void iommu_set_dma_strict(bool force)
{
  if (force == true)
 iommu_dma_strict = true;
 else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
 iommu_dma_strict = true;
}

So we would use iommu_set_dma_strict(true) for a) and b), but 
iommu_set_dma_strict(false) for c).


Yes. We need to distinguish the "must" and "nice-to-have" cases of
setting strict mode.



Then I am not sure what you want to do with the accompanying print 
for c). It was:

"IOMMU batching is disabled due to virtualization"

And now is from this series:
"IOMMU batching disallowed due to virtualization"

Using iommu_get_dma_strict(domain) is not appropriate here to know 
the current mode (so we know whether to print).


Note that this change would mean that the current series would 
require non-trivial rework, which would be unfortunate so late in 
the cycle.


This patch series looks good to me and I have added by reviewed-by.
Probably we could make another patch series to improve it so that the
kernel optimization should not override the user setting.


On a personal level I would be happy with that approach, but I think 
it's better to not start changing things right away in a follow-up 
series.


So how about we add this patch (which replaces 6/6 "iommu: Remove 
mode argument from iommu_set_dma_strict()")?


Robin, any opinion?


For me it boils down to whether there are any realistic workloads 
where non-strict mode *would* still perform better under 
virtualisation. The 


At present, we see that strict mode has better performance in the
virtualization environment because it will make the shadow page table
management more efficient. When the hardware supports nested
translation, we may have to re-evaluate this since there's no need for
a shadowing page table anymore.


I guess I was assuming that in most cases, proper nested mode could look 
distinct enough that we'd be able to treat it differently in the first 
place. For instance, if it's handing guest tables directly to the 
hardware, would the host have any reason to still set the "caching mode" 
ID bit?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-22 Thread Robin Murphy

On 2021-06-22 17:06, Doug Anderson wrote:

Hi,

On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy  wrote:


Hi Doug,

On 2021-06-22 00:52, Douglas Anderson wrote:


This patch attempts to put forward a proposal for enabling non-strict
DMA on a device-by-device basis. The patch series requests non-strict
DMA for the Qualcomm SDHCI controller as a first device to enable,
getting a nice bump in performance with what's believed to be a very
small drop in security / safety (see the patch for the full argument).

As part of this patch series I am end up slightly cleaning up some of
the interactions between the PCI subsystem and the IOMMU subsystem but
I don't go all the way to fully remove all the tentacles. Specifically
this patch series only concerns itself with a single aspect: strict
vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
to talk about / reason about for more subsystems compared to overall
deciding what it means for a device to be "external" or "untrusted".

If something like this patch series ends up being landable, it will
undoubtedly need coordination between many maintainers to land. I
believe it's fully bisectable but later patches in the series
definitely depend on earlier ones. Sorry for the long CC list. :(


Unfortunately, this doesn't work. In normal operation, the default
domains should be established long before individual drivers are even
loaded (if they are modules), let alone anywhere near probing. The fact
that iommu_probe_device() sometimes gets called far too late off the
back of driver probe is an unfortunate artefact of the original
probe-deferral scheme, and causes other problems like potentially
malformed groups - I've been forming a plan to fix that for a while now,
so I for one really can't condone anything trying to rely on it.
Non-deterministic behaviour based on driver probe order for multi-device
groups is part of the existing problem, and your proposal seems equally
vulnerable to that too.


Doh! :( I definitely can't say I understand the iommu subsystem
amazingly well. It was working for me, but I could believe that I was
somehow violating a rule somewhere.

I'm having a bit of a hard time understanding where the problem is
though. Is there any chance that you missed the part of my series
where I introduced a "pre_probe" step? Specifically, I see this:

* really_probe() is called w/ a driver and a device.
* -> calls dev->bus->dma_configure() w/ a "struct device *"
* -> eventually calls iommu_probe_device() w/ the device.


This...


* -> calls iommu_alloc_default_domain() w/ the device
* -> calls iommu_group_alloc_default_domain()
* -> always allocates a new domain

...so we always have a "struct device" when a domain is allocated if
that domain is going to be associated with a device.

I will agree that iommu_probe_device() is called before the driver
probe, but unless I missed something it's after the device driver is
loaded.  ...and assuming something like patch #1 in this series looks
OK then iommu_probe_device() will be called after "pre_probe".

So assuming I'm not missing something, I'm not actually relying the
IOMMU getting init off the back of driver probe.


...is implicitly that. Sorry that it's not obvious.

The "proper" flow is that iommu_probe_device() is called for everything 
which already exists during the IOMMU driver's own probe when it calls 
bus_set_iommu(), then at BUS_NOTIFY_ADD_DEVICE time for everything which 
appears subsequently. The only trouble is, to observe it in action on a 
DT-based system you'd currently have to go back at least 4 years, before 
09515ef5ddad...


Basically there were two issues: firstly we need the of_xlate step 
before add_device (now probe_device) for a DT-based IOMMU driver to know 
whether it should care about the given device or not. When -EPROBE_DEFER 
was the only tool we had to ensure probe ordering, and resolving the 
"iommus" DT property the only place to decide that, delaying it all 
until driver probe time was the only reasonable option, however ugly. 
The iommu_probe_device() "replay" in {of,acpi}_iommu_configure() is 
merely doing its best to fake up the previous behaviour. Try binding a 
dummy driver to your device first, then unbind it and bind the real one, 
and you'll see that iommu_probe_device() doesn't run the second or 
subsequent times. Now that we have fw_devlink to take care of ordering, 
the main reason for this weirdness is largely gone, so I'm keen to start 
getting rid of the divergence again as far as possible. Fundamentally, 
IOMMU drivers are supposed to be aware of all devices which the kernel 
knows about, regardless of whether they have a driver available or not.


The second issue is that when we have multiple IOMMU instances, the 
initial bus_set_iommu() "replay" is only useful for the first instance, 
so devices managed by other instances which aren't up and running yet 
will be glossed over. Currently this ends up being papered over by the 
solution to the first 

[GIT PULL] (stable) stable/for-linus-5.14

2021-06-22 Thread Konrad Rzeszutek Wilk
Hey Linus,

Please git pull the following branch:

git pull git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git 
stable/for-linus-5.14

which has the regression for the DMA operations where the offset was
ignored and corruptions would appear.

Going forward there will be a cleanups to make the offset and alignment
logic more clearer and better test-cases to help with this.

Bumyong Lee (1):
  swiotlb: manipulate orig_addr when tlb_addr has offset

 kernel/dma/swiotlb.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-22 Thread Konrad Rzeszutek Wilk
On Tue, Jun 22, 2021 at 04:48:24PM +0900, 'Dominique MARTINET' wrote:
> Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400:
> > The beaty of 'devel' and 'linux-next' is that they can be reshuffled and
> > mangled. I pushed them original patch from Bumyong there and will let
> > it sit for a day and then create a stable branch and give it to Linus.
> 
> Thanks, that should be good.
> 
> Do you want me to send a follow-up patch with the two extra checks
> (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)
> tlb_offset < alloc_size
> 
> or are we certain this can't ever happen?

I would love more patches and I saw the previous one you posted.

But we only got two (or one) weeks before the next merge window opens
so I am sending to Linus the one that was tested with NVMe and crypto
(see above).

That is the
https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=stable/for-linus-5.14

And then after Linus releases the 5.14 - I would love to take your
cleanup on top of that and test it?

> (I didn't see any hit in dmesg when I ran with these, but my opinion is
> better safe than sorry...)
> 
> 
> > Then I need to expand the test-regression bucket so that this does not
> > happen again. Dominique, how easy would it be to purchase one of those
> > devices?
> 
> My company is making such a device, but it's not on the market yet
> (was planned for august, with some delay in approvisionning it'll
> probably be a bit late), and would mean buying from Japan so I'm not
> sure how convenient that would be...
> 
> These are originally NXP devices so I assume Horia would have better
> suggestions, if you would?
> 
> 
> > I was originally thinking to create a crypto device in QEMU to simulate
> > this but that may take longer to write than just getting the real thing.
> > 
> > Or I could create some fake devices with weird offsets and write a driver
> > for it to exercise this.. like this one I had done some time ago that
> > needs some brushing off.
> 
> Just a fake device with fake offsets as a test is probably good enough,
> ideally would need to exerce both failures we've seen (offset in
> dma_sync_single_for_device like caam does and in the original mapping (I
> assume?) like the NVMe driver does), but that sounds possible :)

Yup. Working on that now.
> 
> 
> Thanks again!
> -- 
> Dominique
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 01/12] swiotlb: Refactor swiotlb init functions

2021-06-22 Thread Stefano Stabellini
On Sat, 19 Jun 2021, Claire Chang wrote:
> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
> initialization to make the code reusable.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  kernel/dma/swiotlb.c | 50 ++--
>  1 file changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 52e2ac526757..1f9b2b9e7490 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
>   memset(vaddr, 0, bytes);
>  }
>  
> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> start,
> + unsigned long nslabs, bool late_alloc)
>  {
> + void *vaddr = phys_to_virt(start);
>   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
> +
> + mem->nslabs = nslabs;
> + mem->start = start;
> + mem->end = mem->start + bytes;
> + mem->index = 0;
> + mem->late_alloc = late_alloc;
> + spin_lock_init(>lock);
> + for (i = 0; i < mem->nslabs; i++) {
> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> + mem->slots[i].alloc_size = 0;
> + }
> + memset(vaddr, 0, bytes);
> +}
> +
> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +{
>   struct io_tlb_mem *mem;
>   size_t alloc_size;
>  
> @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs, int verbose)
>   if (!mem)
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
> - mem->nslabs = nslabs;
> - mem->start = __pa(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> +
> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
>  
>   io_tlb_default_mem = mem;
>   if (verbose)
> @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
>  int
>  swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>  {
> - unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>   struct io_tlb_mem *mem;
> + unsigned long bytes = nslabs << IO_TLB_SHIFT;
>  
>   if (swiotlb_force == SWIOTLB_NO_FORCE)
>   return 0;
> @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
> nslabs)
>   if (!mem)
>   return -ENOMEM;
>  
> - mem->nslabs = nslabs;
> - mem->start = virt_to_phys(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - mem->late_alloc = 1;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> -
> + memset(mem, 0, sizeof(*mem));
>   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
> - memset(tlb, 0, bytes);
> + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
>  
>   io_tlb_default_mem = mem;
>   swiotlb_print_info();
> -- 
> 2.32.0.288.g62a8d224e6-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-22 Thread Doug Anderson
Hi,

On Tue, Jun 22, 2021 at 1:06 PM Saravana Kannan  wrote:
>
> On Tue, Jun 22, 2021 at 1:02 PM Rob Herring  wrote:
> >
> > On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy  wrote:
> > > >
> > > > Hi Doug,
> > > >
> > > > On 2021-06-22 00:52, Douglas Anderson wrote:
> > > > >
> > > > > This patch attempts to put forward a proposal for enabling non-strict
> > > > > DMA on a device-by-device basis. The patch series requests non-strict
> > > > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > > > getting a nice bump in performance with what's believed to be a very
> > > > > small drop in security / safety (see the patch for the full argument).
> > > > >
> > > > > As part of this patch series I am end up slightly cleaning up some of
> > > > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > > > I don't go all the way to fully remove all the tentacles. Specifically
> > > > > this patch series only concerns itself with a single aspect: strict
> > > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > > > to talk about / reason about for more subsystems compared to overall
> > > > > deciding what it means for a device to be "external" or "untrusted".
> > > > >
> > > > > If something like this patch series ends up being landable, it will
> > > > > undoubtedly need coordination between many maintainers to land. I
> > > > > believe it's fully bisectable but later patches in the series
> > > > > definitely depend on earlier ones. Sorry for the long CC list. :(
> > > >
> > > > Unfortunately, this doesn't work. In normal operation, the default
> > > > domains should be established long before individual drivers are even
> > > > loaded (if they are modules), let alone anywhere near probing. The fact
> > > > that iommu_probe_device() sometimes gets called far too late off the
> > > > back of driver probe is an unfortunate artefact of the original
> > > > probe-deferral scheme, and causes other problems like potentially
> > > > malformed groups - I've been forming a plan to fix that for a while now,
> > > > so I for one really can't condone anything trying to rely on it.
> > > > Non-deterministic behaviour based on driver probe order for multi-device
> > > > groups is part of the existing problem, and your proposal seems equally
> > > > vulnerable to that too.
> > >
> > > Doh! :( I definitely can't say I understand the iommu subsystem
> > > amazingly well. It was working for me, but I could believe that I was
> > > somehow violating a rule somewhere.
> > >
> > > I'm having a bit of a hard time understanding where the problem is
> > > though. Is there any chance that you missed the part of my series
> > > where I introduced a "pre_probe" step? Specifically, I see this:
> > >
> > > * really_probe() is called w/ a driver and a device.
> > > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > > * -> eventually calls iommu_probe_device() w/ the device.
> > > * -> calls iommu_alloc_default_domain() w/ the device
> > > * -> calls iommu_group_alloc_default_domain()
> > > * -> always allocates a new domain
> > >
> > > ...so we always have a "struct device" when a domain is allocated if
> > > that domain is going to be associated with a device.
> > >
> > > I will agree that iommu_probe_device() is called before the driver
> > > probe, but unless I missed something it's after the device driver is
> > > loaded.  ...and assuming something like patch #1 in this series looks
> > > OK then iommu_probe_device() will be called after "pre_probe".
> > >
> > > So assuming I'm not missing something, I'm not actually relying the
> > > IOMMU getting init off the back of driver probe.
> > >
> > >
> > > > FWIW we already have a go-faster knob for people who want to tweak the
> > > > security/performance compromise for specific devices, namely the sysfs
> > > > interface for changing a group's domain type before binding the relevant
> > > > driver(s). Is that something you could use in your application, say from
> > > > an initramfs script?
> > >
> > > We've never had an initramfs script in Chrome OS. I don't know all the
> > > history of why (I'm trying to check), but I'm nearly certain it was a
> > > conscious decision. Probably it has to do with the fact that we're not
> > > trying to build a generic distribution where a single boot source can
> > > boot a huge variety of hardware. We generally have one kernel for a
> > > class of devices. I believe avoiding the initramfs just keeps things
> > > simpler.
> > >
> > > I think trying to revamp Chrome OS to switch to an initramfs type
> > > system would be a pretty big undertaking since (as I understand it)
> > > you can't just run a little command and then return to the normal boot
> > > flow. Once you switch to initramfs you're committing to finding /
> > > setting up the rootfs yourself and on Chrome OS I believe that means 

Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-22 Thread Saravana Kannan via iommu
On Tue, Jun 22, 2021 at 1:02 PM Rob Herring  wrote:
>
> On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy  wrote:
> > >
> > > Hi Doug,
> > >
> > > On 2021-06-22 00:52, Douglas Anderson wrote:
> > > >
> > > > This patch attempts to put forward a proposal for enabling non-strict
> > > > DMA on a device-by-device basis. The patch series requests non-strict
> > > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > > getting a nice bump in performance with what's believed to be a very
> > > > small drop in security / safety (see the patch for the full argument).
> > > >
> > > > As part of this patch series I am end up slightly cleaning up some of
> > > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > > I don't go all the way to fully remove all the tentacles. Specifically
> > > > this patch series only concerns itself with a single aspect: strict
> > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > > to talk about / reason about for more subsystems compared to overall
> > > > deciding what it means for a device to be "external" or "untrusted".
> > > >
> > > > If something like this patch series ends up being landable, it will
> > > > undoubtedly need coordination between many maintainers to land. I
> > > > believe it's fully bisectable but later patches in the series
> > > > definitely depend on earlier ones. Sorry for the long CC list. :(
> > >
> > > Unfortunately, this doesn't work. In normal operation, the default
> > > domains should be established long before individual drivers are even
> > > loaded (if they are modules), let alone anywhere near probing. The fact
> > > that iommu_probe_device() sometimes gets called far too late off the
> > > back of driver probe is an unfortunate artefact of the original
> > > probe-deferral scheme, and causes other problems like potentially
> > > malformed groups - I've been forming a plan to fix that for a while now,
> > > so I for one really can't condone anything trying to rely on it.
> > > Non-deterministic behaviour based on driver probe order for multi-device
> > > groups is part of the existing problem, and your proposal seems equally
> > > vulnerable to that too.
> >
> > Doh! :( I definitely can't say I understand the iommu subsystem
> > amazingly well. It was working for me, but I could believe that I was
> > somehow violating a rule somewhere.
> >
> > I'm having a bit of a hard time understanding where the problem is
> > though. Is there any chance that you missed the part of my series
> > where I introduced a "pre_probe" step? Specifically, I see this:
> >
> > * really_probe() is called w/ a driver and a device.
> > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > * -> eventually calls iommu_probe_device() w/ the device.
> > * -> calls iommu_alloc_default_domain() w/ the device
> > * -> calls iommu_group_alloc_default_domain()
> > * -> always allocates a new domain
> >
> > ...so we always have a "struct device" when a domain is allocated if
> > that domain is going to be associated with a device.
> >
> > I will agree that iommu_probe_device() is called before the driver
> > probe, but unless I missed something it's after the device driver is
> > loaded.  ...and assuming something like patch #1 in this series looks
> > OK then iommu_probe_device() will be called after "pre_probe".
> >
> > So assuming I'm not missing something, I'm not actually relying the
> > IOMMU getting init off the back of driver probe.
> >
> >
> > > FWIW we already have a go-faster knob for people who want to tweak the
> > > security/performance compromise for specific devices, namely the sysfs
> > > interface for changing a group's domain type before binding the relevant
> > > driver(s). Is that something you could use in your application, say from
> > > an initramfs script?
> >
> > We've never had an initramfs script in Chrome OS. I don't know all the
> > history of why (I'm trying to check), but I'm nearly certain it was a
> > conscious decision. Probably it has to do with the fact that we're not
> > trying to build a generic distribution where a single boot source can
> > boot a huge variety of hardware. We generally have one kernel for a
> > class of devices. I believe avoiding the initramfs just keeps things
> > simpler.
> >
> > I think trying to revamp Chrome OS to switch to an initramfs type
> > system would be a pretty big undertaking since (as I understand it)
> > you can't just run a little command and then return to the normal boot
> > flow. Once you switch to initramfs you're committing to finding /
> > setting up the rootfs yourself and on Chrome OS I believe that means a
> > whole bunch of dm-verity work.
> >
> >
> > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > my fingers that the pre_probe() might be legit if you take a second
> > look at it?
>
> Couldn't you have a driver 

Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-22 Thread Rob Herring
On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy  wrote:
> >
> > Hi Doug,
> >
> > On 2021-06-22 00:52, Douglas Anderson wrote:
> > >
> > > This patch attempts to put forward a proposal for enabling non-strict
> > > DMA on a device-by-device basis. The patch series requests non-strict
> > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > getting a nice bump in performance with what's believed to be a very
> > > small drop in security / safety (see the patch for the full argument).
> > >
> > > As part of this patch series I am end up slightly cleaning up some of
> > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > I don't go all the way to fully remove all the tentacles. Specifically
> > > this patch series only concerns itself with a single aspect: strict
> > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > to talk about / reason about for more subsystems compared to overall
> > > deciding what it means for a device to be "external" or "untrusted".
> > >
> > > If something like this patch series ends up being landable, it will
> > > undoubtedly need coordination between many maintainers to land. I
> > > believe it's fully bisectable but later patches in the series
> > > definitely depend on earlier ones. Sorry for the long CC list. :(
> >
> > Unfortunately, this doesn't work. In normal operation, the default
> > domains should be established long before individual drivers are even
> > loaded (if they are modules), let alone anywhere near probing. The fact
> > that iommu_probe_device() sometimes gets called far too late off the
> > back of driver probe is an unfortunate artefact of the original
> > probe-deferral scheme, and causes other problems like potentially
> > malformed groups - I've been forming a plan to fix that for a while now,
> > so I for one really can't condone anything trying to rely on it.
> > Non-deterministic behaviour based on driver probe order for multi-device
> > groups is part of the existing problem, and your proposal seems equally
> > vulnerable to that too.
> 
> Doh! :( I definitely can't say I understand the iommu subsystem
> amazingly well. It was working for me, but I could believe that I was
> somehow violating a rule somewhere.
> 
> I'm having a bit of a hard time understanding where the problem is
> though. Is there any chance that you missed the part of my series
> where I introduced a "pre_probe" step? Specifically, I see this:
> 
> * really_probe() is called w/ a driver and a device.
> * -> calls dev->bus->dma_configure() w/ a "struct device *"
> * -> eventually calls iommu_probe_device() w/ the device.
> * -> calls iommu_alloc_default_domain() w/ the device
> * -> calls iommu_group_alloc_default_domain()
> * -> always allocates a new domain
> 
> ...so we always have a "struct device" when a domain is allocated if
> that domain is going to be associated with a device.
> 
> I will agree that iommu_probe_device() is called before the driver
> probe, but unless I missed something it's after the device driver is
> loaded.  ...and assuming something like patch #1 in this series looks
> OK then iommu_probe_device() will be called after "pre_probe".
> 
> So assuming I'm not missing something, I'm not actually relying the
> IOMMU getting init off the back of driver probe.
> 
> 
> > FWIW we already have a go-faster knob for people who want to tweak the
> > security/performance compromise for specific devices, namely the sysfs
> > interface for changing a group's domain type before binding the relevant
> > driver(s). Is that something you could use in your application, say from
> > an initramfs script?
> 
> We've never had an initramfs script in Chrome OS. I don't know all the
> history of why (I'm trying to check), but I'm nearly certain it was a
> conscious decision. Probably it has to do with the fact that we're not
> trying to build a generic distribution where a single boot source can
> boot a huge variety of hardware. We generally have one kernel for a
> class of devices. I believe avoiding the initramfs just keeps things
> simpler.
> 
> I think trying to revamp Chrome OS to switch to an initramfs type
> system would be a pretty big undertaking since (as I understand it)
> you can't just run a little command and then return to the normal boot
> flow. Once you switch to initramfs you're committing to finding /
> setting up the rootfs yourself and on Chrome OS I believe that means a
> whole bunch of dm-verity work.
> 
> 
> ...so probably the initramfs is a no-go for me, but I'm still crossing
> my fingers that the pre_probe() might be legit if you take a second
> look at it?

Couldn't you have a driver flag that has the same effect as twiddling 
sysfs? At the being of probe, check the flag and go set the underlying 
sysfs setting in the device.

Though you may want this to be per device, not per driver. To do that 
early, I 

Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-22 Thread Saravana Kannan via iommu
On Tue, Jun 22, 2021 at 9:46 AM Doug Anderson  wrote:
>
> Hi,
>
> On Mon, Jun 21, 2021 at 7:56 PM Saravana Kannan  wrote:
> >
> > On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson  
> > wrote:
> > >
> > > In the patch ("drivers: base: Add bits to struct device to control
> > > iommu strictness") we add the ability for devices to tell us about
> > > their IOMMU strictness requirements. Let's now take that into account
> > > in the IOMMU layer.
> > >
> > > A few notes here:
> > > * Presumably this is always how iommu_get_dma_strict() was intended to
> > >   behave. Had this not been the intention then it never would have
> > >   taken a domain as a parameter.
> > > * The iommu_set_dma_strict() feels awfully non-symmetric now. That
> > >   function sets the _default_ strictness globally in the system
> > >   whereas iommu_get_dma_strict() returns the value for a given domain
> > >   (falling back to the default). Presumably, at least, the fact that
> > >   iommu_set_dma_strict() doesn't take a domain makes this obvious.
> > >
> > > The function iommu_get_dma_strict() should now make it super obvious
> > > where strictness comes from and who overides who. Though the function
> > > changed a bunch to make the logic clearer, the only two new rules
> > > should be:
> > > * Devices can force strictness for themselves, overriding the cmdline
> > >   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> > > * Devices can request non-strictness for themselves, assuming there
> > >   was no cmdline "iommu.strict=1" or a call to
> > >   iommu_set_dma_strict(true).
> > >
> > > Signed-off-by: Douglas Anderson 
> > > ---
> > >
> > >  drivers/iommu/iommu.c | 56 +--
> > >  include/linux/iommu.h |  2 ++
> > >  2 files changed, 45 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 808ab70d5df5..0c84a4c06110 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -28,8 +28,19 @@
> > >  static struct kset *iommu_group_kset;
> > >  static DEFINE_IDA(iommu_group_ida);
> > >
> > > +enum iommu_strictness {
> > > +   IOMMU_DEFAULT_STRICTNESS = -1,
> > > +   IOMMU_NOT_STRICT = 0,
> > > +   IOMMU_STRICT = 1,
> > > +};
> > > +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> > > +{
> > > +   return (enum iommu_strictness)strictness;
> > > +}
> > > +
> > >  static unsigned int iommu_def_domain_type __read_mostly;
> > > -static bool iommu_dma_strict __read_mostly = true;
> > > +static enum iommu_strictness cmdline_dma_strict __read_mostly = 
> > > IOMMU_DEFAULT_STRICTNESS;
> > > +static enum iommu_strictness driver_dma_strict __read_mostly = 
> > > IOMMU_DEFAULT_STRICTNESS;
> > >  static u32 iommu_cmd_line __read_mostly;
> > >
> > >  struct iommu_group {
> > > @@ -69,7 +80,6 @@ static const char * const 
> > > iommu_group_resv_type_string[] = {
> > >  };
> > >
> > >  #define IOMMU_CMD_LINE_DMA_API BIT(0)
> > > -#define IOMMU_CMD_LINE_STRICT  BIT(1)
> > >
> > >  static int iommu_alloc_default_domain(struct iommu_group *group,
> > >   struct device *dev);
> > > @@ -336,25 +346,38 @@ early_param("iommu.passthrough", 
> > > iommu_set_def_domain_type);
> > >
> > >  static int __init iommu_dma_setup(char *str)
> > >  {
> > > -   int ret = kstrtobool(str, _dma_strict);
> > > +   bool strict;
> > > +   int ret = kstrtobool(str, );
> > >
> > > if (!ret)
> > > -   iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> > > +   cmdline_dma_strict = bool_to_strictness(strict);
> > > return ret;
> > >  }
> > >  early_param("iommu.strict", iommu_dma_setup);
> > >
> > >  void iommu_set_dma_strict(bool strict)
> > >  {
> > > -   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> > > -   iommu_dma_strict = strict;
> > > +   /* A driver can request strictness but not the other way around */
> > > +   if (driver_dma_strict != IOMMU_STRICT)
> > > +   driver_dma_strict = bool_to_strictness(strict);
> > >  }
> > >
> > >  bool iommu_get_dma_strict(struct iommu_domain *domain)
> > >  {
> > > -   /* only allow lazy flushing for DMA domains */
> > > -   if (domain->type == IOMMU_DOMAIN_DMA)
> > > -   return iommu_dma_strict;
> > > +   /* Non-DMA domains or anyone forcing it to strict makes it strict 
> > > */
> > > +   if (domain->type != IOMMU_DOMAIN_DMA ||
> > > +   cmdline_dma_strict == IOMMU_STRICT ||
> > > +   driver_dma_strict == IOMMU_STRICT ||
> > > +   domain->force_strict)
> > > +   return true;
> > > +
> > > +   /* Anyone requesting non-strict (if no forces) makes it 
> > > non-strict */
> > > +   if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> > > +   driver_dma_strict == IOMMU_NOT_STRICT ||
> > > +   domain->request_non_strict)
> > > +   return 

Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-22 Thread Doug Anderson
Hi,

On Tue, Jun 22, 2021 at 10:46 AM John Garry  wrote:
>
> On 22/06/2021 00:52, Douglas Anderson wrote:
> >
> > This patch attempts to put forward a proposal for enabling non-strict
> > DMA on a device-by-device basis. The patch series requests non-strict
> > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > getting a nice bump in performance with what's believed to be a very
> > small drop in security / safety (see the patch for the full argument).
> >
> > As part of this patch series I am end up slightly cleaning up some of
> > the interactions between the PCI subsystem and the IOMMU subsystem but
> > I don't go all the way to fully remove all the tentacles. Specifically
> > this patch series only concerns itself with a single aspect: strict
> > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > to talk about / reason about for more subsystems compared to overall
> > deciding what it means for a device to be "external" or "untrusted".
> >
> > If something like this patch series ends up being landable, it will
> > undoubtedly need coordination between many maintainers to land. I
> > believe it's fully bisectable but later patches in the series
> > definitely depend on earlier ones. Sorry for the long CC list. :(
> >
>
> JFYI, In case to missed it, and I know it's not the same thing as you
> want, above, but the following series will allow you to build the kernel
> to default to lazy mode:
>
> https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.ga...@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e
>
> So iommu.strict=0 would be no longer always required for arm64.

Excellent! I'm never a fan of command line parameters as a replacement
for Kconfig. They are great for debugging or for cases where the user
of the kernel and the person compiling the kernel are not the same
(like with off-the-shelf Linux distros) but aren't great for setting a
default for embedded environments.

I actually think that something like my patchset may be even more
important atop yours. Do you agree? If the default is non-strict it
could be extra important to be able to mark a certain device to be in
"strict" mode.

...also, unfortunately I probably won't be able to use your patchest
for my use case. I think we want the extra level of paranoia by
default and really only want to allow non-strict mode for devices that
we're really convinced are safe.

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-22 Thread Doug Anderson
Hi,

On Tue, Jun 22, 2021 at 11:45 AM Rajat Jain  wrote:
>
> On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson  
> wrote:
> >
> > In the patch ("drivers: base: Add bits to struct device to control
> > iommu strictness") we add the ability for devices to tell us about
> > their IOMMU strictness requirements. Let's now take that into account
> > in the IOMMU layer.
> >
> > A few notes here:
> > * Presumably this is always how iommu_get_dma_strict() was intended to
> >   behave. Had this not been the intention then it never would have
> >   taken a domain as a parameter.
> > * The iommu_set_dma_strict() feels awfully non-symmetric now. That
> >   function sets the _default_ strictness globally in the system
> >   whereas iommu_get_dma_strict() returns the value for a given domain
> >   (falling back to the default). Presumably, at least, the fact that
> >   iommu_set_dma_strict() doesn't take a domain makes this obvious.
> >
> > The function iommu_get_dma_strict() should now make it super obvious
> > where strictness comes from and who overides who. Though the function
> > changed a bunch to make the logic clearer, the only two new rules
> > should be:
> > * Devices can force strictness for themselves, overriding the cmdline
> >   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> > * Devices can request non-strictness for themselves, assuming there
> >   was no cmdline "iommu.strict=1" or a call to
> >   iommu_set_dma_strict(true).
>
> Along the same lines, I believe a platform (device tree / ACPI) should
> also be able to have a say in this. I assume in your proposal, a
> platform would expose a property in device tree which the device
> driver would need to parse and then use it to set these bits in the
> "struct device"?

Nothing would prevent creating a device tree or ACPI property that
caused either "force-strict" or "request-non-strict" from being set if
everyone agrees that it's a good idea. I wouldn't reject the idea
myself, but I do worry that we'd devolve into the usual bikeshed for
exactly how this should look. I talked about this a bit in my response
to Saravana, but basically:

* If there was some generic property, would we call it "untrusted",
"external", or something else?

* How do you describe "trust" in a generic "objective" way? It's not
really boolean and trying to describe exactly how trustworthy
something should be considered is hard.

* At least for the device tree there's a general requirement that it
describes the hardware and not so much how the software should
configure the hardware. As I understand it there is _some_ leeway here
where it's OK to describe how the hardware was designed for the OS to
configure it, but it's a pretty high bar and a hard sell. In general
the device tree isn't supposed to be used to describe "policy". In
other words: if one OS might decide on one setting and another OS on
another then it doesn't really belong in the device tree.

* In general the kernel is also not really supposed to have policy
hardcoded in either, though it feels like we can get away with having
a good default/sane policy and allowing overriding the policy with
command line parameters (like iommu.strict). In the case where
something has to be configured at bootup there's not many ways to do
better.


tl;dr: I have no plans to try to make an overarching property, but my
patch series does allow subsystems to come up with and easily
implement their own rules as it makes sense. While this might seem
hodgepodge I prefer to see it as "flexible" since I'm not convinced
that we're going to be able to come up with an overarching trust
framework.

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-22 Thread Rajat Jain via iommu
On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson  wrote:
>
> In the patch ("drivers: base: Add bits to struct device to control
> iommu strictness") we add the ability for devices to tell us about
> their IOMMU strictness requirements. Let's now take that into account
> in the IOMMU layer.
>
> A few notes here:
> * Presumably this is always how iommu_get_dma_strict() was intended to
>   behave. Had this not been the intention then it never would have
>   taken a domain as a parameter.
> * The iommu_set_dma_strict() feels awfully non-symmetric now. That
>   function sets the _default_ strictness globally in the system
>   whereas iommu_get_dma_strict() returns the value for a given domain
>   (falling back to the default). Presumably, at least, the fact that
>   iommu_set_dma_strict() doesn't take a domain makes this obvious.
>
> The function iommu_get_dma_strict() should now make it super obvious
> where strictness comes from and who overides who. Though the function
> changed a bunch to make the logic clearer, the only two new rules
> should be:
> * Devices can force strictness for themselves, overriding the cmdline
>   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> * Devices can request non-strictness for themselves, assuming there
>   was no cmdline "iommu.strict=1" or a call to
>   iommu_set_dma_strict(true).

Along the same lines, I believe a platform (device tree / ACPI) should
also be able to have a say in this. I assume in your proposal, a
platform would expose a property in device tree which the device
driver would need to parse and then use it to set these bits in the
"struct device"?

Thanks,

Rajat



>
> Signed-off-by: Douglas Anderson 
> ---
>
>  drivers/iommu/iommu.c | 56 +--
>  include/linux/iommu.h |  2 ++
>  2 files changed, 45 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 808ab70d5df5..0c84a4c06110 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -28,8 +28,19 @@
>  static struct kset *iommu_group_kset;
>  static DEFINE_IDA(iommu_group_ida);
>
> +enum iommu_strictness {
> +   IOMMU_DEFAULT_STRICTNESS = -1,
> +   IOMMU_NOT_STRICT = 0,
> +   IOMMU_STRICT = 1,
> +};
> +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> +{
> +   return (enum iommu_strictness)strictness;
> +}
> +
>  static unsigned int iommu_def_domain_type __read_mostly;
> -static bool iommu_dma_strict __read_mostly = true;
> +static enum iommu_strictness cmdline_dma_strict __read_mostly = 
> IOMMU_DEFAULT_STRICTNESS;
> +static enum iommu_strictness driver_dma_strict __read_mostly = 
> IOMMU_DEFAULT_STRICTNESS;
>  static u32 iommu_cmd_line __read_mostly;
>
>  struct iommu_group {
> @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
>  };
>
>  #define IOMMU_CMD_LINE_DMA_API BIT(0)
> -#define IOMMU_CMD_LINE_STRICT  BIT(1)
>
>  static int iommu_alloc_default_domain(struct iommu_group *group,
>   struct device *dev);
> @@ -336,25 +346,38 @@ early_param("iommu.passthrough", 
> iommu_set_def_domain_type);
>
>  static int __init iommu_dma_setup(char *str)
>  {
> -   int ret = kstrtobool(str, _dma_strict);
> +   bool strict;
> +   int ret = kstrtobool(str, );
>
> if (!ret)
> -   iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> +   cmdline_dma_strict = bool_to_strictness(strict);
> return ret;
>  }
>  early_param("iommu.strict", iommu_dma_setup);
>
>  void iommu_set_dma_strict(bool strict)
>  {
> -   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> -   iommu_dma_strict = strict;
> +   /* A driver can request strictness but not the other way around */
> +   if (driver_dma_strict != IOMMU_STRICT)
> +   driver_dma_strict = bool_to_strictness(strict);
>  }
>
>  bool iommu_get_dma_strict(struct iommu_domain *domain)
>  {
> -   /* only allow lazy flushing for DMA domains */
> -   if (domain->type == IOMMU_DOMAIN_DMA)
> -   return iommu_dma_strict;
> +   /* Non-DMA domains or anyone forcing it to strict makes it strict */
> +   if (domain->type != IOMMU_DOMAIN_DMA ||
> +   cmdline_dma_strict == IOMMU_STRICT ||
> +   driver_dma_strict == IOMMU_STRICT ||
> +   domain->force_strict)
> +   return true;
> +
> +   /* Anyone requesting non-strict (if no forces) makes it non-strict */
> +   if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> +   driver_dma_strict == IOMMU_NOT_STRICT ||
> +   domain->request_non_strict)
> +   return false;
> +
> +   /* Nobody said anything, so it's strict by default */
> return true;
>  }
>  EXPORT_SYMBOL_GPL(iommu_get_dma_strict);
> @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
>
>  static int iommu_group_alloc_default_domain(struct 

Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush

2021-06-22 Thread Robin Murphy

On 2021-06-22 15:27, Sai Prakash Ranjan wrote:

Hi Robin,

On 2021-06-22 17:41, Robin Murphy wrote:

On 2021-06-22 08:11, Sai Prakash Ranjan wrote:

Hi Robin,

On 2021-06-21 21:15, Robin Murphy wrote:

On 2021-06-18 03:51, Sai Prakash Ranjan wrote:

Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context
with tlb_flush_all() callback in partial walk flush to improve unmap
performance on select few platforms where the cost of 
over-invalidation

is less than the unmap latency.


I still think this doesn't belong anywhere near io-pgtable at all.
It's a driver-internal decision how exactly it implements a non-leaf
invalidation, and that may be more complex than a predetermined
boolean decision. For example, I've just realised for SMMUv3 we can't
invalidate multiple levels of table at once with a range command,
since if we assume the whole thing is mapped at worst-case page
granularity we may fail to invalidate any parts which are mapped as
intermediate-level blocks. If invalidating a 1GB region (with 4KB
granule) means having to fall back to 256K non-range commands, we may
not want to invalidate by VA then, even though doing so for a 2MB
region is still optimal.

It's also quite feasible that drivers might want to do this for leaf
invalidations too - if you don't like issuing 512 commands to
invalidate 2MB, do you like issuing 511 commands to invalidate 2044KB?
- and at that point the logic really has to be in the driver anyway.



Ok I will move this to tlb_flush_walk() functions in the drivers. In 
the previous
v1 thread, you suggested to make the choice in iommu_get_dma_strict() 
test,
I assume you meant the test in iommu_dma_init_domain() with a flag or 
was it
the leaf driver(ex:arm-smmu.c) test of iommu_get_dma_strict() in 
init_domain?


Yes, I meant literally inside the same condition where we currently
set "pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;" in
arm_smmu_init_domain_context().



Ok got it, thanks.

I am still a bit confused on where this flag would be? Should this be 
a part

of struct iommu_domain?


Well, if you were to rewrite the config with an alternative set of
flush_ops at that point it would be implicit. For a flag, probably
either in arm_smmu_domain or arm_smmu_impl. Maybe a flag would be less
useful than generalising straight to a "maximum number of by-VA
invalidations it's worth sending individually" threshold value?


But then we would still need some flag to make this implementation
specific (qcom specific for now) and this threshold would just be
another condition although it would have been useful if this was
generic enough.


Well, for that approach I assume we could do something like special-case 
0, or if it's a mutable per-domain value maybe just initialise it to 
SIZE_MAX or whatever such that it would never be reached in practice. 
Whichever way, it was meant to be implied that anything at the domain 
level would still be subject to final adjustment by the init_context hook.


Robin.


It's clear to me what overall shape and separation of responsibility is
most logical, but beyond that I don't have a particularly strong
opinion on the exact implementation; I've just been chucking ideas
around :)



Your ideas are very informative and useful :)

Thanks,
Sai


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-22 Thread John Garry

On 22/06/2021 00:52, Douglas Anderson wrote:


This patch attempts to put forward a proposal for enabling non-strict
DMA on a device-by-device basis. The patch series requests non-strict
DMA for the Qualcomm SDHCI controller as a first device to enable,
getting a nice bump in performance with what's believed to be a very
small drop in security / safety (see the patch for the full argument).

As part of this patch series I am end up slightly cleaning up some of
the interactions between the PCI subsystem and the IOMMU subsystem but
I don't go all the way to fully remove all the tentacles. Specifically
this patch series only concerns itself with a single aspect: strict
vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
to talk about / reason about for more subsystems compared to overall
deciding what it means for a device to be "external" or "untrusted".

If something like this patch series ends up being landable, it will
undoubtedly need coordination between many maintainers to land. I
believe it's fully bisectable but later patches in the series
definitely depend on earlier ones. Sorry for the long CC list. :(



JFYI, In case to missed it, and I know it's not the same thing as you 
want, above, but the following series will allow you to build the kernel 
to default to lazy mode:


https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.ga...@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e

So iommu.strict=0 would be no longer always required for arm64.

Thanks,
John




Douglas Anderson (6):
   drivers: base: Add the concept of "pre_probe" to drivers
   drivers: base: Add bits to struct device to control iommu strictness
   PCI: Indicate that we want to force strict DMA for untrusted devices
   iommu: Combine device strictness requests with the global default
   iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
   mmc: sdhci-msm: Request non-strict IOMMU mode

  drivers/base/dd.c | 10 +--
  drivers/iommu/dma-iommu.c |  2 +-
  drivers/iommu/iommu.c | 56 +++
  drivers/mmc/host/sdhci-msm.c  |  8 +
  drivers/pci/probe.c   |  4 ++-
  include/linux/device.h| 11 +++
  include/linux/device/driver.h |  9 ++
  include/linux/iommu.h |  2 ++
  8 files changed, 85 insertions(+), 17 deletions(-)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-22 Thread Doug Anderson
Hi,

On Tue, Jun 22, 2021 at 9:53 AM Doug Anderson  wrote:
>
> Hi,
>
> On Mon, Jun 21, 2021 at 7:05 PM Lu Baolu  wrote:
> >
> > On 6/22/21 7:52 AM, Douglas Anderson wrote:
> > > @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device 
> > > *dev)
> > >
> > >   static int iommu_group_alloc_default_domain(struct bus_type *bus,
> > >   struct iommu_group *group,
> > > - unsigned int type)
> > > + unsigned int type,
> > > + struct device *dev)
> > >   {
> > >   struct iommu_domain *dom;
> > >
> > > @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct 
> > > bus_type *bus,
> > >   if (!dom)
> > >   return -ENOMEM;
> > >
> > > + /* Save the strictness requests from the device */
> > > + if (dev && type == IOMMU_DOMAIN_DMA) {
> > > + dom->request_non_strict = dev->request_non_strict_iommu;
> > > + dom->force_strict = dev->force_strict_iommu;
> > > + }
> > > +
> >
> > An iommu default domain might be used by multiple devices which might
> > have different "strict" attributions. Then who could override who?
>
> My gut instinct would be that if multiple devices were part of a given
> domain that it would be combined like this:
>
> 1. Any device that requests strict makes the domain strict force strict.
>
> 2. To request non-strict all of the devices in the domain would have
> to request non-strict.
>
> To do that I'd have to change my patchset obviously, but I don't think
> it should be hard. We can just keep a count of devices and a count of
> the strict vs. non-strict requests? If there are no other blockers
> I'll try to do that in my v2.

One issue, I guess, is that we might need to transition a non-strict
domain to become strict. Is that possible? We'd end up with an extra
"flush queue" that we didn't need to allocate, but are there other
problems?

Actually, in general would it be possible to transition between strict
and non-strict at runtime as long as we called
init_iova_flush_queue()? Maybe that's a better solution than all
this--we just boot in strict mode and can just transition to
non-strict mode after-the-fact?


-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-22 Thread Doug Anderson
Hi,

On Mon, Jun 21, 2021 at 7:05 PM Lu Baolu  wrote:
>
> On 6/22/21 7:52 AM, Douglas Anderson wrote:
> > @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device 
> > *dev)
> >
> >   static int iommu_group_alloc_default_domain(struct bus_type *bus,
> >   struct iommu_group *group,
> > - unsigned int type)
> > + unsigned int type,
> > + struct device *dev)
> >   {
> >   struct iommu_domain *dom;
> >
> > @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct 
> > bus_type *bus,
> >   if (!dom)
> >   return -ENOMEM;
> >
> > + /* Save the strictness requests from the device */
> > + if (dev && type == IOMMU_DOMAIN_DMA) {
> > + dom->request_non_strict = dev->request_non_strict_iommu;
> > + dom->force_strict = dev->force_strict_iommu;
> > + }
> > +
>
> An iommu default domain might be used by multiple devices which might
> have different "strict" attributions. Then who could override who?

My gut instinct would be that if multiple devices were part of a given
domain that it would be combined like this:

1. Any device that requests strict makes the domain strict force strict.

2. To request non-strict all of the devices in the domain would have
to request non-strict.

To do that I'd have to change my patchset obviously, but I don't think
it should be hard. We can just keep a count of devices and a count of
the strict vs. non-strict requests? If there are no other blockers
I'll try to do that in my v2.

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-22 Thread Doug Anderson
Hi,

On Mon, Jun 21, 2021 at 7:56 PM Saravana Kannan  wrote:
>
> On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson  
> wrote:
> >
> > In the patch ("drivers: base: Add bits to struct device to control
> > iommu strictness") we add the ability for devices to tell us about
> > their IOMMU strictness requirements. Let's now take that into account
> > in the IOMMU layer.
> >
> > A few notes here:
> > * Presumably this is always how iommu_get_dma_strict() was intended to
> >   behave. Had this not been the intention then it never would have
> >   taken a domain as a parameter.
> > * The iommu_set_dma_strict() feels awfully non-symmetric now. That
> >   function sets the _default_ strictness globally in the system
> >   whereas iommu_get_dma_strict() returns the value for a given domain
> >   (falling back to the default). Presumably, at least, the fact that
> >   iommu_set_dma_strict() doesn't take a domain makes this obvious.
> >
> > The function iommu_get_dma_strict() should now make it super obvious
> > where strictness comes from and who overides who. Though the function
> > changed a bunch to make the logic clearer, the only two new rules
> > should be:
> > * Devices can force strictness for themselves, overriding the cmdline
> >   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> > * Devices can request non-strictness for themselves, assuming there
> >   was no cmdline "iommu.strict=1" or a call to
> >   iommu_set_dma_strict(true).
> >
> > Signed-off-by: Douglas Anderson 
> > ---
> >
> >  drivers/iommu/iommu.c | 56 +--
> >  include/linux/iommu.h |  2 ++
> >  2 files changed, 45 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 808ab70d5df5..0c84a4c06110 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -28,8 +28,19 @@
> >  static struct kset *iommu_group_kset;
> >  static DEFINE_IDA(iommu_group_ida);
> >
> > +enum iommu_strictness {
> > +   IOMMU_DEFAULT_STRICTNESS = -1,
> > +   IOMMU_NOT_STRICT = 0,
> > +   IOMMU_STRICT = 1,
> > +};
> > +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> > +{
> > +   return (enum iommu_strictness)strictness;
> > +}
> > +
> >  static unsigned int iommu_def_domain_type __read_mostly;
> > -static bool iommu_dma_strict __read_mostly = true;
> > +static enum iommu_strictness cmdline_dma_strict __read_mostly = 
> > IOMMU_DEFAULT_STRICTNESS;
> > +static enum iommu_strictness driver_dma_strict __read_mostly = 
> > IOMMU_DEFAULT_STRICTNESS;
> >  static u32 iommu_cmd_line __read_mostly;
> >
> >  struct iommu_group {
> > @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] 
> > = {
> >  };
> >
> >  #define IOMMU_CMD_LINE_DMA_API BIT(0)
> > -#define IOMMU_CMD_LINE_STRICT  BIT(1)
> >
> >  static int iommu_alloc_default_domain(struct iommu_group *group,
> >   struct device *dev);
> > @@ -336,25 +346,38 @@ early_param("iommu.passthrough", 
> > iommu_set_def_domain_type);
> >
> >  static int __init iommu_dma_setup(char *str)
> >  {
> > -   int ret = kstrtobool(str, _dma_strict);
> > +   bool strict;
> > +   int ret = kstrtobool(str, );
> >
> > if (!ret)
> > -   iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> > +   cmdline_dma_strict = bool_to_strictness(strict);
> > return ret;
> >  }
> >  early_param("iommu.strict", iommu_dma_setup);
> >
> >  void iommu_set_dma_strict(bool strict)
> >  {
> > -   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> > -   iommu_dma_strict = strict;
> > +   /* A driver can request strictness but not the other way around */
> > +   if (driver_dma_strict != IOMMU_STRICT)
> > +   driver_dma_strict = bool_to_strictness(strict);
> >  }
> >
> >  bool iommu_get_dma_strict(struct iommu_domain *domain)
> >  {
> > -   /* only allow lazy flushing for DMA domains */
> > -   if (domain->type == IOMMU_DOMAIN_DMA)
> > -   return iommu_dma_strict;
> > +   /* Non-DMA domains or anyone forcing it to strict makes it strict */
> > +   if (domain->type != IOMMU_DOMAIN_DMA ||
> > +   cmdline_dma_strict == IOMMU_STRICT ||
> > +   driver_dma_strict == IOMMU_STRICT ||
> > +   domain->force_strict)
> > +   return true;
> > +
> > +   /* Anyone requesting non-strict (if no forces) makes it non-strict 
> > */
> > +   if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> > +   driver_dma_strict == IOMMU_NOT_STRICT ||
> > +   domain->request_non_strict)
> > +   return false;
> > +
> > +   /* Nobody said anything, so it's strict by default */
>
> If iommu.strict is not set in the command line, upstream treats it as
> iommu.strict=1. Meaning, no drivers can override it.
>
> If I understand it correctly, with your series, if iommu.strict=1 is
> not set, 

Re: [PATCH] dt-bindings: arm-smmu: Fix json-schema syntax

2021-06-22 Thread Rob Herring
On Mon, Jun 21, 2021 at 7:58 AM Thierry Reding  wrote:
>
> From: Thierry Reding 
>
> Commit 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible
> string") introduced a jsonschema syntax error as a result of a rebase
> gone wrong. Fix it.
>
> Fixes: 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible string")
> Reported-by: Rob Herring 
> Signed-off-by: Thierry Reding 
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

Acked-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-22 Thread Doug Anderson
Hi,

On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy  wrote:
>
> Hi Doug,
>
> On 2021-06-22 00:52, Douglas Anderson wrote:
> >
> > This patch attempts to put forward a proposal for enabling non-strict
> > DMA on a device-by-device basis. The patch series requests non-strict
> > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > getting a nice bump in performance with what's believed to be a very
> > small drop in security / safety (see the patch for the full argument).
> >
> > As part of this patch series I am end up slightly cleaning up some of
> > the interactions between the PCI subsystem and the IOMMU subsystem but
> > I don't go all the way to fully remove all the tentacles. Specifically
> > this patch series only concerns itself with a single aspect: strict
> > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > to talk about / reason about for more subsystems compared to overall
> > deciding what it means for a device to be "external" or "untrusted".
> >
> > If something like this patch series ends up being landable, it will
> > undoubtedly need coordination between many maintainers to land. I
> > believe it's fully bisectable but later patches in the series
> > definitely depend on earlier ones. Sorry for the long CC list. :(
>
> Unfortunately, this doesn't work. In normal operation, the default
> domains should be established long before individual drivers are even
> loaded (if they are modules), let alone anywhere near probing. The fact
> that iommu_probe_device() sometimes gets called far too late off the
> back of driver probe is an unfortunate artefact of the original
> probe-deferral scheme, and causes other problems like potentially
> malformed groups - I've been forming a plan to fix that for a while now,
> so I for one really can't condone anything trying to rely on it.
> Non-deterministic behaviour based on driver probe order for multi-device
> groups is part of the existing problem, and your proposal seems equally
> vulnerable to that too.

Doh! :( I definitely can't say I understand the iommu subsystem
amazingly well. It was working for me, but I could believe that I was
somehow violating a rule somewhere.

I'm having a bit of a hard time understanding where the problem is
though. Is there any chance that you missed the part of my series
where I introduced a "pre_probe" step? Specifically, I see this:

* really_probe() is called w/ a driver and a device.
* -> calls dev->bus->dma_configure() w/ a "struct device *"
* -> eventually calls iommu_probe_device() w/ the device.
* -> calls iommu_alloc_default_domain() w/ the device
* -> calls iommu_group_alloc_default_domain()
* -> always allocates a new domain

...so we always have a "struct device" when a domain is allocated if
that domain is going to be associated with a device.

I will agree that iommu_probe_device() is called before the driver
probe, but unless I missed something it's after the device driver is
loaded.  ...and assuming something like patch #1 in this series looks
OK then iommu_probe_device() will be called after "pre_probe".

So assuming I'm not missing something, I'm not actually relying the
IOMMU getting init off the back of driver probe.


> FWIW we already have a go-faster knob for people who want to tweak the
> security/performance compromise for specific devices, namely the sysfs
> interface for changing a group's domain type before binding the relevant
> driver(s). Is that something you could use in your application, say from
> an initramfs script?

We've never had an initramfs script in Chrome OS. I don't know all the
history of why (I'm trying to check), but I'm nearly certain it was a
conscious decision. Probably it has to do with the fact that we're not
trying to build a generic distribution where a single boot source can
boot a huge variety of hardware. We generally have one kernel for a
class of devices. I believe avoiding the initramfs just keeps things
simpler.

I think trying to revamp Chrome OS to switch to an initramfs type
system would be a pretty big undertaking since (as I understand it)
you can't just run a little command and then return to the normal boot
flow. Once you switch to initramfs you're committing to finding /
setting up the rootfs yourself and on Chrome OS I believe that means a
whole bunch of dm-verity work.


...so probably the initramfs is a no-go for me, but I'm still crossing
my fingers that the pre_probe() might be legit if you take a second
look at it?

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush

2021-06-22 Thread Sai Prakash Ranjan

Hi Robin,

On 2021-06-22 17:41, Robin Murphy wrote:

On 2021-06-22 08:11, Sai Prakash Ranjan wrote:

Hi Robin,

On 2021-06-21 21:15, Robin Murphy wrote:

On 2021-06-18 03:51, Sai Prakash Ranjan wrote:
Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire 
context

with tlb_flush_all() callback in partial walk flush to improve unmap
performance on select few platforms where the cost of 
over-invalidation

is less than the unmap latency.


I still think this doesn't belong anywhere near io-pgtable at all.
It's a driver-internal decision how exactly it implements a non-leaf
invalidation, and that may be more complex than a predetermined
boolean decision. For example, I've just realised for SMMUv3 we can't
invalidate multiple levels of table at once with a range command,
since if we assume the whole thing is mapped at worst-case page
granularity we may fail to invalidate any parts which are mapped as
intermediate-level blocks. If invalidating a 1GB region (with 4KB
granule) means having to fall back to 256K non-range commands, we may
not want to invalidate by VA then, even though doing so for a 2MB
region is still optimal.

It's also quite feasible that drivers might want to do this for leaf
invalidations too - if you don't like issuing 512 commands to
invalidate 2MB, do you like issuing 511 commands to invalidate 
2044KB?

- and at that point the logic really has to be in the driver anyway.



Ok I will move this to tlb_flush_walk() functions in the drivers. In 
the previous
v1 thread, you suggested to make the choice in iommu_get_dma_strict() 
test,
I assume you meant the test in iommu_dma_init_domain() with a flag or 
was it
the leaf driver(ex:arm-smmu.c) test of iommu_get_dma_strict() in 
init_domain?


Yes, I meant literally inside the same condition where we currently
set "pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;" in
arm_smmu_init_domain_context().



Ok got it, thanks.

I am still a bit confused on where this flag would be? Should this be 
a part

of struct iommu_domain?


Well, if you were to rewrite the config with an alternative set of
flush_ops at that point it would be implicit. For a flag, probably
either in arm_smmu_domain or arm_smmu_impl. Maybe a flag would be less
useful than generalising straight to a "maximum number of by-VA
invalidations it's worth sending individually" threshold value?


But then we would still need some flag to make this implementation
specific (qcom specific for now) and this threshold would just be
another condition although it would have been useful if this was
generic enough.


It's clear to me what overall shape and separation of responsibility is
most logical, but beyond that I don't have a particularly strong
opinion on the exact implementation; I've just been chucking ideas
around :)



Your ideas are very informative and useful :)

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-22 Thread Rob Herring
On Tue, Jun 22, 2021 at 2:17 AM Geert Uytterhoeven  wrote:
>
> Hi Rob,
>
> On Tue, Jun 15, 2021 at 9:16 PM Rob Herring  wrote:
> > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
> > same size as the list is redundant and can be dropped. Note that is DT
> > schema specific behavior and not standard json-schema behavior. The tooling
> > will fixup the final schema adding any unspecified minItems/maxItems.
> >
> > This condition is partially checked with the meta-schema already, but
> > only if both 'minItems' and 'maxItems' are equal to the 'items' length.
> > An improved meta-schema is pending.
>
> > Signed-off-by: Rob Herring 
>
> > --- a/Documentation/devicetree/bindings/net/stm32-dwmac.yaml
> > +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.yaml
> > @@ -46,7 +46,6 @@ properties:
> >
> >clocks:
> >  minItems: 3
> > -maxItems: 5
> >  items:
> >- description: GMAC main clock
> >- description: MAC TX clock
>
> While resolving the conflict with commit fea99822914039c6
> ("dt-bindings: net: document ptp_ref clk in dwmac") in soc/for-next,
> I noticed the following construct for clock-names:
>
>   clock-names:
> minItems: 3
> maxItems: 6
> contains:
>   enum:
> - stmmaceth
> - mac-clk-tx
> - mac-clk-rx
> - ethstp
> - eth-ck
> - ptp_ref
>
> Should this use items instead of enum, and drop maxItems, or is this
> a valid construct to support specifying the clocks in random order?
> If the latter, it does mean that the order of clock-names may not
> match the order of the clock descriptions.

'contains' is true if one or more entries match the strings. So it is
really saying one of these is required. That's not really much of a
constraint. There's 'minContains' and 'maxContains' in newer
json-schema versions (not yet supported) that could add some
constraints if there has to be at least N entries from contains. An
'items' schema (as opposed to a list) would say all items have to
match one of the strings. I'm sure that's too strict.

TLDR: clocks for this binding are a mess and the above is probably all
we can do here.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] dma-mapping: use vmalloc_to_page for vmalloc addresses

2021-06-22 Thread Roman Skakun
This commit is dedicated to fix incorrect conversion from
cpu_addr to page address in cases when we get virtual
address which allocated in the vmalloc range.
As the result, virt_to_page() cannot convert this address
properly and return incorrect page address.

Need to detect such cases and obtains the page address using
vmalloc_to_page() instead.

Signed-off-by: Roman Skakun 
Reviewed-by: Andrii Anisov 
---
Hey!
Thanks for suggestions, Christoph!
I updated the patch according to your advice.
But, I'm so surprised because nobody catches this problem
in the common code before. It looks a bit strange as for me. 


 kernel/dma/ops_helpers.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 910ae69cae77..782728d8a393 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -5,6 +5,14 @@
  */
 #include 
 
+static struct page *cpu_addr_to_page(void *cpu_addr)
+{
+   if (is_vmalloc_addr(cpu_addr))
+   return vmalloc_to_page(cpu_addr);
+   else
+   return virt_to_page(cpu_addr);
+}
+
 /*
  * Create scatter-list for the already allocated DMA buffer.
  */
@@ -12,7 +20,7 @@ int dma_common_get_sgtable(struct device *dev, struct 
sg_table *sgt,
 void *cpu_addr, dma_addr_t dma_addr, size_t size,
 unsigned long attrs)
 {
-   struct page *page = virt_to_page(cpu_addr);
+   struct page *page = cpu_addr_to_page(cpu_addr);
int ret;
 
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
@@ -43,7 +51,7 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct 
*vma,
return -ENXIO;
 
return remap_pfn_range(vma, vma->vm_start,
-   page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
+   page_to_pfn(cpu_addr_to_page(cpu_addr)) + vma->vm_pgoff,
user_count << PAGE_SHIFT, vma->vm_page_prot);
 #else
return -ENXIO;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush

2021-06-22 Thread Robin Murphy

On 2021-06-22 08:11, Sai Prakash Ranjan wrote:

Hi Robin,

On 2021-06-21 21:15, Robin Murphy wrote:

On 2021-06-18 03:51, Sai Prakash Ranjan wrote:

Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context
with tlb_flush_all() callback in partial walk flush to improve unmap
performance on select few platforms where the cost of over-invalidation
is less than the unmap latency.


I still think this doesn't belong anywhere near io-pgtable at all.
It's a driver-internal decision how exactly it implements a non-leaf
invalidation, and that may be more complex than a predetermined
boolean decision. For example, I've just realised for SMMUv3 we can't
invalidate multiple levels of table at once with a range command,
since if we assume the whole thing is mapped at worst-case page
granularity we may fail to invalidate any parts which are mapped as
intermediate-level blocks. If invalidating a 1GB region (with 4KB
granule) means having to fall back to 256K non-range commands, we may
not want to invalidate by VA then, even though doing so for a 2MB
region is still optimal.

It's also quite feasible that drivers might want to do this for leaf
invalidations too - if you don't like issuing 512 commands to
invalidate 2MB, do you like issuing 511 commands to invalidate 2044KB?
- and at that point the logic really has to be in the driver anyway.



Ok I will move this to tlb_flush_walk() functions in the drivers. In the 
previous

v1 thread, you suggested to make the choice in iommu_get_dma_strict() test,
I assume you meant the test in iommu_dma_init_domain() with a flag or 
was it
the leaf driver(ex:arm-smmu.c) test of iommu_get_dma_strict() in 
init_domain?


Yes, I meant literally inside the same condition where we currently set 
"pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;" in 
arm_smmu_init_domain_context().


I am still a bit confused on where this flag would be? Should this be a 
part

of struct iommu_domain?


Well, if you were to rewrite the config with an alternative set of 
flush_ops at that point it would be implicit. For a flag, probably 
either in arm_smmu_domain or arm_smmu_impl. Maybe a flag would be less 
useful than generalising straight to a "maximum number of by-VA 
invalidations it's worth sending individually" threshold value? It's 
clear to me what overall shape and separation of responsibility is most 
logical, but beyond that I don't have a particularly strong opinion on 
the exact implementation; I've just been chucking ideas around :)


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-22 Thread Robin Murphy

On 2021-06-22 00:52, Douglas Anderson wrote:

In the patch ("drivers: base: Add bits to struct device to control
iommu strictness") we add the ability for devices to tell us about
their IOMMU strictness requirements. Let's now take that into account
in the IOMMU layer.

A few notes here:
* Presumably this is always how iommu_get_dma_strict() was intended to
   behave. Had this not been the intention then it never would have
   taken a domain as a parameter.


FWIW strictness does have the semantic of being a per-domain property, 
but mostly in the sense that it's only relevant to IOMMU_DOMAIN_DMA 
domains, so the main thing was encapsulating that check rather than 
duplicating it all over callsites.



* The iommu_set_dma_strict() feels awfully non-symmetric now. That
   function sets the _default_ strictness globally in the system
   whereas iommu_get_dma_strict() returns the value for a given domain
   (falling back to the default). Presumably, at least, the fact that
   iommu_set_dma_strict() doesn't take a domain makes this obvious.


It *is* asymmetric - one is for IOMMU core code and individual driver 
internals to know whether they need to do whatever bits of setting up a 
flush queue for a given domain they are responsible for, while the other 
is specifically for two drivers to force the global default in order to 
preserve legacy driver-specific behaviour. Maybe that should have been 
called something like iommu_set_dma_default_strict instead... :/


Robin.


The function iommu_get_dma_strict() should now make it super obvious
where strictness comes from and who overides who. Though the function
changed a bunch to make the logic clearer, the only two new rules
should be:
* Devices can force strictness for themselves, overriding the cmdline
   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
* Devices can request non-strictness for themselves, assuming there
   was no cmdline "iommu.strict=1" or a call to
   iommu_set_dma_strict(true).

Signed-off-by: Douglas Anderson 
---

  drivers/iommu/iommu.c | 56 +--
  include/linux/iommu.h |  2 ++
  2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..0c84a4c06110 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -28,8 +28,19 @@
  static struct kset *iommu_group_kset;
  static DEFINE_IDA(iommu_group_ida);
  
+enum iommu_strictness {

+   IOMMU_DEFAULT_STRICTNESS = -1,
+   IOMMU_NOT_STRICT = 0,
+   IOMMU_STRICT = 1,
+};
+static inline enum iommu_strictness bool_to_strictness(bool strictness)
+{
+   return (enum iommu_strictness)strictness;
+}
+
  static unsigned int iommu_def_domain_type __read_mostly;
-static bool iommu_dma_strict __read_mostly = true;
+static enum iommu_strictness cmdline_dma_strict __read_mostly = 
IOMMU_DEFAULT_STRICTNESS;
+static enum iommu_strictness driver_dma_strict __read_mostly = 
IOMMU_DEFAULT_STRICTNESS;
  static u32 iommu_cmd_line __read_mostly;
  
  struct iommu_group {

@@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
  };
  
  #define IOMMU_CMD_LINE_DMA_API		BIT(0)

-#define IOMMU_CMD_LINE_STRICT  BIT(1)
  
  static int iommu_alloc_default_domain(struct iommu_group *group,

  struct device *dev);
@@ -336,25 +346,38 @@ early_param("iommu.passthrough", 
iommu_set_def_domain_type);
  
  static int __init iommu_dma_setup(char *str)

  {
-   int ret = kstrtobool(str, _dma_strict);
+   bool strict;
+   int ret = kstrtobool(str, );
  
  	if (!ret)

-   iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
+   cmdline_dma_strict = bool_to_strictness(strict);
return ret;
  }
  early_param("iommu.strict", iommu_dma_setup);
  
  void iommu_set_dma_strict(bool strict)

  {
-   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-   iommu_dma_strict = strict;
+   /* A driver can request strictness but not the other way around */
+   if (driver_dma_strict != IOMMU_STRICT)
+   driver_dma_strict = bool_to_strictness(strict);
  }
  
  bool iommu_get_dma_strict(struct iommu_domain *domain)

  {
-   /* only allow lazy flushing for DMA domains */
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   return iommu_dma_strict;
+   /* Non-DMA domains or anyone forcing it to strict makes it strict */
+   if (domain->type != IOMMU_DOMAIN_DMA ||
+   cmdline_dma_strict == IOMMU_STRICT ||
+   driver_dma_strict == IOMMU_STRICT ||
+   domain->force_strict)
+   return true;
+
+   /* Anyone requesting non-strict (if no forces) makes it non-strict */
+   if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
+   driver_dma_strict == IOMMU_NOT_STRICT ||
+   domain->request_non_strict)
+   return false;
+
+   /* Nobody said anything, so it's strict by default */

Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-22 Thread Robin Murphy

Hi Doug,

On 2021-06-22 00:52, Douglas Anderson wrote:


This patch attempts to put forward a proposal for enabling non-strict
DMA on a device-by-device basis. The patch series requests non-strict
DMA for the Qualcomm SDHCI controller as a first device to enable,
getting a nice bump in performance with what's believed to be a very
small drop in security / safety (see the patch for the full argument).

As part of this patch series I am end up slightly cleaning up some of
the interactions between the PCI subsystem and the IOMMU subsystem but
I don't go all the way to fully remove all the tentacles. Specifically
this patch series only concerns itself with a single aspect: strict
vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
to talk about / reason about for more subsystems compared to overall
deciding what it means for a device to be "external" or "untrusted".

If something like this patch series ends up being landable, it will
undoubtedly need coordination between many maintainers to land. I
believe it's fully bisectable but later patches in the series
definitely depend on earlier ones. Sorry for the long CC list. :(


Unfortunately, this doesn't work. In normal operation, the default 
domains should be established long before individual drivers are even 
loaded (if they are modules), let alone anywhere near probing. The fact 
that iommu_probe_device() sometimes gets called far too late off the 
back of driver probe is an unfortunate artefact of the original 
probe-deferral scheme, and causes other problems like potentially 
malformed groups - I've been forming a plan to fix that for a while now, 
so I for one really can't condone anything trying to rely on it. 
Non-deterministic behaviour based on driver probe order for multi-device 
groups is part of the existing problem, and your proposal seems equally 
vulnerable to that too.


FWIW we already have a go-faster knob for people who want to tweak the 
security/performance compromise for specific devices, namely the sysfs 
interface for changing a group's domain type before binding the relevant 
driver(s). Is that something you could use in your application, say from 
an initramfs script?


Thanks,
Robin.


Douglas Anderson (6):
   drivers: base: Add the concept of "pre_probe" to drivers
   drivers: base: Add bits to struct device to control iommu strictness
   PCI: Indicate that we want to force strict DMA for untrusted devices
   iommu: Combine device strictness requests with the global default
   iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
   mmc: sdhci-msm: Request non-strict IOMMU mode

  drivers/base/dd.c | 10 +--
  drivers/iommu/dma-iommu.c |  2 +-
  drivers/iommu/iommu.c | 56 +++
  drivers/mmc/host/sdhci-msm.c  |  8 +
  drivers/pci/probe.c   |  4 ++-
  include/linux/device.h| 11 +++
  include/linux/device/driver.h |  9 ++
  include/linux/iommu.h |  2 ++
  8 files changed, 85 insertions(+), 17 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch

2021-06-22 Thread Robin Murphy

On 2021-06-21 18:51, Ross Philipson wrote:

On 6/18/21 2:32 PM, Robin Murphy wrote:

On 2021-06-18 17:12, Ross Philipson wrote:

The IOMMU should always be set to default translated type after
the PMRs are disabled to protect the MLE from DMA.

Signed-off-by: Ross Philipson 
---
   drivers/iommu/intel/iommu.c | 5 +
   drivers/iommu/iommu.c   | 6 +-
   2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284..4f0256d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -41,6 +41,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device
*dev)
    */
   static int device_def_domain_type(struct device *dev)
   {
+    /* Do not allow identity domain when Secure Launch is configured */
+    if (slaunch_get_flags() & SL_FLAG_ACTIVE)
+    return IOMMU_DOMAIN_DMA;


Is this specific to Intel? It seems like it could easily be done
commonly like the check for untrusted external devices.


It is currently Intel only but that will change. I will look into what
you suggest.


Yeah, it's simple and unobtrusive enough that I reckon it's worth going 
straight to the common version if it's worth doing at all.



+
   if (dev_is_pci(dev)) {
   struct pci_dev *pdev = to_pci_dev(dev);
   diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d..d49b7dd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -23,6 +23,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
     static struct kset *iommu_group_kset;
@@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
   {
   if (cmd_line)
   iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
-    iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
+
+    /* Do not allow identity domain when Secure Launch is configured */
+    if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
+    iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;


Quietly ignoring the setting and possibly leaving iommu_def_domain_type
uninitialised (note that 0 is not actually a usable type) doesn't seem
great. AFAICS this probably warrants similar treatment to the


Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
though passthrough was requested. Or perhaps something more is needed here?


mem_encrypt_active() case - there doesn't seem a great deal of value in
trying to save users from themselves if they care about measured boot
yet explicitly pass options which may compromise measured boot. If you
really want to go down that route there's at least the sysfs interface
you'd need to nobble as well, not to mention the various ways of
completely disabling IOMMUs...


Doing a secure launch with the kernel is not a general purpose user use
case. A lot of work is done to secure the environment. Allowing
passthrough mode would leave the secure launch kernel exposed to DMA. I
think what we are trying to do here is what we intend though there may
be a better way or perhaps it is incomplete as you suggest.


On second thoughts this is overkill anyway - if you do hook 
iommu_get_def_domain_type(), you're done (in terms of the kernel-managed 
setting, at least); it doesn't matter what iommu_def_domain_type gets 
set to if will never get used. However, since this isn't really a 
per-device thing, it might be more semantically appropriate to leave 
that alone and instead only massage the default type in 
iommu_subsys_init(), as for memory encryption.


When you say "secure the environment", what's the actual threat model 
here, i.e. who's securing what against whom? If it's a device lockdown 
type thing where the system owner wants to defend against the end user 
trying to mess with the software stack or gain access to parts they 
shouldn't, then possibly you can trust the command line, but there are 
definitely other places which need consideration. If on the other hand 
it's more about giving the end user confidence that their choice of 
software stack isn't being interfered with by a malicious host or 
external third parties, then it probably leans towards the opposite 
being true...


If the command line *is* within the threat model, consider "iommu=off" 
and/or "intel_iommu=off" for example: I don't know how PMRs work, but I 
can only imagine that that's liable to leave things either wide open, or 
blocked to the point of no DMA working at all, neither of which seems to 
be what you want. I'm guessing "intel_iommu=tboot_noforce" might have 
some relevant implications too.



It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on
!SECURE_LAUNCH for clarity though.


This came from a specific request to not make disabling IOMMU modes
build time dependent. This is because a secure launch enabled kernel can
also be booted as a general purpose kernel in cases where this is desired.


Ah, thanks for 

Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-22 Thread Geert Uytterhoeven
Hi Rob,

On Tue, Jun 15, 2021 at 9:16 PM Rob Herring  wrote:
> If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
> same size as the list is redundant and can be dropped. Note that is DT
> schema specific behavior and not standard json-schema behavior. The tooling
> will fixup the final schema adding any unspecified minItems/maxItems.
>
> This condition is partially checked with the meta-schema already, but
> only if both 'minItems' and 'maxItems' are equal to the 'items' length.
> An improved meta-schema is pending.

> Signed-off-by: Rob Herring 

> --- a/Documentation/devicetree/bindings/net/stm32-dwmac.yaml
> +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.yaml
> @@ -46,7 +46,6 @@ properties:
>
>clocks:
>  minItems: 3
> -maxItems: 5
>  items:
>- description: GMAC main clock
>- description: MAC TX clock

While resolving the conflict with commit fea99822914039c6
("dt-bindings: net: document ptp_ref clk in dwmac") in soc/for-next,
I noticed the following construct for clock-names:

  clock-names:
minItems: 3
maxItems: 6
contains:
  enum:
- stmmaceth
- mac-clk-tx
- mac-clk-rx
- ethstp
- eth-ck
- ptp_ref

Should this use items instead of enum, and drop maxItems, or is this
a valid construct to support specifying the clocks in random order?
If the latter, it does mean that the order of clock-names may not
match the order of the clock descriptions.

Gr{oetje,eeting}s,

Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Yongji Xie
On Tue, Jun 22, 2021 at 3:50 PM Jason Wang  wrote:
>
>
> 在 2021/6/22 下午3:22, Yongji Xie 写道:
> >> We need fix a way to propagate the error to the userspace.
> >>
> >> E.g if we want to stop the deivce, we will delay the status reset until
> >> we get respose from the userspace?
> >>
> > I didn't get how to delay the status reset. And should it be a DoS
> > that we want to fix if the userspace doesn't give a response forever?
>
>
> You're right. So let's make set_status() can fail first, then propagate
> its failure via VHOST_VDPA_SET_STATUS.
>

OK. So we only need to propagate the failure in the vhost-vdpa case, right?

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Jason Wang


在 2021/6/22 下午3:22, Yongji Xie 写道:

We need fix a way to propagate the error to the userspace.

E.g if we want to stop the deivce, we will delay the status reset until
we get respose from the userspace?


I didn't get how to delay the status reset. And should it be a DoS
that we want to fix if the userspace doesn't give a response forever?



You're right. So let's make set_status() can fail first, then propagate 
its failure via VHOST_VDPA_SET_STATUS.






+ }
+}
+
+static size_t vduse_vdpa_get_config_size(struct vdpa_device *vdpa)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ return dev->config_size;
+}
+
+static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int 
offset,
+   void *buf, unsigned int len)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ memcpy(buf, dev->config + offset, len);
+}
+
+static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int 
offset,
+ const void *buf, unsigned int len)
+{
+ /* Now we only support read-only configuration space */
+}
+
+static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ return dev->generation;
+}
+
+static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
+ struct vhost_iotlb *iotlb)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+ int ret;
+
+ ret = vduse_domain_set_map(dev->domain, iotlb);
+ if (ret)
+ return ret;
+
+ ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
+ if (ret) {
+ vduse_domain_clear_map(dev->domain, iotlb);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void vduse_vdpa_free(struct vdpa_device *vdpa)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ dev->vdev = NULL;
+}
+
+static const struct vdpa_config_ops vduse_vdpa_config_ops = {
+ .set_vq_address = vduse_vdpa_set_vq_address,
+ .kick_vq= vduse_vdpa_kick_vq,
+ .set_vq_cb  = vduse_vdpa_set_vq_cb,
+ .set_vq_num = vduse_vdpa_set_vq_num,
+ .set_vq_ready   = vduse_vdpa_set_vq_ready,
+ .get_vq_ready   = vduse_vdpa_get_vq_ready,
+ .set_vq_state   = vduse_vdpa_set_vq_state,
+ .get_vq_state   = vduse_vdpa_get_vq_state,
+ .get_vq_align   = vduse_vdpa_get_vq_align,
+ .get_features   = vduse_vdpa_get_features,
+ .set_features   = vduse_vdpa_set_features,
+ .set_config_cb  = vduse_vdpa_set_config_cb,
+ .get_vq_num_max = vduse_vdpa_get_vq_num_max,
+ .get_device_id  = vduse_vdpa_get_device_id,
+ .get_vendor_id  = vduse_vdpa_get_vendor_id,
+ .get_status = vduse_vdpa_get_status,
+ .set_status = vduse_vdpa_set_status,
+ .get_config_size= vduse_vdpa_get_config_size,
+ .get_config = vduse_vdpa_get_config,
+ .set_config = vduse_vdpa_set_config,
+ .get_generation = vduse_vdpa_get_generation,
+ .set_map= vduse_vdpa_set_map,
+ .free   = vduse_vdpa_free,
+};
+
+static dma_addr_t vduse_dev_map_page(struct device *dev, struct page *page,
+  unsigned long offset, size_t size,
+  enum dma_data_direction dir,
+  unsigned long attrs)
+{
+ struct vduse_dev *vdev = dev_to_vduse(dev);
+ struct vduse_iova_domain *domain = vdev->domain;
+
+ return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
+}
+
+static void vduse_dev_unmap_page(struct device *dev, dma_addr_t dma_addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ struct vduse_dev *vdev = dev_to_vduse(dev);
+ struct vduse_iova_domain *domain = vdev->domain;
+
+ return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
+}
+
+static void *vduse_dev_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_addr, gfp_t flag,
+ unsigned long attrs)
+{
+ struct vduse_dev *vdev = dev_to_vduse(dev);
+ struct vduse_iova_domain *domain = vdev->domain;
+ unsigned long iova;
+ void *addr;
+
+ *dma_addr = DMA_MAPPING_ERROR;
+ addr = vduse_domain_alloc_coherent(domain, size,
+ (dma_addr_t *), flag, attrs);
+ if (!addr)
+ return NULL;
+
+ *dma_addr = (dma_addr_t)iova;
+
+ return addr;
+}
+
+static void vduse_dev_free_coherent(struct device *dev, size_t size,
+ void *vaddr, dma_addr_t dma_addr,
+ unsigned long attrs)
+{
+ struct vduse_dev *vdev = dev_to_vduse(dev);
+ struct vduse_iova_domain *domain = vdev->domain;

Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-22 Thread 'Dominique MARTINET'
Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400:
> The beaty of 'devel' and 'linux-next' is that they can be reshuffled and
> mangled. I pushed them original patch from Bumyong there and will let
> it sit for a day and then create a stable branch and give it to Linus.

Thanks, that should be good.

Do you want me to send a follow-up patch with the two extra checks
(tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)
tlb_offset < alloc_size

or are we certain this can't ever happen?
(I didn't see any hit in dmesg when I ran with these, but my opinion is
better safe than sorry...)


> Then I need to expand the test-regression bucket so that this does not
> happen again. Dominique, how easy would it be to purchase one of those
> devices?

My company is making such a device, but it's not on the market yet
(was planned for august, with some delay in approvisionning it'll
probably be a bit late), and would mean buying from Japan so I'm not
sure how convenient that would be...

These are originally NXP devices so I assume Horia would have better
suggestions, if you would?


> I was originally thinking to create a crypto device in QEMU to simulate
> this but that may take longer to write than just getting the real thing.
> 
> Or I could create some fake devices with weird offsets and write a driver
> for it to exercise this.. like this one I had done some time ago that
> needs some brushing off.

Just a fake device with fake offsets as a test is probably good enough,
ideally would need to exerce both failures we've seen (offset in
dma_sync_single_for_device like caam does and in the original mapping (I
assume?) like the NVMe driver does), but that sounds possible :)


Thanks again!
-- 
Dominique
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/5] ACPI: Move IOMMU setup code out of IORT

2021-06-22 Thread Eric Auger
Hi Jean,
On 6/18/21 5:20 PM, Jean-Philippe Brucker wrote:
> Extract the code that sets up the IOMMU infrastructure from IORT, since
> it can be reused by VIOT. Move it one level up into a new
> acpi_iommu_configure_id() function, which calls the IORT parsing
> function which in turn calls the acpi_iommu_fwspec_init() helper.
>
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  include/acpi/acpi_bus.h   |  3 ++
>  include/linux/acpi_iort.h |  8 ++---
>  drivers/acpi/arm64/iort.c | 74 +--
>  drivers/acpi/scan.c   | 73 +-
>  4 files changed, 86 insertions(+), 72 deletions(-)
>
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index 3a82faac5767..41f092a269f6 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -588,6 +588,9 @@ struct acpi_pci_root {
>  
>  bool acpi_dma_supported(struct acpi_device *adev);
>  enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
> +int acpi_iommu_fwspec_init(struct device *dev, u32 id,
> +struct fwnode_handle *fwnode,
> +const struct iommu_ops *ops);
>  int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset,
>  u64 *size);
>  int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
> diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
> index f7f054833afd..f1f0842a2cb2 100644
> --- a/include/linux/acpi_iort.h
> +++ b/include/linux/acpi_iort.h
> @@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev);
>  int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
>  /* IOMMU interface */
>  int iort_dma_get_ranges(struct device *dev, u64 *size);
> -const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
> - const u32 *id_in);
> +int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
>  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
> *head);
>  phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
>  #else
> @@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device 
> *dev) { }
>  /* IOMMU interface */
>  static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
>  { return -ENODEV; }
> -static inline const struct iommu_ops *iort_iommu_configure_id(
> -   struct device *dev, const u32 *id_in)
> -{ return NULL; }
> +static inline int iort_iommu_configure_id(struct device *dev, const u32 
> *id_in)
> +{ return -ENODEV; }
>  static inline
>  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
> *head)
>  { return 0; }
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index a940be1cf2af..487d1095030d 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -806,23 +806,6 @@ static struct acpi_iort_node 
> *iort_get_msi_resv_iommu(struct device *dev)
>   return NULL;
>  }
>  
> -static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device 
> *dev)
> -{
> - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> -
> - return (fwspec && fwspec->ops) ? fwspec->ops : NULL;
> -}
> -
> -static inline int iort_add_device_replay(struct device *dev)
> -{
> - int err = 0;
> -
> - if (dev->bus && !device_iommu_mapped(dev))
> - err = iommu_probe_device(dev);
> -
> - return err;
> -}
> -
>  /**
>   * iort_iommu_msi_get_resv_regions - Reserved region driver helper
>   * @dev: Device from iommu_get_resv_regions()
> @@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type)
>   }
>  }
>  
> -static int arm_smmu_iort_xlate(struct device *dev, u32 streamid,
> -struct fwnode_handle *fwnode,
> -const struct iommu_ops *ops)
> -{
> - int ret = iommu_fwspec_init(dev, fwnode, ops);
> -
> - if (!ret)
> - ret = iommu_fwspec_add_ids(dev, , 1);
> -
> - return ret;
> -}
> -
>  static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
>  {
>   struct acpi_iort_root_complex *pci_rc;
> @@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct 
> acpi_iort_node *node,
>   return iort_iommu_driver_enabled(node->type) ?
>  -EPROBE_DEFER : -ENODEV;
>  
> - return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops);
> + return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops);
>  }
>  
>  struct iort_pci_alias_info {
> @@ -1020,24 +991,13 @@ static int iort_nc_iommu_map_id(struct device *dev,
>   * @dev: device to configure
>   * @id_in: optional input id const value pointer
>   *
> - * Returns: iommu_ops pointer on configuration success
> - *  NULL on configuration failure
> + * Returns: 0 on success, <0 on failure
>   */
> -const struct iommu_ops *iort_iommu_configure_id(struct 

Re: [PATCH v5 4/5] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()

2021-06-22 Thread Eric Auger
Hi Jean,

On 6/18/21 5:20 PM, Jean-Philippe Brucker wrote:
> Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
> virtual platforms, but isn't currently possible. The overflow check in
> iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
> a limit address instead of a size, so callers don't have to fake a size
> to work around the check.
>
> The base and limit parameters are being phased out, because:
> * they are redundant for x86 callers. dma-iommu already reserves the
>   first page, and the upper limit is already in domain->geometry.
> * they can now be obtained from dev->dma_range_map on Arm.
> But removing them on Arm isn't completely straightforward so is left for
> future work. As an intermediate step, simplify the x86 callers by
> passing dummy limits.
>
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Eric
> ---
>  include/linux/dma-iommu.h   |  4 ++--
>  arch/arm64/mm/dma-mapping.c |  2 +-
>  drivers/iommu/amd/iommu.c   |  2 +-
>  drivers/iommu/dma-iommu.c   | 12 ++--
>  drivers/iommu/intel/iommu.c |  5 +
>  5 files changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 6e75a2d689b4..758ca4694257 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, 
> dma_addr_t base);
>  void iommu_put_dma_cookie(struct iommu_domain *domain);
>  
>  /* Setup call for arch DMA mapping code */
> -void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
> +void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
>  
>  /* The DMA API isn't _quite_ the whole story, though... */
>  /*
> @@ -50,7 +50,7 @@ struct msi_msg;
>  struct device;
>  
>  static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base,
> - u64 size)
> +u64 dma_limit)
>  {
>  }
>  
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 4bf1dd3eb041..6719f9efea09 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
> u64 size,
>  
>   dev->dma_coherent = coherent;
>   if (iommu)
> - iommu_setup_dma_ops(dev, dma_base, size);
> + iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
>  
>  #ifdef CONFIG_XEN
>   if (xen_swiotlb_detect())
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 3ac42bbdefc6..216323fb27ef 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev)
>   /* Domains are initialized for this device - have a look what we ended 
> up with */
>   domain = iommu_get_domain_for_dev(dev);
>   if (domain->type == IOMMU_DOMAIN_DMA)
> - iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
> + iommu_setup_dma_ops(dev, 0, U64_MAX);
>   else
>   set_dma_ops(dev, NULL);
>  }
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..c62e19bed302 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev)
>   * iommu_dma_init_domain - Initialise a DMA mapping domain
>   * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
>   * @base: IOVA at which the mappable address space starts
> - * @size: Size of IOVA space
> + * @limit: Last address of the IOVA space
>   * @dev: Device the domain is being initialised for
>   *
> - * @base and @size should be exact multiples of IOMMU page granularity to
> + * @base and @limit + 1 should be exact multiples of IOMMU page granularity 
> to
>   * avoid rounding surprises. If necessary, we reserve the page at address 0
>   * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
>   * any change which could make prior IOVAs invalid will fail.
>   */
>  static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t 
> base,
> - u64 size, struct device *dev)
> +  dma_addr_t limit, struct device *dev)
>  {
>   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   unsigned long order, base_pfn;
> @@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
>   /* Check the domain allows at least some access to the device... */
>   if (domain->geometry.force_aperture) {
>   if (base > domain->geometry.aperture_end ||
> - base + size <= domain->geometry.aperture_start) {
> + limit < domain->geometry.aperture_start) {
>   pr_warn("specified DMA range outside IOMMU 
> capability\n");
>   return -EFAULT;
>   

Re: [PATCH 1/1] dma-mapping: remove trailing spaces and tabs

2021-06-22 Thread Christoph Hellwig
Thanks,

applied to the dma-mapping tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma debug: report -EEXIST errors in add_dma_entry

2021-06-22 Thread Christoph Hellwig
Thanks,

applied to the dma-mapping tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Yongji Xie
.On Tue, Jun 22, 2021 at 1:07 PM Jason Wang  wrote:
>
>
> 在 2021/6/21 下午6:41, Yongji Xie 写道:
> > On Mon, Jun 21, 2021 at 5:14 PM Jason Wang  wrote:
> >>
> >> 在 2021/6/15 下午10:13, Xie Yongji 写道:
> >>> This VDUSE driver enables implementing vDPA devices in userspace.
> >>> The vDPA device's control path is handled in kernel and the data
> >>> path is handled in userspace.
> >>>
> >>> A message mechnism is used by VDUSE driver to forward some control
> >>> messages such as starting/stopping datapath to userspace. Userspace
> >>> can use read()/write() to receive/reply those control messages.
> >>>
> >>> And some ioctls are introduced to help userspace to implement the
> >>> data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
> >>> descriptors referring to vDPA device's iova regions. Then userspace
> >>> can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
> >>> and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
> >>> and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
> >>> ioctls can be used to inject interrupt and setup the kickfd for
> >>> virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
> >>> configuration space and inject a config interrupt.
> >>>
> >>> Signed-off-by: Xie Yongji 
> >>> ---
> >>>Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
> >>>drivers/vdpa/Kconfig   |   10 +
> >>>drivers/vdpa/Makefile  |1 +
> >>>drivers/vdpa/vdpa_user/Makefile|5 +
> >>>drivers/vdpa/vdpa_user/vduse_dev.c | 1453 
> >>> 
> >>>include/uapi/linux/vduse.h |  143 ++
> >>>6 files changed, 1613 insertions(+)
> >>>create mode 100644 drivers/vdpa/vdpa_user/Makefile
> >>>create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
> >>>create mode 100644 include/uapi/linux/vduse.h
> >>>
> >>> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> >>> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> >>> index 9bfc2b510c64..acd95e9dcfe7 100644
> >>> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> >>> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> >>> @@ -300,6 +300,7 @@ Code  Seq#Include File
> >>>Comments
> >>>'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
> >>> conflict!
> >>>'|'   00-7F  linux/media.h
> >>>0x80  00-1F  linux/fb.h
> >>> +0x81  00-1F  linux/vduse.h
> >>>0x89  00-06  arch/x86/include/asm/sockios.h
> >>>0x89  0B-DF  linux/sockios.h
> >>>0x89  E0-EF  linux/sockios.h 
> >>> SIOCPROTOPRIVATE range
> >>> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> >>> index a503c1b2bfd9..6e23bce6433a 100644
> >>> --- a/drivers/vdpa/Kconfig
> >>> +++ b/drivers/vdpa/Kconfig
> >>> @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
> >>>  vDPA block device simulator which terminates IO request in a
> >>>  memory buffer.
> >>>
> >>> +config VDPA_USER
> >>> + tristate "VDUSE (vDPA Device in Userspace) support"
> >>> + depends on EVENTFD && MMU && HAS_DMA
> >>> + select DMA_OPS
> >>> + select VHOST_IOTLB
> >>> + select IOMMU_IOVA
> >>> + help
> >>> +   With VDUSE it is possible to emulate a vDPA Device
> >>> +   in a userspace program.
> >>> +
> >>>config IFCVF
> >>>tristate "Intel IFC VF vDPA driver"
> >>>depends on PCI_MSI
> >>> diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
> >>> index 67fe7f3d6943..f02ebed33f19 100644
> >>> --- a/drivers/vdpa/Makefile
> >>> +++ b/drivers/vdpa/Makefile
> >>> @@ -1,6 +1,7 @@
> >>># SPDX-License-Identifier: GPL-2.0
> >>>obj-$(CONFIG_VDPA) += vdpa.o
> >>>obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
> >>> +obj-$(CONFIG_VDPA_USER) += vdpa_user/
> >>>obj-$(CONFIG_IFCVF)+= ifcvf/
> >>>obj-$(CONFIG_MLX5_VDPA) += mlx5/
> >>>obj-$(CONFIG_VP_VDPA)+= virtio_pci/
> >>> diff --git a/drivers/vdpa/vdpa_user/Makefile 
> >>> b/drivers/vdpa/vdpa_user/Makefile
> >>> new file mode 100644
> >>> index ..260e0b26af99
> >>> --- /dev/null
> >>> +++ b/drivers/vdpa/vdpa_user/Makefile
> >>> @@ -0,0 +1,5 @@
> >>> +# SPDX-License-Identifier: GPL-2.0
> >>> +
> >>> +vduse-y := vduse_dev.o iova_domain.o
> >>> +
> >>> +obj-$(CONFIG_VDPA_USER) += vduse.o
> >>> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> >>> b/drivers/vdpa/vdpa_user/vduse_dev.c
> >>> new file mode 100644
> >>> index ..5271cbd15e28
> >>> --- /dev/null
> >>> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> >>> @@ -0,0 +1,1453 @@
> >>> +// SPDX-License-Identifier: GPL-2.0-only
> >>> +/*
> >>> + * VDUSE: vDPA Device in Userspace
> >>> + *
> >>> + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All 
> >>> rights reserved.
> >>> + *
> >>> + * Author: Xie Yongji 
> >>> + *
> >>> + */
> >>> +
> >>> 

Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush

2021-06-22 Thread Sai Prakash Ranjan

Hi Robin,

On 2021-06-21 21:15, Robin Murphy wrote:

On 2021-06-18 03:51, Sai Prakash Ranjan wrote:

Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context
with tlb_flush_all() callback in partial walk flush to improve unmap
performance on select few platforms where the cost of 
over-invalidation

is less than the unmap latency.


I still think this doesn't belong anywhere near io-pgtable at all.
It's a driver-internal decision how exactly it implements a non-leaf
invalidation, and that may be more complex than a predetermined
boolean decision. For example, I've just realised for SMMUv3 we can't
invalidate multiple levels of table at once with a range command,
since if we assume the whole thing is mapped at worst-case page
granularity we may fail to invalidate any parts which are mapped as
intermediate-level blocks. If invalidating a 1GB region (with 4KB
granule) means having to fall back to 256K non-range commands, we may
not want to invalidate by VA then, even though doing so for a 2MB
region is still optimal.

It's also quite feasible that drivers might want to do this for leaf
invalidations too - if you don't like issuing 512 commands to
invalidate 2MB, do you like issuing 511 commands to invalidate 2044KB?
- and at that point the logic really has to be in the driver anyway.



Ok I will move this to tlb_flush_walk() functions in the drivers. In the 
previous
v1 thread, you suggested to make the choice in iommu_get_dma_strict() 
test,
I assume you meant the test in iommu_dma_init_domain() with a flag or 
was it
the leaf driver(ex:arm-smmu.c) test of iommu_get_dma_strict() in 
init_domain?


I am still a bit confused on where this flag would be? Should this be a 
part

of struct iommu_domain?

Thanks,
Sai




Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/io-pgtable-arm.c | 3 ++-
  include/linux/io-pgtable.h | 5 +
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 87def58e79b5..5d362f2214bd 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -768,7 +768,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg 
*cfg, void *cookie)

if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NON_STRICT |
IO_PGTABLE_QUIRK_ARM_TTBR1 |
-   IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+   IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
+   IO_PGTABLE_QUIRK_TLB_INV_ALL))
return NULL;
data = arm_lpae_alloc_pgtable(cfg);
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4d40dfa75b55..45441592a0e6 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -82,6 +82,10 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 *  attributes set in the TCR for a non-coherent page-table walker.
+*
+* IO_PGTABLE_QUIRK_TLB_INV_ALL: Use TLBIALL/TLBIASID to invalidate
+*  entire context for partial walk flush to increase unmap
+*  performance on select few platforms.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -89,6 +93,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
+   #define IO_PGTABLE_QUIRK_TLB_INV_ALLBIT(7)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu