Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-03-01 Thread Keqian Zhu
Hi Robin,

I am going to send v2 at next week, to addresses these issues reported by you. 
Many thanks!
And do you have any further comments on patch #4 #5 and #6?

Thanks,
Keqian

On 2021/2/5 3:50, Robin Murphy wrote:
> On 2021-01-28 15:17, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> The SMMU which supports HTTU (Hardware Translation Table Update) can
>> update the access flag and the dirty state of TTD by hardware. It is
>> essential to track dirty pages of DMA.
>>
>> This adds feature detection, none functional change.
>>
>> Co-developed-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 
>>   include/linux/io-pgtable.h  |  1 +
>>   3 files changed, 25 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 8ca7415d785d..0f0fe71cc10d 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct 
>> iommu_domain *domain,
>>   .pgsize_bitmap= smmu->pgsize_bitmap,
>>   .ias= ias,
>>   .oas= oas,
>> +.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
>>   .coherent_walk= smmu->features & ARM_SMMU_FEAT_COHERENCY,
>>   .tlb= &arm_smmu_flush_ops,
>>   .iommu_dev= smmu->dev,
>> @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct 
>> arm_smmu_device *smmu)
>>   if (reg & IDR0_HYP)
>>   smmu->features |= ARM_SMMU_FEAT_HYP;
>>   +switch (FIELD_GET(IDR0_HTTU, reg)) {
> 
> We need to accommodate the firmware override as well if we need this to be 
> meaningful. Jean-Philippe is already carrying a suitable patch in the SVA 
> stack[1].
> 
>> +case IDR0_HTTU_NONE:
>> +break;
>> +case IDR0_HTTU_HA:
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
>> +break;
>> +case IDR0_HTTU_HAD:
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HD;
>> +break;
>> +default:
>> +dev_err(smmu->dev, "unknown/unsupported HTTU!\n");
>> +return -ENXIO;
>> +}
>> +
>>   /*
>>* The coherency feature as set by FW is used in preference to the ID
>>* register, but warn on mismatch.
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 96c2e9565e00..e91bea44519e 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -33,6 +33,10 @@
>>   #define IDR0_ASID16(1 << 12)
>>   #define IDR0_ATS(1 << 10)
>>   #define IDR0_HYP(1 << 9)
>> +#define IDR0_HTTUGENMASK(7, 6)
>> +#define IDR0_HTTU_NONE0
>> +#define IDR0_HTTU_HA1
>> +#define IDR0_HTTU_HAD2
>>   #define IDR0_COHACC(1 << 4)
>>   #define IDR0_TTFGENMASK(3, 2)
>>   #define IDR0_TTF_AARCH642
>> @@ -286,6 +290,8 @@
>>   #define CTXDESC_CD_0_TCR_TBI0(1ULL << 38)
>> #define CTXDESC_CD_0_AA64(1UL << 41)
>> +#define CTXDESC_CD_0_HD(1UL << 42)
>> +#define CTXDESC_CD_0_HA(1UL << 43)
>>   #define CTXDESC_CD_0_S(1UL << 44)
>>   #define CTXDESC_CD_0_R(1UL << 45)
>>   #define CTXDESC_CD_0_A(1UL << 46)
>> @@ -604,6 +610,8 @@ struct arm_smmu_device {
>>   #define ARM_SMMU_FEAT_RANGE_INV(1 << 15)
>>   #define ARM_SMMU_FEAT_BTM(1 << 16)
>>   #define ARM_SMMU_FEAT_SVA(1 << 17)
>> +#define ARM_SMMU_FEAT_HTTU_HA(1 << 18)
>> +#define ARM_SMMU_FEAT_HTTU_HD(1 << 19)
>>   u32features;
>> #define ARM_SMMU_OPT_SKIP_PREFETCH(1 << 0)
>> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
>> index ea727eb1a1a9..1a00ea8562c7 100644
>> --- a/include/linux/io-pgtable.h
>> +++ b/include/linux/io-pgtable.h
>> @@ -97,6 +97,7 @@ struct io_pgtable_cfg {
>>   unsigned longpgsize_bitmap;
>>   unsigned intias;
>>   unsigned intoas;
>> +boolhttu_hd;
> 
> This is very specific to the AArch64 stage 1 format, not a generic capability 
> - I think it should be a quirk flag rather than a common field.
> 
> Robin.
> 
> [1] 
> https://jpbrucker.net/git/linux/commit/?h=sva/current&id=1ef7d512fb9082450dfe0d22ca4f7e35625a097b
> 
>>   boolcoherent_walk;
>>   const struct iommu_flush_ops*tlb;
>>   struct device*iommu_dev;
>>
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] fork: Clear PASID for new mm

2021-03-01 Thread Jacob Pan
Hi Fenghua,

On Thu, 25 Feb 2021 22:17:11 +, Fenghua Yu  wrote:

> Hi, Jean,
> 
> On Wed, Feb 24, 2021 at 11:19:27AM +0100, Jean-Philippe Brucker wrote:
> > Hi Fenghua,
> > 
> > [Trimmed the Cc list]
> > 
> > On Mon, Jul 13, 2020 at 04:48:03PM -0700, Fenghua Yu wrote:  
> > > When a new mm is created, its PASID should be cleared, i.e. the PASID
> > > is initialized to its init state 0 on both ARM and X86.  
> > 
> > I just noticed this patch was dropped in v7, and am wondering whether we
> > could still upstream it. Does x86 need a child with a new address space
> > (!CLONE_VM) to inherit the PASID of the parent?  That doesn't make much
> > sense with regard to IOMMU structures - same PASID indexing multiple
> > PGDs?  
> 
> You are right: x86 should clear mm->pasid when a new mm is created.
> This patch somehow is losted:(
> 
> > 
> > Currently iommu_sva_alloc_pasid() assumes mm->pasid is always
> > initialized to 0 and fails on forked tasks. I'm trying to figure out
> > how to fix this. Could we clear the pasid on fork or does it break the
> > x86 model?  
> 
> x86 calls ioasid_alloc() instead of iommu_sva_alloc_pasid(). So
We should consolidate at some point, there is no need to store pasid in two
places.

> functionality is not a problem without this patch on x86. But I think
I feel the reason that x86 doesn't care is that mm->pasid is not used
unless bind_mm is called. For the fork children even mm->pasid is non-zero,
it has no effect since it is not loaded onto MSRs.
Perhaps you could also add a check or WARN_ON(!mm->pasid) in load_pasid()?

> we do need to have this patch in the kernel because PASID is per addr
> space and two addr spaces shouldn't have the same PASID.
> 
Agreed.

> Who will accept this patch?
> 
> Thanks.
> 
> -Fenghua


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] Revert "iommu/iova: Retry from last rb tree node if iova search fails"

2021-03-01 Thread John Garry

On 01/03/2021 13:20, Robin Murphy wrote:

FWIW, I'm 99% sure that what you really want is [1], but then you get
to battle against an unknown quantity of dodgy firmware instead.


Something which has not been said before is that this only happens for
strict mode.

I think that makes sense - once you*have*  actually failed to allocate
from the 32-bit space, max32_alloc_size will make subsequent attempts
fail immediately. In non-strict mode you're most likely freeing 32-bit
IOVAs back to the tree - and thus reset max32_alloc_size - much less
often, and you'll make more total space available each time, both of
which will amortise the cost of getting back into that failed state
again. Conversely, the worst case in strict mode is to have multiple
threads getting into this pathological cycle:

1: allocate, get last available IOVA
2: allocate, fail and set max32_alloc_size
3: free one IOVA, reset max32_alloc_size, goto 1

Now, given the broken behaviour where the cached PFN can get stuck near
the bottom of the address space, step 2 might well have been faster and
more premature than it should have, but I hope you can appreciate that
relying on an allocator being broken at its fundamental purpose of
allocating is not a good or sustainable thing to do.


I figure that you're talking about 4e89dce72521 now. I would have liked 
to know which real-life problem it solved in practice.




While max32_alloc_size indirectly tracks the largest*contiguous*  
available space, one of the ideas from which it grew was to simply keep

count of the total number of free PFNs. If you're really spending
significant time determining that the tree is full, as opposed to just
taking longer to eventually succeed, then it might be relatively
innocuous to tack on that semi-redundant extra accounting as a
self-contained quick fix for that worst case.


Anyway, we see ~50% throughput regression, which is intolerable. As seen
in [0], I put this down to the fact that we have so many IOVA requests
which exceed the rcache size limit, which means many RB tree accesses
for non-cacheble IOVAs, which are now slower.


I will attempt to prove this by increasing RCACHE RANGE, such that all 
IOVA sizes may be cached.




On another point, as for longterm IOVA aging issue, it seems that there
is no conclusion there. However I did mention the issue of IOVA sizes
exceeding rcache size for that issue, so maybe we can find a common
solution. Similar to a fixed rcache depot size, it seems that having a
fixed rcache max size range value (at 6) doesn't scale either.

Well, I'd say that's more of a workload tuning thing than a scalability
one -


ok


a massive system with hundreds of CPUs that spends all day
flinging 1500-byte network packets around as fast as it can might be
happy with an even smaller value and using the saved memory for
something else. IIRC the value of 6 is a fairly arbitrary choice for a
tradeoff between expected utility and memory consumption, so making it a
Kconfig or command-line tuneable does seem like a sensible thing to explore.


Even if it is were configurable, wouldn't it make sense to have it 
configurable per IOVA domain?


Furthermore, as mentioned above, I still want to solve this IOVA aging 
issue, and this fixed RCACHE RANGE size seems to be the at the center of 
that problem.





As for 4e89dce72521, so even if it's proper to retry for a failed alloc,
it is not always necessary. I mean, if we're limiting ourselves to 32b
subspace for this SAC trick and we fail the alloc, then we can try the
space above 32b first (if usable). If that fails, then retry there. I
don't see a need to retry the 32b subspace if we're not limited to it.
How about it? We tried that idea and it looks to just about restore
performance.

The thing is, if you do have an actual PCI device where DAC might mean a
33% throughput loss and you're mapping a long-lived buffer, or you're on
one of these systems where firmware fails to document address limits and
using the full IOMMU address width quietly breaks things, then you
almost certainly*do*  want the allocator to actually do a proper job of
trying to satisfy the given request.


If those conditions were true, then it seems quite a tenuous position, 
so trying to help that scenario in general terms will have limited efficacy.




Furthermore, what you propose is still fragile for your own use-case
anyway. If someone makes internal changes to the allocator - converts it
to a different tree structure, implements split locking for concurrency,
that sort of thing - and it fundamentally loses the dodgy cached32_node
behaviour which makes the initial failure unintentionally fast for your
workload's allocation pattern, that extra complexity will suddenly just
be dead weight and you'll probably be complaining of a performance
regression again.

We're talking about an allocation that you know you don't need to make,
and that you even expect to fail, so I still maintain that it's absurd
to focus on optimising for

Re: [PATCH 1/1] Revert "iommu/iova: Retry from last rb tree node if iova search fails"

2021-03-01 Thread Robin Murphy

On 2021-02-25 13:54, John Garry wrote:

On 29/01/2021 12:03, Robin Murphy wrote:

On 2021-01-29 09:48, Leizhen (ThunderTown) wrote:


Currently, we are thinking about the solution to the problem. 
However, because the end time of v5.11 is approaching, this patch is 
sent first.


However, that commit was made for a reason - how do we justify that 
one thing being slow is more important than another thing being 
completely broken? It's not practical to just keep doing the patch 
hokey-cokey based on whoever shouts loudest :(



On 2021/1/29 17:21, Zhen Lei wrote:

This reverts commit 4e89dce725213d3d0b0475211b500eda4ef4bf2f.

We find that this patch has a great impact on performance. According to
our test: the iops decreases from 1655.6K to 893.5K, about half.

Hardware: 1 SAS expander with 12 SAS SSD
Command:  Only the main parameters are listed.
   fio bs=4k rw=read iodepth=128 cpus_allowed=0-127


FWIW, I'm 99% sure that what you really want is [1], but then you get 
to battle against an unknown quantity of dodgy firmware instead.




Something which has not been said before is that this only happens for 
strict mode.


I think that makes sense - once you *have* actually failed to allocate 
from the 32-bit space, max32_alloc_size will make subsequent attempts 
fail immediately. In non-strict mode you're most likely freeing 32-bit 
IOVAs back to the tree - and thus reset max32_alloc_size - much less 
often, and you'll make more total space available each time, both of 
which will amortise the cost of getting back into that failed state 
again. Conversely, the worst case in strict mode is to have multiple 
threads getting into this pathological cycle:


1: allocate, get last available IOVA
2: allocate, fail and set max32_alloc_size
3: free one IOVA, reset max32_alloc_size, goto 1

Now, given the broken behaviour where the cached PFN can get stuck near 
the bottom of the address space, step 2 might well have been faster and 
more premature than it should have, but I hope you can appreciate that 
relying on an allocator being broken at its fundamental purpose of 
allocating is not a good or sustainable thing to do.


While max32_alloc_size indirectly tracks the largest *contiguous* 
available space, one of the ideas from which it grew was to simply keep 
count of the total number of free PFNs. If you're really spending 
significant time determining that the tree is full, as opposed to just 
taking longer to eventually succeed, then it might be relatively 
innocuous to tack on that semi-redundant extra accounting as a 
self-contained quick fix for that worst case.


Anyway, we see ~50% throughput regression, which is intolerable. As seen 
in [0], I put this down to the fact that we have so many IOVA requests 
which exceed the rcache size limit, which means many RB tree accesses 
for non-cacheble IOVAs, which are now slower.


On another point, as for longterm IOVA aging issue, it seems that there 
is no conclusion there. However I did mention the issue of IOVA sizes 
exceeding rcache size for that issue, so maybe we can find a common 
solution. Similar to a fixed rcache depot size, it seems that having a 
fixed rcache max size range value (at 6) doesn't scale either.


Well, I'd say that's more of a workload tuning thing than a scalability 
one - a massive system with hundreds of CPUs that spends all day 
flinging 1500-byte network packets around as fast as it can might be 
happy with an even smaller value and using the saved memory for 
something else. IIRC the value of 6 is a fairly arbitrary choice for a 
tradeoff between expected utility and memory consumption, so making it a 
Kconfig or command-line tuneable does seem like a sensible thing to explore.


As for 4e89dce72521, so even if it's proper to retry for a failed alloc, 
it is not always necessary. I mean, if we're limiting ourselves to 32b 
subspace for this SAC trick and we fail the alloc, then we can try the 
space above 32b first (if usable). If that fails, then retry there. I 
don't see a need to retry the 32b subspace if we're not limited to it. 
How about it? We tried that idea and it looks to just about restore 
performance.


The thing is, if you do have an actual PCI device where DAC might mean a 
33% throughput loss and you're mapping a long-lived buffer, or you're on 
one of these systems where firmware fails to document address limits and 
using the full IOMMU address width quietly breaks things, then you 
almost certainly *do* want the allocator to actually do a proper job of 
trying to satisfy the given request.


Furthermore, what you propose is still fragile for your own use-case 
anyway. If someone makes internal changes to the allocator - converts it 
to a different tree structure, implements split locking for concurrency, 
that sort of thing - and it fundamentally loses the dodgy cached32_node 
behaviour which makes the initial failure unintentionally fast for your 
workload's allocation pattern, that extra complexity wi

[PATCH 0/3] iommu/iova: Add CPU hotplug handler to flush rcaches to core code

2021-03-01 Thread John Garry
The Intel IOMMU driver supports flushing the per-CPU rcaches when a CPU is
offlined.

Let's move it to core code, so everyone can take advantage.

Also correct a code comment.

Based on v5.12-rc1. Tested on arm64 only.

John Garry (3):
  iova: Add CPU hotplug handler to flush rcaches
  iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining
  iova: Correct comment for free_cpu_cached_iovas()

 drivers/iommu/intel/iommu.c | 31 ---
 drivers/iommu/iova.c| 32 ++--
 include/linux/cpuhotplug.h  |  2 +-
 include/linux/iova.h|  1 +
 4 files changed, 32 insertions(+), 34 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining

2021-03-01 Thread John Garry
Now that the core code handles flushing per-IOVA domain CPU rcaches,
remove the handling here.

Signed-off-by: John Garry 
---
 drivers/iommu/intel/iommu.c | 31 ---
 include/linux/cpuhotplug.h  |  1 -
 2 files changed, 32 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee0932307d64..d1e66e1b07b8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4065,35 +4065,6 @@ static struct notifier_block intel_iommu_memory_nb = {
.priority = 0
 };
 
-static void free_all_cpu_cached_iovas(unsigned int cpu)
-{
-   int i;
-
-   for (i = 0; i < g_num_of_iommus; i++) {
-   struct intel_iommu *iommu = g_iommus[i];
-   struct dmar_domain *domain;
-   int did;
-
-   if (!iommu)
-   continue;
-
-   for (did = 0; did < cap_ndoms(iommu->cap); did++) {
-   domain = get_iommu_domain(iommu, (u16)did);
-
-   if (!domain || domain->domain.type != IOMMU_DOMAIN_DMA)
-   continue;
-
-   iommu_dma_free_cpu_cached_iovas(cpu, &domain->domain);
-   }
-   }
-}
-
-static int intel_iommu_cpu_dead(unsigned int cpu)
-{
-   free_all_cpu_cached_iovas(cpu);
-   return 0;
-}
-
 static void intel_disable_iommus(void)
 {
struct intel_iommu *iommu = NULL;
@@ -4388,8 +4359,6 @@ int __init intel_iommu_init(void)
bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
if (si_domain && !hw_pass_through)
register_memory_notifier(&intel_iommu_memory_nb);
-   cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
- intel_iommu_cpu_dead);
 
down_read(&dmar_global_lock);
if (probe_acpi_namespace_devices())
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index cedac9986557..85996494bec1 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -57,7 +57,6 @@ enum cpuhp_state {
CPUHP_PAGE_ALLOC_DEAD,
CPUHP_NET_DEV_DEAD,
CPUHP_PCI_XGENE_DEAD,
-   CPUHP_IOMMU_INTEL_DEAD,
CPUHP_IOMMU_IOVA_DEAD,
CPUHP_LUSTRE_CFS_DEAD,
CPUHP_AP_ARM_CACHE_B15_RAC_DEAD,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] iova: Add CPU hotplug handler to flush rcaches

2021-03-01 Thread John Garry
Like the intel IOMMU driver already does, flush the per-IOVA domain
CPU rcache when a CPU goes offline - there's no point in keeping it.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c   | 30 +-
 include/linux/cpuhotplug.h |  1 +
 include/linux/iova.h   |  1 +
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index e6e2fa85271c..c78312560425 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -25,6 +25,17 @@ static void init_iova_rcaches(struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
 static void fq_destroy_all_entries(struct iova_domain *iovad);
 static void fq_flush_timeout(struct timer_list *t);
+
+static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
+{
+   struct iova_domain *iovad;
+
+   iovad = hlist_entry_safe(node, struct iova_domain, cpuhp_dead);
+
+   free_cpu_cached_iovas(cpu, iovad);
+   return 0;
+}
+
 static void free_global_cached_iovas(struct iova_domain *iovad);
 
 void
@@ -51,6 +62,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
+   cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, 
&iovad->cpuhp_dead);
init_iova_rcaches(iovad);
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
@@ -257,10 +269,21 @@ int iova_cache_get(void)
 {
mutex_lock(&iova_cache_mutex);
if (!iova_cache_users) {
+   int ret;
+
+   ret = cpuhp_setup_state_multi(CPUHP_IOMMU_IOVA_DEAD, 
"iommu/iova:dead", NULL,
+   iova_cpuhp_dead);
+   if (ret) {
+   mutex_unlock(&iova_cache_mutex);
+   pr_err("Couldn't register cpuhp handler\n");
+   return ret;
+   }
+
iova_cache = kmem_cache_create(
"iommu_iova", sizeof(struct iova), 0,
SLAB_HWCACHE_ALIGN, NULL);
if (!iova_cache) {
+   cpuhp_remove_multi_state(CPUHP_IOMMU_IOVA_DEAD);
mutex_unlock(&iova_cache_mutex);
pr_err("Couldn't create iova cache\n");
return -ENOMEM;
@@ -282,8 +305,10 @@ void iova_cache_put(void)
return;
}
iova_cache_users--;
-   if (!iova_cache_users)
+   if (!iova_cache_users) {
+   cpuhp_remove_multi_state(CPUHP_IOMMU_IOVA_DEAD);
kmem_cache_destroy(iova_cache);
+   }
mutex_unlock(&iova_cache_mutex);
 }
 EXPORT_SYMBOL_GPL(iova_cache_put);
@@ -606,6 +631,9 @@ void put_iova_domain(struct iova_domain *iovad)
 {
struct iova *iova, *tmp;
 
+   cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD,
+   &iovad->cpuhp_dead);
+
free_iova_flush_queue(iovad);
free_iova_rcaches(iovad);
rbtree_postorder_for_each_entry_safe(iova, tmp, &iovad->rbroot, node)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f14adb882338..cedac9986557 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -58,6 +58,7 @@ enum cpuhp_state {
CPUHP_NET_DEV_DEAD,
CPUHP_PCI_XGENE_DEAD,
CPUHP_IOMMU_INTEL_DEAD,
+   CPUHP_IOMMU_IOVA_DEAD,
CPUHP_LUSTRE_CFS_DEAD,
CPUHP_AP_ARM_CACHE_B15_RAC_DEAD,
CPUHP_PADATA_DEAD,
diff --git a/include/linux/iova.h b/include/linux/iova.h
index c834c01c0a5b..4be6c0ab4997 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -95,6 +95,7 @@ struct iova_domain {
   flush-queues */
atomic_t fq_timer_on;   /* 1 when timer is active, 0
   when not */
+   struct hlist_node   cpuhp_dead;
 };
 
 static inline unsigned long iova_size(struct iova *iova)
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] iova: Correct comment for free_cpu_cached_iovas()

2021-03-01 Thread John Garry
Function free_cpu_cached_iovas() is not only called when a CPU is
hotplugged, so remove that part of the code comment.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index c78312560425..465b3b0eeeb0 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -996,7 +996,7 @@ static void free_iova_rcaches(struct iova_domain *iovad)
 }
 
 /*
- * free all the IOVA ranges cached by a cpu (used when cpu is unplugged)
+ * free all the IOVA ranges cached by a cpu
  */
 void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad)
 {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-03-01 Thread Christoph Hellwig
From: Ricardo Ribalda 

On architectures where the is no coherent caching such as ARM use the
dma_alloc_noncontiguos API and handle manually the cache flushing using
dma_sync_sgtable().

With this patch on the affected architectures we can measure up to 20x
performance improvement in uvc_video_copy_data_work().

Eg: aarch64 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 67034480 : duration 33303
FPS: 29.99
URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
raw decode speed: 9.931 Gbits/s
raw URB handling speed: 1.025 Gbits/s
throughput: 16.102 Mbits/s
URB decode CPU usage 0.162600 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 54683536 : duration 33302
FPS: 29.99
URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max (uS)
raw decode speed: 365.470 Mbits/s
raw URB handling speed: 295.986 Mbits/s
throughput: 13.136 Mbits/s
URB decode CPU usage 3.594500 %

Signed-off-by: Ricardo Ribalda 
Reviewed-by: Tomasz Figa 
Signed-off-by: Christoph Hellwig 
---
 drivers/media/usb/uvc/uvc_video.c | 79 ++-
 drivers/media/usb/uvc/uvcvideo.h  |  4 +-
 2 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c 
b/drivers/media/usb/uvc/uvc_video.c
index f2f565281e63ff..d008c68fb6c806 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -6,11 +6,13 @@
  *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1096,6 +1098,26 @@ static int uvc_video_decode_start(struct uvc_streaming 
*stream,
return data[0];
 }
 
+static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
+{
+   return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
+}
+
+static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
+{
+   struct device *dma_dev = stream_to_dmadev(uvc_urb->stream);
+
+   if (for_device) {
+   dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
+   DMA_FROM_DEVICE);
+   } else {
+   dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
+DMA_FROM_DEVICE);
+   invalidate_kernel_vmap_range(uvc_urb->buffer,
+uvc_urb->stream->urb_size);
+   }
+}
+
 /*
  * uvc_video_decode_data_work: Asynchronous memcpy processing
  *
@@ -1117,6 +1139,8 @@ static void uvc_video_copy_data_work(struct work_struct 
*work)
uvc_queue_buffer_release(op->buf);
}
 
+   uvc_urb_dma_sync(uvc_urb, true);
+
ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
if (ret < 0)
dev_err(&uvc_urb->stream->intf->dev,
@@ -1541,10 +1565,12 @@ static void uvc_video_complete(struct urb *urb)
 * Process the URB headers, and optionally queue expensive memcpy tasks
 * to be deferred to a work queue.
 */
+   uvc_urb_dma_sync(uvc_urb, false);
stream->decode(uvc_urb, buf, buf_meta);
 
/* If no async work is needed, resubmit the URB immediately. */
if (!uvc_urb->async_operations) {
+   uvc_urb_dma_sync(uvc_urb, true);
ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
if (ret < 0)
dev_err(&stream->intf->dev,
@@ -1560,24 +1586,46 @@ static void uvc_video_complete(struct urb *urb)
  */
 static void uvc_free_urb_buffers(struct uvc_streaming *stream)
 {
+   struct device *dma_dev = stream_to_dmadev(stream);
struct uvc_urb *uvc_urb;
 
for_each_uvc_urb(uvc_urb, stream) {
if (!uvc_urb->buffer)
continue;
 
-#ifndef CONFIG_DMA_NONCOHERENT
-   usb_free_coherent(stream->dev->udev, stream->urb_size,
- uvc_urb->buffer, uvc_urb->dma);
-#else
-   kfree(uvc_urb->buffer);
-#endif
+   dma_vunmap_noncontiguous(dma_dev, uvc_urb->buffer);
+   dma_free_noncontiguous(dma_dev, stream->urb_size, uvc_urb->sgt,
+  DMA_FROM_DEVICE);
+
 

[PATCH 5/6] dma-iommu: implement ->alloc_noncontiguous

2021-03-01 Thread Christoph Hellwig
Implement support for allocating a non-contiguous DMA region.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Tomasz Figa 
Tested-by: Ricardo Ribalda 
---
 drivers/iommu/dma-iommu.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b4d7bfffb3a0d2..714fa930d7b576 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -707,6 +707,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
goto out_free_sg;
 
sgt->sgl->dma_address = iova;
+   sgt->sgl->dma_length = size;
return pages;
 
 out_free_sg:
@@ -744,6 +745,37 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 }
 
+#ifdef CONFIG_DMA_REMAP
+static struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev,
+   size_t size, enum dma_data_direction dir, gfp_t gfp,
+   unsigned long attrs)
+{
+   struct dma_sgt_handle *sh;
+
+   sh = kmalloc(sizeof(*sh), gfp);
+   if (!sh)
+   return NULL;
+
+   sh->pages = __iommu_dma_alloc_noncontiguous(dev, size, &sh->sgt, gfp,
+   PAGE_KERNEL, attrs);
+   if (!sh->pages) {
+   kfree(sh);
+   return NULL;
+   }
+   return &sh->sgt;
+}
+
+static void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
+   struct sg_table *sgt, enum dma_data_direction dir)
+{
+   struct dma_sgt_handle *sh = sgt_handle(sgt);
+
+   __iommu_dma_unmap(dev, sgt->sgl->dma_address, size);
+   __iommu_dma_free_pages(sh->pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+   sg_free_table(&sh->sgt);
+}
+#endif /* CONFIG_DMA_REMAP */
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -1260,6 +1292,10 @@ static const struct dma_map_ops iommu_dma_ops = {
.free   = iommu_dma_free,
.alloc_pages= dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
+#ifdef CONFIG_DMA_REMAP
+   .alloc_noncontiguous= iommu_dma_alloc_noncontiguous,
+   .free_noncontiguous = iommu_dma_free_noncontiguous,
+#endif
.mmap   = iommu_dma_mmap,
.get_sgtable= iommu_dma_get_sgtable,
.map_page   = iommu_dma_map_page,
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/6] dma-iommu: refactor iommu_dma_alloc_remap

2021-03-01 Thread Christoph Hellwig
Split out a new helper that only allocates a sg_table worth of
memory without mapping it into contiguous kernel address space.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Tomasz Figa 
Tested-by: Ricardo Ribalda 
---
 drivers/iommu/dma-iommu.c | 67 ---
 1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9ab6ee22c11088..b4d7bfffb3a0d2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -649,23 +649,12 @@ static struct page **__iommu_dma_alloc_pages(struct 
device *dev,
return pages;
 }
 
-/**
- * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
- * @dev: Device to allocate memory for. Must be a real device
- *  attached to an iommu_dma_domain
- * @size: Size of buffer in bytes
- * @dma_handle: Out argument for allocated DMA handle
- * @gfp: Allocation flags
- * @prot: pgprot_t to use for the remapped mapping
- * @attrs: DMA attributes for this allocation
- *
- * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
+/*
+ * If size is less than PAGE_SIZE, then a full CPU page will be allocated,
  * but an IOMMU which supports smaller pages might not map the whole thing.
- *
- * Return: Mapped virtual address, or NULL on failure.
  */
-static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
+static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
+   size_t size, struct sg_table *sgt, gfp_t gfp, pgprot_t prot,
unsigned long attrs)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -675,11 +664,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
-   struct sg_table sgt;
dma_addr_t iova;
-   void *vaddr;
-
-   *dma_handle = DMA_MAPPING_ERROR;
 
if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
iommu_deferred_attach(dev, domain))
@@ -706,38 +691,56 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
if (!iova)
goto out_free_pages;
 
-   if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
+   if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
goto out_free_iova;
 
if (!(ioprot & IOMMU_CACHE)) {
struct scatterlist *sg;
int i;
 
-   for_each_sg(sgt.sgl, sg, sgt.orig_nents, i)
+   for_each_sg(sgt->sgl, sg, sgt->orig_nents, i)
arch_dma_prep_coherent(sg_page(sg), sg->length);
}
 
-   if (iommu_map_sg_atomic(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
+   if (iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt->orig_nents, ioprot)
< size)
goto out_free_sg;
 
+   sgt->sgl->dma_address = iova;
+   return pages;
+
+out_free_sg:
+   sg_free_table(sgt);
+out_free_iova:
+   iommu_dma_free_iova(cookie, iova, size, NULL);
+out_free_pages:
+   __iommu_dma_free_pages(pages, count);
+   return NULL;
+}
+
+static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
+   unsigned long attrs)
+{
+   struct page **pages;
+   struct sg_table sgt;
+   void *vaddr;
+
+   pages = __iommu_dma_alloc_noncontiguous(dev, size, &sgt, gfp, prot,
+   attrs);
+   if (!pages)
+   return NULL;
+   *dma_handle = sgt.sgl->dma_address;
+   sg_free_table(&sgt);
vaddr = dma_common_pages_remap(pages, size, prot,
__builtin_return_address(0));
if (!vaddr)
goto out_unmap;
-
-   *dma_handle = iova;
-   sg_free_table(&sgt);
return vaddr;
 
 out_unmap:
-   __iommu_dma_unmap(dev, iova, size);
-out_free_sg:
-   sg_free_table(&sgt);
-out_free_iova:
-   iommu_dma_free_iova(cookie, iova, size, NULL);
-out_free_pages:
-   __iommu_dma_free_pages(pages, count);
+   __iommu_dma_unmap(dev, *dma_handle, size);
+   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
return NULL;
 }
 
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/6] dma-mapping: add a dma_alloc_noncontiguous API

2021-03-01 Thread Christoph Hellwig
Add a new API that returns a potentiall virtually non-contigous sg_table
and a DMA address.  This API is only properly implemented for dma-iommu
and will simply return a contigious chunk as a fallback.

The intent is that drivers can use this API if either:

 - no kernel mapping or only temporary kernel mappings are required.
   That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
 - a kernel mapping is required for cached and DMA mapped pages, but
   the driver also needs the pages to e.g. map them to userspace.
   In that sense it is a replacement for some aspects of the recently
   removed and never fully implemented DMA_ATTR_NON_CONSISTENT

Signed-off-by: Christoph Hellwig 
Reviewed-by: Tomasz Figa 
Tested-by: Ricardo Ribalda 
---
 Documentation/core-api/dma-api.rst |  78 +
 include/linux/dma-map-ops.h|  19 ++
 include/linux/dma-mapping.h|  32 +
 kernel/dma/mapping.c   | 106 +
 4 files changed, 235 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 157a474ae54416..00a1d4fa3f9e4e 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -594,6 +594,84 @@ dev, size, dma_handle and dir must all be the same as 
those passed into
 dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
 dma_alloc_noncoherent().
 
+::
+
+   struct sg_table *
+   dma_alloc_noncontiguous(struct device *dev, size_t size,
+   enum dma_data_direction dir, gfp_t gfp,
+   unsigned long attrs);
+
+This routine allocates   bytes of non-coherent and possibly 
non-contiguous
+memory.  It returns a pointer to struct sg_table that describes the allocated
+and DMA mapped memory, or NULL if the allocation failed. The resulting memory
+can be used for struct page mapped into a scatterlist are suitable for.
+
+The return sg_table is guaranteed to have 1 single DMA mapped segment as
+indicated by sgt->nents, but it might have multiple CPU side segments as
+indicated by sgt->orig_nents.
+
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+The attrs argument must be either 0 or DMA_ATTR_ALLOC_SINGLE_PAGES.
+
+Before giving the memory to the device, dma_sync_sgtable_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_sgtable_for_cpu(), just like for streaming DMA mappings that are
+reused.
+
+::
+
+   void
+   dma_free_noncontiguous(struct device *dev, size_t size,
+  struct sg_table *sgt,
+  enum dma_data_direction dir)
+
+Free memory previously allocated using dma_alloc_noncontiguous().  dev, size,
+and dir must all be the same as those passed into dma_alloc_noncontiguous().
+sgt must be the pointer returned by dma_alloc_noncontiguous().
+
+::
+
+   void *
+   dma_vmap_noncontiguous(struct device *dev, size_t size,
+   struct sg_table *sgt)
+
+Return a contiguous kernel mapping for an allocation returned from
+dma_alloc_noncontiguous().  dev and size must be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
+Once a non-contiguous allocation is mapped using this function, the
+flush_kernel_vmap_range() and invalidate_kernel_vmap_range() APIs must be used
+to manage the coherency between the kernel mapping, the device and user space
+mappings (if any).
+
+::
+
+   void
+   dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
+
+Unmap a kernel mapping returned by dma_vmap_noncontiguous().  dev must be the
+same the one passed into dma_alloc_noncontiguous().  vaddr must be the pointer
+returned by dma_vmap_noncontiguous().
+
+
+::
+
+   int
+   dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
+  size_t size, struct sg_table *sgt)
+
+Map an allocation returned from dma_alloc_noncontiguous() into a user address
+space.  dev and size must be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
 ::
 
int
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 51872e736e7b1d..0d53a96a3d641f 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -22,6 +22,11 @@ struct dma_map_ops {
gfp_t gfp);
void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
dma_addr_t dma_handle, enum dma_data_direction dir);
+   struct sg_table *(*alloc_noncontiguous)(struct device *dev, size_t size,
+

[PATCH 2/6] dma-mapping: refactor dma_{alloc,free}_pages

2021-03-01 Thread Christoph Hellwig
Factour out internal versions without the dma_debug calls in preparation
for callers that will need different dma_debug calls.

Note that this changes the dma_debug calls to get the not page aligned
size values, but as long as alloc and free agree on one variant we are
fine.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Tomasz Figa 
Tested-by: Ricardo Ribalda 
---
 kernel/dma/mapping.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9ce86c77651c6f..07f964ebcda15e 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -477,11 +477,10 @@ void dma_free_attrs(struct device *dev, size_t size, void 
*cpu_addr,
 }
 EXPORT_SYMBOL(dma_free_attrs);
 
-struct page *dma_alloc_pages(struct device *dev, size_t size,
+static struct page *__dma_alloc_pages(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
-   struct page *page;
 
if (WARN_ON_ONCE(!dev->coherent_dma_mask))
return NULL;
@@ -490,31 +489,41 @@ struct page *dma_alloc_pages(struct device *dev, size_t 
size,
 
size = PAGE_ALIGN(size);
if (dma_alloc_direct(dev, ops))
-   page = dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
-   else if (ops->alloc_pages)
-   page = ops->alloc_pages(dev, size, dma_handle, dir, gfp);
-   else
+   return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
+   if (!ops->alloc_pages)
return NULL;
+   return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
+}
 
-   debug_dma_map_page(dev, page, 0, size, dir, *dma_handle);
+struct page *dma_alloc_pages(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+   struct page *page = __dma_alloc_pages(dev, size, dma_handle, dir, gfp);
 
+   if (page)
+   debug_dma_map_page(dev, page, 0, size, dir, *dma_handle);
return page;
 }
 EXPORT_SYMBOL_GPL(dma_alloc_pages);
 
-void dma_free_pages(struct device *dev, size_t size, struct page *page,
+static void __dma_free_pages(struct device *dev, size_t size, struct page 
*page,
dma_addr_t dma_handle, enum dma_data_direction dir)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
 
size = PAGE_ALIGN(size);
-   debug_dma_unmap_page(dev, dma_handle, size, dir);
-
if (dma_alloc_direct(dev, ops))
dma_direct_free_pages(dev, size, page, dma_handle, dir);
else if (ops->free_pages)
ops->free_pages(dev, size, page, dma_handle, dir);
 }
+
+void dma_free_pages(struct device *dev, size_t size, struct page *page,
+   dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+   debug_dma_unmap_page(dev, dma_handle, size, dir);
+   __dma_free_pages(dev, size, page, dma_handle, dir);
+}
 EXPORT_SYMBOL_GPL(dma_free_pages);
 
 int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/6] dma-mapping: add a dma_mmap_pages helper

2021-03-01 Thread Christoph Hellwig
Add a helper to map memory allocated using dma_alloc_pages into
a user address space, similar to the dma_alloc_attrs function for
coherent allocations.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Tomasz Figa 
Tested-by: Ricardo Ribalda 
---
 Documentation/core-api/dma-api.rst | 10 ++
 include/linux/dma-mapping.h|  2 ++
 kernel/dma/mapping.c   | 13 +
 3 files changed, 25 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index e6d23f117308df..157a474ae54416 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -563,6 +563,16 @@ Free a region of memory previously allocated using 
dma_alloc_pages().
 dev, size, dma_handle and dir must all be the same as those passed into
 dma_alloc_pages().  page must be the pointer returned by dma_alloc_pages().
 
+::
+
+   int
+   dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+  size_t size, struct page *page)
+
+Map an allocation returned from dma_alloc_pages() into a user address space.
+dev and size must be the same as those passed into dma_alloc_pages().
+page must be the pointer returned by dma_alloc_pages().
+
 ::
 
void *
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2a984cb4d1e037..2b8dce756e1fa1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -263,6 +263,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t 
size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
 void dma_free_pages(struct device *dev, size_t size, struct page *page,
dma_addr_t dma_handle, enum dma_data_direction dir);
+int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+   size_t size, struct page *page);
 
 static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b6a63367993328..9ce86c77651c6f 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -517,6 +517,19 @@ void dma_free_pages(struct device *dev, size_t size, 
struct page *page,
 }
 EXPORT_SYMBOL_GPL(dma_free_pages);
 
+int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+   size_t size, struct page *page)
+{
+   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+   if (vma->vm_pgoff >= count || vma_pages(vma) > count - vma->vm_pgoff)
+   return -ENXIO;
+   return remap_pfn_range(vma, vma->vm_start,
+  page_to_pfn(page) + vma->vm_pgoff,
+  vma_pages(vma) << PAGE_SHIFT, vma->vm_page_prot);
+}
+EXPORT_SYMBOL_GPL(dma_mmap_pages);
+
 int dma_supported(struct device *dev, u64 mask)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


add a new dma_alloc_noncontiguous API v3

2021-03-01 Thread Christoph Hellwig
Hi all,

this series adds the new noncontiguous DMA allocation API requested by
various media driver maintainers.

Changes since v2:
 - rebased to Linux 5.12-rc1
 - dropped one already merged patch
 - pass an attrs argument to dma_alloc_noncontigous
 - clarify the dma_vmap_noncontiguous documentation a bit
 - fix double assignments in uvcvideo

Changes since v1:
 - document that flush_kernel_vmap_range and invalidate_kernel_vmap_range
   must be called once an allocation is mapped into KVA
 - add dma-debug support
 - remove the separate dma_handle argument, and instead create fully formed
   DMA mapped scatterlists
 - use a directional allocation in uvcvideo
 - call invalidate_kernel_vmap_range from uvcvideo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 17/17] iommu: remove iommu_domain_set_attr

2021-03-01 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/iommu.c | 17 -
 include/linux/iommu.h | 27 ---
 2 files changed, 44 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8490aefd4b41f8..b04e6cefe8520d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2668,23 +2668,6 @@ bool iommu_dma_use_flush_queue(struct iommu_domain 
*domain)
 }
 EXPORT_SYMBOL_GPL(iommu_dma_use_flush_queue);
 
-int iommu_domain_set_attr(struct iommu_domain *domain,
- enum iommu_attr attr, void *data)
-{
-   int ret = 0;
-
-   switch (attr) {
-   default:
-   if (domain->ops->domain_set_attr == NULL)
-   return -EINVAL;
-
-   ret = domain->ops->domain_set_attr(domain, attr, data);
-   }
-
-   return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
-
 int iommu_domain_enable_nesting(struct iommu_domain *domain)
 {
if (!domain->ops->domain_enable_nesting)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 39d3ed4d2700ac..62535f563aa491 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -97,20 +97,6 @@ enum iommu_cap {
IOMMU_CAP_NOEXEC,   /* IOMMU_NOEXEC flag */
 };
 
-/*
- * Following constraints are specifc to FSL_PAMUV1:
- *  -aperture must be power of 2, and naturally aligned
- *  -number of windows must be power of 2, and address space size
- *   of each window is determined by aperture size / # of windows
- *  -the actual size of the mapped region of a window must be power
- *   of 2 starting with 4KB and physical address must be naturally
- *   aligned.
- */
-
-enum iommu_attr {
-   DOMAIN_ATTR_MAX,
-};
-
 /* These are the possible reserved region types */
 enum iommu_resv_type {
/* Memory regions which must be mapped 1:1 at all times */
@@ -194,7 +180,6 @@ struct iommu_iotlb_gather {
  * @device_group: find iommu group for a particular device
  * @dma_use_flush_queue: Returns %true if a DMA flush queue is used
  * @dma_enable_flush_queue: Try to enable the DMA flush queue
- * @domain_set_attr: Change domain attributes
  * @domain_enable_nesting: Enable nesting
  * @domain_set_pgtable_attr: Set io page table attributes
  * @get_resv_regions: Request list of reserved regions for a device
@@ -247,8 +232,6 @@ struct iommu_ops {
struct iommu_group *(*device_group)(struct device *dev);
bool (*dma_use_flush_queue)(struct iommu_domain *domain);
void (*dma_enable_flush_queue)(struct iommu_domain *domain);
-   int (*domain_set_attr)(struct iommu_domain *domain,
-  enum iommu_attr attr, void *data);
int (*domain_enable_nesting)(struct iommu_domain *domain);
int (*domain_set_pgtable_attr)(struct iommu_domain *domain,
struct io_pgtable_domain_attr *pgtbl_cfg);
@@ -498,11 +481,7 @@ extern struct iommu_domain 
*iommu_group_default_domain(struct iommu_group *);
 bool iommu_dma_use_flush_queue(struct iommu_domain *domain);
 int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,
struct io_pgtable_domain_attr *pgtbl_cfg);
-extern int iommu_domain_set_attr(struct iommu_domain *domain, enum iommu_attr,
-void *data);
 int iommu_domain_enable_nesting(struct iommu_domain *domain);
-int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,
-   struct io_pgtable_domain_attr *pgtbl_cfg);
 
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
  unsigned long iova, int flags);
@@ -869,12 +848,6 @@ static inline int iommu_group_id(struct iommu_group *group)
return -ENODEV;
 }
 
-static inline int iommu_domain_set_attr(struct iommu_domain *domain,
-   enum iommu_attr attr, void *data)
-{
-   return -EINVAL;
-}
-
 static inline int  iommu_device_register(struct iommu_device *iommu)
 {
return -ENODEV;
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-01 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
 drivers/iommu/iommu.c   |  9 ++
 include/linux/iommu.h   |  9 +-
 4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu)
struct io_pgtable_domain_attr pgtbl_cfg;
 
pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
-   iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
+   iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg);
 }
 
 struct msm_gem_address_space *
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2e17d990d04481..2858999c86dfd1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
iommu_domain *domain)
return ret;
 }
 
-static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
-   enum iommu_attr attr, void *data)
+static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
+   struct io_pgtable_domain_attr *pgtbl_cfg)
 {
-   int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
 
-   mutex_lock(&smmu_domain->init_mutex);
-
-   switch(domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
-   struct io_pgtable_domain_attr *pgtbl_cfg = data;
-
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
 
-   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
-   break;
-   }
-   default:
-   ret = -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   ret = -ENODEV;
-   break;
-   default:
-   ret = -EINVAL;
+   mutex_lock(&smmu_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   ret = 0;
}
-out_unlock:
mutex_unlock(&smmu_domain->init_mutex);
+
return ret;
 }
 
@@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
-   .domain_set_attr= arm_smmu_domain_set_attr,
+   .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
.domain_enable_nesting  = arm_smmu_domain_enable_nesting,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2e9e058501a953..8490aefd4b41f8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
*domain)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
 
+int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,
+   struct io_pgtable_domain_attr *pgtbl_cfg)
+{
+   if (!domain->ops->domain_set_pgtable_attr)
+   return -EINVAL;
+   return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr);
+
 void iommu_get_resv_regions(struct device *dev, struct list_head *list)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aed88aa3bd3edf..39d3ed4d2700ac 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct iommu_domain;
 struct notifier_block;
 struct iommu_sva;
 struct iommu_fault_event;
+struct io_pgtable_domain_attr;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ   0x0
@@ -107,7 +108,6 @@ enum iommu_cap {
  */
 
 enum iommu_attr {
-   DOMAIN_ATTR_IO_PGTABLE_CFG,
DOMAIN_ATTR_MAX,
 };
 
@@ -196,6 +196,7 @@ struct iommu_iotlb_gather {
  * @dma_enable_flush_queue: Try to enable the DMA flush queue
  * @domain_set_attr: Change domain attributes
  * @domain_enable_nesting: Enable nesting
+ * @domain_set_pgtable_attr: Set io page table attributes
  * @get_resv_regions: Request list of reserved regions for a device
  * @put_resv_regions: Free list of reserved regions for a device

[PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING

2021-03-01 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 40 ++---
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 30 ++--
 drivers/iommu/intel/iommu.c | 28 +--
 drivers/iommu/iommu.c   |  8 +
 drivers/vfio/vfio_iommu_type1.c |  5 +--
 include/linux/iommu.h   |  4 ++-
 6 files changed, 50 insertions(+), 65 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index bf96172e8c1f71..8e6fee3ea454d3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2466,41 +2466,21 @@ static void arm_smmu_dma_enable_flush_queue(struct 
iommu_domain *domain)
to_smmu_domain(domain)->non_strict = true;
 }
 
-static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
-   enum iommu_attr attr, void *data)
+static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain)
 {
-   int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
 
-   mutex_lock(&smmu_domain->init_mutex);
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
 
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
-
-   if (*(int *)data)
-   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
-   else
-   smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
-   break;
-   default:
-   ret = -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   ret = -ENODEV;
-   break;
-   default:
-   ret = -EINVAL;
+   mutex_lock(&smmu_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
+   ret = 0;
}
-
-out_unlock:
mutex_unlock(&smmu_domain->init_mutex);
+
return ret;
 }
 
@@ -2603,7 +2583,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
-   .domain_set_attr= arm_smmu_domain_set_attr,
+   .domain_enable_nesting  = arm_smmu_domain_enable_nesting,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index e7893e96f5177a..2e17d990d04481 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1497,6 +1497,24 @@ static void arm_smmu_dma_enable_flush_queue(struct 
iommu_domain *domain)
to_smmu_domain(domain)->pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 }
 
+static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
+   
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
+
+   mutex_lock(&smmu_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
+   ret = 0;
+   }
+   mutex_unlock(&smmu_domain->init_mutex);
+
+   return ret;
+}
+
 static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
 {
@@ -1508,17 +1526,6 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
switch(domain->type) {
case IOMMU_DOMAIN_UNMANAGED:
switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
-
-   if (*(int *)data)
-   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
-   else
-   smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
-   break;
case DOMAIN_ATTR_IO_PGTABLE_CFG: {
struct io_pgtable_domain_attr *pgtbl_cfg = data;
 
@@ -1603,6 +1610,7 @@ static struct iommu_ops arm_smmu_ops = {
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
.

[PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-01 Thread Christoph Hellwig
Use explicit methods for setting and querying the information instead.

Also remove the now unused iommu_domain_get_attr functionality.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/amd/iommu.c   | 23 ++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++---
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 56 +
 drivers/iommu/dma-iommu.c   |  8 ++-
 drivers/iommu/intel/iommu.c | 27 ++
 drivers/iommu/iommu.c   | 19 +++
 include/linux/iommu.h   | 17 ++-
 7 files changed, 51 insertions(+), 146 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40d0..37a8e51db17656 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1771,24 +1771,11 @@ static struct iommu_group 
*amd_iommu_device_group(struct device *dev)
return acpihid_device_group(dev);
 }
 
-static int amd_iommu_domain_get_attr(struct iommu_domain *domain,
-   enum iommu_attr attr, void *data)
+static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain)
 {
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   return -ENODEV;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = !amd_iommu_unmap_flush;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return !amd_iommu_unmap_flush;
 }
 
 /*
@@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = {
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
-   .domain_get_attr = amd_iommu_domain_get_attr,
+   .dma_use_flush_queue = amd_iommu_dma_use_flush_queue,
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a8304375..bf96172e8c1f71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
 }
 
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
-   enum iommu_attr attr, void *data)
+static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return smmu_domain->non_strict;
+}
+
+
+static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain)
+{
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return;
+   to_smmu_domain(domain)->non_strict = true;
 }
 
 static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
@@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
}
break;
case IOMMU_DOMAIN_DMA:
-   switch(attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
-   break;
-   default:
-   ret = -ENODEV;
-   }
+   ret = -ENODEV;
break;
default:
ret = -EINVAL;
@@ -2619,7 +2601,8 @@ static struct iommu_ops arm_smmu_ops = {
.probe_device   = arm_smmu_probe_device,
.release_device = arm_smmu_release_device,
.device_group   = arm_smmu_device_group,
-   .domain_get_attr= arm_smmu_domain_get_attr,
+   .dma_use_flush_queue

[PATCH 13/17] iommu: remove DOMAIN_ATTR_GEOMETRY

2021-03-01 Thread Christoph Hellwig
The geometry information can be trivially queried from the iommu_domain
struture.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/iommu.c   | 20 +++-
 drivers/soc/fsl/qbman/qman_portal.c |  1 +
 drivers/vfio/vfio_iommu_type1.c | 26 --
 drivers/vhost/vdpa.c| 10 +++---
 include/linux/iommu.h   |  1 -
 5 files changed, 19 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a4cda390993e6..23daaea7883b75 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2667,23 +2667,9 @@ core_initcall(iommu_init);
 int iommu_domain_get_attr(struct iommu_domain *domain,
  enum iommu_attr attr, void *data)
 {
-   struct iommu_domain_geometry *geometry;
-   int ret = 0;
-
-   switch (attr) {
-   case DOMAIN_ATTR_GEOMETRY:
-   geometry  = data;
-   *geometry = domain->geometry;
-
-   break;
-   default:
-   if (!domain->ops->domain_get_attr)
-   return -EINVAL;
-
-   ret = domain->ops->domain_get_attr(domain, attr, data);
-   }
-
-   return ret;
+   if (!domain->ops->domain_get_attr)
+   return -EINVAL;
+   return domain->ops->domain_get_attr(domain, attr, data);
 }
 EXPORT_SYMBOL_GPL(iommu_domain_get_attr);
 
diff --git a/drivers/soc/fsl/qbman/qman_portal.c 
b/drivers/soc/fsl/qbman/qman_portal.c
index bf38eb0042ed52..4a4466cc26c232 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -53,6 +53,7 @@ static void portal_set_cpu(struct qm_portal_config *pcfg, int 
cpu)
dev_err(dev, "%s(): iommu_domain_alloc() failed", __func__);
goto no_iommu;
}
+
ret = fsl_pamu_configure_l1_stash(pcfg->iommu_domain, cpu);
if (ret < 0) {
dev_err(dev, "%s(): fsl_pamu_configure_l1_stash() = %d",
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4bb162c1d649b3..c8e57f22f421c5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2252,7 +2252,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
int ret;
bool resv_msi, msi_remap;
phys_addr_t resv_msi_base = 0;
-   struct iommu_domain_geometry geo;
+   struct iommu_domain_geometry *geo;
LIST_HEAD(iova_copy);
LIST_HEAD(group_resv_regions);
 
@@ -2333,10 +2333,9 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
goto out_domain;
 
/* Get aperture info */
-   iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
-
-   if (vfio_iommu_aper_conflict(iommu, geo.aperture_start,
-geo.aperture_end)) {
+   geo = &domain->domain->geometry;
+   if (vfio_iommu_aper_conflict(iommu, geo->aperture_start,
+geo->aperture_end)) {
ret = -EINVAL;
goto out_detach;
}
@@ -2359,8 +2358,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (ret)
goto out_detach;
 
-   ret = vfio_iommu_aper_resize(&iova_copy, geo.aperture_start,
-geo.aperture_end);
+   ret = vfio_iommu_aper_resize(&iova_copy, geo->aperture_start,
+geo->aperture_end);
if (ret)
goto out_detach;
 
@@ -2493,7 +2492,6 @@ static void vfio_iommu_aper_expand(struct vfio_iommu 
*iommu,
   struct list_head *iova_copy)
 {
struct vfio_domain *domain;
-   struct iommu_domain_geometry geo;
struct vfio_iova *node;
dma_addr_t start = 0;
dma_addr_t end = (dma_addr_t)~0;
@@ -2502,12 +2500,12 @@ static void vfio_iommu_aper_expand(struct vfio_iommu 
*iommu,
return;
 
list_for_each_entry(domain, &iommu->domain_list, next) {
-   iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
- &geo);
-   if (geo.aperture_start > start)
-   start = geo.aperture_start;
-   if (geo.aperture_end < end)
-   end = geo.aperture_end;
+   struct iommu_domain_geometry *geo = &domain->domain->geometry;
+
+   if (geo->aperture_start > start)
+   start = geo->aperture_start;
+   if (geo->aperture_end < end)
+   end = geo->aperture_end;
}
 
/* Modify aperture limits. The new aper is either same or bigger */
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ef688c8c0e0e6f..25824fab433d0a 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -826,18 +826,14 @@ static void vhost_vdpa_free_domain(struct vhost_vdpa *v)
 static void vhost_vdpa_set_i

[PATCH 12/17] iommu: remove DOMAIN_ATTR_PAGING

2021-03-01 Thread Christoph Hellwig
DOMAIN_ATTR_PAGING is never used.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/iommu.c | 5 -
 include/linux/iommu.h | 1 -
 2 files changed, 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b212bf0261820b..9a4cda390993e6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2668,7 +2668,6 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
  enum iommu_attr attr, void *data)
 {
struct iommu_domain_geometry *geometry;
-   bool *paging;
int ret = 0;
 
switch (attr) {
@@ -2676,10 +2675,6 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
geometry  = data;
*geometry = domain->geometry;
 
-   break;
-   case DOMAIN_ATTR_PAGING:
-   paging  = data;
-   *paging = (domain->pgsize_bitmap != 0UL);
break;
default:
if (!domain->ops->domain_get_attr)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 840864844027dc..180ff4bd7fa7ef 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -108,7 +108,6 @@ enum iommu_cap {
 
 enum iommu_attr {
DOMAIN_ATTR_GEOMETRY,
-   DOMAIN_ATTR_PAGING,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
DOMAIN_ATTR_IO_PGTABLE_CFG,
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/17] iommu/fsl_pamu: remove the snoop_id field

2021-03-01 Thread Christoph Hellwig
The snoop_id is always set to ~(u32)0.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 5 ++---
 drivers/iommu/fsl_pamu_domain.h | 1 -
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 21c6d9e79eddf9..701fc3f187a100 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -97,12 +97,12 @@ static int pamu_set_liodn(struct fsl_dma_domain 
*dma_domain, struct device *dev,
goto out_unlock;
ret = pamu_config_ppaace(liodn, geom->aperture_start,
 geom->aperture_end - 1, omi_index, 0,
-dma_domain->snoop_id, dma_domain->stash_id, 0);
+~(u32)0, dma_domain->stash_id, 0);
if (ret)
goto out_unlock;
ret = pamu_config_ppaace(liodn, geom->aperture_start,
 geom->aperture_end - 1, ~(u32)0,
-0, dma_domain->snoop_id, dma_domain->stash_id,
+0, ~(u32)0, dma_domain->stash_id,
 PAACE_AP_PERMS_QUERY | PAACE_AP_PERMS_UPDATE);
 out_unlock:
spin_unlock_irqrestore(&iommu_lock, flags);
@@ -210,7 +210,6 @@ static struct iommu_domain *fsl_pamu_domain_alloc(unsigned 
type)
return NULL;
 
dma_domain->stash_id = ~(u32)0;
-   dma_domain->snoop_id = ~(u32)0;
INIT_LIST_HEAD(&dma_domain->devices);
spin_lock_init(&dma_domain->domain_lock);
 
diff --git a/drivers/iommu/fsl_pamu_domain.h b/drivers/iommu/fsl_pamu_domain.h
index 5f4ed253f61b31..95ac1b3cab3b69 100644
--- a/drivers/iommu/fsl_pamu_domain.h
+++ b/drivers/iommu/fsl_pamu_domain.h
@@ -13,7 +13,6 @@ struct fsl_dma_domain {
/* list of devices associated with the domain */
struct list_headdevices;
u32 stash_id;
-   u32 snoop_id;
struct iommu_domain iommu_domain;
spinlock_t  domain_lock;
 };
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/17] iommu/fsl_pamu: enable the liodn when attaching a device

2021-03-01 Thread Christoph Hellwig
Instead of a separate call to enable all devices from the list, just
enablde the liodn one the device is attached to the iommu domain.

This also remove the DOMAIN_ATTR_FSL_PAMU_ENABLE iommu_attr.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 47 ++---
 drivers/iommu/fsl_pamu_domain.h | 10 --
 drivers/soc/fsl/qbman/qman_portal.c | 11 ---
 include/linux/iommu.h   |  1 -
 4 files changed, 3 insertions(+), 66 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 962cdc1a4a1924..21c6d9e79eddf9 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -195,9 +195,6 @@ static void fsl_pamu_domain_free(struct iommu_domain 
*domain)
 
/* remove all the devices from the device list */
detach_device(NULL, dma_domain);
-
-   dma_domain->enabled = 0;
-
kmem_cache_free(fsl_pamu_domain_cache, dma_domain);
 }
 
@@ -285,6 +282,9 @@ static int fsl_pamu_attach_device(struct iommu_domain 
*domain,
ret = pamu_set_liodn(dma_domain, dev, liodn[i]);
if (ret)
break;
+   ret = pamu_enable_liodn(liodn[i]);
+   if (ret)
+   break;
}
spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
return ret;
@@ -341,46 +341,6 @@ int fsl_pamu_configure_l1_stash(struct iommu_domain 
*domain, u32 cpu)
return ret;
 }
 
-/* Configure domain dma state i.e. enable/disable DMA */
-static int configure_domain_dma_state(struct fsl_dma_domain *dma_domain, bool 
enable)
-{
-   struct device_domain_info *info;
-   unsigned long flags;
-   int ret;
-
-   spin_lock_irqsave(&dma_domain->domain_lock, flags);
-   dma_domain->enabled = enable;
-   list_for_each_entry(info, &dma_domain->devices, link) {
-   ret = (enable) ? pamu_enable_liodn(info->liodn) :
-   pamu_disable_liodn(info->liodn);
-   if (ret)
-   pr_debug("Unable to set dma state for liodn %d",
-info->liodn);
-   }
-   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
-
-   return 0;
-}
-
-static int fsl_pamu_set_domain_attr(struct iommu_domain *domain,
-   enum iommu_attr attr_type, void *data)
-{
-   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
-   int ret = 0;
-
-   switch (attr_type) {
-   case DOMAIN_ATTR_FSL_PAMU_ENABLE:
-   ret = configure_domain_dma_state(dma_domain, *(int *)data);
-   break;
-   default:
-   pr_debug("Unsupported attribute type\n");
-   ret = -EINVAL;
-   break;
-   }
-
-   return ret;
-}
-
 static struct iommu_group *get_device_iommu_group(struct device *dev)
 {
struct iommu_group *group;
@@ -505,7 +465,6 @@ static const struct iommu_ops fsl_pamu_ops = {
.attach_dev = fsl_pamu_attach_device,
.detach_dev = fsl_pamu_detach_device,
.iova_to_phys   = fsl_pamu_iova_to_phys,
-   .domain_set_attr = fsl_pamu_set_domain_attr,
.probe_device   = fsl_pamu_probe_device,
.release_device = fsl_pamu_release_device,
.device_group   = fsl_pamu_device_group,
diff --git a/drivers/iommu/fsl_pamu_domain.h b/drivers/iommu/fsl_pamu_domain.h
index cd488004acd1b3..5f4ed253f61b31 100644
--- a/drivers/iommu/fsl_pamu_domain.h
+++ b/drivers/iommu/fsl_pamu_domain.h
@@ -12,16 +12,6 @@
 struct fsl_dma_domain {
/* list of devices associated with the domain */
struct list_headdevices;
-   /* dma_domain states:
-* enabled - DMA has been enabled for the given
-* domain. This translates to setting of the
-* valid bit for the primary PAACE in the PAMU
-* PAACT table. Domain geometry should be set and
-* it must have a valid mapping before DMA can be
-* enabled for it.
-*
-*/
-   int enabled;
u32 stash_id;
u32 snoop_id;
struct iommu_domain iommu_domain;
diff --git a/drivers/soc/fsl/qbman/qman_portal.c 
b/drivers/soc/fsl/qbman/qman_portal.c
index 798b3a1ffd0b9c..bf38eb0042ed52 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -46,7 +46,6 @@ static void portal_set_cpu(struct qm_portal_config *pcfg, int 
cpu)
 {
 #ifdef CONFIG_FSL_PAMU
struct device *dev = pcfg->dev;
-   int window_count = 1;
int ret;
 
pcfg->iommu_domain = iommu_domain_alloc(&platform_bus_type);
@@ -66,14 +65,6 @@ static void portal_set_cpu(struct qm_portal_config *pcfg, 
int cpu)
ret);
goto out_domain_free;
}
-   ret = iommu_domain_set_attr(pcfg->iommu_domain

[PATCH 09/17] iommu/fsl_pamu: merge handle_attach_device into fsl_pamu_attach_device

2021-03-01 Thread Christoph Hellwig
No good reason to split this functionality over two functions.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 59 +++--
 1 file changed, 20 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 4a4944332674f7..962cdc1a4a1924 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -240,45 +240,13 @@ static int update_domain_stash(struct fsl_dma_domain 
*dma_domain, u32 val)
return ret;
 }
 
-/*
- * Attach the LIODN to the DMA domain and configure the geometry
- * and window mappings.
- */
-static int handle_attach_device(struct fsl_dma_domain *dma_domain,
-   struct device *dev, const u32 *liodn,
-   int num)
-{
-   unsigned long flags;
-   int ret = 0;
-   int i;
-
-   spin_lock_irqsave(&dma_domain->domain_lock, flags);
-   for (i = 0; i < num; i++) {
-   /* Ensure that LIODN value is valid */
-   if (liodn[i] >= PAACE_NUMBER_ENTRIES) {
-   pr_debug("Invalid liodn %d, attach device failed for 
%pOF\n",
-liodn[i], dev->of_node);
-   ret = -EINVAL;
-   break;
-   }
-
-   attach_device(dma_domain, liodn[i], dev);
-   ret = pamu_set_liodn(dma_domain, dev, liodn[i]);
-   if (ret)
-   break;
-   }
-   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
-
-   return ret;
-}
-
 static int fsl_pamu_attach_device(struct iommu_domain *domain,
  struct device *dev)
 {
struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
+   unsigned long flags;
+   int len, ret = 0, i;
const u32 *liodn;
-   u32 liodn_cnt;
-   int len, ret = 0;
struct pci_dev *pdev = NULL;
struct pci_controller *pci_ctl;
 
@@ -298,14 +266,27 @@ static int fsl_pamu_attach_device(struct iommu_domain 
*domain,
}
 
liodn = of_get_property(dev->of_node, "fsl,liodn", &len);
-   if (liodn) {
-   liodn_cnt = len / sizeof(u32);
-   ret = handle_attach_device(dma_domain, dev, liodn, liodn_cnt);
-   } else {
+   if (!liodn) {
pr_debug("missing fsl,liodn property at %pOF\n", dev->of_node);
-   ret = -EINVAL;
+   return -EINVAL;
}
 
+   spin_lock_irqsave(&dma_domain->domain_lock, flags);
+   for (i = 0; i < len / sizeof(u32); i++) {
+   /* Ensure that LIODN value is valid */
+   if (liodn[i] >= PAACE_NUMBER_ENTRIES) {
+   pr_debug("Invalid liodn %d, attach device failed for 
%pOF\n",
+liodn[i], dev->of_node);
+   ret = -EINVAL;
+   break;
+   }
+
+   attach_device(dma_domain, liodn[i], dev);
+   ret = pamu_set_liodn(dma_domain, dev, liodn[i]);
+   if (ret)
+   break;
+   }
+   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
return ret;
 }
 
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/17] iommu/fsl_pamu: merge pamu_set_liodn and map_liodn

2021-03-01 Thread Christoph Hellwig
Merge the two fuctions that configure the ppaace into a single coherent
function.  I somehow doubt we need the two pamu_config_ppaace calls,
but keep the existing behavior just to be on the safe side.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 65 +
 1 file changed, 17 insertions(+), 48 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 40eff4b7bc5d42..4a4944332674f7 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -54,25 +54,6 @@ static int __init iommu_init_mempool(void)
return 0;
 }
 
-/* Map the DMA window corresponding to the LIODN */
-static int map_liodn(int liodn, struct fsl_dma_domain *dma_domain)
-{
-   int ret;
-   struct iommu_domain_geometry *geom = &dma_domain->iommu_domain.geometry;
-   unsigned long flags;
-
-   spin_lock_irqsave(&iommu_lock, flags);
-   ret = pamu_config_ppaace(liodn, geom->aperture_start,
-geom->aperture_end - 1, ~(u32)0,
-0, dma_domain->snoop_id, dma_domain->stash_id,
-PAACE_AP_PERMS_QUERY | PAACE_AP_PERMS_UPDATE);
-   spin_unlock_irqrestore(&iommu_lock, flags);
-   if (ret)
-   pr_debug("PAACE configuration failed for liodn %d\n", liodn);
-
-   return ret;
-}
-
 static int update_liodn_stash(int liodn, struct fsl_dma_domain *dma_domain,
  u32 val)
 {
@@ -94,11 +75,11 @@ static int update_liodn_stash(int liodn, struct 
fsl_dma_domain *dma_domain,
 }
 
 /* Set the geometry parameters for a LIODN */
-static int pamu_set_liodn(int liodn, struct device *dev,
- struct fsl_dma_domain *dma_domain,
- struct iommu_domain_geometry *geom_attr)
+static int pamu_set_liodn(struct fsl_dma_domain *dma_domain, struct device 
*dev,
+ int liodn)
 {
-   phys_addr_t window_addr, window_size;
+   struct iommu_domain *domain = &dma_domain->iommu_domain;
+   struct iommu_domain_geometry *geom = &domain->geometry;
u32 omi_index = ~(u32)0;
unsigned long flags;
int ret;
@@ -110,22 +91,25 @@ static int pamu_set_liodn(int liodn, struct device *dev,
 */
get_ome_index(&omi_index, dev);
 
-   window_addr = geom_attr->aperture_start;
-   window_size = geom_attr->aperture_end + 1;
-
spin_lock_irqsave(&iommu_lock, flags);
ret = pamu_disable_liodn(liodn);
-   if (!ret)
-   ret = pamu_config_ppaace(liodn, window_addr, window_size, 
omi_index,
-0, dma_domain->snoop_id,
-dma_domain->stash_id, 0);
+   if (ret)
+   goto out_unlock;
+   ret = pamu_config_ppaace(liodn, geom->aperture_start,
+geom->aperture_end - 1, omi_index, 0,
+dma_domain->snoop_id, dma_domain->stash_id, 0);
+   if (ret)
+   goto out_unlock;
+   ret = pamu_config_ppaace(liodn, geom->aperture_start,
+geom->aperture_end - 1, ~(u32)0,
+0, dma_domain->snoop_id, dma_domain->stash_id,
+PAACE_AP_PERMS_QUERY | PAACE_AP_PERMS_UPDATE);
+out_unlock:
spin_unlock_irqrestore(&iommu_lock, flags);
if (ret) {
pr_debug("PAACE configuration failed for liodn %d\n",
 liodn);
-   return ret;
}
-
return ret;
 }
 
@@ -265,7 +249,6 @@ static int handle_attach_device(struct fsl_dma_domain 
*dma_domain,
int num)
 {
unsigned long flags;
-   struct iommu_domain *domain = &dma_domain->iommu_domain;
int ret = 0;
int i;
 
@@ -280,21 +263,7 @@ static int handle_attach_device(struct fsl_dma_domain 
*dma_domain,
}
 
attach_device(dma_domain, liodn[i], dev);
-   /*
-* Check if geometry has already been configured
-* for the domain. If yes, set the geometry for
-* the LIODN.
-*/
-   ret = pamu_set_liodn(liodn[i], dev, dma_domain,
-&domain->geometry);
-   if (ret)
-   break;
-
-   /*
-* Create window/subwindow mapping for
-* the LIODN.
-*/
-   ret = map_liodn(liodn[i], dma_domain);
+   ret = pamu_set_liodn(dma_domain, dev, liodn[i]);
if (ret)
break;
}
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/17] iommu/fsl_pamu: replace DOMAIN_ATTR_FSL_PAMU_STASH with a direct call

2021-03-01 Thread Christoph Hellwig
Add a fsl_pamu_configure_l1_stash API that qman_portal can call directly
instead of indirecting through the iommu attr API.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/fsl_pamu_stash.h | 12 +++-
 drivers/iommu/fsl_pamu_domain.c   | 16 +++-
 drivers/iommu/fsl_pamu_domain.h   |  2 --
 drivers/soc/fsl/qbman/qman_portal.c   | 18 +++---
 include/linux/iommu.h |  1 -
 5 files changed, 9 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/fsl_pamu_stash.h 
b/arch/powerpc/include/asm/fsl_pamu_stash.h
index 30a31ad2123d86..c0fbadb70b5dad 100644
--- a/arch/powerpc/include/asm/fsl_pamu_stash.h
+++ b/arch/powerpc/include/asm/fsl_pamu_stash.h
@@ -7,6 +7,8 @@
 #ifndef __FSL_PAMU_STASH_H
 #define __FSL_PAMU_STASH_H
 
+struct iommu_domain;
+
 /* cache stash targets */
 enum pamu_stash_target {
PAMU_ATTR_CACHE_L1 = 1,
@@ -14,14 +16,6 @@ enum pamu_stash_target {
PAMU_ATTR_CACHE_L3,
 };
 
-/*
- * This attribute allows configuring stashig specific parameters
- * in the PAMU hardware.
- */
-
-struct pamu_stash_attribute {
-   u32 cpu;/* cpu number */
-   u32 cache;  /* cache to stash to: L1,L2,L3 */
-};
+int fsl_pamu_configure_l1_stash(struct iommu_domain *domain, u32 cpu);
 
 #endif  /* __FSL_PAMU_STASH_H */
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index fd2bc88b690465..40eff4b7bc5d42 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -372,27 +372,20 @@ static void fsl_pamu_detach_device(struct iommu_domain 
*domain,
 }
 
 /* Set the domain stash attribute */
-static int configure_domain_stash(struct fsl_dma_domain *dma_domain, void 
*data)
+int fsl_pamu_configure_l1_stash(struct iommu_domain *domain, u32 cpu)
 {
-   struct pamu_stash_attribute *stash_attr = data;
+   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
unsigned long flags;
int ret;
 
spin_lock_irqsave(&dma_domain->domain_lock, flags);
-
-   memcpy(&dma_domain->dma_stash, stash_attr,
-  sizeof(struct pamu_stash_attribute));
-
-   dma_domain->stash_id = get_stash_id(stash_attr->cache,
-   stash_attr->cpu);
+   dma_domain->stash_id = get_stash_id(PAMU_ATTR_CACHE_L1, cpu);
if (dma_domain->stash_id == ~(u32)0) {
pr_debug("Invalid stash attributes\n");
spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
return -EINVAL;
}
-
ret = update_domain_stash(dma_domain, dma_domain->stash_id);
-
spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
 
return ret;
@@ -426,9 +419,6 @@ static int fsl_pamu_set_domain_attr(struct iommu_domain 
*domain,
int ret = 0;
 
switch (attr_type) {
-   case DOMAIN_ATTR_FSL_PAMU_STASH:
-   ret = configure_domain_stash(dma_domain, data);
-   break;
case DOMAIN_ATTR_FSL_PAMU_ENABLE:
ret = configure_domain_dma_state(dma_domain, *(int *)data);
break;
diff --git a/drivers/iommu/fsl_pamu_domain.h b/drivers/iommu/fsl_pamu_domain.h
index 13ee06e0ef0136..cd488004acd1b3 100644
--- a/drivers/iommu/fsl_pamu_domain.h
+++ b/drivers/iommu/fsl_pamu_domain.h
@@ -22,9 +22,7 @@ struct fsl_dma_domain {
 *
 */
int enabled;
-   /* stash_id obtained from the stash attribute details */
u32 stash_id;
-   struct pamu_stash_attribute dma_stash;
u32 snoop_id;
struct iommu_domain iommu_domain;
spinlock_t  domain_lock;
diff --git a/drivers/soc/fsl/qbman/qman_portal.c 
b/drivers/soc/fsl/qbman/qman_portal.c
index 9ee1663f422cbf..798b3a1ffd0b9c 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -47,7 +47,6 @@ static void portal_set_cpu(struct qm_portal_config *pcfg, int 
cpu)
 #ifdef CONFIG_FSL_PAMU
struct device *dev = pcfg->dev;
int window_count = 1;
-   struct pamu_stash_attribute stash_attr;
int ret;
 
pcfg->iommu_domain = iommu_domain_alloc(&platform_bus_type);
@@ -55,13 +54,9 @@ static void portal_set_cpu(struct qm_portal_config *pcfg, 
int cpu)
dev_err(dev, "%s(): iommu_domain_alloc() failed", __func__);
goto no_iommu;
}
-   stash_attr.cpu = cpu;
-   stash_attr.cache = PAMU_ATTR_CACHE_L1;
-   ret = iommu_domain_set_attr(pcfg->iommu_domain,
-   DOMAIN_ATTR_FSL_PAMU_STASH,
-   &stash_attr);
+   ret = fsl_pamu_configure_l1_stash(pcfg->iommu_domain, cpu);
if (ret < 0) {
-   dev_err(dev, "%s(): iommu_domain_set_attr() = %d",
+   dev_err(dev, "%s(): fsl_pamu_configure_l1

[PATCH 06/17] iommu/fsl_pamu: remove ->domain_window_enable

2021-03-01 Thread Christoph Hellwig
The only thing that fsl_pamu_window_enable does for the current caller
is to fill in the prot value in the only dma_window structure, and to
propagate a few values from the iommu_domain_geometry struture into the
dma_window.  Remove the dma_window entirely, hardcode the prot value and
otherwise use the iommu_domain_geometry structure instead.

Remove the now unused ->domain_window_enable iommu method.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 182 +++-
 drivers/iommu/fsl_pamu_domain.h |  17 ---
 drivers/iommu/iommu.c   |  11 --
 drivers/soc/fsl/qbman/qman_portal.c |   7 --
 include/linux/iommu.h   |  17 ---
 5 files changed, 14 insertions(+), 220 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index e6bdd38fc18409..fd2bc88b690465 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -54,34 +54,18 @@ static int __init iommu_init_mempool(void)
return 0;
 }
 
-static phys_addr_t get_phys_addr(struct fsl_dma_domain *dma_domain, dma_addr_t 
iova)
-{
-   struct dma_window *win_ptr = &dma_domain->win_arr[0];
-   struct iommu_domain_geometry *geom;
-
-   geom = &dma_domain->iommu_domain.geometry;
-
-   if (win_ptr->valid)
-   return win_ptr->paddr + (iova & (win_ptr->size - 1));
-
-   return 0;
-}
-
 /* Map the DMA window corresponding to the LIODN */
 static int map_liodn(int liodn, struct fsl_dma_domain *dma_domain)
 {
int ret;
-   struct dma_window *wnd = &dma_domain->win_arr[0];
-   phys_addr_t wnd_addr = dma_domain->iommu_domain.geometry.aperture_start;
+   struct iommu_domain_geometry *geom = &dma_domain->iommu_domain.geometry;
unsigned long flags;
 
spin_lock_irqsave(&iommu_lock, flags);
-   ret = pamu_config_ppaace(liodn, wnd_addr,
-wnd->size,
-~(u32)0,
-wnd->paddr >> PAMU_PAGE_SHIFT,
-dma_domain->snoop_id, dma_domain->stash_id,
-wnd->prot);
+   ret = pamu_config_ppaace(liodn, geom->aperture_start,
+geom->aperture_end - 1, ~(u32)0,
+0, dma_domain->snoop_id, dma_domain->stash_id,
+PAACE_AP_PERMS_QUERY | PAACE_AP_PERMS_UPDATE);
spin_unlock_irqrestore(&iommu_lock, flags);
if (ret)
pr_debug("PAACE configuration failed for liodn %d\n", liodn);
@@ -89,33 +73,6 @@ static int map_liodn(int liodn, struct fsl_dma_domain 
*dma_domain)
return ret;
 }
 
-/* Update window/subwindow mapping for the LIODN */
-static int update_liodn(int liodn, struct fsl_dma_domain *dma_domain, u32 
wnd_nr)
-{
-   int ret;
-   struct dma_window *wnd = &dma_domain->win_arr[wnd_nr];
-   phys_addr_t wnd_addr;
-   unsigned long flags;
-
-   spin_lock_irqsave(&iommu_lock, flags);
-
-   wnd_addr = dma_domain->iommu_domain.geometry.aperture_start;
-
-   ret = pamu_config_ppaace(liodn, wnd_addr,
-wnd->size,
-~(u32)0,
-wnd->paddr >> PAMU_PAGE_SHIFT,
-dma_domain->snoop_id, dma_domain->stash_id,
-wnd->prot);
-   if (ret)
-   pr_debug("Window reconfiguration failed for liodn %d\n",
-liodn);
-
-   spin_unlock_irqrestore(&iommu_lock, flags);
-
-   return ret;
-}
-
 static int update_liodn_stash(int liodn, struct fsl_dma_domain *dma_domain,
  u32 val)
 {
@@ -172,26 +129,6 @@ static int pamu_set_liodn(int liodn, struct device *dev,
return ret;
 }
 
-static int check_size(u64 size, dma_addr_t iova)
-{
-   /*
-* Size must be a power of two and at least be equal
-* to PAMU page size.
-*/
-   if ((size & (size - 1)) || size < PAMU_PAGE_SIZE) {
-   pr_debug("Size too small or not a power of two\n");
-   return -EINVAL;
-   }
-
-   /* iova must be page size aligned */
-   if (iova & (size - 1)) {
-   pr_debug("Address is not aligned with window size\n");
-   return -EINVAL;
-   }
-
-   return 0;
-}
-
 static void remove_device_ref(struct device_domain_info *info)
 {
unsigned long flags;
@@ -257,13 +194,10 @@ static void attach_device(struct fsl_dma_domain 
*dma_domain, int liodn, struct d
 static phys_addr_t fsl_pamu_iova_to_phys(struct iommu_domain *domain,
 dma_addr_t iova)
 {
-   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
-
if (iova < domain->geometry.aperture_start ||
iova > domain->geometry.aperture_end)
return 0;
-
-   return get_phys_addr(dma_do

[PATCH 05/17] iommu/fsl_pamu: remove support for multiple windows

2021-03-01 Thread Christoph Hellwig
The only domains allocated forces use of a single window.  Remove all
the code related to multiple window support, as well as the need for
qman_portal to force a single window.

Remove the now unused DOMAIN_ATTR_WINDOWS iommu_attr.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu.c| 264 +-
 drivers/iommu/fsl_pamu.h|  10 +-
 drivers/iommu/fsl_pamu_domain.c | 275 +---
 drivers/iommu/fsl_pamu_domain.h |  12 +-
 drivers/soc/fsl/qbman/qman_portal.c |   7 -
 include/linux/iommu.h   |   1 -
 6 files changed, 59 insertions(+), 510 deletions(-)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index b9a974d9783113..3e1647cd5ad47a 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -63,19 +63,6 @@ static const struct of_device_id l3_device_ids[] = {
 /* maximum subwindows permitted per liodn */
 static u32 max_subwindow_count;
 
-/* Pool for fspi allocation */
-static struct gen_pool *spaace_pool;
-
-/**
- * pamu_get_max_subwin_cnt() - Return the maximum supported
- * subwindow count per liodn.
- *
- */
-u32 pamu_get_max_subwin_cnt(void)
-{
-   return max_subwindow_count;
-}
-
 /**
  * pamu_get_ppaace() - Return the primary PACCE
  * @liodn: liodn PAACT index for desired PAACE
@@ -155,13 +142,6 @@ static unsigned int map_addrspace_size_to_wse(phys_addr_t 
addrspace_size)
return fls64(addrspace_size) - 2;
 }
 
-/* Derive the PAACE window count encoding for the subwindow count */
-static unsigned int map_subwindow_cnt_to_wce(u32 subwindow_cnt)
-{
-   /* window count is 2^(WCE+1) bytes */
-   return __ffs(subwindow_cnt) - 1;
-}
-
 /*
  * Set the PAACE type as primary and set the coherency required domain
  * attribute
@@ -174,89 +154,11 @@ static void pamu_init_ppaace(struct paace *ppaace)
   PAACE_M_COHERENCE_REQ);
 }
 
-/*
- * Set the PAACE type as secondary and set the coherency required domain
- * attribute.
- */
-static void pamu_init_spaace(struct paace *spaace)
-{
-   set_bf(spaace->addr_bitfields, PAACE_AF_PT, PAACE_PT_SECONDARY);
-   set_bf(spaace->domain_attr.to_host.coherency_required, PAACE_DA_HOST_CR,
-  PAACE_M_COHERENCE_REQ);
-}
-
-/*
- * Return the spaace (corresponding to the secondary window index)
- * for a particular ppaace.
- */
-static struct paace *pamu_get_spaace(struct paace *paace, u32 wnum)
-{
-   u32 subwin_cnt;
-   struct paace *spaace = NULL;
-
-   subwin_cnt = 1UL << (get_bf(paace->impl_attr, PAACE_IA_WCE) + 1);
-
-   if (wnum < subwin_cnt)
-   spaace = &spaact[paace->fspi + wnum];
-   else
-   pr_debug("secondary paace out of bounds\n");
-
-   return spaace;
-}
-
-/**
- * pamu_get_fspi_and_allocate() - Allocates fspi index and reserves subwindows
- *required for primary PAACE in the secondary
- *PAACE table.
- * @subwin_cnt: Number of subwindows to be reserved.
- *
- * A PPAACE entry may have a number of associated subwindows. A subwindow
- * corresponds to a SPAACE entry in the SPAACT table. Each PAACE entry stores
- * the index (fspi) of the first SPAACE entry in the SPAACT table. This
- * function returns the index of the first SPAACE entry. The remaining
- * SPAACE entries are reserved contiguously from that index.
- *
- * Returns a valid fspi index in the range of 0 - SPAACE_NUMBER_ENTRIES on 
success.
- * If no SPAACE entry is available or the allocator can not reserve the 
required
- * number of contiguous entries function returns ULONG_MAX indicating a 
failure.
- *
- */
-static unsigned long pamu_get_fspi_and_allocate(u32 subwin_cnt)
-{
-   unsigned long spaace_addr;
-
-   spaace_addr = gen_pool_alloc(spaace_pool, subwin_cnt * sizeof(struct 
paace));
-   if (!spaace_addr)
-   return ULONG_MAX;
-
-   return (spaace_addr - (unsigned long)spaact) / (sizeof(struct paace));
-}
-
-/* Release the subwindows reserved for a particular LIODN */
-void pamu_free_subwins(int liodn)
-{
-   struct paace *ppaace;
-   u32 subwin_cnt, size;
-
-   ppaace = pamu_get_ppaace(liodn);
-   if (!ppaace) {
-   pr_debug("Invalid liodn entry\n");
-   return;
-   }
-
-   if (get_bf(ppaace->addr_bitfields, PPAACE_AF_MW)) {
-   subwin_cnt = 1UL << (get_bf(ppaace->impl_attr, PAACE_IA_WCE) + 
1);
-   size = (subwin_cnt - 1) * sizeof(struct paace);
-   gen_pool_free(spaace_pool, (unsigned 
long)&spaact[ppaace->fspi], size);
-   set_bf(ppaace->addr_bitfields, PPAACE_AF_MW, 0);
-   }
-}
-
 /*
  * Function used for updating stash destination for the coressponding
  * LIODN.
  */
-int  pamu_update_paace_stash(int liodn, u32 subwin, u32 value)
+int pamu_update_paace_stash(int liodn, u32 value)
 {
struct paace *paace;
 
@@ -265,11 +167,6 @@ int  pamu_update_paace_stash(int liodn, u32 s

[PATCH 04/17] iommu/fsl_pamu: merge iommu_alloc_dma_domain into fsl_pamu_domain_alloc

2021-03-01 Thread Christoph Hellwig
Keep the functionality to allocate the domain together.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 34 ++---
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 7bd08ddad07779..a4da5597755d3d 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -292,25 +292,6 @@ static int check_size(u64 size, dma_addr_t iova)
return 0;
 }
 
-static struct fsl_dma_domain *iommu_alloc_dma_domain(void)
-{
-   struct fsl_dma_domain *domain;
-
-   domain = kmem_cache_zalloc(fsl_pamu_domain_cache, GFP_KERNEL);
-   if (!domain)
-   return NULL;
-
-   domain->stash_id = ~(u32)0;
-   domain->snoop_id = ~(u32)0;
-   domain->win_cnt = pamu_get_max_subwin_cnt();
-
-   INIT_LIST_HEAD(&domain->devices);
-
-   spin_lock_init(&domain->domain_lock);
-
-   return domain;
-}
-
 static void remove_device_ref(struct device_domain_info *info, u32 win_cnt)
 {
unsigned long flags;
@@ -412,12 +393,17 @@ static struct iommu_domain 
*fsl_pamu_domain_alloc(unsigned type)
if (type != IOMMU_DOMAIN_UNMANAGED)
return NULL;
 
-   dma_domain = iommu_alloc_dma_domain();
-   if (!dma_domain) {
-   pr_debug("dma_domain allocation failed\n");
+   dma_domain = kmem_cache_zalloc(fsl_pamu_domain_cache, GFP_KERNEL);
+   if (!dma_domain)
return NULL;
-   }
-   /* defaul geometry 64 GB i.e. maximum system address */
+
+   dma_domain->stash_id = ~(u32)0;
+   dma_domain->snoop_id = ~(u32)0;
+   dma_domain->win_cnt = pamu_get_max_subwin_cnt();
+   INIT_LIST_HEAD(&dma_domain->devices);
+   spin_lock_init(&dma_domain->domain_lock);
+
+   /* default geometry 64 GB i.e. maximum system address */
dma_domain->iommu_domain. geometry.aperture_start = 0;
dma_domain->iommu_domain.geometry.aperture_end = (1ULL << 36) - 1;
dma_domain->iommu_domain.geometry.force_aperture = true;
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/17] iommu/fsl_pamu: remove support for setting DOMAIN_ATTR_GEOMETRY

2021-03-01 Thread Christoph Hellwig
The default geometry is the same as the one set by qman_port given
that FSL_PAMU depends on having 64-bit physical and thus DMA addresses.

Remove the support to update the geometry and remove the now pointless
geom_size field.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 55 +++--
 drivers/iommu/fsl_pamu_domain.h |  6 
 drivers/soc/fsl/qbman/qman_portal.c | 12 ---
 3 files changed, 5 insertions(+), 68 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index e587ec43f7e750..7bd08ddad07779 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -62,7 +62,7 @@ static phys_addr_t get_phys_addr(struct fsl_dma_domain 
*dma_domain, dma_addr_t i
 
geom = &dma_domain->iommu_domain.geometry;
 
-   if (!win_cnt || !dma_domain->geom_size) {
+   if (!win_cnt) {
pr_debug("Number of windows/geometry not configured for the 
domain\n");
return 0;
}
@@ -72,7 +72,7 @@ static phys_addr_t get_phys_addr(struct fsl_dma_domain 
*dma_domain, dma_addr_t i
dma_addr_t subwin_iova;
u32 wnd;
 
-   subwin_size = dma_domain->geom_size >> ilog2(win_cnt);
+   subwin_size = (geom->aperture_end + 1) >> ilog2(win_cnt);
subwin_iova = iova & ~(subwin_size - 1);
wnd = (subwin_iova - geom->aperture_start) >> 
ilog2(subwin_size);
win_ptr = &dma_domain->win_arr[wnd];
@@ -234,7 +234,7 @@ static int pamu_set_liodn(int liodn, struct device *dev,
get_ome_index(&omi_index, dev);
 
window_addr = geom_attr->aperture_start;
-   window_size = dma_domain->geom_size;
+   window_size = geom_attr->aperture_end + 1;
 
spin_lock_irqsave(&iommu_lock, flags);
ret = pamu_disable_liodn(liodn);
@@ -303,7 +303,6 @@ static struct fsl_dma_domain *iommu_alloc_dma_domain(void)
domain->stash_id = ~(u32)0;
domain->snoop_id = ~(u32)0;
domain->win_cnt = pamu_get_max_subwin_cnt();
-   domain->geom_size = 0;
 
INIT_LIST_HEAD(&domain->devices);
 
@@ -502,7 +501,8 @@ static int fsl_pamu_window_enable(struct iommu_domain 
*domain, u32 wnd_nr,
return -EINVAL;
}
 
-   win_size = dma_domain->geom_size >> ilog2(dma_domain->win_cnt);
+   win_size = (domain->geometry.aperture_end + 1) >>
+   ilog2(dma_domain->win_cnt);
if (size > win_size) {
pr_debug("Invalid window size\n");
spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
@@ -665,41 +665,6 @@ static void fsl_pamu_detach_device(struct iommu_domain 
*domain,
pr_debug("missing fsl,liodn property at %pOF\n", dev->of_node);
 }
 
-static  int configure_domain_geometry(struct iommu_domain *domain, void *data)
-{
-   struct iommu_domain_geometry *geom_attr = data;
-   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
-   dma_addr_t geom_size;
-   unsigned long flags;
-
-   geom_size = geom_attr->aperture_end - geom_attr->aperture_start + 1;
-   /*
-* Sanity check the geometry size. Also, we do not support
-* DMA outside of the geometry.
-*/
-   if (check_size(geom_size, geom_attr->aperture_start) ||
-   !geom_attr->force_aperture) {
-   pr_debug("Invalid PAMU geometry attributes\n");
-   return -EINVAL;
-   }
-
-   spin_lock_irqsave(&dma_domain->domain_lock, flags);
-   if (dma_domain->enabled) {
-   pr_debug("Can't set geometry attributes as domain is active\n");
-   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
-   return  -EBUSY;
-   }
-
-   /* Copy the domain geometry information */
-   memcpy(&domain->geometry, geom_attr,
-  sizeof(struct iommu_domain_geometry));
-   dma_domain->geom_size = geom_size;
-
-   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
-
-   return 0;
-}
-
 /* Set the domain stash attribute */
 static int configure_domain_stash(struct fsl_dma_domain *dma_domain, void 
*data)
 {
@@ -769,13 +734,6 @@ static int fsl_pamu_set_windows(struct iommu_domain 
*domain, u32 w_count)
return  -EBUSY;
}
 
-   /* Ensure that the geometry has been set for the domain */
-   if (!dma_domain->geom_size) {
-   pr_debug("Please configure geometry before setting the number 
of windows\n");
-   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
-   return -EINVAL;
-   }
-
/*
 * Ensure we have valid window count i.e. it should be less than
 * maximum permissible limit and should be a power of two.
@@ -811,9 +769,6 @@ static int fsl_pamu_set_domain_attr(struct iommu_domain 
*domain,
int ret = 0;
 
switch (attr_type) {
-   case DOMAIN_ATTR_GEOME

[PATCH 02/17] iommu/fsl_pamu: remove fsl_pamu_get_domain_attr

2021-03-01 Thread Christoph Hellwig
None of the values returned by this function are ever queried.  Also
remove the DOMAIN_ATTR_FSL_PAMUV1 enum value that is not otherwise used.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 30 --
 include/linux/iommu.h   |  4 
 2 files changed, 34 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 53380cf1fa452f..e587ec43f7e750 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -832,35 +832,6 @@ static int fsl_pamu_set_domain_attr(struct iommu_domain 
*domain,
return ret;
 }
 
-static int fsl_pamu_get_domain_attr(struct iommu_domain *domain,
-   enum iommu_attr attr_type, void *data)
-{
-   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
-   int ret = 0;
-
-   switch (attr_type) {
-   case DOMAIN_ATTR_FSL_PAMU_STASH:
-   memcpy(data, &dma_domain->dma_stash,
-  sizeof(struct pamu_stash_attribute));
-   break;
-   case DOMAIN_ATTR_FSL_PAMU_ENABLE:
-   *(int *)data = dma_domain->enabled;
-   break;
-   case DOMAIN_ATTR_FSL_PAMUV1:
-   *(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
-   break;
-   case DOMAIN_ATTR_WINDOWS:
-   *(u32 *)data = dma_domain->win_cnt;
-   break;
-   default:
-   pr_debug("Unsupported attribute type\n");
-   ret = -EINVAL;
-   break;
-   }
-
-   return ret;
-}
-
 static struct iommu_group *get_device_iommu_group(struct device *dev)
 {
struct iommu_group *group;
@@ -987,7 +958,6 @@ static const struct iommu_ops fsl_pamu_ops = {
.domain_window_enable = fsl_pamu_window_enable,
.iova_to_phys   = fsl_pamu_iova_to_phys,
.domain_set_attr = fsl_pamu_set_domain_attr,
-   .domain_get_attr = fsl_pamu_get_domain_attr,
.probe_device   = fsl_pamu_probe_device,
.release_device = fsl_pamu_release_device,
.device_group   = fsl_pamu_device_group,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 47c8b318d8f523..52874ae164dd60 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -104,9 +104,6 @@ enum iommu_cap {
  *  -the actual size of the mapped region of a window must be power
  *   of 2 starting with 4KB and physical address must be naturally
  *   aligned.
- * DOMAIN_ATTR_FSL_PAMUV1 corresponds to the above mentioned contraints.
- * The caller can invoke iommu_domain_get_attr to check if the underlying
- * iommu implementation supports these constraints.
  */
 
 enum iommu_attr {
@@ -115,7 +112,6 @@ enum iommu_attr {
DOMAIN_ATTR_WINDOWS,
DOMAIN_ATTR_FSL_PAMU_STASH,
DOMAIN_ATTR_FSL_PAMU_ENABLE,
-   DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
DOMAIN_ATTR_IO_PGTABLE_CFG,
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/17] iommu: remove the unused domain_window_disable method

2021-03-01 Thread Christoph Hellwig
domain_window_disable is wired up by fsl_pamu, but never actually called.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/fsl_pamu_domain.c | 48 -
 include/linux/iommu.h   |  2 --
 2 files changed, 50 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index b2110767caf49c..53380cf1fa452f 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -473,53 +473,6 @@ static int update_domain_mapping(struct fsl_dma_domain 
*dma_domain, u32 wnd_nr)
return ret;
 }
 
-static int disable_domain_win(struct fsl_dma_domain *dma_domain, u32 wnd_nr)
-{
-   struct device_domain_info *info;
-   int ret = 0;
-
-   list_for_each_entry(info, &dma_domain->devices, link) {
-   if (dma_domain->win_cnt == 1 && dma_domain->enabled) {
-   ret = pamu_disable_liodn(info->liodn);
-   if (!ret)
-   dma_domain->enabled = 0;
-   } else {
-   ret = pamu_disable_spaace(info->liodn, wnd_nr);
-   }
-   }
-
-   return ret;
-}
-
-static void fsl_pamu_window_disable(struct iommu_domain *domain, u32 wnd_nr)
-{
-   struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
-   unsigned long flags;
-   int ret;
-
-   spin_lock_irqsave(&dma_domain->domain_lock, flags);
-   if (!dma_domain->win_arr) {
-   pr_debug("Number of windows not configured\n");
-   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
-   return;
-   }
-
-   if (wnd_nr >= dma_domain->win_cnt) {
-   pr_debug("Invalid window index\n");
-   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
-   return;
-   }
-
-   if (dma_domain->win_arr[wnd_nr].valid) {
-   ret = disable_domain_win(dma_domain, wnd_nr);
-   if (!ret) {
-   dma_domain->win_arr[wnd_nr].valid = 0;
-   dma_domain->mapped--;
-   }
-   }
-
-   spin_unlock_irqrestore(&dma_domain->domain_lock, flags);
-}
 
 static int fsl_pamu_window_enable(struct iommu_domain *domain, u32 wnd_nr,
  phys_addr_t paddr, u64 size, int prot)
@@ -1032,7 +985,6 @@ static const struct iommu_ops fsl_pamu_ops = {
.attach_dev = fsl_pamu_attach_device,
.detach_dev = fsl_pamu_detach_device,
.domain_window_enable = fsl_pamu_window_enable,
-   .domain_window_disable = fsl_pamu_window_disable,
.iova_to_phys   = fsl_pamu_iova_to_phys,
.domain_set_attr = fsl_pamu_set_domain_attr,
.domain_get_attr = fsl_pamu_get_domain_attr,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430af4..47c8b318d8f523 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -209,7 +209,6 @@ struct iommu_iotlb_gather {
  * @put_resv_regions: Free list of reserved regions for a device
  * @apply_resv_region: Temporary helper call-back for iova reserved ranges
  * @domain_window_enable: Configure and enable a particular window for a domain
- * @domain_window_disable: Disable a particular window for a domain
  * @of_xlate: add OF master IDs to iommu grouping
  * @is_attach_deferred: Check if domain attach should be deferred from iommu
  *  driver init to device driver init (default no)
@@ -270,7 +269,6 @@ struct iommu_ops {
/* Window handling functions */
int (*domain_window_enable)(struct iommu_domain *domain, u32 wnd_nr,
phys_addr_t paddr, u64 size, int prot);
-   void (*domain_window_disable)(struct iommu_domain *domain, u32 wnd_nr);
 
int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
bool (*is_attach_deferred)(struct iommu_domain *domain, struct device 
*dev);
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


cleanup unused or almost unused IOMMU APIs and the FSL PAMU driver

2021-03-01 Thread Christoph Hellwig
Hi all,

there are a bunch of IOMMU APIs that are entirely unused, or only used as
a private communication channel between the FSL PAMU driver and it's only
consumer, the qbman portal driver.

So this series drops a huge chunk of entirely unused FSL PAMU
functionality, then drops all kinds of unused IOMMU APIs, and then
replaces what is left of the iommu_attrs with properly typed, smaller
and easier to use specific APIs.

Diffstat:
 arch/powerpc/include/asm/fsl_pamu_stash.h   |   12 
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |2 
 drivers/iommu/amd/iommu.c   |   23 
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   85 ---
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |  122 +---
 drivers/iommu/dma-iommu.c   |8 
 drivers/iommu/fsl_pamu.c|  264 --
 drivers/iommu/fsl_pamu.h|   10 
 drivers/iommu/fsl_pamu_domain.c |  694 ++--
 drivers/iommu/fsl_pamu_domain.h |   46 -
 drivers/iommu/intel/iommu.c |   55 --
 drivers/iommu/iommu.c   |   75 ---
 drivers/soc/fsl/qbman/qman_portal.c |   56 --
 drivers/vfio/vfio_iommu_type1.c |   31 -
 drivers/vhost/vdpa.c|   10 
 include/linux/iommu.h   |   81 ---
 16 files changed, 214 insertions(+), 1360 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous

2021-03-01 Thread Christoph Hellwig
On Mon, Mar 01, 2021 at 05:02:43PM +0900, Sergey Senozhatsky wrote:
> > I plan to resend the whole series with the comments very soon.
> 
> Oh, OK.
> 
> I thought the series was in linux-next already so a single patch
> would do.

It was, with an emphasys on was.  I hadn't realized I need an ack
from Laurent for uvcvideo, and he didn't have time to review it by the
time we noticed.  So I'll repost it with him in the receipients list and
the small fixups accumulated now that -rc1 is out.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API

2021-03-01 Thread Christoph Hellwig
On Tue, Feb 16, 2021 at 06:55:39PM +, Robin Murphy wrote:
> On 2021-02-02 09:51, Christoph Hellwig wrote:
>> Add a new API that returns a potentiall virtually non-contigous sg_table
>> and a DMA address.  This API is only properly implemented for dma-iommu
>> and will simply return a contigious chunk as a fallback.
>>
>> The intent is that media drivers can use this API if either:
>>
>>   - no kernel mapping or only temporary kernel mappings are required.
>> That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
>>   - a kernel mapping is required for cached and DMA mapped pages, but
>> the driver also needs the pages to e.g. map them to userspace.
>> In that sense it is a replacement for some aspects of the recently
>> removed and never fully implemented DMA_ATTR_NON_CONSISTENT
>>
>> Signed-off-by: Christoph Hellwig 
>> ---
>>   Documentation/core-api/dma-api.rst |  74 +
>>   include/linux/dma-map-ops.h|  18 +
>>   include/linux/dma-mapping.h|  31 +
>>   kernel/dma/mapping.c   | 103 +
>>   4 files changed, 226 insertions(+)
>>
>> diff --git a/Documentation/core-api/dma-api.rst 
>> b/Documentation/core-api/dma-api.rst
>> index 157a474ae54416..e24b2447f4bfe6 100644
>> --- a/Documentation/core-api/dma-api.rst
>> +++ b/Documentation/core-api/dma-api.rst
>> @@ -594,6 +594,80 @@ dev, size, dma_handle and dir must all be the same as 
>> those passed into
>>   dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
>>   dma_alloc_noncoherent().
>>   +::
>> +
>> +struct sg_table *
>> +dma_alloc_noncontiguous(struct device *dev, size_t size,
>> +enum dma_data_direction dir, gfp_t gfp)
>> +
>> +This routine allocates   bytes of non-coherent and possibly 
>> non-contiguous
>> +memory.  It returns a pointer to struct sg_table that describes the 
>> allocated
>> +and DMA mapped memory, or NULL if the allocation failed. The resulting 
>> memory
>> +can be used for struct page mapped into a scatterlist are suitable for.
>> +
>> +The return sg_table is guaranteed to have 1 single DMA mapped segment as
>> +indicated by sgt->nents, but it might have multiple CPU side segments as
>> +indicated by sgt->orig_nents.
>> +
>> +The dir parameter specified if data is read and/or written by the device,
>> +see dma_map_single() for details.
>> +
>> +The gfp parameter allows the caller to specify the ``GFP_`` flags (see
>> +kmalloc()) for the allocation, but rejects flags used to specify a memory
>> +zone such as GFP_DMA or GFP_HIGHMEM.
>> +
>> +Before giving the memory to the device, dma_sync_sgtable_for_device() needs
>> +to be called, and before reading memory written by the device,
>> +dma_sync_sgtable_for_cpu(), just like for streaming DMA mappings that are
>> +reused.
>> +
>> +::
>> +
>> +void
>> +dma_free_noncontiguous(struct device *dev, size_t size,
>> +   struct sg_table *sgt,
>> +   enum dma_data_direction dir)
>> +
>> +Free memory previously allocated using dma_alloc_noncontiguous().  dev, 
>> size,
>> +and dir must all be the same as those passed into dma_alloc_noncontiguous().
>> +sgt must be the pointer returned by dma_alloc_noncontiguous().
>> +
>> +::
>> +
>> +void *
>> +dma_vmap_noncontiguous(struct device *dev, size_t size,
>> +struct sg_table *sgt)
>> +
>> +Return a contiguous kernel mapping for an allocation returned from
>> +dma_alloc_noncontiguous().  dev and size must be the same as those passed 
>> into
>> +dma_alloc_noncontiguous().  sgt must be the pointer returned by
>> +dma_alloc_noncontiguous().
>> +
>> +Once a non-contiguous allocation is mapped using this function, the
>> +flush_kernel_vmap_range() and invalidate_kernel_vmap_range() APIs must be 
>> used
>> +to manage the coherency of the kernel mapping.
>
> Maybe say something like "coherency between the kernel mapping and any 
> userspace mappings"? Otherwise people like me may be easily confused and 
> think it's referring to coherency between the kernel mapping and the 
> device, where in most cases those APIs won't help at all :)

Well, it is all of the above for a VIVT cache setup.  I've ammended
it to:

Once a non-contiguous allocation is mapped using this function, the
flush_kernel_vmap_range() and invalidate_kernel_vmap_range() APIs must be used
to manage the coherency between the kernel mapping, the device and user space
mappings (if any).

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous

2021-03-01 Thread Sergey Senozhatsky
On (21/03/01 08:21), Christoph Hellwig wrote:
> On Mon, Mar 01, 2021 at 04:17:42PM +0900, Sergey Senozhatsky wrote:
> > > > Do you think we could add the attrs parameter to the
> > > > dma_alloc_noncontiguous() API?
> > > 
> > > Yes, we could probably do that.
> > 
> > I can cook a patch, unless somebody is already looking into it.
> 
> I plan to resend the whole series with the comments very soon.

Oh, OK.

I thought the series was in linux-next already so a single patch
would do.

-ss
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu