On 22.09.2017 18:21, Tomasz Nowicki wrote:
Hi Robin,
On 21.09.2017 17:52, Robin Murphy wrote:
The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.
Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.
Inspired by work by Zhen Lei <thunder.leiz...@huawei.com>.
Tested-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
Tested-by: Zhen Lei <thunder.leiz...@huawei.com>
Tested-by: Nate Watterson <nwatt...@codeaurora.org>
Signed-off-by: Robin Murphy <robin.mur...@arm.com>
---
v5: Fixed __cached_rbnode_delete_update() logic to update both nodes
when necessary
drivers/iommu/iova.c | 60
++++++++++++++++++++++++----------------------------
include/linux/iova.h | 3 ++-
2 files changed, 30 insertions(+), 33 deletions(-)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 20be9a8b3188..c6f5a22f8d20 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -48,6 +48,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned
long granule,
spin_lock_init(&iovad->iova_rbtree_lock);
iovad->rbroot = RB_ROOT;
+ iovad->cached_node = NULL;
iovad->cached32_node = NULL;
iovad->granule = granule;
iovad->start_pfn = start_pfn;
@@ -110,48 +111,44 @@ EXPORT_SYMBOL_GPL(init_iova_flush_queue);
static struct rb_node *
__get_cached_rbnode(struct iova_domain *iovad, unsigned long
*limit_pfn)
{
- if ((*limit_pfn > iovad->dma_32bit_pfn) ||
- (iovad->cached32_node == NULL))
+ struct rb_node *cached_node = NULL;
+ struct iova *curr_iova;
+
+ if (*limit_pfn <= iovad->dma_32bit_pfn)
+ cached_node = iovad->cached32_node;
+ if (!cached_node)
+ cached_node = iovad->cached_node;
+ if (!cached_node)
return rb_last(&iovad->rbroot);
- else {
- struct rb_node *prev_node = rb_prev(iovad->cached32_node);
- struct iova *curr_iova =
- rb_entry(iovad->cached32_node, struct iova, node);
- *limit_pfn = curr_iova->pfn_lo;
- return prev_node;
- }
+
+ curr_iova = rb_entry(cached_node, struct iova, node);
+ *limit_pfn = min(*limit_pfn, curr_iova->pfn_lo);
I guess this it the fix for stale pointer in iovad->cached32_node from
previous series but I think this is something more.
With this min() here we have real control over the limit_pfn we would
like to allocate at most. In other works, without your series two
subsequent calls can give us:
iova (ffff) = alloc_iova_fast(iovad, 1, DMA_BIT_MASK(32) >> shift);
iova (fffe)= alloc_iova_fast(iovad, 1, DMA_BIT_MASK(16) >> shift);
We do not see this since nobody uses limit_pfn below DMA_BIT_MASK(32)
now. That might be done intentionally so I ask for your opinion.
Also, with your patch and two the same alloc_iova_fast() calls in
iteration may get 32-bit space full much faster. Please correct me if I
messed things up.
After few more thoughts, I think this is unrealistic case.
In any case, please consider below fix:
static void
-__cached_rbnode_insert_update(struct iova_domain *iovad,
- unsigned long limit_pfn, struct iova *new)
+__cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
{
- if (limit_pfn != iovad->dma_32bit_pfn)
- return;
- iovad->cached32_node = &new->node;
+ if (new->pfn_hi == iovad->dma_32bit_pfn)
+ iovad->cached32_node = &new->node;
+ else
+ iovad->cached_node = &new->node;
}
Thanks,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu