On 2021-02-04 19:30, Christoph Hellwig wrote:
Respect the min_align_mask in struct device_dma_parameters in swiotlb.

There are two parts to it:
  1) for the lower bits of the alignment inside the io tlb slot, just
     extent the size of the allocation and leave the start of the slot
      empty
  2) for the high bits ensure we find a slot that matches the high bits
     of the alignment to avoid wasting too much memory

Based on an earlier patch from Jianxiong Gao <jx...@google.com>.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
  kernel/dma/swiotlb.c | 49 +++++++++++++++++++++++++++++++++++++-------
  1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6a2439826a1ba4..ab3192142b9906 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
        }
  }
+/*
+ * Return the offset into a iotlb slot required to keep the device happy.
+ */
+static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
+{
+       unsigned min_align_mask = dma_get_min_align_mask(dev);
+
+       if (!min_align_mask)
+               return 0;

I doubt that's beneficial - even if the compiler can convert it into a csel, it'll then be doing unnecessary work to throw away a cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters ;)

+       return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);

(BTW, for readability throughout, "#define IO_TLB_SIZE (1 << IO_TLB_SHIFT)" sure wouldn't go amiss...)

+}
+
  /*
   * Carefully handle integer overflow which can occur when boundary_mask == 
~0UL.
   */
@@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long 
boundary_mask)
        return nr_slots(boundary_mask + 1);
  }
+static inline bool check_alignment(phys_addr_t orig_addr,
+               dma_addr_t tbl_dma_addr, unsigned int index,
+               unsigned int min_align_mask)
+{
+       if (!min_align_mask)
+               return true;

Ditto - even the 5 or so operations this might skip is unlikely to outweigh a branch on anything that matters, and again csel would be a net loss since x & 0 == y & 0 is still the correct answer.

+       return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) ==
+               (orig_addr & min_align_mask);
+}
+
  static unsigned int wrap_index(unsigned int index)
  {
        if (index >= io_tlb_nslabs)
@@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index)
   * Find a suitable number of IO TLB entries size that will fit this request 
and
   * allocate a buffer from that IO TLB pool.
   */
-static int find_slots(struct device *dev, size_t alloc_size,
-               dma_addr_t tbl_dma_addr)
+static int find_slots(struct device *dev, phys_addr_t orig_addr,
+               size_t alloc_size, dma_addr_t tbl_dma_addr)
  {
+       unsigned int min_align_mask = dma_get_min_align_mask(dev) &
+                       ~((1 << IO_TLB_SHIFT) - 1);
        unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
        unsigned int nslots = nr_slots(alloc_size), stride = 1;
        unsigned int index, wrap, count = 0, i;
@@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size,
         * For mappings greater than or equal to a page, we limit the stride
         * (and hence alignment) to a page size.
         */
-       if (alloc_size >= PAGE_SIZE)
+       if (min_align_mask)
+               stride = (min_align_mask + 1) >> IO_TLB_SHIFT;

So this can't underflow because "min_align_mask" is actually just the high-order bits representing the number of iotlb slots needed to meet the requirement, right? (It took a good 5 minutes to realise this wasn't doing what I initially thought it did...)

In that case, a) could the local var be called something like iotlb_align_mask to clarify that it's *not* just a copy of the device's min_align_mask, and b) maybe just have an unconditional initialisation that works either way:

        stride = (min_align_mask >> IO_TLB_SHIFT) + 1;

In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE in the call to check_alignment() below, or shift everything down by IO_TLB_SHIFT in check_alignment() itself, instead of mangling min_align_mask at all (I'm assuming we do need to ignore the low-order bits of orig_addr at this point).

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to