Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-04 Thread Benjamin Herrenschmidt
On Sat, 2015-04-04 at 07:27 -0400, Sowmini Varadhan wrote:
> One last question before I spin out v9.. the dma_mask code
> is a bit confusing to me, so I want to make sure... the code is 
> 
> > if (limit + tbl->it_offset > mask) {
> >   limit = mask - tbl->it_offset + 1;
> >   /* If we're constrained on address range, first try
> >* at the masked hint to avoid O(n) search complexity,
> >* but on second pass, start at 0 in pool 0.
> >*/
> >   if ((start & mask) >= limit || pass > 0) {
> >   spin_unlock(&(pool->lock));
> >   pool = &(tbl->pools[0]);
> >   spin_lock(&(pool->lock));
> >   start = pool->start;
> 
> So here I would need to mark need_flush to true, right?

Well, no but ...

So you aren't changing any pool hint, so you don't have to mark any pool
for flushing. The "n < hint" check will do the flush later on if needed
(the one I told you to add).

You only need to force a flush if you modify the hint.

*However*, you do an unlock, which means that if you have modified any
*other* pool's hint (ie, need_flush is true), you need to actually
perform a flush.

I think that business with flushing on unlock is becoming too ugly, I
would suggest you simplify things by moving need_flush to be part of the
pool structure (maybe a flag).

IE, when you modify a pool's hint, your mark it for flushing. 

When you allocate from a pool, you flush it at the end, just before you
return success, if either it was marked for flushing or n < hint (and
clear the mark of course).

That should cleanly factor all cases in a simpler & more readable code
path. The flush is done only when needed (an actual allocation happen on
a pool), you just "mark them dirty" when you change their hints, you
don't have to worry about the lock dropped case & pool hop'ing anymore.

Basically the logic looks like that:

.../...

n = iommu_area_alloc(...)
if (n == -1) {
if (pass == 0) {
change pool hint
pool->need_flush = true;
pass++;
goto again;
} else if (!largealloc &&  ) {
unlock/pool hop/lock
// no need to flush anything here
pool->hint = pool->start;
pool->need_flush = true;
etc...
} else {
n = DMA_ERROR_CODE;
goto fail;
}
}
end = n + pages;

if (n < pool->hint || pool->need_flush) {
pool->need_flush = false;
iommu->lazy_flush(...);
}
pool->hint = end;

.../...

Now this can be further simplified imho. For example that logic:

+   if (pass == 0 && handle && *handle &&
+   (*handle >= pool->start) && (*handle < pool->end))
+   start = *handle;
+   else
+   start = pool->hint;

Can move above the again: label. that way, you remove the pass test
and you no longer have to adjust pool->hint in the "if (likely(pass ==
0))" failure case. Just adjust "start" and try again, which means you
also no longer need to set need_flush.

Additionally, when pool hop'ing, now you only need to adjust "start" as
well. By not resetting it, you also remove the need for marking it "need
flush". I would add to that, why not set "start" to the new pool's hint
instead of the new pool's start ? That will save some flushes, no ?

Anton, do you remember why you reset the hint when changing pool ? I
don't see that gaining us anything and it does make lazy flushing more
complex.

At that point, you have removed all setters of need_flush, that logic
can be entirely removed, or am I just too tired (it's past midnight) and
missing something entirely here ?

We only need the if (n < pool->hint) check to do the lazy flush at the
end, that's it.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-04 Thread Sowmini Varadhan
One last question before I spin out v9.. the dma_mask code
is a bit confusing to me, so I want to make sure... the code is 

> if (limit + tbl->it_offset > mask) {
>   limit = mask - tbl->it_offset + 1;
>   /* If we're constrained on address range, first try
>* at the masked hint to avoid O(n) search complexity,
>* but on second pass, start at 0 in pool 0.
>*/
>   if ((start & mask) >= limit || pass > 0) {
>   spin_unlock(&(pool->lock));
>   pool = &(tbl->pools[0]);
>   spin_lock(&(pool->lock));
>   start = pool->start;

So here I would need to mark need_flush to true, right?


--Sowmini

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Sowmini Varadhan
On (04/04/15 08:06), Benjamin Herrenschmidt wrote:
> 
> No, I meant "n < pool->hint", ie, the start of the newly allocated
> block.

ah, got it. I'll do my drill with patchset and get back, probably by
Monday.

--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Benjamin Herrenschmidt
On Fri, 2015-04-03 at 14:28 -0400, Sowmini Varadhan wrote:
> 
> Just want to confirm:
> 
> > > + again:
> > > + if (pass == 0 && handle && *handle &&
> > > + (*handle >= pool->start) && (*handle < pool->end))
> > > + start = *handle;
> > > + else
> > > + start = pool->hint;
> > 
> > Now this means "handle" might be < pool->hint, in that case you also
> > need a lazy flush. Or rather only if the resulting alloc is. My
>  :
> 
> > > + } else {
> > > + /* give up */
> > > + n = DMA_ERROR_CODE;
> > > + goto bail;
> > > + }
> > > + }
> > 
> > Here, make this something like:
> > 
> > } else if (end < pool->hint)
> > need_flush = true;
> 
> you mean 
> 
>   } else if (start < pool->hint)
> 
> right?  (so I'm not missing some corner-case that you are thinking
> about here)

No, I meant "n < pool->hint", ie, the start of the newly allocated
block.

"end" hasn't been adjusted yet at that point but we don't want to
compare "end" anyway (which will be n + npages), we really want n, ie if
the *beginning* of the newly allocated chunk is before the end of the
previous one (after all they may even overlap if the previous one has
been freed).

Ben.

> --Sowmini
>  


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Sowmini Varadhan


Just want to confirm:

> > + again:
> > +   if (pass == 0 && handle && *handle &&
> > +   (*handle >= pool->start) && (*handle < pool->end))
> > +   start = *handle;
> > +   else
> > +   start = pool->hint;
> 
> Now this means "handle" might be < pool->hint, in that case you also
> need a lazy flush. Or rather only if the resulting alloc is. My
 :

> > +   } else {
> > +   /* give up */
> > +   n = DMA_ERROR_CODE;
> > +   goto bail;
> > +   }
> > +   }
> 
> Here, make this something like:
> 
>   } else if (end < pool->hint)
>   need_flush = true;

you mean 

} else if (start < pool->hint)

right?  (so I'm not missing some corner-case that you are thinking
about here)

--Sowmini
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread David Miller
From: Sowmini Varadhan 
Date: Thu, 2 Apr 2015 17:54:53 -0400

> the other question that comes to my mind is: the whole lazy_flush
> optimization probably works best when there is exactly one pool,
> and no large pools. In most other cases, we'd end up doing a lazy_flush
> when we wrap within our pool itself, losing the benefit of that 
> optimization. 
> 
> Given that the lazy_flush is mostly there to avoid regressions for 
> the older sun4u architectures (which have other hardware bottlenecks 
> anyway), and this code is rapidly getting messy, does it make sense
> to constrain the lazy_flush check to only apply for the 1-pool, 
> no-large-pool case?

I think it's better to have the multiple pools with more often global
flushing.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Thu, 2015-04-02 at 18:15 -0400, Sowmini Varadhan wrote:
> On (04/03/15 08:57), Benjamin Herrenschmidt wrote:
> > 
> > > > I only just noticed too, you completely dropped the code to honor
> > > > the dma mask. Why that ? Some devices rely on this.
> 
> /* Sowmini's comment about this coming from sparc origins.. */
> 
> > Probably, not that many devices have limits on DMA mask but they do
> > exist. It becomes more important if we decide to create a very large
> > IOMMU window that spans beyond 4G in order to support devices with
> > 32-bit DMA masks. Otherwise it's older devices mostly with <32-bit
> > masks.
> > 
> > In any case, for a generic piece of code, this should be supported.
> > Basically, assume that if we have something in the powerpc code, we need
> > it, if you remove it, we won't be able to use your code generically.
> 
> I see.
> 
> is the mask something that can be stored in the iommu_map_table as
> part of the init? 
> 
No, the mask is per device and has to be retrieved from the device.
Additionally the mask used for dma_map_* can be different from the
consistent mask used for alloc_coherent (we have a bug there on powerpc
which I'm trying to fix btw).

So it should be passed as an argument by the caller.

> I can see that the align_order has to be an additional arg to 
> iommu_tbl_range_alloc, not sure if mask falls in that category
> as well.

It does.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
On (04/03/15 08:57), Benjamin Herrenschmidt wrote:
> 
> > > I only just noticed too, you completely dropped the code to honor
> > > the dma mask. Why that ? Some devices rely on this.

/* Sowmini's comment about this coming from sparc origins.. */

> Probably, not that many devices have limits on DMA mask but they do
> exist. It becomes more important if we decide to create a very large
> IOMMU window that spans beyond 4G in order to support devices with
> 32-bit DMA masks. Otherwise it's older devices mostly with <32-bit
> masks.
> 
> In any case, for a generic piece of code, this should be supported.
> Basically, assume that if we have something in the powerpc code, we need
> it, if you remove it, we won't be able to use your code generically.

I see.

is the mask something that can be stored in the iommu_map_table as
part of the init? 

I can see that the align_order has to be an additional arg to 
iommu_tbl_range_alloc, not sure if mask falls in that category
as well.

--Sowmini

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Thu, 2015-04-02 at 17:54 -0400, Sowmini Varadhan wrote:
> the other question that comes to my mind is: the whole lazy_flush
> optimization probably works best when there is exactly one pool,
> and no large pools. In most other cases, we'd end up doing a lazy_flush
> when we wrap within our pool itself, losing the benefit of that 
> optimization. 

Not a big deal. The point of the optimization is to avoid flushing on
every map/unmap. As long as the pool wrapping is an order of magnitude
rarer than actual map/unmap operations, it's a win.

> Given that the lazy_flush is mostly there to avoid regressions for 
> the older sun4u architectures (which have other hardware bottlenecks 
> anyway), and this code is rapidly getting messy, does it make sense
> to constrain the lazy_flush check to only apply for the 1-pool, 
> no-large-pool case?

I was planning to use it to improve things on older G5s as well and I
definitely want pools on these. Leave it in, we just need to get it
right.

I think it's not that messy, if you refactor the way I suggested, it
basically boils down to conditionally flushing in two places, at
the point where the pool lock is dropped, if the hint for that pool
was moved backward.

Do the test once in the else case of the n == -1 for all the "normal"
cases (which includes the hint > limit, multi-pass wrap and *handle
below hint, they are all covered) and in the pool hop'ing case.

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Fri, 2015-04-03 at 09:01 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2015-04-03 at 08:57 +1100, Benjamin Herrenschmidt wrote:
> 
> > No, the last argument to iommu_area_alloc() which is passed from the
> > callers when doing consistent allocs. Basically, the DMA api mandates
> > that consistent allocs are naturally aligned (to their own size), we
> > implement that on powerpc by passing that alignment argument down.
> 
> Talking of this ... I notice this is not documented in DMA-API.txt...
> however many drivers make that assumption, and the DMA pool allocator
> in mm/dmapool.c as well.
> 
> Maybe somebody should update DMA-API.txt...

Ah it *is* documented in DMA-API-HOWTO.txt :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Fri, 2015-04-03 at 08:57 +1100, Benjamin Herrenschmidt wrote:

> No, the last argument to iommu_area_alloc() which is passed from the
> callers when doing consistent allocs. Basically, the DMA api mandates
> that consistent allocs are naturally aligned (to their own size), we
> implement that on powerpc by passing that alignment argument down.

Talking of this ... I notice this is not documented in DMA-API.txt...
however many drivers make that assumption, and the DMA pool allocator
in mm/dmapool.c as well.

Maybe somebody should update DMA-API.txt...

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Thu, 2015-04-02 at 17:43 -0400, Sowmini Varadhan wrote:
> On (04/03/15 07:54), Benjamin Herrenschmidt wrote:
> > > + limit = pool->end;
> > > +
> > > + /* The case below can happen if we have a small segment appended
> > > +  * to a large, or when the previous alloc was at the very end of
> > > +  * the available space. If so, go back to the beginning and flush.
> > > +  */
> > > + if (start >= limit) {
> > > + start = pool->start;
> > > + if (!large_pool && iommu->lazy_flush != NULL)
> > > + iommu->lazy_flush(iommu);
> > 
> > Add need_flush = false;
> 
> A few clarifications, while I parse the rest of your comments:
> 
> Not sure I follow- need_flush is initialized to true at the start of the 
> function?

No but you can loop back there via "goto again". However if you follow
my other comment and move the flush back to the end, then you don't need
that at all.

> > I only just noticed too, you completely dropped the code to honor
> > the dma mask. Why that ? Some devices rely on this.
> 
> so that's an interesting question: the existing iommu_range_alloc() in
> arch/sparc/kernel/iommu.c does not use the mask at all. I based most of
> the code  on this (except for the lock fragmentation part). 
> I dont know if this is arch specific.

Probably, not that many devices have limits on DMA mask but they do
exist. It becomes more important if we decide to create a very large
IOMMU window that spans beyond 4G in order to support devices with
32-bit DMA masks. Otherwise it's older devices mostly with <32-bit
masks.

In any case, for a generic piece of code, this should be supported.
Basically, assume that if we have something in the powerpc code, we need
it, if you remove it, we won't be able to use your code generically.

There are a few cases we can debate like our block allocation, but
things like the mask or the alignment constraints aren't in that group.

> > 
> > > + if (dev)
> > > + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > > +   1 << iommu->table_shift);
> > > + else
> > > + boundary_size = ALIGN(1UL << 32, 1 << iommu->table_shift);
> > > +
> > > + shift = iommu->table_map_base >> iommu->table_shift;
> > > + boundary_size = boundary_size >> iommu->table_shift;
> > > + /*
> > > +  * if the skip_span_boundary_check had been set during init, we set
> > > +  * things up so that iommu_is_span_boundary() merely checks if the
> > > +  * (index + npages) < num_tsb_entries
> > > +  */
> > > + if ((iommu->flags & IOMMU_NO_SPAN_BOUND) != 0) {
> > > + shift = 0;
> > > + boundary_size = iommu->poolsize * iommu->nr_pools;
> > > + }
> > > + n = iommu_area_alloc(iommu->map, limit, start, npages, shift,
> > > +  boundary_size, 0);
> > 
> > You have completely dropped the alignment support. This will break
> > drivers. There are cases (especially with consistent allocations) where
> 
> Again, not sure I follow? are you referring to the IOMMU_NO_SPAN_BOUND case?

No, the last argument to iommu_area_alloc() which is passed from the
callers when doing consistent allocs. Basically, the DMA api mandates
that consistent allocs are naturally aligned (to their own size), we
implement that on powerpc by passing that alignment argument down.

> That's very specific to LDC (sparc ldoms virtualization infra). The default
> is to not have IOMMU_NO_SPAN_BOUND set. 
> For the rest of the drivers, the code that sets up boundary_size aligns things
> in the same way as the ppc code.
> 
> 
> > the driver have alignment constraints on the address, those must be
> > preserved.
> > 
> 
> --Sowmini


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Tue, 2015-03-31 at 10:40 -0400, Sowmini Varadhan wrote:

> + if (largealloc) {
> + pool = &(iommu->large_pool);
> + spin_lock_irqsave(&pool->lock, flags);
> + pool_nr = 0; /* to keep compiler happy */
> + } else {
> + /* pick out pool_nr */
> + pool_nr =  pool_hash & (npools - 1);
> + pool = &(iommu->pools[pool_nr]);
> + spin_lock_irqsave(&(pool->lock), flags);
> + }

Move spin_lock_irqsave outside of if/else

> + again:
> + if (pass == 0 && handle && *handle &&
> + (*handle >= pool->start) && (*handle < pool->end))
> + start = *handle;
> + else
> + start = pool->hint;

Now this means "handle" might be < pool->hint, in that case you also
need a lazy flush. Or rather only if the resulting alloc is. My
suggestion originally was to test that after the area alloc. See further
down.

> + limit = pool->end;
> +
> + /* The case below can happen if we have a small segment appended
> +  * to a large, or when the previous alloc was at the very end of
> +  * the available space. If so, go back to the beginning and flush.
> +  */
> + if (start >= limit) {
> + start = pool->start;
> + if (!large_pool && iommu->lazy_flush != NULL)
> + iommu->lazy_flush(iommu);

Add need_flush = false;

In fact, looking more closely, I see a another problems:

You never flush the large pool. You should either remove all
your large_pool checks since you treat it as a normal pool you
your code, and let it flush normally, or you should have an
unconditional flush somewhere in there for the large pool (or in
free()), either way. The current implementation you show me, afaik,
never calls lazy_flush on a large allocation.

Also, if you fix the problem I mentioned earlier about *handle < hint,
you also don't need the above flush, it will be handled further down
in the else case for the if (n == -1), see below

> + }

I only just noticed too, you completely dropped the code to honor
the dma mask. Why that ? Some devices rely on this.

> + if (dev)
> + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> +   1 << iommu->table_shift);
> + else
> + boundary_size = ALIGN(1UL << 32, 1 << iommu->table_shift);
> +
> + shift = iommu->table_map_base >> iommu->table_shift;
> + boundary_size = boundary_size >> iommu->table_shift;
> + /*
> +  * if the skip_span_boundary_check had been set during init, we set
> +  * things up so that iommu_is_span_boundary() merely checks if the
> +  * (index + npages) < num_tsb_entries
> +  */
> + if ((iommu->flags & IOMMU_NO_SPAN_BOUND) != 0) {
> + shift = 0;
> + boundary_size = iommu->poolsize * iommu->nr_pools;
> + }
> + n = iommu_area_alloc(iommu->map, limit, start, npages, shift,
> +  boundary_size, 0);

You have completely dropped the alignment support. This will break
drivers. There are cases (especially with consistent allocations) where
the driver have alignment constraints on the address, those must be
preserved.

> + if (n == -1) {
> + if (likely(pass == 0)) {
> + /* First failure, rescan from the beginning.  */
> + pool->hint = pool->start;
> + if (!large_pool && iommu->lazy_flush != NULL)
> + need_flush = true;

Same question about the large pool check... I wouldn't bother with the
function pointer NULL check here either, only test it at the call site.

> + pass++;
> + goto again;
> + } else if (!largealloc && pass <= iommu->nr_pools) {
> + 

Here, you might have moved the hint of the pool (case above), and then
hit this code path, no ? In that case, need_flush might be set, which
means you must flush before you unlock.

>   spin_unlock(&(pool->lock));
> +
>   pool_nr = (pool_nr + 1) & (iommu->nr_pools - 1);
> + pool = &(iommu->pools[pool_nr]);
> + spin_lock(&(pool->lock));
> + pool->hint = pool->start;
> + if (!large_pool && iommu->lazy_flush != NULL)
> + need_flush = true;

I wouldn't bother with the test. This can't be large_alloc afaik and
the NULL check can be done at the call site.

> + pass++;
> + goto again;
> + } else {
> + /* give up */
> + n = DMA_ERROR_CODE;
> + goto bail;
> + }
> + }

Here, make this something like:

} else if (end < pool->hint)
need_flush = true;

To handle the case where you started your alloc below the hint without
having updated the hint yet, ie, a *handle < hint or a 

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
the other question that comes to my mind is: the whole lazy_flush
optimization probably works best when there is exactly one pool,
and no large pools. In most other cases, we'd end up doing a lazy_flush
when we wrap within our pool itself, losing the benefit of that 
optimization. 

Given that the lazy_flush is mostly there to avoid regressions for 
the older sun4u architectures (which have other hardware bottlenecks 
anyway), and this code is rapidly getting messy, does it make sense
to constrain the lazy_flush check to only apply for the 1-pool, 
no-large-pool case?


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
On (04/03/15 07:54), Benjamin Herrenschmidt wrote:
> > +   limit = pool->end;
> > +
> > +   /* The case below can happen if we have a small segment appended
> > +* to a large, or when the previous alloc was at the very end of
> > +* the available space. If so, go back to the beginning and flush.
> > +*/
> > +   if (start >= limit) {
> > +   start = pool->start;
> > +   if (!large_pool && iommu->lazy_flush != NULL)
> > +   iommu->lazy_flush(iommu);
> 
> Add need_flush = false;

A few clarifications, while I parse the rest of your comments:

Not sure I follow- need_flush is initialized to true at the start of the 
function?

> I only just noticed too, you completely dropped the code to honor
> the dma mask. Why that ? Some devices rely on this.

so that's an interesting question: the existing iommu_range_alloc() in
arch/sparc/kernel/iommu.c does not use the mask at all. I based most of
the code  on this (except for the lock fragmentation part). 
I dont know if this is arch specific.

> 
> > +   if (dev)
> > +   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > + 1 << iommu->table_shift);
> > +   else
> > +   boundary_size = ALIGN(1UL << 32, 1 << iommu->table_shift);
> > +
> > +   shift = iommu->table_map_base >> iommu->table_shift;
> > +   boundary_size = boundary_size >> iommu->table_shift;
> > +   /*
> > +* if the skip_span_boundary_check had been set during init, we set
> > +* things up so that iommu_is_span_boundary() merely checks if the
> > +* (index + npages) < num_tsb_entries
> > +*/
> > +   if ((iommu->flags & IOMMU_NO_SPAN_BOUND) != 0) {
> > +   shift = 0;
> > +   boundary_size = iommu->poolsize * iommu->nr_pools;
> > +   }
> > +   n = iommu_area_alloc(iommu->map, limit, start, npages, shift,
> > +boundary_size, 0);
> 
> You have completely dropped the alignment support. This will break
> drivers. There are cases (especially with consistent allocations) where

Again, not sure I follow? are you referring to the IOMMU_NO_SPAN_BOUND case?
That's very specific to LDC (sparc ldoms virtualization infra). The default
is to not have IOMMU_NO_SPAN_BOUND set. 
For the rest of the drivers, the code that sets up boundary_size aligns things
in the same way as the ppc code.


> the driver have alignment constraints on the address, those must be
> preserved.
> 

--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread David Laight
From: Sowmini Varadhan
> Investigation of multithreaded iperf experiments on an ethernet
> interface show the iommu->lock as the hottest lock identified by
> lockstat, with something of the order of  21M contentions out of
> 27M acquisitions, and an average wait time of 26 us for the lock.
> This is not efficient. A more scalable design is to follow the ppc
> model, where the iommu_map_table has multiple pools, each stretching
> over a segment of the map, and with a separate lock for each pool.
> This model allows for better parallelization of the iommu map search.

I've wondered whether the iommu setup for ethernet receive (in particular)
could be made much more efficient if there were a function that
would unmap one buffer and map a second buffer?
My thought is that iommu pte entry used by the old buffer could just
be modified to reference the new one.
In effect each ring entry would end up using a fixed iommu pte.

The other question is how much data can be copied in 26us ?
On iommu systems 'copybreak' limits on receive and transmit
may need to be quite high.

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread Sowmini Varadhan
On (03/31/15 15:15), David Laight wrote:
> 
> I've wondered whether the iommu setup for ethernet receive (in particular)
> could be made much more efficient if there were a function that
> would unmap one buffer and map a second buffer?
> My thought is that iommu pte entry used by the old buffer could just
> be modified to reference the new one.
> In effect each ring entry would end up using a fixed iommu pte.

There are a number of interesting things to investigate in
this space, and the above is just one of them, ways to avoid
the overhead of a full-blown map/unmap on each call. See 
  http://www.spinics.net/lists/sparclinux/msg13613.html

But the scope of this patchset is actually very rigidly defined:
to refactor the iommu pool/arena allocator into a common library,
and avoid code duplication (today even the single sparc arch
duplicates it for sun4[u,v] and ldc, and that's not even counting
the duplication across other archs/pci-drivers).

Investigating ways to provide a generalized infra that can
avoid a dma map/unmp for every packet would be a good follow-on.

> The other question is how much data can be copied in 26us ?
> On iommu systems 'copybreak' limits on receive and transmit
> may need to be quite high.

where does the "26us" number come from? I may be missing that context?

--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread Sowmini Varadhan
Investigation of multithreaded iperf experiments on an ethernet
interface show the iommu->lock as the hottest lock identified by
lockstat, with something of the order of  21M contentions out of
27M acquisitions, and an average wait time of 26 us for the lock.
This is not efficient. A more scalable design is to follow the ppc
model, where the iommu_map_table has multiple pools, each stretching
over a segment of the map, and with a separate lock for each pool.
This model allows for better parallelization of the iommu map search.

This patch adds the iommu range alloc/free function infrastructure.

Signed-off-by: Sowmini Varadhan 
---
v2 changes:
  - incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
  - make the npools value an input parameter, for the case when
the iommu map size is not very large
  - cookie_to_index mapping, and optimizations for span-boundary
check, for use case such as LDC.
v3: eliminate iommu_sparc, rearrange the ->demap indirection to
be invoked under the pool lock.

v4: David Miller review changes:
  - s/IOMMU_ERROR_CODE/DMA_ERROR_CODE
  - page_table_map_base and page_table_shift are unsigned long, not u32.

v5: Feedback from b...@kernel.crashing.org and a...@ozlabs.ru
  - removed ->cookie_to_index and ->demap indirection: caller should
invoke these as needed before calling into the generic allocator

v6: Benh/DaveM discussion eliminationg iommu_tbl_ops, but retaining flush_all
optimization.

v7: one-time initialization of pool_hash from iommu_tbl_pool_init()

v8: Benh code review comments.

 include/linux/iommu-common.h |   48 +
 lib/Makefile |2 +-
 lib/iommu-common.c   |  233 ++
 3 files changed, 282 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/iommu-common.h
 create mode 100644 lib/iommu-common.c

diff --git a/include/linux/iommu-common.h b/include/linux/iommu-common.h
new file mode 100644
index 000..7b1b761
--- /dev/null
+++ b/include/linux/iommu-common.h
@@ -0,0 +1,48 @@
+#ifndef _LINUX_IOMMU_COMMON_H
+#define _LINUX_IOMMU_COMMON_H
+
+#include 
+#include 
+#include 
+
+#define IOMMU_POOL_HASHBITS 4
+#define IOMMU_NR_POOLS  (1 << IOMMU_POOL_HASHBITS)
+
+struct iommu_pool {
+   unsigned long   start;
+   unsigned long   end;
+   unsigned long   hint;
+   spinlock_t  lock;
+};
+
+struct iommu_map_table {
+   unsigned long   table_map_base;
+   unsigned long   table_shift;
+   unsigned long   nr_pools;
+   void(*lazy_flush)(struct iommu_map_table *);
+   unsigned long   poolsize;
+   struct iommu_pool   pools[IOMMU_NR_POOLS];
+   u32 flags;
+#defineIOMMU_HAS_LARGE_POOL0x0001
+#defineIOMMU_NO_SPAN_BOUND 0x0002
+   struct iommu_pool   large_pool;
+   unsigned long   *map;
+};
+
+extern void iommu_tbl_pool_init(struct iommu_map_table *iommu,
+   unsigned long num_entries,
+   u32 table_shift,
+   void (*lazy_flush)(struct iommu_map_table *),
+   bool large_pool, u32 npools,
+   bool skip_span_boundary_check);
+
+extern unsigned long iommu_tbl_range_alloc(struct device *dev,
+  struct iommu_map_table *iommu,
+  unsigned long npages,
+  unsigned long *handle);
+
+extern void iommu_tbl_range_free(struct iommu_map_table *iommu,
+u64 dma_addr, unsigned long npages,
+unsigned long entry);
+
+#endif
diff --git a/lib/Makefile b/lib/Makefile
index 58f74d2..60c22e6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,7 +106,7 @@ obj-$(CONFIG_AUDIT_GENERIC) += audit.o
 obj-$(CONFIG_AUDIT_COMPAT_GENERIC) += compat_audit.o
 
 obj-$(CONFIG_SWIOTLB) += swiotlb.o
-obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o
+obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o iommu-common.o
 obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 obj-$(CONFIG_NOTIFIER_ERROR_INJECTION) += notifier-error-inject.o
 obj-$(CONFIG_CPU_NOTIFIER_ERROR_INJECT) += cpu-notifier-error-inject.o
diff --git a/lib/iommu-common.c b/lib/iommu-common.c
new file mode 100644
index 000..35dacbe
--- /dev/null
+++ b/lib/iommu-common.c
@@ -0,0 +1,233 @@
+/*
+ * IOMMU mmap management and range allocation functions.
+ * Based almost entirely upon the powerpc iommu allocator.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+unsigned long iommu_large_alloc = 15;
+
+static DEFINE_PER_CPU(unsigned int, iommu_pool_hash);
+
+static void setup_iommu_pool_hash(void)
+{
+   unsigned int i;
+   static bool do_once;
+
+   if (do_o