Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-04 Thread Sowmini Varadhan
One last question before I spin out v9.. the dma_mask code
is a bit confusing to me, so I want to make sure... the code is 

 if (limit + tbl-it_offset  mask) {
   limit = mask - tbl-it_offset + 1;
   /* If we're constrained on address range, first try
* at the masked hint to avoid O(n) search complexity,
* but on second pass, start at 0 in pool 0.
*/
   if ((start  mask) = limit || pass  0) {
   spin_unlock((pool-lock));
   pool = (tbl-pools[0]);
   spin_lock((pool-lock));
   start = pool-start;

So here I would need to mark need_flush to true, right?


--Sowmini

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-04 Thread Benjamin Herrenschmidt
On Sat, 2015-04-04 at 07:27 -0400, Sowmini Varadhan wrote:
 One last question before I spin out v9.. the dma_mask code
 is a bit confusing to me, so I want to make sure... the code is 
 
  if (limit + tbl-it_offset  mask) {
limit = mask - tbl-it_offset + 1;
/* If we're constrained on address range, first try
 * at the masked hint to avoid O(n) search complexity,
 * but on second pass, start at 0 in pool 0.
 */
if ((start  mask) = limit || pass  0) {
spin_unlock((pool-lock));
pool = (tbl-pools[0]);
spin_lock((pool-lock));
start = pool-start;
 
 So here I would need to mark need_flush to true, right?

Well, no but ...

So you aren't changing any pool hint, so you don't have to mark any pool
for flushing. The n  hint check will do the flush later on if needed
(the one I told you to add).

You only need to force a flush if you modify the hint.

*However*, you do an unlock, which means that if you have modified any
*other* pool's hint (ie, need_flush is true), you need to actually
perform a flush.

I think that business with flushing on unlock is becoming too ugly, I
would suggest you simplify things by moving need_flush to be part of the
pool structure (maybe a flag).

IE, when you modify a pool's hint, your mark it for flushing. 

When you allocate from a pool, you flush it at the end, just before you
return success, if either it was marked for flushing or n  hint (and
clear the mark of course).

That should cleanly factor all cases in a simpler  more readable code
path. The flush is done only when needed (an actual allocation happen on
a pool), you just mark them dirty when you change their hints, you
don't have to worry about the lock dropped case  pool hop'ing anymore.

Basically the logic looks like that:

.../...

n = iommu_area_alloc(...)
if (n == -1) {
if (pass == 0) {
change pool hint
pool-need_flush = true;
pass++;
goto again;
} else if (!largealloc   ) {
unlock/pool hop/lock
// no need to flush anything here
pool-hint = pool-start;
pool-need_flush = true;
etc...
} else {
n = DMA_ERROR_CODE;
goto fail;
}
}
end = n + pages;

if (n  pool-hint || pool-need_flush) {
pool-need_flush = false;
iommu-lazy_flush(...);
}
pool-hint = end;

.../...

Now this can be further simplified imho. For example that logic:

+   if (pass == 0  handle  *handle 
+   (*handle = pool-start)  (*handle  pool-end))
+   start = *handle;
+   else
+   start = pool-hint;

Can move above the again: label. that way, you remove the pass test
and you no longer have to adjust pool-hint in the if (likely(pass ==
0)) failure case. Just adjust start and try again, which means you
also no longer need to set need_flush.

Additionally, when pool hop'ing, now you only need to adjust start as
well. By not resetting it, you also remove the need for marking it need
flush. I would add to that, why not set start to the new pool's hint
instead of the new pool's start ? That will save some flushes, no ?

Anton, do you remember why you reset the hint when changing pool ? I
don't see that gaining us anything and it does make lazy flushing more
complex.

At that point, you have removed all setters of need_flush, that logic
can be entirely removed, or am I just too tired (it's past midnight) and
missing something entirely here ?

We only need the if (n  pool-hint) check to do the lazy flush at the
end, that's it.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Sowmini Varadhan


Just want to confirm:

  + again:
  +   if (pass == 0  handle  *handle 
  +   (*handle = pool-start)  (*handle  pool-end))
  +   start = *handle;
  +   else
  +   start = pool-hint;
 
 Now this means handle might be  pool-hint, in that case you also
 need a lazy flush. Or rather only if the resulting alloc is. My
 :

  +   } else {
  +   /* give up */
  +   n = DMA_ERROR_CODE;
  +   goto bail;
  +   }
  +   }
 
 Here, make this something like:
 
   } else if (end  pool-hint)
   need_flush = true;

you mean 

} else if (start  pool-hint)

right?  (so I'm not missing some corner-case that you are thinking
about here)

--Sowmini
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Sowmini Varadhan
On (04/04/15 08:06), Benjamin Herrenschmidt wrote:
 
 No, I meant n  pool-hint, ie, the start of the newly allocated
 block.

ah, got it. I'll do my drill with patchset and get back, probably by
Monday.

--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Benjamin Herrenschmidt
On Fri, 2015-04-03 at 14:28 -0400, Sowmini Varadhan wrote:
 
 Just want to confirm:
 
   + again:
   + if (pass == 0  handle  *handle 
   + (*handle = pool-start)  (*handle  pool-end))
   + start = *handle;
   + else
   + start = pool-hint;
  
  Now this means handle might be  pool-hint, in that case you also
  need a lazy flush. Or rather only if the resulting alloc is. My
  :
 
   + } else {
   + /* give up */
   + n = DMA_ERROR_CODE;
   + goto bail;
   + }
   + }
  
  Here, make this something like:
  
  } else if (end  pool-hint)
  need_flush = true;
 
 you mean 
 
   } else if (start  pool-hint)
 
 right?  (so I'm not missing some corner-case that you are thinking
 about here)

No, I meant n  pool-hint, ie, the start of the newly allocated
block.

end hasn't been adjusted yet at that point but we don't want to
compare end anyway (which will be n + npages), we really want n, ie if
the *beginning* of the newly allocated chunk is before the end of the
previous one (after all they may even overlap if the previous one has
been freed).

Ben.

 --Sowmini
  


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Thu, 2015-04-02 at 17:54 -0400, Sowmini Varadhan wrote:
 the other question that comes to my mind is: the whole lazy_flush
 optimization probably works best when there is exactly one pool,
 and no large pools. In most other cases, we'd end up doing a lazy_flush
 when we wrap within our pool itself, losing the benefit of that 
 optimization. 

Not a big deal. The point of the optimization is to avoid flushing on
every map/unmap. As long as the pool wrapping is an order of magnitude
rarer than actual map/unmap operations, it's a win.

 Given that the lazy_flush is mostly there to avoid regressions for 
 the older sun4u architectures (which have other hardware bottlenecks 
 anyway), and this code is rapidly getting messy, does it make sense
 to constrain the lazy_flush check to only apply for the 1-pool, 
 no-large-pool case?

I was planning to use it to improve things on older G5s as well and I
definitely want pools on these. Leave it in, we just need to get it
right.

I think it's not that messy, if you refactor the way I suggested, it
basically boils down to conditionally flushing in two places, at
the point where the pool lock is dropped, if the hint for that pool
was moved backward.

Do the test once in the else case of the n == -1 for all the normal
cases (which includes the hint  limit, multi-pass wrap and *handle
below hint, they are all covered) and in the pool hop'ing case.

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
On (04/03/15 08:57), Benjamin Herrenschmidt wrote:
 
   I only just noticed too, you completely dropped the code to honor
   the dma mask. Why that ? Some devices rely on this.

/* Sowmini's comment about this coming from sparc origins.. */

 Probably, not that many devices have limits on DMA mask but they do
 exist. It becomes more important if we decide to create a very large
 IOMMU window that spans beyond 4G in order to support devices with
 32-bit DMA masks. Otherwise it's older devices mostly with 32-bit
 masks.
 
 In any case, for a generic piece of code, this should be supported.
 Basically, assume that if we have something in the powerpc code, we need
 it, if you remove it, we won't be able to use your code generically.

I see.

is the mask something that can be stored in the iommu_map_table as
part of the init? 

I can see that the align_order has to be an additional arg to 
iommu_tbl_range_alloc, not sure if mask falls in that category
as well.

--Sowmini

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Tue, 2015-03-31 at 10:40 -0400, Sowmini Varadhan wrote:

 + if (largealloc) {
 + pool = (iommu-large_pool);
 + spin_lock_irqsave(pool-lock, flags);
 + pool_nr = 0; /* to keep compiler happy */
 + } else {
 + /* pick out pool_nr */
 + pool_nr =  pool_hash  (npools - 1);
 + pool = (iommu-pools[pool_nr]);
 + spin_lock_irqsave((pool-lock), flags);
 + }

Move spin_lock_irqsave outside of if/else

 + again:
 + if (pass == 0  handle  *handle 
 + (*handle = pool-start)  (*handle  pool-end))
 + start = *handle;
 + else
 + start = pool-hint;

Now this means handle might be  pool-hint, in that case you also
need a lazy flush. Or rather only if the resulting alloc is. My
suggestion originally was to test that after the area alloc. See further
down.

 + limit = pool-end;
 +
 + /* The case below can happen if we have a small segment appended
 +  * to a large, or when the previous alloc was at the very end of
 +  * the available space. If so, go back to the beginning and flush.
 +  */
 + if (start = limit) {
 + start = pool-start;
 + if (!large_pool  iommu-lazy_flush != NULL)
 + iommu-lazy_flush(iommu);

Add need_flush = false;

In fact, looking more closely, I see a another problems:

You never flush the large pool. You should either remove all
your large_pool checks since you treat it as a normal pool you
your code, and let it flush normally, or you should have an
unconditional flush somewhere in there for the large pool (or in
free()), either way. The current implementation you show me, afaik,
never calls lazy_flush on a large allocation.

Also, if you fix the problem I mentioned earlier about *handle  hint,
you also don't need the above flush, it will be handled further down
in the else case for the if (n == -1), see below

 + }

I only just noticed too, you completely dropped the code to honor
the dma mask. Why that ? Some devices rely on this.

 + if (dev)
 + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
 +   1  iommu-table_shift);
 + else
 + boundary_size = ALIGN(1UL  32, 1  iommu-table_shift);
 +
 + shift = iommu-table_map_base  iommu-table_shift;
 + boundary_size = boundary_size  iommu-table_shift;
 + /*
 +  * if the skip_span_boundary_check had been set during init, we set
 +  * things up so that iommu_is_span_boundary() merely checks if the
 +  * (index + npages)  num_tsb_entries
 +  */
 + if ((iommu-flags  IOMMU_NO_SPAN_BOUND) != 0) {
 + shift = 0;
 + boundary_size = iommu-poolsize * iommu-nr_pools;
 + }
 + n = iommu_area_alloc(iommu-map, limit, start, npages, shift,
 +  boundary_size, 0);

You have completely dropped the alignment support. This will break
drivers. There are cases (especially with consistent allocations) where
the driver have alignment constraints on the address, those must be
preserved.

 + if (n == -1) {
 + if (likely(pass == 0)) {
 + /* First failure, rescan from the beginning.  */
 + pool-hint = pool-start;
 + if (!large_pool  iommu-lazy_flush != NULL)
 + need_flush = true;

Same question about the large pool check... I wouldn't bother with the
function pointer NULL check here either, only test it at the call site.

 + pass++;
 + goto again;
 + } else if (!largealloc  pass = iommu-nr_pools) {
 + 

Here, you might have moved the hint of the pool (case above), and then
hit this code path, no ? In that case, need_flush might be set, which
means you must flush before you unlock.

   spin_unlock((pool-lock));
 +
   pool_nr = (pool_nr + 1)  (iommu-nr_pools - 1);
 + pool = (iommu-pools[pool_nr]);
 + spin_lock((pool-lock));
 + pool-hint = pool-start;
 + if (!large_pool  iommu-lazy_flush != NULL)
 + need_flush = true;

I wouldn't bother with the test. This can't be large_alloc afaik and
the NULL check can be done at the call site.

 + pass++;
 + goto again;
 + } else {
 + /* give up */
 + n = DMA_ERROR_CODE;
 + goto bail;
 + }
 + }

Here, make this something like:

} else if (end  pool-hint)
need_flush = true;

To handle the case where you started your alloc below the hint without
having updated the hint yet, ie, a *handle  hint or a wrap around,
this will also handle the start = limit case.

Basically you need to flush iff you are going to provide an alloc below
the current hint, 

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Thu, 2015-04-02 at 18:15 -0400, Sowmini Varadhan wrote:
 On (04/03/15 08:57), Benjamin Herrenschmidt wrote:
  
I only just noticed too, you completely dropped the code to honor
the dma mask. Why that ? Some devices rely on this.
 
 /* Sowmini's comment about this coming from sparc origins.. */
 
  Probably, not that many devices have limits on DMA mask but they do
  exist. It becomes more important if we decide to create a very large
  IOMMU window that spans beyond 4G in order to support devices with
  32-bit DMA masks. Otherwise it's older devices mostly with 32-bit
  masks.
  
  In any case, for a generic piece of code, this should be supported.
  Basically, assume that if we have something in the powerpc code, we need
  it, if you remove it, we won't be able to use your code generically.
 
 I see.
 
 is the mask something that can be stored in the iommu_map_table as
 part of the init? 
 
No, the mask is per device and has to be retrieved from the device.
Additionally the mask used for dma_map_* can be different from the
consistent mask used for alloc_coherent (we have a bug there on powerpc
which I'm trying to fix btw).

So it should be passed as an argument by the caller.

 I can see that the align_order has to be an additional arg to 
 iommu_tbl_range_alloc, not sure if mask falls in that category
 as well.

It does.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Fri, 2015-04-03 at 09:01 +1100, Benjamin Herrenschmidt wrote:
 On Fri, 2015-04-03 at 08:57 +1100, Benjamin Herrenschmidt wrote:
 
  No, the last argument to iommu_area_alloc() which is passed from the
  callers when doing consistent allocs. Basically, the DMA api mandates
  that consistent allocs are naturally aligned (to their own size), we
  implement that on powerpc by passing that alignment argument down.
 
 Talking of this ... I notice this is not documented in DMA-API.txt...
 however many drivers make that assumption, and the DMA pool allocator
 in mm/dmapool.c as well.
 
 Maybe somebody should update DMA-API.txt...

Ah it *is* documented in DMA-API-HOWTO.txt :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Fri, 2015-04-03 at 08:57 +1100, Benjamin Herrenschmidt wrote:

 No, the last argument to iommu_area_alloc() which is passed from the
 callers when doing consistent allocs. Basically, the DMA api mandates
 that consistent allocs are naturally aligned (to their own size), we
 implement that on powerpc by passing that alignment argument down.

Talking of this ... I notice this is not documented in DMA-API.txt...
however many drivers make that assumption, and the DMA pool allocator
in mm/dmapool.c as well.

Maybe somebody should update DMA-API.txt...

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
On (04/03/15 07:54), Benjamin Herrenschmidt wrote:
  +   limit = pool-end;
  +
  +   /* The case below can happen if we have a small segment appended
  +* to a large, or when the previous alloc was at the very end of
  +* the available space. If so, go back to the beginning and flush.
  +*/
  +   if (start = limit) {
  +   start = pool-start;
  +   if (!large_pool  iommu-lazy_flush != NULL)
  +   iommu-lazy_flush(iommu);
 
 Add need_flush = false;

A few clarifications, while I parse the rest of your comments:

Not sure I follow- need_flush is initialized to true at the start of the 
function?

 I only just noticed too, you completely dropped the code to honor
 the dma mask. Why that ? Some devices rely on this.

so that's an interesting question: the existing iommu_range_alloc() in
arch/sparc/kernel/iommu.c does not use the mask at all. I based most of
the code  on this (except for the lock fragmentation part). 
I dont know if this is arch specific.

 
  +   if (dev)
  +   boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
  + 1  iommu-table_shift);
  +   else
  +   boundary_size = ALIGN(1UL  32, 1  iommu-table_shift);
  +
  +   shift = iommu-table_map_base  iommu-table_shift;
  +   boundary_size = boundary_size  iommu-table_shift;
  +   /*
  +* if the skip_span_boundary_check had been set during init, we set
  +* things up so that iommu_is_span_boundary() merely checks if the
  +* (index + npages)  num_tsb_entries
  +*/
  +   if ((iommu-flags  IOMMU_NO_SPAN_BOUND) != 0) {
  +   shift = 0;
  +   boundary_size = iommu-poolsize * iommu-nr_pools;
  +   }
  +   n = iommu_area_alloc(iommu-map, limit, start, npages, shift,
  +boundary_size, 0);
 
 You have completely dropped the alignment support. This will break
 drivers. There are cases (especially with consistent allocations) where

Again, not sure I follow? are you referring to the IOMMU_NO_SPAN_BOUND case?
That's very specific to LDC (sparc ldoms virtualization infra). The default
is to not have IOMMU_NO_SPAN_BOUND set. 
For the rest of the drivers, the code that sets up boundary_size aligns things
in the same way as the ppc code.


 the driver have alignment constraints on the address, those must be
 preserved.
 

--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt
On Thu, 2015-04-02 at 17:43 -0400, Sowmini Varadhan wrote:
 On (04/03/15 07:54), Benjamin Herrenschmidt wrote:
   + limit = pool-end;
   +
   + /* The case below can happen if we have a small segment appended
   +  * to a large, or when the previous alloc was at the very end of
   +  * the available space. If so, go back to the beginning and flush.
   +  */
   + if (start = limit) {
   + start = pool-start;
   + if (!large_pool  iommu-lazy_flush != NULL)
   + iommu-lazy_flush(iommu);
  
  Add need_flush = false;
 
 A few clarifications, while I parse the rest of your comments:
 
 Not sure I follow- need_flush is initialized to true at the start of the 
 function?

No but you can loop back there via goto again. However if you follow
my other comment and move the flush back to the end, then you don't need
that at all.

  I only just noticed too, you completely dropped the code to honor
  the dma mask. Why that ? Some devices rely on this.
 
 so that's an interesting question: the existing iommu_range_alloc() in
 arch/sparc/kernel/iommu.c does not use the mask at all. I based most of
 the code  on this (except for the lock fragmentation part). 
 I dont know if this is arch specific.

Probably, not that many devices have limits on DMA mask but they do
exist. It becomes more important if we decide to create a very large
IOMMU window that spans beyond 4G in order to support devices with
32-bit DMA masks. Otherwise it's older devices mostly with 32-bit
masks.

In any case, for a generic piece of code, this should be supported.
Basically, assume that if we have something in the powerpc code, we need
it, if you remove it, we won't be able to use your code generically.

There are a few cases we can debate like our block allocation, but
things like the mask or the alignment constraints aren't in that group.

  
   + if (dev)
   + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
   +   1  iommu-table_shift);
   + else
   + boundary_size = ALIGN(1UL  32, 1  iommu-table_shift);
   +
   + shift = iommu-table_map_base  iommu-table_shift;
   + boundary_size = boundary_size  iommu-table_shift;
   + /*
   +  * if the skip_span_boundary_check had been set during init, we set
   +  * things up so that iommu_is_span_boundary() merely checks if the
   +  * (index + npages)  num_tsb_entries
   +  */
   + if ((iommu-flags  IOMMU_NO_SPAN_BOUND) != 0) {
   + shift = 0;
   + boundary_size = iommu-poolsize * iommu-nr_pools;
   + }
   + n = iommu_area_alloc(iommu-map, limit, start, npages, shift,
   +  boundary_size, 0);
  
  You have completely dropped the alignment support. This will break
  drivers. There are cases (especially with consistent allocations) where
 
 Again, not sure I follow? are you referring to the IOMMU_NO_SPAN_BOUND case?

No, the last argument to iommu_area_alloc() which is passed from the
callers when doing consistent allocs. Basically, the DMA api mandates
that consistent allocs are naturally aligned (to their own size), we
implement that on powerpc by passing that alignment argument down.

 That's very specific to LDC (sparc ldoms virtualization infra). The default
 is to not have IOMMU_NO_SPAN_BOUND set. 
 For the rest of the drivers, the code that sets up boundary_size aligns things
 in the same way as the ppc code.
 
 
  the driver have alignment constraints on the address, those must be
  preserved.
  
 
 --Sowmini


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
the other question that comes to my mind is: the whole lazy_flush
optimization probably works best when there is exactly one pool,
and no large pools. In most other cases, we'd end up doing a lazy_flush
when we wrap within our pool itself, losing the benefit of that 
optimization. 

Given that the lazy_flush is mostly there to avoid regressions for 
the older sun4u architectures (which have other hardware bottlenecks 
anyway), and this code is rapidly getting messy, does it make sense
to constrain the lazy_flush check to only apply for the 1-pool, 
no-large-pool case?


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread David Laight
From: Sowmini Varadhan
 Investigation of multithreaded iperf experiments on an ethernet
 interface show the iommu-lock as the hottest lock identified by
 lockstat, with something of the order of  21M contentions out of
 27M acquisitions, and an average wait time of 26 us for the lock.
 This is not efficient. A more scalable design is to follow the ppc
 model, where the iommu_map_table has multiple pools, each stretching
 over a segment of the map, and with a separate lock for each pool.
 This model allows for better parallelization of the iommu map search.

I've wondered whether the iommu setup for ethernet receive (in particular)
could be made much more efficient if there were a function that
would unmap one buffer and map a second buffer?
My thought is that iommu pte entry used by the old buffer could just
be modified to reference the new one.
In effect each ring entry would end up using a fixed iommu pte.

The other question is how much data can be copied in 26us ?
On iommu systems 'copybreak' limits on receive and transmit
may need to be quite high.

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread Sowmini Varadhan
On (03/31/15 15:15), David Laight wrote:
 
 I've wondered whether the iommu setup for ethernet receive (in particular)
 could be made much more efficient if there were a function that
 would unmap one buffer and map a second buffer?
 My thought is that iommu pte entry used by the old buffer could just
 be modified to reference the new one.
 In effect each ring entry would end up using a fixed iommu pte.

There are a number of interesting things to investigate in
this space, and the above is just one of them, ways to avoid
the overhead of a full-blown map/unmap on each call. See 
  http://www.spinics.net/lists/sparclinux/msg13613.html

But the scope of this patchset is actually very rigidly defined:
to refactor the iommu pool/arena allocator into a common library,
and avoid code duplication (today even the single sparc arch
duplicates it for sun4[u,v] and ldc, and that's not even counting
the duplication across other archs/pci-drivers).

Investigating ways to provide a generalized infra that can
avoid a dma map/unmp for every packet would be a good follow-on.

 The other question is how much data can be copied in 26us ?
 On iommu systems 'copybreak' limits on receive and transmit
 may need to be quite high.

where does the 26us number come from? I may be missing that context?

--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev