David Mosberger wrote:
On Thu, 24 Mar 2005 18:18:17 +1100, Nick Piggin <[EMAIL PROTECTED]> said:


  Nick> After applying the recent freepgt patchset from Hugh (on
  Nick> lkml), the time to fork+exit a process mapping 64GB of address
  Nick> (32MB of page tables) is 0.471s. With the prefetch patch, this
  Nick> drops to 0.357s.


Sorry, above numbers were wrong: 0.118s versus 0.089s. Improvement ratio is the same, I just used the wrong divisor.

Looks like a nice improvement to me.

Does prefetching 1 line ahead give the best results?  That's only
128/8=16 PTEs.  Assuming a 200 cycle latency, this would allow
for only 12.5 cycles/iteration.  Especially for large (NUMA) machines,
prefetching further out might help more.


Hmm... yeah it may do. Although I don't think that changes your cycles / iteration ratio, does it? Just allows for for a little bit more variation.

I just retested, and prefetching 2 lines ahead gives virtually the same
performance.

But actually, my tests are set up so each pte page has only a single
'present' pte (I did it that way to speed up initial faulting time).
So the loop will almost always get stopped by the pte_none tests. So
perhaps that is able to complete in close to or less than 12 cycles.

Nick


- To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Reply via email to