David Miller <da...@davemloft.net> writes:

  As an aside I think we can get it down to 2.5 cycles per limb on
  T4 with 4-way unrolling, and 3.0 cycles per limb with 2-way
  unrolling.
  
  The idea is to decrease the bookkeeping instructions by only
  maintaining base pointers which do not change, and then we have an
  offset which operates as the loop index.
  
  So we'd instead have an 'n_off' instead of 'n', and then in some local
  registers we'd hold:
  
  l3:   up - 8
  l4:   up - 16
  l5:   rp - 8
  l6:   rp - 16
  
A clever trick!  But you will probably get 2.75 c/l for 4-way, not 2.5
c/l.  We'll need infinite unrolling for 2.5...

-- 
Torbjörn
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to