David Edelsohn <dje....@gmail.com> writes:

> Calls impose a lot of overhead on Power.

Thanks, that's good to know.

> And both the efficient loop instruction and the preferred indirect call
> instruction use the CTR register.

That's one thing I wonder after having a closer look at the AES loops.

One rather common pattern in GMP and Nettle assembly loops, is to use
the same register as both index register and loop counter. A loop that
in C would conventionally be written as

  for (i = 0; i < n; i++)
    dst[i] = f(src[i]);

is written in assembly closer to

  dst += n; src += n; // Base registers point at end of arrays
  n = -n; // Use negative index register
  for (; n != 0; n++)
    dst[n] = f(src[n]);

This saves one register (and eliminates corresponding update
instructions), and the loop branch is based on carry flag (or zero flag)
from the index register update n++. (If the items processed by the loop
are larger than a byte, n would also be scaled by the size, and one
would do n += size rather than n++, and it still works just fine).

Would that pattern work well on power, or is it always preferable to use
the special counter register, e.g., if it provides better branch
prediction? I'm not so familiar with power assembly, but from the AES
code it looks like the relevant instructions are mtctr to initialize the
counter, and bdnz to decrement and branch.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se

Reply via email to