David Edelsohn <dje....@gmail.com> writes: > Calls impose a lot of overhead on Power.
Thanks, that's good to know. > And both the efficient loop instruction and the preferred indirect call > instruction use the CTR register. That's one thing I wonder after having a closer look at the AES loops. One rather common pattern in GMP and Nettle assembly loops, is to use the same register as both index register and loop counter. A loop that in C would conventionally be written as for (i = 0; i < n; i++) dst[i] = f(src[i]); is written in assembly closer to dst += n; src += n; // Base registers point at end of arrays n = -n; // Use negative index register for (; n != 0; n++) dst[n] = f(src[n]); This saves one register (and eliminates corresponding update instructions), and the loop branch is based on carry flag (or zero flag) from the index register update n++. (If the items processed by the loop are larger than a byte, n would also be scaled by the size, and one would do n += size rather than n++, and it still works just fine). Would that pattern work well on power, or is it always preferable to use the special counter register, e.g., if it provides better branch prediction? I'm not so familiar with power assembly, but from the AES code it looks like the relevant instructions are mtctr to initialize the counter, and bdnz to decrement and branch. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se