- compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and
measure the time the benchmark takes

The optimal unrolling factor may not be a power of two, depending on icache size (11 times the loop body size?), iteration count (13*n for some unknown n?), and whether there are actions performed inside the loop once or twice every N passes (for N not a power of two).

The powers of two would probably hit a lot of the common cases, but you might want to throw in some intermediate values too, if it's too costly to check all practical values.

Ken

Reply via email to