Hello Gareth,
I managed to get some free time to look at this and it would appear that static foreach is slower than hand unrolling a loop. Fortunately there are no detectable differences between hand unrolling a loop and building the body of the loop in a compile time string and using a mixin. Results are here:
Last time I checked, static foreach tended to load the index onto the stack for each iteration even if it was never used. Did you take a look at the ASM?
-- ... <IXOYE><