Modern compiler optimizations are a sight to behold. When I extract the bitrev32() function from sys/dev/fdt/if_dwge.c and compile it on its own on aarch64, clang with optimization recognizes the purpose and reduces the arithmetic to a single "rbit" instruction. Amazing.
Somewhat less (or more?) amazingly, this appears to be a late stage optimization that is sabotaged by earlier steps. When compiling the full if_dwge.c, clang seems to inline the function, and then strips it down to the fragments needed to reverse the few bits actually used. Over in the sys/dev/rasops code, the same bit reversal is performed by the MBE() macro. Here clang recognizes that the code is called inside some loops and extracts the invariant parts: It loads the eight constants into registers up front, and only leaves the and/or/shift operations inside the loop. So those clever optimizations of the arithmetic prevent the even more clever substitution with "rbit". It's just an observation I thought I'd share. Let's file it under "the compiler moves in a mysterious way". -- Christian "naddy" Weisgerber [email protected]
