Walter Bright wrote: > > ... To generate the code directly, assuming the existence of SSE, > is to mean the code will only run on modern chips. Whether or not this > is a problem depends on your application.
If MMX/SSE/SSE2 optimizations are low-lying fruit, I'd at least like to have an -sse (and maybe -sse2, -sse3, and -no-sse) switch for the compiler to determine whether the compiler emits those instructions or not. I'm also wondering if a more ideal approach (and perhaps additional option to those above) would be to borrow the best of JIT compilation and emit multiple code paths. Maybe the program would have a bootstrap phase when starting up where it would call cpuid, find out what it has available, rewrite the main binary to use the optimal paths, then execute the main binary. That way feature detection doesn't happen while the program itself is running, and thus doesn't slow down the computations as they happen. Then passing -sse* would cause it to not emit the bootstrap, but instead just assume that the instructions will be available.