On Mar 29, 2010, at 16:30, Tim Prince wrote: > gcc used to have the ability to replace division by a power of 2 by an fscale > instruction, for appropriate targets (maybe still does). The problem (again) is that floating point multiplication is just too damn fast. On x86, even though the latency may be 5 cycles, since the multiplier is fully pipelined, the throughput is one multiplication per clock cycle, and that's for non-vectorized code!
For comparison, the fscale instruction breaks down to 30 µops or something like that, compared to a single µop for most forms of floating point multiplication. Given that Jeroen also needs to do floating-point additions, just bouncing the values between integer and float registers will be more expensive than the entire multiplication is in the first place. > Such targets have nearly disappeared from everyday usage. What remains is > the possibility of replacing the division by constant power of 2 by > multiplication, but it's generally considered the programmer should have done > that in the beginning. No, this is something the compiler does and should do. It is well understood that for binary floating point multiplications division by a power of two is identical to multiplication by its reciprocal, and it's the compiler's job to select the fastest instruction. -Geert