On Mar 29, 2010, at 16:30, Tim Prince wrote:
> gcc used to have the ability to replace division by a power of 2 by an fscale 
> instruction, for appropriate targets (maybe still does).
The problem (again) is that floating point multiplication is 
just too damn fast. On x86, even though the latency may 
be 5 cycles, since the multiplier is fully pipelined, the 
throughput is one multiplication per clock cycle, and that's
for non-vectorized code!

For comparison, the fscale instruction breaks down to 30 µops
or something like that, compared to a single µop for most
forms of floating point multiplication. Given that Jeroen
also needs to do floating-point additions, just bouncing
the values between integer and float registers will be
more expensive than the entire multiplication is in the
first place.

> Such targets have nearly disappeared from everyday usage.  What remains is 
> the possibility of replacing the division by constant power of 2 by 
> multiplication, but it's generally considered the programmer should have done 
> that in the beginning.

No, this is something the compiler does and should do. 
It is well understood that for binary floating point multiplications
division by a power of two is identical to multiplication by its reciprocal,
and it's the compiler's job to select the fastest instruction.

  -Geert

Reply via email to