On 2/11/22 19:25, XenoAmess wrote:
On Fri, 11 Feb 2022 18:24:49 GMT, Andrew Haley <a...@openjdk.org> wrote:
Just multiply by 0.75.
On a modern design, floating-point multiply is 4 clocks latency, 4 ops/clock
throughput. FP max is 2 clocks latency, conversions int-float and vice versa 3
clocks latency, 4 ops/clock throughput. Long division is 7-9 clocks, 2ops/clock
throughput. Shift and add 2 clocks, 2/3 ops/clock througput. Compare is 1
clock, 3 ops/clock throughput, conditional move is 1 clock, 3 ops/clock
throughput.
Seems like it's a wash.
@theRealAph
no multiply but divide.
Well yes, but that doesn't look at all hard to change.
besides, did you count the cost for Math.ceil? it is the heaviest part.
Yes. 3 clocks latency, 4 ops/clock throughput. Your hardware may vary.
And that instruction does both the ceil() and the float-int conversion.
(Having said that, I don't know if we currently generate optimal code
for this operation. But of course that can be fixed.)
I don't think this is terribly important, but I don't like to see
attempts at hand optimization in the standard library.