On 2/11/22 19:25, XenoAmess wrote:
On Fri, 11 Feb 2022 18:24:49 GMT, Andrew Haley <a...@openjdk.org> wrote:

Just multiply by 0.75.

On a modern design, floating-point multiply is 4 clocks latency, 4 ops/clock 
throughput. FP max is 2 clocks latency, conversions int-float and vice versa 3 
clocks latency, 4 ops/clock throughput. Long division is 7-9 clocks, 2ops/clock 
throughput. Shift and add 2 clocks, 2/3 ops/clock througput. Compare is 1 
clock, 3 ops/clock throughput, conditional move is 1 clock, 3 ops/clock 
throughput.

Seems like it's a wash.

@theRealAph

no multiply but divide.

Well yes, but that doesn't look at all hard to change.

besides, did you count the cost for Math.ceil? it is the heaviest part.

Yes. 3 clocks latency, 4 ops/clock throughput. Your hardware may vary.
And that instruction does both the ceil() and the float-int conversion.

(Having said that, I don't know if we currently generate optimal code
for this operation. But of course that can be fixed.)

I don't think this is terribly important, but I don't like to see
attempts at hand optimization in the standard library.

Reply via email to