Hi, I currently use an utility-class heavily for the XRender Java2D backend, which performs saturated casts:
1.) return (short) (x > Short.MAX_VALUE ? Short.MAX_VALUE : (x < Short.MIN_VALUE ? Short.MIN_VALUE : x)); 2.) return (short) (x > 65535 ? 65535 : (x < 0) ? 0 : x); I spent quite some time benchmarking/tuning the protocol-generation-methods, and a lot of cycles are spent in those saturated casts, even if the utility methods are static. E.g. XRenderFillRectangle takes 40 cycles without clamping, but already 70 cycles with on my core2duo with hotspot-server/jdk 14.0. Hotspot seems to solve the problem always with conditional jumps, although well predictable ones. Modern processors seem to have support for this kind of operation, in x86 there's packssdw in MMX/SSE2. I think something like a saturated cast could be quite useful, there are already cast-methods in Long/Integer/Short - what do you think about adding saturated casts to that API? Those could be instrified to use MMX/SSE2 if available. If that would be too specific how hard would it be to add this kind of optimization to hotspot? How far does SIMD support in hotspot go (I read some time ago there've been some optimizations), if SIMD would be supported 4 casts could be done in a single cycle :) Thanks, Clemens