The number of plausible variants is astonishing! ---
Your use of -client and -server is outdated, which explains why you get the same results for both (-client is ignored). I'm not sure what's blessed by hotspot team, but for C1 I use -XX:+TieredCompilation -XX:TieredStopAtLevel=1 and for C2 I use -XX:-TieredCompilation -server --- Now I understand the advantage of using ~i & (i - 1): the subsequent zero check is a short-circuit for all odd numbers, better than i & -i, which explains your results - they depend on being able to short-circuit. So just use a more faithful inlining of nlz without trying to improve on it. static int ntz_inlineNlz5(int i) { i = ~i & (i - 1); if (i <= 0) return (i == 0) ? 0 : 32; int n = 1; if (i >= 1 << 16) { n += 16; i >>>= 16; } if (i >= 1 << 8) { n += 8; i >>>= 8; } if (i >= 1 << 4) { n += 4; i >>>= 4; } if (i >= 1 << 2) { n += 2; i >>>= 2; } return n + (i >>> 1); } But it's hard to resist the urge to optimize out a branch: static int ntz_inlineNlz6(int i) { i = ~i & (i - 1); if (i <= 0) return i & 32; int n = 1; if (i >= 1 << 16) { n += 16; i >>>= 16; } if (i >= 1 << 8) { n += 8; i >>>= 8; } if (i >= 1 << 4) { n += 4; i >>>= 4; } if (i >= 1 << 2) { n += 2; i >>>= 2; } return n + (i >>> 1); }