Re: Why a Java method invocation is slower when you call it somewhere else in your code?

2017-04-09 Thread Kirk Pepperdine
Hi Gil,

Interesting, I’ve run stuff in the past were simply adding a return prevented 
the code from being JIT’ed away. What I’ve also noted is that any time you 
stray from int the expected optimization will not be applied. Something like

public int sum( int a, int b, int n) {
Int temp = 0;
for (int I = 0; I  On Apr 9, 2017, at 5:49 PM, Gil Tene  wrote:
> 
> 
> 
> On Saturday, April 8, 2017 at 9:40:46 AM UTC-7, Kirk Pepperdine wrote:
> 
> >>> 
> >>> - Your mySleep won't actually do what you think it does. The entire 
> >>> method can be optimized away to nothing after inking at the call site by 
> >>> the JIT once the calls to it actually warm up enough, since it has no 
> >>> side effects and nothing is done with its return code. 
> >> 
> >> Well, this won’t happen in OpenJDK because of the return value. 
> > 
> > The return value "saves" you only as long as the method doesn't get 
> > inlined. After it is inlined, the fact that the return value isn't used 
> > allows the JIT to kill the entire code… 
> 
> You’d think but not in my experience. 
> 
> Stock OpenJDK currently inlines and completely eliminates:
> 
>   public static int wasteSomeTime(int t) {
> int x = 0;
> for(int i = 0; i < t * 1; i++) {
>   x += (t ^ x) % 93;
> }
>  return x;
>   }
> 
> When called like this:
> 
>   wasteSomeTime(sleepArg);
> 
> 
> So return values demonstrably don't prevent the optimization...
> 
> The optimization will not happen if inlining the method at the call site. 
> 
> I built a small set of jmh benchmarks to demonstrate this 
> .
>  They result in this:
> 
> Benchmark (benchLoopCount)  
> (sleepArg)   Mode  Cnt   Score   Error  Units
> MethodInliningExampleBench.noRetValIntLoop  10
>1  thrpt5  2830940580.903 ±  52900090.474  ops/s
> MethodInliningExampleBench.noRetValIntLoopNoInlining10
>1  thrpt55500.356 ±   245.758  ops/s
> MethodInliningExampleBench.retValIntLoop10
>1  thrpt5  2877030926.237 ± 134788500.109  ops/s
> MethodInliningExampleBench.retValIntLoopNoInlining  10
>1  thrpt5   0.219 ± 0.007  ops/s
> 
> 
> 
> 
> Which demonstrates that when inlining is *prevented* at the caller there is a 
> real difference between having return value and not (the loop in the method 
> gets optimized away only if there is no return value). But that when inlining 
> is not prevented at the caller and the return value is not used, both cases 
> get optimized away the same way. 
> 
> And since it is "hard" to reliably disallow inlining (without e.g. using 
> Aleksey's cool @CompilerControl(CompilerControl.Mode.DONT_INLINE annotations 
> in jmh), inlining can bite you and wreck your assumptions at any time...
> 
> Interestingly, as you can see from the same jmh tests above, while stock 
> OpenJDK will optimize away the above code, it *currently* won't optimize away 
> this code:
> 
>   public static long mySleepL1(long t) {
> long x = 0;
> for(int i = 0; i < t * 1; i++) {
>   x += (t ^ x) % 93;
> }
> return x;
>   }
> 
> Which differs only in using longs instead of ints.
> 
> The results for the longs tests are:
> 
> Benchmark  (benchLoopCount)  
> (sleepArg)   Mode  Cnt   Score   Error  Units
> MethodInliningExampleBench.noRetValLongLoop  10   
> 1  thrpt5  2924098828.778 ± 234409260.906  ops/s 
> MethodInliningExampleBench.noRetValLongLoopNoInlining10   
> 1  thrpt5   0.243 ± 0.013  ops/s 
> MethodInliningExampleBench.retValLongLoop10   
> 1  thrpt5   0.254 ± 0.014  ops/s 
> MethodInliningExampleBench.retValLongLoopNoInlining  10   
> 1  thrpt5   0.246 ± 0.012  ops/s
> 
> 
> 
> 
> So the using longs seems to defeat some of the *current* OpenJDK 
> optimizations. But how much would you want to bet on that staying the same in 
> the next release? 
> 
> Similarly, *current* stock OpenJDK won't recognize that System.nanoTime() and 
> System.currentTimeMillis() have no side effects, so the original example 
> method:
>  
> public static long mySleep(long t) {
> long x = 0;
> for(int i = 0; i < t * 1; i++) {
> x += System.currentTimeMillis() / System.nanoTime();
> }
> return x;
> }
> 
> Will not optimize away at the call site on *current* OpenJDK builds.  But 
> this can change at any moment as new optimizations and metadata about 
> intrinsics are added in coming versions or with better optimizing JITs.
> 
> In all these c

Re: Why a Java method invocation is slower when you call it somewhere else in your code?

2017-04-09 Thread Gil Tene


On Saturday, April 8, 2017 at 9:40:46 AM UTC-7, Kirk Pepperdine wrote:
>
>
> >>> 
> >>> - Your mySleep won't actually do what you think it does. The entire 
> method can be optimized away to nothing after inking at the call site by 
> the JIT once the calls to it actually warm up enough, since it has no side 
> effects and nothing is done with its return code. 
> >> 
> >> Well, this won’t happen in OpenJDK because of the return value. 
> > 
> > The return value "saves" you only as long as the method doesn't get 
> inlined. After it is inlined, the fact that the return value isn't used 
> allows the JIT to kill the entire code… 
>
> You’d think but not in my experience. 
>

Stock OpenJDK currently inlines and completely eliminates:

  public static int wasteSomeTime(int t) {
int x = 0;
for(int i = 0; i < t * 1; i++) {
  x += (t ^ x) % 93;
}
 return x;
  }

When called like this:

  wasteSomeTime(sleepArg);


So return values demonstrably don't prevent the optimization...


The optimization will not happen if inlining the method at the call site. 


I built a small set of jmh benchmarks to demonstrate this 
.
 They result in this:


Benchmark (benchLoopCount)  
(sleepArg)   Mode  Cnt   Score   Error  Units
MethodInliningExampleBench.noRetValIntLoop  10  
 1  thrpt5  2830940580.903 ±  52900090.474  ops/s
MethodInliningExampleBench.noRetValIntLoopNoInlining10  
 1  thrpt55500.356 ±   245.758  ops/s
MethodInliningExampleBench.retValIntLoop10  
 1  thrpt5  2877030926.237 ± 134788500.109  ops/s
MethodInliningExampleBench.retValIntLoopNoInlining  10  
 1  thrpt5   0.219 ± 0.007  ops/s



Which demonstrates that when inlining is **prevented** at the caller there 
is a real difference between having return value and not (the loop in the 
method gets optimized away only if there is no return value). But that when 
inlining is not prevented at the caller and the return value is not used, 
both cases get optimized away the same way. 

And since it is "hard" to reliably disallow inlining (without e.g. using 
Aleksey's cool @CompilerControl(CompilerControl.Mode.DONT_INLINE 
annotations in jmh), inlining can bite you and wreck your assumptions at 
any time...

Interestingly, as you can see from the same jmh tests above, while stock 
OpenJDK will optimize away the above code, it *currently* won't optimize 
away this code:

  public static long mySleepL1(long t) {
long x = 0;
for(int i = 0; i < t * 1; i++) {
  x += (t ^ x) % 93;
}
return x;
  }

Which differs only in using longs instead of ints.

The results for the longs tests are:

Benchmark  (benchLoopCount)  (
sleepArg)   Mode  Cnt   Score   Error  Units
MethodInliningExampleBench.noRetValLongLoop  10 
  1  thrpt5  2924098828.778 ± 234409260.906  ops/s 
MethodInliningExampleBench.noRetValLongLoopNoInlining10 
  1  thrpt5   0.243 ± 0.013  ops/s 
MethodInliningExampleBench.retValLongLoop10 
  1  thrpt5   0.254 ± 0.014  ops/s 
MethodInliningExampleBench.retValLongLoopNoInlining  10 
  1  thrpt5   0.246 ± 0.012  ops/s



So the using longs seems to defeat some of the *current* OpenJDK 
optimizations. But how much would you want to bet on that staying the same 
in the next release? 

Similarly, *current* stock OpenJDK won't recognize that System.nanoTime() 
and System.currentTimeMillis() have no side effects, so the original 
example method:
 
public static long mySleep(long t) {
long x = 0;
for(int i = 0; i < t * 1; i++) {
x += System.currentTimeMillis() / System.nanoTime();
}
return x;
}

Will not optimize away at the call site on *current* OpenJDK builds.  But 
this can change at any moment as new optimizations and metadata about 
intrinsics are added in coming versions or with better optimizing JITs.

In all these cases, dead code *might* be removed. And whether or not it 
does can depend on the length of the run, the data you use, the call site, 
the phase of the moon 🌙 , or the version of the JDK or JIT that happens to 
run your code. Any form of comparison (between call sites, versions, etc.) 
with such dead code involved is flakey, and will often lead to "surprising" 
conclusions. Sometimes those surprising conclusions happen right away. 
Sometimes the happen a year later, when you test again using your 
previously established, tried-and-tested, based-on-experience tests that no 
long