On 14.05.2015 2:06, Vitaly Davidovich wrote:
Why not look at the generated asm and not guess? :) The branch
avoiding versions may cause data dependence hazards whereas the
branchy one just has branches but assuming perfectly predicted (and
microbenchmarks typically are) can pipeline through. Ivan, could you
please post the asm here? Assuming you guys are interested in
investigating this further.
Sure, here they are:
void substring_1(int, int, char[]);
Code:
0: iload_1
1: iflt 15
4: iload_2
5: aload_3
6: arraylength
7: if_icmpgt 15
10: iload_1
11: iload_2
12: if_icmple 23
15: new #4 // class java/lang/Error
18: dup
19: invokespecial #5 // Method
java/lang/Error."<init>":()V
22: athrow
23: return
void substring_2(int, int, char[]);
Code:
0: iload_1
1: aload_3
2: arraylength
3: iload_2
4: isub
5: ior
6: iload_2
7: iload_1
8: isub
9: ior
10: ifge 21
13: new #4 // class java/lang/Error
16: dup
17: invokespecial #5 // Method
java/lang/Error."<init>":()V
20: athrow
21: return
void substring_3(int, int, char[]);
Code:
0: iload_1
1: aload_3
2: arraylength
3: iload_2
4: isub
5: ior
6: iflt 18
9: iload_2
10: iload_1
11: isub
12: dup
13: istore 4
15: ifge 26
18: new #4 // class java/lang/Error
21: dup
22: invokespecial #5 // Method
java/lang/Error."<init>":()V
25: athrow
26: return
Sincerely yours,
Ivan
sent from my phone
On May 13, 2015 6:51 PM, "Martin Buchholz" <marti...@google.com
<mailto:marti...@google.com>> wrote:
On Wed, May 13, 2015 at 2:25 PM, Ivan Gerasimov
<ivan.gerasi...@oracle.com <mailto:ivan.gerasi...@oracle.com>>
wrote:
>
> Benchmark Mode Cnt Score Error Units
> MyBenchmark.testMethod_1 thrpt 60 1132911599.680 ±
42375177.640 ops/s
> MyBenchmark.testMethod_2 thrpt 60 813737659.576 ±
14226427.823 ops/s
> MyBenchmark.testMethod_3 thrpt 60 810406621.145 ±
12316864.045 ops/s
>
> The plain old ||-combined check was faster in this round.
> Some other tests showed different results.
> The speed seems to depend on the scope of the checked variables and
> complexity of the expressions to calculate.
> However, I still don't have a clear understanding of all the
aspects we
> need to pay attention to when doing such optimizations.
>
I'm not sure, but the only thing that could explain such a huge
performance
gap is that hotspot was able to determine at jit time that some of the
comparisons did not need to be performed at all. If true, is this
cheating
or not? (you could retry with -Xint) One of the ideas is to
separate hot
and cold code (hotspot does not yet split code inside a single
method) so
that hotspot is more likely to inline, so that hotspot is more
likely to
optimize, and optimizing beginIndex < 0 away entirely is much
easier than
my more complex expression. So yeah, I could be persuaded that
keeping
beginIndex < 0 as an independent expression likely to be eliminated.
Micro-optimizing is hard, but for the very core of the platform,
important
(more than readability).
One of these days I have to learn how to write a jmh benchmark.