On 14.05.2015 2:06, Vitaly Davidovich wrote:

Why not look at the generated asm and not guess? :) The branch avoiding versions may cause data dependence hazards whereas the branchy one just has branches but assuming perfectly predicted (and microbenchmarks typically are) can pipeline through. Ivan, could you please post the asm here? Assuming you guys are interested in investigating this further.

Sure, here they are:

  void substring_1(int, int, char[]);
    Code:
       0: iload_1
       1: iflt          15
       4: iload_2
       5: aload_3
       6: arraylength
       7: if_icmpgt     15
      10: iload_1
      11: iload_2
      12: if_icmple     23
      15: new           #4                  // class java/lang/Error
      18: dup
19: invokespecial #5 // Method java/lang/Error."<init>":()V
      22: athrow
      23: return

  void substring_2(int, int, char[]);
    Code:
       0: iload_1
       1: aload_3
       2: arraylength
       3: iload_2
       4: isub
       5: ior
       6: iload_2
       7: iload_1
       8: isub
       9: ior
      10: ifge          21
      13: new           #4                  // class java/lang/Error
      16: dup
17: invokespecial #5 // Method java/lang/Error."<init>":()V
      20: athrow
      21: return

  void substring_3(int, int, char[]);
    Code:
       0: iload_1
       1: aload_3
       2: arraylength
       3: iload_2
       4: isub
       5: ior
       6: iflt          18
       9: iload_2
      10: iload_1
      11: isub
      12: dup
      13: istore        4
      15: ifge          26
      18: new           #4                  // class java/lang/Error
      21: dup
22: invokespecial #5 // Method java/lang/Error."<init>":()V
      25: athrow
      26: return

Sincerely yours,
Ivan

sent from my phone

On May 13, 2015 6:51 PM, "Martin Buchholz" <marti...@google.com <mailto:marti...@google.com>> wrote:

    On Wed, May 13, 2015 at 2:25 PM, Ivan Gerasimov
    <ivan.gerasi...@oracle.com <mailto:ivan.gerasi...@oracle.com>>
    wrote:

    >
    > Benchmark                  Mode  Cnt           Score     Error Units
    > MyBenchmark.testMethod_1  thrpt   60  1132911599.680 ±
    42375177.640 ops/s
    > MyBenchmark.testMethod_2  thrpt   60   813737659.576 ±
    14226427.823 ops/s
    > MyBenchmark.testMethod_3  thrpt   60   810406621.145 ±
    12316864.045 ops/s
    >
    > The plain old ||-combined check was faster in this round.
    > Some other tests showed different results.
    > The speed seems to depend on the scope of the checked variables and
    > complexity of the expressions to calculate.
    > However, I still don't have a clear understanding of all the
    aspects we
    > need to pay attention to when doing such optimizations.
    >

    I'm not sure, but the only thing that could explain such a huge
    performance
    gap is that hotspot was able to determine at jit time that some of the
    comparisons did not need to be performed at all.  If true, is this
    cheating
    or not?  (you could retry with -Xint)  One of the ideas is to
    separate hot
    and cold code (hotspot does not yet split code inside a single
    method) so
    that hotspot is more likely to inline, so that hotspot is more
    likely to
    optimize, and optimizing beginIndex < 0 away entirely is much
    easier than
    my more complex expression.  So yeah, I could be persuaded that
    keeping
    beginIndex < 0 as an independent expression likely to be eliminated.
    Micro-optimizing is hard, but for the very core of the platform,
    important
    (more than readability).

    One of these days I have to learn how to write a jmh benchmark.


Reply via email to