Hi all,

I've gathered statistics for Dapaco.jython bench (the worst Dacapo
bench in performance point of view), and identified several places for
optimization. For every hot place small testcase was created – you can
find below as well as estimated speedup for every case. I believe that
optimization below could significantly improve current "horrible"
situation for jython (7570 on DRLVM vs 2916 on Sun 1.6).

Throwing/catching exception (HARMONY-4549 was created to track the issue)
Expected boost: 700 ms = ~5-7 % overall jython bench
Description: Raising/catching exceptions is very slow in comparison
with Sun. TryRaiseExcept sub-bench of jython bench throwing and
catching thousands exceptions and as you can see from the numbers
below, it works more that 3 times slower on drlvm. AFAIU, since there
are some operations on exception object in catch block VM unwind the
stack every time exception caught.
Small testcase:
public class TestExceptions {

   public static void main(String[] args) {
       //warmup VM first
       tryRaiseExceptions(1);
       long start = System.currentTimeMillis();
       tryRaiseExceptions(1000000);
       long res = System.currentTimeMillis() -start;
       System.out.println("completed in "+res+" msec");
   }

   public static void tryRaiseExceptions(int n) {
       for(int i=0; i<n; i++)
           try{
               throw new TException();
           }catch(TException throwable){
               TException ts = Test2.test(throwable);
           }
   }
}


public class Test2  {
  public static TException test(TException thr) {
      return thr;
  }
}

public class TException  extends RuntimeException {
}



System.identityHashCode re-implementation on magics (HARMONY-4551)
Expected boost: 1000 ms = ~10% overall
Description: System.identityHashCode() method frequently used in
jython bench (more that 22000000 invocations). The reason of some many
invocations is IdentityHashMap usage for storing ThreadLocal objects.
I assume the method could be implemented through magic's and small
experiments with the next incorrect implementation shows huge speedup
on small testcase (from 1609 msec for un-patched version to 409 msec
on patched one)

return ObjectReference.fromObject(object).toAddress().toInt();

Small testcase:
public class test {
   public static void main(String[] args) {
       runTest(1000, new Object());
       long start = System.currentTimeMillis();
       runTest(10000000, str);
       long end = System.currentTimeMillis() - start;
       System.out.println("completed in "+end);
   }

   public static void runTest(int num, Object obj) {
       for(int i=0; i<num; i++) {
           System.identityHashCode(new Object());
       }
   }
}


Instanceof modification (HARMONY-4552)
Expected boost: 700 ms = ~5-7%
Description: instanceof used in many places in Dacapo, but the hottest
places are Arithmetic operations, in particular CompareFloats,
CompareIntegers, SimpleFloatArithmetic, etc. The typical code for
those benches is the following:
PyInteger add(PyObject obj)
If(obj instanceof PyInteger)
 Int v = ((PyInteger)obj).value

It means that we have thousands of instanceof check for the same
object, i.e. PyInteger instanceof PyInteger. Small testcase illustrate
the problem. I should mention that the test works very fast on Sun 1.6
server : 15 msec, while in client mode it completed in 2600 msec. On
Harmony VM in server mode test completed in 2700 msec

Small testcase:
public class Test {
   public static void main(String[] args) {
       runTest(1000, new String());
       long start = System.currentTimeMillis();
       runTest(1000000000, new String());
       long end = System.currentTimeMillis() - start;
       System.out.println("completed in "+end);
   }

   public static void runTest(int num, String obj) {
       for(int i=0; i<num; i++) {
           if(obj instanceof String){}
       }
   }
}

String.compareTo and equals methods optimizations ( HARMONY-4553 )
Expected boost: 700 ms = ~5-7%
Description: compareTo and equals methods used in CompareStrings,
CompareInternedStrings sub benches and in several cases inside jython.
The test below shows that DRLVM significantly slower on these
operation.

Small testcase:
public class CompareToTest{
   public static void main(String[] args){
       String st1 = new String("0 1 2 3 4 5 6 7 8 9");
       String st2 = new String("0 1 2 3 4 5 6 7 8 9");
       //warmup VM
       stringCompareTo(st1, st2, 100000);
       long start = System.currentTimeMillis();
       stringCompareTo(st1, st2, 20000000);
       long end = System.currentTimeMillis() -start;
       System.out.println("String compareTo for equals strings
completed in "+end +" msec");
       st1 = new String("0 1 2 3 4 5 6 7 8 9abc");
       //warmup VM
       stringCompareTo(st1, st2, 100000);
       long start1 = System.currentTimeMillis();
       stringCompareTo(st1, st2, 20000000);
       long end1 = System.currentTimeMillis() -start1;
       System.out.println("String compareTo for non equals strings
completed in "+end1 +" msec");

       System.out.println("Total in "+(end1+end) +" msec");

   }

   public static void stringCompareTo(String st1, String st2, int num){
       for(int x=0; x<num; x++) {
           st1.compareTo(st2);
       }

   }
}


Thread.currentThread() method optimization (HARMONY-4555)
Expected boost: ~5%
Description: Thread.currentThread() is also one of the hot method for
jython bench. The method invoked more that 7.5 millions times during
jython execution. Despite the fact that the method was already
optimized several times it still works slower on comparison with RI.
I've made some experiments with magics implementation several weeks
ago and have a good speedup for small test and for jython bench. Since
threading system redesigning at the moment, I think it would be great
to add currentThread() optimization to the plan.

Testcase:
public class CurrentThreadTest {
   public static void main(String[] args) {
       long st = System.currentTimeMillis();
       for(int i=0; i< 100000000; i++) {
           Thread.currentThread();
       }
       long res = System.currentTimeMillis()-st;
       System.out.println("res="+res);
   }
}


Could JIT, GC and Thread gurus please have a look to the mentioned issues?


Thanks.
Vladimir.

Sub-benches statistics in milliseconds :

HARMONY JDK H vs JDK

     BuiltinFunctionCalls 63 78 0,807692
      BuiltinMethodLookup 265 203 1,305419
            CompareFloats 110 31 3,548387
    CompareFloatsIntegers 94 63 1,492063
          CompareIntegers 156 31 5,032258
   CompareInternedStrings 187 31 6,032258
             CompareLongs 94 32 2,9375
           CompareStrings 125 31 4,032258
           CompareUnicode 94 31 3,032258
            ConcatStrings 797 656 1,214939
            ConcatUnicode 562 188 2,989362
          CreateInstances 203 62 3,274194
       CreateNewInstances 344 204 1,686275
  CreateStringsWithConcat 344 156 2,205128
  CreateUnicodeWithConcat 141 78 1,807692
             DictCreation 156 78 2
        DictWithFloatKeys 328 141 2,326241
      DictWithIntegerKeys 157 78 2,012821
       DictWithStringKeys 62 62 1
                 ForLoops 78 94 0,829787
               IfThenElse 172 234 0,735043
              ListSlicing 63 32 1,96875
           NestedForLoops 109 109 1
     NormalClassAttribute 156 78 2
  NormalInstanceAttribute 125 63 1,984127
      PythonFunctionCalls 188 78 2,410256
        PythonMethodCalls 250 109 2,293578
                Recursion 250 94 2,659574
             SecondImport 141 109 1,293578
      SecondPackageImport 156 141 1,106383
    SecondSubmoduleImport 234 187 1,251337
  SimpleComplexArithmetic 110 16 6,875
   SimpleDictManipulation 156 94 1,659574
    SimpleFloatArithmetic 109 62 1,758065
 SimpleIntFloatArithmetic 78 16 4,875
  SimpleIntegerArithmetic 63 31 2,032258
   SimpleListManipulation 62 31 2
     SimpleLongArithmetic 157 188 0,835106
               SmallLists 343 141 2,432624
              SmallTuples 250 125 2
    SpecialClassAttribute 141 93 1,516129
 SpecialInstanceAttribute 125 63 1,984127
           StringMappings 328 125 2,624
         StringPredicates 219 109 2,009174
            StringSlicing 140 78 1,794872
                TryExcept 16 0
           TryRaiseExcept 1641 500 3,282
             TupleSlicing 172 94 1,829787
          UnicodeMappings 156 110 1,418182
        UnicodePredicates 219 78 2,807692
           UnicodeSlicing 140 62 2,258065


10829 5578

Reply via email to