Vladimir,
this is a REALLY AWESOME analysis that you perfomed!!
we should definitely pick all these items and optimise them out, and,
I am sure, we will!
Greater thanks!
on the exceptions: I wonder why lazyexc does not apply here.. Maybe,
this is a recompilation problem? Vladimir, did you try to run
tryRaiseExceptions(...) several times in a loop? does it help DRLVM's
performance?
On the 0x31F day of Apache Harmony Vladimir Strigun wrote:
> Hi all,
>
> I've gathered statistics for Dapaco.jython bench (the worst Dacapo
> bench in performance point of view), and identified several places for
> optimization. For every hot place small testcase was created √ you can
> find below as well as estimated speedup for every case. I believe that
> optimization below could significantly improve current "horrible"
> situation for jython (7570 on DRLVM vs 2916 on Sun 1.6).
>
> Throwing/catching exception (HARMONY-4549 was created to track the issue)
> Expected boost: 700 ms = ~5-7 % overall jython bench
> Description: Raising/catching exceptions is very slow in comparison
> with Sun. TryRaiseExcept sub-bench of jython bench throwing and
> catching thousands exceptions and as you can see from the numbers
> below, it works more that 3 times slower on drlvm. AFAIU, since there
> are some operations on exception object in catch block VM unwind the
> stack every time exception caught.
> Small testcase:
> public class TestExceptions {
>
> public static void main(String[] args) {
> //warmup VM first
> tryRaiseExceptions(1);
> long start = System.currentTimeMillis();
> tryRaiseExceptions(1000000);
> long res = System.currentTimeMillis() -start;
> System.out.println("completed in "+res+" msec");
> }
>
> public static void tryRaiseExceptions(int n) {
> for(int i=0; i<n; i++)
> try{
> throw new TException();
> }catch(TException throwable){
> TException ts = Test2.test(throwable);
> }
> }
> }
>
>
> public class Test2 {
> public static TException test(TException thr) {
> return thr;
> }
> }
>
> public class TException extends RuntimeException {
> }
>
>
>
> System.identityHashCode re-implementation on magics (HARMONY-4551)
> Expected boost: 1000 ms = ~10% overall
> Description: System.identityHashCode() method frequently used in
> jython bench (more that 22000000 invocations). The reason of some many
> invocations is IdentityHashMap usage for storing ThreadLocal objects.
> I assume the method could be implemented through magic's and small
> experiments with the next incorrect implementation shows huge speedup
> on small testcase (from 1609 msec for un-patched version to 409 msec
> on patched one)
>
> return ObjectReference.fromObject(object).toAddress().toInt();
>
> Small testcase:
> public class test {
> public static void main(String[] args) {
> runTest(1000, new Object());
> long start = System.currentTimeMillis();
> runTest(10000000, str);
> long end = System.currentTimeMillis() - start;
> System.out.println("completed in "+end);
> }
>
> public static void runTest(int num, Object obj) {
> for(int i=0; i<num; i++) {
> System.identityHashCode(new Object());
> }
> }
> }
>
>
> Instanceof modification (HARMONY-4552)
> Expected boost: 700 ms = ~5-7%
> Description: instanceof used in many places in Dacapo, but the hottest
> places are Arithmetic operations, in particular CompareFloats,
> CompareIntegers, SimpleFloatArithmetic, etc. The typical code for
> those benches is the following:
> PyInteger add(PyObject obj)
> If(obj instanceof PyInteger)
> Int v = ((PyInteger)obj).value
>
> It means that we have thousands of instanceof check for the same
> object, i.e. PyInteger instanceof PyInteger. Small testcase illustrate
> the problem. I should mention that the test works very fast on Sun 1.6
> server : 15 msec, while in client mode it completed in 2600 msec. On
> Harmony VM in server mode test completed in 2700 msec
>
> Small testcase:
> public class Test {
> public static void main(String[] args) {
> runTest(1000, new String());
> long start = System.currentTimeMillis();
> runTest(1000000000, new String());
> long end = System.currentTimeMillis() - start;
> System.out.println("completed in "+end);
> }
>
> public static void runTest(int num, String obj) {
> for(int i=0; i<num; i++) {
> if(obj instanceof String){}
> }
> }
> }
>
> String.compareTo and equals methods optimizations ( HARMONY-4553 )
> Expected boost: 700 ms = ~5-7%
> Description: compareTo and equals methods used in CompareStrings,
> CompareInternedStrings sub benches and in several cases inside jython.
> The test below shows that DRLVM significantly slower on these
> operation.
>
> Small testcase:
> public class CompareToTest{
> public static void main(String[] args){
> String st1 = new String("0 1 2 3 4 5 6 7 8 9");
> String st2 = new String("0 1 2 3 4 5 6 7 8 9");
> //warmup VM
> stringCompareTo(st1, st2, 100000);
> long start = System.currentTimeMillis();
> stringCompareTo(st1, st2, 20000000);
> long end = System.currentTimeMillis() -start;
> System.out.println("String compareTo for equals strings
> completed in "+end +" msec");
> st1 = new String("0 1 2 3 4 5 6 7 8 9abc");
> //warmup VM
> stringCompareTo(st1, st2, 100000);
> long start1 = System.currentTimeMillis();
> stringCompareTo(st1, st2, 20000000);
> long end1 = System.currentTimeMillis() -start1;
> System.out.println("String compareTo for non equals strings
> completed in "+end1 +" msec");
>
> System.out.println("Total in "+(end1+end) +" msec");
>
> }
>
> public static void stringCompareTo(String st1, String st2, int num){
> for(int x=0; x<num; x++) {
> st1.compareTo(st2);
> }
>
> }
> }
>
>
> Thread.currentThread() method optimization (HARMONY-4555)
> Expected boost: ~5%
> Description: Thread.currentThread() is also one of the hot method for
> jython bench. The method invoked more that 7.5 millions times during
> jython execution. Despite the fact that the method was already
> optimized several times it still works slower on comparison with RI.
> I've made some experiments with magics implementation several weeks
> ago and have a good speedup for small test and for jython bench. Since
> threading system redesigning at the moment, I think it would be great
> to add currentThread() optimization to the plan.
>
> Testcase:
> public class CurrentThreadTest {
> public static void main(String[] args) {
> long st = System.currentTimeMillis();
> for(int i=0; i< 100000000; i++) {
> Thread.currentThread();
> }
> long res = System.currentTimeMillis()-st;
> System.out.println("res="+res);
> }
> }
>
>
> Could JIT, GC and Thread gurus please have a look to the mentioned issues?
>
>
> Thanks.
> Vladimir.
>
> Sub-benches statistics in milliseconds :
>
> HARMONY JDK H vs JDK
>
> BuiltinFunctionCalls 63 78 0,807692
> BuiltinMethodLookup 265 203 1,305419
> CompareFloats 110 31 3,548387
> CompareFloatsIntegers 94 63 1,492063
> CompareIntegers 156 31 5,032258
> CompareInternedStrings 187 31 6,032258
> CompareLongs 94 32 2,9375
> CompareStrings 125 31 4,032258
> CompareUnicode 94 31 3,032258
> ConcatStrings 797 656 1,214939
> ConcatUnicode 562 188 2,989362
> CreateInstances 203 62 3,274194
> CreateNewInstances 344 204 1,686275
> CreateStringsWithConcat 344 156 2,205128
> CreateUnicodeWithConcat 141 78 1,807692
> DictCreation 156 78 2
> DictWithFloatKeys 328 141 2,326241
> DictWithIntegerKeys 157 78 2,012821
> DictWithStringKeys 62 62 1
> ForLoops 78 94 0,829787
> IfThenElse 172 234 0,735043
> ListSlicing 63 32 1,96875
> NestedForLoops 109 109 1
> NormalClassAttribute 156 78 2
> NormalInstanceAttribute 125 63 1,984127
> PythonFunctionCalls 188 78 2,410256
> PythonMethodCalls 250 109 2,293578
> Recursion 250 94 2,659574
> SecondImport 141 109 1,293578
> SecondPackageImport 156 141 1,106383
> SecondSubmoduleImport 234 187 1,251337
> SimpleComplexArithmetic 110 16 6,875
> SimpleDictManipulation 156 94 1,659574
> SimpleFloatArithmetic 109 62 1,758065
> SimpleIntFloatArithmetic 78 16 4,875
> SimpleIntegerArithmetic 63 31 2,032258
> SimpleListManipulation 62 31 2
> SimpleLongArithmetic 157 188 0,835106
> SmallLists 343 141 2,432624
> SmallTuples 250 125 2
> SpecialClassAttribute 141 93 1,516129
> SpecialInstanceAttribute 125 63 1,984127
> StringMappings 328 125 2,624
> StringPredicates 219 109 2,009174
> StringSlicing 140 78 1,794872
> TryExcept 16 0
> TryRaiseExcept 1641 500 3,282
> TupleSlicing 172 94 1,829787
> UnicodeMappings 156 110 1,418182
> UnicodePredicates 219 78 2,807692
> UnicodeSlicing 140 62 2,258065
>
>
> 10829 5578
>
--
Egor Pasko