Re: Benchmarking Smalltalk on JVM
Rémi your comment The idea is just to propagate the type you need if you can. So for a = 2 = 3, '=' will return a RtObject but for if (2 = 3), '=' will return a boolean because it's called in an if. Ah yes, this is something that ST compilers do and I think is a good idea. Basically some selectors ( message names) are known to be mostly between say Integers and result in booleans. The compiler inlines some guards and the op to eliminate message sends for the most expected case. The downside is that you cannot overwrite these selectors for the types the compiler recognizes. So aInt = aInt = aBoolean will always occur even if you add a selector to Integer overriding =. I have left that out for now as I focus on a solid implementation of the environment but it, along with adding type hints and embedded jvm code, is a way to improve performance thanks amrk ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/02/2012 04:45 AM, Mark Roos wrote: From Rémi Without the descriptors of invokedynamic and the code of the BSM, it's hard to tell. Yes but they have no invoke dynamics and I was just wondering if my indy part was causing the issue. Your answer told me that I should be OK so that was helpful. This same code was much faster on jdk8-b20 for some reason. It can be an escape analysis change. As far as I know, escape analysis don't work through indy call but if Charles see same performance as Java, escape analysis has to work ?? I will play around and see where the time is going . Would be nice to have an way to get the 8086 object code. https://wikis.oracle.com/display/HotSpotInternals/PrintAssembly thanks mark Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/02/2012 04:45 AM, Mark Roos wrote: from Rémi if you know it will never escape,you should use an int directly. Well I am trying to build a Smalltalk system which has no static types so I have to box the ints. Since the code I showed was programmer entered I need to stay with the boxes. There are cases where the compiler generates the index code and there I do use static ints if I can be sure they are not passed. or you can box only just before it's passed. The MutableInteger trick only works because the VM does the escape analysis for you but the escape analysis done by the VM is more britlle than the one you can write, by example, you know that increment() is a pure function, the VM has to inline it to know. So if one call is not inlined in the middle of the body of the loop, then the VM will not remove your MutableInteger. It does cause some issues when I open a debugger on the stack so I may want to keep then boxed anyway and thus the MutableInteger ??, yes your debugger has to support it, but if you want a typed smalltalk you will need that anyway. mark Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On Thu, Feb 2, 2012 at 4:47 AM, Rémi Forax fo...@univ-mlv.fr wrote: It can be an escape analysis change. As far as I know, escape analysis don't work through indy call but if Charles see same performance as Java, escape analysis has to work ?? My comment was about using an iterator/cursor for iteration (no object creation visible to Ruby) rather than numeric indices (Fixnum created per iteration). When object overhead is equivalent between Ruby and Java we can match perf. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
Some nice comments from Rémi So if one call is not inlined in the middle of the body of the loop, then the VM will not remove your MutableInteger. This could be what is causing the difference in time. I have seen some mails that indicate indy GWT depth ( methodHandle stacks ) impacts the inlining budget. So a change in the size of my polymorphic cache could have a big impact. I would think that a GWT test is cheap to inline though. You are correct that I can replace the indy calls on Mutable integer with my own inline byte codes which I think it a good idea. and ??, yes your debugger has to support it, but if you want a typed smalltalk you will need that anyway. My intent for 'typed' Smalltalk code is to replace the 1000 lines or so of java code I have to have to support primitive methods. If I could generate the jvm byte codes from a Smalltalk syntax I would cut the need to write java just to get the performance improvements available from having static type information. My debugger issue is with unboxed primitives which I would like to hide as much as possible until Fixnums appear thanks mark___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/01/2012 01:52 AM, Mark Roos wrote: I just loaded about 250K lines of Smalltalk code into my jvm implementation so now I can start some real benchmarks using our application. All of this was done on a Mac. My first try was a object load which takes about 20 files and creates a pretty complex object set. This takes 100 seconds in ST and using the initial jdk7 release I also get 100 seconds. Not bad. But I see that one of the major slowdowns is in my use of boxed integers vs STs use of Fixnums. So I did some more detailed experiments. Using this code snippet which creates and drops about 2 million Integers which ST does in about 10ms. | bytes pos sum | bytes := ByteArray new:100. sum := 0. pos := 1. [pos = 100]whileTrue:[ sum := bytes at:pos. pos := pos + 1]. ^sum For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( very nice). I then converted some constructor lookups to statics to get to 66ms. Then the obvious move to make an integer cache for which I used the jTalk range of -2000 to 4000 gave 30ms And finally ( to handle the index integer) I created a MutableInteger which dropped me to 5ms. So 2X better than the ST I started with. But then I upgraded to jsk8b23 and now the best I see is 16ms. It also seems like the jit sometimes compiles and sometimes not even using the same startup sequence. Bleeding edge I would guess. But for the final test I used jdk7u4 and my load is 73 seconds. Not as good as the best jdk8b20 ( 60 seconds) but faster than native Smalltalk Hi Mark, I believe tiered compilation was enable by default between jdk8b20 and jdk8b23. I have seen some weird compilation pattern too but no time to really investigate. looking good mark Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On Feb 1, 2012, at 9:34 AM, Rémi Forax wrote: On 02/01/2012 01:52 AM, Mark Roos wrote: I just loaded about 250K lines of Smalltalk code into my jvm implementation so now I can start some real benchmarks using our application. All of this was done on a Mac. My first try was a object load which takes about 20 files and creates a pretty complex object set. This takes 100 seconds in ST and using the initial jdk7 release I also get 100 seconds. Not bad. But I see that one of the major slowdowns is in my use of boxed integers vs STs use of Fixnums. So I did some more detailed experiments. Using this code snippet which creates and drops about 2 million Integers which ST does in about 10ms. | bytes pos sum | bytes := ByteArray new:100. sum := 0. pos := 1. [pos = 100]whileTrue:[ sum := bytes at:pos. pos := pos + 1]. ^sum For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( very nice). I then converted some constructor lookups to statics to get to 66ms. Then the obvious move to make an integer cache for which I used the jTalk range of -2000 to 4000 gave 30ms And finally ( to handle the index integer) I created a MutableInteger which dropped me to 5ms. So 2X better than the ST I started with. But then I upgraded to jsk8b23 and now the best I see is 16ms. It also seems like the jit sometimes compiles and sometimes not even using the same startup sequence. Bleeding edge I would guess. But for the final test I used jdk7u4 and my load is 73 seconds. Not as good as the best jdk8b20 ( 60 seconds) but faster than native Smalltalk Hi Mark, I believe tiered compilation was enable by default between jdk8b20 and jdk8b23. I have seen some weird compilation pattern too but no time to really investigate. I was thinking about the same. Try -XX:-TieredCompilation to know for sure. -- Chris looking good mark Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos mr...@roos.com wrote: For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( very nice). I then converted some constructor lookups to statics to get to 66ms. Then the obvious move to make an integer cache for which I used the jTalk range of -2000 to 4000 gave 30ms And finally ( to handle the index integer) I created a MutableInteger which dropped me to 5ms. Can you explain MutableInteger a bit more? - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
Thanks Adding -XX:-TieredCompilation made the run time consistent at 21ms. Still not as fast as b20 ( 5ms ) but faster than 7u4 which is 29ms. mark ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
Hi Charles Its pretty simple. All of my integers are boxed and are by definition immutable. However I noticed that many uses of integer were for loop counters and indexes where the integer never escapes from the method. So I added two primitives, one to copy a integer into a new box and the other to increment the java primitive held inside the box. In all other ways it inherits from my Integer class. The value is in reducing Integer creation for big loop/index ints. Usage looks like position := 1 newMutable. gets a mutable integer with an initial value of 1 position increment:1. increments the internal primitive position = 10 normal integer compare method I'll probably add a mutable bit to the header to protect the unwary in case it escapes but for now its a power tool. regards mark From: Charles Oliver Nutter head...@headius.com To: Da Vinci Machine Project mlvm-dev@openjdk.java.net Date: 02/01/2012 12:43 PM Subject:Re: Benchmarking Smalltalk on JVM Sent by:mlvm-dev-boun...@openjdk.java.net On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos mr...@roos.com wrote: For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( very nice). I then converted some constructor lookups to statics to get to 66ms. Then the obvious move to make an integer cache for which I used the jTalk range of -2000 to 4000 gave 30ms And finally ( to handle the index integer) I created a MutableInteger which dropped me to 5ms. Can you explain MutableInteger a bit more? - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
This may be a little much to ask but... These bytecodes take about 20ns per cycle to run on my 2.8 GHz mac using jdk8-B23 without TieredCompile. Does this seem reasonable given the number of indy calls? The GWT depth on the method sends is 1 thanks mark LABEL 56 LABEL 1 56 aload 4 58 aload 3 59 astore 1 60 aload 1 INDY (asm)61 [at:] RtCallSite, (6) {RtTestCases class benchmarkLoop, 19} 66 astore 1 67 aload 1 68 astore 5 INDY (asm)70 [41] ConstantCallSite, (6) {dummy} 75 aload 4 77 astore 1 78 aload 1 INDY (asm)79 [increment:] RtCallSite, (6) {RtTestCases class benchmarkLoop, 23} 84 astore 1 LABEL 85 LABEL 0 85 aload 4 87 astore 1 INDY (asm)88 [4100] ConstantCallSite, (6) {dummy} 93 aload 1 INDY (asm)94 [=] RtCallSite, (6) {RtTestCases class benchmarkLoop, 24} 99 astore 1 100 aload 1 101 getstatic ri/core/rtalk/RtObject _true Lri/core/rtalk/RtObject; JUMP 104 if_acmpeq LABEL 1 ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/01/2012 10:44 PM, Mark Roos wrote: Hi Charles Its pretty simple. All of my integers are boxed and are by definition immutable. However I noticed that many uses of integer were for loop counters and indexes where the integer never escapes from the method. So I added two primitives, one to copy a integer into a new box and the other to increment the java primitive held inside the box. In all other ways it inherits from my Integer class. The value is in reducing Integer creation for big loop/index ints. Usage looks like position := 1 newMutable. gets a mutable integer with an initial value of 1 position increment:1.increments the internal primitive position = 10 normal integer compare method I'll probably add a mutable bit to the header to protect the unwary in case it escapes but for now its a power tool. regards mark if you know it will never escape,you should use an int directly. Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/01/2012 10:44 PM, Mark Roos wrote: This may be a little much to ask but... These bytecodes take about 20ns per cycle to run on my 2.8 GHz mac using jdk8-B23 without TieredCompile. Does this seem reasonable given the number of indy calls? The GWT depth on the method sends is 1 thanks mark LABEL 56 LABEL 1 56 aload 4 58 aload 3 59 astore 1 60 aload 1 INDY (asm) 61 [at:] RtCallSite, (6) {RtTestCases class benchmarkLoop, 19} 66 astore 1 67 aload 1 68 astore 5 INDY (asm) 70 [41] ConstantCallSite, (6) {dummy} 75 aload 4 77 astore 1 78 aload 1 INDY (asm) 79 [increment:] RtCallSite, (6) {RtTestCases class benchmarkLoop, 23} 84 astore 1 LABEL 85 LABEL 0 85 aload 4 87 astore 1 INDY (asm) 88 [4100] ConstantCallSite, (6) {dummy} 93 aload 1 INDY (asm) 94 [=] RtCallSite, (6) {RtTestCases class benchmarkLoop, 24} 99 astore 1 100 aload 1 101 getstatic ri/core/rtalk/RtObject _true Lri/core/rtalk/RtObject; JUMP 104 if_acmpeq LABEL 1 Without the descriptors of invokedynamic and the code of the BSM, it's hard to tell. Anyway, you can optimize the last instructions, = should return a boolean so the sequence should be: ldc 4100 aload 1 indy = (ILObject;)Z if_eq LABEL 1 for that you have to propagate types, from root to leafs to type the return type of invokedynamic with the expected type (the condition of an if is a boolean) an from leafs to root (the first argument of = is an int). cheers, Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
Ahh, ok, I figured it was something like that. So is your code there actual code, or is it what you compile the code as when you realize the value won't escape? As in your case, the biggest limitation to JRuby's performance these days is the cost of boxed numerics, so I'm always looking for ways to eliminate or reduce that cost. FWIW, I have done experiments with using enumerators instead of integer loops over a given range, which acts similar to what you have (since it's basically a mutable cursor that creates no new language-visible values). In such cases, JRuby + indy can iterate over a range as fast as Java. - Charlie On Wed, Feb 1, 2012 at 3:44 PM, Mark Roos mr...@roos.com wrote: Hi Charles Its pretty simple. All of my integers are boxed and are by definition immutable. However I noticed that many uses of integer were for loop counters and indexes where the integer never escapes from the method. So I added two primitives, one to copy a integer into a new box and the other to increment the java primitive held inside the box. In all other ways it inherits from my Integer class. The value is in reducing Integer creation for big loop/index ints. Usage looks like position := 1 newMutable. gets a mutable integer with an initial value of 1 position increment:1. increments the internal primitive position = 10 normal integer compare method I'll probably add a mutable bit to the header to protect the unwary in case it escapes but for now its a power tool. regards mark From: Charles Oliver Nutter head...@headius.com To: Da Vinci Machine Project mlvm-dev@openjdk.java.net Date: 02/01/2012 12:43 PM Subject: Re: Benchmarking Smalltalk on JVM Sent by: mlvm-dev-boun...@openjdk.java.net On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos mr...@roos.com wrote: For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( very nice). I then converted some constructor lookups to statics to get to 66ms. Then the obvious move to make an integer cache for which I used the jTalk range of -2000 to 4000 gave 30ms And finally ( to handle the index integer) I created a MutableInteger which dropped me to 5ms. Can you explain MutableInteger a bit more? - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
From Rémi Without the descriptors of invokedynamic and the code of the BSM, it's hard to tell. Yes but they have no invoke dynamics and I was just wondering if my indy part was causing the issue. Your answer told me that I should be OK so that was helpful. This same code was much faster on jdk8-b20 for some reason. I will play around and see where the time is going . Would be nice to have an way to get the 8086 object code. thanks mark___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
From Rémi Anyway, you can optimize the last instructions, = should return a boolean so the sequence should be: ldc 4100 aload 1 indy = (ILObject;)Z if_eq LABEL 1 I am not sure how to handle this in a Smalltalk envrionment. All of the objects are instances of the same java type so = is a method which returns an RtObject which is the singular instance of true. I have to compare that return to 'true' to get what the if bytecode wants. = could have been a block making type inference more interesting. thanks for the thoughts mark___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev