Re: Benchmarking Smalltalk on JVM
Rémi your comment The idea is just to propagate the type you need if you can. So for a = 2 <= 3, '<=' will return a RtObject but for if (2 <= 3), '<=' will return a boolean because it's called in an if. Ah yes, this is something that ST compilers do and I think is a good idea. Basically some selectors ( message names) are known to be mostly between say Integers and result in booleans. The compiler inlines some guards and the op to eliminate message sends for the most expected case. The downside is that you cannot overwrite these selectors for the types the compiler recognizes. So aInt = aInt => aBoolean will always occur even if you add a selector to Integer overriding =. I have left that out for now as I focus on a solid implementation of the environment but it, along with adding type hints and embedded jvm code, is a way to improve performance thanks amrk ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/02/2012 04:45 AM, Mark Roos wrote: > From Rémi > Anyway, you can optimize the last instructions, <= should > return a boolean >so the sequence should be: > >ldc 4100 >aload 1 >indy <= (ILObject;)Z >if_eq LABEL 1 > > I am not sure how to handle this in a Smalltalk envrionment. All of > the objects are instances of the same > java type so <= is a method which returns an RtObject which is the > singular instance of true. I have to > compare that return to 'true' to get what the if bytecode wants. <= should be a method that returns a boolean which is wrapped to a RtObject by invokedynamic if the return type is an Object but not if the return type is a boolean. > > <= could have been a block making type inference more interesting. The idea is just to propagate the type you need if you can. So for a = 2 <= 3, '<=' will return a RtObject but for if (2 <= 3), '<=' will return a boolean because it's called in an if. > > thanks for the thoughts > > mark cheers, Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
Some nice comments from Rémi So if one call is not inlined in the middle of the body of the loop, then the VM will not remove your MutableInteger. This could be what is causing the difference in time. I have seen some mails that indicate indy GWT depth ( methodHandle stacks ) impacts the inlining budget. So a change in the size of my polymorphic cache could have a big impact. I would think that a GWT test is cheap to inline though. You are correct that I can replace the indy calls on Mutable integer with my own inline byte codes which I think it a good idea. and ??, yes your debugger has to support it, but if you want a typed smalltalk you will need that anyway. My intent for 'typed' Smalltalk code is to replace the 1000 lines or so of java code I have to have to support primitive methods. If I could generate the jvm byte codes from a Smalltalk syntax I would cut the need to write java just to get the performance improvements available from having static type information. My debugger issue is with unboxed primitives which I would like to hide as much as possible until Fixnums appear thanks mark___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On Thu, Feb 2, 2012 at 4:47 AM, Rémi Forax wrote: > It can be an escape analysis change. > As far as I know, escape analysis don't work through indy call but > if Charles see same performance as Java, escape analysis has to work ?? My comment was about using an iterator/cursor for iteration (no object creation visible to Ruby) rather than numeric indices (Fixnum created per iteration). When object overhead is equivalent between Ruby and Java we can match perf. - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/02/2012 04:45 AM, Mark Roos wrote: > from Rémi > > if you know it will never escape,you should use an int directly. > > Well I am trying to build a Smalltalk system which has no static types so > I have to box the ints. Since the code I showed was programmer entered I > need to stay with the boxes. > > There are cases where the compiler generates the index code and there > I do > use static ints if I can be sure they are not passed. or you can box only just before it's passed. The MutableInteger trick only works because the VM does the escape analysis for you but the escape analysis done by the VM is more britlle than the one you can write, by example, you know that increment() is a pure function, the VM has to inline it to know. So if one call is not inlined in the middle of the body of the loop, then the VM will not remove your MutableInteger. > It does cause some > issues when I open a debugger on the stack so I may want to keep then > boxed > anyway and thus the MutableInteger ??, yes your debugger has to support it, but if you want a typed smalltalk you will need that anyway. > > mark > Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/02/2012 04:45 AM, Mark Roos wrote: > From Rémi > Without the descriptors of invokedynamic and the code of the > BSM, it's >hard to tell. > > Yes but they have no invoke dynamics and I was just wondering if my > indy part was causing the > issue. Your answer told me that I should be OK so that was helpful. > This same code was much > faster on jdk8-b20 for some reason. It can be an escape analysis change. As far as I know, escape analysis don't work through indy call but if Charles see same performance as Java, escape analysis has to work ?? > > I will play around and see where the time is going . Would be nice to > have an way to get the 8086 object code. https://wikis.oracle.com/display/HotSpotInternals/PrintAssembly > > thanks > > mark Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
>From Rémi Anyway, you can optimize the last instructions, <= should return a boolean so the sequence should be: ldc 4100 aload 1 indy <= (ILObject;)Z if_eq LABEL 1 I am not sure how to handle this in a Smalltalk envrionment. All of the objects are instances of the same java type so <= is a method which returns an RtObject which is the singular instance of true. I have to compare that return to 'true' to get what the if bytecode wants. <= could have been a block making type inference more interesting. thanks for the thoughts mark___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
>From Rémi Without the descriptors of invokedynamic and the code of the BSM, it's hard to tell. Yes but they have no invoke dynamics and I was just wondering if my indy part was causing the issue. Your answer told me that I should be OK so that was helpful. This same code was much faster on jdk8-b20 for some reason. I will play around and see where the time is going . Would be nice to have an way to get the 8086 object code. thanks mark___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
>From charles Ahh, ok, I figured it was something like that. So is your code there actual code, or is it what you compile the code as when you realize the value won't escape? FWIW, I have done experiments with using enumerators instead of integer loops This is actual code as I am trying to keep to the Smalltalk model of a very simple direct compile. So I depend on the programmer to indicate the mutable integers and to use appropriate methods. There are a few places where the compiler does create an enumerator behind the scene where I will automatically generate similar code to what you are. In this case the integers are not visible so its an easy catch. I have been toying with 'typed' Smalltalk methods where I could go directly to static java but I have not tried that yet thanks mark ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
from Rémi if you know it will never escape,you should use an int directly. Well I am trying to build a Smalltalk system which has no static types so I have to box the ints. Since the code I showed was programmer entered I need to stay with the boxes. There are cases where the compiler generates the index code and there I do use static ints if I can be sure they are not passed. It does cause some issues when I open a debugger on the stack so I may want to keep then boxed anyway and thus the MutableInteger mark ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
Ahh, ok, I figured it was something like that. So is your code there actual code, or is it what you compile the code as when you realize the value won't escape? As in your case, the biggest limitation to JRuby's performance these days is the cost of boxed numerics, so I'm always looking for ways to eliminate or reduce that cost. FWIW, I have done experiments with using enumerators instead of integer loops over a given range, which acts similar to what you have (since it's basically a mutable cursor that creates no new language-visible values). In such cases, JRuby + indy can iterate over a range as fast as Java. - Charlie On Wed, Feb 1, 2012 at 3:44 PM, Mark Roos wrote: > Hi Charles > > Its pretty simple. All of my integers are boxed and are by definition > immutable. However I noticed > that many uses of integer were for loop counters and indexes where the > integer never escapes from > the method. So I added two primitives, one to copy a integer into a new box > and the other to increment > the java primitive held inside the box. In all other ways it inherits from > my Integer class. The value is in > reducing Integer creation for big loop/index ints. > > Usage looks like > position := 1 newMutable. gets a mutable integer with an initial > value of 1 > position increment:1. increments the internal primitive > position <= 10 normal integer compare method > > I'll probably add a mutable bit to the header to protect the unwary in case > it escapes but for now its > a power tool. > > regards > mark > > > > > > From: Charles Oliver Nutter > To: Da Vinci Machine Project > Date: 02/01/2012 12:43 PM > Subject: Re: Benchmarking Smalltalk on JVM > Sent by: mlvm-dev-boun...@openjdk.java.net > > > > > On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos wrote: >> For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( >> very nice). >> I then converted some constructor lookups to statics to get to 66ms. >> Then the obvious move to make an integer cache for which I used the jTalk >> range of -2000 to 4000 gave 30ms >> And finally ( to handle the index integer) I created a MutableInteger >> which >> dropped me to 5ms. > > Can you explain MutableInteger a bit more? > > - Charlie > ___ > mlvm-dev mailing list > mlvm-dev@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > > > ___ > mlvm-dev mailing list > mlvm-dev@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev > ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/01/2012 10:44 PM, Mark Roos wrote: > This may be a little much to ask but... These bytecodes take about > 20ns per cycle to run on > my 2.8 GHz mac using jdk8-B23 without TieredCompile. Does this seem > reasonable given the number of indy calls? > The GWT depth on the method sends is 1 > > thanks > mark > >LABEL <56> LABEL 1 > <56> aload 4 > <58> aload 3 > <59> astore 1 > <60> aload 1 >INDY (asm) <61> ["at:"] RtCallSite, (6) {RtTestCases class > benchmarkLoop, 19} > <66> astore 1 > <67> aload 1 > <68> astore 5 >INDY (asm) <70> ["41"] ConstantCallSite, (6) {dummy} > <75> aload 4 > <77> astore 1 > <78> aload 1 >INDY (asm) <79> ["increment:"] RtCallSite, (6) {RtTestCases class > benchmarkLoop, 23} > <84> astore 1 >LABEL <85> LABEL 0 > <85> aload 4 > <87> astore 1 >INDY (asm) <88> ["4100"] ConstantCallSite, (6) {dummy} > <93> aload 1 >INDY (asm) <94> ["<="] RtCallSite, (6) {RtTestCases class > benchmarkLoop, 24} > <99> astore 1 > <100> aload 1 > <101> getstatic ri/core/rtalk/RtObject _true Lri/core/rtalk/RtObject; >JUMP <104> if_acmpeq LABEL 1 Without the descriptors of invokedynamic and the code of the BSM, it's hard to tell. Anyway, you can optimize the last instructions, <= should return a boolean so the sequence should be: ldc 4100 aload 1 indy <= (ILObject;)Z if_eq LABEL 1 for that you have to propagate types, from root to leafs to type the return type of invokedynamic with the expected type (the condition of an if is a boolean) an from leafs to root (the first argument of <= is an int). cheers, Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/01/2012 10:44 PM, Mark Roos wrote: > Hi Charles > > Its pretty simple. All of my integers are boxed and are by definition > immutable. However I noticed > that many uses of integer were for loop counters and indexes where the > integer never escapes from > the method. So I added two primitives, one to copy a integer into a > new box and the other to increment > the java primitive held inside the box. In all other ways it inherits > from my Integer class. The value is in > reducing Integer creation for big loop/index ints. > > Usage looks like > position := 1 newMutable. gets a mutable integer with an > initial value of 1 > position increment:1.increments the internal > primitive > position <= 10 normal integer compare method > > I'll probably add a mutable bit to the header to protect the unwary in > case it escapes but for now its > a power tool. > > regards > mark if you know it will never escape,you should use an int directly. Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
This may be a little much to ask but... These bytecodes take about 20ns per cycle to run on my 2.8 GHz mac using jdk8-B23 without TieredCompile. Does this seem reasonable given the number of indy calls? The GWT depth on the method sends is 1 thanks mark LABEL <56> LABEL 1 <56> aload 4 <58> aload 3 <59> astore 1 <60> aload 1 INDY (asm)<61> ["at:"] RtCallSite, (6) {RtTestCases class benchmarkLoop, 19} <66> astore 1 <67> aload 1 <68> astore 5 INDY (asm)<70> ["41"] ConstantCallSite, (6) {dummy} <75> aload 4 <77> astore 1 <78> aload 1 INDY (asm)<79> ["increment:"] RtCallSite, (6) {RtTestCases class benchmarkLoop, 23} <84> astore 1 LABEL <85> LABEL 0 <85> aload 4 <87> astore 1 INDY (asm)<88> ["4100"] ConstantCallSite, (6) {dummy} <93> aload 1 INDY (asm)<94> ["<="] RtCallSite, (6) {RtTestCases class benchmarkLoop, 24} <99> astore 1 <100> aload 1 <101> getstatic ri/core/rtalk/RtObject _true Lri/core/rtalk/RtObject; JUMP <104> if_acmpeq LABEL 1 ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
Hi Charles Its pretty simple. All of my integers are boxed and are by definition immutable. However I noticed that many uses of integer were for loop counters and indexes where the integer never escapes from the method. So I added two primitives, one to copy a integer into a new box and the other to increment the java primitive held inside the box. In all other ways it inherits from my Integer class. The value is in reducing Integer creation for big loop/index ints. Usage looks like position := 1 newMutable. gets a mutable integer with an initial value of 1 position increment:1. increments the internal primitive position <= 10 normal integer compare method I'll probably add a mutable bit to the header to protect the unwary in case it escapes but for now its a power tool. regards mark From: Charles Oliver Nutter To: Da Vinci Machine Project Date: 02/01/2012 12:43 PM Subject:Re: Benchmarking Smalltalk on JVM Sent by:mlvm-dev-boun...@openjdk.java.net On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos wrote: > For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( > very nice). > I then converted some constructor lookups to statics to get to 66ms. > Then the obvious move to make an integer cache for which I used the jTalk > range of -2000 to 4000 gave 30ms > And finally ( to handle the index integer) I created a MutableInteger which > dropped me to 5ms. Can you explain MutableInteger a bit more? - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
Thanks Adding -XX:-TieredCompilation made the run time consistent at 21ms. Still not as fast as b20 ( 5ms ) but faster than 7u4 which is 29ms. mark ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos wrote: > For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( > very nice). > I then converted some constructor lookups to statics to get to 66ms. > Then the obvious move to make an integer cache for which I used the jTalk > range of -2000 to 4000 gave 30ms > And finally ( to handle the index integer) I created a MutableInteger which > dropped me to 5ms. Can you explain MutableInteger a bit more? - Charlie ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On Feb 1, 2012, at 9:34 AM, Rémi Forax wrote: > On 02/01/2012 01:52 AM, Mark Roos wrote: >> I just loaded about 250K lines of Smalltalk code into my jvm >> implementation so now I can start >> some real benchmarks using our application. All of this was done on a >> Mac. >> >> My first try was a object load which takes about 20 files and creates >> a pretty complex object set. This >> takes 100 seconds in ST and using the initial jdk7 release I also get >> 100 seconds. Not bad. But >> I see that one of the major slowdowns is in my use of boxed integers >> vs STs use of Fixnums. So >> I did some more detailed experiments. >> >> Using this code snippet which creates and drops about 2 million >> Integers which ST does in about 10ms. >> >>| bytes pos sum | >>bytes := ByteArray new:100. >>sum := 0. >>pos := 1. >>[pos <= 100]whileTrue:[ >>sum := bytes at:pos. >>pos := pos + 1]. >>^sum >> >> For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to >> 117ms ( very nice). >> I then converted some constructor lookups to statics to get to 66ms. >> Then the obvious move to make an integer cache for which I used the >> jTalk range of -2000 to 4000 gave 30ms >> And finally ( to handle the index integer) I created a MutableInteger >> which dropped me to 5ms. >> >> So 2X better than the ST I started with. >> >> But then I upgraded to jsk8b23 and now the best I see is 16ms. It >> also seems like the jit sometimes >> compiles and sometimes not even using the same startup sequence. >> Bleeding edge I would guess. >> >> But for the final test I used jdk7u4 and my load is 73 seconds. Not >> as good as the best jdk8b20 ( 60 seconds) >> but faster than native Smalltalk > > Hi Mark, I believe tiered compilation was enable by default between > jdk8b20 and jdk8b23. > I have seen some weird compilation pattern too but no time to really > investigate. I was thinking about the same. Try -XX:-TieredCompilation to know for sure. -- Chris > >> >> looking good >> mark > > Rémi > > ___ > mlvm-dev mailing list > mlvm-dev@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: Benchmarking Smalltalk on JVM
On 02/01/2012 01:52 AM, Mark Roos wrote: > I just loaded about 250K lines of Smalltalk code into my jvm > implementation so now I can start > some real benchmarks using our application. All of this was done on a > Mac. > > My first try was a object load which takes about 20 files and creates > a pretty complex object set. This > takes 100 seconds in ST and using the initial jdk7 release I also get > 100 seconds. Not bad. But > I see that one of the major slowdowns is in my use of boxed integers > vs STs use of Fixnums. So > I did some more detailed experiments. > > Using this code snippet which creates and drops about 2 million > Integers which ST does in about 10ms. > > | bytes pos sum | > bytes := ByteArray new:100. > sum := 0. > pos := 1. > [pos <= 100]whileTrue:[ > sum := bytes at:pos. > pos := pos + 1]. > ^sum > > For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to > 117ms ( very nice). > I then converted some constructor lookups to statics to get to 66ms. > Then the obvious move to make an integer cache for which I used the > jTalk range of -2000 to 4000 gave 30ms > And finally ( to handle the index integer) I created a MutableInteger > which dropped me to 5ms. > > So 2X better than the ST I started with. > > But then I upgraded to jsk8b23 and now the best I see is 16ms. It > also seems like the jit sometimes > compiles and sometimes not even using the same startup sequence. > Bleeding edge I would guess. > > But for the final test I used jdk7u4 and my load is 73 seconds. Not > as good as the best jdk8b20 ( 60 seconds) > but faster than native Smalltalk Hi Mark, I believe tiered compilation was enable by default between jdk8b20 and jdk8b23. I have seen some weird compilation pattern too but no time to really investigate. > > looking good > mark Rémi ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Benchmarking Smalltalk on JVM
I just loaded about 250K lines of Smalltalk code into my jvm implementation so now I can start some real benchmarks using our application. All of this was done on a Mac. My first try was a object load which takes about 20 files and creates a pretty complex object set. This takes 100 seconds in ST and using the initial jdk7 release I also get 100 seconds. Not bad. But I see that one of the major slowdowns is in my use of boxed integers vs STs use of Fixnums. So I did some more detailed experiments. Using this code snippet which creates and drops about 2 million Integers which ST does in about 10ms. | bytes pos sum | bytes := ByteArray new:100. sum := 0. pos := 1. [pos <= 100]whileTrue:[ sum := bytes at:pos. pos := pos + 1]. ^sum For the initial JDK7 I get 400ms, moving to jdk8 b20 it drops to 117ms ( very nice). I then converted some constructor lookups to statics to get to 66ms. Then the obvious move to make an integer cache for which I used the jTalk range of -2000 to 4000 gave 30ms And finally ( to handle the index integer) I created a MutableInteger which dropped me to 5ms. So 2X better than the ST I started with. But then I upgraded to jsk8b23 and now the best I see is 16ms. It also seems like the jit sometimes compiles and sometimes not even using the same startup sequence. Bleeding edge I would guess. But for the final test I used jdk7u4 and my load is 73 seconds. Not as good as the best jdk8b20 ( 60 seconds) but faster than native Smalltalk looking good mark___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev