Re: Benchmarking Smalltalk on JVM

2012-02-13 Thread Mark Roos
Rémi your comment

The idea is just to propagate the type you need if you can.
So
   for a = 2 = 3, '=' will return a RtObject but
   for if (2 = 3),  '=' will return a boolean because it's called in 
an if.

Ah yes,
this is something that ST compilers do and I think is a good idea. 
Basically some selectors ( message names)
are known to be  mostly between say Integers and result in booleans.  The 
compiler inlines some guards and the op
to eliminate message sends for the most expected case.  The downside is 
that you cannot overwrite these
selectors for the types the compiler recognizes.  So aInt = aInt = 
aBoolean will always occur even if you add
a selector to Integer overriding =.

I have left that out for now as I focus on a solid implementation of the 
environment but it,  along with adding
type hints and embedded jvm code, is a way to improve performance

thanks
amrk

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-02 Thread Rémi Forax
On 02/02/2012 04:45 AM, Mark Roos wrote:
 From Rémi
 Without the descriptors of invokedynamic and the code of the 
 BSM, it's
hard to tell.

 Yes but they have no invoke dynamics and I was just wondering if my 
 indy part was causing the
 issue.  Your answer told me that I should be OK so that was helpful.   
 This same code was much
 faster on jdk8-b20 for some reason.

It can be an escape analysis change.
As far as I know, escape analysis don't work through indy call but
if Charles see same performance as Java, escape analysis has to work ??


 I will play around and see where the time is going .  Would be nice to 
 have an way to get the 8086 object code.

https://wikis.oracle.com/display/HotSpotInternals/PrintAssembly


 thanks

 mark

Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-02 Thread Rémi Forax
On 02/02/2012 04:45 AM, Mark Roos wrote:
 from Rémi

 if you know it will never escape,you should use an int directly.

 Well I am trying to build a Smalltalk system which has no static types so
 I have to box the ints. Since the code I showed was programmer entered I
 need to stay with the boxes.

 There are cases where the compiler generates the index code and there 
 I do
 use static ints if I can be sure they are not passed.

or you can box only just before it's passed.

The MutableInteger trick only works because the VM does
the escape analysis for you but the escape analysis done
by the VM is more britlle than the one you can write,
by example, you know that increment() is a pure function,
the VM has to inline it to know. So if one call is not inlined in
the middle of the body of the loop, then the VM will
not remove your MutableInteger.

 It does cause some
 issues when I open a debugger on the stack so I may want to keep then 
 boxed
 anyway and thus the MutableInteger

??, yes your debugger has to support it, but if you
want a typed smalltalk you will need that anyway.


 mark


Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-02 Thread Charles Oliver Nutter
On Thu, Feb 2, 2012 at 4:47 AM, Rémi Forax fo...@univ-mlv.fr wrote:
 It can be an escape analysis change.
 As far as I know, escape analysis don't work through indy call but
 if Charles see same performance as Java, escape analysis has to work ??

My comment was about using an iterator/cursor for iteration (no object
creation visible to Ruby) rather than numeric indices (Fixnum created
per iteration). When object overhead is equivalent between Ruby and
Java we can match perf.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-02 Thread Mark Roos
Some nice comments from Rémi

So if one call is not inlined in
the middle of the body of the loop, then the VM will
not remove your MutableInteger.

This could be what is causing the difference in time.  I have seen some 
mails that indicate indy
GWT depth ( methodHandle stacks ) impacts the inlining budget.  So a 
change in the size of my
polymorphic cache could have a big impact.  I would think that a GWT test 
is cheap to inline though.

You are correct that I can replace the indy calls on Mutable integer with 
my own inline byte codes
which I think it a good idea.

and
??, yes your debugger has to support it, but if you
want a typed smalltalk you will need that anyway.

My intent for 'typed' Smalltalk code is to replace the 1000 lines or so of 
java code I have to have
to support primitive methods.  If I could generate the jvm byte codes from 
a Smalltalk syntax I would
cut the need to write java just to get the performance improvements 
available from having static
type information.

My debugger issue is with unboxed primitives which I would like to hide as 
much as possible until
Fixnums appear

thanks
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Rémi Forax
On 02/01/2012 01:52 AM, Mark Roos wrote:
 I just loaded about 250K lines of Smalltalk code into my jvm 
 implementation so now I can start
 some real benchmarks using our application.  All of this was done on a 
 Mac.

 My first try was a object load which takes about 20 files and creates 
 a pretty complex object set.  This
 takes 100 seconds in ST and using the initial jdk7 release I also get 
 100 seconds.  Not bad.  But
 I see that one of the major slowdowns is in my use of boxed integers 
 vs STs use of Fixnums.  So
 I did some more detailed experiments.

 Using this code snippet which creates and drops about 2 million 
 Integers which ST does in about 10ms.

 | bytes pos sum |
 bytes := ByteArray new:100.
 sum := 0.
 pos := 1.
 [pos = 100]whileTrue:[
 sum := bytes at:pos.
 pos := pos + 1].
 ^sum

 For the initial JDK7 I get 400ms,  moving to jdk8 b20 it drops to 
 117ms ( very nice).
 I then converted some constructor lookups to statics to get to 66ms.
 Then the obvious move to make an integer cache for which I used the 
 jTalk range of -2000 to 4000 gave 30ms
 And finally ( to handle the index integer) I created a MutableInteger 
 which dropped me to 5ms.

 So 2X better than the ST I started with.

 But then I upgraded to jsk8b23 and now the best I see is 16ms.  It 
 also seems like the jit sometimes
 compiles and sometimes not even using the same startup sequence. 
  Bleeding edge I would guess.

 But for the final test I used jdk7u4 and my load is 73 seconds.  Not 
 as good as the best jdk8b20 ( 60 seconds)
 but faster than native Smalltalk

Hi Mark, I believe tiered compilation was enable by default between 
jdk8b20 and jdk8b23.
I have seen some weird compilation pattern too but no time to really 
investigate.


 looking good
 mark

Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Christian Thalinger

On Feb 1, 2012, at 9:34 AM, Rémi Forax wrote:

 On 02/01/2012 01:52 AM, Mark Roos wrote:
 I just loaded about 250K lines of Smalltalk code into my jvm 
 implementation so now I can start
 some real benchmarks using our application.  All of this was done on a 
 Mac.
 
 My first try was a object load which takes about 20 files and creates 
 a pretty complex object set.  This
 takes 100 seconds in ST and using the initial jdk7 release I also get 
 100 seconds.  Not bad.  But
 I see that one of the major slowdowns is in my use of boxed integers 
 vs STs use of Fixnums.  So
 I did some more detailed experiments.
 
 Using this code snippet which creates and drops about 2 million 
 Integers which ST does in about 10ms.
 
| bytes pos sum |
bytes := ByteArray new:100.
sum := 0.
pos := 1.
[pos = 100]whileTrue:[
sum := bytes at:pos.
pos := pos + 1].
^sum
 
 For the initial JDK7 I get 400ms,  moving to jdk8 b20 it drops to 
 117ms ( very nice).
 I then converted some constructor lookups to statics to get to 66ms.
 Then the obvious move to make an integer cache for which I used the 
 jTalk range of -2000 to 4000 gave 30ms
 And finally ( to handle the index integer) I created a MutableInteger 
 which dropped me to 5ms.
 
 So 2X better than the ST I started with.
 
 But then I upgraded to jsk8b23 and now the best I see is 16ms.  It 
 also seems like the jit sometimes
 compiles and sometimes not even using the same startup sequence. 
 Bleeding edge I would guess.
 
 But for the final test I used jdk7u4 and my load is 73 seconds.  Not 
 as good as the best jdk8b20 ( 60 seconds)
 but faster than native Smalltalk
 
 Hi Mark, I believe tiered compilation was enable by default between 
 jdk8b20 and jdk8b23.
 I have seen some weird compilation pattern too but no time to really 
 investigate.

I was thinking about the same.  Try -XX:-TieredCompilation to know for sure.

-- Chris

 
 
 looking good
 mark
 
 Rémi
 
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Charles Oliver Nutter
On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos mr...@roos.com wrote:
 For the initial JDK7 I get 400ms,  moving to jdk8 b20 it drops to 117ms (
 very nice).
 I then converted some constructor lookups to statics to get to 66ms.
 Then the obvious move to make an integer cache for which I used the jTalk
 range of -2000 to 4000 gave 30ms
 And finally ( to handle the index integer) I created a MutableInteger which
 dropped me to 5ms.

Can you explain MutableInteger a bit more?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Mark Roos
Thanks

Adding

-XX:-TieredCompilation

made the run time consistent at 21ms.  Still not as fast as b20 ( 5ms ) 
but faster than 7u4 
which is 29ms.

mark

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Mark Roos
Hi Charles

Its pretty simple.  All of my integers are boxed and are by definition 
immutable.  However I noticed
that many uses of integer were for loop counters and indexes where the 
integer never escapes from
the method.  So I added two primitives, one to copy a integer into a new 
box  and the other to increment
the java primitive held inside the box.  In all other ways it inherits 
from my Integer class.  The value is in
reducing Integer creation for big loop/index ints.

Usage looks like
position := 1 newMutable.   gets a mutable integer with an initial 
value of 1
position increment:1.   increments the internal primitive
position = 10  normal integer compare method

I'll probably add a mutable bit to the header to protect the unwary in 
case it escapes but for now its
a power tool.

regards
mark





From:   Charles Oliver Nutter head...@headius.com
To: Da Vinci Machine Project mlvm-dev@openjdk.java.net
Date:   02/01/2012 12:43 PM
Subject:Re: Benchmarking Smalltalk on JVM
Sent by:mlvm-dev-boun...@openjdk.java.net



On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos mr...@roos.com wrote:
 For the initial JDK7 I get 400ms,  moving to jdk8 b20 it drops to 117ms 
(
 very nice).
 I then converted some constructor lookups to statics to get to 66ms.
 Then the obvious move to make an integer cache for which I used the 
jTalk
 range of -2000 to 4000 gave 30ms
 And finally ( to handle the index integer) I created a MutableInteger 
which
 dropped me to 5ms.

Can you explain MutableInteger a bit more?

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Mark Roos
This may be a little much to ask but...   These bytecodes take about 20ns 
per cycle to run on
my 2.8 GHz mac using jdk8-B23 without TieredCompile.  Does this seem 
reasonable given the number of indy calls?
The GWT depth on the method sends is 1

thanks
mark

   LABEL  56  LABEL 1 
  56  aload 4 
  58  aload 3 
  59  astore 1 
  60  aload 1 
   INDY (asm)61  [at:] RtCallSite, (6) {RtTestCases class 
benchmarkLoop, 19} 
  66  astore 1 
  67  aload 1 
  68  astore 5 
   INDY (asm)70  [41] ConstantCallSite, (6) {dummy} 
  75  aload 4 
  77  astore 1 
  78  aload 1 
   INDY (asm)79  [increment:] RtCallSite, (6) {RtTestCases 
class benchmarkLoop, 23} 
  84  astore 1 
   LABEL  85  LABEL 0 
  85  aload 4 
  87  astore 1 
   INDY (asm)88  [4100] ConstantCallSite, (6) {dummy} 
  93  aload 1 
   INDY (asm)94  [=] RtCallSite, (6) {RtTestCases class 
benchmarkLoop, 24} 
  99  astore 1 
 100  aload 1 
 101  getstatic ri/core/rtalk/RtObject _true 
Lri/core/rtalk/RtObject; 
   JUMP   104  if_acmpeq LABEL 1 ___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Rémi Forax
On 02/01/2012 10:44 PM, Mark Roos wrote:
 Hi Charles

 Its pretty simple.  All of my integers are boxed and are by definition 
 immutable.  However I noticed
 that many uses of integer were for loop counters and indexes where the 
 integer never escapes from
 the method.  So I added two primitives, one to copy a integer into a 
 new box  and the other to increment
 the java primitive held inside the box.  In all other ways it inherits 
 from my Integer class.  The value is in
 reducing Integer creation for big loop/index ints.

 Usage looks like
 position := 1 newMutable.   gets a mutable integer with an 
 initial value of 1
 position increment:1.increments the internal 
 primitive
 position = 10  normal integer compare method

 I'll probably add a mutable bit to the header to protect the unwary in 
 case it escapes but for now its
 a power tool.

 regards
 mark

if you know it will never escape,you should use an int directly.

Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Rémi Forax
On 02/01/2012 10:44 PM, Mark Roos wrote:
 This may be a little much to ask but...   These bytecodes take about 
 20ns per cycle to run on
 my 2.8 GHz mac using jdk8-B23 without TieredCompile.  Does this seem 
 reasonable given the number of indy calls?
 The GWT depth on the method sends is 1

 thanks
 mark

LABEL 56  LABEL 1
 56  aload 4
 58  aload 3
 59  astore 1
 60  aload 1
INDY (asm) 61  [at:] RtCallSite, (6) {RtTestCases class 
 benchmarkLoop, 19}
 66  astore 1
 67  aload 1
 68  astore 5
INDY (asm) 70  [41] ConstantCallSite, (6) {dummy}
 75  aload 4
 77  astore 1
 78  aload 1
INDY (asm) 79  [increment:] RtCallSite, (6) {RtTestCases class 
 benchmarkLoop, 23}
 84  astore 1
LABEL 85  LABEL 0
 85  aload 4
 87  astore 1
INDY (asm) 88  [4100] ConstantCallSite, (6) {dummy}
 93  aload 1
INDY (asm) 94  [=] RtCallSite, (6) {RtTestCases class 
 benchmarkLoop, 24}
 99  astore 1
 100  aload 1
 101  getstatic ri/core/rtalk/RtObject _true Lri/core/rtalk/RtObject;
JUMP 104  if_acmpeq LABEL 1

Without the descriptors of invokedynamic and the code of the BSM, it's 
hard to tell.

Anyway, you can optimize the last instructions, = should return a boolean
so the sequence should be:

ldc 4100
aload 1
indy = (ILObject;)Z
if_eq LABEL 1

for that you have to propagate types, from root to leafs to type the 
return type
of invokedynamic with the expected type (the condition of an if is a 
boolean)
an from leafs to root (the first argument of = is an int).

cheers,
Rémi

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Charles Oliver Nutter
Ahh, ok, I figured it was something like that. So is your code there
actual code, or is it what you compile the code as when you realize
the value won't escape?

As in your case, the biggest limitation to JRuby's performance these
days is the cost of boxed numerics, so I'm always looking for ways to
eliminate or reduce that cost.

FWIW, I have done experiments with using enumerators instead of
integer loops over a given range, which acts similar to what you have
(since it's basically a mutable cursor that creates no new
language-visible values). In such cases, JRuby + indy can iterate over
a range as fast as Java.

- Charlie

On Wed, Feb 1, 2012 at 3:44 PM, Mark Roos mr...@roos.com wrote:
 Hi Charles

 Its pretty simple.  All of my integers are boxed and are by definition
 immutable.  However I noticed
 that many uses of integer were for loop counters and indexes where the
 integer never escapes from
 the method.  So I added two primitives, one to copy a integer into a new box
  and the other to increment
 the java primitive held inside the box.  In all other ways it inherits from
 my Integer class.  The value is in
 reducing Integer creation for big loop/index ints.

 Usage looks like
         position := 1 newMutable.   gets a mutable integer with an initial
 value of 1
         position increment:1.            increments the internal primitive
         position = 10          normal integer compare method

 I'll probably add a mutable bit to the header to protect the unwary in case
 it escapes but for now its
 a power tool.

 regards
 mark





 From:        Charles Oliver Nutter head...@headius.com
 To:        Da Vinci Machine Project mlvm-dev@openjdk.java.net
 Date:        02/01/2012 12:43 PM
 Subject:        Re: Benchmarking Smalltalk on JVM
 Sent by:        mlvm-dev-boun...@openjdk.java.net
 



 On Tue, Jan 31, 2012 at 6:52 PM, Mark Roos mr...@roos.com wrote:
 For the initial JDK7 I get 400ms,  moving to jdk8 b20 it drops to 117ms (
 very nice).
 I then converted some constructor lookups to statics to get to 66ms.
 Then the obvious move to make an integer cache for which I used the jTalk
 range of -2000 to 4000 gave 30ms
 And finally ( to handle the index integer) I created a MutableInteger
 which
 dropped me to 5ms.

 Can you explain MutableInteger a bit more?

 - Charlie
 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


 ___
 mlvm-dev mailing list
 mlvm-dev@openjdk.java.net
 http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Mark Roos
From Rémi
Without the descriptors of invokedynamic and the code of the BSM, 
it's 
hard to tell.

Yes but they have no invoke dynamics and I was just wondering if my indy 
part was causing the
issue.  Your answer told me that I should be OK so that was helpful. This 
same code was much
faster on jdk8-b20 for some reason.

I will play around and see where the time is going .  Would be nice to 
have an way to get the 8086 object code.

thanks

mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Benchmarking Smalltalk on JVM

2012-02-01 Thread Mark Roos
From Rémi
Anyway, you can optimize the last instructions, = should return a 
boolean
so the sequence should be:

ldc 4100
aload 1
indy = (ILObject;)Z
if_eq LABEL 1

I am not sure how to handle this in a Smalltalk envrionment.  All of the 
objects are instances of the same
java type so = is a method which returns an RtObject which is the 
singular instance of true.  I have to
compare that return to 'true' to get what the if bytecode wants.

= could have been a block making type inference more interesting.

thanks for the thoughts

mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev