Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread BGB

On 9/28/2012 5:33 PM, Vitaly Davidovich wrote:


Yeah, the CLR does something similar - there's a type, ValueType, 
which is the base class for all structs implicitly.  It's just a 
moniker though since you can't extend it explicitly (languages like C# 
then provide keywords to declare them) and it provides just a handful 
of basic methods inherited from object which you can override if need 
be.  They can implement interfaces, but not extend (or be extended by) 
classes.  Assignment just does bitwise copying, as well as passing 
them as args (unless ref/out is used) or returning them.  Assigning or 
referring to them via any interface that they implement boxes it.


The really nice thing about them is you can implement (and BCL does) 
some lightweight abstractions like enumerators. You can also do 
RAII-like things with them; C# actually generates code that refers to 
them as-is (not IDisposable) when desugaring using{} blocks.  It's 
very nice basically :)




my own VM does something vaguely similar as well, but it differs 
somewhat from the strategy described (in that value-types are indicated 
via a modifier flag and don't require any special treatment at the 
bytecode level), and they are not quite as efficient as they could be.


value-classes also exist, and can do RAII-like stuff via 
copy-constructors and destructors.



partly this is handled internally with them being objects as before, except:
normal objects have a copyValue method which simply returns the same object;
value-types create a new object which holds a copy of the object (for 
value-classes, creates a new instance of the object, calling the 
copy-constructor with the old object).


and, similarly:
the dropValue method for normal reference-objects is essentially no-op;
however, for value-types, it will destroy the object (for value-classes, 
calling the destructor and then freeing the memory).


note (for my VM, not JVM based): many VM objects are not class 
instances... and these methods are VM-internal, and exist separately 
from the actual methods declared in a class. some are just weird, such 
as toString, which calls a VM internal method, which may in-turn call 
the class method if the object is an instance of a class. technically, 
the vtable for these internal methods is linked to (indirectly) via the 
GC's object header.



I'm wondering why the JVM can't do something like that apart from 
having to modify the language spec and thus have VM vendors needing to 
implement it.  Not downplaying that aspect, but curious what technical 
challenges this would present.




I don't really know.

(admittedly, I haven't really done any development on the standard JVM).

I did similar before on my own miniature JVM implementation, partly by 
creating a special "Struct" class, and imparting some "magic" to it 
(more like that described before).


my own VM's implementation was based on this. underneath, they were 
largely built on the same core machinery.



or such...




Sent from my phone

On Sep 28, 2012 5:55 PM, "BGB" > wrote:


On 9/28/2012 4:10 PM, Vitaly Davidovich wrote:


Since we're in wishful thinking territory now :), the two things
I'd really like are:

1) value/struct types (i.e. avoid heap and be able to pack data
closer together).  I don't how much we can rely on EA.
2) more auto-vectorization

I think 2 is being worked on by Vladimir but unclear if there are
any concrete plans for 1.  I know John Rose has written about it,
but don't know if anything's actually planned.



yeah, agreed on 1.
I remember reading before of mention of using special signatures
or similar, but I forget the specifics.


I had before floated the idea of if it could be indicated via a
special base-class or interface.
in the latter case, the interface would essentially be "magic",
and tell the VM: "Hey! This thing here is a struct!".

this could sort of work, but would exhibit incorrect behavior on
older JVMs, unless it were done multi-part:
one part, a JVM extension to support built-in structs (indicated
via a special class or interface, as before);
the second part would be providing special
classes/interfaces/methods, which could be used to "implement" the
special struct behavior (could just be "native"?);
the 3rd part would basically be some syntax sugar in Java, mostly
so that the code isn't filled up with nasty looking method calls.


say (extensions):
ValueType interface, provides ability to construct types with
pass-by-value semantics (mostly would be handled specially by
"javac" or similar);
Struct class which implements ValueType, special class, for which
all derived classes are structs;
ValueClass class, which is like struct, but creates classes which
implement pass-by-value semantics.

so, if we have something like:
SomeStruct a, b;
a=new SomeStruct(...);
b=a;

Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Vitaly Davidovich
Yeah, the CLR does something similar - there's a type, ValueType, which is
the base class for all structs implicitly.  It's just a moniker though
since you can't extend it explicitly (languages like C# then provide
keywords to declare them) and it provides just a handful of basic methods
inherited from object which you can override if need be.  They can
implement interfaces, but not extend (or be extended by) classes.
Assignment just does bitwise copying, as well as passing them as args
(unless ref/out is used) or returning them.  Assigning or referring to them
via any interface that they implement boxes it.

The really nice thing about them is you can implement (and BCL does) some
lightweight abstractions like enumerators.  You can also do RAII-like
things with them; C# actually generates code that refers to them as-is (not
IDisposable) when desugaring using{} blocks.  It's very nice basically :)

I'm wondering why the JVM can't do something like that apart from having to
modify the language spec and thus have VM vendors needing to implement it.
Not downplaying that aspect, but curious what technical challenges this
would present.

Sent from my phone
On Sep 28, 2012 5:55 PM, "BGB"  wrote:

>  On 9/28/2012 4:10 PM, Vitaly Davidovich wrote:
>
> Since we're in wishful thinking territory now :), the two things I'd
> really like are:
>
> 1) value/struct types (i.e. avoid heap and be able to pack data closer
> together).  I don't how much we can rely on EA.
> 2) more auto-vectorization
>
> I think 2 is being worked on by Vladimir but unclear if there are any
> concrete plans for 1.  I know John Rose has written about it, but don't
> know if anything's actually planned.
>
>
> yeah, agreed on 1.
> I remember reading before of mention of using special signatures or
> similar, but I forget the specifics.
>
>
> I had before floated the idea of if it could be indicated via a special
> base-class or interface.
> in the latter case, the interface would essentially be "magic", and tell
> the VM: "Hey! This thing here is a struct!".
>
> this could sort of work, but would exhibit incorrect behavior on older
> JVMs, unless it were done multi-part:
> one part, a JVM extension to support built-in structs (indicated via a
> special class or interface, as before);
> the second part would be providing special classes/interfaces/methods,
> which could be used to "implement" the special struct behavior (could just
> be "native"?);
> the 3rd part would basically be some syntax sugar in Java, mostly so that
> the code isn't filled up with nasty looking method calls.
>
>
> say (extensions):
> ValueType interface, provides ability to construct types with
> pass-by-value semantics (mostly would be handled specially by "javac" or
> similar);
> Struct class which implements ValueType, special class, for which all
> derived classes are structs;
> ValueClass class, which is like struct, but creates classes which
> implement pass-by-value semantics.
>
> so, if we have something like:
> SomeStruct a, b;
> a=new SomeStruct(...);
> b=a;
> the latter could generate code more like if it were:
> b=a.copyValue();
> and when they leave scope:
> a.dropValue();
> b.dropValue();
>
> and:
> public struct SomeStruct { ... }
>
> could actually be handled internally more like:
> public final class SomeStruct extends Struct { ... }
>
>
> with the VM realizing that Struct and "Struct.copyValue()" and similar are
> magic, with the JIT generating special code to handle them more efficiently.
>
> or, at least, this is my idle thinking at the moment...
>
>
> note: unrelated to "java.sql.Struct"...
>
>
>  Sent from my phone
> On Sep 28, 2012 3:59 PM, "Charles Oliver Nutter" 
> wrote:
>
>> Now what we need is a way to inject new intrinsics into the JVM, so I
>> can make an asm version of something and tell hotspot "no no, use
>> this, not the JVM bytecode" :)
>>
>> - Charlie
>>
>> On Fri, Sep 28, 2012 at 11:53 AM, Vitaly Davidovich 
>> wrote:
>> > Yup, it would have to do extensive pattern matching otherwise.  C/C++
>> > compilers do the same thing (I.e. have intimate knowledge of stdlib
>> calls
>> > and may optimize more aggressively or replace code with intrinsic
>> > altogether).
>> >
>> > In this case, jit uses the bsf x86 assembly instruction whereas hand
>> rolled
>> > "copy version" generates asm pretty much matching the java code.
>> >
>> > Sent from my phone
>> >
>> > On Sep 28, 2012 2:42 PM, "Raffaello Giulietti"
>> >  wrote:
>> >>
>> >> On Fri, Sep 28, 2012 at 8:15 PM, Charles Oliver Nutter
>> >>  wrote:
>> >> > On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
>> >> >  wrote:
>> >> >> I'm not sure that we are speaking about the same thing.
>> >> >>
>> >> >> The Java source code of numberOfTrailingZeros() is exactly the same
>> in
>> >> >> Integer as it is in MyInteger. But, as far as I understand, what
>> >> >> really runs on the metal upon invocation of the Integer method is
>> not
>> >> >> JITted code but something else that probably makes use

Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Michael Barker
> 1) value/struct types (i.e. avoid heap and be able to pack data closer
> together).  I don't how much we can rely on EA.

I made a start on some of this based on John Rose's design,
unfortunately a promotion and a new daughter has pretty much stopped
progress.  I hope to get back to it at some point.

Mike.
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread BGB

On 9/28/2012 4:10 PM, Vitaly Davidovich wrote:


Since we're in wishful thinking territory now :), the two things I'd 
really like are:


1) value/struct types (i.e. avoid heap and be able to pack data closer 
together).  I don't how much we can rely on EA.

2) more auto-vectorization

I think 2 is being worked on by Vladimir but unclear if there are any 
concrete plans for 1.  I know John Rose has written about it, but 
don't know if anything's actually planned.




yeah, agreed on 1.
I remember reading before of mention of using special signatures or 
similar, but I forget the specifics.



I had before floated the idea of if it could be indicated via a special 
base-class or interface.
in the latter case, the interface would essentially be "magic", and tell 
the VM: "Hey! This thing here is a struct!".


this could sort of work, but would exhibit incorrect behavior on older 
JVMs, unless it were done multi-part:
one part, a JVM extension to support built-in structs (indicated via a 
special class or interface, as before);
the second part would be providing special classes/interfaces/methods, 
which could be used to "implement" the special struct behavior (could 
just be "native"?);
the 3rd part would basically be some syntax sugar in Java, mostly so 
that the code isn't filled up with nasty looking method calls.



say (extensions):
ValueType interface, provides ability to construct types with 
pass-by-value semantics (mostly would be handled specially by "javac" or 
similar);
Struct class which implements ValueType, special class, for which all 
derived classes are structs;
ValueClass class, which is like struct, but creates classes which 
implement pass-by-value semantics.


so, if we have something like:
SomeStruct a, b;
a=new SomeStruct(...);
b=a;
the latter could generate code more like if it were:
b=a.copyValue();
and when they leave scope:
a.dropValue();
b.dropValue();

and:
public struct SomeStruct { ... }

could actually be handled internally more like:
public final class SomeStruct extends Struct { ... }


with the VM realizing that Struct and "Struct.copyValue()" and similar 
are magic, with the JIT generating special code to handle them more 
efficiently.


or, at least, this is my idle thinking at the moment...


note: unrelated to "java.sql.Struct"...



Sent from my phone

On Sep 28, 2012 3:59 PM, "Charles Oliver Nutter" > wrote:


Now what we need is a way to inject new intrinsics into the JVM, so I
can make an asm version of something and tell hotspot "no no, use
this, not the JVM bytecode" :)

- Charlie

On Fri, Sep 28, 2012 at 11:53 AM, Vitaly Davidovich
mailto:vita...@gmail.com>> wrote:
> Yup, it would have to do extensive pattern matching otherwise.
 C/C++
> compilers do the same thing (I.e. have intimate knowledge of
stdlib calls
> and may optimize more aggressively or replace code with intrinsic
> altogether).
>
> In this case, jit uses the bsf x86 assembly instruction whereas
hand rolled
> "copy version" generates asm pretty much matching the java code.
>
> Sent from my phone
>
> On Sep 28, 2012 2:42 PM, "Raffaello Giulietti"
> mailto:raffaello.giulie...@gmail.com>> wrote:
>>
>> On Fri, Sep 28, 2012 at 8:15 PM, Charles Oliver Nutter
>> mailto:head...@headius.com>> wrote:
>> > On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
>> > mailto:raffaello.giulie...@gmail.com>> wrote:
>> >> I'm not sure that we are speaking about the same thing.
>> >>
>> >> The Java source code of numberOfTrailingZeros() is exactly
the same in
>> >> Integer as it is in MyInteger. But, as far as I understand, what
>> >> really runs on the metal upon invocation of the Integer
method is not
>> >> JITted code but something else that probably makes use of
CPU specific
>> >> instructions. This code is built directly into the JVM and
need not
>> >> bear any resemblance with the code that would have been
produced by
>> >> JITting the bytecode.
>> >
>> > Regardless of whether the method is implemented in Java or
not, the
>> > JVM "knows" native/intrinsic/optimized versions of many
java.lang core
>> > methods. numberOfTrailingZeros is one such method.
>> >
>> > Here, the JVM is using its intrinsified version rather than
the JITed
>> > version, presumably because the intrinsified version is
pre-optimized
>> > and faster than what the JVM JIT can do for the JVM bytecode
version.
>> >
>> > system ~/projects/jruby-ruby $ java -XX:+PrintCompilation
>> > -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining Blah
>> >  651 java.lang.String::hashCode (55 bytes)
>> >  782 Blah::doIt (5 bytes)
>> >  783 java.lang.Integer::numberOfTrailingZeros (79
>> > bytes)
>> > @ 1
>> > java.lang.Integer:

Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Charles Oliver Nutter
On Fri, Sep 28, 2012 at 2:10 PM, Vitaly Davidovich  wrote:
> 1) value/struct types (i.e. avoid heap and be able to pack data closer
> together).  I don't how much we can rely on EA.
> 2) more auto-vectorization
>
> I think 2 is being worked on by Vladimir but unclear if there are any
> concrete plans for 1.  I know John Rose has written about it, but don't know
> if anything's actually planned.

Yeah, other than the occasional cries of "tail calls" I think the
performance of boxed numerics (or boxed anything that doesn't need to
be boxed) is by far the #1 pain point for dynlang implementers on JVM
right now.

I'm hopeful that the indy opto work happening for JDK8 will be able to
EA across dyncall boundaries, but you're rightly skeptical about the
current EA saving us much overhead. Because everything needs to inline
to EA, I don't expect to see a lot of improvement for numerics.
However, I *would* expect to see improvements in cases where we create
truly transient data structures e.g. for "out" params, since they
should be easier to inline and more localized.

- Charlie
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Vitaly Davidovich
Since we're in wishful thinking territory now :), the two things I'd really
like are:

1) value/struct types (i.e. avoid heap and be able to pack data closer
together).  I don't how much we can rely on EA.
2) more auto-vectorization

I think 2 is being worked on by Vladimir but unclear if there are any
concrete plans for 1.  I know John Rose has written about it, but don't
know if anything's actually planned.

Sent from my phone
On Sep 28, 2012 3:59 PM, "Charles Oliver Nutter" 
wrote:

> Now what we need is a way to inject new intrinsics into the JVM, so I
> can make an asm version of something and tell hotspot "no no, use
> this, not the JVM bytecode" :)
>
> - Charlie
>
> On Fri, Sep 28, 2012 at 11:53 AM, Vitaly Davidovich 
> wrote:
> > Yup, it would have to do extensive pattern matching otherwise.  C/C++
> > compilers do the same thing (I.e. have intimate knowledge of stdlib calls
> > and may optimize more aggressively or replace code with intrinsic
> > altogether).
> >
> > In this case, jit uses the bsf x86 assembly instruction whereas hand
> rolled
> > "copy version" generates asm pretty much matching the java code.
> >
> > Sent from my phone
> >
> > On Sep 28, 2012 2:42 PM, "Raffaello Giulietti"
> >  wrote:
> >>
> >> On Fri, Sep 28, 2012 at 8:15 PM, Charles Oliver Nutter
> >>  wrote:
> >> > On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
> >> >  wrote:
> >> >> I'm not sure that we are speaking about the same thing.
> >> >>
> >> >> The Java source code of numberOfTrailingZeros() is exactly the same
> in
> >> >> Integer as it is in MyInteger. But, as far as I understand, what
> >> >> really runs on the metal upon invocation of the Integer method is not
> >> >> JITted code but something else that probably makes use of CPU
> specific
> >> >> instructions. This code is built directly into the JVM and need not
> >> >> bear any resemblance with the code that would have been produced by
> >> >> JITting the bytecode.
> >> >
> >> > Regardless of whether the method is implemented in Java or not, the
> >> > JVM "knows" native/intrinsic/optimized versions of many java.lang core
> >> > methods. numberOfTrailingZeros is one such method.
> >> >
> >> > Here, the JVM is using its intrinsified version rather than the JITed
> >> > version, presumably because the intrinsified version is pre-optimized
> >> > and faster than what the JVM JIT can do for the JVM bytecode version.
> >> >
> >> > system ~/projects/jruby-ruby $ java -XX:+PrintCompilation
> >> > -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining Blah
> >> >  651 java.lang.String::hashCode (55 bytes)
> >> >  782 Blah::doIt (5 bytes)
> >> >  783 java.lang.Integer::numberOfTrailingZeros (79
> >> > bytes)
> >> > @ 1
> >> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> >> >  791 %   Blah::main @ 2 (29 bytes)
> >> > @ 9   Blah::doIt (5 bytes)   inline (hot)
> >> >   @ 1
> >> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> >> > @ 15   Blah::doIt (5 bytes)   inline (hot)
> >> >   @ 1
> >> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> >> >
> >> > system ~/projects/jruby-ruby $ cat Blah.java
> >> > public class Blah {
> >> > public static int value = 0;
> >> > public static void main(String[] args) {
> >> >   for (int i = 0; i < 10_000_000; i++) {
> >> > value = doIt(i) + doIt(i * 2);
> >> >   }
> >> > }
> >> >
> >> > public static int doIt(int i) {
> >> >   return Integer.numberOfTrailingZeros(i);
> >> > }
> >> > }
> >> > ___
> >>
> >>
> >> Yes, this is what Vitaly stated and what happens behind the curtains.
> >>
> >> In the end, this means there are no chances for the rest of us to
> >> implement better Java code as a replacement for the intrinsified
> >> methods.
> >>
> >> For example, the following variant is about 2.5 times *faster*,
> >> averaged over all integers, than the JITted original method, the one
> >> copied verbatim! (Besides, everybody would agree that it is more
> >> readable, I hope.)
> >>
> >> But since the Integer version is intrinsified, it still runs about 2
> >> times slower than that (mysterious) code.
> >>
> >> public static int numberOfTrailingZeros(int i) {
> >> int n = 0;
> >> for (; n < 32 && (i & 1 << n) == 0; ++n);
> >> return n;
> >> }
> >> ___
> >> mlvm-dev mailing list
> >> mlvm-dev@openjdk.java.net
> >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> >
> >
> > ___
> > mlvm-dev mailing list
> > mlvm-dev@openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> >
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net

Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Mark Roos
I understand your bad luck in micro benchmarking a method that happens to 
be intrinsic.

I was just sharing my experience in creating a non Java recognized 'boxed' 
primitive and that it did not
on the whole suffer the extreme degradation you see.

As you said 'micro benchmarks are bad'

regards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Charles Oliver Nutter
Now what we need is a way to inject new intrinsics into the JVM, so I
can make an asm version of something and tell hotspot "no no, use
this, not the JVM bytecode" :)

- Charlie

On Fri, Sep 28, 2012 at 11:53 AM, Vitaly Davidovich  wrote:
> Yup, it would have to do extensive pattern matching otherwise.  C/C++
> compilers do the same thing (I.e. have intimate knowledge of stdlib calls
> and may optimize more aggressively or replace code with intrinsic
> altogether).
>
> In this case, jit uses the bsf x86 assembly instruction whereas hand rolled
> "copy version" generates asm pretty much matching the java code.
>
> Sent from my phone
>
> On Sep 28, 2012 2:42 PM, "Raffaello Giulietti"
>  wrote:
>>
>> On Fri, Sep 28, 2012 at 8:15 PM, Charles Oliver Nutter
>>  wrote:
>> > On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
>> >  wrote:
>> >> I'm not sure that we are speaking about the same thing.
>> >>
>> >> The Java source code of numberOfTrailingZeros() is exactly the same in
>> >> Integer as it is in MyInteger. But, as far as I understand, what
>> >> really runs on the metal upon invocation of the Integer method is not
>> >> JITted code but something else that probably makes use of CPU specific
>> >> instructions. This code is built directly into the JVM and need not
>> >> bear any resemblance with the code that would have been produced by
>> >> JITting the bytecode.
>> >
>> > Regardless of whether the method is implemented in Java or not, the
>> > JVM "knows" native/intrinsic/optimized versions of many java.lang core
>> > methods. numberOfTrailingZeros is one such method.
>> >
>> > Here, the JVM is using its intrinsified version rather than the JITed
>> > version, presumably because the intrinsified version is pre-optimized
>> > and faster than what the JVM JIT can do for the JVM bytecode version.
>> >
>> > system ~/projects/jruby-ruby $ java -XX:+PrintCompilation
>> > -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining Blah
>> >  651 java.lang.String::hashCode (55 bytes)
>> >  782 Blah::doIt (5 bytes)
>> >  783 java.lang.Integer::numberOfTrailingZeros (79
>> > bytes)
>> > @ 1
>> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
>> >  791 %   Blah::main @ 2 (29 bytes)
>> > @ 9   Blah::doIt (5 bytes)   inline (hot)
>> >   @ 1
>> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
>> > @ 15   Blah::doIt (5 bytes)   inline (hot)
>> >   @ 1
>> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
>> >
>> > system ~/projects/jruby-ruby $ cat Blah.java
>> > public class Blah {
>> > public static int value = 0;
>> > public static void main(String[] args) {
>> >   for (int i = 0; i < 10_000_000; i++) {
>> > value = doIt(i) + doIt(i * 2);
>> >   }
>> > }
>> >
>> > public static int doIt(int i) {
>> >   return Integer.numberOfTrailingZeros(i);
>> > }
>> > }
>> > ___
>>
>>
>> Yes, this is what Vitaly stated and what happens behind the curtains.
>>
>> In the end, this means there are no chances for the rest of us to
>> implement better Java code as a replacement for the intrinsified
>> methods.
>>
>> For example, the following variant is about 2.5 times *faster*,
>> averaged over all integers, than the JITted original method, the one
>> copied verbatim! (Besides, everybody would agree that it is more
>> readable, I hope.)
>>
>> But since the Integer version is intrinsified, it still runs about 2
>> times slower than that (mysterious) code.
>>
>> public static int numberOfTrailingZeros(int i) {
>> int n = 0;
>> for (; n < 32 && (i & 1 << n) == 0; ++n);
>> return n;
>> }
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Ben Evans
Please include a link to your entire project, including the test harness.

Microbenchmarks are tricky things, and it will be easier to have a good
discussion if others can independently reproduce your results.

Thanks,

Ben
On 28 Sep 2012 11:42, "Raffaello Giulietti" 
wrote:

> On Fri, Sep 28, 2012 at 8:15 PM, Charles Oliver Nutter
>  wrote:
> > On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
> >  wrote:
> >> I'm not sure that we are speaking about the same thing.
> >>
> >> The Java source code of numberOfTrailingZeros() is exactly the same in
> >> Integer as it is in MyInteger. But, as far as I understand, what
> >> really runs on the metal upon invocation of the Integer method is not
> >> JITted code but something else that probably makes use of CPU specific
> >> instructions. This code is built directly into the JVM and need not
> >> bear any resemblance with the code that would have been produced by
> >> JITting the bytecode.
> >
> > Regardless of whether the method is implemented in Java or not, the
> > JVM "knows" native/intrinsic/optimized versions of many java.lang core
> > methods. numberOfTrailingZeros is one such method.
> >
> > Here, the JVM is using its intrinsified version rather than the JITed
> > version, presumably because the intrinsified version is pre-optimized
> > and faster than what the JVM JIT can do for the JVM bytecode version.
> >
> > system ~/projects/jruby-ruby $ java -XX:+PrintCompilation
> > -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining Blah
> >  651 java.lang.String::hashCode (55 bytes)
> >  782 Blah::doIt (5 bytes)
> >  783 java.lang.Integer::numberOfTrailingZeros (79
> bytes)
> > @ 1
> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> >  791 %   Blah::main @ 2 (29 bytes)
> > @ 9   Blah::doIt (5 bytes)   inline (hot)
> >   @ 1
> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> > @ 15   Blah::doIt (5 bytes)   inline (hot)
> >   @ 1
> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> >
> > system ~/projects/jruby-ruby $ cat Blah.java
> > public class Blah {
> > public static int value = 0;
> > public static void main(String[] args) {
> >   for (int i = 0; i < 10_000_000; i++) {
> > value = doIt(i) + doIt(i * 2);
> >   }
> > }
> >
> > public static int doIt(int i) {
> >   return Integer.numberOfTrailingZeros(i);
> > }
> > }
> > ___
>
>
> Yes, this is what Vitaly stated and what happens behind the curtains.
>
> In the end, this means there are no chances for the rest of us to
> implement better Java code as a replacement for the intrinsified
> methods.
>
> For example, the following variant is about 2.5 times *faster*,
> averaged over all integers, than the JITted original method, the one
> copied verbatim! (Besides, everybody would agree that it is more
> readable, I hope.)
>
> But since the Integer version is intrinsified, it still runs about 2
> times slower than that (mysterious) code.
>
> public static int numberOfTrailingZeros(int i) {
> int n = 0;
> for (; n < 32 && (i & 1 << n) == 0; ++n);
> return n;
> }
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Vitaly Davidovich
For things like this, it's better to grab hsdis and look at the generated
assembly - it'll answer most of these types of questions.

Sent from my phone
On Sep 28, 2012 2:54 PM, "Ben Evans"  wrote:

> Please include a link to your entire project, including the test harness.
>
> Microbenchmarks are tricky things, and it will be easier to have a good
> discussion if others can independently reproduce your results.
>
> Thanks,
>
> Ben
> On 28 Sep 2012 11:42, "Raffaello Giulietti" 
> wrote:
>
>> On Fri, Sep 28, 2012 at 8:15 PM, Charles Oliver Nutter
>>  wrote:
>> > On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
>> >  wrote:
>> >> I'm not sure that we are speaking about the same thing.
>> >>
>> >> The Java source code of numberOfTrailingZeros() is exactly the same in
>> >> Integer as it is in MyInteger. But, as far as I understand, what
>> >> really runs on the metal upon invocation of the Integer method is not
>> >> JITted code but something else that probably makes use of CPU specific
>> >> instructions. This code is built directly into the JVM and need not
>> >> bear any resemblance with the code that would have been produced by
>> >> JITting the bytecode.
>> >
>> > Regardless of whether the method is implemented in Java or not, the
>> > JVM "knows" native/intrinsic/optimized versions of many java.lang core
>> > methods. numberOfTrailingZeros is one such method.
>> >
>> > Here, the JVM is using its intrinsified version rather than the JITed
>> > version, presumably because the intrinsified version is pre-optimized
>> > and faster than what the JVM JIT can do for the JVM bytecode version.
>> >
>> > system ~/projects/jruby-ruby $ java -XX:+PrintCompilation
>> > -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining Blah
>> >  651 java.lang.String::hashCode (55 bytes)
>> >  782 Blah::doIt (5 bytes)
>> >  783 java.lang.Integer::numberOfTrailingZeros (79
>> bytes)
>> > @ 1
>> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
>> >  791 %   Blah::main @ 2 (29 bytes)
>> > @ 9   Blah::doIt (5 bytes)   inline (hot)
>> >   @ 1
>> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
>> > @ 15   Blah::doIt (5 bytes)   inline (hot)
>> >   @ 1
>> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
>> >
>> > system ~/projects/jruby-ruby $ cat Blah.java
>> > public class Blah {
>> > public static int value = 0;
>> > public static void main(String[] args) {
>> >   for (int i = 0; i < 10_000_000; i++) {
>> > value = doIt(i) + doIt(i * 2);
>> >   }
>> > }
>> >
>> > public static int doIt(int i) {
>> >   return Integer.numberOfTrailingZeros(i);
>> > }
>> > }
>> > ___
>>
>>
>> Yes, this is what Vitaly stated and what happens behind the curtains.
>>
>> In the end, this means there are no chances for the rest of us to
>> implement better Java code as a replacement for the intrinsified
>> methods.
>>
>> For example, the following variant is about 2.5 times *faster*,
>> averaged over all integers, than the JITted original method, the one
>> copied verbatim! (Besides, everybody would agree that it is more
>> readable, I hope.)
>>
>> But since the Integer version is intrinsified, it still runs about 2
>> times slower than that (mysterious) code.
>>
>> public static int numberOfTrailingZeros(int i) {
>> int n = 0;
>> for (; n < 32 && (i & 1 << n) == 0; ++n);
>> return n;
>> }
>> ___
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Vitaly Davidovich
Yup, it would have to do extensive pattern matching otherwise.  C/C++
compilers do the same thing (I.e. have intimate knowledge of stdlib calls
and may optimize more aggressively or replace code with intrinsic
altogether).

In this case, jit uses the bsf x86 assembly instruction whereas hand rolled
"copy version" generates asm pretty much matching the java code.

Sent from my phone
On Sep 28, 2012 2:42 PM, "Raffaello Giulietti" <
raffaello.giulie...@gmail.com> wrote:

> On Fri, Sep 28, 2012 at 8:15 PM, Charles Oliver Nutter
>  wrote:
> > On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
> >  wrote:
> >> I'm not sure that we are speaking about the same thing.
> >>
> >> The Java source code of numberOfTrailingZeros() is exactly the same in
> >> Integer as it is in MyInteger. But, as far as I understand, what
> >> really runs on the metal upon invocation of the Integer method is not
> >> JITted code but something else that probably makes use of CPU specific
> >> instructions. This code is built directly into the JVM and need not
> >> bear any resemblance with the code that would have been produced by
> >> JITting the bytecode.
> >
> > Regardless of whether the method is implemented in Java or not, the
> > JVM "knows" native/intrinsic/optimized versions of many java.lang core
> > methods. numberOfTrailingZeros is one such method.
> >
> > Here, the JVM is using its intrinsified version rather than the JITed
> > version, presumably because the intrinsified version is pre-optimized
> > and faster than what the JVM JIT can do for the JVM bytecode version.
> >
> > system ~/projects/jruby-ruby $ java -XX:+PrintCompilation
> > -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining Blah
> >  651 java.lang.String::hashCode (55 bytes)
> >  782 Blah::doIt (5 bytes)
> >  783 java.lang.Integer::numberOfTrailingZeros (79
> bytes)
> > @ 1
> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> >  791 %   Blah::main @ 2 (29 bytes)
> > @ 9   Blah::doIt (5 bytes)   inline (hot)
> >   @ 1
> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> > @ 15   Blah::doIt (5 bytes)   inline (hot)
> >   @ 1
> > java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> >
> > system ~/projects/jruby-ruby $ cat Blah.java
> > public class Blah {
> > public static int value = 0;
> > public static void main(String[] args) {
> >   for (int i = 0; i < 10_000_000; i++) {
> > value = doIt(i) + doIt(i * 2);
> >   }
> > }
> >
> > public static int doIt(int i) {
> >   return Integer.numberOfTrailingZeros(i);
> > }
> > }
> > ___
>
>
> Yes, this is what Vitaly stated and what happens behind the curtains.
>
> In the end, this means there are no chances for the rest of us to
> implement better Java code as a replacement for the intrinsified
> methods.
>
> For example, the following variant is about 2.5 times *faster*,
> averaged over all integers, than the JITted original method, the one
> copied verbatim! (Besides, everybody would agree that it is more
> readable, I hope.)
>
> But since the Integer version is intrinsified, it still runs about 2
> times slower than that (mysterious) code.
>
> public static int numberOfTrailingZeros(int i) {
> int n = 0;
> for (; n < 32 && (i & 1 << n) == 0; ++n);
> return n;
> }
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Raffaello Giulietti
On Fri, Sep 28, 2012 at 8:15 PM, Charles Oliver Nutter
 wrote:
> On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
>  wrote:
>> I'm not sure that we are speaking about the same thing.
>>
>> The Java source code of numberOfTrailingZeros() is exactly the same in
>> Integer as it is in MyInteger. But, as far as I understand, what
>> really runs on the metal upon invocation of the Integer method is not
>> JITted code but something else that probably makes use of CPU specific
>> instructions. This code is built directly into the JVM and need not
>> bear any resemblance with the code that would have been produced by
>> JITting the bytecode.
>
> Regardless of whether the method is implemented in Java or not, the
> JVM "knows" native/intrinsic/optimized versions of many java.lang core
> methods. numberOfTrailingZeros is one such method.
>
> Here, the JVM is using its intrinsified version rather than the JITed
> version, presumably because the intrinsified version is pre-optimized
> and faster than what the JVM JIT can do for the JVM bytecode version.
>
> system ~/projects/jruby-ruby $ java -XX:+PrintCompilation
> -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining Blah
>  651 java.lang.String::hashCode (55 bytes)
>  782 Blah::doIt (5 bytes)
>  783 java.lang.Integer::numberOfTrailingZeros (79 bytes)
> @ 1
> java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
>  791 %   Blah::main @ 2 (29 bytes)
> @ 9   Blah::doIt (5 bytes)   inline (hot)
>   @ 1
> java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
> @ 15   Blah::doIt (5 bytes)   inline (hot)
>   @ 1
> java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
>
> system ~/projects/jruby-ruby $ cat Blah.java
> public class Blah {
> public static int value = 0;
> public static void main(String[] args) {
>   for (int i = 0; i < 10_000_000; i++) {
> value = doIt(i) + doIt(i * 2);
>   }
> }
>
> public static int doIt(int i) {
>   return Integer.numberOfTrailingZeros(i);
> }
> }
> ___


Yes, this is what Vitaly stated and what happens behind the curtains.

In the end, this means there are no chances for the rest of us to
implement better Java code as a replacement for the intrinsified
methods.

For example, the following variant is about 2.5 times *faster*,
averaged over all integers, than the JITted original method, the one
copied verbatim! (Besides, everybody would agree that it is more
readable, I hope.)

But since the Integer version is intrinsified, it still runs about 2
times slower than that (mysterious) code.

public static int numberOfTrailingZeros(int i) {
int n = 0;
for (; n < 32 && (i & 1 << n) == 0; ++n);
return n;
}
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Charles Oliver Nutter
On Fri, Sep 28, 2012 at 10:21 AM, Raffaello Giulietti
 wrote:
> I'm not sure that we are speaking about the same thing.
>
> The Java source code of numberOfTrailingZeros() is exactly the same in
> Integer as it is in MyInteger. But, as far as I understand, what
> really runs on the metal upon invocation of the Integer method is not
> JITted code but something else that probably makes use of CPU specific
> instructions. This code is built directly into the JVM and need not
> bear any resemblance with the code that would have been produced by
> JITting the bytecode.

Regardless of whether the method is implemented in Java or not, the
JVM "knows" native/intrinsic/optimized versions of many java.lang core
methods. numberOfTrailingZeros is one such method.

Here, the JVM is using its intrinsified version rather than the JITed
version, presumably because the intrinsified version is pre-optimized
and faster than what the JVM JIT can do for the JVM bytecode version.

system ~/projects/jruby-ruby $ java -XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining Blah
 651 java.lang.String::hashCode (55 bytes)
 782 Blah::doIt (5 bytes)
 783 java.lang.Integer::numberOfTrailingZeros (79 bytes)
@ 1
java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
 791 %   Blah::main @ 2 (29 bytes)
@ 9   Blah::doIt (5 bytes)   inline (hot)
  @ 1
java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)
@ 15   Blah::doIt (5 bytes)   inline (hot)
  @ 1
java.lang.Integer::numberOfTrailingZeros (79 bytes)   (intrinsic)

system ~/projects/jruby-ruby $ cat Blah.java
public class Blah {
public static int value = 0;
public static void main(String[] args) {
  for (int i = 0; i < 10_000_000; i++) {
value = doIt(i) + doIt(i * 2);
  }
}

public static int doIt(int i) {
  return Integer.numberOfTrailingZeros(i);
}
}
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Raffaello Giulietti
On Fri, Sep 28, 2012 at 6:49 PM, Mark Roos  wrote:
> From Raffaello
> are java.lang classes better served by the JVM than other classes?
>Here's a small experiment.
>I created a MyInteger class that exposes the very same implementation
>of Integer.numberOfTrailingZeros(int), copied verbatim.
>
> We did similar micro benchmarks, using Hanoi as a test case, to see the
> speed variations
> the various types of integers would have.  We tried int, long, Long and our
> version of
> a boxed long.  One case where we saw a 3-5x difference was between boxed and
> primitive versions.  This was expected.
>
> The other case was when we compared our custom boxed long with the Java
> Long.
> We found the issue here was with the creation and collection of instances.
> The use of
> the integer cache made a huge difference.  Once we did that out times became
> very close.
> So while there may be internal optimization by the JVM  in the Hanoi case it
> had a minor
> effect.
>
> regards
> mark
> ___


I'm not sure that we are speaking about the same thing.

The Java source code of numberOfTrailingZeros() is exactly the same in
Integer as it is in MyInteger. But, as far as I understand, what
really runs on the metal upon invocation of the Integer method is not
JITted code but something else that probably makes use of CPU specific
instructions. This code is built directly into the JVM and need not
bear any resemblance with the code that would have been produced by
JITting the bytecode.

On the other hand, the bytecode for the method in MyInteger, which
stems from an identical source, will be JITted. It is not built into
the JVM.

Hence, in my example, there is no chance to get close to the
performance of the Integer variant, since the Integer code that runs
on the metal is probably highly optimized and probably quite different
from the JITted code of the MyInteger method.

I guess this happens for every method that is intrinsified into the
JVM. As Vitaly points out, they are listed in
http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/9b0ca45cd756/src/share/vm/opto/library_call.cpp
and numberOfTrailingZeros() is among them.

Cheers
Raffaello
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Mark Roos
>From Raffaello
are java.lang classes better served by the JVM than other classes?
Here's a small experiment.
I created a MyInteger class that exposes the very same 
implementation
of Integer.numberOfTrailingZeros(int), copied verbatim.

We did similar micro benchmarks, using Hanoi as a test case, to see the 
speed variations
the various types of integers would have.  We tried int, long, Long and 
our version of
a boxed long.  One case where we saw a 3-5x difference was between boxed 
and
primitive versions.  This was expected.

The other case was when we compared our custom boxed long with the Java 
Long.
We found the issue here was with the creation and collection of instances. 
The use of
the integer cache made a huge difference.  Once we did that out times 
became very close.
So while there may be internal optimization by the JVM  in the Hanoi case 
it had a minor
effect.

regards
mark___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: Are java.lang classes better served by the JVM?

2012-09-28 Thread Vitaly Davidovich
Yes, the JIT has intrinsic knowledge of some JDK classes and their methods,
and emits optimized code - Integer has several intrinsics.  Look at
src/share/vm/opto/library_call.cpp

Cheers

Sent from my phone
On Sep 28, 2012 10:51 AM, "Raffaello Giulietti" <
raffaello.giulie...@gmail.com> wrote:

> Hello,
>
> are java.lang classes better served by the JVM than other classes?
>
> Here's a small experiment.
>
> I created a MyInteger class that exposes the very same implementation
> of Integer.numberOfTrailingZeros(int), copied verbatim.
>
> And here is a test that, on my JVM, shows that the implementation in
> Integer is about 5 times faster. I tried several JVM flags, e.g.,
> -server, -XX:+AggressiveOpts, -Xshare:off, unsatisfactorily. Similar
> results with factor of about 3-5 are observed on other platforms.
>
> Why the big performance difference? I know, micro-benchmarks are evil,
> etc, etc, ... But this is hard to understand, except if Integer were
> already super-optimized "a priori", intrinsically, while building the
> JVM. Is this the case?
>
>
>
> Greetings
> Raffaello
>
>
>
> ---
>
> public class Trailing {
>
> private static int COUNT = 1 << 30;
>
> public static void main(String[] args) {
> warmup();
> my();
> their();
> }
>
> private static void warmup() {
> int t = 0;
> for (int i = 0; i < COUNT; ++i) {
> t += MyInteger.numberOfTrailingZeros(i);
> }
> System.out.println("warmup, t=" + t);
> }
>
> private static void their() {
> int t = 0;
> long begin = System.nanoTime();
> for (int i = 0; i < COUNT; ++i) {
> t += Integer.numberOfTrailingZeros(i);
> }
> System.out.println((System.nanoTime() - begin) / 100 +
> "ms, t=" + t);
> }
>
> private static void my() {
> int t = 0;
> long begin = System.nanoTime();
> for (int i = 0; i < COUNT; ++i) {
> t += MyInteger.numberOfTrailingZeros(i);
> }
> System.out.println((System.nanoTime() - begin) / 100 +
> "ms, t=" + t);
> }
>
> }
>
> --
>
> public class MyInteger {
>
> public static int numberOfTrailingZeros(int i) {
> // HD, Figure 5-14
> int y;
> if (i == 0) return 32;
> int n = 31;
> y = i <<16; if (y != 0) { n = n -16; i = y; }
> y = i << 8; if (y != 0) { n = n - 8; i = y; }
> y = i << 4; if (y != 0) { n = n - 4; i = y; }
> y = i << 2; if (y != 0) { n = n - 2; i = y; }
> return n - ((i << 1) >>> 31);
> }
>
> }
> ___
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


hg: mlvm/mlvm/hotspot: anno-stable.patch: update

2012-09-28 Thread john . r . rose
Changeset: e3ab72281c5f
Author:jrose
Date:  2012-09-28 00:23 -0700
URL:   http://hg.openjdk.java.net/mlvm/mlvm/hotspot/rev/e3ab72281c5f

anno-stable.patch: update

! anno-stable.patch

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev