Re: Other interesting papers and research
Found a paper from David too... http://www.research.ibm.com/people/d/dgrove/papers/cgo05.html On 6/6/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi Rob, > > > From: Robert Lougher <[EMAIL PROTECTED]> > > Date: Mon, 6 Jun 2005 14:58:45 +0100 > > > > > One thing to > > > > note is that a threaded interpreter would see something like a 2-4x > > > > expansion over "normal" bytecodes when it converts from bytecodes to its > > > > internal form (arrays of function pointers). > > > > > > Direct threading interpreters like JDK's one work on plain Java > > > bytecode and they do not need to expand normal bytecode instructions. > > > Such expansion may have been required if Java bytecode is not linear > > > and rather a tree or other complicated form. > > > > According to my understanding, an indirect threaded interpreter uses > > the original bytecode stream. It's indirect because the handler > > address must be looked up via the bytecode. > > Ah, thanks for the indication. > My wording 'direct threading' was not correct. > > Threading (interpreting) techniques I referred as implemented in JDKs > should be called 'token threading', neither direct nor indirect threading > because they work directly on bytecode instructions withought any expansion. > Note that the interpreter provides NEXT routines to for all native > code fragments corresponding to VM instructions. > For JVM, this wording like something threading is not very informative > because direct interpretation of portable bytecode is naturally > 'token threading'. > > Dave's last posting was based on direct threading technique and his saying > was correct about direct threading but my posting was incorrect in advance. > > Kazuyuki Shudo[EMAIL PROTECTED] http://www.shudo.net/ > -- Davanum Srinivas - http://webservices.apache.org/~dims/
Re: Other interesting papers and research
Hi Rob, > From: Robert Lougher <[EMAIL PROTECTED]> > Date: Mon, 6 Jun 2005 14:58:45 +0100 > > > One thing to > > > note is that a threaded interpreter would see something like a 2-4x > > > expansion over "normal" bytecodes when it converts from bytecodes to its > > > internal form (arrays of function pointers). > > > > Direct threading interpreters like JDK's one work on plain Java > > bytecode and they do not need to expand normal bytecode instructions. > > Such expansion may have been required if Java bytecode is not linear > > and rather a tree or other complicated form. > > According to my understanding, an indirect threaded interpreter uses > the original bytecode stream. It's indirect because the handler > address must be looked up via the bytecode. Ah, thanks for the indication. My wording 'direct threading' was not correct. Threading (interpreting) techniques I referred as implemented in JDKs should be called 'token threading', neither direct nor indirect threading because they work directly on bytecode instructions withought any expansion. Note that the interpreter provides NEXT routines to for all native code fragments corresponding to VM instructions. For JVM, this wording like something threading is not very informative because direct interpretation of portable bytecode is naturally 'token threading'. Dave's last posting was based on direct threading technique and his saying was correct about direct threading but my posting was incorrect in advance. Kazuyuki Shudo[EMAIL PROTECTED] http://www.shudo.net/
Re: Other interesting papers and research
Hi, On 6/6/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi Dave, > > > From: David P Grove <[EMAIL PROTECTED]> > > > > One thing to > > note is that a threaded interpreter would see something like a 2-4x > > expansion over "normal" bytecodes when it converts from bytecodes to its > > internal form (arrays of function pointers). > > Direct threading interpreters like JDK's one work on plain Java > bytecode and they do not need to expand normal bytecode instructions. > Such expansion may have been required if Java bytecode is not linear > and rather a tree or other complicated form. > According to my understanding, an indirect threaded interpreter uses the original bytecode stream. It's indirect because the handler address must be looked up via the bytecode. A direct threaded interpreter removes this step by placing the actual handler addresses in the rewritten instruction stream. For what it's worth, JamVM supports both direct and indirect threading, with static and dynamic stack-caching respectively. I seem to recall an average code size increase of ~4x for JamVM's internal instruction format but I'll need to recheck my figures to be sure. Note, this is assuming a 32-bit architecture. Handler addresses will be double on a 64-bit machine, and the code increase over bytecodes therefore larger. Rob.
Re: Other interesting papers and research
Hi Dave, > From: David P Grove <[EMAIL PROTECTED]> > [EMAIL PROTECTED] wrote on 06/05/2005 10:48:29 PM: > > > - The machine code concatinating technique consumes much memory. > > In my experience, generated machine code is about 10 times larger > > than the original instructions in Java bytecode. > > > > In the paper, the authors have not mentioned memory consumption of the > > technique. We cannot guess how much it is precisely, but it is > > possible to be a big drawback. Yes, we can say the same for the > > approach taking a baseline compiler instead of an interpreter (like > > Jikes RVM). Memory consumption of the baseline compiler of Jike RVM > > is very interesting. > > It's platform dependent of course, but on IA32 isn't too horrible. For > example, running SPECjvm98 we see a 6.23x expansion from the Jikes RVM > baseline compiler machine code bytes over bytecode bytes. Thanks for giving us such an useful number. It looks reasonable. > One thing to > note is that a threaded interpreter would see something like a 2-4x > expansion over "normal" bytecodes when it converts from bytecodes to its > internal form (arrays of function pointers). Direct threading interpreters like JDK's one work on plain Java bytecode and they do not need to expand normal bytecode instructions. Such expansion may have been required if Java bytecode is not linear and rather a tree or other complicated form. Then, > So, a 6x expansion is > probably only roughly 2x worse than some interpreted systems would > actually see in practice. We have to just say the baseline compiler of Jikes RVM generates 6x larger native code than the original bytecode instructions. For Java-written JVM, it seems to be natural to have a baseline compiler instead of an interpreter. It looks complicated to have an interpreter for a Java-written JVM. We hope that the architecture of a JVM (e.g. interpreter or baseline compiler) is independent of the language for implementing a certain part of JVM. But there seems to be an implication between them. Any comment? Kazuyuki Shudo[EMAIL PROTECTED] http://www.shudo.net/
Re: Other interesting papers and research
[EMAIL PROTECTED] wrote on 06/05/2005 10:48:29 PM: > - The machine code concatinating technique consumes much memory. > In my experience, generated machine code is about 10 times larger > than the original instructions in Java bytecode. > > In the paper, the authors have not mentioned memory consumption of the > technique. We cannot guess how much it is precisely, but it is > possible to be a big drawback. Yes, we can say the same for the > approach taking a baseline compiler instead of an interpreter (like > Jikes RVM). Memory consumption of the baseline compiler of Jike RVM > is very interesting. It's platform dependent of course, but on IA32 isn't too horrible. For example, running SPECjvm98 we see a 6.23x expansion from the Jikes RVM baseline compiler machine code bytes over bytecode bytes. One thing to note is that a threaded interpreter would see something like a 2-4x expansion over "normal" bytecodes when it converts from bytecodes to its internal form (arrays of function pointers). So, a 6x expansion is probably only roughly 2x worse than some interpreted systems would actually see in practice. You can get this data out of Jikes RVM for your platform/program with -X:vm:measureCompilation=true. --dave
Re: Other interesting papers and research
Hi Steve and all, | The approach of using C Compiler generated code rather than writing a | full compiler appeals to me: | http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf > From: Steve Blackburn <[EMAIL PROTECTED]> > Date: Tue, 24 May 2005 21:08:05 +1000 > >>They automatically build themselves > >>simple JIT backends (by extracting fragments produced by the ahead of > >>time compiler). This sounds like a great way to achieve portability > >>while achiving better performance than a conventional interpreter. > >I guess it's a bit better or just comparable with a good interpreter. > > They say it is a lot better: "speedups of up to 1.87 over the fastest > previous interpreter based technique, and performance comparable to > simple native code compilers. Their technique may be reasonable in cases, but it is better to be aware of: - On an Athlon processor, speedup (gforth-native over gforth-super) is less than those on PowerPC. It is between 1.06 and 1.49. - The machine code concatinating technique consumes much memory. In my experience, generated machine code is about 10 times larger than the original instructions in Java bytecode. In the paper, the authors have not mentioned memory consumption of the technique. We cannot guess how much it is precisely, but it is possible to be a big drawback. Yes, we can say the same for the approach taking a baseline compiler instead of an interpreter (like Jikes RVM). Memory consumption of the baseline compiler of Jike RVM is very interesting. And the next one does not reduce the value of the technique, but better to know: - The compared interpreter (gforth-super) could be improved by other techniques including stack caching. This means that their machine code concatinating technique may benefit from those remaining techniques. > >In 1998, I have written such a JIT compiler concatinate code fragments > >generated by GCC for each JVM instruction. > > Very interesting! > > >Unfortunately, the JIT was > >slightly slower than an interpreter in Sun's Classic VM. The > >interpreter was written in x86 assembly language and implements > >dynamic stack caching with 2 registers and 3 states. It performs much > >better than the previous interpreter written in C. > > > >Then I rewrote the JIT. In reality, my old JIT compiler in C cannot be strictly compared to the JDK interpreter in assembly language. I try clarifying the situation. inst. executionstack handling (1)Ertl&Gregg's gforth-super direct threading <2> TOS in a register <2> (2)Ertl&Gregg's gforth-native concat'd machine code <1> TOS in a register <2> (3)JDK interpreter in Cswitch threading <3> memory<3> (4)JDK interpreter in asm direct threading <2> stack caching <1> (5)Shudo's first JIT concat'd machine code <1> memory<3> (6)Shudo's lightweight JIT concat'd macihne code <1> stack caching <1> Gforth-super (1) is the Ertl&Gregg's fastest interpreter compared to the machine code concatinating technique (2). (3) is the only interpreter which an ancient JDK 1.0.2 had. (4) is an interpreter of JDK 1.1 written in assembly language. (5) is my first JIT compiler which concatinates machine code for each VM instruction. (6) is shuJIT, the JIT compiler re-written after (5). <1>-<3> mean the order of expected performance. Machine code concatination proposed in the Ertl&Gregg's paper is marked with <1> because it is the best in those in this table for VM instruction execution. (4) and (6) uses 2 machine registers for stack caching and Ertl&Gregg's implementations uses one register to cache the top of stack (TOS). Ertl&Gregg's paper says that gforth-native (2) provided speedups up to 1.87 over gforth-super (1). I wrote in a previous posting that (5) was slower than (4), but (5) was inferior to (4) in stack handling. It is possible to be the reason of the slowness and then I cannot say that the machine code concatinating technique is not necessarily faster than a good interpreter like (4). The followings are other facts I experienced: - Shudo's lightweight JIT (6) generates about 10 times larger native code compared with the original Java bytecode instructions. This is possible to be a drawback of an approach taking a baseline compiler instead of an interpreter. Does the Ertl&Gregg's machine concatinating technique suffer it? How about a baseline compiler of Jikes RVM? - An interpreter of JDK 1.1 written in assembly (4) executed a numerical benchmark, Linpack about twice as fast as the previous interpreter in C (3). A lesson here is that an interpreter is often regarded as just slow but a faster one and a slower one are very different in performance. - Register utilization is still important even for an interpreter. Interpreters (1) and (4), and lightweight JIT compilers (2) and (6) utilizes one or two machine registers to cache values around TOS.
Re: Other interesting papers and research
[EMAIL PROTECTED] wrote: They automatically build themselves simple JIT backends (by extracting fragments produced by the ahead of time compiler). This sounds like a great way to achieve portability while achiving better performance than a conventional interpreter. I guess it's a bit better or just comparable with a good interpreter. They say it is a lot better: "speedups of up to 1.87 over the fastest previous interpreter based technique, and performance comparable to simple native code compilers. The effort required for retargeting our implementation from the 386 to the PPC architecture was less than a person day." In 1998, I have written such a JIT compiler concatinate code fragments generated by GCC for each JVM instruction. Very interesting! Unfortunately, the JIT was slightly slower than an interpreter in Sun's Classic VM. The interpreter was written in x86 assembly language and implements dynamic stack caching with 2 registers and 3 states. It performs much better than the previous interpreter written in C. Then I rewrote the JIT. It would be interesting to hear your perspective on Ertl & Gregg's approach. Did they do something you had not done? Do they have any particular insight? You are in an excellent position to make a critical assessment of their work. I am not very sure which is better for us, having a portable and so-so baseline compiler or a good interpreter which is possibly less portable than the compiler. There will be a trade off between memory consumption, portability and so on. Ideally we will have both as components and the capacity to choose either or both depending on the build we are targetting. Cheers, --Steve
Re: Other interesting papers and research
From: Steve Blackburn <[EMAIL PROTECTED]> > > [EMAIL PROTECTED] wrote: > > > >> The approach of using C Compiler generated code rather than writing a > >> full compiler appeals to me: > >> http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf > >> > >> I am curious on how well the approach performs compared to existing > >> JITs. > They automatically build themselves > simple JIT backends (by extracting fragments produced by the ahead of > time compiler). This sounds like a great way to achieve portability > while achiving better performance than a conventional interpreter. I guess it's a bit better or just comparable with a good interpreter. In 1998, I have written such a JIT compiler concatinate code fragments generated by GCC for each JVM instruction. Unfortunately, the JIT was slightly slower than an interpreter in Sun's Classic VM. The interpreter was written in x86 assembly language and implements dynamic stack caching with 2 registers and 3 states. It performs much better than the previous interpreter written in C. Then I rewrote the JIT. I am not very sure which is better for us, having a portable and so-so baseline compiler or a good interpreter which is possibly less portable than the compiler. There will be a trade off between memory consumption, portability and so on. Kazuyuki Shudo[EMAIL PROTECTED] http://www.shudo.net/
RE: Other interesting papers and research
>Summing up, I support the idea of a java/bytecode to C compiler that can >be bundled with gcc. As stated we would gain portability and we can use >all facilities provided by gcc. To me it sounds a bit like gcj... In order to use a low level bytecode as an intermediate representation, LLVM bytecode (which can be emitted by a modified version of gcc [may be useful for rapid prototyping]) could be a realization of your ideas, IMHO. Another way I see it would be to use a simplified version of Java bytecode to represent low-level instructions (as the Squeak Smalltalk VM works: only static methods and a restricted class set is supported, this way it is possible to run the VM in itself, and also to machine-generate C code from the VM sources. In modern days we might consider generating LLVM bytecode - or simply use gcj! For this kind of architecture, for instance [1]) Regards, RB [1] not sure about the implications in terms of speed, though. In my point of view it is better, anyway, to have a working system first and then hook bestial optimizations latter [once the execution model is simple enough or rather close enough to a reasonable machine abstraction this might be somewhat easier/less difficult] PS: I am favorable to use a "java in java" VM as a plugin to such a system. (or rather, have a "stock" VM with fast uptimes as bootstrap and then a "very optimizing" plugin, which does not need to tackle with anything but code generation itself - if this is really possible) -Original Message- From: Ariel Sabiguero Yawelak [mailto:[EMAIL PROTECTED] Sent: Monday, May 23, 2005 5:06 PM To: harmony-dev@incubator.apache.org Subject: Re: Other interesting papers and research Other interesting things that can be achieved are some sorts of high performance "tunning" aspects, which are very interesting, and using gcc power might be more interesting than redoing it from scratch, either, at the begining of current project, or maybe forever. An adequate "bundle" of gcc and harmony might produce a JIT/WAT java/bytecode compilation. Moreover, the compilation parameters might be "tuneable" by the JVM administrator and choose between compilation speed, compilation performance, memory footprint, etc. Appart from code-reusing, there is also an adequate sort of abstraction that is good here. and concentrating on this, we avoid discussing machine level details as we all agree that GCC is portable, performant and adequate. Summing up, I support the idea of a java/bytecode to C compiler that can be bundled with gcc. As stated we would gain portability and we can use all facilities provided by gcc. Ariel Archie Cobbs wrote: > [EMAIL PROTECTED] wrote: > >> The approach of using C Compiler generated code rather than writing a >> full compiler appeals to me: >> http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf >> >> I am curious on how well the approach performs compared to existing >> JITs. > > > I'm admittedly biased, but the approach of using the C compiler has > some good benefits, mainly in portability. This is especially true for > architectures like x86 that have a complicated instruction set, where > optmization is a subtle art. Though JC uses the C compiler as a WAT > instead of a JIT, it is very portable (to any architecture that GCC > targets) as a result. To the extent that portability is a goal, this > might make sense as an approach to take, at least initially. > > -Archie > > __ > > Archie Cobbs *CTO, Awarix* > http://www.awarix.com > >
Re: Other interesting papers and research
Other interesting things that can be achieved are some sorts of high performance "tunning" aspects, which are very interesting, and using gcc power might be more interesting than redoing it from scratch, either, at the begining of current project, or maybe forever. An adequate "bundle" of gcc and harmony might produce a JIT/WAT java/bytecode compilation. Moreover, the compilation parameters might be "tuneable" by the JVM administrator and choose between compilation speed, compilation performance, memory footprint, etc. Appart from code-reusing, there is also an adequate sort of abstraction that is good here. and concentrating on this, we avoid discussing machine level details as we all agree that GCC is portable, performant and adequate. Summing up, I support the idea of a java/bytecode to C compiler that can be bundled with gcc. As stated we would gain portability and we can use all facilities provided by gcc. Ariel Archie Cobbs wrote: [EMAIL PROTECTED] wrote: The approach of using C Compiler generated code rather than writing a full compiler appeals to me: http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf I am curious on how well the approach performs compared to existing JITs. I'm admittedly biased, but the approach of using the C compiler has some good benefits, mainly in portability. This is especially true for architectures like x86 that have a complicated instruction set, where optmization is a subtle art. Though JC uses the C compiler as a WAT instead of a JIT, it is very portable (to any architecture that GCC targets) as a result. To the extent that portability is a goal, this might make sense as an approach to take, at least initially. -Archie __ Archie Cobbs *CTO, Awarix* http://www.awarix.com
Re: Other interesting papers and research
Archie Cobbs wrote: [EMAIL PROTECTED] wrote: The approach of using C Compiler generated code rather than writing a full compiler appeals to me: http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf I am curious on how well the approach performs compared to existing JITs. I'm admittedly biased, but the approach of using the C compiler has some good benefits, mainly in portability. As far as I can tell, the technical insight in this paper has nothing to do with C per se. It has to do with having a portable ahead of time compiler (be it C or Java). The idea of leveraging a portable ahead of time compiler is something that all interpreters do. The insight here is to do it far more agressively. They automatically build themselves simple JIT backends (by extracting fragments produced by the ahead of time compiler). This sounds like a great way to achieve portability while achiving better performance than a conventional interpreter. So long as we have a portable java WAT compiler at our disposal (gcj), I think we can apply this neat idea independant of whether we're using C, C++ or Java (or fortran for that matter). --Steve
Re: Other interesting papers and research
[EMAIL PROTECTED] wrote: The approach of using C Compiler generated code rather than writing a full compiler appeals to me: http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf I am curious on how well the approach performs compared to existing JITs. I'm admittedly biased, but the approach of using the C compiler has some good benefits, mainly in portability. This is especially true for architectures like x86 that have a complicated instruction set, where optmization is a subtle art. Though JC uses the C compiler as a WAT instead of a JIT, it is very portable (to any architecture that GCC targets) as a result. To the extent that portability is a goal, this might make sense as an approach to take, at least initially. -Archie __ Archie Cobbs *CTO, Awarix* http://www.awarix.com
Other interesting papers and research
Thanks to the JAM post: http://www.csc.uvic.ca/~csc586a/papers/index.html in particular this: http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf The approach of using C Compiler generated code rather than writing a full compiler appeals to me: http://www.csc.uvic.ca/~csc586a/papers/ertlgregg04.pdf I am curious on how well the approach performs compared to existing JITs. -Andy -- Andrew C. Oliver SuperLink Software, Inc. Java to Excel using POI http://www.superlinksoftware.com/services/poi Commercial support including features added/implemented, bugs fixed.