Hi everybody The attached email's are from a private discussion I've been having with Gilbert Carl Herschberger, Todd Miller and John Leuner. The discussion is about us working on a combined JVM project. This discussion was one of the reasons for my email "The JOS Project?" to the general mailing list. Robert Fitzsimons [EMAIL PROTECTED]
Hello Todd, Gilbert As I hope you two know i've been playing around with writing my own JVM for the last few months. It began as an experiment when I got a bit of writers block with RJK, it's goals were to share as much information between multiple Java processes and be fast and efficient. It is currently at the stage where class files can be loaded, simple code executed and objects created. Now i've stopped working on one code base and restarted with a new one about three times so far. I have found this a very good way to improve the quality of the code, and include the much better ideas you come up with after you've written a piece of code. Now I reached this stage again and this is where you two come in. We have all been working on our own JVM related projects decafe, Pure Reflection, my JVM, etc. Also as part of the architecture group we've also had a lot of really good ideas, including multiple Java processes, BCNI, MPCL, etc. The issue is with all the work thats going on and all the ideas we've had this still isn't that much to show for it. So lets take all we've learned and all the ideas, and start again. But this time as a group! Lets come up with a design using all our current ideas and allow for future expansion, then write it so that is can be fast, efficient, flexible and portable. So what do you guy's think, is it worth doing, can we do it? I'm willing to listen to all ideas and issues on design, implementation, programming language, coding style, etc. Robert Fitzsimons [EMAIL PROTECTED] PS. I'm making my current code base available at <URL:http://www.273k.net/jos/jvm.20000911.tar.gz>, so if you have any questions or comments give me a shout.
As you may know, I have suggested we export existing components to a VMKit. We can build common vm-related components so that each of us can build our own virtual machine. We are not limited to building a single virtual machine together. Are you again suggesting that we work together on a single virtual machine? I want a virtual machine that runs on jJOS, Linux and Windows. I would like most of it to be written in C/C++. I don't care if some of it is written in x86 assembler. I don't care if it is compiled from Java source code. We should learn from the mismanagement of the decaf project. A community won't volunteer to work on a virtual machine that satisfies the requirements of only one person. No, a community finds it easy to work on reusable components that will satify the requirements of everyone.
At 03:53 AM 9/13/00 +0000, you wrote: >Yes. I think it would be to the projects advantage for there to be a >single JVM. Although it's nice to have more I don't think it's not very >practical at this moment in time. I believe it might be easier to attract virtual machine designers if we support all virtual machine designs. Diversity is good. While /we/ don't have to form a team to build many virtual machines at once, we should expect our base virtual machine to be customize-able and specialize-able. I embrace the idea of multiple virtual machines through extension of a base virtual machine. Our goal should be to build that base. >The VMKit is a good idea, we just need to figure out how to accomplish >it. We should try and write each component to be reusable, but doing >this and writing two or more JVM at the same time is just a waste. When each member of a VMKit group is expected to write their own virtual machine, it forces certain issues. It forces us to think of each component as a plug-in. If I write a component, for example, that you can plug into your custom virtual machine, that's a good thing. Isn't it? By opening up the expectations, many more virtual machine designers will be attracted to the VMKit group...because they can get the specialized virtual machine they always wanted. If a group wants to work on a single virtual machine, that's no problem. It's a start. I am fully opposed to the one-size-fits-all approach. Members of a VMKit group are not required to work on a group virtual machine if they don't want to. They can contribute vm-related components. Also, we can salvage vm-related components from other open source project. >I wouldn't say that decaf was mismanaged it just couldn't attract many >long term developers like myself, and for me that was more due to JJOS >than anything else. This might be significant. Like you, I do not like jJOS, the decaf-specific kernel. I think you mean jJOS (the kernel) and not JJOS (the kernel and virtual machine). The jJOS kernel is so decaf-specific that it invokes decaf_main(). It provides no well-defined kernel interface. It uses C++ and classes; but, it lacks object oriented design in specialization for Etherboot and GRUB. What else is wrong with jJOS? Can't we replace jJOS with something better? I think so. We can salvage parts of jJOS to create a better kernel. Reading the Linux Programming White Papers, I see that the Linux project organized its header files properly, leading to a strong parallel development process. The header files for jJOS and decaf are organized horribly. Inside Pure Reflection for C++, I tried to distinguish between public classes and private classes. I also put classes in class libraries. jJOS and decaf do not distinguish between public and private classes, making all classes public. There isn't a single class library in the entire project! For example, Many calculator libraries can implement the public calculator interface. When a "calculator" is compiled into an independent library, it can be plugged into a custom virtual machine. Only the calculator interface is exposed to a virtual machine. Only the calculator interface must be stored in a public header file. When I pre-compile the calculator library and distribute it in a binary edition, any specialized tools needed to build the calculator library are optional, not required. That is encapsulation. An enhanced calculator can extend the calculator interface. That is inheritence. A debugging version of the calculator could log each calculator request. A remote calculator can run on another CPU. A backup calculator can be compared to a new one. Three calculators can be used simultaneously to reduce the chance of a math error at runtime. You could choose a calculator from a list of calculators at runtime. That is a good use of polymorphism. For each component in a virtual machine, there is a similar story.
I'm very interested in your ideas about how to implement multiple java processes. Right now, I have two main priorities. The greater priorities are first to write a JVM amenable to integration with a class library, and then perform that integration; and second, for that JVM to support multiple java processes. BCNI is not as important, and though I haven't thought it through as much, should be a well-enscapulated change to make. (That is, only invokenative should have to be changed. And we have to do some thinking about handling exceptions in the 'native' bytecode, etc.) I was thinking about architecture questions, and it occured to me that the idea of converting bytecode into an array of function calls has the benefit that it becomes much simpler to replace parts of it with JIT-compiled code or accelerated interpretation. (For instance, many common sequences are longer than they 'need' to be because of the operand stack. Recognizing what those sequences do allows them to be shortened and the stack avoided.) What about Jay Lepreau? ('kissme') -_Quinn
> I'm very interested in your ideas about how to implement multiple > Java processes. Right now, I have two main priorities. The greater > priorities are first to write a JVM amenable to integration with a class > library, and then perform that integration; and second, for that JVM to > support multiple Java processes. BCNI is not as important, and though I > haven't thought it through as much, should be a well-enscapulated change > to make. (That is, only invokenative should have to be changed. And we > have to do some thinking about handling exceptions in the 'native' > bytecode, etc.) Basically the goal for multiple Java processes is to save memory by not having multiple copies of the same information in memory. So what we need to decide is what information can be shared and what can't. It's easier to start with what can't be shared between processes: * instance data * static data * thread data (stack frames, ip, etc) And what can be shared between processes: * class data * method data * field data * interface data I haven't listed constant data or string data because these are a little fuzzy. With this list in your mind you can now start coding, but you have to make sure that the non shared data has no pointers to shared data. +---------------------+-------------+ | Object <---> Class -+-> ClassData | +---------------------+-------------+ It really is that easy to create the data structures for multiple processes. The problem is with writing an execution engine, the code needs to be written so that it doesn't reference the non shared data directly (this is a lot easier to do with an interpreter). This is what i've spent the last month or so working on. There are a lot of other things as well, I can't really explain them in words but the code in my JVM covers most of them. > I was thinking about architecture questions, and it occured to me > that the idea of converting bytecode into an array of function calls has > the benefit that it becomes much simpler to replace parts of it with > JIT-compiled code or accelerated interpretation. (For instance, many > common sequences are longer than they 'need' to be because of the operand > stack. Recognizing what those sequences do allows them to be shortened > and the stack avoided.) Do you mean convert each bytecode opcode into a function call? The call over head is just too great for that to work. I think we need to forget about interpreters and go for a native compiler or JIT. With JOS we have the best chance in the world to write the fastest JVM the whole OS is written to run Java bytecode. Though recognizing common bytecode sequences is a good place to start with improving speed. > What about Jay Lepreau? ('kissme') I forwarded the same email to him a day or so after I sent it to you and Gilbert. Robert Fitzsimons [EMAIL PROTECTED]
>Basically the goal for multiple Java processes is to save memory by not >having multiple copies of the same information in memory. The goal is to save "memory" by /any/ mechanism, not just multiple bytecode processes. Multiple bytecode process is not required. It is only one approach to this problem. Unfortunately, it seems to be the least likely to succeed. Most of the potential to save "memory" seems to come from specifically optimizing a virtual machine to use the kernel's virtual memory manager properly. Bytecode in virtual memory should be marked "read-only", not "read-write". It should be stored in a system-wide bytecode cache. Think about it. The potential to save memory is limited. An application that has 10MB of object data will always have 10MB of object data. Most of the potential to save memory is found inside the duplication of raw bytecode. Is it necessary to define multiple Java processes in order to save memory? No. Is multiple Java processes the only way to save memory? No. Could there be a simpler alternative that saves just as much memory, or more? Yes. Is there a platform-independent solution? Yes. Without a bytecode cache, if there are 4MB of the same read/write bytecode per virtual machine and there are 100 virtual machines, that's 400MB of wasted swap space. With a bytecode cache, the same scene would require 4MB of swap space. By making bytecode a resource, it requires no swap space. So much effort has been consumed (wasted?) by the theory of multiple bytecode processes. The theory of multiple bytecode processes has been adopted and assumed by some without a lot of thought. It is a theory that may have never been challenged. I'll challenge it. By comparing multiple bytecode processes within a virtual machine to a highly optimized virtual machine, I have concluded that multiple bytecode processes within the same virtual machine does not "save" memory. Instead, it adds far too much complexity to the internal workings of a virtual machine. When the solution is reduced, it requires just as much memory to run multiple bytecode processes as it does to run multiple virtual machines. It is six of one, half dozen of the other. Add a bytecode cache and bytecode resource to an off-the-shelf virtual machine and this saves potentially all of the memory that might be saved by multiple bytecode processes.
Hi Gilbert I don't think there's anything in your email that Todd and John couldn't see, so i've included them in my reply. I'm not sure what you mean by "bytecode cache". Could you explain it in more detail? Are you talking about this email from last year? <URL:http://jos.org/pipermail/arch/1999-November/000325.html> # What is a bytecode cache? There are plenty of options. You might # install a bytecode cache servlet in your Java-enabled HTTP server. # Configure it for cache size and trusted Internet websites and you're # done. Everyone on the network can use all your applications. # # You might install package files on a static HTTP server. You might # install the bytecode cache daemon on a server and skip the HTTP thing. # What does it really mean to install an application. For everyone that # just wants to take it for a test drive, they just run it. Distributing # the bytecode -- that is what a network is for, isn't it? What you talk about in the above email is a way of caching information so that you don't have to download it again. This does not save any memory at the JVM level. If you look at the classfile data structure you will see that it is optimized for size. This means it has to be convert into an internal data structure before it can be used by a JVM [1]. Only having to load this internal data structure once is where you get the saving when you use multiple Java processes. If the "bytecode cache" contains this internal data structure then it might save memory but not otherwise. Robert Fitzsimons [EMAIL PROTECTED] 1. Any developer that writes a JVM that used the classfile directly as it's internal data structure should be shot IMHO. On Tue, Sep 19, 2000 at 09:49:41AM -0400, Gilbert Carl Herschberger II wrote: > >Basically the goal for multiple Java processes is to save memory by not > >having multiple copies of the same information in memory. > > The goal is to save "memory" by /any/ mechanism, not just multiple > bytecode > processes. Multiple bytecode process is not required. It is only one > approach to this problem. Unfortunately, it seems to be the least > likely to > succeed. > > Most of the potential to save "memory" seems to come from specifically > optimizing a virtual machine to use the kernel's virtual memory > manager > properly. Bytecode in virtual memory should be marked "read-only", not > "read-write". It should be stored in a system-wide bytecode cache. > > Think about it. The potential to save memory is limited. An > application > that has 10MB of object data will always have 10MB of object data. > Most of > the potential to save memory is found inside the duplication of raw > bytecode. Is it necessary to define multiple Java processes in order > to > save memory? No. Is multiple Java processes the only way to save > memory? > No. Could there be a simpler alternative that saves just as much > memory, or > more? Yes. Is there a platform-independent solution? Yes. > > Without a bytecode cache, if there are 4MB of the same read/write > bytecode > per virtual machine and there are 100 virtual machines, that's 400MB > of > wasted swap space. With a bytecode cache, the same scene would require > 4MB > of swap space. By making bytecode a resource, it requires no swap > space. > > > So much effort has been consumed (wasted?) by the theory of multiple > bytecode processes. The theory of multiple bytecode processes has been > adopted and assumed by some without a lot of thought. It is a theory > that > may have never been challenged. I'll challenge it. By comparing > multiple > bytecode processes within a virtual machine to a highly optimized > virtual > machine, I have concluded that multiple bytecode processes within the > same > virtual machine does not "save" memory. Instead, it adds far too much > complexity to the internal workings of a virtual machine. When the > solution > is reduced, it requires just as much memory to run multiple bytecode > processes as it does to run multiple virtual machines. It is six of > one, > half dozen of the other. > > Add a bytecode cache and bytecode resource to an off-the-shelf virtual > machine and this saves potentially all of the memory that might be > saved by > multiple bytecode processes.
Hmm. Shouldn't we discuss this openly, on the kernel mailing list? At 03:06 AM 9/20/00 +0000, you wrote: >I'm not sure what you mean by "bytecode cache". Could you explain it in >more detail? >Are you talking about this email from last year? > ><URL:http://jos.org/pipermail/arch/1999-November/000325.html> No, no. That bytecode cache is a network bytecode cache. It enables a network to cache bytecode at an HTTP, SQL, application or proxy server. The kernel bytecode cache is system-wide and enables a kernel to cache bytecode for multiple virtual machines and/or multiple bytecode processes. >What you talk about in the above email is a way of caching information >so that you don't have to download it again. This does not save any >memory at the JVM level. A network bytecode cache reduces the amount of time required to download bytecode; it does not save memory. In contrast, a kernel bytecode cache saves memory; it does not reduce the amount of time required to download bytecode. This is the kind of bytecode cache I was describing. >If you look at the classfile data structure you will see that it is >optimized for size. This means it has to be convert into an internal data >structure before it can be used by a JVM [1]. Only having to load this >internal data structure once is where you get the saving when you use >multiple Java processes. Let's see if I understand this correctly. You say that the classfile data structure is optimized for size. You also agree that our goal is the conservation of memory, a very precious resource. If the classfile is already optimize for size and size matters, we can use classfile data structure to save memory. >If the "bytecode cache" contains this internal data structure then it >might save memory but not otherwise. You're saying that an internal data structure cache is the /only/ way to save memory; but, it's not. Such a cache can contain bytecode, internal data structures, or both and still save memory. There is more than one way to construct a kernel bytecode cache, not just one. While you may desire a cache that throws away the original bytecode and saves internal data structures, I would like it to keep the original bytecode. I believe there are many reasons to save the original bytecode. Here are a few. 1. The bytecode cache is vm-independent. An internal data structure is always specific to a virtual machine. The internal data structure should be internal to a virtual machine. It should not be exposed to a kernel and/or other virtual machines. Multiple virtual machines should not share a common internal data structure. If a virtual machine is optimized for speed, its internal data structure is optimized for speed. If a virtual machine is optimized for size, its internal data structure is optimized for size. And so on. 2. Two classes are able to modify the state of an object if they are equivalent. They are obviously equivalent if they share the same bytecode. They are also obviously equivalent if there bytecode is matches byte-for-byte. The internal data structure is more difficult, but not impossible, to compare. 3. Boot classes can be statically linked to a kernel. The java.lang, java.util, java.io and java.net packages, for example, can be pre-loaded in a kernel bytecode cache. There are many examples of where this can save space. When a virtual machine "loads" its boot classes directly from the kernel, this saves time and opens up the possibility of downloading the remainder of the standard Java class library from across the network. For a MPCL-compatible virtual machine, it always "loads" its boot classes directly from the kernel. Boot classes are guaranteed to be equivalent and conserve memory in every bytecode process because CLASSPATH is not used for boot classes. Also, static fields are unique for each primordial class loader making that part of a bytecode process independent from all other bytecode processes. On the other hand, a similar cache is required within a virtual machine. It is not a bytecode cache, but an internal class structure cache. The internal class structure cache is part of a virtual machine, not a kernel. It cannot be safely shared among virtual machines. The operation of an internel class structure cache seems to be redundent with the operation of a kernel bytecode cache. Is it? No. Here's why. If we build five virtual machines, we must build five internal class structure caches because the internal class structure is always unique to a virtual machine. It might be a class hiearchy like this: InternalClassStructureCache | +-- DecafInternalClassStructureCache | +-- KaffeInternalClassStructureCache | +-- JapharInternalClassStructureCache | +-- KissmeInternalClassStructureCache There must be five different caches because the internel class structure of one virtual machine does not match any other. If we build five virtual machines, we build one kernel bytecode caches, not five. KernelBytecodeCache There seems to be a fixation on the one kernel/one virtual machine implementation. Why is this? It is not the only way. When decaf, Kaffe, Japhar and Kissme are running at the same time on one kernel, there is one kernel/multiple virtual machines. The maximum amount of conservation comes from integrating the kernel bytecode cache with a virtual memory manager. Bytecode that is not "in use" does not have to be stored in real memory. In the future, it is more likely that multiple instances of a virtual machine will be running on a kernel. It is more likely they will use a preemptive multitasking kernel. This is also one kernel/multiple virtual machines. With John's proof of concept, it is more likely we'll use more of the Linux kernel. And please don't try to convince me that only one instance of decaf will ever be running on a kernel. Today, we're using the Linux kernel to run decaf in host mode. Multiple instances of decaf can be running at the same time on a Linux kernel. While this might not be ideal, it is real. >1. Any developer that writes a JVM that used the classfile directly as >it's internal data structure should be shot IMHO. When the stated goal is optimization for size, it seems obvious that a classfile format is the best choice. It is already optimized for size. It is undesireable and unnecessary to duplicate the data that is already stored inside raw bytecode. 1. constant pool (class names, field names, method names, attribute names) 2. field table 3. method table 4. code attribute of a method 5. exception attribute of a method Measured in bytes, more than 90% of my internal data structure is already stored appropriately in raw bytecode. I see no need to duplicate 90% in order to save 10% for each primordial class loader. I'd rather save the 90%. When the stated goal is optimization for speed, size doesn't matter. Conservation of memory is not an issue. The code attribute of all methods can be compiled into machine code. The machine code has to be stored somewhere in real memory. This is the domain of an internal data structure for a virtual machine, not a system-wide kernel bytecode cache. For multiple bytecode processes, classes must be compiled at least once for each CLASSPATH. I am doing whatever I can to understand the implications of my stated goal. I am building a bytecode interpreter optimized for size. I have every reason to believe the original bytecode is critical to my design. I continue to combine bytecode resource and bytecode cache to save memory.
