Hi everybody

The attached email's are from a private discussion I've been having with
Gilbert Carl Herschberger, Todd Miller and John Leuner.  The discussion
is about us working on a combined JVM project.  This discussion was one
of the reasons for my email "The JOS Project?" to the general mailing
list.

Robert Fitzsimons
[EMAIL PROTECTED]



Hello Todd, Gilbert

As I hope you two know i've been playing around with writing my own JVM
for the last few months.  It began as an experiment when I got a bit of
writers block with RJK, it's goals were to share as much information
between multiple Java processes and be fast and efficient.  It is
currently at the stage where class files can be loaded, simple code
executed and objects created.

Now i've stopped working on one code base and restarted with a new one
about three times so far.  I have found this a very good way to improve
the quality of the code, and include the much better ideas you come up
with after you've written a piece of code.  Now I reached this stage
again and this is where you two come in.

We have all been working on our own JVM related projects decafe, Pure
Reflection, my JVM, etc.  Also as part of the architecture group we've
also had a lot of really good ideas, including multiple Java processes,
BCNI, MPCL, etc.  The issue is with all the work thats going on and all
the ideas we've had this still isn't that much to show for it.

So lets take all we've learned and all the ideas, and start again.  But
this time as a group!  Lets come up with a design using all our current
ideas and allow for future expansion, then write it so that is can be
fast, efficient, flexible and portable.

So what do you guy's think, is it worth doing, can we do it?

I'm willing to listen to all ideas and issues on design, implementation,
programming language, coding style, etc.

Robert Fitzsimons
[EMAIL PROTECTED]

PS.  I'm making my current code base available at
<URL:http://www.273k.net/jos/jvm.20000911.tar.gz>, so if you have any
questions or comments give me a shout.




As you may know, I have suggested we export existing components to a VMKit.
We can build common vm-related components so that each of us can build our
own virtual machine. We are not limited to building a single virtual
machine together.

Are you again suggesting that we work together on a single virtual machine?

I want a virtual machine that runs on jJOS, Linux and Windows. I would like
most of it to be written in C/C++. I don't care if some of it is written in
x86 assembler. I don't care if it is compiled from Java source code.

We should learn from the mismanagement of the decaf project. A community
won't volunteer to work on a virtual machine that satisfies the
requirements of only one person. No, a community finds it easy to work on
reusable components that will satify the requirements of everyone.




At 03:53 AM 9/13/00 +0000, you wrote:
>Yes.  I think it would be to the projects advantage for there to be a
>single JVM.  Although it's nice to have more I don't think it's not very
>practical at this moment in time.

I believe it might be easier to attract virtual machine designers if we
support all virtual machine designs. Diversity is good. While /we/ don't
have to form a team to build many virtual machines at once, we should
expect our base virtual machine to be customize-able and specialize-able. I
embrace the idea of multiple virtual machines through extension of a base
virtual machine. Our goal should be to build that base.

>The VMKit is a good idea, we just need to figure out how to accomplish
>it.  We should try and write each component to be reusable, but doing
>this and writing two or more JVM at the same time is just a waste.

When each member of a VMKit group is expected to write their own virtual
machine, it forces certain issues. It forces us to think of each component
as a plug-in. If I write a component, for example, that you can plug into
your custom virtual machine, that's a good thing. Isn't it?

By opening up the expectations, many more virtual machine designers will be
attracted to the VMKit group...because they can get the specialized virtual
machine they always wanted. If a group wants to work on a single virtual
machine, that's no problem. It's a start.

I am fully opposed to the one-size-fits-all approach. Members of a VMKit
group are not required to work on a group virtual machine if they don't
want to. They can contribute vm-related components.

Also, we can salvage vm-related components from other open source project.

>I wouldn't say that decaf was mismanaged it just couldn't attract many
>long term developers like myself, and for me that was more due to JJOS
>than anything else.

This might be significant. Like you, I do not like jJOS, the decaf-specific
kernel. I think you mean jJOS (the kernel) and not JJOS (the kernel and
virtual machine). The jJOS kernel is so decaf-specific that it invokes
decaf_main(). It provides no well-defined kernel interface. It uses C++ and
classes; but, it lacks object oriented design in specialization for
Etherboot and GRUB.

What else is wrong with jJOS? Can't we replace jJOS with something better?
I think so. We can salvage parts of jJOS to create a better kernel.

Reading the Linux Programming White Papers, I see that the Linux project
organized its header files properly, leading to a strong parallel
development process. The header files for jJOS and decaf are organized
horribly.

Inside Pure Reflection for C++, I tried to distinguish between public
classes and private classes. I also put classes in class libraries. jJOS
and decaf do not distinguish between public and private classes, making all
classes public. There isn't a single class library in the entire project!

For example, Many calculator libraries can implement the public calculator
interface. When a "calculator" is compiled into an independent library, it
can be plugged into a custom virtual machine. Only the calculator interface
is exposed to a virtual machine. Only the calculator interface must be
stored in a public header file. When I pre-compile the calculator library
and distribute it in a binary edition, any specialized tools needed to
build the calculator library are optional, not required. That is
encapsulation.

An enhanced calculator can extend the calculator interface. That is
inheritence. A debugging version of the calculator could log each
calculator request. A remote calculator can run on another CPU. A backup
calculator can be compared to a new one. Three calculators can be used
simultaneously to reduce the chance of a math error at runtime.

You could choose a calculator from a list of calculators at runtime. That
is a good use of polymorphism.

For each component in a virtual machine, there is a similar story.




        I'm very interested in your ideas about how to implement multiple
java processes.  Right now, I have two main priorities.  The greater
priorities are first to write a JVM amenable to integration with a class
library, and then perform that integration; and second, for that JVM to
support multiple java processes.  BCNI is not as important, and though I
haven't thought it through as much, should be a well-enscapulated change
to make.  (That is, only invokenative should have to be changed.  And we
have to do some thinking about handling exceptions in the 'native'
bytecode, etc.)

        I was thinking about architecture questions, and it occured to me
that the idea of converting bytecode into an array of function calls has
the benefit that it becomes much simpler to replace parts of it with
JIT-compiled code or accelerated interpretation.  (For instance, many
common sequences are longer than they 'need' to be because of the operand
stack.  Recognizing what those sequences do allows them to be shortened
and the stack avoided.)

        What about Jay Lepreau?  ('kissme')

-_Quinn









>       I'm very interested in your ideas about how to implement multiple
> Java processes.  Right now, I have two main priorities.  The greater
> priorities are first to write a JVM amenable to integration with a class
> library, and then perform that integration; and second, for that JVM to
> support multiple Java processes.  BCNI is not as important, and though I
> haven't thought it through as much, should be a well-enscapulated change
> to make.  (That is, only invokenative should have to be changed.  And we
> have to do some thinking about handling exceptions in the 'native'
> bytecode, etc.)

Basically the goal for multiple Java processes is to save memory by not
having multiple copies of the same information in memory.  So what we
need to decide is what information can be shared and what can't.

It's easier to start with what can't be shared between processes:
* instance data
* static data
* thread data (stack frames, ip, etc)

And what can be shared between processes:
* class data
* method data
* field data
* interface data

I haven't listed constant data or string data because these are a little
fuzzy.

With this list in your mind you can now start coding, but you have to
make sure that the non shared data has no pointers to shared data.

+---------------------+-------------+
| Object <---> Class -+-> ClassData |
+---------------------+-------------+

It really is that easy to create the data structures for multiple
processes.  The problem is with writing an execution engine, the code
needs to be written so that it doesn't reference the non shared data
directly (this is a lot easier to do with an interpreter).  This is what
i've spent the last month or so working on.

There are a lot of other things as well, I can't really explain them in
words but the code in my JVM covers most of them.

>       I was thinking about architecture questions, and it occured to me
> that the idea of converting bytecode into an array of function calls has
> the benefit that it becomes much simpler to replace parts of it with
> JIT-compiled code or accelerated interpretation.  (For instance, many
> common sequences are longer than they 'need' to be because of the operand
> stack.  Recognizing what those sequences do allows them to be shortened
> and the stack avoided.)

Do you mean convert each bytecode opcode into a function call?  The call
over head is just too great for that to work.  I think we need to forget
about interpreters and go for a native compiler or JIT.  With JOS we
have the best chance in the world to write the fastest JVM the whole OS
is written to run Java bytecode.

Though recognizing common bytecode sequences is a good place to start with
improving speed.

>       What about Jay Lepreau?  ('kissme')

I forwarded the same email to him a day or so after I sent it to you and
Gilbert.

Robert Fitzsimons
[EMAIL PROTECTED]




>Basically the goal for multiple Java processes is to save memory by not
>having multiple copies of the same information in memory.

The goal is to save "memory" by /any/ mechanism, not just multiple bytecode
processes. Multiple bytecode process is not required. It is only one
approach to this problem. Unfortunately, it seems to be the least likely to
succeed.

Most of the potential to save "memory" seems to come from specifically
optimizing a virtual machine to use the kernel's virtual memory manager
properly. Bytecode in virtual memory should be marked "read-only", not
"read-write". It should be stored in a system-wide bytecode cache.

Think about it. The potential to save memory is limited. An application
that has 10MB of object data will always have 10MB of object data. Most of
the potential to save memory is found inside the duplication of raw
bytecode. Is it necessary to define multiple Java processes in order to
save memory? No. Is multiple Java processes the only way to save memory?
No. Could there be a simpler alternative that saves just as much memory, or
more? Yes. Is there a platform-independent solution? Yes.

Without a bytecode cache, if there are 4MB of the same read/write bytecode
per virtual machine and there are 100 virtual machines, that's 400MB of
wasted swap space. With a bytecode cache, the same scene would require 4MB
of swap space. By making bytecode a resource, it requires no swap space.

So much effort has been consumed (wasted?) by the theory of multiple
bytecode processes. The theory of multiple bytecode processes has been
adopted and assumed by some without a lot of thought. It is a theory that
may have never been challenged. I'll challenge it. By comparing multiple
bytecode processes within a virtual machine to a highly optimized virtual
machine, I have concluded that multiple bytecode processes within the same
virtual machine does not "save" memory. Instead, it adds far too much
complexity to the internal workings of a virtual machine. When the solution
is reduced, it requires just as much memory to run multiple bytecode
processes as it does to run multiple virtual machines. It is six of one,
half dozen of the other.

Add a bytecode cache and bytecode resource to an off-the-shelf virtual
machine and this saves potentially all of the memory that might be saved by
multiple bytecode processes.




Hi Gilbert

I don't think there's anything in your email that Todd and John couldn't
see, so i've included them in my reply.

I'm not sure what you mean by "bytecode cache".  Could you explain it in
more detail?  Are you talking about this email from last year?

<URL:http://jos.org/pipermail/arch/1999-November/000325.html>

# What is a bytecode cache? There are plenty of options. You might
# install a bytecode cache servlet in your Java-enabled HTTP server.
# Configure it for cache size and trusted Internet websites and you're
# done. Everyone on the network can use all your applications.
#
# You might install package files on a static HTTP server. You might
# install the bytecode cache daemon on a server and skip the HTTP thing.
# What does it really mean to install an application. For everyone that
# just wants to take it for a test drive, they just run it. Distributing
# the bytecode -- that is what a network is for, isn't it?

What you talk about in the above email is a way of caching information
so that you don't have to download it again.  This does not save any
memory at the JVM level.

If you look at the classfile data structure you will see that it is
optimized for size.  This means it has to be convert into an internal
data
structure before it can be used by a JVM [1].  Only having to load this
internal data structure once is where you get the saving when you use
multiple Java processes.

If the "bytecode cache" contains this internal data structure then it
might save memory but not otherwise.

Robert Fitzsimons
[EMAIL PROTECTED]

1.  Any developer that writes a JVM that used the classfile directly as
it's internal data structure should be shot IMHO.

On Tue, Sep 19, 2000 at 09:49:41AM -0400, Gilbert Carl Herschberger II wrote:
> >Basically the goal for multiple Java processes is to save memory by not
> >having multiple copies of the same information in memory.
>
> The goal is to save "memory" by /any/ mechanism, not just multiple
> bytecode
> processes. Multiple bytecode process is not required. It is only one
> approach to this problem. Unfortunately, it seems to be the least
> likely to
> succeed.
>
> Most of the potential to save "memory" seems to come from specifically
> optimizing a virtual machine to use the kernel's virtual memory
> manager
> properly. Bytecode in virtual memory should be marked "read-only", not
> "read-write". It should be stored in a system-wide bytecode cache.
>
> Think about it. The potential to save memory is limited. An
> application
> that has 10MB of object data will always have 10MB of object data.
> Most of
> the potential to save memory is found inside the duplication of raw
> bytecode. Is it necessary to define multiple Java processes in order
> to
> save memory? No. Is multiple Java processes the only way to save
> memory?
> No. Could there be a simpler alternative that saves just as much
> memory, or
> more? Yes. Is there a platform-independent solution? Yes.
>
> Without a bytecode cache, if there are 4MB of the same read/write
> bytecode
> per virtual machine and there are 100 virtual machines, that's 400MB
> of
> wasted swap space. With a bytecode cache, the same scene would require
> 4MB
> of swap space. By making bytecode a resource, it requires no swap
> space.
>
>
> So much effort has been consumed (wasted?) by the theory of multiple
> bytecode processes. The theory of multiple bytecode processes has been
> adopted and assumed by some without a lot of thought. It is a theory
> that
> may have never been challenged. I'll challenge it. By comparing
> multiple
> bytecode processes within a virtual machine to a highly optimized
> virtual
> machine, I have concluded that multiple bytecode processes within the
> same
> virtual machine does not "save" memory. Instead, it adds far too much
> complexity to the internal workings of a virtual machine. When the
> solution
> is reduced, it requires just as much memory to run multiple bytecode
> processes as it does to run multiple virtual machines. It is six of
> one,
> half dozen of the other.
>
> Add a bytecode cache and bytecode resource to an off-the-shelf virtual
> machine and this saves potentially all of the memory that might be
> saved by
> multiple bytecode processes.




Hmm. Shouldn't we discuss this openly, on the kernel mailing list?

At 03:06 AM 9/20/00 +0000, you wrote:
>I'm not sure what you mean by "bytecode cache".  Could you explain it in
>more detail?

>Are you talking about this email from last year?
>
><URL:http://jos.org/pipermail/arch/1999-November/000325.html>

No, no. That bytecode cache is a network bytecode cache. It enables a
network to cache bytecode at an HTTP, SQL, application or proxy server.

The kernel bytecode cache is system-wide and enables a kernel to cache
bytecode for multiple virtual machines and/or multiple bytecode processes.

>What you talk about in the above email is a way of caching information
>so that you don't have to download it again.  This does not save any
>memory at the JVM level.

A network bytecode cache reduces the amount of time required to download
bytecode; it does not save memory.

In contrast, a kernel bytecode cache saves memory; it does not reduce the
amount of time required to download bytecode. This is the kind of bytecode
cache I was describing.

>If you look at the classfile data structure you will see that it is
>optimized for size.  This means it has to be convert into an internal data
>structure before it can be used by a JVM [1].  Only having to load this
>internal data structure once is where you get the saving when you use
>multiple Java processes.

Let's see if I understand this correctly. You say that the classfile data
structure is optimized for size. You also agree that our goal is the
conservation of memory, a very precious resource. If the classfile is
already optimize for size and size matters, we can use classfile data
structure to save memory.

>If the "bytecode cache" contains this internal data structure then it
>might save memory but not otherwise.

You're saying that an internal data structure cache is the /only/ way to
save memory; but, it's not. Such a cache can contain bytecode, internal
data structures, or both and still save memory.

There is more than one way to construct a kernel bytecode cache, not just
one. While you may desire a cache that throws away the original bytecode
and saves internal data structures, I would like it to keep the original
bytecode.

I believe there are many reasons to save the original bytecode. Here are a
few.

1. The bytecode cache is vm-independent. An internal data structure is
always specific to a virtual machine. The internal data structure should be
internal to a virtual machine. It should not be exposed to a kernel and/or
other virtual machines. Multiple virtual machines should not share a common
internal data structure. If a virtual machine is optimized for speed, its
internal data structure is optimized for speed. If a virtual machine is
optimized for size, its internal data structure is optimized for size. And
so on.

2. Two classes are able to modify the state of an object if they are
equivalent. They are obviously equivalent if they share the same bytecode.
They are also obviously equivalent if there bytecode is matches
byte-for-byte. The internal data structure is more difficult, but not
impossible, to compare.

3. Boot classes can be statically linked to a kernel. The java.lang,
java.util, java.io and java.net packages, for example, can be pre-loaded in
a kernel bytecode cache. There are many examples of where this can save
space. When a virtual machine "loads" its boot classes directly from the
kernel, this saves time and opens up the possibility of downloading the
remainder of the standard Java class library from across the network.

For a MPCL-compatible virtual machine, it always "loads" its boot classes
directly from the kernel. Boot classes are guaranteed to be equivalent and
conserve memory in every bytecode process because CLASSPATH is not used for
boot classes. Also, static fields are unique for each primordial class
loader making that part of a bytecode process independent from all other
bytecode processes.

On the other hand, a similar cache is required within a virtual machine. It
is not a bytecode cache, but an internal class structure cache. The
internal class structure cache is part of a virtual machine, not a kernel.
It cannot be safely shared among virtual machines.

The operation of an internel class structure cache seems to be redundent
with the operation of a kernel bytecode cache. Is it? No. Here's why.

If we build five virtual machines, we must build five internal class
structure caches because the internal class structure is always unique to a
virtual machine. It might be a class hiearchy like this:

InternalClassStructureCache
  |
  +-- DecafInternalClassStructureCache
  |
  +-- KaffeInternalClassStructureCache
  |
  +-- JapharInternalClassStructureCache
  |
  +-- KissmeInternalClassStructureCache

There must be five different caches because the internel class structure of
one virtual machine does not match any other.

If we build five virtual machines, we build one kernel bytecode caches, not
five.

KernelBytecodeCache

There seems to be a fixation on the one kernel/one virtual machine
implementation. Why is this? It is not the only way. When decaf, Kaffe,
Japhar and Kissme are running at the same time on one kernel, there is one
kernel/multiple virtual machines. The maximum amount of conservation comes
from integrating the kernel bytecode cache with a virtual memory manager.
Bytecode that is not "in use" does not have to be stored in real memory.

In the future, it is more likely that multiple instances of a virtual
machine will be running on a kernel. It is more likely they will use a
preemptive multitasking kernel. This is also one kernel/multiple virtual
machines.

With John's proof of concept, it is more likely we'll use more of the Linux
kernel. And please don't try to convince me that only one instance of decaf
will ever be running on a kernel. Today, we're using the Linux kernel to
run decaf in host mode. Multiple instances of decaf can be running at the
same time on a Linux kernel. While this might not be ideal, it is real.

>1.  Any developer that writes a JVM that used the classfile directly as
>it's internal data structure should be shot IMHO.

When the stated goal is optimization for size, it seems obvious that a
classfile format is the best choice. It is already optimized for size. It
is undesireable and unnecessary to duplicate the data that is already
stored inside raw bytecode.

1. constant pool (class names, field names, method names, attribute names)
2. field table
3. method table
4. code attribute of a method
5. exception attribute of a method

Measured in bytes, more than 90% of my internal data structure is already
stored appropriately in raw bytecode. I see no need to duplicate 90% in
order to save 10% for each primordial class loader. I'd rather save the 90%.

When the stated goal is optimization for speed, size doesn't matter.
Conservation of memory is not an issue. The code attribute of all methods
can be compiled into machine code. The machine code has to be stored
somewhere in real memory. This is the domain of an internal data structure
for a virtual machine, not a system-wide kernel bytecode cache. For
multiple bytecode processes, classes must be compiled at least once for
each CLASSPATH.

I am doing whatever I can to understand the implications of my stated goal.
I am building a bytecode interpreter optimized for size. I have every
reason to believe the original bytecode is critical to my design. I
continue to combine bytecode resource and bytecode cache to save memory.


Reply via email to