Re: Some questions about the architecture

Rodrigo Kumpera Wed, 19 Oct 2005 14:49:40 -0700

On 10/19/05, Apache Harmony Bootstrap JVM <[EMAIL PROTECTED]> wrote:
>
>
> -----Original Message-----
> From: Rodrigo Kumpera <[EMAIL PROTECTED]>
> Sent: Oct 19, 2005 1:49 PM
> To: harmony-dev@incubator.apache.org, Apache Harmony Bootstrap JVM <[EMAIL 
> PROTECTED]>
> Subject: Re: Some questions about the architecture
>
> On 10/19/05, Apache Harmony Bootstrap JVM <[EMAIL PROTECTED]> wrote:
> >
> > Rodrigo,
> >
> > At some point, _somebody_ has to wait on I/O.  I agree
> > that this is not the most efficient implementation, but one
> > of the advantages it has is that it does not need _any_
> > gc_safepoint() type calls for read or write barriers.
> > I am _definitely_ interested in your suggestions, and
> > I think others will agree with you, but let's get the code
> > up and running as it stands so we can try other approaches
> > and compare what good things they bring to the table
> > instead of, or even in addition to, the existing approach.
>
> I think I have not been clear enout. safepoints are needed by the
> garbage collector to know when is safe to stop a given thread (in
> bounded time) for a stop-the-world garbage collection. This have
> nothing to do with read/write barriers.
>
> ---
>
> Notice that in 'jvm/src/jvmcfg.h' there is a JVMCFG_GC_THREAD
> that is used in jvm_run() as a regular thread like any other.
> It calls gc_run() on a scheduled basis.  Also, any time an object
> finalize() is done, gc_run() is possible.  Yes, I treat GC as a
> stop-the-world process, but here is the key:  Due to the lack
> of asynchronous native POSIX threads, there are no safe points
> required.  The only thread is the SIGALRM target that sets the
> volatile boolean in timeslice_tick() for use by opcode_run() to
> test.  <b>This is the _only_ formally asynchrous data structure in
> the whole machine.</b>  (Bold if you use an HTML browser, otherwise
> clutter meant for emphasis.)  Objects that contain no references can
> be GC'd since they are merely table entries.  Depending on how the
> GC algorithm is done, gc_run() may or may not even need to look
> at a particular object.
>
> Notice also that classes are treated in the same way by the GC API.
> If a class is no longer referenced by any objects, it may be GC'd also.
> First, its intrinsic class object must be GC'd, then the class itself.  This
> may take more than one pass of gc_run() to make it happen.
>
> ---


How exactly the java thread stack is scanned for references on this
scheme? Safepoints are required for 2 reasons, first to allow native
threads proper pausing and second to make easier for the garbage
collector identify what on the stack is a reference and what is not.

The first one is a non-issue in this case, but the second one is, as
precise "java stack" scanning is required for any moving collector
(f.e. semi-space, train or mark-sweep-compact). The solution for the
second problem is either have a tagged stack (we tag each slot in the
stack is it's a reference or not), generate gc_maps for all bytecodes
of a method (memory-wise, this is not pratical, with JIT'ed code even
worse).




>
> For exemple, as I understand, JikesRVM implements gc safepoints (the
> points in the bytecode where gc maps are generated) at loop backedges
> and method calls.
>
> > The priorities that I set were (1) get the logic working
> > without resorting to design changes such as multi-threading,
> > then (2) optimize the implementation and evaluate
> > improvements and architectural changes, then (3) implement
> > improvements and architectural changes.  The same goes
> > for the object model using the OBJECT() macro and the
> > 'robject' structure in 'jvm/src/object.h'.  And the CLASS()
> > macro, and the STACK() macro, and other components
> > that I have tried to implement in a modular fashion (see 'README'
> > for a discussion of this issue).  Let's get it working, then look into
> > design changes, even having more than one option available at
> > configuration time, compile time, or even run time, such as is
> > now the case with the HEAP_xxx() macros and the GC_xxx()
> > macros that Robin Garner has been asking about.
> >
> > As to the 'jvm/src/timeslice.c' code, notice that each
> > time that SIGALRM is received, the handler sets a
> > volatile boolean that is read by the JVM inner loop
> > in 'while ( ... || (rfalse == pjvm->timeslice_expired))'
> > in 'jvm/src/opcode.c' to check if it is time to give the
> > next thread some time.  I don't expect this to be the
> > most efficient check, but it _should_ work properly
> > since I have unit tested the time slicing code, both
> > the while() test and the setting of the boolean in
> > timeslice_tick().  One thing I have heard on this
> > list is that one of the implementations, I think it was
> > IBM's Jikes (?), was that they chose an interpreter
> > over a JIT.  Now that is not directly related to time
> > slicing, but it does mean that a mechanism like what I
> > implemented does not have to have compile-time
> > support.
> >
> > *** How about you JVM experts out there?  Do you have
> >       any wisdom for me on the subject of time slicing
> >       on an outer/inner interpreter loop interpreter
> >       implementation?  And compared to JIT?  Archie Cobb,
> >       what do you think?  How about you lurkers out there? ***
>
> All open source JVMs I checked use native threads, you can take a look
> at how IBM did with Native POSIX Threading Library (NPTL), as it
> implement userland threads on linux.
>
> ---
>
> I would be interested in your evaluation of the existing implementation
> against what could be done to implement such an approach.

First, it's was not NPTL, but NGPT the project IBM created, my fault.
The IBM site seens to be offline.From what I remember, it implemented
userland threads with coordination of the kernel to do context switch
and scheduling, basically using signals to perform the context switch.

Anyway, I think it seens to be a good decision to switch soon to a
native thread implementation, as it requires less code to have proper
schedulling and good I/O primitives.


> ---
>
> > As to your question about setjmp/longjmp, I agree that
> > there are other ways to do it.  In fact, I originally used
> > stack walking in one sense to return from fatal errors
> > instead for my original implementation of the heap
> > allocator, which used malloc/free.  If I got an error
> > from malloc(), I simply returned a NULL pointer, which
> > I tested from the calling function.  If I got this error,
> > I returned to its caller with an error, and so on, all the
> > way up.  However, what happens when you have a
> > normally (void) return?  Use TRUE/FALSE instead?
> > Could be.  But the more I developed the code, the
> > harder this became to support.  Therefore, since fatal
> > errors kill the application anyway, I decided to _VASTLY_
> > simplify the code by using what is effectively the OO concept
> > of an exception as available in the 'C' runtime library
> > with setjmp/longjmp.  Notice that many complicated models
> > can end up with irresolvable terminal conditions and that
> > the simplest way to escape is back to a known good state.
> > This is the purpose of setjmp/longjmp.  Try this on for size
> > with any communication protocol implementation, such as
> > TCP/IP some time.  When you get to a snarled condition where
> > there just is not any graceful way out, the non-local character
> > of setjmp/longjmp cuts that knot instead of untying it with
> > horrible error code checking back up the stack.  This is why
> > I finally decided to go this way.  (Does this answer your main
> > question here?)
>
> It does, but by stack walking I meant not returning null, but having
> the code analise the call stack for a proper IP address to use.
>
> ---
> What do you mean by 'IP address' in this context?  I think I am
> missing something.
> ---


By IP I mean Intruction Pointer, the  EIP register in x86 f.e. What I
mean was something like this:

void throw_exception(jobject_t *ex) {
        long * ip = (*(&ex - 1)); //the return address is after the arguments
        long * sp = (*(&ex - 2)); //the old frame pointer is after the return 
address
        jclass_t * cl = ex->vtable->class_obj;

        printf("obj 0x%x ip 0x%x sp 0x%x\n", obj, ip, sp);
        
        printf("------\n");
        //this code performs stack unwinding, it misses synchronized methods .
        while(isNotThreadBaseFunction(ip)) {
                printf("trace element ip 0x%x sp 0x%x\n", ip, sp);
                catch_info_t * info = find_catch_info(ip, cl);
                if(info) restore_to(ip, sp, ex, info);
                ip = (long *)*(sp+ 1);
                sp = (long *)*sp;
        }
        printf("-----\n");
        fflush(stdout);
        //uncaught exception, must never happen, this is a JVM bug.
        //in my vm, at least, uncaught exceptions where handled by the
implementation of Thread.
}

find_catch_info was implemented in java, but looks something like this
(don't bother with the linear search for now):

catch_info * find_catch_info(long *ip, jclass_t *ex) {
  if(ip < vm ->compiledMethodsStart || ip > vm->compiledMethodsEnd)
      return 0;
  foreach(compiled_method_t * m, vm->compiledMethods)
      if(m->owns(ip)) //this instruction pointer belongs to this method
         return m->findCatch(ip, ex); //find a catch block for the exception
  return 0;
}

restore_to is implemented this way:

state void restore_to(long *ip, long *frame, jobject_t *ex, catch_info *info)  {
   asm("movl %0, %%eax;"
                "movl %1, %%ebx;"
                "movl %2, %%ecx;"
                "movl %3, %%edx;"
                "movl %%ebx, %%ebp;"
                "movl %%ebp, %%esp;"
                "subl %%edx, %%esp;"
                "pushl %%ecx;"
                "pushl %%eax;"
                "ret;"
                        :
                        :"m"(ip), "m"(frame), "m"(ex), "m"(info->stackDelta)
//stackDelta is local storage + temp storage
                        :"%eax", "%ebx", "%ecx", "%edx");
}

This stuff works only in a JIT only enviroment, but only some minor
tweaks would be required to work in a hybrid enviroment

>
> > Also, I sort of get the impression that you may be blurring the
> > distinction between the native 'C' code runtime environment
> > and the virtual Java runtime environment when you talk
> > about serialization, security, GC, and JNI.  (This is _very_
> > easy to do!  This is why I begin my real-machine data types
> > with 'r' and Java data types with 'j'.  I was confusing myself
> > all the time!)  Obviously, there is no such thing as setjmp/longjmp
> > in the OO paradigm, but they do have a better method,
> > namely, the concept of the exception.  That is effectively
> > what I have tried to implement here in the native 'C' code
> > on the real platform, to use OO terms.  Did I misunderstand you?
> >
>
> Not exactly, GC must walk the stack to find the root set;
> Serialization needs to find what is the last user class loader on
> stack (since it's the one used to lookup classes for deserialization);
> Security needs to walk the stack for performing checks on the code
> base of each method on on; and JNI needs this as exceptions are queued
> for using by the ExceptionOccurred call.
>
> I did look at opcode.c and thread.c but I could not find the stack
> unwinding code, could you  point me where it is located?
>
> ---
>
> Which stack to you mean?  A thread's JVM stack?  The real machine
> stack?  I think I'm confused.
>
> ---
>
> > Thanks,
> >
> >
> > Dan Lydick
> >
>
>
>
>
> Dan Lydick
>

Re: Some questions about the architecture

Reply via email to