Re: [drlvm] run of smoke tests on overloaded box

Rana Dasgupta Mon, 18 Jun 2007 18:06:26 -0700

On 6/15/07, Xiao-Feng Li <[EMAIL PROTECTED]> wrote:


Very interesting study. This situation happens not only here but also
finalizer threads shutdown. We have test case that creates infinite
loop execution in a finalizer (or waiting on a lost socket), requiring
the system can shutdown correctly by sort of figuring out this
situation and not waiting for the (dead) finalizer's finish. At the
same time, we have test case that lets the finalizer to run lots of
heavy duty work, and requiring the system to figure out this situation
and waiting for the finalizer's finish.

In GCv5, we solved the problem (or passed the tests anyway) by letting
the system to timed wait on the finalizers. If at the timeout event we
detect there is at least one finalizer is executed, we will loop back
timed waiting again, since in this case it means the finalizers are
still making progress. If at a timeout event we find the finalizers
number is unchanged, we decide the finalizers are dead and will go on
to exit.


For the finalizers that we want to allow to complete, we can
reasonably expect that they will all end at some point( except for
those in error situations like infinite loops ). For shutdown, there
is no contract like runFinalizersOnExit etc. that requires us to allow
all the daemon threads to finish. So we attach a callback that fires
at the next thread safepoint and expect the thread to correctly
release all locks, memory etc. resources before exiting. This is hard
to do( eg., in our implementation, at the next safepoint callback, we
just exit ). This means that currently every time a thread exits and
joins successfully, we are sort of increasing the probability that
some other thread(s) would be hung. So I am not sure how to converge
using an interative approach. I can try a couple of ways. That's why
we need the realistic scenarios.


The problem is, we don't know which timeout value is reasonable, 1ms
or 1s. In this case, I personally think a bigger value makes more
sense. Since in our case, the timed wait doesn't need to wait for
timeout, it can also be waken up by the finalizers once they are
finished, so a longer timeout value does not impact the performance
normally. I guess this is the same case for the thread joining timed
wait?


It is a little different ( as above ), in the timeout we are trying to
guess when the running thread's next safepoint will be taken. The
blocked threads of course cannot execute the callbacks. In theory if
all the threads that are running, correctly release resources and exit
at the next safepoint, the blocked threads should become unblocked and
also exit correctly unless there are deadlocks already. So some form
of iteration could help, but for this the safepoint callback has to be
really good.


Thanks,
xiaofeng

Re: [drlvm] run of smoke tests on overloaded box

Reply via email to