Hi All,

While working on http://issues.apache.org/jira/browse/HARMONY-2391 I
found out that current implementation of safe-point callbacks works
incorrectly/unsafely. Here is a list of problems:

1) hythread_suspend & hythread_resume are used asynchronously what may
lead to a deadlock. Here is the scenario:
   a) T1 wants to set a safe-point callback to T2. So it calls
hythread_set_safe_point_callback(T2, the_callback).
   b) T1 sets T2->safepoint_callback to the_callback;
   c) T2 executes hythread_resume(self) which is under if
(thread->safepoint_callback) statement.
   d) T1 calls send_suspend_request(T2). So T2 will never be resumed
because it already invoked corresponding hythread_resume(self)
statement on the previous step.

2) Current implementation sets suspend_disable_count to 1
(thread_native_suspend.c:162) and allows execution of unsafe code
(which must be executed under suspend disabled state) while GC may be
working.....

3) In stop_callback (thread_java_basic.c:413) suspend_request for the
current thread is set to zero. So the thread just ignores
suspend_requests and continue to do its dirty things.

All these problems are fixed in the HARMONY-2391. But
org.apache.harmony.luni.tests.java.lang.ThreadGroupTest.test_suspend
starts to fail. The scenario is as follows:

1) T1 suspends T2 (so the suspend_request is >0).
2) T1 stops T2 by setting up the stop_callback to T2.
3) T2 needs to switch to suspend disabled state to be able to throw
exception but it can't do that because of step 1)

From the one hand, If I rollback my changes for the 3) problem
described in the first list the this test works fine. From the other
hand, I can't do it because it may lead to a crash if GC is really
working at the time I want to throw an exception. I don't see how to
fix it in an easy way right now.... but want to proceed with
HARMONY-2391.

What would you recommend to do?
1) Commit the HARMONY-2391 patch as is. File a JIRA regarding failing
case. Exclude the test until the bug is fixed.
2) Commit the HARMONY-2391 patch with 3) undone. File a JIRA. In this
case it is possible to have intermittent failures until the problem is
not fixed.

Thanks
Evgueni

Reply via email to