Vineeth created KARAF-7969:
------------------------------

             Summary: Apache Karaf bundles entered a hang state due to a thread 
wait
                 Key: KARAF-7969
                 URL: https://issues.apache.org/jira/browse/KARAF-7969
             Project: Karaf
          Issue Type: Bug
          Components: karaf
    Affects Versions: 4.2.16
            Reporter: Vineeth


Hello 

I have an AIX 7.3 system where I’m able to replicate the issue using IBM J9 
Java 11

>From the Thread dump , i am able to locate one issue in 
>*org.apache.karaf.main.Main* class in 


public void awaitShutdown() throws Exception {    if (framework == null) {      
  return;
    }    while (true) {
        FrameworkEvent event = framework.waitForStop(0);        if 
(event.getType() == FrameworkEvent.STOPPED_UPDATE) {            if (lock != 
null) \{
                lock.release();
            }            while (framework.getState() != Bundle.STARTING && 
framework.getState() != Bundle.ACTIVE) {                Thread.sleep(10);
            }
            monitorThread = monitor();
        } else {            return;
        }
    }
}
The main thread calls {{framework.waitForStop(0)}} with a timeout of 0, meaning 
it will wait indefinitely
This calls into ThreadGate.await() in the Felix framework, which uses 
Object.wait() to block until notified
The parameter 0 means "wait forever"  there's no timeout safety mechanism, 
correct?For the main thread to unblock, the ThreadGate object needs to receive 
a notification via notify() or notifyAll()
This notification should come from the Felix framework when it completes 
shutdown
However, the Felix framework threads (FelixDispatchQueue, FelixFrameworkWiring, 
FelixStartLevel) are all waiting themselves.

_*+Here it should have set some timeout value something like 30 
seconds(30000).?, What you think? Please feel free to correct me.+*_

In thread dump,


3XMTHREADINFO      "main" J9VMThread:0x0000000030010700, 
omrthread_t:0x00000100100B2CE0, java/lang/Thread:0x00000000F007C2D0, state:CW, 
prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x1, isDaemon:false)
3XMJAVALTHRCCL            
jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000F0075810)
3XMTHREADINFO1            (native thread ID:0x1320327, native priority:0x5, 
native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000181)
3XMTHREADINFO2            (native stack address range from:0x0000010010017D90, 
to:0x000001001009D788, size:0x859F8)
3XMCPUTIME               CPU usage total: 0.449041000 secs, user: 0.373210000 
secs, system: 0.075831000 secs, current category="Application"3XMTHREADBLOCK    
 Waiting on: org/apache/felix/framework/util/ThreadGate@0x00000000F05AF2D8 
Owned by: <unowned>
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/Object.waitImpl(Native Method)
4XESTACKTRACE                at java/lang/Object.wait(Object.java:251)
4XESTACKTRACE                at java/lang/Object.wait(Object.java:219)
4XESTACKTRACE                at 
org/apache/felix/framework/util/ThreadGate.await(ThreadGate.java:79)
4XESTACKTRACE                at 
org/apache/felix/framework/Felix.waitForStop(Felix.java:1075)
4XESTACKTRACE                at 
org/apache/karaf/main/Main.awaitShutdown(Main.java:671)
4XESTACKTRACE                at org/apache/karaf/main/Main.main(Main.java:190)
3XMTHREADINFO3           Native callstack:
4XENATIVESTACK               _event_wait+0x2c (0x09000000006AF470 
[libpthreads.a+0x19470])
4XENATIVESTACK               _cond_wait_local+0x2e4 (0x09000000006B75E8 
[libpthreads.a+0x215e8])
4XENATIVESTACK               _cond_wait+0x34 (0x09000000006B7D38 
[libpthreads.a+0x21d38])
4XENATIVESTACK               pthread_cond_wait+0x1a8 (0x09000000006B880C 
[libpthreads.a+0x2280c])
4XENATIVESTACK               IPRA.$monitor_wait_original+0xa10 
(0x0900000000D88CD4 [libj9thr29.so+0x9cd4])
4XENATIVESTACK               omrthread_monitor_wait_interruptable+0x50 
(0x0900000000D89474 [libj9thr29.so+0xa474])
4XENATIVESTACK               monitorWaitImpl+0x488 (0x090000000D84668C 
[libj9vm29.so+0x8168c])
4XENATIVESTACK               (0x090000000D9BC924 [libj9vm29.so+0x1f7924])
4XENATIVESTACK               (0x090000000D8A9168 [libj9vm29.so+0xe4168])
4XENATIVESTACK               runCallInMethod+0x2d0 (0x090000000D8073F4 
[libj9vm29.so+0x423f4])
4XENATIVESTACK               _ZL26gpProtectedRunCallInMethodPv+0x4c 
(0x090000000D7C66F0 [libj9vm29.so+0x16f0])
4XENATIVESTACK               signalProtectAndRunGlue+0x28 (0x090000000D822A8C 
[libj9vm29.so+0x5da8c])
4XENATIVESTACK               omrsig_protect+0x4fc (0x090000000D555760 
[libj9prt29.so+0x5f760])
4XENATIVESTACK               gpProtectAndRun+0xf0 (0x090000000D8227F4 
[libj9vm29.so+0x5d7f4])
4XENATIVESTACK               gpCheckCallin+0x118 (0x090000000D7C665C 
[libj9vm29.so+0x165c])
4XENATIVESTACK               callStaticVoidMethod+0x44 (0x090000000D8E9728 
[libj9vm29.so+0x124728])
4XENATIVESTACK               JavaMain+0xc14 (0x0000010000007258 [java+0x7258])
4XENATIVESTACK               ThreadJavaMain+0xc (0x000001000000DCB0 
[java+0xdcb0])
4XENATIVESTACK               (0x0900000000089214 [libpthreads.a+0x214])
I'm not entirely sure how everything is linked together, but it seems to form a 
larger race condition, which eventually leads to a hang state in the Karaf 
bundles. This issue can be easily reproduced in server mode.

Simply use the {{/bin/start}} script, monitor the status(/bin/status), and once 
the server has started completely, run the {{/bin/stop}} script in a loop. 
After some iteration the system enters a hang state. I managed to capture 
thread and heap dumps during this, which gave me some insights that I’ve shared 
here.

Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to