[ 
https://issues.apache.org/jira/browse/KARAF-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987149#comment-17987149
 ] 

Vineeth commented on KARAF-7969:
--------------------------------

I am not sure whether this issue  is related as well

> Apache Karaf bundles entered a hang state due to a thread wait
> --------------------------------------------------------------
>
>                 Key: KARAF-7969
>                 URL: https://issues.apache.org/jira/browse/KARAF-7969
>             Project: Karaf
>          Issue Type: Bug
>          Components: karaf
>    Affects Versions: 4.2.16
>            Reporter: Vineeth
>            Priority: Major
>
> Hello 
> I have an AIX 7.3 system where I’m able to replicate the issue using IBM J9 
> Java 11, Apache Karaf bundles enter a hang state during shutdown.
> From the Thread dump , i am able to locate one issue in 
> *org.apache.karaf.main.Main* class in 
> {code:java}
> public void awaitShutdown() throws Exception {
>         if (framework == null) {
>             return;
>         }
>         while (true) {
>             FrameworkEvent event = framework.waitForStop(0);
>             if (event.getType() == FrameworkEvent.STOPPED_UPDATE) {
>                 if (lock != null) {
>                     lock.release();
>                 }
>                 while (framework.getState() != Bundle.STARTING && 
> framework.getState() != Bundle.ACTIVE) {
>                     Thread.sleep(10);
>                 }
>                 monitorThread = monitor();
>             } else {
>                 return;
>             }
>         }
>     } {code}
>  
> The main thread calls {{framework.waitForStop(0)}} with a timeout of 0, 
> meaning it will wait indefinitely
> This calls into ThreadGate.await() in the Felix framework, which uses 
> Object.wait() to block until notified
> The parameter 0 means "wait forever"  there's no timeout safety mechanism, 
> correct?For the main thread to unblock, the ThreadGate object needs to 
> receive a notification via notify() or notifyAll()
> This notification should come from the Felix framework when it completes 
> shutdown
> However, the Felix framework threads (FelixDispatchQueue, 
> FelixFrameworkWiring, FelixStartLevel) are all waiting themselves.
> _*+Here it should have set some timeout value something like 30 
> seconds(30000).?, What you think? Please feel free to correct me.+*_
> In thread dump,
> {code:java}
> 3XMTHREADINFO      "main" J9VMThread:0x0000000030010700, 
> omrthread_t:0x00000100100B2CE0, java/lang/Thread:0x00000000F007C2D0, 
> state:CW, prio=5
> 3XMJAVALTHREAD            (java/lang/Thread getId:0x1, isDaemon:false)
> 3XMJAVALTHRCCL            
> jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000F0075810)
> 3XMTHREADINFO1            (native thread ID:0x1320327, native priority:0x5, 
> native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000181)
> 3XMTHREADINFO2            (native stack address range 
> from:0x0000010010017D90, to:0x000001001009D788, size:0x859F8)
> 3XMCPUTIME               CPU usage total: 0.449041000 secs, user: 0.373210000 
> secs, system: 0.075831000 secs, current category="Application"3XMTHREADBLOCK  
>    Waiting on: org/apache/felix/framework/util/ThreadGate@0x00000000F05AF2D8 
> Owned by: <unowned>
> 3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
> 3XMTHREADINFO3           Java callstack:
> 4XESTACKTRACE                at java/lang/Object.waitImpl(Native Method)
> 4XESTACKTRACE                at java/lang/Object.wait(Object.java:251)
> 4XESTACKTRACE                at java/lang/Object.wait(Object.java:219)
> 4XESTACKTRACE                at 
> org/apache/felix/framework/util/ThreadGate.await(ThreadGate.java:79)
> 4XESTACKTRACE                at 
> org/apache/felix/framework/Felix.waitForStop(Felix.java:1075)
> 4XESTACKTRACE                at 
> org/apache/karaf/main/Main.awaitShutdown(Main.java:671)
> 4XESTACKTRACE                at org/apache/karaf/main/Main.main(Main.java:190)
> 3XMTHREADINFO3           Native callstack:
> 4XENATIVESTACK               _event_wait+0x2c (0x09000000006AF470 
> [libpthreads.a+0x19470])
> 4XENATIVESTACK               _cond_wait_local+0x2e4 (0x09000000006B75E8 
> [libpthreads.a+0x215e8])
> 4XENATIVESTACK               _cond_wait+0x34 (0x09000000006B7D38 
> [libpthreads.a+0x21d38])
> 4XENATIVESTACK               pthread_cond_wait+0x1a8 (0x09000000006B880C 
> [libpthreads.a+0x2280c])
> 4XENATIVESTACK               IPRA.$monitor_wait_original+0xa10 
> (0x0900000000D88CD4 [libj9thr29.so+0x9cd4])
> 4XENATIVESTACK               omrthread_monitor_wait_interruptable+0x50 
> (0x0900000000D89474 [libj9thr29.so+0xa474])
> 4XENATIVESTACK               monitorWaitImpl+0x488 (0x090000000D84668C 
> [libj9vm29.so+0x8168c])
> 4XENATIVESTACK               (0x090000000D9BC924 [libj9vm29.so+0x1f7924])
> 4XENATIVESTACK               (0x090000000D8A9168 [libj9vm29.so+0xe4168])
> 4XENATIVESTACK               runCallInMethod+0x2d0 (0x090000000D8073F4 
> [libj9vm29.so+0x423f4])
> 4XENATIVESTACK               _ZL26gpProtectedRunCallInMethodPv+0x4c 
> (0x090000000D7C66F0 [libj9vm29.so+0x16f0])
> 4XENATIVESTACK               signalProtectAndRunGlue+0x28 (0x090000000D822A8C 
> [libj9vm29.so+0x5da8c])
> 4XENATIVESTACK               omrsig_protect+0x4fc (0x090000000D555760 
> [libj9prt29.so+0x5f760])
> 4XENATIVESTACK               gpProtectAndRun+0xf0 (0x090000000D8227F4 
> [libj9vm29.so+0x5d7f4])
> 4XENATIVESTACK               gpCheckCallin+0x118 (0x090000000D7C665C 
> [libj9vm29.so+0x165c])
> 4XENATIVESTACK               callStaticVoidMethod+0x44 (0x090000000D8E9728 
> [libj9vm29.so+0x124728])
> 4XENATIVESTACK               JavaMain+0xc14 (0x0000010000007258 [java+0x7258])
> 4XENATIVESTACK               ThreadJavaMain+0xc (0x000001000000DCB0 
> [java+0xdcb0])
> 4XENATIVESTACK               (0x0900000000089214 [libpthreads.a+0x214]){code}
> I'm not entirely sure how everything is linked together, but it seems to form 
> a larger race condition, which eventually leads to a hang state in the Karaf 
> bundles. This issue can be easily reproduced in server mode.
> Simply use the {{/bin/start}} script, monitor the status(/bin/status), and 
> once the server has started completely, run the {{/bin/stop}} script in a 
> loop. After some iteration the system enters a hang state. I managed to 
> capture thread and heap dumps during this, which gave me some insights that 
> I’ve shared here.
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to