[ https://issues.apache.org/jira/browse/KARAF-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619610#comment-13619610 ]
Amichai Rothman commented on KARAF-2256: ---------------------------------------- Great point. I've switched to equinox, and in the fiddling I did since then I couldn't recreate a deadlock. Although this is inconclusive proof (as race conditions and deadlocks often are), Felix is now looking like the main suspect. > Deadlock when refreshing bundles > -------------------------------- > > Key: KARAF-2256 > URL: https://issues.apache.org/jira/browse/KARAF-2256 > Project: Karaf > Issue Type: Bug > Affects Versions: 2.3.1 > Environment: 64-bit Linux Oracle JDK 1.7.0_17 > Reporter: Amichai Rothman > Assignee: Achim Nierbeck > > When attempting to install the DOSGi feature (by running "features:chooseurl > cxf-dosgi 1.4.0" and "features:install cxf-dosgi-discovery-distributed"), the > installation hangs along with some of the bundles which can no longer be > started, stopped, checked for imports, etc. - the Karaf server must be killed > and restarted to resume. This is likely not related to this specific feature, > and can happen with other refreshed bundles and installed features as well. > At a glance it seems like this is caused by the "OPS4J Pax Web - Runtime > (1.1.12)" bundle being stuck in the stopping state due to a deadlock caused > by its Activator: > It receives a removedService notification from a service tracker, which is > handled in a separate thread using a custom executor and eventually tries to > resolve some bundle and ends up waiting for acquireGlobalLock indefinitely. > This is because at the same time, Felix calls refreshPackages which attempts > to stop the bundle (while holding the lock), whose activator puts a cleanup > task in its custom executor and then attempts to shut down the executor. This > never happens, because the previous executor task initiated from > removeService is waiting for the lock, hence the deadlock. > I'm not entirely sure which of the projects has the underlying bug in it - > probably pax web, possibly Felix if the OSGi specs allow for the behavior > that hangs it, but in any case Karaf is using these versions and exhibiting > the deadlock, so at the very least should upgrade to fixed versions of these > libraries, or patch them. > If anyone who knows these systems better thinks it should be reported in one > of the upstream projects, point me in the right direction and I'll be happy > to do it. > Here is the thread dump, the top two threads show the deadlock, and the other > two are bundles which are stuck as well due to waiting for the same lock (I > think). > "FelixFrameworkWiring" daemon prio=10 tid=0x00007f390002e000 nid=0x35d1 in > Object.wait() [0x00007f3948dd3000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000000f1b8ea50> (a java.lang.Object) > at java.lang.Object.wait(Object.java:503) > at > org.ops4j.pax.web.service.internal.Executor.shutdown(Executor.java:91) > - locked <0x00000000f1b8ea50> (a java.lang.Object) > at > org.ops4j.pax.web.service.internal.Activator.stop(Activator.java:140) > at > org.apache.felix.framework.util.SecureAction.stopActivator(SecureAction.java:667) > at org.apache.felix.framework.Felix.stopBundle(Felix.java:2361) > at > org.apache.felix.framework.Felix$RefreshHelper.stop(Felix.java:4629) > at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3951) > at > org.apache.felix.framework.FrameworkWiringImpl.run(FrameworkWiringImpl.java:172) > at java.lang.Thread.run(Thread.java:722) > "Pax Web Runtime worker" daemon prio=10 tid=0x00007f3904263000 nid=0x370a in > Object.wait() [0x00007f390dfa8000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;) > at java.lang.Object.wait(Object.java:503) > at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4944) > - locked <0x00000000e990e018> (a [Ljava.lang.Object;) > at > org.apache.felix.framework.StatefulResolver.resolve(StatefulResolver.java:219) > at > org.apache.felix.framework.BundleWiringImpl.searchDynamicImports(BundleWiringImpl.java:1539) > at > org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1439) > at > org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:72) > at > org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1843) > at java.lang.ClassLoader.loadClass(ClassLoader.java:356) > at > org.apache.felix.framework.BundleWiringImpl.getClassByDelegation(BundleWiringImpl.java:1317) > at > org.apache.felix.framework.ServiceRegistrationImpl$ServiceReferenceImpl.isAssignableTo(ServiceRegistrationImpl.java:521) > at > org.apache.felix.framework.util.Util.isServiceAssignable(Util.java:280) > at > org.apache.felix.framework.util.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:916) > at > org.apache.felix.framework.util.EventDispatcher.fireEventImmediately(EventDispatcher.java:793) > at > org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(EventDispatcher.java:543) > at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4260) > at org.apache.felix.framework.Felix.access$000(Felix.java:74) > at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:390) > at > org.apache.felix.framework.ServiceRegistry.unregisterService(ServiceRegistry.java:148) > at > org.apache.felix.framework.ServiceRegistrationImpl.unregister(ServiceRegistrationImpl.java:127) > at > org.ops4j.pax.web.service.internal.Activator.updateController(Activator.java:231) > at > org.ops4j.pax.web.service.internal.Activator$DynamicsServiceTrackerCustomizer$2.run(Activator.java:387) > at > org.ops4j.pax.web.service.internal.Executor$Future.run(Executor.java:45) > at > org.ops4j.pax.web.service.internal.Executor$Worker.run(Executor.java:122) > "fileinstall-/opt/apache-karaf-2.3.1/deploy" daemon prio=10 > tid=0x00007f3904018800 nid=0x35a8 in Object.wait() [0x00007f394aba8000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;) > at java.lang.Object.wait(Object.java:503) > at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871) > - locked <0x00000000e990e018> (a [Ljava.lang.Object;) > at org.apache.felix.framework.Felix.startBundle(Felix.java:1744) > at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944) > at > org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1247) > at > org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1219) > at > org.apache.felix.fileinstall.internal.DirectoryWatcher.startAllBundles(DirectoryWatcher.java:1208) > at > org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:503) > at > org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:291) > "NioProcessor-2" prio=10 tid=0x00007f3914014000 nid=0x35fd in Object.wait() > [0x00007f394a064000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000000e990e018> (a [Ljava.lang.Object;) > at java.lang.Object.wait(Object.java:503) > at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4871) > - locked <0x00000000e990e018> (a [Ljava.lang.Object;) > at org.apache.felix.framework.Felix.startBundle(Felix.java:1744) > at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:944) > at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:931) > at > org.apache.karaf.features.internal.FeaturesServiceImpl.installFeatures(FeaturesServiceImpl.java:479) > at > org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:396) > at > org.apache.karaf.features.internal.FeaturesServiceImpl.installFeature(FeaturesServiceImpl.java:392) > at > org.apache.karaf.features.command.InstallFeatureCommand.doExecute(InstallFeatureCommand.java:62) > at > org.apache.karaf.features.command.FeaturesCommandSupport.doExecute(FeaturesCommandSupport.java:41) > at > org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:38) > at > org.apache.felix.gogo.commands.basic.AbstractCommand.execute(AbstractCommand.java:35) > at > org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:78) > at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:474) > at > org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:400) > at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108) > at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183) > at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120) > at > org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89) > at > org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand$1.run(ShellCommandFactory.java:109) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.karaf.shell.ssh.ShellCommandFactory$ShellCommand.start(ShellCommandFactory.java:107) > at > org.apache.sshd.server.channel.ChannelSession.handleExec(ChannelSession.java:388) > at > org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:235) > at > org.apache.sshd.server.channel.ChannelSession.handleRequest(ChannelSession.java:195) > at > org.apache.sshd.common.session.AbstractSession.channelRequest(AbstractSession.java:1057) > at > org.apache.sshd.server.session.ServerSession.running(ServerSession.java:229) > at > org.apache.sshd.server.session.ServerSession.handleMessage(ServerSession.java:205) > at > org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:566) > at > org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:236) > - locked <0x00000000efd56b00> (a java.lang.Object) > at > org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:58) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765) > at > org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417) > at > org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira