Chetan Mehrotra created SLING-2719:
--------------------------------------
Summary: Deadlock in
ResourceResolverFactoryActivator.checkFactoryPreconditions
Key: SLING-2719
URL: https://issues.apache.org/jira/browse/SLING-2719
Project: Sling
Issue Type: Bug
Components: ResourceResolver
Affects Versions: Resource Resolver 1.0.2
Environment: JBoss
Reporter: Chetan Mehrotra
We are seeing intermittent issues of deadlock while running a Sling based
webapp in an app server like JBoss. The deadlock is being seen between the
FelixFrameworkWiring and FelixStartLevel threads.
For example analyzing the order of locks taken in the threaddump-1.log (shown
below). Here the FelixFrameworkWiring thread has the Global bundle lock at
Felix level [1] and is waiting for the lock in
ResourceResolverFactoryActivator.checkFactoryPreconditions. While the
FelixStartLevel thread has the lock on RRF and is waiting for global lock. Thus
resulting in a deadlock
The FelixFrameworkWiring [5] is busy in deactivating components because of a
package refresh earlier (which lead to repository getting shutdown and thus
triggering deactivation of ResourceResolverFactoryActivator). While the
FelixStartLevel [6] thread has activated ResourceResolverFactoryActivator (thus
hold the lock) and later requires global lock for some operation.
Looking at the code for
ResourceResolverFactoryActivator.checkFactoryPreconditions [2] it appears to
take and hold a lock (on this) while making a call to OSGi container. Such a
usage *might* cause issues like deadlock. So it would be better if the
ResourceResolverFactoryActivator does not hold any lock while making the call
to container services [3]
"FelixFrameworkWiring"
- locked <0x00000007944da478> (a java.util.concurrent.atomic.AtomicReference)
org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x00000007944da9b0> (a java.util.concurrent.atomic.AtomicReference)
org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x00000007944dae38> (a java.util.concurrent.atomic.AtomicReference)
org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x0000000796d5d030> (a java.util.concurrent.atomic.AtomicReference)
org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- waiting to lock <0x000000079624ff08> (a
org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator)
org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:330)
"FelixStartLevel"
- locked <0x000000079624ff08> (a
org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator)
org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:324)
- locked <0x0000000796959bc8> (a java.util.concurrent.atomic.AtomicReference)
org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- locked <0x0000000796959eb8> (a java.util.concurrent.atomic.AtomicReference)
org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- locked <0x000000079695a188> (a java.util.concurrent.atomic.AtomicReference)
org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- waiting <0x000000079415eca0> (a [Ljava.lang.Object;)
org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:5019)
[1] This has been confirmed via the value for m_globalLockThread of Felix
instance in Heap Dump
[2]
https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/ResourceResolverFactoryActivator.java#L313
[3] http://njbartlett.name/files/osgibook_preview_20091217.pdf (Section 6.4
Don’t Hold Locks when Calling Foreign Code)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira