Chetan Mehrotra created SLING-2719:
--------------------------------------

             Summary: Deadlock in 
ResourceResolverFactoryActivator.checkFactoryPreconditions
                 Key: SLING-2719
                 URL: https://issues.apache.org/jira/browse/SLING-2719
             Project: Sling
          Issue Type: Bug
          Components: ResourceResolver
    Affects Versions: Resource Resolver 1.0.2
         Environment: JBoss
            Reporter: Chetan Mehrotra


We are seeing intermittent issues of deadlock while running a Sling based 
webapp in an app server like JBoss. The deadlock is being seen between the 
FelixFrameworkWiring and FelixStartLevel threads. 

For example analyzing the order of locks taken in the threaddump-1.log (shown 
below). Here the FelixFrameworkWiring thread has the Global bundle lock at 
Felix level [1] and is waiting for the lock in 
ResourceResolverFactoryActivator.checkFactoryPreconditions. While the 
FelixStartLevel thread has the lock on RRF and is waiting for global lock. Thus 
resulting in a deadlock

The FelixFrameworkWiring [5] is busy in deactivating components because of a 
package refresh earlier (which lead to repository getting shutdown and thus 
triggering deactivation of ResourceResolverFactoryActivator). While the 
FelixStartLevel [6] thread has activated ResourceResolverFactoryActivator (thus 
hold the lock) and later requires global lock for some operation.

Looking at the code for 
ResourceResolverFactoryActivator.checkFactoryPreconditions [2] it appears to 
take and hold a lock (on this) while making a call to OSGi container. Such a 
usage *might* cause issues like deadlock. So it would be better if the 
ResourceResolverFactoryActivator does not hold any lock while making the call 
to container services [3]


"FelixFrameworkWiring"
- locked <0x00000007944da478> (a java.util.concurrent.atomic.AtomicReference) 
org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x00000007944da9b0> (a java.util.concurrent.atomic.AtomicReference) 
org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x00000007944dae38> (a java.util.concurrent.atomic.AtomicReference) 
org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- locked <0x0000000796d5d030> (a java.util.concurrent.atomic.AtomicReference) 
org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
- waiting to lock <0x000000079624ff08> (a 
org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator) 
org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:330)

"FelixStartLevel"
- locked <0x000000079624ff08> (a 
org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator) 
org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:324)
- locked <0x0000000796959bc8> (a java.util.concurrent.atomic.AtomicReference) 
org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- locked <0x0000000796959eb8> (a java.util.concurrent.atomic.AtomicReference) 
org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- locked <0x000000079695a188> (a java.util.concurrent.atomic.AtomicReference) 
org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
- waiting <0x000000079415eca0> (a [Ljava.lang.Object;) 
org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:5019)

[1] This has been confirmed via the value for m_globalLockThread of Felix 
instance in Heap Dump
[2] 
https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/ResourceResolverFactoryActivator.java#L313
[3] http://njbartlett.name/files/osgibook_preview_20091217.pdf (Section 6.4 
Don’t Hold Locks when Calling Foreign Code)



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to