[ https://issues.apache.org/jira/browse/FELIX-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418731#comment-16418731 ]
Sander Stoel commented on FELIX-5716: ------------------------------------- My application was also experiencing a deadlock after unregistering certain services with aspects, after applying this patch it is working as expected. > Dead Lock in DM > --------------- > > Key: FELIX-5716 > URL: https://issues.apache.org/jira/browse/FELIX-5716 > Project: Felix > Issue Type: Bug > Affects Versions: org.apache.felix.dependencymanager-r1 > Reporter: Pierre De Rop > Assignee: Pierre De Rop > Priority: Critical > Attachments: deadlock.proposed.patch.txt > > > I just found an unfortunate deadlock when using latest DM r11 and latest SCR > 2.0.12 in the same JVM. > I believe that the same issue applies to both DM and SCR (I may have to open > a seperate issue for SCR). > First, here is the deadlock: > {code} > Found one Java-level deadlock: > ============================= > "CM Event Dispatcher (Fire ConfigurationEvent: > pid=com.alcatel.as.http.ioh.impl.HttpIOH)": > waiting to lock monitor 0x00007f8188004538 (object 0x00000000eb38d420, a > org.apache.felix.dm.tracker.ServiceTracker$Tracked), > which is held by "FelixDispatchQueue" > "FelixDispatchQueue": > waiting for ownable synchronizer 0x00000000c0699f30, (a > java.util.concurrent.locks.ReentrantLock$FairSync), > which is held by "CM Event Dispatcher (Fire ConfigurationEvent: > pid=com.alcatel.as.http.ioh.impl.HttpIOH)" > Java stack information for the threads listed above: > =================================================== > "CM Event Dispatcher (Fire ConfigurationEvent: > pid=com.alcatel.as.http.ioh.impl.HttpIOH)": > at > org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChangedHideAspects(ServiceTracker.java:1140) > - waiting to lock <0x00000000eb38d420> (a > org.apache.felix.dm.tracker.ServiceTracker$Tracked) > at > org.apache.felix.dm.tracker.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1054) > at > org.apache.felix.framework.EventDispatcher.invokeServiceListenerCallback(EventDispatcher.java:990) > at > org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:838) > at > org.apache.felix.framework.EventDispatcher.fireServiceEvent(EventDispatcher.java:545) > at org.apache.felix.framework.Felix.fireServiceEvent(Felix.java:4579) > at org.apache.felix.framework.Felix.access$000(Felix.java:105) > at org.apache.felix.framework.Felix$1.serviceChanged(Felix.java:419) > at > org.apache.felix.framework.ServiceRegistry.servicePropertiesModified(ServiceRegistry.java:588) > at > org.apache.felix.framework.ServiceRegistrationImpl.setProperties(ServiceRegistrationImpl.java:131) > at > org.apache.felix.scr.impl.manager.SingleComponentManager.updateServiceRegistration(SingleComponentManager.java:558) > at > org.apache.felix.scr.impl.manager.SingleComponentManager.modify(SingleComponentManager.java:755) > at > org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:645) > at > org.apache.felix.scr.impl.manager.SingleComponentManager.reconfigure(SingleComponentManager.java:609) > at > org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.configurationUpdated(ConfigurableComponentHolder.java:426) > at > org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent(RegionConfigurationSupport.java:284) > at > org.apache.felix.scr.impl.manager.RegionConfigurationSupport$1.configurationEvent(RegionConfigurationSupport.java:89) > at > org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.sendEvent(ConfigurationManager.java:2090) > at > org.apache.felix.cm.impl.ConfigurationManager$FireConfigurationEvent.run(ConfigurationManager.java:2058) > at org.apache.felix.cm.impl.UpdateThread.run0(UpdateThread.java:141) > at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:109) > at java.lang.Thread.run(Thread.java:748) > "FelixDispatchQueue": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000c0699f30> (a > java.util.concurrent.locks.ReentrantLock$FairSync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247) > at > java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442) > at > org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainLock(AbstractComponentManager.java:228) > at > org.apache.felix.scr.impl.manager.AbstractComponentManager.obtainStateLock(AbstractComponentManager.java:279) > at > org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:806) > at > org.apache.felix.framework.ServiceRegistrationImpl.getFactoryUnchecked(ServiceRegistrationImpl.java:347) > at > org.apache.felix.framework.ServiceRegistrationImpl.getService(ServiceRegistrationImpl.java:247) > at > org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:350) > at org.apache.felix.framework.Felix.getService(Felix.java:3721) > at > org.apache.felix.framework.BundleContextImpl.getService(BundleContextImpl.java:470) > at > org.apache.felix.dm.impl.ServiceEventImpl.getEvent(ServiceEventImpl.java:86) > at > org.apache.felix.dm.impl.ServiceDependencyImpl.addingService(ServiceDependencyImpl.java:251) > at > org.apache.felix.dm.tracker.ServiceTracker$Tracked.customizerAdding(ServiceTracker.java:1276) > at > org.apache.felix.dm.tracker.AbstractTracked.trackAdding(AbstractTracked.java:279) > at > org.apache.felix.dm.tracker.AbstractTracked.trackInitial(AbstractTracked.java:188) > - locked <0x00000000eb38d420> (a > org.apache.felix.dm.tracker.ServiceTracker$Tracked) > at > org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:387) > - locked <0x00000000eb38d420> (a > org.apache.felix.dm.tracker.ServiceTracker$Tracked) > - locked <0x00000000eb38cd00> (a > org.apache.felix.dm.tracker.ServiceTracker) > at > org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:320) > at > org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:296) > at > org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:235) > at > org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:1360) > at > org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:1059) > at > org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:1010) > at > org.apache.felix.dm.impl.ComponentImpl.lambda$start$2(ComponentImpl.java:426) > at > org.apache.felix.dm.impl.ComponentImpl$$Lambda$7/201251057.run(Unknown Source) > at > org.apache.felix.dm.impl.SerialExecutor.runTask(SerialExecutor.java:138) > at > org.apache.felix.dm.impl.SerialExecutor.runTasks(SerialExecutor.java:120) > at > org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:86) > at > org.apache.felix.dm.impl.SerialExecutor.execute(SerialExecutor.java:105) > at > org.apache.felix.dm.impl.ComponentImpl.start(ComponentImpl.java:424) > at > org.apache.felix.dm.impl.ComponentScheduler.add(ComponentScheduler.java:69) > at > org.apache.felix.dm.DependencyManager.add(DependencyManager.java:136) > at > com.alcatel.as.service.coordinator.impl.Activator.init(Activator.java:16) > at > org.apache.felix.dm.DependencyActivatorBase.start(DependencyActivatorBase.java:75) > at > org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697) > at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2145) > at org.apache.felix.framework.BundleImpl.start(BundleImpl.java:998) > at > com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl$DeployedBundle.start(BundleInstallerImpl.java:1056) > at > com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.startBundles(BundleInstallerImpl.java:728) > at > com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.finishInitialisation(BundleInstallerImpl.java:230) > at > com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:213) > at > com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.start(BundleInstallerImpl.java:174) > at > org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697) > at org.apache.felix.framework.Felix.activateBundle(Felix.java:2239) > at org.apache.felix.framework.Felix.startBundle(Felix.java:2145) > at org.apache.felix.framework.Felix.updateBundle(Felix.java:2506) > at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1018) > at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:1004) > at > com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.checkSelfUpdate(BundleInstallerImpl.java:273) > at > com.alcatel.as.service.bundleinstaller.impl.BundleInstallerImpl.frameworkEvent(BundleInstallerImpl.java:192) > at > org.apache.felix.framework.EventDispatcher.invokeFrameworkListenerCallback(EventDispatcher.java:881) > at > org.apache.felix.framework.EventDispatcher.fireEventImmediately(EventDispatcher.java:830) > at > org.apache.felix.framework.EventDispatcher.run(EventDispatcher.java:1147) > at > org.apache.felix.framework.EventDispatcher.access$000(EventDispatcher.java:54) > at > org.apache.felix.framework.EventDispatcher$1.run(EventDispatcher.java:102) > at java.lang.Thread.run(Thread.java:748) > Found 1 deadlock. > {code} > now, here is my first interpretation of what is going on: > in SCR: > ===== > Configuration Admin is delivering a configuration update event to SCR. So, > the > org.apache.felix.scr.impl.manager.RegionConfigurationSupport.configurationEvent() > method is then called. This method then indirectly triggers a call to the > SingleComponentManager.reconfigure method, which then calls > SingleComponentManager.modify method. Here is the modify() method: > {code} > private boolean modify(boolean configurationDeleted) > { > //0 SCR 112.7.1 If configuration is deleted, and version is < 1.3 and > no flag set, then deactivate unconditionally. > // For version 1.3 and later, or with a flag, more sensible behavior > is allowed. > if ( configurationDeleted && > !getComponentMetadata().isDeleteCallsModify()){ > return false; > } > // 1. no live update if there is no declared method > if ( getComponentMetadata().getModified() == null ) > { > log( LogService.LOG_DEBUG, "No modified method, cannot update > dynamically", null ); > return false; > } > // invariant: we have a modified method name > // 2. get and check configured method > // invariant: modify method is configured and found > // 3. check whether we can dynamically apply the configuration if > // any target filters influence the bound services > final Map<String, Object> props = getProperties(); > for ( DependencyManager<S, ?> dm: getDependencyManagers() ) > { > if ( !dm.canUpdateDynamically( props ) ) > { > log( LogService.LOG_DEBUG, > "Cannot dynamically update the configuration due to > dependency changes induced on dependency {0}", > new Object[] {dm.getName()}, null ); > return false; > } > } > // invariant: modify method existing and no static bound service > changes > // 4. call method (nothing to do when failed, since it has already > been logged) > // (call with non-null default result to continue even if the > // modify method call failed) > obtainStateLock( ); > try > { > //cf 112.5.12 where invoking modified method before updating > target services is specified. > final MethodResult result = invokeModifiedMethod(); > updateTargets( props ); > if ( result == null ) > { > // log an error if the declared method cannot be found > log( LogService.LOG_ERROR, "Declared modify method ''{0}'' > cannot be found, configuring by reactivation", > new Object[] {getComponentMetadata().getModified()}, > null ); > return false; > } > // 5. update the target filter on the services now, this may still > // result in unsatisfied dependencies, in which case we abort > // this dynamic update and have the component be deactivated > if ( !verifyDependencyManagers() ) > { > log( LogService.LOG_DEBUG, > "Updating the service references caused at least one > reference to become unsatisfied, deactivating component", > null ); > return false; > } > // 6. update service registration properties if we didn't just do > it > if ( result.hasResult() ) > { > setServiceProperties( result, null ); > } > else > { > updateServiceRegistration(); > } > ... > {code} > This is problematic because we see that the modify method is invoking the > updateServiceRegistration while holding the StateLock. (the obtainStateLock() > method is called and then the service registration is updaded). > Here, the listener is a DM component, but this component is internally > synchronized on it's service tracker, and is currently blocked, trying to > dereference the DS service, but it can't because the DS service state lock is > currently held by the modify method. > in the next section, I'm now analyzing what is doing DM and will propose a > patch in order to make sure DM does not hold any locks while calling > bundleContext.getService(ref) ... this will fix the issue. > Dependency Manager > ================= > In dependency manager, now let's analyze the the FelixDispatchQueue > stacktrace: we open the DM service tracker, but the open method (in the DM > service tracker) calls the AbstractTracked.trackInitial while holding the > lock on ServiceTracker$Tracked. Then this triggers the customizer > "addingService" method call, which then perform a getService. But the > getService then tries to dereference a declarative service component, and the > stacktrace is then blocked because the DS component state lock is currently > locked by the other "CM Event Dispatcher" > So, in the end, I think that there is something wrong in both SCR and DM : > 1) SCR obtain the state lock while update the service registration. > This is , I think, unfortunate because updating the service registration > while holding a lock is like hodling a lock while calling foreign code (you > don't know what will do the service listener when you update the service > registration). > 2) and DM calls the customizer "addingService" method while holding the lock > on the service tracker. This is also not satisfying. > So, at minimul, in DM I do believe that we should manage to not invoke the > customizer.addingService while holding the lock on the DM service tracker. > now, I have given a look into the DM ServiceTracker.open method and it is > currently implemented like this: > {code} > public void open(boolean trackAllServices, boolean trackAllAspects) { > if (debug) { > System.out.println("[ServiceTracker] " + debugKey + " > T" + Thread.currentThread().getId() + " open"); > } > final Tracked t; > synchronized (this) { > if (tracked != null) { > return; > } > if (DEBUG) { > System.out.println("ServiceTracker.open: " + > filter); > } > m_trackAllAspects = trackAllAspects; > t = trackAllServices ? new AllTracked() : new Tracked(); > synchronized (t) { > try { > context.addServiceListener(t, > listenerFilter); > ServiceReference[] references = null; > if (trackClass != null) { > references = > getInitialReferences(trackAllServices, > trackClass, > null); > } > else { > if (trackReference != null) { > if > (trackReference.getBundle() != null) { > references = > new ServiceReference[] {trackReference}; > } > } > else { /* user supplied filter > */ > references = > getInitialReferences(trackAllServices, > null, > > (listenerFilter != null) ? listenerFilter > > : filter.toString()); > } > } > /* set tracked with the initial > references */ > t.setInitial(references); > > // only actually schedules the actions > for execution within this synchronized block, > // but do the actual execution > afterwards. > t.trackInitial(); > } > catch (InvalidSyntaxException e) { > throw new RuntimeException( > "unexpected > InvalidSyntaxException: " > + > e.getMessage(), e); > } > } > tracked = t; > } > /* Call tracked outside of synchronized region */ > // just trigger the executor > t.getExecutor().execute(); > } > {code} > As you can see the "t.trackInitial()" is invoked from within the synchronized > block. This method actually schedules customizer "added" callbacks, which > will be invoked properly outside of the synchronized block (when > t.getExecutor().executor() will be called). > But please notice that the the issue is that the t.trackInitial() will also > invoke the "addingService" customizer callback from the synchronized block. > Doing so will end up dereferencing the DS service reference while we are > holding the ServiceTracker lock. > Interestingly, notice that the BundleTracker is not calling the > t.trackInitial() method from the synchronized block. > so, I have attempted to make a patch, which seems to resolve the issue. All > tests are passing OK, and we don't experience the deadlock anymore, and I > would appreciate if another DM developer or user could take a look at my > proposed patch, to make sure I don't have messed up :-) > thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)