Hi Felix, Sorry for the delay, I've been struggling to come up with an independent way to reproduce this and finally have something similar in an integration test which I've attached to the jira with a stack trace.
By messing with the timing I've gotten into a state where we're trying to log on a BundleComponentActivator that has been shut down. The activity being logged is different, but the NPE is the same. Admittedly I added the log statement in question.... The basic scenario is two threads shutting stuff down concurrently, and an unbind method that takes a long time to return. bundle A has a service s1 bundle B has a service s2 with a reference to s1, and an unbind method that takes a long time. Stop A in thread 1; the thread gets stuck in unbind of s2. Stop B in thread 2; this shuts down B so it's BundleComponentActivator no longer can log (the NPE). If the unbind method in thread 1 now returns, it will try to log in B's BCA which has no log service tracker. At the moment I'm inclined to think that there is no way to prevent these kinds of races between bundles shutting down and the best thing to do is null checks in many places and catching exceptions from attempts to use stopped bundle contexts; this is what my first two patches do. BTW to get my test to work at all I started upgrading scr to more recent pax bundles. I think this would slightly simplify the code. Also I think the more modern way to run integration tests is with the maven-failsafe-plugin rather than a separate surefire plugin execution. Would you be interested in a patch for this? thanks david jencks On Feb 16, 2012, at 12:41 AM, Felix Meschberger wrote: > Hi David, > > Thanks for reporting. I have seen the reported issue. I will look into it > ASAP. > > My guts feeling tells me that some issues are just a question of doing > null-checks properly while others might be more involved ... > > I assume you have a scenario where you can reproduce reliably ? > > Regards > Felix > > Am 15.02.2012 um 07:55 schrieb David Jencks: > >> see FELIX-3345 >> >> We've been seeing intermittent exceptions from SCR which generally seem to >> look like trying to unget a service on a bundle context, >> BundleComponentActivator, ComponentManager, or DependencyManager that are >> shut down or being shut down. I think there are 2 threads shutting bundles >> down at once. I'm not making much progress investigation exactly how this >> happens so I'd really appreciate it if one of the experts could take a look >> at the stack traces in the issue and attempt to guess whether there's a real >> concurrency bug or if the situations we're seeing are expected when more >> than one thread is shutting down bundles at once, and the "don't throw an >> exception" patch I provided would be appropriate. >> >> To try to pique your interest here is one of the stack traces: >> >> Stack Dump = org.osgi.framework.ServiceException: Exception in >> org.apache.felix.scr.impl.manager.DelayedComponentManager.ungetService() >> at >> org.eclipse.osgi.internal.serviceregistry.ServiceUse.releaseService(ServiceUse.java:287) >> >> at >> org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.releaseService(ServiceRegistrationImpl.java:562) >> >> at >> org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.releaseServicesInUse(ServiceRegistry.java:665) >> >> at >> org.eclipse.osgi.framework.internal.core.BundleContextImpl.close(BundleContextImpl.java:91) >> >> at >> org.eclipse.osgi.framework.internal.core.BundleHost.stopWorker(BundleHost.java:514) >> >> at >> org.eclipse.osgi.framework.internal.core.AbstractBundle.suspend(AbstractBundle.java:565) >> >> at >> org.eclipse.osgi.framework.internal.core.Framework.suspendBundle(Framework.java:1161) >> >> at >> org.eclipse.osgi.framework.internal.core.StartLevelManager.decFWSL(StartLevelManager.java:595) >> >> at >> org.eclipse.osgi.framework.internal.core.StartLevelManager.doSetStartLevel(StartLevelManager.java:257) >> >> at >> org.eclipse.osgi.framework.internal.core.StartLevelManager.shutdown(StartLevelManager.java:215) >> >> at >> org.eclipse.osgi.framework.internal.core.InternalSystemBundle.suspend(InternalSystemBundle.java:284) >> >> at >> org.eclipse.osgi.framework.internal.core.Framework.shutdown(Framework.java:691) >> >> at >> org.eclipse.osgi.framework.internal.core.Framework.close(Framework.java:598) >> at >> org.eclipse.osgi.framework.internal.core.InternalSystemBundle$1.run(InternalSystemBundle.java:261) >> >> at java.lang.Thread.run(Thread.java:680) >> Caused by: java.lang.NullPointerException >> at >> org.apache.felix.scr.impl.BundleComponentActivator.log(BundleComponentActivator.java:614) >> >> at >> org.apache.felix.scr.impl.BundleComponentActivator.log(BundleComponentActivator.java:589) >> >> at >> org.apache.felix.scr.impl.manager.AbstractComponentManager.log(AbstractComponentManager.java:633) >> >> at >> org.apache.felix.scr.impl.manager.AbstractComponentManager$State.log(AbstractComponentManager.java:1000) >> >> at >> org.apache.felix.scr.impl.manager.AbstractComponentManager$State.ungetService(AbstractComponentManager.java:964) >> >> at >> org.apache.felix.scr.impl.manager.DelayedComponentManager.ungetService(DelayedComponentManager.java:114) >> >> at >> org.eclipse.osgi.internal.serviceregistry.ServiceUse$3.run(ServiceUse.java:277) >> >> at java.security.AccessController.doPrivileged(Native Method) >> at >> org.eclipse.osgi.internal.serviceregistry.ServiceUse.releaseService(ServiceUse.java:275) >> >> ... 14 more >> >> I think the DelayedComponentManager.State here is Disposed but we've also >> seen this trace with state Active but the bundle context stopped (so >> ungetting throws an exception). >> >> many thanks >> david jencks >> >