Hi Felix,

Sorry for the delay, I've been struggling to come up with an independent way to 
reproduce this and finally have something similar in an integration test which 
I've attached to the jira with a stack trace.

By messing with the timing I've gotten into a state where we're trying to log 
on a BundleComponentActivator that has been shut down.  The activity being 
logged is different, but the NPE is the same.  Admittedly I added the log 
statement in question....

The basic scenario is two threads shutting stuff down concurrently, and an 
unbind method that takes a long time to return.

bundle A has a service s1
bundle B has a service s2 with a reference to s1, and an unbind method that 
takes a long time.

Stop A in thread 1; the thread gets stuck in unbind of s2.

Stop B in thread 2; this shuts down B so it's BundleComponentActivator no 
longer can log (the NPE).

If the unbind method in thread 1 now returns, it will try to log in B's BCA 
which has no log service tracker.

At the moment I'm inclined to think that there is no way to prevent these kinds 
of races between bundles shutting down and the best thing to do is null checks 
in many places and catching exceptions from attempts to use stopped bundle 
contexts; this is what my first two patches do.

BTW to get my test to work at all I started upgrading scr to more recent pax 
bundles.  I think this would slightly simplify the code.  Also I think the more 
modern way to run integration tests is with the maven-failsafe-plugin rather 
than a separate surefire plugin execution.  Would you be interested in a patch 
for this?

thanks
david jencks


On Feb 16, 2012, at 12:41 AM, Felix Meschberger wrote:

> Hi David,
> 
> Thanks for reporting. I have seen the reported issue. I will look into it 
> ASAP.
> 
> My guts feeling tells me that some issues are just a question of doing 
> null-checks properly while others might be more involved ...
> 
> I assume you have a scenario where you can reproduce reliably ?
> 
> Regards
> Felix
> 
> Am 15.02.2012 um 07:55 schrieb David Jencks:
> 
>> see FELIX-3345
>> 
>> We've been seeing intermittent exceptions from SCR which generally seem to 
>> look like trying to unget a service on a bundle context, 
>> BundleComponentActivator, ComponentManager, or DependencyManager that are 
>> shut down or being shut down.  I think there are 2 threads shutting bundles 
>> down at once.  I'm not making much progress investigation exactly how this 
>> happens so I'd really appreciate it if one of the experts could take a look 
>> at the stack traces in the issue and attempt to guess whether there's a real 
>> concurrency bug or if the situations we're seeing are expected when more 
>> than one thread is shutting down bundles at once, and the "don't throw an 
>> exception" patch I provided would be appropriate.
>> 
>> To try to pique your interest here is one of the stack traces:
>> 
>> Stack Dump = org.osgi.framework.ServiceException: Exception in 
>> org.apache.felix.scr.impl.manager.DelayedComponentManager.ungetService() 
>> at 
>> org.eclipse.osgi.internal.serviceregistry.ServiceUse.releaseService(ServiceUse.java:287)
>>  
>> at 
>> org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.releaseService(ServiceRegistrationImpl.java:562)
>>  
>> at 
>> org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.releaseServicesInUse(ServiceRegistry.java:665)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.BundleContextImpl.close(BundleContextImpl.java:91)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.BundleHost.stopWorker(BundleHost.java:514)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.AbstractBundle.suspend(AbstractBundle.java:565)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.Framework.suspendBundle(Framework.java:1161)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.StartLevelManager.decFWSL(StartLevelManager.java:595)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.StartLevelManager.doSetStartLevel(StartLevelManager.java:257)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.StartLevelManager.shutdown(StartLevelManager.java:215)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.InternalSystemBundle.suspend(InternalSystemBundle.java:284)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.Framework.shutdown(Framework.java:691)
>>  
>> at 
>> org.eclipse.osgi.framework.internal.core.Framework.close(Framework.java:598) 
>> at 
>> org.eclipse.osgi.framework.internal.core.InternalSystemBundle$1.run(InternalSystemBundle.java:261)
>>  
>> at java.lang.Thread.run(Thread.java:680) 
>> Caused by: java.lang.NullPointerException 
>> at 
>> org.apache.felix.scr.impl.BundleComponentActivator.log(BundleComponentActivator.java:614)
>>  
>> at 
>> org.apache.felix.scr.impl.BundleComponentActivator.log(BundleComponentActivator.java:589)
>>  
>> at 
>> org.apache.felix.scr.impl.manager.AbstractComponentManager.log(AbstractComponentManager.java:633)
>>  
>> at 
>> org.apache.felix.scr.impl.manager.AbstractComponentManager$State.log(AbstractComponentManager.java:1000)
>>  
>> at 
>> org.apache.felix.scr.impl.manager.AbstractComponentManager$State.ungetService(AbstractComponentManager.java:964)
>>  
>> at 
>> org.apache.felix.scr.impl.manager.DelayedComponentManager.ungetService(DelayedComponentManager.java:114)
>>  
>> at 
>> org.eclipse.osgi.internal.serviceregistry.ServiceUse$3.run(ServiceUse.java:277)
>>  
>> at java.security.AccessController.doPrivileged(Native Method) 
>> at 
>> org.eclipse.osgi.internal.serviceregistry.ServiceUse.releaseService(ServiceUse.java:275)
>>  
>> ... 14 more 
>> 
>> I think the DelayedComponentManager.State here is Disposed but we've also 
>> seen this trace with state Active but the bundle context stopped (so 
>> ungetting throws an exception).
>> 
>> many thanks
>> david jencks
>> 
> 

Reply via email to