On 1/1/13 12:40, Jad Naous wrote:
On Tue, Jan 1, 2013 at 9:16 AM, Richard S. Hall <he...@ungoverned.org>wrote:

On 1/1/13 11:51, Jad Naous wrote:

On Tue, Jan 1, 2013 at 8:30 AM, Richard S. Hall <he...@ungoverned.org
wrote:
  On 1/1/13 11:13, Jad Naous wrote:
  On Tue, Jan 1, 2013 at 7:58 AM, Richard S. Hall <he...@ungoverned.org
wrote:

   On 1/1/13 10:37, Jad Naous wrote:

   On Tue, Jan 1, 2013 at 6:21 AM, Richard S. Hall<he...@ungoverned.org
**
wrote:

    On 1/1/13 06:09, Jad Naous wrote:

     Happy new year!
  I'm running into another deadlock now. It seems like there needs to
be a
more rigorous study of how locking is used in the framework. In
particular,
it seems like the framework should not be invoking any listeners
within
the
same thread that's executing the stopping/starting/refreshing of
bundles...
Anywhere that happens there will be a potential for a deadlock
because
of
a
misordering of lock acquisition. The framework should never call
into
user
code with any locks held.

    Yeah, tell me about it, but it is not possible in all cases,

  unfortunately. I personally feel that all events should be
asynchronous,
but that's another story.

-> richard


    I'm happy to help fix this, but I need some pointers as to what
needs to

  happen. For synchronous events, what are the requirements? Or is
this
not
fixable? Can the lock be released before firing the events?

   I don't think it can be fixed, some of these things are baked into
the

spec. I think the spec even states somewhere that synchronous event
listeners must be careful since they may be holding framework locks, so
they shouldn't try to do too much. Of course, that is not easy advice
to
follow since framework impls vary.

We even have bugs open that say we aren't holding locks when we should
be,
e.g.:

       
https://issues.apache.org/******jira/browse/FELIX-3806<https://issues.apache.org/****jira/browse/FELIX-3806>
<https:/**/issues.apache.org/**jira/**browse/FELIX-3806<https://issues.apache.org/**jira/browse/FELIX-3806>
<https:/**/issues.apache.org/**jira/**browse/FELIX-3806<http://issues.apache.org/jira/**browse/FELIX-3806>
<https**://issues.apache.org/jira/**browse/FELIX-3806<https://issues.apache.org/jira/browse/FELIX-3806>

This is one of the poorly designed parts of the OSGi spec. It was meant
to
work in a world were services can "come and go at anytime", but it
allows
users to clings to synchronous events to do tons of work.

   Then I guess this is something that ipojo is doing incorrectly?

Hard for me to say, but if there is a case where we can fire events
without holding locks, then you are correct in saying that we should try
to
avoid doing so. Likewise, users should avoid doing too much when they
receive synchronous events.

  Well, iPOJO is not doing anything. It is just loading a class, and
that's
what's causing the global lock to be acquired.

You're really going to make me look into this, aren't you? ;-)


:) Thanks from me and the tons of other people using this framework!


I'm not sure what is going on, but for whatever reason you are in the
middle of a refresh (which requires global lock) while it appears that
iPOJO is starting to manage a new bundle, which ultimately results in
someone trying to do a dynamic import (which also requires global lock).

There isn't much we can do in this case other than try to detect it and
fail for the global lock acquire...

And ipojo will have to deal... Seems simpler if ipojo didn't try to
unregister services or do that sort of thing (or at least not acquire any
locks of its own) synchronously.

iPOJO doesn't appear to be doing anything synchronously here. It looks like it is working on another thread to start management of a perhaps newly installed or updated bundle. The issue actually arises because the framework instigates a dynamic import when trying to determine if it should deliver the event to a service listener (it needs to try to load classes in some cases to see if the service's class is compatible with the listener's service class).

This is really ugly. Not really sure what we could do here other than always fail service event delivery if the listener bundle doesn't already have access to the service class. But that would still be complicated to do and would lead to other failure scenarios.

I'd have to think about that one.

-> richard



-> richard


  -> richard

    -> richard
     Name: Thread-2

State: WAITING on [Ljava.lang.Object;@4a018e1b
Total blocked: 38,871,649  Total waited: 38,871,650

Stack trace:
      java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.********java:485)
org.apache.felix.framework.********Felix.acquireGlobalLock(****
Felix.**
**java:5033)
org.apache.felix.framework.********StatefulResolver.resolve(**
StatefulResolver.java:451)
org.apache.felix.framework.********BundleWiringImpl.****
searchDynamicImports(**
BundleWiringImpl.java:1578)
org.apache.felix.framework.********BundleWiringImpl.**
findClassOrResourceByDelegatio********n(BundleWiringImpl.java:**
****1478)
org.apache.felix.framework.********BundleWiringImpl.access$**
400(****
BundleWiringImpl.java:75)
org.apache.felix.framework.********BundleWiringImpl$**
BundleClassLoader.loadClass(********BundleWiringImpl.java:**1882)
java.lang.ClassLoader.********loadClass(ClassLoader.java:*****
***247)
org.apache.felix.framework.********BundleWiringImpl.****
getClassByDelegation(**
BundleWiringImpl.java:1356)
org.apache.felix.framework.********ServiceRegistrationImpl$**
ServiceReferenceImpl.********isAssignableTo(****
ServiceRegistrationImpl.java:********
548)
org.apache.felix.framework.********util.Util.****
isServiceAssignable(****
Util.java:280)
org.apache.felix.framework.********util.EventDispatcher.**
invokeServiceListenerCallback(********EventDispatcher.java:**916)
org.apache.felix.framework.********util.EventDispatcher.****
fireEventImmediately(
**EventDispatcher.java:793)
org.apache.felix.framework.********util.EventDispatcher.****
fireServiceEvent(**
EventDispatcher.java:543)
org.apache.felix.framework.********Felix.fireServiceEvent(**
Felix.*****
*java:4346)
org.apache.felix.framework.********Felix.registerService(**
Felix.******
java:3356)
org.apache.felix.framework.********BundleContextImpl.******
registerService(**
BundleContextImpl.java:346)
org.apache.felix.ipojo.********IPojoFactory.start(****
IPojoFactory.java:613)
        - locked org.apache.felix.ipojo.********
ComponentFactory@468034b6
org.apache.felix.ipojo.********Extender.********
createAbstractFactory(**
Extender.java:520)
org.apache.felix.ipojo.********Extender.parse(Extender.java:***
*****301)
org.apache.felix.ipojo.********Extender.startManagementFor(********
Extender.java:237)
org.apache.felix.ipojo.********Extender.access$600(Extender.*****
***java:52)
org.apache.felix.ipojo.********Extender$CreatorThread.run(********
Extender.java:769)
java.lang.Thread.run(Thread.********java:662)


Name: FelixFrameworkWiring
State: BLOCKED on org.apache.felix.ipojo.****

ComponentFactory@468034b6owned

by: Thread-2
Total blocked: 7  Total waited: 1

Stack trace:
      org.apache.felix.ipojo.********IPojoFactory.****
removeFactoryStateListener(**
IPojoFactory.java:511)
org.apache.felix.ipojo.********InstanceCreator.removeFactory(**
******
InstanceCreator.java:199)
org.apache.felix.ipojo.********Extender.closeManagementFor(********
Extender.java:180)
org.apache.felix.ipojo.********Extender.bundleChanged(****
Extender.java:153)
org.apache.felix.framework.********util.EventDispatcher.**
invokeBundleListenerCallback(********EventDispatcher.java:868)
org.apache.felix.framework.********util.EventDispatcher.****
fireEventImmediately(
**EventDispatcher.java:789)
org.apache.felix.framework.********util.EventDispatcher.****
fireBundleEvent(**
EventDispatcher.java:514)
org.apache.felix.framework.********Felix.fireBundleEvent(**
Felix.******
java:4330)
org.apache.felix.framework.********Felix.stopBundle(Felix.**
java:***
***2451)
org.apache.felix.framework.********Felix$RefreshHelper.stop(******
Felix.java:4715)
org.apache.felix.framework.********Felix.refreshPackages(**
Felix.******
java:4037)
org.apache.felix.framework.********FrameworkWiringImpl.run(**
FrameworkWiringImpl.java:178)
java.lang.Thread.run(Thread.********java:662)






On Sat, Dec 22, 2012 at 6:39 PM, Richard S. Hall <
he...@ungoverned.org

   wrote:

      On 12/22/12 20:41 , Jad Naous wrote:

      Thanks! I was building with java6. Don't know if that is the

issue.

   Anyway,

I tested the snapshot, and looks like that fixes it.

Any idea when 4.1.0 will be released? Do you know if
http://svn.apache.org/viewvc?**********view=revision&revision=**
****<http://svn.apache.org/viewvc?********view=revision&revision=****>
**1421958<http://svn.apache.**org/viewvc?******view=**
revision&revision=****1421958<http://svn.apache.org/viewvc?******view=revision&revision=****1421958>
<http://svn.apache.**org/**viewvc?****view=revision&**
revision=**1421958<http://svn.**apache.org/viewvc?****view=**
revision&revision=**1421958<http://svn.apache.org/viewvc?****view=revision&revision=**1421958>
<http://svn.apache.org/******viewvc?**view=revision&****<http://svn.apache.org/****viewvc?**view=revision&****>
revision=1421958<http://svn.**apache.org/**viewvc?**view=**
revision&**revision=1421958<http://svn.apache.org/**viewvc?**view=revision&**revision=1421958>
<http://svn.**apache.org/**viewvc?**view=**revision&**
revision=1421958<http://apache.org/viewvc?**view=**revision&revision=1421958>
<http://svn.**apache.org/viewvc?**view=**
revision&revision=1421958<http://svn.apache.org/viewvc?**view=revision&revision=1421958>
<http://svn.apache.org/********viewvc?view=revision&revision=****<http://svn.apache.org/******viewvc?view=revision&revision=**>
****1421958<http://svn.apache.**org/****viewvc?view=revision&**
revision=****1421958<http://svn.apache.org/****viewvc?view=revision&revision=****1421958>
<http://svn.apache.**org/****viewvc?view=revision&**
revision=**1421958<http://svn.**apache.org/**viewvc?view=**
revision&revision=**1421958<http://svn.apache.org/**viewvc?view=revision&revision=**1421958>
<http://svn.apache.**org/****viewvc?view=revision&****
revision=1421958<http://svn.****apache.org/viewvc?view=**<http://apache.org/viewvc?view=**>
revision&revision=1421958<http**://svn.apache.org/viewvc?view=**
revision&revision=1421958<http://svn.apache.org/viewvc?view=revision&revision=1421958>
   will

apply

cleanly onto 4.0.3? Otherwise, how stable do you think is
4.1.0-SNAPSHOT?

     The release version will be 4.2.0, but the 4.1.0-SNAPSHOT
build
should

   be

reasonably stable. We try to keep trunk stable.

I want to try to get a release out soon, but I don't have a
specific
time
table. I'll try to get in done in January if all goes well.

-> richard


     Thanks!

   Jad.


On Fri, Dec 21, 2012 at 6:49 AM, Richard S. Hall <
he...@ungoverned.org

    wrote:

        It built fine for me. I'm was building with Java 7.
     Regardless, I deployed snapshots of framework, main, and
  main.distribution, so just grab what you want from the Apache
snapshot
repo
to try it out.

-> richard


On 12/20/12 18:56 , Jad Naous wrote:

      It does look like the same issue.

    Got the trunk/framework. mvn clean install gives:

  [INFO] --- maven-compiler-plugin:2.3.2:************compile


(default-compile) @
org.apache.felix.framework ---
Dec 20, 2012 3:54:59 PM org.sonatype.guice.bean.****
reflect.Logs$JULSink
warn
WARNING: Error injecting: org.apache.maven.plugin.************
CompilerMojo
java.lang.************NoClassDefFoundError:
org/codehaus/plexus/compiler/************util/scan/******



InclusionScanException



Thanks,
jad.


On Thu, Dec 20, 2012 at 3:22 PM, Richard S. Hall <
he...@ungoverned.org

     wrote:

         On 12/20/12 4:10 PM, Jad Naous wrote:

         If a bundle undergoes a refresh while ipojo is still

initializing

      components, a deadlock can happen.  The issue is that if

ipojo is

   attempting to register a service, it will be doing it while

synchronizing
on the InstanceCreator instance. It will then try to register
a
service
which requires the framework's global lock.

       Registering a service doesn't require a global, just a
bundle
lock.
I

     think this could be related to:

            https://issues.apache.org/********<https://issues.apache.org/******>

******jira/browse/FELIX-3761<h**ttps://issues.apache.org/*****
*******jira/browse/FELIX-3761<https://issues.apache.org/**********jira/browse/FELIX-3761>
<h**ttps://issues.apache.org/************jira/browse/FELIX-**
3761<http://issues.apache.org/**********jira/browse/FELIX-3761>
<https://issues.apache.**org/********jira/browse/FELIX-**3761<https://issues.apache.org/********jira/browse/FELIX-3761>
<htt**ps://issues.apache.org/************jira/browse/FELIX-**
3761<http://issues.apache.org/**********jira/browse/FELIX-3761>
<http://issues.apache.org/**********jira/browse/FELIX-3761<http://issues.apache.org/********jira/browse/FELIX-3761>
**>
<**https://issues.apache.org/**********jira/browse/FELIX-3761<https://issues.apache.org/********jira/browse/FELIX-3761>
<**https://issues.apache.org/********jira/browse/FELIX-3761<https://issues.apache.org/******jira/browse/FELIX-3761>
<https**://issues.apache.org/**********jira/**browse/FELIX-**
3761<http://issues.apache.org/********jira/**browse/FELIX-3761>
<http://issues.apache.org/********jira/**browse/FELIX-3761<http://issues.apache.org/******jira/**browse/FELIX-3761>
**>
<**http://issues.apache.org/********jira/**browse/FELIX-3761<http://issues.apache.org/******jira/**browse/FELIX-3761>
<h**ttp://issues.apache.org/******jira/**browse/FELIX-3761<http://issues.apache.org/****jira/**browse/FELIX-3761>
<ht**tps://issues.apache.org/**********jira/browse/FELIX-3761<http://issues.apache.org/********jira/browse/FELIX-3761>
<**http://issues.apache.org/********jira/browse/FELIX-3761<http://issues.apache.org/******jira/browse/FELIX-3761>
<ht**tps://issues.apache.org/********jira/browse/FELIX-3761<http://issues.apache.org/******jira/browse/FELIX-3761>
<ht**tps://issues.apache.org/******jira/browse/FELIX-3761<https://issues.apache.org/****jira/browse/FELIX-3761>
<https:/**/issues.apache.org/**********jira/**browse/FELIX-**
3761<http://issues.apache.org/********jira/**browse/FELIX-3761>
<http://issues.apache.org/********jira/**browse/FELIX-3761<http://issues.apache.org/******jira/**browse/FELIX-3761>
**>
<**http://issues.apache.org/********jira/**browse/FELIX-3761<http://issues.apache.org/******jira/**browse/FELIX-3761>
<h**ttp://issues.apache.org/******jira/**browse/FELIX-3761<http://issues.apache.org/****jira/**browse/FELIX-3761>
<ht**tp://issues.apache.org/******jira/****browse/FELIX-3761<http://issues.apache.org/****jira/****browse/FELIX-3761>
<h**ttp://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/**jira/**browse/FELIX-3761>
<http**s://issues.apache.org/********jira/**browse/FELIX-3761<http://issues.apache.org/******jira/**browse/FELIX-3761>
<**http://issues.apache.org/******jira/**browse/FELIX-3761<http://issues.apache.org/****jira/**browse/FELIX-3761>
<ht**tp://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/**jira/**browse/FELIX-3761>
<http**s://issues.apache.org/******jira/**browse/FELIX-3761<http://issues.apache.org/****jira/**browse/FELIX-3761>
<ht**tp://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/**jira/**browse/FELIX-3761>
<http**s://issues.apache.org/****jira/**browse/FELIX-3761<http://issues.apache.org/**jira/**browse/FELIX-3761>
<http**s://issues.apache.org/**jira/**browse/FELIX-3761<https://issues.apache.org/**jira/browse/FELIX-3761>
<https:/**/issues.apache.org/**********jira/**browse/FELIX-**
3761<http://issues.apache.org/********jira/**browse/FELIX-3761>
<http://issues.apache.org/********jira/**browse/FELIX-3761<http://issues.apache.org/******jira/**browse/FELIX-3761>
**>
<**http://issues.apache.org/********jira/**browse/FELIX-3761<http://issues.apache.org/******jira/**browse/FELIX-3761>
<h**ttp://issues.apache.org/******jira/**browse/FELIX-3761<http://issues.apache.org/****jira/**browse/FELIX-3761>
<ht**tp://issues.apache.org/******jira/****browse/FELIX-3761<http://issues.apache.org/****jira/****browse/FELIX-3761>
<h**ttp://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/**jira/**browse/FELIX-3761>
<http**://issues.apache.org/******jira/****browse/FELIX-3761<http://issues.apache.org/****jira/****browse/FELIX-3761>
<h**ttp://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/**jira/******browse/FELIX-3761<http://issues.apache.org/jira/******browse/FELIX-3761>
<h**ttp://issues.apache.org/jira/******browse/FELIX-3761<http://issues.apache.org/jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/jira/******browse/FELIX-3761<http://issues.apache.org/jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/jira/****browse/FELIX-3761<http://issues.apache.org/jira/**browse/FELIX-3761>
<https**://issues.apache.org/********jira/**browse/FELIX-3761<http://issues.apache.org/******jira/**browse/FELIX-3761>
<**http://issues.apache.org/******jira/**browse/FELIX-3761<http://issues.apache.org/****jira/**browse/FELIX-3761>
<ht**tp://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/**jira/**browse/FELIX-3761>
<http**://issues.apache.org/****jira/****browse/FELIX-3761<http://issues.apache.org/**jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/jira/******browse/FELIX-3761<http://issues.apache.org/jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/jira/****browse/FELIX-3761<http://issues.apache.org/jira/**browse/FELIX-3761>
<https**://issues.apache.org/******jira/**browse/FELIX-3761<http://issues.apache.org/****jira/**browse/FELIX-3761>
<ht**tp://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/**jira/**browse/FELIX-3761>
<http**://issues.apache.org/**jira/****browse/FELIX-3761<http://issues.apache.org/jira/****browse/FELIX-3761>
<htt**p://issues.apache.org/jira/****browse/FELIX-3761<http://issues.apache.org/jira/**browse/FELIX-3761>
<https**://issues.apache.org/****jira/**browse/FELIX-3761<http://issues.apache.org/**jira/**browse/FELIX-3761>
<http**://issues.apache.org/jira/****browse/FELIX-3761<http://issues.apache.org/jira/**browse/FELIX-3761>
<https**://issues.apache.org/**jira/**browse/FELIX-3761<http://issues.apache.org/jira/**browse/FELIX-3761>
<https**://issues.apache.org/jira/**browse/FELIX-3761<https://issues.apache.org/jira/browse/FELIX-3761>
This avoid grabbing the bundle lock when registering a service,
so
maybe
it will help your situation. You could try to build the
framework
from
trunk and see if it makes a difference.

If you aren't able to build from trunk, let me know and I'll
try
to
publish a snapshot build since I don't think we have a recent
one
(we
should do this no matter what).

-> richard



       If a refresh is happening in another thread, the refresh
will
be
holding

     the framework's global lock, which will then call IPOJO's
extender,

   which

then attempts to call a method on InstanceCreator, hence
leading
to
the
deadlock.

Here are the stack traces:

Daemon Thread [Thread-1] (Suspended)
Object.wait(long) line: not available [native method]
Object[](Object).wait() line: 485
Felix.acquireBundleLock(**************BundleImpl, int) line:
4871
Felix.registerService(**************BundleImpl, String[],
Object,
Dictionary)
line:
3205
BundleContextImpl.**************registerService(String[],
Object,
Dictionary)
line:
346

IPojoContext.registerService(**************String[], Object,
Dictionary)
line:
385
ProvidedService.**************registerService() line: 362
ProvidedServiceHandler.__M_**************stateChanged(int)
line:
509
ProvidedServiceHandler.**************stateChanged(int) line:
not



available



InstanceManager.setState(int) line: 536
InstanceManager.start() line: 418
ComponentFactory.**************createInstance(Dictionary,


IPojoContext,
HandlerManager[])
line: 179

ComponentFactory(IPojoFactory)**************.**


createComponentInstance(******
Dictionary,
ServiceContext) line: 310
ComponentFactory(IPojoFactory)**************.**
createComponentInstance(******
Dictionary)
line: 239
InstanceCreator$**************ManagedInstance.create(********
******
IPojoFactory)
line: 355
InstanceCreator.addInstance(**************Dictionary, long)

line: 89
Extender.parse(Bundle, String) line: 306
Extender.startManagementFor(**************Bundle) line: 237






Extender.access$600(Extender, Bundle) line: 52
Extender$CreatorThread.run() line: 769
Thread.run() line: 662

Daemon Thread [FelixFrameworkWiring] (Suspended)
InstanceCreator.**************removeInstancesFromBundle(*****
*****
****long)
line: 116
Extender.closeManagementFor(**************Bundle) line: 171
Extender.bundleChanged(**************BundleEvent) line: 153
EventDispatcher.**************invokeBundleListenerCallback(**
***

****

*****Bundle,
EventListener,
EventObject) line: 868
EventDispatcher.**************fireEventImmediately(********



EventDispatcher,

int, Map,
EventObject, Dictionary) line: 789
EventDispatcher.**************fireBundleEvent(BundleEvent,

Framework)


line:

514


Felix.fireBundleEvent(int, Bundle) line: 4244
Felix.stopBundle(BundleImpl, boolean) line: 2351
Felix$RefreshHelper.stop() line: 4629
Felix.refreshPackages(**************Collection,

FrameworkListener[])


line:

3951


FrameworkWiringImpl.run() line: 172
Thread.run() line: 662









Reply via email to