[jira] Commented: (FELIX-2400) High contention (or deadlock) in PackageAdmin and StartLevel

2011-02-28 Thread Alexander Berger (JIRA)

[ 
https://issues.apache.org/jira/browse/FELIX-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000215#comment-13000215
 ] 

Alexander Berger commented on FELIX-2400:
-

I did not yet manage to test this problem with a 3.* release of the framework. 
At the moment we are using framework version 3.0.6 with our work around (as 
described above) applied. But from a logical point of view, as long as the 
locking behaviour within the framework is the same the contention will stay.

> High contention (or deadlock) in PackageAdmin and StartLevel 
> -
>
> Key: FELIX-2400
> URL: https://issues.apache.org/jira/browse/FELIX-2400
> Project: Felix
>  Issue Type: Bug
>  Components: Framework
>Affects Versions: framework-2.0.5
> Environment: Felix 2.0.5
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)
> SunOS castor 5.10 Generic_13-06 sun4u sparc SUNW,Sun-Fire-V890
>Reporter: Alexander Berger
>
> Imagine the following code:
> void createProblem(PackageAdmin pa, StartLevel sl, Bundle bundles[], int 
> level){
>for ( final Bundle b : bundles) {
>   sl.setBundleStartLevel(b, level);
>}
>pa.refreshPackages(null);
>pa.resolveBundles(null);
> }
> If there have been many bundles updated or uninstalled the code above might 
> create what looks like a deadlock (see Stack traces below)
> but in fact is a high contention problem. On our system (16 core Sun Sparcv9, 
> 64GB) with about 20 bundles (all updated, so refresh will be busy) 
> this will result in very poor runtime performance, it will take about 30 to 
> 60 minutes for pa.resolveBundles(null) to return.
> The problem lies in the asynchronous nature of 
> setBundleStartLevel/refreshPackages and the way that Felix uses locking 
> (acquireGlobalLock and acquireBundleLock). For example the following code 
> works fine (and for pa.resolveBundles(null) returns within some seconds) but 
> poses the problem of how to implement "magicWait":
> void createNoProblem(PackageAdmin pa, StartLevel sl, Bundle bundles[], int 
> level){
>for ( final Bundle b : bundles) {
>   sl.setBundleStartLevel(b, level);
>}
>// wait until the asynchronous sl.setBundleStartLevel logic has finished
>magicWait(sl);
>pa.refreshPackages(null);
>// wait until the asynchronous pa.refreshPackages logic has finished
>magicWait(pa); 
>pa.resolveBundles(null);
> }
> At the moment I solved the problem by patching PackageAdminImpl like this (I 
> know this is an ugly solution buts its only a show case):
> public boolean isDone() {
>synchronized(this) {
>   final Bundle tmp[][] = m_reqBundles;
>   return tmp == null || tmp.length == 0;
>}
> }
> And implementing magicWait like this:
> void magicWait(final PackageAdmin pa){
> final Method method = pa.getClass().getMethod("isDone");
> method.setAccessible(true);
> while ( ! (Boolean)method.invoke(pa) ) {
>Thread.yield();
> }
> }
> Then I did something similar for StartLevel. 
> For me this patch/work around is fine for the moment but I think the problem 
> should be investigated and solved in the Felix framework.
> "FelixPackageAdmin" daemon prio=3 tid=0x0001005ac800 nid=0x1a in 
> Object.wait() [0x4f6fe000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x554000e0> (a [Ljava.lang.Object;)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4535)
>   - locked <0x554000e0> (a [Ljava.lang.Object;)
>   at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3314)
>   at 
> org.apache.felix.framework.PackageAdminImpl.run(PackageAdminImpl.java:331)
>   at java.lang.Thread.run(Unknown Source)
>Locked ownable synchronizers:
>   - None
>   
> "FelixStartLevel" daemon prio=3 tid=0x000100848000 nid=0x19 in 
> Object.wait() [0x4f8fe000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x554000e0> (a [Ljava.lang.Object;)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4462)
>   - locked <0x554000e0> (a [Ljava.lang.Object;)
>   at org.apache.felix.framework.Felix.setBundleStartLevel(Felix.java:1266)
>   at 
> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:270)
>   at java.lang.Thread.run(Unknown Source)
>Locked ownable synchronizers:
>   - None
>   
> "OSKi" prio=3 tid=0x0001006ea800 nid=0x1b runnable [0x4f4fd000]
>

[jira] Commented: (FELIX-2400) High contention (or deadlock) in PackageAdmin and StartLevel

2011-02-21 Thread Richard S. Hall (JIRA)

[ 
https://issues.apache.org/jira/browse/FELIX-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997543#comment-12997543
 ] 

Richard S. Hall commented on FELIX-2400:


Did you ever get around to testing this on a 3.0.x release of the framework?

> High contention (or deadlock) in PackageAdmin and StartLevel 
> -
>
> Key: FELIX-2400
> URL: https://issues.apache.org/jira/browse/FELIX-2400
> Project: Felix
>  Issue Type: Bug
>  Components: Framework
>Affects Versions: framework-2.0.5
> Environment: Felix 2.0.5
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)
> SunOS castor 5.10 Generic_13-06 sun4u sparc SUNW,Sun-Fire-V890
>Reporter: Alexander Berger
>
> Imagine the following code:
> void createProblem(PackageAdmin pa, StartLevel sl, Bundle bundles[], int 
> level){
>for ( final Bundle b : bundles) {
>   sl.setBundleStartLevel(b, level);
>}
>pa.refreshPackages(null);
>pa.resolveBundles(null);
> }
> If there have been many bundles updated or uninstalled the code above might 
> create what looks like a deadlock (see Stack traces below)
> but in fact is a high contention problem. On our system (16 core Sun Sparcv9, 
> 64GB) with about 20 bundles (all updated, so refresh will be busy) 
> this will result in very poor runtime performance, it will take about 30 to 
> 60 minutes for pa.resolveBundles(null) to return.
> The problem lies in the asynchronous nature of 
> setBundleStartLevel/refreshPackages and the way that Felix uses locking 
> (acquireGlobalLock and acquireBundleLock). For example the following code 
> works fine (and for pa.resolveBundles(null) returns within some seconds) but 
> poses the problem of how to implement "magicWait":
> void createNoProblem(PackageAdmin pa, StartLevel sl, Bundle bundles[], int 
> level){
>for ( final Bundle b : bundles) {
>   sl.setBundleStartLevel(b, level);
>}
>// wait until the asynchronous sl.setBundleStartLevel logic has finished
>magicWait(sl);
>pa.refreshPackages(null);
>// wait until the asynchronous pa.refreshPackages logic has finished
>magicWait(pa); 
>pa.resolveBundles(null);
> }
> At the moment I solved the problem by patching PackageAdminImpl like this (I 
> know this is an ugly solution buts its only a show case):
> public boolean isDone() {
>synchronized(this) {
>   final Bundle tmp[][] = m_reqBundles;
>   return tmp == null || tmp.length == 0;
>}
> }
> And implementing magicWait like this:
> void magicWait(final PackageAdmin pa){
> final Method method = pa.getClass().getMethod("isDone");
> method.setAccessible(true);
> while ( ! (Boolean)method.invoke(pa) ) {
>Thread.yield();
> }
> }
> Then I did something similar for StartLevel. 
> For me this patch/work around is fine for the moment but I think the problem 
> should be investigated and solved in the Felix framework.
> "FelixPackageAdmin" daemon prio=3 tid=0x0001005ac800 nid=0x1a in 
> Object.wait() [0x4f6fe000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x554000e0> (a [Ljava.lang.Object;)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4535)
>   - locked <0x554000e0> (a [Ljava.lang.Object;)
>   at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3314)
>   at 
> org.apache.felix.framework.PackageAdminImpl.run(PackageAdminImpl.java:331)
>   at java.lang.Thread.run(Unknown Source)
>Locked ownable synchronizers:
>   - None
>   
> "FelixStartLevel" daemon prio=3 tid=0x000100848000 nid=0x19 in 
> Object.wait() [0x4f8fe000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x554000e0> (a [Ljava.lang.Object;)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4462)
>   - locked <0x554000e0> (a [Ljava.lang.Object;)
>   at org.apache.felix.framework.Felix.setBundleStartLevel(Felix.java:1266)
>   at 
> org.apache.felix.framework.StartLevelImpl.run(StartLevelImpl.java:270)
>   at java.lang.Thread.run(Unknown Source)
>Locked ownable synchronizers:
>   - None
>   
> "OSKi" prio=3 tid=0x0001006ea800 nid=0x1b runnable [0x4f4fd000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.felix.framework.searchpolicy.ResolvedPackage.clone(ResolvedPackage.java:62)
>   at 
> org.apache.felix.framework.searchpolicy.Resolver.isClassSpaceConsistent(Resolver.ja

[jira] Commented: (FELIX-2400) High contention (or deadlock) in PackageAdmin and StartLevel

2010-06-09 Thread Richard S. Hall (JIRA)

[ 
https://issues.apache.org/jira/browse/FELIX-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877087#action_12877087
 ] 

Richard S. Hall commented on FELIX-2400:


It is not clear to me why there would be interference between 
pa.refreshPackages() and pa.resolveBundles(), since both need to acquire the 
global lock and therefore by definition they will be serialized. Of course, 
there is a race condition regarding which will happen first in your scenario.

Regardinging, the interference with start level, there could be something there 
and that is tricky. We'd have to investigate it more. This area hasn't change 
much for framework 3.0 so i'd expect the issue to be the same, but feel free to 
test it.

> High contention (or deadlock) in PackageAdmin and StartLevel 
> -
>
> Key: FELIX-2400
> URL: https://issues.apache.org/jira/browse/FELIX-2400
> Project: Felix
>  Issue Type: Bug
>  Components: Framework
>Affects Versions: framework-2.0.5
> Environment: Felix 2.0.5
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)
> SunOS castor 5.10 Generic_13-06 sun4u sparc SUNW,Sun-Fire-V890
>Reporter: Alexander Berger
>
> Imagine the following code:
> void createProblem(PackageAdmin pa, StartLevel sl, Bundle bundles[], int 
> level){
>for ( final Bundle b : bundles) {
>   sl.setBundleStartLevel(b, level);
>}
>pa.refreshPackages(null);
>pa.resolveBundles(null);
> }
> If there have been many bundles updated or uninstalled the code above might 
> create what looks like a deadlock (see Stack traces below)
> but in fact is a high contention problem. On our system (16 core Sun Sparcv9, 
> 64GB) with about 20 bundles (all updated, so refresh will be busy) 
> this will result in very poor runtime performance, it will take about 30 to 
> 60 minutes for pa.resolveBundles(null) to return.
> The problem lies in the asynchronous nature of 
> setBundleStartLevel/refreshPackages and the way that Felix uses locking 
> (acquireGlobalLock and acquireBundleLock). For example the following code 
> works fine (and for pa.resolveBundles(null) returns within some seconds) but 
> poses the problem of how to implement "magicWait":
> void createNoProblem(PackageAdmin pa, StartLevel sl, Bundle bundles[], int 
> level){
>for ( final Bundle b : bundles) {
>   sl.setBundleStartLevel(b, level);
>}
>// wait until the asynchronous sl.setBundleStartLevel logic has finished
>magicWait(sl);
>pa.refreshPackages(null);
>// wait until the asynchronous pa.refreshPackages logic has finished
>magicWait(pa); 
>pa.resolveBundles(null);
> }
> At the moment I solved the problem by patching PackageAdminImpl like this (I 
> know this is an ugly solution buts its only a show case):
> public boolean isDone() {
>synchronized(this) {
>   final Bundle tmp[][] = m_reqBundles;
>   return tmp == null || tmp.length == 0;
>}
> }
> And implementing magicWait like this:
> void magicWait(final PackageAdmin pa){
> final Method method = pa.getClass().getMethod("isDone");
> method.setAccessible(true);
> while ( ! (Boolean)method.invoke(pa) ) {
>Thread.yield();
> }
> }
> Then I did something similar for StartLevel. 
> For me this patch/work around is fine for the moment but I think the problem 
> should be investigated and solved in the Felix framework.
> "FelixPackageAdmin" daemon prio=3 tid=0x0001005ac800 nid=0x1a in 
> Object.wait() [0x4f6fe000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x554000e0> (a [Ljava.lang.Object;)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:4535)
>   - locked <0x554000e0> (a [Ljava.lang.Object;)
>   at org.apache.felix.framework.Felix.refreshPackages(Felix.java:3314)
>   at 
> org.apache.felix.framework.PackageAdminImpl.run(PackageAdminImpl.java:331)
>   at java.lang.Thread.run(Unknown Source)
>Locked ownable synchronizers:
>   - None
>   
> "FelixStartLevel" daemon prio=3 tid=0x000100848000 nid=0x19 in 
> Object.wait() [0x4f8fe000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x554000e0> (a [Ljava.lang.Object;)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.felix.framework.Felix.acquireBundleLock(Felix.java:4462)
>   - locked <0x554000e0> (a [Ljava.lang.Object;)
>   at org.apache.felix.framework.Felix.setBundleStartLevel(Felix.java:1266)
>   at 
> org.apache.felix.framew