Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-19 Thread David Bosschaert
Thanks Pierre for submitting a unit test to FELIX-4866 that helped me
enormously in identifying the issue.

I have fixed the bug in my code (without degrading performance) and at
least your concurrency test, my concurrency tests and all the
framework unit tests now consistently pass. I would be very interested
in hearing whether your bigger test suit also still behaves as
expected.

Best regards,

David

On 14 May 2015 at 22:53, Pierre De Rop pierre.de...@gmail.com wrote:
 the threadump did not help.
 I will  investigate (may be a bug somewhere in my part; if this is the
 case, I would be sorry to make all this noise).

 hope to let you know soon.

 by the way, do you know how to run the SCR integration tests with the
 framework from the trunk ? I know that there are some SCR integration tests
 that are doing some load tests, and I would be interested to know if they
 are also ok with the framework from the trunk ?

 cheers;
 /Pierre


 On Thu, May 14, 2015 at 10:06 PM, David Bosschaert 
 david.bosscha...@gmail.com wrote:

 Hi Pierre,

 It would indeed be useful to find out more about why your test is
 hanging. Maybe analysing a threaddump might give some more
 information?

 Cheers,

 David

 On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote:
  Thanks David; I just gave a try, and indeed the parallel test passed. I
  observed a gain of around 7/10%. The tool is described in [1].
 
  But I only have 4 cores on my laptop and I will make more tests in my lab
  at work (next week) where we have some servers having 32 or even 128
  processors. This will give a better idea of the gain because the more
  processor you have, the more synchronization is costly, so I could
 possibly
  observe a better performance gain.
 
  Now, I'm sorry but I think that there is still a problem (I don't know
  where): when using more threads, the parallel test does not complete and
  stops with a timeout message, indicating that the number of expected
  components are not created after a timeout delay of 1 minute.
 
  So, I just committed a modified version of the tool in the sandbox which
  can now take a -Dthreads option in order to configure the number of
  threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does
  not complete and ends with a timeout:
 
  $ java -Dthreads=10 -server -jar bin/felix.jar
 
  g! Starting benchmarks (each tested bundle will add/remove 630 components
  during bundle activation).
 
  [Starting benchmarks with no processing done in components start
  methods]
 
  Benchmarking bundle:
  org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
  .Could not start
 components
  timely: current start latch=2, stop latch=630
 
  My current understanding of this is that some components are still
 awaiting
  for unsatisfied service dependencies, just like if a service tracker
 would
  have missed a service registration.
 
  I ran the same test during two hours with the previous framework version,
  and did not observe any problems.
 
  I wonder if someone else do have another tool in order to perform another
  kind of load test, just to see if some problems are also observed.
 
  - from  my side, I will do the following: in the past, the benchmark
 tool
  supported not only dependencymanager, but also Felix SCR and iPojo. So, I
  will reintroduce Felix SCR in the benchmark and will check if I also
  observe the problem (with -Dthreads=10).
 
  I will let you know.
 
  cheers;
  /Pierre
 
  [1]
 
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README
 
  On Thu, May 14, 2015 at 3:41 PM, David Bosschaert 
  david.bosscha...@gmail.com wrote:
 
  I've fixed this now in
  svn.apache.org/viewvc?view=revisionrevision=1679367
 
  Pierre, your loadtest now runs to completion - thanks for reporting
  this issue! I can see that the results for the parallel tests are a
  little bit different than before, but I'm not sure how to read them so
  I'll leave the interpretation of that to you :)
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com
  wrote:
   I think I know what this is. I had some additional changes exactly in
   this area that I simply forgot to apply this morning. I should have it
   fixed sometime today.
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com
 
  wrote:
   Hi Pierre,
  
   I'll take a look today.
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com
 wrote:
   I just committed the benchmark tool in
   http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you
  can
   take a look.
  
   To run the scenario:
  
   - install jdk8:
  
   [nxuser@nx0012 pderop]$ java -version
   java version 1.8.0_40
   Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
   Java HotSpot(TM) 64-Bit Server VM (build 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-19 Thread Pierre De Rop
Yes David, indeed our scenarios are different. In mine, I'm measuring the
assembly of components using parallel dependency manager where components
are concurrently activated, registered, and bound with each other, using a
shared thread pool.

cheers;
/Pierre




On Tue, May 19, 2015 at 3:40 PM, David Bosschaert 
david.bosscha...@gmail.com wrote:

 Hi Pierre,

 Good to hear that the problem is now gone.
 I guess the performance improvement measured hugely depends on what
 you are testing. My test focuses on multiple clients/threads/bundles
 accessing the same service (either singleton or PSF) in a very raw
 manner (via ctx.getServiceReference()).
 Good to hear that you're still seeing perf improvements but I guess
 your test exercises a number of other components as well (e.g.
 Dependency Manager) possibly using multiple service registrations, so
 that could very well explain some of the differences in our results...

 Cheers,

 David

 On 19 May 2015 at 13:32, Pierre De Rop pierre.de...@gmail.com wrote:
  Hi David,
 
  Excellent.
 
  I'm glad to confirm that the issue is resolved, and my DM loader is now
  running seamlessly.
  I'm observing an overall gain of 16% compared to the previous 5.0.0.
  (but this has to be taken with care,because I only made a quick test).
 
  I did not have time but I guess I could observe a better performance gain
  on a bigger host with more cpu (I only have four); since synchronization
  cost is usually proportional to the number of available cores and as I
  understand your fix is now based on java.util.concurrent jdk tools.
 
 
  many thanks
  /Pierre
 
  On Tue, May 19, 2015 at 1:57 PM, David Bosschaert 
  david.bosscha...@gmail.com wrote:
 
  Thanks Pierre for submitting a unit test to FELIX-4866 that helped me
  enormously in identifying the issue.
 
  I have fixed the bug in my code (without degrading performance) and at
  least your concurrency test, my concurrency tests and all the
  framework unit tests now consistently pass. I would be very interested
  in hearing whether your bigger test suit also still behaves as
  expected.
 
  Best regards,
 
  David
 
  On 14 May 2015 at 22:53, Pierre De Rop pierre.de...@gmail.com wrote:
   the threadump did not help.
   I will  investigate (may be a bug somewhere in my part; if this is the
   case, I would be sorry to make all this noise).
  
   hope to let you know soon.
  
   by the way, do you know how to run the SCR integration tests with the
   framework from the trunk ? I know that there are some SCR integration
  tests
   that are doing some load tests, and I would be interested to know if
 they
   are also ok with the framework from the trunk ?
  
   cheers;
   /Pierre
  
  
   On Thu, May 14, 2015 at 10:06 PM, David Bosschaert 
   david.bosscha...@gmail.com wrote:
  
   Hi Pierre,
  
   It would indeed be useful to find out more about why your test is
   hanging. Maybe analysing a threaddump might give some more
   information?
  
   Cheers,
  
   David
  
   On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com
 wrote:
Thanks David; I just gave a try, and indeed the parallel test
 passed.
  I
observed a gain of around 7/10%. The tool is described in [1].
   
But I only have 4 cores on my laptop and I will make more tests in
 my
  lab
at work (next week) where we have some servers having 32 or even
 128
processors. This will give a better idea of the gain because the
 more
processor you have, the more synchronization is costly, so I could
   possibly
observe a better performance gain.
   
Now, I'm sorry but I think that there is still a problem (I don't
 know
where): when using more threads, the parallel test does not
 complete
  and
stops with a timeout message, indicating that the number of
 expected
components are not created after a timeout delay of 1 minute.
   
So, I just committed a modified version of the tool in the sandbox
  which
can now take a -Dthreads option in order to configure the number of
threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test
  does
not complete and ends with a timeout:
   
$ java -Dthreads=10 -server -jar bin/felix.jar
   
g! Starting benchmarks (each tested bundle will add/remove 630
  components
during bundle activation).
   
[Starting benchmarks with no processing done in components
  start
methods]
   
Benchmarking bundle:
   
  org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
.Could not start
   components
timely: current start latch=2, stop latch=630
   
My current understanding of this is that some components are still
   awaiting
for unsatisfied service dependencies, just like if a service
 tracker
   would
have missed a service registration.
   
I ran the same test during two hours with the previous framework
  version,
and did not observe any problems.
   

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-19 Thread Pierre De Rop
Hi David,

Excellent.

I'm glad to confirm that the issue is resolved, and my DM loader is now
running seamlessly.
I'm observing an overall gain of 16% compared to the previous 5.0.0.
(but this has to be taken with care,because I only made a quick test).

I did not have time but I guess I could observe a better performance gain
on a bigger host with more cpu (I only have four); since synchronization
cost is usually proportional to the number of available cores and as I
understand your fix is now based on java.util.concurrent jdk tools.


many thanks
/Pierre

On Tue, May 19, 2015 at 1:57 PM, David Bosschaert 
david.bosscha...@gmail.com wrote:

 Thanks Pierre for submitting a unit test to FELIX-4866 that helped me
 enormously in identifying the issue.

 I have fixed the bug in my code (without degrading performance) and at
 least your concurrency test, my concurrency tests and all the
 framework unit tests now consistently pass. I would be very interested
 in hearing whether your bigger test suit also still behaves as
 expected.

 Best regards,

 David

 On 14 May 2015 at 22:53, Pierre De Rop pierre.de...@gmail.com wrote:
  the threadump did not help.
  I will  investigate (may be a bug somewhere in my part; if this is the
  case, I would be sorry to make all this noise).
 
  hope to let you know soon.
 
  by the way, do you know how to run the SCR integration tests with the
  framework from the trunk ? I know that there are some SCR integration
 tests
  that are doing some load tests, and I would be interested to know if they
  are also ok with the framework from the trunk ?
 
  cheers;
  /Pierre
 
 
  On Thu, May 14, 2015 at 10:06 PM, David Bosschaert 
  david.bosscha...@gmail.com wrote:
 
  Hi Pierre,
 
  It would indeed be useful to find out more about why your test is
  hanging. Maybe analysing a threaddump might give some more
  information?
 
  Cheers,
 
  David
 
  On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote:
   Thanks David; I just gave a try, and indeed the parallel test passed.
 I
   observed a gain of around 7/10%. The tool is described in [1].
  
   But I only have 4 cores on my laptop and I will make more tests in my
 lab
   at work (next week) where we have some servers having 32 or even 128
   processors. This will give a better idea of the gain because the more
   processor you have, the more synchronization is costly, so I could
  possibly
   observe a better performance gain.
  
   Now, I'm sorry but I think that there is still a problem (I don't know
   where): when using more threads, the parallel test does not complete
 and
   stops with a timeout message, indicating that the number of expected
   components are not created after a timeout delay of 1 minute.
  
   So, I just committed a modified version of the tool in the sandbox
 which
   can now take a -Dthreads option in order to configure the number of
   threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test
 does
   not complete and ends with a timeout:
  
   $ java -Dthreads=10 -server -jar bin/felix.jar
  
   g! Starting benchmarks (each tested bundle will add/remove 630
 components
   during bundle activation).
  
   [Starting benchmarks with no processing done in components
 start
   methods]
  
   Benchmarking bundle:
  
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
   .Could not start
  components
   timely: current start latch=2, stop latch=630
  
   My current understanding of this is that some components are still
  awaiting
   for unsatisfied service dependencies, just like if a service tracker
  would
   have missed a service registration.
  
   I ran the same test during two hours with the previous framework
 version,
   and did not observe any problems.
  
   I wonder if someone else do have another tool in order to perform
 another
   kind of load test, just to see if some problems are also observed.
  
   - from  my side, I will do the following: in the past, the benchmark
  tool
   supported not only dependencymanager, but also Felix SCR and iPojo.
 So, I
   will reintroduce Felix SCR in the benchmark and will check if I also
   observe the problem (with -Dthreads=10).
  
   I will let you know.
  
   cheers;
   /Pierre
  
   [1]
  
 
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README
  
   On Thu, May 14, 2015 at 3:41 PM, David Bosschaert 
   david.bosscha...@gmail.com wrote:
  
   I've fixed this now in
   svn.apache.org/viewvc?view=revisionrevision=1679367
  
   Pierre, your loadtest now runs to completion - thanks for reporting
   this issue! I can see that the results for the parallel tests are a
   little bit different than before, but I'm not sure how to read them
 so
   I'll leave the interpretation of that to you :)
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:38, David Bosschaert 
 david.bosscha...@gmail.com
   wrote:
 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread Pierre De Rop
I just committed the benchmark tool in
http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can
take a look.

To run the scenario:

- install jdk8:

[nxuser@nx0012 pderop]$ java -version
java version 1.8.0_40
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

- checkout the loadtest from
http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/

- go the the loadtest directory and start the test, just like this:

$ java -server -jar bin/felix.jar
Welcome to Apache Felix Gogo

g! Starting benchmarks (each tested bundle will add/remove 630 components
during bundle activation).

[Starting benchmarks with no processing done in components start
methods]

Benchmarking bundle:
org.apache.felix.dependencymanager.benchmark.dependencymanager
..
- results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722
| 919,838,078]

Benchmarking bundle:
org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .


Here, the first
org.apache.felix.dependencymanager.benchmark.dependencymanager test
(single-threaded) passes OK. But the next one hangs
(org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel).
it uses a fork join pool with size=4.

and when typing log warn, we see:

log warn

2015.05.14 13:56:10 ERROR - Bundle:
org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel -
[ForkJoinPool-1-worker-3] Error processing tasks -
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
at
org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245)
at
org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212)
at
org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189)
at
org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269)
at
org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577)
at
org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655)
at
org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434)
at
org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422)
at
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375)
at
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319)
at
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295)
at
org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226)
at
org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657)
at
org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535)
at
org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492)
at
org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482)
at
org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227)
at
org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182)
at
org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165)
at
java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689)
at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)


(I will investigate also in my code to check if the problem does not come
from me ?)

cheers;
/Pierre


On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop pierre.de...@gmail.com
wrote:

 Hi David,

 I don't know if it's me (a bug in my benchmark tool) or if if there is a
 regression somewhere in the framework, by my parallel test does not pass
 anymore.

 The test first starts with a single-threaded scenario, which passes OK
 (org.apache.felix.dependencymanager.benchmark.dependencymanager), then when
 the parallel test starts
 (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel)
 it suddenly hangs, and when I type log warn under the gogo shell, I see
 the following exception:

 (I'm using java8):

 $ java -server -Xmx4g -Xms4g -jar bin/felix.jar
 
 Welcome to Apache Felix Gogo

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .

 (here, the dependencymanager.parallel test hangs and when I type log
 warn, I see this:)

 g! log warn
 2015.05.14 13:31:03 ERROR - 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread David Bosschaert
Hi Pierre,

I'll take a look today.

Cheers,

David

On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote:
 I just committed the benchmark tool in
 http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can
 take a look.

 To run the scenario:

 - install jdk8:

 [nxuser@nx0012 pderop]$ java -version
 java version 1.8.0_40
 Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
 Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

 - checkout the loadtest from
 http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/

 - go the the loadtest directory and start the test, just like this:

 $ java -server -jar bin/felix.jar
 Welcome to Apache Felix Gogo

 g! Starting benchmarks (each tested bundle will add/remove 630 components
 during bundle activation).

 [Starting benchmarks with no processing done in components start
 methods]

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager
 ..
 - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722
 | 919,838,078]

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .


 Here, the first
 org.apache.felix.dependencymanager.benchmark.dependencymanager test
 (single-threaded) passes OK. But the next one hangs
 (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel).
 it uses a fork join pool with size=4.

 and when typing log warn, we see:

 log warn

 2015.05.14 13:56:10 ERROR - Bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel -
 [ForkJoinPool-1-worker-3] Error processing tasks -
 java.util.ConcurrentModificationException
 at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
 at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
 at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189)
 at
 org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269)
 at
 org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577)
 at
 org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655)
 at
 org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434)
 at
 org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295)
 at
 org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226)
 at
 org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657)
 at
 org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535)
 at
 org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492)
 at
 org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482)
 at
 org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227)
 at
 org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182)
 at
 org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165)
 at
 java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
 at
 java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
 at
 java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689)
 at
 java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)


 (I will investigate also in my code to check if the problem does not come
 from me ?)

 cheers;
 /Pierre


 On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop pierre.de...@gmail.com
 wrote:

 Hi David,

 I don't know if it's me (a bug in my benchmark tool) or if if there is a
 regression somewhere in the framework, by my parallel test does not pass
 anymore.

 The test first starts with a single-threaded scenario, which passes OK
 (org.apache.felix.dependencymanager.benchmark.dependencymanager), then when
 the parallel test starts
 (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel)
 it suddenly hangs, and when I type log warn under the gogo shell, I see
 the following exception:

 (I'm using java8):

 $ java -server -Xmx4g -Xms4g -jar bin/felix.jar
 
 Welcome to Apache Felix Gogo

 Benchmarking bundle:
 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread David Bosschaert
I think I know what this is. I had some additional changes exactly in
this area that I simply forgot to apply this morning. I should have it
fixed sometime today.

Cheers,

David

On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote:
 Hi Pierre,

 I'll take a look today.

 Cheers,

 David

 On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote:
 I just committed the benchmark tool in
 http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can
 take a look.

 To run the scenario:

 - install jdk8:

 [nxuser@nx0012 pderop]$ java -version
 java version 1.8.0_40
 Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
 Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

 - checkout the loadtest from
 http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/

 - go the the loadtest directory and start the test, just like this:

 $ java -server -jar bin/felix.jar
 Welcome to Apache Felix Gogo

 g! Starting benchmarks (each tested bundle will add/remove 630 components
 during bundle activation).

 [Starting benchmarks with no processing done in components start
 methods]

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager
 ..
 - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722
 | 919,838,078]

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .


 Here, the first
 org.apache.felix.dependencymanager.benchmark.dependencymanager test
 (single-threaded) passes OK. But the next one hangs
 (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel).
 it uses a fork join pool with size=4.

 and when typing log warn, we see:

 log warn

 2015.05.14 13:56:10 ERROR - Bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel -
 [ForkJoinPool-1-worker-3] Error processing tasks -
 java.util.ConcurrentModificationException
 at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
 at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
 at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189)
 at
 org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269)
 at
 org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577)
 at
 org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655)
 at
 org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434)
 at
 org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295)
 at
 org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226)
 at
 org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657)
 at
 org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535)
 at
 org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492)
 at
 org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482)
 at
 org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227)
 at
 org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182)
 at
 org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165)
 at
 java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
 at
 java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
 at
 java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689)
 at
 java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)


 (I will investigate also in my code to check if the problem does not come
 from me ?)

 cheers;
 /Pierre


 On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop pierre.de...@gmail.com
 wrote:

 Hi David,

 I don't know if it's me (a bug in my benchmark tool) or if if there is a
 regression somewhere in the framework, by my parallel test does not pass
 anymore.

 The test first starts with a single-threaded scenario, which passes OK
 (org.apache.felix.dependencymanager.benchmark.dependencymanager), then when
 the parallel test starts
 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread David Bosschaert
I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367

Pierre, your loadtest now runs to completion - thanks for reporting
this issue! I can see that the results for the parallel tests are a
little bit different than before, but I'm not sure how to read them so
I'll leave the interpretation of that to you :)

Cheers,

David

On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote:
 I think I know what this is. I had some additional changes exactly in
 this area that I simply forgot to apply this morning. I should have it
 fixed sometime today.

 Cheers,

 David

 On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote:
 Hi Pierre,

 I'll take a look today.

 Cheers,

 David

 On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote:
 I just committed the benchmark tool in
 http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can
 take a look.

 To run the scenario:

 - install jdk8:

 [nxuser@nx0012 pderop]$ java -version
 java version 1.8.0_40
 Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
 Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

 - checkout the loadtest from
 http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/

 - go the the loadtest directory and start the test, just like this:

 $ java -server -jar bin/felix.jar
 Welcome to Apache Felix Gogo

 g! Starting benchmarks (each tested bundle will add/remove 630 components
 during bundle activation).

 [Starting benchmarks with no processing done in components start
 methods]

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager
 ..
 - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722
 | 919,838,078]

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .


 Here, the first
 org.apache.felix.dependencymanager.benchmark.dependencymanager test
 (single-threaded) passes OK. But the next one hangs
 (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel).
 it uses a fork join pool with size=4.

 and when typing log warn, we see:

 log warn

 2015.05.14 13:56:10 ERROR - Bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel -
 [ForkJoinPool-1-worker-3] Error processing tasks -
 java.util.ConcurrentModificationException
 at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
 at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
 at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212)
 at
 org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189)
 at
 org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269)
 at
 org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577)
 at
 org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655)
 at
 org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434)
 at
 org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319)
 at
 org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295)
 at
 org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226)
 at
 org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657)
 at
 org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535)
 at
 org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492)
 at
 org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482)
 at
 org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227)
 at
 org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182)
 at
 org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165)
 at
 java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
 at
 java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
 at
 java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689)
 at
 java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)


 (I will investigate also in my code to check if the problem does not come
 from me ?)

 cheers;
 /Pierre


 On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread Pierre De Rop
Thanks David; I just gave a try, and indeed the parallel test passed. I
observed a gain of around 7/10%. The tool is described in [1].

But I only have 4 cores on my laptop and I will make more tests in my lab
at work (next week) where we have some servers having 32 or even 128
processors. This will give a better idea of the gain because the more
processor you have, the more synchronization is costly, so I could possibly
observe a better performance gain.

Now, I'm sorry but I think that there is still a problem (I don't know
where): when using more threads, the parallel test does not complete and
stops with a timeout message, indicating that the number of expected
components are not created after a timeout delay of 1 minute.

So, I just committed a modified version of the tool in the sandbox which
can now take a -Dthreads option in order to configure the number of
threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does
not complete and ends with a timeout:

$ java -Dthreads=10 -server -jar bin/felix.jar

g! Starting benchmarks (each tested bundle will add/remove 630 components
during bundle activation).

[Starting benchmarks with no processing done in components start
methods]

Benchmarking bundle:
org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
.Could not start components
timely: current start latch=2, stop latch=630

My current understanding of this is that some components are still awaiting
for unsatisfied service dependencies, just like if a service tracker would
have missed a service registration.

I ran the same test during two hours with the previous framework version,
and did not observe any problems.

I wonder if someone else do have another tool in order to perform another
kind of load test, just to see if some problems are also observed.

- from  my side, I will do the following: in the past, the benchmark tool
supported not only dependencymanager, but also Felix SCR and iPojo. So, I
will reintroduce Felix SCR in the benchmark and will check if I also
observe the problem (with -Dthreads=10).

I will let you know.

cheers;
/Pierre

[1]
http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README

On Thu, May 14, 2015 at 3:41 PM, David Bosschaert 
david.bosscha...@gmail.com wrote:

 I've fixed this now in
 svn.apache.org/viewvc?view=revisionrevision=1679367

 Pierre, your loadtest now runs to completion - thanks for reporting
 this issue! I can see that the results for the parallel tests are a
 little bit different than before, but I'm not sure how to read them so
 I'll leave the interpretation of that to you :)

 Cheers,

 David

 On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com
 wrote:
  I think I know what this is. I had some additional changes exactly in
  this area that I simply forgot to apply this morning. I should have it
  fixed sometime today.
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com
 wrote:
  Hi Pierre,
 
  I'll take a look today.
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote:
  I just committed the benchmark tool in
  http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you
 can
  take a look.
 
  To run the scenario:
 
  - install jdk8:
 
  [nxuser@nx0012 pderop]$ java -version
  java version 1.8.0_40
  Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
  Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
 
  - checkout the loadtest from
  http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/
 
  - go the the loadtest directory and start the test, just like this:
 
  $ java -server -jar bin/felix.jar
  Welcome to Apache Felix Gogo
 
  g! Starting benchmarks (each tested bundle will add/remove 630
 components
  during bundle activation).
 
  [Starting benchmarks with no processing done in components
 start
  methods]
 
  Benchmarking bundle:
  org.apache.felix.dependencymanager.benchmark.dependencymanager
  ..
  - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 |
 319,631,722
  | 919,838,078]
 
  Benchmarking bundle:
 
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .
 
 
  Here, the first
  org.apache.felix.dependencymanager.benchmark.dependencymanager test
  (single-threaded) passes OK. But the next one hangs
 
 (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel).
  it uses a fork join pool with size=4.
 
  and when typing log warn, we see:
 
  log warn
 
  2015.05.14 13:56:10 ERROR - Bundle:
 
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel -
  [ForkJoinPool-1-worker-3] Error processing tasks -
  java.util.ConcurrentModificationException
  at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
  at 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread Pierre De Rop
oops, I just realise that it's not making sense to include the SCR test
bundle in the benchmark tools, since it's not a concurrent test.

So for now, I have clues about where the problem may come from.


cheers;
/Pierre

On Thu, May 14, 2015 at 7:54 PM, Pierre De Rop pierre.de...@gmail.com
wrote:

 Thanks David; I just gave a try, and indeed the parallel test passed. I
 observed a gain of around 7/10%. The tool is described in [1].

 But I only have 4 cores on my laptop and I will make more tests in my lab
 at work (next week) where we have some servers having 32 or even 128
 processors. This will give a better idea of the gain because the more
 processor you have, the more synchronization is costly, so I could possibly
 observe a better performance gain.

 Now, I'm sorry but I think that there is still a problem (I don't know
 where): when using more threads, the parallel test does not complete and
 stops with a timeout message, indicating that the number of expected
 components are not created after a timeout delay of 1 minute.

 So, I just committed a modified version of the tool in the sandbox which
 can now take a -Dthreads option in order to configure the number of
 threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does
 not complete and ends with a timeout:

 $ java -Dthreads=10 -server -jar bin/felix.jar

 g! Starting benchmarks (each tested bundle will add/remove 630 components
 during bundle activation).

 [Starting benchmarks with no processing done in components start
 methods]

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
 .Could not start components
 timely: current start latch=2, stop latch=630

 My current understanding of this is that some components are still
 awaiting for unsatisfied service dependencies, just like if a service
 tracker would have missed a service registration.

 I ran the same test during two hours with the previous framework version,
 and did not observe any problems.

 I wonder if someone else do have another tool in order to perform another
 kind of load test, just to see if some problems are also observed.

 - from  my side, I will do the following: in the past, the benchmark tool
 supported not only dependencymanager, but also Felix SCR and iPojo. So, I
 will reintroduce Felix SCR in the benchmark and will check if I also
 observe the problem (with -Dthreads=10).

 I will let you know.

 cheers;
 /Pierre

 [1]
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README

 On Thu, May 14, 2015 at 3:41 PM, David Bosschaert 
 david.bosscha...@gmail.com wrote:

 I've fixed this now in
 svn.apache.org/viewvc?view=revisionrevision=1679367

 Pierre, your loadtest now runs to completion - thanks for reporting
 this issue! I can see that the results for the parallel tests are a
 little bit different than before, but I'm not sure how to read them so
 I'll leave the interpretation of that to you :)

 Cheers,

 David

 On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com
 wrote:
  I think I know what this is. I had some additional changes exactly in
  this area that I simply forgot to apply this morning. I should have it
  fixed sometime today.
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com
 wrote:
  Hi Pierre,
 
  I'll take a look today.
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote:
  I just committed the benchmark tool in
  http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you
 can
  take a look.
 
  To run the scenario:
 
  - install jdk8:
 
  [nxuser@nx0012 pderop]$ java -version
  java version 1.8.0_40
  Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
  Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
 
  - checkout the loadtest from
  http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/
 
  - go the the loadtest directory and start the test, just like this:
 
  $ java -server -jar bin/felix.jar
  Welcome to Apache Felix Gogo
 
  g! Starting benchmarks (each tested bundle will add/remove 630
 components
  during bundle activation).
 
  [Starting benchmarks with no processing done in components
 start
  methods]
 
  Benchmarking bundle:
  org.apache.felix.dependencymanager.benchmark.dependencymanager
  ..
  - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 |
 319,631,722
  | 919,838,078]
 
  Benchmarking bundle:
 
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .
 
 
  Here, the first
  org.apache.felix.dependencymanager.benchmark.dependencymanager test
  (single-threaded) passes OK. But the next one hangs
 
 (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel).
  it uses a fork join pool with size=4.
 
  and when typing log warn, we see:

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread David Bosschaert
Hi Pierre,

It would indeed be useful to find out more about why your test is
hanging. Maybe analysing a threaddump might give some more
information?

Cheers,

David

On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote:
 Thanks David; I just gave a try, and indeed the parallel test passed. I
 observed a gain of around 7/10%. The tool is described in [1].

 But I only have 4 cores on my laptop and I will make more tests in my lab
 at work (next week) where we have some servers having 32 or even 128
 processors. This will give a better idea of the gain because the more
 processor you have, the more synchronization is costly, so I could possibly
 observe a better performance gain.

 Now, I'm sorry but I think that there is still a problem (I don't know
 where): when using more threads, the parallel test does not complete and
 stops with a timeout message, indicating that the number of expected
 components are not created after a timeout delay of 1 minute.

 So, I just committed a modified version of the tool in the sandbox which
 can now take a -Dthreads option in order to configure the number of
 threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does
 not complete and ends with a timeout:

 $ java -Dthreads=10 -server -jar bin/felix.jar

 g! Starting benchmarks (each tested bundle will add/remove 630 components
 during bundle activation).

 [Starting benchmarks with no processing done in components start
 methods]

 Benchmarking bundle:
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
 .Could not start components
 timely: current start latch=2, stop latch=630

 My current understanding of this is that some components are still awaiting
 for unsatisfied service dependencies, just like if a service tracker would
 have missed a service registration.

 I ran the same test during two hours with the previous framework version,
 and did not observe any problems.

 I wonder if someone else do have another tool in order to perform another
 kind of load test, just to see if some problems are also observed.

 - from  my side, I will do the following: in the past, the benchmark tool
 supported not only dependencymanager, but also Felix SCR and iPojo. So, I
 will reintroduce Felix SCR in the benchmark and will check if I also
 observe the problem (with -Dthreads=10).

 I will let you know.

 cheers;
 /Pierre

 [1]
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README

 On Thu, May 14, 2015 at 3:41 PM, David Bosschaert 
 david.bosscha...@gmail.com wrote:

 I've fixed this now in
 svn.apache.org/viewvc?view=revisionrevision=1679367

 Pierre, your loadtest now runs to completion - thanks for reporting
 this issue! I can see that the results for the parallel tests are a
 little bit different than before, but I'm not sure how to read them so
 I'll leave the interpretation of that to you :)

 Cheers,

 David

 On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com
 wrote:
  I think I know what this is. I had some additional changes exactly in
  this area that I simply forgot to apply this morning. I should have it
  fixed sometime today.
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com
 wrote:
  Hi Pierre,
 
  I'll take a look today.
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote:
  I just committed the benchmark tool in
  http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you
 can
  take a look.
 
  To run the scenario:
 
  - install jdk8:
 
  [nxuser@nx0012 pderop]$ java -version
  java version 1.8.0_40
  Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
  Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
 
  - checkout the loadtest from
  http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/
 
  - go the the loadtest directory and start the test, just like this:
 
  $ java -server -jar bin/felix.jar
  Welcome to Apache Felix Gogo
 
  g! Starting benchmarks (each tested bundle will add/remove 630
 components
  during bundle activation).
 
  [Starting benchmarks with no processing done in components
 start
  methods]
 
  Benchmarking bundle:
  org.apache.felix.dependencymanager.benchmark.dependencymanager
  ..
  - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 |
 319,631,722
  | 919,838,078]
 
  Benchmarking bundle:
 
 org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .
 
 
  Here, the first
  org.apache.felix.dependencymanager.benchmark.dependencymanager test
  (single-threaded) passes OK. But the next one hangs
 
 (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel).
  it uses a fork join pool with size=4.
 
  and when typing log warn, we see:
 
  log warn
 
  2015.05.14 13:56:10 ERROR - Bundle:
 
 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread Pierre De Rop
the threadump did not help.
I will  investigate (may be a bug somewhere in my part; if this is the
case, I would be sorry to make all this noise).

hope to let you know soon.

by the way, do you know how to run the SCR integration tests with the
framework from the trunk ? I know that there are some SCR integration tests
that are doing some load tests, and I would be interested to know if they
are also ok with the framework from the trunk ?

cheers;
/Pierre


On Thu, May 14, 2015 at 10:06 PM, David Bosschaert 
david.bosscha...@gmail.com wrote:

 Hi Pierre,

 It would indeed be useful to find out more about why your test is
 hanging. Maybe analysing a threaddump might give some more
 information?

 Cheers,

 David

 On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote:
  Thanks David; I just gave a try, and indeed the parallel test passed. I
  observed a gain of around 7/10%. The tool is described in [1].
 
  But I only have 4 cores on my laptop and I will make more tests in my lab
  at work (next week) where we have some servers having 32 or even 128
  processors. This will give a better idea of the gain because the more
  processor you have, the more synchronization is costly, so I could
 possibly
  observe a better performance gain.
 
  Now, I'm sorry but I think that there is still a problem (I don't know
  where): when using more threads, the parallel test does not complete and
  stops with a timeout message, indicating that the number of expected
  components are not created after a timeout delay of 1 minute.
 
  So, I just committed a modified version of the tool in the sandbox which
  can now take a -Dthreads option in order to configure the number of
  threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does
  not complete and ends with a timeout:
 
  $ java -Dthreads=10 -server -jar bin/felix.jar
 
  g! Starting benchmarks (each tested bundle will add/remove 630 components
  during bundle activation).
 
  [Starting benchmarks with no processing done in components start
  methods]
 
  Benchmarking bundle:
  org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
  .Could not start
 components
  timely: current start latch=2, stop latch=630
 
  My current understanding of this is that some components are still
 awaiting
  for unsatisfied service dependencies, just like if a service tracker
 would
  have missed a service registration.
 
  I ran the same test during two hours with the previous framework version,
  and did not observe any problems.
 
  I wonder if someone else do have another tool in order to perform another
  kind of load test, just to see if some problems are also observed.
 
  - from  my side, I will do the following: in the past, the benchmark
 tool
  supported not only dependencymanager, but also Felix SCR and iPojo. So, I
  will reintroduce Felix SCR in the benchmark and will check if I also
  observe the problem (with -Dthreads=10).
 
  I will let you know.
 
  cheers;
  /Pierre
 
  [1]
 
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README
 
  On Thu, May 14, 2015 at 3:41 PM, David Bosschaert 
  david.bosscha...@gmail.com wrote:
 
  I've fixed this now in
  svn.apache.org/viewvc?view=revisionrevision=1679367
 
  Pierre, your loadtest now runs to completion - thanks for reporting
  this issue! I can see that the results for the parallel tests are a
  little bit different than before, but I'm not sure how to read them so
  I'll leave the interpretation of that to you :)
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com
  wrote:
   I think I know what this is. I had some additional changes exactly in
   this area that I simply forgot to apply this morning. I should have it
   fixed sometime today.
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com
 
  wrote:
   Hi Pierre,
  
   I'll take a look today.
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com
 wrote:
   I just committed the benchmark tool in
   http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you
  can
   take a look.
  
   To run the scenario:
  
   - install jdk8:
  
   [nxuser@nx0012 pderop]$ java -version
   java version 1.8.0_40
   Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
   Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
  
   - checkout the loadtest from
   http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/
  
   - go the the loadtest directory and start the test, just like
 this:
  
   $ java -server -jar bin/felix.jar
   Welcome to Apache Felix Gogo
  
   g! Starting benchmarks (each tested bundle will add/remove 630
  components
   during bundle activation).
  
   [Starting benchmarks with no processing done in components
  start
   methods]
  
   

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread Pierre De Rop
ok, it's a bit late, I will continue tomorrow.

What I just found is that when the test fails, we are in the following
situation:
A DM component C1 that is part of the test remains inactive because it is
awaiting for a service dependency on C2.
But C2 is actually registered in the OSGi service registry (I verified it
using inspect capability service gogo command).

And it looks like the service tracker used by C1 to track C2 has never been
called in the addingService(C2).
That is why C1 remains inactive and this makes the test failing (I added
some debug code in dependency manager in order to verify this).

so, it will be difficult to make an integration test, but I think there is
still a problem somewhere in the framework.
I also ran the tests of DM and I have now 4 failing tests.

will continue to investigate tomorrow if I can.

cheers;
/Pierre



On Thu, May 14, 2015 at 10:06 PM, David Bosschaert 
david.bosscha...@gmail.com wrote:

 Hi Pierre,

 It would indeed be useful to find out more about why your test is
 hanging. Maybe analysing a threaddump might give some more
 information?

 Cheers,

 David

 On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote:
  Thanks David; I just gave a try, and indeed the parallel test passed. I
  observed a gain of around 7/10%. The tool is described in [1].
 
  But I only have 4 cores on my laptop and I will make more tests in my lab
  at work (next week) where we have some servers having 32 or even 128
  processors. This will give a better idea of the gain because the more
  processor you have, the more synchronization is costly, so I could
 possibly
  observe a better performance gain.
 
  Now, I'm sorry but I think that there is still a problem (I don't know
  where): when using more threads, the parallel test does not complete and
  stops with a timeout message, indicating that the number of expected
  components are not created after a timeout delay of 1 minute.
 
  So, I just committed a modified version of the tool in the sandbox which
  can now take a -Dthreads option in order to configure the number of
  threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does
  not complete and ends with a timeout:
 
  $ java -Dthreads=10 -server -jar bin/felix.jar
 
  g! Starting benchmarks (each tested bundle will add/remove 630 components
  during bundle activation).
 
  [Starting benchmarks with no processing done in components start
  methods]
 
  Benchmarking bundle:
  org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
  .Could not start
 components
  timely: current start latch=2, stop latch=630
 
  My current understanding of this is that some components are still
 awaiting
  for unsatisfied service dependencies, just like if a service tracker
 would
  have missed a service registration.
 
  I ran the same test during two hours with the previous framework version,
  and did not observe any problems.
 
  I wonder if someone else do have another tool in order to perform another
  kind of load test, just to see if some problems are also observed.
 
  - from  my side, I will do the following: in the past, the benchmark
 tool
  supported not only dependencymanager, but also Felix SCR and iPojo. So, I
  will reintroduce Felix SCR in the benchmark and will check if I also
  observe the problem (with -Dthreads=10).
 
  I will let you know.
 
  cheers;
  /Pierre
 
  [1]
 
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README
 
  On Thu, May 14, 2015 at 3:41 PM, David Bosschaert 
  david.bosscha...@gmail.com wrote:
 
  I've fixed this now in
  svn.apache.org/viewvc?view=revisionrevision=1679367
 
  Pierre, your loadtest now runs to completion - thanks for reporting
  this issue! I can see that the results for the parallel tests are a
  little bit different than before, but I'm not sure how to read them so
  I'll leave the interpretation of that to you :)
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com
  wrote:
   I think I know what this is. I had some additional changes exactly in
   this area that I simply forgot to apply this morning. I should have it
   fixed sometime today.
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com
 
  wrote:
   Hi Pierre,
  
   I'll take a look today.
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com
 wrote:
   I just committed the benchmark tool in
   http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you
  can
   take a look.
  
   To run the scenario:
  
   - install jdk8:
  
   [nxuser@nx0012 pderop]$ java -version
   java version 1.8.0_40
   Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
   Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
  
   - checkout the loadtest from
   

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread Pierre De Rop
the threadump did not help.
I will  investigate (may be a bug somewhere in my part; if this is the
case, I would be sorry to make all this noise).

hope to let you know soon.

by the way, do you know how to run the SCR integration tests with the
framework from the trunk ? I know that there are some SCR integration tests
that are doing some load tests, and I would be interested to know if they
are also ok with the framework from the trunk ?

cheers;
/Pierre


On Thu, May 14, 2015 at 10:06 PM, David Bosschaert 
david.bosscha...@gmail.com wrote:

 Hi Pierre,

 It would indeed be useful to find out more about why your test is
 hanging. Maybe analysing a threaddump might give some more
 information?

 Cheers,

 David

 On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote:
  Thanks David; I just gave a try, and indeed the parallel test passed. I
  observed a gain of around 7/10%. The tool is described in [1].
 
  But I only have 4 cores on my laptop and I will make more tests in my lab
  at work (next week) where we have some servers having 32 or even 128
  processors. This will give a better idea of the gain because the more
  processor you have, the more synchronization is costly, so I could
 possibly
  observe a better performance gain.
 
  Now, I'm sorry but I think that there is still a problem (I don't know
  where): when using more threads, the parallel test does not complete and
  stops with a timeout message, indicating that the number of expected
  components are not created after a timeout delay of 1 minute.
 
  So, I just committed a modified version of the tool in the sandbox which
  can now take a -Dthreads option in order to configure the number of
  threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does
  not complete and ends with a timeout:
 
  $ java -Dthreads=10 -server -jar bin/felix.jar
 
  g! Starting benchmarks (each tested bundle will add/remove 630 components
  during bundle activation).
 
  [Starting benchmarks with no processing done in components start
  methods]
 
  Benchmarking bundle:
  org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel
  .Could not start
 components
  timely: current start latch=2, stop latch=630
 
  My current understanding of this is that some components are still
 awaiting
  for unsatisfied service dependencies, just like if a service tracker
 would
  have missed a service registration.
 
  I ran the same test during two hours with the previous framework version,
  and did not observe any problems.
 
  I wonder if someone else do have another tool in order to perform another
  kind of load test, just to see if some problems are also observed.
 
  - from  my side, I will do the following: in the past, the benchmark
 tool
  supported not only dependencymanager, but also Felix SCR and iPojo. So, I
  will reintroduce Felix SCR in the benchmark and will check if I also
  observe the problem (with -Dthreads=10).
 
  I will let you know.
 
  cheers;
  /Pierre
 
  [1]
 
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README
 
  On Thu, May 14, 2015 at 3:41 PM, David Bosschaert 
  david.bosscha...@gmail.com wrote:
 
  I've fixed this now in
  svn.apache.org/viewvc?view=revisionrevision=1679367
 
  Pierre, your loadtest now runs to completion - thanks for reporting
  this issue! I can see that the results for the parallel tests are a
  little bit different than before, but I'm not sure how to read them so
  I'll leave the interpretation of that to you :)
 
  Cheers,
 
  David
 
  On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com
  wrote:
   I think I know what this is. I had some additional changes exactly in
   this area that I simply forgot to apply this morning. I should have it
   fixed sometime today.
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com
 
  wrote:
   Hi Pierre,
  
   I'll take a look today.
  
   Cheers,
  
   David
  
   On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com
 wrote:
   I just committed the benchmark tool in
   http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you
  can
   take a look.
  
   To run the scenario:
  
   - install jdk8:
  
   [nxuser@nx0012 pderop]$ java -version
   java version 1.8.0_40
   Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
   Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
  
   - checkout the loadtest from
   http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/
  
   - go the the loadtest directory and start the test, just like
 this:
  
   $ java -server -jar bin/felix.jar
   Welcome to Apache Felix Gogo
  
   g! Starting benchmarks (each tested bundle will add/remove 630
  components
   during bundle activation).
  
   [Starting benchmarks with no processing done in components
  start
   methods]
  
   

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread Pierre De Rop
Hi David,

I don't know if it's me (a bug in my benchmark tool) or if if there is a
regression somewhere in the framework, by my parallel test does not pass
anymore.

The test first starts with a single-threaded scenario, which passes OK
(org.apache.felix.dependencymanager.benchmark.dependencymanager), then when
the parallel test starts
(org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel)
it suddenly hangs, and when I type log warn under the gogo shell, I see
the following exception:

(I'm using java8):

$ java -server -Xmx4g -Xms4g -jar bin/felix.jar

Welcome to Apache Felix Gogo

Benchmarking bundle:
org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .

(here, the dependencymanager.parallel test hangs and when I type log
warn, I see this:)

g! log warn
2015.05.14 13:31:03 ERROR - Bundle:
org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel -
[ForkJoinPool-1-worker-3] Error processing tasks -
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
at
org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245)
at
org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212)
at
org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189)
at
org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269)
at
org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577)
at
org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655)
at
org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434)
at
org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422)
at
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375)
at
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319)
at
org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295)
at
org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226)
at
org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657)
at
org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535)
at
org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492)
at
org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482)
at
org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227)
at
org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182)
at
org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165)
at
java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689)
at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

(If I configure my threadpool to 1, I have no problems, but with
threadpool=4, then I have the problem)

I will investigate, but Ideally, may be it would be helpful if you could
also run the test by yourself; so I will commit soon something to reproduce
the problem in my sandbox.

cheers;
/Pierre

On Thu, May 14, 2015 at 11:11 AM, David Bosschaert 
david.bosscha...@gmail.com wrote:

 I've committed this now in
 http://svn.apache.org/viewvc?view=revisionrevision=1679327

 Curious to see what others are measuring. My tests were focused on
 multiple bundles/threads obtaining the same service, as that's were I
 saw a bit of contention.

 Cheers,

 David

 On 13 May 2015 at 15:10, Pierre De Rop pierre.de...@gmail.com wrote:
  Hi David,
 
  I'm looking forward to test your improvements using the dependencymanager
  benchmark tool ([1]).
 
 
  [1]
 
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/
 
  /Pierre
 
  On Wed, May 13, 2015 at 3:02 PM, David Bosschaert 
  david.bosscha...@gmail.com wrote:
 
  I have implemented the performance improvements that I was thinking of
  using Java 5 concurrency tools, they can be viewed at [1].
 
  I wrote a little performance test suite [2] that tests multithreaded
  service registry performance (10 threads) from single / multiple
  bundles with either singleton services and Prototype Service Factory
  services and the results are quite impressive. I'm getting performance
  improvements compared to the current trunk from 8 times better than
  the original (800%) to more than 30 times better 

Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-14 Thread David Bosschaert
I've committed this now in
http://svn.apache.org/viewvc?view=revisionrevision=1679327

Curious to see what others are measuring. My tests were focused on
multiple bundles/threads obtaining the same service, as that's were I
saw a bit of contention.

Cheers,

David

On 13 May 2015 at 15:10, Pierre De Rop pierre.de...@gmail.com wrote:
 Hi David,

 I'm looking forward to test your improvements using the dependencymanager
 benchmark tool ([1]).


 [1]
 http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/

 /Pierre

 On Wed, May 13, 2015 at 3:02 PM, David Bosschaert 
 david.bosscha...@gmail.com wrote:

 I have implemented the performance improvements that I was thinking of
 using Java 5 concurrency tools, they can be viewed at [1].

 I wrote a little performance test suite [2] that tests multithreaded
 service registry performance (10 threads) from single / multiple
 bundles with either singleton services and Prototype Service Factory
 services and the results are quite impressive. I'm getting performance
 improvements compared to the current trunk from 8 times better than
 the original (800%) to more than 30 times better (3000%).

 Carsten has already reviewed the code (thanks Carsten!) and I'm
 planning to commit it to Felix tomorrow if nobody objects.

 Cheers,

 David

 [1]
 https://github.com/bosschaert/felix/commit/e6a1b06c6e66d9c98e6d81b91ef7003c8e725450
 [2]
 https://github.com/bosschaert/coderthoughts/tree/master/service-registry-perftest/srperf

 On 23 March 2015 at 15:39, Richard S. Hall he...@ungoverned.org wrote:
  On 3/23/15 10:17 , David Bosschaert wrote:
 
  On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org
 wrote:
 
  On 3/23/15 03:55 , Guillaume Nodet wrote:
 
  There's a call to interrupt() in Felix#acquireBundleLock(), not sure
 if
  it
  can be the culprit though.
  Interrupts could also be caused by a bundle being shutdown while one
 of
  its
  thread is waiting for a service, which should is a valid use case
 imho.
  Anyway, I think sanely reacting to a thread being interrupted would be
  good.
 
 
  Yes, threads can be interrupted if they are holding a bundle lock and
 the
  global lock holder needs the bundle lock.
 
  I admit that I do not recall why we ignore the interrupt here, but
 didn't
  we
  implement service lookup so that a bundle lock wasn't necessary? I
  thought
  we just checked for the validity of the bundle context before returning
  or
  something. Perhaps we felt there was no reason to be interrupted in
 that
  case. I really don't know.
 
  I think that the Service Registry could be rewritten to be completely
  free of synchronized blocks using the Java 5 concurrency libraries,
 
 
  Well, that just moves the sync blocks to the library, but yeah sure.
 
  which I think would really be a better approach. There is too much
  locking going on in the current SR implementation IMHO.
 
 
  I don't really think there is too much, but it is complicated.
  Unfortunately, it is complicated to make sure that locks aren't held
 while
  do service lookups and this is complicated because you can run into
 cycles,
  etc.
 
  But feel free to try to simplify it.
 
 
  This brings the question: can we move to Java 5 (or Java 6) for the
  Framework codebase? AFAIK we're currently still JDK 1.4 compatible but
  I would be surprised if there is anyone who still needs a JDK that
  went end-of-life 7 years ago.
 
 
  At this point, it doesn't really matter to me.
 
  - richard
 
 
  Best regards,
 
  David
 
 



Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-13 Thread Pierre De Rop
Hi David,

I'm looking forward to test your improvements using the dependencymanager
benchmark tool ([1]).


[1]
http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/

/Pierre

On Wed, May 13, 2015 at 3:02 PM, David Bosschaert 
david.bosscha...@gmail.com wrote:

 I have implemented the performance improvements that I was thinking of
 using Java 5 concurrency tools, they can be viewed at [1].

 I wrote a little performance test suite [2] that tests multithreaded
 service registry performance (10 threads) from single / multiple
 bundles with either singleton services and Prototype Service Factory
 services and the results are quite impressive. I'm getting performance
 improvements compared to the current trunk from 8 times better than
 the original (800%) to more than 30 times better (3000%).

 Carsten has already reviewed the code (thanks Carsten!) and I'm
 planning to commit it to Felix tomorrow if nobody objects.

 Cheers,

 David

 [1]
 https://github.com/bosschaert/felix/commit/e6a1b06c6e66d9c98e6d81b91ef7003c8e725450
 [2]
 https://github.com/bosschaert/coderthoughts/tree/master/service-registry-perftest/srperf

 On 23 March 2015 at 15:39, Richard S. Hall he...@ungoverned.org wrote:
  On 3/23/15 10:17 , David Bosschaert wrote:
 
  On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org
 wrote:
 
  On 3/23/15 03:55 , Guillaume Nodet wrote:
 
  There's a call to interrupt() in Felix#acquireBundleLock(), not sure
 if
  it
  can be the culprit though.
  Interrupts could also be caused by a bundle being shutdown while one
 of
  its
  thread is waiting for a service, which should is a valid use case
 imho.
  Anyway, I think sanely reacting to a thread being interrupted would be
  good.
 
 
  Yes, threads can be interrupted if they are holding a bundle lock and
 the
  global lock holder needs the bundle lock.
 
  I admit that I do not recall why we ignore the interrupt here, but
 didn't
  we
  implement service lookup so that a bundle lock wasn't necessary? I
  thought
  we just checked for the validity of the bundle context before returning
  or
  something. Perhaps we felt there was no reason to be interrupted in
 that
  case. I really don't know.
 
  I think that the Service Registry could be rewritten to be completely
  free of synchronized blocks using the Java 5 concurrency libraries,
 
 
  Well, that just moves the sync blocks to the library, but yeah sure.
 
  which I think would really be a better approach. There is too much
  locking going on in the current SR implementation IMHO.
 
 
  I don't really think there is too much, but it is complicated.
  Unfortunately, it is complicated to make sure that locks aren't held
 while
  do service lookups and this is complicated because you can run into
 cycles,
  etc.
 
  But feel free to try to simplify it.
 
 
  This brings the question: can we move to Java 5 (or Java 6) for the
  Framework codebase? AFAIK we're currently still JDK 1.4 compatible but
  I would be surprised if there is anyone who still needs a JDK that
  went end-of-life 7 years ago.
 
 
  At this point, it doesn't really matter to me.
 
  - richard
 
 
  Best regards,
 
  David
 
 



Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-05-13 Thread David Bosschaert
I have implemented the performance improvements that I was thinking of
using Java 5 concurrency tools, they can be viewed at [1].

I wrote a little performance test suite [2] that tests multithreaded
service registry performance (10 threads) from single / multiple
bundles with either singleton services and Prototype Service Factory
services and the results are quite impressive. I'm getting performance
improvements compared to the current trunk from 8 times better than
the original (800%) to more than 30 times better (3000%).

Carsten has already reviewed the code (thanks Carsten!) and I'm
planning to commit it to Felix tomorrow if nobody objects.

Cheers,

David

[1] 
https://github.com/bosschaert/felix/commit/e6a1b06c6e66d9c98e6d81b91ef7003c8e725450
[2] 
https://github.com/bosschaert/coderthoughts/tree/master/service-registry-perftest/srperf

On 23 March 2015 at 15:39, Richard S. Hall he...@ungoverned.org wrote:
 On 3/23/15 10:17 , David Bosschaert wrote:

 On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org wrote:

 On 3/23/15 03:55 , Guillaume Nodet wrote:

 There's a call to interrupt() in Felix#acquireBundleLock(), not sure if
 it
 can be the culprit though.
 Interrupts could also be caused by a bundle being shutdown while one of
 its
 thread is waiting for a service, which should is a valid use case imho.
 Anyway, I think sanely reacting to a thread being interrupted would be
 good.


 Yes, threads can be interrupted if they are holding a bundle lock and the
 global lock holder needs the bundle lock.

 I admit that I do not recall why we ignore the interrupt here, but didn't
 we
 implement service lookup so that a bundle lock wasn't necessary? I
 thought
 we just checked for the validity of the bundle context before returning
 or
 something. Perhaps we felt there was no reason to be interrupted in that
 case. I really don't know.

 I think that the Service Registry could be rewritten to be completely
 free of synchronized blocks using the Java 5 concurrency libraries,


 Well, that just moves the sync blocks to the library, but yeah sure.

 which I think would really be a better approach. There is too much
 locking going on in the current SR implementation IMHO.


 I don't really think there is too much, but it is complicated.
 Unfortunately, it is complicated to make sure that locks aren't held while
 do service lookups and this is complicated because you can run into cycles,
 etc.

 But feel free to try to simplify it.


 This brings the question: can we move to Java 5 (or Java 6) for the
 Framework codebase? AFAIK we're currently still JDK 1.4 compatible but
 I would be surprised if there is anyone who still needs a JDK that
 went end-of-life 7 years ago.


 At this point, it doesn't really matter to me.

 - richard


 Best regards,

 David




Re: [Framework] ServiceRegistry.getService() endless loop with lock?

2015-03-23 Thread Richard S. Hall

On 3/23/15 03:55 , Guillaume Nodet wrote:

There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it
can be the culprit though.
Interrupts could also be caused by a bundle being shutdown while one of its
thread is waiting for a service, which should is a valid use case imho.
Anyway, I think sanely reacting to a thread being interrupted would be good.


Yes, threads can be interrupted if they are holding a bundle lock and 
the global lock holder needs the bundle lock.


I admit that I do not recall why we ignore the interrupt here, but 
didn't we implement service lookup so that a bundle lock wasn't 
necessary? I thought we just checked for the validity of the bundle 
context before returning or something. Perhaps we felt there was no 
reason to be interrupted in that case. I really don't know.


- richard




2015-03-23 8:46 GMT+01:00 Carsten Ziegeler cziege...@apache.org:


Am 23.03.15 um 01:25 schrieb Richard S. Hall:

On 3/21/15 05:52 , Carsten Ziegeler wrote:

Question remains, why the thread got interrupted in the first place.

It was something that you did as part of FELIX-4806:


Yes, I noticed this as well - and I have no idea why I did it. I know
that I worked on some other code at that time where it was good to add
the interrupt call. Maybe a case of repeating the pattern :(

But my question was not why this piece of code has been added but
rather why the thread gets interrupted in the first place.

Carsten
--
Carsten Ziegeler
Adobe Research Switzerland
cziege...@apache.org





Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-03-23 Thread Richard S. Hall

On 3/23/15 10:17 , David Bosschaert wrote:

On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org wrote:

On 3/23/15 03:55 , Guillaume Nodet wrote:

There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it
can be the culprit though.
Interrupts could also be caused by a bundle being shutdown while one of
its
thread is waiting for a service, which should is a valid use case imho.
Anyway, I think sanely reacting to a thread being interrupted would be
good.


Yes, threads can be interrupted if they are holding a bundle lock and the
global lock holder needs the bundle lock.

I admit that I do not recall why we ignore the interrupt here, but didn't we
implement service lookup so that a bundle lock wasn't necessary? I thought
we just checked for the validity of the bundle context before returning or
something. Perhaps we felt there was no reason to be interrupted in that
case. I really don't know.

I think that the Service Registry could be rewritten to be completely
free of synchronized blocks using the Java 5 concurrency libraries,


Well, that just moves the sync blocks to the library, but yeah sure.


which I think would really be a better approach. There is too much
locking going on in the current SR implementation IMHO.


I don't really think there is too much, but it is complicated. 
Unfortunately, it is complicated to make sure that locks aren't held 
while do service lookups and this is complicated because you can run 
into cycles, etc.


But feel free to try to simplify it.



This brings the question: can we move to Java 5 (or Java 6) for the
Framework codebase? AFAIK we're currently still JDK 1.4 compatible but
I would be surprised if there is anyone who still needs a JDK that
went end-of-life 7 years ago.


At this point, it doesn't really matter to me.

- richard



Best regards,

David




Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)

2015-03-23 Thread David Bosschaert
On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org wrote:
 On 3/23/15 03:55 , Guillaume Nodet wrote:

 There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it
 can be the culprit though.
 Interrupts could also be caused by a bundle being shutdown while one of
 its
 thread is waiting for a service, which should is a valid use case imho.
 Anyway, I think sanely reacting to a thread being interrupted would be
 good.


 Yes, threads can be interrupted if they are holding a bundle lock and the
 global lock holder needs the bundle lock.

 I admit that I do not recall why we ignore the interrupt here, but didn't we
 implement service lookup so that a bundle lock wasn't necessary? I thought
 we just checked for the validity of the bundle context before returning or
 something. Perhaps we felt there was no reason to be interrupted in that
 case. I really don't know.

I think that the Service Registry could be rewritten to be completely
free of synchronized blocks using the Java 5 concurrency libraries,
which I think would really be a better approach. There is too much
locking going on in the current SR implementation IMHO.

This brings the question: can we move to Java 5 (or Java 6) for the
Framework codebase? AFAIK we're currently still JDK 1.4 compatible but
I would be surprised if there is anyone who still needs a JDK that
went end-of-life 7 years ago.

Best regards,

David


Re: [Framework] ServiceRegistry.getService() endless loop with lock?

2015-03-23 Thread Carsten Ziegeler
Am 23.03.15 um 01:25 schrieb Richard S. Hall:
 On 3/21/15 05:52 , Carsten Ziegeler wrote:

 Question remains, why the thread got interrupted in the first place.
 
 It was something that you did as part of FELIX-4806:
 
Yes, I noticed this as well - and I have no idea why I did it. I know
that I worked on some other code at that time where it was good to add
the interrupt call. Maybe a case of repeating the pattern :(

But my question was not why this piece of code has been added but
rather why the thread gets interrupted in the first place.

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziege...@apache.org


Re: [Framework] ServiceRegistry.getService() endless loop with lock?

2015-03-23 Thread Guillaume Nodet
There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it
can be the culprit though.
Interrupts could also be caused by a bundle being shutdown while one of its
thread is waiting for a service, which should is a valid use case imho.
Anyway, I think sanely reacting to a thread being interrupted would be good.


2015-03-23 8:46 GMT+01:00 Carsten Ziegeler cziege...@apache.org:

 Am 23.03.15 um 01:25 schrieb Richard S. Hall:
  On 3/21/15 05:52 , Carsten Ziegeler wrote:
 
  Question remains, why the thread got interrupted in the first place.
 
  It was something that you did as part of FELIX-4806:
 
 Yes, I noticed this as well - and I have no idea why I did it. I know
 that I worked on some other code at that time where it was good to add
 the interrupt call. Maybe a case of repeating the pattern :(

 But my question was not why this piece of code has been added but
 rather why the thread gets interrupted in the first place.

 Carsten
 --
 Carsten Ziegeler
 Adobe Research Switzerland
 cziege...@apache.org



Re: [Framework] ServiceRegistry.getService() endless loop with lock?

2015-03-23 Thread Robert Munteanu
On Fri, Mar 20, 2015 at 7:41 PM, David Bosschaert
david.bosscha...@gmail.com wrote:
 Some more thoughts about this...

 The wait() call in the getService() method is as follows:

 synchronized (this)
 {
 // First make sure that no existing operation is currently
 // being performed by another thread on the service registration.
 for (Object o = m_lockedRegsMap.get(reg); (o != null); o =
 m_lockedRegsMap.get(reg))
 {
 // We don't allow cycles when we call out to the
 service factory.
 if (o.equals(Thread.currentThread()))
 {
 throw new ServiceException(
 ServiceFactory.getService() resulted in a cycle.,
 ServiceException.FACTORY_ERROR,
 null);
 }

 // Otherwise, wait for it to be freed.
 try
 {
 wait();
 }
 catch (InterruptedException ex)
 {
 Thread.currentThread().interrupt();
 }
 }


Resetting the interrupt flag on a thread after it has been interrupted
is a usual practice and it allows the thread pool managing said thread
to to see that a cancellation has been requested. So IMO the interrupt
call should remain.

However, the code should also break out of the loop, as an interrupt
invariably is a request to stop the thread's execution. IMO the code
should end up looking something like

  try
  {
  wait();
  }
  catch (InterruptedException ex)
  {
  Thread.currentThread().interrupt();
   break;
  }

Robert

 I'm wondering why the code doesn't break out of the loop in the catch block?

 Cheers,

 David

 On 20 March 2015 at 12:16, David Bosschaert david.bosscha...@gmail.com 
 wrote:
 Hi all,

 I'm looking at an issue that I'm experiencing (with Felix 4.6.1/Java
 7) where the ServiceRegsitry.getService() [1] method seems to be in an
 endless loop. It doesn't happen very often, but when it does happen
 the thread executing getService() seems to never exit that method
 apparently switch between the following two states:

 1: Thread 22059: (state = IN_VM)
  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may
 be imprecise)
  - java.lang.Object.wait() @bci=2, line=503 (Compiled frame)
  - 
 org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle,
 org.osgi.framework.ServiceReference, boolean) @bci=86, line=313
 (Compiled frame)

 2: Thread 22059: (state = IN_VM)
  - java.lang.Throwable.fillInStackTrace(int) @bci=0 (Compiled frame;
 information may be imprecise)
  - java.lang.Throwable.fillInStackTrace() @bci=16, line=783 (Compiled frame)
  - java.lang.Throwable.init() @bci=24, line=250 (Compiled frame)
  - java.lang.Exception.init() @bci=1, line=54 (Compiled frame)
  - java.lang.InterruptedException.init() @bci=1, line=57 (Compiled frame)
  - java.lang.Object.wait(long) @bci=0 (Compiled frame)
  - java.lang.Object.wait() @bci=2, line=503 (Compiled frame)
  - 
 org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle,
 org.osgi.framework.ServiceReference, boolean) @bci=86, line=313
 (Compiled frame)

 Even though the thread is executing wait() all of the other Felix
 SR-accessing threads are blocked on the Service Registry lock. The net
 effect is that any operation on the Service Registry is blocked.
 There is one thing that I don't understand and that is that in the
 above frames the lock should really be released, as the code is in
 wait(). However, it seems like the lock is still held because none of
 the other threads are getting access to the Service Registry. For
 example another such thread is the following which is actually about
 to decrease the usage count on the service and then call notifyAll():

 Thread 48643: (state = BLOCKED)
  - 
 org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle,
 org.osgi.framework.ServiceReference, boolean) @bci=241, line=367
 (Compiled frame)
  - 
 org.apache.felix.framework.util.EventDispatcher.filterListenersUsingHooks(org.osgi.framework.ServiceEvent,
 org.osgi.framework.launch.Framework, java.util.Map) @bci=349, line=618
 (Compiled frame)
  - 
 org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(org.osgi.framework.ServiceEvent,
 java.util.Dictionary, org.osgi.framework.launch.Framework) @bci=33,
 line=542 (Interpreted frame)
  - 
 org.apache.felix.framework.Felix.fireServiceEvent(org.osgi.framework.ServiceEvent,
 java.util.Dictionary) @bci=7, line=4547 (Compiled frame)
  - 
 org.apache.felix.framework.Felix.access$000(org.apache.felix.framework.Felix,
 org.osgi.framework.ServiceEvent, java.util.Dictionary) @bci=3,
 line=106 (Compiled frame)
  - 
 

Re: [Framework] ServiceRegistry.getService() endless loop with lock?

2015-03-22 Thread Richard S. Hall

On 3/21/15 05:52 , Carsten Ziegeler wrote:

Am 20.03.15 um 18:41 schrieb David Bosschaert:

Some more thoughts about this...

The wait() call in the getService() method is as follows:

synchronized (this)
 {
 // First make sure that no existing operation is currently
 // being performed by another thread on the service registration.
 for (Object o = m_lockedRegsMap.get(reg); (o != null); o =
m_lockedRegsMap.get(reg))
 {
 // We don't allow cycles when we call out to the
service factory.
 if (o.equals(Thread.currentThread()))
 {
 throw new ServiceException(
 ServiceFactory.getService() resulted in a cycle.,
 ServiceException.FACTORY_ERROR,
 null);
 }

 // Otherwise, wait for it to be freed.
 try
 {
 wait();
 }
 catch (InterruptedException ex)
 {
 Thread.currentThread().interrupt();
 }
 }

I'm wondering why the code doesn't break out of the loop in the catch block?


Good question - The call to interrupt() is wrong. If the thread gets
interrupted while in wait(), the catch block resets the interrupt flag
and the current thread stays in the for loop. Then it hits wait and as
the interrupted flag is set, it throws an interrupted exception, and the
game starts again. So this will basically create a busy loop once this
thread gets interrupted.

I assume, as the thread is holding the lock on this and it's a busy
loop, the other thread who wants to get the lock on this has no chance
as once this thread is interrupted, it never waits. This would explain
why it never leaves the loop.

Question remains, why the thread got interrupted in the first place.


It was something that you did as part of FELIX-4806:

https://fisheye6.atlassian.com/browse/felix/trunk/framework/src/main/java/org/apache/felix/framework/ServiceRegistry.java?r=1661592

Not sure why, though. It doesn't seem related to the issue.

- richard



Carsten




Re: [Framework] ServiceRegistry.getService() endless loop with lock?

2015-03-22 Thread David Bosschaert
Thanks Carsten!

I removed that interrupt() call on trunk.

On why it got interrupted? I'm not sure. I guess anyone could call
Thread.interrupt()...

Cheers,

David

On 21 March 2015 at 09:52, Carsten Ziegeler cziege...@apache.org wrote:
 Good question - The call to interrupt() is wrong. If the thread gets
 interrupted while in wait(), the catch block resets the interrupt flag
 and the current thread stays in the for loop. Then it hits wait and as
 the interrupted flag is set, it throws an interrupted exception, and the
 game starts again. So this will basically create a busy loop once this
 thread gets interrupted.

 I assume, as the thread is holding the lock on this and it's a busy
 loop, the other thread who wants to get the lock on this has no chance
 as once this thread is interrupted, it never waits. This would explain
 why it never leaves the loop.

 Question remains, why the thread got interrupted in the first place.

 Carsten
 --
 Carsten Ziegeler
 Adobe Research Switzerland
 cziege...@apache.org


Re: [Framework] ServiceRegistry.getService() endless loop with lock?

2015-03-21 Thread Carsten Ziegeler
Am 20.03.15 um 18:41 schrieb David Bosschaert:
 Some more thoughts about this...
 
 The wait() call in the getService() method is as follows:
 
 synchronized (this)
 {
 // First make sure that no existing operation is currently
 // being performed by another thread on the service registration.
 for (Object o = m_lockedRegsMap.get(reg); (o != null); o =
 m_lockedRegsMap.get(reg))
 {
 // We don't allow cycles when we call out to the
 service factory.
 if (o.equals(Thread.currentThread()))
 {
 throw new ServiceException(
 ServiceFactory.getService() resulted in a cycle.,
 ServiceException.FACTORY_ERROR,
 null);
 }
 
 // Otherwise, wait for it to be freed.
 try
 {
 wait();
 }
 catch (InterruptedException ex)
 {
 Thread.currentThread().interrupt();
 }
 }
 
 I'm wondering why the code doesn't break out of the loop in the catch block?
 

Good question - The call to interrupt() is wrong. If the thread gets
interrupted while in wait(), the catch block resets the interrupt flag
and the current thread stays in the for loop. Then it hits wait and as
the interrupted flag is set, it throws an interrupted exception, and the
game starts again. So this will basically create a busy loop once this
thread gets interrupted.

I assume, as the thread is holding the lock on this and it's a busy
loop, the other thread who wants to get the lock on this has no chance
as once this thread is interrupted, it never waits. This would explain
why it never leaves the loop.

Question remains, why the thread got interrupted in the first place.

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziege...@apache.org


Re: [Framework] ServiceRegistry.getService() endless loop with lock?

2015-03-20 Thread David Bosschaert
Some more thoughts about this...

The wait() call in the getService() method is as follows:

synchronized (this)
{
// First make sure that no existing operation is currently
// being performed by another thread on the service registration.
for (Object o = m_lockedRegsMap.get(reg); (o != null); o =
m_lockedRegsMap.get(reg))
{
// We don't allow cycles when we call out to the
service factory.
if (o.equals(Thread.currentThread()))
{
throw new ServiceException(
ServiceFactory.getService() resulted in a cycle.,
ServiceException.FACTORY_ERROR,
null);
}

// Otherwise, wait for it to be freed.
try
{
wait();
}
catch (InterruptedException ex)
{
Thread.currentThread().interrupt();
}
}

I'm wondering why the code doesn't break out of the loop in the catch block?

Cheers,

David

On 20 March 2015 at 12:16, David Bosschaert david.bosscha...@gmail.com wrote:
 Hi all,

 I'm looking at an issue that I'm experiencing (with Felix 4.6.1/Java
 7) where the ServiceRegsitry.getService() [1] method seems to be in an
 endless loop. It doesn't happen very often, but when it does happen
 the thread executing getService() seems to never exit that method
 apparently switch between the following two states:

 1: Thread 22059: (state = IN_VM)
  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may
 be imprecise)
  - java.lang.Object.wait() @bci=2, line=503 (Compiled frame)
  - 
 org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle,
 org.osgi.framework.ServiceReference, boolean) @bci=86, line=313
 (Compiled frame)

 2: Thread 22059: (state = IN_VM)
  - java.lang.Throwable.fillInStackTrace(int) @bci=0 (Compiled frame;
 information may be imprecise)
  - java.lang.Throwable.fillInStackTrace() @bci=16, line=783 (Compiled frame)
  - java.lang.Throwable.init() @bci=24, line=250 (Compiled frame)
  - java.lang.Exception.init() @bci=1, line=54 (Compiled frame)
  - java.lang.InterruptedException.init() @bci=1, line=57 (Compiled frame)
  - java.lang.Object.wait(long) @bci=0 (Compiled frame)
  - java.lang.Object.wait() @bci=2, line=503 (Compiled frame)
  - 
 org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle,
 org.osgi.framework.ServiceReference, boolean) @bci=86, line=313
 (Compiled frame)

 Even though the thread is executing wait() all of the other Felix
 SR-accessing threads are blocked on the Service Registry lock. The net
 effect is that any operation on the Service Registry is blocked.
 There is one thing that I don't understand and that is that in the
 above frames the lock should really be released, as the code is in
 wait(). However, it seems like the lock is still held because none of
 the other threads are getting access to the Service Registry. For
 example another such thread is the following which is actually about
 to decrease the usage count on the service and then call notifyAll():

 Thread 48643: (state = BLOCKED)
  - 
 org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle,
 org.osgi.framework.ServiceReference, boolean) @bci=241, line=367
 (Compiled frame)
  - 
 org.apache.felix.framework.util.EventDispatcher.filterListenersUsingHooks(org.osgi.framework.ServiceEvent,
 org.osgi.framework.launch.Framework, java.util.Map) @bci=349, line=618
 (Compiled frame)
  - 
 org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(org.osgi.framework.ServiceEvent,
 java.util.Dictionary, org.osgi.framework.launch.Framework) @bci=33,
 line=542 (Interpreted frame)
  - 
 org.apache.felix.framework.Felix.fireServiceEvent(org.osgi.framework.ServiceEvent,
 java.util.Dictionary) @bci=7, line=4547 (Compiled frame)
  - 
 org.apache.felix.framework.Felix.access$000(org.apache.felix.framework.Felix,
 org.osgi.framework.ServiceEvent, java.util.Dictionary) @bci=3,
 line=106 (Compiled frame)
  - 
 org.apache.felix.framework.Felix$1.serviceChanged(org.osgi.framework.ServiceEvent,
 java.util.Dictionary) @bci=6, line=436 (Compiled frame)
  - 
 org.apache.felix.framework.ServiceRegistry.unregisterService(org.osgi.framework.Bundle,
 org.osgi.framework.ServiceRegistration) @bci=100, line=165 (Compiled
 frame)
  - org.apache.felix.framework.ServiceRegistrationImpl.unregister()
 @bci=52, line=140 (Interpreted frame)

 I just don't understand why all the other threads are blocked on the
 service registry. I'm probably missing something simple, so would be
 grateful if someone else has an idea.

 Many thanks,

 David

 [1] 
 http://svn.apache.org/repos/asf/felix/releases/org.apache.felix.framework-4.6.1/src/main/java/org/apache/felix/framework/ServiceRegistry.java