Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
Thanks Pierre for submitting a unit test to FELIX-4866 that helped me enormously in identifying the issue. I have fixed the bug in my code (without degrading performance) and at least your concurrency test, my concurrency tests and all the framework unit tests now consistently pass. I would be very interested in hearing whether your bigger test suit also still behaves as expected. Best regards, David On 14 May 2015 at 22:53, Pierre De Rop pierre.de...@gmail.com wrote: the threadump did not help. I will investigate (may be a bug somewhere in my part; if this is the case, I would be sorry to make all this noise). hope to let you know soon. by the way, do you know how to run the SCR integration tests with the framework from the trunk ? I know that there are some SCR integration tests that are doing some load tests, and I would be interested to know if they are also ok with the framework from the trunk ? cheers; /Pierre On Thu, May 14, 2015 at 10:06 PM, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, It would indeed be useful to find out more about why your test is hanging. Maybe analysing a threaddump might give some more information? Cheers, David On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote: Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems. I wonder if someone else do have another tool in order to perform another kind of load test, just to see if some problems are also observed. - from my side, I will do the following: in the past, the benchmark tool supported not only dependencymanager, but also Felix SCR and iPojo. So, I will reintroduce Felix SCR in the benchmark and will check if I also observe the problem (with -Dthreads=10). I will let you know. cheers; /Pierre [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README On Thu, May 14, 2015 at 3:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote: I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
Yes David, indeed our scenarios are different. In mine, I'm measuring the assembly of components using parallel dependency manager where components are concurrently activated, registered, and bound with each other, using a shared thread pool. cheers; /Pierre On Tue, May 19, 2015 at 3:40 PM, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, Good to hear that the problem is now gone. I guess the performance improvement measured hugely depends on what you are testing. My test focuses on multiple clients/threads/bundles accessing the same service (either singleton or PSF) in a very raw manner (via ctx.getServiceReference()). Good to hear that you're still seeing perf improvements but I guess your test exercises a number of other components as well (e.g. Dependency Manager) possibly using multiple service registrations, so that could very well explain some of the differences in our results... Cheers, David On 19 May 2015 at 13:32, Pierre De Rop pierre.de...@gmail.com wrote: Hi David, Excellent. I'm glad to confirm that the issue is resolved, and my DM loader is now running seamlessly. I'm observing an overall gain of 16% compared to the previous 5.0.0. (but this has to be taken with care,because I only made a quick test). I did not have time but I guess I could observe a better performance gain on a bigger host with more cpu (I only have four); since synchronization cost is usually proportional to the number of available cores and as I understand your fix is now based on java.util.concurrent jdk tools. many thanks /Pierre On Tue, May 19, 2015 at 1:57 PM, David Bosschaert david.bosscha...@gmail.com wrote: Thanks Pierre for submitting a unit test to FELIX-4866 that helped me enormously in identifying the issue. I have fixed the bug in my code (without degrading performance) and at least your concurrency test, my concurrency tests and all the framework unit tests now consistently pass. I would be very interested in hearing whether your bigger test suit also still behaves as expected. Best regards, David On 14 May 2015 at 22:53, Pierre De Rop pierre.de...@gmail.com wrote: the threadump did not help. I will investigate (may be a bug somewhere in my part; if this is the case, I would be sorry to make all this noise). hope to let you know soon. by the way, do you know how to run the SCR integration tests with the framework from the trunk ? I know that there are some SCR integration tests that are doing some load tests, and I would be interested to know if they are also ok with the framework from the trunk ? cheers; /Pierre On Thu, May 14, 2015 at 10:06 PM, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, It would indeed be useful to find out more about why your test is hanging. Maybe analysing a threaddump might give some more information? Cheers, David On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote: Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems.
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
Hi David, Excellent. I'm glad to confirm that the issue is resolved, and my DM loader is now running seamlessly. I'm observing an overall gain of 16% compared to the previous 5.0.0. (but this has to be taken with care,because I only made a quick test). I did not have time but I guess I could observe a better performance gain on a bigger host with more cpu (I only have four); since synchronization cost is usually proportional to the number of available cores and as I understand your fix is now based on java.util.concurrent jdk tools. many thanks /Pierre On Tue, May 19, 2015 at 1:57 PM, David Bosschaert david.bosscha...@gmail.com wrote: Thanks Pierre for submitting a unit test to FELIX-4866 that helped me enormously in identifying the issue. I have fixed the bug in my code (without degrading performance) and at least your concurrency test, my concurrency tests and all the framework unit tests now consistently pass. I would be very interested in hearing whether your bigger test suit also still behaves as expected. Best regards, David On 14 May 2015 at 22:53, Pierre De Rop pierre.de...@gmail.com wrote: the threadump did not help. I will investigate (may be a bug somewhere in my part; if this is the case, I would be sorry to make all this noise). hope to let you know soon. by the way, do you know how to run the SCR integration tests with the framework from the trunk ? I know that there are some SCR integration tests that are doing some load tests, and I would be interested to know if they are also ok with the framework from the trunk ? cheers; /Pierre On Thu, May 14, 2015 at 10:06 PM, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, It would indeed be useful to find out more about why your test is hanging. Maybe analysing a threaddump might give some more information? Cheers, David On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote: Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems. I wonder if someone else do have another tool in order to perform another kind of load test, just to see if some problems are also observed. - from my side, I will do the following: in the past, the benchmark tool supported not only dependencymanager, but also Felix SCR and iPojo. So, I will reintroduce Felix SCR in the benchmark and will check if I also observe the problem (with -Dthreads=10). I will let you know. cheers; /Pierre [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README On Thu, May 14, 2015 at 3:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote:
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager .. - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722 | 919,838,078] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . Here, the first org.apache.felix.dependencymanager.benchmark.dependencymanager test (single-threaded) passes OK. But the next one hangs (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel). it uses a fork join pool with size=4. and when typing log warn, we see: log warn 2015.05.14 13:56:10 ERROR - Bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel - [ForkJoinPool-1-worker-3] Error processing tasks - java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at java.util.HashMap$KeyIterator.next(HashMap.java:1453) at java.util.AbstractCollection.addAll(AbstractCollection.java:343) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189) at org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269) at org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577) at org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655) at org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434) at org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295) at org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226) at org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657) at org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535) at org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492) at org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482) at org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227) at org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182) at org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) (I will investigate also in my code to check if the problem does not come from me ?) cheers; /Pierre On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop pierre.de...@gmail.com wrote: Hi David, I don't know if it's me (a bug in my benchmark tool) or if if there is a regression somewhere in the framework, by my parallel test does not pass anymore. The test first starts with a single-threaded scenario, which passes OK (org.apache.felix.dependencymanager.benchmark.dependencymanager), then when the parallel test starts (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel) it suddenly hangs, and when I type log warn under the gogo shell, I see the following exception: (I'm using java8): $ java -server -Xmx4g -Xms4g -jar bin/felix.jar Welcome to Apache Felix Gogo Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . (here, the dependencymanager.parallel test hangs and when I type log warn, I see this:) g! log warn 2015.05.14 13:31:03 ERROR -
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager .. - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722 | 919,838,078] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . Here, the first org.apache.felix.dependencymanager.benchmark.dependencymanager test (single-threaded) passes OK. But the next one hangs (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel). it uses a fork join pool with size=4. and when typing log warn, we see: log warn 2015.05.14 13:56:10 ERROR - Bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel - [ForkJoinPool-1-worker-3] Error processing tasks - java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at java.util.HashMap$KeyIterator.next(HashMap.java:1453) at java.util.AbstractCollection.addAll(AbstractCollection.java:343) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189) at org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269) at org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577) at org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655) at org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434) at org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295) at org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226) at org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657) at org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535) at org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492) at org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482) at org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227) at org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182) at org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) (I will investigate also in my code to check if the problem does not come from me ?) cheers; /Pierre On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop pierre.de...@gmail.com wrote: Hi David, I don't know if it's me (a bug in my benchmark tool) or if if there is a regression somewhere in the framework, by my parallel test does not pass anymore. The test first starts with a single-threaded scenario, which passes OK (org.apache.felix.dependencymanager.benchmark.dependencymanager), then when the parallel test starts (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel) it suddenly hangs, and when I type log warn under the gogo shell, I see the following exception: (I'm using java8): $ java -server -Xmx4g -Xms4g -jar bin/felix.jar Welcome to Apache Felix Gogo Benchmarking bundle:
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager .. - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722 | 919,838,078] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . Here, the first org.apache.felix.dependencymanager.benchmark.dependencymanager test (single-threaded) passes OK. But the next one hangs (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel). it uses a fork join pool with size=4. and when typing log warn, we see: log warn 2015.05.14 13:56:10 ERROR - Bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel - [ForkJoinPool-1-worker-3] Error processing tasks - java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at java.util.HashMap$KeyIterator.next(HashMap.java:1453) at java.util.AbstractCollection.addAll(AbstractCollection.java:343) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189) at org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269) at org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577) at org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655) at org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434) at org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295) at org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226) at org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657) at org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535) at org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492) at org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482) at org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227) at org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182) at org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) (I will investigate also in my code to check if the problem does not come from me ?) cheers; /Pierre On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop pierre.de...@gmail.com wrote: Hi David, I don't know if it's me (a bug in my benchmark tool) or if if there is a regression somewhere in the framework, by my parallel test does not pass anymore. The test first starts with a single-threaded scenario, which passes OK (org.apache.felix.dependencymanager.benchmark.dependencymanager), then when the parallel test starts
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote: I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager .. - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722 | 919,838,078] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . Here, the first org.apache.felix.dependencymanager.benchmark.dependencymanager test (single-threaded) passes OK. But the next one hangs (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel). it uses a fork join pool with size=4. and when typing log warn, we see: log warn 2015.05.14 13:56:10 ERROR - Bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel - [ForkJoinPool-1-worker-3] Error processing tasks - java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at java.util.HashMap$KeyIterator.next(HashMap.java:1453) at java.util.AbstractCollection.addAll(AbstractCollection.java:343) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189) at org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269) at org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577) at org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655) at org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434) at org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295) at org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226) at org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657) at org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535) at org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492) at org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482) at org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227) at org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182) at org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) (I will investigate also in my code to check if the problem does not come from me ?) cheers; /Pierre On Thu, May 14, 2015 at 1:47 PM, Pierre De Rop
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems. I wonder if someone else do have another tool in order to perform another kind of load test, just to see if some problems are also observed. - from my side, I will do the following: in the past, the benchmark tool supported not only dependencymanager, but also Felix SCR and iPojo. So, I will reintroduce Felix SCR in the benchmark and will check if I also observe the problem (with -Dthreads=10). I will let you know. cheers; /Pierre [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README On Thu, May 14, 2015 at 3:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote: I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager .. - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722 | 919,838,078] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . Here, the first org.apache.felix.dependencymanager.benchmark.dependencymanager test (single-threaded) passes OK. But the next one hangs (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel). it uses a fork join pool with size=4. and when typing log warn, we see: log warn 2015.05.14 13:56:10 ERROR - Bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel - [ForkJoinPool-1-worker-3] Error processing tasks - java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
oops, I just realise that it's not making sense to include the SCR test bundle in the benchmark tools, since it's not a concurrent test. So for now, I have clues about where the problem may come from. cheers; /Pierre On Thu, May 14, 2015 at 7:54 PM, Pierre De Rop pierre.de...@gmail.com wrote: Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems. I wonder if someone else do have another tool in order to perform another kind of load test, just to see if some problems are also observed. - from my side, I will do the following: in the past, the benchmark tool supported not only dependencymanager, but also Felix SCR and iPojo. So, I will reintroduce Felix SCR in the benchmark and will check if I also observe the problem (with -Dthreads=10). I will let you know. cheers; /Pierre [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README On Thu, May 14, 2015 at 3:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote: I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager .. - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722 | 919,838,078] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . Here, the first org.apache.felix.dependencymanager.benchmark.dependencymanager test (single-threaded) passes OK. But the next one hangs (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel). it uses a fork join pool with size=4. and when typing log warn, we see:
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
Hi Pierre, It would indeed be useful to find out more about why your test is hanging. Maybe analysing a threaddump might give some more information? Cheers, David On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote: Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems. I wonder if someone else do have another tool in order to perform another kind of load test, just to see if some problems are also observed. - from my side, I will do the following: in the past, the benchmark tool supported not only dependencymanager, but also Felix SCR and iPojo. So, I will reintroduce Felix SCR in the benchmark and will check if I also observe the problem (with -Dthreads=10). I will let you know. cheers; /Pierre [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README On Thu, May 14, 2015 at 3:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote: I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager .. - results in nanos: [139,129,744 | 143,957,687 | 152,157,581 | 319,631,722 | 919,838,078] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . Here, the first org.apache.felix.dependencymanager.benchmark.dependencymanager test (single-threaded) passes OK. But the next one hangs (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel). it uses a fork join pool with size=4. and when typing log warn, we see: log warn 2015.05.14 13:56:10 ERROR - Bundle:
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
the threadump did not help. I will investigate (may be a bug somewhere in my part; if this is the case, I would be sorry to make all this noise). hope to let you know soon. by the way, do you know how to run the SCR integration tests with the framework from the trunk ? I know that there are some SCR integration tests that are doing some load tests, and I would be interested to know if they are also ok with the framework from the trunk ? cheers; /Pierre On Thu, May 14, 2015 at 10:06 PM, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, It would indeed be useful to find out more about why your test is hanging. Maybe analysing a threaddump might give some more information? Cheers, David On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote: Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems. I wonder if someone else do have another tool in order to perform another kind of load test, just to see if some problems are also observed. - from my side, I will do the following: in the past, the benchmark tool supported not only dependencymanager, but also Felix SCR and iPojo. So, I will reintroduce Felix SCR in the benchmark and will check if I also observe the problem (with -Dthreads=10). I will let you know. cheers; /Pierre [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README On Thu, May 14, 2015 at 3:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote: I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods]
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
ok, it's a bit late, I will continue tomorrow. What I just found is that when the test fails, we are in the following situation: A DM component C1 that is part of the test remains inactive because it is awaiting for a service dependency on C2. But C2 is actually registered in the OSGi service registry (I verified it using inspect capability service gogo command). And it looks like the service tracker used by C1 to track C2 has never been called in the addingService(C2). That is why C1 remains inactive and this makes the test failing (I added some debug code in dependency manager in order to verify this). so, it will be difficult to make an integration test, but I think there is still a problem somewhere in the framework. I also ran the tests of DM and I have now 4 failing tests. will continue to investigate tomorrow if I can. cheers; /Pierre On Thu, May 14, 2015 at 10:06 PM, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, It would indeed be useful to find out more about why your test is hanging. Maybe analysing a threaddump might give some more information? Cheers, David On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote: Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems. I wonder if someone else do have another tool in order to perform another kind of load test, just to see if some problems are also observed. - from my side, I will do the following: in the past, the benchmark tool supported not only dependencymanager, but also Felix SCR and iPojo. So, I will reintroduce Felix SCR in the benchmark and will check if I also observe the problem (with -Dthreads=10). I will let you know. cheers; /Pierre [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README On Thu, May 14, 2015 at 3:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote: I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
the threadump did not help. I will investigate (may be a bug somewhere in my part; if this is the case, I would be sorry to make all this noise). hope to let you know soon. by the way, do you know how to run the SCR integration tests with the framework from the trunk ? I know that there are some SCR integration tests that are doing some load tests, and I would be interested to know if they are also ok with the framework from the trunk ? cheers; /Pierre On Thu, May 14, 2015 at 10:06 PM, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, It would indeed be useful to find out more about why your test is hanging. Maybe analysing a threaddump might give some more information? Cheers, David On 14 May 2015 at 19:54, Pierre De Rop pierre.de...@gmail.com wrote: Thanks David; I just gave a try, and indeed the parallel test passed. I observed a gain of around 7/10%. The tool is described in [1]. But I only have 4 cores on my laptop and I will make more tests in my lab at work (next week) where we have some servers having 32 or even 128 processors. This will give a better idea of the gain because the more processor you have, the more synchronization is costly, so I could possibly observe a better performance gain. Now, I'm sorry but I think that there is still a problem (I don't know where): when using more threads, the parallel test does not complete and stops with a timeout message, indicating that the number of expected components are not created after a timeout delay of 1 minute. So, I just committed a modified version of the tool in the sandbox which can now take a -Dthreads option in order to configure the number of threads. With -Dthreads=4, its OK. But with -Dthreads=10, then test does not complete and ends with a timeout: $ java -Dthreads=10 -server -jar bin/felix.jar g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods] Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel .Could not start components timely: current start latch=2, stop latch=630 My current understanding of this is that some components are still awaiting for unsatisfied service dependencies, just like if a service tracker would have missed a service registration. I ran the same test during two hours with the previous framework version, and did not observe any problems. I wonder if someone else do have another tool in order to perform another kind of load test, just to see if some problems are also observed. - from my side, I will do the following: in the past, the benchmark tool supported not only dependencymanager, but also Felix SCR and iPojo. So, I will reintroduce Felix SCR in the benchmark and will check if I also observe the problem (with -Dthreads=10). I will let you know. cheers; /Pierre [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/README On Thu, May 14, 2015 at 3:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: I've fixed this now in svn.apache.org/viewvc?view=revisionrevision=1679367 Pierre, your loadtest now runs to completion - thanks for reporting this issue! I can see that the results for the parallel tests are a little bit different than before, but I'm not sure how to read them so I'll leave the interpretation of that to you :) Cheers, David On 14 May 2015 at 14:38, David Bosschaert david.bosscha...@gmail.com wrote: I think I know what this is. I had some additional changes exactly in this area that I simply forgot to apply this morning. I should have it fixed sometime today. Cheers, David On 14 May 2015 at 14:03, David Bosschaert david.bosscha...@gmail.com wrote: Hi Pierre, I'll take a look today. Cheers, David On 14 May 2015 at 14:00, Pierre De Rop pierre.de...@gmail.com wrote: I just committed the benchmark tool in http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/, if you can take a look. To run the scenario: - install jdk8: [nxuser@nx0012 pderop]$ java -version java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) - checkout the loadtest from http://svn.apache.org/viewvc/felix/sandbox/pderop/loadtest/ - go the the loadtest directory and start the test, just like this: $ java -server -jar bin/felix.jar Welcome to Apache Felix Gogo g! Starting benchmarks (each tested bundle will add/remove 630 components during bundle activation). [Starting benchmarks with no processing done in components start methods]
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
Hi David, I don't know if it's me (a bug in my benchmark tool) or if if there is a regression somewhere in the framework, by my parallel test does not pass anymore. The test first starts with a single-threaded scenario, which passes OK (org.apache.felix.dependencymanager.benchmark.dependencymanager), then when the parallel test starts (org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel) it suddenly hangs, and when I type log warn under the gogo shell, I see the following exception: (I'm using java8): $ java -server -Xmx4g -Xms4g -jar bin/felix.jar Welcome to Apache Felix Gogo Benchmarking bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel . (here, the dependencymanager.parallel test hangs and when I type log warn, I see this:) g! log warn 2015.05.14 13:31:03 ERROR - Bundle: org.apache.felix.dependencymanager.benchmark.dependencymanager.parallel - [ForkJoinPool-1-worker-3] Error processing tasks - java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at java.util.HashMap$KeyIterator.next(HashMap.java:1453) at java.util.AbstractCollection.addAll(AbstractCollection.java:343) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:245) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:212) at org.apache.felix.framework.capabilityset.CapabilitySet.match(CapabilitySet.java:189) at org.apache.felix.framework.ServiceRegistry.getServiceReferences(ServiceRegistry.java:269) at org.apache.felix.framework.Felix.getServiceReferences(Felix.java:3577) at org.apache.felix.framework.Felix.getAllowedServiceReferences(Felix.java:3655) at org.apache.felix.framework.BundleContextImpl.getServiceReferences(BundleContextImpl.java:434) at org.apache.felix.dm.tracker.ServiceTracker.getInitialReferences(ServiceTracker.java:422) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:375) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:319) at org.apache.felix.dm.tracker.ServiceTracker.open(ServiceTracker.java:295) at org.apache.felix.dm.impl.ServiceDependencyImpl.start(ServiceDependencyImpl.java:226) at org.apache.felix.dm.impl.ComponentImpl.startDependencies(ComponentImpl.java:657) at org.apache.felix.dm.impl.ComponentImpl.performTransition(ComponentImpl.java:535) at org.apache.felix.dm.impl.ComponentImpl.handleChange(ComponentImpl.java:492) at org.apache.felix.dm.impl.ComponentImpl.access$5(ComponentImpl.java:482) at org.apache.felix.dm.impl.ComponentImpl$3.run(ComponentImpl.java:227) at org.apache.felix.dm.impl.DispatchExecutor.runTask(DispatchExecutor.java:182) at org.apache.felix.dm.impl.DispatchExecutor.run(DispatchExecutor.java:165) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) (If I configure my threadpool to 1, I have no problems, but with threadpool=4, then I have the problem) I will investigate, but Ideally, may be it would be helpful if you could also run the test by yourself; so I will commit soon something to reproduce the problem in my sandbox. cheers; /Pierre On Thu, May 14, 2015 at 11:11 AM, David Bosschaert david.bosscha...@gmail.com wrote: I've committed this now in http://svn.apache.org/viewvc?view=revisionrevision=1679327 Curious to see what others are measuring. My tests were focused on multiple bundles/threads obtaining the same service, as that's were I saw a bit of contention. Cheers, David On 13 May 2015 at 15:10, Pierre De Rop pierre.de...@gmail.com wrote: Hi David, I'm looking forward to test your improvements using the dependencymanager benchmark tool ([1]). [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/ /Pierre On Wed, May 13, 2015 at 3:02 PM, David Bosschaert david.bosscha...@gmail.com wrote: I have implemented the performance improvements that I was thinking of using Java 5 concurrency tools, they can be viewed at [1]. I wrote a little performance test suite [2] that tests multithreaded service registry performance (10 threads) from single / multiple bundles with either singleton services and Prototype Service Factory services and the results are quite impressive. I'm getting performance improvements compared to the current trunk from 8 times better than the original (800%) to more than 30 times better
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
I've committed this now in http://svn.apache.org/viewvc?view=revisionrevision=1679327 Curious to see what others are measuring. My tests were focused on multiple bundles/threads obtaining the same service, as that's were I saw a bit of contention. Cheers, David On 13 May 2015 at 15:10, Pierre De Rop pierre.de...@gmail.com wrote: Hi David, I'm looking forward to test your improvements using the dependencymanager benchmark tool ([1]). [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/ /Pierre On Wed, May 13, 2015 at 3:02 PM, David Bosschaert david.bosscha...@gmail.com wrote: I have implemented the performance improvements that I was thinking of using Java 5 concurrency tools, they can be viewed at [1]. I wrote a little performance test suite [2] that tests multithreaded service registry performance (10 threads) from single / multiple bundles with either singleton services and Prototype Service Factory services and the results are quite impressive. I'm getting performance improvements compared to the current trunk from 8 times better than the original (800%) to more than 30 times better (3000%). Carsten has already reviewed the code (thanks Carsten!) and I'm planning to commit it to Felix tomorrow if nobody objects. Cheers, David [1] https://github.com/bosschaert/felix/commit/e6a1b06c6e66d9c98e6d81b91ef7003c8e725450 [2] https://github.com/bosschaert/coderthoughts/tree/master/service-registry-perftest/srperf On 23 March 2015 at 15:39, Richard S. Hall he...@ungoverned.org wrote: On 3/23/15 10:17 , David Bosschaert wrote: On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org wrote: On 3/23/15 03:55 , Guillaume Nodet wrote: There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it can be the culprit though. Interrupts could also be caused by a bundle being shutdown while one of its thread is waiting for a service, which should is a valid use case imho. Anyway, I think sanely reacting to a thread being interrupted would be good. Yes, threads can be interrupted if they are holding a bundle lock and the global lock holder needs the bundle lock. I admit that I do not recall why we ignore the interrupt here, but didn't we implement service lookup so that a bundle lock wasn't necessary? I thought we just checked for the validity of the bundle context before returning or something. Perhaps we felt there was no reason to be interrupted in that case. I really don't know. I think that the Service Registry could be rewritten to be completely free of synchronized blocks using the Java 5 concurrency libraries, Well, that just moves the sync blocks to the library, but yeah sure. which I think would really be a better approach. There is too much locking going on in the current SR implementation IMHO. I don't really think there is too much, but it is complicated. Unfortunately, it is complicated to make sure that locks aren't held while do service lookups and this is complicated because you can run into cycles, etc. But feel free to try to simplify it. This brings the question: can we move to Java 5 (or Java 6) for the Framework codebase? AFAIK we're currently still JDK 1.4 compatible but I would be surprised if there is anyone who still needs a JDK that went end-of-life 7 years ago. At this point, it doesn't really matter to me. - richard Best regards, David
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
Hi David, I'm looking forward to test your improvements using the dependencymanager benchmark tool ([1]). [1] http://svn.apache.org/viewvc/felix/trunk/dependencymanager/org.apache.felix.dependencymanager.benchmark/ /Pierre On Wed, May 13, 2015 at 3:02 PM, David Bosschaert david.bosscha...@gmail.com wrote: I have implemented the performance improvements that I was thinking of using Java 5 concurrency tools, they can be viewed at [1]. I wrote a little performance test suite [2] that tests multithreaded service registry performance (10 threads) from single / multiple bundles with either singleton services and Prototype Service Factory services and the results are quite impressive. I'm getting performance improvements compared to the current trunk from 8 times better than the original (800%) to more than 30 times better (3000%). Carsten has already reviewed the code (thanks Carsten!) and I'm planning to commit it to Felix tomorrow if nobody objects. Cheers, David [1] https://github.com/bosschaert/felix/commit/e6a1b06c6e66d9c98e6d81b91ef7003c8e725450 [2] https://github.com/bosschaert/coderthoughts/tree/master/service-registry-perftest/srperf On 23 March 2015 at 15:39, Richard S. Hall he...@ungoverned.org wrote: On 3/23/15 10:17 , David Bosschaert wrote: On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org wrote: On 3/23/15 03:55 , Guillaume Nodet wrote: There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it can be the culprit though. Interrupts could also be caused by a bundle being shutdown while one of its thread is waiting for a service, which should is a valid use case imho. Anyway, I think sanely reacting to a thread being interrupted would be good. Yes, threads can be interrupted if they are holding a bundle lock and the global lock holder needs the bundle lock. I admit that I do not recall why we ignore the interrupt here, but didn't we implement service lookup so that a bundle lock wasn't necessary? I thought we just checked for the validity of the bundle context before returning or something. Perhaps we felt there was no reason to be interrupted in that case. I really don't know. I think that the Service Registry could be rewritten to be completely free of synchronized blocks using the Java 5 concurrency libraries, Well, that just moves the sync blocks to the library, but yeah sure. which I think would really be a better approach. There is too much locking going on in the current SR implementation IMHO. I don't really think there is too much, but it is complicated. Unfortunately, it is complicated to make sure that locks aren't held while do service lookups and this is complicated because you can run into cycles, etc. But feel free to try to simplify it. This brings the question: can we move to Java 5 (or Java 6) for the Framework codebase? AFAIK we're currently still JDK 1.4 compatible but I would be surprised if there is anyone who still needs a JDK that went end-of-life 7 years ago. At this point, it doesn't really matter to me. - richard Best regards, David
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
I have implemented the performance improvements that I was thinking of using Java 5 concurrency tools, they can be viewed at [1]. I wrote a little performance test suite [2] that tests multithreaded service registry performance (10 threads) from single / multiple bundles with either singleton services and Prototype Service Factory services and the results are quite impressive. I'm getting performance improvements compared to the current trunk from 8 times better than the original (800%) to more than 30 times better (3000%). Carsten has already reviewed the code (thanks Carsten!) and I'm planning to commit it to Felix tomorrow if nobody objects. Cheers, David [1] https://github.com/bosschaert/felix/commit/e6a1b06c6e66d9c98e6d81b91ef7003c8e725450 [2] https://github.com/bosschaert/coderthoughts/tree/master/service-registry-perftest/srperf On 23 March 2015 at 15:39, Richard S. Hall he...@ungoverned.org wrote: On 3/23/15 10:17 , David Bosschaert wrote: On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org wrote: On 3/23/15 03:55 , Guillaume Nodet wrote: There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it can be the culprit though. Interrupts could also be caused by a bundle being shutdown while one of its thread is waiting for a service, which should is a valid use case imho. Anyway, I think sanely reacting to a thread being interrupted would be good. Yes, threads can be interrupted if they are holding a bundle lock and the global lock holder needs the bundle lock. I admit that I do not recall why we ignore the interrupt here, but didn't we implement service lookup so that a bundle lock wasn't necessary? I thought we just checked for the validity of the bundle context before returning or something. Perhaps we felt there was no reason to be interrupted in that case. I really don't know. I think that the Service Registry could be rewritten to be completely free of synchronized blocks using the Java 5 concurrency libraries, Well, that just moves the sync blocks to the library, but yeah sure. which I think would really be a better approach. There is too much locking going on in the current SR implementation IMHO. I don't really think there is too much, but it is complicated. Unfortunately, it is complicated to make sure that locks aren't held while do service lookups and this is complicated because you can run into cycles, etc. But feel free to try to simplify it. This brings the question: can we move to Java 5 (or Java 6) for the Framework codebase? AFAIK we're currently still JDK 1.4 compatible but I would be surprised if there is anyone who still needs a JDK that went end-of-life 7 years ago. At this point, it doesn't really matter to me. - richard Best regards, David
Re: [Framework] ServiceRegistry.getService() endless loop with lock?
On 3/23/15 03:55 , Guillaume Nodet wrote: There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it can be the culprit though. Interrupts could also be caused by a bundle being shutdown while one of its thread is waiting for a service, which should is a valid use case imho. Anyway, I think sanely reacting to a thread being interrupted would be good. Yes, threads can be interrupted if they are holding a bundle lock and the global lock holder needs the bundle lock. I admit that I do not recall why we ignore the interrupt here, but didn't we implement service lookup so that a bundle lock wasn't necessary? I thought we just checked for the validity of the bundle context before returning or something. Perhaps we felt there was no reason to be interrupted in that case. I really don't know. - richard 2015-03-23 8:46 GMT+01:00 Carsten Ziegeler cziege...@apache.org: Am 23.03.15 um 01:25 schrieb Richard S. Hall: On 3/21/15 05:52 , Carsten Ziegeler wrote: Question remains, why the thread got interrupted in the first place. It was something that you did as part of FELIX-4806: Yes, I noticed this as well - and I have no idea why I did it. I know that I worked on some other code at that time where it was good to add the interrupt call. Maybe a case of repeating the pattern :( But my question was not why this piece of code has been added but rather why the thread gets interrupted in the first place. Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
On 3/23/15 10:17 , David Bosschaert wrote: On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org wrote: On 3/23/15 03:55 , Guillaume Nodet wrote: There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it can be the culprit though. Interrupts could also be caused by a bundle being shutdown while one of its thread is waiting for a service, which should is a valid use case imho. Anyway, I think sanely reacting to a thread being interrupted would be good. Yes, threads can be interrupted if they are holding a bundle lock and the global lock holder needs the bundle lock. I admit that I do not recall why we ignore the interrupt here, but didn't we implement service lookup so that a bundle lock wasn't necessary? I thought we just checked for the validity of the bundle context before returning or something. Perhaps we felt there was no reason to be interrupted in that case. I really don't know. I think that the Service Registry could be rewritten to be completely free of synchronized blocks using the Java 5 concurrency libraries, Well, that just moves the sync blocks to the library, but yeah sure. which I think would really be a better approach. There is too much locking going on in the current SR implementation IMHO. I don't really think there is too much, but it is complicated. Unfortunately, it is complicated to make sure that locks aren't held while do service lookups and this is complicated because you can run into cycles, etc. But feel free to try to simplify it. This brings the question: can we move to Java 5 (or Java 6) for the Framework codebase? AFAIK we're currently still JDK 1.4 compatible but I would be surprised if there is anyone who still needs a JDK that went end-of-life 7 years ago. At this point, it doesn't really matter to me. - richard Best regards, David
Service Registry refactor using Java-5 concurrency libraries (Was Re: [Framework] ServiceRegistry.getService() endless loop with lock?)
On 23 March 2015 at 13:39, Richard S. Hall he...@ungoverned.org wrote: On 3/23/15 03:55 , Guillaume Nodet wrote: There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it can be the culprit though. Interrupts could also be caused by a bundle being shutdown while one of its thread is waiting for a service, which should is a valid use case imho. Anyway, I think sanely reacting to a thread being interrupted would be good. Yes, threads can be interrupted if they are holding a bundle lock and the global lock holder needs the bundle lock. I admit that I do not recall why we ignore the interrupt here, but didn't we implement service lookup so that a bundle lock wasn't necessary? I thought we just checked for the validity of the bundle context before returning or something. Perhaps we felt there was no reason to be interrupted in that case. I really don't know. I think that the Service Registry could be rewritten to be completely free of synchronized blocks using the Java 5 concurrency libraries, which I think would really be a better approach. There is too much locking going on in the current SR implementation IMHO. This brings the question: can we move to Java 5 (or Java 6) for the Framework codebase? AFAIK we're currently still JDK 1.4 compatible but I would be surprised if there is anyone who still needs a JDK that went end-of-life 7 years ago. Best regards, David
Re: [Framework] ServiceRegistry.getService() endless loop with lock?
Am 23.03.15 um 01:25 schrieb Richard S. Hall: On 3/21/15 05:52 , Carsten Ziegeler wrote: Question remains, why the thread got interrupted in the first place. It was something that you did as part of FELIX-4806: Yes, I noticed this as well - and I have no idea why I did it. I know that I worked on some other code at that time where it was good to add the interrupt call. Maybe a case of repeating the pattern :( But my question was not why this piece of code has been added but rather why the thread gets interrupted in the first place. Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: [Framework] ServiceRegistry.getService() endless loop with lock?
There's a call to interrupt() in Felix#acquireBundleLock(), not sure if it can be the culprit though. Interrupts could also be caused by a bundle being shutdown while one of its thread is waiting for a service, which should is a valid use case imho. Anyway, I think sanely reacting to a thread being interrupted would be good. 2015-03-23 8:46 GMT+01:00 Carsten Ziegeler cziege...@apache.org: Am 23.03.15 um 01:25 schrieb Richard S. Hall: On 3/21/15 05:52 , Carsten Ziegeler wrote: Question remains, why the thread got interrupted in the first place. It was something that you did as part of FELIX-4806: Yes, I noticed this as well - and I have no idea why I did it. I know that I worked on some other code at that time where it was good to add the interrupt call. Maybe a case of repeating the pattern :( But my question was not why this piece of code has been added but rather why the thread gets interrupted in the first place. Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: [Framework] ServiceRegistry.getService() endless loop with lock?
On Fri, Mar 20, 2015 at 7:41 PM, David Bosschaert david.bosscha...@gmail.com wrote: Some more thoughts about this... The wait() call in the getService() method is as follows: synchronized (this) { // First make sure that no existing operation is currently // being performed by another thread on the service registration. for (Object o = m_lockedRegsMap.get(reg); (o != null); o = m_lockedRegsMap.get(reg)) { // We don't allow cycles when we call out to the service factory. if (o.equals(Thread.currentThread())) { throw new ServiceException( ServiceFactory.getService() resulted in a cycle., ServiceException.FACTORY_ERROR, null); } // Otherwise, wait for it to be freed. try { wait(); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); } } Resetting the interrupt flag on a thread after it has been interrupted is a usual practice and it allows the thread pool managing said thread to to see that a cancellation has been requested. So IMO the interrupt call should remain. However, the code should also break out of the loop, as an interrupt invariably is a request to stop the thread's execution. IMO the code should end up looking something like try { wait(); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); break; } Robert I'm wondering why the code doesn't break out of the loop in the catch block? Cheers, David On 20 March 2015 at 12:16, David Bosschaert david.bosscha...@gmail.com wrote: Hi all, I'm looking at an issue that I'm experiencing (with Felix 4.6.1/Java 7) where the ServiceRegsitry.getService() [1] method seems to be in an endless loop. It doesn't happen very often, but when it does happen the thread executing getService() seems to never exit that method apparently switch between the following two states: 1: Thread 22059: (state = IN_VM) - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise) - java.lang.Object.wait() @bci=2, line=503 (Compiled frame) - org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle, org.osgi.framework.ServiceReference, boolean) @bci=86, line=313 (Compiled frame) 2: Thread 22059: (state = IN_VM) - java.lang.Throwable.fillInStackTrace(int) @bci=0 (Compiled frame; information may be imprecise) - java.lang.Throwable.fillInStackTrace() @bci=16, line=783 (Compiled frame) - java.lang.Throwable.init() @bci=24, line=250 (Compiled frame) - java.lang.Exception.init() @bci=1, line=54 (Compiled frame) - java.lang.InterruptedException.init() @bci=1, line=57 (Compiled frame) - java.lang.Object.wait(long) @bci=0 (Compiled frame) - java.lang.Object.wait() @bci=2, line=503 (Compiled frame) - org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle, org.osgi.framework.ServiceReference, boolean) @bci=86, line=313 (Compiled frame) Even though the thread is executing wait() all of the other Felix SR-accessing threads are blocked on the Service Registry lock. The net effect is that any operation on the Service Registry is blocked. There is one thing that I don't understand and that is that in the above frames the lock should really be released, as the code is in wait(). However, it seems like the lock is still held because none of the other threads are getting access to the Service Registry. For example another such thread is the following which is actually about to decrease the usage count on the service and then call notifyAll(): Thread 48643: (state = BLOCKED) - org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle, org.osgi.framework.ServiceReference, boolean) @bci=241, line=367 (Compiled frame) - org.apache.felix.framework.util.EventDispatcher.filterListenersUsingHooks(org.osgi.framework.ServiceEvent, org.osgi.framework.launch.Framework, java.util.Map) @bci=349, line=618 (Compiled frame) - org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(org.osgi.framework.ServiceEvent, java.util.Dictionary, org.osgi.framework.launch.Framework) @bci=33, line=542 (Interpreted frame) - org.apache.felix.framework.Felix.fireServiceEvent(org.osgi.framework.ServiceEvent, java.util.Dictionary) @bci=7, line=4547 (Compiled frame) - org.apache.felix.framework.Felix.access$000(org.apache.felix.framework.Felix, org.osgi.framework.ServiceEvent, java.util.Dictionary) @bci=3, line=106 (Compiled frame) -
Re: [Framework] ServiceRegistry.getService() endless loop with lock?
On 3/21/15 05:52 , Carsten Ziegeler wrote: Am 20.03.15 um 18:41 schrieb David Bosschaert: Some more thoughts about this... The wait() call in the getService() method is as follows: synchronized (this) { // First make sure that no existing operation is currently // being performed by another thread on the service registration. for (Object o = m_lockedRegsMap.get(reg); (o != null); o = m_lockedRegsMap.get(reg)) { // We don't allow cycles when we call out to the service factory. if (o.equals(Thread.currentThread())) { throw new ServiceException( ServiceFactory.getService() resulted in a cycle., ServiceException.FACTORY_ERROR, null); } // Otherwise, wait for it to be freed. try { wait(); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); } } I'm wondering why the code doesn't break out of the loop in the catch block? Good question - The call to interrupt() is wrong. If the thread gets interrupted while in wait(), the catch block resets the interrupt flag and the current thread stays in the for loop. Then it hits wait and as the interrupted flag is set, it throws an interrupted exception, and the game starts again. So this will basically create a busy loop once this thread gets interrupted. I assume, as the thread is holding the lock on this and it's a busy loop, the other thread who wants to get the lock on this has no chance as once this thread is interrupted, it never waits. This would explain why it never leaves the loop. Question remains, why the thread got interrupted in the first place. It was something that you did as part of FELIX-4806: https://fisheye6.atlassian.com/browse/felix/trunk/framework/src/main/java/org/apache/felix/framework/ServiceRegistry.java?r=1661592 Not sure why, though. It doesn't seem related to the issue. - richard Carsten
Re: [Framework] ServiceRegistry.getService() endless loop with lock?
Thanks Carsten! I removed that interrupt() call on trunk. On why it got interrupted? I'm not sure. I guess anyone could call Thread.interrupt()... Cheers, David On 21 March 2015 at 09:52, Carsten Ziegeler cziege...@apache.org wrote: Good question - The call to interrupt() is wrong. If the thread gets interrupted while in wait(), the catch block resets the interrupt flag and the current thread stays in the for loop. Then it hits wait and as the interrupted flag is set, it throws an interrupted exception, and the game starts again. So this will basically create a busy loop once this thread gets interrupted. I assume, as the thread is holding the lock on this and it's a busy loop, the other thread who wants to get the lock on this has no chance as once this thread is interrupted, it never waits. This would explain why it never leaves the loop. Question remains, why the thread got interrupted in the first place. Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: [Framework] ServiceRegistry.getService() endless loop with lock?
Am 20.03.15 um 18:41 schrieb David Bosschaert: Some more thoughts about this... The wait() call in the getService() method is as follows: synchronized (this) { // First make sure that no existing operation is currently // being performed by another thread on the service registration. for (Object o = m_lockedRegsMap.get(reg); (o != null); o = m_lockedRegsMap.get(reg)) { // We don't allow cycles when we call out to the service factory. if (o.equals(Thread.currentThread())) { throw new ServiceException( ServiceFactory.getService() resulted in a cycle., ServiceException.FACTORY_ERROR, null); } // Otherwise, wait for it to be freed. try { wait(); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); } } I'm wondering why the code doesn't break out of the loop in the catch block? Good question - The call to interrupt() is wrong. If the thread gets interrupted while in wait(), the catch block resets the interrupt flag and the current thread stays in the for loop. Then it hits wait and as the interrupted flag is set, it throws an interrupted exception, and the game starts again. So this will basically create a busy loop once this thread gets interrupted. I assume, as the thread is holding the lock on this and it's a busy loop, the other thread who wants to get the lock on this has no chance as once this thread is interrupted, it never waits. This would explain why it never leaves the loop. Question remains, why the thread got interrupted in the first place. Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: [Framework] ServiceRegistry.getService() endless loop with lock?
Some more thoughts about this... The wait() call in the getService() method is as follows: synchronized (this) { // First make sure that no existing operation is currently // being performed by another thread on the service registration. for (Object o = m_lockedRegsMap.get(reg); (o != null); o = m_lockedRegsMap.get(reg)) { // We don't allow cycles when we call out to the service factory. if (o.equals(Thread.currentThread())) { throw new ServiceException( ServiceFactory.getService() resulted in a cycle., ServiceException.FACTORY_ERROR, null); } // Otherwise, wait for it to be freed. try { wait(); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); } } I'm wondering why the code doesn't break out of the loop in the catch block? Cheers, David On 20 March 2015 at 12:16, David Bosschaert david.bosscha...@gmail.com wrote: Hi all, I'm looking at an issue that I'm experiencing (with Felix 4.6.1/Java 7) where the ServiceRegsitry.getService() [1] method seems to be in an endless loop. It doesn't happen very often, but when it does happen the thread executing getService() seems to never exit that method apparently switch between the following two states: 1: Thread 22059: (state = IN_VM) - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise) - java.lang.Object.wait() @bci=2, line=503 (Compiled frame) - org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle, org.osgi.framework.ServiceReference, boolean) @bci=86, line=313 (Compiled frame) 2: Thread 22059: (state = IN_VM) - java.lang.Throwable.fillInStackTrace(int) @bci=0 (Compiled frame; information may be imprecise) - java.lang.Throwable.fillInStackTrace() @bci=16, line=783 (Compiled frame) - java.lang.Throwable.init() @bci=24, line=250 (Compiled frame) - java.lang.Exception.init() @bci=1, line=54 (Compiled frame) - java.lang.InterruptedException.init() @bci=1, line=57 (Compiled frame) - java.lang.Object.wait(long) @bci=0 (Compiled frame) - java.lang.Object.wait() @bci=2, line=503 (Compiled frame) - org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle, org.osgi.framework.ServiceReference, boolean) @bci=86, line=313 (Compiled frame) Even though the thread is executing wait() all of the other Felix SR-accessing threads are blocked on the Service Registry lock. The net effect is that any operation on the Service Registry is blocked. There is one thing that I don't understand and that is that in the above frames the lock should really be released, as the code is in wait(). However, it seems like the lock is still held because none of the other threads are getting access to the Service Registry. For example another such thread is the following which is actually about to decrease the usage count on the service and then call notifyAll(): Thread 48643: (state = BLOCKED) - org.apache.felix.framework.ServiceRegistry.getService(org.osgi.framework.Bundle, org.osgi.framework.ServiceReference, boolean) @bci=241, line=367 (Compiled frame) - org.apache.felix.framework.util.EventDispatcher.filterListenersUsingHooks(org.osgi.framework.ServiceEvent, org.osgi.framework.launch.Framework, java.util.Map) @bci=349, line=618 (Compiled frame) - org.apache.felix.framework.util.EventDispatcher.fireServiceEvent(org.osgi.framework.ServiceEvent, java.util.Dictionary, org.osgi.framework.launch.Framework) @bci=33, line=542 (Interpreted frame) - org.apache.felix.framework.Felix.fireServiceEvent(org.osgi.framework.ServiceEvent, java.util.Dictionary) @bci=7, line=4547 (Compiled frame) - org.apache.felix.framework.Felix.access$000(org.apache.felix.framework.Felix, org.osgi.framework.ServiceEvent, java.util.Dictionary) @bci=3, line=106 (Compiled frame) - org.apache.felix.framework.Felix$1.serviceChanged(org.osgi.framework.ServiceEvent, java.util.Dictionary) @bci=6, line=436 (Compiled frame) - org.apache.felix.framework.ServiceRegistry.unregisterService(org.osgi.framework.Bundle, org.osgi.framework.ServiceRegistration) @bci=100, line=165 (Compiled frame) - org.apache.felix.framework.ServiceRegistrationImpl.unregister() @bci=52, line=140 (Interpreted frame) I just don't understand why all the other threads are blocked on the service registry. I'm probably missing something simple, so would be grateful if someone else has an idea. Many thanks, David [1] http://svn.apache.org/repos/asf/felix/releases/org.apache.felix.framework-4.6.1/src/main/java/org/apache/felix/framework/ServiceRegistry.java