Hi JB, yes, Its possible that some installs are slower, but the initial install is one doProvision command, so I doubt that this is noticeable, and the race conditions I observed show that it actually is not safe (which could be fixed, but my proposal is faster to implement).
I also noticed that there is probably excessive parsing of bundles, which I was unable to fix so far (not a real problem except for startup speed) https://github.com/apache/karaf/blob/master/features/core/src/main/java/org/apache/karaf/features/internal/region/Subsystem.java#L355 Subsystem#downloadBundles is recursively invoked. The first subsystem is root, the children are all the karaf standard features. What is happening is that recursively download tasks are created. What this means is that if subsystem A has a dependency to X and subsystem B has a dependency to X, then X will be downloaded twice. Actually it might not be downloaded twice due to policy, but at least the jar is scanned twice and the metadata is extracted twice. With the amount of interdependencies, this is actually quite excessive. According to my profiling certain karaf dependencies are attempted to download and verified a couple hundred times. What could be done is that maybe the subsystem dependency graph is carried over to the next subsystem, avoiding to rebuild it again. An easier fix would be to within each subsystem first collect all locations for "downloader.download(loc, new DownloadCallback() {}" before actually triggering the download and its callback. But I found that this is not so effective, as the overlap is mostly between subsystems, not within. Anybody thinks it could be possible to share the "bundles" state between subsystems, so that multiple downloads for the same bundle can be avoided? Fabian PS: There a a couple ways to see the problem. For example, just log out the location passed into org.apache.karaf.features.internal.download.impl.MavenDownloadManager.MavenDownloader.download(String, DownloadCallback) - you will see the same urls over and over. On Wed, Jan 27, 2016 at 6:53 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Fabian, > > I quick take a look on your PR and it looks good to me. I think your usage > of the ExecutorService single thread instead of cached thread pool makes > sense. The only "impact" that I would like to evaluate is when the resolver > has to install "lot of" features in the same time. However, the impact is > probably just the installation time will be a bit long, but on the other > hand more reliable, so not a big deal ;) > > Let me run some tests on your PR. > > Thanks ! > Regards > JB > > > On 01/26/2016 10:57 PM, Fabian Lange wrote: > >> Hi JB, >> it turns out this is most likely the old Issue I reported once. Having now >> more insights into what should be done, I propose this solution: >> https://github.com/apache/karaf/pull/138 >> >> It basically ensures that only one provision is concurrently executing. >> What do you think? >> >> Fabian >> >> >> On Tue, Jan 26, 2016 at 10:19 PM, Fabian Lange < >> fabian.la...@codecentric.de> >> wrote: >> >> Hi JB, >>> not really reproducable. We had this before, and I then introduced a >>> locking abstraction around the features manager. This is a programmatic >>> installation of features. So I assume that features are no longer >>> installed >>> concurrently. >>> >>> However it might race with karaf/felix bootstrap and feature >>> installation. >>> I only synchronized my installation >>> >>> What I do, is that I add a feature repository, then do a foreach and add >>> the requirements, then fire off the install >>> >>> Fabian >>> >>> >>> On Tue, Jan 26, 2016 at 10:16 PM, Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >>> It sounds like concurrent update of feature state in the store. >>>> >>>> Does it mean users install/uninstall multiple features in the same time >>>> ? >>>> >>>> Any use case to reproduce it ? >>>> >>>> Thanks >>>> Regards >>>> JB >>>> >>>> >>>> On 01/26/2016 09:18 PM, Fabian Lange wrote: >>>> >>>> Hi all, >>>>> I am seeing a significant amount of following stacktraces reported by >>>>> our >>>>> users. they seem to go away with restarting, so it looks like a leak or >>>>> race condition to me: >>>>> >>>>> java.lang.NullPointerException >>>>> >>>>> at >>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:203)[:1.8.0_72] >>>>> >>>>> at >>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:162)[:1.8.0_72] >>>>> >>>>> at >>>>> >>>>> org.apache.karaf.features.internal.osgi.Activator$1.getOutputStream(Activator.java:197)[7:org.apache.karaf.features.core:4.0.4] >>>>> >>>>> at >>>>> >>>>> org.apache.karaf.features.internal.service.StateStorage.save(StateStorage.java:56)[7:org.apache.karaf.features.core:4.0.4] >>>>> >>>>> at >>>>> >>>>> org.apache.karaf.features.internal.service.FeaturesServiceImpl.saveState(FeaturesServiceImpl.java:297)[7:org.apache.karaf.features.core:4.0.4] >>>>> >>>>> at >>>>> >>>>> org.apache.karaf.features.internal.service.FeaturesServiceImpl.saveState(FeaturesServiceImpl.java:1154)[7:org.apache.karaf.features.core:4.0.4] >>>>> >>>>> at >>>>> >>>>> org.apache.karaf.features.internal.service.Deployer.deploy(Deployer.java:764)[7:org.apache.karaf.features.core:4.0.4] >>>>> >>>>> at >>>>> >>>>> org.apache.karaf.features.internal.service.FeaturesServiceImpl.doProvision(FeaturesServiceImpl.java:1089)[7:org.apache.karaf.features.core:4.0.4] >>>>> >>>>> at >>>>> >>>>> org.apache.karaf.features.internal.service.FeaturesServiceImpl$1.call(FeaturesServiceImpl.java:985)[7:org.apache.karaf.features.core:4.0.4] >>>>> >>>>> at >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_72] >>>>> >>>>> at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_72] >>>>> >>>>> at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_72] >>>>> >>>>> at java.lang.Thread.run(Thread.java:745)[:1.8.0_72] >>>>> >>>>> >>>>> the NPE is caused by bundleContex.getDataFile(state.json) returning >>>>> null. >>>>> The javadoc says that this could be possible, but because it works most >>>>> of >>>>> the time I speculate this might be actally a race condition, or maybe >>>>> leaked filehandles somewhere. >>>>> Anybody got pointers or ideas? >>>>> I am willing to make a fix, but I need to understand this more >>>>> >>>>> Fabian >>>>> >>>>> >>>>> -- >>>> Jean-Baptiste Onofré >>>> jbono...@apache.org >>>> http://blog.nanthrax.net >>>> Talend - http://www.talend.com >>>> >>>> >>> >>> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >