Hi JB,
yes, Its possible that some installs are slower, but the initial install is
one doProvision command, so I doubt that this is noticeable, and the race
conditions I observed show that it actually is not safe (which could be
fixed, but my proposal is faster to implement).

I also noticed that there is probably excessive parsing of bundles, which I
was unable to fix so far (not a real problem except for startup speed)

https://github.com/apache/karaf/blob/master/features/core/src/main/java/org/apache/karaf/features/internal/region/Subsystem.java#L355

Subsystem#downloadBundles is recursively invoked.
The first subsystem is root, the children are all the karaf standard
features.
What is happening is that recursively download tasks are created. What this
means is that if subsystem A has a dependency to X and subsystem B has a
dependency to X, then X will be downloaded twice. Actually it might not be
downloaded twice due to policy, but at least the jar is scanned twice and
the metadata is extracted twice.
With the amount of interdependencies, this is actually quite excessive.
According to my profiling certain karaf dependencies are attempted to
download and verified a couple hundred times.

What could be done is that maybe the subsystem dependency graph is carried
over to the next subsystem, avoiding to rebuild it again.
An easier fix would be to within each subsystem first collect all locations
for "downloader.download(loc, new DownloadCallback() {}" before actually
triggering the download and its callback. But I found that this is not so
effective, as the overlap is mostly between subsystems, not within.

Anybody thinks it could be possible to share the "bundles" state between
subsystems, so that multiple downloads for the same bundle can be avoided?

Fabian

PS: There a a couple ways to see the problem. For example, just log out the
location passed
into 
org.apache.karaf.features.internal.download.impl.MavenDownloadManager.MavenDownloader.download(String,
DownloadCallback) - you will see the same urls over and over.


On Wed, Jan 27, 2016 at 6:53 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Fabian,
>
> I quick take a look on your PR and it looks good to me. I think your usage
> of the ExecutorService single thread instead of cached thread pool makes
> sense. The only "impact" that I would like to evaluate is when the resolver
> has to install "lot of" features in the same time. However, the impact is
> probably just the installation time will be a bit long, but on the other
> hand more reliable, so not a big deal ;)
>
> Let me run some tests on your PR.
>
> Thanks !
> Regards
> JB
>
>
> On 01/26/2016 10:57 PM, Fabian Lange wrote:
>
>> Hi JB,
>> it turns out this is most likely the old Issue I reported once. Having now
>> more insights into what should be done, I propose this solution:
>> https://github.com/apache/karaf/pull/138
>>
>> It basically ensures that only one provision is concurrently executing.
>> What do you think?
>>
>> Fabian
>>
>>
>> On Tue, Jan 26, 2016 at 10:19 PM, Fabian Lange <
>> fabian.la...@codecentric.de>
>> wrote:
>>
>> Hi JB,
>>> not really reproducable. We had this before, and I then introduced a
>>> locking abstraction around the features manager. This is a programmatic
>>> installation of features. So I assume that features are no longer
>>> installed
>>> concurrently.
>>>
>>> However it might race with karaf/felix bootstrap and feature
>>> installation.
>>> I only synchronized my installation
>>>
>>> What I do, is that I add a feature repository, then do a foreach and add
>>> the requirements, then fire off the install
>>>
>>> Fabian
>>>
>>>
>>> On Tue, Jan 26, 2016 at 10:16 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>>
>>> It sounds like concurrent update of feature state in the store.
>>>>
>>>> Does it mean users install/uninstall multiple features in the same time
>>>> ?
>>>>
>>>> Any use case to reproduce it ?
>>>>
>>>> Thanks
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 01/26/2016 09:18 PM, Fabian Lange wrote:
>>>>
>>>> Hi all,
>>>>> I am seeing a significant amount of following stacktraces reported by
>>>>> our
>>>>> users. they seem to go away with restarting, so it looks like a leak or
>>>>> race condition to me:
>>>>>
>>>>> java.lang.NullPointerException
>>>>>
>>>>>          at
>>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:203)[:1.8.0_72]
>>>>>
>>>>>          at
>>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:162)[:1.8.0_72]
>>>>>
>>>>>          at
>>>>>
>>>>> org.apache.karaf.features.internal.osgi.Activator$1.getOutputStream(Activator.java:197)[7:org.apache.karaf.features.core:4.0.4]
>>>>>
>>>>>          at
>>>>>
>>>>> org.apache.karaf.features.internal.service.StateStorage.save(StateStorage.java:56)[7:org.apache.karaf.features.core:4.0.4]
>>>>>
>>>>>          at
>>>>>
>>>>> org.apache.karaf.features.internal.service.FeaturesServiceImpl.saveState(FeaturesServiceImpl.java:297)[7:org.apache.karaf.features.core:4.0.4]
>>>>>
>>>>>          at
>>>>>
>>>>> org.apache.karaf.features.internal.service.FeaturesServiceImpl.saveState(FeaturesServiceImpl.java:1154)[7:org.apache.karaf.features.core:4.0.4]
>>>>>
>>>>>          at
>>>>>
>>>>> org.apache.karaf.features.internal.service.Deployer.deploy(Deployer.java:764)[7:org.apache.karaf.features.core:4.0.4]
>>>>>
>>>>>          at
>>>>>
>>>>> org.apache.karaf.features.internal.service.FeaturesServiceImpl.doProvision(FeaturesServiceImpl.java:1089)[7:org.apache.karaf.features.core:4.0.4]
>>>>>
>>>>>          at
>>>>>
>>>>> org.apache.karaf.features.internal.service.FeaturesServiceImpl$1.call(FeaturesServiceImpl.java:985)[7:org.apache.karaf.features.core:4.0.4]
>>>>>
>>>>>          at
>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_72]
>>>>>
>>>>>          at
>>>>>
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_72]
>>>>>
>>>>>          at
>>>>>
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_72]
>>>>>
>>>>>          at java.lang.Thread.run(Thread.java:745)[:1.8.0_72]
>>>>>
>>>>>
>>>>> the NPE is caused by bundleContex.getDataFile(state.json) returning
>>>>> null.
>>>>> The javadoc says that this could be possible, but because it works most
>>>>> of
>>>>> the time I speculate this might be actally a race condition, or maybe
>>>>> leaked filehandles somewhere.
>>>>> Anybody got pointers or ideas?
>>>>> I am willing to make a fix, but I need to understand this more
>>>>>
>>>>> Fabian
>>>>>
>>>>>
>>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbono...@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>>
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to