[
https://issues.apache.org/jira/browse/SLING-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858099#comment-15858099
]
Karl Pauls edited comment on SLING-5457 at 2/8/17 3:04 PM:
-----------------------------------------------------------
I think I can see what is going on namely, while the installer is active there
is a start level change going on at the same time and the two are racing for
the same bundle.
That makes it so that sometimes the interaction is:
Bundle: ACTIVE
Installer: stop bundle
Bundle: STOPPED
Startlevel: start bundle
Bundle: STARTING
Installer: update bundle
Exception: bundle STARTING
Bundle: ACTIVE
In reality, this can be generalised to any two management agents racing for the
same bundle in this sequence. The bundle update isn’t trying to wait for a
bundle that is in the STOPPING or STARTING state. Instead, as mentioned in the
issue, an exception is thrown and I think that is actually a bug in Felix
(technically, its more a missing feature but that is besides the point) as
newer versions of the spec mandate that on an update the framework should wait
for bundles that are STOPPING or STARTING - hence, the real fix for this issue
is to implement that behaviour in the Felix framework.
However, additionally, I think that this specific interaction with the start
level change and the installer is somewhat unfortunate. It probably would be
worthwhile for the installer to try to only be active when there is no start
level change going on (I remember that there was some other bug report on the
sling dev list recently that I suspect might be related to this interaction).
Implementing a retry as proposed here should be ok as a short term bandaid.
Ultimatly, I’d say this should be addressed by an improved Felix framework and
possibly a better handling of start level changes by the installer.
I created FELIX-5528 to try to address this in the framework (as well as trying
to improve the error message as well as part of FELIX-5138.
was (Author: karlpauls):
I think I can see what is going on namely, while the installer is active there
is a start level change going on at the same time and the two are racing for
the same bundle.
That makes it so that sometimes the interaction is:
Bundle: ACTIVE
Installer: stop bundle
Bundle: STOPPED
Startlevel: start bundle
Bundle: STARTING
Installer: update bundle
Exception: bundle STARTING
Bundle: ACTIVE
In reality, this can be generalised to any two management agents racing for the
same bundle in this sequence. The bundle update isn’t trying to wait for a
bundle that is in the STOPPING or STARTING state. Instead, as mentioned in the
issue, an exception is thrown and I think that is actually a bug in Felix
(technically, its more a missing feature but that is besides the point) as
newer versions of the spec mandate that on an update the framework should wait
for bundles that are STOPPING or STARTING - hence, the real fix for this issue
is to implement that behaviour in the Felix framework.
However, additionally, I think that this specific interaction with the start
level change and the installer is somewhat unfortunate. It probably would be
worthwhile for the installer to try to only be active when there is no start
level change going on (I remember that there was some other bug report on the
sling dev list recently that I suspect might be related to this interaction).
Implementing a retry as proposed here should be ok as a short term bandaid.
Ultimatly, I’d say this should be addressed by an improved Felix framework and
possibly a better handling of start level changes by the installer.
I created FELIX-5528 to try to address this in the framework (as well as trying
to improve the error message as well as part of FELIX-5138).
> OsgiInstaller should retry to start bundles on failures
> -------------------------------------------------------
>
> Key: SLING-5457
> URL: https://issues.apache.org/jira/browse/SLING-5457
> Project: Sling
> Issue Type: Bug
> Components: Installer
> Affects Versions: Installer Core 3.6.4
> Reporter: Jörg Hoh
>
> The OsgiInstaller doesn't update a bundle properly, if there's an exception
> from the framework.
> I have this exception:
> {code}
> 11.12.2015 14:09:36.753 *INFO* [FelixStartLevel] my.custom.bundle BundleEvent
> RESOLVED
> 11.12.2015 14:09:36.753 *INFO* [FelixStartLevel] my.custom.bundle BundleEvent
> STARTING
> 11.12.2015 14:09:36.754 INFO [OsgiInstallerImpl]
> org.apache.sling.installer.core.impl.tasks.BundleUpdateTask Removing failing
> update task - unable to retry: BundleUpdateTask:
> TaskResource(url=jcrinstall:/apps/myapp/install/my.custom.bundle-1.5.6-SNAPSHOT.jar,
> entity=bundle:my.custom.bundle, state=INSTALL,
> attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:28:84:15:,
> Bundle-SymbolicName=my.custom.bundle, Bundle-Version=1.5.6-SNAPSHOT],
> digest=1449838063263)
> org.osgi.framework.BundleException: Bundle my.custom.bundle [252] cannot be
> update, since it is either starting or stopping.
> at org.apache.felix.framework.Felix.updateBundle(Felix.java:2311)
> at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:995)
> at
> org.apache.sling.installer.core.impl.tasks.BundleUpdateTask.execute(BundleUpdateTask.java:92)
> at
> org.apache.sling.installer.core.impl.OsgiInstallerImpl.doExecuteTasks(OsgiInstallerImpl.java:847)
> at
> org.apache.sling.installer.core.impl.OsgiInstallerImpl.executeTasks(OsgiInstallerImpl.java:689)
> at
> org.apache.sling.installer.core.impl.OsgiInstallerImpl.run(OsgiInstallerImpl.java:265)
> at java.lang.Thread.run(Thread.java:767)
> {code}
> I don't know for what reason the Felix.updateBundle() failed (see also
> FELIX-5138 to get some more information in this case), but from my point of
> view there should be a dedicated error handling just for the
> {code}BundleImpl.update{code} call. Does it make sense to retry the
> installation at a later point in time (maybe 3 times at max)?
> (I got this exception when I deployed a large number of bundles through the
> JCR installer. It happens only once in a while, but it's an annoying task to
> fix it manually.)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)