Hi Konrad,

On Sun, 2020-05-03 at 11:32 +0200, Konrad Windszus wrote:
> Hi, 
> I just experienced this exception when starting Sling Starter 12-
> SNAPSHOT:
> 
> 03.05.2020 11:22:33.372 *INFO* [CM Event Dispatcher (Fire
> ConfigurationEvent:
> pid=org.apache.jackrabbit.oak.spi.security.user.action.DefaultAuthori
> zableActionProvider)]
> org.apache.jackrabbit.oak.security.internal.SecurityProviderRegistrat
> ion Trying to unregister the SecurityProvider...
> 03.05.2020 11:22:33.405 *ERROR* [Apache Sling Repository Startup
> Thread #1]
> org.apache.sling.jcr.oak.server.internal.OakSlingRepositoryManager
> start: Uncaught Throwable trying to access Repository, calling
> stopRepository()
> java.lang.RuntimeException:
> org.apache.jackrabbit.oak.api.CommitFailedException: OakSegment0002:
> Merge interrupted
>       at
> org.apache.jackrabbit.oak.OakInitializer.initialize(OakInitializer.ja
> va:50) [org.apache.jackrabbit.oak-core:1.26.0]
>       at org.apache.jackrabbit.oak.Oak.initialContent(Oak.java:689)
> [org.apache.jackrabbit.oak-core:1.26.0]
>       at
> org.apache.jackrabbit.oak.Oak.createNewContentRepository(Oak.java:734
> ) [org.apache.jackrabbit.oak-core:1.26.0]
>       at
> org.apache.jackrabbit.oak.Oak.createContentRepository(Oak.java:673)
> [org.apache.jackrabbit.oak-core:1.26.0]
>       at
> org.apache.jackrabbit.oak.jcr.Jcr.createContentRepository(Jcr.java:37
> 6) [org.apache.jackrabbit.oak-jcr:1.26.0]
>       at
> org.apache.sling.jcr.oak.server.internal.OakSlingRepositoryManager.ac
> quireRepository(OakSlingRepositoryManager.java:152)
> [org.apache.sling.jcr.oak.server:1.2.4]
>       at
> org.apache.sling.jcr.base.AbstractSlingRepositoryManager.initializeAn
> dRegisterRepositoryService(AbstractSlingRepositoryManager.java:515)
> [org.apache.sling.jcr.base:3.1.0]
>       at
> org.apache.sling.jcr.base.AbstractSlingRepositoryManager.access$300(A
> bstractSlingRepositoryManager.java:92)
> [org.apache.sling.jcr.base:3.1.0]
>       at
> org.apache.sling.jcr.base.AbstractSlingRepositoryManager$4.run(Abstra
> ctSlingRepositoryManager.java:496) [org.apache.sling.jcr.base:3.1.0]
> Caused by: org.apache.jackrabbit.oak.api.CommitFailedException:
> OakSegment0002: Merge interrupted
>       at
> org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedu
> le(LockBasedScheduler.java:284) [org.apache.jackrabbit.oak-segment-
> tar:1.26.0]
>       at
> org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge(SegmentNodeS
> tore.java:211) [org.apache.jackrabbit.oak-segment-tar:1.26.0]
>       at
> org.apache.jackrabbit.oak.OakInitializer.initialize(OakInitializer.ja
> va:48) [org.apache.jackrabbit.oak-core:1.26.0]
>       ... 8 common frames omitted
> Caused by: java.lang.InterruptedException: null
>       at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedIn
> terruptibly(AbstractQueuedSynchronizer.java:1302)
>       at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
>       at
> org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedu
> le(LockBasedScheduler.java:262) [org.apache.jackrabbit.oak-segment-
> tar:1.26.0]
>       ... 10 common frames omitted
> I don't know what lead to the restart of the Oak service itself
> (probably some reconfigurations) and I remember vaguely there once
> was a JIRA issue about that. Couldn't find it though, so any pointers
> are appreciated.
> 
> Should I open this in a new issue or add as comment to any existing
> one?

I see you correctly identified SLING-7811 [1] as place where we discuss
the root cause. I think the problem surfaces at different levels:

1. We start components and the apply configurations later, this
restarting them. I though this was fixed in the feature model, but
apparently we get the same problem [2]. One option is to ask Oak to
make a configuration required for some components [3], but IMO this is
not fixing the root cause. Oak is not to blame for the way we manage
components and configurations.

2. Oak does not play nicely with Thread interrupt. Unfortunately this
is by design [4]

3. We stop the Oak repository using Thread.interrupt [1]. As mentioned
in the Jira issue, I am working on a version that stops the repository 
without using interrupts, but on the other hand this will not fix it
properly.

If we initialise the repository without all the required
services/initialisers being available, we will create and expose an
inconsistent repository service, which leads to Sling being unable to
start and an incorrect repository state. I am more and more convinced
that we must be 100% sure that we have all the pre-requisites for the
repository service. The fix in Oak [3] will help, but I would've
preferred a solution at the OSGi level.

Thanks,
Robert

[1]: https://issues.apache.org/jira/browse/SLING-7811
[2]: 
https://issues.apache.org/jira/browse/SLING-9118?focusedCommentId=17090938&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17090938
[3]: https://issues.apache.org/jira/browse/OAK-9047
[4]: https://jackrabbit.apache.org/oak/docs/dos_and_donts.html

Reply via email to