Thanks Robert. We tried ensuring only a single Sling pod was hitting the database at one time with some strange results:
The first time it runs (against an empty database) everything goes well: the database is populated and the pod comes up with no issues. We then bring this pod down, and then try to bring the same exact one up again with the original exception popping up again: 29.01.2020 02:58:59.571 *ERROR* [Apache Sling Repository Startup Thread #4] ERROR: Bundle '160' EventDispatcher: Error during dispatch. (org.apache.sling.api.SlingException: Can't create the JCR event listener.) org.apache.sling.api.SlingException: Can't create the JCR event listener. ... ... Caused by: javax.jcr.LoginException: Can neither derive user name nor principal names for bundle org.apache.sling.jcr.resource [154] and sub service observation at org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(AbstractSlingRepository2.java:387) I wonder if the sling pod is leaving the database in an unusable state when being brought down. Regards, Carlos On Thu, Feb 6, 2020 at 4:11 AM Robert Munteanu <romb...@apache.org> wrote: > On Wed, 2020-02-05 at 21:17 -0500, Carlos Munoz wrote: > > Hi all, > > > > I think I have a theory for our issues here, and it may have to do > > with the > > fact that we are running on a heavily containerized environment > > (kubernetes). I wanted to consult with the community experts to see > > what > > you thought. > > > > The way our container platform works on an update is that it will try > > to > > bring up a new container with sling (and our application) against the > > same > > mongo database that an original (and still running) container is > > running > > against. Now this works fine when the only thing being updated is our > > application bundle, but it starts encountering problems when several > > other > > bundles and configurations are being updated (some removed, some > > added, > > some updated). I *think* the core of the problem here is that the > > bundles > > and configurations are all stored in the database itself, and two > > containers with potentially different bundle versions and > > configurations > > are attempting to use it simultaneously. > > That is a pretty good guess I'd say :-) > > I did see some similar problems when using Sling for development > purposes on k8s. I never went to production with it, but for my own > purposes it was enough to ensure that only one Sling pod was starting > up at a time. Maybe you can try that as well? > > A more involved solution would be to use the CompositeNodeStore [1], > which is designed to separate the storage of /libs and /apps from the > rest of the repository. So for instance you'd have /libs and /apps > stored on a local segment store for each pod, and the rest of the > content in Mongo. > > Unfortunately there is very little documentation and no tooling around > it available, so that makes it a difficult proposition. > > Thanks, > Robert > > > [1]: https://jackrabbit.apache.org/oak/docs/nodestore/compositens.html > > > > > If I am right, then our core problem to figure out is how to upgrade > > a > > database from one sling version to the next. > > > > Let me know what you all think. > > > > Regards, > > > > Carlos > > > > On Tue, Feb 4, 2020 at 7:06 AM Carlos Munoz <camu...@redhat.com> > > wrote: > > > > > Thanks Bertrand! I will continue my fact finding mission here :) > > > > > > Regards, > > > > > > Carlos > > > > > > On Tue, Feb 4, 2020 at 4:31 AM Bertrand Delacretaz < > > > bdelacre...@apache.org> > > > wrote: > > > > > > > Hi, > > > > > > > > On Sun, Feb 2, 2020 at 4:50 AM Carlos Munoz <camu...@redhat.com> > > > > wrote: > > > > > ...do configurations from the > > > > > repoinit files get installed in a specific order with relation > > > > > to the > > > > > artifacts?... > > > > > > > > The repoinit configs are applied by a single > > > > SlingRepositoryInitializer [1] service which is implemented by > > > > org.apache.sling.jcr.repoinit.impl.RepositoryInitializer [2]. > > > > > > > > The execution order of the SlingRepositoryInitializer services is > > > > based on their service rankings [4] and the RepositoryInitializer > > > > processes its configurations in the order in which they are > > > > provided > > > > by the OSGi framework, sequentially. > > > > > > > > All this happens before the SlingRepository service is made > > > > available [3] > > > > > > > > The logs should help understand what's going on but IIRC it all > > > > happens in a single thread. > > > > > > > > -Bertrand > > > > > > > > [1] > > > > > https://sling.apache.org/documentation/bundles/repository-initialization.html > > > > [2] > > > > > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/master/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java > > > > [3] > > > > > https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L511 > > > > [4] > > > > > https://github.com/apache/sling-org-apache-sling-jcr-base/blob/e8fe5e004b5af1802bb2a76dbbb583a437f848ee/src/main/java/org/apache/sling/jcr/base/AbstractSlingRepositoryManager.java#L581 > > > > > > > > > >