Thanks Sebastian. Is there a JIRA for this already? 2018-03-27 10:03 GMT+02:00 Sebastian Laskawiec <slask...@redhat.com>:
> At the moment, the cluster health status checker enumerates all caches in > the cache manager [1] and checks whether those cashes are running and not > in degraded more [2]. > > I'm not sure how counter caches have been implemented. One thing is for > sure - they should be taken into account in this loop [3]. > > [1] https://github.com/infinispan/infinispan/blob/ > master/core/src/main/java/org/infinispan/health/impl/ > ClusterHealthImpl.java#L22 > [2] https://github.com/infinispan/infinispan/blob/ > master/core/src/main/java/org/infinispan/health/impl/ > CacheHealthImpl.java#L25 > [3] https://github.com/infinispan/infinispan/blob/ > master/core/src/main/java/org/infinispan/health/impl/ > ClusterHealthImpl.java#L23-L24 > > On Mon, Mar 26, 2018 at 1:59 PM Thomas SEGISMONT <tsegism...@gmail.com> > wrote: > >> 2018-03-26 13:16 GMT+02:00 Pedro Ruivo <pe...@infinispan.org>: >> >>> >>> >>> On 23-03-2018 15:06, Thomas SEGISMONT wrote: >>> > Hi Pedro, >>> > >>> > 2018-03-23 13:25 GMT+01:00 Pedro Ruivo <pe...@infinispan.org >>> > <mailto:pe...@infinispan.org>>: >>> > >>> > Hi Thomas, >>> > >>> > Is the test in question using any counter/lock? >>> > >>> > >>> > I have seen the problem on a test for counters, on another one for >>> > locks, as well as well as caches only. >>> > But Vert.x starts the ClusteredLockManager and the CounterManager in >>> all >>> > cases (even if no lock/counter is created/used) >>> > >>> > >>> > I did see similar behavior with the counter's in our server test >>> suite. >>> > The partition handling makes the cache degraded because nodes are >>> > starting and stopping concurrently. >>> > >>> > >>> > As for me I was able to observe the problem even when stopping nodes >>> one >>> > after the other and waiting for cluster to go back to HEALTHY status. >>> > Is it possible that the status of the counter and lock caches are not >>> > taken into account in cluster health? >>> >>> The counter and lock caches are private. So, they aren't in the cluster >>> health neither their name are returned by getCacheNames() method. >>> >> >> Thanks for the details. >> >> I'm not concerned with these internal caches not being listed when >> calling getCacheNames. >> >> However, the cluster health status should include their status as well. >> Cluster status testing is the recommended way to implement readiness >> checks on Kubernetes for example. >> >> What do you think Sebastian? >> >> >>> >>> > >>> > >>> > I'm not sure if there are any JIRA to tracking. Ryan, Dan do you >>> know? >>> > If there is none, it should be created. >>> > >>> > I improved the counters by making the cache start lazily when you >>> first >>> > get or define a counter [1]. This workaround solved the issue for >>> us. >>> > >>> > As a workaround for your test suite, I suggest to make sure the >>> caches >>> > (___counter_configuration and org.infinispan.LOCK) have finished >>> their >>> > state transfer before stopping the cache managers, by invoking >>> > DefaultCacheManager.getCache(*cache-name*) in all the caches >>> managers. >>> > >>> > Sorry for the inconvenience and the delay in replying. >>> > >>> > >>> > No problem. >>> > >>> > >>> > Cheers, >>> > Pedro >>> > >>> > [1] https://issues.jboss.org/browse/ISPN-8860 >>> > <https://issues.jboss.org/browse/ISPN-8860> >>> > >>> > On 21-03-2018 16:16, Thomas SEGISMONT wrote: >>> > > Hi everyone, >>> > > >>> > > I am working on integrating Infinispan 9.2.Final in >>> vertx-infinispan. >>> > > Before merging I wanted to make sure the test suite passed but >>> it >>> > > doesn't. It's not the always the same test involved. >>> > > >>> > > In the logs, I see a lot of messages like "After merge (or >>> > coordinator >>> > > change), cache still hasn't recovered a majority of members and >>> must >>> > > stay in degraded mode. >>> > > The context involved are "___counter_configuration" and >>> > > "org.infinispan.LOCKS" >>> > > >>> > > Most often it's harmless but, sometimes, I also see this >>> exception >>> > > "ISPN000210: Failed to request state of cache" >>> > > Again the cache involved is either "___counter_configuration" or >>> > > "org.infinispan.LOCKS" >>> > > After this exception, the cache manager is unable to stop. It >>> > blocks in >>> > > method "terminate" (join on cache future). >>> > > >>> > > I thought the test suite was too rough (we stop all nodes at >>> the same >>> > > time). So I changed it to make sure that: >>> > > - nodes start one after the other >>> > > - a new node is started only when the previous one indicates >>> > HEALTHY status >>> > > - nodes stop one after the other >>> > > - a node is stopped only when it indicates HEALTHY status >>> > > Pretty much what we do on Kubernetes for the readiness check >>> > actually. >>> > > But it didn't get any better. >>> > > >>> > > Attached are the logs of such a failing test. >>> > > >>> > > Note that the Vert.x test itself does not fail, it's only when >>> > closing >>> > > nodes that we have issues. >>> > > >>> > > Here's our XML config: >>> > > >>> > https://github.com/vert-x3/vertx-infinispan/blob/ispn92/ >>> src/main/resources/default-infinispan.xml >>> > <https://github.com/vert-x3/vertx-infinispan/blob/ispn92/ >>> src/main/resources/default-infinispan.xml> >>> > > >>> > > Does that ring a bell? Do you need more info? >>> > > >>> > > Regards, >>> > > Thomas >>> > > >>> > > >>> > > >>> > > _______________________________________________ >>> > > infinispan-dev mailing list >>> > > infinispan-dev@lists.jboss.org >>> > <mailto:infinispan-dev@lists.jboss.org> >>> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> > <https://lists.jboss.org/mailman/listinfo/infinispan-dev> >>> > > >>> > _______________________________________________ >>> > infinispan-dev mailing list >>> > infinispan-dev@lists.jboss.org <mailto:infinispan-dev@lists. >>> jboss.org> >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> > <https://lists.jboss.org/mailman/listinfo/infinispan-dev> >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > infinispan-dev mailing list >>> > infinispan-dev@lists.jboss.org >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> > >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev >
_______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev