[ https://issues.apache.org/jira/browse/GEODE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768480#comment-15768480 ]
Kirk Lund commented on GEODE-2238: ---------------------------------- I closed GEODE-2244 as a duplicate of GEODE-2238. Below is where initialization of cluster config is async during locator startup. I believe we need to have incoming requests for cluster config determine that cluster config is enabled and then wait on a Future for initialization of cluster config to complete. The method in InternalLocator is startCache(DistributedSystem): {code:java} private void startCache(DistributedSystem ds) { GemFireCacheImpl gfc = GemFireCacheImpl.getInstance(); if (gfc == null) { logger.info("Creating cache for locator."); this.myCache = new CacheFactory(ds.getProperties()).create(); gfc = (GemFireCacheImpl) this.myCache; } else { logger.info("Using existing cache for locator."); ((InternalDistributedSystem) ds).handleResourceEvent(ResourceEvent.LOCATOR_START, this); } startJmxManagerLocationService(gfc); startSharedConfigurationService(gfc); } {code} The method startSharedConfigurationService hands off to a thread to load cluster config and use distributed lock service to become primary cluster config source. This was probably made async due to use of distributed lock service to keep startup responsive. Unfortunately, it opens up a race condition window -- if any requests come in before cluster config is ready, the locator will reply saying it doesn't have cluster config. I think some of our Flaky tests are hitting this race condition. > Member may fail to receive cluster configuration from locator > ------------------------------------------------------------- > > Key: GEODE-2238 > URL: https://issues.apache.org/jira/browse/GEODE-2238 > Project: Geode > Issue Type: Bug > Components: management > Affects Versions: 1.0.0-incubating > Reporter: Kirk Lund > Assignee: Dan Smith > Labels: Flaky > > LuceneClusterConfigurationDUnitTest.indexWithAnalyzerGetsCreatedUsingClusterConfiguration > is failing frequently in precheckin. I'm going to mark it as FlakyTest. > Below is the stack trace: > {noformat} > :geode-lucene:distributedTest > org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest > > indexWithAnalyzerGetsCreatedUsingClusterConfiguration FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest$$Lambda$29/613305101.run > in VM 2 running on Host 3fb23bc375ef with 4 VMs > at org.apache.geode.test.dunit.VM.invoke(VM.java:344) > at org.apache.geode.test.dunit.VM.invoke(VM.java:314) > at org.apache.geode.test.dunit.VM.invoke(VM.java:259) > at org.apache.geode.test.dunit.rules.Member.invoke(Member.java:60) > at > org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest.indexWithAnalyzerGetsCreatedUsingClusterConfiguration(LuceneClusterConfigurationDUnitTest.java:102) > Caused by: > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertNotNull(Assert.java:712) > at org.junit.Assert.assertNotNull(Assert.java:722) > at > org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest.lambda$indexWithAnalyzerGetsCreatedUsingClusterConfiguration$bb17a952$1(LuceneClusterConfigurationDUnitTest.java:105) > 94 tests completed, 1 failed > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)