Juan Ramos created GEODE-8248: --------------------------------- Summary: Member hangs waiting for missing disk-stores after gfsh shutdown Key: GEODE-8248 URL: https://issues.apache.org/jira/browse/GEODE-8248 Project: Geode Issue Type: Bug Components: gfsh, persistence Reporter: Juan Ramos Attachments: temporal.zip
Let’s say I have 2 servers with a simple {{REPLICATE_PERSISTENT}} region and I stop both using the {{gfsh shutdown}} command. According to the [documentation|https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html], I should be able to start either of the servers without any problems as both host the most up to date data. However, what happens in reality is that the startup hangs with the following: {noformat} (1) Executing - start server --name=server1 --locators=localhost[10334] --server-port=40401 --cache-xml-file=/temporal/cache.xml ......... Region /TestRegion has potentially stale data. It is waiting for another member to recover the latest data. My persistent id: DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c Name: server1 Location: /temporal/server1/dataStore Members with potentially new data: [ DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80 Name: server2 Location: /temporal/server2/dataStore ] "main" #1 prio=5 os_prio=31 tid=0x00007f9b28809000 nid=0x1003 in Object.wait() [0x000070000ab04000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62) - locked <0x0000000719df55e0> (a org.apache.geode.internal.cache.persistence.MembershipChangeListener) at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218) at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118) at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835) at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52) at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196) at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076) at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043) at org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198) at org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116) at org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449) - locked <0x00000005c0593168> (a org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511) at org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337) at org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272) at org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388) at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208) at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207) - locked <0x00000005c016a108> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl) - locked <0x00000005c0043de0> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder) at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164) - locked <0x00000005c0043de0> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder) at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139) at org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52) at org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869) at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786) at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716) at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236) {noformat} We should either fix the problem and make sure the members fully synchronise their data during the {{shutdown}} process so they don't have to wait on each other or, if this is the expected behaviour, update the documentation accordingly. The attached {{zip}} file contains a simple script to reproduce the issue, the only thing that needs to be changed after downloading and uncompressing the file, it's the {{GEMFIRE}} environment variable. -- This message was sent by Atlassian Jira (v8.3.4#803005)