Juan Ramos created GEODE-8248:
---------------------------------

             Summary: Member hangs waiting for missing disk-stores after gfsh 
shutdown
                 Key: GEODE-8248
                 URL: https://issues.apache.org/jira/browse/GEODE-8248
             Project: Geode
          Issue Type: Bug
          Components: gfsh, persistence
            Reporter: Juan Ramos
         Attachments: temporal.zip

Let’s say I have 2 servers with a simple {{REPLICATE_PERSISTENT}} region and I 
stop both using the {{gfsh shutdown}} command.
According to the 
[documentation|https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html],
 I should be able to start either of the servers without any problems as both 
host the most up to date data. However, what happens in reality is that the 
startup hangs with the following:
{noformat}
(1) Executing - start server --name=server1 --locators=localhost[10334] 
--server-port=40401 --cache-xml-file=/temporal/cache.xml

.........
Region /TestRegion has potentially stale data. It is waiting for another member 
to recover the latest data.
My persistent id:

  DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c
  Name: server1
  Location: /temporal/server1/dataStore

Members with potentially new data:
[
  DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80
  Name: server2
  Location: /temporal/server2/dataStore
]


"main" #1 prio=5 os_prio=31 tid=0x00007f9b28809000 nid=0x1003 in Object.wait() 
[0x000070000ab04000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at 
org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62)
        - locked <0x0000000719df55e0> (a 
org.apache.geode.internal.cache.persistence.MembershipChangeListener)
        at 
org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218)
        at 
org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118)
        at 
org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835)
        at 
org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196)
        at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
        at 
org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198)
        at 
org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449)
        - locked <0x00000005c0593168> (a 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511)
        at 
org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208)
        at 
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
        - locked <0x00000005c016a108> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        - locked <0x00000005c0043de0> (a java.lang.Class for 
org.apache.geode.internal.cache.InternalCacheBuilder)
        at 
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
        - locked <0x00000005c0043de0> (a java.lang.Class for 
org.apache.geode.internal.cache.InternalCacheBuilder)
        at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
        at 
org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
        at 
org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
        at 
org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
        at 
org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
        at 
org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
{noformat}

We should either fix the problem and make sure the members fully synchronise 
their data during the {{shutdown}} process so they don't have to wait on each 
other or, if this is the expected behaviour, update the documentation 
accordingly.
The attached {{zip}} file contains a simple script to reproduce the issue, the 
only thing that needs to be changed after downloading and uncompressing the 
file, it's the {{GEMFIRE}} environment variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to