[ https://issues.apache.org/jira/browse/GEODE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Swapnil Bawaskar closed GEODE-3003. ----------------------------------- > Geode doesn't start after cluster restart when using cluster-configuration > -------------------------------------------------------------------------- > > Key: GEODE-3003 > URL: https://issues.apache.org/jira/browse/GEODE-3003 > Project: Geode > Issue Type: Bug > Components: configuration, membership > Reporter: Anton Mironenko > Assignee: Kenneth Howe > Priority: Major > Fix For: 1.5.0 > > Attachments: 20170522-geode-klyazma.zip, 20170522-geode-vyazma.zip, > 20170608-host1-locator0.zip, 20170608-host2-locator0.zip, geode-host1.zip, > geode-host2.zip, readme.txt > > > There is a two-host Geode cluster with locator and server on each host. > First start of all nodes goes well. > Then all nodes are gracefully stopped (kill [locator-PID] [server-PID]). > The second start goes wrong: the locator on the first host always doesn't > join the rest of the cluster with the error in the locator log: > "Region /_ConfigurationRegion has potentially stale data. It is waiting for > another member to recover the latest data." > And sometimes (once per 5 starts) some server shuts down just after start > with the error > "org.apache.geode.GemFireConfigException: cluster configuration service not > available". > This bug started appearing only when we moved to Geode 1.1.1. And it totally > blocks us. > On GemFire 8.2.1 there was no such a bug. > This is very easy to reproduce. > Test preparation: > --------------------- > Here are two attached zip files - "geode-host1.zip" and "geode-host2.zip" > 1) unzip "geode-host1.zip" into some folder on your first host > 2) in start-locator.sh change the IPs of locators to the values of your host1 > and host2 > "--locators=10.50.3.38[20236],10.50.3.14[20236]" > 3) in start-server.sh > "locators=10.50.3.38[20236],10.50.3.14[20236]" change the IPs of locators to > the values of your host1 and host2 > 4) do the bullets 1)-3) for host2, the folder where you unzip the file should > be the same as on the first host > Test running: > --------------- > 1) rm -rf {locator0,server1} > 2) run ./start-locator.sh; ./start-server.sh on host1, then on host2. See > that this cluster start is successful. > 3) kill locator and server processes first on host1, then on host2 > kill [locator-PID] [server-PID] > 4) run > ./start-locator.sh; ./start-server.sh > on host1, then on host2. Make sure the interval between this command run on > two hosts is less than 1 second! > 5) see via gfsh that actually there are two clusters: "host1-locator" and > "host1-server, host2-locator, host2-server" instead of one cluster. And > sometimes there is no "host1-server", because it shutdown with the error > "Region /_ConfigurationRegion has potentially stale data. It is waiting for > another member to recover the latest data.". -- This message was sent by Atlassian JIRA (v7.6.3#76005)