Anton Mironenko created GEODE-3003:
--------------------------------------

             Summary: Geode doesn't start after cluster restart when using 
cluster-configuration
                 Key: GEODE-3003
                 URL: https://issues.apache.org/jira/browse/GEODE-3003
             Project: Geode
          Issue Type: Bug
          Components: configuration
            Reporter: Anton Mironenko


There is a two-host Geode cluster with locator and server on each host.
First start of all nodes goes well.
Then all nodes are gracefully stopped (kill [locator-PID] [server-PID]).
The second start goes wrong: the locator on the first host always doesn't join 
the rest of the cluster with the error in the locator log:
"Region /_ConfigurationRegion has potentially stale data. It is waiting for 
another member to recover the latest data."

And sometimes (once per 5 starts) some server shuts down just after start with 
the error 
"org.apache.geode.GemFireConfigException: cluster configuration service not 
available".

This bug started appearing only when we moved to Geode 1.1.1. And it totally 
blocks us.
On GemFire 8.2.1 there was no such a bug.

This is very easy to reproduce.

Test preparation:
---------------------
Here are two attached zip files - "geode-host1.zip" and "geode-host2.zip"
1) unzip "geode-host1.zip" into some folder on your first host
2) in start-locator.sh change the IPs of locators to the values of your host1 
and host2
"--locators=10.50.3.38[20236],10.50.3.14[20236]"
3) in start-server.sh 
"locators=10.50.3.38[20236],10.50.3.14[20236]" change the IPs of locators to 
the values of your host1 and host2
4) do the bullets 1)-4) for host2, the folder where you unzip the file should 
be the same as on the first host

Test running:
---------------
1) rm -rf {locator0,server1}
2) run ./start-locator.sh; ./start-server.sh on both hosts. See that this 
cluster start is successful.
3) kill locator and server processes on both hosts
kill [locator-PID] [server-PID]
4) run ./start-locator.sh; ./start-server.sh on both hosts
5) see that actually there are two clusters: "host1-locator" and "host1-server, 
host2-locator, host2-server" instead of one cluster. And sometimes there is no 
"host1-server", because it shutdown with error "Region /_ConfigurationRegion 
has potentially stale data. It is waiting for another member to recover the 
latest data.".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to