I’d like to propose a functional change to cache creation when a cache server 
is created via a cache.xml file. This proposal originated from work on 
GEODE-1128 <https://issues.apache.org/jira/browse/GEODE-1128> dealing with 
missing colocated regions. The change is to fail cache creation if there are 
missing colocated regions in the cache.xml that will prevent persistent PR 
recovery.

Discussion:
When persistent PRs are colocated, the parent region is created first, but 
persistent data recovery isn’t done until all the colocated regions have been 
created. Currently, if a child region is not created, the cache creation will 
succeed but persistent data is not recovered. This is the condition reported in 
the Jira ticket

When caches and regions are created via the APIs, or interactively with gfsh, 
the cache is created, then the parent region(s), then the child region(s). 
There will always be an unknown delay between each of these steps. The parent 
region creation succeeds, but internally Geode does not know when (or if) the 
child regions will be created. Normally the child regions are created after a 
short period and recovery proceeds, so the parent region having unrecovered 
data is a transitory state. If the child region is not created, the the parent 
region data will not be recovered. In this case a warning can be logged if the 
missing child regions aren’t created within a reasonable time. 

However, when the cache creation is done via a cache.xml file, regions are 
created as part of the cache creation. In this case it’s known fairly quickly 
that there’s a misconfiguration that will prevent persistent PR recovery. The 
cache creation can be failed immediately alerting the user to the 
misconfiguration.

Ken

Reply via email to