I’d like to propose a functional change to cache creation when a cache server is created via a cache.xml file. This proposal originated from work on GEODE-1128 <https://issues.apache.org/jira/browse/GEODE-1128> dealing with missing colocated regions. The change is to fail cache creation if there are missing colocated regions in the cache.xml that will prevent persistent PR recovery.
Discussion: When persistent PRs are colocated, the parent region is created first, but persistent data recovery isn’t done until all the colocated regions have been created. Currently, if a child region is not created, the cache creation will succeed but persistent data is not recovered. This is the condition reported in the Jira ticket When caches and regions are created via the APIs, or interactively with gfsh, the cache is created, then the parent region(s), then the child region(s). There will always be an unknown delay between each of these steps. The parent region creation succeeds, but internally Geode does not know when (or if) the child regions will be created. Normally the child regions are created after a short period and recovery proceeds, so the parent region having unrecovered data is a transitory state. If the child region is not created, the the parent region data will not be recovered. In this case a warning can be logged if the missing child regions aren’t created within a reasonable time. However, when the cache creation is done via a cache.xml file, regions are created as part of the cache creation. In this case it’s known fairly quickly that there’s a misconfiguration that will prevent persistent PR recovery. The cache creation can be failed immediately alerting the user to the misconfiguration. Ken