I'm generally in favor of this idea. One place you might get in trouble
with adding this check is that we have an outstanding bug with gateway
sender manual start - GEODE-1117. The issue is that even though you have
everything defined in cache.xml, if you have manual-start set to true on
the gateway it will not create the colocated regions until you call start.

I'm not quite sure what the use case is for manual start. The combination
of manual start with a persistent queue seems especially strange, because
manual-start means you will lose some events. Does anyone know why we have
that option?

The right thing to do is recover what you can, when you can. Not to make
> parent recovery dependant on its children.
>

I personally think geode is way too flexible in how people can recover
persistent regions. That leads to a lot of complicated code, complicated
configuration, and these sorts of issues which shouldn't even be possible.
Rather than adding more flexibility, I'd like to see us move to a model
where when you start a member with persistent data it just recovers all of
the regions without the user calling create region or anything. I think
that would reduce the complexity for both geode and the user.

-Dan

On Mon, Jul 25, 2016 at 10:13 AM, Michael Stolz <mst...@pivotal.io> wrote:

> What about the case where the parent region is created via cache.xml and
> the child regions are created dynamically? I believe that could be a valid
> case. The right thing to do is recover what you can, when you can. Not to
> make parent recovery dependant on its children.
>
> --
> Mike Stolz
> Principal Engineer - Gemfire Product Manager
> Mobile: 631-835-4771
> On Jul 25, 2016 1:06 PM, "Kenneth Howe" <kh...@pivotal.io> wrote:
>
> > I’d like to propose a functional change to cache creation when a cache
> > server is created via a cache.xml file. This proposal originated from
> work
> > on GEODE-1128 <https://issues.apache.org/jira/browse/GEODE-1128> dealing
> > with missing colocated regions. The change is to fail cache creation if
> > there are missing colocated regions in the cache.xml that will prevent
> > persistent PR recovery.
> >
> > Discussion:
> > When persistent PRs are colocated, the parent region is created first,
> but
> > persistent data recovery isn’t done until all the colocated regions have
> > been created. Currently, if a child region is not created, the cache
> > creation will succeed but persistent data is not recovered. This is the
> > condition reported in the Jira ticket
> >
> > When caches and regions are created via the APIs, or interactively with
> > gfsh, the cache is created, then the parent region(s), then the child
> > region(s). There will always be an unknown delay between each of these
> > steps. The parent region creation succeeds, but internally Geode does not
> > know when (or if) the child regions will be created. Normally the child
> > regions are created after a short period and recovery proceeds, so the
> > parent region having unrecovered data is a transitory state. If the child
> > region is not created, the the parent region data will not be recovered.
> In
> > this case a warning can be logged if the missing child regions aren’t
> > created within a reasonable time.
> >
> > However, when the cache creation is done via a cache.xml file, regions
> are
> > created as part of the cache creation. In this case it’s known fairly
> > quickly that there’s a misconfiguration that will prevent persistent PR
> > recovery. The cache creation can be failed immediately alerting the user
> to
> > the misconfiguration.
> >
> > Ken
> >
> >
>

Reply via email to