Re: GridDhtInvalidPartitionException takes the cluster down

Andrey Gura Mon, 25 Mar 2019 05:52:35 -0700

Failure handlers were introduced in order to avoid cluster hanging and
they kill nodes instead.


If critical worker was terminated by GridDhtInvalidPartitionException
then your node is unable to work anymore.

Unexpected cluster shutdown with reasons in logs that failure handlers
provide is better than hanging. So answer is NO. We mustn't disable
failure handlers.

On Mon, Mar 25, 2019 at 2:47 PM Roman Shtykh <[email protected]> wrote:
>
> If it sticks to the behavior we had before introducing failure handler, I 
> think it's better to have disabled instead of killing the whole cluster, as 
> in my case, and create a parent issue for those ten bugs.Pavel, thanks for 
> the suggestion!
>
>
>
>     On Monday, March 25, 2019, 7:07:20 p.m. GMT+9, Nikolay Izhikov 
> <[email protected]> wrote:
>
>  Guys.
>
> We should fix the SYSTEM_WORKER_TERMINATION once and for all.
> Seems, we have ten or more "cluster shutdown" bugs with this subsystem
> since it was introduced.
>
> Should we disable it by default in 2.7.5?
>
>
> пн, 25 мар. 2019 г. в 13:04, Pavel Kovalenko <[email protected]>:
>
> > Hi Roman,
> >
> > I think this InvalidPartition case can be simply handled
> > in GridCacheTtlManager.expire method.
> > For workaround a custom FailureHandler can be configured that will not stop
> > a node in case of such exception is thrown.
> >
> > пн, 25 мар. 2019 г. в 08:38, Roman Shtykh <[email protected]>:
> >
> > > Igniters,
> > >
> > > Restarting a node when injecting data and having it expired, results at
> > > GridDhtInvalidPartitionException which terminates nodes with
> > > SYSTEM_WORKER_TERMINATION one by one taking the whole cluster down. This
> > is
> > > really bad and I didn't find the way to save the cluster from
> > disappearing.
> > > I created a JIRA issue
> > https://issues.apache.org/jira/browse/IGNITE-11620
> > > with a test case. Any clues how to fix this inconsistency when
> > rebalancing?
> > >
> > > -- Roman
> > >
> >

Re: GridDhtInvalidPartitionException takes the cluster down

Reply via email to