Re: GridDhtInvalidPartitionException takes the cluster down

Roman Shtykh Mon, 25 Mar 2019 04:51:59 -0700

If it sticks to the behavior we had before introducing failure handler, I think 
it's better to have disabled instead of killing the whole cluster, as in my 
case, and create a parent issue for those ten bugs.Pavel, thanks for the 
suggestion!


 

    On Monday, March 25, 2019, 7:07:20 p.m. GMT+9, Nikolay Izhikov 
<[email protected]> wrote:  
 
 Guys.

We should fix the SYSTEM_WORKER_TERMINATION once and for all.
Seems, we have ten or more "cluster shutdown" bugs with this subsystem
since it was introduced.

Should we disable it by default in 2.7.5?


пн, 25 мар. 2019 г. в 13:04, Pavel Kovalenko <[email protected]>:

> Hi Roman,
>
> I think this InvalidPartition case can be simply handled
> in GridCacheTtlManager.expire method.
> For workaround a custom FailureHandler can be configured that will not stop
> a node in case of such exception is thrown.
>
> пн, 25 мар. 2019 г. в 08:38, Roman Shtykh <[email protected]>:
>
> > Igniters,
> >
> > Restarting a node when injecting data and having it expired, results at
> > GridDhtInvalidPartitionException which terminates nodes with
> > SYSTEM_WORKER_TERMINATION one by one taking the whole cluster down. This
> is
> > really bad and I didn't find the way to save the cluster from
> disappearing.
> > I created a JIRA issue
> https://issues.apache.org/jira/browse/IGNITE-11620
> > with a test case. Any clues how to fix this inconsistency when
> rebalancing?
> >
> > -- Roman
> >
>

Re: GridDhtInvalidPartitionException takes the cluster down

Reply via email to