If it sticks to the behavior we had before introducing failure handler, I think 
it's better to have disabled instead of killing the whole cluster, as in my 
case, and create a parent issue for those ten bugs.Pavel, thanks for the 
suggestion!

 

    On Monday, March 25, 2019, 7:07:20 p.m. GMT+9, Nikolay Izhikov 
<nizhi...@apache.org> wrote:  
 
 Guys.

We should fix the SYSTEM_WORKER_TERMINATION once and for all.
Seems, we have ten or more "cluster shutdown" bugs with this subsystem
since it was introduced.

Should we disable it by default in 2.7.5?


пн, 25 мар. 2019 г. в 13:04, Pavel Kovalenko <jokse...@gmail.com>:

> Hi Roman,
>
> I think this InvalidPartition case can be simply handled
> in GridCacheTtlManager.expire method.
> For workaround a custom FailureHandler can be configured that will not stop
> a node in case of such exception is thrown.
>
> пн, 25 мар. 2019 г. в 08:38, Roman Shtykh <rsht...@yahoo.com.invalid>:
>
> > Igniters,
> >
> > Restarting a node when injecting data and having it expired, results at
> > GridDhtInvalidPartitionException which terminates nodes with
> > SYSTEM_WORKER_TERMINATION one by one taking the whole cluster down. This
> is
> > really bad and I didn't find the way to save the cluster from
> disappearing.
> > I created a JIRA issue
> https://issues.apache.org/jira/browse/IGNITE-11620
> > with a test case. Any clues how to fix this inconsistency when
> rebalancing?
> >
> > -- Roman
> >
>  

Reply via email to