Re: IEP-14: Ignite failures handling (Discussion)

Andrey Gura Fri, 16 Mar 2018 06:44:40 -0700

Hi!

Thank you all for your opinions and ideas!

While reading the thread I made two important conclusions:

1. Proposed API should be changed because possible actions enumeration
is bad idea. More clean and simple design should allow user provide
failure handler implementation with custom logic of failure handling
if needed.

2. Several failure handler implementations should be provided out-of
box in order to provide simple way of changing default behaviour
through configuration. The following implementations should be
provided:

     - NoOpFailureHandler - It's useful for tests and debugging.
     - RestartProcessFailureHandler - Specific implementation that
could be used only with ignite.(sh|bat).
     - StopNodeFailureHandler - This implementation will stop Ignite
node in case of critical error.
     - StopNodeOrHaltFailureHandler(boolean tryStop, long timeout) -
Default failure handler will try stop node if tryStop value is true.
If node can't be stopped or tryStop value is false then JVM process
will be terminated forcibly (Runtime.halt()). Default value for
tryStop parameter is false. Of course we should limit time of node
shutdown in order to prevent hangs.

As for the default behavior, I agree with those who believe that most
suitable default option is process termination (although I had a
different opinion before) and most strong argument for this choice is
impossibility of reasoning about system state in case of critical
error.
Also I believe that we can't choose solution that will be suitable for
any community member and the best that we can do is provide simple way
of changing this behavior.

So, I think, default behavior discussion should be finished. I'll
update IEP-14 [1] accordingly to my conclusions above. If you have any
ideas or thoughts about this conclusions, please feel free to share.

Thanks!

[1] 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling

On Fri, Mar 16, 2018 at 1:07 AM, Dmitriy Setrakyan
<dsetrak...@apache.org> wrote:
> On Thu, Mar 15, 2018 at 5:21 AM, Dmitry Pavlov <dpavlov....@gmail.com>
> wrote:
>
>> Hi Dmitriy,
>>
>> It seems, here everyone agrees that killing the process will give a more
>> guaranteed result. The question is that the majority in the community does
>> not consider this to be acceptable in case Ignite as started as embedded
>> lib (e.g. from Java, using Ignition.start())
>>
>> What can help to accept the community's opinion? Let's remember Apache
>> principle: "community first".
>>
>
> I am still confused about the problem the majority of the community is
> trying to solve. If our priority is to keep the cluster in frozen state,
> then what is the reason for this task altogether?
>
> The priority should be to keep the cluster operational, not frozen. The
> only solution here is "kill" or "stop+kill". If the community does not
> accept this option as a default, then I propose to drop this task
> altogether, because we do not have to do anything to keep the cluster
> frozen.
>
>
>> If release 2.5 will show us it was inpractical, we will change default to
>> kill even for library. What do you think?
>>
>
> See above. I do not see a reason to continue with this task if the end
> result is identical to what we have today.
>
> I want to give the community another chance to speak up and voice their
> opinions again, having fully understood the context and the problem being
> solved here.
>
> D.

Re: IEP-14: Ignite failures handling (Discussion)

Reply via email to