Hi! Thank you all for your opinions and ideas!
While reading the thread I made two important conclusions: 1. Proposed API should be changed because possible actions enumeration is bad idea. More clean and simple design should allow user provide failure handler implementation with custom logic of failure handling if needed. 2. Several failure handler implementations should be provided out-of box in order to provide simple way of changing default behaviour through configuration. The following implementations should be provided: - NoOpFailureHandler - It's useful for tests and debugging. - RestartProcessFailureHandler - Specific implementation that could be used only with ignite.(sh|bat). - StopNodeFailureHandler - This implementation will stop Ignite node in case of critical error. - StopNodeOrHaltFailureHandler(boolean tryStop, long timeout) - Default failure handler will try stop node if tryStop value is true. If node can't be stopped or tryStop value is false then JVM process will be terminated forcibly (Runtime.halt()). Default value for tryStop parameter is false. Of course we should limit time of node shutdown in order to prevent hangs. As for the default behavior, I agree with those who believe that most suitable default option is process termination (although I had a different opinion before) and most strong argument for this choice is impossibility of reasoning about system state in case of critical error. Also I believe that we can't choose solution that will be suitable for any community member and the best that we can do is provide simple way of changing this behavior. So, I think, default behavior discussion should be finished. I'll update IEP-14 [1] accordingly to my conclusions above. If you have any ideas or thoughts about this conclusions, please feel free to share. Thanks! [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling On Fri, Mar 16, 2018 at 1:07 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > On Thu, Mar 15, 2018 at 5:21 AM, Dmitry Pavlov <dpavlov....@gmail.com> > wrote: > >> Hi Dmitriy, >> >> It seems, here everyone agrees that killing the process will give a more >> guaranteed result. The question is that the majority in the community does >> not consider this to be acceptable in case Ignite as started as embedded >> lib (e.g. from Java, using Ignition.start()) >> >> What can help to accept the community's opinion? Let's remember Apache >> principle: "community first". >> > > I am still confused about the problem the majority of the community is > trying to solve. If our priority is to keep the cluster in frozen state, > then what is the reason for this task altogether? > > The priority should be to keep the cluster operational, not frozen. The > only solution here is "kill" or "stop+kill". If the community does not > accept this option as a default, then I propose to drop this task > altogether, because we do not have to do anything to keep the cluster > frozen. > > >> If release 2.5 will show us it was inpractical, we will change default to >> kill even for library. What do you think? >> > > See above. I do not see a reason to continue with this task if the end > result is identical to what we have today. > > I want to give the community another chance to speak up and voice their > opinions again, having fully understood the context and the problem being > solved here. > > D.