Guys, We have activity to implement a set of mechanisms to handle critical issues on nodes (IEP-14 - [1]).
I have an idea to spread message about critical issues to nodes through entire topology and put it to logs of all nodes. In my view this will add much more clarity. Imagine all nodes output message to log - "Critical system thread failed on node XXX [details=...]". This should help a lot with investigations. Andrey Gura, Alex Goncharuk what do you think? --Yakov [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling