I am not quite I understand how tasks are split. How can we discuss graceful shutdown without discussing the reasons of this shutdown? What leads to it?
On Wed, Nov 15, 2017 at 2:10 PM, Anton Vinogradov <avinogra...@gridgain.com> wrote: > Vova, > > Currently we have a lot IEPs to improve grid monitoring and behavior. > > Let's split tasks to: > > 1) Graceful shutdown. > In this case we'd like to provide user ability to do something, > LifecycleBean is what we looking for, thanks for tips! > But, we have to keep shutdown reason somewhere. > In case you know where it already kept , please let us know. > > 2) OOM or any other reason cause node crash. > In this case some watchdog (like [1] or [2]) should monitor node alive > > 3) GC and deadlock(java and tx) issues > Should be monitored by special thread [3] or published by metrics [4] > > 4) Throughput, latency and space issues > Special metrics should be developed according to [5] > > Andrey asking about case #1 (graceful shutdown), lets discuss only this > case. > > [1] https://issues.apache.org/jira/browse/IGNITE-6587 > [2] https://wrapper.tanukisoftware.com/doc/english/download.jsp > [3] https://issues.apache.org/jira/browse/IGNITE-6171 > [4] > https://cwiki.apache.org/confluence/display/IGNITE/IEP- > 7%3A+Ignite+internal+problems+detection > [5] > https://cwiki.apache.org/confluence/display/IGNITE/IEP- > 6%3A+Metrics+improvements > > > On Wed, Nov 15, 2017 at 1:34 PM, Vladimir Ozerov <voze...@gridgain.com> > wrote: > > > AFAIK the idea was not only to shutdown the node, but also to give user > > (e.g. administrator) ability to observe the problem from the outside, > e.g. > > through JMX. E.g. if we detect Java-level deadlock, it doesn't mean that > > the only possible solution is node shutdown. In addition it could be > no-op, > > e.g. to give user chance to collect additional system info, or simply > > because this particular deadlock is resolvable (e.g. > > Lock.lockInterruptibly()). So as we need to expose health info through > JMX > > anyway, we could also give user programmatic access to it as well. > > Alternatively, we can expose this info through JMX only and ask user to > get > > instance of that bean manually. > > > > On Wed, Nov 15, 2017 at 1:19 PM, Anton Vinogradov < > > avinogra...@gridgain.com> > > wrote: > > > > > Vova, > > > > > > Could you point to metric you're talking about? > > > > > > On Wed, Nov 15, 2017 at 1:06 PM, Andrey Kuznetsov <stku...@gmail.com> > > > wrote: > > > > > > > Vladimir, > > > > > > > > Could you please refine, what are local metrics? Should I extend > Ignite > > > > interface by adding something similar to dataRegionMetrics() or there > > is > > > > some universal mechanism to handle metrics? > > > > > > > > 2017-11-15 8:30 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > > > > > > > > > > This information should be available through local metrics, so that > > it > > > is > > > > > accessible from Ignite instance. > > > > > > > > > > > > > > >