Hi, Igniters! This discussion thread related to https://issues.apache.org/jira/browse/IGNITE-6171.
Currently there are no JVM performance monitoring tools in AI, for example the impact of GC (eg STW) on the operation of the node. I think we should add this functionality. 1) It is useful to know that STW duration increased or any other situations leads to similar consequences. This will allow system administrators to solve issues prior they become problems. I propose to add a special thread that will record current time every N milliseconds and check the difference with the latest recorded value. The maximum and total pause values for a certain period can be published in the special metrics available through JMX. 2) If the pause reaches a critical value, we need to stop the node, without waiting for end of the pause. The thread (from the first part of the proposed solution) is able to estimate the pause duration, but only after its completion. So, we need an external thread (in another JVM or native) that is able to recognize that the pause duration has passed the critical mark. We can estimate (STW or similar) pause duration by a) reading value updated by the first thread, somehow (eg via JMX, shmem or shared file) or b) by using JVM diagnostic tools. Does anybody know crossplatform solutions? Feel free to suggest ideas or tips, especially about second part of proposal. Thoughts? -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
