Vladimir Ozerov created IGNITE-6171:
---------------------------------------

             Summary: Native facility to control excessive GC pauses
                 Key: IGNITE-6171
                 URL: https://issues.apache.org/jira/browse/IGNITE-6171
             Project: Ignite
          Issue Type: Task
          Components: general
            Reporter: Vladimir Ozerov
             Fix For: 2.2


Ignite is Java-based application. If node experiences long GC pauses it may 
negatively affect other nodes. We need to find a way to detect long GC pauses 
within the process and trigger some actions in response, e.g. node stop. 

This is a kind of Inception \[1'\], when you need to understand that you sleep 
while sleeping. As all Java threads are blocked on safepoint, we cannot use 
Java's thread to detect Java's GC. Native threads should be used instead.

Proposed solution:
1) Thread 1 should periodically call dummy JNI method returning current time, 
and set this time to shared variable;
2) Thread 2 should periodically check that variable. If it has not been changed 
for some time - most likely we are in GC pause. Once certain threashold is 
reached - trigger compensating action, whether this is a warning, process kill, 
or what so ever.

Justification: crossing native -> Java boundaries involves safepoints. This way 
Thread 1 will be trapped if STW pause is in progress. Java method cannot be 
empty, as JVM is smart enough and can deduce it to no-op. 

\[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to