[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255216#comment-16255216
 ] 

Dmitriy Sorokin edited comment on IGNITE-6171 at 11/16/17 12:22 PM:
--------------------------------------------------------------------

[~vozerov], [~avinogradov]
I think that we don't need to use JNI method, we only need a standard thread 
that wakes up through a small fixed timeout (20 ms, for example) and updates 
the time value by current system time. with calculating the difference with the 
previous value.
If the difference with the previous value will differ significantly from the 
expected one, this will mean that our thread has been frozen some time, and it 
does not matter if it was a STW pause or other cause of the system response 
degradation.
The system state with our control thread non-running more can't happen 
instantaneously, so we can detect the fact of system response degradation by 
this way.


was (Author: cyberdemon):
I think that we don't need to use JNI method, we only need a standard thread 
that wakes up through a small fixed timeout (20 ms, for example) and updates 
the time value by current system time. with calculating the difference with the 
previous value.
If the difference with the previous value will differ significantly from the 
expected one, this will mean that our thread has been frozen some time, and it 
does not matter if it was a STW pause or other cause of the system response 
degradation.
The system state with our control thread non-running more can't happen 
instantaneously, so we can detect the fact of system response degradation by 
this way.

> Native facility to control excessive GC pauses
> ----------------------------------------------
>
>                 Key: IGNITE-6171
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6171
>             Project: Ignite
>          Issue Type: Task
>          Components: general
>            Reporter: Vladimir Ozerov
>            Assignee: Dmitriy Sorokin
>              Labels: iep-7, usability
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to