[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses

2017-12-13 Thread Dmitriy Sorokin (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287401#comment-16287401
 ] 

Dmitriy Sorokin edited comment on IGNITE-6171 at 12/13/17 1:22 PM:
---

[~avinogradov], review my new patch, please. I think that passing of Ignite 
Basic test suite is enouth for this patch.
[Ignite 2.0 Tests :: Ignite 
Basic|https://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteBasic_Ignite20Tests=pull%2F3076%2Fhead=buildTypeStatusDiv]


was (Author: cyberdemon):
[~avinogradov], review my new patch, please. I think that passing of Ignite 
Basic test suite is enouth for this patch.
[Ignite20Tests_IgniteBasic|https://ci.ignite.apache.org/viewType.html?buildTypeId=Ignite20Tests_IgniteBasic_Ignite20Tests=pull%2F3076%2Fhead=buildTypeStatusDiv]

> Native facility to control excessive GC pauses
> --
>
> Key: IGNITE-6171
> URL: https://issues.apache.org/jira/browse/IGNITE-6171
> Project: Ignite
>  Issue Type: Task
>  Components: general
>Affects Versions: 2.3
>Reporter: Vladimir Ozerov
>Assignee: Dmitriy Sorokin
>  Labels: iep-7, usability
> Fix For: 2.4
>
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses

2017-12-13 Thread Dmitriy Sorokin (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287401#comment-16287401
 ] 

Dmitriy Sorokin edited comment on IGNITE-6171 at 12/13/17 1:18 PM:
---

[~avinogradov], review my new patch, please. I think that passing of Ignite 
Basic test suite is enouth for this patch.
[Ignite20Tests_IgniteBasic|https://ci.ignite.apache.org/viewLog.html?buildId=991128=buildResultsDiv=Ignite20Tests_IgniteBasic]


was (Author: cyberdemon):
[~avinogradov], review my new patch, please. I think that passing of Ignite 
Basic test suite is enouth for this patch.

> Native facility to control excessive GC pauses
> --
>
> Key: IGNITE-6171
> URL: https://issues.apache.org/jira/browse/IGNITE-6171
> Project: Ignite
>  Issue Type: Task
>  Components: general
>Affects Versions: 2.3
>Reporter: Vladimir Ozerov
>Assignee: Dmitriy Sorokin
>  Labels: iep-7, usability
> Fix For: 2.4
>
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses

2017-12-13 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289218#comment-16289218
 ] 

Anton Vinogradov edited comment on IGNITE-6171 at 12/13/17 12:55 PM:
-

[~cyberdemon]
Looks good to me. See minor comments at upsource.

[~vozerov]
Please perform final review


was (Author: avinogradov):
[~cyberdemon]
Looks good to me/ Se minor comments at upsource.

[~vozerov]
Please perform final review

> Native facility to control excessive GC pauses
> --
>
> Key: IGNITE-6171
> URL: https://issues.apache.org/jira/browse/IGNITE-6171
> Project: Ignite
>  Issue Type: Task
>  Components: general
>Affects Versions: 2.3
>Reporter: Vladimir Ozerov
>Assignee: Dmitriy Sorokin
>  Labels: iep-7, usability
> Fix For: 2.4
>
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses

2017-12-05 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278396#comment-16278396
 ] 

Anton Vinogradov edited comment on IGNITE-6171 at 12/5/17 10:56 AM:


[~cyberdemon]
Please see my comments


was (Author: avinogradov):
Dmitriy Sorokin
Please see my comments

> Native facility to control excessive GC pauses
> --
>
> Key: IGNITE-6171
> URL: https://issues.apache.org/jira/browse/IGNITE-6171
> Project: Ignite
>  Issue Type: Task
>  Components: general
>Affects Versions: 2.3
>Reporter: Vladimir Ozerov
>Assignee: Dmitriy Sorokin
>  Labels: iep-7, usability
> Fix For: 2.4
>
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses

2017-11-23 Thread Dmitriy Sorokin (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264634#comment-16264634
 ] 

Dmitriy Sorokin edited comment on IGNITE-6171 at 11/23/17 5:22 PM:
---

[~avinogradov], please review new patch.


was (Author: cyberdemon):
Anton Vinogradov, please review new patch.

> Native facility to control excessive GC pauses
> --
>
> Key: IGNITE-6171
> URL: https://issues.apache.org/jira/browse/IGNITE-6171
> Project: Ignite
>  Issue Type: Task
>  Components: general
>Affects Versions: 2.3
>Reporter: Vladimir Ozerov
>Assignee: Dmitriy Sorokin
>  Labels: iep-7, usability
> Fix For: 2.4
>
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses

2017-11-20 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259147#comment-16259147
 ] 

Anton Vinogradov edited comment on IGNITE-6171 at 11/20/17 11:53 AM:
-

[~cyberdemon],

Since no one rejected idea propose to start with first part of devlis's 
duscussion

{noformat}
I propose to add a special thread that will record current time every N 
milliseconds and check the difference with the latest recorded value. 
The maximum and total pause values for a certain period can be published in 
the special metrics available through JMX. 
{noformat}


was (Author: avinogradov):
Since no one  rejected idea propose to start with first part of devlis's 
duscussion

{noformat}
I propose to add a special thread that will record current time every N 
milliseconds and check the difference with the latest recorded value. 
The maximum and total pause values for a certain period can be published in 
the special metrics available through JMX. 
{noformat}

> Native facility to control excessive GC pauses
> --
>
> Key: IGNITE-6171
> URL: https://issues.apache.org/jira/browse/IGNITE-6171
> Project: Ignite
>  Issue Type: Task
>  Components: general
>Reporter: Vladimir Ozerov
>Assignee: Dmitriy Sorokin
>  Labels: iep-7, usability
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses

2017-11-16 Thread Vladimir Ozerov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255304#comment-16255304
 ] 

Vladimir Ozerov edited comment on IGNITE-6171 at 11/16/17 1:30 PM:
---

[~cyberdemon], the very problem with this approach is that we observe GC pause 
_after_ it is finished. This is fine to log max GC pause somewhere and show a 
kind of "red flag" to the user. But we cannot react to this pause anyhow. To 
the constrast, solution with native threads will allow to shutdown unresponsive 
node _during_ GC pause. This was the original idea. 

But the question is - does original idea makes sense? Do we really want to 
shutdown the node due to long GC pause? This needs to be discussed separately.


was (Author: vozerov):
[~cyberdemon], the very problem with this approach is that we observe GC pause 
_after_ it finished. This is fine to log max GC pause somewhere and show a kind 
of "red flag" to the user. But we cannot react to this pause anyhow. To the 
constrast, solution with native threads will allow to shutdown unresponsive 
node _during_ GC pause. This was the original idea. 

But the question is - does original idea makes sense? Do we really want to 
shutdown the node due to long GC pause? This needs to be discussed separately.

> Native facility to control excessive GC pauses
> --
>
> Key: IGNITE-6171
> URL: https://issues.apache.org/jira/browse/IGNITE-6171
> Project: Ignite
>  Issue Type: Task
>  Components: general
>Reporter: Vladimir Ozerov
>Assignee: Dmitriy Sorokin
>  Labels: iep-7, usability
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6171) Native facility to control excessive GC pauses

2017-11-16 Thread Dmitriy Sorokin (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255216#comment-16255216
 ] 

Dmitriy Sorokin edited comment on IGNITE-6171 at 11/16/17 12:22 PM:


[~vozerov], [~avinogradov]
I think that we don't need to use JNI method, we only need a standard thread 
that wakes up through a small fixed timeout (20 ms, for example) and updates 
the time value by current system time. with calculating the difference with the 
previous value.
If the difference with the previous value will differ significantly from the 
expected one, this will mean that our thread has been frozen some time, and it 
does not matter if it was a STW pause or other cause of the system response 
degradation.
The system state with our control thread non-running more can't happen 
instantaneously, so we can detect the fact of system response degradation by 
this way.


was (Author: cyberdemon):
I think that we don't need to use JNI method, we only need a standard thread 
that wakes up through a small fixed timeout (20 ms, for example) and updates 
the time value by current system time. with calculating the difference with the 
previous value.
If the difference with the previous value will differ significantly from the 
expected one, this will mean that our thread has been frozen some time, and it 
does not matter if it was a STW pause or other cause of the system response 
degradation.
The system state with our control thread non-running more can't happen 
instantaneously, so we can detect the fact of system response degradation by 
this way.

> Native facility to control excessive GC pauses
> --
>
> Key: IGNITE-6171
> URL: https://issues.apache.org/jira/browse/IGNITE-6171
> Project: Ignite
>  Issue Type: Task
>  Components: general
>Reporter: Vladimir Ozerov
>Assignee: Dmitriy Sorokin
>  Labels: iep-7, usability
>
> Ignite is Java-based application. If node experiences long GC pauses it may 
> negatively affect other nodes. We need to find a way to detect long GC pauses 
> within the process and trigger some actions in response, e.g. node stop. 
> This is a kind of Inception \[1\], when you need to understand that you sleep 
> while sleeping. As all Java threads are blocked on safepoint, we cannot use 
> Java's thread to detect Java's GC. Native threads should be used instead.
> Proposed solution:
> 1) Thread 1 should periodically call dummy JNI method returning current time, 
> and set this time to shared variable;
> 2) Thread 2 should periodically check that variable. If it has not been 
> changed for some time - most likely we are in GC pause. Once certain 
> threashold is reached - trigger compensating action, whether this is a 
> warning, process kill, or what so ever.
> Justification: crossing native -> Java boundaries involves safepoints. This 
> way Thread 1 will be trapped if STW pause is in progress. Java method cannot 
> be empty, as JVM is smart enough and can deduce it to no-op. 
> \[1\] http://www.imdb.com/title/tt1375666/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)