[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2017-10-31 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-2656:

Fix Version/s: (was: 2.3)
   2.4

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 2.4
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2017-10-31 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-2656:

Fix Version/s: (was: 2.4)
   2.3

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 2.3
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2017-10-19 Thread Denis Magda (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Magda updated IGNITE-2656:

Fix Version/s: (was: 2.3)
   2.4

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 2.4
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2017-09-27 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-2656:

Labels:   (was: documentation)

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 2.3
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2017-09-27 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-2656:

Labels: documentation  (was: )

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 2.3
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2017-09-27 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-2656:

Component/s: documentation

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 2.3
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2017-05-22 Thread Denis Magda (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Magda updated IGNITE-2656:

Fix Version/s: (was: 2.1)
   2.2

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 2.2
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2017-04-05 Thread Denis Magda (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Magda updated IGNITE-2656:

Fix Version/s: (was: 2.0)
   2.1

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 2.1
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2016-08-23 Thread Denis Magda (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Magda updated IGNITE-2656:

Priority: Minor  (was: Major)

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Minor
> Fix For: 1.8
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster

2016-08-09 Thread Pavel Tupitsyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Tupitsyn updated IGNITE-2656:
---
Fix Version/s: (was: 1.7)
   1.8

> Documentation on debugging and fixing the reasons of node disconnection from 
> the cluster
> 
>
> Key: IGNITE-2656
> URL: https://issues.apache.org/jira/browse/IGNITE-2656
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Magda
>Assignee: Denis Magda
>Priority: Critical
> Fix For: 1.8
>
>
> Sometimes a node can be abruptly kicked off from the cluster buy some reason.
> The documentation must contain information on how to get to the root of the 
> issue by looking at logs files. Usually the node that was kicked off contains 
> "Local node segmented" message and the node that failed its next neighbor 
> contains a message with more details "Failed to send message to next node".
> Next the article must list possible reasons of the disconnection:
> - long GC pauses. Give recommendations on how to check;
> - high node utilization so that it responds with a delay;
> - low network configuration parameters that are not suited for an environment;
> There should be a section about 
> {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and 
> showing all its pros and cons.
> The article must say when it makes sense to 'disable' this timeout by 
> switching to explicit configuration of TcpDiscoverySpi.socketTimeout, 
> TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, 
> TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to 
> be mentioned as well.
>   
> Also I would list the usage of TcpDiscoverySpi.joinTimeout,
> TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for 
> join result, node stop, socket reader first message.) there as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)