[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Ozerov updated IGNITE-2656: Fix Version/s: (was: 2.3) 2.4 > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 2.4 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Ozerov updated IGNITE-2656: Fix Version/s: (was: 2.4) 2.3 > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 2.3 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Magda updated IGNITE-2656: Fix Version/s: (was: 2.3) 2.4 > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 2.4 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Ozerov updated IGNITE-2656: Labels: (was: documentation) > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 2.3 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Ozerov updated IGNITE-2656: Labels: documentation (was: ) > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 2.3 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Ozerov updated IGNITE-2656: Component/s: documentation > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 2.3 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Magda updated IGNITE-2656: Fix Version/s: (was: 2.1) 2.2 > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 2.2 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Magda updated IGNITE-2656: Fix Version/s: (was: 2.0) 2.1 > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 2.1 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Magda updated IGNITE-2656: Priority: Minor (was: Major) > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Minor > Fix For: 1.8 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (IGNITE-2656) Documentation on debugging and fixing the reasons of node disconnection from the cluster
[ https://issues.apache.org/jira/browse/IGNITE-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-2656: --- Fix Version/s: (was: 1.7) 1.8 > Documentation on debugging and fixing the reasons of node disconnection from > the cluster > > > Key: IGNITE-2656 > URL: https://issues.apache.org/jira/browse/IGNITE-2656 > Project: Ignite > Issue Type: Bug >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Critical > Fix For: 1.8 > > > Sometimes a node can be abruptly kicked off from the cluster buy some reason. > The documentation must contain information on how to get to the root of the > issue by looking at logs files. Usually the node that was kicked off contains > "Local node segmented" message and the node that failed its next neighbor > contains a message with more details "Failed to send message to next node". > Next the article must list possible reasons of the disconnection: > - long GC pauses. Give recommendations on how to check; > - high node utilization so that it responds with a delay; > - low network configuration parameters that are not suited for an environment; > There should be a section about > {{IgniteConfiguration.failureDetectionTimeout}} describing its behavior and > showing all its pros and cons. > The article must say when it makes sense to 'disable' this timeout by > switching to explicit configuration of TcpDiscoverySpi.socketTimeout, > TcpDiscoverySpi.ackTimeout, TcpDiscoverySpi.maxAckTimeout, > TcpDiscoverySpi.reconnectCount. Pros and cons of manual configuration has to > be mentioned as well. > > Also I would list the usage of TcpDiscoverySpi.joinTimeout, > TcpDiscoverySpi.networkTimeout (used on client reconnect, servers waits for > join result, node stop, socket reader first message.) there as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)