[jira] [Commented] (IGNITE-5155) Need to improve stats dump on exchange timeout
[ https://issues.apache.org/jira/browse/IGNITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011983#comment-16011983 ] Semen Boikov commented on IGNITE-5155: -- Yes, I already implemented exactly what Yakov suggested + more informative debug logging for some cache operation, will continue to work on this issue. > Need to improve stats dump on exchange timeout > -- > > Key: IGNITE-5155 > URL: https://issues.apache.org/jira/browse/IGNITE-5155 > Project: Ignite > Issue Type: Improvement >Reporter: Yakov Zhdanov >Assignee: Semen Boikov > Fix For: 2.1 > > > Currently, on large topologies info dumped on "Failed to wait for partition > map exchange" > (org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java:1713) > floods the log and we need to reduce information dumped. > 1. Reduce output for exchange futures that are already done. Keep event, > topology version, servers count, clients count (more?) > 2. Do not dump the whole communication stats, but send message to exchange > coordinator, ask for its status and for number of messages received and for > acked messages from local node. > 3. we can think of sending new message from cache node to coordinator that > may be a sign of a problem on that node (e.g. unreleased tx locks or still > renting partitions) and coordinator may include this info to a status thus > every Ignite node may point to a problem node in the logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-5155) Need to improve stats dump on exchange timeout
[ https://issues.apache.org/jira/browse/IGNITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010567#comment-16010567 ] Stanilovsky Evgeny commented on IGNITE-5155: Semen told that he already realize more of listed enhacements. > Need to improve stats dump on exchange timeout > -- > > Key: IGNITE-5155 > URL: https://issues.apache.org/jira/browse/IGNITE-5155 > Project: Ignite > Issue Type: Improvement >Reporter: Yakov Zhdanov >Assignee: Stanilovsky Evgeny > Fix For: 2.1 > > > Currently, on large topologies info dumped on "Failed to wait for partition > map exchange" > (org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java:1713) > floods the log and we need to reduce information dumped. > 1. Reduce output for exchange futures that are already done. Keep event, > topology version, servers count, clients count (more?) > 2. Do not dump the whole communication stats, but send message to exchange > coordinator, ask for its status and for number of messages received and for > acked messages from local node. > 3. we can think of sending new message from cache node to coordinator that > may be a sign of a problem on that node (e.g. unreleased tx locks or still > renting partitions) and coordinator may include this info to a status thus > every Ignite node may point to a problem node in the logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-5155) Need to improve stats dump on exchange timeout
[ https://issues.apache.org/jira/browse/IGNITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995441#comment-15995441 ] Yakov Zhdanov commented on IGNITE-5155: --- [~sboikov] [~agoncharuk], Guys, can you please take a look and comment? > Need to improve stats dump on exchange timeout > -- > > Key: IGNITE-5155 > URL: https://issues.apache.org/jira/browse/IGNITE-5155 > Project: Ignite > Issue Type: Improvement >Reporter: Yakov Zhdanov >Assignee: Stanilovsky Evgeny > Fix For: 2.1 > > > Currently, on large topologies info dumped on "Failed to wait for partition > map exchange" > (org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java:1713) > floods the log and we need to reduce information dumped. > 1. Reduce output for exchange futures that are already done. Keep event, > topology version, servers count, clients count (more?) > 2. Do not dump the whole communication stats, but send message to exchange > coordinator, ask for its status and for number of messages received and for > acked messages from local node. > 3. we can think of sending new message from cache node to coordinator that > may be a sign of a problem on that node (e.g. unreleased tx locks or still > renting partitions) and coordinator may include this info to a status thus > every Ignite node may point to a problem node in the logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-5155) Need to improve stats dump on exchange timeout
[ https://issues.apache.org/jira/browse/IGNITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995440#comment-15995440 ] Yakov Zhdanov commented on IGNITE-5155: --- [~zstan], since you currently work on IGNITE-5125 I thought this issue would be interesting for you. Reassign to me if you disagree. Thanks! > Need to improve stats dump on exchange timeout > -- > > Key: IGNITE-5155 > URL: https://issues.apache.org/jira/browse/IGNITE-5155 > Project: Ignite > Issue Type: Improvement >Reporter: Yakov Zhdanov >Assignee: Stanilovsky Evgeny > Fix For: 2.1 > > > Currently, on large topologies info dumped on "Failed to wait for partition > map exchange" > (org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java:1713) > floods the log and we need to reduce information dumped. > 1. Reduce output for exchange futures that are already done. Keep event, > topology version, servers count, clients count (more?) > 2. Do not dump the whole communication stats, but send message to exchange > coordinator, ask for its status and for number of messages received and for > acked messages from local node. > 3. we can think of sending new message from cache node to coordinator that > may be a sign of a problem on that node (e.g. unreleased tx locks or still > renting partitions) and coordinator may include this info to a status thus > every Ignite node may point to a problem node in the logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)