[jira] [Commented] (IGNITE-5155) Need to improve stats dump on exchange timeout

2017-05-16 Thread Semen Boikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011983#comment-16011983
 ] 

Semen Boikov commented on IGNITE-5155:
--

Yes, I already implemented exactly what Yakov suggested + more informative 
debug logging for some cache operation, will continue to work on this issue.

> Need to improve stats dump on exchange timeout
> --
>
> Key: IGNITE-5155
> URL: https://issues.apache.org/jira/browse/IGNITE-5155
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Yakov Zhdanov
>Assignee: Semen Boikov
> Fix For: 2.1
>
>
> Currently, on large topologies info dumped on "Failed to wait for partition 
> map exchange" 
> (org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java:1713)
>  floods the log and we need to reduce information dumped.
> 1. Reduce output for exchange futures that are already done. Keep event, 
> topology version, servers count, clients count (more?)
> 2. Do not dump the whole communication stats, but send message to exchange 
> coordinator, ask for its status and for number of messages received and for 
> acked messages from local node.
> 3. we can think of sending new message from cache node to coordinator that 
> may be a sign of a problem on that node (e.g. unreleased tx locks or still 
> renting partitions) and coordinator may include this info to a status thus 
> every Ignite node may point to a problem node in the logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-5155) Need to improve stats dump on exchange timeout

2017-05-15 Thread Stanilovsky Evgeny (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16010567#comment-16010567
 ] 

Stanilovsky Evgeny commented on IGNITE-5155:


Semen told that he already realize more of listed enhacements.

> Need to improve stats dump on exchange timeout
> --
>
> Key: IGNITE-5155
> URL: https://issues.apache.org/jira/browse/IGNITE-5155
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Yakov Zhdanov
>Assignee: Stanilovsky Evgeny
> Fix For: 2.1
>
>
> Currently, on large topologies info dumped on "Failed to wait for partition 
> map exchange" 
> (org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java:1713)
>  floods the log and we need to reduce information dumped.
> 1. Reduce output for exchange futures that are already done. Keep event, 
> topology version, servers count, clients count (more?)
> 2. Do not dump the whole communication stats, but send message to exchange 
> coordinator, ask for its status and for number of messages received and for 
> acked messages from local node.
> 3. we can think of sending new message from cache node to coordinator that 
> may be a sign of a problem on that node (e.g. unreleased tx locks or still 
> renting partitions) and coordinator may include this info to a status thus 
> every Ignite node may point to a problem node in the logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-5155) Need to improve stats dump on exchange timeout

2017-05-03 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995441#comment-15995441
 ] 

Yakov Zhdanov commented on IGNITE-5155:
---

[~sboikov] [~agoncharuk],

Guys, can you please take a look and comment?

> Need to improve stats dump on exchange timeout
> --
>
> Key: IGNITE-5155
> URL: https://issues.apache.org/jira/browse/IGNITE-5155
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Yakov Zhdanov
>Assignee: Stanilovsky Evgeny
> Fix For: 2.1
>
>
> Currently, on large topologies info dumped on "Failed to wait for partition 
> map exchange" 
> (org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java:1713)
>  floods the log and we need to reduce information dumped.
> 1. Reduce output for exchange futures that are already done. Keep event, 
> topology version, servers count, clients count (more?)
> 2. Do not dump the whole communication stats, but send message to exchange 
> coordinator, ask for its status and for number of messages received and for 
> acked messages from local node.
> 3. we can think of sending new message from cache node to coordinator that 
> may be a sign of a problem on that node (e.g. unreleased tx locks or still 
> renting partitions) and coordinator may include this info to a status thus 
> every Ignite node may point to a problem node in the logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-5155) Need to improve stats dump on exchange timeout

2017-05-03 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995440#comment-15995440
 ] 

Yakov Zhdanov commented on IGNITE-5155:
---

[~zstan], since you currently work on IGNITE-5125 I thought this issue would be 
interesting for you. Reassign to me if you disagree.

Thanks!



> Need to improve stats dump on exchange timeout
> --
>
> Key: IGNITE-5155
> URL: https://issues.apache.org/jira/browse/IGNITE-5155
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Yakov Zhdanov
>Assignee: Stanilovsky Evgeny
> Fix For: 2.1
>
>
> Currently, on large topologies info dumped on "Failed to wait for partition 
> map exchange" 
> (org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java:1713)
>  floods the log and we need to reduce information dumped.
> 1. Reduce output for exchange futures that are already done. Keep event, 
> topology version, servers count, clients count (more?)
> 2. Do not dump the whole communication stats, but send message to exchange 
> coordinator, ask for its status and for number of messages received and for 
> acked messages from local node.
> 3. we can think of sending new message from cache node to coordinator that 
> may be a sign of a problem on that node (e.g. unreleased tx locks or still 
> renting partitions) and coordinator may include this info to a status thus 
> every Ignite node may point to a problem node in the logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)