[ https://issues.apache.org/jira/browse/IGNITE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Mekhanikov reassigned IGNITE-10808: ----------------------------------------- Assignee: Denis Mekhanikov > Discovery message queue may build up with TcpDiscoveryMetricsUpdateMessage > -------------------------------------------------------------------------- > > Key: IGNITE-10808 > URL: https://issues.apache.org/jira/browse/IGNITE-10808 > Project: Ignite > Issue Type: Bug > Reporter: Stanislav Lukyanov > Assignee: Denis Mekhanikov > Priority: Major > Attachments: IgniteMetricsOverflowTest.java > > > A node receives a new metrics update message every `metricsUpdateFrequency` > milliseconds, and the message will be put at the top of the queue (because it > is a high priority message). > If processing one message takes more than `metricsUpdateFrequency` then > multiple `TcpDiscoveryMetricsUpdateMessage` will be in the queue. A long > enough delay (e.g. caused by a network glitch or GC) may lead to the queue > building up tens of metrics update messages which are essentially useless to > be processed. Finally, if processing a message on average takes a little more > than `metricsUpdateFrequency` (even for a relatively short period of time, > say, for a minute due to network issues) then the message worker will end up > processing only the metrics updates and the cluster will essentially hang. > Reproducer is attached. In the test, the queue first builds up and then very > slowly being teared down, causing "Failed to wait for PME" messages. > Need to change ServerImpl's SocketReader not to put another metrics update > message to the top of the queue if it already has one (or replace the one at > the top with new one). -- This message was sent by Atlassian JIRA (v7.6.3#76005)