[jira] [Commented] (IGNITE-9309) LocalNodeMovingPartitionsCount metrics may calculates incorrect due to processFullPartitionUpdate

Pavel Kovalenko (JIRA) Thu, 23 Aug 2018 07:51:24 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590338#comment-16590338
 ]


Pavel Kovalenko commented on IGNITE-9309:
-----------------------------------------

The actual problem was introduced in 
https://issues.apache.org/jira/browse/IGNITE-8684 .

The key problem that partition state changes now happened only after receiving 
FullMap with exchangeId (PME). There can be race between handling FullMap with 
echangeId != null (PME) and FullMap without exchangeId. If we receive fresh 
FullMap without exchangeId earlier than with, we override our local partition 
states, and FullMap with exchangeId will be rejected as outdated. It means that 
the partition states will not be changed and no rebalance will start.

> LocalNodeMovingPartitionsCount metrics may calculates incorrect due to 
> processFullPartitionUpdate
> -------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-9309
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9309
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.6
>            Reporter: Maxim Muzafarov
>            Priority: Major
>
> [~qvad] have found incorrect {{LocalNodeMovingPartitionsCount}} metrics 
> calculation on client node {{JOIN\LEFT}}. Full issue reproducer is absent.
> Probable scenario:
> {code}
> Repeat 10 times:
> 1. stop node
> 2. clean lfs
> 3. add stopped node (trigger rebalance)
> 4. 3 times: start 2 clients, wait for topology snapshot, close clients
> 5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount 
> (like waitForFinishRebalance())
> {code}
> Whole discussion and all configuration details can be found in comments of 
> [IGNITE-7165|https://issues.apache.org/jira/browse/IGNITE-7165].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-9309) LocalNodeMovingPartitionsCount metrics may calculates incorrect due to processFullPartitionUpdate

Reply via email to