[
https://issues.apache.org/jira/browse/IGNITE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590338#comment-16590338
]
Pavel Kovalenko commented on IGNITE-9309:
-----------------------------------------
The actual problem was introduced in
https://issues.apache.org/jira/browse/IGNITE-8684 .
The key problem that partition state changes now happened only after receiving
FullMap with exchangeId (PME). There can be race between handling FullMap with
echangeId != null (PME) and FullMap without exchangeId. If we receive fresh
FullMap without exchangeId earlier than with, we override our local partition
states, and FullMap with exchangeId will be rejected as outdated. It means that
the partition states will not be changed and no rebalance will start.
> LocalNodeMovingPartitionsCount metrics may calculates incorrect due to
> processFullPartitionUpdate
> -------------------------------------------------------------------------------------------------
>
> Key: IGNITE-9309
> URL: https://issues.apache.org/jira/browse/IGNITE-9309
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.6
> Reporter: Maxim Muzafarov
> Priority: Major
>
> [~qvad] have found incorrect {{LocalNodeMovingPartitionsCount}} metrics
> calculation on client node {{JOIN\LEFT}}. Full issue reproducer is absent.
> Probable scenario:
> {code}
> Repeat 10 times:
> 1. stop node
> 2. clean lfs
> 3. add stopped node (trigger rebalance)
> 4. 3 times: start 2 clients, wait for topology snapshot, close clients
> 5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount
> (like waitForFinishRebalance())
> {code}
> Whole discussion and all configuration details can be found in comments of
> [IGNITE-7165|https://issues.apache.org/jira/browse/IGNITE-7165].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)