[ https://issues.apache.org/jira/browse/IGNITE-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563003#comment-17563003 ]
Vladislav Pyatkov commented on IGNITE-17279: -------------------------------------------- LGTM > Mapping of partition states to nodes can erroneously skip lost partitions on > the coordinator node > ------------------------------------------------------------------------------------------------- > > Key: IGNITE-17279 > URL: https://issues.apache.org/jira/browse/IGNITE-17279 > Project: Ignite > Issue Type: Bug > Reporter: Vyacheslav Koptilin > Assignee: Vyacheslav Koptilin > Priority: Minor > Fix For: 2.14 > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that a coordinator node does not correctly update node2part mapping > for lost partitions. > {noformat} > [test-runner-#1%distributed.CachePartitionLostAfterSupplierHasLeftTest%][root] > dump partitions state for <default>: > ----preload sync futures---- > nodeId=b57ca812-416d-40d7-bb4f-271994900000 > consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest0 > isDone=true > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d300001 > consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest1 > isDone=true > ----rebalance futures---- > nodeId=b57ca812-416d-40d7-bb4f-271994900000 isDone=true res=true topVer=null > remaining: {} > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d300001 isDone=true res=false > topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0] > remaining: {} > ----partition state---- > localNodeId=b57ca812-416d-40d7-bb4f-271994900000 > grid=distributed.CachePartitionLostAfterSupplierHasLeftTest0 > local part=0 counters=Counter [lwm=200, missed=[], maxApplied=200, hwm=200] > fullSize=200 *state=LOST* reservations=0 isAffNode=true > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d300001 part=0 *state=LOST* > isAffNode=true > ... > localNodeId=20fdfa4a-ddf6-4229-b25e-38cd8d300001 > grid=distributed.CachePartitionLostAfterSupplierHasLeftTest1 > local part=0 counters=Counter [lwm=0, missed=[], maxApplied=0, hwm=0] > fullSize=100 *state=LOST* reservations=0 isAffNode=true > nodeId=b57ca812-416d-40d7-bb4f-271994900000 part=0 *state=OWNING* > isAffNode=true > ... > {noformat} > *Update*: > The root cause of the issue is that the coordinator node incorrectly > update mapping nodes to partition states on PME (see > GridDhtPartitionTopologyImpl.node2part). It seems to me, that the coordinator > node should set partition state to LOST on all affinity nodes (if this > partition is assumed as LOST on the coordinator) before creating and sending > a “full map” message. -- This message was sent by Atlassian Jira (v8.20.10#820010)