[jira] [Commented] (IGNITE-17279) Mapping of partition states to nodes can erroneously skip lost partitions on the coordinator node
[ https://issues.apache.org/jira/browse/IGNITE-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563003#comment-17563003 ] Vladislav Pyatkov commented on IGNITE-17279: LGTM > Mapping of partition states to nodes can erroneously skip lost partitions on > the coordinator node > - > > Key: IGNITE-17279 > URL: https://issues.apache.org/jira/browse/IGNITE-17279 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Minor > Fix For: 2.14 > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that a coordinator node does not correctly update node2part mapping > for lost partitions. > {noformat} > [test-runner-#1%distributed.CachePartitionLostAfterSupplierHasLeftTest%][root] > dump partitions state for : > preload sync futures > nodeId=b57ca812-416d-40d7-bb4f-27199490 > consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest0 > isDone=true > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 > consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest1 > isDone=true > rebalance futures > nodeId=b57ca812-416d-40d7-bb4f-27199490 isDone=true res=true topVer=null > remaining: {} > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 isDone=true res=false > topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0] > remaining: {} > partition state > localNodeId=b57ca812-416d-40d7-bb4f-27199490 > grid=distributed.CachePartitionLostAfterSupplierHasLeftTest0 > local part=0 counters=Counter [lwm=200, missed=[], maxApplied=200, hwm=200] > fullSize=200 *state=LOST* reservations=0 isAffNode=true > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 part=0 *state=LOST* > isAffNode=true > ... > localNodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 > grid=distributed.CachePartitionLostAfterSupplierHasLeftTest1 > local part=0 counters=Counter [lwm=0, missed=[], maxApplied=0, hwm=0] > fullSize=100 *state=LOST* reservations=0 isAffNode=true > nodeId=b57ca812-416d-40d7-bb4f-27199490 part=0 *state=OWNING* > isAffNode=true > ... > {noformat} > *Update*: > The root cause of the issue is that the coordinator node incorrectly > update mapping nodes to partition states on PME (see > GridDhtPartitionTopologyImpl.node2part). It seems to me, that the coordinator > node should set partition state to LOST on all affinity nodes (if this > partition is assumed as LOST on the coordinator) before creating and sending > a “full map” message. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17279) Mapping of partition states to nodes can erroneously skip lost partitions on the coordinator node
[ https://issues.apache.org/jira/browse/IGNITE-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562295#comment-17562295 ] Vyacheslav Koptilin commented on IGNITE-17279: -- Hi [~v.pyatkov], Could you please take a look? > Mapping of partition states to nodes can erroneously skip lost partitions on > the coordinator node > - > > Key: IGNITE-17279 > URL: https://issues.apache.org/jira/browse/IGNITE-17279 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Minor > Fix For: 2.14 > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that a coordinator node does not correctly update node2part mapping > for lost partitions. > {noformat} > [test-runner-#1%distributed.CachePartitionLostAfterSupplierHasLeftTest%][root] > dump partitions state for : > preload sync futures > nodeId=b57ca812-416d-40d7-bb4f-27199490 > consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest0 > isDone=true > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 > consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest1 > isDone=true > rebalance futures > nodeId=b57ca812-416d-40d7-bb4f-27199490 isDone=true res=true topVer=null > remaining: {} > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 isDone=true res=false > topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0] > remaining: {} > partition state > localNodeId=b57ca812-416d-40d7-bb4f-27199490 > grid=distributed.CachePartitionLostAfterSupplierHasLeftTest0 > local part=0 counters=Counter [lwm=200, missed=[], maxApplied=200, hwm=200] > fullSize=200 *state=LOST* reservations=0 isAffNode=true > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 part=0 *state=LOST* > isAffNode=true > ... > localNodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 > grid=distributed.CachePartitionLostAfterSupplierHasLeftTest1 > local part=0 counters=Counter [lwm=0, missed=[], maxApplied=0, hwm=0] > fullSize=100 *state=LOST* reservations=0 isAffNode=true > nodeId=b57ca812-416d-40d7-bb4f-27199490 part=0 *state=OWNING* > isAffNode=true > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17279) Mapping of partition states to nodes can erroneously skip lost partitions on the coordinator node
[ https://issues.apache.org/jira/browse/IGNITE-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562294#comment-17562294 ] Ignite TC Bot commented on IGNITE-17279: {panel:title=Branch: [pull/10126/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/10126/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6657616&buildTypeId=IgniteTests24Java8_RunAll] > Mapping of partition states to nodes can erroneously skip lost partitions on > the coordinator node > - > > Key: IGNITE-17279 > URL: https://issues.apache.org/jira/browse/IGNITE-17279 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > It seems that a coordinator node does not correctly update node2part mapping > for lost partitions. > {noformat} > [test-runner-#1%distributed.CachePartitionLostAfterSupplierHasLeftTest%][root] > dump partitions state for : > preload sync futures > nodeId=b57ca812-416d-40d7-bb4f-27199490 > consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest0 > isDone=true > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 > consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest1 > isDone=true > rebalance futures > nodeId=b57ca812-416d-40d7-bb4f-27199490 isDone=true res=true topVer=null > remaining: {} > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 isDone=true res=false > topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0] > remaining: {} > partition state > localNodeId=b57ca812-416d-40d7-bb4f-27199490 > grid=distributed.CachePartitionLostAfterSupplierHasLeftTest0 > local part=0 counters=Counter [lwm=200, missed=[], maxApplied=200, hwm=200] > fullSize=200 *state=LOST* reservations=0 isAffNode=true > nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 part=0 *state=LOST* > isAffNode=true > ... > localNodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 > grid=distributed.CachePartitionLostAfterSupplierHasLeftTest1 > local part=0 counters=Counter [lwm=0, missed=[], maxApplied=0, hwm=0] > fullSize=100 *state=LOST* reservations=0 isAffNode=true > nodeId=b57ca812-416d-40d7-bb4f-27199490 part=0 *state=OWNING* > isAffNode=true > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)