[jira] [Commented] (IGNITE-17279) Mapping of partition states to nodes can erroneously skip lost partitions on the coordinator node

2022-07-06 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563003#comment-17563003
 ] 

Vladislav Pyatkov commented on IGNITE-17279:


LGTM

> Mapping of partition states to nodes can erroneously skip lost partitions on 
> the coordinator node
> -
>
> Key: IGNITE-17279
> URL: https://issues.apache.org/jira/browse/IGNITE-17279
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Minor
> Fix For: 2.14
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It seems that a coordinator node does not correctly update node2part mapping 
> for lost partitions. 
> {noformat}
> [test-runner-#1%distributed.CachePartitionLostAfterSupplierHasLeftTest%][root]
>  dump partitions state for :
> preload sync futures
> nodeId=b57ca812-416d-40d7-bb4f-27199490 
> consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest0 
> isDone=true
> nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 
> consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest1 
> isDone=true
> rebalance futures
> nodeId=b57ca812-416d-40d7-bb4f-27199490 isDone=true res=true topVer=null
> remaining: {}
> nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 isDone=true res=false 
> topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0]
> remaining: {}
> partition state
> localNodeId=b57ca812-416d-40d7-bb4f-27199490 
> grid=distributed.CachePartitionLostAfterSupplierHasLeftTest0
> local part=0 counters=Counter [lwm=200, missed=[], maxApplied=200, hwm=200] 
> fullSize=200 *state=LOST* reservations=0 isAffNode=true
>  nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 part=0 *state=LOST* 
> isAffNode=true
> ...
> localNodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 
> grid=distributed.CachePartitionLostAfterSupplierHasLeftTest1
> local part=0 counters=Counter [lwm=0, missed=[], maxApplied=0, hwm=0] 
> fullSize=100 *state=LOST* reservations=0 isAffNode=true
>  nodeId=b57ca812-416d-40d7-bb4f-27199490 part=0 *state=OWNING* 
> isAffNode=true
> ...
> {noformat}
> *Update*:
> The root cause of the issue is that the coordinator node incorrectly 
> update mapping nodes to partition states on PME (see 
> GridDhtPartitionTopologyImpl.node2part). It seems to me, that the coordinator 
> node should set partition state to LOST on all affinity nodes (if this 
> partition is assumed as LOST on the coordinator) before creating and sending 
> a “full map” message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17279) Mapping of partition states to nodes can erroneously skip lost partitions on the coordinator node

2022-07-04 Thread Vyacheslav Koptilin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562295#comment-17562295
 ] 

Vyacheslav Koptilin commented on IGNITE-17279:
--

Hi [~v.pyatkov],

Could you please take a look?

> Mapping of partition states to nodes can erroneously skip lost partitions on 
> the coordinator node
> -
>
> Key: IGNITE-17279
> URL: https://issues.apache.org/jira/browse/IGNITE-17279
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Minor
> Fix For: 2.14
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It seems that a coordinator node does not correctly update node2part mapping 
> for lost partitions. 
> {noformat}
> [test-runner-#1%distributed.CachePartitionLostAfterSupplierHasLeftTest%][root]
>  dump partitions state for :
> preload sync futures
> nodeId=b57ca812-416d-40d7-bb4f-27199490 
> consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest0 
> isDone=true
> nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 
> consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest1 
> isDone=true
> rebalance futures
> nodeId=b57ca812-416d-40d7-bb4f-27199490 isDone=true res=true topVer=null
> remaining: {}
> nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 isDone=true res=false 
> topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0]
> remaining: {}
> partition state
> localNodeId=b57ca812-416d-40d7-bb4f-27199490 
> grid=distributed.CachePartitionLostAfterSupplierHasLeftTest0
> local part=0 counters=Counter [lwm=200, missed=[], maxApplied=200, hwm=200] 
> fullSize=200 *state=LOST* reservations=0 isAffNode=true
>  nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 part=0 *state=LOST* 
> isAffNode=true
> ...
> localNodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 
> grid=distributed.CachePartitionLostAfterSupplierHasLeftTest1
> local part=0 counters=Counter [lwm=0, missed=[], maxApplied=0, hwm=0] 
> fullSize=100 *state=LOST* reservations=0 isAffNode=true
>  nodeId=b57ca812-416d-40d7-bb4f-27199490 part=0 *state=OWNING* 
> isAffNode=true
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17279) Mapping of partition states to nodes can erroneously skip lost partitions on the coordinator node

2022-07-04 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562294#comment-17562294
 ] 

Ignite TC Bot commented on IGNITE-17279:


{panel:title=Branch: [pull/10126/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/10126/head] Base: [master] : No new tests 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=6657616&buildTypeId=IgniteTests24Java8_RunAll]

> Mapping of partition states to nodes can erroneously skip lost partitions on 
> the coordinator node
> -
>
> Key: IGNITE-17279
> URL: https://issues.apache.org/jira/browse/IGNITE-17279
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It seems that a coordinator node does not correctly update node2part mapping 
> for lost partitions. 
> {noformat}
> [test-runner-#1%distributed.CachePartitionLostAfterSupplierHasLeftTest%][root]
>  dump partitions state for :
> preload sync futures
> nodeId=b57ca812-416d-40d7-bb4f-27199490 
> consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest0 
> isDone=true
> nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 
> consistentId=distributed.CachePartitionLostAfterSupplierHasLeftTest1 
> isDone=true
> rebalance futures
> nodeId=b57ca812-416d-40d7-bb4f-27199490 isDone=true res=true topVer=null
> remaining: {}
> nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 isDone=true res=false 
> topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0]
> remaining: {}
> partition state
> localNodeId=b57ca812-416d-40d7-bb4f-27199490 
> grid=distributed.CachePartitionLostAfterSupplierHasLeftTest0
> local part=0 counters=Counter [lwm=200, missed=[], maxApplied=200, hwm=200] 
> fullSize=200 *state=LOST* reservations=0 isAffNode=true
>  nodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 part=0 *state=LOST* 
> isAffNode=true
> ...
> localNodeId=20fdfa4a-ddf6-4229-b25e-38cd8d31 
> grid=distributed.CachePartitionLostAfterSupplierHasLeftTest1
> local part=0 counters=Counter [lwm=0, missed=[], maxApplied=0, hwm=0] 
> fullSize=100 *state=LOST* reservations=0 isAffNode=true
>  nodeId=b57ca812-416d-40d7-bb4f-27199490 part=0 *state=OWNING* 
> isAffNode=true
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)