[ https://issues.apache.org/jira/browse/IGNITE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Chugunov updated IGNITE-8828: ------------------------------------ Remaining Estimate: 264h (was: 240h) Original Estimate: 264h (was: 240h) > Detecting and stopping unresponsive nodes during Partition Map Exchange > ----------------------------------------------------------------------- > > Key: IGNITE-8828 > URL: https://issues.apache.org/jira/browse/IGNITE-8828 > Project: Ignite > Issue Type: Improvement > Components: general > Reporter: Sergey Chugunov > Priority: Major > Original Estimate: 264h > Remaining Estimate: 264h > > During PME process coordinator (1) gathers local partition maps from all > nodes and (2) sends calculated full partition map back to all nodes in the > topology. > However if one or more nodes fail to send local information on step 1 for any > reason, PME process hangs blocking all operations. The only solution will be > to manually identify and stop nodes which failed to send info to coordinator. > This should be done by coordinator itself: in case it didn't receive in time > local partition maps from any nodes, it should check that stopping these > nodes won't lead to data loss and then stop them forcibly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)