[ https://issues.apache.org/jira/browse/IGNITE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Plekhanov updated IGNITE-8828: -------------------------------------- Fix Version/s: 2.10 > Detecting and stopping unresponsive nodes during Partition Map Exchange > ----------------------------------------------------------------------- > > Key: IGNITE-8828 > URL: https://issues.apache.org/jira/browse/IGNITE-8828 > Project: Ignite > Issue Type: Improvement > Components: general > Reporter: Sergey Chugunov > Priority: Major > Labels: iep-25 > Fix For: 2.10 > > Original Estimate: 264h > Remaining Estimate: 264h > > During PME process coordinator (1) gathers local partition maps from all > nodes and (2) sends calculated full partition map back to all nodes in the > topology. > However if one or more nodes fail to send local information on step 1 for any > reason, PME process hangs blocking all operations. The only solution will be > to manually identify and stop nodes which failed to send info to coordinator. > This should be done by coordinator itself: in case it didn't receive in time > local partition maps from any nodes, it should check that stopping these > nodes won't lead to data loss and then stop them forcibly. -- This message was sent by Atlassian Jira (v8.3.4#803005)