[ https://issues.apache.org/jira/browse/HDDS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephen O'Donnell updated HDDS-2592: ------------------------------------ Resolution: Fixed Status: Resolved (was: Patch Available) > Add Datanode command to allow the datanode to persist its admin state > ---------------------------------------------------------------------- > > Key: HDDS-2592 > URL: https://issues.apache.org/jira/browse/HDDS-2592 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM > Affects Versions: 0.5.0 > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When the operational state of a datanode changes, an async command should be > triggered to persist the new state on the datanodes. For maintenance mode, > the datanode should also store the maintenance end time. The datanode will > then report the new state (and optional maintenance end time) back via its > heartbeat. > The purpose of the DN persisting this information and heartbeating it back to > SCM is to allow the operation state to be recovered after a SCM reboot, as > SCM does not persist any of this information. It also allows "Recon" to learn > the datanode states. > If SCM is restarted, then it will forget all knowledge of the datanodes. When > they register, their operational state will be reported and SCM can set it > correctly. > Outside of registration (ie during normal heartbeats), the SCM state is the > source of truth for the operational state and if the DN heartbeat reports a > state that is not the same as SCM, SCM should issue another command to the > datanode to set its state to the SCM value. There is a chance the state miss > match is due to an unprocessed command triggered by the SCM state change, but > the worst case is an extra command sent to the datanode. This is a very > lightweight command, so that is not an issue. > One open question is whether to persist intermediate states on the DN. Ie for > decommissioning, the DN will first persist "Decommissioning" and then > transition to "Decommissioned" when SCM is satisfied all containers are > replicated. It would be possible to persist both these states in turn on the > datanode quite easily in turn. Or, we set the end state (Decommissioned) on > the datanode and allow SCM to get the node to that state. For the latter, if > SCM is restarted, then the DN will report "Decommissioned" on registration, > but SCM will set its internal state to Decommissioning and then ensure all > containers are replicated before transitioning the node to Decommissioned. > This seems like a safer approach, but there are advantages of tracking the > intermediate states on the DNs too. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org