[ https://issues.apache.org/jira/browse/KUDU-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wong updated KUDU-2950: ------------------------------ Description: Once Kudu has the building blocks to orchestrate a rolling restart, it'd be great if we could support restarting multiple nodes at a time. Location awareness would play a crucial role in this because, if used to identify racks placement, we could bring down an entire rack at a time if we wanted. If we did this, though, during the controlled restart of a given rack, Kudu would be more vulnerable to the _unexpected_ downtime of another rack. One approach would be to support something like [HDFS's upgrade domains|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html]: {quote}The idea is to group datanodes in a new dimension called upgrade domain, in addition to the existing rack-based grouping. For example, we can assign all datanodes in the first position of any rack to upgrade domain ud_01, nodes in the second position to upgrade domain ud_02 and so on. ... By default, 3 replicas of any given block are placed on 3 different upgrade domains. This means all datanodes belonging to a specific upgrade domain collectively won’t store more than one replica of any block. {quote} The decoupling of physical groups from restartable groups should make a batch restarts more robust to rack failures. was: Once Kudu has the building blocks to orchestrate a rolling restart, it'd be great if we could support restarting multiple nodes at a time. Location awareness would play a crucial role in this because, if used to identify racks placement, we could bring down an entire rack at a time if we wanted. If we did this, though, during the controlled restart of a given rack, Kudu would be more vulnerable to the _unexpected_ downtime of another rack. One approach would be to support something like HDFS's upgrade domains: {quote}The idea is to group datanodes in a new dimension called upgrade domain, in addition to the existing rack-based grouping. For example, we can assign all datanodes in the first position of any rack to upgrade domain ud_01, nodes in the second position to upgrade domain ud_02 and so on. ... By default, 3 replicas of any given block are placed on 3 different upgrade domains. This means all datanodes belonging to a specific upgrade domain collectively won’t store more than one replica of any block. {quote} The decoupling of physical groups from restartable groups should make a batch restarts more robust to rack failures. > Support restarting nodes in batches > ----------------------------------- > > Key: KUDU-2950 > URL: https://issues.apache.org/jira/browse/KUDU-2950 > Project: Kudu > Issue Type: Improvement > Reporter: Andrew Wong > Priority: Major > > Once Kudu has the building blocks to orchestrate a rolling restart, it'd be > great if we could support restarting multiple nodes at a time. > Location awareness would play a crucial role in this because, if used to > identify racks placement, we could bring down an entire rack at a time if we > wanted. If we did this, though, during the controlled restart of a given > rack, Kudu would be more vulnerable to the _unexpected_ downtime of another > rack. > One approach would be to support something like [HDFS's upgrade > domains|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html]: > {quote}The idea is to group datanodes in a new dimension called upgrade > domain, in addition to the existing rack-based grouping. For example, we can > assign all datanodes in the first position of any rack to upgrade domain > ud_01, nodes in the second position to upgrade domain ud_02 and so on. > ... > By default, 3 replicas of any given block are placed on 3 different upgrade > domains. This means all datanodes belonging to a specific upgrade domain > collectively won’t store more than one replica of any block. > {quote} > The decoupling of physical groups from restartable groups should make a batch > restarts more robust to rack failures. -- This message was sent by Atlassian Jira (v8.3.4#803005)