[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-1284: Resolution: Fixed Fix Version/s: 0.4.1 Target Version/s: (was: 0.5.0) Status: Resolved (was: Patch Available) Looks like this was committed to trunk. Resolving. > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-1284: Status: Patch Available (was: Reopened) > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-1284: - Fix Version/s: (was: 0.4.0) > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-1284: - Target Version/s: 0.5.0 (was: 0.4.0) > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > Labels: pull-request-available > Fix For: 0.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-1284: - Fix Version/s: 0.4.0 > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > Labels: pull-request-available > Fix For: 0.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDDS-1284: - Resolution: Fixed Status: Resolved (was: Patch Available) > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-1284: - Labels: pull-request-available (was: ) > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > Labels: pull-request-available > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDDS-1284: --- Summary: Adjust default values of pipline recovery for more resilient service restart (was: Adjust default values of pipline recovery for more resilient service restart ) > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart
[ https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDDS-1284: --- Status: Patch Available (was: Open) > Adjust default values of pipline recovery for more resilient service restart > > > Key: HDDS-1284 > URL: https://issues.apache.org/jira/browse/HDDS-1284 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > > As of now we have a following algorithm to handle node failures: > 1. In case of a missing node the leader of the pipline or the scm can > detected the missing heartbeats. > 2. SCM will start to close the pipeline (CLOSING state) and try to close the > containers with the remaining nodes in the pipeline > 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline > can be created from the healthy nodes (one node can be part only one pipwline > in the same time). > While this algorithm can work well with a big cluster it doesn't provide very > good usability on small clusters: > Use case1: > Given 3 nodes, in case of a service restart, if the restart takes more than > 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes > (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING > state. As there are no more nodes and we can't assign the same node to two > different pipeline, the cluster will be unavailable for 5 minutes. > Use case2: > Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 > racks. Let's stop one rack. As all the pipelines are affected, all the > pipelines will be moved to the CLOSING state. We have no free nodes, > therefore we need to wait for 5 minutes to write any data to the cluster. > These problems can be solved in multiple ways: > 1.) Instead of waiting 5 minutes, destroy the pipeline when all the > containers are reported to be closed. (Most of the time it's enough, but some > container report can be missing) > 2.) Support multi-raft and open a pipeline as soon as we have enough nodes > (even if the nodes already have a CLOSING pipelines). > Both the options require more work on the pipeline management side. For 0.4.0 > we can adjust the following parameters to get better user experience: > {code} > > ozone.scm.pipeline.destroy.timeout > 60s > OZONE, SCM, PIPELINE > > Once a pipeline is closed, SCM should wait for the above configured time > before destroying a pipeline. > > > ozone.scm.stale.node.interval > 90s > OZONE, MANAGEMENT > > The interval for stale node flagging. Please > see ozone.scm.heartbeat.thread.interval before changing this value. > > > {code} > First of all, we can be more optimistic and mark node to stale only after 5 > mins instead of 90s. 5 mins should be enough most of the time to recover the > nodes. > Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. > Ideally the close command is sent by the scm to the datanode with a HB. > Between two HB we have enough time to close all the containers via ratis. > With the next HB, datanode can report the successful datanode. (If the > containers can be closed the scm can manage the QUASI_CLOSED containers) > We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds > for the confirmation. --> 66 seconds seems to be a safe choice (assuming that > 6 seconds is enough to process the report about the successful closing) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org