[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-07-23 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1284:

  Resolution: Fixed
   Fix Version/s: 0.4.1
Target Version/s:   (was: 0.5.0)
  Status: Resolved  (was: Patch Available)

Looks like this was committed to trunk. Resolving.

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-07-23 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1284:

Status: Patch Available  (was: Reopened)

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-03-21 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-1284:
-
Fix Version/s: (was: 0.4.0)

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-03-21 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-1284:
-
Target Version/s: 0.5.0  (was: 0.4.0)

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-03-15 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-1284:
-
Fix Version/s: 0.4.0

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-03-15 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-1284:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-03-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1284:
-
Labels: pull-request-available  (was: )

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: pull-request-available
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-03-14 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-1284:
---
Summary: Adjust default values of pipline recovery for more resilient 
service restart  (was: Adjust default values of pipline recovery for more 
resilient service restart )

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1284) Adjust default values of pipline recovery for more resilient service restart

2019-03-14 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-1284:
---
Status: Patch Available  (was: Open)

> Adjust default values of pipline recovery for more resilient service restart
> 
>
> Key: HDDS-1284
> URL: https://issues.apache.org/jira/browse/HDDS-1284
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>
> As of now we have a following algorithm to handle node failures:
> 1. In case of a missing node the leader of the pipline or the scm can 
> detected the missing heartbeats.
> 2. SCM will start to close the pipeline (CLOSING state) and try to close the 
> containers with the remaining nodes in the pipeline
> 3. After 5 minutes the pipeline will be destroyed (CLOSED) and a new pipeline 
> can be created from the healthy nodes (one node can be part only one pipwline 
> in the same time).
> While this algorithm can work well with a big cluster it doesn't provide very 
> good usability on small clusters:
> Use case1:
> Given 3 nodes, in case of a service restart, if the restart takes more than 
> 90s, the pipline will be moved to the CLOSING state. For the next 5 minutes 
> (ozone.scm.pipeline.destroy.timeout) the container will remain in the CLOSING 
> state. As there are no more nodes and we can't assign the same node to two 
> different pipeline, the cluster will be unavailable for 5 minutes.
> Use case2:
> Given 90 nodes and 30 pipelines where all the pipelines are spread across 3 
> racks. Let's stop one rack. As all the pipelines are affected, all the 
> pipelines will be moved to the CLOSING state. We have no free nodes, 
> therefore we need to wait for 5 minutes to write any data to the cluster.
> These problems can be solved in multiple ways:
> 1.) Instead of waiting 5 minutes, destroy the pipeline when all the 
> containers are reported to be closed. (Most of the time it's enough, but some 
> container report can be missing)
> 2.) Support multi-raft and open a pipeline as soon as we have enough nodes 
> (even if the nodes already have a CLOSING pipelines).
> Both the options require more work on the pipeline management side. For 0.4.0 
> we can adjust the following parameters to get better user experience:
> {code}
>   
> ozone.scm.pipeline.destroy.timeout
> 60s
> OZONE, SCM, PIPELINE
> 
>   Once a pipeline is closed, SCM should wait for the above configured time
>   before destroying a pipeline.
> 
>   
> ozone.scm.stale.node.interval
> 90s
> OZONE, MANAGEMENT
> 
>   The interval for stale node flagging. Please
>   see ozone.scm.heartbeat.thread.interval before changing this value.
> 
>   
>  {code}
> First of all, we can be more optimistic and mark node to stale only after 5 
> mins instead of 90s. 5 mins should be enough most of the time to recover the 
> nodes.
> Second: we can decrease the time of ozone.scm.pipeline.destroy.timeout. 
> Ideally the close command is sent by the scm to the datanode with a HB. 
> Between two HB we have enough time to close all the containers via ratis. 
> With the next HB, datanode can report the successful datanode. (If the 
> containers can be closed the scm can manage the QUASI_CLOSED containers)
> We need to wait 29 seconds (worst case) for the next HB, and 29+30 seconds 
> for the confirmation. --> 66 seconds seems to be a safe choice (assuming that 
> 6 seconds is enough to process the report about the successful closing)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org