[jira] [Commented] (HDDS-3214) Unhealthy datanodes repeatedly participate in pipeline creation

2020-06-29 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148135#comment-17148135
 ] 

Arpit Agarwal commented on HDDS-3214:
-

Moved to 0.7.0.

> Unhealthy datanodes repeatedly participate in pipeline creation
> ---
>
> Key: HDDS-3214
> URL: https://issues.apache.org/jira/browse/HDDS-3214
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Prashant Pogde
>Priority: Blocker
>  Labels: TriagePending, fault_injection
>
> steps taken :
> 1) Mounted noise injection FUSE on all datanodes
> 2) Selected 1 datanode from each open pipeline (factor=3)
> 3) Injected WRITE FAILURE noise with error code - ENOENT on 
> "hdds.datanode.dir" path of list of datanodes selected in step 2)
> 4) start PUT key operation of size  32 MB.
>  
> Observation :
> 
>  # Commit failed, pipelines were moved to exclusion list.
>  # Client retries , new pipeline is created with same set of datanodes. 
> Container creation fails as WRITE  FAILURE injection present.
>  # Pipeline is closed and the process is repeated for 
> "ozone.client.max.retries" retries.
> Everytime, same set of datanodes are selected for pipeline creation which 
> include 1 unhealthy datanode. 
> Expectation - pipeline should have been created by selecting 3 healthy  
> datanodes available.
>  
> cc - [~ljain]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3214) Unhealthy datanodes repeatedly participate in pipeline creation

2020-03-17 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060823#comment-17060823
 ] 

Lokesh Jain commented on HDDS-3214:
---

There are three pipelines. One node in each pipeline has been injected by 
fault. When a write on a datanode fails it sends pipeline close action to the 
SCM. SCM then destroys the pipeline and allocates a new one. While allocating a 
pipeline the only available nodes in SCM are the ones involved in the pipeline 
destroyed. Therefore a new pipeline is created using the same nodes. As a 
result the client write never succeeds as this cycle continues for all the 
pipelines in the cluster.

Ideally the datanode failing should be flagged in SCM until a corrective action 
is taken. The new pipeline should be created using healthy datanodes.

> Unhealthy datanodes repeatedly participate in pipeline creation
> ---
>
> Key: HDDS-3214
> URL: https://issues.apache.org/jira/browse/HDDS-3214
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Major
>  Labels: fault_injection
>
> steps taken :
> 1) Mounted noise injection FUSE on all datanodes
> 2) Selected 1 datanode from each open pipeline (factor=3)
> 3) Injected WRITE FAILURE noise with error code - ENOENT on 
> "hdds.datanode.dir" path of list of datanodes selected in step 2)
> 4) start PUT key operation of size  32 MB.
>  
> Observation :
> 
>  # Commit failed, pipelines were moved to exclusion list.
>  # Client retries , new pipeline is created with same set of datanodes. 
> Container creation fails as WRITE  FAILURE injection present.
>  # Pipeline is closed and the process is repeated for 
> "ozone.client.max.retries" retries.
> Everytime, same set of datanodes are selected for pipeline creation which 
> include 1 unhealthy datanode. 
> Expectation - pipeline should have been created by selecting 3 healthy  
> datanodes available.
>  
> cc - [~ljain]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org