[ 
https://issues.apache.org/jira/browse/HDFS-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992347#comment-12992347
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1606:
----------------------------------------------

h5. When to add a datanode?
Since adding a datanode to an existing pipeline is an expensive operation (see 
[the previoius 
comment|https://issues.apache.org/jira/browse/HDFS-1606?focusedCommentId=12991839&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12991839]),
 it should not be performed for every pipeline failure.  Suppose the number of 
replications of the file is greater than or equal to 3.  When a pipeline fails, 
the operation will be invoked if
* the number of datanodes in the pipeline drops from 2 to 1; or
* the block is reopened for append; or
* it is specified by the user.

Note that when the number of replications is specified to less than 3, the 
operation should not be invoked by default because performance is preferred 
over data guarantee.

> Provide a stronger data guarantee in the write pipeline
> -------------------------------------------------------
>
>                 Key: HDFS-1606
>                 URL: https://issues.apache.org/jira/browse/HDFS-1606
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, hdfs client
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> In the current design, if there is a datanode/network failure in the write 
> pipeline, DFSClient will try to remove the failed datanode from the pipeline 
> and then continue writing with the remaining datanodes.  As a result, the 
> number of datanodes in the pipeline is decreased.  Unfortunately, it is 
> possible that DFSClient may incorrectly remove a healthy datanode but leave 
> the failed datanode in the pipeline because failure detection may be 
> inaccurate under erroneous conditions.
> We propose to have a new mechanism for adding new datanodes to the pipeline 
> in order to provide a stronger data guarantee.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to