[ 
https://issues.apache.org/jira/browse/HDFS-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102689#comment-14102689
 ] 

Colin Patrick McCabe commented on HDFS-4257:
--------------------------------------------

Thanks for working on this again, Nicholas.

{code}
  <name>dfs.client.block.write.replace-datanode-on-failure.best-effort</name>
  <value>false</value>
  <description>
    Best effort means that the client will try to replace the failed datanode
    (provided that the policy is satisfied), however, it will continue the
    write operation in case that the datanode replacement also fails.

    Suppose the datanode replacement fails.
    false: An exception should be thrown so that the write will fail.
    true : The write should be resumed with the remaining datandoes.
  </description>
</property>
{code}

This description doesn't mention write pipeline recovery.  We should make it 
clear here that this setting applies to pipeline recovery.  I agree with 
Yongjun that we should also probably mention that best effort means that the 
client may resume writing with a lower number of datanodes than configured, 
which may lead to data loss.

{code}
+            if (DFSClient.LOG.isTraceEnabled()) {
+              DFSClient.LOG.trace("Failed to replace datanode", ioe);
+            }
{code}

Failure to replace a datanode is a very serious issue.  This ought to be 
{{LOG.error}} or {{LOG.warn}}, not trace.  Also, should we mention that we are 
continuing only because best effort is configured?

{code}
+    /** Is the condition satisfied? */
+    public boolean satisfy(final short replication,
+        final DatanodeInfo[] existings, final int n, final boolean isAppend,
+        final boolean isHflushed);
{code}
Can you add more JavaDoc for this class?  For example, we should document what 
the parameters are.

> The ReplaceDatanodeOnFailure policies could have a forgiving option
> -------------------------------------------------------------------
>
>                 Key: HDFS-4257
>                 URL: https://issues.apache.org/jira/browse/HDFS-4257
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs-client
>    Affects Versions: 2.0.2-alpha
>            Reporter: Harsh J
>            Assignee: Tsz Wo Nicholas Sze
>            Priority: Minor
>         Attachments: h4257_20140325.patch, h4257_20140325b.patch, 
> h4257_20140326.patch, h4257_20140819.patch
>
>
> Similar question has previously come over HDFS-3091 and friends, but the 
> essential problem is: "Why can't I write to my cluster of 3 nodes, when I 
> just have 1 node available at a point in time.".
> The policies cover the 4 options, with {{Default}} being default:
> {{Disable}} -> Disables the whole replacement concept by throwing out an 
> error (at the server) or acts as {{Never}} at the client.
> {{Never}} -> Never replaces a DN upon pipeline failures (not too desirable in 
> many cases).
> {{Default}} -> Replace based on a few conditions, but whose minimum never 
> touches 1. We always fail if only one DN remains and none others can be added.
> {{Always}} -> Replace no matter what. Fail if can't replace.
> Would it not make sense to have an option similar to Always/Default, where 
> despite _trying_, if it isn't possible to have > 1 DN in the pipeline, do not 
> fail. I think that is what the former write behavior was, and what fit with 
> the minimum replication factor allowed value.
> Why is it grossly wrong to pass a write from a client for a block with just 1 
> remaining replica in the pipeline (the minimum of 1 grows with the 
> replication factor demanded from the write), when replication is taken care 
> of immediately afterwards? How often have we seen missing blocks arise out of 
> allowing this + facing a big rack(s) failure or so?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to