[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster

Uma Maheswara Rao G (JIRA) Wed, 13 Mar 2013 21:40:15 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602020#comment-13602020
 ]


Uma Maheswara Rao G commented on HDFS-4600:
-------------------------------------------

if the cluster size is 2 nodes, we recommend to disable this feature as we can 
not replace with new node in any case.

Look at the comment for the configuration in hdfs-default.xml:
{quote}
If there is a datanode/network failure in the write pipeline, DFSClient will 
try to remove the failed datanode from the pipeline and then continue writing 
with the remaining datanodes. As a result, the number of datanodes in the 
pipeline is decreased. The feature is to add new datanodes to the pipeline. 
This is a site-wide property to enable/disable the feature. When the cluster 
size is extremely small, e.g. 3 nodes or less, cluster administrators may want 
to set the policy to NEVER in the default configuration file or disable this 
feature. Otherwise, users may experience an unusually high rate of pipeline 
failures since it is impossible to find new datanodes for replacement. See also 
dfs.client.block.write.replace-datanode-on-failure.policy 
{quote}

to disable this, we can set 
dfs.client.block.write.replace-datanode-on-failure.enable to false.

Here existing/selected nodes will be only 2 and replication will be still set 
to 3? if So, policy might satisfied and trying to add another in pipeline I 
think. Alternatively you can set replication to 2 as your cluster max size is 
only 2. If replication is less than 3, this policy will not be used.  With 
smaller clusters It is always recommended to disable this fature.
                
> HDFS file append failing in multinode cluster
> ---------------------------------------------
>
>                 Key: HDFS-4600
>                 URL: https://issues.apache.org/jira/browse/HDFS-4600
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.3-alpha
>            Reporter: Roman Shaposhnik
>            Priority: Blocker
>             Fix For: 2.0.4-alpha
>
>         Attachments: core-site.xml, hdfs-site.xml, X.java
>
>
> NOTE: the following only happens in a fully distributed setup (core-site.xml 
> and hdfs-site.xml are attached)
> Steps to reproduce:
> {noformat}
> $ javac -cp /usr/lib/hadoop/client/\* X.java
> $ echo aaaaa > a.txt
> $ hadoop fs -ls /tmp/a.txt
> ls: `/tmp/a.txt': No such file or directory
> $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt
> 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> Exception in thread "main" java.io.IOException: Failed to replace a bad 
> datanode on the existing pipeline due to no more good datanodes being 
> available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[10.10.37.16:50010, 10.80.134.126:50010], 
> original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470)
> {noformat}
> Given that the file actually does get created:
> {noformat}
> $ hadoop fs -ls /tmp/a.txt
> Found 1 items
> -rw-r--r--   3 root hadoop          6 2013-03-13 16:05 /tmp/a.txt
> {noformat}
> this feels like a regression in APPEND's functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4600) HDFS file append failing in multinode cluster

Reply via email to