[ https://issues.apache.org/jira/browse/HDFS-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604329#comment-13604329 ]
Uma Maheswara Rao G commented on HDFS-4600: ------------------------------------------- [~tucu00] {quote}, why does it work OK in a pseudo cluster setup then?{quote} In pseudo mode replication would set to 1 right with single DN. So, in Append call it will not try adding any new node as it is meeting replication. In fully distributed mode replication would be default to 3. But here we have only 2 nodes in cluster. {code} and yet in the very same scenario the plain write would be successful. All I am saying is that there's a surprising inconsistency here. {code} This feature is to check only if there are pipeline failures. see the property name dfs.client.block.write.replace-datanode-on-failure.enable . Here additionally we are checking even in append as we can have chance here to add node to pipeline if there are less. Ideally NN is not giving enough nodes means, 1) cluster it would not have good nodes 2) cluster is not having many nodes as many expected. So, here both the cases we can not replace with new nodes. For the #2, we are not recommending to enable this feature. For #1, I don't think this will happen in any normal clusters with more nodes. Because pipeline setup will happen normally if there are available nodes. It may fail later if there are NW issues or any crashes etc, that time anyway recovery will trigger and this feature will come into picture to not reducing the nodes in pipeline. > HDFS file append failing in multinode cluster > --------------------------------------------- > > Key: HDFS-4600 > URL: https://issues.apache.org/jira/browse/HDFS-4600 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.3-alpha > Reporter: Roman Shaposhnik > Priority: Minor > Fix For: 2.0.4-alpha > > Attachments: core-site.xml, hdfs-site.xml, X.java > > > NOTE: the following only happens in a fully distributed setup (core-site.xml > and hdfs-site.xml are attached) > Steps to reproduce: > {noformat} > $ javac -cp /usr/lib/hadoop/client/\* X.java > $ echo aaaaa > a.txt > $ hadoop fs -ls /tmp/a.txt > ls: `/tmp/a.txt': No such file or directory > $ HADOOP_CLASSPATH=`pwd` hadoop X /tmp/a.txt > 13/03/13 16:05:14 WARN hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > Exception in thread "main" java.io.IOException: Failed to replace a bad > datanode on the existing pipeline due to no more good datanodes being > available to try. (Nodes: current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > 13/03/13 16:05:14 ERROR hdfs.DFSClient: Failed to close file /tmp/a.txt > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[10.10.37.16:50010, 10.80.134.126:50010], > original=[10.10.37.16:50010, 10.80.134.126:50010]). The current failed > datanode replacement policy is DEFAULT, and a client may configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:793) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:858) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:964) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:470) > {noformat} > Given that the file actually does get created: > {noformat} > $ hadoop fs -ls /tmp/a.txt > Found 1 items > -rw-r--r-- 3 root hadoop 6 2013-03-13 16:05 /tmp/a.txt > {noformat} > this feels like a regression in APPEND's functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira