[jira] [Commented] (HDFS-3091) Failed to add new DataNode in pipeline and will be resulted into write failure.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232474#comment-13232474 ] liaowenrui commented on HDFS-3091: -- yeah,I agree with him! thank you for your answer! its designing idea is good,But this implement have a defect. I think this featrue is to guarantee replication in reliability. assume cluster size is 10,user set the replication values to 10, when it have been writen 9,one of them is bad, do you think that its writing is success? Failed to add new DataNode in pipeline and will be resulted into write failure. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3113) httpfs does not support delegation tokens
httpfs does not support delegation tokens - Key: HDFS-3113 URL: https://issues.apache.org/jira/browse/HDFS-3113 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.3 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 0.24.0, 0.23.3 httpfs does not support calls to get/renew tokens nor delegation token authentication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3114) Remove implementing Writable interface for the internal data types in HDFS
Remove implementing Writable interface for the internal data types in HDFS -- Key: HDFS-3114 URL: https://issues.apache.org/jira/browse/HDFS-3114 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas With changes done in 0.23 and trunk, there is a clear separation of wire types and implementation types. Given this, lot of Writable code associated with internal types can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232725#comment-13232725 ] Milind Bhandarkar commented on HDFS-3107: - This will be a great addition to HDFS for a couple of reasons: 1. Having an append without a truncate is a serious deficiency. 2. If a user mistakenly starts to append data to an existing large file, and discovers the mistake, the only recourse is to recreate that file, by rewriting the contents. This is very inefficient. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232737#comment-13232737 ] Suresh Srinivas commented on HDFS-3105: --- Comments: # Not sure how UpdateReplicaUnderRecoveryResponseProto can have storage instead of block? Also do you need DatanodeStorage or just storageID sufficient? # Please do not update service protocol version, as this is with in a release. This is not used any more and we need to clean this up at some point in time. Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3115) Update hdfs design doc to consider HA NNs
Update hdfs design doc to consider HA NNs - Key: HDFS-3115 URL: https://issues.apache.org/jira/browse/HDFS-3115 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 0.24.0, 0.23.3 Reporter: Todd Lipcon Priority: Minor The hdfs_design_doc.xml still references the NN as an SPOF, which is no longer true. We should sweep docs for anything else that seems to be out of date with HA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Failed to add new DataNode in pipeline and will be resulted into write failure.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232786#comment-13232786 ] Tsz Wo (Nicholas), SZE commented on HDFS-3091: -- What if we log about the cluster size and can recommend to disable this feature with smaller clusters? ... Sure, let's add some comments for this. assume cluster size is 10,user set the replication values to 10, when it have been writen 9,one of them is bad, do you think that its writing is success? If the DEFAULT policy is used, the pipeline won't fail until the number of datanodes N drops to 5 as described in (2) in the description. In your example, if the user set replication to 18 and N drops to 9, the write should fail. Failed to add new DataNode in pipeline and will be resulted into write failure. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3091) Failed to add new DataNode in pipeline and will be resulted into write failure.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3091: - Attachment: h3091_20120319.patch h3091_20120319.patch: add comments for small clusters. Failed to add new DataNode in pipeline and will be resulted into write failure. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3091) Failed to add new DataNode in pipeline and will be resulted into write failure.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3091: - Attachment: h3091_20120319.patch Failed to add new DataNode in pipeline and will be resulted into write failure. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Attachments: h3091_20120319.patch, h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3091) Failed to add new DataNode in pipeline and will be resulted into write failure.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3091: - Attachment: (was: h3091_20120319.patch) Failed to add new DataNode in pipeline and will be resulted into write failure. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Failed to add new DataNode in pipeline and will be resulted into write failure.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232817#comment-13232817 ] Uma Maheswara Rao G commented on HDFS-3091: --- Thanks a lot Nicholas. Patch looks good. +1 Failed to add new DataNode in pipeline and will be resulted into write failure. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-3091: -- Target Version/s: 0.24.0, 0.23.3 (was: 0.23.3, 0.24.0) Summary: Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. (was: Failed to add new DataNode in pipeline and will be resulted into write failure.) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3105: - Attachment: h3105_20120319.patch Thanks Suresh for the review. h3105_20120319.patch: returns storageID instead of DatanodeStorage and reverts the versionID changes. Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3004: -- Status: Patch Available (was: Open) Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232844#comment-13232844 ] Uma Maheswara Rao G commented on HDFS-3091: --- Updated the title. Committed to trunk. Thanks Nicholas for the patch. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G resolved HDFS-3091. --- Resolution: Fixed Assignee: Tsz Wo (Nicholas), SZE Target Version/s: 0.24.0, 0.23.3 (was: 0.23.3, 0.24.0) Hadoop Flags: Reviewed Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-3091: -- Target Version/s: 0.24.0, 0.23.3 (was: 0.23.3, 0.24.0) Fix Version/s: 0.24.0 Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232845#comment-13232845 ] Uma Maheswara Rao G commented on HDFS-3091: --- tomorrow, I will back-port this to 0.23 branch as well. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232849#comment-13232849 ] Hudson commented on HDFS-3091: -- Integrated in Hadoop-Common-trunk-Commit #1900 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1900/]) HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. Contributed by Nicholas. (Revision 1302624) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302624 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232854#comment-13232854 ] Hudson commented on HDFS-3091: -- Integrated in Hadoop-Hdfs-trunk-Commit #1974 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1974/]) HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. Contributed by Nicholas. (Revision 1302624) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302624 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232858#comment-13232858 ] Uma Maheswara Rao G commented on HDFS-3091: --- Just merged to 0.23 also. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3091: - Target Version/s: 0.24.0, 0.23.3 (was: 0.23.3, 0.24.0) Fix Version/s: 0.23.3 Issue Type: Improvement (was: Bug) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232867#comment-13232867 ] Hudson commented on HDFS-3091: -- Integrated in Hadoop-Hdfs-0.23-Commit #693 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/693/]) Merge HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. Contributed by Nicholas. (Revision 1302633) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302633 Files : * /hadoop/common/branches/branch-0.23 * /hadoop/common/branches/branch-0.23/hadoop-common-project * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++ * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job * /hadoop/common/branches/branch-0.23/hadoop-project * /hadoop/common/branches/branch-0.23/hadoop-project/src/site Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232871#comment-13232871 ] Hudson commented on HDFS-3091: -- Integrated in Hadoop-Common-0.23-Commit #702 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/702/]) Merge HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. Contributed by Nicholas. (Revision 1302633) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302633 Files : * /hadoop/common/branches/branch-0.23 * /hadoop/common/branches/branch-0.23/hadoop-common-project * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++ * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job * /hadoop/common/branches/branch-0.23/hadoop-project * /hadoop/common/branches/branch-0.23/hadoop-project/src/site Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232873#comment-13232873 ] Suresh Srinivas commented on HDFS-3105: --- +1 for the patch. Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2386) with security enabled fsck calls lead to handshake_failure and hftp fails throwing the same exception in the logs
[ https://issues.apache.org/jira/browse/HDFS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232875#comment-13232875 ] Joey Echeverria commented on HDFS-2386: --- From testing I've been doing it looks like KSSL won't work without at least one of the DES encryption types enabled (e.g. DES_CBC_CRC). This looks like it's caused by a bug in the JDK. Basically, AES and RC4 don't pad unless they encrypt a message which is not a multiple of a block. However, the JDK is assuming that the PreMasterSecret will be padded and assumes that the last byte in the decrypted secret is the length of the padding. When using AES or RC4, this ends up being a random byte and usually will cause the JDK to end up with an invalid PreMasterSecret. In defense of this, the JDK generates a random secret that then caused the handshake to fail later on. I need to do some more testing with another version of Kerberos, but I plan on filing a JDK bug. with security enabled fsck calls lead to handshake_failure and hftp fails throwing the same exception in the logs - Key: HDFS-2386 URL: https://issues.apache.org/jira/browse/HDFS-2386 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.205.0 Reporter: Arpit Gupta -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232885#comment-13232885 ] Hudson commented on HDFS-3091: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1908 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1908/]) HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. Contributed by Nicholas. (Revision 1302624) Result = ABORTED umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302624 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232897#comment-13232897 ] Hadoop QA commented on HDFS-3105: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518930/h3105_20120319.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2034//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2034//console This message is automatically generated. Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3116) Typo in fetchdt error message
Typo in fetchdt error message - Key: HDFS-3116 URL: https://issues.apache.org/jira/browse/HDFS-3116 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.24.0 Reporter: Aaron T. Myers Priority: Trivial In {{DelegationTokenFetcher.java}} there's the following typo of the word exactly: {code} System.err.println(ERROR: Must specify exacltly one token file); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232927#comment-13232927 ] Hadoop QA commented on HDFS-3004: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518778/HDFS-3004.019.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 24 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader org.apache.hadoop.hdfs.server.namenode.TestNameNodeRecovery org.apache.hadoop.hdfs.server.namenode.TestEditLog +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2035//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2035//console This message is automatically generated. Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232936#comment-13232936 ] Hudson commented on HDFS-3091: -- Integrated in Hadoop-Mapreduce-0.23-Commit #710 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/710/]) Merge HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. Contributed by Nicholas. (Revision 1302633) Result = ABORTED umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302633 Files : * /hadoop/common/branches/branch-0.23 * /hadoop/common/branches/branch-0.23/hadoop-common-project * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++ * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job * /hadoop/common/branches/branch-0.23/hadoop-project * /hadoop/common/branches/branch-0.23/hadoop-project/src/site Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed
[jira] [Updated] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3094: -- Attachment: (was: HDFS-3094.docs.patch) add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232956#comment-13232956 ] Arpit Gupta commented on HDFS-3094: --- {code} -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestLeaseRecovery2 +1 contrib tests. The patch passed contrib unit tests. {code} I reran the test class multiple times and it went through. I have also created HADOOP-8185 for documentation changes to trunk. add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3105: - Resolution: Fixed Fix Version/s: 0.23.3 0.24.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I have committed this to trunk and 0.23. Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232960#comment-13232960 ] Hudson commented on HDFS-3105: -- Integrated in Hadoop-Hdfs-trunk-Commit #1976 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1976/]) HDFS-3105. Add DatanodeStorage information to block recovery. (Revision 1302683) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302683 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestInterDatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232963#comment-13232963 ] Hudson commented on HDFS-3105: -- Integrated in Hadoop-Common-trunk-Commit #1902 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1902/]) HDFS-3105. Add DatanodeStorage information to block recovery. (Revision 1302683) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302683 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestInterDatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232976#comment-13232976 ] Hudson commented on HDFS-3105: -- Integrated in Hadoop-Common-0.23-Commit #704 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/704/]) svn merge -c 1302683 from trunk for HDFS-3105. (Revision 1302685) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302685 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestInterDatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232977#comment-13232977 ] Hudson commented on HDFS-3105: -- Integrated in Hadoop-Hdfs-0.23-Commit #695 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/695/]) svn merge -c 1302683 from trunk for HDFS-3105. (Revision 1302685) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302685 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestInterDatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-309) FSEditLog should log progress during replay
[ https://issues.apache.org/jira/browse/HDFS-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232982#comment-13232982 ] Todd Lipcon commented on HDFS-309: -- Hi Sho. This patch fell out of date when we merged the HA branch, I believe. Would you mind updating it against the current trunk? FSEditLog should log progress during replay --- Key: HDFS-309 URL: https://issues.apache.org/jira/browse/HDFS-309 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Sho Shimauchi Labels: newbie Attachments: HDFS-309.txt, HDFS-309.txt, HDFS-309.txt When the NameNode is replaying a long edit log, it's handy to have reports on how far through it is, so you can judge how much time it is remaining. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2983) Relax the build version check to permit rolling upgrades within a release
[ https://issues.apache.org/jira/browse/HDFS-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2983: -- Target Version/s: 1.1.0, 0.23.2 (was: 0.23.2) Relax the build version check to permit rolling upgrades within a release - Key: HDFS-2983 URL: https://issues.apache.org/jira/browse/HDFS-2983 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Eli Collins Currently the version check for DN/NN communication is strict (it checks the exact svn revision or git hash, Storage#getBuildVersion calls VersionInfo#getRevision), which prevents rolling upgrades across any releases. Once we have the PB-base RPC in place (coming soon to branch-23) we'll have the necessary pieces in place to loosen this restriction, though perhaps it takes another 23 minor release or so before we're ready to commit to making the minor versions compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232991#comment-13232991 ] Hudson commented on HDFS-3105: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1910 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1910/]) HDFS-3105. Add DatanodeStorage information to block recovery. (Revision 1302683) Result = ABORTED szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302683 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestInterDatanodeProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233002#comment-13233002 ] Hudson commented on HDFS-3105: -- Integrated in Hadoop-Mapreduce-0.23-Commit #711 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/711/]) svn merge -c 1302683 from trunk for HDFS-3105. (Revision 1302685) Result = ABORTED szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302685 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestInterDatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3083: - Attachment: HDFS-3083-combined.patch Here's a patch which addresses the issue. It includes changes in both HDFS and Common projects, so test-patch isn't going to work. I can create separate JIRAs if folks want, but I figure reviewing it would be easier as a single patch. No tests are included since security has to be enabled to verify the fix. To test it out, I ran the DT test script attached to HDFS-2904, with the following extra test case appended: {code} # Token issued by nn2 should work when nn2 still active kinit -k -t ~/keytabs/$ADMIN.keytab $ADMIN/simon kinit -R hdfs haadmin -failover nn1 nn2 rm -f /tmp/token hdfs fetchdt --renewer $RENEWER /tmp/token kdestroy HADOOP_TOKEN_FILE_LOCATION=/tmp/token hadoop fs -ls / {code} All of the tests in the test script passed with this patch applied. The above test fails without the patch, and passes with it. I also successfully ran some MR jobs with the second-listed NN in the active state, and confirmed that everything worked as expected. HA+security: failed to run a mapred job from yarn after a manual failover - Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Mankude updated HDFS-2802: --- Attachment: snapshot-one-pager.pdf Support for RW/RO snapshots in HDFS --- Key: HDFS-2802 URL: https://issues.apache.org/jira/browse/HDFS-2802 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Affects Versions: 0.24.0 Reporter: Hari Mankude Assignee: Hari Mankude Attachments: snapshot-one-pager.pdf Snapshots are point in time images of parts of the filesystem or the entire filesystem. Snapshots can be a read-only or a read-write point in time copy of the filesystem. There are several use cases for snapshots in HDFS. I will post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233044#comment-13233044 ] Hari Mankude commented on HDFS-2802: Uploaded the one pager. More detailed design doc and the first version of the patch is in the works. Support for RW/RO snapshots in HDFS --- Key: HDFS-2802 URL: https://issues.apache.org/jira/browse/HDFS-2802 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Affects Versions: 0.24.0 Reporter: Hari Mankude Assignee: Hari Mankude Attachments: snapshot-one-pager.pdf Snapshots are point in time images of parts of the filesystem or the entire filesystem. Snapshots can be a read-only or a read-write point in time copy of the filesystem. There are several use cases for snapshots in HDFS. I will post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3100: - Description: STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. was: STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. Assignee: Brandon Li (was: Tsz Wo (Nicholas), SZE) failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3100: - Affects Version/s: 0.24.0 failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0, 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3089) Move FSDatasetInterface and other related classes/interfaces to a package
[ https://issues.apache.org/jira/browse/HDFS-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3089: - Attachment: svn_mv.sh h3089_20120319_svn_mv.patch svn_mv.sh: a script to run svn mv h3089_20120319_svn_mv.patch: updated with trunk. Move FSDatasetInterface and other related classes/interfaces to a package - Key: HDFS-3089 URL: https://issues.apache.org/jira/browse/HDFS-3089 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3089_20120316_svn_mv.patch, h3089_20120319_svn_mv.patch, svn_mv.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3100: - Attachment: HDFS-3100.patch Attached patch for the trunk. failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0, 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: HDFS-3100.patch, hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3100: - Status: Patch Available (was: Open) failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.1, 0.24.0 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: HDFS-3100.patch, hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233068#comment-13233068 ] Hadoop QA commented on HDFS-3100: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518984/HDFS-3100.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2036//console This message is automatically generated. failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0, 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: HDFS-3100.patch, hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233088#comment-13233088 ] Colin Patrick McCabe commented on HDFS-3004: Hi Todd, Thanks for looking at this. We'll have to chat about EditLogInputException, since there are a few things that are unclear to me about that exception. It's used almost nowhere in the code. Pretty much every deserialization error shows up as an IOException. If the intention was that deserialization errors would be EditLogInputExceptions, we need to make that clear and actually implement it. It will be quite a large amount of work, though-- probably a patch at least as big as this one, maybe more. I don't really understand how EditLogTailer is used in practice, so I can't evaluate how reasonable this is. C. Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233095#comment-13233095 ] jirapos...@reviews.apache.org commented on HDFS-2834: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/#review6103 --- Real close now! hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java https://reviews.apache.org/r/4212/#comment13128 this comment seems like it's in the wrong spot, since the code that comes after it doesn't reference offsetFromChunkBoundary. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java https://reviews.apache.org/r/4212/#comment13130 shouldn't this be true? hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java https://reviews.apache.org/r/4212/#comment13132 no reason to use DFSClient here. Instead you can just use the filesystem, right? Then downcast the stream you get back? hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java https://reviews.apache.org/r/4212/#comment13131 don't you want an assert on sawException here? You can also use GenericTestUtils.assertExceptionContains() if you want to check the text of it - Todd On 2012-03-09 00:47:24, Henry Robinson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4212/ bq. --- bq. bq. (Updated 2012-03-09 00:47:24) bq. bq. bq. Review request for hadoop-hdfs and Todd Lipcon. bq. bq. bq. Summary bq. --- bq. bq. New patch for HDFS-2834 (I can't update the old review request). bq. bq. bq. This addresses bug HDFS-2834. bq. http://issues.apache.org/jira/browse/HDFS-2834 bq. bq. bq. Diffs bq. - bq. bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 bq. bq. Diff: https://reviews.apache.org/r/4212/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Henry bq. bq. ByteBuffer-based read API for DFSInputStream Key: HDFS-2834 URL: https://issues.apache.org/jira/browse/HDFS-2834 Project: Hadoop HDFS Issue Type: Improvement Reporter: Henry Robinson Assignee: Henry Robinson Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated {{byte[]}}. Although for many clients this is desired behaviour, in certain situations, such as native-reads through libhdfs, this imposes an extra copy penalty since the {{byte[]}} needs to be copied out again into a natively readable memory area. For these cases, it would be preferable to allow the client to supply its own buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233099#comment-13233099 ] Todd Lipcon commented on HDFS-3004: --- The EditLogInputExceptions are currently being thrown by this code: {code} try { if ((op = in.readOp()) == null) { break; } } catch (IOException ioe) { long badTxId = txId + 1; // because txId hasn't been incremented yet String errorMessage = formatEditLogReplayError(in, recentOpcodeOffsets, badTxId); FSImage.LOG.error(errorMessage); throw new EditLogInputException(errorMessage, ioe, numEdits); } {code} It indicates that whatever exception happened was due to a deserialization error, which is distinct from an application error. EditLogTailer is used by the HA StandbyNode to tail the edits out of the edit log and apply them to the SBN's namespace. Since it's reading the same log that the active is writing, it's possible that it can see a partial edit at the end of the file, in which case it will generally see an IOException. The fact that it's being wrapped with EditLogInputException indicates that it was some problem reading the edits and can likely be retried. If the EditLogTailer gets a different type of exception, though, indicating that the _appplication_ of the edit failed, then it will exit, because it may have left the namespace in an inconsistent state and thus is no longer a candidate for failover. Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3089) Move FSDatasetInterface and other related classes/interfaces to a package
[ https://issues.apache.org/jira/browse/HDFS-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3089: - Attachment: h3089_20120319.patch h3089_20120319.patch: generated by svn rm/add for Jenkins. Move FSDatasetInterface and other related classes/interfaces to a package - Key: HDFS-3089 URL: https://issues.apache.org/jira/browse/HDFS-3089 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3089_20120316_svn_mv.patch, h3089_20120319.patch, h3089_20120319_svn_mv.patch, svn_mv.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233127#comment-13233127 ] Todd Lipcon commented on HDFS-3094: --- {code} +NONINTERACTIVE(-nonInterActive); {code} should be {{-nonInteractive}} (not a capital 'A') {code} +//default force to false +private boolean isForce=false; +//default interactive to true +private boolean isInteractive=true; {code} These comments are superfluous, since they just say the same thing as the code. Also, please add spaces before and after the '=' in the variable definitions. {code} +public boolean getisForce() { + return isForce; +} + +public void setisForce(boolean force) { + isForce = force; +} {code} Rename {{getisForce}} to just {{isForce}} or {{isForceEnabled()}}. Rename {{setisForce}} to {{setForceEnabled()}} or {{setForce()}}. Same goes for {{isInteractive}}/{{setisInteractive}} below it. {code} +//by default force is off and interactive is on +startOpt.setisForce(false); +startOpt.setisInteractive(true); {code} you already have these defaults in the variable declarations, no need to duplicate them - It looks like if you specify invalid options, it won't give any kind of useful error message. You should probably be throwing HadoopIllegalArgumentException instead of returning null in several of these cases. - I don't follow the following comment: {{+//make sure the user did not sent force or noninteractive as the clusterid or an empty clusterid}} Can you clarify it? In one of your test cases, you make a new thread and then sleep. This is not a reliable way of testing, especially since it wants to get user input. This won't work well in many test environments. I'd suggest we just use manual tests for this, or else set up a way to override System.in for the purpose of the test, so you can test without spawning a new thread. Style nits: please make the code look like the surrounding style in the rest of the codebase. Spaces around '=' signs. No spaces after '(' in if statements. Maximum 80 characters in a line, etc. No tabs (two space indentation). Space after '//'. Please read over your comments for typos as well. add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233134#comment-13233134 ] Suresh Srinivas commented on HDFS-3107: --- bq. if a user mistakenly starts to append data to an existing large file, and discovers the mistake, the only recourse is to recreate that file, by rewriting the contents. This is very inefficient. What if user accidentally truncates a file :-) HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233146#comment-13233146 ] Colin Patrick McCabe commented on HDFS-3004: Ok, I think I see what you are trying to express with this exception. Exceptions reading the edit log, as opposed to exceptions applying the edits. Since I only looked at what was in trunk, I didn't really see where it was useful, but now I understand. I do kind of wonder if readOp itself should be doing this, just for consistency's sake. C. Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233148#comment-13233148 ] Milind Bhandarkar commented on HDFS-3107: - What if user accidentally deletes a directory ? You guys never supported me when I asked for a file-by-file deletion, that could be aborted in time to save 70 pct of users' time, right? Instead you have always supported a directory deletion with a single misdirected RPC. Anyway, to answer your question, if user accidentally truncates, he/she can always append again, without losing any efficiency. Can we have some mature discussions on this jira please ? -- Milind Bhandarkar Chief Architect, Greenplum Labs, Data Computing Division, EMC +1-650-523-3858 (W) +1-408-666-8483 (C) HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3117) clean cache and can't start hadoop
clean cache and can't start hadoop -- Key: HDFS-3117 URL: https://issues.apache.org/jira/browse/HDFS-3117 Project: Hadoop HDFS Issue Type: Task Reporter: cldoltd i use command cache 3 /proc/sys/vm/drop_caches to clean cache Now i can't start hadoop. thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233160#comment-13233160 ] Arpit Gupta commented on HDFS-3094: --- Thanks for the review Todd. I will make appropriate changes to the branch 1 and trunk. bq. I don't follow the following comment: + //make sure the user did not sent force or noninteractive as the clusterid or an empty clusterid Can you clarify it? What i mean there is that if the user entered a wrong command where they did not specify a clusterid. {code} ./bin/hadoop namenode -format -clusterid -force {code} add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3071) haadmin failover command does not provide enough detail for when target NN is not ready to be active
[ https://issues.apache.org/jira/browse/HDFS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3071: -- Attachment: hdfs-3071.txt Here's a patch which addresses the issue. Unfortunately it's cross-project, and no real way to split it up without breaking one or the other. on commit. As an experiment, I made the change in such a way that it wouldn't break protocol compatibility. This resulted in a sort of strange API naming. Let me know if you think it's better to just break the wire protocol (since we haven't had an Apache release with HA yet, it's probably acceptable) haadmin failover command does not provide enough detail for when target NN is not ready to be active Key: HDFS-3071 URL: https://issues.apache.org/jira/browse/HDFS-3071 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon Attachments: hdfs-3071.txt When running the failover command, you can get an error message like the following: {quote} $ hdfs --config $(pwd) haadmin -failover namenode2 namenode1 Failover failed: xxx.yyy/1.2.3.4:8020 is not ready to become active {quote} Unfortunately, the error message doesn't describe why that node isn't ready to be active. In my case, the target namenode's logs don't indicate anything either. It turned out that the issue was Safe mode is ON.Resources are low on NN. Safe mode must be turned off manually., but ideally the user would be told that at the time of the failover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3071) haadmin failover command does not provide enough detail for when target NN is not ready to be active
[ https://issues.apache.org/jira/browse/HDFS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233193#comment-13233193 ] Todd Lipcon commented on HDFS-3071: --- I tested this manually in addition to the unit tests. For the manual test, I put one of the NNs in safemode and then issued the failover command: {code} todd@todd-w510:~/git/hadoop-common/hadoop-dist/target/hadoop-0.24.0-SNAPSHOT$ ./bin/hdfs haadmin -failover nn2 nn1 Failover failed: todd-w510/127.0.0.1:8021 is not ready to become active: Not ready to go active, since the node is in safemode. Use hdfs dfsadmin -safemode leave to turn safe mode off. {code} haadmin failover command does not provide enough detail for when target NN is not ready to be active Key: HDFS-3071 URL: https://issues.apache.org/jira/browse/HDFS-3071 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon Attachments: hdfs-3071.txt When running the failover command, you can get an error message like the following: {quote} $ hdfs --config $(pwd) haadmin -failover namenode2 namenode1 Failover failed: xxx.yyy/1.2.3.4:8020 is not ready to become active {quote} Unfortunately, the error message doesn't describe why that node isn't ready to be active. In my case, the target namenode's logs don't indicate anything either. It turned out that the issue was Safe mode is ON.Resources are low on NN. Safe mode must be turned off manually., but ideally the user would be told that at the time of the failover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3081) SshFenceByTcpPort uses netcat incorrectly
[ https://issues.apache.org/jira/browse/HDFS-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3081: -- Status: Patch Available (was: Open) SshFenceByTcpPort uses netcat incorrectly - Key: HDFS-3081 URL: https://issues.apache.org/jira/browse/HDFS-3081 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon Attachments: hdfs-3081.txt SshFencyByTcpPort currently assumes that the NN is listening on localhost. Typical setups have the namenode listening just on the hostname of the namenode, which would lead nc -z to not catch it. Here's an example in which the NN is running, listening on 8020, but doesn't respond to localhost 8020. {noformat} [root@xxx ~]# lsof -P -p 5286 | grep -i listen java5286 root 110u IPv41772357 TCP xxx:8020 (LISTEN) java5286 root 121u IPv41772397 TCP xxx:50070 (LISTEN) [root@xxx ~]# nc -z localhost 8020 [root@xxx ~]# nc -z xxx 8020 Connection to xxx 8020 port [tcp/intu-ec-svcdisc] succeeded! {noformat} Here's the likely offending code: {code} LOG.info( Indeterminate response from trying to kill service. + Verifying whether it is running using nc...); rc = execCommand(session, nc -z localhost 8020); {code} Naively, we could rely on netcat to the correct hostname (since the NN ought to be listening on the hostname it's configured as), or just to use fuser. Fuser catches ports independently of what IPs they're bound to: {noformat} [root@xxx ~]# fuser 1234/tcp 1234/tcp: 6766 6768 [root@xxx ~]# jobs [1]- Running nc -l localhost 1234 [2]+ Running nc -l rhel56-18.ent.cloudera.com 1234 [root@xxx ~]# sudo lsof -P | grep -i LISTEN | grep -i 1234 nc 6766 root3u IPv42563626 TCP localhost:1234 (LISTEN) nc 6768 root3u IPv42563671 TCP xxx:1234 (LISTEN) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3081) SshFenceByTcpPort uses netcat incorrectly
[ https://issues.apache.org/jira/browse/HDFS-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3081: -- Attachment: hdfs-3081.txt Attached patch fixes the problem. I am still using nc to verify that it's down, since it's possible that, if the user is wrong, then fuser won't be able to find the listening process. (it has to be either the same user or root). I tested locally by using my external hostname and verifying the following in the logs: 12/03/19 21:40:19 INFO ha.SshFenceByTcpPort: Connected to todd-w510 12/03/19 21:40:19 INFO ha.SshFenceByTcpPort: Looking for process running on port 8020 12/03/19 21:40:19 DEBUG ha.SshFenceByTcpPort: Running cmd: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 8020 12/03/19 21:40:19 INFO ha.SshFenceByTcpPort: Indeterminate response from trying to kill service. Verifying whether it is running using nc... 12/03/19 21:40:19 DEBUG ha.SshFenceByTcpPort: Running cmd: nc -z todd-w510 8020 12/03/19 21:40:19 INFO ha.SshFenceByTcpPort: Verified that the service is down. SshFenceByTcpPort uses netcat incorrectly - Key: HDFS-3081 URL: https://issues.apache.org/jira/browse/HDFS-3081 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon Attachments: hdfs-3081.txt SshFencyByTcpPort currently assumes that the NN is listening on localhost. Typical setups have the namenode listening just on the hostname of the namenode, which would lead nc -z to not catch it. Here's an example in which the NN is running, listening on 8020, but doesn't respond to localhost 8020. {noformat} [root@xxx ~]# lsof -P -p 5286 | grep -i listen java5286 root 110u IPv41772357 TCP xxx:8020 (LISTEN) java5286 root 121u IPv41772397 TCP xxx:50070 (LISTEN) [root@xxx ~]# nc -z localhost 8020 [root@xxx ~]# nc -z xxx 8020 Connection to xxx 8020 port [tcp/intu-ec-svcdisc] succeeded! {noformat} Here's the likely offending code: {code} LOG.info( Indeterminate response from trying to kill service. + Verifying whether it is running using nc...); rc = execCommand(session, nc -z localhost 8020); {code} Naively, we could rely on netcat to the correct hostname (since the NN ought to be listening on the hostname it's configured as), or just to use fuser. Fuser catches ports independently of what IPs they're bound to: {noformat} [root@xxx ~]# fuser 1234/tcp 1234/tcp: 6766 6768 [root@xxx ~]# jobs [1]- Running nc -l localhost 1234 [2]+ Running nc -l rhel56-18.ent.cloudera.com 1234 [root@xxx ~]# sudo lsof -P | grep -i LISTEN | grep -i 1234 nc 6766 root3u IPv42563626 TCP localhost:1234 (LISTEN) nc 6768 root3u IPv42563671 TCP xxx:1234 (LISTEN) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3084) FenceMethod.tryFence() and ShellCommandFencer should pass namenodeId as well as host:port
[ https://issues.apache.org/jira/browse/HDFS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HDFS-3084: - Assignee: Todd Lipcon FenceMethod.tryFence() and ShellCommandFencer should pass namenodeId as well as host:port - Key: HDFS-3084 URL: https://issues.apache.org/jira/browse/HDFS-3084 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0, 0.23.3 Reporter: Philip Zeyliger Assignee: Todd Lipcon The FenceMethod interface passes along the host:port of the NN that needs to be fenced. That's great for the common case. However, it's likely necessary to have extra configuration parameters for fencing, and these are typically keyed off the nameserviceId.namenodeId (if, for nothing else, consistency with all the other parameters that are keyed off of namespaceId.namenodeId). Obviously this can be backed out from the host:port, but it's inconvenient, and requires iterating through all the configs. The shell interface exhibits the same issue: host:port is great for most fencers, but if you need extra configs (like the host:port of the power supply unit), those are harder to pipe through without the namenodeId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3081) SshFenceByTcpPort uses netcat incorrectly
[ https://issues.apache.org/jira/browse/HDFS-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233206#comment-13233206 ] Hadoop QA commented on HDFS-3081: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519021/hdfs-3081.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2038//console This message is automatically generated. SshFenceByTcpPort uses netcat incorrectly - Key: HDFS-3081 URL: https://issues.apache.org/jira/browse/HDFS-3081 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon Attachments: hdfs-3081.txt SshFencyByTcpPort currently assumes that the NN is listening on localhost. Typical setups have the namenode listening just on the hostname of the namenode, which would lead nc -z to not catch it. Here's an example in which the NN is running, listening on 8020, but doesn't respond to localhost 8020. {noformat} [root@xxx ~]# lsof -P -p 5286 | grep -i listen java5286 root 110u IPv41772357 TCP xxx:8020 (LISTEN) java5286 root 121u IPv41772397 TCP xxx:50070 (LISTEN) [root@xxx ~]# nc -z localhost 8020 [root@xxx ~]# nc -z xxx 8020 Connection to xxx 8020 port [tcp/intu-ec-svcdisc] succeeded! {noformat} Here's the likely offending code: {code} LOG.info( Indeterminate response from trying to kill service. + Verifying whether it is running using nc...); rc = execCommand(session, nc -z localhost 8020); {code} Naively, we could rely on netcat to the correct hostname (since the NN ought to be listening on the hostname it's configured as), or just to use fuser. Fuser catches ports independently of what IPs they're bound to: {noformat} [root@xxx ~]# fuser 1234/tcp 1234/tcp: 6766 6768 [root@xxx ~]# jobs [1]- Running nc -l localhost 1234 [2]+ Running nc -l rhel56-18.ent.cloudera.com 1234 [root@xxx ~]# sudo lsof -P | grep -i LISTEN | grep -i 1234 nc 6766 root3u IPv42563626 TCP localhost:1234 (LISTEN) nc 6768 root3u IPv42563671 TCP xxx:1234 (LISTEN) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3081) SshFenceByTcpPort uses netcat incorrectly
[ https://issues.apache.org/jira/browse/HDFS-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233211#comment-13233211 ] Philip Zeyliger commented on HDFS-3081: --- Patch looks good to me; thanks! SshFenceByTcpPort uses netcat incorrectly - Key: HDFS-3081 URL: https://issues.apache.org/jira/browse/HDFS-3081 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon Attachments: hdfs-3081.txt SshFencyByTcpPort currently assumes that the NN is listening on localhost. Typical setups have the namenode listening just on the hostname of the namenode, which would lead nc -z to not catch it. Here's an example in which the NN is running, listening on 8020, but doesn't respond to localhost 8020. {noformat} [root@xxx ~]# lsof -P -p 5286 | grep -i listen java5286 root 110u IPv41772357 TCP xxx:8020 (LISTEN) java5286 root 121u IPv41772397 TCP xxx:50070 (LISTEN) [root@xxx ~]# nc -z localhost 8020 [root@xxx ~]# nc -z xxx 8020 Connection to xxx 8020 port [tcp/intu-ec-svcdisc] succeeded! {noformat} Here's the likely offending code: {code} LOG.info( Indeterminate response from trying to kill service. + Verifying whether it is running using nc...); rc = execCommand(session, nc -z localhost 8020); {code} Naively, we could rely on netcat to the correct hostname (since the NN ought to be listening on the hostname it's configured as), or just to use fuser. Fuser catches ports independently of what IPs they're bound to: {noformat} [root@xxx ~]# fuser 1234/tcp 1234/tcp: 6766 6768 [root@xxx ~]# jobs [1]- Running nc -l localhost 1234 [2]+ Running nc -l rhel56-18.ent.cloudera.com 1234 [root@xxx ~]# sudo lsof -P | grep -i LISTEN | grep -i 1234 nc 6766 root3u IPv42563626 TCP localhost:1234 (LISTEN) nc 6768 root3u IPv42563671 TCP xxx:1234 (LISTEN) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira