[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906104#comment-13906104 ] Kihwal Lee edited comment on HDFS-5583 at 2/19/14 9:26 PM: --- Thanks for the review, Brandon. - The admin wants to know whether the command was received by the datanode: This is determined by the return code of the command. As with other commands, when the return code is not 0, the state is non-deterministic and only then the command may be reissued. I do not believe that this is a common case. Moreover, the shutdown normally takes less than two seconds and probably the reissuing shutdown manually takes more than that. In my opinion, adding support for reporting progress won't have much value. If you still feel that it needs to be changed, I will change it. Please let me know what you think. - I am planning on adding at least one more OOB ack type in near future for write draining, which will be useful for decommissioining. The reserved enums make certain checks more efficient. I will address the rest of the comments when you finish the review. was (Author: kihwal): Thanks for the review, Brandon. - The admin wants to know whether the command was received: This is determined by the return code of the command. As with other commands, when the return code is not 0, the state is non-deterministic and only then the command may be reissued. I do not believe that this is a common case. Moreover, the shutdown normally take less than two seconds and probably the reissuing shutdown manually take more than that. In my opinion, adding support for reporting progress won't have much value. If you still feel that it needs to be changed, I will change it. Please let me know what you think. - I am planning on adding at least one more OOB ack type in near future for write draining, which will be useful for decommissioining. The reserved enums make certain checks more efficient. I will address the rest of the comments when you finish the review. > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch, HDFS-5583.patch, HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548 ] Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:38 AM: --- The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the new test case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE1 [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} was (Author: kihwal): The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE1 [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548 ] Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:37 AM: --- The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE1 [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} was (Author: kihwal): The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548 ] Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:37 AM: --- The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {panel} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {panel} was (Author: kihwal): The patch makes DN send OOB acks to clients who are writing. The added test case currently doesn't do much, but after the client-side changes, it will be updated. The OOB Ack sending can still be verified from running the test new case. The test log should show something like following: {noformat} [DataNode] 2014-02-10 23:23:52,412 INFO datanode.DataNode (DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before restart 2014-02-10 23:23:52,412 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(731)) - Shutting down for restart (BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002). 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type OOB_TYPE1 [Upstream Datanode] 2014-02-10 23:23:52,413 INFO datanode.DataNode (BlockReceiver.java:run(1060)) - Relaying an out of band ack of type OOB_TYPE [Client] 2014-02-10 23:23:52,414 WARN hdfs.DFSClient (DFSOutputStream.java:run(784)) - DFSOutputStream ResponseProcessor exception for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 java.io.IOException: Bad response OOB_TYPE1 for block BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 127.0.0.1:55182 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732) {noformat} > Make DN send an OOB Ack on shutdown before restaring > > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)