[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack
[ https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913180#action_12913180 ] Hairong Kuang commented on HDFS-1346: - This patch has been deployed on our production cluster for a while and it seems to have fixed the bug. The fix helps reducing the chance of losing data when hflushing. Shall we commit it to the 0.20 append branch? DFSClient receives out of order packet ack -- Key: HDFS-1346 URL: https://issues.apache.org/jira/browse/HDFS-1346 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: blockrecv-diff.txt, outOfOrder.patch When running 0.20 patched with HDFS-101, we sometimes see an error as follow: WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq no for block blk_-2871223654872350746_21421120 10280 but received 10281 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570) This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack
[ https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913396#action_12913396 ] dhruba borthakur commented on HDFS-1346: Hi Hairong, I agree with your observation. Please commit it to 0.20 append branch too. Thanks. DFSClient receives out of order packet ack -- Key: HDFS-1346 URL: https://issues.apache.org/jira/browse/HDFS-1346 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: blockrecv-diff.txt, outOfOrder.patch When running 0.20 patched with HDFS-101, we sometimes see an error as follow: WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq no for block blk_-2871223654872350746_21421120 10280 but received 10281 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570) This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack
[ https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900465#action_12900465 ] Hairong Kuang commented on HDFS-1346: - Todd, yours is missing this patch: https://issues.apache.org/jira/secure/attachment/12439379/pipelineHeartbeat.patch. HDFS-101 says that it fixes a bug of incorrect handle of pipeline heartbeat in yahoo's hadoop security branch 0.20. But I did not put the bug description there. Koji, do you still remember what exact problem that pipelineHeartbeat.patch is fixed? DFSClient receives out of order packet ack -- Key: HDFS-1346 URL: https://issues.apache.org/jira/browse/HDFS-1346 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: blockrecv-diff.txt, outOfOrder.patch When running 0.20 patched with HDFS-101, we sometimes see an error as follow: WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq no for block blk_-2871223654872350746_21421120 10280 but received 10281 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570) This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack
[ https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899942#action_12899942 ] Hairong Kuang commented on HDFS-1346: - This seems to be caused by a race condition in BlockReceiver#PacketResponder.run(). It has the following piece of code where mirrorError is shared by two threads: {code} 1: if (!mirrorError) { 2:// read an ack from downstream datanode 3:ack.readFields(mirrorIn); 4:... 5:seqno = ack.getSeqno(); 6: } 7: if (seqno = 0 || mirrorError) { 8:Packet pkt = null; 9:synchronized (this) { 10:... 11:pkt = ackQueue.removeFirst(); 12:expected = pkt.seqno; 13:...; 14: } 15: } {code} If starting at line 1, mirrorError is false, the thread reads an ack from a downsream datanode (line 3). If it happens that the ack is for a heartbeat packet, seqno is -1 (line 5). Then if it happens that the other thread changes mirrorError to be true in between lines 4 and 5, the condition becomes true on line 7. A data packet is removed from ackQueue on line 12, which should not because the ack is for a heartbeat packet not for a data packet. So an ack for a data packet ends up being dropped. DFSClient receives out of order packet ack -- Key: HDFS-1346 URL: https://issues.apache.org/jira/browse/HDFS-1346 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append When running 0.20 patched with HDFS-101, we sometimes see an error as follow: WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq no for block blk_-2871223654872350746_21421120 10280 but received 10281 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570) This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack
[ https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900081#action_12900081 ] Todd Lipcon commented on HDFS-1346: --- Any way to trigger this with a unit test that would inject a Thread.sleep in there using mockito? Or just too hard? (I'm surprised I haven't run into this ever in append testing) DFSClient receives out of order packet ack -- Key: HDFS-1346 URL: https://issues.apache.org/jira/browse/HDFS-1346 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: outOfOrder.patch When running 0.20 patched with HDFS-101, we sometimes see an error as follow: WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq no for block blk_-2871223654872350746_21421120 10280 but received 10281 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570) This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack
[ https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900105#action_12900105 ] sam rash commented on HDFS-1346: not easily--PacketResponder is a non-static inner class. Constructing a BlockReceiver requires a Datanode instance. If you can harness a Datanode, then you need to stub out the DataInputStream and figure out when to fire a callback (somehow when ack.readFields() reads from the DatainputStream, but not before). I think it's possible, but we haven't had time yet DFSClient receives out of order packet ack -- Key: HDFS-1346 URL: https://issues.apache.org/jira/browse/HDFS-1346 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20-append Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.20-append Attachments: outOfOrder.patch When running 0.20 patched with HDFS-101, we sometimes see an error as follow: WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq no for block blk_-2871223654872350746_21421120 10280 but received 10281 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570) This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.