[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack

2010-09-21 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913180#action_12913180
 ] 

Hairong Kuang commented on HDFS-1346:
-

This patch has been deployed on our production cluster for a while and it seems 
to have fixed the bug. The fix helps reducing the chance of losing data when 
hflushing.

Shall we commit it to the 0.20 append branch?

 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: blockrecv-diff.txt, outOfOrder.patch


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack

2010-09-21 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913396#action_12913396
 ] 

dhruba borthakur commented on HDFS-1346:


Hi Hairong, I agree with your observation.  Please commit it to 0.20 append 
branch too. Thanks.

 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: blockrecv-diff.txt, outOfOrder.patch


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack

2010-08-19 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900465#action_12900465
 ] 

Hairong Kuang commented on HDFS-1346:
-

Todd, yours is missing this patch: 
https://issues.apache.org/jira/secure/attachment/12439379/pipelineHeartbeat.patch.
 HDFS-101 says that  it fixes a bug of incorrect handle of pipeline heartbeat 
in yahoo's hadoop security branch 0.20.  But I did not put the bug description 
there.

Koji, do you still remember what exact problem that pipelineHeartbeat.patch is 
fixed?

 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: blockrecv-diff.txt, outOfOrder.patch


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack

2010-08-18 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899942#action_12899942
 ] 

Hairong Kuang commented on HDFS-1346:
-

This seems to be caused by a race condition in 
BlockReceiver#PacketResponder.run(). It has the following piece of code where 
mirrorError is shared by two threads:
{code}
1:  if (!mirrorError) {
2:// read an ack from downstream datanode
3:ack.readFields(mirrorIn);
4:...
5:seqno = ack.getSeqno();
6:  }
7:  if (seqno = 0 || mirrorError) {
8:Packet pkt = null;
9:synchronized (this) {
10:...
11:pkt = ackQueue.removeFirst();
12:expected = pkt.seqno;
13:...;
14:   }
15: }
{code}

If starting at line 1, mirrorError is false, the thread reads an ack from a 
downsream datanode (line 3). If it happens that the ack is for a heartbeat 
packet, seqno is -1 (line 5). Then if it happens that the other thread changes 
mirrorError to be true in between lines 4 and 5, the condition becomes true on 
line 7. A data packet is removed from ackQueue on line 12, which should not 
because the ack is for a heartbeat packet not for a data packet. So an ack for 
a data packet ends up being dropped.

 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack

2010-08-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900081#action_12900081
 ] 

Todd Lipcon commented on HDFS-1346:
---

Any way to trigger this with a unit test that would inject a Thread.sleep in 
there using mockito? Or just too hard? (I'm surprised I haven't run into this 
ever in append testing)

 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: outOfOrder.patch


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1346) DFSClient receives out of order packet ack

2010-08-18 Thread sam rash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900105#action_12900105
 ] 

sam rash commented on HDFS-1346:


not easily--PacketResponder is a non-static inner class.  Constructing a 
BlockReceiver requires a Datanode instance.  
If you can harness a Datanode, then you need to stub out the DataInputStream 
and figure out when to fire a callback (somehow when ack.readFields() reads 
from the DatainputStream, but not before).

I think it's possible, but we haven't had time yet


 DFSClient receives out of order packet ack
 --

 Key: HDFS-1346
 URL: https://issues.apache.org/jira/browse/HDFS-1346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20-append
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: outOfOrder.patch


 When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block 
 blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: 
 Expecting seq
 no for block blk_-2871223654872350746_21421120 10280 but received 10281
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
 This indicates that DFS client expects an ack for packet N, but receives an 
 ack for packet N+1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.