[ 
https://issues.apache.org/jira/browse/HDFS-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768472#action_12768472
 ] 

stack commented on HDFS-720:
----------------------------

This might be cleaner example.  Its first exception in this DN's log after 
loading started.   Its like we skip/lose a packet because of the NPE?

{code}
...
2009-10-21 18:23:11,323 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 for block blk_-7356834145770439479_1586 responded my status  
for seqno 895
2009-10-21 18:23:11,323 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving one packet for block blk_-7356834145770439479_1586 of length 65024 
seqno 896 offsetInBlock 58025472 lastPacketInBlock false
2009-10-21 18:23:11,323 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-7356834145770439479_1586 1 responded other status  for 
seqno 895
2009-10-21 18:23:11,323 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 adding seqno 896 to ack queue.
2009-10-21 18:23:11,324 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 got seqno = 896
2009-10-21 18:23:11,325 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving one packet for block blk_-7356834145770439479_1586 of length 65024 
seqno 897 offsetInBlock 58090496 lastPacketInBlock false
2009-10-21 18:23:11,325 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 adding seqno 897 to ack queue.
2009-10-21 18:23:11,325 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-7356834145770439479_1586 1 Exception 
java.lang.NullPointerException
    at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
    at java.lang.Thread.run(Thread.java:619)

2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 got seqno = 4546
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 seqno = 4546
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 for block blk_-830254393316092139_1588 responded my status  
for seqno 4546
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-830254393316092139_1588 1 responded other status  for 
seqno 4546
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 got seqno = 4547
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 seqno = 4547
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving one packet for block blk_-7356834145770439479_1586 of length 65024 
seqno 898 offsetInBlock 58155520 lastPacketInBlock false
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 for block blk_-830254393316092139_1588 responded my status  
for seqno 4547
2009-10-21 18:23:11,326 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-830254393316092139_1588 1 responded other status  for 
seqno 4547
2009-10-21 18:23:11,327 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 for block blk_-7356834145770439479_1586 responded my status  
for seqno -2
2009-10-21 18:23:11,327 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-7356834145770439479_1586 1 responded other status  for 
seqno -2
2009-10-21 18:23:11,327 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 for block blk_-7356834145770439479_1586 terminating
...
{code}

We keep going with the download... then this on the finish up:

{code}
2009-10-21 18:23:11,338 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 1 for block blk_-830254393316092139_1588 responded my status  
for seqno 4555
2009-10-21 18:23:11,338 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
Number of active connections is: 184
2009-10-21 18:23:11,338 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-830254393316092139_1588 1 responded other status  for 
seqno 4555
2009-10-21 18:23:11,339 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: 
writeBlock receive buf size 131071 tcp no delay true
2009-10-21 18:23:11,339 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_-7356834145770439479_1586 src: /XX,XX,XX.142:49468 dest: 
/XX,XX,XX.139:51010
2009-10-21 18:23:11,339 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Recover the RBW replica blk_-7356834145770439479_1586
2009-10-21 18:23:11,339 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Recovering replica ReplicaBeingWritten, blk_-7356834145770439479_1586, RBW
  getNumBytes()     = 58415616
  getBytesOnDisk()  = 58415616
  getVisibleLength()= 58025472
  getVolume()       = /d3/stack/dfs.data.dir/current/finalized
  getBlockFile()    = 
/d3/stack/dfs.data.dir/current/rbw/blk_-7356834145770439479
  bytesAcked=58025472
  bytesOnDisk=58415616
{code}



> NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
> ----------------------------------------------------------------
>
>                 Key: HDFS-720
>                 URL: https://issues.apache.org/jira/browse/HDFS-720
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.21.0
>         Environment: Current branch-0.21 of hdfs, mapreduce, and common.  
> Here is svn info:
> URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
> Repository Root: https://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 827883
> Node Kind: directory
> Schedule: normal
> Last Changed Author: szetszwo
> Last Changed Rev: 826906
> Last Changed Date: 2009-10-20 00:16:25 +0000 (Tue, 20 Oct 2009)
>            Reporter: stack
>         Attachments: dn.log
>
>
> Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010:
> {code}
> 2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: 
> /XX.XX.XX.139:51010
> 2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder blk_6345892463926159834_1029 1 Exception 
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
>         at java.lang.Thread.run(Thread.java:619)
> {code}
> On XX,XX,XX.140 side, it looks like this:
> {code}
> 10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: 
> /XX.XX.XX140:51010
> 2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder 2 for block blk_6345892463926159834_1029 terminating
> 2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.140:51010, 
> storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, 
> ipcPort=51020):Exception writing block blk_6345892463926159834_1029 to mirror 
> XX.XX.XX.139:51010
> java.io.IOException: Connection reset by peer
>     at sun.nio.ch.FileDispatcher.write0(Native Method)
>     at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>     at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>     at sun.nio.ch.IOUtil.write(IOUtil.java:75)
>     at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>     at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
>     at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>     at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>     at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>     at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>     at java.io.DataOutputStream.write(DataOutputStream.java:90)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352)
>     at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
>     at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
>     at java.lang.Thread.run(Thread.java:619)
> {code}
> Here is the bit of code inside the run method:
> {code}
>  922                   pkt = ackQueue.getFirst();
>  923                   expected = pkt.seqno;
> {code}
> So 'pkt' is null?  But LinkedList API says that it throws 
> NoSuchElementException if list is empty so you'd think we wouldn't get a NPE 
> here.  What am I missing?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to