[ 
https://issues.apache.org/jira/browse/HDFS-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771326#action_12771326
 ] 

dhruba borthakur commented on HDFS-743:
---------------------------------------

An application wrote 200 bytes to a file but failed to close the file. The 
replicas all have 200 bytes of the file. The namenode still thinks that the 
filesize is zero bytes. When the hardlease expires, the namenode writes the 
OP_CLOSE record to the transaction log, the filelength for this block recorded  
in the transaction log is zero. This by itself is not a bug. In hadoop 0.17, 
the namenode does not contact datanodes while doing lease recovery.

If the namenode is rebooted, the filesize continues to be displayed as zero. 
The block report from the datanodes sends the blocksize to be 200 bytes. But 
because of a bug in DatanodeDescriptor.reportDiff, this block size of 200 is 
ignored by the namenode. This, by itself is still not a bug because the file 
continues to be displayed as having zero size.

Now, if a datanode dies, the block of size 200 is replicated from one of the 
original datanodes to a new datanode. The new datanode sends a blockReceived 
command. The blockReceived command has the size 200 and the namenode accepts 
this value as the true size of the block and updates the block length to be 200 
bytes. The file is now displayed as having 200 bytes.

If one restarts the namenode, the file goes back to being zero size until the 
time when the block needs to be replicated again and the file  shows up being 
of size 200. This is the cause of the mystery of the "fluctuating file size". 

> file size is fluctuating although file is closed
> ------------------------------------------------
>
>                 Key: HDFS-743
>                 URL: https://issues.apache.org/jira/browse/HDFS-743
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>
> I am seeing that the length of a file sometimes becomes zero after a namenode 
> restart. These files have only one block. All the three replicas of that 
> block on the datanode(s) has non-zero size. Increasing the replication factor 
> of the file causes the file to show its correct non-zero length.
> I am marking this as a blocker because it is still to be investigated which 
> releases it affects. I am seeing this on 0.17.x very frequently. I might have 
> seen this on 0.20.x but do not have a reproducible case yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to