[ 
https://issues.apache.org/jira/browse/HDFS-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Prakash updated HDFS-10529:
----------------------------------
    Attachment: HDFS-10529.000.patch

Attached is a patch that attempts to fix this.

Currently when any new block is created, first an RBW version is made and then 
it is finalized, passed to addFinalizedBlock() in BlockPoolSlice with the 
intended total block size.

By looking at the difference between getOriginalBytesReserved() and 
originalBytesReserved() it is possible to see how many bytes were actually 
written, and the dfsUsage can be increased by this amount. The usage for the 
meta file can be added directly since the old meta is removed upon new block 
creation.


> Df reports incorrect usage when appending less than block size
> --------------------------------------------------------------
>
>                 Key: HDFS-10529
>                 URL: https://issues.apache.org/jira/browse/HDFS-10529
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.2, 3.0.0-alpha1
>            Reporter: Pranav Prakash
>            Priority: Minor
>              Labels: datanode, fs, hdfs
>         Attachments: HDFS-10529.000.patch
>
>
> Steps to recreate issue:
> 1. Create a 100MB file on HDFS cluster with 128MB blocksize and replication 
> factor 3
> 2. Append 100MB to the file
> 3. Df reports around 900MB even though it should only be around 600MB.
> Looking at the blocks confirms that df is incorrect, as there exist only two 
> blocks on each DN -- a 128MB block and a 72MB block.
> This issue seems to arise because BlockPoolSlice does not account for the 
> delta increase in dfsUsage when an append happens to a partially-filled 
> block, and instead naively adds the total block size. For instance, in the 
> example scenario when when block is "filled" from 100 to 128MB, 
> addFinalizedBlock() in BlockPoolSlice adds the size of the newly created 
> block into the total instead of accounting for the difference/delta in block 
> size between old and new.  This has the effect of double-counting the old 
> partially-filled block: it is counted once when it is first created (in the 
> example scenario when the 100MB file is created) and again when it becomes 
> part of the filled block (in the example scenario when the 128MB block is 
> formed form the initial 100MB block). Thus the perceived size becomes 100MB + 
> 128MB + 72 = 300 MB for each DN, or 900MB across the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to