[ https://issues.apache.org/jira/browse/HDFS-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wang updated HDFS-10529: ------------------------------- Target Version/s: 3.0.0-alpha2 (was: 2.8.0, 3.0.0-alpha1) > Df reports incorrect usage when appending less than block size > -------------------------------------------------------------- > > Key: HDFS-10529 > URL: https://issues.apache.org/jira/browse/HDFS-10529 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.2, 3.0.0-alpha1 > Reporter: Pranav Prakash > Assignee: Pranav Prakash > Priority: Minor > Labels: datanode, fs, hdfs > Attachments: HDFS-10529.000.patch > > > Steps to recreate issue: > 1. Create a 100MB file on HDFS cluster with 128MB blocksize and replication > factor 3 > 2. Append 100MB to the file > 3. Df reports around 900MB even though it should only be around 600MB. > Looking at the blocks confirms that df is incorrect, as there exist only two > blocks on each DN -- a 128MB block and a 72MB block. > This issue seems to arise because BlockPoolSlice does not account for the > delta increase in dfsUsage when an append happens to a partially-filled > block, and instead naively adds the total block size. For instance, in the > example scenario when when block is "filled" from 100 to 128MB, > addFinalizedBlock() in BlockPoolSlice adds the size of the newly created > block into the total instead of accounting for the difference/delta in block > size between old and new. This has the effect of double-counting the old > partially-filled block: it is counted once when it is first created (in the > example scenario when the 100MB file is created) and again when it becomes > part of the filled block (in the example scenario when the 128MB block is > formed form the initial 100MB block). Thus the perceived size becomes 100MB + > 128MB + 72 = 300 MB for each DN, or 900MB across the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org