[ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019492#comment-14019492
 ] 

stanley shi commented on HDFS-6489:
-----------------------------------

No, the disk usage will not go back down untill after 10 minutes (the next 
round of "du -sk" is called). 

I printed the stack trace of the BlockPoolSlice.addBlock, which does the actual 
increase of the disk usage
{quote}at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addBlock(BlockPoolSlice.ja
va:157)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlock(FsVolumeImpl.java:1
67)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetIm
pl.java:961)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl
.java:942)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:727)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
        at java.lang.Thread.run(Thread.java:722){quote}
we can see it is called at "finalizeBlock" phase. at this phase, we should 
already know how much data we wrote, then increasing the dfs usage with the 
total blocksize doesn't make sense.


> DFS Used space is not correct computed on frequent append operations
> --------------------------------------------------------------------
>
>                 Key: HDFS-6489
>                 URL: https://issues.apache.org/jira/browse/HDFS-6489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.2.0
>            Reporter: stanley shi
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3              16G  2.9G   13G  20% /
> tmpfs                 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1              97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to