[jira] [Commented] (HDFS-17497) Logic for committed blocks is mixed when computing file size

ASF GitHub Bot (Jira) Tue, 23 Apr 2024 06:18:11 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840099#comment-17840099
 ]


ASF GitHub Bot commented on HDFS-17497:
---------------------------------------

ZanderXu opened a new pull request, #6765:
URL: https://github.com/apache/hadoop/pull/6765

   <p>One in-writing HDFS file may contains multiple committed blocks, as 
follows (assume one file contains three blocks):</p>
   <div class="table-wrap">
   
     | Block 1 | Block 2 | Block 3
   

> Logic for committed blocks is mixed when computing file size
> ------------------------------------------------------------
>
>                 Key: HDFS-17497
>                 URL: https://issues.apache.org/jira/browse/HDFS-17497
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Priority: Major
>
> One in-writing HDFS file may contains multiple committed blocks, as follows 
> (assume one file contains three blocks):
> || ||Block 1||Block 2||Block 3||
> |Case 1|Complete|Commit|UnderConstruction|
> |Case 2|Complete|Commit|Commit|
> |Case 3|Commit|Commit|Commit|
>  
> But the logic for committed blocks is mixed when computing file size, it 
> ignores the bytes of the last committed block and contains the bytes of other 
> committed blocks.
> {code:java}
> public final long computeFileSize(boolean includesLastUcBlock,
>     boolean usePreferredBlockSize4LastUcBlock) {
>   if (blocks.length == 0) {
>     return 0;
>   }
>   final int last = blocks.length - 1;
>   //check if the last block is BlockInfoUnderConstruction
>   BlockInfo lastBlk = blocks[last];
>   long size = lastBlk.getNumBytes();
>   // the last committed block is not complete, so it's bytes may be ignored.
>   if (!lastBlk.isComplete()) {
>      if (!includesLastUcBlock) {
>        size = 0;
>      } else if (usePreferredBlockSize4LastUcBlock) {
>        size = isStriped()?
>            getPreferredBlockSize() *
>                ((BlockInfoStriped)lastBlk).getDataBlockNum() :
>            getPreferredBlockSize();
>      }
>   }
>   // The bytes of other committed blocks are calculated into the file length.
>   for (int i = 0; i < last; i++) {
>     size += blocks[i].getNumBytes();
>   }
>   return size;
> } {code}
> The bytes of one committed block will not be changed, so the bytes of the 
> last committed block should be calculated into the file length too.
>  
> And the logic for committed blocks is mixed too when computing file length in 
> DFSInputStream. Normally DFSInputStream doesn't get visible length for 
> committed block regardless of whether the committed block is the last block 
> or not.
>  
> HDFS-10843 noticed one bug which actually caused by the committed block, but 
> HDFS-10843 fixed that bug in another way.
> The num of bytes of the committed block will no longer change, so we should 
> update the quota usage when the block is committed, which can reduce the 
> delta quota usage in time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17497) Logic for committed blocks is mixed when computing file size

Reply via email to