[
https://issues.apache.org/jira/browse/HDFS-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023363#comment-18023363
]
ASF GitHub Bot commented on HDFS-17497:
---------------------------------------
github-actions[bot] commented on PR #6765:
URL: https://github.com/apache/hadoop/pull/6765#issuecomment-3342139953
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Logic for committed blocks is mixed when computing file size
> ------------------------------------------------------------
>
> Key: HDFS-17497
> URL: https://issues.apache.org/jira/browse/HDFS-17497
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: ZanderXu
> Priority: Major
> Labels: pull-request-available
>
> One in-writing HDFS file may contains multiple committed blocks, as follows
> (assume one file contains three blocks):
> || ||Block 1||Block 2||Block 3||
> |Case 1|Complete|Commit|UnderConstruction|
> |Case 2|Complete|Commit|Commit|
> |Case 3|Commit|Commit|Commit|
>
> But the logic for committed blocks is mixed when computing file size, it
> ignores the bytes of the last committed block and contains the bytes of other
> committed blocks.
> {code:java}
> public final long computeFileSize(boolean includesLastUcBlock,
> boolean usePreferredBlockSize4LastUcBlock) {
> if (blocks.length == 0) {
> return 0;
> }
> final int last = blocks.length - 1;
> //check if the last block is BlockInfoUnderConstruction
> BlockInfo lastBlk = blocks[last];
> long size = lastBlk.getNumBytes();
> // the last committed block is not complete, so it's bytes may be ignored.
> if (!lastBlk.isComplete()) {
> if (!includesLastUcBlock) {
> size = 0;
> } else if (usePreferredBlockSize4LastUcBlock) {
> size = isStriped()?
> getPreferredBlockSize() *
> ((BlockInfoStriped)lastBlk).getDataBlockNum() :
> getPreferredBlockSize();
> }
> }
> // The bytes of other committed blocks are calculated into the file length.
> for (int i = 0; i < last; i++) {
> size += blocks[i].getNumBytes();
> }
> return size;
> } {code}
> The bytes of one committed block will not be changed, so the bytes of the
> last committed block should be calculated into the file length too.
>
> And the logic for committed blocks is mixed too when computing file length in
> DFSInputStream. Normally DFSInputStream does not need to get visible length
> for committed block regardless of whether the committed block is the last
> block or not.
>
> -HDFS-10843- encountered one bug which actually caused by the committed
> block, but -HDFS-10843- fixed that bug by updating quota usage when
> completing block. The num of bytes of the committed block will no longer
> change, so we should update the quota usage when the block is committed,
> which can reduce the delta quota usage in time.
>
> So there are somethings we need to do:
> * Unify the calculation logic for all committed blocks in
> {{computeFileSize}} of {{INodeFile}}
> * Unify the calculation logic for all committed blocks in {{getFileLength}}
> of {{DFSInputStream}}
> * Update quota usage when committing block
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]