[GitHub] [hbase] bbeaudreault commented on a diff in pull request #5384: WIP HBASE-28065 Corrupt HFile data is mishandled in several cases

via GitHub Tue, 12 Sep 2023 07:04:21 -0700


bbeaudreault commented on code in PR #5384:
URL: https://github.com/apache/hbase/pull/5384#discussion_r1323097842



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:
##########
@@ -1700,6 +1721,21 @@ protected HFileBlock 
readBlockDataInternal(FSDataInputStream is, long offset,
         }
         onDiskSizeWithHeader = getOnDiskSizeWithHeader(headerBuf, 
checksumSupport);
       }
+
+      // The common case is that onDiskSizeWithHeader was produced by a read 
without checksum
+      // validation, so give it a sanity check before trying to use it.
+      //
+      // TODO: Should we also check the value vs. some multiple of 
fileContext.getBlocksize() ?

Review Comment:
   Ok, that's a good point. I was thinking smaller values, but yea it would be 
good to avoid huge allocations.
   
   I think a block should probable be no larger than 
`fileContext.getBlockSize() + conf.getLong(HRegion.HBASE_MAX_CELL_SIZE_KEY) + 
someSmallBytesOverhead`. This is of course specific to DATA blocks, but I think 
INDEX/BLOOM/etc should be well under that. Of course someone could change the 
value of HBASE_MAX_CELL_SIZE_KEY after writing some larger block, and we should 
be able to read that. So maybe this is not a safe general assumption for this 
code path.
   
   Ideally the maxBlockSize could be a metadata on the storefile, and we can 
use that as a very accurate guardrail here. But that would warrant a new JIRA i 
think, so maybe we go with your idea or even make it configurable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] bbeaudreault commented on a diff in pull request #5384: WIP HBASE-28065 Corrupt HFile data is mishandled in several cases

Reply via email to