bbeaudreault commented on code in PR #5384: URL: https://github.com/apache/hbase/pull/5384#discussion_r1323097842
########## hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java: ########## @@ -1700,6 +1721,21 @@ protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, } onDiskSizeWithHeader = getOnDiskSizeWithHeader(headerBuf, checksumSupport); } + + // The common case is that onDiskSizeWithHeader was produced by a read without checksum + // validation, so give it a sanity check before trying to use it. + // + // TODO: Should we also check the value vs. some multiple of fileContext.getBlocksize() ? Review Comment: Ok, that's a good point. I was thinking smaller values, but yea it would be good to avoid huge allocations. I think a block should probable be no larger than `fileContext.getBlockSize() + conf.getLong(HRegion.HBASE_MAX_CELL_SIZE_KEY) + someSmallBytesOverhead`. This is of course specific to DATA blocks, but I think INDEX/BLOOM/etc should be well under that. Of course someone could change the value of HBASE_MAX_CELL_SIZE_KEY after writing some larger block, and we should be able to read that. So maybe this is not a safe general assumption for this code path. Ideally the maxBlockSize could be a metadata on the storefile, and we can use that as a very accurate guardrail here. But that would warrant a new JIRA i think, so maybe we go with your idea or even make it configurable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org