ndimiduk commented on code in PR #5384: URL: https://github.com/apache/hbase/pull/5384#discussion_r1323068390
########## hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java: ########## @@ -1658,16 +1676,19 @@ private ByteBuff allocate(int size, boolean intoHeap) { protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean updateMetrics, boolean intoHeap) throws IOException { - if (offset < 0) { - throw new IOException("Invalid offset=" + offset + " trying to read " + "block (onDiskSize=" - + onDiskSizeWithHeaderL + ")"); - } - final Span span = Span.current(); final AttributesBuilder attributesBuilder = Attributes.builder(); Optional.of(Context.current()).map(val -> val.get(CONTEXT_KEY)) .ifPresent(c -> c.accept(attributesBuilder)); - int onDiskSizeWithHeader = checkAndGetSizeAsInt(onDiskSizeWithHeaderL, hdrSize); + if (offset < 0) { + throw new IOException("Invalid offset=" + offset + " trying to read " + "block (onDiskSize=" + + onDiskSizeWithHeaderL + ")"); + } + if (!checkCallerProvidedOnDiskSizeWithHeader(onDiskSizeWithHeaderL)) { + throw new IOException( Review Comment: I think that's a reasonable approach treat the provided block size as a hint and nothing more. ########## hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java: ########## @@ -1700,6 +1721,21 @@ protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, } onDiskSizeWithHeader = getOnDiskSizeWithHeader(headerBuf, checksumSupport); } + + // The common case is that onDiskSizeWithHeader was produced by a read without checksum + // validation, so give it a sanity check before trying to use it. + // + // TODO: Should we also check the value vs. some multiple of fileContext.getBlocksize() ? Review Comment: Yeah, which is why I left it as a TODO. I'm just thinking about how to put a reasonable upper bound of the size of the memory segment we allocate before performing the read. There's a lot of space between 33 bytes and Integer.MAX_VAL bytes, a good portion of which is not reasonable. I was thinking something absurdly large for a block like 1GiB or 10GiB. Or something like `10^6 * fileContext.getBlocksize()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org