bbeaudreault commented on code in PR #5384: URL: https://github.com/apache/hbase/pull/5384#discussion_r1321591860
########## hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java: ########## @@ -1721,14 +1765,36 @@ protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, if (headerBuf == null) { headerBuf = onDiskBlock.duplicate().position(0).limit(hdrSize); } - // Do a few checks before we go instantiate HFileBlock. - assert onDiskSizeWithHeader > this.hdrSize; - verifyOnDiskSizeMatchesHeader(onDiskSizeWithHeader, headerBuf, offset, checksumSupport); + ByteBuff curBlock = onDiskBlock.duplicate().position(0).limit(onDiskSizeWithHeader); // Verify checksum of the data before using it for building HFileBlock. if (verifyChecksum && !validateChecksum(offset, curBlock, hdrSize)) { + invalidateNextBlockHeader(); + span.addEvent("Falling back to HDFS checksumming.", attributesBuilder.build()); return null; } + + // TODO: is this check necessary or can we proceed with a provided value regardless of Review Comment: Thanks for tracing that. So I think 4 actually seems reasonably concerning. However, now I'm trying to think how much this even matters: - So here we're yet again validating onDiskSizeWithHeader. - However, if we've reached this point, we've already used onDiskSizeWithHeader to successfully allocate a buffer, read into it, and verify checksums - So at this point, if verifyChecksum is true then I think we can trust onDiskSizeWithHeader. Otherwise some part of the above surely would have failed. - In terms of the case where verifyChecksum is false, I suppose it depends on how much we trust HDFS checksums and/or want to validate against a non-corruption hbase-internal bug. Given the hotness of this codepath, maybe we don't want to do the work of unit tests here (protecting against hbase internal bugs)? So maybe it depends on trusting hdfs checksums, which seems reasonable? In that case, maybe don't do this verification? WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org