bbeaudreault commented on code in PR #5384:
URL: https://github.com/apache/hbase/pull/5384#discussion_r1321591860


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:
##########
@@ -1721,14 +1765,36 @@ protected HFileBlock 
readBlockDataInternal(FSDataInputStream is, long offset,
         if (headerBuf == null) {
           headerBuf = onDiskBlock.duplicate().position(0).limit(hdrSize);
         }
-        // Do a few checks before we go instantiate HFileBlock.
-        assert onDiskSizeWithHeader > this.hdrSize;
-        verifyOnDiskSizeMatchesHeader(onDiskSizeWithHeader, headerBuf, offset, 
checksumSupport);
+
         ByteBuff curBlock = 
onDiskBlock.duplicate().position(0).limit(onDiskSizeWithHeader);
         // Verify checksum of the data before using it for building HFileBlock.
         if (verifyChecksum && !validateChecksum(offset, curBlock, hdrSize)) {
+          invalidateNextBlockHeader();
+          span.addEvent("Falling back to HDFS checksumming.", 
attributesBuilder.build());
           return null;
         }
+
+        // TODO: is this check necessary or can we proceed with a provided 
value regardless of

Review Comment:
   Thanks for tracing that. So I think 4 actually seems reasonably concerning. 
However, now I'm trying to think how much this even matters:
   
   - So here we're yet again validating onDiskSizeWithHeader.
   - However, if we've reached this point, we've already used 
onDiskSizeWithHeader to successfully allocate a buffer, read into it, and 
verify checksums
   - So at this point, if verifyChecksum is true then I think we can trust 
onDiskSizeWithHeader. Otherwise some part of the above surely would have failed.
   - In terms of the case where verifyChecksum is false, I suppose it depends 
on how much we trust HDFS checksums and/or want to validate against a 
non-corruption hbase-internal bug.
   
   Given the hotness of this codepath, maybe we don't want to do the work of 
unit tests here (protecting against hbase internal bugs)? So maybe it depends 
on trusting hdfs checksums, which seems reasonable? In that case, maybe don't 
do this verification?
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to