[GitHub] [hbase] ndimiduk commented on a diff in pull request #5384: WIP HBASE-28065 Corrupt HFile data is mishandled in several cases

via GitHub Tue, 12 Sep 2023 06:51:38 -0700


ndimiduk commented on code in PR #5384:
URL: https://github.com/apache/hbase/pull/5384#discussion_r1323068390



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:
##########
@@ -1658,16 +1676,19 @@ private ByteBuff allocate(int size, boolean intoHeap) {
     protected HFileBlock readBlockDataInternal(FSDataInputStream is, long 
offset,
       long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
boolean updateMetrics,
       boolean intoHeap) throws IOException {
-      if (offset < 0) {
-        throw new IOException("Invalid offset=" + offset + " trying to read " 
+ "block (onDiskSize="
-          + onDiskSizeWithHeaderL + ")");
-      }
-
       final Span span = Span.current();
       final AttributesBuilder attributesBuilder = Attributes.builder();
       Optional.of(Context.current()).map(val -> val.get(CONTEXT_KEY))
         .ifPresent(c -> c.accept(attributesBuilder));
-      int onDiskSizeWithHeader = checkAndGetSizeAsInt(onDiskSizeWithHeaderL, 
hdrSize);
+      if (offset < 0) {
+        throw new IOException("Invalid offset=" + offset + " trying to read " 
+ "block (onDiskSize="
+          + onDiskSizeWithHeaderL + ")");
+      }
+      if (!checkCallerProvidedOnDiskSizeWithHeader(onDiskSizeWithHeaderL)) {
+        throw new IOException(

Review Comment:
   I think that's a reasonable approach treat the provided block size as a hint 
and nothing more.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:
##########
@@ -1700,6 +1721,21 @@ protected HFileBlock 
readBlockDataInternal(FSDataInputStream is, long offset,
         }
         onDiskSizeWithHeader = getOnDiskSizeWithHeader(headerBuf, 
checksumSupport);
       }
+
+      // The common case is that onDiskSizeWithHeader was produced by a read 
without checksum
+      // validation, so give it a sanity check before trying to use it.
+      //
+      // TODO: Should we also check the value vs. some multiple of 
fileContext.getBlocksize() ?

Review Comment:
   Yeah, which is why I left it as a TODO.
   
   I'm just thinking about how to put a reasonable upper bound of the size of 
the memory segment we allocate before performing the read. There's a lot of 
space between 33 bytes and Integer.MAX_VAL bytes, a good portion of which is 
not reasonable. I was thinking something absurdly large for a block like 1GiB 
or 10GiB. Or something like `10^6 * fileContext.getBlocksize()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a diff in pull request #5384: WIP HBASE-28065 Corrupt HFile data is mishandled in several cases

Reply via email to