[GitHub] [hadoop] steveloughran commented on a diff in pull request #4862: HADOOP-18439. Fix VectoredIO for LocalFileSystem when checksum is enabled.

GitBox Wed, 07 Sep 2022 12:05:43 -0700


steveloughran commented on code in PR #4862:
URL: https://github.com/apache/hadoop/pull/4862#discussion_r965174231



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       IntBuffer sums = sumsBytes.asIntBuffer();
       sums.position(offset / FSInputChecker.CHECKSUM_SIZE);
       ByteBuffer current = data.duplicate();
-      int numChunks = data.remaining() / bytesPerSum;
+      int numFullChunks = data.remaining() / bytesPerSum;
+      boolean partialChunk = (data.remaining() % bytesPerSum != 0);

Review Comment:
   wrap in () because % isn't used enough for the precedence to be immediately 
obvious



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -396,11 +407,33 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       return data;
     }
 
+    /**
+     * Validates range parameters.
+     * In case of CheckSum FS, we already have calculated
+     * fileLength so failing fast here.
+     * @param ranges requested ranges.
+     * @param fileLen length of file.
+     * @throws EOFException end of file exception.
+     */
+    private void validateRangeRequest(List<? extends FileRange> ranges, long 
fileLen) throws EOFException {
+      for (FileRange range : ranges) {
+        VectoredReadUtils.validateRangeRequest(range);
+        if (range.getOffset() + range.getLength() > fileLen) {
+          LOG.warn("Requested range [{}, {}) is beyond EOF for path {}",
+                  range.getOffset(), range.getLength(), file);
+          throw new EOFException("Requested range [" + range.getOffset() + ", "
+                  + range.getLength() + ") is beyond EOF for path " + file);
+        }
+      }
+    }
+
     @Override
     public void readVectored(List<? extends FileRange> ranges,
                              IntFunction<ByteBuffer> allocate) throws 
IOException {
+      long length = fs.getFileStatus(file).getLen();

Review Comment:
   1. use `getFileLength()` which does the on demand read
   2. move that code to use getFileStatus() as the way it does it today makes 
no sense. excessively inefficient.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:
##########
@@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes,
       IntBuffer sums = sumsBytes.asIntBuffer();
       sums.position(offset / FSInputChecker.CHECKSUM_SIZE);
       ByteBuffer current = data.duplicate();
-      int numChunks = data.remaining() / bytesPerSum;
+      int numFullChunks = data.remaining() / bytesPerSum;
+      boolean partialChunk = (data.remaining() % bytesPerSum != 0);
+      int totalChunks = numFullChunks;
+      if (partialChunk) {
+        totalChunks++;
+      }
       CRC32 crc = new CRC32();
       // check each chunk to ensure they match
-      for(int c = 0; c < numChunks; ++c) {
+      for(int c = 0; c < totalChunks; ++c) {
         // set the buffer position and the limit
-        current.limit((c + 1) * bytesPerSum);
+        if (c == numFullChunks) {
+          int lastIncompleteChunk = data.remaining() % bytesPerSum;

Review Comment:
   add comment to explain what is happening



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] steveloughran commented on a diff in pull request #4862: HADOOP-18439. Fix VectoredIO for LocalFileSystem when checksum is enabled.

Reply via email to