[ https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601479#comment-17601479 ]
ASF GitHub Bot commented on HADOOP-18439: ----------------------------------------- steveloughran commented on code in PR #4862: URL: https://github.com/apache/hadoop/pull/4862#discussion_r965174231 ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java: ########## @@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes, IntBuffer sums = sumsBytes.asIntBuffer(); sums.position(offset / FSInputChecker.CHECKSUM_SIZE); ByteBuffer current = data.duplicate(); - int numChunks = data.remaining() / bytesPerSum; + int numFullChunks = data.remaining() / bytesPerSum; + boolean partialChunk = (data.remaining() % bytesPerSum != 0); Review Comment: wrap in () because % isn't used enough for the precedence to be immediately obvious ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java: ########## @@ -396,11 +407,33 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes, return data; } + /** + * Validates range parameters. + * In case of CheckSum FS, we already have calculated + * fileLength so failing fast here. + * @param ranges requested ranges. + * @param fileLen length of file. + * @throws EOFException end of file exception. + */ + private void validateRangeRequest(List<? extends FileRange> ranges, long fileLen) throws EOFException { + for (FileRange range : ranges) { + VectoredReadUtils.validateRangeRequest(range); + if (range.getOffset() + range.getLength() > fileLen) { + LOG.warn("Requested range [{}, {}) is beyond EOF for path {}", + range.getOffset(), range.getLength(), file); + throw new EOFException("Requested range [" + range.getOffset() + ", " + + range.getLength() + ") is beyond EOF for path " + file); + } + } + } + @Override public void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate) throws IOException { + long length = fs.getFileStatus(file).getLen(); Review Comment: 1. use `getFileLength()` which does the on demand read 2. move that code to use getFileStatus() as the way it does it today makes no sense. excessively inefficient. ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java: ########## @@ -371,12 +371,23 @@ static ByteBuffer checkBytes(ByteBuffer sumsBytes, IntBuffer sums = sumsBytes.asIntBuffer(); sums.position(offset / FSInputChecker.CHECKSUM_SIZE); ByteBuffer current = data.duplicate(); - int numChunks = data.remaining() / bytesPerSum; + int numFullChunks = data.remaining() / bytesPerSum; + boolean partialChunk = (data.remaining() % bytesPerSum != 0); + int totalChunks = numFullChunks; + if (partialChunk) { + totalChunks++; + } CRC32 crc = new CRC32(); // check each chunk to ensure they match - for(int c = 0; c < numChunks; ++c) { + for(int c = 0; c < totalChunks; ++c) { // set the buffer position and the limit - current.limit((c + 1) * bytesPerSum); + if (c == numFullChunks) { + int lastIncompleteChunk = data.remaining() % bytesPerSum; Review Comment: add comment to explain what is happening > Fix VectoredIO for LocalFileSystem when checksum is enabled. > ------------------------------------------------------------ > > Key: HADOOP-18439 > URL: https://issues.apache.org/jira/browse/HADOOP-18439 > Project: Hadoop Common > Issue Type: Sub-task > Components: common > Affects Versions: 3.3.9 > Reporter: Mukund Thakur > Assignee: Mukund Thakur > Priority: Major > Labels: pull-request-available > > While merging the ranges in CheckSumFs, they are rounded up based on the > value of checksum bytes size > which leads to some ranges crossing the EOF thus they need to be fixed else > it will cause EOFException during actual reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org