saxenapranav commented on code in PR #6699: URL: https://github.com/apache/hadoop/pull/6699#discussion_r1579170762
########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java: ########## @@ -376,32 +439,48 @@ private int readLastBlock(final byte[] b, final int off, final int len) // data need to be copied to user buffer from index bCursor, // AbfsInutStream buffer is going to contain data from last block start. In // that case bCursor will be set to fCursor - lastBlockStart - long lastBlockStart = max(0, contentLength - footerReadSize); + if (!fileStatusInformationPresent.get()) { + long lastBlockStart = max(0, (fCursor + len) - footerReadSize); + bCursor = (int) (fCursor - lastBlockStart); + return optimisedRead(b, off, len, lastBlockStart, min(fCursor + len, footerReadSize), true); + } + long lastBlockStart = max(0, getContentLength() - footerReadSize); bCursor = (int) (fCursor - lastBlockStart); // 0 if contentlength is < buffersize - long actualLenToRead = min(footerReadSize, contentLength); - return optimisedRead(b, off, len, lastBlockStart, actualLenToRead); + long actualLenToRead = min(footerReadSize, getContentLength()); + return optimisedRead(b, off, len, lastBlockStart, actualLenToRead, false); } private int optimisedRead(final byte[] b, final int off, final int len, - final long readFrom, final long actualLen) throws IOException { + final long readFrom, final long actualLen, + final boolean isReadWithoutContentLengthInformation) throws IOException { fCursor = readFrom; int totalBytesRead = 0; int lastBytesRead = 0; try { buffer = new byte[bufferSize]; + boolean fileStatusInformationPresentBeforeRead = fileStatusInformationPresent.get(); for (int i = 0; - i < MAX_OPTIMIZED_READ_ATTEMPTS && fCursor < contentLength; i++) { + i < MAX_OPTIMIZED_READ_ATTEMPTS && (!fileStatusInformationPresent.get() + || fCursor < getContentLength()); i++) { lastBytesRead = readInternal(fCursor, buffer, limit, (int) actualLen - limit, true); if (lastBytesRead > 0) { totalBytesRead += lastBytesRead; + boolean shouldBreak = !fileStatusInformationPresentBeforeRead + && totalBytesRead == (int) actualLen; limit += lastBytesRead; fCursor += lastBytesRead; fCursorAfterLastRead = fCursor; + if (shouldBreak) { + break; + } } } } catch (IOException e) { + if (isNonRetriableOptimizedReadException(e)) { + throw e; Review Comment: So, this is the case where there is lazy optimization , and inputStream is not aware of contentlength. Now, on the first read, it could go into an optimized block, and would try to read. In a non-lazy case, in trunk, if there is IOException raised in optimizeRead, it tries to read with ReadOneBlock. Now, in non-lazy case, it cannot be a case that an inputStream gets created for non-existing path. But, in a lazy case, inputStream can be created for non-existing path, and optimizeRead can be tried for it. Now, when the optimizeRead is failing with FileNotFound, the inputStream should fail and not try readOneBlock. I should add a comment for better code understanding in future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org