[ https://issues.apache.org/jira/browse/HADOOP-19139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840729#comment-17840729 ]
ASF GitHub Bot commented on HADOOP-19139: ----------------------------------------- saxenapranav commented on code in PR #6699: URL: https://github.com/apache/hadoop/pull/6699#discussion_r1579180763 ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java: ########## @@ -376,32 +439,48 @@ private int readLastBlock(final byte[] b, final int off, final int len) // data need to be copied to user buffer from index bCursor, // AbfsInutStream buffer is going to contain data from last block start. In // that case bCursor will be set to fCursor - lastBlockStart - long lastBlockStart = max(0, contentLength - footerReadSize); + if (!fileStatusInformationPresent.get()) { + long lastBlockStart = max(0, (fCursor + len) - footerReadSize); + bCursor = (int) (fCursor - lastBlockStart); + return optimisedRead(b, off, len, lastBlockStart, min(fCursor + len, footerReadSize), true); + } + long lastBlockStart = max(0, getContentLength() - footerReadSize); bCursor = (int) (fCursor - lastBlockStart); // 0 if contentlength is < buffersize - long actualLenToRead = min(footerReadSize, contentLength); - return optimisedRead(b, off, len, lastBlockStart, actualLenToRead); + long actualLenToRead = min(footerReadSize, getContentLength()); + return optimisedRead(b, off, len, lastBlockStart, actualLenToRead, false); } private int optimisedRead(final byte[] b, final int off, final int len, - final long readFrom, final long actualLen) throws IOException { + final long readFrom, final long actualLen, + final boolean isReadWithoutContentLengthInformation) throws IOException { fCursor = readFrom; int totalBytesRead = 0; int lastBytesRead = 0; try { buffer = new byte[bufferSize]; + boolean fileStatusInformationPresentBeforeRead = fileStatusInformationPresent.get(); for (int i = 0; - i < MAX_OPTIMIZED_READ_ATTEMPTS && fCursor < contentLength; i++) { + i < MAX_OPTIMIZED_READ_ATTEMPTS && (!fileStatusInformationPresent.get() + || fCursor < getContentLength()); i++) { lastBytesRead = readInternal(fCursor, buffer, limit, (int) actualLen - limit, true); if (lastBytesRead > 0) { totalBytesRead += lastBytesRead; + boolean shouldBreak = !fileStatusInformationPresentBeforeRead + && totalBytesRead == (int) actualLen; limit += lastBytesRead; fCursor += lastBytesRead; fCursorAfterLastRead = fCursor; + if (shouldBreak) { + break; + } } } } catch (IOException e) { + if (isNonRetriableOptimizedReadException(e)) { + throw e; Review Comment: adding: ``` /* * FileNotFoundException in AbfsInputStream read can happen only in case of * lazy optimization enabled. In such case, the contentLength is not known * before opening the inputStream, and the first read can give a * FileNotFoundException, and if this exception is raised, it has to be * thrown back to the application and make a readOneBlock call. */ ``` > [ABFS]: No GetPathStatus call for opening AbfsInputStream > --------------------------------------------------------- > > Key: HADOOP-19139 > URL: https://issues.apache.org/jira/browse/HADOOP-19139 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Reporter: Pranav Saxena > Assignee: Pranav Saxena > Priority: Major > Labels: pull-request-available > > Read API gives contentLen and etag of the path. This information would be > used in future calls on that inputStream. Prior information of eTag is of not > much importance. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org