Re: [PR] HADOOP-19139.No GetPathStatus for opening AbfsInputStream [hadoop]

via GitHub Fri, 19 Jul 2024 01:57:53 -0700


saxenapranav commented on code in PR #6699:
URL: https://github.com/apache/hadoop/pull/6699#discussion_r1683883811



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##########
@@ -385,32 +434,74 @@ private int readLastBlock(final byte[] b, final int off, 
final int len)
     // data need to be copied to user buffer from index bCursor,
     // AbfsInutStream buffer is going to contain data from last block start. In
     // that case bCursor will be set to fCursor - lastBlockStart
-    long lastBlockStart = max(0, contentLength - footerReadSize);
+    if (!getFileStatusInformationPresent()) {
+      long lastBlockStart = max(0, (fCursor + len) - footerReadSize);
+      bCursor = (int) (fCursor - lastBlockStart);
+      return optimisedRead(b, off, len, lastBlockStart, min(fCursor + len, 
footerReadSize), true);
+    }
+    long lastBlockStart = max(0, getContentLength() - footerReadSize);
     bCursor = (int) (fCursor - lastBlockStart);
     // 0 if contentlength is < buffersize
-    long actualLenToRead = min(footerReadSize, contentLength);
-    return optimisedRead(b, off, len, lastBlockStart, actualLenToRead);
+    long actualLenToRead = min(footerReadSize, getContentLength());
+    return optimisedRead(b, off, len, lastBlockStart, actualLenToRead, false);
   }
 
   private int optimisedRead(final byte[] b, final int off, final int len,
-      final long readFrom, final long actualLen) throws IOException {
+      final long readFrom, final long actualLen,
+      final boolean isOptimizedReadWithoutContentLengthInformation) throws 
IOException {
     fCursor = readFrom;
     int totalBytesRead = 0;
     int lastBytesRead = 0;
     try {
       buffer = new byte[bufferSize];
+      boolean fileStatusInformationPresentBeforeRead = 
getFileStatusInformationPresent();
+      /*
+       * Content length would not be available for the first optimized read in 
case
+       * of lazy head optimization in inputStream. In such case, read of the 
first optimized read
+       * would be done without the contentLength constraint. Post first call, 
the contentLength
+       * would be present and should be used for further reads.
+       */
       for (int i = 0;
-           i < MAX_OPTIMIZED_READ_ATTEMPTS && fCursor < contentLength; i++) {
+           i < MAX_OPTIMIZED_READ_ATTEMPTS && 
(!getFileStatusInformationPresent()
+               || fCursor < getContentLength()); i++) {
         lastBytesRead = readInternal(fCursor, buffer, limit,
             (int) actualLen - limit, true);
         if (lastBytesRead > 0) {
           totalBytesRead += lastBytesRead;
           limit += lastBytesRead;
           fCursor += lastBytesRead;
           fCursorAfterLastRead = fCursor;
+
+          /*
+           * In non-lazily opened inputStream, the contentLength would be 
available before
+           * opening the inputStream. In such case, optimized read would 
always be done
+           * on the last part of the file.
+           *
+           * In lazily opened inputStream, the contentLength would not be 
available before
+           * opening the inputStream. In such case, contentLength conditioning 
would not be
+           * applied to execute optimizedRead. Hence, the optimized read may 
not be done on the
+           * last part of the file. If the optimized read is done on the 
non-last part of the
+           * file, inputStream should read only the amount of data requested 
by optimizedRead,
+           * as the buffer supplied would be only of the size of the data 
requested by optimizedRead.
+           */
+          boolean shouldBreak = !fileStatusInformationPresentBeforeRead
+              && totalBytesRead == (int) actualLen;
+          if (shouldBreak) {
+            break;
+          }
         }
       }
     } catch (IOException e) {
+      if (e instanceof FileNotFoundException) {

Review Comment:
   Good point. Taken.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Re: [PR] HADOOP-19139.No GetPathStatus for opening AbfsInputStream [hadoop]

Reply via email to