n3nash commented on a change in pull request #2440:
URL: https://github.com/apache/hudi/pull/2440#discussion_r557120061



##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##########
@@ -274,19 +275,27 @@ private boolean isBlockCorrupt(int blocksize) throws 
IOException {
   }
 
   private long scanForNextAvailableBlockOffset() throws IOException {
+    // Make buffer large enough to scan through the file as quick as possible 
especially if it is on S3/GCS.
+    // Using lower buffer is incurring a lot of API calls thus drastically 
increasing the cost of the storage
+    // and also may take days to complete scanning trough the large files.
+    byte[] dataBuf = new byte[1024 * 1024];

Review comment:
       Instead of this, can we do the following in the constructor ?
   
   ```
       if (fsDataInputStream.getWrappedStream() instanceof FSInputStream ) {
         this.inputStream = new TimedFSDataInputStream(logFile.getPath(), new 
FSDataInputStream(
             new BufferedFSInputStream((FSInputStream) 
fsDataInputStream.getWrappedStream(), bufferSize)));
       } else if (**fsDataInputStream.getWrappedStream() instanceof 
FSDataInputStream**) {
    <initialize buffered input stream>
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to