hsnusonic commented on a change in pull request #996:
URL: https://github.com/apache/orc/pull/996#discussion_r784223798



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -180,12 +180,18 @@ public static long estimateRgEndOffset(boolean 
isCompressed,
                                          boolean isLast,
                                          long nextGroupOffset,
                                          long streamLength) {
+    if (isCompressed && bufferSize <= 0) {
+      throw new IllegalArgumentException("BufferSize must be > 0 but was " + 
bufferSize);
+    }
     // figure out the worst case last location
     // if adjacent groups have the same compressed block offset then stretch 
the slop
-    // by factor of 2 to safely accommodate the next compression block.
-    // One for the current compression block and another for the next 
compression block.
+    // by a factor to safely accommodate the next compression block.
+    // We need to calculate the maximum number of blocks by bufferSize 
accordingly.
+    final int stretchFactor = isCompressed
+        ? 2 + (MAX_VALUES_LENGTH * MAX_BIT_WIDTH / 8 - 1) / bufferSize

Review comment:
       Thanks for the suggestion! That's better.
   
   Yes, divided by 8 is the conversion from bits to bytes. Let me use number to 
explain.
   ```
   2 + (MAX_VALUES_LENGTH * MAX_BIT_WIDTH / 8 - 1) / bufferSize
   = 1 + 1 + (512 * 64 / 8 - 1) / bufferSize
   = 1 + 1 + (4096 - 1) / bufferSize
   // the first 1 is for the bytes in uncompressed buffer since these bytes 
will fit into 1 block
   // the remain part should be viewed together as round up operation of 4096 / 
bufferSize
   // if bufferSize = 4096 => stretchFactor = 1 + 1 + 0 
   // if bufferSize = 2048 => stretchFactor = 1 + 1 + 1
   // if bufferSize = 2047 => stretchFactor = 1 + 1 + 2
   ```
   Please let me know if you have any suggestions to make the formula easier to 
read.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to