csringhofer commented on a change in pull request #1008:
URL: https://github.com/apache/orc/pull/1008#discussion_r782839420



##########
File path: c++/src/Compression.cc
##########
@@ -533,24 +545,37 @@ DIAGNOSTIC_PUSH
   }
 
   /** There are three possible scenarios when seeking a position:

Review comment:
       now it is four possible scenarios

##########
File path: c++/src/Compression.cc
##########
@@ -533,24 +545,37 @@ DIAGNOSTIC_PUSH
   }
 
   /** There are three possible scenarios when seeking a position:
-   * 1. The seeked position is already read and decompressed into
-   *    the output stream.
-   * 2. It is already read from the input stream, but has not been
-   *    decompressed yet, ie. it's not in the output stream.
-   * 3. It is not read yet from the inputstream.
+   * 1. The chunk of the seeked position is already read and decompressed into 
the output
+   *    stream, ie. chunk header is read and chunk contents are in the output 
stream.
+   * 2. The chunk of the seeked position is partially read. This only happens 
for
+   *    uncompressed chunks. The chunk header is read but the seeked position 
hasn't been
+   *    read yet.
+   * 3. It is already read from the input stream, but has not been 
decompressed yet, ie.
+   *    it's not in the output stream.
+   * 4. It is not read yet from the input stream.
    */
   void DecompressionStream::seek(PositionProvider& position) {
     size_t seekedPosition = position.current();

Review comment:
       not really related to the change, but I think that it would be clearer 
if this would be renamed, e.g. to startOfSeekedChunk

##########
File path: c++/src/Compression.cc
##########
@@ -533,24 +545,37 @@ DIAGNOSTIC_PUSH
   }
 
   /** There are three possible scenarios when seeking a position:
-   * 1. The seeked position is already read and decompressed into
-   *    the output stream.
-   * 2. It is already read from the input stream, but has not been
-   *    decompressed yet, ie. it's not in the output stream.
-   * 3. It is not read yet from the inputstream.
+   * 1. The chunk of the seeked position is already read and decompressed into 
the output
+   *    stream, ie. chunk header is read and chunk contents are in the output 
stream.
+   * 2. The chunk of the seeked position is partially read. This only happens 
for
+   *    uncompressed chunks. The chunk header is read but the seeked position 
hasn't been
+   *    read yet.
+   * 3. It is already read from the input stream, but has not been 
decompressed yet, ie.
+   *    it's not in the output stream.
+   * 4. It is not read yet from the input stream.
    */
   void DecompressionStream::seek(PositionProvider& position) {
     size_t seekedPosition = position.current();
-    // Case 1: the seeked position is the one that is currently buffered and
-    // decompressed. Here we only need to set the output buffer's pointer to 
the
-    // seeked position. Note that after the headerPosition comes the 3 bytes of
-    // the header.
+    // Case 1&2: the seeked position is in the current chunk and it's buffered 
and
+    // decompressed. Note that after the headerPosition comes the 3 bytes of 
the header.
     if (headerPosition == seekedPosition
         && inputBufferStartPosition <= headerPosition + 3 && inputBufferStart) 
{
       position.next(); // Skip the input level position.
       size_t posInChunk = position.next(); // Chunk level position.
-      outputBufferLength = uncompressedBufferLength - posInChunk;
-      outputBuffer = outputBufferStart + posInChunk;
+      // Case 1: The position is in the decompressed buffer. Here we only need 
to
+      // set the output buffer's pointer to the seeked position.
+      if (uncompressedBufferLength >= posInChunk) {
+        outputBufferLength = uncompressedBufferLength - posInChunk;
+        outputBuffer = outputBufferStart + posInChunk;
+        return;
+      }
+      // Case 2: The position is outside the decompressed buffer. Skip bytes 
to seek.

Review comment:
       This is a bit confusing as it can only happen in the uncompressed case.

##########
File path: c++/src/Compression.cc
##########
@@ -321,6 +321,17 @@ DIAGNOSTIC_PUSH
                          DECOMPRESS_ORIGINAL,
                          DECOMPRESS_EOF};
 
+  std::string decompressStateToString(DecompressState state) {

Review comment:
       I couldn't find the place where we use this function




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to