sodonnel commented on code in PR #10415:
URL: https://github.com/apache/ozone/pull/10415#discussion_r3375076536


##########
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java:
##########
@@ -208,12 +214,40 @@ public synchronized void seek(long pos) throws 
IOException {
     if (pos == position) {
       return;
     }
-    LOG.debug("{}: seek {} -> {}", this, position, pos);
-    closeStream();
+    LOG.debug("{}: seek {} -> {}", getName(streamingReader), position, pos);
+    readBuffer = reuseReadBuffer(readBuffer, pos);

Review Comment:
   Lets say we have 32MB pre-read in the queue and we are within the first 1MB. 
Then we seek to 10MB offset. `reuseReadBuffer` will return null.
   
   Then later in `read()` we call `dataAvailableToRead` which triggers a call 
`streamingReader.read()` if there is no remaining in the buffer or the buffer 
is null.
   
   ```
     private synchronized boolean dataAvailableToRead(int length, boolean 
preRead) throws IOException {
       if (position >= blockLength) {
         return false;
       }
       initialize();
   
       if (!hasRemaining(readBuffer)) {
         readBuffer = streamingReader.read(length, preRead);
       }
       Preconditions.assertTrue(hasRemaining(readBuffer));
       return true;
     }
   ```
   
   Inside streamingReadder.read:
   
   ```
    private ReadBuffer read(int length, boolean preRead) throws IOException {
         checkError();
         if (future.isDone()) {
           return null; // Stream ended
         }
   
         readBlock(length, preRead);  // !! Reads more data before polling
   
         while (true) {
           final ReadBlockResponseProto proto = poll();
   ```
   
   It pre-reads more data duplicating data already on the queue and also 
pulling more data onto the queue that the pre-read limit.
   
   This is the sort of thing the boundary tests for 'reading within the 
pre-read doesn't issue new read calls' would catch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to