wj1918 opened a new pull request #8727:
URL: https://github.com/apache/kafka/pull/8727


   Bug fix only, [JIRA]( https://issues.apache.org/jira/browse/KAFKA-8120)
   1. The Outer loop Issue, at Original line 134 of FileStreamSourceTask.java 
        while (readerCopy.ready()) {
   
   Since the buffer is used cross multiple poll. This condition missed a case 
that file has reached EOF, but buffer has un-parsed data. 
   
   To fix it, change the while loop to if statement, and move the file reading 
logic to extractLine
           if (offset < buffer.length && reader.ready()) {
   
   Test case: FileStreamSourceTaskTest.testSmallFile
   
   2. The Expanding Buffer Issue, at Original line 140 of 
FileStreamSourceTask.java
       if (offset == buffer.length) {
           char[] newbuf = new char[buffer.length * 2];
           System.arraycopy(buffer, 0, newbuf, 0, buffer.length);
           buffer = newbuf;
       }
   
   For large file, this condition will be always true even expanding buffer is 
not needed, so every read will trigger expand buffer which will causes Java 
heap error or NegativeArraySizeException if buffer.length * 2 overflows. 
   
   To fix it, move the expanding buffer logic to the end of extractLine 
      if (offset == buffer.length &&  buffer.length  < maxBufferSize) {
          needExtendBuffer = true;
      }
   
   Test case: FileStreamSourceTaskTest.testLargeFile
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to