hachikuji commented on pull request #9590:
URL: https://github.com/apache/kafka/pull/9590#issuecomment-790152137


   @jolshan In `Log.completeSwapOperations`, we have the following comment:
   ```
         // We create swap files for two cases:
         // (1) Log cleaning where multiple segments are merged into one, and
         // (2) Log splitting where one segment is split into multiple.
         //
         // Both of these mean that the resultant swap segments be composed of 
the original set, i.e. the swap segment
         // must fall within the range of existing segment(s). If we cannot 
find such a segment, it means the deletion
         // of that segment was successful. In such an event, we should simply 
rename the .swap to .log without having to
         // do a replace with an existing segment.
   ```
   This is no longer accurate following the change here. As I understand it, 
the swap recovery logic looks for each file with the `.swap` extension. It then 
tries to find all log segments which have been deleted by the compaction 
process by searching for any segments which have an end offset larger than the 
swap segment's base offset, but less than the offset of the first record in the 
swapped segment. This is handling the case when the broker crashes before it 
could delete the old segments.
   
   The recovery algorithm is described in the comment above `replaceSegments`. 
I believe the logic still works correctly if we change it so that segments that 
are smaller than the first swap segment get removed. This should work because 
the swap suffix is removed in descending order, which means that the smallest 
swap file is the last one to be updated. I think we will need some test cases 
to ensure that the various cases work correctly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to