hachikuji commented on pull request #9590: URL: https://github.com/apache/kafka/pull/9590#issuecomment-790152137
@jolshan In `Log.completeSwapOperations`, we have the following comment: ``` // We create swap files for two cases: // (1) Log cleaning where multiple segments are merged into one, and // (2) Log splitting where one segment is split into multiple. // // Both of these mean that the resultant swap segments be composed of the original set, i.e. the swap segment // must fall within the range of existing segment(s). If we cannot find such a segment, it means the deletion // of that segment was successful. In such an event, we should simply rename the .swap to .log without having to // do a replace with an existing segment. ``` This is no longer accurate following the change here. As I understand it, the swap recovery logic looks for each file with the `.swap` extension. It then tries to find all log segments which have been deleted by the compaction process by searching for any segments which have an end offset larger than the swap segment's base offset, but less than the offset of the first record in the swapped segment. This is handling the case when the broker crashes before it could delete the old segments. The recovery algorithm is described in the comment above `replaceSegments`. I believe the logic still works correctly if we change it so that segments that are smaller than the first swap segment get removed. This should work because the swap suffix is removed in descending order, which means that the smallest swap file is the last one to be updated. I think we will need some test cases to ensure that the various cases work correctly. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org