[ 
https://issues.apache.org/jira/browse/KAFKA-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838672#comment-16838672
 ] 

ASF GitHub Bot commented on KAFKA-8351:
---------------------------------------

hachikuji commented on pull request #6722: KAFKA-8351; Cleaner should handle 
transactions spanning multiple segments
URL: https://github.com/apache/kafka/pull/6722
 
 
   When cleaning transactional data, we need to keep track of which 
transactions still have data associated with them so that we do not remove the 
markers. We had logic to do this, but it was not being carried over when 
beginning cleaning for a new set of segments. This could cause the cleaner to 
incorrectly believe a transaction marker was no longer needed. The fix here 
carries the transactional state between groups of segments to be cleaned.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Log cleaner must handle transactions spanning multiple segments
> ---------------------------------------------------------------
>
>                 Key: KAFKA-8351
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8351
>             Project: Kafka
>          Issue Type: Bug
>          Components: log cleaner
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> When cleaning transactions, we have to do some bookkeeping to keep track of 
> which transactions still have data left around. As long as there is still 
> data, we cannot remove the transaction marker. The problem is that we do this 
> tracking at the segment level. We do not carry over the ongoing transaction 
> state between segments. So if the first entry in a segment is a marker, we 
> incorrectly clean it. In the worst case, data from a committed transaction 
> could become aborted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to